Forecasting Malaria Morbidity to 2036 Based on Geo-Climatic Factors in the Democratic Republic of Congo

Background: Malaria is a global burden in terms of morbidity and mortality. In the Democratic Republic of Congo, malaria prevalence is increasing due to strong climatic variations. Reductions in malaria morbidity and mortality, the fight against climate change, good health and well-being constitute key development aims as set by the United Nations Sustainable Development Goals (SDGs). This study aims to predict malaria morbidity to 2036 in relation to climate variations between 2001 and 2019, which may serve as a basis to develop an early warning system that integrates monitoring of rainfall and temperature trends and early detection of anomalies in weather patterns. Methods: Meteorological data were collected at the Mettelsat and the database of the Epidemiological Surveillance Directorate including all malaria cases registered in the surveillance system based on positive blood test results, either by microscopy or by a rapid diagnostic test for malaria, was used to estimate malaria morbidity and mortality by province of the DRC from 2001 to 2019. Malaria prevalence and mortality rates by year and province using direct standardization and mean annual percentage change were calculated using DRC mid-year populations. Time series combining several predictive models were used to forecast malaria epidemic episodes to 2036. Finally, the impact of climatic factors on malaria morbidity was modeled using multivariate time series analysis. Results: The geographical distribution of malaria prevalence from 2001 and 2019 shows strong disparities between provinces with the highest of 7700 cases per 100,000 people at risk for South Kivu. In the northwest, malaria prevalence ranges from 4980 to 7700 cases per 100,000 people at risk. Malaria has been most deadly in Sankuru with a case-fatality rate of 0.526%, followed by Kasai (0.430%), Kwango (0.415%), Bas-Uélé, (0.366%) and Kwilu (0.346%), respectively. However, the stochastic trend model predicts an average annual increase of 6024.07 malaria cases per facility with exponential growth in epidemic waves over the next 200 months of the study. This represents an increase of 99.2%. There was overwhelming evidence of associations between geographic location (western, central and northeastern region of the country), total evaporation under shelter, maximum daily temperature at two meters altitude and malaria morbidity (p < 0.0001). Conclusions: The stochastic trends in our time series observed in this study suggest an exponential increase in epidemic waves over the next 200 months of the study. The increase in new malaria cases is statistically related to population density, average number of rainy days, average wind speed, and unstable and intermediate epidemiological facies. Therefore, the results of this research should provide relevant information for the Congolese government to respond to malaria in real time by setting up a warning system integrating the monitoring of rainfall and temperature trends and early detection of anomalies in weather patterns.


Type of Study and Data Collection
This was a retrospective longitudinal ecological study. We exploited the Mettelsat database for meteorological data on the one hand and the Epidemiological Surveillance Directorate (ESD) database including malaria data from 2001 to 2019 on the other hand.
The ESD database is centralized data of 26 provincial health directorates of the DRC including all malaria cases registered in the surveillance system based on positive blood test results, either by microscopy or by a rapid diagnostic test for malaria, which was used to estimate malaria morbidity by province of the DRC.
Climatic data, including total rainfall quantity, temperature, wind speed, total evaporation under shelter, and average relative humidity, were measured on a permanent basis using different weather stations in the DRC. The distribution of these stations takes into account the variability of the different eco-zones with respect to their climate, their proximity to rivers and altitude. Geographic data are collected at the National Geographic Institute including longitude, latitude, altitude, seasons and hydrography ( Figure 2). The United Nations World Population Prospects estimates the 2022population of DRC to be 95,240,792 inhabitants with a population growth rate of 3.19%, the highest in the world and the population projected to surpass 100 million in 2024 and double its population by 2047. The fertility rates in DRC of 6.11 births per woman is also among the highest in the world. However, since the invasion of DRC in 1996 by neighbouring countries mainly Rwanda and Uganda and the second war of invasion in 1998, also called the African world war, the deadliest conflict on earth since the Chinese Civil War with an estimated over 6 million civilian deaths from 1993 to 2003 alone and 47% of the deaths were children under 5. In the economic war in DRC, which is still ongoing, 90% of soldiers died from malaria and disease and displaced a huge percentage of the population, which makes the DRC one of the most challenging environments for health development in Africa with many issues that a rapidly increasing population can exacerbate such as ongoing conflict, lack of modern health services in addition to major health problems, the EBOLA pandemic, HIV/AIDs as well as rape, human trafficking and child labor [10,11].

Type of Study and Data Collection
This was a retrospective longitudinal ecological study. We exploited the Mettelsat database for meteorological data on the one hand and the Epidemiological Surveillance Directorate (ESD) database including malaria data from 2001 to 2019 on the other hand.
The ESD database is centralized data of 26 provincial health directorates of the DRC including all malaria cases registered in the surveillance system based on positive blood test results, either by microscopy or by a rapid diagnostic test for malaria, which was used to estimate malaria morbidity by province of the DRC.
Climatic data, including total rainfall quantity, temperature, wind speed, total evaporation under shelter, and average relative humidity, were measured on a permanent basis using different weather stations in the DRC. The distribution of these stations takes into account the variability of the different eco-zones with respect to their climate, their proximity to rivers and altitude. Geographic data are collected at the National Geographic Institute including longitude, latitude, altitude, seasons and hydrography ( Figure 2). This figure shows that malaria and climate data were collected separately and then merged for the analyses.

Sampling
Sampling was conducted in three stages as follows: In the first stage, we drew lots from the Provincial Health Directorates (PHD) with at least one weather station. In total, the DRC has 26 PHD. For the malaria data, 13,332,072 observations were selected for our analysis. These are 26 provinces × 19 years × 52 epidemiological weeks × 519 Health Zones or the total of 13,332,072. In the second stage, we selected the health zones with meteorological data. We reduced the observations by grouping them into 26 provinces, 19 years, 52 epidemiological weeks. Outliers and incomplete data reported by some health zones represented less than one percent for the 26 provinces from 2001 to 2019 ( Figure 3). Outliers were removed during the data cleaning process. Missing data were replaced by the mean value imputation technique. In the third stage, we have the malaria cases completely and correctly recorded in the database. The data were reduced in terms of 26 provinces and 19 years. This gives a total of 494. This allowed us to forecast malaria prevalence to 2036 ( Figure 3).

Statistical Methods
The raw data from the different sources mentioned above were cleaned, recoded and merged into a single Excel file. The imputation technique of mean values was used to fill in missing data. The data were analyzed using Stata, R for Windows 4.1.3, R-studio This figure shows that malaria and climate data were collected separately and then merged for the analyses.

Sampling
Sampling was conducted in three stages as follows: In the first stage, we drew lots from the Provincial Health Directorates (PHD) with at least one weather station. In total, the DRC has 26 PHD. For the malaria data, 13,332,072 observations were selected for our analysis. These are 26 provinces × 19 years × 52 epidemiological weeks × 519 Health Zones or the total of 13,332,072. In the second stage, we selected the health zones with meteorological data. We reduced the observations by grouping them into 26 provinces, 19 years, 52 epidemiological weeks. Outliers and incomplete data reported by some health zones represented less than one percent for the 26 provinces from 2001 to 2019 ( Figure 3). Outliers were removed during the data cleaning process. Missing data were replaced by the mean value imputation technique. In the third stage, we have the malaria cases completely and correctly recorded in the database. The data were reduced in terms of 26 provinces and 19 years. This gives a total of 494. This allowed us to forecast malaria prevalence to 2036 ( Figure 3). This figure shows that malaria and climate data were collected separately and then merged for the analyses.

Sampling
Sampling was conducted in three stages as follows: In the first stage, we drew lots from the Provincial Health Directorates (PHD) with at least one weather station. In total, the DRC has 26 PHD. For the malaria data, 13,332,072 observations were selected for our analysis. These are 26 provinces × 19 years × 52 epidemiological weeks × 519 Health Zones or the total of 13,332,072. In the second stage, we selected the health zones with meteorological data. We reduced the observations by grouping them into 26 provinces, 19 years, 52 epidemiological weeks. Outliers and incomplete data reported by some health zones represented less than one percent for the 26 provinces from 2001 to 2019 ( Figure 3). Outliers were removed during the data cleaning process. Missing data were replaced by the mean value imputation technique. In the third stage, we have the malaria cases completely and correctly recorded in the database. The data were reduced in terms of 26 provinces and 19 years. This gives a total of 494. This allowed us to forecast malaria prevalence to 2036 ( Figure 3).

Statistical Methods
The raw data from the different sources mentioned above were cleaned, recoded and merged into a single Excel file. The imputation technique of mean values was used to fill in missing data. The data were analyzed using Stata, R for Windows 4.1.3, R-studio

Statistical Methods
The raw data from the different sources mentioned above were cleaned, recoded and merged into a single Excel file. The imputation technique of mean values was used to fill in missing data. The data were analyzed using Stata, R for Windows 4.1.3, R-studio 2022.02.3 (Vienna, Austria) and ARC-GIS 10.2 (ESRI, Redlands, CA, USA). Flat sorting and calculations of measures of association were performed on the collected data. Variables were compared using Spearman correlation analysis. Finally, the modeling and forecasting of malaria epidemic episodes to 2036 was done by combining several time series models as follows: Let X t be a time series with the following general additive structure: s t is the seasonality, i.e., a periodic component of period d such that ∃d, ∀t, s t+d + s t . Seasonality corresponds to pure seasonal variations. ε t is the residual or noise of the time series.
The trend and seasonality correspond to the deterministic part of the series, the noise is the random component.
The study of time series has two aspects, on the one hand the extraction of the deterministic character and on the other hand the determination of a model for the noise. These two steps are necessary to be able to make forecasts.
To evaluate the predictive performance of a model, it is common to divide the available data into a "training set" and a "test set", i.e., to separate the data set into two parts: one to estimate the number and value of the model's parameters, the other to test the model's predictions [10] (Figure 4). The use of a test set allows us to know if the model makes relevant predictions when confronted with new data. In this case, we have divided our data series into a training set and a test set of sizes 90% (or 205 months) and 10% (or 23 months), respectively. Thus, we calculated the forecast errors on a test data set size of 228 months (19 years at a frequency of 12 months). Sequential ensemble methods were used to select the training set and the test set.
model's predictions [10] (Figure 4). The use of a test set allows us to know if the model makes relevant predictions when confronted with new data. In this case, we have divided our data series into a training set and a test set of sizes 90% (or 205 months) and 10% (or 23 months), respectively. Thus, we calculated the forecast errors on a test data set size of 228 months (19 years at a frequency of 12 months). Sequential ensemble methods were used to select the training set and the test set.
The most commonly used forecast validation metrics are the MAE (Mean Absolute Error) and the RMSE (Root Mean Squared Error) whose formulas are as follows: With: n = number of data ŷi = predicted data yi = reserved data Optimal model selection [12,13].  The most commonly used forecast validation metrics are the MAE (Mean Absolute Error) and the RMSE (Root Mean Squared Error) whose formulas are as follows: With: n = number of datâ yi = predicted data yi = reserved data Optimal model selection [12,13]. The Box-Jenkins SARIMA approach and the combination of several forecasting models (ETS, ARIMA, STL-ETS, NNAR, and TBATS) was implemented. Using malaria case count data from the last 19 years (228 months) of observations, we model and forecast malaria episodes to 2036.

Spatial and Temporal Distribution of Malaria Morbidity and Mortality
Malaria prevalence in South Kivu was 7700 cases per 100,000 people at risk over the past 19 years. Several provinces reported between 4980 and 7700 cases per 100,000 people exposed. In the northwestern part, there is Kinshasa, Equateur and North-Ubangi. In the center of the country, we find Kasai Central and Oriental. In the southeastern part of the country, there is Haut-Uélé and Sud-Kivu. The orange color symbolizes the malaria prevalence ranging from 4310 to 4980 cases per 100,000 exposed persons ( Figure 5).

Spatial and Temporal Distribution of Malaria Morbidity and Mortality
Malaria prevalence in South Kivu was 7700 cases per 100,000 people at risk over the past 19 years. Several provinces reported between 4980 and 7700 cases per 100,000 people exposed. In the northwestern part, there is Kinshasa, Equateur and North-Ubangi. In the center of the country, we find Kasai Central and Oriental. In the southeastern part of the country, there is Haut-Uélé and Sud-Kivu. The orange color symbolizes the malaria prevalence ranging from 4310 to 4980 cases per 100,000 exposed persons ( Figure 5).  Malaria killed the most people in Sankuru with a case fatality rate of 0.526%, followed by Kasai (0.430%), Kwango (0.415%), Bas-Uélé (0.366%) and Kwilu (0.346%).

Temperatures (in °C)
Describing the variability of temperatures, it is very useful to carry out a detailed analysis of the results obtained with these variables. In this case, it is necessary to determine the trends in temperature values over the 19-  Malaria killed the most people in Sankuru with a case fatality rate of 0.526%, followed by Kasai (0.430%), Kwango (0.415%), Bas-Uélé (0.366%) and Kwilu (0.346%).

Temperatures (in • C)
Describing the variability of temperatures, it is very useful to carry out a detailed analysis of the results obtained with these variables. In this case, it is necessary to determine the trends in temperature values over the 19  Distributed by geographical entities, it is clear that the average temperatures are divided into three regions. The northwestern part is the warmest with average temperatures ranging from 25.4 °C to 26.8 °C. In Tshopo and the center of the DRC, temperatures range from 24.9 to 25.6 °C. Finally, it is between 18.9 and 24.9 °C in the southeastern part of the country. Over the course of the year, temperatures begin to rise gradually from January to May. These increases are between about 1 °C and 2.7 °C. Then, a significant drop is seen between June and August. This cycle starts again from September to January of the next year. Other maps showing weather variables and malaria forecast maps can be found in the Appendices A and B ( Figures A1-A12). Distributed by geographical entities, it is clear that the average temperatures are divided into three regions. The northwestern part is the warmest with average temperatures ranging from 25.4 • C to 26.8 • C. In Tshopo and the center of the DRC, temperatures range from 24.9 to 25.6 • C. Finally, it is between 18.9 and 24.9 • C in the southeastern part of the country. Over the course of the year, temperatures begin to rise gradually from January to May. These increases are between about 1 • C and 2.7 • C. Then, a significant drop is seen between June and August. This cycle starts again from September to January of the next year. Other maps showing weather variables and malaria forecast maps can be found in the Appendices A and B ( Figures A1-A12).

Time Series Profile
The different steps of the stochastic modeling of the Malaria series are the following: To begin, let us present the temporal evolution of all the variables of the study in the following Charts ( Figure 7):

Time Series Profile
The different steps of the stochastic modeling of the Malaria series are the following: (1) Identification of the series, (2) Diagnosis of the series, (3) Choice of model, (4) Diagnosis of the quality of the model, (5) Adjustment of the model, (6) Validation of the model, (7) Forecasting.
To begin, let us present the temporal evolution of all the variables of the study in the following Charts ( Figure 7): All of the study variables follow an oscillating pattern over the past 19 years. A general trend, probably driven by weather parameters, is detected as well as cyclical trends. Later on, we will see how to predict future occurrences, and the uncertainty of these predictions, from the recorded data.

Identification of the Time Series
We use the "Palmet" series, which represents the number of malaria cases between January 2001 and December 2019 in DRC ( Figure 8). All of the study variables follow an oscillating pattern over the past 19 years. A general trend, probably driven by weather parameters, is detected as well as cyclical trends. Later on, we will see how to predict future occurrences, and the uncertainty of these predictions, from the recorded data.

Identification of the Time Series
We use the "Palmet" series, which represents the number of malaria cases between January 2001 and December 2019 in DRC ( Figure 8). An oscillating trend is observed between 2001 and 2004. In contrast, a linear and increasing trend is observed between 2005 and 2019. The variance of malaria cases does not always appear to be constant around the seasonal mean in the short term, but it remains relatively stable over longer periods. The annual cycle is also clearly repetitive. This Chart confirms the sharp increase in malaria episodes in the first and last quarter. The downward trend is observed in the third quarter.

Modeling Choices
In order to model the stochastic part of the series, we start by removing the deterministic part (trend + seasonality). We can then identify and model the stochastic part. In this case, we use the moving average method. We estimate the trend and the seasonality in a non-parametric way by the moving average method ( Figure 9).
The "Palmet" series follows a linear trend over the last 19 years. However, there are seasonal fluctuations that oscillate around the general trend on a regular basis during the study period. After decomposing the Palmet series, the random (stochastic) component shows low variabilities around the mean. This justifies a modeling of the Stochastic part. An oscillating trend is observed between 2001 and 2004. In contrast, a linear and increasing trend is observed between 2005 and 2019. The variance of malaria cases does not always appear to be constant around the seasonal mean in the short term, but it remains relatively stable over longer periods. The annual cycle is also clearly repetitive. This Chart confirms the sharp increase in malaria episodes in the first and last quarter. The downward trend is observed in the third quarter.

Modeling Choices
In order to model the stochastic part of the series, we start by removing the deterministic part (trend + seasonality). We can then identify and model the stochastic part. In this case, we use the moving average method. We estimate the trend and the seasonality in a non-parametric way by the moving average method (Figure 9). Over the past 19 years, malaria episodes have shown high monthly variability around the average, with a peak in April. It decreases to lowest values in June, but rises again to maximum in July, September, October and November ( Figure 10).

Diagnosis of Seasonality
To diagnose seasonality, we used the autocorrelation test. Just as correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values in a time series. Palmet: Our forecasting model was named weather malaria (in French "paludisme météorologie").
The "Palmet" series follows a linear trend over the last 19 years. However, there are seasonal fluctuations that oscillate around the general trend on a regular basis during the study period. After decomposing the Palmet series, the random (stochastic) component shows low variabilities around the mean. This justifies a modeling of the Stochastic part.
Over the past 19 years, malaria episodes have shown high monthly variability around the average, with a peak in April. It decreases to lowest values in June, but rises again to maximum in July, September, October and November ( Figure 10). Over the past 19 years, malaria episodes have shown high monthly variability around the average, with a peak in April. It decreases to lowest values in June, but rises again to maximum in July, September, October and November ( Figure 10).

Diagnosis of Seasonality
To diagnose seasonality, we used the autocorrelation test. Just as correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values in a time series.
If the test of stationarity of the series were confirmed, by "difference", the second step in the diagnosis of the series would be the analysis of autocorrelations. Two indicators are then used: Total autocorrelation (ACF) and partial autocorrelation (PACF). These indicators can be examined by Charts or by numerical analysis.
There are several autocorrelation coefficients, corresponding to each panel in the shift plot. For example, r 1 measures the relationship between y t and y t−1 , r 2 measures the relationship between y t and y t−2 and so on.
The top ten autocorrelation coefficients for the mean annual malaria cases data are shown in the following Table 1: The Box-Pierce test was used to deduce the significance of total autocorrelations (ACF) at the 5% significance level.
The null hypothesis of these tests is H0: suggests the absence of autocorrelation (between 1 and the maximum lag indicated in parameter). Below is the implementation of the Box-Pierce test in R: data: Palu X-squared = 2284.5, df = 24, p-value < 2.2 × 10 −16 .
We reject H0 because p-value = 2.2 × 10 −16 << 5%. The autocorrelation coefficients are plotted to show the autocorrelation function known as a correlogram.
For the total autocorrelation (ACF), we see that r 1 is higher than for the other lags. This is due to the seasonal pattern of the data: peaks tend to be one quarter apart and troughs one quarter apart. All total autocorrelations (ACF) are significant. As for partial autocorrelation (PACF), r 6 , r 8 , r 9 and r 10 are more negative than for the other lags because the troughs tend to be two quarters behind the peaks.

Charts Visualization: Correlograms
A total autocorrelation (a partial autocorrelation) is significant at a lag of order p (resp. order q) when the "bar" at that lag falls outside the confidence interval (except for the lag of order 0, the first bar). It can be seen that for the total autocorrelation (ACF), autocorrelations of order 1 to 10 are significant. For PACF, the significant partial autocorrelations are those of order 1 to 2 ( Figure 11). If the test of stationarity of the series were confirmed, by "difference", the second step in the diagnosis of the series would be the analysis of autocorrelations. Two indicators are then used: Total autocorrelation (ACF) and partial autocorrelation (PACF). These indicators can be examined by Charts or by numerical analysis.
There are several autocorrelation coefficients, corresponding to each panel in the shift plot. For example, r1 measures the relationship between yt and yt−1, r2 measures the relationship between yt and yt−2 and so on.
The top ten autocorrelation coefficients for the mean annual malaria cases data are shown in the following Table 1: The Box-Pierce test was used to deduce the significance of total autocorrelations (ACF) at the 5% significance level.
The null hypothesis of these tests is H0: suggests the absence of autocorrelation (between 1 and the maximum lag indicated in parameter). Below is the implementation of the Box-Pierce test in R: data: Palu X-squared = 2284.5, df = 24, p-value < 2.2 × 10 −16 .
We reject H0 because p-value = 2.2 × 10 −16 << 5%. The autocorrelation coefficients are plotted to show the autocorrelation function known as a correlogram.
For the total autocorrelation (ACF), we see that r1 is higher than for the other lags. This is due to the seasonal pattern of the data: peaks tend to be one quarter apart and troughs one quarter apart. All total autocorrelations (ACF) are significant. As for partial autocorrelation (PACF), r6, r8, r9 and r10 are more negative than for the other lags because the troughs tend to be two quarters behind the peaks.

Charts Visualization: Correlograms
A total autocorrelation (a partial autocorrelation) is significant at a lag of order p (resp. order q) when the "bar" at that lag falls outside the confidence interval (except for the lag of order 0, the first bar). It can be seen that for the total autocorrelation (ACF), autocorrelations of order 1 to 10 are significant. For PACF, the significant partial autocorrelations are those of order 1 to 2 ( Figure 11). The performance of the predictions depends on the size of the prediction errors. It is a matter of comparing the predictions with the actual observed values. The smaller the training data, the better the method performs (Table 2). It is obvious from the Charts below that the naive seasonal method is the best for these data, although it can still be improved, as we will discover later. Sometimes different measures of accuracy lead to different results as to which forecasting method is more appropriate. However, in this case, all results indicate that the seasonal naive method is the best of the three methods for this data set ( Figure 12). The results can be seen in Charts below (Table 3). The performance of the predictions depends on the size of the prediction errors. It is a matter of comparing the predictions with the actual observed values. The smaller the training data, the better the method performs (Table 2). It is obvious from the Charts below that the naive seasonal method is the best for these data, although it can still be improved, as we will discover later. Sometimes different measures of accuracy lead to different results as to which forecasting method is more appropriate. However, in this case, all results indicate that the seasonal naive method is the best of the three methods for this data set ( Figure 12). The results can be seen in Charts below (Table 3). Here, the best method is the seasonal naive method (regardless of the accuracy measure used).  Here, the best method is the seasonal naive method (regardless of the accuracy measure used).

Forecast Combinations
Using malaria case count data from the last 19 years (228 months) of observations, we modeled and predicted future malaria infestations for the next 200 months using forecasts from the following models: ETS, ARIMA, STL-ETS, NNAR, and TBATS; and we compare the results (Figure 13).
Using malaria case count data from the last 19 years (228 months) of observations, we modeled and predicted future malaria infestations for the next 200 months using forecasts from the following models: ETS, ARIMA, STL-ETS, NNAR, and TBATS; and we compare the results (Figure 13). The combined model approach predicts an exponential increase in malaria episodes over the next 16 years in DRC. The combinatorial approach was used to improve the accuracy of the forecasts.

Predictive Models
The NNAR is particularly well suited to this series while the combined approach is twice as good as the component method. In the next 16 years, there will be an average increase of 309,198 new malaria infestations per year per province (Table 3).

Final Model Selection
Before selecting the final model for the prediction, several models are proposed and compared ( Table 4).
The time series models presented above allow for the inclusion of information from past observations of a series, but not for the inclusion of other information that might also be relevant. If we want to include external variables that can explain some of the historical variation and can lead to more accurate forecasts, we need to exploit dynamic regression models that take into account the subtle variation in time series. In this case, we will use the deterministic and stochastic model and then compare them. A deterministic trend is obtained by using the regression model: where ηt is an ARMA process.  The combined model approach predicts an exponential increase in malaria episodes over the next 16 years in DRC. The combinatorial approach was used to improve the accuracy of the forecasts.

Predictive Models
The NNAR is particularly well suited to this series while the combined approach is twice as good as the component method. In the next 16 years, there will be an average increase of 309,198 new malaria infestations per year per province (Table 3).

Final Model Selection
Before selecting the final model for the prediction, several models are proposed and compared (Table 4). The time series models presented above allow for the inclusion of information from past observations of a series, but not for the inclusion of other information that might also be relevant. If we want to include external variables that can explain some of the historical variation and can lead to more accurate forecasts, we need to exploit dynamic regression models that take into account the subtle variation in time series. In this case, we will use the deterministic and stochastic model and then compare them. A deterministic trend is obtained by using the regression model: where η t is an ARMA process.
The following equations are generated: Each year, there will be an average of 6420 of the new malaria cases per health facility. Our stochastic trend model is estimated as follows (Table 5): The model can be written as: In this case, of the stochastic model, the annual increase in malaria cases is 6024.07 per health facility. Although the growth estimates are similar, the prediction intervals are not, as shown in Figure 14. In particular, the stochastic trends have much wider prediction intervals because the errors are not stationary.
There is an implicit assumption with deterministic trends that the slope of the trend will not change over time. On the other hand, stochastic trends can change, and the estimated growth is only assumed to be the average growth over the historical period, not necessarily the growth rate that will be observed in the future. Therefore, it is safer to forecast with stochastic trends as our final choice of model, especially for longer forecasting horizons, because the forecasting intervals allow for greater uncertainty about future growth. There is an implicit assumption with deterministic trends that the slope of the trend will not change over time. On the other hand, stochastic trends can change, and the estimated growth is only assumed to be the average growth over the historical period, not necessarily the growth rate that will be observed in the future. Therefore, it is safer to Figure 14. Forecast of the annual number of malaria cases in DRC using a deterministic trend model and a stochastic trend model.

Climate Predictors Associated with Malaria
We also modelled the relationship between malaria morbidity and climate factors between 2001 and 2019 adjusting for spatio-temporal variations to identify climatic predictors of malaria (Table 6).  Figure 15 shows the correlation matrix. Spearman correlation analysis indicates a statistically significant association between evaporation within two meters of the ground and the number of annual malaria cases (r = 0.25; p < 0.001).    A statistically insignificant association was observed between the rest of the climate variables and the number of annual malaria cases.
Based on the historical average values of geoclimatic factors (all else being equal), over the next 16 years, the number of malaria cases will increase by its historical average value of 653,071 cases per year.
In addition, for each annual increase of one unit of evaporation at two meters above ground, the absolute number of malaria cases will increase by its historical average value of 13,285 cases. This change is statistically significant (p < 0.0001).
Year-over-year, when relative soil moisture increases by an average of one percent, the average number of malaria cases will increase by an average of 39,233 cases. A statistically significant change (p = 0.049).
Similarly, the annual increase in heavy rainfall in millimeters will result in an increase in malaria cases from its historical average of 3058 cases per year. This increase is statistically significant (p = 0.049).
Similarly, when the density increases by one inhabitant per square kilometer from one year to the next, the average number of malaria cases will increase by 1320 cases compared to its historical value (p < 0.001).
On the other hand, a one-degree Celsius increase in minimum temperature each year will result in an average decrease in malaria cases by 23,287 of its historical average value per year (p = 0.008).

Discussion
Malaria is caused by infection with protozoan parasites of the species Plasmodium. Plasmodium falciparum is widespread in Africa, while P. vivax, P. ovale and P. malariae infections are less common and geographically limited [14,15].
In DRC, our results show an average malaria prevalence of the last 19 years of 13,246 (11,783.83-1417.83) cases per 100,000 people at risk. This prevalence increases significantly during the whole study period. The year 2002 was more morbid with 29,137.99 (12,094.51-38,304.56) cases per 100,000 persons at risk.
Most cases in 2018 (213 million or 93%) were recorded in the WHO African region, far ahead of the Southeast Asian region (3.4%) and the Eastern Mediterranean region (2.1%) [16]. Our study showed that malaria-attributable mortality had reached 117.33 (71.08-163.58) malaria deaths per 100,000 people exposed to malaria in 2002. In 2008, the lowest level recorded was 19.04 (14.14-23.35) malaria deaths per 100,000 people exposed to malaria. In general, malaria mortality has been declining dramatically over the past 19 years.
The same observations were made by WHO 2019 showing that on a global scale, malaria incidence declined between 2010 and 2018 from 71 cases per 1000 people at risk of malaria to 57 per 1000. Nevertheless, this decline slowed considerably between 2014 and 2018, with incidence decreasing to 57 per 1000 in 2014 and remaining at a similar level through 2018 [16].
The WHO Africa region alone accounted for 94% of global malaria deaths in 2018. Yet, it also accounted for 85% of the 180,000 fewer deaths from the disease compared to 2010 [16]. Nearly 85% of global malaria deaths in 2018 were concentrated in 20 countries in the WHO African region and India. Nigeria alone accounted for nearly 24% of these deaths, followed by the Democratic Republic of Congo (11%), the United Republic of Tanzania (5%), and Angola, Mozambique, and Niger (4% each) [16].
In our study, Spearman's correlation analysis indicates a statistically significant association between evaporation at two meters from the ground and the number of annual malaria cases. A statistically insignificant association was observed between the rest of the climate variables and the number of annual malaria cases. In contrast, in South Africa, a statistically significant association between all climate variables studied and monthly malaria cases was reported where minimum temperature had the highest correlation (R = 0.39; p < 0.001), followed by total quantity of rainfall at two meters altitude and mean temperature (r = 0.35; p < 0.001, r = 0.35; p < 0.001), then relative humidity and maximum temperature with (R = 0.29; p < 0.001, r = 0.25; p < 0.001), respectively [17].
This difference could be related to the disparity in transmission levels in Africa, both in intensity and in seasonality and regularity. This disparity has consequences at several levels: biological, immunological, pathological, etc. Subjected to a strong and permanent plasmodial infection, the population develops a premunition, a temporary immunity, non-sterilizing and maintained by the infection, itself maintained by the transmission.
It should be noted that South Africa is located in an area where transmission is regular every year, with a long seasonal interruption of about 6 months, linked to the rhythm of the rains. The populations of these areas have a very high degree of resistance to malaria but are not totally free of "moderate" malaria attacks and may be susceptible to severe malaria attacks in case of plasmodial infections by foreign strains [18].
Wilson's classifications of epidemiological facies as "models" reflect the dynamics of host-vector-parasite relationships in different environments, and he points out that all intermediate situations can be found within and between these basic models.
There are seasonal variations in the intensity of transmission, but no interruption, however brief. Transmission is generally by An. gambiae and An. funestus. This mode of transmission is well illustrated by the degraded forest areas of Central Africa (DRC, Congo, Cameroon, etc.).
This approach by Wilson highlights the transmission/malaria disease relationships, taking into consideration the intensity of transmission (chance of infection or frequency of infection), the duration of transmission with its regularity from one year to the next (transmission continues nearly all-year round or annually recurring season or seasons of malaria transmission), the ecological variables (altitude, temperature, humidity) with the relative development of immunity. Variations in these different variables are reflected in variations in the classical malaria indices (splenic index, plasmodic index) but above all in malaria morbidity according to age groups and general mortality in the event of an epidemic. Some authors agree with Wilson's categorization of epidemiological facies where the DRC/Zaire is located in the Degraded Forest Zone: permanent transmission (=Wilson's Group I): where malaria is endemic with intense and permanent transmission [19].
The results of our study reveal a negative but not statistically significant association observed between the remaining number of annual malaria cases and wind speed, number of rainy days, maximum and minimum temperature. Our results concur with those found by Lindsay et al. showing a negative but statistically significant association between rainfall and malaria cases, which could be due to the fact that the breeding sites were washed away by heavy rains [20].
The significant relationships observed with a lag of more than one month between malaria onset and associated climatic variables (p < 0.05), such as rainfall and minimum temperature at two months, suggest that climatic conditions in a given year (early onset of rains, especially those associated with La Niña conditions and tropical cyclones) may affect malaria transmission in the following year.
Periods of heavy rainfall combined with low temperatures and high relative humidity promote saturated soil moisture, which prolongs the life of water pockets and may result in the persistence of larval habitats [21,22].
Several forecasting models were used in this study to predict future malaria episodes. Some authors used the SARIMAX model to forecast monthly time series of malaria cases using the SARIMA Box-Jenkins approach and multiple linear regression (MLR). The forecast shows that the number of malaria cases remains high until July, which is normally a period of low malaria transmission, indicating a change in the malaria season [17].
For Kumar et al. [23], the autoregressive integrated moving average, ARIMA (0,1,1) (0,1,0) 12, was the best fitting model and could explain 72.5% variability in the time series data. Rainfall (p = 0.004) and relative humidity (p = 0.001) were found to be significant predictors for malaria transmission in the study area. The seasonal adjustment factor (SAF) for malaria cases showed a peak during August and September. Although ARIMA time series analysis models are a simple and reliable tool for producing reliable forecasts for malaria in Delhi, India, in our study however, the combination of several forecasts led to higher forecast accuracy.
The combination of multiple model forecasts (ETS, ARIMA, STL-ETS, NNAR, and BATS) fit better and led to greater accuracy in modeling and forecasting for future malaria infestations over the next 200 months. The results were virtually unanimous: the combined model approach predicts an average increase of 309,198 new malaria infestations per year per province over the next 16 years in DRC. However, the NNAR predictive model was particularly well suited in this series.
Wangdi et al. [24] also found that ARIMA (2,1,1) (0,1,1) 12 was the best possible model for predicting malaria cases in Bhutan. In the same vein, The ARIMA model has also been used for malaria case prediction in Sri Lanka [25] and Ethiopia [26]. To evaluate the predictive performance of our model, we divided our data set into a training set and a test set of sizes 90% and 10%, respectively.
The mean absolute scaled error (MASE) as a measure of the quality of the predictions made by our models showed that the naive seasonal method was the best of the three methods for this data set with a MASE of 0.894. Recall that the closer the MASE is to zero, the better the forecast.
Our forecasting horizon was set at 2036, and it was desirable and wise to finally retain the stochastic model, as it is safer to forecast with stochastic trends, especially for longer forecasting horizons, as the forecasting intervals allow for greater uncertainty about future growth.
Our stochastic model predicts an average annual increase of 6024.07 new malaria cases per facility.
As for the estimated seasonal component, this study shows a slight decrease in malaria cases from January to February throughout the study period, rising again in March and reaching the peak in April. It declines to lowest values in June, but rises again to the peak in July, September and November. In addition, malaria is present almost year-round in the DRC due to the tropical climate and the country's extensive river system. The DRC has about 30 major rivers with at least 20,000 km of shoreline and 15 lakes with about 180,000 km of shoreline.
As for the "seasonality" component, this study reveals that the "Palmet" series will have variation between months simply due to the different number of days in each month, in addition to the seasonal variation in future years. This disparity is closely related to the variability of precipitation during the study period showing very intense palustrine episodes from January to June and then from September to December. On the other hand, a decrease in rainfall is observed between July and October (dry season).
The relief of the DRC can also have an impact on the seasonality of malaria. There is a vast low-lying area in the center of the country, which is a plateau shaped by the westward flowing river basin and covered by a large tropical forest. This area is surrounded by mountainous terraces such as the Mitumba Mountains in the east and the Virunga Mountains in the north, savanna-covered plateaus in the south and southwest, and the north is bordered beyond the river by dense forest. High mountains are found at the eastern end of the country (Great Rift region).
A study conducted in Kenya shows an increase in malaria cases in the high and medium altitude areas while in the riverine and lowland areas, the number of malaria cases had significantly decreased as of 2011, during the study period [27].
Finally, our results of the multivariate time series analysis showed that over the next 16 years, the number of malaria cases will increase from its historical average value of 653,071 cases per year due to increase in geo-climatic factors such as evaporation under shelter at two meters from the ground, relative soil moisture, high rainfall and population growth with a density of one inhabitant per square kilometer will significantly increase the number of malaria cases from its historical mean value each year. On the other hand, increasing the minimum temperature by one degree Celsius each year will lead to an average decrease in malaria cases by 23.287 of its historical average value per year.
Our previous study conducted in 2022 showed that the number of rainy days resulted in a statistically significant 7% increase in malaria cases with malaria morbidity significantly associated with geographic location, total evaporation under shelter, maximum daily temperature at two meters altitude, and humidity at two meters altitude [28].
Our results are similar to those of Solomon Kibret et al. in their study on environmental and meteorological factors related to malaria transmission around large dams in three ecological settings in Ethiopia, which showed that Rainfall (lagged 1 and 2 months) was significantly associated with malaria incidence, but only at the lowland dam, while minimum and maximum air temperatures (lagged 1 and 2 months) were significant factors only at the highland dam [29].
The polynomial model Mopuri R et al. had predicted malaria cases with high predictive power and revealed that first lag malaria cases and population play a vital role in malaria transmission. Similarly, mean temperature, rainfall and normalized vegetation index had significant impact on malaria cases [30].
In the same perspective, the results of Nkiruka et al. suggest that, although the exact association between malaria incidence and climate variability varies across geographic regions, non-seasonal changes in three climatic factors (rainfall, temperature, and surface radiation) contribute significantly to malaria occurrence [31].
The results of Sahai et al. agree with ours. The authors note that changes in rainfall and temperature regimes probably play a major role in increasing the incidence of these diseases in different geographical locations. A detailed analysis was conducted on the incidence of malaria and diarrhea in two districts of Maharashtra State. It is found that the increased likelihood of high (low) rainfall, high (low) minimum temperatures and low (moderate) maximum temperatures are more (less) conducive to both diseases in these areas, but with different thresholds [32].

Conclusions
The purpose of this study was to forecast malaria episodes to 2036 using the epidemiological and meteorological model in order to strengthen the surveillance and response to diseases with epidemic potential throughout the DRC. The results of the predictive analysis showed that malaria transmission is highly seasonal in the DRC despite being endemic in the country. Our stochastic trend model predicts an average annual increase of 6024.07 malaria cases per health facility. As such, this model forecasts an exponential growth of epidemic waves over the next 200 months of the study.
Furthermore, the geo-climatic predictors most associated with malaria were geographic location (western, central and northeastern region of the country), total evaporation under shelter, maximum daily temperature at two meters altitude. Finally, the average number of malaria cases increased positively as a function of the average number of rainy days, the total quantity of rainfall and the average daily temperature.
Therefore, the results of this research should provide relevant information for the Congolese government to respond to malaria in real time by setting up a warning system integrating the monitoring of rainfall and temperature trends and early detection of anomalies in weather patterns.