Using the Hurst Exponent and Entropy Measures to Predict Effective Transmissibility in Empirical Series of Malaria Incidence

: We analyze the empirical series of malaria incidence, using the concepts of autocorrelation, Hurst exponent and Shannon entropy with the aim of uncovering hidden variables in those series. From the simulations of an agent model for malaria spreading, we ﬁrst derive models of the malaria incidence, the Hurst exponent and the entropy as functions of gametocytemia, measuring the infectious power of a mosquito to a human host. Second, upon estimating the values of three observables—incidence, Hurst exponent and entropy—from the data set of different malaria empirical series we predict a value of the gametocytemia for each observable. Finally, we show that the independent predictions show considerable consistency with only a few exceptions which are discussed in further detail.


Introduction
In Babylon, during the year 323 BC, Alexander the Great fell ill after returning from Persia, and died shortly after at the age of 32. The reason of his death was most probably a disease now known to be caused by an infection with the parasite Plasmodium falciparum [1]. Malaria cases have declined globally from 238 million cases, in 2000, to 229 million, in 2019 [2]. Its mortality also decreased worldwide from around 736 thousand in 2000 to 409 thousand deaths in 2019 [2]. However, this reduction has stabilized since 2016, and several challenges and risks remain, namely, as concerns children under 5 years old. Due to their limited immune protection, they are a highly vulnerable group, accounting for 67% of global fatal malaria cases in 2019 [2].
Of particular importance in malaria transmission is the amount of gametocytes in the blood circulation. Gametocytes are the parasite forms which mediate transmission from humans to mosquitoes and, therefore, are also obvious targets in implementing preventive actions such as vaccine immunization or anti-malarial drugs. Submicroscopic gametocytemia detection is difficult and often not as precise as statistical data collection from e.g., malaria incidence varying heterogeneously from one community to another [3,4].
In this paper we propose an approach to estimate effective Plasmodium transmissibility levels, which reflect submicroscopic gametocytemia in individuals of a specific community, based on collected time series of malaria incidence in that same community.
Time series models have been used as important tools and metrics, not only in epidemiology, but also in economics, geophysics, biology and ecology [5]. One of them, the so-called Hurst exponent, was introduced by Hurst in 1951 [6] in the context of hydrology planning, specifically, during the study of flooding levels in the river Nile. The Hurst exponent is a measure of memory in a series of values. Applying it to the series of Nile flooding levels, based upon Egyptian ancient hydrology data collected over 847 years, Hurst was able to estimate the rate at which the autocorrelation of that series decreased as the time interval between measurements increased. In this way, Hurst used the collected data to model the flooding levels of the Nile, and, with that, assessed optimum dam sizes to contain extreme rain events. Soon the applicability of the Hurst exponent extended to many other fields, particularly in the study of financial theory [7] and complex phenomena with evidence of fractal features [8][9][10]. However, while different time series models have been implemented in epidemic infections in general [11][12][13][14][15][16][17][18], and in malaria incidence forecasting in particular [19,20]-mostly auto-regressive and linear models-the application of the Hurst exponent in malarial spreading is still lacking.
To relate gametocytemia levels with malaria incidence time series we introduce a methodology based upon the use of Hurst exponents. To the best of our knowledge, this is the first analysis of empirical data of malaria incidence series using Hurst exponent to derive an estimate of a hidden variable, namely, the level of transmissibility or easiness of spreading within the community from which the series of malaria incidence is collected.
We show that the Hurst exponent is able to grasp long-range dependencies close to the phase transition between disease elimination and stable prevalence scenarios of malaria. In a more general scope, complexity in time series may also be related to the level of information entropy, which is commonly used to address emergence phenomena and self-organization. In this paper, we analyze the effects of memory in time series of malaria incidence. Two types of memory have been defined in random stochastic processes, long and short memory, where the transition between these regimes may be represented as a phase transition in the context of a stochastic random process. We focus on Hurst exponent estimation and Shannon entropy at different levels of disease transmission intensity, applied to malaria time series derived from simulations with a previously introduced agent-based model [3] with different (parameterized) levels of gametocytemia, as well as to different empirical malaria time series.
Our objective is to properly identify malaria transmission patterns, as well as to link long-range dependence processes in malaria incidence time series to the occurrence of phase transitions in close proximity to disease elimination. We also test the importance of Hurst exponent estimation and Shannon entropy as sound predictors of the presence of long-range dependence and long memory processes in malaria transmission. In particular, we show that from simple models connecting gametocytemia levels and measures easily extractable from empirical series of malaria incidence, such as incidence levels, a Hurst exponent and entropy, we are able to predict an indicator of "effective" gametocytemia for regions where malaria incidence is regularly monitored. Our study uses eight different empirical series, shown in Figure 1.
We start in Section 2 by describing the different empirical data sets analyzed in this study and the agent-model used to produce simulated scenarios of different transmissibility levels, parameterized by the gametocytemia level. Moreover, we briefly describe the basic tools-the Hurst exponent and Shannon entropy-explaining how they are computed from series of malaria incidence. In Sections 3 and 4 we describe, qualitatively, the behavior of Hurst exponent and entropy, and the form of the autocorrelation function decay, in eight different malaria empirical time series examples. In Section 5 we derive models to fit the values of the three observables: the malaria incidence, the Hurst exponent and the entropy for different simulated scenarios, as a function of the gametocytemia level. Using this fitted expressions we then measure the average malaria incidence, Hurst exponent and entropy of the empirical data sets. Introducing these values in the fitted expressions we retrieve an estimate of the associated "effective" gametocytemia level, i.e., the effective transmissibility level in each empirical case. Finally, in Section 6 we discuss the limitations of our approach, as well as possible extensions to it.  [27]. All data series are available from the respective original references from the authors.

Empirical Series of Malaria Incidence
The empirical part of our investigation comprehends eight series of malaria incidence as presented in Figure 1. These series are available from previous studies namely  [27]. The data sets were chosen as representative of regions with malaria incidence diversity in Africa. They have a time duration ranging from 60 [16] to 132 months [22], while showing different trends and periodicity-see Figure 1 and Table 1. Malaria incidence was measured at each month as the expected number of new malaria cases per 100 inhabitants in a full year, if malaria incidence were to be kept constant. In this way, to obtain the precise incidence at each month, the yearly presented graphical value must be divided by 12 months. As can be seen from Figure 1, the eight cases differ in their level of malaria transmission and epidemic behavior. All empirical series show some form of irregular periodicity as a consequence of climate seasonality. Declining malaria incidence is clear in series from Aregawi and Okech, as well as in Alhassan and Bedane, although with a final disease outbreak in these last two cases. High levels of malaria transmission occur in Gomez-Elipe (identified solely as Elipe time series in some of the figures, for clarity) for a brief period of time. Malaria incidence remains quite stable in the Muwanika, Appiah and Landoh empirical series, with a consistent upward trend in the last case.

Agent Model for Malarial Spreading
To combine results from empirical data with simulations from the agent model introduced and developed in previous works [3,4], we conduct a series of simulations at different levels of disease transmission efficiency. Human-to-mosquito (H-to-M) transmission was defined in terms of the fraction of human disease days with the presence of gametocytemia in blood circulation, henceforth represented as w h .
Six different scenarios were considered, corresponding to a wide range of different levels of positive gametocytemia duration and disease transmission efficiency, specifically, 110 days of positive gametocytemia during 150 days of expected disease duration (i.e., w h = 110/150 = 0.733), 90 days (w h = 0.600), 75 days (w h = 0.500), 70 days (w h = 0.467), 68 days (w h = 0.453), and 63 days (w h = 0.420). For the simulations, we consider a system of N m = 4000 mosquitoes and N h = 2000 human individuals, both including healthy and infected individuals. We have modeled the number of mosquitoes as a small but effective fraction of the overall mosquito mass that randomly feeds on a human individual, twice daily on average. The simulation time lasts 30 years while evaluating each human individual in terms of disease duration and human-to-mosquito transmission.
The algorithm keeps track of several attributes for each agent, whether human or mosquito, at a particular age, such as the time spent since the first day of infection, and the individual immunity status. Beyond 5 years of persistent human reinfections, the human host will develop partial protective immunity at the maximum possible level, while losing it after 2 years without infection. The computational cycle includes a realistic mosquito daily mortality routine. Dead mosquitoes are to be replaced by uninfected mosquitoes. Human disease duration reflects realistic human recovery from malaria.
Human-to-mosquito transmission efficiency (w h ) is also stochastically defined and directly dependent on the number of days with positive gametocytemia. Upon updating the number of healthy humans individuals and mosquitoes, the algorithm generates one episode of mosquito feeding in a human individual, with the possibility of protection from long lasting insecticide-impregnated nets (LLIN), insecticide-impregnated nets (ITN) or indoor residual spraying (IRS). Our model is inspired by Mozambique's seasonality [18,28], considering 150 days for the duration of its high-transmission season-see Figure 2.
Relevant details as well as the flowchart describing the computer implementation of the agent-based model are given in Ref. [3]. That model was also used as support for an additional study concerning the presence of heterogeneity in malaria transmission along with the use of ivermectin [4].
Using the present model we analyzed the behavior of the human-mosquito coupled system, resulting from a complex interaction between the two compartments. Human-tomosquito transmission efficiency (w h ) was used to define the probability of a sustained presence of gametocytemia in human blood circulation, as well as the survival probability of infected mosquitoes beyond latency. These aspects are considered critical in disease transmission. Our model simulations use gametocytemia as an independent variable affecting human-to-mosquito transmission. Different levels of gametocytemia define different stages of disease transmission efficiency. Theoretical gametocytemia reduction is considered equivalent to an effective treatment with gametocidal agents, such as primaquine or methylene blue, in a fraction of the human population. 68 days (w h = 0.453), and 63 days (w h = 0.420). For the simulations, we consider a system of N m = 4000 mosquitoes and N h = 2000 human individuals, both including healthy and infected individuals. We have modeled the number of mosquitoes as a small but effective fraction of the overall mosquito mass that randomly feeds on a human individual, twice daily on average. The simulation time lasts 30 years while evaluating each human individual in terms of disease duration and human-to-mosquito transmission.
The algorithm keeps track of several attributes for each agent, whether human or mosquito, at a particular age, such as the time spent since the first day of infection, and the individual immunity status. Beyond 5 years of persistent human reinfections, the human host will develop partial protective immunity at the maximum possible level, while loosing it after 2 years without infection. The computational cycle includes a realistic mosquito daily mortality routine. Dead mosquitoes are to be replaced by uninfected mosquitoes. Human disease duration reflects realistic human recovery from malaria.
Human-to-mosquito transmission efficiency (w h ) is also stochastically defined and directly dependent on the number of days with positive gametocytemia. Upon updating the number of healthy humans individuals and mosquitoes, the algorithm generates one episode of mosquito feeding in a human individual, with the possibility of protection from long lasting insecticide-impregnated nets (LLIN), insecticide-impregnated nets (ITN) or indoor residual spraying (IRS). Our model is inspired in Mozambique seasonality [28,47], considering 150 days for the duration of the high transmission season-see Figure 2. Relevant details as well as the flowchart describing the computer implementation of the agent-based model are given in Ref. [3]. That model was also used as support for an

Hurst Exponent and Entropy to Assess Memory Effects in Stochastic Series
We use two different metrics to assess memory in series of malaria incidence, empirical and simulated, investigating whether long-range dependencies could occur close to phase transition near disease elimination when compared to more stable epidemic scenarios. The Hurst exponent is defined as where the first member represents the rescaled range as a dimensionless ratio between R (represents the maximal range of all observations) and D (represents the standard deviation of all observations). An explicit mathematical definition is given below in Equation (4). T stands for the time index (number of observations in the time series), k is some constant to be determined and H represents the Hurst exponent. With the Hurst exponent estimation it became possible to distinguish among three different regimes characterizing the time series: (i) the anti-persistent regime, characterized by 0.0 < H < 0.5, where, if the series increases (resp. decreases) in one period, it is very likely that it will decrease (resp. increase) in the next period; (ii) the persistent regime, characterized by 0.5 < H < 1, where, if the series decreases (resp. increases) in one period it is very likely that it will decrease (resp. increase) in the next period; and (iii) the memoryless regime, characterized by H = 0.5 when the process is uncorrelated in time.
A Hurst exponent estimation close to 0.5 (random walk process) is found in empirical time series with heavier disease transmission. In malaria time series from our model simulations, transitions from prevalence to disease elimination are characterized by values of the Hurst exponent larger than 0.5 and close to 1, a footprint of a persistence time series.
As for entropy, it is related with the complexity of the time series [29,30]. The complexity of stochastic processes may be calculated with the use of entropy-based measures. For that purpose, several functions may be employed. The significance of complexity, emergence phenomena and self-organization may provide us with useful information concerning continuous as well as discrete systems, in the form of time series results.
Information entropy is supported by the equation: where P(x) is the probability density function of the observable x, which, in its discretized form, is estimated by a histogram with a finite set of values P(x i ) for the set of bin-points x i . The log function is base two. Information theory defines entropy in terms of information uncertainty in the evolution of time series results. Other entropy-related terms, such as mutual information, may be used as alternative methods for time series analysis in malaria.

Estimating the Hurst Exponent and Entropy in Series of Malaria Incidence
Both the Hurst exponent and Shannon entropy are influenced by the length of the time series sample. Therefore, we introduced a standardization procedure, which is independent of the length of the time series.
The procedure is illustrated in Figure 3 and is as follows. We define a 36-month moving average malaria incidence, symbolized as I 36 (m), at month m, as the average malaria incidence in the previous 36 months: where I(n) is the monthly malaria incidence measured or simulated, composing the series of values.
In the Alhassan malaria time series (Kasena Nankana municipality in Ghana, 2017) [21] one can see a declining trend in malaria incidence in the form of a decreasing moving average of 36 months (I 36 ), from ∼11 to ∼4 cases per 100 inhabitants, per year [phy], despite the small final outbreak in the last 6 months. Consistent this declining trend in I 36 , we may also witness a decreasing trend in S 36 , as well as a rising trend in H 36 after month 55. In this case, the behavior of both the Hurst exponent and entropy in the preceding 36 months is well correlated with the malaria incidence trend. The Gomez-Elipe time series (Karuzi in Burundi, 2007) [24] is quite different from the remaining empirical examples. It shows a consistent stable pattern of malaria incidence below 110 cases phy, until month 45, peaking at ∼500 cases phy, around month 48, with a fast downward trend to ∼60 cases phy at month 56, and a slower decline thereafter to a final value of ∼20 cases phy-see Figure 1. The 36-months malaria incidence (I 36 ) reveals a consistent upward trend peaking at month 64, with a downward pattern thereafter. In this case, we find a persistent oscillatory behavior in the value of H 36 during the entire time series, usually above 0.9. Yet, in parallel with the outbreak in malaria incidence, we see a sudden fall in H 36 to values close to 0.6 (closer to random noise) at around month 48. Entropy in the form of a 36-months moving average (S 36 ) reveals a similar behavior in relation to I 36 from month 48 onwards.
The Landoh malaria time series (Est Mono district in Togo, 2012) [25] reveals a consistent upward trend in malaria incidence in the form of a mild increase in the 36-months moving average (I 36 ) during the whole time series, ranging from the initial ∼18 cases phy, to a peak at ∼30 cases phy in month 72.
Consistent with the steady upward trend in I 36 during the entire time series, we find a declining trend in the 36-months moving average of the Hurst exponent (H 36 ), from a peak of ∼0.85 at month 47, to an all-time-low value of ∼0.55. Along with this downward trend in H 36 there is a consistent wave-like increase in entropy in the form of a 36-months moving average (S 36 ), during the entire time series. In the Landoh time series, the behavior of H 36 and S 36 is reasonably well correlated with the behavior of I 36 in time.
Of a similar form, but with an opposite trend to the Landoh time series, the Okech time series (Kenya, 2008) [27] also reveals a steady decreasing trend. In the Okech time series, malaria incidence consistently decreases from an initial peak of ∼400 cases phy, to a final value of close to 50 cases phy. Consistent with the steady downward trend in I 36 during the entire time series, we find an upward trend in the 36-months moving average of the Hurst exponent (H 36 ) from an all-time-low of ∼0.6 at month 42, to a peak of ∼1.2 at month 82. Along with this upward trend in H 36 , there is also a consistent decrease in entropy in the form of a 36-months moving average (S 36 ), during the whole time series. In the Okech time series, the behavior of H 36 and S 36 is also quite correlated with the behavior of I 36 in time.
As the Hurst exponent (H) and Shannon entropy (S) are influenced by the length of the time-series sample, we also considered a 36-month moving window estimate. In the case of the Hurst exponent, we compute the quantities R and D as Having the series of 36 values R(T)/D(T) for each window, we then applied Equation (1) to find fitting values for k and for the Hurst exponent H. As for Shannon entropy, an estimate of the probability density was first computed given by the set of values P(I i ) for an assumed set of bin values I i , with i = 1, . . . , N b , of malaria incidence (I) and then Equation (2) was applied, running the sum only over the 36 incidence values observed within each time window.
In this way, we have computed the average of malaria incidence I, H and S for 36-month windows, independently of the size of the empirical series. In order to quantify I, H and S for each empirical case, we then obtained the average of the corresponding time series and computed its standard deviations. Table 1 shows the empirical values of the three observables. Notice that, in these cases, higher moments seems to be not particularly relevant, which can be seen comparing the average in Table 1 with the median and quartiles in Table 2.

Qualitative Analysis and Robustness Assessment of the Hurst Exponent and Entropy in Empirical Time Series Behavior
While the eight empirical cases show a broad range of values for malaria incidence and the Hurst exponent, the entropy seems much more resistant to changes. The cases of Alhassan, Elipe and Okech form a separated group from the other series which form a second group. We will address these cases in more detail when building the model for effective human-to-mosquito transmissibility (gametocytemia).
We have also looked at the way those indices behaved in a typical low-transmission empirical time series, such as that obtained from Okech, 2008 [27]. In this empirical time series, when the Hurst exponent was evaluated in 36-month partial intervals (H 36 ), it revealed a clear inverse correlation with malaria annual incidence. Futhermore, the Shannon entropy consistently decreased with progressive lower values of malaria incidence. This correlation pattern was similar to the one found in model simulations when comparing high-and low-transmission scenarios.
By consistently searching for evidence of the presence of long-range dependence in malaria time series we looked into the time evolution of information (Shannon) entropy in scenarios with stronger disease transmission, and compared the results with those from other time series of lower disease transmission. In our model simulations, at low levels of disease burden near an eradication-prevalence transition, we consistently found malaria time series with lower information entropy. In Appendix A we describe, in more detail, the eight particular cases. Table 3 shows the correlation between the three properties, I, H and S, for the eight empirical data sets. One finds evidence of linear correlation between malaria incidence and the Hurst exponent or Shannon entropy in seven of the eight presented empirical malaria time series. The case of Bedane is an exception. The estimation of the Hurst exponent by R/S-analysis may be biased due to the shorter length of 36 months [31][32][33]. To ascertain how robust the results with 36-month windows are, we repeated our estimates for 24-and 48-month averages. The correlation is also reasonably evident in the 48-months time frame. Entropy is clearly more linearly correlated to malaria incidence than the Hurst exponent.
In summary, the data shown in Table 3 suggests the presence of significant linear correlation between malaria incidence, and the Hurst exponent or Shannon entropy. In Appendix B we present a more detailed comparison between 36-month averages with 24-and 48-month averages.

Autocorrelation Function and Stochastic Memory in Malaria Empirical Series
The autocorrelation function (ρ k ) behavior has been used with reasonable success in different research fields, from finance to hydrology, and climate data time series. It measures the linear relationship between two sequential values of a time series with a specific time lag k. The autocorrelation function (ρ k ) expresses the magnitude of that correlation between k lagged values: Figure 4 shows the autocorrelation function for each empirical case. The algebraic decay of the empirical autocorrelation function ρ k is strongly connected to the memory of stochastic processes, such as long memory in the form of Long-Range Dependence (LRD). The existence of LRD assumes the presence of stationarity in the time series. A memory parameter d is defined in relation to the slope of the autocorrelation function (ρ) decay. When d > 0 the term persistent defines the time series, with progressively larger values in time. In the opposite case, we have d < 0 with the presence of antipersistence, where positive values will tend to alternate with negative values and vice versa. Here, we borrow the concept as defined in Ref. [34]: For d < 0 we have anti-persistence; i.e., positive values tend to be followed by negative values and vice versa. In the special case of d = 0, the process will correspond to the presence of white noise, without evidence of autocorrelation and corresponding to a pure Markovian process [34]. Here, we borrow the concept as defined in Ref. [34]: For d < 0 we have anti-persistence; i.e., positive values tend to be followed by negative values and vice versa. In the special case of d = 0, the process will correspond to the presence of white noise, without evidence of autocorrelation, and corresponding to a pure Markovian process [34].
The decay in time of the autocorrelation function ρ k in a stochastic process correlates with the presence of memory persistence of past events in the present state of the system. In the case of fast ρ k exponential decay the system memory will be short. With slower ρ k decays (corresponding to power law processes) memory will be longer in relation to the presence of LRD.
For the present eight empirical time series the autocorrelation function ρ k did reveal similar decay patterns. The ρ k decay seems to deviate from exponential decay in most of the cases, what would have been expected in the case of a pure white noise Markovian process, thus suggesting the presence of memory persistence in all empirical series shown. The slow ρ k decay is usually related to the presence of time series non-stationarity. In more than half of the presented examples, a persistent and ondulatory expression of ρ k values as a result of seasonality and periodicity in disease transmission overlaps with the background decaying trend. The decay in time of the autocorrelation function ρ k in a stochastic process correlates with the presence of memory persistence of past events in the present state of the system. In the case of fast ρ k the exponential decay the system memory will be short. With slower ρ k decays (corresponding to power law processes) memory will be longer in relation to the presence of LRD.

Towards a More Quantitative Malaria Model for Predicting Effective Gametocytemia
For the present eight empirical time series the autocorrelation function ρ k revealed similar decay patterns. The ρ k decay seems to have deviated from exponential decay in most of the cases, which would have been expected in the case of a pure-white-noise Markovian process, thus suggesting the presence of memory persistence in all the empirical series shown. The slow ρ k decay is usually related to the presence of time series' non-stationarity. In more than half of the presented examples, a persistent and ondulatory expression of ρ k values as a result of seasonality and periodicity in disease transmission overlaps with the background decaying trend. Figure 5 shows the result obtained for I, H and S in the six different scenarios of gametocytemia levels. From Figure 5a we observe that the six simulations cover all different incidence regimes, ranging from low incidence (I∼0) to high incidence (I∼1). For the same simulations, the Hurst exponent shown in Figure 5b, shows a clear decrease of memory patterns with the increase of the gametocytemia level w h .

Towards a More Quantitative Malaria Model for Predicting Effective Gametocytemia
As for the dependence of the entropy S, shown in Figure 5c, we also observe a transition to large entropy values as w h increases, though the transition seems much more abrupt. At the phase transition near disease extinction we have consistently found lower values of information entropy, clearly defining a stochastic process with long memory. At higher disease transmission rates (higher w h ) entropy becomes higher and more stable, evolving towards a short memory stochastic process. This dichotomy defines the nature of transmission stability and may be useful in defining how distant a malaria time series is from a situation of disease extinction.

Models for the Three Observables as Function of Parameter Gametocytemia
Having described the values of the incidence, the Hurst exponent and entropy obtained in six simulations with an agent model for malaria spreading, we now derive models for each one of these observables as a function of the central parameter in our approach, the gametocytemia level w h .
Notice that, being a parameter that cannot be measured directly from empirical series of malaria incidence, a model from simulations relating the observables with this parameter will enable predicting the "effective" gametocytemia level (i.e., the transmissibility) in empirical cases.
To model the malaria incidence I we consider a continuous step function, varying from I = 0 for w h to I = 1 for w h = 1: The dashed line in Figure 5a shows a function given by Equation (6) for A = 0.562 and α = 8.3. Here, parameter A gives the gametocytemia level, which brings the malaria incidence to the level of 50%, while the value of parameter α indicates how abrupt the transition from eradication to prevalence occurs when increasing the gametocytemia level. Through inspection of Figure 5b, we choose to model the Hurst exponent H by a power law for which the best fit yields B = 0.56 and β = 0.58. Here, parameters have no direct interpretation.
Finally, to address the abrupt transition observed for the entropy when the gametocytemia level varies, we choose a step function tuned by an exponential of w h : with the best fit yielding C = 10,136 and γ = 32. The parameter C tunes how low the entropy is for the extreme case of w h = 0, while, similarly to parameter α, the γ controls how abrupt the transition from that minimum level to S = 1 occurs.

Prediction of Effective Gametocytemia for Empirical Cases
In relation to the prediction of "effective" gametocytemia in the empirical cases we first invert the functions defined in Equations (6)-(8) with respect to the gametocytemia level. We call to the values obtained estimates of gametocytemia, which, in general, do not coincide: In the case that all three models, in predicting w h , retrieve the same value, we can assume almost zero error (maximum consistency of the models). In general, there will be deviations between the three predictions for gametocytemia level. Therefore we take, as an estimatê w h for the gametocytemia level, the average of the three independent predictions, and, as the corresponding error σŵ h , the largest deviation of the independent predictions from that estimate:ŵ Figure 6 repeats the models drawn for the simulations of the agent-based model, together with the estimate for the eight empirical series. While malaria incidence and the Hurst exponent retrieve reasonably acceptable predictions, the entropy seems to be very sensitive. The reason for this may be related with the fact that the values of the entropy are all very similar, making difficult to derive a numerical model that distinguishes between the different values. Moreover, the errors are typically large, showing a broad range of different predictions depending on which models are used (cf. Equation (9)).
In Figure 6 we plot the curves in Figure 5 together with estimates for the eight empirical cases. To predict the gametocytemia in empirical datasets, we assume that the range of values of I, H and S observed for the collection of empirical series covers the range of admissible values between a minimum and a maximum. The same occurs for the collection of our simulations, but, since there is no guarantee of proper calibration, the minimum and maximum values may be different. Still, assuming that those values obtained from simulations should cover the same range of possibilities as those in empirical cases, we normalize the range of observed values in the empirical cases to the range observed for the simulations. This is a necessary step to predict effective gametocytemia, as explained below.

Discussion and Conclusions
The utility of time series models is still a long way from becoming standard practice in malaria prevention. Differences in climate and geographic factors between world regions act as confounding factors in the strictly mathematical time series approach, lowering malaria forecast precision. In recent years the Box-Jenkins theory has become a consistent development in malaria forecasting [5,18,35,36]. However, little attention has been devoted to the Hurst theory, information entropy, short-and long-memory stochastic processes or long-range dependence. It is remarkable that the Hurst theory was initially implemented in the field of hydrology, as malaria surges are clearly correlated with rainfall, temperature and climate seasonality [6].
Malaria epidemic time series consistently present different memory patterns, depending on disease transmission intensity. By comparing time series from our model simulations to real-data malaria time series from different parts of the world, it was possible to obtain a better definition of epidemic stability, according to disease transmission efficiency, from field data time series results. In stationary time series, long-memory processes have been related to the presence of the long-range dependence (LRD) between present and past results [37].
At low H-to-M disease transmission intensity, time series patterns were consistent with the presence of LRD. However, at high disease transmission intensity, this pattern reverted to a low-memory process. By looking at the present model time series with changing w h , one could witness significant differences in stochastic memory patterns. Additionally, in the presented empirical time series, the Hurst exponent and entropy correlated reasonably well with different epidemic growth rates. A similar pattern was evident when looking at malaria incidence correlation with the Hurst exponent and Shannon entropy. As these parameters may be affected by the time series length, their use in a normalized setting should be considered as a reliable option.
By using the standardized forms of the Hurst exponent (H 36 ) and entropy measurement (S 36 ) it was possible to define the type of memory of stochastic malaria incidence time series with greater precision. This fact may be of significant relevance as both parameters may become additional and useful tools in malaria forecasting.
In this paper we used the 36-month standard time length for a specific analysis. We considered it a compromise between a shorter time length (24 months) with less information available to the Hurst exponent estimation, and a longer time length (48 months) with less data available for analysis in shorter empirical time series, such as the series of Appiah et al. [16] (60 months). Other alternative methods, such as the generalized Hurst exponent and adaptive fractal analysis [38,39], could be applied, but, in our case, the results did not show significant improvements. See Appendix B.
Being a standard approach in time series analysis, we also performed SARIMA models for the simulation and empirical cases. Figures 7 and 8 1, 1, 0). The SARIMA forecast of both empirical series is presented in Figure 8, while its equations and coefficients are available in Table A4.
By using the standardized forms of Hurst exponent (H 36 ) and entropy measurement (S 36 ) it was possible to define the type of memory of stochastic malaria incidence time series with greater precision. This fact may be of significant relevance as both parameters may become additional and useful tools in malaria forecasting.
In this paper we used the 36 month standard time length for a specific analysis. We considered it a compromise between a shorter time length (24 months) with less information available to Hurst exponent estimation, and a longer time length (48 months) with less data available for analysis in shorter empirical time series, such as the series of Appiah et al. [16] (60 months). Other alternative methods, such as the generalized Hurst exponent and adaptative fractal analysis [54,55], could be applied, but in our case the results do not show significant improvements. See Appendix B.
Being a standard approach in time series analysis, we also performed SARIMA models for the simulation and empirical cases. Figures 7 and 8 1, 1, 0). SARIMA forecast of both empirical series is presented in Figure 8, while its equations and coefficients are available in Table A4.  All in all, our results seem to indicate that in malaria incidence time series, long range dependence may occur close to phase transition between epidemic stability and disease elimination. The presence of these long-memory stochastic processes in malaria incidence All together, our results seem to indicate that, in malaria incidence time series, longrange dependence may occur close to the phase transition between epidemic stability and disease elimination. The presence of these long-memory stochastic processes in malaria incidence time series could become an additional and useful tool in the early detection of epidemic resurgence, as well as a potential improvement in malarial prevention strategies. Acknowledgments: We thank Miguel Prudêncio (IMM-Lisbon) for his help in defining model parameters, and for his crucial remarks in discussing the model design.

Conflicts of Interest:
No competing interests apply to this manuscript.

Appendix A. Qualitative Analysis of Hurst Exponent and Entropy: Case-by-Case Description
Appendix A.1. Alhassan (2017) In the Alhassan malaria time series (Kasena Nankana municipality in Ghana, 2017) [21] one can see a declining trend in malaria incidence in the form of a decreasing moving average of 36 months (I 36 ), from ∼11 to ∼4 cases per 100 inhabitants, per year [phy], despite the small final outbreak in the last 6 months. Consistent with the declining trend in I 36 , we may also witness a decreasing trend in S 36 , as well as a rising trend in H 36 after month 55. In this case, the behavior of both the Hurst exponent and entropy in the preceding 36 months is well correlated with the malaria incidence trend-see Figures A1-A3.

Appendix A.2. Appiah (2015)
By looking at the Appiah malaria time series (Ejisu-Juaben municipality in Ghana, 2015) [16] an irregular oscillation of malaria incidence is detectable, superimposed on a stable trend in malaria incidence ∼12.4 cases phy, with a range between a peak incidence of ∼20 cases phy at 32 months and an all-time-low of ∼5 cases phy at 18 months-see Figure 1. Consistent with the initial declining trend in I 36 until month 53, we may also witness a declining trend in S 36 as well as a rising trend in H 36 . In the present case, the behaviors of both the Hurst exponent and entropy in the initial 53 months are well correlated with the malaria incidence trend-see Figures

Alhassan (2017)
In the Alhassan malaria time series (Kasena Nankana municipality in Ghana, 2017) [21] one can see a declining trend in malaria incidence in the form of a decreasing moving  In relation to the Bedane malaria time series (Kucha district in Ethiopia, 2016) [23] we can witness an initial declining trend in malaria incidence in the form of a decreasing moving average of 36 months (I 36 ), lasting until month 98, from ∼15 cases phy, to ∼10 cases phy, followed by a small final upsurge in I 36 to ∼11.5 cases phy. Consistent with the initial declining trend in I 36 until month 98, we may also witness a delayed declining trend in S 36 from months ∼70 to ∼115, as well as a rising trend in H 36 from months ∼60 to ∼80, despite the presence of a superimposed irregular oscillatory noise pattern. In this case, the behavior of both the Hurst exponent and entropy in the initial ∼98 months partially shows some degree of correlation with the global malaria incidence trend-see Figures A1-A3.

Appendix A.4. Aregawi (2014)
In the Aregawi malaria time series (Ethiopia, 2014) [22] we can witness an initial small upper trend in malaria incidence in the form of an increasing moving average of 36 months (I 36 ) in the initial ∼62 months, with ∼0.26 cases per 100 inhabitants, per year, declining thereafter to less than ∼0.12 cases per 100 inhabitants, per year.
Consistent with the initial small upper trend in I 36 , we also witness a declining trend in H 36 after month 62. In this case, entropy revealed an atypical behavior, peaking a littler later, at month ∼72. Along with a declining trend in I 36 after month 62, one can witness a consistent rise in H 36 , lasting to the end of the time series despite a transitory fall at month ∼105, with a rapid recovery at month ∼114. In the Aregawi time series, the behavior of S 36 was more unpredictable, with a more delayed response. This fact may be somehow related to the low malaria incidence in the time series.
However, in the present case, the behavior of the Hurst exponent in the form of a 36month moving average (H 36 ) in the initial 65 months still reveals some degree of correlation with the global malaria incidence trend-see Figure A3. At such low levels of malaria incidence (∼0.2 phy) this behavior could be interpreted as a possible outlier result-see Figures A1-A3.
The Gomez-Elipe time series (Karuzi in Burundi, 2007) [24] is quite different from the remaining empirical examples. It shows a consistent stable pattern of malaria incidence below 110 cases phy, until month 45, peaking to ∼500 cases phy, around month 48, with a rapid fall to ∼60 cases phy at month 56, and with a slower decline thereafter to a final value of ∼20 cases phy-see Figure 1. The 36-months malaria incidence (I 36 ) reveals a consistent upward trend peaking at month 64, with a downward pattern thereafter.
In this case, we find a persistent oscillatory behavior in the value of H 36 during the entire time series, usually ranging above 0.9. Yet, coinciding with the outbreak in malaria incidence at month 48, it is possible to see a sudden fall in H 36 to values close to 0.6 (closer to random noise). Entropy in the form of a 36-month moving average (S 36 ) reveals a parallel correlated behavior to I 36 from month 48 onwards. Despite the presence of a superimposed oscillatory noise pattern, the behaviors of the Hurst exponent and entropy also show some degree of correlation with the malaria incidence trend, globally-see Figures     Appendix A. 6. Muwanika (2017) In the Muwanika malaria time series (Uganda, 2017) [26] we may witness an initial small upper trend in malaria incidence in the form of a mild increase in the 36-month moving average ( I 36 ) during the initial ∼47 months, with a peak at ∼60 cases phy, declining thereafter to ∼56 cases phy.
Consistent with the initial small upper trend in I 36 until month 47, we may also witness a declining trend in H 36 from an initial value ∼0.9 to ∼0.5 close to month 45. In a similar form, but with an opposite trend to the Landoh time series, the Okech time series (Kenya, 2008) [27] also reveals a steady decreasing trend. In the Okech time series, malaria incidence consistently decreases from an initial peak of ∼400 cases phy, to a final value close to 50 cases phy.
Consistent with the steady downward trend in I 36 during the entire time series, we find an upper trend in the 36-month moving average of Hurst exponent (H 36 ), from an all-time-low of ∼0.6 at month 42, to a peak of ∼1.

Appendix B. Inspecting the Robustness of 36-Month Averages
The results in the previous appendix have been derived using average values in windows of 36 months, i.e., three years. While the number of points is small, it covers two annual cycles. As it is known-and as shown in Appendix C-malaria spreading shows periodic behavior following annual seasonality.
To evaluate the robustness of the estimated values for malaria incidence, the Hurst exponent and entropy, shown in the previous appendix and discussed in the main text, we present, next, the results for estimates done from two and four annual cycles, i.e., 24 and 48 months, respectively. The results are shown in Figures A5-A8. behavior following annual seasonality.
To evaluate the robustness of the estimated values for malaria incidence, hurst exponent and entropy, shown in the previous appendix and discussed in the main text, we present next the results for estimates done from two and four annual cycles, i.e., 24 and 48 months respectively. Results are shown in Figures A5-A8.          We choose to use the standard method to estimate the Hurst exponent, based on the quotient R/S [34,37,[40][41][42][43]. An alternative method could be the so-called generalized Hurst exponent (GHE), defined from the function where · is the average over time t. The GHE H G depends in general on q as [44] K q (τ) ∼ τ qH G (q) .
A general Hurst exponent varies with the moment q flags the presence of multifractality in the series. GHE is a modern approach to Hurst analysis with some specific advantages applied in the analysis of complex and imhomogeneous time series in electrocardiography (ECG) signals, and it has been shown to be a promising tool for the study of atrial fibrillation (AF) organization from surface ECG [45]. It is usually recommended in the presence of short time series, where it has been shown to be slightly more efficient. It has been used mainly in the assessment of the stability of financial firms as applied to the stock market. However, GHE does not solve the problem of small time series, such as the ones analysed in this paper. Moreover, deriving the exponent from the scaling relation in Equation (A2) imposes a linear relation between log K q and log τ that usually occurs only for the smallest values of τ. Consequently, it may overestimate more recent events, reducing the significance of past events. This is indeed the case for the GHE derived for Okech (see Figure 1), where, as malaria is eradicated towards the end of the data record, it can wrongly show an antipersistence value (H g < 0.5). See Table A1. For this reason we choose the R/S analysis to estimate the Hurst exponent.
Those results will be compared with two empirical time series with different transmission patterns from separate world regions where malaria is still endemic, "Okech" and "Landoh" [25,27]. The Okech time series reveals a pattern consistent with an unstable epidemic state, with a decreasing trend from high levels of malaria incidence, evolving to imminent disease elimination. The other empirical time series reveal an unstable epidemic behavior, but, with a non-stationary increasing trend and an average level of malaria incidence by African standards of between 10 and 50 malaria cases per 100 inhabitants per year. See Table A2. Table A2. (Top) Human-to-mosquito disease transmission efficiency (low, with w h = 0.420, and high, with w h = 0.733), and average malaria incidence (±SE) of ten simulations in all settings. (Bottom) Empirical time series with average malaria incidence (±SD) in different settings of malaria transmission. As our model shows annual seasonality approximately, we choose a SARIMA model with a period of 12 months: