Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic

Karami, Hamed; Chowell, Gerardo; Mujica, Oscar J.; Smirnova, Alexandra

doi:10.3390/math13101692

Open AccessArticle

Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic

¹

Department of Mathematics & Statistics, Georgia State University, Atlanta, GA 30303, USA

²

Department of Population Health Sciences, Georgia State University, Atlanta, GA 30303, USA

³

Department of Evidence and Intelligence for Action in Health, Pan American Health Organization, Washington, DC 20037, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(10), 1692; https://doi.org/10.3390/math13101692

Submission received: 15 April 2025 / Revised: 16 May 2025 / Accepted: 19 May 2025 / Published: 21 May 2025

(This article belongs to the Special Issue Advanced Intelligent Algorithms for Decision Making Under Uncertainty)

Download

Browse Figures

Versions Notes

Abstract

Environmental transmission is a critical driver of cholera dynamics and a key factor influencing model-based inference and forecasting. This study focuses on stable parameter estimation and forecasting of cholera outbreaks using a compartmental SIRB model informed by three formulations of the environmental transmission rate: (1) a pre-parameterized periodic function, (2) a temperature-driven function, and (3) a flexible, data-driven time-dependent function. We apply these methods to the 1991–1997 cholera epidemic in Peru, estimating key parameters; these include the case reporting rate and human-to-human transmission rate. We assess practical identifiability via parametric bootstrapping and compare the performance of each transmission formulation in fitting epidemic data and forecasting short-term incidence. Our results demonstrate that while the data-driven approach achieves superior in-sample fit, the temperature-dependent model offers better forecasting performance due to its ability to incorporate seasonal trends. The study highlights trade-offs between model flexibility and parameter identifiability and provides a framework for evaluating cholera transmission models under data limitations. These insights can inform public health strategies for outbreak preparedness and response.

Keywords:

infectious diseases; parameter estimation; cholera transmission; forecasting

MSC:

92-08; 92-10; 65K10

1. Introduction

Cholera is an acute diarrheal infection caused by the bacterium Vibrio cholerae [1,2,3]. People contract cholera from drinking water or eating food contaminated with cholera bacteria [4]. The disease can spread quickly in areas with polluted water, poor sanitation, and inadequate hygiene [4,5,6]. Cholera symptoms include acute watery diarrhea [7,8], vomiting [6], dehydration [6], electrolyte imbalance [9], body weakness and fainting [10,11], abdominal cramps [12], and chills [5,13]. Patients with mild symptoms can be effectively treated with oral rehydration therapy [14,15]. More severe cases require intravenous rehydration and antibiotics [14]. With immediate treatment, the cholera mortality rate is under 1%; yet, if infection remains untreated, the death rate can rise to 70% [12].

In over 200 years, cholera has been one of the world’s most dangerous and deadly diseases [10,11,16,17,18,19]. As recently as the 1980s, annual cholera death rates were still around three million [14]. In the Americas, for most of the past century, cholera remained endemic; however, it reemerged in the early 1990s with a severe outbreak in Peru [20,21,22], followed by two outbreaks in Haiti, in 2010 and 2018–2023, respectively [23]. In Peru, the first cases of cholera were identified in several coastal cities in January 1991. The disease then spread rapidly through the Peruvian highlands and jungle regions. As of 28 November 1991, over 300,000 cholera cases have been reported along with 3516 deaths [24]. However, the actual number of cholera cases might have exceeded the official statistics considerably due to a large number of asymptomatic and mild cases [25]. According to [26], the cost of the cholera outbreak to the country’s economy in 1991 surpassed USD $770 million due to food trade embargoes and a significant decline in tourism.

Mathematical models of disease transmission coupled with robust optimization algorithms for parameter estimation help to understand cholera progression and assess the efficiency of preventive measures. Basic cholera models extend SIR dynamics by adding environment-to-human transmission to account for the infection from bacteria-contaminated water. This introduces an extra compartment into the model, typically denoted as B, that stands for the concentration of pathogenic bacterium V. cholerae in the aquatic space [27,28]. In [29], the authors take this one step further by dividing B into lower and hyper pathogen compartments,

B_{l}

and

B_{h}

, to incorporate a hyper-infectious stage of V. cholerae. A network approach to modeling cholera spread is considered in [30], where a human population is split into a number of host groups residing in different locations. Human-to-human and environment-to-human pathogen dynamics inside each region are coupled with between-regions disease transmission. Advanced models studying multi-scale infection dynamics, the impacts of control policies, and uncertainty in disease transmission and diagnostics have all become invaluable research tools in the global fight against cholera [27,28].

In most studies, human-to-human and environmental transmission rates are assumed to be constant; their values are estimated by fitting model predictions to reported incidence (or cumulative) series [27]. In [31], a time-dependent environment-to-human transmission rate was introduced to leverage the connection between the rate of aquatic transmission and the weekly temperature fluctuations in the corresponding region. Across 25 departments in Peru, the reconstructed transmission rate, set up as a function of temperature, allowed for an explanation of the primary drivers of cholera disease in a population with no immunity. To expand upon this approach, in this paper, we investigate and compare three distinct approximation strategies for environmental transmission rate: (1) pre-parameterized periodic transmission, (2) temperature-dependent transmission, and (3) data-driven time-dependent transmission while concurrently estimating other key epidemic parameters, such as case reporting rate and human-to-human transmission. Additionally, we used pre-parameterized and temperature-dependent transmission rates to forecast cholera cases beyond the training period, with particular attention given to the seasonal nature of the disease and its strong correlation with temperature. Our results include several forecasts, along with in-depth analysis of cholera trends, offering insights into model performance and practical implications for epidemic preparedness.

The paper is organized as follows: Section 2 introduces the model of cholera dynamics and the optimization algorithm for stable parameter estimation employed in our study. In Section 3, we present numerical analysis of three different approaches to the discretization of the environmental transmission rate, and Section 4 details the forecasting methodology and evaluates multiple forecasting scenarios. Finally, Section 5 concludes the paper with key findings and their implications for cholera epidemic modeling and public health awareness.

2. Modeling Cholera Transmission

The

S I R B

model of cholera spread [31,32,33,34,35,36] captures the transmission dynamics of this alarming bacterial infection within an immunologically naive host:

\begin{matrix} \frac{d S}{d t} & = μ N - β_{h} S (t) I (t) - β_{e} (t) S (t) \frac{B (t)}{B (t) + κ} - μ S (t), \\ \frac{d I}{d t} & = β_{h} S (t) I (t) + β_{e} (t) S (t) \frac{B (t)}{B (t) + κ} - (μ + γ) I (t), \\ \frac{d R}{d t} & = γ I (t) - μ R (t), \\ \frac{d B}{d t} & = λ I (t) - δ B (t), \end{matrix}

(1)

with the initial conditions

S (0) = N - I_{0}, I (0) = I_{0}, R (0) = 0, B (0) = B_{0} .

(2)

In the above, N is the total population of the region. Susceptible, S, humans acquire cholera (indirectly) from infectious, I, individuals and (directly) from bacteria-contaminated water sources. The fourth compartment, B, accounts for the concentration of vibrios in the environment, and R is the number of people removed from cholera disease following their recovery or, in less than 1% of cases, death. The model omits any potential movement back from the removed, R, to the susceptible, S, class since Vibrio cholerae infection is known to provide long-lasting immunity to the recovered population [27,28]. As in a classical SIR model, system (1) assumes that cholera dynamics are much faster than the dynamics of natural birth and death. Therefore, birth and death rates are expected to balance one another in (1).

To ensure stable parameter estimation, we incorporated both incidence,

D = {[D_{1}, D_{2}, \dots, D_{m}]}^{⊤}

, and cumulative data,

C = {[C_{1}, C_{2}, \dots, C_{m}]}^{⊤}

, in our analysis framework, with m being the total number of weeks in the study period. The error in the reported incidence data is independent and identically distributed (i.i.d.), potentially resulting in a higher accuracy of estimation. At the same time, incidence data are often irregular and “spread out”, which complicates the identification of disease parameters consistent with the level of noise in the reported data. The cumulative data are smooth, making it easier to adjust the fit. However, since earlier cases dominate in the cumulative series, using cumulative data alone may result in a growing noise propagation towards the end of the study interval. Combining incidence and cumulative data enables us to obtain the best of both worlds by benefiting from the accuracy of incidence data and from the stabilizing effect of its cumulative counterpart. Due to a large number of asymptomatic cases [25], we suppose that incidence data,

D

, are under-reported, and the true number of incidence cases is

D / ψ

,

0 < ψ \leq 1

. The epidemic parameters used in model (1), along with their specific values, are listed in Table 1.

Thus, our goal is to optimize the model with respect to human-to-human transmission rate,

β_{h}

, transmission rate from the environment,

β_{e} (t)

, and the reporting rate,

ψ

. To that end, we introduced the vector of unknowns,

θ = {[θ_{1}, θ_{2}, \dots, θ_{n}]}^{⊤}

; set

β_{h} = θ_{1}

,

ψ = θ_{2}

, and express

β_{e} (t)

in terms of

θ_{3}, θ_{4}, \dots, θ_{n}

,

n \geq 3

, leading to its approximation,

\tilde{β_{e}} [θ] (t)

. Now, if one solves the ODE system (1) with

θ_{1}

,

θ_{2}

, and

\tilde{β} [θ] (t)

, replacing

β_{h}

,

ψ

, and

β_{e} (t)

, respectively, one obtains state variables,

\tilde{S} [θ] (t)

,

\tilde{I} [θ]

(t),

\tilde{R} [θ] (t)

, and

\tilde{B} [θ] (t)

, as functions of

θ

. This allows us to estimate

β_{h}

,

ψ

, and

β_{e} (t)

by fitting the observable part of (1) to the reported data while minimizing the misfit with respect to

θ

. In other words, the original task of optimizing the model with respect to state variables and epidemic parameters is reduced to the unconstrained optimization with respect to epidemic parameters only. The size of the solution space will depend on the number of parameters used to approximate

β_{e} (t)

. In what follows, three different approaches to this approximation will be explored. Of course, a closed-form solution to (1) for any given set of parameters,

θ

, is unattainable. Thus, system (1) has to be solved numerically at every step of an optimization algorithm.

According to the first equation in (1), the number of newly infected people over the week

t_{i}

is given by

θ_{1} \tilde{S} [θ] (t_{i}) \tilde{I} [θ] (t_{i}) + {\tilde{β}}_{e} [θ] (t_{i}) \tilde{S} [θ] (t_{i}) \frac{\tilde{B} [θ] (t_{i})}{\tilde{B} [θ] (t_{i}) + κ}

. Hence, the observation operator for the incidence data,

D

, is defined as

\begin{matrix} Z = {[Z [θ] (t_{1}), Z [θ] (t_{2}), \dots, Z [θ] (t_{m})]}^{⊤}, \end{matrix}

(3)

\begin{matrix} Z [θ] (t_{i}) = θ_{1} \tilde{S} [θ] (t_{i}) \tilde{I} [θ] (t_{i}) + {\tilde{β}}_{e} [θ] (t_{i}) \tilde{S} [θ] (t_{i}) \frac{\tilde{B} [θ] (t_{i})}{\tilde{B} [θ] (t_{i}) + κ}, i = 1, 2, \dots, m, \end{matrix}

(4)

and one arrives at the following nonlinear least squares problem (NLSP):

min_{θ} F (θ), F (θ) = {∥θ_{2} [\begin{matrix} Z \\ ω T Z \end{matrix}] - [\begin{matrix} D \\ ω C \end{matrix}]∥}^{2}, T = {[\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ 1 & 1 & 0 & \dots & 0 \\ 1 & 1 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & 1 & 1 & \dots & 1 \end{matrix}]}_{m \times m}

(5)

where

F (θ)

is a discrete analog of the least squares error,

Z

represents the model-predicted new infections, T is the cumulative sum operator matrix,

ω > 0

is an appropriate weighting factor, and

| | \cdot | |

stands for the Euclidian norm in

R^{m}

.

Stable estimation of cholera transmission rates allows for the calculation of the basic reproduction number,

R_{0}

, defined as the average number of secondary cases caused by each infected individual in an entirely susceptible population [31]. Following [32], one can calculate the effective reproduction number for model (1) as the spectral radius of its next-generation matrix,

F V^{- 1}

, where

F = (\begin{matrix} β_{h} N & \frac{β_{e} (0) N}{κ} \\ 0 & 0 \end{matrix}) a n d V = (\begin{matrix} γ + μ & 0 \\ - λ & δ \end{matrix}) .

Thus,

R_{0} = \frac{N}{δ κ (γ + μ)} (λ β_{e} (0) + δ κ β_{h})

. The basic reproduction number,

R_{0}

, is influenced by multiple factors, such as water management, hygiene practices, access to quality healthcare, and others. Over time, due to a long-term immunity of the recovered population and variations in

β_{e} (t)

, the transmission potential of cholera disease tends to change. This change is reflected in the effective reproduction number,

R_{e} (t)

, which accounts for the depletion of susceptible individuals in the context of a time-dependent environmental component [31]:

R_{e} (t) = \frac{S (t)}{δ κ (γ + μ)} (λ β_{e} (t) + δ κ β_{h}) .

(6)

Accurate monitoring of

R_{e} (t)

allows for examining the strength of control and prevention measures and ensures reliable forecasting of future incidence cases [37].

Table 1. Parameter definitions and baseline values associated with the simple cholera model.

Symbol	Definition	Value	Reference
$μ$	Natural birth and death rate	$\frac{1}{60 \cdot 52}$ weeks⁻¹	[31]
$κ$	50% infectious dose	$10^{6} {mL}^{- 1}$	[35]
$γ$	Recovery rate	$\frac{7}{5}$ weeks⁻¹	[38]
$λ$	Contribution rate of vibrios from infected individuals	$10 {mL}^{- 1} {weeks}^{- 1}$	[35]
$δ$	Death rate of vibrios in the environment	$\frac{7}{30}$ weeks⁻¹	[39]
$B_{0}$	Initial concentration of vibrios	$5 \cdot 10^{4} {mL}^{- 1}$	–
$β_{h}$	Human-to-human transmission rate	${person}^{- 1} {week}^{- 1}$	Estimated
$β_{e} (t)$	Environmental transmission rate	weeks⁻¹	Estimated
$ψ$	Reporting rate	Unitless	Estimated

3. Parameter Estimation with Quantified Uncertainty

3.1. Discrete Approximation of the Environmental Transmission Rate

In this section, we focus on stable estimation of the human-to-human transmission rate,

β_{h}

, the environmental transmission rate,

β_{e} (t)

, and the incidence reporting rate,

ψ

. The goal of our study is to investigate the main drivers behind aggressive cholera spread during the 1991–1997 Cholera Epidemic in Peru, and to assess the efficiency of government response and the behavioral changes in the population. To that end, we investigated and compared three discretization strategies for the environmental transmission rate,

β_{e} (t)

, which is responsible for the direct transmission of Vibrio cholerae: (1) pre-parameterized periodic transmission, (2) temperature-dependent transmission, and (3) data-driven time-dependent transmission. We concurrently estimated two other important disease parameters,

β_{h}

and

ψ

.

Peru is known for its diverse climate due to a combination of the cold current in the Pacific Ocean, the mountain climate of the Andean highlands, and the tropical temperatures in the Amazon jungle; the difference between summer (December to April) and winter (June to October) temperatures in each region may be significant. It has been observed that the emergence of cholera is strongly correlated with temperature fluctuations, owing to warmer temperatures leading to a higher number of cholera cases [27,28,31]. Hence, modeling

β_{e} (t)

as a function of temperature can potentially help to reconstruct a more nuanced environmental transmission rate, which is crucial to a better understanding of cholera dynamics:

{\tilde{β}}_{e} [θ] (t) = θ_{3} T_{norm} (t) + θ_{4}, T_{norm} (t) = \frac{T (t) - μ (T)}{σ (T)},

(7)

where

T_{norm} (t)

represents the standardized temperature, and

T (t)

is the original temperature measurement. The terms

μ (T)

and

σ (T)

refer to the mean and standard deviation of all temperature values in the data set, respectively, with any missing values omitted from these calculations. This transformation centers the temperature data around zero and scales it according to its variability.

Note that seasonality in cholera transmission is also driven by other factors, rather than temperature. For example, cholera outbreaks often peak during (or after) heavy rainfalls due to increased contamination of water sources with cholera bacteria and possible damage to sanitation systems. Therefore, in some sense, using a general pre-parameterized periodic transmission,

β_{e} (t)

, accounts for seasonality patterns more broadly:

{\tilde{β}}_{e} [θ] (t) = θ_{3} [sin (\frac{2 π t}{52} + θ_{4}) + 1] + θ_{5}, θ_{i} \geq 0, i = 3, 4, 5,

(8)

where

θ_{3}

determines the amplitude of oscillations, and

θ_{4}

and

θ_{5}

control horizontal and vertical shifts, respectively. The addition of 1 to the sine function ensures that

{\tilde{β}}_{e} [θ] (t)

remains positive within the study window. The factor 52 represents the number of weeks in a year.

Finally, approximating

β_{e} (t)

as a linear or nonlinear combination of base functions without inducing any pre-set behavior helps to account for triggers beyond climate, such as sanitation infrastructure, travel patterns, natural disasters, food preparations, and others. In particular, writing

{\tilde{β}}_{e} [θ] (t)

as a linear combination of Fourier or Legendre polynomials,

P_{j} (t)

,

j = 0, 1, 2, \dots, n - 3

, gives rise to

{\tilde{β}}_{e} [θ] (t) = \sum_{j = 3}^{n} θ_{j} P_{j - 3} (t) n \geq 3 .

(9)

In 1989, shortly before the start of the 1991–1997 cholera epidemic, in collaboration with the US Centers for Disease Control (CDC), the Peruvian Field Epidemiology Training Program (FETP) was established. Over the course of the 1991–1997 outbreak, the system covered nearly 6000 health centers in 25 different departments [20,21,22] (see Figure 1). Epidemiological surveillance included both laboratory-confirmed and suspected cases (that is, cases of acute and watery diarrhea in patients older than five). The data are publicly available in the Figshare repository (https://doi.org/10.6084/m9.figshare.10005170.v1, accessed on 15 April 2025) [31,40,41]. For the study of temperature-dependent transmission,

β_{e} (t)

, weekly temperature time series can be retrieved from the European Centre for Medium-Range Weather Forecasts’ ERA-Interim atmospheric reanalysis archive (https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-interim, accessed on 15 April 2025), covering the period from 1991 to 1997 [31]. This archive provides daily minimum, mean, and maximum temperatures for all 25 Peruvian departments, which we used to explore the relationship between case incidence and temperature.

3.2. Parameter Estimation for the Ayacucho Region

To examine the advantages and limitations of our proposed parameter estimation models for

β_{e} (t)

, i.e., pre-parameterized periodic transmission (8), temperature-dependent transmission (7), and data-driven time-dependent transmission (9), we focused on the cholera epidemic in an inland region of Ayacucho during two distinct periods: from 26 February 1991 to 17 December 1991, and from 4 June 1991 to 19 May 1992. The outbreak in Ayacucho was influenced by a combination of environmental and socioeconomic factors, with increased rainfall, flooding, and the lack of basic services all contributing to the severity of the cholera spread [42,43,44].

To solve the nonlinear least squares problem (8), we employed the built-in function ‘lsqcurvefit’ from the Matlab optimization toolbox, which executes the Trust-Region-Reflective algorithm. At every step of the iterative process, we used `ode23s’ for the numerical approximation of state variables in the ODE system (1 since, for some epidemic scenarios, in the presence of two different transmission pathways, environmental and human-to-human, model (1) may easily be stiff. To quantify the uncertainty in our estimated disease parameters,

β_{e} (t)

,

β_{h}

, and

ψ

, we refit the model to

M = 100

additional data sets for incidence and cumulative cases, assuming a Poisson error structure. The resulting M best-fit parameter sets were used to estimate the mean values and the 95% confidence intervals for each of the three parameters,

β_{e} (t)

,

β_{h}

, and

ψ

, and for the effective reproduction number,

R_{e} (t)

(6).

Figure 2 compares the fit to incidence and cumulative cases for three transmission rate modeling approaches, (7)–(9), with parameters reconstructed from epidemic data for the Ayacucho region, February-December 1991. The incidence and cumulative curves generated from pre-parameterized periodic (8) and temperature-dependent (7) transmission rates (left and middle columns, respectively) follow the reported data quite well, considering the limitations of the a priori assumptions enforced by these models. Both approximations, (7) and (8), merge the first two peaks into one peak in the middle of them. However, overall, they represent the general data trend correctly. As expected, the pre-parameterized periodic model (8) over-smoothed both the incidence and cumulative curves, yet the confidence intervals cover most of the data points in the reported sets. In contrast to the periodic transmission rate (8), the temperature-dependent transmission model (7) (middle column) shows greater sensitivity to data variations. This increased sensitivity to environmental factors produces trajectories that mimic data fluctuations more closely, but it struggles with overall accuracy.

As Figure 2 illustrates, among all three discretization methods, the reconstructed data-driven time-dependent (9) transmission rate (right column) achieves exceptional accuracy for both incidence and cumulative cases. By projecting the environmental transmission rate onto a finite-dimensional space with a sufficiently large number of basis functions (we used 30 basis functions in our simulations), one obtains a near-perfect fit, with very high confidence. This proves the effectiveness of the proposed methodology in reconstructing the complex dynamics of cholera transmission.

The 95% confidence intervals (CI) provide insights into the estimation uncertainty, with the time-dependent

β_{e} (t)

model (9) generating the narrowest intervals with the best data coverage, reinforcing its superior performance. For (9), the median curves follow the incidence and cumulative data closely while showing visible misfits when models (7) and (8) are utilized to approximate

β_{e} (t)

.

Figure 3 represents a comprehensive comparison of parameter estimation results across the three approximation models of environmental transmission in the Ayacucho region, February–December 1991. Each column corresponds to a different discretization method, and the rows represent different epidemiological parameters.

In the top row, the environmental transmission rate,

β_{e} (t)

, for models (7) and (8) exhibit similar behavioral patterns, with the temperature-dependent (7) transmission rate (middle column) showing low confidence and greater sensitivity to the observed data, for which temperature distribution was expected to serve as a proxy. The periodic (8) discretization (left column) gives rise to a smooth, low-amplitude curve with minimal fluctuations (ranging from approximately 0 to 0.04), demonstrating a simplified version of the environmental transmission rate. The temperature-dependent (7) approximation of

β_{e} (t)

(middle column) assumes a higher average value (approximately 0.05) with a wide CI (ranging from approximately 0 to 0.14), but it still changes rather slowly. In contrast, the data-driven time-dependent (9) transmission rate (right column) displays complex, oscillating behavior, with a significantly higher amplitude (up to 0.13). This complex pattern closely follows the reported incidence data presented in Figure 2, displaying this method’s ability to capture intricate transmission dynamics driving the outbreak.

The second row illustrates the effective reproduction number,

R_{e} (t)

, which provides crucial insights into disease transmissibility over time. The periodic method (8) leads to a relatively stable reproduction number, close to 1, with minor changes (between approximately 0.8 and 1.4) that appear over-smoothed (similar to its underlying transmission rate). The temperature-dependent approach (7) shows higher initial values of

R_{e} (t)

that rapidly decrease and then stabilize around 1 for most of the study period; the CI becomes narrower towards the right end of the interval. The reproduction number,

R_{e} (t)

, for the reconstructed time-dependent method (9) reveals the most complex pattern, with multiple peaks exceeding 1.5 that align with the incidence series in periodicity but not necessarily in height. The reproduction number based on (9) becomes less than 0.5 in July 1991, only to go back up and form another wave towards the end of the study period.

The third row presents the estimated human-to-human transmission rates,

β_{h}

, with their respective confidence intervals. When the periodic method (8) is used for

β_{e} (t)

approximation,

β_{h} = 2.83 \times 10^{- 6}

(95% CI:

2.75 \times 10^{- 6}, 3.08 \times 10^{- 6}

). When using the temperature-dependent model (7) for

β_{e} (t)

, we obtain

β_{h} = 1.85 \times 10^{- 6}

(95% CI:

4.91 \times 10^{- 7}, 2.93 \times 10^{- 6}

), and the data-driven time-dependent discretization (9) for

β_{e} (t)

gives rise to

β_{h} = 1.28 \times 10^{- 6}

(95% CI:

1.01 \times 10^{- 6}, 1.66 \times 10^{- 6}

). While these estimates differ, all three methods suggest relatively low human-to-human transmission rates of order

10^{- 6}

, indicating that (indirect) human-to-human transmission plays a minor role compared to the transmission from the aquatic environment. The periodic method shows the narrowest histogram distribution with clear central values, and the temperature-dependent approach exhibits the widest uncertainty range (consistent with the uncertainty in

β_{e} (t)

reconstruction).

The bottom row displays the reporting rate estimates,

ψ

, representing the proportion of reported cases. The periodic transmission rate method (8) estimates

ψ = 2.79 \times 10^{- 2}

(95% CI:

1.3 \times 10^{- 2}, 3.55 \times 10^{- 2}

), suggesting that approximately 2.8 percent of all cases, a relatively low number, have been reported. The temperature-dependent approach (7) leads to

ψ = 2.75 \times 10^{- 2}

(95% CI:

1.65 \times 10^{- 2}, 7.89 \times 10^{- 2}

), and the data-driven time-dependent discretization (9) corresponds to

ψ = 4.2 \times 10^{- 2}

(95% CI:

2.98 \times 10^{- 2}, 6.21 \times 10^{- 2}

). These consistently low reporting rates across all methods indicate significant under-reporting, with the data-driven time-dependent method (9) suggesting the highest reporting rate at approximately 4.2%. This underscores the fact that most infected people have mild or no symptoms, and these cases remain unaccounted. Figure 3 reveals that while the three methods for the approximation of

β_{e} (t)

differ in how accurately they characterize the transmission dynamics from the environment, they reconstruct similar magnitudes for human-to-human transmission,

β_{h}

, (

10^{- 6}

) and the reporting rate,

ψ

, (

10^{- 2}

). Model (9) provides the most reliable methodology for the analysis of cholera transmission and the efficiency of control and prevention put forward by the authorities.

Figure 4 displays the fit to incidence and cumulative cases in Ayacucho, June 1991–May 1992, associated with three discretization strategies for the transmission rate,

β_{e} (t)

, which is responsible for the direct transmission of Vibrio cholerae from the environment: pre-parameterized periodic transmission (8), temperature-dependent transmission (7), and data-driven time-dependent transmission (9). Two other important disease parameters,

β_{h}

and

ψ

, have also been estimated along with

β_{e} (t)

.

The periodic transmission rate (8) method (left column) captures the overall disease dynamics rather accurately while skipping over the details. It averages the peaks in the second half of the window and under-estimates the initial peak. Yet, it “gets” the general trend very well. For the June 1991–May 1992 time frame, as presented in Figure 4, the temperature-dependent (7) approximation method for

β_{e} (t)

(middle column) demonstrates a less impressive data fit than the February–December 1991 window shown in Figure 2. The reconstructed incidence curve struggles to capture any peaks beyond the initial one, predicting a relatively stable epidemic trend instead. This can be explained by the fact that temperature fluctuations were not the dominating factor in environmental transmission during this period of time, with more significant events, such as increased rainfall and flooding, contributing to cholera spread.

The data-driven time-dependent transmission rate (9) method (right column) continues to demonstrate exceptional performance, achieving a perfect fit to both incidence and cumulative data across all waves. This consistent excellence across different time windows indicates that, considering numerous factors impacting cholera, the best transmission approximation is the one that is learnt from data, and it is superior to any pre-set behavior. The numerical study across the two time periods shows that while the pre-parameterized periodic approach (8) generally outperforms the temperature-dependent method (7), neither can match the accuracy of the data-driven time-dependent algorithm (9).

Figure 5 illustrates the comparison of parameter estimation results among the three modeling strategies for the June 1991–May 1992 period. Each column corresponds to a different method, while the rows represent different epidemiological parameters. In the upper row, the environmental transmission rate,

β_{e} (t)

, displays dissimilar patterns across the three methods. The periodic (8) transmission rate (left column) is a full sine wave with a narrow confidence interval, ranging from approximately 0.05 to 0.35, and it clearly demonstrates the limitations of its built-in behavior. The mean curve for the temperature-dependent (7) rate (middle column) is near-horizontal (ranging from approximately 0.02 to 0.04), with slight variations and large uncertainty (similar to the temperature-dependent rate in Figure 3 for the previous time interval).

The reconstructed data-driven time-dependent (9) transmission rate (right column) shows complex oscillatory behavior, with multiple peaks (ranging from approximately 0.01 to 0.15). As before, the trajectory for (9) correlates with the waves observed in the incidence data (see Figure 4), though the magnitude of the peaks is not consistent. The second row illustrates the time-dependent effective reproduction number

R_{e} (t)

associated with

β_{e} (t)

and

β_{h}

. The pre-parameterized method (8) produces a smooth trajectory, suggesting two consecutive epidemic waves, with values starting above 2.5, dropping below 1 around August 1991, rising above 1 again around October 1991 (reaching approximately 2), and finally declining toward the end of the period. The temperature-dependent transmission rate approach (7) gives rise to significantly different dynamics for

R_{e} (t)

, with values fluctuating near 1 throughout most of the study window and showing minimal variation. The data-driven time-dependent method (9) informs the fast-changing behavior of the corresponding reproduction number, with amplitude bouncing from 2.5 to almost 0.

The third row presents the estimated human-to-human transmission rates,

β_{h}

, with their respective confidence intervals. The periodic method (8) estimates

β_{h} = 3.67 \times 10^{- 7}

(95% CI:

4.43 \times 10^{- 14}, 7.78 \times 10^{- 7}

), the temperature-dependent approach (7) yields

β_{h} = 1.23 \times 10^{- 6}

(95% CI:

- 1.33 \times 10^{- 6}, 1.91 \times 10^{- 6}

), and the data-driven time-dependent discretization (9) reconstructs

β_{h} = 6.85 \times 10^{- 7}

(95% CI:

9.17 \times 10^{- 8}, 1.59 \times 10^{- 6}

). These estimates show greater variability compared to the first time period. In particular, the confidence interval for the temperature-dependent method (7) includes a few outliers (due to instability), with negative values that lack physical meaning. All three methods indicate relatively low rates of human-to-human transmission, which is in agreement with the findings from the previous data set.

The bottom row displays the reporting rate approximations,

ψ

, representing the proportion of reported cases. The periodic method (8) estimates

ψ = 8.22 \times 10^{- 3}

(95% CI:

7.65 \times 10^{- 3}, 8.73 \times 10^{- 3}

), that is, approximately 0.8 percent of cases are reported. This is substantially lower than the estimate for the first time period. The temperature-dependent approach (7) gives

ψ = 1.62 \times 10^{- 1}

(95% CI:

9.11 \times 10^{- 2}, 2.75 \times 10^{- 1}

), and data-driven time-dependent discretization (9) estimates

ψ = 6.22 \times 10^{- 2}

(95% CI:

4.16 \times 10^{- 2}, 1.09 \times 10^{- 1}

). Evidently, algorithm (9) yields the most accurate reporting rate histogram, with methods (7) and (8) showing the lower and upper bounds, respectively.

Overall, the above experiments illustrate that parameter estimation based on periodic

β_{e} (t)

approximation (8) is slightly more reliable than the temperature-dependent method (7) since temperature fluctuations are important but not the only factors that contribute to the seasonality of cholera spread. Yet, time-dependent discretization (9) offers unparalleled accuracy regarding data fit (see Table 2 and Table 3). It leads to the most informed estimates of the effective reproduction number,

R_{e} (t)

, which allows for monitoring (and adjusting) the impact of control and prevention measures. Estimates of the human-to-human transmission rate,

β_{h}

, and case reporting rate,

ψ

, are relatively consistent (though, understandably, not identical) across all three discretization methods for

β_{e} (t)

. The experiments convincingly demonstrate that the reporting rate,

ψ

, is low (due to the prevalence of asymptomatic and mild cases that go largely under-reported) and (indirect) human-to-human transmission,

β_{h}

, is much less of a factor in cholera spread than (direct) cholera transmission from the environment (since cholera is unlikely to pass from person to person after a casual contact [14]).

3.3. Practical Identifiability of Model Parameters

To assess the practical identifiability of key transmission parameters in our cholera modeling framework, we implemented an uncertainty quantification strategy based on parametric bootstrapping. Specifically, we generated

M = 100

synthetic data sets by introducing Poisson-distributed noise to both weekly incidence and cumulative case data and re-estimated model parameters under each scenario. This allowed us to empirically derive confidence intervals for the human-to-human transmission rate,

β_{h}

, the environmental transmission rate,

β_{e} (t)

, and the case reporting probability,

ψ

.

We found that practical identifiability varied notably across the three environmental transmission structures under consideration: pre-parameterized periodic, temperature-dependent, and data-driven time-dependent. The periodic model yielded narrow confidence intervals for

β_{h}

and

ψ

, suggesting relatively stable estimation, though it lacked flexibility to capture short-term fluctuations in transmission dynamics. In contrast, the temperature-dependent model showed wider intervals, and in some instances, produced biologically implausible lower bounds for

β_{h}

, highlighting challenges in recovering this parameter reliably under that formulation. The data-driven approach, which projects

β_{e} (t)

onto a high-dimensional basis function space, provided the closest fit to the observed data and yielded narrow confidence bounds for all estimated parameters. However, these tight intervals may partially reflect the model’s flexibility and its capacity to overfit the observed data, rather than true practical identifiability.

Taken together, these results underscore the importance of jointly considering model fit and parameter identifiability when evaluating the utility of alternative transmission models. While data-driven methods can improve in-sample accuracy, their capacity to recover epidemiologically meaningful parameters may be compromised in settings with sparse or noisy data. These observations are consistent with prior work demonstrating identifiability limitations in epidemic models under data-constrained conditions [46,47].

4. Forecasting of Future Incidence Cases

Stable estimation of disease parameters is crucial for our ability to predict future cholera trends. Forecasting helps to ensure the most powerful government response focused on improving water safety, sanitation measures, and dispatching the necessary resources to the highest risk areas. As our numerical study in Section 3 demonstrates, data-driven discrete approximation (9) of the environmental transmission rate,

β_{e} (t)

, is very efficient. However, the expansion coefficients in (9) do not work beyond the calibration interval, thus failing to provide any insight into the number of future incidence cases. For this reason, to forecast the next phase of the outbreak, one needs to employ a parametric representation of

β_{e} (t)

that would, in a meaningful way, extrapolate the disease dynamics observed during the training period. Discretization methods (7) and (8) can answer this call since they involve behavioral patterns that are expected to be carried over to the post-calibration stage. In what follows, we detail the steps required to adapt approximations (7) and (8) to forecast cholera transmission while addressing challenges encountered earlier, such as noise sensitivity and the influence of seasonal variations.

The primary advantage of the transmission rate (7) is its ability to include future temperature variations in the forecasting model. Indeed, with epidemic data coming in daily, one can use the official weather forecasts to predict future incidence cases for the next 10 or 15 days based on model (7). However, when it comes to weekly reporting, which is the case for the 1991–1997 cholera epidemic in Peru, in order to predict cholera trends in the upcoming weeks, one would need to use long-term weather forecasts that are generally less reliable. Therefore, in the experiments reported in this section, to estimate future temperature values, we employed the average temperature for the corresponding period over the past 5 years. For this purpose, we downloaded historic temperature data from Visual Crossing (https://www.visualcrossing.com/weather-data/, accessed on 15 April 2025), covering the years 1986–1991. We then calculated a 5-year average for each forecasting window to serve as a proxy for our real temperature input.

To predict future cholera trends using model (1), we analyzed four different Peruvian regions during four different periods: Arequipa (May–December 1991), Ayacucho (April–November 1991), Huanuco (March–October 1991), and Junin (October 1991–May 1992). For each region, we considered five distinct time intervals and compared the forecasting performance of two discretization algorithms for

β_{e} (t)

: temperature-dependent transmission (7) and pre-parameterized periodic transmission (8).

The study windows for each region were as follows:

Arequipa: May 1991–July 1991, July 1991–September 1991, August 1991–October 1991, September 1991–November 1991, and October 1991–December 1991;
Ayacucho: April 1991–June 1991, May 1991–July 1991, June 1991–August 1991, August 1991–October 1991, and September 1991–November 1991;
Huanuco: March 1991–May 1991, April 1991–June 1991, May 1991–July 1991, July 1991–September 1991, and August 1991–October 1991;
Junin: October 1991–December 1991, November 1991–January 1992, December 1991–February 1992, January 1992–March 1992, and March 1992–May 1992.

4.1. Epidemic Forecasts: Arequipa Region

Figure 6 displays incidence data for the Arequipa region. The forecasting process follows a 5-week moving window approach, each lasting 10 weeks. The brown dashed vertical lines, spaced 5 weeks apart, indicate the starting points of the forecasts. At the top, a comparison of historic standardized temperature (1986–1991) with actual standardized temperature,

T_{norm} (t)

, during the same months is presented. Both temperature curves reveal seasonal variations, showing moderate temperatures in May, peaking around November, and then falling slightly in December.

Figure 7 and Figure 8 illustrate a comprehensive analysis of cholera forecasting results for incidence and cumulative cases, respectively, in the Arequipa region during five sequential time periods in 1991. Each figure compares temperature-dependent transmission (7) (bottom row) and pre-parameterized periodic transmission (8) (top row).

In the first time period (May-July 1991), the confidence interval (CI) for periodic approximation (8) covers almost all incidence data during the calibration and forecasting periods, but it predicts a downward trajectory that fails to capture the real trend in incidence data. The temperature-dependent model (7), despite showing greater uncertainty (wider 95% CI) and over-estimating the expected number of incidence cases, predicts the overall disease trend remarkably well. This suggests that temperature variations may help to estimate future cholera dynamics in ways that a simple periodic model cannot. The second time period (July-September) reveals an improvement in the temperature-dependent model (7) performance. Not only does it successfully cover most of the observed data points within its 95%CI, but the lower bound of the CI also correctly forecasts the peak and the subsequent decline in cholera cases. The periodic model (8), despite its narrower confidence band, over-estimates the rate of decline in incidence cases towards the end of the forecasting period.

During the third period (August–October), the temperature-dependent model (7) provides accurate predictions of both the trend and the timing of case fluctuations. It suggests a relatively stable trajectory, with an accurate forecast of ups and downs in the incidence data. However, the periodic method (8) shows a misguided downhill trend that is not supported by the data.

The fourth column (September–November) presents an interesting case, where both models cover most data points within their confidence intervals. Yet, the temperature-dependent model correctly forecasts an increasing incidence function, and the periodic model (8) predicts relative stability by underestimating incidence data.

In the final period (October-December), we observe that both models predict that incidence cases will go up, and neither CI captures all data points beyond the calibration barrier. This inconsistency appears to be caused by an unexpected decrease in cases immediately following the training period, possibly due to intervention measures or reporting irregularities not accounted for in either model. In fact, towards the end of the training period, the incidence data suggests an uphill trend, but it suddenly falls after the calibration line.

The cumulative case forecasts in Figure 8 provide additional observations, showing how errors in incidence forecasting accumulate over time. The temperature-dependent model (7) generally shows better alignment with the observed cumulative case data, particularly in the fourth forecasting frame.

4.2. Epidemic Forecasts: Ayacucho Region

Figure 9 displays predictive data analysis for the Ayacucho region, utilizing a 6-week moving window approach that spans 10 weeks for each study period, 5 weeks for the calibration, and 5 weeks for the forecast. After reaching their lowest point in mid-June, temperatures begin to rise. Apparently, warmer weather creates favorable conditions for cholera spread. About a month later, the outbreak responds with a growing number of newly reported cases, suggesting a time lag effect, with higher temperatures leading to increased transmission. However, during the early months of the study period, the pattern is reversed.

While the temperatures decline from April to June, the cholera transmission actually increases from early May to mid-June, reaching its peak at the time when temperatures are the coolest. This underlines the fact that temperature is an important but not the only factor in the formation of seasonal cycles.

Figure 10 and Figure 11 illustrate cholera forecasting in Ayacucho through incidence and cumulative data, respectively. As before, our goal is to compare the numerical efficiency of two distinct modeling approaches, pre-parameterized periodic transmission rate (8) (top row) and temperature-dependent transmission rate (7) (bottom row).

In the first time period (April–June 1991), both methods fully cover incidence data within the 95% CI. However, the periodic approach (8) wrongly forecasts a downward trend for incidence data (instead of upward), and the temperature-dependent approach (7) suggests a slight increase in data incidence right after the calibration period, followed by a relatively stable trajectory, which is better aligned with the actual trend in cholera data.

In the second time period (May–July 1991), the data exhibit irregular patterns during the forecasting period, making it hard to expect a good prediction of the trajectory from either method.

The periodic model (8) performs adequately by capturing the data within 95% CI with wide uncertainty bounds, although the prediction of the trend mistakenly under-estimates cholera incidence data. In contrast, the temperature-dependent model (7) does better by not only capturing the data within 95% CI but also showing an accurate mean curve cutting through data points during the forecast, which is quite remarkable considering the variability of incidence data.

During the third time period (June–August), both methods correctly predict the dynamics of cholera data. However, the temperature-dependent approach (7), in showing wider uncertainty, is more successful in capturing all data points within the 95% CI. It also appears to predict the incidence trend slightly more accurately during the forecasting period.

During the fourth period (August–October), the two methods are very consistent and are both wrong in exhibiting downhill behavior, which is not in sync with the exponential increase in incidence cases (much more rapid than anticipated).

Although the fit during the calibration period is adequate, the forecasts based on models (7) and (8) do not capture the three highest points of the data set.

In the final period (September–November), again, we see a pattern similar to the first column in Figure 7. The temperature-dependent method (7), although unsuccessful in covering all data points, predicts the disease trend correctly by suggesting an initial ascent followed by a decrease towards the end of the window. The periodic approach (8) anticipates near-exponential growth, which is not supported by the data.

The cumulative case forecasts illustrated in Figure 11 confirm the numerical findings outlined above. Again, during the fourth period (August-October, 1991), the two methods treat the last two points of the data set as outliers and under-estimate the spike in cumulative incidence in the month of October (which actually could have been expected considering the rising temperature this time of the year).

4.3. Epidemic Forecasts: Huanuco Region

Figure 12 displays the forecasting data analysis for the Huanuco region, utilizing a 5-week moving window approach that spans 10 weeks for each study period, 5 weeks for the calibration period, and 5 weeks for the forecasting period as before. This figure further illustrates the lagged relationship between historic temperatures and disease incidence. Indeed, a steady decline in temperature values causes, with a delay, a (rather rapid) decrease in reported incidence cases. However, the uphill temperature trend towards the end of the study window is not followed by any increase in cholera transmission, indicating that other factors might have been at play.

Figure 13 and Figure 14 present the comparison of cholera forecasting results for the Huanuco region across five sequential periods in 1991. The analysis contrasts two modeling approaches: the periodic transmission rate (8) (top row) and the temperature-dependent transmission rate (7) (bottom row).

In the first period (March–May 1991), model (8) successfully captures average incidence, with a linear regression through the incidence data, but it fails to forecast the downward trend during the last 3 weeks. The temperature-dependent model, while overestimating incidence, is remarkably accurate in predicting the overall dynamics of the outbreak. Notably, both models encompass all data points within their 95% CI.

During the second period (April–June 1991), both models are correct in showing a steady decline in incidence cases. The periodic approach performs slightly better, covering all data points within its 95% CI, and model (7) leaves one point just outside the lower bound of the CI.

The third period (May–July), again, underscores the strength of the temperature-dependent model (7). It provides an accurate trend prediction without over-smoothing and covers all but one of the data points within an appropriately widened 95% CI. In contrast, the periodic model (8), while following the general trend, misses the first two data points in the forecasting window and consistently under-estimates the expected cases.

The fourth period (July–September) essentially confirms these findings, with the temperature-dependent model (7) accurately forecasting the behavioral pattern and covering the data points through wider uncertainty bounds, which properly account for what might have otherwise been dismissed as outliers. Periodic model (8) shows more uncertainty while over-simplifying the general trend and under-estimating future incidence cases.

In the final period (August–October), neither model demonstrates clear superiority. The temperature-dependent method shows a more accurate fit over the calibration period but over-estimates future cases. On the other hand, the periodic model under-estimates future cases.

The cumulative case forecasts presented in Figure 14 reinforce the comparative performance of both modeling approaches across the five time periods. In the first period (March–May 1991), while the periodic model initially tracks cases well, both models ultimately over-estimate the cumulative burden, though reported cases remain within their 95% CI. The second and third periods display strong predictive performance from both approaches, with particularly close alignment between median predictions and actual data. In the fourth period (July–September), the two models are noticeably different, with the temperature-dependent model (7) showing better alignment with observations, especially at the early stages of the forecast. The final period reveals different uncertainty characteristics between the models, with the temperature-dependent approach (7) demonstrating a better data fit during the calibration period but over-estimating future incidence cases.

4.4. Epidemic Forecasts: Junin Region

Figure 15 displays the forecasting data for the Junin region, utilizing a 5-week moving window approach that spans 10 weeks for each analysis period, with 5 weeks for the calibration period, and 5 weeks for the forecast period, as before. The plot shows significant fluctuations in incidence data throughout this period, with several pronounced peaks and valleys. Yet, it is rather consistent with the actual standardized temperature that deviates from historical temperatures toward the end of the period (March–May 1992); the actual temperature remains relatively stable, whereas the historical average shows a steady decline. This discrepancy is significant for forecasting, as models calibrated using historical temperature patterns might produce inaccurate predictions when actual temperatures deviate from their expected seasonal trends. This highlights a potential limitation of temperature-dependent transmission models that rely on historical temperature data for future projections.

Figure 16 and Figure 17 illustrate cholera forecasting bundles in Junin across five sequential periods in 1991–1992, comparing periodic transmission rate (8) (top row) and temperature-dependent transmission rate (7) (bottom row) modeling approaches. In the first period (October–December 1991), both models forecast similar upward trends, thus overestimating real incidence. Their uncertainty bounds look similar, though the temperature-dependent model exhibits slightly higher upper and lower bounds. Both approaches successfully capture all data points within their 95% CI.

During the second period (November 1991–January 1992), both models predict a downward trend, with the mean curve for the temperature-dependent model (7) better following the actual data pattern. The unexpected spike at the endpoint, potentially an outlier, results in a wider uncertainty interval for both models. The periodic model (8) continues to under-estimate the incidence cases.

The third period (December 1991–February 1992) reveals contrasting strengths: the periodic model (8) is accurate in the first half of the forecasting window; yet, it fails to track the downward trend in the last 3 weeks. The temperature-dependent model (7) identifies the overall trend better but does not capture some data points within its 95% CI, whereas the periodic model encompasses all observations.

The fourth (January–March 1992) and the final (March–May 1992) periods further demonstrate the strong characteristics of the temperature-dependent model (7). In both cases, model (7) accurately predicts the overall trend and covers most of the data points within its 95% CI (without excessive uncertainty). In contrast, the periodic model (8) performs poorly. For January–March 1992, it incorrectly predicted fast decay and left all actual cases out of its 95% CI. For March–May 1992, model (8) mistakenly forecasts a growing trend while not including all but one data point in the 95% CI.

The cumulative case forecasts in Figure 17 are consistent with those of the incidence cases, particularly in the fourth and fifth periods, where the periodic approach under-estimates and over-estimates cumulative cases, respectively.

To summarize, between the two methods, (7) and (8), for the multiple data sets considered, the temperature-dependent method (7) was a clear winner (in contrast to the parameter estimation experiments, where (8) did better than (7)). In general, the discretization algorithm (7) has accurately predicted the overall trend and covered most of the data points within its 95% CI (without excessive uncertainty). This observation is supported by Table 4, Table 5, Table 6 and Table 7 and, most of all, by the above figures. The superiority of method (7) is understandable since the temperature data, though not perfect, provide a priori information about the future for forecasting analysis.

5. Conclusions and Future Work

Mathematical models of disease transmission coupled with robust optimization algorithms for parameter estimation help to understand the spread of cholera and to assess the efficiency of preventive measures. The primary goal of this study was stable parameter estimation and forecasting of future incidence cases based on a compartmental model (1) along with various discretization strategies for the environmental transmission rate,

β_{e} (t)

. To reconstruct

β_{e} (t)

, three approximation methods were employed: temperature-dependent transmission (7), pre-parameterized periodic transmission (8), and general data-driven time-dependent transmission (9). Concurrently, other key epidemic parameters, such as case reporting rate,

ψ

, and human-to-human transmission,

β_{h}

, were also estimated. Focusing on the 1991–1997 cholera outbreak in Peru, we applied these discretization tools to calibrate the model and to assess practical implications of parameter estimation for epidemic preparedness. In the next stage, the pre-parameterized and temperature-dependent transmission rates were used to forecast cholera cases beyond the training period, with particular attention paid to the seasonal nature of the disease and its strong correlation with temperature.

Numerical simulations using two data sets associated with the 1991–1997 cholera epidemic in Peru illustrate that reconstructed disease parameters based on pre-parameterized periodic approximation (8) are slightly more reliable than parameters estimated using the temperature-dependent method (7); this is because temperature fluctuations are important, but they are not the only factor contributing to seasonality of cholera spread. Yet, time-dependent discretization (9) offers unparalleled accuracy of data fit; it leads to the most informed estimates of the effective reproduction number,

R_{e} (t)

, which allows for monitoring (and adjusting) the impact of control and prevention measures. Estimates of the human-to-human transmission rate,

β_{h}

, and the case reporting rate,

ψ

, are relatively consistent (though, understandably, not identical) across all three

β_{e} (t)

discretization methods. The experiments convincingly demonstrate that the reporting rate,

ψ

, is low (apparently, less than 10% due to the prevalence of asymptomatic and mild cases that go largely under-reported) and (indirect) human-to-human transmission,

β_{h}

, is much less of a factor in cholera spread than (direct) cholera transmission from the environment (since cholera is unlikely to pass from person to person after a casual contact [14]).

While our study primarily focuses on practical identifiability assessed via parametric bootstrapping, we note that the structural identifiability of the baseline cholera model with constant parameters has been well-established in the literature, assuming known initial conditions and full observability of incidence data [47]. However, in the data-driven time-dependent formulation, where

β_{e} (t)

is approximated by a high-dimensional basis expansion (e.g., Fourier or Legendre polynomials), structural identifiability becomes more challenging to assess analytically. The increased model flexibility introduces parameter redundancy, which can hinder the unique recovery of parameters, even under ideal, noise-free conditions.

To forecast the next phase of the outbreak, discretization methods (7) and (8) were employed for the parametric representation of

β_{e} (t)

since such methods involve behavioral patterns that are expected to be carried over to the post-calibration cholera stage and can, therefore, in a meaningful way, extrapolate the disease dynamics beyond the training period. The primary challenge in predicting future cholera cases using (7) is the availability of temperature data. Indeed, with epidemic data coming in daily, one can use the official weather forecasts to estimate the number of new cases for the next 10 or 15 days based on model (7). However, when it comes to weekly reporting, which is the case for the 1991–1997 cholera epidemic in Peru, in order to predict cholera trends in the upcoming weeks, one would need to use long-term weather forecasts that are generally less reliable. Therefore, in our experiments, to approximate future temperature values, we employed the average temperature for the corresponding period over the past 5 years. For this purpose, we downloaded historical temperature data from Visual Crossing (https://www.visualcrossing.com/weather-data/, accessed on 15 April 2025), covering the years 1986–1991. We then calculated a 5-year average for each forecasting window to serve as a proxy for our real temperature input. When it comes to forecasting, between the two methods, (7) and (8), for the multiple data sets considered, the temperature-dependent method (7) was a clear winner (as opposed to the parameter estimation experiments, where (8) did better than (7)). In general, the discretization algorithm (7) accurately predicted the overall trend and covered most of the data points within its 95% CI (without excessive uncertainty).

In conclusion, despite the significant progress made in our understanding of cholera transmission dynamics, substantial challenges remain in the development of rigorous forecasting methodologies. While the relationship between historic temperatures and disease incidence allows us to successfully incorporate an unknown future into the forecasting algorithm, incidence cases do not always correlate with temperature fluctuations, suggesting that, ideally, cholera models need to account for other environmental and non-environmental factors. Moreover, models based on historical temperature patterns might produce inaccurate predictions when actual temperatures significantly deviate from their expected seasonal trends. Nevertheless, in this paper, we have made an important first step in the design of a reliable forecasting strategy, which builds upon (1) a novel modeling approach to the discrete approximation of a time-dependent environmental transmission rate and (2) a regularized optimization algorithm for stable parameter estimation and the quantification of uncertainty in the reconstructed disease parameters.

In future work, to discretize the environmental transmission rate,

β_{e} (t)

, we plan to explore hybrid strategies, e.g., semi-parametric approximations based on a (truncated) real Fourier basis with different frequencies. We will also evaluate the impact of various numerical methods on the accuracy and stability of parameter estimation. This will include the study of regularized optimization procedures with low-rank Hessian updates as well as second-order iterative schemes coupled with nudging.

Author Contributions

Conceptualization, A.S. and G.C; methodology, A.S and G.C.; software, H.K. and A.S.; validation, A.S, G.C. and H.K.; formal analysis, A.S.; investigation, A.S. and H.K.; resources, G.C. and O.J.M.; data curation, O.J.M.; writing—original draft preparation, A.S. and H.K.; writing—review and editing, A.S., G.C., H.K. and O.J.M.; visualization, H.K.; supervision, A.S.; project administration, A.S. and G.C.; funding acquisition, A.S. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

G.C. was supported by NSF awards 2125246 and 2026797, A.S. was supported by NSF award 2409868.

Data Availability Statement

The codes and datasets used in this study are publicly available from https://github.com/hkarami-GSU/Cholera_code/.

Acknowledgments

We would like to express our deepest gratitude to the anonymous reviewers for their most valuable comments on our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Holmgren, J.; Svennerholm, A.M. Mechanisms of disease and immunity in cholera: A review. J. Infect. Dis. 1977, 136, S105–S112. [Google Scholar] [CrossRef] [PubMed]
Munot, K.; Kotler, D.P. Small intestinal infections. Curr. Gastroenterol. Rep. 2016, 18, 31. [Google Scholar] [CrossRef] [PubMed]
Finkelstein, R.A. Cholera, Vibrio cholerae O1 and O139, and other pathogenic vibrios. In Medical Microbiology, 4th ed.; University of Texas Medical Branch at Galveston: Galveston, TX, USA, 1996. [Google Scholar]
Rabbani, G.; Greenough, W., III. Food as a vehicle of transmission of cholera. J. Diarrhoeal Dis. Res. 1999, 17, 1–9. [Google Scholar] [PubMed]
Rabbani, G.; Greenough, W.B., III. Pathophysiology and clinical aspects of cholera. In Cholera; Springer: Berlin/Heidelberg, Germany, 1992; pp. 209–228. [Google Scholar]
Bennish, M.L. Cholera: Pathophysiology, clinical features, and treatment. In Vibrio Cholerae and Cholera: Molecular to Global Perspectives; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1994; pp. 227–255. [Google Scholar]
Aranda-Michel, J.; Giannella, R.A. Acute diarrhea: A practical review. Am. J. Med. 1999, 106, 670–676. [Google Scholar] [CrossRef] [PubMed]
Guerrant, R.L.; Carneiro-Filho, B.A.; Dillingham, R.A. Cholera, diarrhea, and oral rehydration therapy: Triumph and indictment. Clin. Infect. Dis. 2003, 37, 398–405. [Google Scholar] [CrossRef] [PubMed]
Cieza, J.; Sovero, Y.; Estremadoyro, L.; Dumler, F. Electrolyte disturbances in elderly patients with severe diarrhea due to cholera. J. Am. Soc. Nephrol. 1995, 6, 1463–1467. [Google Scholar] [CrossRef]
Zafar, M.Z.; Gulzar, H. A case study: Cholera. Occup. Med. Health Aff. 2016, 4, 2–5. [Google Scholar]
Bell, C.W. On the Epidemic Ague or “Fainting Fever” of Persia, a Species of Cholera, Occurring in Teheran in the Autumn of the Year 1842. Br. Foreign Med. Rev. 1843, 16, 558. [Google Scholar]
Kuna, A.; Gajewski, M. Cholera—The new strike of an old foe. Int. Marit. Health 2017, 68, 163–167. [Google Scholar] [CrossRef]
Christie, A.T. On the Symptoms of Epidemic Cholera. Lond. Med. Phys. J. 1833, 14, 456. [Google Scholar]
World Health Organization. Cholera. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/cholera (accessed on 15 April 2025).
Sharifi-Mood, B.; Metanat, M. Diagnosis, clinical management, prevention, and control of cholera; a review study. Int. J. Infect. 2014, 1, e18303. [Google Scholar] [CrossRef]
Haggard, J.V. Epidemic Cholera in Texas, 1833–1834. Southwest. Hist. Q. 1937, 40, 216–230. [Google Scholar]
Cooper, D.B. The new “black death”: Cholera in Brazil, 1855–1856. Soc. Sci. Hist. 1986, 10, 467–488. [Google Scholar]
Arnold, D. Cholera and colonialism in British India. In Past Present; Oxford University Press: Oxford, UK, 1986; pp. 118–151. [Google Scholar]
Hsueh, B.Y.; Waters, C.M. Combating cholera. F1000Research 2019, 8, 589. [Google Scholar] [CrossRef]
Koo, D.; Traverso, H.; Libel, M.; Drasbek, C.J.; Tauxe, R.; Brandling-Bennett, D.A. Epidemic cholera in Latin America, 1991–1993: Im-plications of case definitions used for public health surveillance. Bull. Pan Am. Health Organ. (PAHO) 1996, 30, 134–143. [Google Scholar]
Del Aguila, R.; Benavides, B.; Jacoby, E.; Novara, J. Reconocimiento de cólera por personas sintomáticas después del brote epidémico en las UDES Lima sur y la sub-región Luciano Castillo-región Grau. Rev. Peru. Epidemiol. (Online) 1992, 5, 5–9. [Google Scholar]
Vugia, D.J.; Koehler, J.E.; Ries, A.A. Surveillance for epidemic cholera in the Americas: An assessment. Morb. Mortal. Wkly. Rep. Surveill. Summ. 1992, 41, 27–34. [Google Scholar]
Tuite, A.R.; Tien, J.; Eisenberg, M.; Earn, D.J.; Ma, J.; Fisman, D.N. Cholera epidemic in Haiti, 2010: Using a transmission model to explain spatial spread of disease and identify optimal control interventions. Ann. Intern. Med. 2011, 154, 593–601. [Google Scholar] [CrossRef]
Anderson, C. Cholera epidemic traced to risk miscalculation. Nature 1991, 354, 255. [Google Scholar] [CrossRef]
Finger, F.; Lemaitre, J.; Juin, S.; Jackson, B.; Funk, S.; Lessler, J.; Mintz, E.; Dely, P.; Boncy, J.; Azman, A. Inferring the proportion of undetected cholera infections from serological and clinical surveillance in an immunologically naive population. Epidemiol. Infect. 2024, 152, e149. [Google Scholar] [CrossRef]
Curioso, W.H.; Miranda, J.J.; Kimball, A.M. Rapid Response: Controlling the cholera epidemic in Peru: The community’s Oral Rehydration Units. Br. Med. Assoc. 2004, 328, 777. [Google Scholar]
Wang, J. Mathematical models for cholera dynamics—A review. Microorganisms 2022, 10, 2358. [Google Scholar] [CrossRef] [PubMed]
Anteneh, L.M.; Lokonon, B.E.; Kakaï, R.G. Modelling techniques in cholera epidemiology: A systematic and critical review. Math. Biosci. 2024, 373, 109210. [Google Scholar] [CrossRef] [PubMed]
Hartley, D.; Morris, J.J.; Smith, D. Hyperinfectivity: A critical element in the ability of V. cholerae to cause epidemics? PLoS Med. 2006, 3, e7. [Google Scholar] [CrossRef] [PubMed]
Ratchford, C.; Wang, J. Multi-scale modeling of cholera dynamics in a spatially heterogeneous environment. Math. Biosci. Eng. 2020, 17, 948–974. [Google Scholar] [CrossRef]
Smirnova, A.; Sterrett, N.; Mujica, O.J.; Munayco, C.; Suárez, L.; Viboud, C.; Chowell, G. Spatial dynamics and the basic reproduction number of the 1991–1997 cholera epidemic in Peru. PLoS Neglected Trop. Dis. 2020, 14, e0008045. [Google Scholar] [CrossRef]
Mukandavire, Z.; Liao, S.; Wang, J.; Gaff, H.; Smith, D.L.; Morris, J.G., Jr. Estimating the reproductive numbers for the 2008–2009 cholera outbreaks in Zimbabwe. Proc. Natl. Acad. Sci. USA 2011, 108, 8767–8772. [Google Scholar] [CrossRef]
Mukandavire, Z.; Smith, D.L.; Morris, J.G., Jr. Cholera in Haiti: Reproductive numbers and vaccination coverage estimates. Sci. Rep. 2013, 3, 997. [Google Scholar] [CrossRef]
Azman, A.S.; Luquero, F.J.; Rodrigues, A.; Palma, P.P.; Grais, R.F.; Banga, C.N.; Grenfell, B.T.; Lessler, J. Urban cholera transmission hotspots and their implications for reactive vaccination: Evidence from Bissau city, Guinea Bissau. PLoS Neglected Trop. Dis. 2012, 6, e1901. [Google Scholar] [CrossRef][Green Version]
Codeço, C.T. Endemic and epidemic dynamics of cholera: The role of the aquatic reservoir. BMC Infect. Dis. 2001, 1, 1–14. [Google Scholar] [CrossRef]
Fung, I.C.H. Cholera transmission dynamic models for public health practitioners. Emerg. Themes Epidemiol. 2014, 11, 1–11. [Google Scholar] [CrossRef] [PubMed]
Cauchemez, S.; Boëlle, P.Y.; Thomas, G.; Valleron, A.J. Estimating in real time the efficacy of measures to control emerging communicable diseases. Am. J. Epidemiol. 2006, 164, 591–597. [Google Scholar] [CrossRef] [PubMed]
Hendrix, T.R. The pathophysiology of cholera. Bull. N.Y. Acad. Med. 1971, 47, 1169. [Google Scholar] [PubMed]
Kaper, J.; Morris, J.; Levine, M. Cholera. Clin. Microbiol. Rev. 1995, 8, 48–86. [Google Scholar] [CrossRef]
Mujica, O.; Seminario, L.; Taxe, R.; Beingolea, L.; Palacios, A.; Vásquez, L.; Vargas, R.; Moreno, D.; Rodriguez, M.; Tejada, E.; et al. Investigación epidemiológica del cólera en el Perú; lecciones para un continente en riesgo. Rev. Méd. Hered. 1992, 2, 121–131. [Google Scholar] [CrossRef]
Seminario, C.; Mujica, O.; Fishbein, D. Priorities for public health surveillance when resources are limited. Morb. Mortal. Wkly. Rep. 1992, 41, 85–89. [Google Scholar]
Ramírez, I.J.; Lee, J. Deconstructing the spatial effects of El Niño and vulnerability on cholera rates in Peru: Wavelet and GIS analyses. Spat. Spatio-Temporal Epidemiol. 2022, 40, 100474. [Google Scholar] [CrossRef]
Choi, B.; Kim, B. Prevalence and risk factors of intestinal parasite infection among schoolchildren in the peripheral highland regions of Huanuco, Peru. Osong Public Health Res. Perspect. 2017, 8, 302. [Google Scholar] [CrossRef]
Mougenot, B.; Amaya, E.; Herrera-Añazco, P. Water, sanitation, and hygiene (WASH) conditions and prevalence of office visits due to anemia: A regional-level analysis from Peru. J. Water Sanit. Hyg. Dev. 2020, 10, 951–958. [Google Scholar] [CrossRef]
Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
Roosa, K.; Lee, Y.; Luo, R.; Kirpich, A.; Rothenberg, R.; Hyman, J.M.; Yan, P.; Chowell, G. Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med. 2020, 9, 596. [Google Scholar] [CrossRef] [PubMed]
Eisenberg, M.C.; Robertson, S.L.; Tien, J.H. Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease. J. Theor. Biol. 2013, 324, 84–102. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Map of Peru, showing its 25 regions, each marked with a number and a different color. The code to generate this map is available in the GitHub repository (https://github.com/hkarami-GSU/Peru_Map, accessed on 15 April 2025), with geodata sourced from Stanford University’s Geospatial Center (https://geowebservices.stanford.edu/geoserver/web/?0, accessed on 15 April 2025).

Figure 2. (Top) incidence cases and (bottom) cumulative data for the Ayacucho region, from 26 February to 17 December 1991, vs. model predictions with three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 3. Parameter estimation results while fitting the simple cholera model (1) to incidence and cumulative data for the Ayacucho region, from 26 February to 17 December 1991. The rows represent (first row) the time-dependent environmental transmission rate,

β_{e} (t)

, the time-dependent basic reproduction number, (second row)

R_{e} (t)

, (third row) histograms for human-to-human transmission rate,

β_{h}

, and (fourth row) histograms for the case reporting rate,

ψ

. The columns correspond to three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 3. Parameter estimation results while fitting the simple cholera model (1) to incidence and cumulative data for the Ayacucho region, from 26 February to 17 December 1991. The rows represent (first row) the time-dependent environmental transmission rate,

β_{e} (t)

, the time-dependent basic reproduction number, (second row)

R_{e} (t)

, (third row) histograms for human-to-human transmission rate,

β_{h}

, and (fourth row) histograms for the case reporting rate,

ψ

. The columns correspond to three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 4. (Top) incidence cases and (bottom) cumulative data for the Ayacucho region, from 4 June 1991 to 19 May 1992 vs. model predictions with three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 5. Parameter estimation results while fitting the simple cholera model (1) to incidence and cumulative data for the Ayacucho region, from 4 June 1991 to 19 May 1992. The rows represent (first row) the time-dependent environmental transmission rate,

β_{e} (t)

, the time-dependent basic reproduction number, (second row)

R_{e} (t)

, (third row) histograms for human-to-human transmission rate,

β_{h}

, and (fourth row) histograms for case reporting rate,

ψ

. The columns correspond to three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 5. Parameter estimation results while fitting the simple cholera model (1) to incidence and cumulative data for the Ayacucho region, from 4 June 1991 to 19 May 1992. The rows represent (first row) the time-dependent environmental transmission rate,

β_{e} (t)

, the time-dependent basic reproduction number, (second row)

R_{e} (t)

, (third row) histograms for human-to-human transmission rate,

β_{h}

, and (fourth row) histograms for case reporting rate,

ψ

. The columns correspond to three different discretizations for the environmental transmission rate: (left) pre-parameterized periodic transmission (8), (middle) temperature-dependent transmission (7), and (right) data-driven time-dependent transmission (9).

Figure 6. Incidence data for the Arequipa region. The forecasting process follows a 5-week moving window approach, each lasting 10 weeks. The brown dashed vertical lines, spaced 5 weeks apart, indicate the starting points of the forecasts. At the top, a comparison of historic temperature with actual temperature during the same months is presented.

Figure 7. Comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Arequipa across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 8. Comparison of epidemic cumulative forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Arequipa across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 9. Incidence data for the Ayacucho region. The forecasting process follows a 5-week moving window approach, each lasting 10 weeks. The brown dashed vertical lines, spaced 5 weeks apart, indicate the starting points of the forecasts. At the top, a comparison of historic temperature with actual temperature during the same months is presented.

Figure 10. Comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Ayacucho across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 11. Comparison of epidemic cumulative forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Ayacucho across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 12. Incidence data for the Huanuco region. The forecasting process follows a 5-week moving window approach, each lasting 10 weeks. The brown dashed vertical lines, spaced 5 weeks apart, indicate the starting points of the forecasts. At the top, a comparison of historic temperature with actual temperature during the same months is presented.

Figure 13. Comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods in Huanuco across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 14. Comparison of epidemic cumulative forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods in Huanuco across different periods in 1991. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 15. Incidence data for the Junin region. The forecasting process follows a 5-week moving window approach, each lasting 10 weeks. The brown dashed vertical lines, spaced 5 weeks apart, indicate the starting points of the forecasts. At the top, a comparison of historic temperature with actual temperature during the same months is presented.

Figure 16. Comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods in Junin across different periods in 1991 and 1992. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Figure 17. Comparison of epidemic cumulative forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods in Junin across different periods in 1991 and 1992. Each panel displays a 5-week calibration period followed by a 5-week forecasting period, separated by vertical dashed lines.

Table 2. Goodness of fit [45] to incidence data for the Ayacucho region from 26 February to 17 December 1991 using a simple cholera model (1).

	Pre-Parameterized Periodic	Temperature-Dependent	Data-Driven Time-Dependent
MAE	32.75	38.25	13.21
RMSE	44.78	50.32	19.66

Table 3. Goodness of fit [45] to incidence data for the Ayacucho region from 4 June 1991 to 19 May 1992 using a simple cholera model (1).

	Pre-Parameterized Periodic	Temperature-Dependent	Data-Driven Time-Dependent
MAE	46.84	40.44	14.53
RMSE	58.55	52.21	21.12

Table 4. Goodness of fit [45] comparison for epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Arequipa across different periods in 1991.

	1		2		3		4		5
	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Periodic	112.36	123.36	78.13	93.4	242.89	283.45	184.72	216.43	527.83	549.32
T-based	473.33	507.51	100.86	106.91	120.45	145.8	72.26	87.94	499.41	524.63

Table 5. Goodness of fit [45] comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Ayacucho across different periods in 1991.

	1		2		3		4		5
	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Periodic	76.43	77.17	69.31	91.9	11.22	12.64	15.61	20.48	194.77	236.28
T-based	48.16	53.33	66.02	79.99	9.13	10.29	13.01	18.13	128.62	153.43

Table 6. Goodness of fit [45] comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Huanuco across different periods in 1991.

	1		2		3		4		5
	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Periodic	63.37	79.38	11.44	14.02	27.38	33.66	34.48	41.33	9.88	14.88
T-based	110.86	125.77	18.58	20.56	23.89	25.42	21.52	29.97	17.45	19.31

Table 7. Goodness of fit [45] comparison of epidemic incidence forecasts using the periodic transmission rate (top row) and temperature-dependent transmission rate (bottom row) methods for Junin across different periods in 1991 and 1992.

	1		2		3		4		5
	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Periodic	94.49	111.65	78.63	107.65	96.91	137.72	150.34	167.21	197.64	215.67
T-based	148.9	169.25	40.74	66.31	64.36	70.58	53.5	72.87	52.58	68.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karami, H.; Chowell, G.; Mujica, O.J.; Smirnova, A. Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic. Mathematics 2025, 13, 1692. https://doi.org/10.3390/math13101692

AMA Style

Karami H, Chowell G, Mujica OJ, Smirnova A. Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic. Mathematics. 2025; 13(10):1692. https://doi.org/10.3390/math13101692

Chicago/Turabian Style

Karami, Hamed, Gerardo Chowell, Oscar J. Mujica, and Alexandra Smirnova. 2025. "Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic" Mathematics 13, no. 10: 1692. https://doi.org/10.3390/math13101692

APA Style

Karami, H., Chowell, G., Mujica, O. J., & Smirnova, A. (2025). Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic. Mathematics, 13(10), 1692. https://doi.org/10.3390/math13101692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Estimation and Forecasting Strategies for Cholera Dynamics: Insights from the 1991–1997 Peruvian Epidemic

Abstract

1. Introduction

2. Modeling Cholera Transmission

3. Parameter Estimation with Quantified Uncertainty

3.1. Discrete Approximation of the Environmental Transmission Rate

3.2. Parameter Estimation for the Ayacucho Region

3.3. Practical Identifiability of Model Parameters

4. Forecasting of Future Incidence Cases

4.1. Epidemic Forecasts: Arequipa Region

4.2. Epidemic Forecasts: Ayacucho Region

4.3. Epidemic Forecasts: Huanuco Region

4.4. Epidemic Forecasts: Junin Region

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI