Next Article in Journal
Numerical Investigation on Cathode Gas Diffusion Layer with Conical Frustum Grooves for Enhancing Performance of Proton Exchange Membrane Fuel Cell
Previous Article in Journal
Model Formulation of an Urban Canopy Model by Means of Detailed CFD Simulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A State-Space Agent-Based Model for Infectious Disease Spread

by
Durward A. Cator
1,
Martial L. Ndeffo-Mbah
2 and
Ulisses M. Braga-Neto
1,*
1
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
2
Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
*
Author to whom correspondence should be addressed.
Computation 2026, 14(6), 117; https://doi.org/10.3390/computation14060117
Submission received: 24 March 2026 / Revised: 11 May 2026 / Accepted: 18 May 2026 / Published: 22 May 2026
(This article belongs to the Section Computational Social Science)

Abstract

We present a novel framework for epidemiological disease spread modeling that combines agent-based simulation with Boolean state-space representations and optimal filtering for state estimation under noisy observations. Our approach models individual agents in discrete Susceptible-Exposed-Infected-Recovered (SEIR) states using a compact 2-bit Boolean representation, with agent interactions governed by scheduled contact patterns. To address the challenge of inferring latent infection states from limited and noisy testing data, we develop two complementary inference approaches: (1) a Boolean Kalman particle filter for small populations that tracks the full joint distribution over agent states, and (2) a mean-field approximation for large populations that factorizes the posterior into independent marginal distributions, enabling scalability to realistic population sizes. Unlike continuous-state Kalman filters, our methods naturally handle the discrete nature of epidemiological states while accommodating realistic observation models where only a subset of agents are tested at each time step, with test results subject to false positive and false negative errors. We demonstrate that this framework enables accurate reconstruction of population-level infection dynamics and individual agent states from sparse, noisy observations across populations from 100 to 50,000 agents, providing a computationally tractable approach for real-time epidemic monitoring.

1. Introduction

The COVID-19 pandemic has underscored the critical need for accurate, real-time epidemiological modeling frameworks capable of tracking disease spread through populations despite limited and imperfect surveillance data [1]. Traditional compartmental models (e.g., Susceptible-Infected-Recovered (SIR), Susceptible-Exposed-Infected-Recovered (SEIR)) provide valuable insights into aggregate disease dynamics but lack the granularity to capture individual-level heterogeneity in contact patterns, testing behavior, and infection progression [2]. Conversely, agent-based models (ABMs) offer rich microscopic detail but face significant challenges in state estimation: how can we infer the true infection states of thousands of individuals when we observe only noisy test results from a small subset?
This paper addresses this fundamental inference problem by developing a state-space framework that combines the representational power of agent-based models with the statistical rigor of optimal filtering theory. Our key innovation lies in recognizing that epidemiological states are inherently discrete and can be efficiently represented using Boolean variables, enabling the application of Boolean Kalman filtering techniques originally developed for gene regulatory networks [3]. However, while particle filtering is asymptotically exact as the number of particles grows [4], the curse of dimensionality makes straightforward particle filtering impractical in high-dimensional epidemic ABMs [5], motivating scalable mean-field approximations for realistic population sizes [6].

1.1. Motivation and Challenges

Real-world epidemic surveillance presents several distinct challenges:
  • Partial observability: Only a fraction of the population is tested at any given time, leaving most infection states latent.
  • Noisy measurements: Diagnostic tests exhibit false positive and false negative rates, corrupting observations with measurement error.
  • Discrete states: Epidemiological compartments (S, E, I, R) are categorical, not continuous quantities.
  • Nonlinear dynamics: Disease transmission depends on complex, nonlinear interactions between agents.
  • High dimensionality: Tracking individual-level states in large populations creates computational challenges that grow exponentially with population size.
Standard Kalman filtering assumes continuous Gaussian states and linear dynamics, making it unsuitable for this domain. Extended and unscented Kalman filters can handle nonlinearity but still assume continuous state spaces. Particle filters offer a general solution for nonlinear, non-Gaussian problems, but become computationally prohibitive in high dimensions without careful design or approximations.

1.2. Our Approach

We propose a Boolean state-space agent-based model (BS-ABM) where each agent’s SEIR state is encoded using 2 Boolean variables, yielding four possible states: { 00 , 10 , 11 , 01 } { S , E , I , R } . Agent interactions follow predefined schedules (e.g., workplace, household, social contacts), and state transitions are governed by probabilistic rules that depend on contact with infected individuals. For state estimation, we develop two complementary approaches tailored to different population scales:
Small Populations (Particle Filter): For populations up to approximately 100 agents, we adapt the Auxiliary Particle Filter implementation of the Boolean Kalman Filter (APF-BKF) framework of Imani and Braga-Neto [3] to the epidemiological setting. This approach maintains a population of particles, each representing a hypothesis about all agents’ Boolean states, and propagates these through the discrete-state transition dynamics while updating particle weights based on noisy observations. While the state space of 4 100 10 60 configurations is enormous, particle filtering provides an effective approximation that captures dependencies between agent states arising from their contact structure.
Large Populations (Mean-Field Approximation): For populations of N 100 agents, the state space of 4 N configurations renders particle filtering utterly intractable—no practical number of particles can meaningfully sample this space. We, therefore, employ a mean-field approximation that factorizes the posterior distribution into independent marginal distributions for each agent. Instead of tracking the full joint distribution p ( X k | Y 1 : k ) , we maintain N separate marginal distributions p ( X k ( i ) | Y 1 : k ) for each agent i, where each marginal is defined over only 4 SEIR states. This reduces the representation complexity from exponential to linear in N, enabling scalable inference while sacrificing the ability to capture correlations between agent states.

1.3. Contributions

Our main contributions are: (1) a formal Boolean state-space framework for agent-based epidemiological modeling that naturally represents discrete disease states; (2) an adaptation of Boolean Kalman particle filtering to epidemic state estimation in small populations under realistic observation models including random sampling and measurement noise; (3) a scalable mean-field approximation for large populations that performs inference by tracking independent per-agent marginal distributions while preserving accurate population-level estimates; (4) computational implementations that scale from 100 to 50,000 agents, illustrating the practical transition from exact particle filtering to mean-field inference as population size grows and (5) empirical demonstration that both methods can accurately reconstruct latent states from sparse, noisy observations.

2. Related Work

2.1. Epidemiological Modeling

Compartmental models based on ordinary differential equations have been the workhorse of mathematical epidemiology since Kermack and McKendrick [7]. The SEIR model extends the basic SIR framework by including an Exposed (latent) compartment [2]. While computationally efficient, these models assume homogeneous mixing and cannot capture individual-level heterogeneity.
Agent-based models address these limitations by explicitly simulating individual agents and their interactions [8,9]. ABMs have been successfully applied to model influenza [10], COVID-19 [11], and other infectious diseases. However, most ABM applications focus on forward simulation rather than inverse problems.

2.2. State Estimation and Data Assimilation

The Kalman filter [12] provides optimal state estimates for linear Gaussian systems. Extensions, including the Extended Kalman Filter [13] and Unscented Kalman Filter [14], handle nonlinear dynamics but assume continuous state spaces.
Particle filters (Sequential Monte Carlo methods) [4,15] offer a general framework for nonlinear, non-Gaussian state estimation. These methods have been applied to epidemiological inference [16,17], but standard implementations face challenges with high-dimensional state spaces and discrete states.

2.3. Boolean Network Models and Filtering

Boolean networks, introduced by Kauffman [18] for modeling gene regulatory networks, represent system states as binary vectors with deterministic or probabilistic transition rules. Probabilistic Boolean Networks [19] add stochasticity, enabling richer dynamics.
Imani and Braga-Neto [3] developed particle filtering methods specifically for partially-observed Boolean dynamical systems. This approach has been successfully applied to gene regulatory network inference [20] but has not previously been adapted to epidemiological agent-based models.

2.4. Mean-Field Approximations

Mean-field approximations provide a bridge between microscopic agent-based models and macroscopic compartmental models by assuming that agent interactions can be approximated by their average effect in large populations [21]. The mathematical foundations of such approximations for epidemic processes on networks have been rigorously established [22].

3. Materials and Methods

3.1. Boolean State Representation

Consider a population of N agents indexed by i { 1 , , N } . At discrete time step k, each agent occupies one of four SEIR states: Susceptible (S), Exposed (E), Infected (I), or Recovered (R). We encode each agent’s state using two Boolean variables ( b i 1 , b i 2 ) { 0 , 1 } 2 : ( 0 , 0 ) S ; ( 1 , 0 ) E ; ( 1 , 1 ) I ; ( 0 , 1 ) R . The global system state at time k is represented by the Boolean matrix:
X k = b 1 1 ( k ) b 1 2 ( k ) b 2 1 ( k ) b 2 2 ( k ) b N 1 ( k ) b N 2 ( k ) B N × 2
where B = { 0 , 1 } denotes the Boolean domain. At each time step k, we obtain noisy observations on a subset T k { 1 , , N } of agents’ states in the observation process Y k .

3.2. State Transition and Observation Dynamics

The states are updated and observed at each discrete time through the following nonlinear signal model:
X k = f k ( X k 1 , n k ) Y k = h k ( X k , v k )
where n k represents stochastic process noise capturing random elements of disease transmission, v k is observation noise, f k maps the previous state and process noise to the current state, and h k maps the current state and observation noise into the measurement space.

3.2.1. Contact Structure and Scheduling

Agents move according to a location schedule L k = { L 1 ( k ) , L 2 ( k ) , , L N ( k ) } , where L i ( k ) { 1 , , M } denotes the subset of M locations agent i visits at time k. At each location j, agents interact according to a contact schedule C j k = { C 1 ( j , k ) , C 2 ( j , k ) , , C N ( j , k ) } , where C i ( j , k ) { 1 , , N } denotes the set of agents that agent i contacts at location j at time k. Let C k = { C 1 k , , C M k } denote the set of all scheduled contacts at time k. These contacts might represent household interactions (daily, fixed contacts), workplace interactions (weekday patterns), social interactions (varying contacts), or community interactions (spatial proximity). Figure 1 depicts a typical contact structure and observation for an agent on a university campus.
Susceptible to Exposed Transition: For a susceptible agent i, denote:
p i j = 1 l C i ( j , k ) ( 1 λ E · I I ( l , k 1 ) )
where λ E [ 0 , 1 ] is the per-contact transmission probability and I I ( l , k 1 ) is an indicator function equal to 1 if agent l is in the infected state at time k 1 . This captures independent exposure chances for all contacts in C i ( j , k ) . The probability of transitioning to exposed at time k is:
P ( S i E i X k 1 , L k , C k ) = 1 ( 1 θ ) · j L i ( k ) ( 1 e j · p i j )
where θ [ 0 , 1 ] is the spontaneous exposure rate, and e j [ 0 , 1 ] is an environmental risk factor associated with location j. This captures independent exposure chances across all locations in L i ( k ) . For brevity, we suppress the dependencies and use P ( S i E i ) going forward.
Exposed to Infected Transition: Exposed agents transition to infected with probability λ I per time step:
P ( E i I i ) = λ I
where 1 / λ I represents the mean incubation period.
Infected to Recovered Transition: Infected agents recover with probability λ R per time step:
P ( I i R i ) = λ R
where 1 / λ R represents the mean infectious period.

3.2.2. Observation Model

Let y i j B denote the observation of bit j belonging to a tested agent i T k . Then for each agent i T k , the observation model independently measures each bit as a Bernoulli trial with errors
P ( y i 1 = 1 b i 1 = 0 ) = α P ( y i 1 = 0 b i 1 = 1 ) = β P ( y i 2 = 1 b i 2 = 0 ) = γ P ( y i 2 = 0 b i 2 = 1 ) = δ

3.3. Inference Approaches

The central challenge of epidemic state estimation is computing the posterior distribution p ( X k | Y 1 : k ) over all agent states given all observations up to time k. We develop two complementary approaches tailored to different population scales.

3.3.1. Boolean Kalman Particle Filter

For populations where N 100 , we maintain a set of N p particles { x k , m } m = 1 N p , where each particle x k , m is a complete hypothesis about all N agents’ Boolean states. The particle filter approximates the posterior distribution as follows:
p ( X k | Y 1 : k ) m = 1 N p W k , m δ ( X k x k , m )
where W k , m are importance weights and δ ( · ) is the Dirac delta function assigning point mass probability to each particle x k , m .
The agent-based SEIR dynamics define a state transition distribution p ( X k | X k 1 ) through the stochastic function f k ( X k 1 , n k ) described by Equations (3)–(6). These equations define per-agent transition probabilities that are independent Bernoulli trials given the previous state X k 1 . This conditional independence allows the state transitions to factorize:
p ( X k | X k 1 ) = i = 1 N p ( X k ( i ) | X k 1 ) .
Two important distributions are the look-ahead distribution, p ( Y k X k 1 ) , and the guided proposal distribution, p ( X k X k 1 , Y k ) . If observations are conditionally independent, the look-ahead becomes the following:
p ( Y k X k 1 ) = i = 1 N X k ( i ) { 0 , 1 } 2 p ( Y k ( i ) X k ( i ) ) p ( X k ( i ) X k 1 )
where p ( Y k ( i ) X k ( i ) ) = 1 for i T k (untested agents). The guided proposal p ( X k X k 1 , Y k ) = p ( Y k X k ) p ( X k X k 1 ) / p ( Y k X k 1 ) factorizes identically:
p ( X k X k 1 , Y k ) = i = 1 N p ( Y k ( i ) X k ( i ) ) p ( X k ( i ) X k 1 ) X k ( i ) { 0 , 1 } 2 p ( Y k ( i ) X k ( i ) ) p ( X k ( i ) X k 1 )
We now reproduce the APF-BKF of Imani and Braga-Neto [3] adapted to our epidemiological model. Given particles { x k , m } m = 1 N p and initial distribution p ( X 0 ) , sample X 0 , m p ( X 0 ) , and initialize weights W 0 , m = 1 / N p , for m = 1 , , N p . For each time step k = 1 , 2 , :
  • Compute first-stage weights: V k , m = p ( Y k x k 1 , m ) W k 1 , m for m = 1 , , N p .
  • Sample auxiliary variables: { ζ k , m } m = 1 N p Cat ( { V k , m } m = 1 N p ) .
  • Obtain new particles from the guided proposal: x k , m p ( X k x k 1 , ζ k , m , Y k ) .
  • Compute second-stage weights:
    W ˜ k , m = p ( Y k x k , m ) p ( x k , m x k 1 , ζ k , m ) p ( x k , m x k 1 , ζ k , m , Y k )
  • Normalize: W k , m = W ˜ k , m / j = 1 N p W ˜ k , j .
  • Compute expectation: z k = m = 1 N p W k , m x k , m .
  • Compute minimum mean square error (MMSE) estimate: X ^ k MS = z ¯ k where z ¯ k ( i , j ) = 1 if z k ( i , j ) > 1 / 2 , else 0 (elementwise thresholding at 1/2).
  • Compute mean square error (MSE): MSE ( X ^ k MS , Y 1 : k ) = min { z k , z k c } 1 where z k c ( i , j ) = 1 z k ( i , j ) , min is performed elementwise, and · 1 denotes the matrix L 1 norm.
Note that while the MSE in step 8 may not look like a conventional expectation of an L 2 norm, it is indeed the true MSE in the space of binary vectors. Also note, if the exact look-ahead and guided proposal are used (conditional independence holds), then the second stage weights are uniform.
p ( Y k x k , m ) p ( x k , m x k 1 , ζ k , m ) p ( x k , m x k 1 , ζ k , m , Y k ) = p ( Y k x k , m ) p ( x k , m x k 1 , ζ k , m ) p ( Y k x k , m ) p ( x k , m x k 1 , ζ k , m ) / p ( Y k x k 1 , ζ k , m ) = p ( Y k x k 1 , ζ k , m )
Since the first stage weights use the look-ahead p ( Y k x k 1 , m ) W k 1 , m to sample ζ k , m , then the final weights become uniform under the fully adapted proposal.

3.3.2. Mean-Field Approximation for Large Populations

For populations where N 100 , the state space of 4 N configurations becomes intractable. We employ a mean-field (MF) approximation that factorizes the posterior:
p ( X k | Y 1 : k ) i = 1 N q k ( i ) ( X k ( i ) )
where q k ( i ) ( s ) = p ( X k ( i ) = s Y 1 : k ) for s { S , E , I , R } is the marginal distribution for agent i. Each marginal is represented as a probability vector of length 4:
q k ( i ) = [ q k ( i ) ( S ) , q k ( i ) ( E ) , q k ( i ) ( I ) , q k ( i ) ( R ) ] .
This requires storing only 4 N probability values rather than tracking 4 N joint configurations.
Mean-Field Prediction Step: At time k, we compute the expected infection pressure each agent faces. For agent i at location j, we use a moment closure in place of Equation (3) for the mean-field infection probability:
p ¯ i j = 1 l C i ( j , k ) 1 η · λ E · q k 1 ( l ) ( I )
where η [ 0 , 1 ] is an effective infectious susceptibility scaling parameter that accounts for over-mixing in the mean field approximation, and q k 1 ( l ) ( I ) is the probability that agent l is infected at time k 1 .
The predicted marginal distribution for agent i at time k is the following:
q k ( i ) ( S ) = q k 1 ( i ) ( S ) · 1 P ( S i E i ) q k ( i ) ( E ) = q k 1 ( i ) ( S ) · P ( S i E i ) + q k 1 ( i ) ( E ) · ( 1 λ I ) q k ( i ) ( I ) = q k 1 ( i ) ( E ) · λ I + q k 1 ( i ) ( I ) · ( 1 λ R ) q k ( i ) ( R ) = q k 1 ( i ) ( I ) · λ R + q k 1 ( i ) ( R )
where the superscript ( ) denotes the predicted distribution before incorporating observations.
Mean-Field Update Step: When agent i is observed at time k with measurement Y k ( i ) , we update only that agent’s marginal using Bayes’ rule:
q k ( i ) ( s ) = p ( Y k ( i ) | X k ( i ) = s ) · q k ( i ) ( s ) s { S , E , I , R } p ( Y k ( i ) | X k ( i ) = s ) · q k ( i ) ( s )
Note that this differs from updating the full joint distribution, where a positive test for one agent can provide information on other agents through the contact network. For agents not observed at time k, the marginal remains unchanged: q k ( i ) ( s ) = q k ( i ) ( s ) . These marginals are then converted to the Boolean state vector according to our two-bit encoding:
z k ( i ) = [ q k ( i ) ( E ) + q k ( i ) ( I ) , q k ( i ) ( I ) + q k ( i ) ( R ) ] , i = 1 , , N
with MMSE state estimate and MSE computed following the APF-BKF algorithm. Note that the MSE may no longer be optimal as we are no longer targeting the joint distribution.

4. Results

We evaluate our BS-ABM frameworks on two realistic campus scenarios demonstrating different inference methodologies necessitated by computational constraints:
  • Small College (100 agents): We apply the Boolean Kalman particle filter (BKF) with 4 × 10 6 particles.
  • Large University (50,000 agents): We employ mean-field approximation with linear computational complexity.
Both scenarios assess the ability to reconstruct population-level infection dynamics and individual agent states from sparse, noisy testing data.
Agent schedules are generated as follows. During weekdays, both campuses operate on five class periods per day. Each agent attends three of the five periods in randomly selected classrooms, yielding 15 class meetings per week. No classes occur on weekends. This weekly schedule repeats over 100 days. Additionally, a specified fraction of agents live in dormitories, interacting with roommates daily (including weekends).
For each campus, we run simulations for two outbreak regimes. In the small outbreak regime, classroom and dormitory environmental risk factors are sampled uniformly in [0.1, 0.2], and λ E = 0.01 . In the large outbreak regime, risk factors are sampled from [0.2, 0.3], and λ E = 0.05 . For all experiments, λ I = 1 / 5 and λ R = 1 / 10 .
For all combinations of campus size, outbreak regime, and testing strategy, an ensemble of 100 randomly initialized schedules, environmental risk factors, and five initially infected agents were sampled. All parameters are summarized in Table A1.
All experiments were originally run as Google Colab notebooks in an L4 GPU environment with 24 GB of RAM and took approximately 15 h of wall time with a single BKF simulation day taking approximately 810 ms, and a single MF simulation day taking approximately 90 ms. The source code used to generate our results can be found at https://github.com/UBragaNeto/State-Space_Agent-Based_Model_Infectious_Disease, accessed on 6 May 2026.

4.1. Observation Model Parameters

At each time step, we independently test N T agents selected randomly. Each test measures both Boolean state bits with symmetric error rates: false positive rate α = 0.05 and false negative rate β = 0.05 for the first bit, and false positive rate γ = 0.05 and false negative rate δ = 0.05 for the second bit. We evaluate testing rates of 1%, 5%, and 10% of the population per day.

4.2. Quantities of Interest

We analyze several performance metrics:
Ground Truth Test Error: The portion of tested agents whose tests do not match their true state. With symmetric 5% error on both bits, expected error is P ( state error ) = 1 P ( bit 1 correct ) P ( bit 2 correct ) = 1 0.95 2 = 0.0975 .
BKF/MF Test Error: The portion of tested agents whose BKF/MF states do not match their true states. Our model incorporates test error information via the likelihood, so we expect agents in the test set to be closer to ground truth in the BKF/MF than their naïve test results would indicate.
BKF/MF Total Error: The portion of all agents whose BKF/MF states do not match their true states. This evaluates the quality of estimates for untested agents.
BKF/MF Precision: Measures the accuracy of positive predictions (exposed or infected) as the ratio of true positives over true positives + false positives.
BKF/MF Recall: Measures the ability to find all positives (exposed or infected) as the ratio of true positives over true positives + false negatives.
Relative L 1 Error of E + I Counts: Let E ^ = [ E ^ 1 , , E ^ 100 ] denote the vector of Exposed agents predicted by the BKF/MF over 100 days, and let E ˜ denote ground truth. Similarly, define I ^ and I ˜ for Infectious agents. The relative L 1 error on E + I counts is ( E ^ + I ^ ) ( E ˜ + I ˜ ) 1 / E ˜ + I ˜ 1 .
Ground Truth Within MSE: To determine uncertainty quantification calibration, we measure the portion of ground truth E + I counts that lie in the interval E ˜ + I ˜ [ E ^ + I ^ MSE ,   E ^ + I ^ + MSE ] .

4.3. Small College Environment: Particle Filter Approach

The small college comprises 100 agents moving among three classroom buildings and three dormitories over 100 days. Half the population lives on campus; the other half lives off campus with a daily exposure probability of 10 3 . The spontaneous exposure rate θ is 10 4 .
The small outbreak regime had an average size of 3.12 infections with a peak of 7.68. The large outbreak regime had an average size of 14.61 infections with a peak of 54.23. Figure 2a,b show 100 random initialization mean and ± 2 standard deviation Susceptible (S), Exposed + Infected (E + I), and Recovered (R) curves for the small outbreak and large outbreak regimes, respectively.
Small Outbreak Performance: Table 1 summarizes results averaged over 100 ensembles. At a 1% test rate, BKF test error is roughly half the ground truth test error (0.0588 vs. 0.0998). This improves to less than half at 10% testing (0.0460 vs. 0.0968), indicating growing ability to outperform naïve test results as observations increase.
The total BKF error begins at 0.1411, showing an accurate prediction of 85.89% of agent states with noisy data on only 1% of the population. This improves to 92.53% accuracy at 10% testing. Precision begins at 0.6703 and slightly increases to 0.6995, indicating nearly 70% accuracy in reported exposed and infected states for all test rates. Recall starts quite low at 0.0261, but is over double the testing rate of 1%. It remains over double the testing rate as it climbs to 10%, indicating an ability to find exposed and infected agents at a higher rate than naïve testing.
At 1% testing, relative L 1 error is 0.6942 (approximately ± 2 E + I counts around the 3.12 average) with 99.94% of error captured within MSE. At 10% testing, error drops to 0.5104 ( ± 1.5 counts) with 93.94% MSE coverage, indicating the BKF becomes slightly overconfident as observations become more informative.
Large Outbreak Performance: Table 2 summarizes results. For all testing rates, BKF test error remains around half the ground truth error, indicating the BKF outperforms naïve tests even as SEIR states become broadly distributed.
BKF total error begins at 0.1750 (82.50% accuracy with a single daily test) and improves to 85.55% at 10% testing, showing modest improvement with tenfold increase in observations. Precision begins at 0.5654 at a 1% test rate and climbs to 0.6591 at 10% testing. This is lower than the small outbreak scenario since agent states are more scattered, making them less reliable to predict, but there is a noticeable increase in performance with the testing rate that the small outbreak scenario lacks. Recall starts much higher than the small outbreak at 0.2195 and similarly grows with the test rate, indicating a better ability to find exposed and infected agents when the outbreak signal is large.
Relative L 1 error begins at 0.1575 ( ± 2.3 counts around the 14.61 average) with 90.03% error within MSE. This demonstrates strong ability to track large outbreaks with a single daily test. Error improves modestly with increased testing, again becoming slightly overconfident.

4.4. Large University Environment: Mean-Field Approach

The large university comprises 50,000 agents moving among 1000 classrooms and 5000 dormitories over 100 days. 20% live on campus; the other 80% have daily exposure probability of 10 6 . The spontaneous exposure rate θ is 10 6 , and effective infectious susceptibility scaling η is 0.9. This choice of η = 0.9 was found to best fit the simulation data and does not seem overly sensitive to deviations, with η [ 0.8 , 1.0 ] yielding similar results.
The small outbreak regime had an average size of 8.57 infections with a peak of 17.13. The large outbreak regime had average size 7393 infections with peak 29,329. Figure 3a–c show 100 random initialization mean and ±2 standard deviation S, E + I, and R curves for the small outbreak S, small outbreak E + I and R, and large outbreak regimes, respectively.
Small Outbreak Performance: Table 3 summarizes results. MF test error performs much better than the expected ground truth error of 0.0975. At 1% testing, MF achieves test error 0.0007, improving to 0.0003 at 10%. Total error ranges from 0.0007 at 1% to 0.0004 at 10%. While seemingly exceptional, these results reflect that >99% of the population remain susceptible, with errors occurring predominantly around the average 8.57 E + I agents out of 50,000.
Precision at a 1% testing rate is 0.6859, rising to 0.7258 at 10% testing. This similarly hovers around the 70% value of the small college, small outbreak scenario, but now with a larger improvement as the test rate grows. Recall is woefully below the 1% test rate at 0.0015; however, it grows above the 10% test rate at 0.1282. This indicates the mean field approximation needs more tests to start capturing the exposed and infected agents for small outbreaks.
Relative L 1 error starts at 0.7391 at 1% testing (approximately ±6 E + I counts around the 8.57 average) with 99.29% error within MSE. At 10% testing, error drops to 0.4922 (±4 counts) with 99.74% MSE coverage. All testing rates show the MF’s ability to capture the endemic nature with very few estimates dying off or blowing up.
Large Outbreak Performance: Table 4 summarizes results. MF test error begins at half the ground truth at 1% testing (0.0447 vs. 0.0972) and drops to a third at 10% (0.0313 vs. 0.0974). Unlike the small outbreak, where nearly everyone remains susceptible, the large outbreak can have the population in any mixture of SEIR states. The MF demonstrates clear ability to outperform naïve test results.
Total error begins at 0.1657 at 1% testing, indicating accurate prediction of 83.43% of states with noisy tests on only 1% in any SEIR configuration. Performance increases to 87.89% accuracy at 10% testing. Precision starts similar to the small college, large outbreak scenario at 0.5715 for 1% testing. It then increases at a slightly faster rate to 0.7097 at 10% testing. Recall is much improved for large outbreaks, starting at 0.3505 for 1% testing and growing to 0.6298 at 10%, showing a strong ability to find exposed and infected agents far above the testing rate.
Relative L 1 error begins at 0.0931 (roughly ±688 E + I counts around the 7393 average) and drops to 0.0221 at 10% testing (±163 counts). This is exceptional performance with nearly all errors (0.9998–0.9999) captured within MSE, indicating nearly perfect uncertainty quantification for all test rates.

4.5. Key Findings

Results demonstrate several important insights:
Scaling Transition: Experiments illustrate a practical transition from exact particle filtering (small college, N = 100 ) to scalable mean-field inference (large university, N = 50,000). The mean-field approximation scales linearly with population size and remains effective at realistic scales.
Statistical Denoising: Both methods provide clear evidence of denoising relative to raw test outcomes. With 5% bit error, naïve tests show ∼9.75% error. In small college experiments, BKF reduces tested-agent error by approximately half in both outbreak regimes. In large university experiments, mean-field similarly improves substantially over naïve tests in large outbreaks; in small outbreaks, extremely low error reflects that nearly all agents remain susceptible.
Outbreak Magnitude Effects: Outbreak magnitude strongly affects difficulty. In both environments, large outbreaks are easier to track in aggregate epidemic intensity. For the small college, large outbreaks yield low relative errors (0.1459–0.1575), whereas small outbreaks are more difficult (0.5104–0.6942). The pattern amplifies in the large university: small outbreaks have large relative errors (0.4922–0.7391), while large outbreaks are tracked closely (0.0221–0.0931). Similar findings occur for recall. Small college, large outbreaks have substantially larger recall (0.2195–0.5599) compared with small outbreaks (0.0261–0.2680). The large university is similar, where large outbreak recall (0.3505–0.6298) drastically outperforms small outbreaks (0.0015–0.1282). When few agents are infected, the epidemic signal is weak; when many are infected, the system provides a stronger aggregate signal.
Testing Rate Diminishing Returns: Increasing testing from 1% to 5% produces noticeable improvement. Additional improvement from 5% to 10% is smaller, suggesting that modest daily testing provides the most benefit for population-level monitoring.
Uncertainty Calibration: Small college, small outbreaks show near-perfect coverage at low test rates (0.9994 at 1%) but decrease as tests become more informative (0.9394 at 10%), indicating mild overconfidence. In small colleges, large outbreaks, coverage is lower overall (0.8667–0.9003). By contrast, in large university, large outbreaks, MF coverage is essentially perfect (0.9998–0.9999). All experiments maintained >86% coverage, showing strong uncertainty calibration.
Particle Filter vs. Mean-Field Tradeoff: Particle filtering provides principled approximation of the full joint posterior, with strong performance for small populations under sparse testing, but computational cost limits use at large N. Mean-field sacrifices joint correlation structure, but offers linear-time scalability and, in these regimes, delivers accurate reconstruction of both individual states and population-level epidemic intensity.

5. Conclusions

We have presented a comprehensive framework for epidemic state estimation that combines Boolean state-space representations of agent-based SEIR models with optimal filtering methods tailored to different population scales. Our key contributions include the following: (1) formal Boolean encoding of epidemiological states enabling discrete-state optimal filtering; (2) application of Boolean Kalman particle filtering to small populations demonstrating accurate inference from sparse, noisy observations; (3) development of mean-field approximation for large populations achieving linear computational complexity while maintaining accuracy and (4) comprehensive empirical evaluation across multiple outbreak scenarios, testing rates, and population scales.
Our experimental results demonstrate that mean-field approximations achieve exceptional performance when populations are large and epidemic signals are strong, even 1% daily testing rates enable accurate population-level monitoring, and large outbreaks are easier to track due to a high signal-to-noise ratio. The Boolean state-space framework, combined with scale-appropriate inference methods, provides a promising foundation for real-time epidemic monitoring systems balancing modeling fidelity, computational efficiency, and estimation accuracy.

Future Directions

Several promising extensions warrant future investigation:
Hybrid Particle Filter / Mean-Field Methods: Our results suggest that particle filtering excels at capturing correlations in small groups while mean-field scales to large populations. A hybrid approach that applies particle filtering to highly connected subpopulations (e.g., households, classrooms) while using mean-field for inter-group interactions could combine the strengths of both methods.
Adaptive Testing Strategies: While we used fixed-rate random testing, a more sophisticated strategy based on the selection of agents with maximal MSE could further reduce testing requirements. This would alter the observation model to account for probabilistic selection of the test set and may break the conditional independence assumption in the likelihood.
Parameter Learning: Our current framework assumes known parameters (Test Rate, λ E , λ I , λ R , c j , d j , α , β , γ , δ , η , θ ). Joint state and parameter estimation using particle MCMC or variational methods could enable real-time parameter learning from surveillance data [3].
Extended States: Extending the Boolean representation to handle symptomatic status, vaccination status, or multiple circulating strains (requiring additional Boolean bits per agent) would enable modeling of competitive dynamics and strain-specific interventions.

Author Contributions

Conceptualization, M.L.N.-M. and U.M.B.-N.; methodology, D.A.C., M.L.N.-M. and U.M.B.-N.; software, D.A.C.; validation, D.A.C. and M.L.N.-M.; formal analysis, D.A.C.; writing—original draft preparation, D.A.C.; writing—review and editing, M.L.N.-M. and U.M.B.-N.; supervision, U.M.B.-N.; project administration, U.M.B.-N.; funding acquisition, U.M.B.-N. and M.L.N.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSF-AoF: A Bayesian Paradigm for Physics-Informed Machine Learning, Award Number 2225507.

Data Availability Statement

The source code used to generate our results can be found at https://github.com/UBragaNeto/State-Space_Agent-Based_Model_Infectious_Disease, accessed on 6 May 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SEIRSusceptible-Exposed-Infected-Recovered
ABMagent-based model
BS-ABMBoolean state-space agent-based model
BKFBoolean Kalman particle filter
APF-BKFAuxiliary Particle Filter implementation of the Boolean Kalman Filter
MFmean-field
MSEmean square error
MMSEminimum mean square error

Appendix A

Table A1. Model Parameters.
Table A1. Model Parameters.
ParameterSmall CollegeLarge UniversityDescription
λ E (small outbreak)0.010.01Probability of S → E
λ E (large outbreak)0.050.05Probability of S → E
λ I 1/51/5Probability of E → I
λ R 1/101/10Probability of I → R
off_campus0.50.8Portion of agents living off campus
off_exp 10 3 10 6 Off campus probability of S → E
c j (small outbreak)Unif ([0.1, 0.2])Unif ([0.1, 0.2])Classroom risk factors
c j (large outbreak)Unif ([0.2, 0.3])Unif ([0.2, 0.3])Classroom risk factors
d j (small outbreak)Unif ([0.1, 0.2])Unif ([0.1, 0.2])Dormitory risk factors
d j (large outbreak)Unif ([0.2, 0.3])Unif ([0.2, 0.3])Dormitory risk factors
θ 10 4 10 6 Spontaneous exposure rate
α 0.050.05First bit false positive rate
β 0.050.05First bit false negative rate
γ 0.050.05Second bit false positive rate
δ 0.050.05Second bit false negative rate
η N/A0.9Infectious susceptibility scaling factor
N p 4 × 10 6 N/ANumber of particles

References

  1. Ferguson, N.M.; Laydon, D.; Nedjati-Gilani, G.; Imai, N.; Ainslie, K.; Baguelin, M.; Bhatia, S.; Boonyasiri, A.; Cucunubá, Z.; Cuomo-Dannenburg, G.; et al. Report 9: Impact of Non-Pharmaceutical Interventions (NPIs) to Reduce COVID-19 Mortality and Healthcare Demand; Imperial College London: London, UK, 2020. [Google Scholar] [CrossRef]
  2. Keeling, M.J.; Rohani, P. Modeling Infectious Diseases in Humans and Animals, 1st ed.; Princeton University Press: Princeton, NJ, USA, 2011; pp. 15–104. [Google Scholar] [CrossRef]
  3. Imani, M.; Braga-Neto, U.M. Particle filters for partially-observed Boolean dynamical systems. Automatica 2018, 87, 238–250. [Google Scholar] [CrossRef]
  4. Doucet, A.; De Freitas, N.; Gordon, N. Sequential Monte Carlo Methods in Practice, 1st ed.; Springer: New York, NY, USA, 2001; pp. 17–41. [Google Scholar] [CrossRef]
  5. Snyder, C.; Bengtsson, T.; Bickel, P.; Anderson, J. Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 2008, 136, 4629–4640. [Google Scholar] [CrossRef]
  6. Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
  7. Kermack, W.O.; McKendrick, A.G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Proc. A 1927, 115, 700–721. [Google Scholar] [CrossRef]
  8. Tracy, M.; Cerdá, M.; Keyes, K.M. Agent-Based Modeling in Public Health: Current Applications and Future Directions. Annu. Rev. Public Health 2018, 39, 77–94. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, X.; Wang, J.; Yu, C.; Fei, J.; Luo, T.; Cao, Z. Agent-Based Modeling of Epidemics: Approaches, Applications, and Future Directions. Technologies 2025, 13, 272. [Google Scholar] [CrossRef]
  10. Kraul, M.; Zimmerman, R.K.; Williams, K.V.; Raviotta, J.M.; Harrison, L.H.; Williams, J.V.; Roberts, M.S. Agent-based model of the impact of higher influenza vaccine efficacy on seasonal influenza burden. Vaccine X 2023, 13, 100249. [Google Scholar] [CrossRef]
  11. Kerr, C.C.; Stuart, R.M.; Mistry, D.; Abeysuriya, R.G.; Rosenfeld, K.; Hart, G.R.; Núñez, R.C.; Cohen, J.A.; Selvaraj, P.; Hagedorn, B.; et al. Covasim: An agent-based model of COVID-19 dynamics and interventions. PLoS Comput. Biol. 2021, 17, e1009149. [Google Scholar] [CrossRef] [PubMed]
  12. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
  13. Jazwinski, A.H. Stochastic Processes and Filtering Theory, 1st ed.; Academic Press: Cambridge, MA, USA, USA, 1970; pp. 332–366. [Google Scholar] [CrossRef]
  14. Julier, S.J.; Uhlmann, J.K. New extension of the Kalman filter to nonlinear systems. Signal Process. Sens. Fusion Target Recognit. 1997, 3068, 182–193. [Google Scholar] [CrossRef]
  15. Arulampalam, M.S.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
  16. Wheeler, J.; Rosengart, A.; Jiang, Z.; Tan, K.; Treutle, N.; Ionides, E.L. Informing policy via dynamic models: Cholera in Haiti. PLoS Comput. Biol. 2024, 20, e1012032. [Google Scholar] [CrossRef] [PubMed]
  17. Temfack, D.; Wyse, J. Sequential Monte Carlo Squared for online inference in stochastic epidemic models. Epidemics 2025, 52, 100847. [Google Scholar] [CrossRef] [PubMed]
  18. Kauffman, S.A. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 1969, 22, 437–467. [Google Scholar] [CrossRef] [PubMed]
  19. Shmulevich, I.; Dougherty, E.R.; Kim, S.; Zhang, W. Probabilistic Boolean networks: A rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002, 18, 261–274. [Google Scholar] [CrossRef] [PubMed]
  20. Imani, M.; Braga-Neto, U.M. Boolean Kalman filter and smoother under model uncertainty. Automatica 2020, 111, 108609. [Google Scholar] [CrossRef]
  21. Kiss, I.Z.; Miller, J.C.; Simon, P.L. Mathematics of Epidemics on Networks: From Exact to Approximate Models, 1st ed.; Springer: Cham, Switzerland, 2017; pp. 165–205. [Google Scholar] [CrossRef]
  22. Decreusefond, L.; Dhersin, J.-S.; Moyal, P.; Tran, V.C. Large graph limit for an SIR process in random network with heterogeneous connectivity. Ann. Appl. Probab. 2012, 22, 541–575. [Google Scholar] [CrossRef]
Figure 1. University Contact Structure: The agent vector at time k 1 follows the schedule of Agent j (Green). Agent j attends three classes each with contacts of infectious agents (Red) and noninfectious agents (Gray). Agent j lives in a dormitory with infectious roommate contacts. Agent j is selected for random testing at time k. Agent j’s state is updated according to state transition dynamics and loops back to the agent vector at time k.
Figure 1. University Contact Structure: The agent vector at time k 1 follows the schedule of Agent j (Green). Agent j attends three classes each with contacts of infectious agents (Red) and noninfectious agents (Gray). Agent j lives in a dormitory with infectious roommate contacts. Agent j is selected for random testing at time k. Agent j’s state is updated according to state transition dynamics and loops back to the agent vector at time k.
Computation 14 00117 g001
Figure 2. Small College Susceptible (S), Exposed+Infected (E + I), and Recovered (R) Trajectories averaged over 100 random initializations for (a) Small Outbreak Scenario, (b) Large Outbreak Scenario. Mean trajectories are shown as solid lines, and ± 2 standard deviation bands are shown as shaded regions.
Figure 2. Small College Susceptible (S), Exposed+Infected (E + I), and Recovered (R) Trajectories averaged over 100 random initializations for (a) Small Outbreak Scenario, (b) Large Outbreak Scenario. Mean trajectories are shown as solid lines, and ± 2 standard deviation bands are shown as shaded regions.
Computation 14 00117 g002
Figure 3. Large University S, E + I, and R Trajectories averaged over 100 random initializations for (a) Small Outbreak S, (b) Small Outbreak E + I and R, (c) Large Outbreak. Mean trajectories are shown as solid lines, and ±2 standard deviation bands are shown as shaded regions.
Figure 3. Large University S, E + I, and R Trajectories averaged over 100 random initializations for (a) Small Outbreak S, (b) Small Outbreak E + I and R, (c) Large Outbreak. Mean trajectories are shown as solid lines, and ±2 standard deviation bands are shown as shaded regions.
Computation 14 00117 g003
Table 1. Small College, Small Outbreak Boolean Kalman Particle Filter (BKF) Results.
Table 1. Small College, Small Outbreak Boolean Kalman Particle Filter (BKF) Results.
Test Rate (%)1510
Ground Truth Test Error0.09980.09360.0968
BKF Test Error0.05880.05530.0460
BKF Total Error0.14110.10140.0747
BKF Precision0.67030.66880.6995
BKF Recall0.02610.13430.2680
BKF E + I Relative L 1 Error0.69420.56300.5104
Ground Truth Within MSE0.99940.98030.9394
Table 2. Small College, Large Outbreak BKF Results.
Table 2. Small College, Large Outbreak BKF Results.
Test Rate (%)1510
Ground Truth Test Error0.09680.09390.0950
BKF Test Error0.04780.04570.0483
BKF Total Error0.17500.16120.1445
BKF Precision0.56540.60560.6591
BKF Recall0.21950.45210.5599
BKF E + I Relative L 1 Error0.15750.14900.1459
Ground Truth Within MSE0.90030.88800.8667
Table 3. Large University, Small Outbreak Mean-Field (MF) Results.
Table 3. Large University, Small Outbreak Mean-Field (MF) Results.
Test Rate (%)1510
Ground Truth Test Error0.09740.09750.0974
MF Test Error0.00070.00040.0003
MF Total Error0.00070.00050.0004
MF Precision0.68590.70250.7258
MF Recall0.00150.03520.1282
MF E + I Relative L 1 Error0.73910.62580.4922
Ground Truth Within MSE0.99290.99860.9974
Table 4. Large University, Large Outbreak MF Results.
Table 4. Large University, Large Outbreak MF Results.
Test Rate (%)1510
Ground Truth Test Error0.09720.09740.0974
MF Test Error0.04470.03680.0313
MF Total Error0.16570.14250.1211
MF Precision0.57150.65340.7097
MF Recall0.35050.50900.6298
MF E + I Relative L 1 Error0.09310.04310.0221
Ground Truth Within MSE0.99990.99990.9998
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cator, D.A.; Ndeffo-Mbah, M.L.; Braga-Neto, U.M. A State-Space Agent-Based Model for Infectious Disease Spread. Computation 2026, 14, 117. https://doi.org/10.3390/computation14060117

AMA Style

Cator DA, Ndeffo-Mbah ML, Braga-Neto UM. A State-Space Agent-Based Model for Infectious Disease Spread. Computation. 2026; 14(6):117. https://doi.org/10.3390/computation14060117

Chicago/Turabian Style

Cator, Durward A., Martial L. Ndeffo-Mbah, and Ulisses M. Braga-Neto. 2026. "A State-Space Agent-Based Model for Infectious Disease Spread" Computation 14, no. 6: 117. https://doi.org/10.3390/computation14060117

APA Style

Cator, D. A., Ndeffo-Mbah, M. L., & Braga-Neto, U. M. (2026). A State-Space Agent-Based Model for Infectious Disease Spread. Computation, 14(6), 117. https://doi.org/10.3390/computation14060117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop