1. Introduction
The COVID-19 pandemic has underscored the critical need for accurate, real-time epidemiological modeling frameworks capable of tracking disease spread through populations despite limited and imperfect surveillance data [
1]. Traditional compartmental models (e.g., Susceptible-Infected-Recovered (SIR), Susceptible-Exposed-Infected-Recovered (SEIR)) provide valuable insights into aggregate disease dynamics but lack the granularity to capture individual-level heterogeneity in contact patterns, testing behavior, and infection progression [
2]. Conversely, agent-based models (ABMs) offer rich microscopic detail but face significant challenges in state estimation: how can we infer the true infection states of thousands of individuals when we observe only noisy test results from a small subset?
This paper addresses this fundamental inference problem by developing a state-space framework that combines the representational power of agent-based models with the statistical rigor of optimal filtering theory. Our key innovation lies in recognizing that epidemiological states are inherently discrete and can be efficiently represented using Boolean variables, enabling the application of Boolean Kalman filtering techniques originally developed for gene regulatory networks [
3]. However, while particle filtering is asymptotically exact as the number of particles grows [
4], the curse of dimensionality makes straightforward particle filtering impractical in high-dimensional epidemic ABMs [
5], motivating scalable mean-field approximations for realistic population sizes [
6].
1.1. Motivation and Challenges
Real-world epidemic surveillance presents several distinct challenges:
Partial observability: Only a fraction of the population is tested at any given time, leaving most infection states latent.
Noisy measurements: Diagnostic tests exhibit false positive and false negative rates, corrupting observations with measurement error.
Discrete states: Epidemiological compartments (S, E, I, R) are categorical, not continuous quantities.
Nonlinear dynamics: Disease transmission depends on complex, nonlinear interactions between agents.
High dimensionality: Tracking individual-level states in large populations creates computational challenges that grow exponentially with population size.
Standard Kalman filtering assumes continuous Gaussian states and linear dynamics, making it unsuitable for this domain. Extended and unscented Kalman filters can handle nonlinearity but still assume continuous state spaces. Particle filters offer a general solution for nonlinear, non-Gaussian problems, but become computationally prohibitive in high dimensions without careful design or approximations.
1.2. Our Approach
We propose a Boolean state-space agent-based model (BS-ABM) where each agent’s SEIR state is encoded using 2 Boolean variables, yielding four possible states: . Agent interactions follow predefined schedules (e.g., workplace, household, social contacts), and state transitions are governed by probabilistic rules that depend on contact with infected individuals. For state estimation, we develop two complementary approaches tailored to different population scales:
Small Populations (Particle Filter): For populations up to approximately 100 agents, we adapt the Auxiliary Particle Filter implementation of the Boolean Kalman Filter (APF-BKF) framework of Imani and Braga-Neto [
3] to the epidemiological setting. This approach maintains a population of particles, each representing a hypothesis about all agents’ Boolean states, and propagates these through the discrete-state transition dynamics while updating particle weights based on noisy observations. While the state space of
configurations is enormous, particle filtering provides an effective approximation that captures dependencies between agent states arising from their contact structure.
Large Populations (Mean-Field Approximation): For populations of agents, the state space of configurations renders particle filtering utterly intractable—no practical number of particles can meaningfully sample this space. We, therefore, employ a mean-field approximation that factorizes the posterior distribution into independent marginal distributions for each agent. Instead of tracking the full joint distribution , we maintain N separate marginal distributions for each agent i, where each marginal is defined over only 4 SEIR states. This reduces the representation complexity from exponential to linear in N, enabling scalable inference while sacrificing the ability to capture correlations between agent states.
1.3. Contributions
Our main contributions are: (1) a formal Boolean state-space framework for agent-based epidemiological modeling that naturally represents discrete disease states; (2) an adaptation of Boolean Kalman particle filtering to epidemic state estimation in small populations under realistic observation models including random sampling and measurement noise; (3) a scalable mean-field approximation for large populations that performs inference by tracking independent per-agent marginal distributions while preserving accurate population-level estimates; (4) computational implementations that scale from 100 to 50,000 agents, illustrating the practical transition from exact particle filtering to mean-field inference as population size grows and (5) empirical demonstration that both methods can accurately reconstruct latent states from sparse, noisy observations.
4. Results
We evaluate our BS-ABM frameworks on two realistic campus scenarios demonstrating different inference methodologies necessitated by computational constraints:
Small College (100 agents): We apply the Boolean Kalman particle filter (BKF) with particles.
Large University (50,000 agents): We employ mean-field approximation with linear computational complexity.
Both scenarios assess the ability to reconstruct population-level infection dynamics and individual agent states from sparse, noisy testing data.
Agent schedules are generated as follows. During weekdays, both campuses operate on five class periods per day. Each agent attends three of the five periods in randomly selected classrooms, yielding 15 class meetings per week. No classes occur on weekends. This weekly schedule repeats over 100 days. Additionally, a specified fraction of agents live in dormitories, interacting with roommates daily (including weekends).
For each campus, we run simulations for two outbreak regimes. In the small outbreak regime, classroom and dormitory environmental risk factors are sampled uniformly in [0.1, 0.2], and . In the large outbreak regime, risk factors are sampled from [0.2, 0.3], and . For all experiments, and .
For all combinations of campus size, outbreak regime, and testing strategy, an ensemble of 100 randomly initialized schedules, environmental risk factors, and five initially infected agents were sampled. All parameters are summarized in
Table A1.
All experiments were originally run as Google Colab notebooks in an L4 GPU environment with 24 GB of RAM and took approximately 15 h of wall time with a single BKF simulation day taking approximately 810 ms, and a single MF simulation day taking approximately 90 ms. The source code used to generate our results can be found at
https://github.com/UBragaNeto/State-Space_Agent-Based_Model_Infectious_Disease, accessed on 6 May 2026.
4.1. Observation Model Parameters
At each time step, we independently test agents selected randomly. Each test measures both Boolean state bits with symmetric error rates: false positive rate and false negative rate for the first bit, and false positive rate and false negative rate for the second bit. We evaluate testing rates of 1%, 5%, and 10% of the population per day.
4.2. Quantities of Interest
We analyze several performance metrics:
Ground Truth Test Error: The portion of tested agents whose tests do not match their true state. With symmetric 5% error on both bits, expected error is .
BKF/MF Test Error: The portion of tested agents whose BKF/MF states do not match their true states. Our model incorporates test error information via the likelihood, so we expect agents in the test set to be closer to ground truth in the BKF/MF than their naïve test results would indicate.
BKF/MF Total Error: The portion of all agents whose BKF/MF states do not match their true states. This evaluates the quality of estimates for untested agents.
BKF/MF Precision: Measures the accuracy of positive predictions (exposed or infected) as the ratio of true positives over true positives + false positives.
BKF/MF Recall: Measures the ability to find all positives (exposed or infected) as the ratio of true positives over true positives + false negatives.
Relative Error of E + I Counts: Let denote the vector of Exposed agents predicted by the BKF/MF over 100 days, and let denote ground truth. Similarly, define and for Infectious agents. The relative error on E + I counts is .
Ground Truth Within MSE: To determine uncertainty quantification calibration, we measure the portion of ground truth E + I counts that lie in the interval .
4.3. Small College Environment: Particle Filter Approach
The small college comprises 100 agents moving among three classroom buildings and three dormitories over 100 days. Half the population lives on campus; the other half lives off campus with a daily exposure probability of . The spontaneous exposure rate is .
The small outbreak regime had an average size of 3.12 infections with a peak of 7.68. The large outbreak regime had an average size of 14.61 infections with a peak of 54.23.
Figure 2a,b show 100 random initialization mean and
standard deviation Susceptible (S), Exposed + Infected (E + I), and Recovered (R) curves for the small outbreak and large outbreak regimes, respectively.
Small Outbreak Performance:
Table 1 summarizes results averaged over 100 ensembles. At a 1% test rate, BKF test error is roughly half the ground truth test error (0.0588 vs. 0.0998). This improves to less than half at 10% testing (0.0460 vs. 0.0968), indicating growing ability to outperform naïve test results as observations increase.
The total BKF error begins at 0.1411, showing an accurate prediction of 85.89% of agent states with noisy data on only 1% of the population. This improves to 92.53% accuracy at 10% testing. Precision begins at 0.6703 and slightly increases to 0.6995, indicating nearly 70% accuracy in reported exposed and infected states for all test rates. Recall starts quite low at 0.0261, but is over double the testing rate of 1%. It remains over double the testing rate as it climbs to 10%, indicating an ability to find exposed and infected agents at a higher rate than naïve testing.
At 1% testing, relative error is 0.6942 (approximately E + I counts around the 3.12 average) with 99.94% of error captured within MSE. At 10% testing, error drops to 0.5104 ( counts) with 93.94% MSE coverage, indicating the BKF becomes slightly overconfident as observations become more informative.
Large Outbreak Performance:
Table 2 summarizes results. For all testing rates, BKF test error remains around half the ground truth error, indicating the BKF outperforms naïve tests even as SEIR states become broadly distributed.
BKF total error begins at 0.1750 (82.50% accuracy with a single daily test) and improves to 85.55% at 10% testing, showing modest improvement with tenfold increase in observations. Precision begins at 0.5654 at a 1% test rate and climbs to 0.6591 at 10% testing. This is lower than the small outbreak scenario since agent states are more scattered, making them less reliable to predict, but there is a noticeable increase in performance with the testing rate that the small outbreak scenario lacks. Recall starts much higher than the small outbreak at 0.2195 and similarly grows with the test rate, indicating a better ability to find exposed and infected agents when the outbreak signal is large.
Relative error begins at 0.1575 ( counts around the 14.61 average) with 90.03% error within MSE. This demonstrates strong ability to track large outbreaks with a single daily test. Error improves modestly with increased testing, again becoming slightly overconfident.
4.4. Large University Environment: Mean-Field Approach
The large university comprises 50,000 agents moving among 1000 classrooms and 5000 dormitories over 100 days. 20% live on campus; the other 80% have daily exposure probability of . The spontaneous exposure rate is , and effective infectious susceptibility scaling is 0.9. This choice of was found to best fit the simulation data and does not seem overly sensitive to deviations, with yielding similar results.
The small outbreak regime had an average size of 8.57 infections with a peak of 17.13. The large outbreak regime had average size 7393 infections with peak 29,329.
Figure 3a–c show 100 random initialization mean and ±2 standard deviation S, E + I, and R curves for the small outbreak S, small outbreak E + I and R, and large outbreak regimes, respectively.
Small Outbreak Performance:
Table 3 summarizes results. MF test error performs much better than the expected ground truth error of 0.0975. At 1% testing, MF achieves test error 0.0007, improving to 0.0003 at 10%. Total error ranges from 0.0007 at 1% to 0.0004 at 10%. While seemingly exceptional, these results reflect that >99% of the population remain susceptible, with errors occurring predominantly around the average 8.57 E + I agents out of 50,000.
Precision at a 1% testing rate is 0.6859, rising to 0.7258 at 10% testing. This similarly hovers around the 70% value of the small college, small outbreak scenario, but now with a larger improvement as the test rate grows. Recall is woefully below the 1% test rate at 0.0015; however, it grows above the 10% test rate at 0.1282. This indicates the mean field approximation needs more tests to start capturing the exposed and infected agents for small outbreaks.
Relative error starts at 0.7391 at 1% testing (approximately ±6 E + I counts around the 8.57 average) with 99.29% error within MSE. At 10% testing, error drops to 0.4922 (±4 counts) with 99.74% MSE coverage. All testing rates show the MF’s ability to capture the endemic nature with very few estimates dying off or blowing up.
Large Outbreak Performance:
Table 4 summarizes results. MF test error begins at half the ground truth at 1% testing (0.0447 vs. 0.0972) and drops to a third at 10% (0.0313 vs. 0.0974). Unlike the small outbreak, where nearly everyone remains susceptible, the large outbreak can have the population in any mixture of SEIR states. The MF demonstrates clear ability to outperform naïve test results.
Total error begins at 0.1657 at 1% testing, indicating accurate prediction of 83.43% of states with noisy tests on only 1% in any SEIR configuration. Performance increases to 87.89% accuracy at 10% testing. Precision starts similar to the small college, large outbreak scenario at 0.5715 for 1% testing. It then increases at a slightly faster rate to 0.7097 at 10% testing. Recall is much improved for large outbreaks, starting at 0.3505 for 1% testing and growing to 0.6298 at 10%, showing a strong ability to find exposed and infected agents far above the testing rate.
Relative error begins at 0.0931 (roughly ±688 E + I counts around the 7393 average) and drops to 0.0221 at 10% testing (±163 counts). This is exceptional performance with nearly all errors (0.9998–0.9999) captured within MSE, indicating nearly perfect uncertainty quantification for all test rates.
4.5. Key Findings
Results demonstrate several important insights:
Scaling Transition: Experiments illustrate a practical transition from exact particle filtering (small college, ) to scalable mean-field inference (large university, N = 50,000). The mean-field approximation scales linearly with population size and remains effective at realistic scales.
Statistical Denoising: Both methods provide clear evidence of denoising relative to raw test outcomes. With 5% bit error, naïve tests show ∼9.75% error. In small college experiments, BKF reduces tested-agent error by approximately half in both outbreak regimes. In large university experiments, mean-field similarly improves substantially over naïve tests in large outbreaks; in small outbreaks, extremely low error reflects that nearly all agents remain susceptible.
Outbreak Magnitude Effects: Outbreak magnitude strongly affects difficulty. In both environments, large outbreaks are easier to track in aggregate epidemic intensity. For the small college, large outbreaks yield low relative errors (0.1459–0.1575), whereas small outbreaks are more difficult (0.5104–0.6942). The pattern amplifies in the large university: small outbreaks have large relative errors (0.4922–0.7391), while large outbreaks are tracked closely (0.0221–0.0931). Similar findings occur for recall. Small college, large outbreaks have substantially larger recall (0.2195–0.5599) compared with small outbreaks (0.0261–0.2680). The large university is similar, where large outbreak recall (0.3505–0.6298) drastically outperforms small outbreaks (0.0015–0.1282). When few agents are infected, the epidemic signal is weak; when many are infected, the system provides a stronger aggregate signal.
Testing Rate Diminishing Returns: Increasing testing from 1% to 5% produces noticeable improvement. Additional improvement from 5% to 10% is smaller, suggesting that modest daily testing provides the most benefit for population-level monitoring.
Uncertainty Calibration: Small college, small outbreaks show near-perfect coverage at low test rates (0.9994 at 1%) but decrease as tests become more informative (0.9394 at 10%), indicating mild overconfidence. In small colleges, large outbreaks, coverage is lower overall (0.8667–0.9003). By contrast, in large university, large outbreaks, MF coverage is essentially perfect (0.9998–0.9999). All experiments maintained >86% coverage, showing strong uncertainty calibration.
Particle Filter vs. Mean-Field Tradeoff: Particle filtering provides principled approximation of the full joint posterior, with strong performance for small populations under sparse testing, but computational cost limits use at large N. Mean-field sacrifices joint correlation structure, but offers linear-time scalability and, in these regimes, delivers accurate reconstruction of both individual states and population-level epidemic intensity.
5. Conclusions
We have presented a comprehensive framework for epidemic state estimation that combines Boolean state-space representations of agent-based SEIR models with optimal filtering methods tailored to different population scales. Our key contributions include the following: (1) formal Boolean encoding of epidemiological states enabling discrete-state optimal filtering; (2) application of Boolean Kalman particle filtering to small populations demonstrating accurate inference from sparse, noisy observations; (3) development of mean-field approximation for large populations achieving linear computational complexity while maintaining accuracy and (4) comprehensive empirical evaluation across multiple outbreak scenarios, testing rates, and population scales.
Our experimental results demonstrate that mean-field approximations achieve exceptional performance when populations are large and epidemic signals are strong, even 1% daily testing rates enable accurate population-level monitoring, and large outbreaks are easier to track due to a high signal-to-noise ratio. The Boolean state-space framework, combined with scale-appropriate inference methods, provides a promising foundation for real-time epidemic monitoring systems balancing modeling fidelity, computational efficiency, and estimation accuracy.
Future Directions
Several promising extensions warrant future investigation:
Hybrid Particle Filter / Mean-Field Methods: Our results suggest that particle filtering excels at capturing correlations in small groups while mean-field scales to large populations. A hybrid approach that applies particle filtering to highly connected subpopulations (e.g., households, classrooms) while using mean-field for inter-group interactions could combine the strengths of both methods.
Adaptive Testing Strategies: While we used fixed-rate random testing, a more sophisticated strategy based on the selection of agents with maximal MSE could further reduce testing requirements. This would alter the observation model to account for probabilistic selection of the test set and may break the conditional independence assumption in the likelihood.
Parameter Learning: Our current framework assumes known parameters (Test Rate,
,
,
,
,
,
,
,
,
,
,
). Joint state and parameter estimation using particle MCMC or variational methods could enable real-time parameter learning from surveillance data [
3].
Extended States: Extending the Boolean representation to handle symptomatic status, vaccination status, or multiple circulating strains (requiring additional Boolean bits per agent) would enable modeling of competitive dynamics and strain-specific interventions.