Analysis of COVID-19 Spread in Tokyo through an Agent-Based Model with Data Assimilation

In this paper, we introduce an agent-based model together with a particle filter approach to study the spread of COVID-19. Investigations are mainly performed on the metropolis of Tokyo, but other prefectures of Japan are also briefly surveyed. A novel method for evaluating the effective reproduction number is one of the main outcomes of our approach. Other unknown parameters are also evaluated. Uncertain quantities, such as, for example, the probability that an infected agent develops symptoms, are tested and discussed, and the stability of our computations is examined. Detailed explanations are provided for the model and for the assimilation process.


Introduction
The outbreak of COVID-19 had a huge impact on human society, and has also triggered an enormous scientific response. In this paper, we provide a rather detailed account of an agent-based model for the estimation of the effective reproduction number R t . Our approach is based on data assimilation and on a particle filter scheme.
The effective reproduction number is one quantity of major importance for any epidemiological studies, as indicated in the studies [1][2][3][4][5] and references therein. Since it works as an estimator of the number of secondary cases produced by a primary case at time t, it provides crucial information about the spread of the disease, allowing health services to evaluate previous interventions and forecast the evolution of the epidemic. However, its precise evaluation is very challenging, and several independent approaches exist or have been recently developed. As thoroughly investigated in [6,7], all these techniques present advantages and flaws, but nevertheless each of them brings some information.
One of the earliest attempts to implement epidemiological observations with epidemic models through a variational data assimilation approach was published in [8]. In relation to COVID-19, by March 2020, some investigations using a method called Ensemble Smoother with Multiple Data Assimilation (ES-MDA) appeared in a preprint format [9], quickly followed by [10], which forecasts the evolution of the epidemic in Pakistan with a Kalman filter coupled with the Arima model. Still, in the early responses, let us mention the implementation of a switching Kalman filter [11] to overtake the Gaussian and linearity assumptions, while an extended Kalman filter is used in [12] for an evaluation of the effective reproduction number. However, as acknowledged by the authors in [12] "the presented approach can only be used for countries/regions/areas that have performed mass testing with laboratory confirmation", which drastically reduces its applicability. In another direction, let us mention [13] where a Kalman filter and random forest algorithms are applied to make short-term predictions in India.
Another important work in this framework is the international collaboration [14], in which it clearly presents that the regional characters of some parameters are important. More recently, let us mention [15] in which a model with vaccination and using an Ensemble Kalman Filter is implemented for Saudi Arabia, and [16], which discusses vaccination strategies by using data assimilation techniques. Reference [17] tracks the effective reproduction number with the Kalman Filter, and provides estimates for the effectiveness of non-pharmaceutical interventions, while [18] uses Bayesian sequential data assimilation for forecasting the evolution of COVID-19 in several Mexican localities. Additional recent investigations mixing data assimilation techniques and epidemic models can be found in [19][20][21][22][23][24][25].
In contrast to the references mentioned above, contact-based (or agent-based) models provide a more microscopic approach for the spread of diseases; see, for example, [26]. Up to the best of our knowledge, very few works have investigated the COVID-19 outbreak with such models coupled with epidemiological data. As a rare example, let us mention [27] which uses an ensemble Kalman filter and a undirected graph of 200 nodes with a smallworld topology for estimating the evolution of the epidemic.
In the present study, we further develop the agent-based model and the assimilation of data, but simplify the graph structure. More precisely, we propose a method coupling a particle filter with an agent-based model for the evaluation of the effective reproduction number R t . It consists of two main steps, which are repeated on a regular basis: (1) Generate a prediction for the model's state on the next time step based on the current model's state, and move the time step forward; (2) update the model's state based on observations on the current time step. Since new data about COVID-19 are available on a daily basis, we regard one day as one time step in our model. In one experiment, numerous independent simulations are considered simultaneously, and consequently some quantities of interest can be estimated with a probability distribution.
We also aim to make our approach and results easily reproducible. Since some parameters in the model are location-dependent, and in order to rely on a constant source of observations, our investigations are mainly focusing on the metropolis of Tokyo. We also provide results for two other Japanese prefectures and the entire country, and these investigations are briefly discussed. The selection of these prefectures is motivated by the diversity of their size, and by the respective impact of the epidemic. This illustrates that our approach can be adapted to different cities, regions, or countries, with little effort.
In Figure 1 we present our main result about the evaluation of the effective reproduction number R t for Tokyo, from 6 March 2020 to 14 August 2021. Our approach provides a daily mean value and confidence intervals based on the distribution of parameters from numerous independent simulations.
Before moving to a more precise description of the content of this paper, let us emphasize a few key points of our investigations. First of all, our approach does not rely on a knowledge of the size of the underlying population. This is in contrast with the standard SIR compartmental model involving differential equations, for which the size of the susceptible population is part of the initial condition. The proportion of fully (or partially) vaccinated persons inside the population is also not a necessary information for our investigations. Accordingly, acquiring a immunity (or not) after recovering from the disease does not play any role in our approach. In fact, these non-dependencies are a special feature of the phenomenological character of our method for the computation of the effective reproduction number. Relatedly, note that our computations does not involve an inverse problem, nor rely on Bayesian estimation, or on any assumption of Poisson distributions. For comparison, let us mention for example [28] which presents an "inverse method for extracting the full transfer function of infection, of which the reproduction number is the in-tegral", or [29] for a statistical approach for the estimation of the instantaneous reproduction number, involving Bayesian estimation and the assumption of some Poisson distributions. Figure 1. Effective reproduction number. The black curve indicates the mean value obtained from numerous independent simulations; the dark and light colored regions refer to the 68% and 90% confidence intervals, respectively. States of emergency correspond to grey regions.
Let us now describe the content of our paper. In Section 2 we precisely introduce our agent-based compartmental model (an extended version of the SEIR model): its compartments, the flows between these compartments, the probability of agents entering each compartment, and the time agents spend in each compartment. Some of these values are fixed, in which case we provide a reference; other values are random variables, whose distributions are either provided by some references or will be evaluated during the experiments. The procedure for producing secondary cases is also described, and the computation of the effective reproduction number R t is explained.
In Section 3, the evolution process and the data assimilation technique are introduced. The leading idea is to consider simultaneously numerous independent epidemics (called particles), and to associate to each of them a weight on a daily basis. The weights are assigned based on the proximity of the simulated epidemics to the observations. More precisely, since direct observations are only possible for three compartments of the model (the agents under medical treatment, the agents who have recovered after medical treatment, and the deceased agents), we compare the number of agents in these three compartments to the corresponding observations, and compute the weights accordingly. Various plots illustrate the outcomes of our experiments, such as, for example, the total number of asymptomatic agents. A resampling process necessary for our experiments is also introduced and discussed. In the final part of this section, we provide some estimations about two parameters evaluated during the experiments.
A discussion about the probability than an infected agent develop symptoms is presented in Section 4. Indeed, for COVID-19 this probability is still not precisely known, and contradictory values can be found in the literature; see, for example, [30,31]. Similarly, it is not certain by how much asymptomatic agents are less infectious than symptomatic agents, and various numbers can be found in the literature [32]. For this reason, we performed stability tests on our model with different probabilities of developing symptoms, and different relative infectivities, and compared the resulting effective reproduction numbers. At a more technical level, we also discuss in this section the relationship between the stability of the computations and the frequency of resamplings introduced in Section 3.
In Section 5, the last section of this paper, we compare our results for the effective reproduction number with similar estimations from independent sources. In particular, one of these estimations is based on a simplified version of a maximum likelihood estimation for the effective reproduction number provided by H. Nishiura [33]. Additional results for other prefectures and for Japan are provided and discussed. We also expose the strength and the weakness of our approach. Some possible improvements are presented, and future extensions are finally discussed.

Description of the Model
The model consists of an extension of the well-known SEIR model, but is adapted in the context of an agent-based model. It is based on seven compartments, as shown in Figure 2, which can be described as follows:

S
The compartment of all susceptible agents. E The exposed agents: right after infections, agents move from S to E, and they are not infectious in this compartment. They are not recorded by health authorities.

I a
The asymptomatic infectious compartment: one part of these agents are asymptomatic (will never develop any symptom), while the other part of these agents are pre-symptomatic. The asymptomatic agents are leaving I a after recovery. Presymptomatic agents will move to I s as symptoms develop. They are not recorded by health authorities yet. Note that the labels and the definitions used for I a and I s (introduced below) coincide with the ones used in [34], but I a corresponds to the compartment P in the SEPIA model, see [35].

I s
The symptomatic infectious compartment: agents are showing mild or severe symptoms. Most of these agents will look for medical support, but others will self-quarantine without contacting any health authority. The first cohort will move to T once supervised by health authorities, while the latter cohort will leave I s after recovery. T The agents undergoing treatment. These agents are treated in hospitals, at home, or in other facilities. They are recorded by health authorities. D The deceased agents coming from T. They are recorded by health authorities.

R
The recovered agents: The agents coming from T are recorded by health authorities, while the ones coming from I a or from I s are not recorded by any health authority. As shown in Figure 2, should a compartment have multiple outflowing paths, its agent will enter different paths following certain probabilities. We provide in Table 1 the different values and the sources. Note that these values will be discussed again with additional experiments in Section 4. Note also that the probabilities P d and P r will be evaluated during the experiments, and therefore are time (≡day) dependent. For the probability P q of self-quarantining without contacting any health authority, since no information is available for Tokyo, we use the result of the survey [36] conducted in Osaka. As emphasized in this survey, it is mainly because of social pressure that some symptomatic persons prefer self-quarantining without contacting any health service. In some of these compartments (Cpt), the number of days spent by agents is important. We list in Table 2 the necessary information about these durations and provide the sources. Some given distributions are provided in Figure 3. Note also that the time duration τ T (time in compartment T) will be evaluated during the experiments, and therefore is time dependent. Note that if any of the parameters are time dependent, the behavior of agents are determined by the parameters on the day they enter their current compartments.   Let us now emphasize some assumptions made in our model: (1) Only agents in I a are going to spread the disease. Indeed, asymptomatic agents or presymptomatic agents in I a are not aware of their condition but are already infectious.
On the other hand, agents in I s are aware of their condition and are supposed to take all precautionary actions for not spreading the disease. Clearly, this is a simplifying assumption, but it is known that the transmission of the virus is mostly taking place before the appearance of the symptoms, with a gradual decline of virus load after the appearance of symptoms [40]. Also, without this assumption one should introduce an additional coefficient reflecting the fraction of symptomatic agents (aware of their status) spreading the disease. Clearly, very little information is available about such a parameter. (2) Births and natural deaths are not included in the diagram flow of Figure 2. As discussed in Section 5, these agents would not play any role. (3) The immunization status of agents in the compartment R is not specified and does not play any role in our model. This is also discussed in Section 5.
Thus, for agents in I a , a simplified daily offspring distribution Od is provided in Figure 4, and we refer for example to [41] for more details. The implementation of this distribution is explained in the next section. One important question concerns the relation between the transmission coefficient for asymptomatic agents and the transmission coefficient for pre-symptomatic and symptomatic agents. For our investigations, we shall rely on the result of the systematic review [32]. Consequently, we shall fix this relative infectivity coefficient k to 0.58. This factor is slightly smaller but of a comparable scale compared to earlier investigations; see, for example, [30]. This factor will be discussed again with additional experiments in Section 4.
Let us set r t ∈ [0, 1] as a time-dependent multiplicative factor which takes into account the real interaction between agents. This factor (updated on a daily basis) depends on individual behavior but also on non-pharmaceutical interventions. As a consequence, for asymptomatic agents in I a , the daily production of second-generation infections is given by the daily offspring production with all # secondary cases ≥1 entries' probabilities multiplied by the factor k · r t . For pre-symptomatic agents in I a , the daily production of second-generation infections is given by the same rule, but only with multiplicative factor r t instead of k · r t . Therefore, the effective reproduction number R t is given by the formula where E(s.c.) stands for the mean value of the secondary cases provided by the offsping distribution. Note that no special interpretation is given for the factor r t or the scaling factor 2.17.

Evolution Process, Particle Filter, and Main Results
The observation data for compartments T, R, and D (colored in yellow in Figure 2) provided by health authorities for Tokyo start on 6 March 2020. However, the epidemic had already started in January at the latest, since the departure of the Diamond Princess from Yokohama took place on 20 January 2020. For this reason, and after several trials, we fixed the following initial conditions for our experiment: 17 January 2020 (long after having chosen this date as the most suitable one for the start of our simulations, we were informed that the first case of COVID-19 detected in Japan was on 16 January 2020). This date corresponds to 49 days before the start of the observations available for Tokyo.
On a daily basis, some parameters have a certain freedom to change their values. In order to describe their evolutions, let us use the notation TN(µ, σ; a, b) for the truncated normal distribution of mean µ, variance σ 2 , minimum value a and maximal value b. For the minimum value a and the maximum value b of this distribution, we use some prior information from the literature, or the normalized interval [0, 1]. For the standard deviation σ, we chose a value corresponding to (b − a)/20. Indeed, it was observed that this value provides enough freedom to the system for its daily evolution, while a larger value generates a spurious self-averaging effect. For the evolution of r t with t ≥ 1, we randomly chose its value according to the distribution TN(r t−1 , 0.05; 0, 1). The evolution of P d is also given by using a truncated normal distribution, namely, for t ≥ 1 the value P d (t) was chosen randomly according to the distribution TN(P d (t − 1), 0.0025; 0, 0.05).
Since the time spent undergoing treatment, τ T , is integer valued, its evolution is slightly more complicated, but nevertheless follows a scheme similar to the previous two parameters. The mean value E(τ T (t)) of τ T (t) was chosen randomly according to the distribution TN(E(τ T (t − 1)), 0.75; 4,19), where the minimum and maximum values were fixed according to information provided by health authorities [39]. Then, one constructs a distribution supported on the greatest integer less than or equal to E(τ T (t)) and the least integer greater than or equal to E(τ T (t)), and such that its expectation is equal to E(τ T (t)). The agents entering the compartment T on that day are then assigned a time in T chosen at random according to this distribution.
Each agent in the compartment I a will infect susceptible agents belonging to the compartment S on a daily basis. We now describe this process. Let us denote by MN(x j , n; p j ) ≡ MN(0, 1, 2, 3, 4, 5, n; p 0 , p 1 , p 2 , p 3 , p 4 , p 5 ), the multinomial distribution for n trials in the set of values x j ∈ {0, 1, 2, 3, 4, 5} with the probability of x j given by p j . We also denote by the vector X = (X 0 , X 1 , X 2 , X 3 , X 4 , X 5 ) one realization of this distribution. For example, if p j is the the probability of having j offsprings as given in Figure 4, then one has ∑ 5 j=0 X j = n, and the expectation value for X j is n · p j for any j ∈ {0, 1, 2, 3, 4, 5}.
On a given day t, assume that the compartment I a contains n asymptomatic agents and m pre-symptomatic agents. Then, the n agents will infect ∑ n j=1 j · X j susceptible agents, where X is one realization of the multinomial distribution MN(x j , n; p a j ), with p a j = k · r t · p j for j ∈ {1, 2, 3, 4, 5} and p a 0 = (1 − k · r t ) + k · r t · p 0 . Similarly, the m agents will infect ∑ n j=1 j · Y j susceptible agents, where Y is one realization of the multinomial distribution MN(x j , m; p s j ), with p s j = r t · p j for j ∈ {1, 2, 3, 4, 5} and p s 0 = (1 − r t ) + r t · p 0 . As highlighted in Figure 2, three compartments, T, D, and R, are associated with observations (collected from [42]) and their values on day t are denoted by T(t), D(t) and R(t), respectively. Some uncertainties surround these observations, and these uncertainties are commonly called observation error and will be denoted by σ. For our experiments, we consider two of these observation errors constant over time, while one will be time dependent. In fact, since the daily variations of T are quite important, the corresponding uncertainties will take them into account. More precisely, if we write T(t) for the number of agents undergoing treatment at time t, we set These expressions for the errors have been chosen after several trials; see the remark below. As shown later, even if these values look very large, they lead to a successful selection process.
Let us now describe more precisely the experimental setup. We used a particle filter approach; namely we created a large number N of independent simulations (=particles) of the propagation of the epidemic in Tokyo in one experiment. Each of the simulations is a realization of the model, and involves all the daily random processes described above. Typically, we chose N = 50 000 or N = 100 000, and kept this number constant during each experiment. The initial conditions we used are gathered in Table 3. On a daily basis, a weight w was assigned to each particle. Let i denote the index of the particle, and let T i (t), R i (t), and D i (t) denote the number of agents in the three colored compartments of Figure 2 for this particle at time t. At each time t ≥ 1 one first computes the not normalized weight W i (t) by the formula with the convention that w i (0) = 1 N . Then, the normalized weight for the particle i is given by With these weights, one obtains three distributions for the analyzed values of T, R, and D at time t, respectively.

Remark 1.
As already mentioned, the expressions for σ X with X ∈ {T, R, D} were chosen after several trials. Based on the expression (2) for the weight, let us provide more information about these observation errors. Clearly, if σ X is too big, the factor exp{− (X i (r)−X(t)) 2 2σ 2 X } will be close to 1 for all particles, and this factor will not play any role. On the other hand, if σ X is too small, then only the particles having |X i (t) − X(t)| ∼ = σ X will keep this factor in (0.1, 1), while those having a difference |X i (t) − X(t)| much bigger than σ X will be assigned an extremely small weight. In addition, since the non-normalized weight (2) is defined by the product of three factors, their relative importance is also encoded in the values given to σ T , σ R and σ D . During the experiments, we observed that the data about the treated agents are clearly more important than the data related to the recovered agents and to the deceased agents for the computation of the effective reproduction number. The choice of two large and constant values for σ R and σ D reflects this observation.
On a daily basis, the exact number of agents who recover, or the exact number of deceased agents, did not match too closely with the real data. However, we kept a weight related to these two quantities because the simulations can also not show arbitrary numbers of recovered agents or of deceased agents. The expression for σ T takes into accounted the current number of agents in the compartment T and the daily variation, together with a regularization factor 20 2 . In this expression, note that none of the numerical values has a very meaningful interpretation, but it is important that the same values are kept for the computation of the weight of all particles.
Now, if we keep computing weights with the formula described above, it will soon turn out that very few particles will concentrate almost all weights, and that nearly all other particles would end up with negligible weights. For dealing with this problem, a process called resampling was implemented: For a given time t, we took one realization X of the multinomial distribution MN(i, N; w i (t − 1)) with i ∈ {1, . . . , N}. Then, for a particle indexed i, we created X i copies of the particle and removed the original one. Finally, we assigned a weight 1/N to all new particles. Clearly, some particles will appear several times, but their trajectories will diverge due to the randomness involved on a daily basis.
One still has to decide when resamplings are organized. For this purpose, let us define the effective number of particles as and observe that if then N e f f = N, while if w i ≈ 1 for one i, and w j ≈ 0 for all j = i, then N e f f ≈ 1. For this reason, N e f f is often used as an indicator of the number of particles still playing a role in the process. As a consequence, we shall use the following rule: If N e f f < N 10 , then a resampling has to take place. As a practice to stabilize the experiments, a resampling is also performed if the most recent resampling took place 15 days ago. In Figure 5a, we present a typical graph for N e f f for a number of particles N = 100 000, while in Figure 5b we provide an indication about the time between the resamplings, namely whenever a resampling took place, we set y = 1/(number of days since the last resampling). We shall soon see that this information plays a role in the stability of the predictions. Let us stress that the computation of the weights and the implementation of the resampling start only on day 50 of the simulation, namely on March 6. Indeed, as there is no observation available before this date, the particles evolve freely and independently.
After running the experiment with observation data up to day t, we ran the experiment for one more day without observation data. With the weight computed on day t still assigned to each particle, one obtains the so-called forecast (or prior) values for T i , R i , and D i , on day t + 1. These values generate the corresponding forecast distributions for T, R, and D. Then, by computing the weights w i (t + 1), as explained above, one obtains the analyzed values for the compartments, also called posterior values, and the corresponding analyzed distributions. Clearly, the forecast distributions and the analyzed distributions do not match in general. The relative difference between the mean values of the forecast and the analyzed distributions for T can be visualized in Figure 6 (lower part). Let us just emphasize that the weights b computed with three compartments, not only with T. A representation of the three compartments with observations is also presented in Figure 6 (upper part) and in Figure 7. In these plots, the observations are represented in red, and the analyzed distributions are represented in green (confidence intervals) and in black (means). Similarly, we provide in Figure 8 the plot of the total number of asymptomatic agents in I a following the estimate provided in [32]. By the definition of asymptomatic agents, there is no observation corroborating these values.  As mentioned in Section 2, P d , P r , and E(τ T ) were evaluated during the experiments, which are the probability of dying, the probability of recovering from T, and the average time agents spend in T, respectively. With our approach, these quantities are provided with their distributions created by their values taken by the N particles. Clearly, if more accurate medical information was available, these parameters could be directly implemented in our model, but in their absence, we have to evaluate them from the observations. For the expected value of P d shown in Figure 9a, observe that the sudden increase visible between mid-February and mid-March 2021 is due to a conjunction of two factors: (1) a large number of death, see Figure 7b, due to the large number of treated agents in January 2021 (as visible in Figure 6) and (2) the relatively small daily number of agents recovering between mid-February and mid-March 2021, as visible in Figure 7a with the small slope of the curve during this period. In other words, this increase in P d is due to the longer stay in compartment T of agents who will ultimately die compared to agents who will recover after treatment. For the expected time E(τ T ) spent in T, we provide in Figure 9b the distributions E(τ T ) deduced from our investigations. The observed long-term trend of decay is expected to be due to the improvement of the medication and treatment for infected patients. Unfortunately, we cannot deduce the difference of time spent in T between patients with mild symptoms and patients with serious symptoms from our model.

Parameters' Dependence and Stability
As mentioned in Section 2, the relative infectivity k of asymptomatic agents compared to pre-symptomatic and symptomatic agents is a very uncertain parameter. For most of our simulations, we used k = 0.58 based on the information provided by the literature [32]. Since this value is quite uncertain, we compared the outcomes of simulations with all conditions the same but with k = 0.2, 0.58, and 1.0. The corresponding mean values for the effective reproduction number R t are shown in Figure 10a. The patterns are similar, but a larger k corresponds to a slightly larger value of R t , most of the time. This is not surprising since the factor k plays a role in the computation of the effective reproduction number, as shown in (1). On the other hand, with one noticeable exception around late August to early September 2020, the three curves cross the critical value R t = 1 more or less simultaneously. Similarly, the probability P s that an infected agent develops symptoms is also a somewhat controversial parameter. For our simulations, we used the proportion of 83% of symptomatic agents and 17% of asymptomatic agents from [32], but other sources mentioned a very different numbers [31]. Since asymptomatic cases are very difficult to detect, and since we cannot be fully confident in these probabilities, a sensitivity test was performed. To do this, we decreased the probability P s to 50% and to 20% and performed the whole experiment while keeping all other parameters the same as the original ones. Compared to the original setting, more agents are asymptomatic and recover without showing any symptoms in these two new scenarios. On the other hand, the number of symptomatic agents is more or less constant since they are dominated by the number of agents undergoing treatment, which is compared and adjusted according to the observations on a daily basis. In Figure 10b, the different curves for the mean R t have similar patterns. However, we observed that a bigger proportion of asymptomatic agents leads to a slightly bigger R t . Indeed, more total infected agents have to be created in this scheme, which means that a larger R t is necessary.
Let us mention one observation linking the stability of the computation of R t with the frequency of resamplings. In Figure 11, we provide the mean value of R t for two independent experiments under exactly the same setting, and each of them involves 100 000 particles. Clearly, these two curves are very similar, but at a few places, small differences are visible, such as, for example, in September 2020, in February and first half of March 2021, and in August 2021. Note that the same kind of discrepancies can also be observed in Figure 10 (around the first two mentioned periods). Now, if we look back to Figure 5b, it can be seen that these three periods follow or coincide with periods of time when resamplings took place more frequently (local maxima of the curve presented in Figure 5b). Our understanding is the following: when abrupt changes in the observations are taking place, and/or when the model is not accurate enough, more resamplings are necessary to keep the experiment following the observation data since only a small portion of particles will have the correct behavior. As the weights concentrate on those particles, N e f f drops rapidly, and more resamplings are triggered as a consequence. This process also quickly eliminates the short-term diversity of particles: since most of the particles are sampled from that small portion of particles, they will share similar behavior for a few days. This could lead to the following two consequences in the short term: (1) the system is less capable of adapting to further rapid changes and (2) the randomness plays a more important role. Figure 11. The mean values of the effective reproduction number obtained with two independent experiments, each involving 100 000 particles and the same setting.

Discussion and Conclusions
The effective reproduction number R t is certainly one of the most important parameters for the study of an epidemic, but it is also one of the most difficult parameters to estimate. Indeed, because of the delay between the infection of an agent and the appearance of symptoms, the effect of the current R t will only be visible in a few days. Accordingly, the current situation (number of agents in I a , I s , or in T) is related to some effective reproduction numbers which took place over the preceding days. Additionally, since the infectious period lasts for several days, it is not possible (as for example mentioned in [6]) to shift R t by some days to obtain a better picture. These drawbacks are well-known, and are taken into account by medical institutions. Our approach does not solve these problems, but it provides an alternative way of estimating R t , and makes it clear that R t is really the reproduction number taking place on day t. As emphasized in [6], this synchronicity is not shared by all methods estimating R t .
In Figure 12, we compare our evaluation for R t with two other resources. One of them is provided by Toyo Keizai, a book and magazine publisher based in Tokyo [42]. Their approach for the computation of R t is completely different from ours, and is based on a simplified version of a formula proposed by H. Nishiura [33]. The second resource is a companion paper [44] in which similar investigations were performed with a different approach: a continuous model (with differential equations) is used for the simulation, and the data assimilation part is based on the ensemble Kalman filter. Clearly, in the first several months of the epidemic, these different methods led to rather diverse values of R t . This phenomenon is expected to be due to inaccurate data, irregular release from T, and small numbers of agents. On the other hand, since mid-July 2020, the two curves from other studies and our mean R t share very similar patterns. From the figure, one can notice that our approach and the approach of [44] both provide a more stable R t estimation than the approach of [42]. Indeed, Ref. [42] uses the observations for the computation of R t directly, while our approach and the approach of [44] model that pandemic and use data assimilation to produce R t estimations. As a result, the rapid variations in R t estimations do not appear anymore. Figure 12. The effective reproduction number for Tokyo computed with different approaches: the black curve and the green confidence intervals are from our approach, the blue curve is provided by [44], and the red curve is borrowed from [42].
In Figures 13 and 14a, we present the effective reproduction numbers evaluated for Japan, Osaka, and Aichi. As of September 2021, Japan has about four times more accumulated infected agents than Tokyo, while Osaka has about half of Tokyo, and Aichi about one quarter of Tokyo. Our evaluations are compared once again with the R t provided by [42] and based on a maximum likelihood estimation. It clearly appears on these figures that the methods lead to similar trends and values for R t , especially after July 2020, and independently of the size of the epidemic. As for Tokyo, our curves do not present the rapid oscillations, and look more stable.
(a) (b) Figure 13. The effective reproduction number for Japan (a) and for Osaka (b): The black curve and the green confidence intervals are from our approach, and the red curve is borrowed from [42]. The black curve and the green confidence intervals are from our approach, and the red curve is borrowed from [42]. (b) Observation value of T (in red) and analyzed values with mean value (in black) and 68%, resp. 90%, confidence interval for Aichi.
The previous comparisons with populations of different sizes allows us to emphasize a specific aspect of our approach: the size of the population in S does not play any role. In fact, we never introduced the sizes of Tokyo, Osaka, Aichi, or Japan in our settings. The only ingredients which matter for the computation of the effective reproduction number R t are the data corresponding to the three yellow compartments of Figure 2. As a result of this independence of the size of the underlying population, births and deaths (not due to COVID-19) do not play any role. Accordingly, the immunization status of agents in R and the vaccination status of the population in S are not relevant for our investigations. On the other hand, it is clear that an efficient vaccination campaign, or a change in the variant of the virus, will have an impact in the effective reproduction number, but our approach based on real data automatically takes these parameters into account. In this sense, our model really computes the effective or phenomenological reproduction number.
We believe that our agent-based model, together with the particle filter approach, constitutes a rather simple method for studying the evolution of epidemics. Based on a model for the propagation of the disease, it allows us to perform several tests about uncertain parameters, as shown in Section 4. Note, however, that it implicitly takes some assumptions into account, such as, for example, that only agents in I a can infect new agents. The complexity of the model could also be enlarged, for example, by increasing the number of compartments and flows between them. Another simplification in our agent-based model is the absence of graph structure. Indeed, we could have considered each agent located on a node of a large graph (for example, homogeneous, random, or temporal) and implemented the interactions between agents by the edges of the graph. Such additional structures would have increased the complexity of the model, but the approach with a particle filter could still be implemented. Interacting populations could also be considered: In Figure 14b, we suspect that the sudden increase in infected agents and then agents undergoing treatment, visible around August 2020 in Aichi prefecture, could have been triggered by an influx of infected agents coming from other prefectures. Our model could then be elaborated on a modular network, such as, for example, the one developed in [45]. We plan to work on these issues in the future. Funding: The authors acknowledge the "RIKEN President's discretionary funds COVID-19 related R&D", and the RIKEN Pioneering Project "Prediction for Science". S.R. is supported by the grant "Topological invariants through scattering theory and noncommutative geometry" from Nagoya University, and by JSPS Grant-in-Aid for scientific research C No. 18K03328 & 21K03292.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on [42]. Program available upon request.