## 1. Introduction

The COVID-19 epidemic started in December 2019 in Hubei province, China. Since then, the disease has spread around the world reaching the pandemic stage, according to the WHO [

1], on 11 March. The first cases were detected in France on 24 January. The infection fatality ratio (IFR), defined as the number of deaths divided by the number of infected cases, is an important quantity that informs us on the expected number of casualties at the end of an epidemic, when a given proportion of the population has been infected. Although the data on the number of deaths from COVID-19 are probably accurate, the actual number of infected people in the population is not known. Thus, due to the relatively low number of screening tests that have been carried out in France (about five in 10,000 people in France to be compared with 50 in 10,000 in South Korea up to 15 March 2020; sources: Santé Publique France and Korean Center for Disease Control) the direct computation of the IFR is not possible. Based on the PCR-confirmed cases in international residents repatriated from China on January 2020, Verity et al. [

2] obtained an estimate of the infection fatality ratio (IFR) of 0.66% in China, and, adjusting for non-uniform attack rates by age, an IFR of 0.9% was obtained in the UK [

3]. Using data from the quarantined Diamond Princess cruise ship in Japan and correcting for delays between confirmation and death, Russel et al. [

4] obtained an IFR of 1.3%.

Using the early data (up to 17 March) available in France, our objectives are: (1) to compute the IFR in France, (2) to estimate the number of people infected with COVID-19 in France, and (3) to compute a basic reproduction number ${R}_{0}$.

## 3. Results

Model fit. To assess model fit, we compared the observations, i.e., the cumulated number of cases

${\Sigma}_{t}:={\Sigma}_{s=1,\dots ,t}{\widehat{\delta}}_{s}$, with the expectation of the observation model associated with the MLE (expectation of a binomial). Namely, we compared

${\Sigma}_{t}$ and

${\Sigma}_{s=1,\dots ,t}{n}_{s}\phantom{\rule{0.166667em}{0ex}}{p}_{s}^{\ast}$ with

and

${I}^{\ast}\left(s\right)$,

${S}^{\ast}\left(s\right)$ the solutions of the system (

1) (at time

s) associated with the MLE. The results are presented in

Figure 1. We observe a good match with the data.

Infection fatality ratio and actual number of infected cases. Using the posterior distribution of the model parameters (the pairwise distributions are presented in

Appendix A, see

Figure A1), we computed the daily distribution of the actual number of infected peoples. Using the relation (

2) together with the data on

$D\left(t\right)={\Sigma}_{t}$, we deduce the distribution of the parameter

$\gamma \left(t\right)$, at each date. The IFR corresponds to the fraction of the infected who die, that is:

We thus obtain, on 17 March an IFR of 0.5% (95%-CI: 0.3–0.8), and the distribution of the IFR is relatively stable over time (see

Figure A3 in the

Appendix A).

Additionally, the distribution of the cumulated number of infected cases (

$I\left(t\right)+R\left(t\right)$) across time is presented in

Figure 2. We observe that it is much higher than the total number of observed cases (compare with

Figure 1). The average estimated ratio between the actual number of individuals that have been infected and observed cases

$(I\left(t\right)+R\left(t\right))/{\Sigma}_{t}$ is eight (95%-CI: 5–12) over the considered period.

Taking into account the data in the nursing homes. The above computation of the IFR is based on the official counting of deaths by COVID-19 in France, which does not take into account the number of deaths in nursing homes. Based on the local data in Grand Est region, we infer that the IFR that we computed has been underestimated by a factor about $(1015+570)/1015\approx 1.6$, leading to an adjusted IFR of 0.8% (95%-CI: 0.45–1.25).

Basic reproduction number. With SIR systems of the form (

1), the basic reproduction number

${R}_{0}$ can be computed directly, based on the formula

${R}_{0}=\alpha /\beta $ [

15]. When

${R}_{0}<1$, the epidemic cannot spread in the population. When

${R}_{0}>1$, the infected compartment

I increases as long as

${R}_{0}\phantom{\rule{0.166667em}{0ex}}S>N=S+I+R$. We computed the marginal posterior distribution of the basic reproduction number

${R}_{0}$. This leads to a mean value of

${R}_{0}$ of 3.2 (95%-CI: 3.1–3.3). The full distribution is available in the

Appendix A (

Figure A4).

Sensitivity of the results with respect to the fixed model parameters. We computed the MLE with a larger infectious period (

$1/\beta $) of 20 days estimated by [

12]. This leads to a much larger basic reproductive number

${R}_{0}=4.8$ and a factor

$\times 15$ between the reported cases and the actual number of cases. However, the value of the IFR remains unchanged (0.5%). We also checked if the window width of the smoothing (moving average over 5 days) had an impact on our results. Computations of the MLE with a window width of 3 days (and

$\beta =1/10$) led to the same results as those presented above, namely a

${R}_{0}=3.2$ and an IFR of 0.5%.

## 4. Discussion

On the IFR and the number of infected cases. The actual number of infected individuals in France is probably much higher than the observations (we find here a factor ×8), which leads at a lower mortality rate than that calculated on the basis of the observed cases: we found here an IFR of 0.5% based on hospital death counting data, to be compared with a case fatality rate (CFR, number of deaths over number of diagnosed cases) of 2% on 17 March. Adjusting for the number of deaths in the nursing homes, we obtained an IFR of 0.8%. These values for the IFR are consistent with the findings in [

2] (0.66% in China) and [

3] (0.9% in the UK). The value of 1.3% estimated on the Diamond Princess cruise ship [

4] falls above the top end of our 95% CI. This reflects the age distribution on the ship, which was skewed towards older individuals (mean age: 58 years), among whom the IFR is higher [

3,

4].

The objective of our study was to estimate the IFR based on

early data, before large scale surveys become available. By late April, new data and preliminary studies are available and can be compared to our results. An antibody study in New York released on 24 April 2020 shows about 14 percent tested positive, corresponding to 2.7 million cases, to be compared with the 271,000 confirmed cases and a statewide total of 15,500 deaths. This corresponds to an IFR of 0.6%. In France, another preliminary study conducted by Pasteur Institute [

16], and based on a joint estimate from French data up to 14 April (hospital death counting data) and from the Diamond Princess cruise ship data finds an IFR of 0.5% thus confirming our result. By 28 April, the number of deaths from COVID-19 in Lombardy (Italy) is 13,575 (source: Ministero della Salute), for a population of 10 million people, showing that the IFR is at least 0.14%. On the other hand, in South Korea where the number of detected cases rapidly reached a plateau, suggesting a small proportion of undetected cases, the ratio between the number of deaths and the number of positive cases is 244/10,752 ≈ 2.3% (source: Johns Hopkins University Center for Systems Science and Engineering [

5]), which can be considered as an upper bound for the IFR, though overestimated.

If the virus led to contaminate 80% of the French population [

3], the total number of deaths to deplore in the absence of variation in the mortality rate (increase induced for example by the saturation of hospital structures, or decrease linked to better patient care) would be 336,000 (95%-CI: 192,000–537,000), excluding the number of deaths in the nursing homes. This estimate could be corroborated or invalidated when 80% of the population will be infected, eventually over several years, assuming that an infected individual is definitively immunised. It has to be noted that measures of confinement or social distancing can decrease both the percentage of infected individuals in the population and the degree of saturation of hospital structures.

On the value of ${R}_{0}$. The estimated distribution in France is high compared to recent estimates (2.0–2.6, see [

3]) but consistent with the findings in [

17] (2.24–3.58). A direct estimate, by a non-mechanistic method, of the parameters

$(\rho ,{t}_{0})$ of a model of the form

${\widehat{\delta}}_{t}={e}^{\rho \phantom{\rule{0.166667em}{0ex}}(t-{t}_{0})}$ gives

${t}_{0}=$ 36 (5 February) and

$\rho =$ 0.22. With the SIR model,

${I}^{\prime}\left(t\right)\approx I\phantom{\rule{0.166667em}{0ex}}(\alpha -\beta )$ for small times (

$S\approx N$), which leads to a growth rate equal to

$\rho \approx \alpha -\beta $, and a value of

$\alpha \approx 0.32$, that is to say

${R}_{0}=3.2$, which is consistent with our distribution of

${R}_{0}$. Note that we have assumed here a infectiousness period of 10 days. A shorter period would lead to a lower value of

${R}_{0}$.

On the uncertainty linked to the data. The uncertainty on the actual number of infected and therefore the IFR are very high. We must therefore interpret with caution the inferences that can be made based on the data we currently have in France. In addition, we do not draw forecasts here: the future dynamics will be strongly influenced by the containment measures that will be taken and should be modelled accordingly.

On the sensitivity of the results with respect to the fixed model parameters. We deliberately chose a parsimonious model with a few parameters to avoid identifiability issues. However, we needed to fix some parameter values. In particular, we assumed a mean duration of the infectious period (

$1/\beta $) of 10 days. A much larger infectious period of 20 days (corresponding to the median period of viral shedding found in [

12]) would lead to a much larger basic reproduction number

${R}_{0}=4.8$ (but still within the range 1.4–6.49 described in [

18]) and a factor

$\times 15$ between the reported cases and the actual number of cases. However, our main result on the IFR would remain unchanged (0.5%). We also assessed the sensitivity of the inference with respect to the prior knowledge, by proposing a set of more informative uniform prior distributions than the set specified in the main text. Overall, this prior modification does not significantly influence the posterior distributions; see

Appendix A.

On the hypotheses underlying the model. The data used here contain a limited amount of information, especially since the observation period considered is short and corresponds to the initial phase of the epidemic dynamics, which can be strongly influenced by discrete events. This limit led us to use a particularly parsimonious model in order to avoid problems of identifiability for the parameters. The assumptions underlying the model are therefore relatively simple and the results must be interpreted with regard to these assumptions. For instance, the date of the introduction ${t}_{0}$ must be seen as an efficient date of introduction for a dynamics where a single introduction would be decisive for the outbreak and the other (anterior and posterior) introductions would have an insignificant effect on the dynamics.

A more complex epidemiological model of the COVID-19 epidemic in China has been proposed in [

19], with an infectious class divided into several compartments (asymptomatic individuals, unobserved symptomatic infectious and observed symptomatic infectious). The authors use this model in [

20] to make forecasts on the cumulative number of cases in China, while taking into account management strategies. In these two studies the authors emphasise the importance of being able to estimate the fraction of infectious cases that are not observed in order to forecast the dynamics of the epidemic. Our study, though based on a simpler SIR model, shows that this fraction can be estimated based on early data.