Understanding Unreported Cases in the COVID-19 Epidemic Outbreak in Wuhan, China, and the Importance of Major Public Health Interventions

We develop a mathematical model to provide epidemic predictions for the COVID-19 epidemic in Wuhan, China. We use reported case data up to 31 January 2020 from the Chinese Center for Disease Control and Prevention and the Wuhan Municipal Health Commission to parameterize the model. From the parameterized model, we identify the number of unreported cases. We then use the model to project the epidemic forward with varying levels of public health interventions. The model predictions emphasize the importance of major public health interventions in controlling COVID-19 epidemics.


Introduction
An epidemic outbreak of a new human coronavirus, termed the novel coronavirus COVID-19, has occurred in Wuhan, China. The first cases occurred in early December, 2019, and, by 29 January 2020, more than 7000 cases had been reported in China [1]. Early reports advise that COVID-19 transmission may occur from an infectious individual, who is not yet symptomatic [2]. Evidently, such asymptomatic infectious cases are not reported to medical authorities. For epidemic influenza outbreaks, reported cases are typically only a fraction of the total number of the symptomatic infectious individuals. For the current epidemic in Wuhan, it is likely that intensive efforts by Chinese public health authorities have reduced the number of unreported cases.
Our objective is to develop a mathematical model, which recovers from data of reported cases, the number of unreported cases for the COVID-19 epidemic in Wuhan. For this epidemic, a modeling approach has been developed in [3], which did not consider unreported cases. Our work continues the investigation in [4,5] of the fundamental problem of parameter identification in mathematical epidemic models. We address the following fundamental issues concerning this epidemic: How will the epidemic evolve in Wuhan with respect to the number of reported cases and unreported cases? How will the number of unreported cases influence the severity of the epidemic? How will public health measures, such as isolation, quarantine, and public closings, mitigate the final size of the epidemic?

The Model and Data
Our model consists of the following system of ordinary differential equations: Here, t ≥ t 0 is time in days, t 0 is the beginning date of the epidemic, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported symptomatic infectious individuals (i.e., ,symptomatic infectious with sever symptoms) at time t, and U(t) is the number of unreported symptomatic infectious individuals (i.e., symptomatic infectious with mild symptoms) at time t. This system is supplemented by initial data (2) Figure 1 is the flow chart of the model (1). The parameters of the model are listed in Table 1.  We use three sets of reported data to model the epidemic in Wuhan: First, data from the Chinese CDC for mainland China (Table 2), second, data from the Wuhan Municipal Health Commission for Hubei Province (Table 3), and third, data from the Wuhan Municipal Health Commission for Wuhan Municipality (Table 4). These data vary but represent the epidemic transmission in Wuhan, from which almost all the cases originated in the larger regions.

Comparison of Model (1) with the Data
For influenza disease outbreaks, the parameters τ, ν, ν 1 , ν 2 , η, as well as the initial conditions S(t 0 ), I(t 0 ), and U(t 0 ), are usually unknown. Our goal is to identify them from specific time data of reported symptomatic infectious cases. To identify the unreported asymptomatic infectious cases, we assume that the cumulative reported symptomatic infectious cases at time t consist of a constant fraction along time of the total number of symptomatic infectious cases up to time t. In other words, we assume that the removal rate ν takes the following form: ν = ν 1 + ν 2 , where ν 1 is the removal rate of reported symptomatic infectious individuals, and ν 2 is the removal rate of unreported symptomatic infectious individuals due to all other causes, such as mild symptom, or other reasons.
The cumulative number of reported symptomatic infectious cases at time t, denoted by CR(t), is Our method is the following: We assume that CR(t) has the following special form: We evaluate χ 1 , χ 2 , χ 3 using the reported case data in Tables 2-4. We obtain the model starting time of the epidemic t 0 from (4): We fix S 0 = 11.081 × 10 6 , which corresponds to the total population of Wuhan. We assume that the variation in S(t) is small during the period considered, and we fix ν, η, f . By using the method in Section 4, we can estimate the parameters ν 1 , ν 2 , τ and the initial conditions U 0 and I 0 from the cumulative reported cases CR(t) given (4). We then construct numerical simulations and compare them with data. The evaluation of χ 1 , χ 2 , χ 3 , and t 0 , using the cumulative reported symptomatic infectious cases in Tables 2-4, is shown in Table 5 and in Figure 2 below. Table 5. Estimation of the parameters χ 1 , χ 2 , χ 3 , and t 0 by using the cumulated reported cases in Tables 2-4.  Tables 2-4, the time t = 0 will correspond to 31 December. Thus, in Table 5, the value t 0 = 5.12 means that the starting time of the epidemic is 5 January, the value t 0 = −2.45 means that the starting time of the epidemic is 28 December, and t 0 = −4.52 means that the starting time of the epidemic is 26 December. In the left side figures, the dots correspond to t → ln (CR(t) + χ 3 ), and in the right side figures, the dots correspond to t → CR(t), where CR(t) is taken from the cumulated confirmed cases in Table 2 (top), in Table 3 (middle), and in Table 4 (bottom). The straight line in the left side figures corresponds to t → ln (χ 1 ) + χ 2 t. We first estimate the value of χ 3 and then use a least square method to evaluate χ 1 and χ 2 . We observe that the data for China in Table 2 and Hubei in Table 3 provides a good fit for CR(t) in (4), while the data for Wuhan in Table 4 does not provide a good fit for CR(t) in (4).

Remark 2.
As long as the number of reported cases follows (1), we can predict the future values of CR(t). For From now on, we fix the fraction f of symptomatic infectious cases that are reported. We assume that between 80% and 100% of infectious cases are reported. Thus, f varies between 0.8 and 1. We assume 1/ν, and the average time during which the patients are asymptomatic infectious varies between one day and seven days. We assume that 1/η, the average time during which a patient is symptomatic infectious, varies between one day and seven days. Thus, we fix f , ν, η. Since f and ν are known, we can compute Moreover, by following the approach described in the supplementary information, we should have and By using the approach described in the supplementary material, the basic reproductive number for model (1) is given by By using ν 2 = (1 − f ) ν and (7), we obtain

Numerical Simulations
We can find multiple values of η, ν and f which provide a good fit for the data. For application of our model, η, ν and f must vary in a reasonable range. For the corona virus COVID-19 epidemic in Wuhan at its current stage, the values of η, ν and f are not known. From preliminary information, we use the values f = 0.8, η = 1/7, ν = 1/7.
In what follows, we plot the graphs of t → CR(t), t → U(t), and t → R(t) for Wuhan by using model (1). We define the turning point t p as the time at which the red curve (i.e., the curve of the non-cumulated reported infectious cases) reaches its maximum value. For example, in the figure below, the turning point t p is day 54, which corresponds to 23 February for Wuhan.
In the following, we take into account the fact that very strong isolation measures have been imposed for all China since 23 January. Specifically, since 23 January, families in China are required to stay at home. In order to take into account such a public intervention, we assume that the transmission of COVID-19 from infectious to susceptible individuals stopped after 25 January. Therefore, we consider the following model: for t ≥ t 0 , The figure below takes into account the public health measures, such as isolation, quarantine, and public closings, which correspond to models (10) and (11). By comparison of Figure 4a with Figure 5, we note that these measures greatly mitigate the final size of the epidemic, and shift the turning point about 24 days before the turning point without these measures.   . We use f = 0.8, η = 1/7, ν = 1/7, and S 0 = 11.081 × 10 6 . The remaining parameters are derived by using (6)- (8). We obtain τ = 4.44 × 10 −08 , I 0 = 3.62 and U 0 = 0.2. The cumulated number of reported cases goes up to 8.5 million people and the turning point is day 54. Thus, the turning point is 23 February (i.e., 54-31).

Discussion
An epidemic outbreak of a new human coronavirus COVID-19 has occurred in Wuhan, China. For this outbreak, the unreported cases and the disease transmission rate are not known. In order to recover these values from reported medical data, we present the mathematical model (1) for outbreak diseases. By knowledge of the cumulative reported symptomatic infectious cases, and assuming (1) the fraction f of asymptomatic infectious that become reported symptomatic infectious cases, (2) the average time 1/ν asymptomatic infectious are asymptomatic, and (3) the average time 1/η symptomatic infectious remain infectious, we estimate the epidemiological parameters in model (1). We then make numerical simulations of the model (1) to prodict forward in time the severity of the epidemic. We observe that public health measures, such as isolation, quarantine, and public closings, greatly reduce the final size of the epidemic, and make the turning point much earlier than without these measures. We observe that the predictive capability of model (1) requires valid estimates of the parameters f , ν, and η, which depend on the input of medical and biological epidemiologists. Our results can contribute to the prevention and control of the COVID-19 epidemic in Wuhan.
As a consequence of our study, we note that public health measures, such as isolation, quarantine, and public closings, greatly reduce the final size of this epidemic, and make the turning point much earlier than without these measures. With our method, we fix η, ν, and f and get the same turning point for the three data sets in Tables 2-4. We choose f = 0.8, which means that around 80% of cases are reported in the model, since cases are very well documented in China. Thus, we only assume that a small fraction, 20%, were not reported. This assumption may be confirmed later on.
We also vary the parameters η, ν, and f , and we do not observe a strong variation of the turning point. Nevertheless, the number of reported cases are very sensitive to the data sets, as shown in the figures. Formula (4) for CR(t) is very descriptive until 26 January for the reported case data for China and Hubei but is not reasonable for Wuhan data. This suggests that the turning point is very robust, while the number of cases is very sensitive. We can find multiple values of η, ν, and f that provide a good fit for the data. This means that η, ν, and f should also be evaluated using other methods. The values 1/η = 7 days and 1/ν = 7 days are taken from information concerning earlier corona viruses, and are used now by medical authorities [2].
The predictive capability of models (1) and (10) requires valid estimates of the parameters f (fraction of asymptomatic infectious that become reported as symptomatic infectious), the parameter 1/ν (average time asymptomatic infectious are asymptomatic), and the parameter 1/η (average time symptomatic infectious remain infectious). In Figure 4, we graph R 0 as a function of f and 1/ν for the data in Table 2, to illustrate the importance of these values in the evolution of the epidemic. The accuracy of these values depend on the input of medical and biological epidemiologists. Figure 6. In this figure, we use 1/η = 7 days, and we plot the basic reproductive number R 0 as a function of f and 1/ν using (9) with χ 2 = 0.38, which corresponds to the data for China in Table 2. If both f and 1/ν are sufficiently small, R 0 < 1. The red plane is the value of R 0 = 4.13.
In influenza epidemics, the fraction f of reported cases may be significantly increased by public health reporting measures, with greater efforts to identify all current cases. Our model reveals the impact of an increase in this fraction f in the value of R 0 , as evident in Figure 6 above, for the COVID-19 epidemic in Wuhan.

Method to Estimate the Parameters of (1) from the Number of Reported Cases
In the following the parameters f , ν and η are fixed.
Step 1: Since f and ν, we know that Step 2: By using Equation (3), we obtain and exp (χ 2 t) exp (χ 2 t 0 ) = I(t) I(t 0 ) , and therefore Moreover, by using (12) at t = t 0 , Step 3: In order to evaluate the parameters of the model, we replace S(t) by S 0 = 11.081 × 10 6 on the right-hand side of (1) (which is equivalent to neglecting the variation of susceptibles due to the epidemic, which is consistent with the fact that t → CR(t) grows exponentially). Therefore, it remains to estimate τ and η in the following system: By using the first equation, we obtain and therefore, by using (13), we must have so, by substituting these expressions into (15), we obtain Remark 3. Here, we fix τ in such a way that the value χ 2 becomes the dominant eigenvalue of the linearized Equation (21) and (I 0 , U 0 ) is the positve eigenvector associated with this dominant eigenvalue χ 2 . Thus, we apply implicitly the Perron-Frobenius theorem. Moreover, the exponentially growing solution (I(t), U(t)) that we consider (which is starting very close to (0,0)) follows the direction of the positive eigenvector associated with the dominant eigenvalue χ 2 .
By dividing the first equation of (16) by I 0 we obtain By using the second equation of (16), we obtain By using (17) and (18), we obtain By using (18), we can compute

Computation of the Basic Reproductive Number R 0
In this section, we apply results in Diekmann, Heesterbeek, and Metz [7] and Van den Driessche and Watmough [8]. The linearized equation of the infectious part of the system is given by The corresponding matrix is A = τS 0 − ν τS 0 ν 2 −η and the matrix A can be rewritten as A = V − S where V = τS 0 τS 0 ν 2 0 and S = ν 0 0 η . Therefore, the next generation matrix is which is a Leslie matrix, and the basic reproductive number becomes By using (19), we obtain and, by using ν 2 = (1 − f ) ν, we obtain