From a scientific perspective, the COVID-19 pandemic has highlighted the crucial role of mathematical and statistical models in providing guidance for health policies. Expressions such as “flatten the curve”, the “apex”, the “plateau” have been widely heard in media and employed by decision makers to explain their choices regarding rules and policies during this critical period. In this short article, we first introduce a simple Susceptible Infected Recovered (SIR) model, in which we adjust a key parameter k standing for a control on the Susceptible-Infected rate, and secondarily the death rate, in order to fit the data of the pandemic in New York (NY) state in March 2020, and provide predictions for a near future. Then, we add a node in the model to take into account the daily fluxes between NY and New Jersey (NJ) states. Note that these two close states, are, up to the day of publication of this article, the most severely hit by the pandemic in United States. Of course, the coupling may be extended to other states. However, in this article, we restrict ourselves to NY and NJ. Accordingly, the main key points of this article are that, (1) it highlights the dynamics and epidemiological characteristics which have been discussed in press and health policies; it highlights qualitatively how lockdown policies have decreased the spread of the virus and provides prediction and explanation of an upcoming apex, (2) it fits real data provided for the New York state and (3) it fits the data of NJ state by considering coupled equations taking into account the daily fluxes between NY and NJ. This provides a quantitative visualization of how the virus may spread from an attractive hot spot (New York City in NY state) towards close states trough the daily fluxes of commuters.

We especially focus on fitting the total number of cases tested positive for COVID-19 as well as the number of deaths in both NY and NJ states. We also give insights in prediction of the number of people needing hospitalization in NY state.

This simple model has classically three classes:

S for susceptible,

I for infected and

R for recovered. Specifically, the class

I is intended to represent all the people who bear effectively the virus at a given time, and can transmit it if in contact with other people. It includes all infected people with or without symptoms, reported or not. There are some differences with Equation (1). First, it includes a death rate

$d\left(t\right)$. Even though the number of deaths does not appear explicitly as a variable, it is simply given by the integral

${\int}_{0}^{t}d\left(u\right)I\left(u\right)du$. Note that in expression

the rate of contamination from

S to

I is proportional to the proportion of susceptible (

S) in the whole population (

$S+I+R$). This is a classical expression standing for the fact that the probability for each individual in the

I class to spread the virus among the class

S is proportional to the portion of

S in the whole population, see for example [

14,

15] and references therein cited. This rate is corrected by a crucial coefficient

$k\left(t\right)$ which is intended to fit the real transfer rate and which contains the effects inherent to the properties of the virus (for example change of propagation rate due to genetic mutation of the virus) or to specific policies (like quarantine, social distancing, lockdown). This time dependence allows us to adjust the dynamics to fit the data. This is a specificity of our model and turns it into a non-autonomous equation. This time dependence of

k is obviously relevant in our model since the rate of transfer from

S to

R is the main target of health policies and is subsequently subject to vary over time. Secondarily, we also allow the death rate to vary. Many internal or external factors may affect the death rate among which are concomitant lethal disease, temperature, hospital conditions. More significantly, one has to note that the rate transfers considered here are instantaneous transfer between compartments, and the function

$d\left(t\right)$ is different from the Case Fatality Rate (CFR). Recall that the CFR is the death rate per confirmed case over a given period of time, and is a typical indicator for death rate. In South Korea, the country which led the highest number of tests, it has been reported to be of 1 percent, see [

16]. In China as of 20 February, this rate varied between 3.8 in the region of Wuhan and 0.7 in others regions, see [

17] and also [

18]. Using

Table 1 and

Table 2 gives a CFR in NY from 1 March to 1 April of

$\frac{1941}{83804}\simeq 2.3$ per cent. Since our model fits the data, by definition, it fits the CFR. At the end of the epidemic, if the whole number of people that contracted the virus would be tested positive the CFR would provide the probability to die for an individual who catches the virus. However, during a growing phase such as the month of March considered here, the CFR has large variations. Moreover, there is a delay between the time a person is tested positive and the time he dies. The time between symptom onset and death has been reported to range from about 2 to 8 weeks, see [

18], the typical average being 23 days according to [

16]. One could for example introduce a time-integral death expression

${\int}_{0}^{t}\tilde{d}(u,t)I\left(u\right)du$ to take into account these informations. However, the time-window considered here is short and corresponds to the beginning of reported cases in NY. Furthermore, in this short article, we wanted to focus on a simple model able to fit data, highlight relevant dynamics and provide estimations. Since a person in the

I compartment will either recover or die, above remarks on the function

d hold for the coefficient

r. In the present work, for sake of simplicity, we have set the

r coefficient to the constant value

$0.64$. This is a simplification which is classical in SIR models. Note that setting the coefficient

r to a constant value is equivalent to assume that people recovering between times

$t-1$ and

t, which is given by

$R\left(t\right)-R(t-1)$ would also write

$r{\int}_{t-1}^{t}I\left(u\right)du$ according to Equation (2). A more meaningful expression for

$R\left(t\right)-R(t-1)$ would be

${\int}_{0}^{t}\tilde{r}(u,t)I\left(u\right)du$ standing to the fact that people recovering between time

$t-1$ and

t have been infected from a period ranging from 0 to

t, with a transfer rate given by

$\tilde{r}(u,t)$. This would lead to the equation

$r=\frac{{\int}_{0}^{t}\tilde{r}(u,t)I\left(u\right)du}{{\int}_{t-1}^{t}I\left(u\right)du}$. Once

r is set to a constant value, to fit data over a reasonable period, numerical tries provide a unique choice for constants

k and

d. It was still possible to use different values of

r to fit the data. However, different values of

r result in different dynamics over time, and notably different time for apex. Our choice of

$r=0.64$ was made to provide dynamics that seem relevant to us regrading the timely dynamics beyond the data. In particular, too smaller of values for

r would provide dynamics with a late apex and less relevant regarding the timely effects of the disease. Other studies have considered models with a constant

r, with different values. See for example [

19] and references therein cited. Remarkably, varying

r, and making it depend on time, provides some freedom to later fit data over a longer period of time, taking in account the end of the first wave in NY.