1. Introduction
In December 2019, China reported 27 cases of pneumonia of unknown origin. The agent causing this pneumonia was identified as a new virus named SARS-CoV-2. The disease associated with this virus has been called COVID-19. On March 11, the World Health Organization (WHO) declared the pandemic as global. Since the beginning of the epidemic to the date of this paper (April 20), more than two and a half million cases have been reported worldwide, and more than 200,000 in Spain.
Unlike other pandemics that originated in less economically developed nations, COVID-19 has emerged in several of the world′s largest economies, spreading across the globe causing enormous disruption to economic, working, and social life. To analyze the impact on healthcare economics and policies, accurate and reliable data are essential.
As has been recognized by the health authorities themselves, the official data on COVID-19 are incomplete and inconsistent. Reported values are far from what may be obtained with mathematical modeling [
1,
2]. For that reason, we have elaborated our data based on more reconcilable information, the mortality. According to the daily monitoring of mortality in Spain report from the Ministry of Justice (MoMo), there has been an unexpected increment of mortality figures when compared with the same period in 2019 [
3]. Given that there are currently no other epidemic diseases, we assume that this increment corresponds to undetected cases of COVID-19, directly or indirectly. Based on this hypothesis, we estimated the expected casualties, CFR (case fatality rate), IFR (infectious case fatality rate), and the percentage of population which may result in positive serological testing for COVID-19 in Spain, using a mathematical model, and a meta-analysis with easy obtainable data from previous reports. This mathematical model is applied to the infectious outbreak occurring in Spain starting on March 9. This proceeding may be especially useful for making estimates in the event of a disease outbreak in the coming months. This estimation is used as a guide to consider the convenience of health policies based on herd immunity in Spain.
2. Materials and Methods
Two different mathematical procedures are used. The first one starts on the date with more than 30 new casualties per day and extends to the calculated point with less than 30 new deaths per day, assuming neither significant changes in the evolution process nor rebounds and a lockdown of 45 days. The number of deaths at a given time t (today) is the sum of the past infections weighted by their probability of death, where the probability of death depends on the number of days since infection.
Using a previous reported methodology [
4], the expected number of deaths, 𝑑𝑡, on a given day
t, is given by the following discrete sum:
where 𝑐
τ is the number of new infections on day
τ and where
is the probability of death for day
t for those getting infected on day
τ; it may be discretized via
for
s =
t–τ for s = 2, 3, …ending when the daily casualties are <30, and
In forecasting new outbreak data at a time t, i(t), it could be interesting to consider the use of time-related exponential growth rate (r) models [
5].
where
i0 is the expected number of infected cases at time
t = 0.
The cumulative incidence
I(t) is the integral of
i(t) over the period 0
– tThe cumulative incidence may be adjusted to the date of report by a factor
u dependent on the parameters of the delay distribution. For estimating the distribution of time delay from onset of disease to death, the authors have used correct truncation and modeling of a log-normal distribution [
6].
In case of a log-normal distribution
, with parameters
, the factor
is the multiplying parameter for adjustment of
I(t) by date of report
t, to the time from onset to death. The factor
u results from [
5]:
It is also possible to evaluate the effects of lockdown with transmission models using a Bayesian framework and jointly infer parameters, as have done the French Pasteur group [
7] (p.12), in our case for 45 days of lockdown. Other models may analyze the serial interval (the time between symptom onset of a primary and secondary cases). However, due to the uncertainties about real infected cases, this mathematical approach has been used only as a test to confirm if the approximate data obtained with the meta-analysis were consistent.
3. Results
According to information from MoMo [
3], unexpected versus real death values reported for the period from March 17 to April 18 were respectively 25,907 and 63,676 (all-death causes), representing 40.7% of the total (
Table 1). The expected value of 37,769 is consistent with the National Institute of Statistics of Spain (INE) 2018 [
8] report of a daily average of 1172 deaths. COVID-19 deaths reported by the Ministry of Health (corresponding to April 18) were 20,043 cases (Ministry of Health, Spain, daily release information:
https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov-China/home.htm (accessed April 19, 2020)).
With the adjusted values of casualties of 45 days, starting on March 9, a prediction curve for the estimated period of pandemic has been elaborated (
Figure 1).
The evolution follows a right skewed curve (skewness = 0.53, Kurtosis = −1.10) with mean = 9.4058 and standard error of the mean (SEM) = 1.003517915. This study is point-estimate focused on cumulative data (casualties) at the end of the outbreak (defined in this study as less than 30 new daily deaths in Spain).
If the total predicted period of this (first) outbreak of 69 days is correct (
Figure 1), the adjusted overall total mortality in Spain would be about 30,568 cases (27,307–33,830). This would correspond to overall casualties for this outbreak of 649 per million population (PMP) (0.0649%). [
9]
The estimation for regions (CCAA), based on the relative ratio of reported casualties PMP, is presented in
Table 2, including total number of casualties and case fatality rate adjusted PMP. This computation assumes a CFR equally distributed among the regions, and only dependent on population, something unproved at present time.
For the meta-analysis procedure to estimate seroprevalence, it is necessary to determine the proportion of symptomatic versus asymptomatic patients. The two key figures, CFR and IFR, are forecasted based on previously available reports.
The report from the Diamond Princess cruise ship, where an outbreak occurred and which was quarantined from January 20 to February 29, 2020 [
10] is a very revealing paper. From a total of 3711 people on board (passengers and crew), 705 became sick (19.0%), and seven died (a casualty in the sample of 0.003%). On February 20, 2020, 3063 PCR tests were performed with a positive result of a total of 634 people (20.7%), with 476 of them over 60 years. Of the 634 confirmed cases, approximately half of them where asymptomatic (306). Of these 634 cases, 313 cases were female. The age pattern was: 6 aged 0–19 years (0.94%), 152 aged 20–59 years (23.98%) and 476 aged 60 years and older (75.08%), with a genetic diversity from 28 countries [
11]. The data were statistically modeled, and observations treated as survival data with right censoring. The probability of being asymptomatic once infected and the infection time for each case were estimated using a Hamiltonian Monte Carlo algorithm. The estimated total number of true asymptomatic cases was 113.3 (95% credible interval: 98.2–128.3%) and the estimated asymptomatic proportion among all infected cases 17.9% (95% credible interval: 15.5–20.2%). The results are summarized in
Table 3.
This reported probability of death following the development of symptoms (1.35%) is very close to that published in China, analyzing 79,394 cases, with 1.4% (0.9–2.1%) [
12] and to the results (1.6%) in another report [
13]. Starting from the death figure, this allows us to estimate the number of symptomatic cases. However, as seen, the crude mortality percentage is about 4 times higher than that previously found for the general population of 0.06%. This underlines the importance of age band. As mentioned, more than 75% of the PCR-detected cases were aged 60 or older. All 7 deaths reported from the Diamond Princess cruise were patients of 70 years or more [
14].
In the second study used in the meta-analysis, one from China with 72,314 case records [
15], CFR increased to 8% in patients aged 70–79 years, and 14.8% in patients aged 80 or more years, with an overall CFR of 2.3%. Other papers report CFR values around 5% [
7,
16,
17]. Once more, the age pyramid is of paramount importance in mortality (
Table 4), as is access to ICU, particularly in aged people.
In the stepwise process, it must be considered that age, gender, and comorbidity, particularly cardiovascular, play important roles in the final CFR result. Oke and co-workers reported data from the Italian Health Ministry scientific adviser (Professor Walter Ricciardi) indicating that 88% of Italian death certificates related to COVID-19 included at least one pre-morbidity and frequently two or three [
18]. Consequently, the evaluation of age, gender, and comorbidity profile becomes crucial when comparing different data series. The Italian series dated March 26, including 73,780 cases [
19], provide CFR information closer to our Mediterranean society (
Table 5):
As to better include age band in future estimations, the report from The Centre for Evidence-Based Medicine [
18] reproduces the whole series of the Italian report [
19] describing the statistics as «a grouped-binomial logistic regression with log-link function with main effects for age-band and sex (no two-way interaction terms). Deviance statistic is 30.9 on 6 degrees of freedom» (para.16) providing a table of risk ratio (
Table 6) taking age band 60–69 as a reference.
4. Discussion
During the COVID-19 outbreak, data reported by the authorities has proved to be inconsistent. We have estimated an under-reporting of the number of deaths of 29%, close to what has been found in the UK with inconsistencies of about 24% [
20], and in other countries [
21,
22]. A median time delay of 13 days from illness onset to death (17 days with right truncation) [
6], and the median basic reproduction number (
Ro) 4–6, not far from (2–5), found in other SARS viruses such as the Singapore outbreak [
5,
23,
24], have been reported. The casualties we found for the period studied (649 PMP) are also consistent with other reports [
9], and the characteristics of the curve, including a period of about 10 weeks, is consistent with an RNA virus pattern [
25].
In cases of incomplete information, such as in the COVID-19 outbreak, death rate may provide the more reliable information to begin with, but one of the important points when comparing different fatality ratios is to analyze data adjusted to the corresponding age band. In this regard, we propose to include an additional reference index, the age-adjusted case fatality ratio (aaCFR), based on risk ratio, setting an age band (e.g., 60–69) as reference. Taking the age bands into consideration, the estimation of the evolution of casualties may be more precise. Once the number of casualties has been determined with the mathematical model, the meta-analysis using available data from the literature allows the estimation of CFR, IFR, and seroprevalence.
Based on the above-mentioned reports, the CFR in Spain could be 4–7%, which is half to one third of the 15% CFR reported by WHO for SARS [
26] (p.10), but according to the report, «global case-fatality ratio of 11% was recorded at the end of the outbreak». Consequently, it may be that both SARS outbreaks are not that different in fatality rate.
In the study of the Diamond Princess cruise ship, the IFR result of about half the CFR [
10,
11], (CFR 2.3% (CI 95% 0.75–5.3%), and IFR 1.2% (CI 95% 0.38–2.7%)), data from Wuhan, and other reports, including WHO [
27,
28], allow us to forecast that IFR is about half of CFR. This ratio is supported by a computation of data, using Bayesian Markov-chain Monte Carlo methodology, in an age-stratified CFR and IFR model, which resulted in an (adjusted) IFR/CFR ratio close to 0.5 (0.478) [
29]. The IFR could be analyzed by predicting attack-rate for age groups [
4], but as the main interest here was to move from mortality data to overall population affected in order to evaluate the gross number of possible infected patients, this age-band analysis is not essential.
It is difficult to make a comparison with influenza A (H1N1), as a review of 77 CFR estimations from 50 studies showed a substantial heterogeneity in ranking, from less than 1 to more than 10,000 deaths per 100,000 cases or infections [
30]. The official report of the Spanish Surveillance flu group computed a CFR of 0.43 deaths per 1000 cases for the 2009 (H1N1) pandemic [
31].
The number of infected patients, assuming most of them will develop herd immunity (natural immunity)—something far from being proved—could be a gross indirect index of the extension and severity of future outbreaks.
Approximating mostly to the Italian report, a crude CFR of 10% over the 30,568 casualties estimated by mathematical modeling at the end of the outbreak will represent a crude IFR value of 5% at most, or around 0.6 million infected patients including both symptomatic and asymptomatic cases (1.3% of a Spanish population of 47.1 million), a figure about half the lowest range of the prediction by the Imperial College Report [
4] for Spain. If the WHO lower estimation of CFR/IFR is considered (1/3), then the value is 0.4 million infected patients (0.87% tests will result positive).
Another indirect and approximate estimate of the highest percentage of seropositive cases can be obtained based on the number of hospitalized patients in Spain, whose percentage in relation to the total number of declared cases is 55% [
32]. Let us assume that this percentage is not the result of a health policy and protocols in Spain (or Italy) different from those of China and other countries, but rather the result of incomplete information on cases. Let us assume that the number of hospitalized patients corresponds in reality to only 15–20% of the infected population, in line with the WHO report [
28]. Over an estimation of about 225,000 cases estimated at the end of this outbreak (in the terms of less than 30 new deaths per day as mentioned), this will represent approximately 123,000 hospitalized patients; if this value is only 15–20% of total symptomatic cases, it will mean 819–615 thousand cases. Taking the highest value and the highest ratio of symptomatic versus asymptomatic cases (0.5), that extreme limit would give an estimation of about 1.2 million patients who have come into contact with the virus (either with or without symptoms), and assuming all their tests will be positive, this upper value estimation represents only 2.6% of the population.
Consequently, serologic analysis is expected to show immunity about 0.87–1.3% of the population, a value close to other preliminary studies, such as the one from Stanford University in Santa Clara with 3324 cases and a result of 1.5% (exact binomial 95% CI 1.1–2.0%). Their test performance specificity was 99.5% (95% CI 99.2–99.7%) and sensitivity s 82.8% (95% CI 76.0–88.4%). The unweighted prevalence adjusted for test performance characteristics was 1.2% (95% CI 0.7–1.8%). After weighting for the population demographics of Santa Clara County, the prevalence was 2.8% (95% CI 1.3–4.7%), using bootstrap to estimate confidence bounds [
33]. In our case, less than 3%.
A value about 2–3% positive results in testing the general population has been suggested as a realistic result by the General Director of WHO, Dr Tedros Adhanom Ghebreyesus [
27], and this seems to be congruent with the first results obtained in the Netherlands study. Higher percentages (14%), such as reported in the German study with a limited sample of 500 subjects in Heinsberg [
9], have been criticized as possible false caveats. The low rate of casualties (0.37%) reported in that study is also to be noted, far from the overall reported fatality rate in Germany of 2% [
34]. There is also a surprising ratio of infected people (2%) versus those with antibodies (14%), discrepant with other reports (as commented above) suggesting asymptomatic patients to be about 1/3 or 1/2 of the number of cases with clinical symptomatology.