Analysis of Reproduction Number R0 of COVID-19 Using Current Health Expenditure as Gross Domestic Product Percentage (CHE/GDP) across Countries

(1) Background: Impact and severity of coronavirus pandemic on health infrastructure vary across countries. We examine the role percentage health expenditure plays in various countries in terms of their preparedness and see how countries improved their public health policy in the first and second wave of the coronavirus pandemic; (2) Methods: We considered the infectious period during the first and second wave of 195 countries with their current health expenditure as gross domestic product percentage (CHE/GDP). An exponential model was used to calculate the slope of the regression line while the ARIMA model was used to calculate the initial autocorrelation slope and also to forecast new cases for both waves. The relationship between epidemiologic and CHE/GDP data was used for processing ordinary least square multivariate modeling and classifying countries into different groups using PC analysis, K-means and hierarchical clustering; (3) Results: Results show that some countries with high CHE/GDP improved their public health strategy against virus during the second wave of the pandemic; (4) Conclusions: Results revealed that countries who spend more on health infrastructure improved in the tackling of the pandemic in the second wave as they were worst hit in the first wave. This research will help countries to decide on how to increase their CHE/GDP in order to properly tackle other pandemic waves of the present COVID-19 outbreak and future diseases that may occur. We are also opening up a debate on the crucial role socio-economic determinants play during the exponential phase of the pandemic modelling.


Introduction
Among the main economic indicators used there is the CHE/GDP index, which is the percentage of the gross domestic product (GDP, equal to the total monetary or market value of all the finished goods and services produced within a country's borders) devoted to the health expenditure by a country (available on the World bank website [1]). This index is high for developed countries (except Japan) and it has been proved that it was correlated to the Gini's index, which measures the degree of inequality in the distribution of income in a country: to summarize a rich developed or developing country having a large gap between incomes of the richest and poorest parts of the population spends a lot on health both on high-tech care for the rich (usually in the privatized part of the health system) and on essential care for the often unhealthy poor. A poor developing country having a weak Gini's index spends, more rationally in general, spending for its middle and poor classes. In this article, we will seek to see the relationships between the socio-economic index CHE/GDP and the spread of COVID-19 in countries where the corresponding data are available from the Worldbank [1] and Worldometers [2] websites. 2 of 24 So far, most countries have experienced at least two peaks of the COVID-19 pandemic and it is necessary to look at both waves and then derive the best conclusion on the efficacy of outlook during these both waves. Health officials, scientists and those involved in the modelling of the pandemic have made a lot of suggestions from the day the first case has been recorded in Wuhan, China. Current health expenditure as gross domestic product percentage (CHE/GDP) is key to different countries' preparedness to respond for curtailing the pandemic even though it is general belief that no one was prepared during the first wave of the pandemic as most developed nations were worst hit and the death toll increased exponentially.
Our goal is to correlate the maximum basic reproduction number R 0 of both waves with CHE/GDP. In order to holistically approach this subject, we used many diverse regression tools and also developed some clustering strategies across all countries considered. The results are key in order to protect lives and improve health infrastructure in the future even though we know that the pandemic is still evolving in different countries.

Materials: The Variables
The variables used for this research are seven in total. The maximum basic reproduction number R 0 for first and second waves is chosen during the exponential phase of all countries considered. The exponential and autocorrelation slopes are calculated using 100 days from the start of a wave depending on the date a particular country recorded their first case between February and August 2020 while also 100 days was used to calculate for the second wave between 15 October 2020 to 22 January 2021 for all countries considered. The opposite of the initial autocorrelation slope was averaged on six days. CHE/GDP was collated from World Bank data [1]. The deterministic R 0 was drafted from previous research [3] and it was calculated as the Malthusian growth parameter during the exponential phase of both waves across countries. The daily new cases were drafted from Worldometers ® [2] and Renkulab ® [4] databases and processed using Python ® facilities [5].

Exponential and ARIMA Model
The exponential model is given as y = a10 bx , where y is the daily number of new cases, x is the number of days, b is the slope and a is a constant, and the log format can be written as logy = loga + bx.
ARIMA modelling has been introduced by N. Wiener for prediction and forecasting [5]. Its parametric approach assumes that the underlying stationary stochastic process of the COVID-19 new daily cases N(t) can be described by a small number of parameters using the autoregressive ARIMA model N(t) = Σ i=1,s a(i) N(i) + W(t), where W is a random residual with the aim being to minimize its variance. The autocorrelation analysis is done by calculating the correlation A(k) between the N(t)'s and the N(t − k)'s (t belonging to a moving time window) by using the formula: where E denotes the expectation and σ the standard deviation. The autocorrelation function A allows examining the serial dependence of the N(t)'s. We used the ARIMA form of (6, 1, 0), we have shown it was the best for the modelling of the COVID-19 outbreak [6].

Clustering Methodology
Clustering is a branch of machine learning which is called 'unsupervised learning' and is frequently utilized to classify biomedical data. We used three classical clustering methods, K-means, PCA (principal component analysis) and hierarchical clustering [6]. K-means clustering chooses a priori the number of clusters and starts out with random centroids while hierarchical clustering starts with every point in dataset as a cluster, then finds the two closest points and combines them into clusters, the process being repeated until appears a big giant cluster and it then creates a dendrogram.
Principal component analysis (PCA) also helps to cluster data points and it is also one of dimension reduction techniques because each variable has a different dimension. It allows us to summarize and visualize the information in a data set described by multiple inter-correlated variables. PCA is used to extract the important information from variables in the dataset and to express this information as a set of few new variables called principal components (PC's).

Linear and Polynomial Regression
Linear regression models use some historic data (100 days infectivity period in our case) of independent and dependent variables (CHE/GDP) and consider a linear relationship between both while polynomial regression models use a similar approach but the dependent variable is modeled as a degree n (6 ≥ n ≥ 2) polynomial in x.

Multivariate Ordinary Least Square Method
Multivariate least squares method allows us to test much more complex relations between variables. It can be can be represented as follows: where β 1 , β 2 , · · · are coefficients or weights, ∈ is the residual noise, y is the dependent variable and x 1 , x 2 , · · · are the independent variables.

Parabolic and Cubic Regression
The meaning of the abbreviations used in Figure 1 is the following: LinregressResult slope = slope of the linear regression, intercept = ordinate at origin of the regression curve, r value = correlation coefficient, p value = p value of the nullity test of correlation coefficient, stderr = standard error of the regression, RMSE = root of mean square error.
Healthcare 2021, 9, x 3 of 32 centroids while hierarchical clustering starts with every point in dataset as a cluster, then finds the two closest points and combines them into clusters, the process being repeated until appears a big giant cluster and it then creates a dendrogram. Principal component analysis (PCA) also helps to cluster data points and it is also one of dimension reduction techniques because each variable has a different dimension. It allows us to summarize and visualize the information in a data set described by multiple inter-correlated variables. PCA is used to extract the important information from variables in the dataset and to express this information as a set of few new variables called principal components (PC's).

Linear and Polynomial Regression
Linear regression models use some historic data (100 days infectivity period in our case) of independent and dependent variables (CHE/GDP) and consider a linear relationship between both while polynomial regression models use a similar approach but the dependent variable is modeled as a degree n (6 ≥ n ≥ 2) polynomial in x.

Multivariate Ordinary Least Square Method
Multivariate least squares method allows us to test much more complex relations between variables. It can be can be represented as follows: where β 1 , β 2 , ⋯ are coefficients or weights, ∈ is the residual noise, y is the dependent variable and x 1 , x 2 , ⋯ are the independent variables.

Parabolic and Cubic Regression
The meaning of the abbreviations used in Figure 1 is the following: LinregressResult slope = slope of the linear regression, intercept = ordinate at origin of the regression curve, r value = correlation coefficient, p value = p value of the nullity test of correlation coefficient, stderr = standard error of the regression, RMSE = root of mean square error. Figure 1 aims to show that classical linear and polynomial regressions (parabolic for Graphs (a) and (b) and cubic for Graph (c)) between the opposite of the slope at the origin of the autocorrelation function of the ARIMA model and successively the slope of the logarithmic regression line of the new daily cases of COVID-19 of the first wave (a), then that of the second wave (b), and finally the number of days since the start of the outbreak (c).
The curves show a different behavior between the two waves (a) and (b), probably due to an increase in the contagion parameter, the basic reproduction number R0 (linked to the Malthusian parameter of the exponential growth phase), despite a shortening of the duration of contagiousness (linked to the slope at the origin of the autocorrelation function, which is all the stronger as the distance from the start of the epidemic increases, no doubt because of the mitigation measures, which decrease the duration of the contagiousness period).

Quartic Regression
We have used in Figure 2, a polynomial of degree 4 for obtaining a fit showing a minimum for the value of the maximum R0 equal to 3.5, which is considered as the observed value for the maximal effective reproduction number at start of the first wave in many developed countries (France, Germany, Switzerland, UK, USA, etc.) [4], which corresponds to the fact that the opposite of the initial autocorrelation slope (indicating that the length of the contagiousness is short when the absolute value of the slope is high) decreases (the contagiousness duration increases) when the maximum R0 increases, which seems logical.  The curves show a different behavior between the two waves (a) and (b), probably due to an increase in the contagion parameter, the basic reproduction number R 0 (linked to the Malthusian parameter of the exponential growth phase), despite a shortening of the duration of contagiousness (linked to the slope at the origin of the autocorrelation function, which is all the stronger as the distance from the start of the epidemic increases, no doubt because of the mitigation measures, which decrease the duration of the contagiousness period).

Quartic Regression
We have used in Figure 2, a polynomial of degree 4 for obtaining a fit showing a minimum for the value of the maximum R 0 equal to 3.5, which is considered as the observed value for the maximal effective reproduction number at start of the first wave in many developed countries (France, Germany, Switzerland, UK, USA, etc.) [4], which corresponds to the fact that the opposite of the initial autocorrelation slope (indicating that the length of the contagiousness is short when the absolute value of the slope is high) decreases (the contagiousness duration increases) when the maximum R 0 increases, which seems logical.

Sextic Regression
We studied the correlation between the value of the opposite of the slope at the origin of the autocorrelation function of the first wave and the economic and health index CHE/GDP, by studying a polynomial regression of degree 6 ( Figure 3). It shows an anticorrelation in the linear regression and a local maximum for countries with an average CHE/GDP ratio of around 7. Countries with a high CHE/GDP ratio (such as France and the United States) have a low value of l in opposite to this slope. The explanation for this phenomenon may come from the correlation reported in the introduction between the CHE/GDP and Gini indices, the poor classes having a longer duration of contagiousness due to a less important state of immunological defense and perhaps less compliance with mitigation measures.

Sextic Regression
We studied the correlation between the value of the opposite of the slope at the origin of the autocorrelation function of the first wave and the economic and health index CHE/GDP, by studying a polynomial regression of degree 6 ( Figure 3). It shows an anticorrelation in the linear regression and a local maximum for countries with an average CHE/GDP ratio of around 7. Countries with a high CHE/GDP ratio (such as France and the United States) have a low value of l in opposite to this slope. The explanation for this phenomenon may come from the correlation reported in the introduction between the CHE/GDP and Gini indices, the poor classes having a longer duration of contagiousness due to a less important state of immunological defense and perhaps less compliance with mitigation measures.

Developed and Developing Countries
The correlation between the first wave exponential regression slope and the CHE/GDP index for developed and developing countries is significantly positive (R = 0.57) on Figure 4. Figure 5 shows the same but for developed countries only with a still higher correlation (R = 0.65).

Developed Countries
The correlation between the first wave exponential regression slope and the CHE/GDP index for developed and developing countries is significantly positive (R = 0.57) on Figure 4.

Developed and Developing Countries
The correlation between the first wave exponential regression slope and the CHE/GDP index for developed and developing countries is significantly positive (R = 0.57) on Figure 4. Figure 5 shows the same but for developed countries only with a still higher correlation (R = 0.65).

Developed and Developing Countries
The correlation between the first wave exponential regression slope and the CHE/GDP index for developed and developing countries is significantly positive (R = 0.57) on Figure 4. Figure 5 shows the same but for developed countries only with a still higher correlation (R = 0.65).

Developed Countries
The correlation between the first wave exponential regression slope and the CHE/GDP index for developed and developing countries is significantly positive (R = 0.57) on Figure 4. Figure 5 shows the same but for developed countries only with a higher correlation (R = 0.65).

All Countries
Figures 4 and 5 show a positive correlation between the slope of the logarithmic regression curve of the new cases of COVID-19 as a function of time (a sign of rapid growth of the epidemic if it is high) and the economic index CHE/GDP. This is true when we observe the developed and developing countries (Figure 4) or the developed countries alone for which the positive correlation is higher, the correlation coefficient de correlation increasing from 0.57 to 0.65 (Figure 5a), but this trend is reversed for the second wave (Figure 5b), where the correlation coefficient equal −0.57, this being possibly due the early implementation of mitigation measures in developed countries, reducing the exponential growth of new cases in the second wave. This trend is confirmed in the study of the correlation between the slope of the logarithmic regression and the maximum R 0 (Figure 5c,d), which increases during the second wave in developed countries (the correlation coefficient rising from 0.33 to 0.44), showing a growth of the new cases more brutal, but shorter, undoubtedly due to the establishment of a faster and more effective lockdown. This correlation coefficient for the first wave remains for all countries close to that for developed countries ( Figure 6).

ARIMA Model for First and Second Wave
The ARIMA model shows more than 95% confidence interval as it can be seen in Figures 7a to 7d with p value for Mali for first wave is p = 0.01 and for second wave it is p = 6.3 −10 while for first wave for Slovenia p = 0.01 and for second wave in Luxembourg p = 0.01.

First wave ARIMA Model
The comparison during the first wave between two countries (Figure 7), one developed (Luxembourg) and one developing (Mali) shows a difference in length of contagiousness period (linked to the value of the opposite to the slope at origin of the autocorrelation function) and shape of the growth curve, indicating a lower virulence of the SARS Cov-2 in Mali, possibly due to the influence of the temperature [7]. This tendency is re-

ARIMA Model for First and Second Wave
The ARIMA model shows more than 95% confidence interval as it can be seen in Figure 7a-d with p value for Mali for first wave is p = 0.01 and for second wave it is p = 6.3 ×10 −10 while for first wave for Slovenia p = 0.01 and for second wave in Luxembourg p = 0.01.

First Wave ARIMA Model
The comparison during the first wave between two countries (Figure 7), one developed (Luxembourg) and one developing (Mali) shows a difference in length of contagiousness period (linked to the value of the opposite to the slope at origin of the autocorrelation function) and shape of the growth curve, indicating a lower virulence of the SARS Cov-2 in Mali, possibly due to the influence of the temperature [7]. This tendency is reversed during the second wave between Mali and Slovenia ( Figure 8).

ARIMA Model Forecast for First and Second Wave
The forecast using the ARIMA method shows a good retrospective adjustment to past data, but a weak predictive power of the future trend of new cases, in particular for the prediction of the entry into the endemic phase after an epidemic wave (Figure 9).

Clustering of Countries from Epidemic and Economic Variables
The hierarchical clustering allows developed and developing countries to be grouped into 5 separate clusters ( Figure 10) and Principal Component Analysis into 3 separate clusters (Figure 11), one being a singleton corresponding to Spain (Figure 12).

ARIMA Model Forecast for First and Second Wave
The forecast using the ARIMA method shows a good retrospective adjustment to past data, but a weak predictive power of the future trend of new cases, in particular for the prediction of the entry into the endemic phase after an epidemic wave ( Figure 9).

Hierarchical Clustering
The hierarchical clustering allows developed and developing countries to be grouped into 5 separate clusters ( Figure 10) and Principal Component Analysis into 3 separate clusters (Figure 11), one being a singleton corresponding to Spain (Figure 12).     of the PCA's plot on the first PC plane with more developed countries in green and more developing in orange. (c) Explained variance plot. (d&e) Correlation circles for the two first PC planes.

Ordinary Least Square Method. The Multivariate Case.
The clustering of the countries from epidemic and economic variables is described in ures 10 to 13 and shows several features: 1. The hierarchical clustering (Figures 10 and 11b)  3. The analysis of parallel coordinates for cluster centroids also shows the portance of the deterministic R0 in the discrimination of clusters ( Figure 12); 4. The analysis of the residuals shows a good explanatory power of the first th principal components (60% of the total variance in Figure 11c, confirmed by projections on the two first principal planes of Figures 11d and 11e), and a w correlation of the principal components with these residuals (Figures 13a  13b).

Ordinary Least Square Method. The Multivariate Case
The clustering of the countries from epidemic and economic variables is described in Figures 10-13 and shows several features:

1.
The hierarchical clustering (Figures 10 and 11b) shows a trend common to developed countries (shown in green), with the notable exception of Germany and Czechia, 2.
The principal component analysis shows the importance of the CHE/GDP index in the first principal component (Figure 11a,c,d) and of the deterministic R 0 (R 0 det ) of the exponential phase of the first wave in the second principal component and of the second wave in third principal component ( Figure 11e); 3.
The analysis of parallel coordinates for cluster centroids also shows the importance of the deterministic R 0 in the discrimination of clusters ( Figure 12); 4.
The analysis of the residuals shows a good explanatory power of the first three principal components (60% of the total variance in Figure 11c, confirmed by the projections on the two first principal planes of Figure 11d,e), and a weak correlation of the principal components with these residuals (Figure 13a

Discussion
There are a lot of differences between the first and second wave results concerning the exponential regression slope and the autocorrelation initial slope: while some countries have higher figures for the first wave, others have lower figures for the second wave and vice versa. This was also evident for the regression plot where some countries have

Discussion
There are a lot of differences between the first and second wave results concerning the exponential regression slope and the autocorrelation initial slope: while some countries have higher figures for the first wave, others have lower figures for the second wave and vice versa. This was also evident for the regression plot where some countries have

Discussion
There are a lot of differences between the first and second wave results concerning the exponential regression slope and the autocorrelation initial slope: while some countries have higher figures for the first wave, others have lower figures for the second wave and vice versa. This was also evident for the regression plot where some countries have negative correlation values for the first wave of some growth parameters with the CHE/GDP and positive for the second wave, and vice versa for other countries. These phenomena prove that the way the pandemic spread in the second wave is different from what was experienced in the first wave. In the principal component analysis, we discovered that first wave deterministic R 0 and CHE/GDP health had high weights in first and second principal components (PC1 and PC2), which are dominant components in the PC analysis.
More precisely, on Figure 1a,b first and second waves of the COVID-19 pandemic are compared using linear and parabolic or cubic regression, showing a significant positive (resp. negative) correlation between the opposite of the initial autocorrelation slope and exponential regression slope of the first (resp. second) wave for developed (resp. all) countries. This opposition between the two waves could result from the application of a more severe lockdown in developed countries during the second wave. On Figure 1c, the opposite of the initial autocorrelation slope decreases significantly if the start of the first wave in a country is late with respect to the start of the COVID-19 outbreak in China due probably to the progressive implementation of mitigation measures in that country taking into account the experience of the countries starting first wave before. On Figure 2, the opposite of the initial autocorrelation slope is significantly negatively correlated with the maximum R 0 observed at the inflection point of the new cases curve, confirming that long contagiousness periods give high exponential increases of the new cases. On Figure 3, for the first wave the opposite of the initial autocorrelation slope is positively (resp. negatively) correlated with the CHE/GDP (resp. maximum R 0 ) for developed countries, which could correspond to the efficiency of the mitigation measures decided in these countries. This is confirmed on Figure 4, where the first wave exponential regression slope is positively correlated with the CHE/GDP in a mix of developed and developing countries. The Figure 5a shows the same type of effect of public health policies in developed countries for the first wave, where CHE/GDP increases with the first wave exponential regression slope, but this result is inverted on Figure 5b for the second wave perhaps due to a rationalization of the care activity between the first two waves. Figure 5c,d shows a similar behavior of the two waves concerning the positive correlation between the exponential regression slope and the maximum R 0 , which makes sense, as these quantities are both related to the initial exponential growth of an epidemic wave. For the first wave of all countries, Figure 6 shows the same positive correlation as Figure 5a between the exponential regression slope and CHE/GDP. Figure 7 compares two countries, one from Sahelian Africa, Mali and one from western Europe, Luxembourg during the first wave of Covid-19 outbreak during the spring 2020: Mali shows a quasi-endemic behavior with a weakly varying autocorrelation function and Luxembourg a frank epidemic wave with a classic shape. For the second wave in fall 2020, Mali presents an attenuated epidemic shape (due probably to specific geoclimatic conditions in western Africa [7]) and a country from central Europe, Slovenia, shows at this period an endemic behavior with an oscillatory occurrence of new cases. Figure 9 proposes a forecasting based on ARIMA decomposition for the first and second waves in Mali with a better approximation for the epidemic second wave than for the quasi-endemic first wave. It is the same for Luxembourg with an inversion of the phases order, an epidemic wave followed by an endemic state well predicted. On the contrary, for Slovenia, the endemic state with oscillations is badly predicted.
Clustering of all countries is then studied on Figures 10-12. Figure 10a shows the boxplot of the seven initial variables used in hierarchical clustering: the first and second wave opposite of the initial autocorrelation slope (respectively ARIMAF and ARIMAS), exponential regression slope and maximum R 0 (respectively FirstwaveD, SecondD, FirstwaveR, SecondR), and CHE/GDP. The boxplots contain five clusters represented in Figure 10b,c corresponding to more "developing" (in red with some notable exceptions such as the Czech Republic and Germany) and (c) more "developed" (in green and partially in orange) countries parts of the hierarchy tree, with a small "exotic" cluster for Tanzania and Mauritius. Figure 11a-e shows the results of the principal component analysis (PCA), with (a) the three principal components declined on the initial variables calculated for all countries (first and second waves maximum R 0 's denoted first wR 0 and second wR 0 , deterministic R 0 's denoted first wR 0 det and second wR 0 det , Arima slopes denoted first wArima, second wArima slopes, and the current health expenditure as gross domestic product percentage denoted CHE/GDP), (b) the projection of the points corresponding to countries of the PCA's plot on the first PC plane, (c) the explained variance plot and (d,e) the correlation circles for the first three principal components with projection of the initial variables as vectors (having 195 components corresponding to the 195 countries of the Table A1 in Appendix A) on the corresponding principal planes. In Figure 11a, the main initial variable in the linear combination of the first (resp. the second) principal component is the first wave deterministic R 0 det (resp. the CHE/GDP) and these two initial variables R 0 det and CHE/GDP are anticorrelated as we have already noticed when commenting before on the Figure 3 (a country devoting a large share of its GDP to health expenditure reduces the occurrence of new cases). Figure 11b gives the projection of 204 countries on the first PC plane and distinguishes two main clusters of 118 and 85 countries, respectively, plus a singleton representing Botswana, with more developed countries in green and more developing countries in orange. Figure 11c shows that 60% of the variance is explained by the three first PCs, and Figure 11d,e presents the correlation circles with projection of the initial variables as vectors on the corresponding two principal planes (PC1, PC2) and (PC2, PC3), showing such as in Figure 11a the preeminence of the opposite vectors, the first wave deterministic R 0 and the CHE/GDP. Figure 12 shows also for the first k-means cluster the importance of the first wave deterministic R 0 .
Finally, Figure 13 a,b corresponds to the ordinary multivariate least square method. Figure 13a shows the eccentric position of developed countries such as Belgium and USA and developing countries such as Equatorial Guinea and Suriname as outliers not fitting the data bulk, and Figure 13b the concentration of the initial variable CHE/GDP with the first and second waves deterministic R 0 det , in agreement with the fact that they are the most dominant initial variables in PCA and k-means clustering.

Conclusions
We have shown in this article that there exist correlations between the growth parameters directly linked to the occurrence of new cases of COVID-19 and socio-economic variables, in particular the current health expenditure as gross domestic product percentage (CHE/GDP) anticorrelated with the basic reproduction time R 0 , which shows the effectiveness of public health mitigation measures, even if they involve significant medico-economic costs. Larger perspectives are offered by combining this study with others on geoclimatic and demographic severity factors of the COVID-19 outbreak [7,8] with the present socioeconomic determinants, in order to obtain the most comprehensive and accurate picture of non-biological exogenous influences on the expanding COVID-19 pandemic.
Concerning the contagious diseases, public health physicians and policy-makers are constantly faced with four challenges. The first concerns the estimation of the basic reproduction number R 0 . The systematic use of R 0 simplifies the decision-making process by policy-makers, advised by public health authorities, but it is too caricature to account for the biology behind the viral spread. We have observed that R 0 was not constant during an epidemic wave due to exogenous and endogenous factors influencing both the duration of the contagiousness period and the transmission rate during this phase. Then, the first challenge concerns the estimation of the mean duration of the contagiousness period for infected patients. As for the transmission rate, realistic assumptions made it possible to obtain an upper limit to this duration [9][10][11], in order to better guide the individual quarantine or lockdown measures decided by the authorities in charge of public health. This upper bound also makes it possible to obtain a lower bound for the percentage of unreported infected patients, which gives an idea of the quality of the census of cases of infected patients, which is the second challenge facing specialists of contagious diseases. The third challenge is the estimation of the daily reproduction numbers over the contagiousness period [2] and the fourth and final interesting challenge is the extension of the methods developed in the present paper to contagious non-infectious diseases (i.e., those without causal infectious agents), such as social contagious diseases, the best example being that of the pandemic linked to obesity, for which many concepts and modelling methods presented here remain available.

Conflicts of Interest:
The authors declare no conflict of interest.