Analyzing the Data of COVID-19 with Quasi-Distribution Fitting Based on Piecewise B-spline Curves

Facing the world wide coronavirus disease 2019 (COVID-19) pandemic, a new fitting method (QDF, quasi-distribution fitting) which could be used to analyze the data of COVID-19 is developed based on piecewise quasi-uniform B-spline curves. For any given country or district, it simulates the distribution histogram data which is made from the daily confirmed cases (or the other data including daily recovery cases and daily fatality cases) of the COVID-19 with piecewise quasi-uniform B-spline curves. Being dealt with area normalization method, the fitting curves could be regarded as a kind of probability density function (PDF), its mathematical expectation and the variance could be used to analyze the situation of the coronavirus pandemic. Numerical experiments based on the data of certain countries have indicated that the QDF method demonstrate the intrinsic characteristics of COVID-19 data of the given country or distric, and because of the interval of data used in this paper is over one year (500 days), it reveals the fact that after multi-wave transmission of the coronavirus, the case fatality rate has declined obviously, the result shows that as an appraisal method, it is effective and feasible.


Introduction
In late December 2019, cases of pneumonia with unknow aetiology were reported in the city of Wuhan, China [1] .The causative agent, identified as the betacoronavirus SARS-CoV-2, is closely related to SARS-CoV which was responsible [2] for the outbreak of SARS between 2002 and 2004.SARS-CoV-2 caused a sizable epidemic of COVID-19 in China, then spread globally and declared a pandemic in March 2020 [3] .Until now, there have been a lot of research articles about this pandemic, demonstrated the plague in different angles.Some compared its pathogenesis with that of previously Middle East respiratory syndrome (MERS) and SARS [4] , or indicated how COVID-19 pneumonia compromise the distal lung performs essential respiratory functions [5] , or detailed virological analysis of cases of COVID-19 that provides proof of active virus replication in tissues of the upper respiratory tract [6] .Some discussed the COVID-19-related mortality among different ages [7] , gender [8] , or race [9] , shed light on the frequency of asymptomatic SARS-CoV-2 infection [10] .There were also some research works focus on building mathematics models to simulate the spread of SARS-CoV-2, to be specifically: metapopulation susceptible-exposed-infectious-removed (SEIR) model which integrated fine-grained, dynamic mobility networks simulating the spread of COVID-19 in ten of the largest US metropolitan areas [11] , and full-spectrum dynamics model which reconstructed the transmission mode of COVID-19 in Wuhan between 1 January and 8 March 2020 [12] .The analyzing of data showed anti-contagion polices have significantly and substantially slowed the growth of COVID-19 infections [13] , and the major non-pharmaceutical interventions-and lockdowns in particular-have had a large effect on reducing transmission [14] in the same way.A research group studied relationship between socio-economic factors and the COVID-19 pandemic in Germany by analyzing both infections and fatalities, their results showed that the population of poorer and more socially deprived districts was not necessarily more likely to get infected with SARS-CoV-2, but combined an average infection rate with a higher than average death rate [15] , beside these social problems caused by COVID-19, it also changed people's everyday life praxis during restriction measures [16] .Estimating the size of the coronavirus disease 2019 (COVID-19) pandemic is made challenging by inconsistencies in the available data [17] , so it is very important to analyze the COVID-19 data of given country or district, inculding the number of daily confirmed cases, daily recovery cases and daily fatality cases.In this paper, we use a quasi-distribution-fitting method based on piecewise quasi-uniform B-spline curves to investigate the consistency of infection and the trending patterns across different countries.It is established by fitting the distribution histogram data made from the COVID-19 data of given country or district with piecewise quasi-uniform B-spline curves, dealt with area normalization process, the fitting curves could be regarded as a kind of probability density function (PDF) of the data, then calculate the mathematical expectation and the variance as evaluation result.

Theoretical considerations
In computer aided geometric design, the B-spline form is widely used in representing a polynomial curve.B-spline curves have optimal shape preserving properties, and a B-spline curve of order n is evaluated by the de Casteljau algorithm with a computational cost of   2 On elementary operations [18] .But B-spline curve has a shortcoming, it is a kind of curve whose control polygon is not combined with the curve itself at their endpoints, which means changing of even one control point, all the points in the curve will be changed, so, in this paper we will use piecewise quasi-uniform Bspline curve to fulfill the fitting works.
f could be regarded as a kind of probability distribution, we can call that histogram distribution.The table 2 show the begin date and the end date of the countries whose data of COVID-19 would be used in our fitting process.Px, which has all the properties of probability density function (PDF), thus the histogram distribution data D k f could be analyzed from the respective of probability theory.In this paper, we fit D k f with piecewise quasiuniform B-spline curve, which is more flexible in curve modeling.Piecewise quasi-uniform B-spine curve is a kind of parameter curve (here the parameter denote as   ), it is gotten from uniform B-spline curve.First, we give the base functions of quintic quasi-uniform B-spline curves defined on interval   0,1 , whose node vector is dividing   0,1 into ten subintervals evenly, denote them as , 0,1, 14 , the concrete expression of those base functions i N could be found in the appendix.Second, we define the base functions of piecewise quasi-uniform B-spline as: is the segmentation point.
Assume , 0,1, 28  i Ci are the points in two-dimensional plane, then the definition of quintic piecewise quasi-uniform B-spline curve is: Where i C are called control points of the curve defined by equation (1).

Fitting process
For given data Equation ( 2) could be solved as: ,where In order to decide the segmentation point , we need to figure out how to evaluate the goodness of a fitting result.Being a parameter fitting curve,   Bt need to be discretized into a standard discrete . The x-coordinate of k P is k , then we have: . After that, we evaluate the goodness of the simulation result with mean square deviation (MSE), calculated as: Denote the standard discrete signal B which is gotten with the segmentation point as B  , then the best segmentation point could be decided like:

Quasi-distribution
The fitting signal k B is an approximation to the histogram distribution D k f , thus the sum of k B may not be 1.Nevertheless, we can fulfill it through an adjustment factor  , as: , then, k B satisfy the property of probability density function, but its expression is not like any existed probability density functions, we can call that quasi-distribution.

Experimental result
In this paper, we investigated eighteen countries' COVID-19 data with the data interval of 500 days, table 2 shows the beginning date and the ending date of each country, and the fitting of the quasidistribution result show as the following figures.In those figures, the green colored signal denote 7day moving average of the original data (called histogram data), including daily confirmed cases, daily recovery cases, and daily fatality cases, then with the algorithm just presented, we get the corresponding quasi-distribution fitting of those data, finally, we put those curves of quasi-distribution fitting together, to find the inner trend of the pandemic.Figure 1 shows the experimental result of Austria, in figure 1 (a), the red curve is the quasi-distribution fitting of the histogram data of daily confirmed cases, based on previous definition, it could be regarded as the probability density function of daily confirmed cases if it was assumed as random variable, in figure 1 (a), we can see, the quasi-distribution curve fitting the histogram data of daily conformed cases perfectly.Similarly, in figure 1 (b), the blue curve is the quasi-distribution fitting of the histogram data of daily recovery cases, in figure 1 (c), the black curve is the quasi-distribution simulation of the histogram data of daily fatality cases.In figure 1 (d), we put three fitting curves in the same coordinate, apparently they both have three peaks, but at the first peak, the fatality curve is higher than the other two, at the second peak, no obviously difference, but at the third peak, the fatality curve is quite lower than the other two, is that mean, with the time going on, even in the situation of the virus mutation, the case fatality rate of the COVID-19 is keep on declining?Or this situation only occurs in Austria, and the data of more countries need to be analyzed.In figure 2, we made the same procedures on the data of Brazil, but the result seems not like that of Austria's, especially in figure 2 (d), the fatality peak around day 400, seems quite higher than the previous peak around day 100, but the trend of quasi-distribution fitting curves of daily confirmed cases and daily recovery cases are still quite similarly, so more countries' data need to be counted.3 shows the experimental result of Canada, it shows the same result like Austria's, in figure 3 (d) the third peak of the fatality curve is much lower than that of the daily confirmed cases and daily recovery cases, and in figure 3 (b), around day 140 there is a booming of the daily recovery cases, apparently it was not because of the suddenly increasing of the medical system, but the releasing of accumulated data, that is why we do not using the histogram data (7-day moving average data) to discover the inner trend of the pandemic, but the corresponding quasi-distribution fitting curves.10, we can see no matter daily confirmed cases, daily recovery cases or daily fatality cases, the histogram data and their corresponding quasi-distribution fitting curves both only have two peaks, it not means the COVID-19 rarely spread during the early 50~70 days, it just because the data around day 200 and day 430 (about the two peak points of abscissa) are too large to make the early days' data obviously.In figure 11 (d) we can find out the second peak of quasi-distribution fitting curve of daily fatality is lower than that of daily confirmed cases, it shows the declining of fatality rate of the pandemic.The same is true in Poland, show in figure 12.  13 shows the fitting result of Israel, based on figure 13(d) even though the last peak of quasidistribution fitting curve of daily fatality is lower than that of daily confirmed cases, but not very obviously, so does the peak before the last one, therefore, we should discreetly draw a conclusion that with the time going on the mortality rate has not increased in Israel.The epidemic situation in Japan and Philippines are similar, show in figure 14 and figure 15 respectively.

Conclusion
In this paper, we build a new way, called QDF method to evaluate the spread of the epidemic caused by COVID-19 based on piecewise quasi-uniform B-spline curves.By fitting the distribution histogram data made from the daily confirmed cases, daily recovery cases and daily fatality cases of eighteen countries, we come to a conclusion that with the spread of the epidemic, even in the situation of the virus mutation, the case fatality rate of the COVID-19 is keep on declining.From the fitting results, the shape of the fitting curves of daily recovery cases, are much similar to the fitting curves of daily confirmed cases, showed in figure 1

Fig. 3 .
Fig. 3. Histogram data and quasi-distribution fitting of Canada

Table 2 Beginning date and ending date of different countries' COVID-19 data
k f , it could be simulated with a function   C to fitting those data.The most important part is to calculate unknown control points  , then build a vector equation group which has N equations to solve the unknown control points i C :