Comparison of Positivity in Two Epidemic Waves of COVID-19 in Colombia with FDA

: We use the functional data methodology to examine whether there are signiﬁcant differences between two waves of contagion by COVID-19 in Colombia between 7 July 2020 and 20 July 2021. A pointwise functional t -test is initially used, then an alternative statistical test proposal for paired samples is presented, which has a theoretical distribution and performs well in small samples. Our statistical test generates a scalar p -value, which provides a global idea about the signiﬁcance of the positivity curves, complementing the existing punctual tests, as an advantage.


Introduction
Functional data analysis (FDA) is a branch of statistics that has had great development in recent years due to its multiple applications in different fields of science [1][2][3][4][5][6][7]. One of the reasons for its popularity is that all consecutive observations of a continuous phenomenon can be viewed as a single curve, since the objective of this field is to analyze sets of curves. FDA has a wide range of descriptive and inferential techniques to accomplish this [8,9].
During the global emergency caused by COVID-19, the Colombian government chose the positivity rate as a key variable for early decisions regarding the management of disease, which can be calculated over multiple periods (daily, weekly, monthly, etc.). In this study, the rate of positivity refers to the daily percentage of positive COVID-19 tests for the total number of tests processed. Its trend helps determine the presence of a wave epidemic outbreak [10,11].
Because the positivity rate is related to waves of contagion [10,12], we assume that confirmed coronavirus cases in Colombia in one year serve to identify the most critical moments of a trend change in that time. As the information discriminated by departments in Colombia was only reported from July 2020, we have usable information from 19 July 2020 onward; we can see several contagion waves, but we focus our attention on two of them, depicted in Figure 1.
In the year 2021, Colombia witnessed strong protests. People took to the streets due to multiple social factors, which prevented isolation behavior and other care measures against COVID-19. This fact may have prolonged the duration of the contagion wave as well as its magnitude, since the protests began in 28 April 2021 and lasted for more than two months [13].
Because of this, we consider two case studies. In the first case, we assume that the two waves of contagion have the same duration. However, in the second case, we consider that both waves have a different duration. The purpose of this study is to determine whether there are significant differences between the first and second waves in each case study separately. Because the data are continuous and additionally paired, we used the FDA methodology with a pointwise functional t-test, followed by a hypothesis testing proposal based on the integral of the difference in positivity rate curves, to test the equality of functional means.  The rest of the paper is composed as follows: Section 2 offers a contextualization of the data used in the study, contextualization of the functional data, the use of the punctual t-test, and the theoretical component of our proposed test, as well as the basis of our simulation; Then, Section 3 shows the functional data built and the results of the tests carried out. Subsequently, Section 4 presents some comments on the results and, finally, we offer our acknowledgments.

Materials and Methods
In this section, we present a contextualization on topics of the article, and we propose a hypothesis testing for functional data.

About the COVID-19 Dataset in Colombia and the Two Case Studies
In Colombia, data on COVID-19 are officially reported by the National Health Institute (NHI) of the Colombian Ministry of Health. However, Colombia is a country divided in 32 regional sections called departments and one special capital district called Bogotá D.C. Each of these reports information on the number and type of tests performed and the number of positive cases to the NHI. This information is not properly reported in some cases, thus leaving a problem of incomplete information. In this study, we only consider departments whose information-from 20 July 2020, to 20 July 2021-is complete. Therefore, we have 23 departments whose information we use and, including Bogotá D.C., 24 regions in total. They are: Antioquia, Atlántico, Bolivar, Boyaca, Cáldas, Casanare, Cauca, Cesar, Córdoba, Cundinamarca, Guajira, Huila, Magdalena, Meta, Nariño, Norte de Santander, Putumayo, Quindío, Risaralda, Santander, Sucre, Tolima, Valle del Cauca, and Bogotá D.C.
It is suspected that the national strike that occurred in Colombia on 28 April 2021, and lasted for at least two months [13], caused people not to take personal protection measures against COVID-19 and that this lengthened the duration of the wave of contagion by SARS-CoV-2 in Colombia. Two study scenarios were considered. The first, called Case 1, assumes the measurements for the first wave of contagion from 20 November 2020, to 18     It is important to note that, for each case, the data from the first and second waves have a paired behavior. For each department, the first wave is followed by the second, which constitutes before and after observations.

About Functional Data Analysis
Functional data analysis is a statistical methodology in which the objects of study are not scalar values, but continuous functions [7,14], considered as observations of a stochastic process {X(t) : t ∈ T}. Thus, the set of functional observations x 1 (t), x 2 (t), . . . , x n (t) constitute a simple random sample of it, and each observation is called a functional datum.
The FDA allows the appropriation of a certain mathematical theory about the functions, collected in the functional analysis, since the functional observations considered are smooth curves and square-integrable. That is, if {x i (t)} n i=1 is a functional random sample, associated with the stochastic process {X(t) : t ∈ T}, so that Equation (1) holds for i = 1, 2, . . . n.
Thus, the functional data are elements of a Hilbert vector space over the field R of the real numbers. This space is denoted , is the domain of the elements of L 2 , which, without loss of generality, can be moved to the interval [0, 1], [15,16]. In FDA, the underlying stochastic process {X(t) : t ∈ T} is defined as a second-order stochastic process [17], so its expected value exists in its functional form, defined as in Equation (2): on the probability space (Ω, A, P).
Although the FDA considers that observations of the stochastic process are continuous functions, the curves of these functions must be obtained through punctual observations of the phenomenon. For this, there are different methodologies; however, we use the vector space structure of L 2 ([0, 1]) to assume that the observations are elements of a finitedimensional subspace H of the space L 2 ([0, 1]). This allows us to assume the existence of a finite basis B = φ j (t) k j=1 of size k for the subspace H, where each element of B is a basis function, and k is the dimension of H. Thus, each functional datum x i (t) is uniquely expressed as a linear combination of elements of B, and elements of R called coefficients, as in Equation (3): where Obtaining the values of c i,j ∈ R are carried out in this work by least squares [1]. This allows for an estimate of the mean function µ(t), through the Expression (4): where {x i (t)} n i=1 is a set of functional data [8]. There are different types of bases to generate functional data. In this study, we consider a basis of functions of the form defined as in Equation (5): for j = 2, 3, . . . , k, since, together with the constant function φ 1 (t) = 1, the set of functions φ j k j=1 constitutes a finite orthonormal basis for a vector subspace of dimension k of the Hilbert space L 2 ([0, 1]) [18,19].

Hypothesis Testing in Functional Data
A hypothesis test for functional data stems from the same theoretical foundation as a hypothesis test for scalar data. Accordingly, an initial hypothesis is generated about a population parameter, known as the null hypothesis and denoted as H 0 , which is contrasted with a hypothesis, generally complementary about the same parameter, called the alternative hypothesis and denoted as H 1 [20].
Since the objective of our study is to determine the existence or not of significant differences between the two waves of contagion of COVID-19, and as the positivity rate has a continuous behavior, the functional data methodology is used. We use the functional mean as a parameter to define a hypothesis test, defining as µ X and µ Y the functional means of the stochastic processes {X(t) : t ∈ T} and {Y(t) : t ∈ T} associated with the positivity rate of COVID-19 in the first and second waves of contagion, respectively. With this, the contrast of hypotheses is raised in Equation (6): Based on the data samples, a statistical test is generated and calculated. The value of the statistic is located within the null distribution, which is the probability distribution that would apply to the statistic if the null hypothesis was correct; next, using the null distribution, a p-value is calculated, which indicates the probability that the test statistic is at least as extreme as the observed statistic [20].
On the tests of hypotheses in functional data, one can find very diverse literature from different approaches such as the [21][22][23][24][25][26][27][28]. However, as can be seen from early work on functional data, hypothesis testing for functional data can be performed using a pointwise t-test [1], based on the idea of fixing a value t ∈ [0, 1]. Thus, the hypothesis test of the Equation (6) is performed for each of the infinite points t ∈ [0, 1], and since the values of the images of the functions x i (t) and y i (t) are scalars for each fixed t, application of a t-test for scalar data is allowed.
We make the statistical comparison in a first instance with a pointwise t-test, which is a natural extension of a t-test but now in the functional context. This methodology has the limitation that, when performing the scalar tests on the domain values [0, 1], the p-value is a continuous function, so it is difficult to generate a global conclusion on the contrast. For this reason, we now propose a different approach to hypothesis testing for functional data, which produces a global p-value that helps decide on the entire domain and not about sections of it.

Another Hypothesis Test Approach for Functional Data
In our case study, the data of interest, in addition to being of a continuous nature, exhibit paired behavior. That is, for each department, there is a curve of the first wave and a curve of the second wave. Therefore, we have 24 pairs of curves. Thus, we proceed as in the scalar case and restate the contrast as in Equation (7): where in this case 0 L 2 ([0,1]) refers to the zero function in L 2 ([0, 1]). The similarity of the difference curve of any two continuous functions with the zero function is an indicator that both functions are similar. Therefore, if the null hypothesis is true, the integral of the difference curve must be zero, and we can obtain the contrast of Equation (8): For the hypothesis test of Equation (8), we present a test statistic based on the average of the integral of the functional differences, denoted by the acronym for Mean Integral of Differences (MID), which is arrived at by using a bit of algebra on the sample estimates of the parameters. Thus, given two sets of functional data of a paired nature {x i } n i=1 and {y i } n i=1 , the MID contrast statistic is defined by Equation (9): where d i (t) is defined as in Equation (10): for each i = 1, 2, . . . , n.
The form of our proposed statistical test, MID ∼ N(0, σ), follows a normal distribution with mean zero and variance σ 2 , guaranteed by the central limit theorem. The contrast can be done by standardizing MID using Expression (11): Thus, σ can be obtained from Equation (12): where S 2 d is the sample variance obtained from set . In this way, a scalarvalue can be computed as 2P(Z ≥ |S.MID|), where Z is a real random variable, such that Z ∼ N(0, 1). In addition to the theoretical approach, to apply our contrast statistic to the specific study cases, we decided to also run a simulation process to find a null distribution and perform the test-considering that the null hypothesis is that the two paired functional samples come from populations with the same mean. Thus, we simulate paired scalar points from a common functional mean for the two groups and use them to construct the curves that constitute the functional data samples of size 24; then, we apply the test statistic. The process is repeated 4000 times for that sample size.
Moreover, a quick Shapiro-Wilk test is performed on the values of the integral of the differences of the functions to assess whether there is evidence that the resulting data do not come from a normal distribution. The p-value for this test is reported later in the results section, which supports the use of the proposed methodology.

Results
In this section, we present the results of our study applying the FDA methodology to the positivity data for COVID-19 in Colombia in both cases considered.

Constructed Functional Data
In Figure 4, we show the functional data of the COVID-19 positivity rate in Colombia in case 1, using the orthonormal basis of the Equation (5). Here, in the top left and right panels, functional data of positivity rate are shown for the first and second waves, respectively; meanwhile, in the bottom left panel, the functional means of positivity rate of both waves of contagion by COVID-19 are shown, which are the goal of the comparison, to be conducted through the curves of difference of the positivity ratio in both waves of contagion by COVID-19, shown in the lower right panel.
Similarly, the functional data of COVID-19 positivity rate in Colombia in case 2 are depicted in Figure 5. We respectively show in the upper left and right panels the functional data of positivity rate for the first and second waves. It is possible to appreciate a certain difference in the trend of positivity between the two waves. This can also be seen in the functional means of both waves of contagion by COVID-19 shown in the bottom left panel of both waves of contagion by COVID-19, which are the goal of comparison in case 2. In addition, a different trend is observed between the curves of the differences in the rate of positivity shown in the figure, with respect to the curves of difference in case 1.

Pointwise Hypothesis Contrast for Curves
As stated above, the pointwise t-test assumes that, for each t ∈ [0, 1], a scalar t-test can be performed with the images of the functions evaluated at point t. That is, a t-test for the two groups of scalar values {x i (t)} n i=1 and {y i (t)} n i=1 , results from evaluating the functions x i and y i at the same fixed point t, for i = 1, 2, . . . , n. In this case, the contrast is defined as in Equation (13): where µ X(t) and µ Y(t) are scalar parameters, since t is a fixed value. This test is performed using the test statistic of Equation (14): where sd is the standard deviation of the scalar values {x i (t) − y i (t)} n i=1 . In this way, we take 1000 values of t within the interval [0, 1] and perform the test for each of these. We then obtain 1000 p-values, which are shown in Figure 6: in the left panel for case 1 and in the right panel for case 2. Note that, so far, it is not possible to determine whether there are significant differences between the two waves of COVID-19 contagion through the positivity curves, in a global way.

Another Hypothesis Test Approach for Functional Data
Under the previously exposed methodology, two groups of curves of size 24 were simulated in pairs, under the null hypothesis that the functional means are equal, and the MID test statistic was calculated in the sample. This process was repeated 4000 times separately for each case of study, with which 4000 values of the MID statistic were obtained. After performing the simulation process to obtain the null distribution in the form of a histogram, the value of the test statistic was calculated in the real functional data of positivity rate for COVID-19 in Colombia in both cases of study. Using the histogram found under simulation, the critical values corresponding to a significance of 0.05 were found by frequency. The histograms of the values found, together with the value of the statistic and the respective critical values, are shown in Figure 7: in the left panel for case 1 and in the right panel for case 2.
In addition to the above, in Table 1, we show the p-values obtained in the Shapiro-Wilk test performed on the 24 pieces of data from the integrals of the difference of the paired functional data in the two study cases. In addition, we show the p-value of the test statistic using the theoretical distribution of the test statistic, and we also show the values of the test statistic in each case and their respective p-values found under simulation and the critical values of the distribution null found under simulation.   Note that now, with the use of the scalar p-values found with our proposal, it is possible to decide on the existence of significant differences between both waves of COVID-19 contagion in a global way. Thus, for case 2, we can say that there are significant differences between the two waves of contagion, since the p-value in this case is 0.00001, under the theoretical null distribution, and 0.0015 under the simulated null distribution. Therefore, the hypothesis that the functional means of the positivity rate are the same in both waves of contagion by COVID-19 is rejected.
In turn, for the first case, since the p-values are 0.08906 under the theoretical null distribution, and 0.0875 under the simulated null distribution, the hypothesis that the functional means of the positivity rate are the same in both waves of contagion by COVID-19 is not rejected. It is not rejected by a very small margin with respect to the reference value of 0.05 significance.

Discussion
It is important to point out that, as Figure 6 shows, the point t-test for functional data allows us to evaluate the sections of the domain of the functions where there are significant differences. In terms of the cases of study, the dates between which there is a greater difference between the two contagion waves could be identified. Nevertheless, the pointwise t-test is insufficient to determine whether or not the two contagion waves are significantly different in each case study.
Our proposed test for hypotheses testing for paired data allow a global decision to be made based on the scalar p-value, however. The application of our contrast statistic allows us to visualize-as can be seen in Figure 7 (left panel)-that, for Case 1, when a significance of 0.05 is taken, the contrast statistic does not reject the hypothesis of equal means; i.e., at a significance of 0.05, there is no evidence that there are significant differences between the two contagion waves, even though the p-value of 0.082 is relatively close. Thus, if the significance is taken at 0.1, the decision would be to reject the null hypothesis, although again, with a very close margin. With regard to Case 2, as shown in Figure 7(rigth panel), the p-value found with our statistic is 0.0015, so we can say that there is sufficient evidence that the two contagion waves are significantly different.
Because of the above, although the p-value in case 1 leaves some doubt, it is important to highlight the difference between the p-values in both cases from a broader point of view, which seems to support the idea that the two case studies are remarkably different, and that the national strike in Colombia should not be ignored when analyzing epidemiological behavior, since the case studies suggest a possible change in the inclusion of positive data due to noncompliance with care measures during the national strike.
Author Contributions: Both authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.