On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection

The paradigm-shifting developments of cryptography and information theory have focused on the privacy of data-sharing systems, such as epidemiological studies, where agencies are collecting far more personal data than they need, causing intrusions on patients’ privacy. To study the capability of the data collection while protecting privacy from an information theory perspective, we formulate a new distributed multiparty computation problem called privacy-preserving epidemiological data collection. In our setting, a data collector requires a linear combination of K users’ data through a storage system consisting of N servers. Privacy needs to be protected when the users, servers, and data collector do not trust each other. For the users, any data are required to be protected from up to E colluding servers; for the servers, any more information than the desired linear combination cannot be leaked to the data collector; and for the data collector, any single server can not know anything about the coefficients of the linear combination. Our goal is to find the optimal collection rate, which is defined as the ratio of the size of the user’s message to the total size of downloads from N servers to the data collector. For achievability, we propose an asymptotic capacity-achieving scheme when E<N−1, by applying the cross-subspace alignment method to our construction; for the converse, we proved an upper bound of the asymptotic rate for all achievable schemes when E<N−1. Additionally, we show that a positive asymptotic capacity is not possible when E≥N−1. The results of the achievability and converse meet when the number of users goes to infinity, yielding the asymptotic capacity. Our work broadens current researches on data privacy in information theory and gives the best achievable asymptotic performance that any epidemiological data collector can obtain.


Introduction
During any prevention and control period in an epidemic, strengthening the protection of personal information is conducive not only to safeguarding personal interests, but also better controlling the development of the epidemic. Differently from the collection of other homogeneous data, such as the large-scale labeled sample obtained in machine learning, the characteristics of medical or epidemiological data collection are reflected in the following aspects: (1) In order to establish a surveillance system for a dynamical group of people to collect syndromic data, the sample size is always changing [1,2]; (2) Data sharing and early response are critical in containing the spread of highly infectious diseases, such as COVID-19. This requires the data stored in the database to be updated to track real-time changes in symptoms, severity, or other disease-related patterns [3,4]; (3) Some of the epidemiological data, such as the locations of infected individuals, the blood oxygen saturation levels of patients with respiratory diseases after medication, or the physical condition monitoring data after viral infection, are related to the user's privacy, so a "privacy-first" approach which uses dynamic identifiers and stores their data in a cryptographically secure manner is needed [5]. Hence, for these factors of epidemiological data collection, the key to ensuring the efficiency of information sharing and privacy protection is to maximize the balance between data privacy and collection rate in epidemic analysis.
While the data collection from public health authorities and the open-source access to researchers can provide convenience to epidemiologists, these strategies may also significantly intrude upon citizens' privacy. Some of these individuals were affected by unwanted privacy invasion, and the ubiquitous data surveillance devices certainly exacerbate those concerns. Therefore, the responsible use of shared data and algorithms, the compliance with data protection regulations, and the appropriate respect for privacy and confidentiality have become important topics in epidemiological data collection [6,7]. In 2014, the Global System for Mobile Communications (GSM) Association outlined some privacy standards for data-processing agencies regarding mobile phone data collection in response to the Ebola virus [8]. Some other methods that effectively guarantee user privacy include: encryption mechanisms [9], the privacy-aware energy-efficient framework (P-AEEF) protocol [10], the differential privacy-based classification algorithm [11], and the spatio-temporal trajectory method [12]. Moreover, unauthorized agencies, malicious hackers, and unidentified attacks, such as traffic analysis attacks, fake-node injection, and selective forwarding, may also eavesdrop on users' data under the current ambiguous and non-uniform collecting algorithms [13]. To avoid the leakage of information from malicious collecting servers or untrusted access, the privacy of data needs to be ensured during data storage and processing.
In line with ensuring the security and privacy of user data, epidemiological data collection also requires the protection of data collectors. In accessing public databases, various epidemiological investigation agencies have different requirements for data privacy. For example, many epidemiological surveys are conducted by governments, universities, research institutes, pharmaceutical companies, and private institutions. Access to public data may reveal these agencies' data preferences, resulting in information leakage. Therefore, the private-preserving epidemiological data collection includes the user's privacy regarding the storage and data collector, along with the data collector's privacy in data storage.
In epidemiological modeling, many recent studies have shown that various models have a good fitting effect on the nature of epidemics, such as the Bayesian model [14] and deep learning models, including multi-head attention, long short-term memory (LSTM), and the convolutional neural network (CNN) [15]. Additionally, when it comes to privacy and security concerns, some work conducted in computer science, cryptography, and information theory provides handy tools to model and solve such problems. The privacy leakage was modeled in differential privacy [16], k-anonymity [17], t-closeness [18], interval privacy [19], etc. With those concerns and analysis, several studies in cryptography and information theory have focused on the issue of sharing messages to untrusted agencies, such as distributed linear separable computation [20][21][22], secured matrix multiplication [23][24][25], secure aggregation [26], participatory sensing [27], and private information retrieval [28][29][30][31]. Specifically, a cryptographical epidemiological model with data security and privacy called RIPPLE was analyzed [32]. This model enables the execution of epidemiological models based on the most-recent real contact graph while keeping all the users' information private. As an extension to the model, the data collector uses the sum private information retrieval (PIR) schemes to obtain the statistical data, which is described as the sum of a set of elements from a database.
Inspired by the cryptographical epidemiological model in [32], we propose an information-theoretic privacy-preserving epidemiological-data-collection problem , described as follows. We have a large and changing number of users sharing their real-time epidemiological data to N servers. The server will update its database once new data are uploaded. The storage of those users' data is also open to all authorized data collectors. For any data collector who desires only some statistical feature, rather than all the data, it directly retrieves the statistic from N servers. We designed the protocol of the interaction between the N servers and the data collector so that when all N servers honestly answer the queries from the data collector, then the desired statistical data can be correctly decoded at the data collector. To simplify the problem, we assume the desired statistics to be the linear combinations of users' personal data. The privacy of this model is reflected in the following three aspects: (1) The privacy of users' data against the data collector: after downloading all answers from the servers, the data collector can decode the desired data without learning anything about the irrelevant details of the users' personal information.
(2) The privacy of users' data against servers: for any user successfully sharing the data to all N servers, the personal information of the user is still confidential, even if of any up to E(E < N) servers collude. (3) The privacy of the data collector's preference against servers: the protocol between data collector and servers should promise that any single server cannot know any preference of the desired statistical data from the query generated by the data collector. In our paper, we take the above privacy concerns into consideration and analyze the data collector's ability of privately receiving the shared data, with respect to the number of symbols it needs to download.
The remainder of the paper is organized as follows: In Section 2, we introduce an information-theoretic description of the privacy-preserving epidemiological-data-collection problem, and we define the communication rate and capacity of our problem. In Section 3, we give a closed-form expression for the asymptotic capacity, which is the main result of the paper. In Section 4, we derive a converse proof for E < N − 1, which provides an upper bound on the asymptotic capacity when the number of users tends to infinity. An asymptotically capacity-achieving scheme using the technique of cross-subspace alignment when E < N − 1 is given in Section 5, and in Section 6, we prove that the problem is not asymptotically achievable when E = N − 1. Finally, in Section 7, we summarize our results and suggest some open problems in this field.

System Model
We formulate the privacy-preserving epidemiological-data-collection problem over a typical distributed, secure computation system, in which there are K users, N servers, and a data collector; and the number of users (K ∈ N + ) is a large-enough integer. Here and throughout the paper, we assume that all of the random symbols in our system are generated by a large-enough finite field F q , and we standardize the entropy of any uniformly distributed symbol to be 1 by taking the term log in the entropy, conditional entropy, and mutual information to be base q. The model is depicted in Figure 1. Let W 1 , · · · , W K be K independent messages, where W k , k ∈ [1 : K] denotes the epidemiological data of user k and W k ∈ GF L×1 q , L ∈ N + is an L × 1 vector with L i.i.d. uniform symbols from the finite field GF q -i.e., The data-collection problem contains two phases: the upload phase and the computation phase. The upload phase starts when the user is required to update its data to each of the N servers. Before uploading, user k knows nothing about the contents of other users' epidemiological data. While uploading their personal data, the users would like to keep his/her message private against E (E ≤ N, E ∈ N) colluding servers; i.e., any of up to E servers will learn nothing about the messages uploaded by K users. For any n ∈ [1 : N], let D k,n ∈ D denote the uploaded message from the k-th user to the n-th server. We have the following equality called the privacy constraint of the users against E servers: For any k ∈ [1 : K], let Z k ∈ Z k denote a random variable privately generated by user k, and its realization is not known to any of the N servers. User k utilizes the user-side randomness Z k to encipher the message W k ; i.e., for any user k, there exist N functions {d k,n } n∈ [1:N] such that d k,n (W k , Z k ) = D k,n , where d k,n : GF L q × Z k → D is the encoding function from the user k to the server n. We have When the servers receive the updated data from users, the old contents of the server will be replaced with the new contents, and all servers will be ready for the computation phase and allow the access of data updated by data collectors.
The computation phase starts when a data collector would like to compute statistical data of the current epidemiological database. To simplify our problem, we only consider statistical data to be a linear combination of all messages W [K] . Let W f and f be the statistical data and the corresponding coefficient vector. Then, we have where W f ∈ GF L q has the same number of symbols with the message length, and f ∈ GF K q contains K elements in the finite field GF q . In our setting, the coefficient vector f only contains the preference of the data collector among K user's epidemiological data records. Therefore, the value of f does not depend on the users' messages W [1:K] or the storage of N servers Let Z ∈ Z denote a random variable privately generated by the data collector, and its realization is not available to any of the N servers and K users. In the computation phase, the data collector with its preference vector f utilizes the randomness Z to generate its queries to N servers. Assume that the query from the data collector to the n-th server is denoted as Q f n ∈ Q n . Then, the data collector uses the strategy g to map the randomness and the coefficient to the queries, such that where g : GF K q × Z → ∏ N n=1 Q n is the encoding function from the data collector to the servers. Hence, we have Since the randomness Z and queries Q f [1:N] are generated privately, and the user-side data and randomness W [1:K] , Z [1:K] are already known before the computation phase starts, the data collector has no knowledge of W [1:K] , Z [1:K] when the queries Q f [1:N] are generated. Thus, we have Upon receiving the query Q f n , the n-th server will send an answer A f n back to the data collector according to the storage of the n-th server. We assume that there is no drop-out in the model-i.e., each server n will successfully return its answer to the data collector. Let A f n ∈ A n be the answer from the n-th server to the data collector, and then for any n ∈ [1 : N], there exists a deterministic function a f n : We would like to design a scheme φ = {d k,n , g, a f n } k∈[1:K],n∈[1:N],f∈GF K q in the upload and computation phases so that the following three constraints can be achieved. Firstly, the data collector is able to reconstruct the desired message from all the information it has received, which we call the decodability constraint. Let ψ be the reconstruction function of the data collector, where ψ : We havê Let P e be the probability of decoding error achieved by a given scheme φ and decoding function ψ. We obtain According to Fano's inequality, the decodability constraint is equivalent to when P e → 0, where o(L) denotes any possible function h : The second constraint guarantees the data privacy of K users against the data collector. The privacy leakage to the data collector is unavoidable, as the data collector must learn some statistical data of the users. This constraint requires that the data collector can learn nothing about the information of the K users other than the desired data W f . We have To protect the privacy of data collector, the third constraint requires the coefficient vector f of the data collector to be indistinguishable from the perspective of each server; i.e., for any different (linearly independent) coefficient vectors f and f from the same scheme φ, the queries to every single server are identically distributed, so that any server cannot deduce f merely from the query and storage without communicating to other servers. Hence, we have , ∀f, f linear independent (11) where A ∼ B means that the random variables A and B have the same distribution. This constraint is called the privacy constraint of the data collector against non-colluding servers. The reason why in the upload phase, we consider up to E servers may collude, and in the computation phase, we consider non-colluding servers to be defined by the following: the upload phase and the computation phase do not always occur at the same time. For example, the users are required to upload their epidemiological data on a regular basis, and the data collector may start his/her queries of certain statistics at relatively random times. Due to the dynamic topology of the servers, the numbers of colluding servers may be different during the uploading phase and the computation phase. Our work assumes that the servers are non-colluding in the computation phase, since the servers may be more interested in the epidemiological data than the data collector's interest. If E = 1, then we have a model where the servers are non-colluding in both the upload phase and the computation phase.
For any scheme φ = {d k,n , g, a f n } k∈[1:K],n∈[1:N],f∈GF K q that satisfies the above decodability constraint, i.e., (9), and the privacy constraints, i.e., the privacy constraint of the users against E servers (2), the privacy constraint of the users against the data collector (10), and the privacy constraint of the data collector against the non-colluding servers (11), its communication rate is characterized by the number of symbols the data collector decodes per download symbol-i.e., It is worth noticing that R is not a function of f due to (11). A rate R is said to be ( -error) achievable if there exists a sequence of schemes indexed by L with their communication rate less than or equal to R where the probability of error P e goes to zero as L → ∞. The -error capacity of this random secure aggregation problem is defined as the supremum of all -error achievable rates, i.e., C := sup R, where the supremum is over all possible -error achievable schemes.

Main Result
For the information-theoretical framework of our privacy-preserving epidemiologicaldata-collection problem presented in Section 2, our main result is the characterization of the asymptotic capacity when K → ∞ for any N ∈ N + and E ∈ N.
To begin with, we show that the problem is infeasible when N = 1 or N = E, since in these cases, some constraints of our setting contradict each other, and no scheme can satisfy all the constraints. Firstly, when there is only one server, the privacy of users against the data collector and the privacy of data collector against servers will conflict with each other. The reason is as follows. First, according to (11), the answer A will be given that for two different f, f , the distribution of A is the same. Second, the decodability (9) guarantees that W f can be derived from A. Then, W f can be derived from A, which contradicts the privacy of users against the data collector.
Moreover, when E = N, the decodability constraint and the privacy of users against the servers will conflict with each other. The reason is as follows. First, the answers A f [1:N] from N servers are given by the queries Q f  (6) and (2), which contradicts the decodability constraint as A f [1:N] is independent from the database W [1:K] . Therefore, no scheme can simultaneously satisfy all the constraints in a single-server or E = N scenario, and there does not exist a positive capacity.
The following theorem gives the asymptotic capacity of private-preserving epidemiological data collection for an infinitely large K, where we have N ≥ 2, E < N servers. Theorem 1. Consider E as a non-negative number, and there are N ≥ 2 servers. When E < N, and the number of users K → ∞, the asymptotic capacity of the secure privacy-preserving epidemiological-data-collection problem is When E < N − 1, the converse proof of Theorem 1 will be given in Section 4, and the achievability proof will be given in Section 5 for any finite K ∈ N + . When K → ∞, our scheme in Section 5 remains achievable at the same rate, and the performance of the achievability and converse will meet. When E = N − 1, the proof of the zero asymptotic capacity is given in Section 6.

Remark 1.
From Theorem 1, we can see a threshold in the asymptotic capacity on the number of the maximized possible colluding servers E. When E ≥ N − 1, there is no scheme with a positive asymptotic capacity; when E < N − 1, the asymptotic capacity is a decreasing function of E. When N approaches infinity while E is a constant, the asymptotic capacity approaches one.

Remark 2.
When the number of users K is a finite integer, the achievability and converse results do not always meet. From our converse proof in Section 4, the upper bound we give for finite K depends on the value of K. However, the performance (i.e., achievable rate) of our scheme in Section 5 is irrelevant to K, even though the scheme we propose is different for the finite K ∈ N + . How to close the gap between the upper and lower bounds when K is finite is still an open problem.

Proof of Theorem 1: Converse When
We give the converse proof of Theorem 1 when E < N − 1 in this section. The converse allows any feasible scheme φ, and we give an upper bound over the rates of all possible schemes. We start with the following lemma, which states an iterative relationship among the number of linear combinations of the users' messages. Lemma 1. Consider K linear independent vectors f 1 , f 2 , · · · , f K ∈ GF K q . Let E be a set and E ⊆ [1 : N], |E | = E. We have Proof. According to our problem setting, we have ≥H(A f where (15) holds because of the non-negativity of the following conditional entropy; i.e.,

H(A
where the first equality follows from (7), and the second equality follows from (9). Finally, (18) holds because W f is independent from the queries and randomness, the security constraint and that f, f ∈ GF K q are linear independent vectors.
The following lemma shows that any answers in a set are independent from any queries conditioned on the same set of queries to the same coefficient and any size of messages and randomnesses. This is the direct inference from the independence of message, queries, and randomnesses generated by the data collector (6).

Lemma 2.
Assume that f ∈ GF N q , N 1 , N 2 ∈ [1 : N], and K ∈ [1 : K]. Then, we have the following equality: Proof. The proof is the same as Section VI, Lemma 1, in [29], and the key to this proof is that A f N 1 is determined by W [1:K] , conditioned on Z and Q f N 1 . We omit the detailed proof here.
The lemma below has a similar form to Lemma 2, and it shows that any set of answers with size of E do not depend on the desired statistic, conditioned on the same set of queries and the randomnesses generated by the data collector.
Proof. We only need to show that I(A f E ; W f |Q f E , Z ) is less than or equal to 0 because of its non-negativity.
The following lemma shows that we can split the answers into two parts-one from E servers that cannot decode the database and the other from N − E servers: Proof. Based on the system model, we have ≥L where (24) (20), (6), and (7); (27) is due to (7); and (28) Now, we can get the lower bound on the asymptotic download size when L and K goes infinity by applying (14) to (23). We have where (30) Thus, for any possible scheme φ satisfying the constraints of the problem, the rate cannot be more than N−E−1 N when E < N − 1 and K → ∞.

Proof of Theorem 1: Achievability When
In this section, we give a cross-subspace alignment (CSA) scheme based on the coding of interference in the computation phase to reach the asymptotic capacity [33] for any integers N > E + 1 and K ≥ 2. Throughout the scheme, we choose the length of each personal message L = N − E − 1 ≥ 1, and we use the notation First, we specify the encoding functions {d k } k∈ [1:K] in the upload phase. Let W l k ∈ GF q be the l-th symbol of each W k , k ∈ [1 : K], l ∈ [1 : L], and W l ∈ GF 1×K q be the row vector of the l-th symbol of all K messages, i.e., W l = [W l 1 , · · · , W l K ]. Assume that α n , n ∈ [1 : N] are N distinct coefficients all belonging to the set {α ∈ GF q : α + i = 0, i ∈ [1 : L]}-i.e., for any i, j ∈ [1 : N], α i = α j . Note that the α n s are globally shared variables known to the users, servers, and the data collector. In order to protect the privacy of the users against the servers, each user k will generate L × E random noises Z k le uniformly from GF q . The uploaded information to the n-th server by the k-th user is given by . . .
For convenience of notation, we write the content stored at server n in a vector form as D n =[D 1 1,n , · · · , D 1 K,n , D 2 1,n , · · · , D 2 K,n , · · · , D L 1,n , · · · , D L K,n ] (34) where D n ∈ GF 1×KL q , and Z le is defined as Z le = [Z 1 le , · · · , Z K le ]. In the computation phase, the query to Server n is determined by the coefficient f and the randomness from the data collector Z . We design the query to Server n based on f as where Z 1 , · · · , Z L are L random column vectors of length K, whose elements are uniformly distributed on GF q , generated by the data collector. For any server n ∈ [1 : N], the answer to the data collector A f n ∈ A n = GF q is calculated by According to the expansion of the representation (38) in the descending power of α, we can see that l+α n W l · f and a polynomial of degree E in α n . We can rewrite (38) as where I e is the coefficient of α e n for any e ∈ [0 : E]. We can clearly see from (38) and (39) that I e is not a function of n. If we write the answers to the data collector from all the N servers together, we can get the following formula in a matrix form: Now, we prove that this scheme satisfies the decodability constraint, i.e., (9), and the privacy constraints-i.e., the privacy constraint of the users against E servers (2), the privacy constraint of the users against the data collector (10), and the privacy constraint of the data collector against the non-colluding servers (11).
Recall that in our scheme, we let L = N − E − 1. The decodability constraint is satisfied because the matrix in (40) is an N × N full-rank matrix when the α n s are distinct Lemma 5 in [33]. Hence, W 1 · f, · · · , W L · f can be fully recovered from (4) is obtained by The privacy constraint of the users against E colluding servers is satisfied due to the sharing strategy of the users. In (33), we know that the k-th user shares its l-th symbol to the n-th server in a form where D l k,n denotes the storage in server n that W l k shares. The security needs to guarantee that any E out of N servers do not know W k for any k ∈ [1 : K]. For any set E := {o 1 , · · · , o E } such that E ∈ [1 : N] and |E | = E, we choose the E servers to be in E . We can write the storage of these servers with respect to what W l k shares in a matrix form: Notice that the Vandermonde matrix in (43), denoted by V E , is invertible for distinct {1 + α e : α e ∈ GF q , e ∈ E }, so the second term of (43) contains E symbols that are linearly independent. The privacy constraint of the users against E colluding servers can be guaranteed as the E additional random symbols protect the shared message. We have where (44) comes from (43), and (45) holds because when i < j, we have and when i > j, we have so the remaining items are those (i, j) such that i = j = l.
To prove that the privacy constraint of the data collector against the non-colluding servers is satisfied, we notice that the query to each server is composed of the desired coefficient f and independent additional randomness Z [1:L] . Consider two linear independent vectors, f, f ∈ GF K q . For any l ∈ [1 : L], the l-th entries of Q f n and Q f n are ∆ n l+α n (f + (l + α n )Z l ) and ∆ n l+α n (f + (l + α n )Z l ), respectively. Notice that Z l (thus ∆ n l+α n · Z l ) is chosen uniformly from GF K q , and that ∆ n l+α n · f and ∆ n l+α n · f are two deterministic vectors in GF K q . The l-th symbol of Q f n or Q f n has the same distribution. Therefore, any single server can not distinguish queries from the data collector with one coefficient f. Furthermore, if we assume the desired coefficient t is a random variable with some distribution known only to the data collector, and f is an implementation of t, we can prove that the mutual information between t and (Q t n , A t n , W where (46) is from (38), (47) is from (10), and (48) is from (6).
Finally, to prove that the privacy constraint of the users against the data collector is satisfied, we construct a basis of GF K q containing the desired f. Assume that the vectors in the basis is denoted by {f 1 , f 2 , · · · , f K } where f 1 = f. We then have Thus, we prove that the scheme satisfies all the constraints. As the answer from any server contains one symbol (the inner product) from GF q , the rate of our proposed scheme is It can be seen that the scheme we construct by (33), (36), and (38) is achievable with the rate (54) invariant with K by rearranging the data W [1:K] to W [1:L] . We notice that the achievable rate meets the asymptotic upper bound for any K ∈ N + , so the scheme is then proved to be asymptotically optimal, by letting K → ∞.

Converse Result When
In this section, we prove that the asymptotic capacity is zero when N = E + 1. First, the inequality (23) also holds when N = E + 1 because the inequality in (15) becomes an equality. Hence, we have Thus, for any linear independent vectors f 1 , f 2 , · · · , f K ∈ GF K q , we have

=0.
The upper bound of C indicates that it is unfeasible to construct a scheme that has a positive communication rate when K → ∞. However, differently from the N = E scenario, our proof of the zero asymptotic capacity does not mean that there does not exist any scheme that satisfies all the constraints of our problem. In other words, there may be schemes that can satisfy all the constraints of the problem, albeit with an asymptotic capacity of zero. The detailed construction of a feasible scheme for E = N − 1 and finite K is still an open problem.

Conclusions
Thanks to the research on data privacy and modeling of infectious diseases, the privacy-preserving epidemiological-data-collection problem was proposed, which aims to maximize the collection rate while protecting the privacy of all users' data and the data collector's preferences. We have found the asymptotic capacity of this problem, and the result shows that when there is more than one remaining server that not colluding with the other servers to decode the users' data, the asymptotic capacity exists. The objective of this work was to find the best performance for the privacy-preserving epidemiologicaldata-collection problem, and we partly achieved this goal by giving the construction and proof of the optimal scheme when K is infinitely large. The achievability for N ≥ E + 2 was given by the cross-subspace alignment method, and the infeasibility of N = 1 or N = E was also proved. Our findings not only extend the research on secure multi-party computing systems in information theory, but also provide information-theoretic frame-works, implementations, and capacity bounds for the study of privacy epidemiological modeling. Although we characterized the asymptotic capacity for this problem, the exact capacity is still unknown. Some future directions include finding the exact capacity for finite K, the construction of a scheme when N = E + 1, and the performance of asymptotic capacity under irregular colluding patterns. In general, our result of asymptotic capacity for this problem will provide useful insights for further studies in data both privacy and epidemiological modeling.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: