Comparison of Antimicrobial Treatment Incidence Quantiﬁcation Based on Detailed Field Data on Animal Level with the Standardized Methodology of the European Medicines Agency in Veal Calves, Switzerland, 2016–2018 Estimation of COVID-19 Dynamics in the Different States of the United States during the First Months of the Pandemic †

: Precise quantiﬁcation of antimicrobial treatment incidence (TI) is crucial for benchmarking. Two widespread methods for treatment incidence quantiﬁcation were compared for agreement. Field data were obtained from 38 veal farms from 2016 to 2018 (1905 calves, 1864 treatments). Calculation of TI swiss for calves was based on detailed treatment records using pharmacokinetic values from the Swiss Veterinary Medicines Compendium. The method published by the European Medicines Agency was used to calculate TI in deﬁned daily doses (TI DDD ). For each calf and treatment, TI swiss and TI DDD were calculated on level of the antimicrobial class, drug, application route, and farm. The quotient (Q) of TI swiss and TI DDD was calculated. Divergence in results between the two methods of ≤ 25% was arbitrarily set as good agreement. The agreement between TI swiss and TI DDD was mostly good. On class level, good agreement was observed for treatments representing 71.5% of the TI DDD, and 74.5% of the total TI DDD on drug level. Poor agreement was mainly observed for tylosin and sulfadimidine. The agreement was better for parenteral than for oral treatments (81.6% vs. 72.3%). For practically orientated calculation on farm level, good agreement was observed (77.5% of the TI DDD ). The TI DDD method showed mostly good agreement, especially for parenteral treatments. Abstract: Estimation of COVID-19 dynamics and its evolution is a multidisciplinary effort, which requires the uniﬁcation of heterogeneous disciplines (scientiﬁc, mathematics, epidemiological, biological/bio-chemical, virologists and health disciplines to mention the most relevant) to work together towards a better understanding of this pandemic. Time series analysis is of great importance to determine both the similarity in the behavior of COVID-19 in certain countries/states and the establishment of models that can analyze and predict the transmission process of this infectious disease. In this contribution, an analysis of the different states of the United States will be carried out to measure the similarity of COVID-19 time series, using dynamic time warping distance (DTW) as a distance metric. A parametric methodology is proposed to jointly analyze infected and deceased persons. This metric allows comparison of time series that have a different time length, making it very appropriate for studying the United States, since the virus did not spread simultaneously in all the states/provinces. After a measure of the similarity between the time series of the states of United States was determined, a hierarchical cluster was created, which makes it possible to analyze the behavioral relationships of the pandemic between different states and to discover interesting patterns and correlations in the underlying data of COVID-19 in the United States. With the proposed methodology, nine different clusters were obtained, showing a different behavior in the eastern zone and western zone of the United States. Finally, to make a prediction of the evolution of COVID-19 in the states, Logistic, Gompertz and SIR models were computed. With these mathematical models, it is possible to have a more precise knowledge of the evolution and forecast of the pandemic.


Introduction
Antimicrobial resistance (AMR) is likely to become an obst ment, as antimicrobials may become ineffective for disease cont may rise considerably [1]. There is a large basis of evidence show (AMU) is a major driver for the development of AMR in human through selection of resistant strains, which emphasizes the n Different methods are used to quantify and compare AMU am aim of developing strategies to monitor drug use and to identify use. Sales figures of antimicrobial therapeutic products have b for various countries for more than a decade [3][4][5][6]. However, an the interpretation of sales figures is that antimicrobials cannot

Introduction
The COVID-19 epidemic started in Hubei Province, China, around December 2019. Since then, the disease has spread to all continents and countries of the world, being categorized as a pandemic by World Health Organization on 11 March 2020.
In recent months, contributions have been made that analyze the evolution of the pandemic in different countries, implementing mathematical models to predict their evolution. Traditional predictive models for infectious diseases mainly include models for predicting differential equations and models for predicting time series based on statistics and random processes.
For example, in [1] a methodology with the aim of estimating the actual number of people infected with COVID-19 in France is presented, since according to the authors, the number of screening tests carried out and the methodology do not directly calculate the actual number of cases and infection mortality rate (IFR). A mechanistic-statistical approach was developed that combines an epidemiological SIR model that describes these unobserved epidemiological dynamics, a probabilistic model that describes the data collection process and a method of statistical inference. The logistic growth model, the generalized logistic growth model, the generalized growth model and the generalized Richards model were used to model the number of infected cases in the 29 provinces of China (and several countries), performing a detailed analysis on the heterogeneous situations by four phases of the outbreak in China [2].
In [3] the Kermack-McKendrick SEIR model (Susceptible, Exposed, Infectious and Recovered) is presented to analyze the effects of behavioral changes on the reduction in community transmission in Mexico. A variable contact rate over time is proposed and the consequences of disease spread in an affected population of non-essential activities is analyzed.
The behavior of the virus in Japan has also been analyzed [4]. By 29 February 2020, in addition to the 619 confirmed cases (passengers and crew members) infected with COVID-19 in a cruise ship (near Tokyo), 215 locally transmitted cases had been also confirmed in Japan. To evaluate the effectiveness of reaction strategies based on avoiding large accumulations or crowded areas and to predict the spread of COVID-19 infections in Japan, in [4] a stochastic transmission model produced by expanding the epidemiological model based on SIR (Susceptible-Infected-Removed) had been presented. The simulation results showed that the number of Infected and Removed patients will increase rapidly if there is no reduction in the time spent in crowded zone.
In [5] using the Maximum-Hasting (MH) parameter estimation method and the SEIR model, the spread of COVID-19 and its prediction in South Africa, Egypt, Nigeria, Senegal, Kenya, and Algeria under three intervention scenarios (suppression, mitigation, mildness) is presented.
In addition to the most relevant epidemiological models used in the literature, models typically based on time series have also been used to analyze the behavior of the pandemic in different countries. The autoregressive integrated moving average (ARIMA) model is a mathematical model widely studied in the context of time series that has been successfully applied in the field of health (estimate the incidence and prevalence of influenza mortality, malaria incidence, hepatitis, and other infectious diseases) as well as in different fields in the past due to its simple structure, fast applicability and ability to explain the data set.
In [6], ten Brazilian states are analyzed using the autoregressive integrated moving average (ARIMA), the cubistic regression (CUBIST), the random forest (RF), ridge regression (RIDGE), the support vector regression (SVR) and the stacking-ensemble learning in the task of time series forecasting of the number of patients infected with COVID-19 with one, three, and six-days ahead. A forecasting model based on ARIMA has also been presented in [7] for Pakistan, presenting the high exponential growth in the number of confirmed cases, deaths and recoveries. In [8], ARIMA time series models were applied to forecast the total confirmed cases of COVID-19 for the next ten days using the model ARIMA (0,2,1), ARIMA (1,2,0) and ARIMA (0,2,1) for Italy, Spain, and France, respectively.
Currently, the analysis of the evolution of COVID-19 in America is of great importance due to the impact of this epidemic on this continent. In this contribution we will focus on the United States. The first patient detected in the United States was a travel-associated case from Washington state on 19 January 2020. The preponderance of initial cases of infected patients with COVID-19 in the United States were correlated with travel to a "high-risk" country or close contacts of previously identified cases corresponding to the testing criteria adopted by the Centers for Disease Control and Prevention (CDC) (https://www.cdc.gov/, accessed March 2020). From 1 to 31 March 2020, the number of reported COVID-19 cases in the United States rapidly increased from 30 to 188,172, the number of deaths rising from 1 to 5531, and the virus being detected in all the states. At the end of April, the number of infected reached 1,069,424 and the number of deceased stood at 62,996. At the time of writing this contribution (14 June 2020) the number of infected is more than 2× 10 6 and Proceedings 2021, 5, 53 3 of 9 more than 100.000 deaths, the United States being one of the countries that is suffering the most from COVID-19.
In a recent paper [9], an attempt was made to estimate the actual number of infected people, even if they have not been counted. It was estimated that the true number of COVID-19 cases in the United States is likely in the tens of thousands, suggesting substantial undetected infections and spread within the country [10].
This contribution presents a methodology to analyze the evolution patterns of COVID-19 in the states of United States (including Puerto Rico and the District of Columbia). A parametric similarity measure is presented, based on a robust distance measure between time series, the dynamic time warping distance (DTW), with which the number of infected and dead in each of the states can be compared simultaneously, even though the start of the epidemic originated on different dates in each zone (therefore, the time series that need to be compared have different lengths).
To the best of our knowledge, this contribution is the first study that tries to develop a hierarchical clustering time series algorithm in order to globally compare and classify the behavior of all the states of United State simultaneously in their evolution of infected and deceased patients suffering COVID-19. Carrying out this classification is very useful, since it will allow the establishment of similarities and patterns in the evolution of the pandemic among the states of the United States.

Material and Methods
A time series is a sequence of numerical (temporal) data points in successive order, which is naturally high dimensional and large in data size. There are two main operations that could be performed when working with time-series with its sequential data: (a) the analysis of a single time series; (b) the analysis of multiple time series simultaneously. This contribution is concentrated on the analysis of multiple time series for all the states of US suffering COVID-19, with the purpose of finding similarities between multiple time series by performing a clustering time-series methodology.
Clustering such complex objects is particularly advantageous because it may lead to the discovery of interesting patterns in time-series datasets, which contributes to a better understanding of the COVID-19 spread in different regions of the United States.
Clustering of time-series sequences has received noteworthy attention [11,12], not only as a formidable exploratory method and powerful tool for discovering patters, but also as a pre-processing step or subroutine for other tasks [13].
In this section, the database used is presented first (Section 2.1). Subsequently, a review of the most popular distance measures for time series is described (Section 2.2) and a new parametric distance is proposed.

Data Set
The COVID-19 epidemic data set used in this contribution was collected from the Johns Hopkins University [14]. In this platform, the number of confirmed, deaths and recovered cases until 14 June 2020 for different countries are presented. For the United States, two additional .csv files are provided, in with detail of administration and province/state is reported (including Puerto Rico and District of Columbia). In order to compare countries behavior, the time-series data are divided by state population.

Similarity/Distance Measure in Time Series
In a simplified way, the similarity of two simple time series with the same number of points (denoted by m), and defined by X = {x 1 , x 2 , . . . .x m } and Y = {y 1 , y 2 , . . . .y m }, can be achieved by simply calculating the Minkowski (or Euclidean) distance (shortest path between two points) between points on both time series that happen at the same time. This distance is the measure of similarity, denoted as d(X,Y), and it is a function that takes both times series (X,Y) as input and calculates their distance "d", defined as: when k = 2, the distance between two series is called Euclidean Distance. Using the Minkowski distance is a good metric to analyze the similarity of two time series, if these time series are synchronized (that is, all similar events in both time series occur at exactly the same time) and have the same length.
The evolution of time series in the different states of the United States present a different start date, both for the number of confirmed and death cases, and therefore its length is also different. Suppose as an analogy the time series of the sound of a mother's voice when she speaks slowly to her child. If the mother says the same phrase quickly, the child will most likely recognize that she is still his mother. However, if the Euclidean distance between both series were used as a metric, these two time series would have a very low similarity and would not be considered fundamentally equal. This would lead to the conclusion that the two voices did not come from the same person. To solve this problem, the dynamic time warping distance (DTW) method is frequently used in the bibliography [15].
DTW is a technique that can be considered as an extension of the Euclidean Distance between series [16], that calculates an optimal match between two given time series with certain restriction, performing non-linearly in the series (by stretching or shrinking along its time-axis). This distortion (denoted as warping) between two time series is used to find corresponding regions and determine the similarity between them.
The DTW of two series X and Y, defined as X = {x 1 , x 2 , . . . .x n } and Y = {y 1 , y 2 , . . . .y m } is computed in the following way. An n-by-m matrix D is computed with the (i. j)th element, defining the local distance of two elements by: The point-to-point alignment between series X and Y can be represented by a time warping path W, defined as: where p is the length of the warping path W, and w x (k) and w y (k) represent the indexes in time series X and Y, respectively. The warping path w x (k) w y (k) indicates that the w x (k)th element in time series X maps to the w y (k)th element in time series Y. There are some constraints and rules for the construction of the warping path: • Every index from the first time series must be matched with one or more indices from the other time series (and vice versa) • The first (the same for the last index) index from the first time series must be matched (not only this match) with the first (last) index from the other time series. That is, the warping path should start at W(1) = (1,1) and end up at W (p) = (n,m).

•
The mapping of the indices from the first time series to indices from the other time serie must be monotonically increasing, and vice versa. The adjacent elements of path W, W(k) and W(k + 1) must be subject to w x (k + 1) − w x (k) ≥ 0 and w y (k + 1) − w y (k) ≥ 0.

•
The warping path should also have the property of continuity, mathematically expressed as adjacent elements of path W, W(k) and W(k + 1) must be subject to w x (k + 1) − w x (k) ≤1 and w y (k + 1) − w y (k) ≤ 1.
The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values. The DTW (minimal distance and optimal warping path) could be found using a dynamic programming algorithm: where RD x i , y j is the minimal cumulative distance from (0, 0) to (i, j) in matrix D. In the methodology proposed in this paper, for each of the states analyzed, both the time series of the number of infected and the time series of deaths will be simultaneously taken into account.
If each of these time series needs to be weighted differently, the following parametric metric, DTW ∝ (S A , S B ) is defined: that measures the similarity in the evolution of the COVID time series for two states of the United States (S A y S B ), TSC A and TSC B represent the time series of the number of infected, TSD A and TSD B represent the time series of the number of deaths for the states S A and S B , respectively. The parameter α (with 0 ≤ α ≤ 1) indicates the relative relevance given to the similarity measure, taking into account the time series of infected or deaths.

Clustering Method for Time Series
Clustering is a data mining technique in which similar data are divided into related or homogeneous groups, in an unsupervised way, that is, without a priori advanced knowledge of the data. For the problem presented in this contribution, working with time series of states of the United States suffering COVID-19, given a set of individual time series data, the objective is to group similar time series into the same cluster.
The problem of grouping time series data is formally defined by, given a dataset of N time series data Q = {X 1 , X 2 . . . X N }, finding in an unsupervised way a partition of Q into K cluster, denoted as C={C 1 , C 2 , . . . C k }, taking into account that homogeneous or similar series are grouped together based on a certain similarity/distance measure. In this paper, the parametric metric DTW ∝ (S A , S B ) is used and there is no intersection between clusters, therefore: The methods used in the area of time series clustering [11,17] are usually based in a conventional clustering algorithm by substituting standard distance measurements with a more suitable distance to compare time series (raw methods) or converting series into normal data and using directly classical algorithms (feature-based methods and models).
Among the most popular clustering algorithms, the hierarchical clustering and the k-means algorithm are widely used in time series clustering. In this contribution the hierarchical clustering is used, mainly due to its great visualization power and its simple and intuitive interpretation.
Hierarchical clustering creates a nested hierarchy of similar time series, according to a pair-wise distance matrix of the time series analyzed. The similarity measure DTW ∝ (S A , S B ) used is therefore essential in this time-series clustering process.
The most widely used linkage criteria, such as single, average and complete linkage variants [18], were analyzed. Hierarchical clustering can be converted into a partitional clustering, with k cluster, by cutting the first k links.

Results and Discussion
To evaluate the performance of the proposed method, several experiments are conducted in this section for three values of the parameter α in the distance metric DTW ∝ (S A , S B ). The time series of the states of the United States was taken from John Hopkins database. Data range from 22 January 2020 to 14 June 2020, covering all the states (including Puerto Rico and District of Columbia). For the computation of the distance metric, a threshold I min was defined, defining the minimum number of infected people to start the time series, being for this study I min = 5 (the number of confirmed was greater than 5). Therefore, the length of the time series is different for each state, being on average 101 days (Figure 1). The index of each of the states is presented in Table 1.
Eng. Proc. 2021, 5, 53 6 of 9 The most widely used linkage criteria, such as single, average and complete linkage variants [18], were analyzed. Hierarchical clustering can be converted into a partitional clustering, with k cluster, by cutting the first k links.

Results and Discussion
To evaluate the performance of the proposed method, several experiments are conducted in this section for three values of the parameter α in the distance metric ∝ ( , ). The time series of the states of the United States was taken from John Hopkins database. Data range from 22 January 2020 to 14 June 2020, covering all the states (including Puerto Rico and District of Columbia). For the computation of the distance metric, a threshold Imin was defined, defining the minimum number of infected people to start the time series, being for this study Imin = 5 (the number of confirmed was greater than 5). Therefore, the length of the time series is different for each state, being on average 101 days (Figure 1). The index of each of the states is presented in Table 1.  Table 1. Distribution of the states obtained by means of hierarchical clustering with 9 clusters (α = 0.5 and, in bold, the state for which the SIR model is calculated). DCluster is the distance between the elements that make up a cluster (its value is zero in the case that there is only one element in a cluster).

DCluster
States (α = 0.5)  Table 1. Distribution of the states obtained by means of hierarchical clustering with 9 clusters (α = 0.5 and, in bold, the state for which the SIR model is calculated). DCluster is the distance between the elements that make up a cluster (its value is zero in the case that there is only one element in a cluster).

Cluster Number (C N ) D Cluster
States (α = 0.5) The values of the parameter α analyzed will be {0, 0.5, 1}. This section of results begins with the value of α = 0.5, that is, the information of the confirmed patients time series with the same relevance as the time series of deaths for the final computation of the distance DTW ∝ (S A , S B ). The distance matrix between the different states is presented in Figure 2 (for a better visual representation, the distance matrix was multiplied by a constant and the states are ordered according to the cluster to which each one belongs). The values of the parameter α analyzed will be {0, 0.5, 1}. This section of results begins with the value of α = 0.5, that is, the information of the confirmed patients time series with the same relevance as the time series of deaths for the final computation of the distance ∝ ( , ). The distance matrix between the different states is presented in Figure 2 (for a better visual representation, the distance matrix was multiplied by a constant and the states are ordered according to the cluster to which each one belongs). It is important to highlight the existence of various clusters with only one state (corresponding with District of Columbia, New Jersey and Rhode Island). Cluster 7 (New Jersey, listed as 48 in Table 1) links directly to cluster 8 ((49) Connecticut, (50) Massachusetts, (51) New York), which denote similar behavior between these states. For cluster 6 (District of Columbia, number 47), the linkage is performed for both cluster 3 ((32) Michigan, (33) Pennsylvania) and cluster 4 ((34) Delaware, (35) Illinois, (36) Louisiana, (37) Maryland). There are two large clusters (cluster 2 and cluster 5) that contain 29 and 9 states, respectively. Its linkage is performed through cluster 1, which contains only two states (Nebraska and South Dakota).
The similarities and distances between the different states and clusters obtained can be analyzed using the results presented in the hierarchical clustering ( Figure 3) and distance matrix (Figure 2). Figure 4 presents a geographical representation of all the states according to the cluster grouping obtained in Table 2, using the similarity between the clusters obtained using the dendogram in Figure 3.  It is important to highlight the existence of various clusters with only one state (corresponding with District of Columbia, New Jersey and Rhode Island). Cluster 7 (New Jersey, listed as 48 in Table 1) links directly to cluster 8 ((49) Connecticut, (50) Massachusetts, (51) New York), which denote similar behavior between these states. For cluster 6 (District of Columbia, number 47), the linkage is performed for both cluster 3 ((32) Michigan, (33) Pennsylvania) and cluster 4 ((34) Delaware, (35) Illinois, (36) Louisiana, (37) Maryland). There are two large clusters (cluster 2 and cluster 5) that contain 29 and 9 states, respectively. Its linkage is performed through cluster 1, which contains only two states (Nebraska and South Dakota).
The similarities and distances between the different states and clusters obtained can be analyzed using the results presented in the hierarchical clustering ( Figure 3) and distance matrix ( Figure 2). Figure 4 presents a geographical representation of all the states according to the cluster grouping obtained in Table 1, using the similarity between the clusters obtained using the dendogram in Figure 3. The values of the parameter α analyzed will be {0, 0.5, 1}. This section of results begins with the value of α = 0.5, that is, the information of the confirmed patients time series with the same relevance as the time series of deaths for the final computation of the distance ∝ ( , ). The distance matrix between the different states is presented in Figure 2 (for a better visual representation, the distance matrix was multiplied by a constant and the states are ordered according to the cluster to which each one belongs). It is important to highlight the existence of various clusters with only one state (corresponding with District of Columbia, New Jersey and Rhode Island). Cluster 7 (New Jersey, listed as 48 in Table 1)  There are two large clusters (cluster 2 and cluster 5) that contain 29 and 9 states, respectively. Its linkage is performed through cluster 1, which contains only two states (Nebraska and South Dakota).
The similarities and distances between the different states and clusters obtained can be analyzed using the results presented in the hierarchical clustering ( Figure 3) and distance matrix (Figure 2). Figure 4 presents a geographical representation of all the states according to the cluster grouping obtained in Table 2, using the similarity between the clusters obtained using the dendogram in Figure 3.   Table 2 using the ∝ ( , ) and α = 0.5.
The hierarchical clustering previously presented was obtained taking into account the time series of the number of confirmed and death cases simultaneously (α = 0.5).

Conclusions
A powerful tool for the analysis of time series is the grouping through clustering. Clustering time series is usually an unsupervised process, with the aim of finding behavioral similarities between the different time series that are analyzed. This article proposed a parametric metric, based on the dynamic time warping distance, in order to measure the distance or similarity between time series corresponding to different states in the United States, taking into account the behavior of the number of COVID-19 confirmed cases and persons deceased due to COVID-19 simultaneously. The proposed parametric metric, named ∝ ( , ), is robust to the different lengths of data sequences (different beginning of the epidemic in the different states of the United States). - Using the Calinski-Harabasz criterion, the optimal number of clusters in which the different states of United States can be grouped was obtained, taken as a value of α = 0.5 (same relevance for the time series of confirmed and death patients). A total of nine heterogeneous clusters were found, in the sense that there are clusters within a large number of states (there are two large clusters, which encompass 29 and 9 countries) and other clusters with only one state (indicating that their behavior was unique, as they do not have excessive similarities with the rest of states).
-With the proposed hierarchical clustering procedure, it is possible to identify and summarize interesting patterns and correlations in the underlying data of the time series of the states of the United States suffering COVID-19 and therefore determine similar behaviors that different states may have.   Table 1 using the DTW ∝ (S A , S B ) and α = 0.5.
The hierarchical clustering previously presented was obtained taking into account the time series of the number of confirmed and death cases simultaneously (α = 0.5).

Conclusions
A powerful tool for the analysis of time series is the grouping through clustering. Clustering time series is usually an unsupervised process, with the aim of finding behavioral similarities between the different time series that are analyzed. This article proposed a parametric metric, based on the dynamic time warping distance, in order to measure the distance or similarity between time series corresponding to different states in the United States, taking into account the behavior of the number of COVID-19 confirmed cases and persons deceased due to COVID-19 simultaneously. The proposed parametric metric, named DTW ∝ (S A , S B ), is robust to the different lengths of data sequences (different beginning of the epidemic in the different states of the United States). - Using the Calinski-Harabasz criterion, the optimal number of clusters in which the different states of United States can be grouped was obtained, taken as a value of α = 0.5 (same relevance for the time series of confirmed and death patients). A total of nine heterogeneous clusters were found, in the sense that there are clusters within a large number of states (there are two large clusters, which encompass 29 and 9 countries) and other clusters with only one state (indicating that their behavior was unique, as they do not have excessive similarities with the rest of states). -With the proposed hierarchical clustering procedure, it is possible to identify and summarize interesting patterns and correlations in the underlying data of the time series of the states of the United States suffering COVID-19 and therefore determine similar behaviors that different states may have. Informed Consent Statement: Not applicable.