Mathematical Modeling and Software Tools for Breeding Value Estimation Based on Phenotypic, Pedigree and Genomic Information of Holstein Friesian Cattle in Serbia

Simple Summary This research aims to evaluate the possibility of introducing modern methods for the selection of dairy cattle in Serbia. A total of eight models were applied to estimate the genetic values and their accuracy based on the data of milk yield, milk fat yield, milk fat content, milk protein yield and milk protein content. A total of 6041 animals were included, and of them, 2565 had data for milk production traits. Two cases were studied, the first when only the first lactations were observed, and the second when all lactations were observed using a repeatability model. For the formation of a genomic relationship matrix, 58 K SNP data were used for a total of 1491 cows. Higher accuracy was obtained using models with repeated measurements. Multivariate analysis with repeated measurements showed the best results. An ssGBLUP methodology showed promising results. Expanding the genotyped population is necessary to achieve better accuracy. Abstract In this paper, six univariate and two multivariate best linear unbiased prediction (BLUP) models were tested for the estimation of breeding values (BV) in Holstein Friesian cattle in Serbia. Two univariate models were formed using the numerator relationship matrix (NRM), four using the genomic relationship matrix (GRM). Multivariate models contained only an NRM. Two cases were studied, the first when only first lactations were observed, and the second when all lactations were observed using a repeatability model. A total of 6041 animals were included, and of them, 2565 had data on milk yield (MY), milk fat yield (FY), milk fat content (FC), milk protein yield (PY) and milk protein content (PC). Finally, out of those 2565 cows, 1491 were genotyped. A higher accuracy of BV was obtained when using a combination of NRM and GRM compared to NRM alone in univariate analysis, while multivariate analysis with repeated measures gave the highest accuracy with all 6041 animals. When only genotyped animals were observed, the highest accuracy of the estimated BV was calculated by the ssGBLUPp model, and the lowest by the univariate BLUP model. In conclusion, the current breeding programs in Serbia should be changed to use multivariate analysis with repeated measurements until the optimal size of the reference population, which must include genotyping data on both bulls and cows, is reached.


Introduction
Animal breeding, by studying variability, primarily genetic, and using different statistical methods, aims to select parents of the next generations that are characterized by high production capacity, appropriate functional and conformational characteristics, and the ability to pass them on to offspring. Selection programs, when setting the criteria for the selection of parents, rely on the statistical analysis of large databases made up of data on the origin and the phenotype measurements of the desired traits, most often by the method of the best linear unbiased prediction (BLUP), which is used to estimate the breeding values of candidates for selection. In the last few years, traditional methods of selection, based on quantitative genetics, have increasingly been complemented by molecular genetic analyses. Molecular genetics has enabled the direct analysis of animal genomes, and for this purpose polymerase chain reaction (PCR), analysis of genetic markers, DNA sequencing, etc., are used. Several types of genetic markers have been developed, of which single nucleotide polymorphism (SNP) is the most widely used today. By providing insight into the structure and function of the genome, SNPs have become powerful tools for improving the selection of all types of domestic animals. On the basis of this knowledge, genomic selection has been recently developed, a new method that enables greater genetic progress compared to previous forms of selection. Genomic selection is based on applying statistical models to a set of fully phenotyped and genotyped individuals, and then using the applied model to predict the genetic value of unmeasured individuals [1]. The first official genomic evaluations in the United States were published in 2009 for the Holstein, Jersey, and Brown Swiss breeds. The incorporation of DNA-marker technology and genomics into traditional methods for evaluating animal breeding values has doubled the rate of genetic progress for economically important traits, reduced the generation interval, increased selection accuracy, reduced costs relative to progenic testing and enabled the identification of recessive lethal alleles [2]. The core data set, which makes such prediction possible, is the reference data set consisting of animals for which three sources of information are available: SNP genotypes, phenotypes and pedigree [3].
Modern breeding programs that include molecular methods for data collection on animals generate large amounts of information. The whole process, from data storage to drawing conclusions, requires complex computer operations, and so presents a great challenge in the search for new algorithms and methods. Therefore, animal breeding is a multidisciplinary scientific field that relies on achievements from genetics, computer science and statistical mathematics. Additionally, the collection, analysis and dissemination of biological data fall within the scope of bioinformatics. In bioinformatics, mathematical, statistical and computational methods are aimed at solving biological problems using genomic data and related information. Genome sequencing is the first step towards understanding genome organization and gene structure, and in addition, it is necessary to develop techniques for quantifying gene expression through complex computer-based modeling. The implementation of complex analytical methods requires the proper application of mathematical and statistical knowledge to the study of biological systems. Based on the above, it can be said that genomic selection is an example of how the advantages of bioinformatics are used in animal husbandry in order to improve the production system.
Genomic selection implies making selection decisions based on genomically estimated breeding values (gEBVs). gEBVs are calculated using an estimate of SNP effects from a prediction equation, which is derived from animals in a reference (train set) group, and which have certain SNP genotypes and phenotypes for the traits of interest. The accuracy of the gEBV depends on the size of the reference population used to calculate the prediction equation, the heritability of the traits, and the degree of relatedness between the candidate for selection and the reference population [4]. The mostly used methods for genomic prediction include best linear unbiased prediction, and several different models are available, such as TABLUP, RRBLUP, GBLUP, ssGBLUP, and ssSNPBLUP. Ref. [5] pointed out that the TABLUP method provides better results compared to RRBLUP and GBLUP because the TA matrix takes into account one marker that explains to a greater extent the genetic variance for a certain trait, while the matrix is created based on all other data from the pedigree and markers regardless of their influence on a given trait. Ref. [6] studied two dairy cattle populations (German-Austrian-Czech Fleckvieh and German-Austrian Brown Swiss) and concluded that the application of ssGBLUP clearly improves genomic evaluation. Ref. [7] pointed out that ssGBLUP uses all available phenotype, genotype and pedigree information to provide unbiased genomic prediction with higher accuracy than other genomic prediction models. Furthermore, those authors reported that ssSNPBLUP proved to be the most efficient of all variants of the ssGBLUP model, highlighting its advantage in large database applications, such as data processing of up to a million genotyped animals.
The dominant dairy breed in the registered cattle population in Serbia is the Holstein Friesian. The largest part of the population is breeding on the territory of AP Vojvodina, where, according to the data of the Central Breeding Organization for the year 2021, there were registered 67,671 heads [8]. According to the data from this report, the average milk production of Holstein Friesian cows in AP Vojvodina during 305 days of lactation in 2021 was 7405 kg. By comparing this value with the milk yield of recorded cows in other ICAR member countries [9], we see that there are opportunities for improving the traits in the Serbian dairy population, and one of the ways is through the improvement of the methods of the selection of cows. This research aims to evaluate the possibility of introducing the most modern methods for the prediction of breeding values in dairy cattle breeding programs in Serbia. In this regard, the objectives of this research are to: 1) test the implementation of the GBLUP and ssGBLUP methods for calculating breeding values instead of standard uni-and multivariate BLUP models; 2) compare the accuracy of breeding value estimates obtained by the applied methods if only the first lactation is observed; and 3) compare the accuracy of breeding value estimates using the applied methods and when using all lactations, i.e., repeated measurements. Based on the obtained results, recommendations will be defined for changing the existing breeding programs in terms of assessing national breeding values and ranking cattle.

Data Sets and Quality Control
The initial database consisted of 1600 Holstein Friesian cows genotyped with the GeneSeek GGP Bovine 100 K SNP chip, for which data were collected on milk yield traits (MY-milk yield; FY-milk fat yield; FC-milk fat content; PY-milk protein yield; PC-milk protein content). The production results referred to lactation milk production standardized to 305 days using ICAR approved methods for calculating daily yields from AM/PM milkings [10,11] and were provided by the Central Breeding Organization, Department of Animal Husbandry, Faculty of Agriculture Novi Sad.

Preparation of SNP File
The first processing of SNPs meant turning off the chips for the so-called Mendelian traits, as well as turning off the double SNPs with a smaller GeneTrain parameter. This was followed by the preparation of SNPs for use in the Wombat software [12], which does not accept markers with missing values. For this purpose, some SNPs and animals were eliminated, after which the database consisted of 1566 animals with 62,807 SNPs, each with a call rate of 100%. The next step included SNP quality control in the preGSF90 software [13], whereby one SNP with Mendelian conflicts (MC) and 4091 SNPs with a minor allele frequency of <0.05 were removed. Those animals (n = 14) with parent-progeny Mendelian conflicts were also removed. The database after quality control included 1491 animals with 58,715 SNPs.

Preparation of Pedigree Files
Three files were prepared based on origin data for two generations of ancestors. The first file is related to the standard pedigree file for calculating the numerical relationship matrix (A) from the pedigree, which is organized through three columns (animal, father, mother). A second pedigree file was created to calculate a genomic relationship matrix (G), which is organized through four columns and includes ID numbers for only genotyped animals. The first three columns refer to data on origin (animal, father, mother), and the fourth column indicates that there is a SNP marker for the animal. The third pedigree file was a combination of the previous two files and was used to calculate a hybrid relationship matrix (H), which used both pedigree data and SNP data to calculate relationships. It was also organized through four columns with the same meaning as in the second file.

Preparation of Data Files for Measurements of Traits and Factors
First, two files were prepared for a total of 2565 animals, i.e., cows from the pedigree file A, for which there were measurement results for the investigated production traits. One file related to the cows for which there were data on production during the first lactation, while the second file included repeated measurements, i.e., production results from other lactations as well (I lactation-2565 measurements; II-1448 measurements; III-768 measurements; IV-344; V-116; VI-40). Lactations refer to calvings from 2012 to 2021, which were grouped into three seasons (I-November, December, January, February; II-March, April, September, October; III-May, June, July, August). Then, a third file was prepared, which included the aforementioned data, but only for genotyped animals. Table 1 provides an overview of the number of animals included in the input files that participated in the trials.

Statistical Analysis of Data
BLUP is often used in animal breeding and is a method that evaluates random effects, which in animal breeding, are most often related to evaluations of breeding values. In all the presented models, the values suggested by Wombat [12] were used for the initial values of the variance components. These are actually variances that were calculated according to well-known formulae. Covariances of A and B are calculated based on the formulâ cov(A, B) = r A,B × s 2 A × s 2 B where r A,B is the correlation coefficient and s 2 A and s 2 B are variances of A and B. Final values of sample variances and sample covariances were obtained by REML estimation of (co)variance components. Additionally, in all analyses, the fixed part of the model included the influence of the interaction between farm, year and season (FYS), and in the model with repeated measurements (Fix 2 ), the influence of lactation number (L) was additionally included, as follows: Ref. [14] first developed BLUP, by which fixed effects and breeding values can be estimated simultaneously. BLUP has found wide use in the genetic evaluation of domestic animals due to its statistical properties. With the constant progress of computers, its application has expanded from simpler models, such as the father model, to more complex models, such as the animal model and its derivatives.
We considered the following equation for a mixed linear model: where: y is the vector of observations (features) dimensions n × 1, and n is the number of data; b is the vector of unknown fixed effects of dimension p × 1, and p is the number of levels of fixed effects; a is the vector of unknown random effects (breeding values) of dimension q × 1, and q is the number of levels of random effects; e is the vector of random error effects of dimension n × 1; X is a known design matrix of dimension n × p relating the given data to the fixed effects; Z is a known design matrix of dimension n × q relating the given data to the random effects. Matrices X and Z are also called incidence matrices and their elements are 0 and 1. As in ref. [15], we assumed that the expectations are actually: E(y) = Xb, E(a) = E(e) = 0, and as the error effects are independently distributed with variance σ 2 e , var(e) = Iσ 2 e = R, var(a) = Aσ 2 a = F i cov(a, e) = cov(e, a) = 0, where A is a matrix of relationship dimension q × q, and I is the unit matrix of dimension n × n. That is, when var(y) = ZAZ σ 2 a + Iσ 2 e , cov(y, a) = ZAσ 2 a , cov(y, e) = Iσ 2 e Z denotes the transposed matrix Z. Mathematically speaking, BLUP provides an estimate for b that we denoted byb, and an estimate for a, denoted byâ, and which were obtained by solving the equations (MME-Mixed Model Equations) which are, for Equation (1), given as assuming that the matrices R and F are regular. Note that when F −1 tends to the zero matrix, then Equations (2) become a generalized least-squares equations (LSE), and then both b and a are considered as fixed effects [16]. Based on the earlier definition of the matrix R, Equation (2) become where α 1 = σ 2 e /σ 2 a . If the matrix in Equations (3) is singular, then a generalized inverse matrix can be used. Also, if the matrix in Equations (3) is written in the form U 11 U 12 U 21 U 22 , and with U 11 U 12 U 21 U 22 its generalized inverse matrix, then the accuracy (r) of the breeding values prediction is obtained by the previously published formula [15], r = √ 1 − u i α 1 , where u i is the i-th diagonal element of the matrix U 22 .

Prediction of Breeding Values Using Numerical Relationship Matrix (NRM), Model with Two Random Effects (Model 2-BLUPp)
The repeatability model was used to analyze data when multiple measurements of the same trait, such as milk yield in consecutive lactations, were recorded in an individual [17].
For models with repetition, the following form was used: where: y is the vector of observations, b is the vector of unknown fixed effects, a is the vector of unknown random effects, p is the vector of unknown environmental random effects of dimension r × 1, and e is the vector of error effects. Furthermore, X and Z are the design matrices for the given data regarding fixed and random effects, respectively. The matrix W is a design matrix related to the effects of the external environment (i.e., only those animals for which we had repeated measurements were observed) of dimension n × r.
The vector a contains only the additive random effects of the animal, while the nonadditive genetic effects are contained in the vector p. The assumption is that environmental effects and error effects are independently distributed with a mean of zero and variances σ 2 p and σ 2 e , respectively. That is when The phenotypic variance σ 2 y is equal to σ 2 a + σ 2 p + σ 2 e , while the repeatability coefficient is equal to R = (σ 2 a + σ 2 p /σ 2 y . The MME model for Equation (4) is of the form After replacing matrix R with matrix Iσ 2 e from Equations (5) we get where α 1 = σ 2 e /σ 2 a and α 2 = σ 2 e /σ 2 p . The aim was to evaluate the effects of number of lactations and to predict the breeding values for the animals and the lasting effect of the environment on the observed cows. If the matrix in Equations (6) is written in the then the accuracy (r) of the prediction of breeding values is obtained using the formula

Prediction of Breeding Values Using Genomic Relationship Matrix (GRM), Single Random Effect Model (Model 3-GBLUP)
First, we will explain the algorithm for obtaining the genomic relationship matrix [18] assuming the matrix M 012 is of dimensions q × m, where is q the number of animals, and m is the number of marker columns, i.e., locuses (number of SNPs). If the alleles are A and B and if AA = 0, AB = 1 and BB = 2, then the matrix M 012 consists of the elements 0, 1 and 2. After its centering, we get the matrix M, that is M = M 012 − J, where J is a matrix of dimension q × m and all elements of which are equal to 1. The elements of matrix M are the numbers −1, 0 and 1. For example, the main diagonal of the matrix MM shows the number of homozygotes for each individual, and outside the diagonal is the number of alleles they share with relatives.
It is necessary to determine the frequencies of allele A and allele B in each of the columns in matrix M. First, in each column in matrix , we determined how many times the numbers −1, 0 and 1 were repeated. Then, p j,−1 , p j,0 and p j,1 marked the number of occurrences of the numbers −1, 0 and 1 in the j-th column of matrix M, respectively, j = 1, 2, . . . , m. The frequency p j,A of allele A and the frequency p j,B of allele B in the j-th column were obtained using the formula Now, matrix P = p ij q×m is formed, where its elements are of the form Finally, the matrix S is of the form S = M − P. This means that the mean of the elements in each column of the matrix S is exactly 0. One way to obtain the genomic relationship matrix G is to use the following formula [18] By dividing SS by 2∑ m j=1 p j,B 1 − p j,B , the matrix G becomes analogous to the relationship matrix A. The genomic inbreeding coefficient for the i-th individual is obtained by subtracting 1 from the i-th element on the main diagonal of matrix G.
As in [19], we wrote the vector of breeding values a as a function of the marker effect, a = Sm. With var(m) = Iσ 2 m , we denoted the variance of the marker vector m [20]. Assuming the same variance per locus, variance of a is As the genetic variance is equal [21] then it follows from Equation (8) that and after including Equation (9) in (7), we have As calculated by another author [18], the genomic relationship matrix (G) is given by with dimensions q × q, and then, based on Equations (10) and (11) var(a) = Gσ 2 a The basic model is the same as in Equation (1), where var(e) = Iσ 2 e = R. An appropriate MME model is If the matrix G is singular, then it is replaced by G r [22], where with 0 ≤ λ ≤ 1, we denoted the ratio of the total genetic variance and the marker effect, A, as the relationship matrix, α, and β as the leveling factors proposed in [23] and [24], J is a square matrix with all elements equal to 1, I is a unit matrix and 0 < ε 1 is a small constant. Thus, GBLUP is BLUP, where A is replaced by the genomic matrix G, i.e., G r .

Prediction of Breeding Values Using Genomic Relationship Matrix (GRM), All Animals Were Genotyped, Model with Two Random Effects (Model 4-GBLUPp)
The basic model was then Equation (4) with var(e) = Iσ 2 e = R, var(a) = Gσ 2 a and var(p) = Iσ 2 p . The MME model is where α 1 = σ 2 e /σ 2 a and α 2 = σ 2 e /σ 2 p . If the matrix in Equations (12) is singular, its generalized inverse matrix is used when obtainingb,â andp. The idea for ssGBLUP arose from the fact that a smaller part of the animals from the observed population was genotyped. A relationship matrix of the form A is observed where index number 1 refers to animals that have not been genotyped, and index number 2 to animals that have been genotyped. Now, the matrix M g012 has dimensions q g × m, where q g is the number of genotyped animals and m is the number of marker columns (i.e., the number of SNPs). After centering the matrix M g012 , we get the matrix M g , whose elements are −1, 0 and 1. After determining the allele frequency (as in the previous section), the matrix S will be of the form S = M g − P. The genomic relationship matrix (G g ) is given by where p j,B is the frequency of allele B in the j-th column of the SNP. If the matrix G g is singular, then it is replaced by G gr where Clearly, A 22 is a matrix of dimension q g × q g and is the relationship matrix only for genotyped animals. Matrices J and I were defined in the previous section. The matrix H −1 is obtained as follows [25] The MME model for Equation (1) is where are α 1 = σ 2 e /σ 2 a , var(e) = Iσ 2 e = R, var(a) = Hσ 2 a . Thus, ssGBLUP is BLUP, where A is replaced by the genomic matrix H. Note that if all animals are genotyped, H −1 = G −1 gr .

Prediction of Breeding Values Using Genetic Markers (GRM), Existing Animals Not Genotyped, Model with Two Random Effects (Model 6-ssGBLUPp)
The basic model is Equation (4) with var(e) = Iσ 2 e = R, var(a) = Hσ 2 a and var(p) = Iσ 2 p . The MME model is where α 1 = σ 2 e /σ 2 a and α 2 = σ 2 e /σ 2 p . In the case of a singular matrix for this model, its generalized inverse matrix was used.

Multivariate Models
In this study, we had some dairy cow data that encompassed only the first lactation, but other data encompassing multiple lactations. Therefore, we defined MBLUP and MBLUPp methods that referred only to the first lactation and to all lactations, respectively, but each method simultaneously analyzed all five traits. The model for multivariate analysis is reduced to the grouping of multiple univariate models for each of the traits. If the number of traits is np, then the univariate model for each of the np traits is Here, y i is the observation vector for the i-th trait, b i is the vector of fixed effects for a trait i; a i is the vector of random effects of animals for a trait i, e i is the vector of random effects of error for i-th trait. X i and Z i are known matrices of the design that relate to the given data to fixed effects and random effects, respectively, and also for trait ii. In matrix form, a multivariate model can be written in the form We assumed the version reported previously [15].
where: G = g ij np×np is the additive genetic matrix of variances and covariances of animal effects where, for example, g 11 is the additive genetic variance of the first trait, g 12 = g 21 is the additive genetic covariance between the first and second traits, etc.; A is a relationship matrix;R = r ij np×np is a matrix of variances and covariances of error effects. The MME model for Equations (15) is If G −1 = g ij np×np , then In matrix form, a multivariate model (np traits) with environmental effects was written as where W i is the design matrix related to the effects of the external environment for the i-th trait. If C = c ij np×np is the matrix of variances and covariances of external environmental effects, the MME model for Equations (18) is where, along with (17),

Two-Tailed t-Test
A two-tailed t-test [26] was performed to check statistically significant differences in the average estimation of accuracy and breeding values between two selected models. Suppose we have two independent normal populations with unknown means µ 1 and µ 2 , and unknown but equal variances. Let a random sample from the first population have n 1 observations and a random sample from the second population have n 2 observations. Let − X 1 , − X 2 , S 2 1 and S 2 2 be the sample means and sample variances, respectively. For has a t distribution with n 1 + n 2 − 2 degrees of freedom. We wish to test Here, t α,n 1 +n 2 −2 is the value in the t-statistic table for level of significance α. In this manuscript, these two populations were estimates of the accuracy of breeding values or estimates of the breeding values for two different mathematical models that have been considered in the case of all traits. The null hypothesis were that there was no statistically significant difference between the average estimate of the accuracy of breeding values or the average estimate of the breeding values regarding the two performed mathematical models.

Spearman's Rank and Pearson's Correlation Coefficients
Spearman's rank correlation [27] measures the strength and direction of association between two ranked variables (breeding values). It basically provides the measure of monotonicity of the relation between two variables, i.e., how well the relationship between two variables could be represented using a monotonic function. The formula for Spearman's rank coefficient is: where ρ is Spearman's rank correlation coefficient, d i is the difference between two ranks of each observation, n is the number of observations. The Spearman's rank correlation can take a value from +1 to −1. A value of +1 means a perfect association of rank. A value of 0 means that there is no association between ranks. Finally, a value of −1 means a perfect negative association of rank. Pearson's correlation coefficient [27] measures the strength between the different variables and their relationships. Therefore, whenever any statistical test is conducted between the two variables (breeding values), it is always a good idea for the person analyzing to calculate the value of the correlation coefficient to know how strong the relationship between the two variables is. Let X 11 , X 12 , . . . , X 1n be the samples of the first variable and X 21 , X 22 , . . . , X 2n of the second variable. Pearson's correlation coefficient (R) is Pearson's correlation coefficient returns a value between −1 and 1. The interpretation of the correlation coefficient is: if the correlation coefficient is −1, it indicates a strong negative relationship, implying a perfect negative relationship between the variables; if the correlation coefficient is 0, it indicates no relationship; if the correlation coefficient is +1, it indicates a strong positive relationship, implying a perfect positive relationship between the variables.

Software Tools for Data Preparation and Processing
Within the BioITGenoSelect project, several software solutions were developed that were used in the preparation of data for this work. Some of the functionality is part of a central application that manages animal data (e.g., origin, lactation), and several programs were written to run independently and can be run from the command line on given input files.
Genotype data were obtained from the Neogen laboratory in a format standard for Illumina GenomeStudio Software (https://www.illumina.com/techniques/microarrays/ array-data-analysis-experimental-design/genomestudio.html, accessed on 12 December 2022). The exports important for the project are in the so-called Final Report format, in which pairs of animals and SNP reads are given in several forms. For the purposes of the work, it was necessary to convert the data into 0125 input form, in which the number of alleles marked with B was counted, while the number 5 represented an unknown value. In the output format, each animal has an associated sequence of numbers representing individual SNP values. Additionally, applications were written that allowed the minimum amount of SNP columns and animals to be removed that would achieve 100% of known values (i.e., 5 does not appear in the output). Since the work used tools from the BLUPF90 family [13] and Wombat [12], which have different input formats for this data, applications were also written that performed translations where necessary.
Data on origin and lactations were generated from the web application BioITGenoSelect (https://bioitgenoselect.polj.uns.ac.rs/genoselect, accessed on 12 December 2022) for the requested animals, based on imported data from the databases curated by the Faculty of Agriculture of the Central Breeding Organization in the livestock sector in Vojvodina province, Serbia.

PreGSF90
PreGSF90 is an interface program for the genomic module to process the genomic information for the BLUPF90 family of programs. Within the BLUPF90 program package [13], there are a number of applications that can be used in different steps of data processing. The package as a whole is designed for various calculations using mixed models in animal breeding. For the purposes of this project, some of the program's data quality control applications were used, primarily PreGSF90, in combination with RENUMF90, which was required for some of the inputs.

Wombat
Wombat [12] is a software for analyzing data related to quantitative genetics and animal breeding, and uses linear mixed models with a focus on REML estimation of (co)variance components. It was first presented at the eighth WCGALP 2006, and in the last few years, it has been significantly improved and expanded, primarily with multivariate analyses and analyses for solving mixed model equations using genomic data and ssGBLUP methods. In Serbia, this program is currently used for estimations of breeding values and ranking of cattle for the purposes of implementing national breeding programs, using univariate analysis with the random effect of the animal. In this study, existing analyses were extended to the use of multivariate analysis (MBLUP), as well as the use of models for genomic prediction (GBLUP and ssGBLUP), with the aim of defining and recommending the model that best fits the currently available data.

BLUP(p)
, ssGBLUP(p) and MBLUP(p) models were first compared because they provided breeding values for all 6041 animals. After that, BLUP(p), GBLUP(p) and ssGBLUP(p) methods were compared, but only for 1491 genotyped animals because GBLUP(p) methods gave breeding values only for those. As the Wombat software does not support multivariate analysis when using SNP markers, it was applied only in cases where the NRM was used, and not when the GRM was used.  Table 2 provides the accuracies of the evaluations of the obtained breeding values for each observed trait, which were created by solving Equations (3), (6), (13), (14), (16) and (19).  For each observed trait, there was a statistically significant difference in the average estimate of accuracy depending on the method used. The average accuracy was higher using the BLUPp model compared to BLUP, except in the case of the trait PC. The same was true for the ssGBLUPp and MBLUPp models versus ssGBLUP and MBLUP models, respectively. The average increase in accuracy of the BLUPp model compared to the BLUP model for the first four traits was 15.12%, the accuracy increase of ssGBLUPp compared to ssGBLUP was 14.78%, while for the MBLUPp model compared to MBLUP, it was 15.11%. For each trait, the MBLUP model gave higher accuracy than the BLUP model. The same trend was clear for the MBLUPp model in relation to BLUPp (MBLUPp was more accurate), except for the trait FC. Table 3 shows the mean values of the estimated breeding values obtained by solving the observed systems, as well as the p-values of the applied two-tailed t-test (the mean values obtained by the BLUP and BLUPp models, ssGBLUP and ssGBLUPp models, and MBLUP and MBLUPp models were compared).

Comparison of BLUP(p), ssGBLUP(p) and MBLUP(p) Models
A two-tailed t-test showed that there were no statistically significant differences in the average estimates of breeding values when comparing BLUP and BLUPp, except for the trait PC. When the ssGBLUP and ssGBLUPp models were compared, there were statistically significant differences in the average estimates of breeding values for all traits except for the trait FC. When comparing MBLUP and MBLUPp models, there were statistically significant differences in the average estimates of breeding values for all the traits studied.

Estimates of Breeding Values and Comparison of Accuracy of Estimates Using NRM for BLUP(p) Models and GRM for GBLUP(p) and ssGBLUP(p) Models for 1491 Genotyped Animals
Breeding value estimates for 1491 genotyped animals were extracted from data generated by the BLUP(pe) and ssGBLUP(p) models that had provided breeding values for 6041 animals, to compare the accuracy of breeding values estimated by the GBLUP(p) models; these models used data and calculated breeding values only for genotyped animals. Descriptive statistics for these methods are shown in Table 4.
When we consider only the first lactations, the GBLUP model proved to be more accurate than the BLUP model, except for the PC trait. The most accurate method was ssGBLUPp, with a maximum standard error of 0.0406. The average percent increase in milk fat yield when estimated using the ssBLUP model compared to BLUP was 5.01%. If we look at all lactations, for all five traits, the highest estimate of accuracy was obtained by the ssGBLUPp model. With the BLUPp model, the accuracy ranged from 0.540 to 0.753, with GBLUPp from 0.529 to 0.736 and with ssGBLUPp from 0.582 to 0.766. The average percentage increase in accuracy of the ssGBLUPp method was 5.91% over the BLUPpe method. With the ssGBLUPp model, the standard error was not greater than 0.0559. Table 5 shows the mean estimated breeding values calculated using the BLUP(p), GBLUP(p) and ssGBLUP(p) models. Table 4. Descriptive statistics and p-values obtained using a two-tailed t-test for the accuracy of prediction of breeding values calculated using the BLUP(p), GBLUP(p) and ssGBLUP(p) models.

Correlations between BVs
The correlations between BVs for the BLUP, GBLUP and ssGBLUP models are shown in Table 6. These correlations were performed on 1491 cows because GBLUP and GBLUPp can only be used on genotyped cows.
For all five traits, the highest correlations were between GBLUP and ssGBLUP. The highest Spearman's rank correlation coefficient is 0.956 (PC trait, GBLUP-ssGBLUP) and the lowest 0.785 (MY trait, BLUP-GBLUP). The highest Pearson's coefficient is 0.962 (PC trait, GBLUP-ssGBLUP) and the lowest 0.789 (MY trait, BLUP-GBLUP). In Figure 1, for the MY trait, the scatter plots between BVs predicted in BLUP, GBLUP and ssGBLUP are shown. For all five traits, the highest correlations were between GBLUP and ssGBLUP. The highest Spearman's rank correlation coefficient is 0.956 (PC trait, GBLUP-ssGBLUP) and the lowest 0.785 (MY trait, BLUP-GBLUP). The highest Pearson's coefficient is 0.962 (PC trait, GBLUP-ssGBLUP) and the lowest 0.789 (MY trait, BLUP-GBLUP). In Figure  1, for the MY trait, the scatter plots between BVs predicted in BLUP, GBLUP and ssGBLUP are shown. The correlations between BVs for the BLUPp, GBLUPp and ssGBLUPp models are also shown in Table 7 for 1491 genotyped cows. Table 7. Spearman's rank (above diagonal) and Pearson's correlation coefficients (below diagonal) between predicted breeding values for milk production traits in all lactations. The correlations between BVs for the BLUPp, GBLUPp and ssGBLUPp models are also shown in Table 7 for 1491 genotyped cows.

Discussion
The estimation of breeding values and animal ranking are important parts of modern breeding programs. The accuracy of an estimated breeding value is defined as the correlation between the true and estimated breeding values [28]. This precision is highly dependent on the amount of information available on any single animal, including data

Discussion
The estimation of breeding values and animal ranking are important parts of modern breeding programs. The accuracy of an estimated breeding value is defined as the correlation between the true and estimated breeding values [28]. This precision is highly dependent on the amount of information available on any single animal, including data on its relatives. Three main sources of information are: the performance data, measured on animals; data on correlations between different traits; and pedigree data, which nowadays are often accompanied by the results of molecular genetic analysis [29]. This enables the implementation of genomic selection that results in great genetic progress and lowers the costs of progeny testing, presenting breeders with the opportunity to select animals that have inherited desirable gene combinations [30,31]. Genomic selection includes the genotyping of animals irrespective of their age and sex, and this allows breeding values to be calculated, even for a newborn male or female animal. This selection approach is recognized by the relevant international organizations, and its advantages in animal breeding are a shorter generation interval, increased selection effect, more accessible data validation and inbreeding prevention. The application of genomic selection in dairy cattle breeding started during the last years of the first decade of the 21st century [32,33], enabled by the analysis of SNP markers that are utilized over the whole genome, and by being relatively inexpensive.
Breeding value estimation and the ranking of dairy cows in Serbia is currently based on the application of the univariate BLUP animal model, where the greatest emphasis is placed on the milk yield characteristics, primarily on the milk yield in the first lactation [34]. Milk yield traits are of primary economic importance in breeding programs because they directly affect the profitability of cattle production [35]. However, the selection of breeding animals should be based on the combination of traits included in a selection index by which the animals are then ranked. Multivariate analysis is the optimal methodology for the analysis of several traits because it simultaneously evaluates the breeding values of animals and calculates phenotypic and genetic correlations between these traits [15]. In addition, one of the major advantages of multivariate over univariate analysis is that it increases the accuracy of the estimate [15]. While observing the data presented in Table 2, it can be seen that using the multivariate analyses (MBLUP/MBLUPp), higher accuracies of breeding values were obtained, on average, for all traits compared to the univariate analyses (BLUP/BLUPp). Additionally, from Table 2, increases in the accuracies of the estimated breeding values were also achieved by the use of the repeated measurements (milk yield results from repeated lactations), which was confirmed for the majority of traits (except for the protein content), regardless of whether it was a standard univariate or multivariate analysis with the use of a numerical relationship matrix or analysis combining numerical and genomic relationship matrices (ssGBLUP).
It is well known that increasing the amount of information on a single animal moves an estimated breeding value closer to the true breeding value. Table 4 shows that, for all traits included, ssGBLUP(p) produced more accurate estimated breeding values than did BLUP(p) and GBLUP(p), while GBLUP had better accuracy than BLUP. However, data in Table 2 show that the application of the ssGBLUP model only partially improved the accuracy. More precisely, the inclusion of SNP marker data resulted in increased accuracy over the univariate BLUP(p) models, but not over the multivariate BLUP(p) models. It can be concluded that models that include genomic data produce more accurate estimated breeding values for genotyped animals, while the same is not necessarily true at the level of the whole population; we speculate that in our case, this was probably a result of the small number of genotyped animals within the population. Ref. [36] compared single-step blending models and GBLUP models with and without adjustment of the genomic relationship matrix for the genomic prediction of 16 traits in a Nordic Holstein population. The accuracy for traits of milk, fat and protein yields were 0.657, 0.675 and 0.655, respectively, and these results were obtained using the GBLUP model. In the current research, the accuracies of milk, fat and protein yields were 0.523, 0.521 and 0.545, respectively, with the GBLUP model. Ref. [37] compared the accuracies of genomic prediction for milk, fat and protein yields from Philippine dairy buffaloes using BLUP, GBLUP and ssGBLUP. In their research, the ssGBLUP model was 13% more accurate than the GBLUP model for milk yield, while in our research, the difference in milk yield estimates between those same models was 4.2%. For fat yield, [37] found an increase of 4%, while ours was 2.5%, and for protein yield, the increase was 3% in [37], while it was 2.6% in our current study. Similar comparisons were also made in other species. The accuracy of BLUP and ssGBLUP for predicting breeding value in Huacaya Alpacas were compared in [38]. Their model with genomic data was, on average, 3.51% more accurate. In our current study, the ssGBLUP model was 4.90% more accurate than the BLUP model. In [39], authors compared the accuracy of BLUP and GBLUP predictions of immune traits in Beijing oil chickens. Average accuracies were 0.316 and 0.274, using BLUP and GBLUP models, respectively. In our current research, these accuracies were 0.573 and 0.579, using BLUP and GBLUP models, respectively. According to [40] accuracies in genomic selection depend on the number, distribution, and contributions of genotypes and phenotypes to the genomic evaluation and [41] conclude that the design and size of the reference population plays a major role in achieving accuracy for the breeding schemes.
A comparison of predicted breeding values calculated using different analyses (BLUP, GBLUP and ssGBLUP) was also performed using Spearman's and Pearson's correlation coefficients, which is in agreement with [42]. In this research, high positive correlations were also obtained in relation to the applied analyses, and the highest were between GBLUP and ssGBLUP, regardless of the investigated trait. Some authors compared genomic prediction using GBLUP and ssGBLUP methods in Iranian Holstein cattle by calculating reliability [43]. In this research of ssGBLUP, the average confidence of genomic predictions for the three traits (milk, fat, and protein yield) was 0.39, which is 0.98% points higher than the average confidence from the GBLUP method.

Conclusions
The integration of bioinformatics in animal breeding is a new approach in Serbia, offering numerous advantages regarding the accuracy and speed of achieving previously set goals. The development of databases that incorporate pedigree, performance and genotype data results in valuable resources for detecting new candidate genes, linked with traits of economic interest, while also increasing the accuracy of estimated breeding values of animals that are candidates for selection.
Greater accuracy when using repeated measures over multiple lactations was confirmed for almost all traits. Furthermore, greater accuracy was obtained with the combination of NRM and GRM, i.e., using the ssGBLUP model compared with either NRM or using GRM in BLUP and GBLUP models, but only when focusing just on genotyped animals. At the level of the examined population that combined both genotyped and non-genotyped animals, the highest accuracy was achieved using multivariate analysis with repeated measures.
In conclusion, this study shows that the current breeding programs in Serbia should be changed to use multivariate analysis with repeated measurements for estimating breeding values and for ranking animals that are candidates for selection, until the optimal size of reference population data, which must include genotyping data on both bulls and cows, is reached. This would have economic significance at the national level because greater accuracy in the prediction of breeding value expands the possibilities for increasing the intensity of selection and leads to an increase in the rate of genetic improvement of economically useful traits in cows.  Institutional Review Board Statement: The animal study protocol was approved by the Ministry of agriculture, forestry and water management-Veterinary Directorate (protocol code 323-07-10977/2019-05 from 19 November 2019).

Informed Consent Statement: Not applicable.
Data Availability Statement: The original data used in this article are available by contacting the corresponding author upon request. After acceptance for publication they will be available on the project website database: https://bioitgenoselect.polj.uns.ac.rs/genoselect/data-p1, accessed on 12 December, 2022.