Genome-Wide Association Analysis of Growth Curve Parameters in Chinese Simmental Beef Cattle

Simple Summary Complex traits that require observations over multiple time points for the same individual are called longitudinal traits. Understanding the genetic architecture of beef cattle growth cannot be limited simply to a genome-wide association study (GWAS) for body weight at any specific ages, but should be extended to a more general purpose by considering the longitudinal weight–age using a growth curve approach. We compared three nonlinear models that described the body weight data of Chinese Simmental beef cattle. The parameters of the suitable model were treated as phenotypes of single-trait GWAS and multi-trait GWAS. We identified 87 significant single nucleotide polymorphisms (SNPs) associated with body weight. Many candidate genes associated with body weight were identified which may be useful for exploring the full genetic architecture underlying growth and development traits in livestock. Abstract The objective of the present study was to perform a genome-wide association study (GWAS) for growth curve parameters using nonlinear models that fit original weight–age records. In this study, data from 808 Chinese Simmental beef cattle that were weighed at 0, 6, 12, and 18 months of age were used to fit the growth curve. The Gompertz model showed the highest coefficient of determination (R2 = 0.954). The parameters’ mature body weight (A), time-scale parameter (b), and maturity rate (K) were treated as phenotypes for single-trait GWAS and multi-trait GWAS. In total, 9, 49, and 7 significant SNPs associated with A, b, and K were identified by single-trait GWAS; 22 significant single nucleotide polymorphisms (SNPs) were identified by multi-trait GWAS. Among them, we observed several candidate genes, including PLIN3, KCNS3, TMCO1, PRKAG3, ANGPTL2, IGF-1, SHISA9, and STK3, which were previously reported to associate with growth and development. Further research for these candidate genes may be useful for exploring the full genetic architecture underlying growth and development traits in livestock.


Introduction
With improved quality of life in China, the demand for meat, particularly beef, is increasing [1]. The Simmental breed accounts for more than 70% of the total number of beef cattle in China [2]. Therefore, it is necessary to analyze the genetic mechanism of the growth traits in beef cattle production [3].
Genome-wide association study (GWAS) and genomic selection (GS) are powerful statistical tools that can broadly identify candidate genes with significant single nucleotide polymorphisms (SNPs) involved in production traits [4,5], growth traits [6,7] and fertility traits [8,9]. However, current research mainly focuses on single data records, such as birth weight, weaning weight, and weight before slaughter [3,10,11]. Complex traits that require observations over multiple time points for the same individual are called longitudinal traits. Compared with single data records, longitudinal data can better describe the growth and production of livestock and poultry [12,13]. The fitting growth curve model is one of the most common strategies for such data [14]. Different models [15] provide a few parameters for people to show the regularity of weight gain in livestock, such as mature body weight (A), time-scale parameter (b), and maturity rate (K), which might reflect the influence of genetic impacts on body weight. In the current study, parameters (such as A and K) were considered as phenotypes of the mixed linear model, and quantitative trait loci affecting the growth curve were identified by GWAS. In addition, Das et al. [16] proposed a series of methods based on random regression models. Previous research has demonstrated that these methods are more sophisticated and flexible [17], but they did not provide biologically-interpretable parameters, such as A and K, which are usually required in daily breeding management.
Generally, a quantitative trait locus (QTL), which affects complex traits, may affect multiple traits [18]. Therefore, the genetic correlations between the parameter estimates (mainly A and K) should be considered. These correlations may be attributed to SNPs that have pleiotropic effects on multiple traits. Therefore, multiple trait GWAS (multi-trait GWAS) is more reasonable in this study [18,19].
In this study, body weights of 808 Chinese Simmental beef cattle at four stages of growth were used to fit the growth curve. The best fitting growth curve parameters were treated as phenotypes for single-trait GWAS and multi-trait GWAS. The aim of our study was to comprehensively analyze candidate genes and QTL regions associated with growth traits by two GWAS methods. We also undertook post-GWAS analyses to identify and prioritize annotated genes within detected genomic regions using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Our findings offer valuable insights for exploring the full genetic architecture underlying growth and development traits in livestock.

Resource Population and Phenotypes Collection
All animals used in the study were treated following the guidelines established by the Council of China Animal Welfare. Protocols of the experiments were approved by the Science Research Department of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China (approval number: RNL09/07). The training population consisted of 808 Chinese Simmental beef cattle established in Ulgai, Xilingole League, Inner Mongolia of China. Body weight at four growth stages (0, 6, 12, and 18 months after birth) were measured for each individual. Since fixed effects were related to body weight and not to growth curve parameters, original body weight data at each age were pre-adjusted for fixed effects (breed, year, and month of birth) by a generalized linear model (GLM). The descriptive statistics of the pre-adjusted phenotypic data are presented in Table 1.

Genotyping and Quality Control
Genomic DNA was isolated from blood samples using the TIANamp Blood DNA Kit (Tiangen Biotech Co.Ltd., Beijing, China) and DNA quality was acceptable when the A260/A280 ratio was between 1.8 and 2.0. In total, 808 cattle were genotyped using Illumina BovineHD Beadchip (Illumina Inc., San Diego, CA, USA). Before statistical analysis, SNPs were pre-processed by PLINK v1.07 [20]. Duplicated SNPs were also removed. Finally, 671,192 SNPs on 29 autosomal chromosomes with an average distance of 3 kb were generated for the analysis. SNPs were deleted according to the following standards, including minor allele frequency (<0.01), SNP call rate (<0.05), and Hardy-Weinberg equilibrium values (p < 1 × 10 −6 ).

Growth Curve Fitting
Three of the most widely used nonlinear models ( Table 2) to describe animal growth curves-Gompertz, Logistic, and Brody-were fitted for each animal using the iterative nonlinear least-squares method via the Gauss-Newton [21] algorithm implemented in SAS 9.4. In the function, A is the mature body weight, which means the ultimate body weight of an individual; b is the time-scale parameter, which means the time for an individual to reach its maximum growth rate; K is the maturity rate, which means the rate at which an individual approaches its mature body weight (A). Table 2. Growth curve model. A is the mature body weight, b is the time-scale parameter, K is the maturity rate, W is the observed body weight, t is the growth time, and e is the natural logarithm.

Model Function
The coefficient of determination R 2 [22] was used to evaluate the fitting effect of the growth curve model. The expression is as follows: where W represents the observation of body weight,Ŵ represents the estimated body weight of the growth curve model, and W is the average value of body weight. The genetic correlation of A, b, and K was also calculated, which used the following formula: where σ 2 a1 is the additive genetic variance of trait 1, σ 2 a2 is the additive genetic variance of trait 2, and σ a12 is the covariance of additive effects.

Single-Trait GWAS
After selecting the nonlinear model which best fit the body weight data, the parameters A, b, and K were used in GWAS. Firstly, we used principal component analysis (PCA) and obtained the kinship matrix using the package GAPIT [23] (Genomic Association and Prediction Integrated Tool) (http://www.maizegenetics.net/gapit) in R software (R 3.6.1).
The following mixed linear model was considered: where y represents the vector of observations from the three phenotypes (A, b, K estimates) for each individual; s is the SNP effects vector; W is a matrix of SNP genotype indicators, which were coded as 0, 1, and 2 corresponding to the three genotypes AA, AB, and BB; µ is a vector of these polygenic effects with an assumed N (0, Kσ 2 ) distribution, where σ 2 is the polygenic variance and K is a marker inferred kinship matrix; X is an incidence matrix for β, and β is a non-genetic vector of fixed effects only including principal component effects (the top three eigenvectors of the Q matrix). The other fixed effects were not included at this step, since they were already considered before fitting nonlinear functions (see pre-adjustment for fixed effects in Section 2.1). Z is an incidence matrix for µ; while e is a vector for random residual errors with a putative N (0, Iσ 2 e ) distribution, where σ 2 e is the residual variance.
The false discovery rate (FDR) was used to determine the threshold values for singletrait GWAS and multi-trait GWAS. The FDR was set as 0.05, and the threshold p-value was calculated as follows: where n is the number of p < 0.05 in the results and m is the total number of SNPs [24].

Multi-Trait GWAS
The multi-trait GWAS was conducted to detect pleiotropic SNPs for the parameters A, b, and K. The model was a Chi-squared distribution which was calculated for each SNP using the following formula [25]: where t i is a 3 × 1 vector of the t-values for ith SNP obtained from single-trait GWAS; v i is the estimate of v; V(v i ) is the corresponding variance which can be obtained by the compressed mixed linear model (CMLM); t i is the transpose of the vector t i ; V −1 is the inverse of the 3 × 3 correlation matrix between traits, which was calculated using the qualified SNPs.

Gene Function Annotation
We explored the biological mechanism of significant SNPs based on the interpretability of the gene functions related to the relevant SNPs. To select the candidate genes based on the physical location of SNPs, the BioMart module of Ensembl was used to match the significant SNPs with the UMD Bostaurus 3.1 (http://www.animalgenome.org). Then we confirmed the biological function of related genes by the genome databanks National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/), and the genes associated with growth and development traits were screened out. GO and KEGG pathways were used to annotate the main biological functions, metabolic pathways, and signal transduction pathways involved in differentially expressed genes.

Growth Curve Fitting
Models are shown in Table 3, and the three growth curves are plotted in Figure 1. The R 2 values for the Gompertz, Logistic, and Brody models were 0.954, 0.951, and 0.951, respectively; the Gompertz model showed the best goodness of fit. Figure 1 shows four growth curves of the Gompertz model, Logistic model, Brody model, and the weight average. The curves representing the Gompertz model and average body weight overlap almost completely, while the other curves have some deviation. Therefore, the parameters of the Gompertz model were selected as phenotypes of GWAS.
The correlation coefficients were 0.087 (A and b), -0.578 (A and K), and 0.369 (b and K) respectively, which showed that A and K have a strong negative correlation.

Principal Components Analysis (PCA)
The population stratification of the Simmental beef cattle population based on the PCA is shown in Figure 2. The population was divided into five separate clusters, demonstrating an obvious stratification in the reference population. The majority of individuals are located in the lower right corner, while a small number of individuals are distributed in other regions. Therefore, the first two principal components are selected as covariables to eliminate the influence of population stratification on correlation analysis.

Principal Components Analysis (PCA)
The population stratification of the Simmental beef cattle population based on the PCA is shown in Figure 2. The population was divided into five separate clusters, demonstrating an obvious stratification in the reference population. The majority of individuals are located in the lower right corner, while a small number of individuals are distributed in other regions. Therefore, the first two principal components are selected as covariables to eliminate the influence of population stratification on correlation analysis.

Principal Components Analysis (PCA)
The population stratification of the Simmental beef cattle population based on the PCA is shown in Figure 2. The population was divided into five separate clusters, demonstrating an obvious stratification in the reference population. The majority of individuals are located in the lower right corner, while a small number of individuals are distributed in other regions. Therefore, the first two principal components are selected as covariables to eliminate the influence of population stratification on correlation analysis.

Summary of Results by Two GWAS Methods
The quantile-quantile (Q-Q) plots and Manhattan plots of single-trait GWAS are shown in Figures 3 and 4. Most points were near the diagonal line in quantile-quantile (Q-Q) plots because the population structure was considered in the GWAS function. The Q-Q plots suggested that there was no inflation or systematic bias in this research. There were nine significant SNPs for mature body weight (A) in the Manhattan plots of single-trait GWAS. The nine SNPs were located on bos taurus autosomes (BTA) 4, 7, 10, 11, 15, and 22, and the locus with the lowest p-value was located at 20,500,709 bp on BTA 7 ( Figure 4A). The 49 significant SNPs were shown for time-scale parameter (b), and the SNP with the lowest p-value was located at 98,989,710 bp on BTA 9. For the maturity rate (K), Manhattan plot indicated seven significant loci which were located on BTA 22 and 25, and the SNP with the lowest p-value was located at 18,694,612 bp on BTA 22. We observed several associated genes involved in growth and development, including PLIN3, KCNS3, TMCO1, ANGPTL2, IGF-1, ASPH, ALPL, GRM7, and SHISA9. All results are shown in Table 4.
The Q-Q plot and the Manhattan plot of multi-trait GWAS are shown in Figures 5 and 6. The same conclusion as the single-trait GWAS was given in the Q-Q plot of multi-trait GWAS. The 22 significant SNPs were identified. The SNP with the lowest p-value was located at 25,336,507 bp on BTA 10. We also observed several associated genes involved in growth and development which included STK3, CD58, and bta-mir-2285de. The results are shown in Table 4.

Summary of Results by Two GWAS Methods
The quantile-quantile (Q-Q) plots and Manhattan plots of single-trait GWAS in Figure 3 and Figure 4. Most points were near the diagonal line in quantile-qua plots because the population structure was considered in the GWAS function. The suggested that there was no inflation or systematic bias in this research. There significant SNPs for mature body weight (A) in the Manhattan plots of single-tr The nine SNPs were located on bos taurus autosomes (BTA) 4, 7, 10, 11, 15, and locus with the lowest p-value was located at 20,500,709 bp on BTA 7 ( Figure 4 significant SNPs were shown for time-scale parameter (b), and the SNP with th value was located at 98,989,710 bp on BTA 9. For the maturity rate (K), Manhatt dicated seven significant loci which were located on BTA 22 and 25, and the SN lowest p-value was located at 18,694,612 bp on BTA 22. We observed several genes involved in growth and development, including PLIN3, KCNS3, TMCO1, IGF-1, ASPH, ALPL, GRM7, and SHISA9. All results are shown in Table 4.
The Q-Q plot and the Manhattan plot of multi-trait GWAS are shown in Fi Figure 6. The same conclusion as the single-trait GWAS was given in the Q-Q plo trait GWAS. The 22 significant SNPs were identified. The SNP with the lowe was located at 25,336,507 bp on BTA 10. We also observed several associated volved in growth and development which included STK3, CD58, and bta-mir-2 results are shown in Table 4.

GO and KEGG Pathway Analysis
We found 29 KEGG pathways and 135 GO terms, and 12 pathways and 99 GO terms were significantly enriched (p < 0.05) (e.g., thiamine metabolism, circadian rhythm, protein stabilization, nephric duct morphogenesis, glycosylphosphatidylinositol (GPI)linked ephrin receptor activity) (Table S1). Particularly, seven KEGG pathways and 14 GO terms which were related to growth and development are shown separately in Table 5, including Hippo signaling pathway-multiple species, longevity regulating pathwaymultiple species, nephric duct morphogenesis, and limb morphogenesis.

Growth Curve Fitting
R 2 of Gompertz model reached 0.954, which was the highest of the three models. The parameter A of the Gompertz model showed that the mature body weight of Chinese Simmental beef cattle reached 617.9 kg, which was within the normal mature weight range (600-800 kg) for the population [26]. Though the R 2 of Logistic model and Brody model reached 0.951, the parameter A (551.0 and 1458.5) was inconsistent with the actual weight of Chinese Simmental beef cattle. The results indicated that the two models may not be suitable for the data in this study. Though the coefficient of determination (R 2 ) for Logistic model and Brody model for A are the same (0.951), the estimate of A for the two models was quite different. The reason for this phenomenon may be that the function expressions of the two models are different, and the estimation methods are also different, so the models adapt to different breeds. Among the three models, the growth curve fitting by the Gompertz model with the highest R 2 had well-matched performance for the actual cattle population. Therefore, the Gompertz model was chosen as the best model for Chinese Simmental beef cattle, which was the same conclusion as Liang et al. [27].
The negative relationship between parameters A and K has been reported many times [14,28,29], which suggests that individuals with smaller mature weight will gain its mature body weight at a young age. Thus, we can predict that precocious animals will not gain a high mature body weight, even if we put in the same cost (such as feed) as other individuals. The conclusion could help us reduce the cost of raising animals by learning to manage individuals separately.
Although there are few studies about growth curves in Chinese Simmental beef cattle, some authors have concluded that the Gompertz model provides the best fit for body weight of beef cattle. Zainaguli et al. [30] used four common models (Logistic, Gompertz, Brody, and Bertallanffy) to fit the weight growth curves of 344 Xinjiang Brown cattle. The Gompertz model showed the best fit for the population. Liang et al. [27] compared four growth curve models (Logistic, Gompertz, Brody, and Bertallanffy) fitted to body weight of Simmental beef cattle and concluded that the Gompertz model was superior to the other models.

GWAS, GO, and KEGG Pathway Analysis
We performed single-trait GWAS and multi-trait GWAS for the body weight trait of Chinese Simmental beef cattle. A great number of genes involved in growth and development were identified by each method. The reason for this phenomenon may be the limited dataset. However, since most growth and development traits are controlled by multiple genes [31], the genes associated with growth and development identified by separate GWAS cannot be ignored. Single-trait GWAS and multi-trait GWAS have their specific advantages in the identification of distinct loci. For example, compared to the metaanalysis GWAS, the single-population GWAS was more powerful for the identification of SNPs [32], whereas multi-trait GWAS has the advantage of increasing statistical power and identifying pleiotropic loci [33][34][35]. Therefore, it should be noted that multi-trait GWAS cannot replace single-trait GWAS, rather it was complementary to single-trait GWAS. Thus, combining single-trait GWAS and multi-trait GWAS methods was expected to markedly improve the analysis of the genetic mechanism of the body weight traits for Chinese Simmental beef cattle.
Single-trait GWAS: For mature body weight (A), the significant locus ARS-BFGL-NGS-14531 which has the lowest p-value was near PLIN3 (perilipin 3). PLIN3 is an important regulator of adipogenesis and triglyceride storage [36], and PLIN3 functions are intertwined with the lipogenic pathways implicated in sebaceous lipogeneses, such as desaturation and triglyceride synthesis [37]; three significant SNPs were near KCNS3 (potassium voltage-gated channel modifier subfamily S member 3) which was proven to be significantly associated with the percent body fat (%BF) [38]. For time-scale parameter (b), two SNPs were within TMCO1 (transmembrane and coiled-coil domains 1), which may affect muscle development because of the significant relationship with PRKAG3 (protein kinase AMP-activated noncatalytic subunit gamma 3); two significant SNPs were near ANGPTL2 (angiopoietin like 2). A study showed that ANGPTL2 may be used as a new type of adipocyte factor [39]. One SNP was located in CFB (complement factor B), which has been identified as related to the total number of piglets born (TNB) and reproductive traits [40]; four significant SNPs were near or within IGF-1 (insulin-like growth factor 1). IGF-1 and its signaling pathway play a primary role in normal growth and aging [41,42]. The locus BovineHD1400008371 was near ASPH (aspartate beta-hydroxylase), which is involved in regulating the growth and development of beef cattle carcass [43]; four significant SNPs were near or within ALPL (alkaline phosphatase, biomineralization associated). A study showed that the expression level of ALPL in white blood cells of obese people is significantly higher than that of lean people, indicating that ALPL may be related to the production of fat [44]; two significant loci were within EPHA4 (EPH receptor A4) which was one of the potential candidate genes for growth trait of pigs [45]. Seven significant loci of maturity rate (K) were concentrated on chromosomes 22 and 25; two associated genes GRM7 (glutamate metabotropic receptor 7) and SHISA9 (shisa family member 9) were found and SHISA9 was highly correlated with growth and development [46].
Multi-trait GWAS: CD58 (CD58 molecule) and STK3 (serine/threonine kinase 3) were found by both methods. Two SNPs (BovineHD1400018901 and BovineHD1400018902) from single-trait GWAS and one SNP (BovineHD1400018913) from multi-trait GWAS were near STK3, which is also named MST2 (macrophage stimulating 2). MST1 (macrophage stimulating 1) and MST2 were central to the Hippo signaling pathway in mammals, which enabled the dynamic regulation of tissue homeostasis in animal development [47]. One significant SNP (BovineHD0100017897) was within bta-mir-2285de which might be an important regulator of bovine mammary lipogenesis and metabolism [48]. The locus BovineHD2700010439 was near KAT6A (lysine acetyltransferase 6A), and it has been shown to be significantly associated with growth retardation [49]. One significant SNP(BovineHD1100023984) was near NBAS (NBAS subunit of NRZ tethering complex) which was significantly associated with bone development [50].
GO and KEGG pathway: There are also some candidate genes which are closely related to growth and development found by GO and KEGG pathway. For example, STK3 was involved in more than one GO and KEGG pathway, including Hippo signaling pathwaymultiple species, Hippo signaling pathway, MAPK signaling pathway, cell differentiation involved in embryonic placenta development [51], hepatocyte apoptotic process, negative regulation of organ growth, negative regulation of cell population proliferation, positive regulation of fat cell differentiation, and central nervous system development, which suggests that STK3 may be closely related to cell proliferation and differentiation [51], organ growth and development, and nervous system development [52]. ASPH was involved in limb morphogenesis and roof of mouth development, which suggests that ASPH is significantly associated with body development [43]. ANGPTL2 was involved in angiogenesis, which suggests that ANGPTL2 is closely related to the development of individuals [39].

Conclusions
In conclusion, the three growth curve models were used to fit the body weight data of Chinese Simmental beef cattle. The parameters of the Gompertz model with the best fitting effect are the phenotypes of GWAS. A total of 65 significant SNPs from single-trait GWAS and 22 SNPs from multi-trait GWAS were found. Seven KEGG pathways and 14 GO terms, which were related to growth and development, were also identified. Several candidate genes that were significantly associated with growth and development traits were observed, including PLIN3, KCNS3, TMCO1, ANGPTL2, CFB, IGF-1, ALPL EPHA4, SHISA9, STK3, and bta-mir-2285de. The role of associated genes in growth and development was also discussed. Further research for these candidate genes may be useful for exploring the full genetic architecture underlying growth and development traits in livestock.

Data Availability Statement:
We confirm that all raw data underlying our findings are publicly available without restriction. Data is available from the Dryad Digital Repository: doi: https://doi.org/ 10.5061/dryad.4qc06.