Population Structure and Genetic Diversity of Yunling Cattle Determined by Whole-Genome Resequencing

The Yunling cattle breed, a three-breed crossbreed, which comprises 50% Brahman cattle, 25% Murray Grey cattle and 25% Yunnan Yellow cattle, has several advantageous traits, including rapid growth, superior meat quality, ability to improve tolerance in hot and humid climates, tick resistance and rough feed. It can be rightfully stated that Yunling cattle serve as vital genetic repositories of the local Yunnan cattle. Gaining insights into the genetic information of Yunling cattle plays a significant role in the formulation of sound breeding strategies for this breed, safeguarding genetic resources and mitigating the risks associated with inbreeding depression. In this study, we constructed the Yunling cattle standard reference genome and aligned the whole genomes of 129 Yunling cattle individuals to the constructed reference genome to estimate the current genetic status of Yunling cattle in Yunnan Province, China. The average alignment rate and the average percentage of properly paired are both 99.72%. The average nucleotide diversity in Yunling cattle is 0.000166, which indicates a lower level of diversity. Population structure analysis classified Yunling cattle into two subgroups. Inbreeding analysis revealed that inbreeding events did occur in the Yunling cattle, which may have contributed to the low genetic diversity observed. This study presents a comprehensive assessment of the genetic structure and diversity among the Yunling cattle and provides a theoretical foundation for the preservation and exploitation of these precious germplasm resources.


Introduction
Yunling cattle is a breed developed by Yunnan Academy of Grassland and Animal Science in China.It is a three-breed crossbreed, which comprises 50% Brahman cattle, 25% Murray Grey cattle and 25% Yunnan Yellow cattle, and it has various benefits, such as rapid growth, superior meat quality, improved tolerance to hot and humid climates, tick resistance and tolerance to crude feeds.The transverse fixup has reached the fifth generation.It is primarily distributed across numerous regions within Yunnan Province and minimally dispersed throughout adjacent provinces [1,2].Livestock and poultry germplasm resources are strategic resources of the country, and the preservation of local breeds is crucial to the implementation of the seed industry revitalization strategy.With these advantageous traits, the Yunling cattle breed has emerged as a significant resource for beef production and cattle breeding in China.
Genetic diversity is the cornerstone of biological adaptation to the environment and evolution and serves as a vital reference for assessing the status of germplasm resources [3].The richer the genetic diversity within a population, the stronger its ability to adapt to environmental alterations [4].Assessing the genetic diversity in a population aids in comprehending its present status and future prospects pertaining to germplasm resources, playing a crucial role in the safeguarding, growth and usage of livestock and poultry germplasm resources [5].
Therefore, the assessment of the genetic characteristics of Yunling cattle is of great necessity for the construction of a rational breeding strategy for Yunling cattle and the design of a local conservation programme for Yunnan cattle [2].The genetic diversity in Yunling cattle has been systematically investigated using various methods, including karyotypic analysis [6], microsatellite DNA markers [1] and the study of Y-chromosome polymorphisms, including Y-SNPs and Y-STRs [7].Here, we provide supplementary insights into the genetic diversity in the Yunling breed from a comprehensive analysis of whole-genome DNA.According to previous research, DNA genetic diversity has revealed the global and local components of ancestry [8].However, the current genetic status of Yunling cattle remains uncertain, particularly at the genome-wide level.
In this study, we used 129 Yunling genomes and compared them with the standard genome of Yunling cattle (BioProject accession number PRJNA978937) and assessed genetic status through calculating nucleotide diversity and heterozygosity indexes.Subsequently, we evaluated the population structure of Yunling cattle through ADMIXTURE, principal component analysis (PCA) and construction of a neighbour-joining tree at the genomic level.Then, the inbreeding status of the Yunling cattle population was analysed by calculating the inbreeding coefficient and fixation index.Our results will aid in the comprehension of the population structure and genetic characteristics of Yunling cattle and establish a base for breeding Yunling cattle and maintaining the genetic diversity in Yunling cattle.

Samples and Sequencing
A set of biological tissues were sampled from a four-year-old male Yunling cattle reared at Chuxiong JinDa Farm of Chuxiong City in Yunnan Province and quickly frozen in liquid nitrogen.Among them, heart tissues were utilised for DNA sequencing in order to assemble the genome.Genomic DNA from heart tissue was extracted using the standard phenolchloroform [9] extraction method for DNA sequencing library construction.The integrity of the genomic DNA molecules was checked using agarose gel electrophoresis.We used the BGISEQ DNBSEQ-T7 platform for short sequencing (bp) to obtain 161.89GB raw data (64X coverage of the estimated genome size), and the PacBio Sequel II platform (CCS mode) for long sequencing to obtain 61.81 GB raw data for genome assembly.The sequencing work has performed at GrandOmics Biosciences Co., Ltd.(Wuhan, China).In this study, raw sequencing information used to assess Yunling cattle genetic diversity comes from NCBI with the BioProject accession number PRJNA555741 (Supplementary Table S1).In project PRJNA555741, all samples were obtained from Xiaoshao Farm of the Yunnan Academy of Grassland and Animal Science, Kunming, Yunnan, China.During the sampling procedure, animals were chosen on the basis of pedigree information to minimise relatedness between individuals [2].

Assemble and Annotation
We assessed the genome size and heterozygosity by 27-mer analysis with short pairedend reads by using KMC [10] and GenomeScope program [11].Initial assembly was performed using the HiFiasm program with HiFi long reads, followed by four rounds of correction using Nextpolish with short reads and default settings in order to imporve assembly accuracy.HiC-Pro [12] and fastp were implemented for the filtration of low-quality Hi-C data and paired-end reads, respectively.Clean paired-end reads were subsequently aligned to the assembly utilizing bowtie2 [13] in order to acquire unique paired-end reads.HiC-Pro was used to identify the validation paired-end reads from the unique mapped paired-end reads of interaction paired-end reads while filtering out invalid read pairs.The scaffolds were further clustered, ordered and oriented onto chromosomes by LACH-ESIS [14].Finally, Juicebox [15] was utilised for manual correction of large-scale inversions and translocations, resulting in the generation of the ultimate pseudochromosomes.
Then, we performed gene function annotation and annotation of non-coding RNAs (ncRNAs) on the genome.For gene function annotation, we used the default parameters of the InterProScan [16] program to identify putative domains and GO (Gene Ontology) terms of genes, and used Blastp to compare the EvidenceModeler-integrated protein sequences with well-known databases, SwissProt, NR (Non-Reduntant Protein Database), KEGG (Kyoto Encyclopedia of Genes and Genomes), KOG (Eukaryotic Orthologous Groups of protein), with an E value cutoff of 1 × 10 −5 and the results with the hit with lowest E value was retained.By combining the comparison outcomes of the aforementioned databases, the total count of annotated genes amounts to 19,172, which corresponds to 92.80% of the predicted protein-coding genes.For ncRNAs annotation, two strategies were used: searching against a database and prediction with a model.Transfer RNAs (tRNAs) were predicted using tRNAscan-SE [17] with eukaryote parameters.MicroRNA, rRNA, small nuclear RNA and small nucleolar RNA were detected using Infernal cmscan to search the Rfam [18] database.The rRNAs and their subunits were predicted by RNAmmer [19].

Population Structure, Principal Component Analysis and Phylogenetic Tree Analysis
The population structure of Yunling cattle was studied with the use of ADMIXTURE 1.3.0software [27], with the assumed assumption that K ancestry was between 2 and 19.We calculated the genetic distance matrix between each individual using PLINK tool.Based on this matrix, we performed PCA analysis using Genome-wide Complex Trait Analysis 1.94.1 (GCTA) [28] and constructed the phylogenetic tree using the neighbour-joining method in MEGA v7.0 [29].

Inbreeding Analysis and Subgroup Analysis
To assess the extent of inbreeding in the Yunling cattle population, we used PLINK with the parameter '--het'.Fixation indexes were calculated to estimate the degree of inbreeding in Yunling cattle using VCFTOOLS with the command '--fst-window-size 20000 --fst-window-step 5000'.Based on the result of ADMIXTURE, we also calculated H E /H O and F HOM for each subgroup separately to evaluate the inbreeding status of each subpopulation.

Diversity in Single-Nucleotide Polymorphisms, Genetic Diversity
More than 15 billion clean reads were aligned to the Yunling cattle reference genome sequence (BioProject accession number PRJNA978937), with an average alignment rate of 99.72% and an average percentage of properly paired of 99.72% (Supplementary Table S2).More than 49 million SNPs were identified; the density plots for filtered SNP quality are shown in Figure S1.The majority of the SNPs were detected in intergenic regions, which accounted for 69.8% of the total.Meanwhile, the remaining SNPs located in the open reading frame upstream and downstream each accounted for 0.5% of the total, while those located in introns and exons accounted for 28.3% and 0.7%, respectively.In exons, there are 145,317 non-synonymous CNVs and 190,812 synonymous CNVs (Figure 1).From Figure 2, it is evident that the LD decay of Yunling cattle is faster.This outcome is in line with Qiuming Chen et al.'s research, suggesting a potential lack of genetic diversity in the Yunling cattle breed [8].In 2021, Liu et al. computed the nucleotide diversity, observed heterozygosity, expected heterozygosity and inbreeding coefficient of Chinese indigenous breeds and Western breeds [30], and we used the same methodology and parameters to evaluate the relevant indices of Yunling cattle (Table 1).With the exception of Yunling cattle, all the data in Table 1 are sourced from the research conducted by Liu et al. [30].The pi value in Yunling cattle (0.000166) is notably lower than that of most Chinese indigenous and Western breeds, as shown in Table 1.Table 1 shows that the H O value of various breeds of cattle ranged from 0.094 to 0.317, with Yunling cattle ranking fourth (0.177) among all other breeds.The value of H E ranged from 0.089 to 0.279, with Yunling cattle ranking third from last (0.256) among all other breeds.It was unexpected that, in the Yunling cattle, the value of H O was significant smaller than that of H E .Additionally, in the whole population, the Yunling cattle displayed the highest heterozygosity deficit ((H E − H O )/H E ), which may indicate a high degree of inbreeding in the population, as shown in Table 1.The results suggest that Yunling cattle display a relatively low level of genetic diversity and a certain amount of inbreeding within the population.

Population Structure
Based on the ADMIXTURE analysis, it appears that the population is divided into two subgroups, as indicated by the CV error value, which was at its lowest point with a K value of 2 (Figure 3).When K = 2, the 29 individuals were clustered into a small group, while the rest of the individuals formed a markedly larger group.The result of ADMIXTURE when K = 2 is shown in Figure 4.In Figure 4, every individual is depicted by a slender vertical bar, which is divided into two coloured segments on the y-axis (orange = group I; green = group II).The lengths of these segments are proportional to the estimated probability of the group.The results of the neighbour-joining tree analysis were basically in agreement with the results of the ADMIXTURE analysis, and the 129 individuals were separated into two main groups, named subgroups I and II (Figure 5).However, the classification of individuals is not entirely consistent with ADMIXTURE.Subgroup I contains 28 individuals, while subgroup II consists of the remaining 101 individuals.The sample numbers of the two subgroups are shown in Supplementary Table S3.The genetic relationship among Yunling cattle was determined using a neighbour-joining dendrogram, which was constructed based on the genetic distance between individual animals.
We performed PCA on the genetic distance matrix between individuals and extracted the first three principal components.OPTICS clustering was performed on the three principal components.As a result, Yunling cattle could be classified into two subgroups, which is concordant with the results of NJ-tree analysis and admixture analysis.Not surprisingly, PCA's classification of the Yunling cattle group is not much different from the first two methods.The group I divided by PCA contains 40 individuals, and the remaining 89 individuals form subgroup II.To facilitate an intuitive comparison of the distinctions among the three grouping methods in classifying individuals, we constructed a three-dimensional scatterplot separately based on the first three principal components to visualise the associations between individuals.A three-dimensional plot is generated using the respective value of every sample across the first (PC1), second (PC2) and third (PC3) principal components.On the basis of the above three figures, we coloured the individuals in the figure according to the clustering results and obtained three figures corresponding to ADMIXTURE (Figure 6a), NJ-tree (Figure 6b) and PCA (Figure 6c), respectively.The clustering results of the three methods are shown in Supplementary Table S3.

Inbreeding Analysis and Subgroup Analysis
To have a better understanding of the genetic status of Yunling cattle, we calculated the homozygosity and heterozygosity of the two subpopulations of Yunling cattle, respectively (Table 2).From the data presented in Table 2, it can be observed that the heterozygosity in both subgroup I and subgroup II is lower than the expected heterozygosity.This indicates a certain level of inbreeding within each of the two subgroups, which is consistent with the analysis of the entire group.To evaluate the level of inbreeding within the Yunling cattle population, an estimation of the excess of homozygosity inbreeding coefficient (F HOM ) was computed.The results indicate that the mean value of F HOM in Yunling cattle (0.251) was at the intermediate level (Table 2), while the average values of F HOM in subgroup I and II were similar to those of the Yunling cattle group at a higher level.All the above evidence points to the occurrence of inbreeding events in Yunling cattle.
In order to analyse the breeding situation of Yunling cattle populations, we calculate fixation indexes.Among these measures, F IS is used to indicate the relative degree of inbreeding of individuals within a breed, with higher values indicating a greater degree of inbreeding [31].The F IS score of Yunling cattle is 0.309, which indicates a noteworthy level of inbreeding in Yunling cattle.F ST quantifies the extent of genetic difference among breeds, with higher F ST values representing greater genetic distances [31].We utilised F ST to estimate the differentiation among Yunling cattle population.The F ST value of the Yunling cattle population is 0.0274, which indicates a certain degree of differentiation between subgroup I and subgroup II.Meanwhile, this conclusion is consistently in line with the results of ADMIXTURE analysis, phylogenetic tree analysis and PCA analysis.

Discussion
Livestock and poultry germplasm resources are strategic resources that guarantee the provision of national livestock products and serve as the fundamental basis for the rejuvenation of the seed industry.The objective of conservation work is to maintain the genetic diversity of the population, mitigate the adverse effects of genetic drift and gene deletion, secure the thriving development and adaptability of the population to its environment and stimulate the enhancement and innovation of livestock and poultry breeds [32].Scientific evaluation of conservation effects can ascertain the existence of genetic risks in current populations and identify necessary interventions to alleviate them.The Yunling cattle breed is one of an excellent local breed in China and holds a significant position in the development of my country's livestock industry.However, the present comprehensive genome-wide genetic diversity and population structure are insufficient.Therefore, this research has assembled and annotated the genome of Yunling cattle by using whole-genome resequencing technology to examine 129 Yunling cattle specimens from the core breeding facility of the Yunnan Academy of Grassland and Animal Science.The study investigated genetic diversity, population structure and inbreeding status to provide direction for Yunling cattle breeding.The research findings serve as a scientific basis for the formulation of a Yunling cattle breeding protection plan.
In this study, we observed that, compared to other breeds, Yunling cattle exhibited decreased levels of nucleotide diversity.The decreased level of nucleotide diversity is potentially due to inbreeding or selective breeding during the formation of Yunling cattle in the last 30 years.This is consistent with previously reported findings on diversity, global and local pedigree components [8].Population structure analysis helps us to determine the relationships and interactions between individuals in a population.In our study, we identified that the Yunling cattle population was differentiated into two subgroups.
Mating among closely related individuals can result in several negative consequences, such as diminished genetic diversity within the population, amplified danger of disease in offspring, heightened impact of genetic drift and inbreeding depression.In this study, we confirm that there has been inbreeding in the Yunling cattle population, which has impacted both subgroups.This is one of the reasons why the genetic diversity in Yunling cattle is at a low level and the species advantage is degraded (decreased meat quality).In this context, the results of our study on the population structure and genetic diversity in Yunling cattle will be an important basis for the development of a scientific breeding programme for Yunling cattle and better conservation of the genetic diversity in this breed.
In summary, we have analysed the whole-genome sequence data of 129 Yunling cattle, relying on the assembled file and annotated file of the Yunling cattle genome in this study.The outcomes of the genetic structure and diversity analyses indicate that the Yunling cattle breed should be regarded as a precious genetic resource.There is an urgent need to optimise the breeding system to maintain sufficient genetic diversity and prevent the onset of inbreeding depression.

Figure 1 .
Figure 1.Functional classification of the detected SNPs.The pie chart on the left shows the composition of SNPs with the percentage of each of the following components labelled: intergenic, intronic, exonic, upstream, downstream, UTR3, UTR5, splicing and ncRNA.The pie chart on the right displays the distribution of exons and labels the respective number of each type, including synonymous, nonsynonymous, stopgain, stoploss and unknown variants.

Figure 2 .
Figure 2. Genome-wide average LD decay estimated for Yunling cattle.The vertical axis denotes the LD coefficient r 2 and the horizontal axis denotes the distance between the genes.

Figure 3 .
Figure 3. Cross-validation error of admixture analysis (K from 2 to 19).The x-axis represents the number of predetermined subpopulations, while the vertical axis shows the cross-validation error obtained from the ADMIXTURE analysis.

Figure 4 .
Figure 4. Population structure analysis of the 129 Yunling cattle individuals accomplished from K = 2.The BioSample numbers for the individual references are shown in labels.Each vertical bar represents a sample and the percentage of the colour indicates the probability of that sample being assigned to that subgroup.Orange for subgroup I; green for subgroup II.

Figure 5 .
Figure 5. Phylogenetic tree of the relationship among 129 Yunling cattle individuals.The BioSample numbers for the individual references are shown in labels.Orange for subgroup I; green for subgroup II.

Figure 6 .
Figure 6.The result of classification of 129 Yunling cattle individuals based on three different methods (orange = group I; green = group II).(a) The result of ADMIXTURE grouping.(b) The result of neighbour-joining tree grouping.(c) The result of PCA clustering grouping.

Table 1 .
Genetic diversity among the 26 cattle breeds.
pi, average nucleotide diversity; H E , expected heterozygosity; H O , observed heterozygosity; F HOM , excess of homozygosity inbreeding coefficient.

Table 2 .
Inbreeding indicators of Yunling cattle population and its subpopulations.