Genetic Differentiation among Livestock Breeds—Values for Fst

Simple Summary The degree of relationship among livestock breeds can be quantified by the Fst statistic, which measures the extent of genetic differentiation between them. An Fst value of 0.1 has often been taken as indicating that two breeds are indeed genetically distinct, but this concept has not been evaluated critically. Here, Fst values have been collated for the six major livestock species: cattle, sheep, goats, pigs, horses, and chickens. These values are remarkably variable both within and between species, demonstrating that Fst > 0.1 is not a reliable criterion for breed distinctiveness. However, the large body of Fst data accumulated in the last 20–30 years represents an untapped database that could contribute to the development of interdisciplinary research involving livestock breeds. Abstract (1) Background: The Fst statistic is widely used to characterize between-breed relationships. Fst = 0.1 has frequently been taken as indicating genetic distinctiveness between breeds. This study investigates whether this is justified. (2) Methods: A database was created of 35,080 breed pairs and their corresponding Fst values, deduced from microsatellite and SNP studies covering cattle, sheep, goats, pigs, horses, and chickens. Overall, 6560 (19%) of breed pairs were between breeds located in the same country, 7395 (21%) between breeds of different countries within the same region, 20,563 (59%) between breeds located far apart, and 562 (1%) between a breed and the supposed wild ancestor of the species. (3) Results: General values for between-breed Fst were as follows, cattle: microsatellite 0.06–0.12, SNP 0.08–0.15; sheep: microsatellite 0.06–0.10, SNP 0.06–0.17; horses: microsatellite 0.04–0.11, SNP 0.08–0.12; goats: microsatellite 0.04–0.14, SNP 0.08–0.16; pigs: microsatellite 0.06–0.27, SNP 0.15–0.22; chickens: microsatellite 0.05–0.28, SNP 0.08–0.26. (4) Conclusions: (1) Large amounts of Fst data are available for a substantial proportion of the world’s livestock breeds, (2) the value for between-breed Fst of 0.1 is not appropriate owing to its considerable variability, and (3) accumulated Fst data may have value for interdisciplinary research.


Introduction
Much research effort over the last 30 years has been applied to the characterization of livestock breeds by molecular genetics, primarily by microsatellite (MS) and singlenucleotide polymorphism (SNP) technologies. This research has usually aimed to support the conservation and sustainable utilization of livestock biodiversity, and also to elucidate the processes of domestication and the evolution and differentiation of breeds.
One of the outputs has been the calculation of the extents to which breeds have diverged from each other, and a very widely used measure for this genetic differentiation has been the F st statistic. As originally described [1], F st values from 0.05 to 0.15 were taken to indicate moderate differentiation between populations, from 0.15 to 0.25 is high differentiation, and greater than 0.25 is very high differentiation. In principle, F st could therefore be used to inform discussion relating to particular breeds, for example, that they are sufficiently different from each other to justify support for their conservation, or, conversely, that they are sufficiently similar for them to merge. In practice, F st measurements are not often used as the main genetic justification for policy decisions regarding breed conservation, but the large number of F st measurements available represents a data resource that could yield insights into overall patterns of breed differentiation. Indeed, genetic differentiation of breeds has often been placed in a spatial context by investigating how it is paralleled by geographic distance [2,3]. Further work has shown correlations with human [4,5] and ecological [6] diversity.
For many years, the literature has included such statements as " . . . the level often found between related breeds (e.g., F st > 0.1) . . . " [7]; "a close relationship between [two breeds] (F st = 0.019) . . . " [8]; "a threshold value . . . " [9]; " . . . strongly indicated that the two . . . are sufficiently different to be considered separate breeds" [10]; " . . . the overall differentiation assessed in the entire dataset was higher than most other studies carried out on European cattle . . . .." [11]; " . . . pairwise comparison . . . showed F st < 0.1 and suggested clearly differentiated populations . . . " [12]. The present study aimed to provide an extensive review of the literature, and is therefore a test of the informal hypothesis embodied in the foregoing statements; namely, that differentiation of breeds can be signalled by F st > 0.1.

Materials and Methods
Published data on F st calculations were obtained from MS or SNP studies on cattle, sheep, goats, horses, pigs, sheep and chickens. A keyword search was not made because F st is seldom used as a keyword or included in the title of a paper. The search proceeded initially by studying reference lists and citations of key papers such as [13][14][15][16][17] and, for cattle, an extensive bibliography assembled for a compilation on world cattle breeds [18]. Data presented solely as heatmaps or as Nei's genetic distance were not used. Attempts were made to obtain unpublished information directly from authors. Only data that clearly used Wright's F st [19] were used, and Reynolds genetic distance measures D R were transformed to F st [20]. F st calculations among herds or flocks were not used except when they related to differentiation between these entities and other, distinct breeds. Breed names and country affiliations were according to [18] when these were available, otherwise the usage considered most widespread and valid was adopted, or a breed name was assigned for the purposes of the study. Technical details such as sample sizes, numbers of alleles, and details of SNP technology were not considered. The references cited are listed in Table 1. Preliminary analysis when 30,000 breed pairs had been obtained showed interpretable patterns of distribution of F st for each of the twelve combinations of species and methodology (MS and SNP). Attention was then focused on recent publications, and a further 5080 breed pairs were added from a final total of 166 papers. No claim is made that this is a complete literature survey.
F st calculations were classified according to whether the two breeds involved were affiliated to the same country, or to different countries. Those of different countries were coded according to the spatial relationships of the two countries, as defined by their borders. Pairs that included a wild ancestor (as defined in the respective studies) were also considered ( Table 2). Table 2. Classification of geographical relationships of breed pairs.

Spatial Relationship of Breeds Code Geographical Class
In the same country 1-Same Same country In countries sharing a land border 2-Land-adj In countries sharing a water border (sea or lake) 3-Marine-adj In countries separated by a third country with land borders 4-Nbut1 Regional In countries separated by a third country with a water border 5-Nbut1marine In more widely separated countries 6-Remote Remote One member of breed pair a wild ancestor (1) 7-Wild_ancestor Wild ancestor (1) Mouflon, bezoar, wild boar, red jungle fowl, Przewalski horse.
For some analyses, to achieve an overview of coverage of different spatial relationships, F st values relating to wild ancestor were excluded, and those for breed pairs classified as 2-Land-adj, 3-Marine-adj, 4-Nbut1, 5-Nbut1marine were merged into a combined geographical class designated Regional.
Some breeds, occurring internationally and often with national prefixes, were identified here as global breeds (Table 3) regardless of their country affiliation. No sheep breeds were thus designated. Although Merino sheep, for example, are very widely distributed, these populations represent well-established distinct breeds [178] and there is no equivalent to the global trades in germplasm seen, for example, in Holstein cattle, Large White (Yorkshire) pigs, and Angora goats. All breed pairs that included a global breed were classified as 6-Remote.  [56]. (2) All commercial breeds, varieties, and strains.
Owing to the large number of breeds considered, direct assessment of which breeds had been characterized by which methodology was not practicable, but preliminary analysis suggested that global breeds were more frequently included in studies using SNP approaches than in those using MS. This was investigated by comparing-for each methodology × species combination-the frequencies of occurrence of breed pairs, which included a global breed.

Results
The literature search concluded in August 2021, with 35,080 F st calculations having been assembled ( Table 4). Numbers of breed pairs ranged very widely between studies, from 1 to 10,296. The breakdown of the dataset according to spatial relationship is in Table 5, and according to specific breeds in Supplementary Information File S1. The complete dataset is in Supplementary Information File S2. In order to characterize the range of F st values within each species × methodology group, the largest and smallest of the medians calculated for each spatial relationship were identified. The medians relating to wild ancestors were excluded for this purpose. Species varied in the degree to which different spatial relationships were covered in the literature. Reflecting the relatively small numbers of breed pairs in the four regional classes (Table 2), in Table 6 these were condensed into geographical classes 1-Same, 6-Remote and Regional. Of the F st calculations, 562 included a wild ancestor, and of the remaining 34,518, 59% (20,563) were of breed pairs classified as 6-Remote, 21% (7395) were Regional, and 19% (6560) were 1-Same. The proportion of breed pairs defined as 6-Remote was, for most species, higher in studies conducted with SNP methodologies than in MS studies. F st values for breed pairs also varied according to the spatial relationships of breeds. In all twelve (species × methodology) cases from cattle MS through to chicken SNP, differences in F st between spatial relationships were highly significant (p < 0.001; Kruskal-Wallis statistic, d.f. in brackets, respectively, 201.59 (5) The rank orders of the median F st values for each of the spatial relationship categories (excluding 7-Wild_ancestor) were significantly correlated (Kendall concordance test; for MS, W = 0.52, for SNP, W = 0.66, both p < 0.01).
These differences are illustrated in Figures 1 and 2, for MS and SNP data, respectively.

Discussion
It is reported [182] that there are 5517 livestock breeds in the world (1047 cattle, 1164 sheep, 580 goat, 720 horse, 569 pig, and 1437 chicken). It is evident that about one-fifth of the world's breeds are represented in the dataset assembled in this report; at least 1040 different breeds have been studied by MS, and 797 by SNP (both methodologies have been applied to some breeds, almost always in separate studies), respectively. Principal reasons

Discussion
It is reported [182] that there are 5517 livestock breeds in the world (1047 cattle, 1164 sheep, 580 goat, 720 horse, 569 pig, and 1437 chicken). It is evident that about onefifth of the world's breeds are represented in the dataset assembled in this report; at least 1040 different breeds have been studied by MS, and 797 by SNP (both methodologies have been applied to some breeds, almost always in separate studies), respectively. Principal reasons for this work have included characterization and conservation of this livestock biodiversity. Much of it has been on establishing the extent of differentiation of national breeds from those of remote countries (many of which are global breeds), with an emphasis on breed pairs of which one member was a national breed and the other was from a remote country (59% of breed pairs). Only 19% of breed pairs comprised breeds that were both of the same country. Thus, an unexpected result of this study has been to suggest that so far, as conservation is concerned, genetic studies have been more interested in introgression of breeds from abroad, than in maintaining the genetic distinctiveness of the diverse breeds of a country. This tendency is evident in both MS and SNP studies, particularly the latter. However, as 21% of breed pairs related to breeds of different-but nearby-countries, there has been a degree of interest in regional patterns of breed differentiation.
The original aim of this study was, however, to test the general prediction that a realistic threshold value for between-breed F st is 0.1. The ranges of F st values between pairs of breeds are shown to be so wide that this prediction appears obsolete for practical purposes. It is now very evident that the F st approach is only one method of visualizing the findings of genomic studies of breeds [183], and reports are now typically accompanied by genetic distance calculations, STRUCTURE plots, plots generated by multivariate statistics, and heatmaps, often within a framework of landscape genomics [12]. Nevertheless, there may still be a requirement for benchmark values of F st as indicating breed differentiation, for example for interdisciplinary studies or to provide a context for conservation genetics of wild populations. For these purposes, the following benchmarks could be adopted, based on the median values obtained in the present study, cattle: MS 0.06-0.12, SNP 0.08-0. The finding that different spatial relationships of breed pairs may influence F st values is novel but not surprising. F st statistics are well known [10,17,184] to lead to insights into patterns of migration and gene flow when placed in a geographical framework. In principle, a formal statistical analysis of the dataset assembled for this paper might enable quantification of the relative contributions of the different variates (to include species, methodology, and spatial relationship) but with the public availability of genotype data, the extensive meta-analysis of published F st values from earlier studies may itself be an obsolete approach, as raw genotypes from multiple sources could be combined and F st values reliably calculated from the pooled data.
The considerable amount of F st data accumulated over the last few decades is still likely to represent a valuable resource. It could be used to audit breed conservation activities, although it will not be a definitive determinant of whether a breed is truly distinctive [185,186]. At the level of original research, these data may help in the formation of hypotheses for future work on breed differentiation and, as awareness increases of their existence and accessibility, they could provide stimulus for new interdisciplinary research.

Conclusions
The use of specific values of F st as indicating breed differentiation is not justified, but benchmark values can be proposed for use in specified contexts. F st data, as obtained from published studies, represent a resource for interdisciplinary research. Data Availability Statement: All data are available in Supplementary Files S1 and S2.