Evolutionary Subdivision of Domestic Chickens: Implications for Local Breeds as Assessed by Phenotype and Genotype in Comparison to Commercial and Fancy Breeds

: To adjust breeding programs for local, commercial, and fancy breeds, and to implement molecular (marker-assisted) breeding, a proper comprehension of phenotypic and genotypic variation is a sine qua non for breeding progress in animal production. Here, we investigated an evolutionary subdivision of domestic chickens based on their phenotypic and genotypic variability using a wide sample of 49 different breeds/populations. These represent a signiﬁcant proportion of the global chicken gene pool and all major purposes of breed use (according to their traditional classiﬁcation model), with many of them being characterized by a synthetic genetic structure and notable admixture. We assessed their phenotypic variability in terms of body weight, body measurements, and egg production. From this, we proposed a phenotypic clustering model (PCM) including six evolutionary lineages of breed formation: egg-type, meat-type, dual purpose (egg-meat and meat-egg), game, fancy, and Bantam. Estimation of genotypic variability was carried out using the analysis of ﬁve SNPs, i.e., at the level of genomic variation at the NCAPG-LCORL locus. Based on these data, two generally similar genotypic clustering models (GCM1 and GCM2) were inferred that also had several overlaps with PCM. Further research for SNPs associated with economically important traits can be instrumental in marker-assisted breeding programs.


Introduction
The poultry industry is currently the most highly efficient livestock production sector, and the domestic chicken, Gallus gallus (Linnaeus, 1758), is the most common farm animal and a notable model organism (e.g., [1,2]). During long-term domestication of poultry, various breeds have been formed that differ significantly, both at the phenotypic and genetic levels, in terms of body weight, plumage color, and many other traits [3][4][5][6]. According to Bennett et al. [7], breeding efforts in the period from the late Middle Ages to the present day resulted in a doubling of the body size and a change in the skeleton morphology in chickens. Significant changes in overall appearance, arising from the synthesis of breeds through crossbreeding and selection, reflect the acquired biological characteristics of these birds and can be closely related to the purpose of use (i.e., utility types), and admixture of breeds formed from the moment of domestication and during further breeding. Therefore, information on the phenotypic diversity of poultry and its genetic background, including it would be interesting to identify the respective variants at the NCAPG-LCORL locus, which may be selection markers associated with egg production, growth, and development in chickens. A better understanding of genomic variation at this locus can help to adjust breeding programs for commercial, local, and fancy breeds and include marker-assisted selection (MAS) using NCAPG-LCORL SNPs. In this regard, the main goal of this study was to study the phenotypic and genomic variability at the NCAPG-LCORL region on GGA4 to characterize genetic differences between 49 purebred and crossbred chicken populations of various purposes of use (i.e., utility types) that belong to different evolutionary branches and TCM categories of domestic chicken. Our extensive survey of various chicken breeds and populations enabled to reveal variation at the NCAPG-LCORL locus in connection with the evolutionary lineage affiliation and purpose of use of these breeds. We also discussed use of this locus as a possible candidate for MAS for egg and meat productivity traits.

Experimental Design
All chicken experiments were conducted with an ethical approval of the RRIFAGB-Branch of the L. K. Ernst Federal Research Center for Animal Husbandry (Protocol No. 2020-4 dated 3 March 2020).
The objects of the present experiments were chickens from the RRIFAGB bioresource collection farm, officially referred to as the "Genetic Collection of Rare and Endangered Chicken Breeds" (Pushkin, St. Petersburg, Russia), based on which this investigation was carried out.
In total, the study included 954 individuals from the following 49 Table S1 for further details). The basic diet and maintenance conditions were similar for all the experimental birds and comply with zootechnic/zoohygienic standards.

Phenotypic Characteristics of the Studied Chicken Populations
To assess the phenotypic diversity, a core of the RRIFAGB bioresource collection was examined including the 39 breeds and populations of chickens that represent, according to TCM, various purposes of breed use (Tables S2 and S3). We collected a series of phenotypic traits including body weight of sexually mature birds at the age of 52 weeks and 13 main body measurements at the age of 330 days were obtained. The body measurements were produced using a compass and a tape measure and included: body length (cm), body slanting length (cm), body and neck length (cm), keel length (cm), chest girth (cm), chest depth (cm), pectoral angle ( • ), distance between shoulder joints (cm), distance between hip joints (cm), femur length (cm), tibia length (cm), shank length (cm), and shank girth (cm). Morphometric parameters for hens and cocks specific to each breed/population are presented in Tables S2 and S3, respectively. Egg production was assessed over a 52-week life period and egg weight at 35 weeks of age (Table S2).
A variety of chicken exterior parameters was further examined relative to TCM and other clustering models (evolutionary affiliation), and genotypes at the NCAPG-LCORL locus.

Genotyping of Chickens and SNP Data Processing
DNA used in this study was isolated from whole blood cells by the conventional phenol-chloroform method. SNP scanning was performed using an Illumina Chicken 60K SNP iSelect BeadChip (Illumina, San Diego, CA, USA). Ten SNPs were selected in the NCAPG-LCORL region on GGA4 using the PLINK 1.9 program [33] and the extract function. Quality control of genotyped SNP loci was proceeded using PLINK 1.9. Additionally, DNA samples with a SNP genotyping quality of more than 90% were included in the further analysis as assessed using the GenomeStudio 2.0 software (Illumina Inc., San Diego, CA, USA). A threshold was set for Hardy-Weinberg equilibrium (HWE) errors (at p < 0.0001), and only SNPs with a minor allele frequency (MAF) of more than 5% were taken into account. After all filtering steps, five SNPs were selected for subsequent computations.

Mathematical and Statistical Analyzes
To assess variability of phenotypic traits, a heatmap for the distribution of the 39 phenotyped chicken breeds and populations was built using the heatmap function in the stats package (R v. 4.2; [34]). Fuzzy analysis clustering (FAC), principal component analysis (PCA), and average linkage clustering (ALC) were implemented using R and libraries for the R environment, along with Euclidean distance metric. For the FAC method, we applied the fanny function from the cluster package [35,36]. In the case of ALC, the unweighted pair group method with arithmetic mean (UPGMA) method was employed using the Euclidean distance matrix between objects with a bootstrapping validation using 1000 bootstrap samples. The pvclust software package for R [37] was used to calculate approximately unbiased (AU) p-values and bootstrap probability (BP) values. Clusters with AU values over 95% were considered significant. The optimal number of clusters was selected for interpretation using the Elbow method [38] and the factoextra package for R [39].
To estimate genotypic variation, biometric data processing was performed using PLINK 1.9 and Microsoft Excel programs. Generation of admixture models for distinguishing clusters based on genotype data for the five selected SNPs at the NCAPG-LCORL locus, including cross-validation (CV) error plots for determining the number of ancestral populations (K), was carried out using the ADMIXTURE program [40].
Degree of genetic diversity of the analyzed populations was evaluated by the indicators of observed (H o ) and expected (H e ) heterozygosity as calculated based on data for the genotype and allele occurrence frequencies at each polymorphic locus. Assessment of genetic diversity between populations was also carried out using the Hudson F ST statistics implemented in the EIGENSOFT 6.1.4 software package [41].
The correspondence of the allele and genotype frequency distribution to HWE in each population was verified using the χ 2 test. In particular, we analyzed the H o − H e deviations for each locus in accordance with the HWE principle that also states that genotype frequencies are related to allele frequencies by simple (quadratic) relationships [42]. Estimation of the reliability of the data obtained was carried out using the Pearson χ 2 test. If the obtained value of the χ 2 statistic is greater than the critical one (i.e., 3.84, the number of degrees of freedom being 1), it was concluded that there was a shift in the genetic equilibrium in an analyzed population. For the studied populations, the inbreeding coefficient F is was also calculated [43]. Using the STATISTICA 10.0 package (Statsoft, Inc./TIBCO, Palo Alto, CA, USA), a comparison of differences in allele frequencies between population groups (clusters) according to TCM and other clustering models was conducted by employing the nonparametric Kruskal-Wallis test for one-way analysis of variance (ANOVA) by ranks because the data did not pass the normality test. Boxplots were generated using STATISTICA 10.0, as well.
Pairwise values of the F ST statistic and the average within-group and between-group divergence D were calculated using the SMARTPCA package (part of EIGENSOFT, [41]), as described elsewhere [44]. Unrooted neighbor-joining dendrograms were constructed based on pairwise F ST statistics and average between-group divergence D using the online T-REX tool with the tree inference, also known as the NJ option [45].
The Chicken QTLdb database [46] was explored to test if any of the five significant SNPs identified in the present study would overlap with the previously established QTLs at the NCAPG-LCORL locus.
Linkage disequilibrium (LD) between pairs of SNPs in the 49 chicken populations was assessed using the PLINK 1.9 software, the D coefficient proposed by Lewontin (see for review [47]), and the squared Pearson correlation coefficient r 2 [48]. LD block structure was determined according to the solid spine algorithm as implemented in the Haploview 4.2 software [48] wherein the specified parameters corresponded to the values D ≥ 0.75.

Analysis of Phenotypic Traits and Breed Clustering by Phenotype
As a result of the survey across the 39 breeds and populations of chickens, values of phenotypic traits were obtained for adult birds of both sexes (Tables S2 and S3). Variation was observed in terms of body weight, 13 main body measurements, egg number over a 52-week period of life, and egg weight. Herewith, the average body weight of females was 2.23 ± 0.80 kg, and that of males 2.49 ± 0.90 kg. The average egg production was 149.55 ± 27.67 eggs, and the average egg weight was 57.14 ± 4.60.
The clustering patterns by phenotypic traits in the breeds and populations observed were produced using more sophisticated mathematical approaches and are discussed in detail below.
Before the onset of the experiments, all breeds and populations in the RRIFAGB bioresource collection were a priori divided into groups in accordance with TCM, i.e., with five generally accepted categories (purposes of breed use, or utility types), namely ETBs, DPBs (with two subcategories, EMBs and MEBs), MTBs, GBs, and FBs (Table S1).
For all phenotypic traits (Tables S2 and S3) measured in hens and cocks, FAC and PCA clustering of breeds and populations (Figure 1a,b) revealed configurations (patterns) of breed grouping that differed from that in TCM. In these analyses, the first component corresponded to 75.5% of genetic variability, and the second one was responsible for 9.1% of genetic variability. In general, using both FAC ( Figure 1a) and PCA (Figure 1b), six clusters were identified that we attributed, with a certain degree of conventionality, to ETBs (cluster I), DPBs (II) (including subclusters EMB/IIa, and MEB/IIb), MTBs (III), GBs (IV), FBs (V), and BTBs (VI; i.e., carriers of the dwarf mutation). We designated this breed grouping pattern as the phenotypic clustering model (PCM).   Table S1. Within clusters I, IIa and IIb, there was a core of breeds that more or less adequately matched the basic characteristics of each breed category, as well as several other breeds that joined the core due to the similarity of the main phenotypic traits, regardless of their affiliation according to TCM and often in contrast to TCM. For example, the RWD breed was assigned to BTBs instead of MTBs by its phenotypic characteristics (Tables S2-S4).
A clear differentiation and a significantly isolated position on the FAC and PCA plots (Figure 1a,b) were observed for the only representative of the MTB category, i.e., the WC × (BL × SL) population (single cluster III). A separate position was also occupied, on the one hand, by clusters of two true GBs (IV), i.e., MGs and UGs, and, on the other hand, by BTBs and similar dwarf chickens (VI). Interestingly, a separate FB cluster (V) included the following three closely spaced and similar breeds with the most pronounced ornamental characters: PWB, and two varieties of the Pavlov breed (PS and PW). The suggested PCM and the respective subdivision of breeds into six phenotypic clusters were further used for comparison with other clustering patterns (models) obtained on the basis of both phenotypic traits and SNP genotypes at the NCAPG-LCORL locus.
Similarly to PCM (as in Figure 1a,b), the populations were located on the FAC and PCA plots that were produced for morphometric characters only (Figures S1a and S2a). At the same time, a different pattern can be seen in the FAC and PCA plots for egg production characteristics (Figures S1b and S2b).
A similar (as in Figure 1a,b) distribution of breeds and populations by phenotypic traits was discovered when using other mathematical methods of comparative analysis, i.e., building a heatmap based on indicators of body weight, egg weight, and egg number (Figure 2), and phylogenetic UPGMA trees using Euclidean distances for different groups of characters (all, Figure S3a; morphometric, Figure S3b; and egg characters, Figure S3c). Some incongruent clustering patterns observed in comparison with the FAC and PCA plots (Figure 1a,b) were explained in each case by peculiarities of the applied mathematical algorithms and a sample (set) of the phenotypic indicators used.

Population Genetic Parameters
As a result of SNP scanning, five significant SNPs were identified in all the 49 breeds and crossbreds in a region on GGA4 that harbors NCAPG-LCORL, as well as in the flanking regions as follows: GGaluGA265966, GGaluGA265969, rs15619223, rs14491017, and rs14491028 (Table 1). Genotype frequency distributions for the five SNP markers in the 49 chicken populations are presented in Table S4. Analysis of the actual and theoretical distribution of genotypes (data not shown) revealed a significant shift (p < 0.05) of the genetic equilibrium for the rs14491017 substitution in the Ts × SL crossbred chickens (χ 2 = 4.25), that for the rs14491028 substitution in the chicken breeds Ts (χ 2 = 5.71), NH (χ 2 = 5.00), and Pu (χ 2 = 3.95), and for GGaluGA265969 substitution in the chicken populations F (χ 2 = 4.36) and PRB (χ 2 = 5.14). and columns (traits) are ordered in accordance with hierarchical clustering as displayed through the respective dendrograms. Lighter colors correspond to lower correlation, and darker ones to higher correlation. Population codes are given according to Supplementary Table S1.

Population Genetic Parameters
As a result of SNP scanning, five significant SNPs were identified in all the 49 breeds and crossbreds in a region on GGA4 that harbors NCAPG-LCORL, as well as in the flanking regions as follows: GGaluGA265966, GGaluGA265969, rs15619223, rs14491017, and rs14491028 (Table 1). Genotype frequency distributions for the five SNP markers in the 49 chicken populations are presented in Table S4. Lighter colors correspond to lower correlation, and darker ones to higher correlation. Population codes are given according to Supplementary Table S1. For the substitutions GGaluGA265966 and rs15619223 in all analyzed populations of the bioresource collection, regardless of breed, the χ 2 values did not significantly exceed the critical value, that is, there was no significant difference between the observed and expected heterozygosity. Lower occurrence frequency of individual genotypes did not correlate with a shift in the genetic equilibrium since it was a consequence of lower frequency of individual alleles. In general, it can be noted that the populations were in the genetic equilibrium.
Inbreeding coefficient, F is , in the studied populations of the bioresource collection varied in the range -1 ≤ F is ≤ 1 (data not shown). The maximum negative value was found Agriculture 2021, 11, 914 9 of 23 in the RWP chicken population, and the maximum positive one in the CB population. With increasing positive value of the coefficient, a rise in inbreeding occurred in the studied populations. At the GGaluGA265969 locus, two breeds, Pu (F is = 0.44) and F (F is = 0.46), had higher positive values. FB chickens of the BMF, SW, PS, and PW breeds differed from all other analyzed groups in that the C allele was found in all their individuals, which could indicate a strong selection pressure on this locus. As for GGaluGA265966, the UM population had F is = 0, with H obs = H exp , that suggested a panmixia in this population at this marker locus.
In almost each category of chicken breeds, except GBs, some separate populations were monomorphic and had only one certain genotype for the rs15619223 marker. For instance, in three MTB populations (WC1, WC2, and WC × (BL × SL)), only the AA genotype was found. Interestingly, for this same SNP, associations with body weight and egg weight were previously found in RWG [50,51]. Thus, this locus can be considered as a potentially effective candidate for selection, since it could be selected in the process of targeted selection based on the desired trait of body weight. It can also be noted that, in the ETB populations of RWS and RWP, only the CC genotype was found at rs14491017 in all individuals.

Clustering of the Analyzed Chicken Breeds by Genotypes
Using the results of the genotype frequency distributions for the five SNPs in the 49 genotyped breeds and chicken populations (Table S4), we performed the appropriate clustering procedures that showed population grouping patterns that was somewhat different from those for TCM and PCM. On the respective FAC and PCA plots ( Figure S4a,b), a quite clearly distinguishable, isolated localization of ETB and MTB clouds was observed, with a cloud of DPB populations between them. In this case, the first component contributed to~30% of the total genetic variability, i.e., significantly less than the analogous first component when using the phenotypic traits, and the second component accounted for 17% of genetic variability. Hereby, the observed clouds (clusters) were more diffuse along the contours. Besides the true representatives of each cluster (ETBs, MTBs, or DPBs), breeds and populations classified in other TCM and PCM categories were joining them. The ETB cluster was subdivided into two distinct subclusters (Ia and Ib), and the DPB cluster could also be subdivided into two subclusters (IIa and IIb). In addition, each cluster (subcluster) had outliers, e.g., LLB and HSSD in the ETB subcluster Ia, AoB in the DPB subcluster IIb, and CB and Pu in the MTB cluster, which also did not always correspond to the definition of either cluster. We designated the resulting distribution pattern of populations relative to each other ( Figure S4a,b) as the genotypic clustering model 1 (GCM1).
Construction of the UPGMA tree using Euclidean metrics led to a slightly different clustering of the analyzed groups of chickens ( Figure S4c). We conditionally designated this pattern as genotypic clustering model 2 (GCM2), which consists, as it were, of four population groups (clusters). In the first GCM2 group, we conditionally combined two ETB and two BTB populations from the GCM1 subcluster Ia. The second GCM2 cluster involved one DPB/MEB and one BTB population from the GCM1 cluster III; and the third GCM2 cluster involved three FB populations (those that formed a separate FB cluster in PCM), one ETB, and one DPB/EMB population from the GCM1 subcluster Ia. The largest was the fourth GCM2 cluster that consisted of the following two large subclusters: IVa (mainly MTBs with adjoining purebred populations and crossbreds from the GCM1 cluster III) and IVb (breeds and crossbreds from the GCM1 subclusters Ib, IIa, and IIb).

Assessment of Breed Admixture by Genotypes
In general, there was a significant admixture of the 49 chicken genotyped breeds and populations, which made it somewhat difficult to identify clearly pronounced admixture models. Comparative analysis of admixture was carried out between groups of breeds subdivided both in relation to their purpose of use (TCM) and according to other clustering models (PCM, GCM1, and GCM2). The plots for determining the number of K groups in the admixture models that could best fit the obtained data showed that the optimal number of ancestral populations was close to the values of K = 6 ( Figure S5).
Analysis of admixture by genotype frequencies revealed some genetic characteristics of individual populations and their groups at the NCAPG-LCORL locus using TCM, PCM, GCM1, and GCM2 ( Figure S6). Particularly, clustering analysis of admixture models revealed to a certain extent the subdivision of individuals according to their origin. At the same time, ETB and MTB chickens, as a rule, tended to form two separate patterns (clusters), which is consistent with the results of genotype frequencies and variants of the compared clustering (classification) models.

Genetic Differentiation of Populations by Alleles
At the first stage of determining the genetic differentiation of the 49 breeds and populations by allele frequencies in the five SNPs (as presented in Supporting Information (SI) S1), they were combined into classes and subclasses according to TCM. When comparing allele frequencies for the GGaluGA265969 substitution using the Kruskal-Wallis test (SI S2), we found out that ETB chickens significantly differed from MTBs and FBs, MTBs from DPBs, DPBs from FBs (at p < 0.05; SI S2a and SI S3a), and the MEB subgroup from FBs (at p < 0.05; SI S2f and SI S3a). For the GGaluGA265966 substitution, differences were found for the ETB group in comparison with the MTB and DPB groups (p < 0.05; SI S2b and SI S3b), as well as with MEB (p < 0.05; SI S2g and SI S3b). For the SNPs rs15619223 and rs14491028, ETB chickens differed from all other groups and subgroups at p < 0.05 (SI S2c,h and SI S3c; and SI S2e,j and SI S3e, respectively). MTB chickens differed in frequency from the DPB group in the SNPs rs15619223 and rs14491017 (p < 0.05; SI S2c and SI S3c; and SI S2d and SI S3d, respectively). In addition, the MTB group had significant differences with the MEB subgroup in allele frequencies at the rs15619223 locus (p < 0.05; SI S2h and SI S3c), as well as with the EMB and MEB subgroups at the rs14491017 locus (p < 0.05; SI S2i and SI S3d). FB chickens significantly differed from the ETB, DPB, and MEB subgroups at GGaluGA265969 (p < 0.05; SI S2a and SI S3a), and also from ETB, MTB, and DPB at rs15619223 (p < 0.05; SI S2c,h and SI S3c).
Next, we compared clusters and subclusters if combined according to PCM for the significance of differences in allele frequencies in the five SNPs (SI S3f-j). All PCM clusters/subclusters differed in pairs at all five marker loci, except for FBs which had significant pairwise differences at four loci. At the same time, FBs had the greatest number of significant pairwise differences, i.e., 17. For other clusters/subclusters, the following number of significant pairwise differences was found: ETBs and MTBs, 12 each; MEBs, 11; EMBs, 8; and GBs, 6.
Genetic differentiation between the analyzed groups of chickens by allele frequencies was also compared based on paired F ST . The differences for the calculated pairwise F ST values were significant at p < 0.05. In the case of TCM (data not shown), the maximum F ST distances relative to other groups were obtained for ETBs when paired with MTBs (0.   1 based on allele frequencies in the five SNPs at the NCAPG-LCORL locus among the studied chicken breeds/populations grouped by differences in phenotypic traits (according to PCM that involves six clusters as in Figure 1a,b). An unrooted neighbor-joining dendrogram built on the basis of pairwise F ST values for six PCM groups ( Figure 3) seemed to adequately reflect both TCM and PCM, and the contemporary concept of the main evolutionary branches of domestic chickens [3]. Herewith, FBs and BTBs were closer to ETBs (on the right side of the tree), GBs to MTBs (on the left side of the tree), and DPBs occupied an intermediate position between these two major parts of the dendrogram. Similar relationships between the six PCM groups were also foun between-group divergence values D based on allele frequencies (Ta metrics, a phylogenetic neighbor-joining tree was built (data n completely coincides in its topology with the FST tree ( Figure 3). Addit tree was generated based on the analysis of genotype frequencies (F its topology had a slightly different spatial configuration, mutual pos branches relative to each other generally coincided with the topology  (Table 2) inferred from allele frequencies in the five breed/population groups as clustered according to PCM (Figure 1a,b).
In addition, we note that, judging from values of the ave divergence D, a clear interrelation was also observed within each of ( In general, the results of examining the populations and the frequencies in the five SNPs showed that the compared groups genetically from each other, which has also been confirmed by the frequencies.

Overlapping of SNPs with QTLs
Using the Chicken QTLdb database [46], we found that the NC (75,897,761…75,920,718; GRCg6a build [49]) harbors 15 known S  (Table 2) inferred from allele frequencies in the five SNPs for six chicken breed/population groups as clustered according to PCM (Figure 1a,b).
Similar relationships between the six PCM groups were also found for paired average between-group divergence values D based on allele frequencies ( Table 2). Using the D metrics, a phylogenetic neighbor-joining tree was built (data not shown), which completely coincides in its topology with the F ST tree (Figure 3). Additionally, an UPGMA tree was generated based on the analysis of genotype frequencies ( Figure S7). Although its topology had a slightly different spatial configuration, mutual positions of individual branches relative to each other generally coincided with the topology of the F ST tree.
In addition, we note that, judging from values of the average within-group divergence D, a clear interrelation was also observed within each of the six PCM groups ( Table 2). BTBs and ETBs had the highest D values (1.20 and 1.27, respectively); MTBs, GBs, and FBs had the lowest D values (0.79-0.84); and DPBs showed a more intermediate value (1.11).
In general, the results of examining the populations and their groups by allele frequencies in the five SNPs showed that the compared groups of chickens differ genetically from each other, which has also been confirmed by the data on genotype frequencies.
Out of the five significant SNPs discovered in the present study (Table 1), two, rs15619223 and rs14491017, were directly listed in the Chicken QTLdb database. For one more, rs14491028, there was a very close interval flanked by two known SNPs. Anyway, all the five SNPs were located near and within the NCAPG-LCORL region and overlapped with known QTLs for egg weight, egg production and growth traits (SI S4).  Figure 4). Further, a comparative assessment of these populations was carried out. As a result, a population specificity of the LD structure at the NCAPG-LCORL locus was found in these 11 chicken populations. In Figure 4, SNPs GGaluGA265966 in PW, BMF, and PS, and rs14491017 in RWS were removed from the LD analysis due to their complete heterozygosity. Two populations of the RW breed (RWP and RWS), FS, and CB showed a strong linkage between all the five SNPs, making up a common 102-kb block.  There were an analogous block structure and almost complete LD between the substitutions GGaluGA265969 and rs15619223 observed in the FS and CB breeds (r 2 = 1 and r 2 = 0.89, respectively), as well as a similar LD between the substitutions GGaluGA265969 and rs15619223 in BB and FS (r 2 = 1 and r 2 = 0.73, respectively).
In the breeds ZS, PC, and NH, a 49-kb block of similar structure can be distinguished, which included the following three polymorphisms: GGaluGA265969, GGaluGA265966, and rs15619223. In the same three breeds, linkage was revealed between the SNPs GGaluGA265966 and rs14491017: at r 2 = 1 for NH and PC, and at r 2 = 0.84 for ZS.
The PC and BMF breeds had the same 63-kb block structure between the substitutions GGaluGA265969, rs15619223, rs14491017, and rs14491028. Moreover, high r 2 values (0.90 and 0.80) were found in these two breeds between the rs15619223 and rs14491028 nucleotide substitutions. Figure 5 illustrates the distribution of haplotypes in the 11 studied populations. In the populations studied, a different degree of haplotype diversity was found, while only three breeds contained the same haplotypes. Of the hypothetically possible 45 haplotypes (when a single diploid individual has nine possible haplotypes) and based on the five SNPs at the NCAPG-LCORL locus, we identified 36 haplotypes. The same haplotypes The color scheme represents a linkage rate between the SNPs: white, no linkage (r 2 = 0); light gray, slight linkage (0 < r 2 ≤ 0.5); dark gray, strong linkage (0.5 < r 2 <1); black, full linkage (r 2 = 1).
There were an analogous block structure and almost complete LD between the substitutions GGaluGA265969 and rs15619223 observed in the FS and CB breeds (r 2 = 1 and r 2 = 0.89, respectively), as well as a similar LD between the substitutions GGaluGA265969 and rs15619223 in BB and FS (r 2 = 1 and r 2 = 0.73, respectively).
In the breeds ZS, PC, and NH, a 49-kb block of similar structure can be distinguished, which included the following three polymorphisms: GGaluGA265969, GGaluGA265966, and rs15619223. In the same three breeds, linkage was revealed between the SNPs GGaluGA265966 and rs14491017: at r 2 = 1 for NH and PC, and at r 2 = 0.84 for ZS.
The PC and BMF breeds had the same 63-kb block structure between the substitutions GGaluGA265969, rs15619223, rs14491017, and rs14491028. Moreover, high r 2 values (0.90 and 0.80) were found in these two breeds between the rs15619223 and rs14491028 nucleotide substitutions. Figure 5 illustrates the distribution of haplotypes in the 11 studied populations. In the populations studied, a different degree of haplotype diversity was found, while only three breeds contained the same haplotypes. Of the hypothetically possible 45 haplotypes (when a single diploid individual has nine possible haplotypes) and based on the five SNPs at the NCAPG-LCORL locus, we identified 36 haplotypes. The same haplotypes (CAA, CGA, TGA, and TAA) were found in the breeds NH, PC, and ZS, with haplotype CGC being found in NH and PC. It should also be noted that there were the formation of a common SNP haploblock and the presence of complete LD between nucleotide substitutions in the above breeds.
(CAA, CGA, TGA, and TAA) were found in the breeds NH, PC, and ZS, with haplotype CGC being found in NH and PC. It should also be noted that there were the formation of a common SNP haploblock and the presence of complete LD between nucleotide substitutions in the above breeds.

Analysis of Phenotypic Traits and a Model for Clustering Breeds by Phenotypes (PCM)
To preserve and use the available poultry genetic resources effectively, priority should be given to phenotypic characterization [52,53]. Phenotypic traits, primarily body weight and linear body measurements, are not only of economic importance, but also significant, alongside genetic parameters, in the classification of domestic animals, their assessment, and the search for ways to improve their productivity (e.g., [3,[54][55][56]). Phenotypic and morphometric evaluation is often used in chicken breeding because it is simple, quick, and cost-effective (e.g., [3,53,[57][58][59]). Body measurements are not uncommon in assessing the diversity of indigenous chicken breeds (e.g., [53,56,59,60]).
Based on the phenotypic diversity analysis of a wide range of global chicken gene pool breeds, we constructed PCM and compared it with TCM. For this purpose, we took phenotypic characteristics reflecting the morphometric parameters of the breeds including body weight and 13 body measurements of hens and cocks, as well as two main traits of egg productivity: egg number for 52 weeks of life and egg weight (Figure 1). We have shown that morphometric characters provided the greatest contribution to the formation of the breed clustering pattern in accordance with the PCM. This was clearly seen on the FAC and PCA plots (Figures S1a and S2a), in which the morphometric characters were used alone (i.e., without egg production traits), while fully resembling the PCM pattern. If clustering was achieved only based on egg production traits, the PCM pattern was disrupted (Figures S1b and S2b). At the same time, if we compare the characteristics of egg number and body weight, we can see that the contribution of the former to the clustering of breeds significantly exceeds the contribution of the latter (Figures 2 and S1a,b). Thus, we can assume that among the morphometric and all phenotypic traits in general, the main contribution to PCM seems to be made by the traits of body measurements in females and males. Moiseyeva et al. [3] stated that body

Analysis of Phenotypic Traits and a Model for Clustering Breeds by Phenotypes (PCM)
To preserve and use the available poultry genetic resources effectively, priority should be given to phenotypic characterization [52,53]. Phenotypic traits, primarily body weight and linear body measurements, are not only of economic importance, but also significant, alongside genetic parameters, in the classification of domestic animals, their assessment, and the search for ways to improve their productivity (e.g., [3,[54][55][56]). Phenotypic and morphometric evaluation is often used in chicken breeding because it is simple, quick, and cost-effective (e.g., [3,53,[57][58][59]). Body measurements are not uncommon in assessing the diversity of indigenous chicken breeds (e.g., [53,56,59,60]).
Based on the phenotypic diversity analysis of a wide range of global chicken gene pool breeds, we constructed PCM and compared it with TCM. For this purpose, we took phenotypic characteristics reflecting the morphometric parameters of the breeds including body weight and 13 body measurements of hens and cocks, as well as two main traits of egg productivity: egg number for 52 weeks of life and egg weight (Figure 1). We have shown that morphometric characters provided the greatest contribution to the formation of the breed clustering pattern in accordance with the PCM. This was clearly seen on the FAC and PCA plots (Figures S1a and S2a), in which the morphometric characters were used alone (i.e., without egg production traits), while fully resembling the PCM pattern. If clustering was achieved only based on egg production traits, the PCM pattern was disrupted (Figures S1b and S2b). At the same time, if we compare the characteristics of egg number and body weight, we can see that the contribution of the former to the clustering of breeds significantly exceeds the contribution of the latter ( (Figures 2 and S1a,b). Thus, we can assume that among the morphometric and all phenotypic traits in general, the main contribution to PCM seems to be made by the traits of body measurements in females and males. Moiseyeva et al. [3] stated that body measurements, being quantitative traits, have a high heritability coefficient (h 2 ≈ 0.5) and a lower intrapopulation variability. These features determine their high correlation with the genetic structure of breeds that was created over a relatively long evolutionary process of their development.
To the best of our knowledge, the proposed assessment of phenotypic diversity using phenotypic clustering of chicken breeds that we designated PCM, as well as the assessment of the contributions of certain phenotypic traits to this PCM, have not been specifically and purposefully examined in other previous studies. It is noteworthy that we carried out such assessments using quite a vast part of the world gene pool of chicken breeds that have different origins (often mixed), purpose of use, and the degree of admixture. Moiseyeva et al. [3] were also able to demonstrate good distinguishability of phenotypic (morphometric) traits to reproduce plausible breed differentiations and topologies of appropriate phylogenetic trees, even using smaller samples of 8-10 different breeds.
Our research not only has confirmed the main insights of the work by Moiseyeva et al. [3], but also significantly expanded them. Due to a larger sample of the global gene pool, including a great number of synthetic breeds and populations, and in the light of the clustering data obtained, we were able to revise the concept of four main evolutionary lineages of breed formation in domestic chickens as postulated by Moiseyeva et al. [3], which embraced ETBs, MTBs, GBs, and BTBs. According to our updated concept based on PCM, it is also necessary to distinguish two more evolutionary branches of chicken breeds, namely DPBs and FBs (Figure 1a,b). The PCM postulated by us and the appropriate concept of six major evolutionary lineages of chicken breed formation would, in our opinion, most fully reflect the entire spectrum of phenotypic diversity of the world gene pool of domestic chickens.
It should also be noted that our concept of evolutionary subdivision of domestic chickens and breed formation based on PCM also has certain discrepancies with TCM adopted in conventional and specialized literature on poultry (e.g., [32]). TCM is grounded on a rather simplified, speculative, and, to a certain extent, very artificial scheme for dividing chicken breeds into conditional utility types, i.e., purposes of use, when only one criterion for selecting poultry for egg or meat productivity can determine if a breed belongs to a certain TCM class. This traditional approach does not consider history of the origin (often mixed) and development of this breed, the degree of its synthetic nature and introgression, as well as the whole complex of its phenotypic features. This also results in significant artificiality, fuzziness, and uncertainty in terms of a defined phenotypic characterization and classification among numerous synthetic breeds from the EMB and MEB subcategories, which are commonly referred to as DPBs. The FB class looks no less artificial and speculative in TCM because it involves any breeds (including BTB) kept by fanciers for their phenotypic diversity and not related to explicit ETBs, MTBs, and GBs. Our proposed concept and PCM produced through an appropriate analysis of the phenotypic diversity of chicken breeds would highlight and correct the said shortcomings of TCM.

Genotypic Models of Clustering and Admixture
Concerning the development of two genotypic clustering models (GCM1 and GCM2), in a number of instances we noted both their congruence with each other and some similarities with PCM. For example, breeds and populations from the GCM2 cluster I were fully included in the GCM1 subcluster Ia (ETB), and members of the GCM2 subcluster IVa in the GCM1 cluster III (MTBs). Breeds of the PCM cluster V (FB) were stably clustered together in the GCM1 subcluster Ia and in the GCM2 cluster III, etc. The existing differences between the models can be attributed to the various algorithms used to construct them, and to the differences in the nature of the compared traits. It should also be noted that, for PCM, we used a fairly wide set of phenotypic traits describing both morphology and performance characteristics of poultry, while for genotypic models, one region of the chicken genome corresponding to the NCAPG-LCORL locus was employed, which apparently narrowed the resolving power of analyzing the breeds and populations by genotypes. Many other investigations including Moiseyeva et al. [3] also observed an unequal contribution of certain sets of phenotypic/genetic factors to the resulting patterns of phylogeny/clustering among examined breeds.
Nevertheless, combining data on genotyping (i.e., allele frequencies in the five SNPs) and on their grouping according to PCM showed significant differences between groups (clusters) of breeds and an expected topology of the breed phylogenetic tree (Figure 3) in full accordance with our proposed updated concept of the evolutionary subdivision of domestic chickens into six evolutionary lineages of breed formation. In addition, the observed differences at the NCAPG-LCORL locus between the identified groups (clusters) of breeds did not contradict the previous information on the relationship of variability at this locus with certain phenotypic traits (e.g., [50,51,61,62]). Considering the above, we would suggest the importance of our findings at the NCAPG-LCORL locus that demonstrated a genetic variability among a wide sample of commercial, local, and fancy chicken breeds, including synthetic ones, which largely overlaps with the patterns of their phenotypic variability and does not contradict the general ideas about history of formation and development of a particular breed.
An important aspect of our study was the detection of significant admixture among the 49 genotyped chicken breeds and populations. This was expressed in complex patterns of admixture models ( Figure S6), which, as a rule, did not allow us to clearly distinguish among themselves the classes (clusters) that we proposed for four models (TCM, PCM, GCM1, and GCM2). This significant admixture can be explained by the history of mixed origin of the studied breeds, i.e., a synthetic nature of their formation, when genetic makeup of many breeds was composed by mixing the genomes of several original, distinctive breeds and/or due to individual crossbreeding events with other breeds and introgression during their breeding (e.g., [63]). Nevertheless, by exploring the obtained patterns of admixture models, one could generally notice certain differences between ETB and MTB chickens, which is consistent with the results of the genotype frequency analysis. We would also suggest that the admixture models themselves can be considered as an auxiliary tool in clarifying evolutionary signatures of subdivision and inference of demographic history among chicken breeds and populations.

Analysis of Genetic Variation at the Locus NCAPG-LCORL
Characterization of genetic variation and population genetic structure based on SNPs at key chicken performance-related genes not only helps to determine the characteristics of various commercial, fancy, and local breeds and populations, including synthetic and highly admixed ones, but also to assess whether this information can be useful in MAS [50,64]. We genotyped chickens of various purposes of use (TCM) and phenotypes (PCM) using the Illumina Chicken 60K SNP iSelect BeadChip, which revealed the presence of the five significant SNPs on GGA4 in the area of the NCAPG-LCORL locus and flanking regions.
The differences found at the genetic level, i.e., in the SNPs within the locus covering the NCAPG-LCORL genes may be of great interest, since these genes are one of the key regulators of RNA polymerase II transcription and have pleiotropic effects in terms of body weight/size and egg weight/size [65,66]. Guo et al. [67] identified genomic variants associated with the size and mass of chicken bones at the NCAPG-LCORL locus. SNPs at the NCAPG gene (rs14491030) and the LCORL gene (rs14699480) were associated with egg weight [68]. According to Yi et al. [17], rs14491030 at the NCAPG gene can simultaneously affect both egg weight and body weight. In an investigation by Barkova and Smaragdov [69], significant associations of rs14491030 at the NCAPG gene with egg weight and shell elastic deformation were found. Sun et al. [70] showed that NCAPG had a definite effect on the eggshell quality in pullets. The pleiotropic effect of the NCAPG-LCORL locus may be related to the fact that egg weight affects the body weight of chickens at birth, their physical shape and further performance [71]. Eggshell, providing gas exchange between an embryo and environment and supplying calcium to embryo bones, is of biological importance for the development of avian embryos [72]. In a study by Liu et al. [61], the LCORL locus was significantly associated with the body weight of chickens. It was also shown that the LCORL gene has different levels of expression in slow-growing and fast-growing broiler chickens.
In the present investigation, we found that following even TCM, ETB chickens in general were significantly different from chickens of other purposes of use (utility types). So, for the GGaluGA265969 substitution, significant differences were shown when comparing the ETB-MTB and ETB-FB group pairs. Presumably, such differences could be due to the fact that this SNP might be associated with the body weight of hens.
SNP rs15619223 showed the largest number (28) of significant pairwise differences between chicken groups of various use/productivity types (TCM) or various clusters/subclusters (PCM and GCM1) (SI S3). According to our previous studies [50,51], this SNP is probably associated with body weight and egg weight in RW chickens. Although for SNPs GGaluGA265966 (14 significant pairwise differences), rs14491028 (14 differences), and rs14491017 (23 differences) we did not find any other reports on their relationship with productive traits in chickens, the presence of polymorphism and a significant difference in allele frequencies between groups of contrasting chicken breeds (SI S3) would suggest considering these SNPs as potentially momentous.
In general, for three models, TCM, PCM, and GCM1, we demonstrated a reliable resolution of the obtained data on allele frequencies in the SNPs of the NCAPG-LCORL locus to discriminate chicken breeds and populations related to ETBs, MTBs, DPBs (EMBs, MEBs), GBs, or FBs (SI S3). Thus, based on our own data available for the region on GGA4 that involves NCAPG-LCORL, we can confirm that there is a definite relationship between genetic variation at this locus and phenotypic diversity in chicken breeds, and this is an important QTL for body weight and egg weight [61].
The maximum heterozygosity for a particular locus is achieved when the frequencies of its alleles are equal and depends on the number of alleles. This is relevant for populations under selection for a desired trait. In the conditions of artificial breeding of populations, parental pairs are assorted, while relationship of individuals participating in mating increases. As a result of mating closely related individuals (i.e., inbreeding), proportion of homozygous genotypes is growing in a population. An important feature of inbreeding is the constancy of allele frequencies in all inbred generations observed while the number of heterozygous genotypes declines [73]. In our study, we used closed gene pool populations with a small population size and identified isolated cases of inbreeding (see Section 3.2.1). However, due to paucity of the studied populations, assessment of the inbreeding level in them would be possible with full confidence in further detailed genome-wide studies.
Among the identified SNPs, a shift in genetic equilibrium was found in crossbred chickens Ts × SL and in purebred populations Ts, NH, Pu, F and PRB (χ 2 > 3.84), which may be indicative of strong selection pressure and a consequence of the selection of offspring from the best sire.
In the SNP-assisted analysis of genetic differentiation between chicken groups using TCM and paired F ST , significant differences were detected for ETB chickens in comparison with MTBs, EMBs, GBs, and MEBs, as well as MTBs and FBs. With applying PCM (Table 2), greatest F ST distances were shown for FBs relative to other clusters. These results of the F ST -based analysis suggest a certain "predictive power" of the five SNPs for discriminating chickens of different other groups (clusters) when employing TCM and PCM.
The five significant SNPs identified in this study overlapped with the previously established QTLs for egg weight, egg production and growth traits at the NCAPG-LCORL locus (SI S4), confirming that these SNPs are relevant to and important for evaluating genomic variation among the studied local, commercial and fancy chicken breeds.
In the present study, we also investigated LD structure at the LCORL gene among various chicken breeds. Close LD was observed in the FS, CB, and BB breeds, suggesting their common origin as the Cochin, Brama, Dorking, and Houdan were FS ancestors. Chickens of the ZS breed were developed by crossing the RW, RIR, NH, and YC breeds. PC, such as NH, was created using the RIR breed [31]. The common origin of the listed breeds could explain their almost complete identity in LD. A similar conformation and diminutive sizes of the BMF and PS chicken breeds probably determined the same structure of their LD patterns. This may be due to the fact that polymorphism in the LCORL gene in animals can have a significant effect on the height, skeletal size, bone formation, and muscle development during embryogenesis [62].
When examining LD haploblocks between the SNPs in the studied chicken populations, important information was obtained regarding a population specificity of the haploblock structure at the NCAPG-LCORL locus for the 11 breeds/populations. These included two ETB populations (RWS and RWP), two EMB populations (FS and BB), three MEB populations (NH, PC and ZS), two FB populations (PS and PW), and two BTB populations (BMF and CB).
LD analysis revealed 36 haplotypes for these breeds. Common haplotypes are confirmed by the origin of breeds and similar phenotypes, suggesting a common mechanism for the formation of these LD patterns. Four similar haplotypes in the three MEB (NH, PC, and ZS) breeds, as well as one more common haplotype in NH and PC, can be associated with their common descent from the same ancestral breeds.
Due to the intensive selection in the breeding program and a significant reduction in the population size, changes in LD were observed in the chicken gene pool populations over several generations. Characterization of LD is of fundamental importance in carrying out genome-wide association analysis and genomic selection, as well as in identifying recent genomic rearrangements. LD can make a certain positive contribution when used in MAS in the future poultry farming [74]. At the same time, the costs of genotyping SNPs can be significantly reduced since markers are usually linked with each other in the area of influence on a trait [75]. Comparative assessment of populations with different demographic histories is an important source of information on changes in the genome when breeding small groups and assessing the results of crossing [64,76].

Conclusions
According to the FAO recommendations [52], to characterize the genetic resources of agricultural animals, it is necessary to rely on three types of information: phenotypic, genetic, and historical, which we attempted to do in the framework of this study. Herein, we proposed and studied four models of clustering (classification), i.e., TCM, PCM, GCM1, and GCM2, using a large spectrum of the global gene pool of chicken breeds that often have a synthetic genetic structure, significant admixture, and possible introgression. Based on these models, we expanded the earlier concept of four evolutionary lineages in domestic chicken breeding postulated by Moiseyeva et al. [3]. Our updated concept of evolutionary subdivision and breed formation includes six evolutionary branches of domestic chickens: ETBs, FBs, BTBs, DPBs, MTBs, and GBs as represented on the appropriate phylogenetic tree (Figure 3). We also found a complex and indistinct character of admixture models for many phenotypically and genotypically diverse breeds, many of which have a specific demographic history and a synthetic genetic blueprint.
Additionally, we discovered significant differences in allele frequencies for the analyzed SNPs at the NCAPG-LCORL locus in the small chicken populations of different phenotypes and purposes of use. This may point out that the importance of this locus in understanding the genetic basis for the formation of productive traits in poultry. Chickens of ETB and MTB are genetically far apart and formed two distinct clusters. Analysis of LD patterns revealed a close linkage between the SNPs in some DPB populations, which may prove a history of their origin from common ancestral breeds. Based on the data obtained, it can be assumed that the presence of LD can be an auxiliary tool in carrying out MAS for a small number of markers associated with each other in the area of influence on a trait, which can significantly reduce the costs of genotyping SNPs in general.
In addition, the current authentic allele composition in FB chickens at the NCAPG-LCORL locus was characterized, suggesting their gene pool as unique "inexhaustible" genetic resources. Since the NCAPG-LCORL locus is one of the key regulators of RNA polymerase II transcription and has a noticeable pleiotropic effect in relation to body weight and size, as well as egg size in chickens, we characterized genetic variants at this locus in various commercial, fancy, and local breeds and populations. Our research findings seem to be relevant for the purpose of identifying individuals that are carriers of SNP variants potentially associated with economically important QTLs. This information can also be in demand in MAS for meat and egg performance in chickens.
Genome-wide association studies can identify SNPs related to formation of phenotypic traits. This provides a unique opportunity to improve the methods for choosing markers of choice for MAS using information about many SNP markers across the entire genome, and thus increasing the selection accuracy. The data obtained can also be used in determining parental pairs and correcting selection targets in local chicken populations. The tested approaches in analyzing SNPs show their high efficiency and will be put into practical use in breeding and preservation programs for small chicken breeds and populations. Importantly, the current thorough phenotypic and genotypic survey involved several remarkable native breeds of Russia and the former USSR including OM [77], YC [78,79], RW [50,51,64], PC [80][81][82][83], PS/PW, RC, MG, UM, and UG. The obtained information will be helpful in their future conservation and commercial use.
Overall, our investigation has further contributed to solving the problem of evolutionary subdivision of domestic chickens including implications for synthetic breeds and admixture through a comprehensive assessment of phenotypic variation and genotypic structure at the NCAPG-LCORL locus, an important QTL for body meat and egg production traits.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/agriculture11100914/s1, Figure S1: Fuzzy Analysis Clustering-based plots showing the distribution of the 39 phenotyped breeds/populations and built using (a) morphometric traits and (b) egg production traits; Figure S2: PCA plots showing the distribution of the 39 phenotyped breeds/populations and built using (a) morphometric traits and (b) egg production traits; Figure S3: Plots of phylogenetic UPGMA-based trees constructed in comparison with PCM (shown via color coding) and using Euclidean distances for different sets of phenotypic traits; Figure S4: Clustering plots for the 49 chicken breeds/populations based on their genotypes in the five SNPs at the NCAPG-LCORL locus and produced using (a) Fuzzy Analysis Clustering, (b) PCA, and (c) an UPGMAassisted tree using Euclidean metrics; Figure S5: Number of ancestral populations using genotypes in the five SNPs at the NCAPG-LCORL locus for the 49 genotyped chicken breeds/populations and applying different clustering models; Figure S6: Population structure based on the genetic variability in the 49 genotyped chicken breeds/populations for five SNP markers at the NCAPG-LCORL locus and produced by Bayesian clustering using the ADMIXTURE program (as visualized in color in R v.4.1.0); Figure S7: An UPGMA-based tree using Euclidean distances for six chicken breed clusters according to PCM and based on genotype frequencies in the five SNPs at the NCAPG-LCORL locus among the 49 genotyped chicken breeds/populations; Table S1: Forty-nine chicken breeds, strains and crosses used in the study and listed in accordance with TCM; Table S2: Phenotypic traits in females among the 39 studied chicken gene pool breeds/populations listed in accordance with PCM; Table S3: Phenotypic traits in males among the 39 studied chicken gene pool breeds/populations listed in accordance with PCM; Table S4: Frequency distribution of genotypes for five SNPs at the NCAPG-LCORL locus on GGA4 among the 49 studied chicken gene pool breeds/populations listed in accordance with PCM; Supporting Information (SI) S1: Allele frequencies in the five studied SNPs at the NCAPG-LCORL locus; SI S2: Significance levels of differences in allele frequencies for the five SNPs; SI S3: Boxplots and significance of differences in allele frequencies for the studied five SNPs at the locus NCAPG-LCORL as assessed for chicken breed groups; SI S4: Known QTLs at the NCAPG-LCORL locus and relevant SNPs in the Chicken QTLdb database.