Balancing at the Borderline of a Breed: A Case Study of the Hungarian Short-Haired Vizsla Dog Breed, Definition of the Breed Profile Using Simple SNP-Based Methods

The aim of this study was to determine the breed boundary of the Hungarian Short-haired Vizsla (HSV) dog breed. Seventy registered purebred HSV dogs were genotyped on approximately 145,000 SNPs. Principal Component Analysis (PCA) and Admixture analysis certified that they belong to the same population. The outer point of the breed demarcation was a single Hungarian Wire-haired Vizsla (HWV) individual, which was the closest animal genetically to the HSV population in the PCA analysis. Three programs were used for the breed assignment calculations, including the widely used GeneClass2.0 software and two additional approaches developed here: the ‘PCA-distance’ and ‘IBS-central’ methods. Both new methods calculate a single number that represents how closely a dog fits into the actual reference population. The former approach calculates this number based on the PCA distances from the median of HSV animals. The latter calculates it from identity by state (IBS) data, measuring the distance from a central animal that is the best representative of the breed. Having no mixed-breed dogs with known HSV genome proportion, admixture animals were simulated by using data of HSV and HWV individuals to calibrate the inclusion/exclusion probabilities for the assignment. The numbers generated from these relatively simple calculations can be used by breeders and clubs to keep their populations under genetic supervision.


Introduction
Currently, there exist approximately 450 dog breeds [1], mainly created during the Victorian era in the mid-19th century [2] bred for various working abilities including hunting, guarding, herding, morphometric, and behavioural standards which have become the dominant determinants of selection. "Breed-defining" phenotypic traits were under strong selection pressure, which reduced heterozygosity and initiated a fixation process on regions harbouring the genes with a major effect on these traits. Accordingly, a dog breed can be considered as a kind of homogeneous strain with special phenotypes and genomic makeup [2]. It was at this point that breeding clubs were formed, registration of pedigrees was required, and the reproductive isolation of the breeds gradually led to increased genetic differentiation [3,4], creating breed-barriers around the populations. Now stand the fundamental questions: which dogs are in, and which dogs are out of this border? Which dogs can be assigned to breed, and which ones must be excluded? Before the advent of the genetic marker-based population analyses, the breed affiliation relied on breed specific phenotypes. Later, advanced genetic markers, including microsatellites Table 1. Pointer-Setter, Retriever, and Spaniel clades, breeds, and their acronyms [24].

Modelled Animals
To test how the proposed assignment methods work, dogs with different HSV genome proportions were needed in addition to the purebred HSVs. Genotypes of these hypothetical animals were artificially created from the actual genotype data acquired by genotyping. Admixed animals were created as follows: An 'empty' genome was loaded in steps from the genotypes of individuals of 24 Vizsla-related breeds (Table 1) [24] and from an HSV genotype. A total of 924 SNPs (Table S1) were used. These SNPs are in ascending order of chromosome number and position. In order to assemble the genome sequentially from 25 animals, approximately 1/25 of each genome contributes to the artificial individual. In this instance, 38−38 of the 24 breeds and 12 SNPs of the 25th GORD breed were entered into the artificial genome (Table 2). For example, the first region contains 38 SNPs from the ESSP, followed by the next region from the CKCS breed and so forth. According to this allocation, the end of one chromosome and the beginning of the next chromosome could be included in a region delimited by 38 SNPs. Following these methods, two artificial animals (Admix1 and Admix2) were created to serve as the controls in GeneClass2 runs. The first column shows the genome assembly steps, the second shows the breed name abbreviations in the database [24], the third denotes the number of SNPs representing a genome region, and the fourth represents the accumulated serial number of the SNPs ordered by chromosome and position (Table S1). * denotes the step where this study's HSV animals were included in the construction of admixed animals.
For testing the ability of inclusion/exclusion of animals at different admixture levels by GeneClass2 software, PCA-distance, and IBS-central, different percentages of loci of selected HSVs were replaced by the genotypes of an HWV animal in the same manner as described above.

234 K SNP-Set
SNP typing of the samples was accomplished using a chip containing the SNPs of Illumina Canine HD chip (Illumina, San Diego, CA, USA) as well as SNPs by the LUPA consortium [25,26]. Genotyping was performed by Neogen Corporation (Ayr, UK). The SNP-chip contains 234 K SNPs including the subset of markers described by Parker et al. [24].

Merge of Databases and Filtering for HSV-Enhanced SNP-Set
An important aspect of this research was to compare the former version of Illumina Canine HD (174 K) genotypes of the breeds included in the database [24] with the genotypes of the Hungarian samples typed on the Canine HD (234 K) chip. The 174 K [24] and the 234 K datasets were merged and used in the Admixture ( Figure 1) and PCA studies ( Figure 2). In the merged dataset, only those SNPs that were present in both sets were used, which had a call rate above 0.95. After filtering, the merged dataset contained 145,453 SNPs. It was then used to search for HSV-enhanced markers. The HSV was compared to all the other breeds in the Parker et al. database [24], except the VIZS breed, by calculating the Fst values of the markers. Only those SNPs were retained from the HSV-enhanced set for further analyses, which had Fst values higher than or equal to 0.4 and were not in linkage disequilibrium (threshold = 0.5, composite haplotype method, [27]). Finally, 924 SNPs were An important aspect of this research was to compare the former version of Illumina Canine HD (174 K) genotypes of the breeds included in the database [24] with the genotypes of the Hungarian samples typed on the Canine HD (234 K) chip. The 174 K [24] and the 234 K datasets were merged and used in the Admixture ( Figure 1) and PCA studies ( Figure 2). In the merged dataset, only those SNPs that were present in both sets were used, which had a call rate above 0.95. After filtering, the merged dataset contained 145,453 SNPs. It was then used to search for HSV-enhanced markers. The HSV was compared to all the other breeds in the Parker et al. database [24], except the VIZS breed, by calculating the Fst values of the markers. Only those SNPs were retained from the HSVenhanced set for further analyses, which had Fst values higher than or equal to 0.4 and were not in linkage disequilibrium (threshold = 0.5, composite haplotype method, [27]). Finally, 924 SNPs were selected into the HSV-enhanced set and were used in GeneClass2 software, PCA (Figure 3), PCA-distance, and IBS-central methods.   Tables 1 and 2.

Calculated Indices and the Software Packages Applied
Call rate calculations of markers and samples as well as Fst values of the markers, were performed by the SNP & Variation Suite (SVS) program (GoldenHelix, Bozeman, MT, USA). Genome-wide pairwise IBS values were determined by both SVS and PLINK [28]. The above-mentioned matrix of pairwise IBS values in PCA was performed by SVS and PLINK to acquire the positions of the animals relative to each other.
To assess the ratio of mixed ancestry of animals, the ADMIXTURE software v.1.3 was used with the -cv option to determine the most probable cluster number (K) from the value of cross-validation error in each Ks [29]. The cross-validation was performed five times, and the algorithm was terminated when the log-likelihood increased by less than 10 −4 between iterations. Before analysis, the alleles of the SNP loci were recoded to numerical values 1 and 2 by PLINK using the -recode12 switch as required by ADMIXTURE.
The inclusion probabilities were determined by GeneClass2 [11], and distances to reference points were determined by the PCA-distance and IBS-central methods. In GeneClass2, the computation goal was to assign individuals to a reference population by choosing Rannala & Mountain [5] criterion and the simulation algorithm of Paetkau et al. [30].

PCA-Distance
PCA-distance is built on the coordinates given by principal component analysis. In PCA, the animals are positioned in a three or more-dimensional space. From a reference point in that space, the standardized distance to individuals can be determined ( Table 2). The reference point, defined here as the median of HSV individuals, is calculated solely from the principal component values of HSV animals. The outgroup is a single HWV individual (HWV1) being the closest to the HSV group in the PCA analysis (Figures 2  and 3). This HWV individual indicates the maximum distance to the median of HSVs.
During the assignment step of a new animal into the population, the PCA coordinates of the new and all RP animals are determined as well. As a result, the PCA coordinates of individuals change at each consecutive assignment step, and the standardised distances of all animals to the actual HSV-median are recalculated.

IBS-Central
This model is based on genetic similarity to a central animal. The method calculates the pairwise genetic similarity matrix of the individuals in the studied population (PLINK, SVS). In that symmetric matrix, the values of the pairs remain unaffected constants between two animals during the iterative calculations. Each dog is characterised by the sum of the identity by state values in that row (IBS sum ). The individual with the maximum sum value (IBS max_of_sums ) is the central animal being the most similar one to all other individuals. The delta value of an animal is equal to IBS sum − IBS max_of_sums . The delta values are normalised between 0 to 1. The 0 value belongs to the central animal, and the 1 value indicates the outgroup, the HWV1 individual.
When inserting a new animal into the population, it is sufficient to calculate the pairwise IBS values of the new individual relative to existing individuals in the population, but all IBS sum values of the animals must be recalculated. An insertion can also change the identity of the central animal.

Admixture
If the analysed populations are listed clockwise (Figure 1), the first two clades are more distant relatives of the Vizsla breed: the Spaniels and the Retrievers. The Pointer-Setter Clade varieties, including the HSV and HWV breeds, begin with the DALM and end with the GORD population. At the end of the round, the other seven breeds are found-Hungarian breeds which are not related to the Vizsla. Among the K2-33 levels, the K = 18 grouping has the lowest cross-validation error rate; this is the optimal group size. The subject of this study, the HSV group, already forms a completely homogeneous set Genes 2022, 13, 2022 7 of 18 at the K = 2 level and maintains this until K = 20. Figure 1B depicts the HSV population in more detail and highlights four individuals (HSV24, 30, 38, and 68) who are slightly different genetically from the majority of HSV animals. The HWV and VIZS groups show strong similarity. Since only the origin of HWV and HSV individuals are known to us, and not that of VIZS individuals, the reason for the genetic similarity of the populations can only be speculated. The HWV group is also similar to the HSV group. This is consistent with the history of the HWV breed since it was created by crossbreeding of the HSV and the GWHP breeds during the 1930's [31].
There are seven Hungarian breeds unrelated to the Vizsla separated quickly and uniformly at early K values, such as the KOM and KUV groups. Some of these do not give a uniform pattern within themselves, even at K = 20, such as the HG. The PUL and PUM populations do not separate from each other, even at K = 30, reflecting the close relationship between the two species. Additionally, the MUD group displays strong similarities with the PUL and PUM groups.
For the breeds coming from Parker et al. [24], Spaniels and Retrievers at the K = 20 level are well structured except for CCRT, with some breeds showing individuals protruding from the group (ESSP and ECKR). In the Pointer-Setter group, there are breeds separated and structured to the K = 20 level, including DALM, WEIM, and ESET, and there are some that do not differ in this analysis even at K = 20, including WHPG-GWHP-GSHP-BRIT. Finally, some are not structured at all, presumably due to the small number of samples ( Figure 1). For more details see Parker et al. [24].

PCA on Merged Dataset, 145,453 SNPs
The PCA study with 145,453 SNPs on 26 species clearly distinguishes both the HSV group and the overlapping HWV and VIZS groups from the other species ( Figure 2). Eigenvalues of axes 1 to 3 are 15.731, 7.008, and 6.229, respectively.

PCA on HSV-Enhanced, 924 SNPs
PCA analysis was also performed on 26 breeds with the HSV-informative set containing 924 SNPs ( Figure 3). The first component (eigenvalue = 106.785) separated the HSVs and closely related HWVs with much greater power than in the previous analysis ( Figure 2), where the eigenvalue of the first axis was 15.731, but, at the same time, the other breeds were closer to each other. The distribution tendency of the animals was similar as seen in Figure 2, but the Vizsla populations stretched more in the height of the Y-axes. This resolution showed that four HSVs (samples 24, 30, 38, and 68) were somewhat detached from the main 'cloud' of this population towards HWV. The HWV and VIZS groups were closely aligned.

Assignments
The individuals in the RP and the individuals added in each step were random. However, they have followed the order of the animals in this database.

Assignments of 40 HSVs
The initial number of the RP was 30 (Table 3). At each assignment step, two new animals were offered to the constantly increasing RP. There were 20 assignment steps altogether. For each step, five individuals were offered to the RP, comprised of two new HSV individuals and the following three negative controls: HWV1, being the closest to the HSV cloud on the PCA plot (Figure 3), and two artificially admixed animals (Admix1 and Admix2).
For an animal to be assigned, the GeneClass2.0 software provides an inclusion probability number ranging from zero to one. Zero is a complete exclusion, and one indicates a maximal fit into the population. As expected, GeneClass2.0 accepted all HSV individualsbeing purebreds by registry and confirmed genetically by the Admixture program-and excluded all three negative controls. The negative controls always acquired zero values, while HSV samples had values from 0.112 to 0.999. Exceptions were two animals at the fourth and nineteenth entry step with 0.045 and 0.007 inclusion probabilities, HSV38 and HSV68 (Table 3). Table 3. Assignments of animals to the reference population (RP) by GeneClass2. Starting number of the RP is 30. Animals to be assigned are two in numbers in each step. After each step, the RP number increases by two.

Assignment
Step HSV ID Assignment Probability The PCA analysis located the HSV45 and HSV67 animals into the centre of the HSV population distribution (Figures 2 and 3, HSV45 and HSV67 are not marked), having high inclusion probabilities, 0.921 and 0.998, respectively, during GeneClass2 assignments (Table 3). Two additional animals from the periphery, HSV38 and HSV68, having low inclusion probabilities, 0.045 and 0.007, respectively, in GeneClass2 output (Table 3), were also used for creating artificially admixed animals. The genomic proportions of these four animals were diluted/replaced by the genome of the HWV1 individual. The first column of Table 4 lists the extent of the HWV portion. In the row of 0%, the original inclusion probabilities of the four HSVs are shown. These values are slightly different from those shown in Table 3, mainly because these individuals were reassigned to 66 HSVs.
In the first step, the RP is formed from 30 HSV individuals. A HWV1 individual sets the maximum PCA-distance value, highlighted in red. PCA-distance values above 0.6 were also marked in red. The RP in each column has a minimum value, indicated in blue, which points to the animal currently closest to the HSV-median of the PCA ( Table 5).
The minimum values vary among seven individuals in a total of 21 consecutively expanding RPs (compacted dataset; Table 5, full dataset; Supplementary Table S3). Some individuals regained the lowest value twice or three times. The dynamics and influence of insertions and recalculations can be tracked by the fluctuating positions of the individuals ( Figure S1). This dynamic nature could be noticed in the movement of the values above 0.6 as well ( Table 5). In that table, five individuals are protruding from the main population (HSV24, 28, 30, 33, and 68). At least once, four individuals show a value higher than 0.6, which also varies with an increasing RP. The value above 0.6 is displayed more than once for HSV28 but went below 0.6 at later entry steps, never regaining its high value.

IBS-Central
Among all 30 individuals of the starting RP, the pairwise IBS values were determined. These values are in a 30 × 30 matrix (Table S2). In the final column of a row, the values are summed. In the RP, the central animal is the one having the highest summed value, normalized of zero. As in the PCA-distance method, an outer border had to be defined, which is determined by the HWV1 individual. This dog had the smallest summed IBS value, or, in other words, it had the lowest similarity to all other HSVs. Its value is standardized to one.
The layout of Table 6 is identical to that of Table 5. Here, the blue colour specifically indicates zero, which belongs to the central animal in a particular RP. The red colour indicates the maximum value set by HWV1 and any value above 0.4.

Standardised Values of Modelled HSVs by PCA-Distance and IBS-Central Methods
To test the functioning behaviour of the two methods presented here, the same animals (HSV38, 68, 45, 67) and their admixed versions, with 10% increments in the genomic fraction of the HWV1 (Table 7), were used as previously presented with GeneClass2 (Section 3.3.1). The PCA-distance gave 1.3-2.3-fold higher standardised distances or exclusion probabilities compared with the IBS-central method. In both approaches, the most protruding specimen, HSV68, reaches and even exceeds one in the PCA-distance at 60-80% of the HWV ratio.

Discussion
In the admixture analysis ( Figure 1A), the PUM and PUL groups appeared to be very similar, as the Pumi breed was created by crossbreeding the primitive Puli with German and French terrier-type herding dogs [32]. HWV and VIZS groups showed strong similarity to each other and also to the HSV group. The HSV population formed a homogeneous subset, confirming the purebred status supported by the pedigree information provided by the breeders. Within this group, four individuals (HSV 24, 30, 38, and 68) were identified who displayed slightly different admixed patterns than those of the majority of HSV animals ( Figure 1B). These peripheral animals have also been highlighted by PCA GeneClass2, PCA-distance, and IBS-central approaches.
The PCA analysis confirmed the separation of HSVs from other, related breeds both by using the merged dataset and the Parker et al. [24] dataset by using the HSV-enhanced 924 SNPs. The last set was extracted from the larger set by contrasting HSV and the remaining breeds. This smaller set, used in subsequent analyses, has larger discrimination power determined from the eigenvalue of axis one, which increased from 15.731 ( Figure 2) to 106.785 (Figure 3). Since the HSV-enhanced set contains the most different loci of HSV from the other breeds, the resolution of Vizsla individuals improves, while that of the other breeds is slightly deteriorating.
The unified first step in the assignment procedures is to build an initial RP. It is unrealistic to determine the size of an RP only in a few animals, but the optimization began from a very low number of dogs in testing the PCA-distance and IBS-central methods ( Figure S2). The size of the initial RP was set at 30 individuals, and its size was increased from this point.
GeneClass2 does not automatically exclude individuals that had already been included in the RP in the initial assignment steps but later appeared to be outliers. This is a cautious technical approach because such a removal is especially undesirable when the number of the RP is still low, since an animal that appears as an outlier could fit later with increased number of genotyped animals becoming available during the subsequent assignment steps. With higher RP numbers, the RP is more likely to represent the entire breed, including animals not yet genotyped and/or bred outside the country.
Assignment expectations based on the admixture analysis of 40 HSVs to the RP containing 30 HSV dogs have been confirmed by GeneClass2 (Table 3). Two animals (HSV24 and 30), which proved to be slightly different from the majority of the HSV population by admixture analysis (Figure 1B), were randomly assigned into the RP. The other two animals (HSV38 and 68), which also appeared to be more admixed in Figure 1B, had much lower inclusion probabilities than the other 38 specimens.
To test how GeneClass2 classifies differentially admixed animals, assignments of simulated HSVs and HWV animals were performed. To create initial genomes that are gradually diluted, two HSVs with low and two with high inclusion probabilities were selected based on the values of Table 3.
As expected, the inclusion probabilities declined steadily faster during the dilution of HSV genomes with HWV genomes in the case of the two peripheral animals: HSV38 and 68. These animals zeroed at 50 and 30%, respectively, while HSV45 and 67 of the central core of the HSV population, zeroed at 70 and 80% genome exchange, respectively (Table 4). In all four cases, a large decrease in the inclusion probability value was observed at 10% HWV ratio. The smallest decrease occurred in HSV67, which shows that the method detects the foreign genome fraction with good sensitivity even in a small proportion and even if this foreign proportion is coming from a very close breed.
These two newly presented assignment approaches, at the current stage, do not decide on inclusion/exclusion. Prior to each assignment step, the size of the RP should be modified based on the previous step in the same way as done here in the case of GeneClass2. If there is an animal that has a significantly different probability of belonging to the main group than the others, it will be shifted towards the periphery of the group. When the RP is large enough, individuals above the empirically set threshold can be removed after each assignment step. The inclusion/exclusion threshold may change with the increase in the RP and could be determined based on the actual data. Both the PCA-distance and IBS-central methods give values between zero and one, which can be interpreted as an inclusion/exclusion probability. That number represents the entire genome and its similarity or dissimilarity from all other animals in the population. When this number is closer to the value one, it could be interpreted that one or more of the individual's ancestors must have belonged to a related breed. Consequently, an animal with a value above a certain threshold, which must be established on professional considerations, is not desirable to be classified in the RP. This decision process could be called "balancing at the breed boundary".
Individuals who do not fit into the RP will drift to the periphery with this approach. If they do not shift closer to the core in the subsequent steps, they must finally be removed manually at a high RP number, when it can be assured that the individual is not one of the extreme specimens of a breed but an outlier animal.
The PCA-distance approach generates a measure of fit to the current RP by specifying the standardised distance of a given individual relative to the median value calculated from the individuals of the RP. Accordingly, it does not matter whether the PCA places the individuals in a two, three, or higher-dimensional space. This method has two notable points: the median value of HSV individuals and the cut-off/maximum value from the median of HSVs, which is designating the outer circle of the breed boundary. Above that maximum value is the territory of another breed. A single individual of the nearest breed is enough for establishing this demarcation point. Given the experience of working with HSVs, it is obvious that its closest relative breed is the HWV. Within HWV, a single individual has been chosen, HWV1, which is closest to the HSV population.
In the IBS-central approach, the most prominent representative of the actual RP of the breed is the individual whose similarity to the other individuals is the highest and, as such, is the one representative of the breed worthy for whole genome sequencing. This animal has the shortest summed distance from all individuals in a given RP.
There were four among the certified purebred HSVs who were slightly peripherical compared to the central core of the HSVs. The most distal HSV in PCA-distance and IBS-central analyses was the HSV24 individual. The two analyses differ in the assessments of HSV38. The PCA-distance displayed seven individuals to be more protruding than this individual. The IBS-central animal method compresses the main group and separates the outlier individuals more explicitly from the main group. Considering these similar results, the IBS-central method might be better suited for practical breed assignments as it distinguishes more explicitly individuals residing in the peripheral region in the breed distribution.
When comparing the results of PCA-distance and IBS-central methods (Tables 5 and 7), both low and high values occur relatively consistently in the two analyses.
The values of the last columns of the two analyses (the data set of the last columns labelled by '70 in Tables 6 and 7) were also plotted on two circular plots (Figure 4). It is more appropriate to refer to them as 'dotted balls' since it reflects the dynamic nature of the calculations and the continuous changing of the RPs.
On both dotted balls, the HWV1 individual is at 1.0. In the PCA-distance calculation ( Figure 4A), the inner two circles, 0-0.2 and 0.2-0.4, are not overcrowded, and there is no animal in the centre. In the IBS-central animal approach (Figure 4B), the data is more condensed in the inner two circles, and the HSV51 central animal is at the origo. The same two data series are also presented in tabular form (data series in the last columns labelled as '70 in Tables 5 and 6) in ascending order (Table S4).
animal in the centre. In the IBS-central animal approach (Figure 4B), the data is more condensed in the inner two circles, and the HSV51 central animal is at the origo. The same two data series are also presented in tabular form (data series in the last columns labelled as '70′ in Tables 5 and 6) in ascending order (Table S4). In the PCA-distance method (Table 5), the 0.6 value itself is an arbitrarily chosen threshold, which is not the borderline of the breed. This value is selected to highlight the animals on periphery as a first trial. The positions of these animals are in good accordance with admixture, PCA, GeneClass2, and IBS-distance results. The borderline of the breed could be clarified as the number of the RP increases. In the IBS-central method, the number of highlighted animals on the periphery is lower than that of the PCA-distance method, since the overall normalised values were also lower in IBS-central. Four individuals (HSV24, 30, 38, and 68) appear to have values above 0.4 (Table 6). These animals were indicated to be slightly different by the admixture analysis based on the merged, 145,453 SNP-set ( Figure 1B) and by the PCA plots (Figures 2 and 3). As can be seen, the position of the central animal is changing between two individuals under the influence of consecutive assignments and recalculations. In the IBS-central results, the oscillation of the standardised distances is more attenuated than in the PCA-distance results.
To test the behaviour of these two methods, the same animals (HSV38, 68, 45, 67) and the corresponding admixed versions with 10% increments in the genomic fraction of the HWV1 (Table 7), were used as previously presented with GeneClass2 (Section 3.3.1). In the case of PCA-distance, at 60−80% HWV ratio, the mixed HSV68 specimens exceeded the 1.0 outer border of HSV set by the single HWV1 individual. Based on assumption and as seen in Figure 1, to some extent, the HSV68 may already carry an HWV or another closely related breed background, such as a breed from the Pointer-Setter clade, and thus its admixed genome became further away than the outer point, the HWV1 dog itself. Vice versa, it is noticeable that the HWV1 genome carries regions more specific to HSV due to In the PCA-distance method (Table 5), the 0.6 value itself is an arbitrarily chosen threshold, which is not the borderline of the breed. This value is selected to highlight the animals on periphery as a first trial. The positions of these animals are in good accordance with admixture, PCA, GeneClass2, and IBS-distance results. The borderline of the breed could be clarified as the number of the RP increases. In the IBS-central method, the number of highlighted animals on the periphery is lower than that of the PCA-distance method, since the overall normalised values were also lower in IBS-central. Four individuals (HSV24, 30, 38, and 68) appear to have values above 0.4 (Table 6). These animals were indicated to be slightly different by the admixture analysis based on the merged, 145,453 SNP-set ( Figure 1B) and by the PCA plots (Figures 2 and 3). As can be seen, the position of the central animal is changing between two individuals under the influence of consecutive assignments and recalculations. In the IBS-central results, the oscillation of the standardised distances is more attenuated than in the PCA-distance results.
To test the behaviour of these two methods, the same animals (HSV38, 68, 45, 67) and the corresponding admixed versions with 10% increments in the genomic fraction of the HWV1 (Table 7), were used as previously presented with GeneClass2 (Section 3.3.1). In the case of PCA-distance, at 60−80% HWV ratio, the mixed HSV68 specimens exceeded the 1.0 outer border of HSV set by the single HWV1 individual. Based on assumption and as seen in Figure 1, to some extent, the HSV68 may already carry an HWV or another closely related breed background, such as a breed from the Pointer-Setter clade, and thus its admixed genome became further away than the outer point, the HWV1 dog itself. Vice versa, it is noticeable that the HWV1 genome carries regions more specific to HSV due to a very small genome proportion left from the initial HSV x DWHP cross when the HWV breed creation started.
As mentioned earlier, we set PCA-distance and IBS-central initial threshold values to 0.6 and 0.4, respectively. If the breeders decide that a 20% foreign DNA ratio is acceptable, the inclusion limit into the RP could be set as 0.820 and 0.570, derived from the results of artificially mixed animals (Table 7). Based on the current image of the sampled 70 HSV animals, inclusion limits could be 0.800 (Table 5) and 0.650 (Table 6) in the cases of PCAdistance and IBS-central methods, respectively.

Conclusions
Two simple SNP-based methods were designed and presented to help breeders' decision to assign new individuals to the reference population (RP) of the Hungarian Shorthaired Vizsla. The PCA-distance method calculates the standardised distances of individuals to the median position of the breed members defined by the coordinates of Principal Component Analysis. The IBS-central method calculates standardised distances based on identity-by-state values, where the reference point is an animal who is genetically the closest to everyone in the RP. The outer border of a breed is defined by the genetically closest member of the most closely related breed: the Hungarian Wire-haired Vizsla.
We plan to genotype more animals from other breeds or species as well and to establish where the described procedures can be put into work with high confidence. From the results presented here, based on Admixture, PCA, GeneClass2, PCA-distance, and IBS-central analyses, we conclude that 70 animals for an RP are satisfactory for the Hungarian Shorthaired Vizsla.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13112022/s1, Figure S1: Oscillations of standardised values obtained from the PCA-distance method; Figure S2: Oscillations of standardised values obtained from the IBS-central method; Table S1: List of 924 HSV-enhanced set; Table S2: Examples for calculation of PCA-distance and IBS-central methods and PCA coordinates of Figure 3; Table S3: Results of PCA-distance and IBS-central methods.; Table S4: Ordered and colour-coded results of PCA-distance and IBS-central methods.

Data Availability Statement:
The data presented in this study are available on request from the first and the corresponding author, and with the permission of Hungarian kennel clubs of the corresponding breed.