Genome Enhanced Marker Improvement for Potato Virus Y Disease Resistance in Potato

: Potato is an important food crop worldwide and is grown in a large number of countries. As such, the crop is under disease pressures and the need for selecting disease resistance genes during breeding programs is essential. Of particular importance within Australia and other parts of the world is the potyvirus, Potato virus Y (PVY). In this paper, three commonly used PVY resistance markers, M45, RYSC3 and M6, were evaluated using existing genomic resources and phenotypic data from the Australian potato breeding program to identify a region where the PVY resistance gene, Ry adg may reside. A region of Chromosome XI was investigated, and a cluster of disease resistance genes was identiﬁed that the resistance gene Ry adg is suspected to reside within. Protein characterization was also performed on the putative resistant gene. A speciﬁc variant that had complete association with the resistance gene was identiﬁed and a single nucleotide polymorphism (SNP) assay was designed to avoid dissociation of marker and gene in future breeding programs. This SNP marker (SNP37279) was validated as a Kompetitive Allele-speciﬁc PCR (KASP) genotyping assay and was found to perform more accurately than all previously used markers for detecting Ry adg .


Introduction
Potato (Solanum tuberosum) is the fourth most important food crop globally with production exceeding 350 million tonnes per annum [1]. This is mainly due to its climate adaptability, large yield and high nutritional value [2]. The risk of disease to potato production in such an extensively grown crop is significant and has wide-reaching implications. Potato is a monoculture crop, making many of the pests and pathogens of high economic importance. Data collected from 19 regions worldwide in 2001-2003 estimated that the potential losses in potatoes from insect pests, pathogens and viruses is 44.9% [3].
Susceptibility to viral diseases is particularly high despite years of breeding. A recurring impediment to potato production is Potato virus Y (Family Potyviridae, genus Potyvirus, PVY), a ssRNA virus that is transmitted in a non-persistent manner via aphids and mechanically spread through germination of infected plant material. PVY has been reported to cause 10-80% yield losses in potato, depending on the virus strain and if co-infections are occurring with other viruses [4]. It has been designated the fifth most economically important plant virus [3] and is the most important virus worldwide infecting potato [5]. Within Australia, PVY has been detected in all states but is successfully controlled in Western Australia and Tasmania. Detection of potato infecting viruses through seed certification schemes has assisted in minimising the rate of infection in field. Despite these efforts, PVY is now being more routinely observed, and several strains have now been detected in the field [3,6]. Of most importance is PVY NTN , inducing symptoms such as leaf and stem necrosis, leaf-drop, stunting and necrosis of the tuber.
Due to the easy transmission of PVY by aphid vectors, infected plant material and mechanical means, attempts of control using chemical-based treatments are often not practical or cost-effective. In addition, symptoms of PVY infection can be strain specific and certain cultivars do not exhibit symptoms when infected [5]. A more appealing method for control of viral diseases is through breeding and the delivery of resistant cultivars [7]. Resistant cultivars can produce harvestable yield without the need for additional inputs or management requirements. However, producing new resistant cultivars can be a long process as phenotypic recurrent selection can take 10-12 years to produce a new variety with the desired trait [8][9][10][11]. The use of molecular tools such as marker-assisted selection (MAS) to assist breeding efforts can reduce this timeframe through easier and earlier screening and allow more informed decisions through reduced phenotypic errors [12,13].
Introgression of PVY resistance genes into commercial germplasm from the wild relatives Solanum tuberosum subsp. andigena and S. stoloniferum has provided a robust source of resistance [7,14,15]. These introgressions provide the resistance genes Ry adg and Ry sto that confer resistance to all strains of PVY [16,17]. Within the Australian potato germplasm, that has a strong European and North American heritage, but which is also well adapted to local conditions, Ry adg is the most prominent resistance source [5,[18][19][20]. In addition, the majority of resistance genes are under simple genetic control where the simplex state provides the trait. Thus, the identification of parents with higher genetic dosage is more valuable for breeding programs to increase desired allele transmission to progeny [9,21,22].
Virus resistance traits are typically selected in conjunction with other specific aims of breeding programs. Selection of resistant parents can be assisted by using marker-based genotyping in conjunction with phenotyping. This combinatorial approach is of benefit as there are known asymptomatic cultivars when infected with PVY, such as Shepody, that would not be detected using phenotyping alone [5]. Several markers have been developed to select for cultivars containing Ry adg , these include the sequence-characterized amplified region (SCAR) marker RYSC3 [22] and converted amplified fragment length polymorphism (AFLP) marker M45 [23], that have been primarily used to genotype the Australian collection. Slater et al. [24,25] performed genotyping using both markers and found 10/74 cultivars were incorrectly genotyped as susceptible to PVY using the RYSC3 marker, while M45 delivered superior accuracy with only cv. Emma genotyping as resistant using both markers yet is phenotypically classified as susceptible to PVY. Low association between RYSC3 marker alleles and Ry adg mediated resistance was also reported by Dalla Rizza et al. [26] and Herrera et al. [27]. To this effect, use of the RYSC3 marker has been replaced by alternative markers such as M45 and M6 [23,27]. There is currently no formal evidence in the literature to suggest that the M6 marker incorrectly genotypes any samples, however, it has only recently been reported and has not been subjected to close scrutiny or evaluation.
Recently, efforts have been made to improve the genomic resources available for the auto-tetraploid potato to the community for the improvement of potato breeding. This has resulted in the first draft of the 840 Mb potato genome, identification of c. 40,000 genes, estimated breeding values for desirable market traits and the identification of large cohorts of sequence variants [28][29][30][31]. These resources allow for the mining of diagnostic markers for specific traits, such as disease resistance and for the application of more traditional breeding tools, such as marker assisted selection (MAS) or the identification of SNP markers for use in genomic selection. Both methods of genomic assisted breeding would deliver benefits to the potato industry through new improved cultivars that are more disease resistant.
The marker systems currently in place for the selection of PVY resistant germplasm are ineffective due to the breakdown of association between the markers and the PVY resistant trait [25]. In this paper we will show that M6 will likely fall into the same category as other previously reported markers, resulting in no predictive diagnostic markers available that are 100% accurate for Ry adg . In addition, screening for these markers is not compatible or effective for high-throughput (HTP) multiplexed screening methods and is not able to be integrated into more complex genomic selection systems where multiple complex traits can be selected for simultaneously. With the genomic resources now available, the application of genetic mapping to find causal markers with more precision than classical biparental maps has been enabled. The aims of this paper are to use existing genomic resources and genotypic data from the Australian potato cultivar collection [24,25,31] to investigate the position of the Ry adg gene on Chromosome XI and develop a diagnostic SNP marker system for the reliable detection of the resistance allele for implementation in potato varietal improvement.

Plant Material and Sampling
The germplasm collection used for the SNP discovery portion of this study was as described in Caruana et al. [31]. A SNP validation panel of 80 cultivars with the majority having known Ry adg phenotypes was assembled. Cultivars from the germplasm collection were used to perform crosses under glasshouse conditions. Two populations (Lady Christl × La Ratte and Lady Claire × Friar consisting of 93 and 45 individuals, respectively) were developed to validate the developed molecular markers. Crosses were chosen on the basis that one cultivar (Lady Christl and Friar) contained the PVY resistance gene, Ry adg [25]. Seeds from the crosses were collected and grown to seedlings and all plants available from each cross sampled for DNA screening at 4 weeks.

Phenotyping
The parental material used in this study [31] had been previously screened for virus resistance [24,25]. Briefly the samples as well as known resistant and susceptible cultivars were grown in 150 mm pots in standard potting mix under glasshouse conditions. Three replicates for each cultivar and a control were inoculated with PVY NTN at the sixth leaf stage. Leaves from infected plants were ground in chilled 0.05 M phosphate inoculation buffer at a 1:5 dilution of leaf material to buffer. Two leaves of each plant were dusted with carborundum powder then lightly rubbed with the PVY NTN inoculum. At 21 days post infection, visual symptoms were noted, and new growth was sampled to test for virus infection. Samples were tested for PVY using a double antibody sandwich ELISA technique [24,25].

DNA Extractions
The germplasm collection and F1 progeny were sampled for DNA extraction. In all cases the sixth leaf was sampled for an automated 96 well format DNA extraction. Tissue was lysed after being frozen at −80 • C using a Tissuelyser II Mixer Mill (Qiagen, Hilden, Germany). DNA was extracted using a MagJET Plant Genomic DNA Kit (Thermo Scientific, Waltham, MA, USA) on a KingFisher Flex 96 (Thermo Scientific) and then stored at −20 • C.

Bioinformatic Data Analysis
The underlying DNA sequence of markers M45, M6 and RYSC3 were compared to the potato reference genome sequence using BLASTn using the Spud DB blast function (http://solanaceae.plantbiology.msu.edu/blast.shtml, accessed on 19 August 2020).
All 2 × 150 base, paired-end RNA sequencing data was taken from the previously published study of Caruana et al. [31]. Briefly, c. 3 million reads from leaf tissue of 181 potato cultivars were generated and SNP variants were identified using a pipeline that entailed the following. Initial sequence data fastq files were processed through a custom perl script for quality trimming and then adapters were removed using cutadapt v1.9 [32]. The trimmed sequence data were aligned to the Solanum tuberosum Group Phureja DM1-3 Assembly Version 3 DM pseudomolecule (v4.03) assembly [28,33] using the Spliced Transcripts Alignment to a Reference (STAR) software v2.5.3a, with default settings [34]. The alignments were converted to bam files and were initially processed with using Picard Agronomy 2021, 11, 832 4 of 13 v2.1.0 to mark and remove duplicate reads (http://broadinstitute.github.io/picard). Basescore recalibration and variant calling was performed using the HaplotypeCaller function of The Genome Analysis Toolkit (GATK, [35]) under the following parameters; quality of mapped read >30; base quality >20; more than five reads covering the base in every genotype; more than four reads covering the alternate base (relative to the reference used) in at least one genotype; a minimum alternate allele fraction of 0.4. SNP calls (minimum two alternate bases at dp 5 to call heterozygote sample) and genotype assignment (AAAA, AAAB, AABB, ABBB or BBBB). All of these parameters needed to be met for a SNP to be called.
Due to the low occurrence of the Ry adg gene in the population used, initial filtering parameters were removed to include SNPs of minor allele frequency in the population. A region of Chromosome XI was delimited with existing markers RySC3 and M45. SNPs of interest were obtained by identifying unique genotypes matching only the known resistant cultivars that fell within the delimited region. Correlation of the SNP and virus resistance phenotypes was possible through the generation of marker-trait association proportions enabling the identification of associated alleles and nucleotide bases.

KASP Assay
Kompetitive Allele-specific PCR (KASP) assays were designed for one SNP in haplotypic configuration with the putative Ry adg gene. For each SNP, two forward allele specific primers and one reverse primer were developed using Sequencher 5.0 (Genecodes, Ann Arbor, MI, USA; Table 1). The developed primers were tested on a panel of 80 cultivars where the presence or absence of the Ry adg gene for the majority of samples was known as well as 138 segregating progeny from two crosses (Lady Christl × La Ratte and Lady Claire × Friar) where the presence or absence was unknown but would be present. KASP genotyping assays were performed in a total volume of 5 µL containing 0.07 µL of primer mix (containing 12 µL of each allele specific forward primer and 30 µm of reverse primer), 2.5 µL of KASPTM 2x Master Mix (LGC Genomics, Teddington, UK) and 2.5 µL of normalised genomic DNA (10 ng/µL) according to the manufacturer's guidelines (LGC Genomics). Assays were run under the following cycling conditions: 94 • C for 15 min; 10 touchdown cycles of 94 • C for 20 s, 61-55 • C for 60 s (dropping 0.6 • C per cycle); 26 cycles of 94 • C for 20 s, 55 • C for 60 s. Detection of fluorescent samples was performed using a FLUOstar Omega microplate reader (BMG Labtech, Ortenberg, Germany) and cluster plot analysis was completed in KlusterCallerTM software (LGC Genomics). Alternatively, both fluorescence detection and cluster plot analysis were performed using CFX Manager (Bio-Rad, Hercules, CA, USA).

Protein Characterisation
The putative resistant allele of the transcript PGSC0003DMT400043732 was identified through SNP identification and translated into a protein sequence to identify open reading frames using the ExPASy Translate Tool (https://web.expasy.org/translate/, accessed on 23 October 2020). The protein sequence for transcript PGSC0003DMT400043732 was initially characterised using InterProScan (EMBL-EBI; http://www.ebi.ac.uk/interpro/, accessed on 23 October 2020; [36]) for functional analysis, family classification and prediction of Agronomy 2021, 11, 832 5 of 13 domains. A BLASTp of the protein sequence was also performed for identification of significant alignments (NCBI; https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 23 October 2020) and a protein model was developed using I-TASSER [37,38]. Alignment of the two sequences (putative resistant allele vs. susceptible) was performed using Clustal Omega (EMBL-EBI; [39]).

Results
The underlying DNA sequence of markers M45, M6 and RYSC3 were compared to the potato reference genome sequence using BLASTn and all identified a specific region on Chromosome XI ( Table 2). The maximum region that was delimited by the markers was included in the analysis. This area spanned position 1 to 1,826,346 bp. The region from the start of Chromosome XI to the M45 marker was of most interest (1-1,494,355 bp; Table 2) as M45 has higher predictive ability when compared to RYSC3. The M6 marker was detected within the sequence interval between M45 and RYSC3 and is located closer to the RYSC3 marker than M45 (Table 2). Within the maximum region of 1,826,346 bp, 200 genes were identified by extracting all predicted and known coding sequences from the genome via the general feature format (GFF) file. From the available genotypic data, a total of 35,569 unfiltered SNPs were also putatively located within these genes in the candidate target region. Within the germplasm collection, 6 out of the 181 cultivars used to generate the sequence and SNP data had previously been phenotyped and genotyped as positive for Ry adg [25]. SNPs unique to these six cultivars were extracted by comparing known resistant cultivars (Carlingford, Friar, Galil, Lady Christl, PO3 and Royal Blue) to all susceptible cultivars. From this, only seven SNPs were found to correlate to the PVY resistant phenotype and five of those SNPs were contained within one gene. These five SNPs were located in the gene (PGSC0003DMG402016981; gene model ID PGSC0003DMT400043732), located between 1,407,371 and 1,411,692 bp in the reference genome. A total of four of the five SNPs were identified as triallelic in nature across the germplasm set used ( Table 3). The candidate Ry adg gene containing the associated SNPs was annotated as a Bacterial spot disease resistance protein 4 (Figure 1). The candidate gene identified is located within a cluster of disease resistance genes in the reference genome, with 6/8 genes between 1,390,274 and 1,430,902 bp being categorised with a similar annotation and function (Supplementary file 1). A triallelic SNP within the gene at genome position 1,410,649 bp (position 3279 of transcript PGSC0003DMT400043732) was selected for primer design for diagnostic assay development (Table 3).
also performed for identification of significant alignments. A Tobacco mosaic virus (TMV) resistance protein (Accession number Q40392.1) produced the most significant alignment to transcript PGSC0003DMT400043732 with 95% query coverage and an E value of 0. An alignment of the two protein sequences (Clustal Omega) representing resistant and susceptible cultivars revealed three amino acid changes at positions 1016, 1059 and 1060 and which are downstream of the LRR region.  Table 3. The SNPs identified in this study that were tested as potential markers for the detection of Ry adg , from left to right, the transcript that they were identified from, the position they reside at within that transcript, the reference base, the alternative base, and the single nucleotide polymorphism (SNP) classification. A KASP assay was developed for SNP3279, putatively associated with the Ry adg locus. The assay was screened across a collection of 80 breeding parents for validation of the SNP marker and included known resistant sources (Carlingford, Friar, Galil, Lady Christl, PO3 and Royal Blue) as well as Emma that was known to incorrectly genotype as resistant with the M45 and RySC3 assays but phenotypes as susceptible ( Table 4). The data generated by the SNP3279 molecular marker was in complete agreement with the known phenotypic results. All known resistant cultivars genotyped as positive for the resistant source with the M45 marker and also phenotyped as resistant to PVY N . The cv. PO3 was the only known resistant source used in the study that genotyped as resistant using both the RySC3 and the M45 marker ( Table 4). The assay also agreed with the M45 marker data, with the exception that the cultivar Emma genotypes correctly as susceptible with the SNP3279 assay. In addition, SNP3279 genotyped a further four cultivars as containing the resistant allele (Granola, Suvi, Foxton and Maranca). The complete set of results of phenotyping and genotyping for the collection of breeding parents are presented in Supplementary File 2.
The SNP3279 marker was also evaluated on novel progeny resulting from the crosses Lady Christl × La Ratte and Lady Claire × Friar in an attempt to identify allele dosage as well as validate the molecular marker in novel segregating germplasm. From the Lady Christl × La Ratte cross, 93 progeny were evaluated. In total, 38/93 progeny were genotyped as resistant while the remaining 55/93 progeny were genotyped as susceptible. The observed segregation ratio was 1:1.4 and a Chi 2 analysis yielded a nonsignificant result for deviation away from a 1:1 ratio. A total of 45 progeny from the Lady Claire x Friar cross were also evaluated. From this cross, 16/45 were genotyped as resistant while the remaining 29/45 were genotyped as susceptible. The observed segregation ratio was 1:1.8 and a Chi 2 analysis also yielded a nonsignificant result for deviation away from a 1:1 ratio. In the case of both crosses, the resistant parents Lady Christl and Friar appear to follow simplex dosage based on the observed segregation ratio of 1:1 in the offspring, indicative of simplex by nuliplex.

Protein Characterisation
The putative resistant allele of the transcript PGSC0003DMT400043732 that has the potential to be the introgressed gene Ry adg , was translated into a protein sequence to identify open reading frames using the Translate Tool (ExPASy). The protein sequence for transcript PGSC0003DMT400043732 was initially characterised using Interpro Scan (EMBL-EBI) for functional analysis, family classification and prediction of domains. The main matches included toll/interleukin-1 receptor homology (TIR) domain superfamily, P-loop containing nucleoside triphosphate hydrolase, winged helix DNA-binding domain superfamily, leucine-rich repeat (LRR) domain superfamily, helical domain of apoptotic protease-activating factors, disease resistance protein signatures and other uncharacterised leucine-rich repeating domains. A BLASTp of the translated protein sequence was also performed for identification of significant alignments. A Tobacco mosaic virus (TMV) resistance protein (Accession number Q40392.1) produced the most significant alignment to transcript PGSC0003DMT400043732 with 95% query coverage and an E value of 0.  Table 4. Cultivars identified that have exhibited a dissociation event between a marker and Ry adg . From left to right; the cultivar being evaluated; the result for RySC3 where a "-" indicates a negative result and "+" indicates a positive; the result for M45 where a "-" indicates a negative result and "+" indicates a positive; the result for SNP3279 where a "-" indicates a negative result and "+" indicates a positive; the phenotypic result from being challenged with PVY N ; the resistance classification pulled from the European potato databases (SASA, IPK, NIVAP), if no classification was present, the classification of parents of the cultivars was provided. Validation of KASP SNP marker was performed using parental material positive for Ry adg and Emma, negative for Ry adg .

Discussion
Advancements in genotyping technologies have removed the need for anonymous marker systems and molecular marker assays have coalesced to SNP variants and markers. This standardisation has significant benefits including the ability to transfer assays across platforms (target capture assays, SNP chips, genome sequencing) as well as the scalability to simultaneously screen for multiple genetic traits using a wide spectrum of multiplexing strategies. For applications in genomic selection (GS), routine genotyping using SNPs that are causal variants for traits will improve the selection power of the markers. The ability to rapidly develop and resolve GS issues relating to historic molecular marker assays has now been enabled in potato through the delivery of the reference genomic sequence along with large collections of known sequence variants [28][29][30][31]33].
Despite the introduction of GS and its initial development in potato, the application of marker assisted selection on simple traits, in some circumstances can still be of value and can be a cost-effective solution. For species like potato with complex genetic inheritance, marker assisted selection is still an important tool to increase selection accuracy of specific genotypes and ultimately reduce the time required to deliver new cultivars. For the most beneficial long-lived potato cultivar, robust forms of disease resistance are of most value, if the mechanism of resistance is durable. As potato has a wide range of traits that have to meet stringent production needs, initial selection of key traits that are under simple genetic control could remove unwanted genotypes from the breeding process prior to more costly genome-wide genotyping and GS.
This study aimed to exemplify and utilise the benefits of existing genetic and genomic resources in potato, through combining SNP genotypic data with the reference genome sequence for the development of new improved diagnostic markers linked to Ry adg . The identification of SNP markers in a high throughput genotyping assay like KASP can validate such markers in a rapid cost-effective manner. The M45 and RYSC3 markers are the most commonly used molecular markers for detection of the PVY resistance gene, Ry adg , and have been effective to some degree in selecting PVY resistant germplasm. However, errors have occurred in typing germplasm and these errors in genotyping are primarily due to the distance between the diagnostic marker and PVY resistance gene. This was exemplified with cv. Emma which contains both the M45 and RySC3 marker but is susceptible to PVY infection. The three PVY resistance markers (M45, M6 and RYSC3) were identified on Chromosome XI at positions 1,494,355, 1,708,102 and 1,826,029, respectively. M6 is a more recently described marker for PVY resistance [27] and has not been screened against a large population of phenotypes. Given that this marker is located on Chromosome XI in-between M45 and RYSC3 and both of these markers have lost linkages with Ry adg [25], it is unlikely that the M6 assay will be more robust.
The presence and location of the Ry adg gene is a result of a historic introgression from S. tuberosum ssp. Andigena [40,41]. The region of Chromosome XI that this gene is located on is expected to contain novel variation and haplotypes compared to the more conventional potato genome because of this introgression event. As the reference genome was derived from a S. tuberosum Phureja clone, it is also expected that the reference sequence does not contain the Ry adg gene. Due to the location of the M45 and RYSC3 markers towards the end of Chromosome XI and the increase in linkage breakages with RYSC3 and the PVY resistance trait as demonstrated by cvs. Carlingford, Friar, Galil, Lady Christl, and Royal Blue, it was surmised that the actual Ry adg gene was upstream of both markers. In addition, the cv. Emma genotyped as resistant with both markers, whilst phenotyping as susceptible [24,25], indicating that the break in linkage had occurred between these markers and the Ry adg gene.
A list of genes in the region has been supplied that were available on PGSC (Supplementary File 1). To shorten this list, SNPs unique to resistant cultivars in the delimited region were extracted and positioned on the genome. Five SNPs between 1,407,371 and 1,411,692 bp were found in a "Bacterial spot resistance gene" (PGSC0003DMG402016981; gene model ID PGSC0003DMT400043732), which was located close to the M45 marker. This gene showed typical R gene qualities such as leucine-rich repeats and nucleotide binding sites [42][43][44] while also residing in a large cluster of disease resistance genes. Out of the five SNPs located within the resistance gene, four were triallelic, suggesting the gene was highly variable or has arisen from a highly divergent haplotype or is the result of a local gene duplication in a specific lineage that is not present within the reference genome. Presence absence variation has been well documented for disease resistance in plants [45]. As the Ry adg has resulted from a wild relative introgression into the potato genome, it is also possible that the cluster of introgressed genes could contain a large cohort of genes that act together and strongly affect disease resistance within the plant. It is suggested that any gene that is present within this cluster could potentially be Ry adg but SNP3279 is likely to hold a strong association due to the effect an introgressed cluster has. Wild introgressions in other plant species as well as specific examples in potato have observed reduced recombination rates within introgressed genetic fragments due to highly divergent genetic sequences and other characteristics [46][47][48][49]. As a result, a genetic marker within the introgressed region is likely to hold association with the trait as a diagnostic assay in a robust manner whilst not necessarily having to be the causative variant. In order to unambiguously determine the Ry adg gene and identify the exact genes that were introgressed, functional genomic knockout studies are required.
KASP technology was chosen to attempt to evaluate SNP copy number to provide allele dosage of the resistant allele for parental material. Allele dosage counts are valuable to potato breeding as the autotetraploid nature of the potato generates complex patterns of inheritance that can dictate the required size of a population that has to be screened to remove unwanted plants [12]. The KASP assay described here can successfully distinguish between nulliplex and simplex accurately, however, higher dosage levels have not knowingly been evaluated and would require further evaluation. Segregation of Ry adg was attempted in unknown populations, however, despite the size of the populations being limiting, segregation and allele dosage was confirmed making it possible to accurately designate allele dosage to parental cultivars.
The newly designed KASP assay for the putatively associated Ry adg variant, SNP3279, accurately genotyped the parent cultivars that did or did not contain the PVY resistance gene, Ry adg , as well as amending cultivars that had been previously genotyped incorrectly due to recombination events between M45 and Ry adg ( Table 4). As the SNP used for the primer design is triallelic, the marker assay developed was specifically targeted to identify the base associated with the resistance allele, while also assaying one of the other two possible nucleotides. As the assay technology only works in a biallelic nature there is the possibility that some cultivars would genotype with a missing allele and may appear as missing data, whilst being accurately genotyped. While the SNP3279 marker accurately repeated the genotyping results for known resistant cultivars, there were new cultivars (Foxton, Maranca, Suvi and Granola) that putatively had resistance based on the presence of SNP3279 but that had not been phenotyped previously. Although these cultivars had not been previously phenotyped, they had been assigned susceptible through genotyping with the historic markers and so represent a potential mislabelled source of PVY resistance. Upon examining these cultivar pedigrees, most had been rated as resistant or were developed with known resistant parents (Table 4). It is likely that these genotyping errors are due to recombination events occurring between the previously used markers and the Ry adg gene as is the case with cv. Emma. In the case of cv. Granola for example, its heritage has included breeding lines known to carry Ry adg in its development and cv. Granola has been reported to contain varying levels of resistance to PVY (Table 4).
Protein characterisation of transcript PGSC0003DMT400043732 showed classic disease resistance protein components when characterised through Interpro Scan (EMBL-EBI). Of most interest was: the identification of Toll/Interleukin receptors known to mediate cell death and disease resistance pathways across all kingdoms [50]; a P-loop containing nucleoside triphosphate hydrolase where reactions catalysed by enzymes cause hydrolysis of bound nucleotide triphosphates (NTPs) [51]; a winged helix DNA-binding domain superfamily responsible for DNA recognition through contact [52]; a leucine-rich repeat (LRR) domain superfamily responsible for detection of pathogens. The majority of currently identified disease resistance genes in plants contain nucleotide binding (NB) and LRR domains [53] and through in silica characterisation in this study it appears that the gene encoded from transcript PGSC0003DMT400043732 falls in this group. The protein sequences of the resistant and susceptible cultivars showed 99.73% identity with three amino acid changes at positions 1016, 1059 and 1060 of this transcript. At these positions the resistant transcript contains Arginine (Arg), Glutamic Acid (Glu) and Alanine (Ala) whereas the susceptible transcript contains Glutamine (Gln) and Aspartic Acid (Asp). The amino acid changes at positions 1016 and 1059 are considered to be minor with a strong similarity between the substituted AAs. The most extreme change seen is at position 1060 where Ala (a nonpolar/aliphatic amino acid) is replaced with Asp (a polar negative amino acid).

Conclusions
The aim of this study was to incorporate the genomic resources that have been developed for Solanum species with large genotypic and variant datasets with historic phenotypic records, to identify genes directly associated with PVY resistance. By combining these new and established resources, this study demonstrates the utility of applied genomics in potatoes to further enhance precision breeding. We identified a gene showing strong characteristics similar to known viral resistance genes and developed a simple diagnostic assay to screen unknown samples that accurately detect its presence in potato germplasm. However, using a haploid reference genome, significant differences between this and the wild introgressed fragment are likely to be present. To further characterise the region containing Ry adg , long read DNA sequencing would unequivocally define the gene content of the targeted region, enabling detailed gene knockout studies for absolute clarity. Alternatively, extensive use of the marker in an applied breeding setting, where the marker would undergo a robust evaluation, could confirm the predictive diagnostic nature of the marker or identify limitations that would indicate a need for further studies.