Molecular Characterisation of a Supergene Conditioning Super-High Vitamin C in Kiwifruit Hybrids

During analysis of kiwifruit derived from hybrids between the high vitamin C (ascorbic acid; AsA) species Actinidia eriantha and A. chinensis, we observed bimodal segregation of fruit AsA concentration suggesting major gene segregation. To test this hypothesis, we performed whole-genome sequencing on pools of hybrid genotypes with either high or low AsA fruit. Pool-GWAS (genome-wide association study) revealed a single Quantitative Trait Locus (QTL) spanning more than 5 Mbp on chromosome 26, which we denote as qAsA26.1. A co-dominant PCR marker was used to validate this association in four diploid (A. chinensis × A. eriantha) × A. chinensis backcross families, showing that the A. eriantha allele at this locus increases fruit AsA levels by 250 mg/100 g fresh weight. Inspection of genome composition and recombination in other A. chinensis genetic maps confirmed that the qAsA26.1 region bears hallmarks of suppressed recombination. The molecular fingerprint of this locus was examined in leaves of backcross validation families by RNA sequencing (RNASEQ). This confirmed strong allelic expression bias across this region as well as differential expression of transcripts on other chromosomes. This evidence suggests that the region harbouring qAsA26.1 constitutes a supergene, which may condition multiple pleiotropic effects on metabolism.


Introduction
Kiwifruit cultivars of Actinidia chinensis are known as a rich source of dietary vitamin C (AsA). However, the related species A. eriantha has AsA concentrations in its fruit of up to 800 mg/100 g fresh weight but has small fruit with a bland flavour [1]. Recently a large-fruited high AsA A. eriantha cultivar ('White') has been described [2]. If this high concentration could be transferred by crossing to more palatable kiwifruit species, an ultra-high health fruit could be developed. The availability of high-quality genome sequences for A. eriantha [3] as well as A. chinensis var. chinensis [4,5] provides the basis for functional and genetic approaches to aid such introgression.
The dominant pathway of AsA biosynthesis in Actinidia species including A. eriantha appears to be the L-galactose pathway [1], with AsA biosynthesis occurring early in fruit development, and then declining. The control of this pathway lies in an early committed step of biosynthesis in the enzymes GDP-galactose phosphorylase (GGP) and GDP-mannose epimerase (GME), with some input from GDP mannose pyrophosphorylase (GMP) [6,7]. Transformation of plants to over-express GGP results in a several fold increase in fruit or tuber ascorbate [8] and over-expression of GME, which by itself has little effect, synergistically increases ascorbate yet further [9]. Oxidised ascorbate is also reduced by several enzymes which have also been implicated in controlling ascorbate concentrations, as have a range of transcription factors and other regulators [7]. In addition, the upstream open reading frame of the GGP gene has a role in controlling translation of the GGP gene and thus ascorbate concentration, forming a feed-back control loop in response to elevated ascorbate [9]. Thus, a complex of enzymes and regulators controls ascorbate concentration in plants, any of which may explain why A. eriantha has such a high ascorbate concentration.
Both in apples [10] and tomatoes [11] QTL mapping has successfully identified candidate genes for regulation of ascorbate content. In this paper we analyse the genetic basis for why A. eriantha has such high ascorbate by studying crosses between A. eriantha and other Actinidia species, and locate the chromosomal region conditioning super-high ascorbate levels in A. eriantha.

Pooled Whole-Genome Sequencing and Genome-Wide Association Study (GWAS)
Quantitative HPLC analysis of AsA levels in fruit harvested from tetraploid hybrid Actinidia backcross populations revealed evidence for bimodal segregation in all families as well as differences in family medians ( Figure 1).  The parents of these populations were selected from crosses between hexaploid A. chinensis var. deliciosa and diploid A. eriantha, and between hexaploid A. chinensis var. deliciosa and diploid A. chinensis var. chinensis ( Figure 2). Because of the complex polyploidy genome composition of these populations and the observation of bimodal segregation suggesting a major gene, we conducted genetic analysis by pooled whole-genome sequencing, exploiting the availability of draft genome assemblies of A. chinensis var. chinensis as reference [4,5]. Since there was wide variation for fruit weight and this was not correlated with AsA levels, we constructed pools by both traits to enable orthogonal, replicated tests of allele frequencies for each trait (Figure 3).
The parents of these populations were selected from crosses between hexaploid A. chinensis var. deliciosa and diploid A. eriantha, and between hexaploid A. chinensis var. deliciosa and diploid A. chinensis var. chinensis ( Figure 2). Because of the complex polyploidy genome composition of these populations and the observation of bimodal segregation suggesting a major gene, we conducted genetic analysis by pooled whole-genome sequencing, exploiting the availability of draft genome assemblies of A. chinensis var. chinensis as reference [4,5]. Since there was wide variation for fruit weight and this was not correlated with AsA levels, we constructed pools by both traits to enable orthogonal, replicated tests of allele frequencies for each trait (Figure 3).  Small insert paired end Illumina sequencing over two lanes yielded 965,452,550 reads with 92.8% Q30, and 86% of reads mapped to the Red5 PS1.1.68.5 pseudomolecules. Pool-GWAS (genome-wide association study) scans performed on both normalised and non-normalised read count data using Popoolation2 [12] revealed a single major QTL for AsA content on Chromosome 26 ( Figure 4), but no The parents of these populations were selected from crosses between hexaploid A. chinensis var. deliciosa and diploid A. eriantha, and between hexaploid A. chinensis var. deliciosa and diploid A. chinensis var. chinensis ( Figure 2). Because of the complex polyploidy genome composition of these populations and the observation of bimodal segregation suggesting a major gene, we conducted genetic analysis by pooled whole-genome sequencing, exploiting the availability of draft genome assemblies of A. chinensis var. chinensis as reference [4,5]. Since there was wide variation for fruit weight and this was not correlated with AsA levels, we constructed pools by both traits to enable orthogonal, replicated tests of allele frequencies for each trait (Figure 3).  Small insert paired end Illumina sequencing over two lanes yielded 965,452,550 reads with 92.8% Q30, and 86% of reads mapped to the Red5 PS1.1.68.5 pseudomolecules. Pool-GWAS (genome-wide association study) scans performed on both normalised and non-normalised read count data using Popoolation2 [12] revealed a single major QTL for AsA content on Chromosome 26 ( Figure 4), but no Small insert paired end Illumina sequencing over two lanes yielded 965,452,550 reads with 92.8% Q30, and 86% of reads mapped to the Red5 PS1.1.68.5 pseudomolecules. Pool-GWAS (genome-wide association study) scans performed on both normalised and non-normalised read count data using Popoolation2 [12] revealed a single major QTL for AsA content on Chromosome 26 ( Figure 4), but no significant associations with fruit weight (data 4 not shown). Closer inspection of the Chromosome 26 region and windowed analysis using QTLseqR [13] revealed a broad distribution of significant scores for AsA on chromosome 26 ( Figure 5; Table A1). Single nucleotide polymorphisms (SNPs) showing association with pool AsA were observed over an interval of 7 Mbp. We denote this major QTL as qAsA26.1.
significant associations with fruit weight (data 4 not shown). Closer inspection of the Chromosome 26 region and windowed analysis using QTLseqR [13] revealed a broad distribution of significant scores for AsA on chromosome 26 ( Figure 5; Table A1). Single nucleotide polymorphisms (SNPs) showing association with pool AsA were observed over an interval of 7 Mbp. We denote this major QTL as qAsA26.1.

Validation in Diploid Backcross Populations
To validate this association, we designed a set of high-resolution melting (HRM) assays (Table 3) of the associated variants on chromosome 26 which were homozygous in the low AsA pools (Table A1). Marker KCH00062 targeting the polymorphisms at 7647158-7647167 bp, exhibited agreement with pool AsA levels in 78/80 samples used to construct sequencing pools. A two-way ANOVA model of fruit AsA concentration showed that marker dosage and paternal family explained 79% (p < 2 × 10 −16 ) and 10% (1.21 × 10 −7 ), respectively, of total variance.

Validation in Diploid Backcross Populations
To validate this association, we designed a set of high-resolution melting (HRM) assays (Table  3) of the associated variants on chromosome 26 which were homozygous in the low AsA pools (Table  A1). Marker KCH00062 targeting the polymorphisms at 7647158-7647167 bp, exhibited agreement with pool AsA levels in 78/80 samples used to construct sequencing pools. A two-way ANOVA model of fruit AsA concentration showed that marker dosage and paternal family explained 79% (p < 2 × 10 -16 ) and 10% (1.21 × 10 -7 ), respectively, of total variance. This marker was evaluated in a further six diploid backcross families: three (A. eriantha × A. chinensis) × A. chinensis and three (A. chinensis × A. eriantha) × A. chinensis ( Figure 6). The maternal parent 11-06-16e of the EACK2 family used by Fraser et al. [14] was homozygous for the A. chinensis var. chinensis allele and the family did not have any high AsA (>400 mg/100g FW) fruit. In the AI247 and AJ247 families, ANOVA analysis indicated that the marker explained 78% of the phenotypic variance and residual analysis revealed 3/196 (1.5%) recombinants. The presence of the A. eriantha allele is associated with an increase in AsA content of approximately 250 mg/100 g FW. This marker was evaluated in a further six diploid backcross families: three (A. eriantha × A. chinensis) × A. chinensis and three (A. chinensis × A. eriantha) × A. chinensis ( Figure 6). The maternal parent 11-06-16e of the EACK2 family used by Fraser et al. [14] was homozygous for the A. chinensis var. chinensis allele and the family did not have any high AsA (>400 mg/100 g FW) fruit. In the AI247 and AJ247 families, ANOVA analysis indicated that the marker explained 78% of the phenotypic variance and residual analysis revealed 3/196 (1.5%) recombinants. The presence of the A. eriantha allele is associated with an increase in AsA content of approximately 250 mg/100 g FW. Additional HRM markers were evaluated from targets in the 8.2-8.5 Mb interval and three informative markers were identified targeting SNPs at 8,193,148,8,453,577 and 8,874,229 bp. The marker at 8,453,577 bp exhibited 10% recombination in the tetraploid families but the others exhibited complex segregation patterns and could not be scored. Efforts to design further co-dominant HRM markers in the 0-7 Mbp region were unsuccessful, suggesting that other marker types may be better suited to these highly heterozygous polyploid hybrids.

Genome Architecture of Actinidia Chromosome 26
Inspection of chromosome 26 repeat density and recombination estimates from genetic mapping [15] shows that the location of qAsA26.1 coincides with the boundary of a region with high repeat density and lower recombination (Figure 7). Alignment of the chromosome 26 pseudomolecules from the assemblies of A. chinensis 'Red5′ [4] and the A. eriantha 'White' [3] indicate that these are highly collinear apart from differences in the terminal repeat-rich region ( Figure A2). Additional HRM markers were evaluated from targets in the 8.2-8.5 Mb interval and three informative markers were identified targeting SNPs at 8,193,148,8,453,577 and 8,874,229 bp. The marker at 8,453,577 bp exhibited 10% recombination in the tetraploid families but the others exhibited complex segregation patterns and could not be scored. Efforts to design further co-dominant HRM markers in the 0-7 Mbp region were unsuccessful, suggesting that other marker types may be better suited to these highly heterozygous polyploid hybrids.

Genome Architecture of Actinidia Chromosome 26
Inspection of chromosome 26 repeat density and recombination estimates from genetic mapping [15] shows that the location of qAsA26.1 coincides with the boundary of a region with high repeat density and lower recombination ( Figure 7). Alignment of the chromosome 26 pseudomolecules from the assemblies of A. chinensis 'Red5' [4] and the A. eriantha 'White' [3] indicate that these are highly collinear apart from differences in the terminal repeat-rich region ( Figure A1).

Characterising the qAsA26.1 Introgression in Leaf Tissues
To better characterise the qAsA26.1 introgression we compared the leaf transcriptome and metabolome of low and high AsA progeny in the diploid A. chinensis × A. eriantha backcross AI247 and AJ247 families used for marker validation. We chose analysis of leaf tissues for ease of reproducible sampling and because it has been shown that A. eriantha also exhibits very high leaf AsA levels [1]. Zhang et al. [16] have reported segregation for leaf AsA content in the cross between hexaploid A. chinensis var. deliciosa and a diploid A. eriantha × A. chinensis var. chinensis. We confirmed by HPLC that leaf AsA levels were higher in samples of immature leaves from backcross progeny carrying the introgression (ACH0007 homozygotes 10.4 mg/100 g FW versus 25.3 mg/100 g FW in heterozygotes; p < 0.025 by T-test). These analyses were performed on tissue samples collected in RNALater without the precautions necessary for good preservation of AsA and are therefore lower than previously observed [1].

Pooled RNASEQ
RNASEQ was performed on three pools of backcross progeny with high fruit AsA which were heterozygous for the introgression and three pools of low AsA progeny lacking it, yielding 21.4-24.4 million reads per library. To determine the patterns of allelic expression on chromosome 26 we performed read assignment using PolyCat [17] based on a set of SNPs identified between A. chinensis and A. eriantha. This revealed that A. eriantha reads were essentially absent in low AsA pools across the first 10 Mbp of Chromosome 26 ( Figure 8A), providing additional genetic evidence that recombination is strongly suppressed in this region.

Characterising the qAsA26.1 Introgression in Leaf Tissues
To better characterise the qAsA26.1 introgression we compared the leaf transcriptome and metabolome of low and high AsA progeny in the diploid A. chinensis × A. eriantha backcross AI247 and AJ247 families used for marker validation. We chose analysis of leaf tissues for ease of reproducible sampling and because it has been shown that A. eriantha also exhibits very high leaf AsA levels [1]. Zhang et al. [16] have reported segregation for leaf AsA content in the cross between hexaploid A. chinensis var. deliciosa and a diploid A. eriantha × A. chinensis var. chinensis. We confirmed by HPLC that leaf AsA levels were higher in samples of immature leaves from backcross progeny carrying the introgression (ACH0007 homozygotes 10.4 mg/100g FW versus 25.3 mg/100 g FW in heterozygotes; p < 0.025 by T-test). These analyses were performed on tissue samples collected in RNALater without the precautions necessary for good preservation of AsA and are therefore lower than previously observed [1].

Pooled RNASEQ
RNASEQ was performed on three pools of backcross progeny with high fruit AsA which were heterozygous for the introgression and three pools of low AsA progeny lacking it, yielding 21.4-24.4 million reads per library. To determine the patterns of allelic expression on chromosome 26 we performed read assignment using PolyCat [17] based on a set of SNPs identified between A. chinensis and A. eriantha. This revealed that A. eriantha reads were essentially absent in low AsA pools across the first 10 Mbp of Chromosome 26 ( Figure 8A), providing additional genetic evidence that recombination is strongly suppressed in this region.  File 1). Of these, 82 mapped to the qAsA26.1 region of Chromosome 26 ( Figure 8B) of these, 61 mapped to annotated gene models. Because of the degree of allelic divergence between the two Actinidia species, transcript analysis based on the de novo assembly we used would be expected to frequently reveal novel alleles or splice variants absent in A. chinensis.
Significant expression differences were observed for 31 transcripts mapping to chromosomes other than chromosome 26, of which 26 mapped to annotated gene models (Table 1). These include one DET of particular interest. The transcript TRINITY_DN123292_c0_g2_i2 maps to the gene model Acc20170.1 (GenBank: PSS04323.1) encoding a putative GDP-L-fucose synthase 2 homologous to Arabidopsis GER2 (KEGG K02377; [18]). We previously reported significantly higher expression of GER in A. eriantha compared to A. chinensis [1; Figure 4, Panel D] (Authors note: This panel is mis-labelled as GMD). Because fucose synthesis draws upon the same substrate pool as ascorbate [6,7], this may have implications for regulation of mannose channelling to ascorbate. The observation of association between qAsA26.1 genotype and transcript expression at this locus suggests that qAsA26.1 contains transcriptional regulators of carbohydrate metabolism. DETs annotated as beta-glucosidase (Acc03845.1) and pectin acetylesterase (Acc29080.1) were also observed. A further DET of potential functional relevance to AsA metabolism is TRINITY_DN120596_c0_g1_i7 mapping to Acc29025.1 on chromosome 26. This gene is annotated as a component of the dolichol-phosphate mannose synthase complex which mediates mannosylation of glycans [19]. Similar associations between competing carbohydrate metabolic pathway expression and fruit AsA have been reported in studies of tomato interspecific introgressions [20] and ripening [21].
Differential expression was also observed for transcripts homologous to laccase (Acc02955.1) and anthocyanidin reductase (Acc09639.1) mapping to chromosomes 3 and 8, respectively. The differentially expressed transcripts identified on chromosome 26 include both structural genes (Acc29585.1, 4-coumarate CoA ligase; Acc29568.1, Shikimate O-hydroxycinnamoyltransferase) and transcriptional regulators of polyphenol metabolism (Acc18102.1, AtMyb4 homolog). Collectively these observations suggest that polymorphism at qAsA26.1 could exert a direct or indirect influence on polyphenol metabolism. Over-expression of GGP in tomato and strawberry not only increased ascorbate but also increased flavonoids and phenylpropanoids [8]. Further evidence for cross-talk comes from studies of Arabidopsis vtc mutants, which have shown that these are also impaired in transcriptional regulation of anthocyanin synthesis [22].

Untargetted Metabolomics
Liquid chromatography-MS analysis of leaf extracts from revealed some evidence for more frequent occurrence of elevated levels of flavonoids and phenylpropanoids in those carrying the eriantha marker allele ( Table 2). This data is from a single time point in an orchard environment and we expect it would be highly influenced by local variability in infection by the pandemic Pseudomonas syringae var. Actinidiae [15,23]. Targeted metabolomic analyses of fruit and vine tissues with standards, especially for antioxidant and key carbohydrates is desirable to better characterise the phenotype of qAsA26.1 alleles.

Discussion
This study confirms the findings in other horticultural crops such as Cucumis [24,25] that pooled sequencing offers a cost-effective and practical means to conduct genome scans in segregating plant populations. Since restricted recombination will preclude further genetic dissection of the locus the application of more sophisticated RNASEQ strategies for analysis of differential transcript usage and QTL [26] would enable a more detailed dissection of allelic and splice variation in future studies. Since our preliminary evidence suggests the potential for complex pleiotropic effects, more detailed metabolic profiling would be desirable.
The qAsA26.1 QTL is notable for its large effect, size and simple dominant inheritance. Although large effect QTL (>20%) for AsA levels have been reported in other fruits such as apple [27] and tomato [20,21,28], this QTL leads to AsA levels an order of magnitude higher. The structural, linkage and expression data presented here suggest that this QTL constitutes a supergene-a group of tightly linked loci inherited as a single Mendelian locus [29]. Supergenes commonly exert multiple pleiotropic effects and may be key to preserving adaptive variation through protecting a haplotype comprising multiple genes [30]. The qAsA26.1 region bears many similarities to the partially differentiated Actinidia sex chromosome (chromosome 25; [31]). Whereas A. chinensis is widely distributed in eastern lowland China, A. eriantha is restricted to southeastern China [32]. Because AsA can play multiple functional roles in higher plants including as a key redox signal in responses to biotic and abiotic stresses [33], it may be speculated that this extended haplotype has been preserved due to its benefits to adaptive fitness. More detailed functional analysis of the genes lying within qAsA26.1 may permit testing whether the locus action is due to a single as opposed to multiple linked regulators [34] The simple inheritance and large effect of this QTL offer some interesting opportunities not only for plant breeding but also for studies of AsA in human and plant physiology. Our findings suggest that practical genetic markers may be easily obtained and applied due to limited recombination and that these could be used to develop breeding lines fixed for high AsA alleles of qAsA26.1. The availability of the 'White' genome assembly will greatly simplify design of allele-specific markers that can be applied in highly heterozygous and polyploid backgrounds. Selecting lines with comparable eating qualities expressing 'normal' or 'super-high' AsA could provide unique materials for human dietary studies. Similarly, the ability to obtain both male and female vines with significantly different AsA content in vegetative tissues would allow replicated testing of hypotheses concerning the role of AsA in plant adaptation and fitness. In addition to marker-based methods, we hope that use of such materials may facilitate discovery of new targets for improvement of AsA levels that are transferable to other crops [35,36].

Ascorbate Analyses
Three whole fruit per seedling were analysed. Each fruit was cut equatorially as a 1 mm slice using a double-bladed knife. The three fruit slices were immediately placed in a plastic 15 mL tube and frozen in liquid nitrogen, then stored at −80 • C until analysis. Fruit were then thawed and centrifuged at 4000× g to separate solid material from the juice. It was critical to freeze the fruit before analysis as directly centrifuged fruit juice gave a much lower ascorbate reading. A 0.1 mL aliquot of the juice was then transferred to a micro tube containing 0.9 mL of 0.8% w/v met phosphoric acid, 2 mM EDTA and 2 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP HCL). These samples were then centrifuged at 14,000× g for 15 min to clarify the juice and then analysed by HPLC using a rocket column (Altima C18 3 micron from Phenomenex Ltd. (Auckland New Zealand) at 35 • C. Ascorbate was quantified by injecting 5 µL into a Dionex Ultimate ® 3000 Rapid Separation LC system (Thermo Scientifc). Instrument control and data analysis was performed using Chromeleon v7.2 (Thermo Scientific). Solvent A was 5 mL methanol, 1mL 0.2M EDTA pH 8.0 and 0.8 mL o-phosphoric acid in 2 L. Solvent B was 100% acetonitrile. The flow was 1.0 mL/min and the linear gradient started with 100% A and B was increased to 30% at 4.5 min, then to 90% B at 6 min. The column was then washed with 100% B and then returned to 100% A. The column was monitored at 245 nm and ascorbate quantified by use of authentic standards. Ascorbate was verified by its UV spectrum. This method gave the sum of oxidised and reduced ascorbate, namely total ascorbate. Ascorbate concentration in the juice was calculated directly and in preliminary assays compared to ascorbate extracted from powdered flesh. The juice method gave about a 5% higher result than the powdered whole fruit method.

Library Preparation
DNA was isolated from leaf bud tissue collected in spring 2015 using a cetyl trimethylammonium bromide extraction method [37] followed by purification with Qiagen columns and quantitated using the 3500 Genetic Analyzer (Applied Biosystems™, Foster City, CA, USA). Four normalised DNA pools were created of 20 individuals each as shown in Table 3. Small-insert Thruplex DNA-seq libraries (Rubicon Genomics Ltd.) were synthesised at NZ Genomics Ltd. and sequenced on two lanes of Illumina Hi-Seq 2500 yielding 965 million reads totalling 120 Gbp with 92.8% >Q30. Quality control using FastQC Screen (http://www.bioinformatics.babraham. ac.uk/projects/fastq_screen/) revealed that 85-88% of reads mapped to the Red5 A. chinensis_ var. chinensis reference version PS1 1.68.5 and 6% mapping to Actinidia chloroplast reference [38]. Read data was deposited as Genbank SRA accession PRJNA551536.

Pooled GWAS and Variant Analysis
Pool-GWAS scans for association of individual SNPs with AsA and fruit weight were performed using Popoolation2 [12]. Variants were summarised using samtools pileup (flags -B -Q 0) and called using popoolation mpileup2sync.jar with option -min-qual 20. Replicated contingency tests were initially performed on non-normalised data over AsA concentration and fruit weight strata using the cmh-test.pl script (flags -min-count 6 -mincoverage 4 -max-coverage 120 -max-coverage 200 -method withreplace). To facilitate comparison over sites with varying coverage, common odd ratios were calculated for significant SNPs using R mantelhaen.test.
Subsequent analyses focused on the genic regions using data resampled with replacement to a read depth of 40 using subsample-synchronised.pl (flags -target-coverage 40). Cochran-Mantel-Haenszel (CMH) tests p-values were adjusted for multiple testing using R p.adjust with the Benjamini and Hochberg correction. Output files for CMH tests are available as supplementary material at 10.5281/zenodo.1309045 To complement the SNP-based analysis, windowed scans for AsA QTL were performed by Next Generation Sequencing Bulked Segregant Analysis (NGS-BSA) [43] using the R package QTLseqr [13]. Input files were generated from VCF files separately for high and low fruit weight samples using samtools bcftools (http://www.htslib.org/doc/bcftools.html), filtering on a set of fixed polymorphisms (file PS1_EA_specific_SNPs.csv.gz in 10.5281/zenodo.3257749) identified between a set of A. chinensis genotypes [31] and A. eriantha using Bambam intersnp [44]. Two pairs of bulks were compared: High AsA/High Fruit Weight versus Low AsA/High Fruit Weight (pools 1 and 3) and High AsA/Low Fruit Weight and Low AsA/Low Fruit Weight (pools 2 and 4) ( Table 2). QTLseqr accepts two population types, F 2 and RIL. However, the lines used for constructing these pools were the result of backcrosses with the SNP data filtering to collect only alleles segregating in EA, therefore the function simulateAlleleFreq (https://rdrr.io/github/bmansfeld/QTLseqR/man/simulateAlleleFreq.html) was modified to permit analysis of a backcross population type (called BC4x) where the expected allele frequency was 0:0.25 and the expected segregation ratio was 1:1. NGS-BSA analysis was conducted to estimate QTL locations based on allele frequency differences among the pairs of pools. SNPs from all 29 chromosomes were analysed in each single analysis. The population type was set to BC4x, the window size was 1 Mbp, and the simulations were bootstrapped 10,000 times. The FDR was set to p < 0.001 based on adjustment by the method of Benjamini and Hochberg [45].
For downstream analysis, SNPs and indels were called with the frequentist variant caller Varscan2 v2.4.2 [46], using a hard filter for MAF >0.1, minimum coverage 20 and only reporting sites called in all four pools. VCF files are available at 10.5281/zenodo.1309045.

PCR Marker Design for Validation
Filtered SNP loci detected by Popoolation2 cmh_test.pl which were homozygous in low AsA pools were used as targets for HRM primer design (Table 4) using the script https://github.com/ PlantandFoodResearch/pcr_marker_design/blob/master/design_primers.py [50]. PCR amplification and HRM analysis on a on a Roche LightCycler 480 were performed as described previously [50].

Sample Collection and Processing
Tissue was sampled in October 2016 between 11 am and 1 pm from young leaves (3-5 cm) of AI47 and AJ47 families used for marker validation and placed in RNAlater (Sigma-Aldrich Co. LLC, St Louis, MO, USA) for shipping at 4 • C. The first fully expanded leaf from the same vine was also sampled for metabolomic analysis by taking 10 2 mm discs with a biopsy punch and placing into 50% v/v/methanol. Metabolomic analysis is described in Appendix B. RNA was prepared using the Spectrum Plant Total RNA Kit (Sigma-Aldrich Co. LLC, St Louis, MO, USA) and purified with the RNeasy Plant Mini Kit (Qiagen N.V., Hilden, Germany). Poly(A) RNA was isolated from 1.5 µg total RNA using NEXTflex Poly(A) Beads (PerkinElmer, Inc.). Six libraries (three high AsA, three low AsA) were made using the NEXTflex Rapid Directional qRNA-Seq Kit (PerkinElmer, Inc.). Samples were pooled by family and by whether they had AsA phenotype and were heterozygous for KCH0062 marker. Pools were formed as follows: Pool 1 N = 3 AI247 high AsA; Pool 2 AI247 N = 3 low AsA; Pool 3 AJ247 N = 8 high AsA; Pool 4 AJ247 N = 13 low AsA; Pool 5 AJ247 N = 7 high AsA; Pool 6 AJ247 N = 10 low AsA.
Synthesis of cDNA and quantitative PCR for genes GGP, GMD, T2 and DHAR2 were performed on individual samples from pools 1, 2, 5 and 6 against PP2A catalyst control as reported previously [1].

RNASEQ Transcript Analysis
A de novo assembly was performed on trimmed reads using Trinity v2.32 [54] yielding an assembly of 345,495 transcripts in 213,327 genes with contig N50 of 798 bp. Transcript abundance was estimated using RSEM [55] and differential expression analysis was performed using DESeq2 Release 3.9 [56]. Transcripts exhibiting differential expression at FDR <0.01 were aligned to the Red5 genome assembly using gmap [57] and intersection with annotated gene models was performed using bedtools [58]. Putative open reading frames and deduced peptides were identified with Transdecoder (https://github.com/TransDecoder) and annotated using GhostKoala [59].  Table S1 Genome locations, read counts and associated annotations of de novo assembled transcripts mapping to kiwifruit genome assembly Red5_PS1_1.69.0 (https://www.ncbi.nlm.nih.gov/assembly/GCA_003024255.1/) which exhibited differential expression between high and low AsA pools at FDR p < 0.05. Table S2 Trinity de novo assembly transcripts related to mannose metabolism that did not exhibit significant expression between AsA pools. Transcipts annotated as GDP-L-fucose synthase include the differentially expressed TRINITY_DN123292_c0_g2_i2.

LCMS system
The system consisted of a Thermo Scientific™ (San Jose, CA, USA) Q Exactive™ Plus Orbitrap coupled with a Vanquish™ UHPLC system (Binary Pump H, Split Sampler HT, Dual Oven). Calibrations were performed immediately prior to sample analysis batch with Thermo™ premixed solutions (Pierce™ LTQ ESI Positive and negative ion calibration solutions, catalogue numbers: 88322 and 88324 respectively).