Genomic Dissection of a Wild Region in a Superior Solanum pennellii Introgression Sub-Line with High Ascorbic Acid Accumulation in Tomato Fruit

The Solanum pennellii introgression lines (ILs) have been exploited to map quantitative trait loci (QTLs) and identify favorable alleles that could improve fruit quality traits in tomato varieties. Over the past few years, ILs exhibiting increased content of ascorbic acid in the fruit have been selected, among which the sub-line R182. The aims of this work were to identify the genes of the wild donor S. pennellii harbored by the sub-line and to detect genes controlling ascorbic acid accumulation by using genomics tools. A Genotyping-By-Sequencing (GBS) approach confirmed that no wild introgressions were present in the sub-line besides one region on chromosome 7. By using a dense single nucleotide polymorphism (SNP) map obtained by RNA sequencing (RNA-Seq), the wild region of the sub-line was finely identified; thus, defining 39 wild genes that replaced 33 genes of the ILs genetic background (cv. M82). The differentially expressed genes mapping in the region and the variants detected among the cultivated and the wild alleles evidenced the potential role of the novel genes present in the wild region. Interestingly, one upregulated gene, annotated as a major facilitator superfamily protein, showed a novel structure in R182, with respect to the parental lines. These genes will be further investigated using gene editing strategies.


Introduction
The cultivated tomato (Solanum lycopersicum) belongs to one of the most important and large plant families worldwide. Indeed, the Solanaceae family consists of about 98 genera and 2700 species [1]. In addition to the cultivated species, there are other related wild species, including Solanum pimpinellifolium, Solanum chmielewskii, Solanum chilense, Solanum neorickii, Solanum peruvianum, Solanum habrochaites, and Solanum pennellii [2,3]. Recent studies have focused on S. pennellii, a wild species from South America, considered an important donor of germplasm for the cultivated tomato, which lost genetic variability during domestication and evolution [4]. In order to fully exploit the genetic variability of wild species, introgression lines (ILs) were obtained, which carry homozygous regions of the wild genomes in a common cultivated background. An entire set of ILs represent a wild species' entire genome and can be used to transfer traits into commercial cultivars. Up until now, the most widely exploited IL population for genetic and molecular studies was that obtained by crossing S. pennellii with the cultivated variety M82, consisting of 76 introgression lines and sub-lines [5][6][7][8].

Plant Materials
Plant material consisted of the S. pennellii introgression line IL7-3 (accession LA4066), the IL7-3 sub-line R182, and the cultivated genotype M82 (accession LA3475). The sub-line R182, kindly provided by Dr. Dani Zamir (Hebrew University, Israel), was selected by using species-specific markers at the Department of Agricultural Sciences of University of Naples Federico II, and then characterized considering different fruit quality traits [32,35]. The three genotypes were grown in open field during the year 2017 in a randomized scheme, using three replicates per genotype, and 10 plants per each replicate, and following the traditional farming practices in the geographical area. A pool of 10 fruits from the genotypes M82 and R182 was collected from different plants at two development stages: breaker (BR, 45 days post anthesis) and mature red (MR, 55 days post anthesis). For each sample, seeds and columella were removed and fruits were ground in liquid nitrogen and stored at −80 • C until analyses. Leaves were collected from seedlings of IL7-3, M82, and R182, ground in liquid nitrogen by mortar and pestle to a fine powder, and stored at −80 • C until DNA genotyping analyses.

GBS Analysis
Genomic DNA was extracted from 100 mg of young leaf tissue from R182 and the parental lines M82 and IL7-3 using the DNeasy plant mini kit (Qiagen). DNA concentration was determined by using a fluorometer (Invitrogen, Carlsbad, CA) while the 260/280 and 260/230 ratios were measured by using a NanoDrop spectrophotometer (Thermo Fischer Scientific, Waltham, MA, USA). For DNA sequencing, 1 µg of DNA diluted in 30 µL of sterile Milli-Q water was used to obtain the libraries for the ddRAD technology, as reported by Peterson et al. [36]. The restriction enzymes MboI and SphI were used for DNA digestion and the fragments were sequenced using the V4 chemistry paired end 125 bp mode on a HiSeq2500 instrument (Illumina, San Diego, CA, USA). The pipeline Stacks v2.0 [37] was used to demultiplex, clean raw Illumina reads and detect the variants. The reads alignment to the reference genome of Solanum lycopersicum (Tomato Genome version SL3.0) was performed using BWA-MEM [38] with default parameters. Raw variants were filtered using VCFtools v.0.1.13 (http://vcftools.sourceforge.net) [39]. The minimum mean of Depth of Coverage (min-mean DP) = 5 was used as filtering parameter. Gene annotation and prediction of the possible effect of the SNP mutations were evaluated by the SnpEff tool (http://snpeff.sourceforge.net/) [40], using the tomato genome assembly SL3.0 and the iTAG3.2 annotation as references.

RNA-Seq Analysis
RNA extraction was performed from M82 and R182 fruits collected at breaker and mature red developmental stages by using the TRIzol reagent (Thermo Fischer Scientific, Waltham, MA, USA) and following the manufacturer's guidelines with some modifications. One mL of TRIzol reagent was used for 100 mg of frozen tomato fruit powder, afterwards the mixture was vortexed, incubated on ice for 5 min, and centrifuged at 14,000 rpm for 10 min at 4 • C. The supernatant was transferred in a fresh Eppendorf tube and stored on ice for 5 min. Then, 0.2 mL of chloroform were added, the mixture was vortexed, incubated for 5 min on ice, and centrifuged at 14,000 rpm at 4 • C for 15 min. The colorless upper aqueous phase was transferred in a fresh Eppendorf tube, 0.5 mL of 2-propanol was added, the mixture was gently mixed by pipetting, incubated on ice for 10 min and centrifuged at 14,000 rpm for 10 min at 4 • C. The supernatant was discarded; the RNA pellet was washed four times by adding 1 mL of 75% ethanol, vortexed, and centrifuged at 14,000 rpm for 5 min at 4 • C. The pellet was then briefly dried for 5-10 min and dissolved in DEPC-treated water. Then, 0.1 V of 3 M Sodium Acetate (NaOAc, pH 5.5) was added and the RNA solution was gently mixed by pipetting. After that, 2 V of 100% ethanol were added, and the solution was gently mixed. For RNA sequencing, 3 µg of RNA diluted in 50 µL of DEPC-treated water were used and the experiment was performed by the "TruSeq mRNA" protocol for preparing libraries, followed by a pair-end strategy for sequencing.
The differential expression analysis was carried out using the Sequentia Biotech data analysis software "AIR" (https://transcriptomics.sequentiabiotech.com/), which relies on the Bioconductor package edgeR [41]. Genes were considered significantly differentially expressed if the false discovery rate (FDR) of the statistical test was less than 0.05.

R182 Introgression Analysis and Genome Reconstruction
The RNA-Seq reads obtained from the genotype R182 were used to identify variants against the two parental genomes (S. lycopersicum M82 and S. pennellii). In particular, the variants obtained against the M82 genome (the background of the sub-IL) allowed identifying the break points of the introgressed region, whereas the variants identified against the S. pennellii genome (the wild donor of the introgression) allowed defining the size of the introgressed region. In order to map the variants, refine the mapping on the splicing junctions, and call the variants on the two genomes, three software were exploited using default parameters: STAR [42], OPOSSUM [28], and PLATYPUS [43]. Then, heterozygous and low-quality variants were filtered out using VCFtools [39]. The genome of the R182 was reconstructed by merging the M82 genome, defined by the break points with the S. pennellii complementary segment. The corresponding gene features coordinates of both parental lines were transferred to the reconstructed R182 genome by a lift-over process.
For comparative genomic analyses, the Best Bi-directional Hits (BBH) between genes of M82 and S. pennellii were assessed by InParanoid v8 [44] software using as input the complete proteins set of S. lycopersicum iTAG4.1 and the S. pennellii v2 annotations. For the genes that did not show a BBH, a BLASTp [45] search was conducted (e-value 1 × 10 −10 ) to evaluate the best match. A synteny analysis between the parental lines was also performed by using SyMAP v.5 [46].

Molecular Marker Analysis
A border check reliability was carried out by looking at the expression of the genes in RNA-Seq data around the putative borders of the introgression. When the genes were not expressed (no reads were available for identifying polymorphisms) specific SCAR/CAPS markers were designed in order to better identify the genes of the wild region introgressed into the R182 genome. Amplification of genomic DNA was carried out using primers designed based on polymorphisms detected in the R182 introgressed region between S. lycopersicum (iTAG4.1) and S. pennellii (v2) genomes. PCR amplification was carried out in 50 µL reaction volume containing 50 ng DNA, 1X of MyTaq TM reaction buffer, 1.0 mM primer, and 1 U MyTaq TM DNA polymerase (Bioline). For designing CAPS markers, restriction enzymes suitable to detect polymorphic SNPs between the fragments amplified were found using the tool CAPS Designer available at the Sol Genomics Network (solgenomics.net). Amplified and restricted fragments were visualized on agarose gel at different concentrations depending on their expected size. In order to identify if the last gene of the introgressed region is present in its wild (Sopen07g024640) or cultivated (Solyc07g049310) allele, the whole gene was sequenced in the genotype R182. The amplification was carried out using three primers pairs (Table S1) designed based on the Sopen07g024640 genomic sequence. PCR amplification was carried out in 50 µL reaction volume containing 50 ng DNA, 1X of Phusion HF buffer, 1.0 mM primer, 10 mM dNTPs and 1 U Phusion™ High-Fidelity DNA Polymerase (Thermo Scientific™). PCR products were analyzed on 1% (w/v) agarose gel and amplicons obtained were purified using the QIAquick Gel Extraction Kit (Qiagen). Then PCR purified products were sequenced using the Eurofins sequencing service (Mix2Seq kit). Geneious software v11.1.5 (Biomatters, http://www.geneious.com) was used to process sequences and MUSCLE algorithm to perform the multiple sequence alignment (https://www.ebi.ac.uk/Tools/msa/muscle/).

GBS Analysis
In order to better characterize the sub-line R182, the presence of spurious (outside of the introgressed region) wild fragments in its genome was ascertained by analyzing data coming from a wider GBS experiment, including the sub-line R182, IL7-3 and the cultivated parental genotype cv. M82. As total, 458 filtered polymorphisms (Table S2) were extracted. Among these, just six (1.3%) mapped outside of the introgressed region 7-3, one on chromosome 1, one on chromosome 4, and two on chromosomes 5 and 12, respectively. The remaining 452 variants, as expected, mapped within the wild region 7-3, being 365 (80.8%) SNPs and 87 (19.2%) insertions or deletions (INDELs). R182 genotyping showed 14 SNPs, and four INDELs in a well-defined region of chromosome 7 extending from SNP at position 59,347,691 (Solyc07g048020) to SNP at position 59,641,222 (Solyc07g049230). The result confirms that no spurious wild introgressed regions were present in the parental line IL7-3 neither in its sub-line R182. The SnpEff analysis of the 18 mutations of R182 evidenced that they affected the genes Solyc07g048020 (two SNPs and one INDEL), Solyc07g048060 (seven SNPs), Solyc07g049160 (one SNP and two INDELs) and Solyc07g049230 (four SNPs and one INDEL). In most cases, these mutations caused modifier effects on the proteins, being classified as intergenic variants.

RNA-Seq Analysis of R182 and DEGs Identification
A RNA-Seq analysis was carried out to highlight transcriptomic differences between the sub-line R182 and the parental genotype M82. Two stages of fruit ripening were assessed, the breaker (BR) and mature red (MR). The quality check (QC), performed on the raw sequencing data to remove low quality portions and Illumina adapters, showed an average reduction of 15% of reads number (Table S3). The remaining high-quality reads aligned against the S. lycopersicum reference genome showed an average of uniquely mapped reads of approximately 94% and an average mismatch rate per base of 0.14% (Table S3).
Following the RNA-Seq data analysis, the R HTSFilter package was applied for the statistical analysis aimed at removing the not expressed genes and the ones showing a high variability. The software relies on a filtering procedure for replicated transcriptome sequencing data based on a Jaccard similarity index. As a result, 19,515 and 16,444 genes were retained for BR and MR samples, respectively. The genes passing the HTSFilter were then used for differential expression analysis ( Figure 1A). The comparison between R182 and M82 for the BR stage showed 54 differentially expressed genes (DEGs) of which 35 upregulated and 19 downregulated, whereas for the MR stage evidenced 100 DEGs of which 40 upregulated and 60 downregulated (for a complete list of the DEGs see Tables S4 and S5). As a whole, the DEGs identified in BR and MR fruit mapped across all the chromosomes (Table 1), with a higher number on chromosomes 1, 3, and 7. The most significant differences in expression at BR stage ( Figure 1A) were highlighted for Solyc03g045140 (Cyclopropane-fatty-acyl-phospholipid synthase), Solyc09g007020 (Pathogenesis-related protein) as well as for Solyc07g048100 (BRCT domain-containing protein) and Solyc07g062600 (Acyl-CoA N-acyltransferase). At MR stage ( Figure 1B) the most significant expression change was recorded for Solyc07g049140 (Metallocarboxypeptidase inhibitor), which appears to be highly downregulated (-13 log2FC) in R182 compared to M82.
Searching among genes commonly regulated in both the studied fruit developmental stages (BR and MR), seven upregulated and eight downregulated genes were found, five of which mapped in the introgressed region of the sub-line R182 ( Table 2). Another gene (Solyc07g049290) of the introgressed region of R182 resulted upregulated, even though only in the MR stage; and it is annotated as a major facilitator superfamily (MFS) protein-like. The DEGs were also functionally annotated to analyze their gene ontologies (GO). DE genes at BR stage highlighted GOs enriched for plant cell wall organization, proteolysis and for fatty acid metabolism terms. By contrast, GO enrichment for MR stage showed, among others, an abundance of terms related to amino acid and sucrose metabolic processes (Table S6). mature red fruit stages. The y-axis corresponds to the mean expression value of log 10 (false discovery rate, FDR) and the x-axis displays the log2 (fold change) value. The red dots represent the significantly upregulated genes while the green dots represent the significantly downregulated genes. Grey dots represent the genes whose expression levels did not reach statistical significance.

R182 Introgressed Genome Reconstruction
In order to define the group of wild genes mapping in R182 introgressed region, we took advantage of the expressed reads obtained from the RNA-Seq analysis, by identifying those that were polymorphic between R182 and M82. In particular, in this analysis a gene was considered wild (from S. pennellii) in R182 if the number of variants that it harbors is greater than the number of variants found in the cultivated genome background (M82). A check across the whole genome ( Figure S1) clearly showed as only one single region on chromosome 7 presents wild genes in the R182 sub-line confirming data reported by the GBS analysis. Specifically, an interval including a group of 31 genes, from Solyc07g048010 to Solyc07g049310, showed an average of 17 variants per gene indicating that they come from the wild donor ( Figure S2). The same analysis was carried out by comparing the reads of R182 across the S. pennellii genome. In this case, to confirm that a wild gene was introgressed in the R182 sub-line we do not expect to detect variants respect to the wild genome. The result confirmed the presence of two segments with wild genes in the R182 genome, one including a group of genes spanning from Sopen07g024420 to Sopen07g024640 and the second including another group of 16 genes from Sopen07g025130 to Sopen07g025280 ( Figure S2). Figure 2 shows as these two splitted wild regions are syntenic with one region identified on M82. However, in order to clarify this unexpected result and to exclude mis-assembly issues of the S. pennellii genome, we analyzed more in details the homology of the genes mapping in the region. First of all, we evaluated the correspondence on the basis of their protein and genomic alignments; thus, detecting the homologous relationships between S. pennellii and S. lycopersicum genes belonging to the introgressed region of R182 sub-line ( Figure S2). Twenty-two Best Bidirectional Hits (BBH) relationships were found pointing out the high similarity/orthology between the two parental lines in this region (Table 3). However, some differences were also observed. In particular, three S. lycopersicum (Solyc07g048020, Solyc07g049135 and Solyc07g049215) and seven S. pennellii (Sopen07g024570, Sopen07g024580, Sopen07g025150, Sopen07g025180, Sopen07g025190, Sopen07g025200, Sopen07g025250) genes appeared to be species-specific since they did not show any homology compared to annotated genes of the contrasting genome. In addition, from this analysis we could also report a one-to-many relationship between a S. pennellii gene (Sopen07g024560) and two S. lycopersicum genes (Solyc07g049120, Solyc07g049130) underlying a probable duplication event in S. lycopersicum cv. M82. Probable duplication events were observed in the opposite direction as well, as for example between one S. lycopersicum (Solyc07g049240) and two S. pennellii (Sopen07g025260, Sopen07g025270) genes. After, we combined this analysis with the information obtained from eight species-specific markers built to confirm missing points in the polymorphism analysis obtained by RNA-Seq data (due to absence of expression/available reads). Altogether, these data allowed reconstructing the presence and order of the wild genes of the introgressed region of the sub-line R182 ( Figure 2). Comprehensively, in the R182 sub-line 39 wild genes replaced 33 S. lycopersicum cv.
M82 genes. Marker analysis confirmed that no other region between the wild segments was introgressed, thus also confirming the synteny of the wild segments with the corresponding S. lycopersicum segment. Figure 2. Reconstruction of the introgressed wild region in R182 by the synteny analysis of the genomic portion of chromosome 7 between S. pennellii and S. lycopersicum. The introgressed genes of S. pennellii, the replaced genes of S. lycopersicum, and their function annotation were highlighted in the right side of the figure. Genes evidenced in red represent species-specific genes. Experimentally tested markers (+) and duplicated genes (*) were reported next to the corresponding genes. Table 3. Homologous relationships between genes of S. pennellii (S. pen) (release v2) and S. lycopersicum M82 (S. lyc) (release iTAG 4.1) belonging to the introgressed region of R182 sub-line. For each gene, the position and the relationship type were reported.

Sopen07g024640 Sequencing
Since the molecular marker analysis carried out on the last gene of the introgressed region (Solyc07g049310/Sopen07g024640) did not allow to clearly identify the presence of the cultivated or wild allele, we carried out the sequencing of Sopen07g024640 (8928 bp) of the R182 genotype, dividing this gene in three parts spanning from 70,111,740 to 70,114,744 bp, from 70,114,745 to 70,117,772 bp and from 70,117,773 to 70,120,667 bp. Results from the Sanger sequencing showed that the first two parts of the Sopen07g024640 sequenced in R182 perfectly align with the sequences of S. pennellii. As for the third part, the multi-alignment between the Sopen07g024640, the ortholog Solyc07g049310 and the corresponding sequence in R182, showed at position 70,119,958 the last S. pennellii variant of the R182 sequencing product, whereas the first S. lycopersicum variant was detected at position 70,120,258 bp ( Figure 3). Starting from position 70,120,258 bp up to the end of the gene, the sequence of R182 perfectly aligned with the sequence of S. lycopersicum. Indeed, six SNPs and three INDELs were found between sequences from R182 and the S. pennellii genome in this last region (Figure 3).

Discussion
Since the last three decades, S. pennellii ILs were studied for identifying QTLs controlling different traits [5,8,47,48] including final yield, disease resistance and tolerance to abiotic stresses [7,49,50]. The ILs were obtained in the year 1992 through a series of backcrosses by the group of Eshed and Zamir [51,52], and then were again characterized by Fulton et al. (Tomato-EXPEN 2000) [53] and Chitwood et al. [6]. However, all these studies relied only on the use of the tomato genome of S. lycopersicum cv. Heinz as unique reference, though the IL parental lines derive from two species: S. lycopersicum and S. pennellii. This simplification could theoretically produce erroneous results when trying to precisely identify borders and extension size of the introgressed regions, considering that S. pennellii is a wild species with a genome quite different from S. lycopersicum in terms of size, collinearity, transposable elements, and so on [4]. For these reasons, a precise characterization of these introgression lines should take into consideration the two parental genomes to identify both the wild regions inserted (from the S. pennellii donor) and the cultivated region replaced (from the S. lycopersicum background).
Previously, the S. pennellii IL7-3 sub-line R182 was selected in our laboratory considering its good performances in terms of fruit quality [35]. This prompted us to better define genes mapping in R182 wild region, since this sub-line could be an important source of candidate genes and regulators controlling AsA synthesis and/or accumulation in red ripe fruit. In our previous study, we attempted to define the R182 introgressed region by designing 18 species-specific molecular markers using S. lycopersicum as reference genome [35].
The integration of genomics tools carried out here, as well as the comparison between the two parental genomes (S. pennellii and S. lycopersicum), allowed reaching two main aims of our work: (1) to exclude the presence of additional wild regions in the sub-line, which could be involved in AsA accumulation in the fruit, and (2) to exactly reconstruct the number and position of wild alleles that map in the introgressed region, following recombination events that occurred between the two parental genomes. First of all, both the genotyping procedures adopted in the present study (GBS and RNA-Seq variants detection) were conducted at genome-wide scale and allowed excluding that spurious wild fragments in the R182 sub-line could contribute to AsA synthesis and/or accumulation (Table S2 and Figure S1). Therefore, these results led us to only focus on the small-sized wild region present in the sub-line. Afterwards, we succeeded to finely define both the break points and the extension of the wild introgressed segment by investigating the variants detected from the RNA-Seq reads. This analysis allowed relying on a wide number of SNP-markers (>500 SNP in the putative introgressed region) that helped to precisely identify the interval of the wild genome in the sub-line. Additional molecular markers were also built on critical spots and the complete sequencing of a border-gene was also performed to accurately define the point where the introgression breaks.
The genome-based approaches performed helped to highlight that the introgressed region has a different size and a different number of annotated genes compared to both the parental lines S. pennellii and S. lycopersicum. In particular, we found that a 394 Kbp fragment including 33 S. lycopersicum genes (spanning from Solyc07g047990 to Solyc07g049310) was replaced by two segments of 284 Kbp and 164 Kbp including 23 (from Sopen07g024420 to Sopen07g024640) and 16 (from Sopen07g025130 to Sopen07g025280) S. pennellii genes, respectively. A synteny analysis between the parental lines demonstrated a translocation of the second wild segment (including 16 genes), which was embedded in the first region with 23 wild genes (Figure 2 and Figure S2). Although the hypothesis of a double recombination event is also possible, the short distance between the two wild segments make it more likely the hypothesis that a mis-assembly of a scaffold in that region of the S. pennellii genome occurred. Further investigation on the genome of S. pennellii will be necessary to ascertain this issue.
Since a recent update of the S. lycopersicum genome assembly and annotation was released [54], a more detailed study to shed light on the exact order of genes in the introgressed region of R182 was undertaken. The most recent release (iTAG4.1) just exhibits slight differences in the analyzed region when compared to the previous version v3.2. The main changes concern the genes Solyc07g048035, Solyc07g048085, and Solyc07g049165, which were removed from the iTAG4.1 annotation, the gene Solyc07g049135 that substituted the previous gene Solyc07g049140, and the gene Solyc07g049215, which was ex-novo annotated in the last release.
Among all the genes mapping in the introgressed region of R182, we focused our attention on the pairs Solyc07g049280/Sopen07g024610, Solyc07g049290/Sopen07g024620, and Solyc07g049310/Sopen07g024640. These genes, coding for a pyrophosphate-fructose 6-phosphate 1-phosphotransferase (PFP) and two major facilitator superfamily (MFS) membrane proteins, were already proposed as candidate genes [35] that could play an indirect role in AsA biosynthesis and accumulation, modulating the amount of AsA precursors. Indeed, the PFP is a key enzyme involved in the glycolysis/gluconeogenesis pathways [55], whereas MFSs perform the import or export of target substrates such as sugars [56]. The importance of MFSs was further confirmed by the results of the transcriptomics analysis performed on fruits at breaker and mature stages from the sub-line R182 and the cultivated genotype M82. The RNA-Seq analyses pointed out that both MFS genes were upregulated in R182. Therefore, these two genes might be good candidates for controlling AsA content in tomato fruit. Noteworthy, Sopen07g024640 and its ortholog Solyc07g049310, map on the border of the introgression. Our fine investigation of the lower border pointed out that the recombination event leading to the formation of the R182 sub-line occurred in the 3 UTR of this gene. This event produced a chimeric sequence of the gene (Figure 3), which combined most of the gene features of the S. pennellii allele (5 UTR, coding sequence and part of 3 UTR) with the remaining part of the 3 UTR of the S. lycopersicum form. This change/recombination of the 3 UTR could be responsible for the altered expression level of the gene Solyc07g049310, being UTRs important regulator regions for transcription [57]. The function and mechanisms of action of this gene will be validated by using genome editing techniques, such as the CRISPR/Cas9 technology.
Additional interesting findings were highlighted when comparing S. lycopersicum and S. pennellii genes belonging to the introgressed region of R182 (Figure 2). The integration of synteny and orthology analyses demonstrated that three cultivated genes, Solyc07g048020 annotated as charged multivesicular body, Solyc07g049135 annotated as fruit-specific protein, and Solyc07g049215 described as an unknown protein, do not seem to have any ortholog in the corresponding S. pennellii genome annotation. The absence of the Solyc07g049135 ortholog (also reported as Metallocarboxypeptidase inhibitor TCMP-2) into the genome of the R182 sub-line was also corroborated by the RNA-Seq analysis that showed a severe downregulation (almost no expression) of this gene in R182 (-13 logFC) at both fruit ripening stages (BR and MR) when compared to the M82 control line. In plant, the Metallocarboxypeptidase inhibitors, a type of proteases inhibitors (PI), are multifunctional proteins primarily involved in plant protection against pathogens and in plant tolerance modulation to diverse abiotic stresses. The production of PIs is highly regulated by different hormones, such as jasmonic acid, abscisic acid, and ethylene [58], which are reported to also regulate in turn ascorbic acid content [59][60][61][62][63]. In addition, a recent study reported that two specific metallocarboxypeptidase inhibitors in tomato, TCMP-1 and TCMP-2, were associated with fruit development [64]. The lack of this specific protease inhibitor in R182 line could, to some extent, influence the regulation of pathways that are linked to fruit development and ripening, maybe acting on the modulation of the hormonal network and, ultimately, on the production of AsA in the fruit.
Looking at the R182 wild fragment, we found seven S. pennellii specific genes (without any S. lycopersicum orthology), including a RNA-directed DNA polymerase protein (Sopen07g025250), a Shaker family potassium ion (K+) channel (Sopen07g025150), two cysteine-rich receptor-like protein kinases (Sopen07g024580 and Sopen07g025180) and three hypothetical proteins (Sopen07g024570, Sopen07g025190 and Sopen07g025200). Receptor-like kinases (RLKs) are ubiquitous molecular components and play essential roles in signal transduction by recognizing extracellular stimuli and activating downstream signaling pathways. A role for RLK was reported in environment responses [65], differentiation [66], growth, and development [67]. Even if no direct evidence of their involvement is reported for AsA accumulation, most CRKs genes are differentially expressed when the levels of reactive oxygen species (ROS) rise [68]. This highlights a possible role of RLKs in the oxido-reduction balance, which could probably imply an AsA homeostasis/modulation too.
Moreover, Sopen07g025260 and Sopen07g025270 align both with Solyc07g049240, representing a probable gene duplication in S. pennellii. Indeed, both Sopen07g024560 and Sopen07g025270 encode for a cold-inducible cationic peroxidase. Peroxidase are genes involved in the recycling pathway of ascorbate [33], which might have a role in shifting the final ratio of the oxidized/reduced AsA in tomato fruit.
Finally, since a consistent group of genes mapping outside the introgressed regions was differentially expressed, a more in-depth investigation of the introgressed region was performed in order to verify if some genes/TFs mapping within it could control the expression of genes in other portions of the genome. Among the genes mapping outside of the introgressed region of the sub-line R182, two genes Solyc03g097580 and Solyc05g024260, both belonging to the Bidirectional sugar transporter SWEET family and the Semi-SWEET-SWEET glucoside transporter superfamily, were upregulated in the breaker and mature stages, respectively, and the Solyc03g098290 encoding for a sucrose synthase was downregulated in the MR stage. All these genes could be involved in the AsA biosynthetic pathways and will be further investigated.

Conclusions
The genomic tools exploited to in depth reconstruct the introgressed region in the S. pennellii sub-line R182 allowed defining that 39 wild genes replaced 33 cultivated genes. In particular, a group of genes mapping within and out the introgressed region were identified as candidate genes controlling ascorbic acid content in the fruit of the sub-line, and are indirectly or directly involved in sugar and/or hormones pathways. Other wild genes encoding proteins with still unknown function could act as regulators of genes mapping outside the region and their potential role in this regulatory mechanism will be explored in the future. In addition, the chimeric structure of a Major facilitator superfamily (MFS) membrane protein mapping at the lower border of the introgression region deserves further investigation.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/8/847/ s1, Figure S1. Distribution of the variants observed comparing the RNA-Seq reads from the R182 sub-line and S. lycopersicum (M82) across the 12 tomato chromosomes. Red rectangle highlights the peak of variants corresponding to the wild introgression on chromosome 7. Figure S2. Homology between the introgressed and replaced genes of the R182 sub-line. The number of variants against R182 is reported for both parental lines (S. pennellii and S. lycopersicum). Genes for which no expression was recorded are marked as "_NE". Grey connectors represent orthology/homology between genes computationally computed while red connectors represent orthology/homology experimentally confirmed. Table S1. Primer sequences (5'-3') designed based on the Sopen07g024640 genomic sequence. Table S2. GBS variants detected for M82, IL7-3 and R182 genotypes. For each variant, the chromosome number, the position in the genome (SL3.0), the reference and the alternative allele, the genotyping (0|0 = Homozygous reference, 0|1 = heterozygous alternative and 1|1 = homozygous alternative) as well as the gene ID, the mutation site and the predicted effect according to SNPeff analysis are reported. In yellow, the variants of the putative introgressed region of R182 sub-line. Table S3: Number of reads of R182 and M82 before and after the quality check (QC). Number of unique, multi-and un-mapped reads, the mismatch ratio as well as the number of reads assigned to genic features (Assigned-GTF) are reported. Table S4. Downregulated genes between R182 and M82 at breaker (BR) stage. For each gene the logarithm Fold Change (logFC), the p-value, the false discovery rate (FDR), the Gene Description, the mean of fragments per kilobase of exon per million reads mapped (FPKM) reference and the mean FPKM query are reported. Table S5. Downregulated genes between R182 and M82 at mature red (MR) stage. For each gene the logarithm Fold Change (logFC), the p-value, the false discovery rate (FDR), the Gene Description, the mean of fragments per kilobase of exon per million reads mapped (FPKM) reference and the mean FPKM query are reported. Table S6. Gene Ontology enrichment for upregulated and downregulated genes between R182 and M82 at breaker (BR) and mature red (MR) stages. For each gene, the GO ID, the GO category, the GO description, the p-Value, the false discovery rate (FDR) and the enrichment score (E_Score) are reported.