Quercus suber Transcriptome Analyses: Identification of Genes and SNPs Related to Cork Quality †

: Cork is an ancestral natural material derived from the cork oak tree ( Quercus suber L.), with multiple industrial applications. During the years, this material has been the subject of several studies. The recent sequencing of the Q. suber genomes opened the possibility to make new studies regarding cork quality. In this study, the transcriptome of cork with superior and poor quality is compared to highlight new molecular pathways and identify SNPs that can be associated to the cork trait, which remains one of the main concerns of the cork industry.


Introduction
Cork oak (Quercus suber L) is an evergreen broad-leaved tree that belongs to the genus Quercus (oaks) of the Fagaceae family and is one of the most important Mediterranean forest tree species. It plays an important environmental, social, and economic role in the Mediterranean ecosystems known as "Montado" in Portugal and "Dehesa" in Spain. Cork is obtained from the extraction of the outside layer of the cork oak tree which is composed by suberized cells. The first cork extraction occurs when the tree has between 20-25 years old. After that, cork is extracted at regular intervals of at least 9 years [Error! Reference source not found.]. The chemical cork composition has already been extensively described, and it is known that can change depending on environmental and genetic conditions [2,3]. Parameters as thickness and structural discontinuities have been used to evaluate the cork quality [4]. Based on quality, the cork's economic value can change, which is one of the main concerns in the cork industry. Scientific studies have been performed to identify conditions and processes that can help to stabilize the cork quality and, therefore, ensure a more profitable and sustainable production [5,6].
Recent advances in sequencing technology, and subsequently sequencing of the cork oak genome, open the opportunity to perform new studies regarding the cork formation  process [7]. The aim of this study was to analyze the dynamic profile of differential expressed genes associated with cork quality, highlighting for the first time, to our knowledge, a set of SNPs that could be involved in cork differentiation.

Plant Material and RNA Isolation
Amadia cork planks were collected from physiologically active cork oak, during the period July-August. Cork samples from eight genotypes were classified in Good and Bad cork quality (GQ and BQ, respectively) according to the main traits of thickness and structural discontinuities (porosity and inclusion of woody cells). GQ and BQ cork samples were harvested in the Barranco Velho and Cercal regions, Portugal, respectively. The inner part of harvested cork planks, corresponding to phellogen and phellem cells, was scraped and stored, according to  for RNA extractions [8]. Total RNA from GQ and BQ samples (four cork oak trees per condition) was isolated according to Almeida et al. (2013), and sequenced by Illumina HiSeq 2000, producing PE reads of 100 bp in length [9].

Reads Pre-Processing and Mapping
The raw reads were preprocessed keeping only reads with a minimum quality of 20 and minimum length of 80 bp using Trimmomatic (v0.38) [10]. Then, the pre-processed reads were mapped against the Q. suber genome with STAR (version 2.5.2 b) using the multi-sample 2-pass mapping mode, according to its user's guidelines [11]. The unique mapped reads (UMR) were then filtered and extracted using SAMtools [12].

Differential Expression Analysis
The differential expression analyses was performed using edgeR, a Bioconductor package [13]. To avoid knowing issues with the incompatibility of some tools such as edgeR to use biological and technical replicates at the same time for differential expression analyses, technical replicates were merged using the function "sumTechReps" from edgeR. Then, genes with low counts were excluded and a Trimmed Mean of M-values (TMM) normalization was applied. In the end, only genes with a log fold change (logFC) ≥ |2| and with a False Discovery Rate (FDR) ≤ 0.05 were considered as differentially expressed genes (DEGs). At last, the interactions between DEGs associated with KEGG pathways and Gene Onthology (GO) terms were visualized and analyzed with Cytoscape. Additionally, BinGO (Cytoscape plugin) was used to identify GOs overrepresented over the set of genes up-or down-regulated against the GO database [14,15].

Variant Calling and Annotation
For Single Nucleotide Polimorfism (SNP) identification, a variant calling analyses was performed using mpileup from SAMtools. The raw variants were then filtered by SNP quality (≥30) and minimum depth coverage per genotype (≥8); indels were removed and only bi-allelic SNPs were maintained, resulting in a high-quality SNP set which was annotated using ANNOVAR [16].

Results and Discussion
The sequencing of RNA from good and bad cork quality samples resulted in a total of 708,803,510 reads. After pre-processing, 98% of the reads were kept. Pre-processed reads were then mapped against the cork oak genome [7] and the UMR obtained, representing 78% of the mapped reads, and used for the differential gene expression analyses.

Differential Expression Analysis
A total of 172 genes were differentially expressed, of which 66 were more expressed in GQ, and 106 in BQ. Within the set of the identified DEGs, only 73 were associated with at least one GO term. For each sub-ontology (biological process (BP), molecular function (MF) and cellular components (CC), a total of 50, 20 and 18 terms were associated with the DEGs (Table 1). After performing the over-represented analyses with BinGO a total of 23 GO terms were over-represented in the set of genes more expressed in GQ (BP:19; CC:1; MF:3) while only one GO term (BP: response to stress process) was over-represented in the set of genes more expressed in BQ. Several proteins such as thiamine thiazole, late embryogenesis abundant protein lea5, and dehydrin erd10 were associated with the response to stress process in BQ (Table  2). Additionally, heat shock proteins were also found differentially expressed being HSP17.5-E, HSP17.6C, HSP26.5 and HSP22.7 more expressed in BQ and HSP70-15 more expressed in GQ. This is not the first time that proteins from heat shock group are identified in phellem of cork oak trees [5]. Regarding the heat shock proteins more expressed in BQ the only information available so far is that its expression confers resistance to heat stress. [17,18]. Contrarily, HSP70-15, which was more expressed in GQ, belongs to 70-kDA heat shock proteins group, a well-known group, frequently found highly expressed in tissues under stress [19]. These group of proteins are actively involved in the folding of de novo synthesized proteins, translocation of precursor proteins into organelles and degradation of damaged proteins under stress conditions [19].
During the differential gene expression analysis, genes uniquely expressed in both conditions were identified. For instance, SRG1, which is involved in oxidation-reduction processes, and PIP2-2, an important aquaporin involved in the transport of water and other small solutes across cell membrane [20], were the only genes exclusively expressed in BQ. On the other hand, 14 genes were found uniquely expressed in GQ. It is important to highlight the presence of genes such as EMF2 and FYPP-3, that are involved in regulation of flower process [21,22], and KUA1, which is a transcript factor, from MYB-like protein family, that acts as a repressor and promotes response to auxin, ethylene, and abscisic acid [23]. These plant growth regulators (PGR) are involved in the regulation of plant growth and development, and abscisic acid is also associated with the increase of resistance of plants to different stresses. Likewise, EMF2 and FYPP-3 are also associated with the regulation of abscisic acid [21,24]. These results could indicate a correlation between hormonal regulation and cork quality, however more studies are necessary to assess its influence.
Two genes, both annotated as acetyl-coenzyme A carboxylase carboxyl transferase (beta subunit) (AccD), were found more expressed in GQ. The AccD protein is responsible to produce malonyl-CoA from acetyl-CoA, a starting unit of fatty acid biosynthetic process. The fatty acid pathway is responsible for very long and long-chain fatty acids synthesis, which are important precursors of waxes and some suberin monomers, two important compounds of cork. The expression of genes involved in fatty acids biosynthesis in phellogen cork tissue from GQ have already been reported [5] which reinforce the hypothesis that genes involved in the regulation and activity of cell wall assembly can affect cork quality [25].

SNPs analysis
The variant calling resulted in the identification of 1,296,640 raw variants, of which 159,248 were considered high-quality SNPs ( Table 2). The high-quality SNPs were further evaluated to confirm if some of them were located in genes identified as differentially expressed. As a result, 8078 SNPs were identified in 148 DEGs from which only the exonic and non-synonymous SNPs (879) were analysed.
The identification of exclusive SNPs, that means, a SNP is considered as exclusive if is only present in at least 75% of the individuals from one group, GQ or BQ, was performed. Following this criterion, for GQ 121 exclusives SNPs were found in a total of 49 genes, while in BQ 68 exclusive SNPs were found among 44 genes. Regarding the SNPs found in DEGs, 18 were exclusive in 8 genes more expressed in GQ, while 5 SNPs were exclusive in 5 genes more expressed in BQ.

Conclusions
In this study were identified a set of candidate genes for cork quality in Quercus suber. It reveals some mechanisms associated with cork quality, that allow us to hypothesis that the observed differences in cork quality could be directly related with regulation of specific PGR increasing resistance to stress and cell wall assembly. Additionally, several exclusive SNPs for individuals of contrasting phenotypes for cork quality were identified, although further studies will be needed to assess their phenotypic influence and its potential usage as genetic markers for cork quality.
Author Contributions: This study was conceived by S.G. and A.R.; Collection and identification of