Comparative Seeds Storage Transcriptome Analysis of Astronium fraxinifolium Schott, a Threatened Tree Species from Brazil

Astronium fraxinifolium Schott (Anacardiaceae), also known as a ‘gonçalo-alves’, is a tree of the American tropics, with distribution in Mexico, part of Central America, Argentina, Bolivia, Brazil and Paraguay. In Brazil it is an endangered species that occurs in the Cerrado, Caatinga and in the Amazon biomes. In support of ex situ conservation, this work aimed to study two accessions with different longevity (p50) of A. fraxinifolium collected from two different geographic regions, and to evaluate the transcriptome during aging of the seeds in order to identify genes related to seed longevity. Artificial ageing was performed at a constant temperature of 45 °C and 60% relative humidity. RNA was extracted from 100 embryonic axes exposed to control and aging conditions for 21 days. The transcriptome analysis revealed differentially expressed genes such as Late Embryogenesis Abundant (LEA) genes, genes involved in the photosystem, glycine rich protein (GRP) genes, and several transcription factors associated with embryo development and ubiquitin-conjugating enzymes. Thus, these results contribute to understanding which genes play a role in seed ageing, and may serve as a basis for future functional characterization of the seed aging process in A. fraxinifolium.


Introduction
Seed persistence in natural habitats depends on the physical and physiological characteristics of the seed, which varies among species and populations [1]. The effects of climate change can affect flowering in forests and, therefore, there is a need to also evaluate seeds for this type of stress [2]. The continuous human exploitation causes deforestation and necessitates conservation to mitigate the loss of plant diversity. The maintenance of germplasm in seed banks serves as a valuable effort, contributing to that conservation goal [3]. Seed quality is strongly influenced by the inevitable process of aging, including during long-term storage. Currently, the process of aging tests in germplasm banks evaluates the viability of the seeds through a germination test; however, this kind of test does not allow for the protection of the progress of events that underlie the deterioration, indicating only the final stages of the process [4]. This process of monitoring the viability of specimens kept in germplasm banks is a routine practice of the banks, and refers to the evaluation of the physiological quality of the seeds during storage [5]. Thus, alternatives to transcriptome during the aging of these seeds, and suggest possible markers associated with the aging process.

Effects on Physiological Indexes during A. fraxinifolium Seed Treatment
The accelerated aging process revealed great differences between the two accessions ( Table 1). Germination was lower in accession MINAS than in GOIAS, with browning of seeds observed (Figure 1). species. Therefore, the present work aims to use the different seed longevity of A. fraxinifolium Schott accessions collected from two different geographic regions, evaluate the transcriptome during the aging of these seeds, and suggest possible markers associated with the aging process.

Effects on Physiological Indexes during A. fraxinifolium Seed Treatment
The accelerated aging process revealed great differences between the two accessions ( Table 1). Germination was lower in accession MINAS than in GOIAS, with browning of seeds observed (Figure 1).

Sequencing Transcriptome Profile
In order to obtain a set of genes involved in seed longevity of A. fraxinifolium, high throughput RNA sequencing was performed with the two accessions using GOIAS and MINAS controls with 98 and 93% of germination, respectively; and ageing induced GOIAS and MINAS with 97 and 81% of germination, respectively. Filtered reads totaled more than 90 million per treatment, with 124,528 transcripts assembled for all sequenced reads from samples. Sequencing reads were deposited at the National Center for Biotechnology Information (NCBI) under SRA BioProject accession number PRJNA881610. These filtered reads were mapped against assembled transcriptome reads with more than 85% successfully mapped reads (Table 2).

Comparison of Seed Longevity and Identification of Differentially Expressed Genes
The differentially expressed genes analysis identified 296 genes for GOIAS aged vs. control seeds, 115 genes for MINAS aged vs. control seeds, and 327 genes against GOIAS aged vs. MINAS aged seeds ( Figure 2). The comparison between treatments GOIAS control × MINAS control seeds showed no significant differentially expressed genes. The complete list of DEGs can be found in Supplementary Table S1.
Among the DEGs identified as up-regulated in both treatments with aged seeds, two genes are common. Their functions are related to ABC transporter B family member 26 (c31142_g1_i6) and Myosin-15 (c29015_g2_i10). On the other hand, down-regulated DEGs are related to regulation, endocytosis and development (Supplementary Table S2).
In order to identify the functions associated with differentially expressed genes, we performed a GO analysis. Thus, of all genes identified between treatments, we annotated and categorized them in three classes: biological process, molecular function and cellular component. Of these, the most representative for biological process were the nucleic acid metabolic process (11.2%), the protein modification process (9.3%) and macromolecule modification (9.3%); for molecular function catalytic activity (50%), binding (32.7%) and ATP-dependent activity (9.4%) were most abundant; and finally, for cellular component intracellular organelle (33.8%), membrane-bounded organelle (32%) and cytoplasm (17.8%) were observed most frequently ( Figure 3). Considering the comparisons between GOIAS control vs. induced aged seeds, the main GO terms in both up and down regulated genes are related to regulation of transcription, response to stimulus, transport and protein phosphorylation. When considering MINAS control vs. induced aged seeds, the main GO terms from upregulated genes are related to DNA repair, chromatin organization, regulation of transcription, and protein ubiquitination. For the down-regulated genes, the main GO terms are related to transmembrane transport, cell differentiation, phosphorylation, and signal transduction (Supplementary Table S1). From these, we selected some genes possibly directly related to the aging process in each treatment (Table 3). A heatmap of these selected genes possibly involved with the aging/longevity process is presented in Figure 4.         Table 3. Red color indicates highly expressed genes (up regulated), and green represents the down-regulated genes. The green to red color transition reflects the values of an FPKM normalized log2-transformed counts.
Gene Ontology enrichment analysis revealed that most genes are mainly related with the control of gene expression in the GOIAS control vs. GOIAS aged-seeds comparison  Table 3. Red color indicates highly expressed genes (up regulated), and green represents the down-regulated genes. The green to red color transition reflects the values of an FPKM normalized log2-transformed counts.  Gene Ontology enrichment analysis revealed that most genes are mainly related with the control of gene expression in the GOIAS control vs. GOIAS aged-seeds comparison ( Figure 5). When considering MINAS control vs. MINAS aged-seeds, most results are related to the production of miRNA involved in gene silencing and the negative regulation of development, with no enriched pathways found for down-regulated genes. Finally, when comparing GOIAS aged-seeds vs. MINAS aged-seeds, the enriched pathways are mainly related to the response to external stimulus, such as UV-light (Figure 4; Supplementary Tables S3-S5).
( Figure 5). When considering MINAS control vs. MINAS aged-seeds, most results are related to the production of miRNA involved in gene silencing and the negative regulation of development, with no enriched pathways found for down-regulated genes. Finally, when comparing GOIAS aged-seeds vs. MINAS aged-seeds, the enriched pathways are mainly related to the response to external stimulus, such as UV-light (Figure 4; Supplementary Tables S3-S5).

Identification of Transcription Factors and Related Transcription-Mediated Complex
Based on annotation of the DEGs, transcription factors and mediators of RNA polymerase were identified. In GOIAS control vs. GOIAS aged-seeds, six transcripts encoded for putative mediators/co-activation of RNA polymerase II, four for transcription factors and three for transcription activators/adapters. In the MINAS control vs. MINAS agedseeds comparison, one transcript encodes a transcription factor and one other a transcription initiation factor. Finally, in GOIAS aged-seeds vs. MINAS aged-seeds comparison, two transcription factors, one transcription initiation factor and one mediator of RNA polymerase were found, but also one co-repressor (Table 4).

Identification of Transcription Factors and Related Transcription-Mediated Complex
Based on annotation of the DEGs, transcription factors and mediators of RNA polymerase were identified. In GOIAS control vs. GOIAS aged-seeds, six transcripts encoded for putative mediators/co-activation of RNA polymerase II, four for transcription factors and three for transcription activators/adapters. In the MINAS control vs. MINAS aged-seeds comparison, one transcript encodes a transcription factor and one other a transcription initiation factor. Finally, in GOIAS aged-seeds vs. MINAS aged-seeds comparison, two transcription factors, one transcription initiation factor and one mediator of RNA polymerase were found, but also one co-repressor (Table 4).

Discussion
When stored for long periods, seeds eventually lose their germination capacity, caused by loss of viability. This affects not only commercial operations but also ex situ seed banks for species conservation representing remaining in situ populations. The ex situ maintenance of viable seeds covering wide genetic variability is an extremely important process for the genetic conservation of species [31]. In the case of long-lived tree species, such as A. fraxinifolium, this maintenance of seed banks is still dependent on long years of development, hampered by irregular flowering throughout the reproductive season [32], which implies the necessity to frequently add to existing collections. Single stranded RNA is notoriously unstable and degrades even in dry stored seeds. It was reported that the degradation of long mRNAs is stronger in aged seeds [6,33]. Other studies focused on RNA degradation including seed water content [34], RIN (RNA integrity number) [7,35], transcriptome and gene expression levels [9,22]. This makes mRNA associated processes a focal point in seed storage studies.
The molecular mechanisms behind aging of seeds are associated with oxidation of molecules such as nucleic acids, lipids and proteins, and protection from these effects with production of antioxidant, reduction of metabolism and active repair of nucleic acids [36]. In the current study, GO enrichment analysis showed that the expression control and gene silencing pathways, such as miRNAs, developmental, transcription and metabolism genes are up-regulated in all treatment comparisons. Although we found several enriched gene silencing pathways, there are several others associated with the cellular developmental process, cell differentiation, or in response to external stimulus, such as cellular response to light stimulus. Other studies also indicated that abscisic acid (ABA) is involved in seed dormancy and desiccation tolerance [37]. In this study, one gene glycine-rich domain-containing protein 1 (GRP proteins) in treatment comparison was identified as down-regulated. Increased expression levels of GRP proteins was associated with ABA induction in other species [38][39][40]. In both treatments, ABC transporter B family members are up-regulated. Other studies suggest that these genes are associated with abscisic acid [41,42], revealing the need for further investigation of these genes regarding their expression and abscisic acid content in A. fraxinifolium. These results suggests that this set of genes may be useful for future evaluation of seed viability in A. fraxinifolium. Despite the antagonistic effects of signaling pathways of ethylene and ABA, the presence of another down-regulated gene, Ethylene-responsive transcription factor RAP2-12, suggests a complex interaction, since both inhibit root growth after germination [43][44][45]. Interestingly, among genes common to both treatments, casein kinase 1-like protein 1 was suggested as a positive mediator of ABA signaling in Arabidopsis [46], but in this study it was shown to be down-regulated. When considering the other common genes, one sterol synthesis related gene (3beta-hydroxysteroid-dehydrogenase/decarboxylase isoform 1) [47] was found as down regulated. Low levels of sterol contents result in inhibition of germination, while high levels induced earlier germination [48,49]. Another down regulated gene, Protein FAR1-RELATED, is a component of the phytochrome A signaling pathway and found to be involved in abscisic acid (ABA) signaling, UV-B signaling, and reactive oxygen species (ROS) homeostasis, among others [50,51]. These results indicate that these pathways and differentially expressed genes can be further analyzed in the future for the development of an expression diagnostics tool for seed aging.
Of the up-regulated genes of aged seeds from Goiás (GOIAS), DEGs with the highest expression value were related to the processes of protein kinase, protein helicase, microtubule proteins, cell membrane components, polymerase and transport of carbohydrates. From the up-regulated genes of aged seeds from Minas Gerais (MINAS), the DEGs with the highest expression value were related to the processes DNA transcription, ubiquitin proteins, starch biosynthesis, transport of zinc and Late embryogenesis abundant protein (LEA) proteins. It has been shown that the synthesis of LEA proteins and heat shock proteins (HSP) is associated with longevity [52][53][54][55]. Changes were reported in gene transcript abundance of Arabidopsis during seed maturation and desiccation, related to regulation of LEA and heat-shock proteins, DNA repair, organelle protein synthesis, decrease in the metabolism of carbohydrate, amino acid and nucleic acids, sugar transport, abiotic stress, starch synthesis, synthesis of storage proteins and synthesis of hormones [10]. LEA proteins are synthesized at the end of seed formation and are involved in protecting the plant from damage caused by environmental stresses, especially drought, cold and salinity, and are particularly related to protecting mitochondrial membranes from dehydration damage. Heat shock proteins (HSP) are molecular chaperones produced by cells that oppose stress-induced denaturation of other proteins [56,57].
In a genome-wide association study (GWAS) with A. thaliana using transgenic plants, knockout mutants for late embryogenesis abundant (LEA) protein demonstrated a drastic reduction in germination after 18 months of natural aging of the seeds, as well as in artificial aging treatments. Also, a mutant for another protein related to photosystem I (PSAD1) was also reported to exhibit the same patterns of low germination [58]. Thus, the results obtained here indicate that LEA may be a target gene for the development of future molecular tests in A. fraxinifolium, as well as the common proteins among the treatments identified in GO enrichment for response to light and UV stimulus.
Other changes in metabolism that affect seed longevity are often associated with oxidative damage, such as lipid peroxidation and formation of reactive oxygen species [17,59]. Several studies have indicated the presence of a large number of proteins involved in the response to oxidative stress in dry mature seeds and in germination [60][61][62]. In addition, antioxidants such as glutathione [63], tocopherols [64] and flavonoids present in the integument [65] also play a role in longevity by relieving the oxidation that occurs during storage. In our study, antioxidant like glutathione S transferase U17 were identified among the MINAS up-regulated DEGs of aged seeds. To control cell damage caused by free radicals, seeds have developed a detoxification mechanism that includes antioxidant enzymes like catalase, ascorbate peroxidase, glutathione peroxidase, glutathione reductase, among others [66]. In addition, ubiquitin proteins were found to be down-regulated in the treatments. Ubiquitin proteins play a role in the integration of environmental stimuli and signaling pathways, which result in complex interactions in response to environmental adversities, hormone responses, plant growth and development (also seed longevity), and are involved in the protection system [67][68][69][70][71].
When considering the differences between the two accessions, for MINAS the most abundant terms for the up-regulation genes are related to DNA repair, chromatin organization, transcription regulation and protein ubiquitination. On the other hand, in the GOIAS accession, the most abundant terms are transcriptional regulation, stimulus response, and transport. DNA repair is associated with seed longevity, so this intense activity accompanied by DNA synthesis is indicative of germination, where accumulated DNA damage is repaired early in imbibition [72][73][74][75]. However, in both accessions the Gene Ontology enrichment indicates processes of gene silencing regulation. In MINAS, the most representative enrichment category is the production of miRNAs involved in gene silencing by miRNA; whereas in GOIAS Gene Ontology, enrichment showed a high fold enrichment for gene silencing. The most representative term is negative regulation of gene silencing, followed by several processes involved in the cellular development process, possibly indicating that seeds are preparing to enter cell division. This scenario reinforces the greater seed viability of the GOIAS accession as shown in Table 1 by the higher germination compared to MINAS, in which the latter is already in advanced cellular and nuclear organization in relation to germination.
The control of gene expression during development requires a set of protein complexes that act on chromatin, methylation sites, histones and as transcription factors that modulate expression. Thus, the identification of such factors/mediators associated with transcription are of great importance transcriptome studies, including on seed viability. We identified several transcription factors that have already been associated with germination and embryo development in plants. Considering the transcription factors differentially expressed in the treatments, there is a large presence of transcriptional complex mediators and transcription factor TFIID associated with RNA polymerase II. TFIID has a central role in the transcription complex of RNA polymerase [76]. The mediators are co-factors which can increase or decrease expression and are also related to signaling pathways in plants [77][78][79]. Some mediators, such as MED21 in Arabidopsis, are required for embryo development and cotyledon expansion [80]. In addition, some mediators are described as being related to hormones, such as brassinosteroid and abscissic acid [81]. Regarding transcription factors, we identified the transcription factor RF2b, a bZIP (basic leucine zipper) associated as a regulator of expression in response to tungro disease in rice [82]. Other studies indicate that bZIPs are related to seed maturation in Arabidopsis and peanuts [83,84]. Another one, transcription factor UNE10, is related to seed desiccation sensitivity in Quercus [85] and in the regulation of cotyledon germination in Camellia oleifera [86]. The transcriptional adapter ADA2b is related to histone modifications, as well as affecting development in Arabidopsis [87,88].
Present only in the comparison of GOIAS control vs. GOIAS aged-seeds, ethyleneresponsive transcription factor RAP2-12 is associated with gene expression under hypoxia in Arabidopsis, contributing to control oxidative stress situations, where the overexpression of this type of transcription factor increased survival of plants in mutant plants [89][90][91]. Furthermore, there is the presence of transcriptional activator DEMETER. These are associated with the DNA demethylase gene, acting on the plant gene imprinting and modifying the chromatin structure [92][93][94]. Another group of factors associated with germination are calmodulin binding transcription activators, reported as essential to Na+ homeostasis, hormonal signaling pathways and processes related to the development of plants [95][96][97]. In MINAS control vs. MINAS aged-seeds, the transcription factor GTE10 (Global Transcription Factor Group E) was found, which is associated with signaling of ABA and sugar [98]. In GOIAS aged-seeds vs. MINAS aged-seeds comparison, the transcriptional corepressor SEUSS is present, associated with embryonic development in Arabidopsis, and its regulation of gene expression is related in stem cells [99,100].
Overall, a string of transcription factors and associated genes appear to play a role in the response to seed ageing. Some of these have already been described in seed germination or viability in other species, while others are novel in this context. Thus, the investigation of molecular mechanisms of seed longevity in this study can contribute to this and other native species. The indication of differentially expressed genes such as LEA and others from the photosystem, GRP and the ubiquitin-conjugating enzyme can serve as the basis for future investigations and contribute to the functional characterization of the seed aging process in A. fraxinifolium.

Sampling
Seed samples were obtained in 2013 from two mother trees of Cerrado biome at Goiás (coordinates 14 • 39 32.00 S and 48 • 35 21.00 W) and Minas Gerais (coordinates 16 • 45 38.80 S and 43 • 53 02.10 W) Brazilian States. After cleaning, seeds of Minas Gerais (MINAS accession) and Goiás (GOIAS accession) were stored at 18% relative humidity (RH) and 5 • C. Following a previous study of seed longevity and viability [101], these two accessions were selected based on their longevity, which were high for GOIAS and low for MINAS. Aged and control seeds were used for both accessions (Figure 1). Each accession obtained corresponds to the seeds from one tree. The seed collection was authorized for activities with a scientific purpose, under number 41166-1, from Chico Mendes Institute for Biodiversity Conservation (ICMBio), which is linked to the Ministry of the Environment (MMA).

Artificial Aging Treatment
Embryonic axes were extracted from control and artificially aged seed after 21 days (temperature of 45 • C and RH of 60%) according to [101]. Briefly, the viability data were transformed into probit units for sigma and P 50 calculations [102]. Sigma indicates the time, in days, that shows a decrease of one probit unit, and P 50 refers to the time, in days, that seeds lose 50% of viability. Germination tests were performed according to Regras para Análise de Sementes [103]. For this, four replicates of 25 seeds were placed on filter paper rolls and moistened with 2.5 times the weight in deionized distilled water, then rolled up and placed in transparent plastic bags. Seeds were placed in a BOD-type incubator at 30 • C for ten days with a photoperiod of eight hours of dark and 16 hours of light using tubular fluorescent lamps 20WT1 with a fluency rate of 30 µmol m −2 s −1 . Germination was scored daily and seeds were considered germinated at least 2 mm of radicle protrusion. Prior germination tests established 30 • C as the optimal germination temperature. The embryonic axes extracted from seeds without protruded radicles after 20 h imbibition were immersed in liquid nitrogen and stored at −80 • C until conducting the RNA extraction [101].

Library Construction and Transcriptome Sequencing
RNA extraction was done by pooling 100 embryo axes per sample. Ageing conditions used for RNA-Seq are described in the Artificial Aging Treatment section. Embryos were ground in liquid nitrogen and RNA was extracted using a NucleoSpin ® RNA Plant kit (Macherey-Nagel, Dürden, Germany) following the manufacturer's instructions. RNA concentration and purity were determined using a spectrophotometer (Nanodrop-2000, Thermofisher Scientific, Waltham, MA, USA). Library construction was done using a TrueSeq RNA Library Prep Kit V2 ® kit (Illumina, San Diego, CA, USA) and sequenced in a single lane 100 bp paired-end run in HiSeq2500 (Illumina, San Diego, CA, USA).

De Novo Assembly, Functional Annotation and Differential Gene Expression
Raw reads were trimmed and filtered with QV lower than 30 with Trimmomatic [104]. Transcriptome de novo assembly was done using Trinity [105]. Transcripts were annotated by Trinotate (https://trinotate.github.io/; accessed on 1 January 2022) using the Swissprot-UniProt database. Transcripts abundances were calculated by RSEM [106]. The analysis of differentially expressed genes (DEGs) was performed as follows: (1) GOIAS aged vs. control; (2) MINAS aged vs. control; (3) GOIAS vs. MINAS aged seeds; and (4) GOIAS vs. MINAS control seeds. A differential gene expression analysis was performed using DEseq [107] by the Benjamin-Hochberg adjusted p-value method (Padj) ≤ 0.05 as cut-off and log2fold change ≥2. Uniprot ID list from differentially expressed transcripts were used for functional analysis annotation and enzyme commission numbers (EC numbers) were assigned to differentially expressed genes according to the functional annotation data retrieved from Uniprot through Blast2GO software v. 5.2.1 (Valencia, Spain) [108]. Gene Ontology (GO) enrichment analysis was performed using the ShinyGO software [109] using Arabidopsis thaliana genome as a model and with a significance level of 0.05 (hypergeometric test) with False Discovery Rate (FDR) as an adjustment method. The top 20 enriched pathways were used for gene function investigation and functional category clustering.