Comparative Genomics of Seasonal Senescence in Forest Trees

In the course of evolution, both flowering plants and some gymnosperms have developed such an adaptation to winter and unfavorable living conditions as deciduousness. Of particular interest is Siberian larch (Larix sibirica Ledeb.), which is the only species in the pine family (Pinaceae) with a seasonal deciduousness. New generation sequencing technologies make it possible to study this phenomenon at the genomic level and to reveal the genetic mechanisms of leaf and needle aging in angiosperms and gymnosperms. Using a comparative analysis of the genomes of evergreen and deciduous trees, it was found that the genes that control EXORDIUM LIKE 2 (EXL2) and DORMANCY-ASSOCIATED PROTEIN 1 (DRM1) proteins are most represented in Siberian larch, while an excess of genes that control proteins acting as immune receptors were found in evergreens. Orthologs from the family of genes that control leucine-rich repeat receptor-like kinases (LRR-RLK) contributed mostly to the distinction between evergreens and deciduous plants.


Introduction
Deciduousness and seasonal senescence are evolutionarily important adaptations to winter and unfavorable living conditions in higher plants (both angiosperms and gymnosperms). Genetic regulation of the seasonal senescence is of primary interest for us. Deciduousness in broad-leaved woody plants of the boreal zone is clearly related to the autumn season: the vast majority of boreal plant species lose their foliage at the end of the vegetation season. Obviously, this reaction is strongly correlated with weather and is regulated by environmental factors such as temperature and daylight length. In cold winter, most perennials hibernate and shed their foliage entirely. Simultaneously, the metabolism of these plants changes and slows down.
The majority of conifer species in the boreal climatic zone are evergreen. Needles fall off throughout the entire year with an intensity that does not have a pronounced seasonality. It is typical for evergreens and the vast majority of conifers, which are gymnosperms. Larches in the Genus Larix, including Larix sibirica Ledeb. represent an exception among conifers, shedding all the needles in the fall, like most flowering plants do.

Genetic Regulation of Leaf Senescence
The genes controlling proteins of the NAC and WRKY families are among the most common and well-studied genetic regulators of leaf senescence [1]. Expression of these genes depends on both internal (changes in the level of reactive oxygen species (ROS), PH, metal ions, and hormones) and external (drought, cold, and light level) signals [1].
In addition, the onset of the leaf aging stage depends on the rate of development of the leaves themselves from the moment of their formation. Coordination of the stages of leaf development is associated with a number of physiological processes and associated with them are gene expression such as cell growth and proliferation (genes BOP1, KNAT1, TCP4, miR396, GPFs, GIFs, ARGOS, ANT, and TOR), hormonal regulation associated with cytokinins (AHKs and CRFs) and auxins (ARF2 and SAUR36), transmission of signals of oxidative stress (FHY3, FAR1 and REF), light transmission of signals (phyA, PHYb, PIFs, and GLK2), and circadian regulation (TOC1) [1].
Gene expression can be regulated at several levels: transcriptional, post-transcriptional, translational, and post-translational.

Regulation at the Transcriptional Level
The regulation of gene expression at the transcriptional level is carried out mainly by modifying histones and enzymes that model chromatin. It has been shown that mutations in the helicase domain in SWI2/SNF2-like chromatin-remodeling proteins, DRD1 and DDM1, delay leaf senescence in Arabidopsis thaliana and also increase its lifespan [2]. Normally, these proteins, participating with ATP, change the composition and arrangement of nucleosomes, allowing other proteins to gain access to DNA, in particular DNA methyltransferases, which is necessary for suppressing gene expression (silencing). Histone H3K4 demethylase JMJ16 negatively regulates leaf senescence in Arabidopsis, at least partly through repressing the expression of positive regulators of leaf senescence, WRKY53 and SAG201 [3].Yellowing of leaves due to the destruction of chlorophyll is one of the visible manifestations of leaf aging. Chlorophyll degrades with the participation of EIN3, ANAC092 (ethylene-mediated degradation) [4] and MYC2/3/4 (in the presence of methyl jasmonate) genes [5]. The transcription factor ANAC046 provides a positive feedback in the regulation of leaf senescence; it directly binds to promoters of genes associated with chlorophyll catabolism (NYC1, NYE, and NYE2), activating their expression [6]. The genes ANAC042, ANAC017/082/090, ANAC083, WRKY54/70, WRKY18/40/60, bHLH03/13/14/17, MYB44, LUX, TCP19, and TCP20 inhibit the leaf senescence [1].

Regulation at the Post-Transcriptional Level
Several studies reported post-transcriptional regulation of leaf senescence by some miRNAs [7][8][9][10]. For instance, miR164 expression gradually decreases with leaf aging through negative regulation by EIN2 [7], which leads to the elaborate up-regulation of ORE1 expression (a positive regulator of cell death and leaf senescence in Arabidopsis) [8]. Plants with over-expression of miR319 exhibit the phenotype of delayed leaf senescence; the target is TCP genes that play the key role in leaf senescence [9]. MiR390 activates the production of tasiRNAs (formed from TAS3) targeted on ARF2 gene (positive regulator of leaf senescence) [10].
Mutations in the HDA9 (histone deacetylase 9) gene slow down leaf aging [11]. It was suggested that HDA9 in complex with a SANT domain-containing protein POWERDRESS (PWR) (that facilitates the transport of HDA9 from the cytoplasm into the nucleus) and transcription factor WRKY53 (that attaches to the W-box) binds the promoters of the key negative regulators of aging (APG9, NPX1, and WRKY57) and then deacetylates lysine residues at the n-terminal portion of histones, thereby suppressing gene expression.

Regulation at the Translational Level
A mutation in the ORE4 gene is known for the delay of senescence and the growth rate of leaves [12]. It was suggested that it is associated with a decrease in the photosynthesis rate due to inhibiting the expression of the PRPSI7 gene. Interestingly, the mutation delayed natural aging, but it had a minor effect on aging induced by darkness, abscisic acid (ABA), methyl jasmonate (MeJA), and ethylene.

Regulation at the Post-Translational Level
Post-translational regulation involves phosphorylation, glycosylation, ubiquitylation, methylation, and acetylation that affect the conformation, activity, stability, and localization of proteins [13]. WRKY53 enzyme provides an example of post-translational regulation; it is a positive regulator of leaf senescence. This enzyme can be phosphorylated by MEKK1 (mitogen-activated protein kinase kinase kinase 1), resulting in the growth of its ability to bind to target promoters in DNA [14]. It is deactivated due to ubiquitination of the WRKY53 protein by ubiquitin ligase UPL5 [15]. RECEPTOR PROTEIN KINASE 1 (RPK1) has an important regulatory role in ABA-mediated and age-dependent leaf senescence. Loss-offunction mutations in RPK1 lead to a significant delay in ABA-induced and age-dependent leaf senescence [16].

Results
The number of eukaryotic and undefined orthologs selected by the EggNOG program in the conifer genomes for the comparative analysis are presented in Table 1. Six orthogroups were found with a statistically significant quantitative difference in the number of orthologs within an orthogroup between evergreen and deciduous trees based on the two-sample t-test ( Figure 1, Table 2). Quantitative analysis of the representation of protein groups in the studied organisms performed using the OrthoFinder program. Yellow and green colors highlight deciduous and evergreen species, respectively. The multigenic phylogenetic tree was obtained using the STAG method for inferring a species tree from a set of single gene trees (supertree approach). Red color highlights less than the average number of proteins in the group, blue highlights more. The first four groups are immune receptors, while the last two are sugar-associated: EXORDIUM-like 2 is involved in sugar sensing, and dormancy-associated protein 1 is involved in responses to fructose, glucose, and sucrose. A smaller number of proteins were observed in L. sibirica than in evergreen conifers in the first three groups, and it was the opposite in the next three groups. In the fourth group, the protein value in Pinus lambertiana, unlike other conifers, was slightly above average, similar to L. sibirica, but still significantly lower than in L. sibirica (p = 4.2 × 10 −6 , Table 3). It is important to note that in the first group, the pattern of protein distribution in deciduous and evergreens was the same for both gymnosperms and angiosperms-the number of proteins was much lower in deciduous species than in evergreens. An orthogroup is a set of genes from multiple species descended from a single gene in the last common ancestor (LCA) of that set of species [19]. There were orthogroups presented in only a few species, but we selected only those that were present in all organisms. However, the remaining orthogroups that are represented in only a few species are also of interest for further research. In general, the generated phylogenetic tree in Figure 1 is in agreement with consensual phylogenetic relationships for these species based on traditional markers [20,21]. The OrthoFinder software was used to carry out a comparative analysis of the proteomes of evergreen and deciduous trees. As a result, a difference was found in the representation of orthologs associated with cell reception and signaling. The common domain for these proteins was a protein kinase domain ("Pkinase"; identifier in the PFAM database-PF00069). Among the proteins containing the "Pkinase" domain, 1496 orthogroups were identified.
We additionally tested under-and overrepresentation of ortholog members in the six selected orthogroups for conifers using Fisher's exact test for a 2 × 2 contingency table. The obtained two-tailed p-values that supported the selection of these orthogroups that demonstrated under-or overrepresentation of ortholog members in deciduous L. sibirica vs. all other four evergreen conifer species (Table 3).
Principal components analysis showed that for proteins containing the protein kinase domain (Table 4) the angiosperms and gymnosperms formed two separate clusters ( Figure 2). In the angiosperms cluster, a certain order in the mutual arrangement of the studied taxa is revealed. In general, evergreens are located in the upper half of the cluster, while deciduous ones are in the lower half. A similar pattern is observed in the cluster of gymnosperms: the only deciduous species (Siberian larch) occupies a separate position in this cluster. In general, order of mutual arrangement of taxa according to deciduousness was similar in angiosperms and gymnosperms.  The differences in the representation of some orthologues between evergreen and deciduous forest trees are shown in Figure 3. Evergreens, both gymnosperms and angiosperms, differed from deciduous species in the representation of At3g47570-like LRRkinases orthologues ( Figure 3A). For other orthogroups among angiosperms, there were no differences between deciduous and evergreen species (Figure 3B-F). However, peculiarities were found in gymnosperms, namely: L. sibirica had more proteins in the orthogroups of receptor-like lectin kinases of the L-type ( Figure 3B), EXORDIUM-like 2 proteins (Figure 3E), and DRM1-like proteins ( Figure 3F); while evergreens had a higher representation of proteins in the orthogroups LRK10L serine/threonine-protein kinases ( Figure 3C) and disease resistance protein ADR1 ( Figure 3D).

Discussion
The orthologues of the At3g47570 Arabidopsis gene, encoding serine/threonine kinaselike proteins containing leucine repeats contributed mostly into the distinction between evergreens and deciduous plants ( Figure 3A). This observation is in agreement with published data on the increase in activity of this kinase with the leaf age in Arabidopsis thaliana [22]. The At3g47570 gene is also a target of the WRKY40 transcription factor, which is involved in the regulation of the protective response when exposed to pathogenic microflora [23].
For Siberian larch, an enriched cluster was found that comprises the orthologs of the receptor-like L-type lectin kinase (LecRK) ( Figure 3B). The proteins of this type are involved in important biological process of the protection from pathogens [24]. It was reported that overexpression of LecRK-IX.1 or LecRK-IX.2 in Arabidopsis increased Phytophthora resistance but also induced cell death [25]. However, cell death induced by overexpression of LecRK-IX.1 and LecRK-IX.2 was not correlated with leaf senescence [25].
Contrary to deciduous conifers, evergreen conifers are characterized by a predominance of orthologs of the receptor-like serine/threonine protein kinase LRK-10L (biological process-innate cellular response) ( Figure 3C). The loss-of-function mutants of the Arabidopsis orthologue of the wheat LRK10 gene showed ABA-insensitive and drought stress-sensitive phenotypes [26].
The difference between deciduous and evergreen conifers in the abundance of disease resistance associated proteins (LRR-kinases, LRK10L serine/threonine-protein kinases, ADR1 and receptor-like lectin kinases of the L-type) was also found (Table 3). Additionally, they are all immune receptors. ADR1 proteins contain NB-ARC domain, and it is known that NB-ARC proteins function as molecular switches regulating many processes, including immunity and apoptosis [4]. Resistance (R) genes are associated with preformed and inducible defenses [27]. It was also suggested that R genes likely contribute to the remarkable longevity of Ginkgo biloba [28]. ADR1 orthologues are most represented in evergreen conifers ( Figure 3D). They convey broad spectrum disease resistance. Overexpression of ADR1 also increased drought tolerance in Arabidopsis [29].
In the reconstructed proteome of larch, the orthologs of the EXL2 ( Figure 3E) and DRM1 ( Figure 3F) proteins were more abundant in comparison to evergreen trees. The EXL2 protein is involved in the signaling pathway of sugars; an increase in the expression of this protein occurs with a decrease in the availability of carbon [30]. The activity of the dormancy-associated DRM1 protein increases during leaf senescence and is associated with the response to the level of glucose, sucrose, and fructose [31].

Materials and Methods
Genomes and their annotations for 14 studied species were downloaded from the open NCBI GenBank database (https://www.ncbi.nlm.nih.gov. accessed on 1 February 2020) and presented in Table 5, including the genome of L. sibirica [32]. Extraction of coding nucleotide sequences of genes in the studied genomes was carried out using Cufflinks software package v. 2.2.1 [33]. Then, the retrieved sequences were translated into amino acids using EMBOSS software v. 6.6.0 [34].
We double-checked the plant genomes used in our study to exclude foreign sequences representing bacteria, viruses, fungi, etc., from the analysis of the plant specific genes. Subsequently, all nucleotide sequences encoding proteins isolated from the assembled genomes were checked against genes representing archaea, bacteria, eukaryotes, and viruses using EggNOG-mapper v. 2.1.6 [35] to strictly select only the eukaryotic orthologs for further analysis and also undefined orthologs that do not belong archaea, bacteria, and viruses. Annotation of protein sequences by functional domains was performed using InterProScan software v. 79.0 [36].
The orthologs search was performed using OrthoFinder software v. 2.3.12 [19]. For the analysis of protein data, we used with default parameters DIAMOND for sequence similarity search [37], mafft for multiple sequence alignment, and fasttree for tree inference using a maximum likelihood approach with the JTT model [38]. The multigenic phylogenetic tree was obtained using the STAG method for inferring a species tree from a set of single gene trees (supertree approach) [39] and was visualized using Dendroscope 3 software v. 3.7.2 [40,41]. Upon clustering of the proteins into orthologous groups, the orthogroups by which L. sibirica differed from other conifers were analyzed using the BLAST program v. 2.11.0 in the UniProt database [42]. All genes in an orthogroup were supposedly descended from a single ancestral gene [27]. Principal component analysis (PCA) was performed to cluster orthogroups using the Past3 software v. 3.22 [43] and pairwise Euclidean distances based on differences in the representation of genes in orthogroups.

Conclusions
The presented data show that the proteins functioning as immune receptors are the most abundant in evergreen trees. They also provide resistance to pathogens and may also be associated to the longer life expectancy of conifers. It supports the role of immune defense in the regulation of leaf lifespan. The orthologues of the LRR receptor-like serine/threonine-protein kinase At3g47570 made the greatest contribution to the difference between evergreen and deciduous plants, which is in agreement with published data on their activity increase during leaf senescence in Arabidopsis thaliana, whereas other orthogroups in L. sibirica were also very significantly different in number than others. It is unlikely that studying more species would change the main findings and conclusions, but certainly additional species should be studied and can provide more details on genetic differences between evergreen and deciduous.  Data Availability Statement: All data are available in the article.

Acknowledgments:
We are thankful to Dmitry Kuzmin and Vadim Sharov (Siberian Federal University, Krasnoyarsk, Russia) for their help with computing and support in data processing. We thank two anonymous reviewers for thoroughly and promptly reviewing our manuscript and their valuable recommendations and comments that helped us improve the manuscript. We acknowledge support from the Open Access Publication Funds of the University of Göttingen.

Conflicts of Interest:
The authors declare no conflict of interest.