Genome-Wide Analysis of the Late Embryogenesis Abundant (LEA) and Abscisic Acid-, Stress-, and Ripening-Induced (ASR) Gene Superfamily from Canavalia rosea and Their Roles in Salinity/Alkaline and Drought Tolerance

Canavalia rosea (bay bean), distributing in coastal areas or islands in tropical and subtropical regions, is an extremophile halophyte with good adaptability to seawater and drought. Late embryogenesis abundant (LEA) proteins typically accumulate in response to various abiotic stresses, including dehydration, salinity, high temperature, and cold, or during the late stage of seed development. Abscisic acid-, stress-, and ripening-induced (ASR) genes are stress and developmentally regulated plant-specific genes. In this study, we reported the first comprehensive survey of the LEA and ASR gene superfamily in C. rosea. A total of 84 CrLEAs and three CrASRs were identified in C. rosea and classified into nine groups. All CrLEAs and CrASRs harbored the conserved motif for their family proteins. Our results revealed that the CrLEA genes were widely distributed in different chromosomes, and all of the CrLEA/CrASR genes showed wide expression features in different tissues in C. rosea plants. Additionally, we introduced 10 genes from different groups into yeast to assess the functions of the CrLEAs/CrASRs. These results contribute to our understanding of LEA/ASR genes from halophytes and provide robust candidate genes for functional investigations in plant species adapted to extreme environments.


Introduction
Abiotic stresses, such as high temperature, high salinity/alkaline, extreme aridity, and cold or freezing, influence plant growth and often cause deficit in cellular water, thereby causing a series of changes, including biochemical alterations in gene expression, osmolytes, and the accumulation of specific proteins involved in the stress response. Late embryogenesis abundant proteins (LEAs) and abscisic acid-, stress-, and ripening-induced proteins (ASRs) are supposed to play crucial roles in the processes of drought resistance or other water-deficit stresses [1][2][3][4][5]. LEA genes have been characterized in plants ranging from algae to higher plants, as well as in invertebrates, fungi, and bacteria [6]. LEA proteins, which are intrinsically disordered proteins, often have the feature of high hydrophilicity and are identified as hydrophilins [7,8]. LEA proteins play protective roles during cell on tropical and subtropical coral islands and coastal zones. There is no doubt that C. rosea has developed elaborate mechanisms to adapt to the high sanity/alkaline drought stress caused by saltwater intrusion or freshwater deficit in sand dunes or coral reefs, and thus the identification of stress-relevant genes in this species is quite necessary. In the present study, based on the genome sequence of C. rosea, we explored the number of LEA/ASR proteins and their families as well as their structural characterization, gene chromosomal location, and tissue-or habitat-specific expression patterns in C. rosea. The availability of a whole-genome sequence of C. rosea will facilitate genome-wide analysis for identifying the evolutionary relationships of C. rosea LEAs/ASRs with related leguminous species. Additionally, the genome provides a source for mining the genetic resources, including LEA/ASR genes, of C. rosea plants, which have evolved to survive under extreme stress.

Identification of the C. rosea LEA Family and ASR Family
Based on the Hidden Markov model profile search and protein Blast research results, a total of 84 CrLEA and three CrASR members were identified and annotated from the C. rosea genome database. The set of C. sativa LEA proteins included three LEA_1 members, 60 LEA_2 members, six LEA_3 members, two LEA_4 members, two LEA_5 members, two LEA_6 members, five dehydrins, four SMP members, and three ASR members (Table 1 and  Table S2). The predicted CrLEAs and CrASRs were named according to their subfamilies and the C. rosea gene nomenclature system ( Table 1). The sequence information of all 84 CrLEA and three CrASR genes can be found in Table S2.
The physiochemical properties were assessed using a series of bioinformatics programs. In this large and varied family, the protein lengths of all CrLEAs and CrASRs ranged from 82 to 499 aa, with a molecular weight (MW) ranging from 8.58 (CrDHN1) to 57.65 (CrLEA2-35) kDa (Table 1). The theoretical isoelectric points (PI) of the CrLEAs and CrASRs ranged from 4.77 (CrSMP3) to , and most of them (67 members, 75%) were considered to be basic (pI > 7). The instability index ranged between 62.03 (CrLEA2-37) and −5.30 (CrDHN5), and averaged around 39, and a small number of members (38 members, 43%) had a high instability index (II) (>40), which indicated that most of these members might be stable. Most CrLEAs and CrASRs presented calculated grand average of hydropathy (GRAVY) values of less than 0, implying that these proteins are quite hydrophilic. Conversely, the aliphatic index (AI) assessment showed that most members had lower values, indicating that only a small number of these proteins appeared to be lipophilic or hydrophobic. We calculated the contents of disordered amino acids in all CrLEAs and CrASRs according to the prediction of PrDOS, and the results indicated that most of CrLEA_2s and all five CrSMPs had the lower values (<30%) than other CrLEAs or CrASRs (>40% in most members) ( Table 1), which is coincided with LEA_2 subfamily has been considered atypical because of containing more hydrophobic amino acids and possessing more defined secondary structure in solution than the other LEA subfamilies [20]. We also summarized the 3D structures of all CrLEAs and CrASRs ( Figure S1), and the results showed that all of CrLEA_2s comprised several β-sheets and presented defined secondary structure, while the other CrLEAs and CrASRs, except that two Cr-LEA_4s contained two α-helices, showed disordered structures, which was in complete accordance with the features of intrinsically disordered proteins (IDPs) [23]. The detailed disorder profile plots of all CrLEAs and CrASRs predicted by PrDOS program were also summarized in Supplementary Materials Figure S2.
The subcellular localization prediction revealed that almost all of the CrLEA and CrASR proteins were present in all subcellular compartments, including the nucleus, cytoplasm, chloroplast, and mitochondria, and even extracellular regions or secretory pathways. Exceptionally, the dehydrin subfamily was more likely to be located in the cell nucleus, and the SMP subfamily was more likely to exist in the cytoplasm. Regarding the CrASR members, they were predicted locating both in the nucleus and in the cytoplasm (Table 1), which is consistent with their possible functions being transcription factors or chaperones, as previous reports in other species [11,12,14].

Phylogenetic Analysis of CrLEA Proteins and CrASR Proteins
To systematically classify the C. rosea LEA and ASR genes and uncover the evolutionary relationships among this diverse gene family, an unrooted phylogenetic tree of 84 CrLEA and three CrASR members was constructed with MEGA6 using neighbor-joining analysis based on the sequences of the CrLEA and CrASR proteins ( Figure 1). The members of this superfamily, including a total of 87 members, clustered together according to their Pfam domains and were classified into two main branches: The LEA_2 subfamily was the largest, with only one distinctive member, CrLEA6-1; the other LEA_1, LEA_3, LEA_4, LEA_5, SMP, DHN, ASR members, and CrLEA6-2 formed another branch. Interestingly, according to the evolutionary relationships, the CrASR subfamily showed a closer relationship with the LEA_4 and LEA_1 subfamilies. There were 29 sister gene pairs in the evolutionary tree, with a bootstrap support value >90% (Figure 1). Among them, nine sub-branches, including

Gene Structures and Conserved Motifs of CrLEAs and CrASRs in C. rosea
The structure of genes should be of important reference significance for estimating their evolutionary relationships and functional expression patterns, and the conserved motif analyses of proteins are greatly valuable for determining their biochemical functions. To investigate the structural characteristics of CrLEAs and CrASRs, the exon-intron As plant-specific ASRs have been considered as a small gene family independent from the plant LEA family in some previous reports [11], here we also constructed a single CrASR phylogenetic tree using MEGA6 ( Figure S3). The sequence information of all plant ASR proteins is presented in Table S2. Our result indicated that the ASR members did not show strict evolutionary relationships within the same species, or even within the same family. The sequences varied vastly in the same species ASRs, which indicated that this family was not particularly evolutionarily conserved ( Figure S3).

Gene Structures and Conserved Motifs of CrLEAs and CrASRs in C. rosea
The structure of genes should be of important reference significance for estimating their evolutionary relationships and functional expression patterns, and the conserved motif analyses of proteins are greatly valuable for determining their biochemical functions. To investigate the structural characteristics of CrLEAs and CrASRs, the exon-intron structures ( Figure 2A) and conserved motifs ( Figure 2B) of 87 member genes or proteins were analyzed. Gene structure analysis showed that the majority of the CrLEAs and CrASRs contained 0 or one intron, and only 12 CrLEA genes (CrLEA2-10, CrLEA2-13, CrLEA2-18, CrLEA2-27, CrLEA2-35, CrLEA2-36, CrLEA2-37, CrLEA2-41, CrLEA2-47, CrLEA2-55, CrLEA6-1, and CrSMP2) possessed two or three introns ( Figure 2A). The CrLEA_2 subfamily had the most intronless members, which might indicate that these members probably evolved recently through retrotransposon processes under a certain degree of evolutionary pressure imposed by natural selection. Meanwhile, in each subfamily, some members possessed similar exon-intron structures and intron and exon lengths, which also indicated that these genes possessed closer evolutionary relationships than the other members.
The conserved motifs of the CrLEA and CrASR proteins were analyzed and compared ( Figure 2B). Due to the smaller molecular lengths of this protein superfamily, we separately analyzed three independent conserved motifs in each CrLEA and CrASR subfamily. These conserved motifs are listed in Table S3. The results suggested that members of each subfamily possessed similar specific conserved motifs, implying functional specificities of different subfamily proteins. We also compared the conserved motifs with the Pfam domain searched by the Pfam database, and the preliminary results suggested that the conserved motif prediction was consistent with the Pfam domain localization in most CrLEA or CrASR members ( Figure 2C).

Chromosomal Locations and Evolutionary Characterization of CrLEAs and CrASRs
To further investigate the evolutionary relationships and genomic position of the CrLEA/CrASR superfamily, we mapped their chromosomal locations according to their gene locus in Table 1. Most of the CrLEAs/CrASRs were extensively and evenly distributed on the 11 chromosomes. High-density LEA_2 gene clusters were identified in certain chromosomal regions, including chromosomes 2, 3, 7, 8, 9, 10, and 11 ( Figure 3), which indicated that these cluster-chromosomal locations of the CrLEA_2 genes may be the result of gene duplication.
Gene family expansion occurs via three mechanisms: Tandem duplication, segmental duplication, and whole-genome duplication (WGD). Tandem duplication and segmental duplication are essential for the evolution of gene families in order to adapt to varying environmental conditions. The results showed that 19 pairs of segmentally duplicated genes and seven pairs of tandem duplicated genes were identified in 84 C. rosea LEA genes (Table 2), and one tandem duplication gene cluster was found on the end of the 11th chromosome (CrLEA2-58/CrLEA2-59/CrLEA2-60) ( Figure 3).
family. These conserved motifs are listed in Table S3. The results suggested that members of each subfamily possessed similar specific conserved motifs, implying functional specificities of different subfamily proteins. We also compared the conserved motifs with the Pfam domain searched by the Pfam database, and the preliminary results suggested that the conserved motif prediction was consistent with the Pfam domain localization in most CrLEA or CrASR members ( Figure 2C). Gene family expansion occurs via three mechanisms: Tandem duplication, segmental duplication, and whole-genome duplication (WGD). Tandem duplication and segmental duplication are essential for the evolution of gene families in order to adapt to varying environmental conditions. The results showed that 19 pairs of segmentally duplicated genes and seven pairs of tandem duplicated genes were identified in 84 C. rosea LEA genes ( Table 2), and one tandem duplication gene cluster was found on the end of the 11th chromosome (CrLEA2-58/CrLEA2-59/CrLEA2-60) ( Figure 3). Synonymous (Ks) and nonsynonymous (Ka) values were calculated to explore the selective pressures on duplicated CrLEAs based on all of the nucleotide sequences of CrLEAs. In general, when the ratio is greater than 1, the replicated gene is under positive selection, whereas the replicated gene is neutrally selected when the ratio is equal to 1, and the replicated gene is under purifying selection when the ratio is less than 1. The results revealed that most of the CrLEAs possessed Ka/Ks ratios greater than 0.1. The Ka/Ks for a paralogous gene pair of CrLEAs was 0.101-0.750 with a mean value of ~0.307. These results indicated that they appeared to have undergone extensive purifying selection during evolution and might preferentially conserve function and structure under selective pressure ( Table 2). The distribution of segmentally duplicated CrLEA genes in the C. rosea chromosomes is simply illustrated in Figure 4. Synonymous (Ks) and nonsynonymous (Ka) values were calculated to explore the selective pressures on duplicated CrLEAs based on all of the nucleotide sequences of CrLEAs. In general, when the ratio is greater than 1, the replicated gene is under positive selection, whereas the replicated gene is neutrally selected when the ratio is equal to 1, and the replicated gene is under purifying selection when the ratio is less than 1. The results revealed that most of the CrLEAs possessed Ka/Ks ratios greater than 0.1. The Ka/Ks for a paralogous gene pair of CrLEAs was 0.101-0.750 with a mean value of~0.307. These results indicated that they appeared to have undergone extensive purifying selection during evolution and might preferentially conserve function and structure under selective pressure ( Table 2). The distribution of segmentally duplicated CrLEA genes in the C. rosea chromosomes is simply illustrated in Figure 4.

Cis-Regulatory Element Analyses of CrLEAs and CrASRs
The CrLEA members showed a duplication-prone pattern ( Table 2, Figure 4), and in some subfamily members, the CrLEA/CrASR members often possessed similar gene structures and conserved amino acid motifs (Figure 2), which suggested that these members might have functional redundancy or superposition. This is also a process of adaptive evolution in which plant species promote their adaptability and improve their survival under extreme environments or special habitats by altering their DNA. Compared with the numbers or structures of genes, the promoter regions of functional genes showed more variability, which also created a more elaborate and more efficient regulatory mechanism for exercising the biological functions of the genes. To further understand the potential functions and regulatory mechanisms of CrLEAs and CrASRs, especially for exploring the possible roles of this gene family for the adaptation of C. rosea to high salinity and drought, we analyzed the promoter regions of the upstream 1000 bp of 87 putative CrLEAs and CrASRs, which are believed to play important roles in regulating the spatial and temporal expression of genes. In general, a total of 13 cis-regulatory elements were summarized in this study, including light response elements, gibberellin-responsive elements, MeJAresponsive elements, auxin response elements, salicylic acid responsiveness elements, ABRE, ERE, MYC, MYB and MBS, TC-rich repeats, LTR, and as-1. The results are shown in Figure 5A, and the interpretation and localization of these cis-regulatory elements are provided in Table S4.
In addition, ASRs and dehydrins have been reported to be crucial in the tolerance to different abiotic stresses, including drought, high salinity/alkaline, or extreme temperatures [11,41]. The LEA_4 subfamily has been suggested to have a close association with the evolution of desiccation tolerance in plants [20]. In this study, we paid special attention to the promoter regions of CrASRs (three members), CrDHNs (five members), and CrLEA_4s (two members). We summarized the abiotic stress-related cis-regulatory elements (including MYB, MYC, as-1, ABRE, MBS, ERE, LTR, and TC-rich repeats) within these 10 CrASR and CrLEA promoter regions ( Figure 5B). The categories and numbers of these elements suggested that the mechanisms regulating CrASR and CrLEA expression are involved in stress responses. As these genes had a high number of cis-regulatory elements related to ABA and drought stress responses, they were also selected for further functional analysis to obtain their detailed functions and regulatory mechanisms.

Expression Profiles of CrLEAs and CrASRs in Different Tissues and Plants Residing in Different Habitats
To obtain the expression patterns of the CrLEA and CrASR family members in different tissues, we selected five tissue types, including the roots, vines, young leaves, flowering buds, and young fruits gathered from SCBG, for detailed RNA-Seq analysis. As shown in Figure 6A, there were only four gene members (CrLEA2-22, CrLEA2-58, CrLEA2-59, and CrLEA5-1) whose transcripts could not be detected in all five tissues, which indicated that these genes were probably not expressed or only showed very low transcriptional levels. Furthermore, in combination with the CrLEA gene duplication, we did not find obvious expressive similarity between the duplicated gene pairs ( Figure 6A, Table 2), which indicated that these genes underwent genetic evolution mainly related to the regulation of gene expression patterns before their basic functions exhibited significant differentiation. Generally, the RNA-Seq analysis using different tissues showed that CrLEAs and CrASRs have tissue expression specificity, and some showed relatively higher expression levels in all tested tissues, while some were even undetectable ( Figure 6A).

Expression Profiles of CrLEAs and CrASRs in Response to Different Stressors and the ABA Treatment
Based on reports on the different LEA numbers involved in abiotic stresses [8,11,20,[41][42][43][44][45], with reference to the RNA-Seq results and the promoter region analysis, we further identified 10 genes (including three CrASRs, five CrDHNs, and two CrLEA_4s) for expression analysis in different C. rosea tissues. Our aim was to examine the spatio-temporal patterns of these genes under various abiotic stress conditions and ABA treatment, from which we simulated the stressful conditions of the tropical coral reef in the laboratory, as far as possible. It has been proved that plant ASRs are closely related with water deficit stress and drought tolerance [11,41], and the dehydrins play fundamental roles in plant response and adaptation to salinity and dehydration stresses [8,42]. LEA_4 genes, also known as group 3 [8], have been proved being strongly associated with drought tolerance in basal and angiosperm resurrection plants via ABA signaling pathway [20]. The RNA-Seq analysis showed that nine of these 10 genes displayed higher expression levels in the YX leaf sample than in the SCBG leaf sample (except for CrASR2). The qRT-PCR analysis showed that all 10 genes were induced by different stresses or ABA in different organs to various degrees, while their expression levels were specific and variable (Figure 7). The mannitol stress (simulated drought) and ABA treatment caused significantly higher expression induction than the high salinity and alkaline treatments. Generally, alkaline stress showed a slighter effect than the other three treatments. CrASR1 showed obviously elevated expression in the C. rosea roots under mannitol or ABA challenges, while CrASR3 showed marked expression induction in the C. rosea leaves or vines in response to high salt or ABA treatments. CrASR3 expression was also induced by mannitol in the roots ( Figure 7A). Distinctively, CrLEA4-1 and CrLEA4-2 were significantly induced in all C. rosea tissues with the mannitol stress and ABA treatment, and the transcripts of CrLEA4-2 showed a significant increase in the roots of C. rosea under high salt treatment ( Figure 7A). Regarding the five dehydrin genes in C. rosea, all of the CrDHNs, without exception, had strong responses to mannitol and showed extremely significant expression induction in the root tissues of the C. rosea plants, while their expression patterns in response to the other three treatments (including high salinity, alkaline, and ABA) showed different degrees of induction or suppression ( Figure 7B). Based on the above analysis, the results of the qRT-PCR are basically consistent with the results of the RNA-Seq analysis pertaining to the different habitats, and we propose that these genes may play important roles in the response of C. rosea to abiotic stress and adaptation to the extreme environment in the coral reef.

Abiotic Stress Tolerance of Yeast Heterologously Expressing CrLEAs and CrASRs
Combined with our transcriptional analyses and promoter regions' prediction, we could conclude preliminarily that these 10 CrLEAs and CrASRs (three CrASRs, five CrDHNs, and two CrLEA_4s) are closely related with the adaptation to abiotic stresses in C. rosea, especially to high salinity, alkaline, or water deficit, etc. To identify their potential roles in vivo, we performed a series of heterogeneous expression assays of the above 10 genes in a yeast system for functional stress tolerance investigation. For the antioxidation tolerance test, 10 CrLEAs and CrASRs were introduced into two H 2 O 2 -sensitive mutant strains yap1∆ and skn7∆, with the corresponding wild-type (WT) yeast BY4741 and two mutant strains transformed with the empty vector pYES2 as controls. As we can see from Figure 8A, with the exception of CrASR1, the rest of the nine C. rosea genes all showed varying degrees of increased tolerance to H 2 O 2 . This indicated that these nine genes all possessed some antioxidation activities ( Figure 8A). Due to the highly hydrophilic features and putative antioxidative abilities of CrLEAs and CrASRs, we also tested several other abiotic stress tolerances in the yeast WT strain ( Figure 8B, upper). In Figure 8B with respect to salt stress, the majority of the above 10 genes showed elevated salinity tolerance in yeast than empty vector pYES2, while the effects varied. The expression of all 10 genes elevated the alkaline stress tolerance of the yeast. We assessed the cadmium (Cd) tolerance of WT yeast expressing CrLEAs and CrASRs, mainly based on their antioxidative abilities ( Figure 8A) as well as the speculations of some reports that ASR or dehydrin proteins possess the ability to bind metals due to their possession of His-rich motifs [33,46,47]. Our results indicated that at least three CrASRs and four CrDHNs (including CrDHN1, CrDHN2, CrDHN3, and CrDHN4) all presented a certain degree of enhanced Cd tolerance in yeast ( Figure 8B, lower). Furthermore, most of the 10 CrLEAs and CrASRs could increase the high osmotic stress tolerance of yeast when expressed in the WT strain ( Figure 8B, lower). Although these yeast stress tolerance results are preliminary, and a few members even showed no significant effects against some stress challenges, there is still need for further functional identification in plants using transgenic systems.

Abiotic Stress Tolerance of Yeast Heterologously Expressing CrLEAs and CrASRs
Combined with our transcriptional analyses and promoter regions' prediction, we could conclude preliminarily that these 10 CrLEAs and CrASRs (three CrASRs, five CrDHNs, and two CrLEA_4s) are closely related with the adaptation to abiotic stresses in C. rosea, especially to high salinity, alkaline, or water deficit, etc. To identify their potential roles in vivo, we performed a series of heterogeneous expression assays of the above 10 CrDHN2, CrDHN3, and CrDHN4) all presented a certain degree of enhanced Cd tole in yeast ( Figure 8B, lower). Furthermore, most of the 10 CrLEAs and CrASRs cou crease the high osmotic stress tolerance of yeast when expressed in the WT strain (F 8B, lower). Although these yeast stress tolerance results are preliminary, and a few bers even showed no significant effects against some stress challenges, there is still for further functional identification in plants using transgenic systems.

Analysis of the Transcriptional Activation Activity of CrASRs
Plant-specific ASRs are typically small gene families comprising several members, and some ASRs have been confirmed to act as transcription factors, with their N-terminals possessing transcription activation ability [48], although in some ASR members, this domain is obviously absent. We first aligned the sequences of three CrASRs, which indicated that only CrASR1 had the possible N-terminal transcription activation domain and could act as a transcription factor ( Figure 9A). To further assess our speculation, the complete coding regions of all three CrASR cDNAs were ligated in-frame with the GAL4 DNA binding domain of the pGBKT7 vector and transformed in yeast in this study, with the empty vector GAL4-BD (pGBKT7-BD) as the negative control. The yeast growth on the SD medium lacking tryptophan (SD/-Trp) was normal and even, while the growth on the SD medium lacking tryptophan and histidine (SD/-Trp/-His) was inhibited. Only the yeast cells containing GAL4-BD-CrASR1 exhibited normal growth, and the LacZ staining assay of β-galactosidase activity was positive ( Figure 9B), which indicated that CrASR1 showed transcription activation activity and might be a transcription factor. and some ASRs have been confirmed to act as transcription factors, with their N-terminals possessing transcription activation ability [48], although in some ASR members, this domain is obviously absent. We first aligned the sequences of three CrASRs, which indicated that only CrASR1 had the possible N-terminal transcription activation domain and could act as a transcription factor ( Figure 9A). To further assess our speculation, the complete coding regions of all three CrASR cDNAs were ligated in-frame with the GAL4 DNA binding domain of the pGBKT7 vector and transformed in yeast in this study, with the empty vector GAL4-BD (pGBKT7-BD) as the negative control. The yeast growth on the SD medium lacking tryptophan (SD/-Trp) was normal and even, while the growth on the SD medium lacking tryptophan and histidine (SD/-Trp/-His) was inhibited. Only the yeast cells containing GAL4-BD-CrASR1 exhibited normal growth, and the LacZ staining assay of β-galactosidase activity was positive ( Figure 9B), which indicated that CrASR1 showed transcription activation activity and might be a transcription factor.

Discussion
Due to the significant roles of LEA and ASR proteins in water deficit stress responses and ROS scavenging abilities, they are believed to act in multiple developmental processes and in response to various stresses, as indicated in a number of previous reports [1][2][3]9]. As C. rosea is a halophyte with the typical features of high salt/alkaline and drought tolerance, it is necessary to systematically investigate the potential role of LEA/ASRs, especially given the lack of studies on LEA/ASR genes in legumes (Fabaceae). As C. rosea generally occurs in tropical and subtropical coastal regions, water shortages are one of the main impacts of its environmental surroundings. It is believed that C. rosea has developed a series of sophisticated mechanisms at multiple levels to cope with stress, including morphological, physiological, and genetic changes and adaptations, ultimately regulating the expression of stress-responsive genes through complex networks. The downstream responsive genes include some water stress-related genes, such as LEAs or ASRs, aquaporins, or ROS-producing and scavenging genes.
LEA protein genes, which were first identified on account of their marked transcript accumulation in embryos for coping with rapid dehydration during seed maturation [15], were later found to be induced in vegetative plant tissues under environmental stress conditions. ASR genes, which encode hydrophilic proteins or transcription factors, were first identified from a tomato cDNA library and participated in response to water-stressed conditions both in stressed leaves and in ripe fruits [41]. LEA proteins constitute a large multigene family that is closely related to the response to abiotic stresses in multiple plant species and that protects cells against water deficit caused by drought and other stresses. While LEAs are not just confined to Plantae, ASRs are exclusive to the plant kingdom (but are absent in Brassicaceae). Both LEAs and ASRs are known to participate in multiple developmental processes and in response to various stresses, mainly water shortage challenges. In the present study, we conducted whole-genome scanning in C. rosea, and a total of 84 CrLEAs and three CrASRs were identified ( Table 1). The number of LEA or ASR genes in C. rosea was compared with other plant species [11,20], and the number of different subfamilies of LEA/ASR genes from other legume species was summarized (Table S5). We found that the numbers of LEA/ASR genes in all of these typical diploid legume species were similar, while the soybean (Glycine max) genome contained obviously more (143), probably due to a whole-genome duplication event in the distant past. Among them, the LEA_2 subfamily (PF03168) possessed the largest member number and the greatest variability in all plant species (Figure 1, Table S5), which might imply a diversified functionality of this atypical LEA subfamily [20].
All of these CrLEA/CrASR proteins possess common characteristics, including small molecular weights, an abundance of hydrophilic amino acids, and coding genes that are intronless or contain few introns (Table 1, Figure 2). Most LEA proteins are termed hydrophilins mainly due to their unifying and outstanding feature of high hydrophilicity and a high content of Gly and small amino acids such as Ala and Ser [8]. Previous research indicated that some LEA proteins present a high degree of unordered structure in solution and are considered IDPs [23]. The high hydrophilicity and intrinsically disordered features of LEAs facilitate their protective functions by promoting associations with membrane surfaces or protein partners for protection, and by sequestering H 2 O, ROS, ions, or other small molecules for alleviating damage or toxicity [23]. Except for some LEA_2 members, most of CrLEA/CrASR proteins showed negative GRAVY scores (<0) ( Table 1), indicating that these proteins have strong hydrophilicity and could offer pivotal protective roles under rapid and severe dehydration in plants. Subcellular localization prediction revealed that the CrLEA/CrASR proteins were present in all subcellular compartments, which is consistent with the functional prediction that LEA/ASR proteins in principal groups are ubiquitous within cells and are required in all cellular compartments responding to abiotic stress [49]. We also predicted the 3D structures and calculated the disordered amino acid contents of all CrLEA/CrASR proteins, and the results indicated that except CrLEA_2 and CrSMP subfamilies, the other proteins showed obviously disordered structures (Figures S1 and S2). Furthermore, previous studies have indicated that plant stress-responsive genes without or with few introns could reduce the time required from transcription to translation, therefore providing good adaptation abilities for plants responding to changes in environments and habitats [46]. Combined with the CrLEA/CrASR gene structure and protein motif, we can infer that CrLEA/CrASR proteins are evolutionarily conserved, and their functions have group specificity.
Gene duplication plays a crucial role in the expansion of gene families and is a major way in which genomes can be reshaped, therefore promoting organismal adaptive evolution to the environment [47]. Based on the evolutionary characterization of CrLEAs and CrASRs, we found that most segmental and tandem duplications occurred in the CrLEA_2 subfamily, and similar characteristics have also been reported for LEA genes in other plant species [20], which suggests that the LEA_2 subfamily contains the most diverse LEA members in plants. LEA_2 is considered as an atypical LEA protein because it possesses more hydrophobic amino acids and a more defined secondary structure compared with other LEA subfamily members [20]. Additionally, also we found that this subfamily contained the only tandem duplicate cluster in C. rosea ( Table 2), indicating that tandem duplications have contributed significantly to the expansion and diversification of the large LEA_2 family in C. rosea. The other segmental duplications occurring in CrLEA_1, CrLEA_3, CrLEA_4, and CrLEA_5 indicated that these LEA genes might participate in the adaptive evolution of C. rosea to water shortage stresses by increasing their gene numbers.
As LEAs have been confirmed to endow plants with a variety of abiotic and biological stress tolerances, LEA families often have multiple copies of LEA genes in various plant species for that very reason. Correspondingly, even within the same LEA subfamily, our transcriptome data showed that the expression of different members was specific (Figure 6), which is probably derived from the influence of the promoter regions. Our statistical analysis results of the promotors of CrLEAs/CrASRs showed that most of the promoter regions contained cis-regulatory elements, such as ABRE, MYC, MYB, MBS, TC-rich repeats, and several hormone-related responsive elements, suggesting that these genes could be regulated or affected by different stresses ( Figure 5A). We also analyzed 10 promoter regions (including three CrASRs, five CrDHNs, and two CrLEA_4s) in detail. We found that the promoter regions showed more variability than the gene coding region, which corresponds with their gene-specific expression pattern ( Figures 5B and 6). The contribution of single CrLEA or CrASR genes to the stress tolerance and environmental adaptation of C. rosea needs to be further explored.
An investigation of the natural habitat of C. rosea indicates that this species possesses significant growth potential and is an adaptable pioneer species that can be used for island greening, sand fixation, and the ecological restoration of coral islands and coastal zones in tropical or subtropical regions [40]. We firstly examined the expression of the CrLEA and CrASR genes in the different tissues and developmental periods using RNA-Seq. The expression profiles revealed spatial variations in the expression of CrLEAs or CrASRs in different organs ( Figure 6A). Our further habitat-specific RNA-Seq data indicated that most of the CrLEA/CrASR genes had greater expression levels in coastal C. rosea (YX) than in inland C. rosea (SCBG), while some were quite the opposite ( Figure 6B). Our results suggested that the differential expression of CrLEA/CrASR genes might be an adaptive mechanism for dealing with intracellular and extracellular water-deficit signals in C. rosea plants, which is associated with different water loss strategies in different habitats and, in the longer term, may also function in the biological adaptability to changes in environmental factors and different habitats.
We also used qRT-PCR to investigate the expression patterns of 10 candidate genes in C. rosea plants under salt, alkaline, high osmotic treatment, and ABA stresses. Detailed expression profiles of these 10 CrLEA/CrASR genes revealed that most were significantly up-regulated after the high osmotic stress (NaCl or mannitol) or ABA treatment, while the alkaline stress showed the least impact on the expression changes of candidate genes (Figure 7). This result is consistent with several previous reports of LEA genes in other plant species, in which their expression was also greatly affected by abiotic stress challenges and hormone treatments [19,48,50]. The relevant roles of ASRs, DHNs, or the LEA_4s subfamily genes during plant stress tolerance have been reported in earlier studies using various techniques [11,20,42]. Here both the RNA-Seq and the qRT-PCR results indicated that these genes might be vital factors influencing the adaptability of C. rosea plants to salt/alkaline and drought stresses in tropical or subtropical coastal regions.
The accumulation of LEA proteins in cells is crucial for the response of organisms to different abiotic stresses. Numerous reports have demonstrated that overexpressing LEA genes in different organisms resulted in improved salinity and dehydration tolerance. For example, Ipomoea pes-caprae is a perennial herbaceous vine plant distributed mainly on sandy beaches or in sunny positions on the roadside in tropical and sub-tropical regions.
It exhibits excellent salt tolerance and drought resistance, and the induction of IpLEAs in yeast obviously improved the salt and oxidative stress tolerance of yeast clones [51]. Prior studies revealed that some LEA genes from Dendrobium officinale were able to enhance the cellular tolerance to high temperature and salt stresses of E. coli cells [49]. For ASR genes, our previous research also demonstrated that an I. pes-caprae ASR gene, IpASR, could improve salinity and drought tolerance in transgenic E. coli and Arabidopsis [45]. In summary, LEA and ASR genes are pertinent research subjects in plant stress physiology and are mainly involved in drought or salinity tolerance in plants. Here, we identified and characterized 10 CrLEA/CrASR genes in C. rosea for the first time and performed functional verification in yeast to investigate bioactivities for abiotic stress resistance. Based on our results, nine of the candidate genes increased the yeast cell resistance to H 2 O 2 , showing individual differences ( Figure 8A). The majority of the 10 tested genes elevated the high salinity, alkaline, and high osmotic stress tolerance of yeast compared with the empty vector control, while some even showed heavy metal Cd tolerance, but not significantly so ( Figure 8B). Our results demonstrated that these CrLEA/CrASR genes may play protective roles in cells under water-deficit conditions caused by high salt/alkaline and high osmotic stresses, and they also could improve cell survival via their antioxidant capacities or metal chelating abilities.
Although CrASR1 did not present antioxidant properties in yeast, it is a highly hydrophilic protein with a relatively low GRAVY value score (Table 1). Unlike CrASR2 or CrASR3, CrASR1 has a remarkable property in that it contains a Gly-rich domain in its N-terminus ( Figure 9A). According to our previous reports [35,45], it is uncertain whether this type of ASR protein could activate transcription and act as a transcription factor, while our result on the transcriptional activation assay in yeast indicated that CrASR1 did exhibit transcription factor activity, yet CrASR2 and CrASR3 had exactly the opposite effect ( Figure 9B). This indicated that CrASR1 could act as a transcriptional regulator together with chaperone-like proteins or hydrophilins, like other LEA/ASR proteins [11,34]. Previous studies have proved that tomato ASR1 is a drought stress-responsive transcription factor, and its consensus ASR1-binding motif was enriched in some specific tomato genomic loci, possibly containing several aquaporin genes [12,52]. A recent study on CaASR1 in another Solanaceae species, Capsicum annuum, also provided evidence that CaASR1 could interact with the transcription factor CabZIP63 and is a positive regulator of the defense response of pepper to the pathogen R. solanacearum, probably by acting as a transcription activator [53]. Our results strongly suggest that CrASR1 showed quite different features from the other two CrASRs or CrLEAs (Figures 8 and 9), and this protein might fulfill an important function in salt/drought tolerance and regulate the environmental adaptation of C. rosea to tropical coastal regions, although more research is needed for further clarification. To analyze the tissue-specific transcriptional patterns of the CrLEA and CrASR genes, the roots, stems, leaves, flowers, and fruits were sampled from C. rosea plants grown in SCBG. To investigate the involvement of the CrLEA and CrASR genes in the adaptation to a hightemperature island climate, coastal saline environment, and seasonal drought environment, adult plant leaves were also gathered from C. rosea growing on both YX Island and SCBG.

Plant Materials and Stress Treatments
The responses of the CrLEA and CrASR genes to different stresses and hormone treatments were investigated. Seedlings of C. rosea germinated from seeds in a soil/vermiculite mixture for 30 d were subjected to treatments. In brief, for high salinity stress, the C. rosea seedlings were removed from the pots and carefully washed with distilled water to remove soil from the roots, following which they were transferred into 600 mM NaCl solution; for salt-alkaline stress, the cleaned C. rosea seedlings were soaked in a 150 mM NaHCO 3 (pH 8.2) solution; for the drought treatment, the seedlings were soaked in a 300 mM manni-tol solution; and for abscisic acid (ABA) treatment, a freshly prepared working solution of 100 µM exogenous ABA was sprayed onto the leaves of the C. rosea seedlings. The second and/or third mature leaves from the shoot apexes were collected at 0, 2, and 24 h during the stress treatments, and the starting point (0 h) was used as the control. All samples were immediately frozen in liquid nitrogen and stored at −80 • C for subsequent gene expression analysis. Three independent biological replicates were conducted.

Identification of LEA/ASR Genes in the C. rosea Genome and Phylogenetic Analysis of CrLEA/CrASR Superfamily Proteins
The putative CrLEA and CrASR gene sequences were collected from the genome database of C. rosea. All of the C. rosea proteins were first identified using DIAMOND [54] and InterProscan (https://www.ebi.ac.uk/interpro/search/sequence/, accessed on 1 March 2021) to assess the conserved domains and motifs (e < 1 × 10 −5 ), following which they were annotated using the Pfam database (http://pfam.xfam.org/, accessed on 1 March 2021). The Pfam ID PF03760 (LEA_1), PF03168 (LEA_2), PF03242 (LEA_3), PF02987 (LEA_4), PF00477 (LEA_5), PF10714 (LEA_6), PF00257 (dehydrin, DHN), PF04927 (SMP), and PF02496 (ASR) were used to search the CrLEAs and CrASRs, and the putative sequences of CrLEA and CrASR proteins were identified and submitted to SMART (http://smart.embl-heidelberg.de/, accessed on 1 March 2021) and the NCBI Conserved Domain Database (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, accessed on 1 March 2021) to confirm the presence of the LEA or ASR domains. Finally, the selected CrLEAs and CrASRs were named based on their sequence homology with other known plant LEAs or ASRs and the C. rosea genome annotations. An unrooted neighbor-joining phylogenetic tree was created based on multiple protein sequence alignments of all identified CrLEAs and CrASRs from C. rosea using Clustal X 2.0 and MEGA 6 with 1000 bootstrap replicates. The obtained LEA and ASR nucleotide and protein sequences from C. rosea are listed in Table S1. The gene structure analysis for CrLEAs and CrASRs was displayed with GSDS (http://gsds.cbi.pku.edu.cn/, accessed on 1 March 2021).

Analysis of Protein-Conserved Motifs and Biochemical Features of CrLEAs/ASRs
To investigate the characteristics of the CrLEA and CrASR proteins, the molecular weight (MW), theoretical isoelectric point (pI), and grand average of hydropathy (GRAVY) were predicted using the ProtParam tool (http://web.expasy.org/protparam/, accessed on 1 March 2021). Furthermore, the subcellular localization predictions for these CsLEA proteins were carried out using the WoLF_PSORT tool (http://www.genscript.com/wolfpsort.html, accessed on 1 March 2021). The contents of disordered amino acids (%) in each CrLEA/ASR were calculated according to the online program PrDOS (Protein DisOrder prediction System, http://prdos.hgc.jp/cgi-bin/top.cgi, accessed on 22 April 2021). The deduced amino acid sequences of the CrLEAs and CrASRs were analyzed by the Multiple Expectation Maximization for Motif elicitation (MEME) tool (http://meme-suite.org/ index.html, accessed on 1 March 2021) to identify conserved domains and motifs of each subgroup of these proteins. The selection of the maximum number of motifs was set to 3, with a minimum width of 11 and a maximum width of 50 amino acids, and an e-value < 1 × 10 −8 . The Phyre2 server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index, accessed on 1 March 2021) was used for homology modelling of the three-dimensional (3D) structure of all CrLEA and CrASR proteins.

Gene Duplication and Collinearity Analysis of CrLEAs
The CrLEA and CrASR genes were mapped onto C. rosea chromosomes according to the positional information of these genes in the C. rosea genome database and were displayed using MapInspect software (http://mapinspect.apponic.com/, accessed on 1 March 2021). Gene segmental duplications were assessed using MCScanX software (http: //chibba.pgml.uga.edu/mcscan2/, accessed on 1 March 2021), and tandem duplications were identified manually. The number of synonymous substitutions per synonymous site (Ka), the number of non-synonymous substitutions per non-synonymous site (Ks), and the p-value from Fisher's exact test of neutrality were calculated using the Nei-Gojobori model with 1,000 bootstrap replicates [55]. A Ka/Ks ratio < 1 indicates purifying selection, a Ka/Ks ratio = 1 indicates neutral selection, and a Ka/Ks ratio > 1 indicates positive selection. The gene segmental duplications of the CrLEAs were visualized using the online shinyCircos software (https://venyao.xyz/shinyCircos/, accessed on 1 March 2021).

Promoter Sequence Profiling of CrLEAs/ASRs
The promoter regions (1000 bp upstream from the translation start site) of all CrLEA and CrASR genes were retrieved from the genome database of C. rosea for further analysis of the cis-regulatory elements and motifs by querying them through the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 1 March 2021). The stress-and hormone-related cis-regulatory elements included light response elements, gibberellin-responsive elements, methyl jasmonate (MeJA)-responsive elements, auxin response elements, salicylic acid responsiveness elements, ABA response elements (ABRE), ethylene response elements (ERE), MYC transcription factor binding elements (MYC), MYB transcription factor binding elements (MYB and MBS), TC-rich repeats, LTR, and as-1. These elements are believed to be involved in plant responses to dehydration, low temperature, salt stress, and other abiotic stresses (http://bioinformatics.psb.ugent. be/webtools/plantcare/html/, accessed on 1 March 2021).

RNA-Seq of Different C. rosea Tissues
The C. rosea RNA-Seq datasets were constructed using Illumina HiSeq X sequencing technology. In brief, seven different tissues from C. rosea plants (root, vine, young leaf, flower bud, and young silique samples collected from C. rosea plants growing in SCBG; mature leaf samples from C. rosea growing in SCBG and on YX Island) were examined using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/, accessed on 1 March 2021) based on the primary 40 Gb clean reads and were mapped to the C. rosea reference genome using Tophat v2.0.10 (http://tophat.cbcb.umd.edu/, accessed on 1 March 2021). The heatmaps showing the CrLEA and CrASR gene expression profiles were generated using TBtools [56], principally adopting the fragments per kilobase of transcript per million mapped reads (FPKM) values, and the expression levels (log2 of FPKM values) of these genes were visualized.

Expression Analysis by Quantitative Reverse Transcription (qRT) PCR of CrLEAs/ASRs under Different Stress Treatments
The transcript abundance of several CrLEA and CrASR genes was further investigated using qRT-PCR. Total RNA was extracted from the C. rosea seedling roots, vines, and leaves under the stress/ABA treatment and unstressed treatment using an RNA plant extraction kit (Magen, Biotech, Guangzhou, China) according to the manufacturer's instructions, and approximately 1 µg of purified total RNA was reverse transcribed into cDNA in a 20 µL reaction volume using AMV reverse transcriptase (TransGen Biotech, Beijing, China) according to the supplier's instructions. In brief, total RNA was extracted from C. rosea seedling tissues under the stress/ABA treatments and reverse transcribed into cDNA. Quantitative RT-PCR was conducted using the LightCycler480 system (Roche, Basel, Switzerland) and TransStart Tip Green qPCR SuperMix (TransGen). All of the gene expression data obtained via qRT-PCR were normalized to the expression of CrEF-α (Table S1). The primers used for qRT-PCR (CrEF-αRTF/CrEF-αRTR for the reference gene and other CrLEAor CrASR-specific primer pairs) are listed in Supplementary Materials Table S1.

Cloning of CrLEA/ASR cDNAs and Heterologous Expression in Yeast
To clone the CrLEA/ASR cDNAs, total RNA isolation and cDNA synthesis was performed as described above. The full-length CrLEA/ASR cDNA was PCR-amplified from C. rosea leaf cDNA using the gene specific primers listed in Supplementary Materials Table S1. The construction of the yeast expression vector pYES2-CrLEAs/ASRs were constructed with in-fusion technique (BD In-Fusion PCR cloning Kit, Takara Bio USA, Mountain View, CA, USA) according to the manufacturer's instruction. In short, the PCR fragments were inserted into the BamHI and EcoRI sites of pYES2 and after sequencing confirmation, these recombinant expression vectors were transformed into different yeast strains to verify the stress tolerance functions. The wild-type (WT) yeast (Saccharomyces cerevisiae) strain BY4741 (Y00000, MATa; ura3∆0; leu2∆0; his3∆1; met15∆0) and the two H 2 O 2 -sensitive mutant strains yap1∆ (Y00569, BY4741; MATa; ura3∆0; leu2∆0; his3∆1; met15∆0; YML007w::kanMX4) and skn7∆ (Y02900, BY4741; MATa; ura3∆0; leu2∆0; his3∆1; met15∆0; YHR206w::kanMX4) were obtained from Euroscarf (Frankfurt, Germany). The various yeast strains were transformed using the LiOAc/PEG technique, and uracil complementation was used for selection. A synthetic defined (SD) yeast medium without uracil (SD-Ura) containing 2% (w/v) galactose (Gal) was used as the yeast induction medium (SDG-Ura) during the salt/alkaline tolerance and oxidative stress experiments on solid medium for the localization observations. The concentrations of the different stress factors are indicated in the figure legends.

Transcriptional Activity Analysis of CrASRs in Yeast
The open reading frames (ORFs) of three CrASRs were cloned into the vector pGBKT7 (Clontech, Mountain View, CA, USA) for transcription activation analysis with the yeast two-hybrid assay according to the manufacturer's introduction (Clontech, CA, USA). In brief, the full CDSs of CrASRs were amplified from C. rosea leaves cDNA using the corresponding primers as listed in Supplementary Materials Table S1. Then all PCR products were inserted into GAL4-DBD vector pGBKT7 at EcoRI site and sequenced. These constructs along with the negative control pGBKT7 (containing the binding domain, BD) were transformed into yeast strain AH109 using the LiOAc/PEG method. The yeast clones were cultured in liquid SD-Trp medium to an OD600 value of 2, after which they were diluted using a gradient dilution (1:10, 1:100, and 1:1000). Two-microliter yeast cultures were spotted onto the corresponding synthetically defined (SD/-Trp and SD/-Trp/-Leu) medium plates for 2 days at 30 • C. Yeast transformation and determination of blue/white colonies were conducted according to the instructions of the manufacturer (Clontech), and X-α-Gal was used as a substrate for the reporter gene MEL1. The primers used in construction of pGBKT7 vectors for CrASRs transactivation activity assay in yeast are listed in Supplementary Materials Table S1.

Statistical Analysis
All the experiments in this study were repeated three times independently, with the results shown as the mean ± SD (n ≥ 3). Pairwise differences between means were analyzed using Student's t-tests in Excel 2010 (Microsoft Corporation, Albuquerque, NM, USA).

Conclusions
In summary, this study is the first to systematically summarize members of the LEA and ASR gene family in a leguminous halophyte, C. rosea. In total, 84 CrLEA genes and three CrASR genes were identified in the C. rosea genome and were classified into nine subgroups. Chromosomal mapping and synteny analysis revealed that these members are distributed in all 11 chromosomes of C. rosea, with several gene tandem and segmental duplication patterns for LEA gene expansion in the C. rosea genome. The CrLEA/CrASR members within the same subfamily are highly conserved in both gene structures and protein motifs. The CrLEA/CrASR superfamily was involved in the adaptation of C. rosea to different habitats, and 10 of the CrLEA/CrASR members showed obvious differences in gene expression in response to salt/alkaline stress, high osmotic stress, or ABA treatment. This systematic study provides new information on the LEA/ASR superfamily in C. rosea and further expands our understanding of the association of CrLEA/CrASR genes with natural ecological adaptability and abiotic stress responses in C. rosea. Our findings should also inform the genetic improvement of other legume plants or crops and provide candidate stress-resistance genes for future research.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. However, most of the data is shown in Supplementary files.