Genome-Wide Identification of the Alba Gene Family in Plants and Stress-Responsive Expression of the Rice Alba Genes

Architectural proteins play key roles in genome construction and regulate the expression of many genes, albeit the modulation of genome plasticity by these proteins is largely unknown. A critical screening of the architectural proteins in five crop species, viz., Oryza sativa, Zea mays, Sorghum bicolor, Cicer arietinum, and Vitis vinifera, and in the model plant Arabidopsis thaliana along with evolutionary relevant species such as Chlamydomonas reinhardtii, Physcomitrella patens, and Amborella trichopoda, revealed 9, 20, 10, 7, 7, 6, 1, 4, and 4 Alba (acetylation lowers binding affinity) genes, respectively. A phylogenetic analysis of the genes and of their counterparts in other plant species indicated evolutionary conservation and diversification. In each group, the structural components of the genes and motifs showed significant conservation. The chromosomal location of the Alba genes of rice (OsAlba), showed an unequal distribution on 8 of its 12 chromosomes. The expression profiles of the OsAlba genes indicated a distinct tissue-specific expression in the seedling, vegetative, and reproductive stages. The quantitative real-time PCR (qRT-PCR) analysis of the OsAlba genes confirmed their stress-inducible expression under multivariate environmental conditions and phytohormone treatments. The evaluation of the regulatory elements in 68 Alba genes from the 9 species studied led to the identification of conserved motifs and overlapping microRNA (miRNA) target sites, suggesting the conservation of their function in related proteins and a divergence in their biological roles across species. The 3D structure and the prediction of putative ligands and their binding sites for OsAlba proteins offered a key insight into the structure–function relationship. These results provide a comprehensive overview of the subtle genetic diversification of the OsAlba genes, which will help in elucidating their functional role in plants.


Introduction
Plants encounter multiple abiotic and biotic stresses in a complex environment, which hamper their growth and development [1]. The stress perception and transduction of signals activate self-defense mechanisms in plants for acclimatization and survival by alterations in protein balance. The regulation of gene expression differs widely between prokaryotes and eukaryotes. The evolutionarily conserved DNA-binding proteins, particularly histones in the chromatin, act as an on-off switch that turns genes on or off [2,3]. The DNA-binding Alba (acetylation lowers binding (DB) Phytozome v12.0 [26] and NCBI. The Pfam database [27] was used to retrieve the Hidden Markov Model (HMM) profile of the Alba domain (Pfam PF01918), which was further submitted to BLASTP (p = 0.001) search. The amino acid sequences were examined for the presence of the Alba domain using the NCBI Conserved Domain Database (CDD) [28]. The Alba protein sequences of rice and Arabidopsis were independently aligned with those of chickpea, maize, and sorghum using BLASTP (e-value cutoff: 1 × 10 −5 ). The HMM profiles obtained from Pfam database for the Alba domain and the putative Alba genes identified from other species were merged to develop a non-redundant list for each species and were examined for the presence of a conserved Alba domain.

Sequence Analysis and Structural Characterization
The information regarding protein sequences, genomic sequences, coding DNA sequences (CDS), and upstream 1500-bp nucleotide stretches from the translation initiation codon along with their locations on the chromosomes were downloaded (www.phytozome.net). To analyze the structure and diversity of the Alba genes and the exon-intron positions, their sequences were surveyed using the online GSDS2.0 program [29]. The MEME online tools [30] were used to identify the motifs of Alba proteins. The parameters chosen were: maximum length of the conserved motif, 50; minimum length, 6; largest number, 15.

Phylogenetic Analysis
The amino acid sequences of Alba proteins from Chlamydomonas (CreAlba), Physcomitrella (PpAlba), Amborella (AmtrAlba), grape (VvAlba), chickpea (CaAlba), Arabidopsis (AtAlba), rice (OsAlba), maize (ZmAlba), and sorghum (SbAlba) were used to construct the cladogram. The SsoAlba1 of Sulfolobus solfataricus was used as an outgroup. The multiple sequence alignment (MSA) of the Alba proteins was conducted using Clustal omega. The MEGA 7.0 software was used with default settings to construct a neighbor-joining phylogenetic tree [31]. The statistical significance for each tree node was determined by bootstrap analysis using 100 replicates [32].

Prediction of miRNA Targets
The putative miRNA targets were identified with the psRNATarget program [42] using default parameters. Recently identified miRNA and considerably different known miRNA sequences were used as custom sequences. The redundant sequences were removed after identifying potential target mRNA sequences for further analysis.

Molecular Modeling of OsAlba Proteins
The 3D structure of the rice Alba proteins was generated by I-TASSER server [43]. The sequences of OsAlba1-9 were used as input. The 3D models were developed from multiple threading alignments and iterative structural assembly simulations. We chose the best model with the highest scores, retrieved the template analogs, predicted the ligand-binding sites, and refined the model with PyMOL software v1.3. The secondary and tertiary structures were further analyzed for predicting the ligand-binding sites and GO terms.

Plant Materials, Growth Conditions, and Stress Treatment
The seedlings of rice (Oryza sativa L. ssp. indica) were grown in pots containing a combination of soilrite and soil (1:2, w/w; 10-12 seedlings/pot) in a program-regulated growth chamber. The seedlings were maintained at 70 ± 5% relative humidity under a 16 h photoperiod (300 µmol·m −2 ·s −1 light intensity) at 28 ± 2 • C. Two-week-old seedlings were exposed to multivariate stresses (dehydration, hypersalinity, high temperature, and cold) as described earlier [44]. In a separate set, the seedlings were supplied with 200 mM NaCl solution, and the samples were collected at 0, 6, 12, and 24 h. The hormonal treatments were carried out by spraying the seedlings with abscisic acid (ABA, 200 µM), jasmonic acid (JA, 200 µM), and salicylic acid (SA, 200 µM), and the tissues were harvested at 0, 0.5, 1, 3, 6, and 12 h intervals [44]. Independently, the seedlings were kept at 4 • C for cold stress and sampled at 0, 1, 3, 6, and 12 h post-treatment. Dehydration was imposed by withholding water supply, and the tissues were collected at 0-, 2-, 4-, and 5-day intervals. High temperature treatment was imposed by transferring the potted seedlings to 42 • C chamber, harvesting at 0, 6, 12, and 24 h post-treatment. For tissue-specific expression analysis, the tissues were collected from six major organs, viz., roots, stems, leaves, flag leaves, leaf sheath, and panicles of 120-day-old plants. A total of three replicates were chosen for each experiment (at an average of three plants per replica). The tissues sampled for each treatment were frozen in liquid nitrogen and stored at −80 • C, unless otherwise mentioned. The tissues from the unstressed seedlings were harvested for each time point for various stress treatments and finally pooled to normalize any possible effect of growth and development.

RNA Isolation and Quantitative Real-Time PCR Analysis
Total RNA was isolated using TRI reagent (Sigma Life Science, St. Louis, MO, USA) as per the manufacturer's instructions. The quantity and quality of the extracted RNA was estimated using Nanodrop1000 (Thermo Fisher Scientific, Wilmington, DE, USA). The complementary DNA (cDNA) was synthesized using RNA samples with a value of 260/280 ratio between 1.8 and 2.1, and 260/230 ratio between 2.0 and 2.5. The gene-specific primers were designed using the Primer Express software v3.0 (Applied Biosystems, Foster City, CA, USA) and are listed in Table S1. Two biological replicates from each treatment, comprised of at least three technical replicates, were analyzed. The transcript analysis was conducted by quantitative real-time PCR (qRT-PCR) using an ABI Prism 7500 Detection System (Applied Biosystems, Foster City, CA, USA). The reactions were carried out in a 10 µL volume containing 200 nM primers, 5 µL SYBR Premix Ex Taq II (Takara, Beijing, China), and 1 µL diluted cDNA in the following conditions: 10 min at 95 • C, 40 cycles of 15 s at 95 • C, and 30 s at 60 • C. The melting curve analysis was carried out to verify the specificity of the amplicon for each primer pair. The expression of tubulin was used as an internal control for normalization.

Identification of Plant-Specific Alba Proteins
The identification and in silico analysis of the Alba superfamily was carried out with the available genome sequences. The gene family was investigated following two strategies, i.e., HMM profile search and BLASTP. A non-redundant list was obtained for rice and maize by combining the identified Alba genes. To further confirm the conserved Alba domain, Pfam and SMART databases were used for the candidate proteins. The analysis revealed 1,4,4,7,7,6,9,20. and 10 Alba genes in Chlamydomonas, Physcomitrella, Amborella, grape, chickpea, Arabidopsis, rice, maize, and sorghum, respectively. The presence of Alba genes as a multigene family in higher eukaryotes suggests their biological significance. The characteristic signatures of CreAlba, PpAlba, AmtrAlba, VvAlba, CaAlba, AtAlba, OsAlba, ZmAlba, and SbAlba representing gene names, identifier, chromosome location, mRNA length, CDS, and protein sequence along with their physical and chemical properties showed a significant conservation as well as variations (Table 1).

Genomic Organization of Alba Genes and Their Chromosomal Distribution
The Alba proteins identified in monocots and dicots showed the characteristic Alba domain with sequence conservation in the core region. The exon-intron organization of the plant-specific Alba genes revealed variation in the intron number ranging 0-17. Most of the Alba genes had a similar intron-phasing distribution ( Figure 1). Next, we analyzed the chromosomal distribution of the Alba genes. In rice, the Alba genes were found to be distributed unevenly throughout the chromosomes. All nine OsAlba genes studied were found to be distributed on the 12 chromosomes, except for chromosomes 5, 7, 8, and 10. However, the number of Alba genes varied widely on each chromosome. Most of the Alba genes were found to be localized on the distal ends of the chromosomes. Two genes were found to be located on chromosome 3, whereas one each were identified on chromosomes 1, 2, 4, 6, 9, 11, and 12. In maize, all 20 ZmAlba genes were found to be distributed on chromosomes 1-10: 4 located on chromosome 1, 3 each on chromosomes 8 and 10, 2 each on chromosomes 2, 3, 4, and 9, while 1 each on chromosomes 5 and 7 (Table 1). In sorghum, SbAlbas genes were found to be distributed on all the chromosomes: two on chromosome 1, while one each on chromosomes 2, 3, 4, 5, 6, 8, 9, and 10. We observed seven Alba genes in chickpea with three genes on chromosome 5 and one each located on chromosomes 1, 4, 6, and 8. In grape, there were seven genes distributed on chromosomes 5, 7, 9, 11, 18. In Arabidopsis, all the six AtAlba genes were found to be distributed on three of five chromosomes: three were present on chromosome 1, two on chromosome 3, and one on chromosome 2. In Chlamydomonas, there was only one Alba gene located on chromosome 9. In Physcomitrella, all four Alba genes were distributed on chromosomes 12, 20, 23, 24. In Amborella, all the Alba genes were found to be present on the scaffold00002, scaffold00017, scaffold00067, and scaffold00104. The major forces during the course of genome evolution in plants have been the duplication of individual genes, of chromosomal segments, or of the entire genome itself [45]. We analyzed the possibility of gene duplication in the Alba gene family using the Plant Genome Duplication database [46]. The distribution of OsAlba1 and OsAlba6 on chromosomes 1 and 6, respectively, suggests a segmental duplication. Similarly, ZmAlba7 and ZmAlba13 on chromosomes 1 and 6 and ZmAlba4 and ZmAlba11 on chromosomes 1 and 5 indicate gene duplication. The presence of more than one member of a gene family on the same chromosome is suggestive of tandem duplication, while segmental duplication is defined as the event of gene duplication on different chromosomes [47]. In sorghum, tandem duplication was observed in SbAlba1 and SbAlba2 on chromosome 1, while segmental duplication was observed in SbAlba4, SbAlba9, and SbAlba10 on chromosomes 3, 9, and 10, respectively. Interestingly, the presence of SbAlba4, SbAlba5, and SbAlba10 on chromosomes 3, 4, and 10, respectively, was also predicted to be an event of segmental duplication. Arabidopsis showed tandem duplication for AtAlba1 and AtAlba2 on chromosome 1. These results suggest a tandem as well as a segmental duplication across the Alba gene family in plants. The variation in gene sequences during duplication indicated the neofunctionalisation of the paralogs [48]. It is suggested that two genes with identical functions can stably be maintained in the genome only when an extra amount of a gene product becomes advantageous for an organism [49]. To investigate the positive selection among OsAlba proteins, we analyzed the value of ω (dN/dS) through the codeml programme of PAML software by the maximum-likelihood method [50]. While OsAlba2 and OsAlba3 indicated a high non-synonymous substitution rate, OsAlba2 and OsAlba4 along with OsAlba4 and OsAlba6 suggested substitutions in an adaptive manner. The expansion of the Alba gene family in plants through evolution provides new insights into their diverse biological roles.

Phylogenetic Analysis of Alba Gene Families
To explore the evolutionary relationships among various Alba family members, full-length amino acid sequences were analyzed. The MSA of sequences was performed, and, sequentially, the phylograms were generated (Figure 2).
Three different phylogenetic trees were constructed using the Alba proteins from the crop families and Alba proteins from other species to better understand the phylogenetic relationships. The phylogram constructed with the most conserved region of the Alba domain of the most similar Alba homologs across species indicated a major grouping between dicots and monocots along with lower plants (Figure 2A). The analysis with all the Alba homologs from the crop species showed two major clades from monocots and dicots. Proteins from rice, maize, and sorghum clustered together, while proteins from chickpea and Arabidopsis formed separate groups ( Figure 2B). The Alba proteins from Chlamydomonas, Amborella, Physcomitrella, and grape, showed diversity in sequences as compared to most of their homologs in the crop species ( Figure 2C). Interestingly, the Alba proteins of grape and Amborella were found to be clustered together, while their counterparts from lower plants grouped separately. Altogether, these results suggested diversity in Alba proteins among the different strata of species along with sequence conservation within similar groups. The rice Alba family of proteins revealed the existence of two distinct groups. The first group was found to contain the typical N-terminal Alba domain, while the second group possessed an Alba domain in the middle. However, OsAlba7 showed a C-terminal Alba domain with a different sequence composition compared to other members. Interestingly, the Alba proteins with a similar domain composition were clustered in the same clade. A diverse domain architecture in plant-specific Alba sequences has previously been reported [51]. Proteasome regulatory subunit Rpn3_C, Trx-4, DEAD/DEAH box helicase, Taxi_N, and Taxi_C domains were found in combination with the Alba domain across the species [21]. The phylogenetic analysis grouped L. infantum Alba proteins in Rpp25/Mdp2 (LiAlba20 or LiAlba3) and Rpp20/Pop7 (LiAlba13 or LiAlba1) groups [22,52]. In Trypanosoma, TbAlba1 and TbAlba2 were classified in the Rpp20/Pop7 subunit-containing Alba-domain family, whereas TbAlba3 and TbAlba4 were grouped together with the Rpp25/Mdp2 subfamily [16]. Furthermore, the motif analysis led to the identification of 15 different conserved motifs, with a sharing of conserved motifs among related proteins ( Figure 3).  The order, number, and type of motifs were found to be similar in proteins within the same subfamily, but differed across the subfamilies. Motif 1, 2, and 3 were found to be conserved in most Alba proteins analyzed, and motif 4 and 5 were the most conserved among plant-specific Alba proteins ( Figure S1). The combination of different domains and motifs along with the Alba domains indicates the possible functional diversities of these proteins across species, as suggested by earlier reports [11,21].

Subcellular Localization of the Alba Family Proteins
To analyze the subcellular location, 68 plant-specific Alba proteins were examined using various localization-prediction tools. While the Alba proteins were primarily predicted to be translocated to the nucleus, cytoplasm, or chloroplast, the majority of them indicated their localization in the nucleus (Table S2). The in-silico prediction analysis in this study and our previous observation of OsAlba1 being translocated to the nucleus [53], confirmed the nuclear localization of the rice Alba proteins. A comparison of data from the species studied further validated the nuclear localization of Alba proteins. The nuclear localization of Alba proteins is suggestive of their putative role in gene expression, particularly in stress responses [53]. In archaea, the Alba proteins were previously shown to be involved in histone modifications, besides the regulation of gene expression [54].

Analysis of Upstream Regulatory Elements in Alba Genes
To evaluate the regulation of plant-specific Alba genes under stress conditions, we examined 1500-bp sequences upstream of the transcriptional start site. The identified putative cis-acting regulatory elements (CAREs) were classified into seven groups: enhancer, essential element, hormone-responsive, stress-responsive, and other elements (Table S3). The CAREs associated with environmental stress and hormone response included ABRE, CE1, and CE3 involved in ABA response; MYB-binding sites (MBS) involved in water-deficit; low temperature responsive elements (LTRs) involved in the cold and hypersalinity response; TCA-element involved in SA response; and TGACG-motif and CGTCA-motif involved in JA response. These results indicate the stress-responsive role of Alba genes across plant species.
We observed the presence of more than one stress-and hormone-responsive elements in the proximal promoter region of the Alba genes. The rice Alba genes were found to contain multiple stress-responsive elements (MBS, HSE, LTR, and TC-rich repeats) along with hormone-responsive elements (CGTCA-motif and TGACG-motif). An ABA-responsive motif (ABRE) was detected in all OsAlba genes except for OsAlba7, suggesting their ABA-mediated regulation. The maize Alba genes also showed an ABRE motif as well as other stress-and hormone-responsive elements in the promoter region. Most Alba genes in sorghum harbored dehydration-and hormone-responsive motifs, including ABRE elements in SbAlba1, SbAlba2, SbAlba4, SbAlba5, and SbAlba8. In chickpea, stress-and hormone-responsive motifs were prevalent in CaAlba3 and CaAlba4, while VvAlba3, VvAlba4, VvAlba5, and VvAlba6 in grape showed the presence of an ABRE motif along with other stress-responsive elements. Arabidopsis Alba genes, AtAlba1, AtAlba2, and AtAlba3 also showed such elements, besides ABRE and MBS motifs in AtAlba1-6. Interestingly, an ABRE motif was found in Chlamydomonas, indicating its origin in primitive plants, which might have become functional in higher plants. Along with other stress-and hormone-responsive motifs, an ABRE motif was also found in PpAlba1 and PpAlba2 of Physcomitrella and in all four Alba genes of Amborella (Table S3). Altogether, these results suggest that the stress-responsive motif along with ABRE in Alba genes might have a key role in stress tolerance.

Prediction of miRNA Targets in Plant-Specific Alba Proteins
Under stress conditions, many genes have been reported to be regulated post-transcriptionally through several miRNA families [55]. We identified miRNA target sequences for miR2673 and few other miRNAs in Alba genes (Table S4) using the miRNA prediction tool (http://plantgrn.noble.org/ psRNATarget/). More than one target sites for miR2673 was found in OsAlba5, ZmAlba5, SbAlba7, and AtAlba1. miR2673 was found to regulate genes responsible for auxin-and ethylene-mediated signal transduction [56], plant defense response, and cellular signaling [57]. This miRNA was also found to be upregulated during the late induction of fruit abscission, senescence, and proline accumulation [58]. A target site for miR5208, which was previously reported to be involved in disease responses, was found in OsAlba7, ZmAlba3, ZmAlba14, ZmAlba8, and SbAlba3 (Table S4) [59]. ZmAlba1 exhibited a target site for miR444, which was earlier shown to be associated with the cold response [60]. Further, a target site for miR394, which was previously documented in hypersalinity and phytohormone response, was found in ZmAlba19 [61][62][63].

Tissue-Specific Expression Profiles of OsAlba Genes
To determine the biological roles of OsAlba genes, their transcript abundance was evaluated in six major organs, viz., roots, stems, leaves, flag leaf, leaf sheath, and panicle. While OsAlba1 and OsAlba7 showed a moderate expression in all tissues analyzed, OsAlba2 and OsAlba5 showed a minimum expression. OsAlba3 and OsAlba9 showed a minimum expression or no expression. However, OsAlba4, OsAlba6, and OsAlba8 were found to be highly expressed in all tissue types (Figure 4). Tissue-specific expression of the OsAlba genes. The transcript profile of the Alba genes was determined in six major organs, viz., roots, stems, leaves, flag leaves, leaf sheath, and panicle. Germinating seedlings (control) were used as a reference to quantitate the relative mRNA levels in different tissues. The error bars indicate SE (standard error). The asterisk marks indicate a statistically significant difference between the control and other tissues (* p ≤ 0.05, ** p ≤ 0.01 and *** p ≤ 0.001 and **** p ≤ 0.0001).
While OsAlba4 was found to be root-specific, OsAlba8 was mostly expressed in the stem. Furthermore, OsAlba4, OsAlba6, and OsAlba8 showed comparatively higher expression in the flag leaf and panicles. The flag leaf and panicles in rice play an important role in providing photosynthates [64] and help in grain filling during seed development [65]. Additionally, the panicles support the survival of seeds during dehydration and heat stress [66], indicating a vital role of OsAlba4, OsAlba6, and OsAlba8 in rice. The differential expression pattern of these genes indicates that they might have a role in the coordination of various physiological pathways.

Stress-Induced Expression of OsAlba Genes
To gain a deeper insight into the role of OsAlba genes in stress tolerance, we investigated their transcript profiles under dehydration, hypersalinity, heat, and cold. The Alba genes showed a diverse expression pattern suggesting their stress-induced differential responses. The transcripts of OsAlba1 were induced significantly under dehydration, heat, and hypersalinity, but showed reduced expression under cold ( Figure 5). Under dehydration, OsAlba3, OsAlba4, OsAlba6, OsAlba7, OsAlba8, and OsAlba9 showed upregulation, OsAlba4, OsAlba7 and OsAlba9 showed a steady-state level, and OsAlba5 showed downregulated expression. Under heat and hypersalinity, OsAlba1, OsAlba2, OsAlba6, and OsAlba7 were upregulated, while OsAlba4 showed downregulated expression. OsAlba8 showed induced expression under heat and lower expression under hypersalinity. Under cold stress, OsAlba3, OsAlba6, and OsAlba7 were upregulated, but OsAlba2, OsAlba4, OsAlba5, and OsAlba8 were downregulated. OsAlba9 showed reduced expression up to 3 h, but its expression was induced subsequently at 6 and 9 h of exposure. The transcript abundance of OsAlba7 was markedly induced under hypersalinity, cold, heat, and dehydration indicating its role in multivariate stress responses. Interestingly, OsAlba3 and OsAlba9 exhibited upregulated expression under dehydration and hypersalinity, suggesting their stress-responsive function. OsAlba6 and OsAlba8 showed tissue-specific expression in the flag leaf and panicles ( Figure 4) and induced expression during heat stress, indicating their possible role in seed maintenance during stress conditions. Several flag leaf-and panicle-specific genes are known to be induced under dehydration, hypersalinity, and heat stress [66,67], and the overlap of stress-responsive gene expression in different organs is no exception in rice [68]. These results demonstrated that most Alba genes in rice are expressed at significantly higher levels under multivariate stresses and phytohormone treatments, but their exact role remains unclear.

Influence of Phytohormones on the Expression of the OsAlba Genes
Phytohormones play an important role in mediating host responses to various biotic and abiotic stresses. ABA controls numerous physiological processes in plants and is best known for its regulatory role in abiotic stress tolerance. Under ABA treatment, OsAlba1, OsAlba4, and OsAlba8 were found to be upregulated, whereas OsAlba2, OsAlba3, OsAlba5, OsAlba6, OsAlba7, and OsAlba9 showed reduced expression. ABA has been reported to promote tolerance to desiccation under conditions of water deficit and hypersalinity [69]. The treatment with SA and JA showed upregulated expression of OsAlba2, OsAlba3, OsAlba4, OsAlba5, OsAlba6, OsAlba7, and OsAlba9 but downregulated expression of OsAlba1 and OsAlba8. The transcript abundance of OsAlba7 was markedly induced under JA, whereas a mix pattern of expression was observed following treatment with SA and ABA, indicating its role in different physiological responses. Traditionally, SA and JA are known to be associated with resistance when plants are inflicted with biotrophic and necrotrophic pathogens [70,71]. However, the stress hormone ABA, better known for its role in the response to drought stress and in the maintenance of seed dormancy [72] has also been demonstrated to influence plant-pathogen interactions [73][74][75]. Altogether, the differential responses of OsAlba genes under various phytohormone treatments suggest the specific physiological roles of individual members of this family in rice.

Three-Dimensional Structure Prediction and Homology Modeling
The structural features of a protein predict its putative interactions and binding to various other molecules or ligands, and eventually provide its sequence-structure-function relationships. The sequence identity/similarity and accurate alignment between the template and a target protein leads to the prediction of 3D models. We selected the best template based on the QMEAN score value. The scores and the parameters of the selected templates for all OsAlba proteins are mentioned in Table 2. The amino acid sequences were submitted to LOMETS [76] to generate 3D structures. The 3D structures predicted for OsAlba proteins ( Figure 6) were aligned to their respective templates in the TM-align server [77]. The predicted model was further analyzed in Chimera 1.2 [78], which showed different number of α-helices, β-strands, and coils ( Figure 6). The OsAlba proteins showed the presence of 2-4 α-helices and 4-6 β-strands, while the coils were found in the range of 7-9, representing a high structural conservation. However, OsAlba7 showed very different structural components having 7 α-helices, 29 β-strands, and 37 coils (Table S5).
The alignment with the template demonstrated a good structural match despite low sequence identities in some of the OsAlba proteins. The percentage sequence identity of the template with the query sequence ranged from 0.11 to 0.64, while the percentage sequence identity between the templates in the threading-aligned region and the query sequence remained between 0.09 and 0.80. The threading alignment coverage ranged between 0.33 and 0.98. The threading alignments had normalized Z-scores of >1.0, which suggests a significant alignment with the respective templates. The C-score for OsAlba4 was −4.03, while the score for OsAlba6 was −0.72. The values of other parameters (number of decoys and cluster density) remained in a reliable range (Table 3). The TM-score was found to be maximum for OsAlba6 (0.62) and minimum for OsAlba8 (0.28). The estimated RMSD was maximum for OsAlba7 and minimum for OsAlba9. The TM-score < 0.17 suggests a random similarity, and the TM-score > 0.5 indicates a model with correct topology. The TM-scores for the predicted template for all the OsAlba proteins were observed within an acceptable range, demonstrating the reliability of the models. The best aligned template and their Protein Database (PDB) IDs along with their sequence identities and query coverage of amino acid residues are mentioned in Table S6.

Structure-Function Relationship of OsAlba Proteins
The analysis of conserved patterns and of their structural components, as observed in the MSA profile of homologous sequences, provided potential information about the possible ligand-binding sites of the OsAlba proteins. The best predicted structural template was found to be IVM0B for OsAlba1, OsAlba6, and OsAlba9. The predicted template for OsAlba4, OsAlba5, and OsAlba8 was 4NL6A (Table S6). We identified the binding residues on the basis of the alignment between the template and the obtained OsAlba models. While OsAlba1, OsAlba2, OsAlba3, OsAlba6, and OsAlba9 showed binding affinity for arginine and nucleic acids and OsAlba1, OsAlba3, and OsAlba6 were predicted to bind both RNA and DNA, OsAlba2 had binding affinity for RNA only. The RNA-binding properties of the Alba proteins in vivo, apart from their DNA-binding ability in a histone-like manner, has been studied in archaea [4,6,12,13]. Earlier, the binding of the Alba proteins to nucleic acids was reported from other species [14]. OsAlba4, OsAlba5, OsAlba7, and OsAlba8 showed no binding to nucleic acids, but showed an affinity for ligands such as KA, PHR, Mg, and chlorophyll-a ( Figure S2). Previous studies had reported the function of Alba in transcriptional regulation through nucleic acid-binding [79]. The binding of the OsAlba proteins with different ligands suggests their various functions in different environmental conditions. The stress-responsive function of OsAlba1 has previously been established in various abiotic stress treatments [53]. GO annotation further suggests various roles of the Alba family proteins (Table S7). While OsAlba1, OsAlba2, and OsAlba9 were found to have a putative role in DNA-packaging and chromosome organization, OsAlba3 was predicted to function in transcriptional regulation and gene expression. OsAlba5 was predicted to help in DNA replication, whereas OsAlba7 was predicted to function in proteolysis. OsAlba4 and OsAlba8 exhibited putative roles in oxido-reductive and metabolic processes, respectively, indicating their similarity in structure as well as function.

Conclusions
In the present study, we identified 68 Alba genes from 9 different species across the plant kingdom and evaluated the gene structure, phylogenetic relationships, upstream regulatory elements, conserved motifs, and their subsequent transcriptional regulation through miRNA target sequences. A number of CAREs were found in the regulatory sequences upstream of the Alba genes, suggesting their expression through a complex regulatory scheme. The transcript profile of the OsAlba genes showed their distinct tissue-specific expression, indicating their specific roles in rice. The OsAlba transcript profiles under dehydration, hypersalinity, heat, cold, and phytohormone treatments indicate that most OsAlba genes might play a crucial role for stress adaptation. Additionally, the distinct subcellular localization of the OsAlba proteins and their homologs in other plant species suggests their organelle-specific biological function. The structural features of the Alba proteins and their evolutionary relationships aided in predicting their putative functions. The results altogether will not only facilitate the understanding of the molecular mechanisms of stress-responsive adaptation in rice but will also give new insights on the role of the Alba proteins in plants, in general.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4425/9/4/183/s1, Table S1: The primer sequences of OsAlba gene for qRT-PCR, Figure S1: Conserved motif arrangement in the plant-specific Alba proteins. The font size of amino acid sequences of each motif represents the frequency of the respective amino acid, Table S2: Subcellular localization of plant-specific Alba proteins, Table S3: cis-acting regulatory elements in plant-specific Alba promoters, Table S4: Identification of miRNA targets in plant-specific Alba genes, Table S5: Secondary structure elements in OsAlba proteins, Table S6: Templates for 3D structure prediction of OsAlba proteins, Figure S2 Table S7: GO annotation for OsAlba proteins.