Characterization and Functional Divergence of a Novel DUF668 Gene Family in Rice Based on Comprehensive Expression Patterns

The domain of unknown function (DUF) superfamily encodes proteins of unknown functions in plants. Among them, DUF668 family members in plants possess a 29 amino-acid conserved domain, and this family has not been described previously. Here, we report this plant-specific novel DUF668 gene family containing 12 OsDUF668 genes in rice (Oryza sativa) and 91 DUF668s for the other seven plant species. In our study, DUF668 genes were present in both dicot and monocot plants, indicating that DUF668 is a conserved gene family that originated by predating the dicot–monocot divergence. Based on the gene structure and motif composition, the DUF668 family consists of two distinct clades, I and II in the phylogenetic tree. Remarkably, OsDUF668 genes clustered on the chromosomes merely show close phylogenetic relationships, suggesting that gene duplications or collinearity seldom happened. Cis-elements prediction display that over 80% of DUF668s contain phytohormone and light responsiveness factors. Further comprehensive experimental analyses of the OsDUF668 family are implemented in 22 different tissues, five hormone treatments, seven environmental factor stresses, and two pathogen-defense related stresses. The OsDUF668 genes express ubiquitously in analyzed rice tissues, and seven genes show tissue-specific high expression profiles. All OsDUF668s respond to drought, and some of Avr9/Cf-9 rapidly elicited genes resist to salt, wound, and rice blast with rapidly altered expression patterns. These findings imply that OsDUF668 is essential for drought-enduring and plant defense. Together, our results bring the important role of the DUF668 gene family in rice development and fitness to the fore.


Introduction
The structures, functions, and evolution patterns of gene families have hitherto long fascinated plant biologists. Our understanding of the interaction and adaptation to plants with the environment is based on the vast amount of information available for these gene families [1]. Among them, the domain of unknown function (DUF) families refers to a certain kind of protein families with domains of unknown function. Generally, DUF families consist of numerous members, and the entire numbering (http://hmmer.org/) [21]. Non-redundant DUF668 proteins were verified by SMART (Simple Modular Architecture Research Tool) (http://smart.embl-heigelberg.de/), and Pfam individually. We searched China's National Rice Data Center (http://www.ricedata.cn/gene/) to obtain annotation information of OsDUF668s. Finally, physicochemical parameters of the rice DUF668 proteins including theoretical isoelectric points (pI) and molecular weights (MW), were generated by ProtParam tools [22], and the subcellular localizations were predicted via the Plant-PLoc server (http://www.csbio.sjtu.edu.cn/bioinf/ plant-multi/) [23]. The existence of DUF668 proteins in different organisms was also consulted in the European Bioinformatics Institute database (EMBL-EBI, https://www.ebi.ac.uk/).

Phylogenetic Analysis, Gene Structure, and Conserved Motifs
Multiple amino acid sequences alignments were performed by ClustalW of MEGA 7.0 with default parameters [24]. Unrooted Neighbor-Joining (NJ) phylogenetic tree was constructed using 103 DUF668 genes in eight plants with 1000 bootstrap replicates. The composition of conserved motifs was searched by the MEME (Multiple Em for Motif Elicitation) online tool (http://meme-suite.org/tools/meme), setting a maximum number to 20 [25]. The TBtools was used to analyze and visualize gene structure (exon/intron) and conserved motifs of DUF668 genes [26]. The phylogenetic tree was visualized by Evolview v2 (https://evolgenius.info//evolview-v2/) [27].

Chromosomal Locations, Gene Duplication Events, and Cis-acting Elements Analysis
Coordinates on the reference genome sequence of rice DUF668 genes were obtained from the genome annotation file. Gene duplication events were discriminated by MCScanX software [28]. Locations of OsDUF668 genes were mapped by TBtools [26]. Also, 2 Kb upstream promoter of each rice DUF668 coding sequence was examined, and the cis-acting regulatory element analysis was executed through the PlantCARE dataset (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [29]. The radar map was generated by R 3.5.1 scripts manually.

Plant Materials and Treatments
In our case, tissue samples were collected from the Oryza sativa ssp. indica cultivar 93-11 under natural field condition. Altogether, 22 tissues were prepared according to the procedure used by Wang et al. 2016 [30], namely root (R, 12-day-old seedlings), leaf (L, 12-day-old seedlings), sink leaf (SL, unexpanded flag leaf before heading stage), sink flag leaf sheath (SiFLS, before heading stage), flag leaf (FL, one week after heading), source flag leaf sheath (SoFLS, one week after heading), node (N, the first node on the top at panicle stage), internode (IN, part between the first node and the second node on the top at panicle stage), panicle (P5, P10, P15, and P20: panicle grown to the length of 5 cm, 10 cm, 15 cm, and 20 cm, respectively), hull of flower (H1, 1-3 days before flowering), stamen (St, 1-3 days before flowering), pistil (Pi, 1-3 days before flowering ), spikelets (Sp1, Sp5, Sp15, Sp20, at 1, 5, 15, and 20 days after flowering), hull of seed (H2, at 12-15 days after flowering), immature seed (IS, embryo and endosperm at 12-15 days after flowering), and calli (Ca, induced 30 days before subculture). Three biological replicates were implemented. Each sample was collected from over three plants and pooled together. Samples were grounded immediately with liquid nitrogen and stored in TRIzol Reagent (Invitrogen) at −80 • C for use.

RNA Isolation and RT-PCR Analysis
Total RNAs of samples were isolated using TRIzol Reagent (Invitrogen). The first strand of cDNA was synthesized from 5µg of total RNA using the M-MLV reverse transcriptase (Promega). The quantitative Real-Time PCR (qRT-PCR) was conducted with gene-specific primers (Table S1) in a 96-well plate by Bio-Rad CFX96 real-time PCR system, using 2× SYBR Green Master Mix reagent (Bio-Rad). The thermal cycles were as follows: 95 • C for 5 min; 40 cycles of 95 • C for 10 s, primer-specific annealing temperature for 10 s, and 72 • C for 15 s; and then melt curve from 65 to 95 • C. Reference genes were selected according to each experimental condition, as described in Table S2 [30]. Based on the corresponding reference gene(s), the relative expression levels of OsDUF668 genes were calculated using the Bio-Rad CFX Manager 2.1 software (tissues), and additionally, log2 (experimental treatments).

Genome-Wide Identification and Characterization of DUF668 Genes in Rice
To identify DUF668 genes in rice, we examined the 29 amino-acid DUF668 conserved domain through the whole rice genome. As a result, we characterized 12 putative non-redundant entries and named all candidates as OsDUF668-1 to OsDUF668-12 according to their chromosomal order, which dispersed unevenly across several chromosomes excluding chromosomes 7-10 ( Figure S1). Then, we measured more OsDUF668s to acquire more information about their chromosome locations, mRNA length, number of amino acids (aa), MW, and theoretical pI ( Table 1). The encoded protein length of those OsDUF668 genes ranged between 357 and 656 aa; meanwhile, their predicted theoretical pI varied from 6.37 to 10.5. We also predicted the subcellular localization of the rice DUF668 proteins, of which the majority (7/12) was supposed to locate in the chloroplast, while the others to be targeted to the nucleus, peroxisome, cytoplasm, and mitochondrion (Table 1). Of note, we noticed that OsDUF668-1, -4, -5, -6, -9, and -12 were annotated as Avr9/Cf-9 rapidly elicited protein whereas the others were not. In this regard, we tempted to speculate that DUF668 genes in rice could be clustered into two distinct groups.

Phylogenetic Analysis, Gene Structure, and Motif Composition of the DUF668s in Eight Plants
To extend our understanding of the DUF668 gene family, we screened out the DUF668 domain in the European Bioinformatics Institute database (EMBL-EBI, https://www.ebi.ac.uk) and found DUF668 genes exist in up to 62 plant organisms excluding animal and other non-plant species (Table S3). We further dissected four model genomes (Homo sapiens, Mus musculus, Drosophila melanogaster, and Danio rerio) and none DUF668 was found. Thus, this finding manifested DUF668s might belong to a kind of plant-specific gene family.
Gramineae is known as the fourth largest flowering plant family, and an essential genetic resource consisted of many commercial crops [35]. Apart from O. sativa, we consequently screened DUF668 members through one model dicot A. thaliana genome and six representative graminaceous genomes, including B. distachyon, O. rufipogpon, P. virgatum, S. bicolor, S. italica, and Z. mays. In the above species genomes, 6, 11, 13, 22, 12, 11, and 16 non-redundant DUF668 family members were detected successively ( Figure 1A; Table S4). To investigate the evolutionary relationships of the DUF668 family in those Gramineae organisms, we conducted a neighbor joining (NJ)-phylogenetic tree combining with exon/intron structural and conserved motifs analysis. The results showed that all 103 DUF668 proteins could be integrated into the two clades I and II ( Figure 1B). Forty-nine and fifty-four DUF668s pertained to clade I and II separately, stating a negligible difference in gene copy number. We noticed that half of the rice OsDUF668s exhibited in clade I, and all of them were Avr9/Cf-9 mediated genes, which confirmed our preceding assumption. Furthermore, these two distinct groups differentiated not only in gene structure but also in motif arrangement. With a few exceptions, DUF668s in clade II had approximately 9-16 exons, in sharp contrast to those in clade I, most of which only possessed one exon ( Figure 1C). We identified 20 different conserved motifs ( Figure 1D; Table S5). Motif 1, and motif 5 were related to unknown function protein DUF668 and DUF3475, while others have no functional annotation. In general, clade II had more motifs than clade I. Motif 1 was the most common motif, presenting in all DUF668 genes. Besides that, the vast majority of DUF668s included motif 2, 3, 4, 5, 6, 7, 8, and 20. Motif 9, 10, 13, 15, and 16 were group-specific elements in clade II, as motif 19 mainly existed in clade I. To sum up, the exon-intron structures and motif constituent of the DUF668 genes substantially conformed to their phylogenetic relationships.  Table S4.

Cis-acting Elements Prediction of DUF668 Genes in Rice
Cis-acting regulatory elements might provide clues for determining gene expression patterns in various organs or under environmental stresses [36]. Previous studies reported a significant positive relationship between response genes and their cis-elements in upstream promoter regions [37,38]. Here, we manifested potential cis-regulatory elements in the 2 Kb upstream regions of the OsDUF668s examined via the PlantCARE database. Thirty-two cis-regulatory elements were detected totally (Figure 2A), which formed eleven subclasses and four main categories as plant growth, abiotic stress, phytohormone responsiveness, and light responsiveness ( Figure 2B). The largest subdivision was light responsiveness, which contained 55.5% predicted cis-elements, including AAA-motif (light-responsive element) and AE-box (part of a module for light response) as representatives. A series of regulatory elements participating in plant hormone responsiveness ranked second. Cis-acting factors respond to abscisic acid, auxin, flavonoid, gibberellin, and salicylic acid were involved. Among them, ABRE (related to the abscisic acid response) was covered the largest portion, followed by the TGA-element (auxin-responsive element). In the abiotic stress response category, elements regarding oxygen-deficient induction (GC-motif, ARE) were the most common, followed by those relating to low-temperature responsiveness (LTR) and drought-inducibility (MBS). As for the plant growth regulation category, only two main stress-related cis-acting factors were identified, known as the CAT-box (referred to meristem expression) and O2-site (involved in zein metabolism regulation). Intriguingly, all kinds of cis-regulatory elements distributed widely throughout the promoter regions of OsDUF668 genes, revealing that OsDUF668s may have intricate expression profiles and be crucial in the regulation of rice development and stress resistance.

Expression Profiles of OsDUF668s in Response to Plant Hormone
Plant hormones, including auxin, cytokinin, and ethylene, have a lasting impact on regulating plant architecture and development [39]. In order to explore the possible functions of OsDUF668s in response to hormone stress, we used qRT-PCR to analyze their relative expressions under 6BA, IAA, GA, SA, and ABA treatments (Figure 4). In this study, the two-fold change (|log2| > 1) was considered to be a significant difference for the gene expression under each treatment. Of all 11 OsDUF668s, seven genes merely up-expressed in any treatment, while the remaining four (OsDUF668 -2, -5, -6, and -11) were examined to be elevating expression in at least one treat. For instance, OsDUF668-5 and OsDUF668-6 were both up-regulated following 6BA and IAA conditions. Also, OsDUF668-5 was induced in 3 h treatment of SA. Expression of OsDUF668-2 and OsDUF668-11 were elevated when applying with ABA. By contrast, more genes were down-regulated or suppressed for certain hormone treatments, such as OsDUF668-4 and OsDUF668-1 under all five hormone conditions, OsDUF668-2 and OsDUF668-10 in all five hormone treatments except for ABA, OsDUF668-3 for 6BA, IAA and GA, and OsDUF668-5 for GA and ABA. The results indicated that these OsDUF668 genes might regulate relevant hormone signaling pathways.

Transcriptional Responses of OsDUF668s Facing Pathogen-Defense Related Stresses
Previous studies have reported that Avr9/Cf-9 rapidly elicited genes play an important role in plant defense for pathogen invasion [14,16]. On the fact that half of the OsDUF668s were annotated as Avr9/Cf-9 rapidly elicited genes, we implemented qRT-PCR to analyze whether OsDUF668s resisted two pathogen-defense related stresses: wound and rice blast ( Figure 6). The results showed that OsDUF668-1, -3, -4, and -5 were significantly elevated following wound and rice blast conditions. Transcripts of those genes were strongly induced in particular with rice blast. OsDUF668-1 showed fast elicitation after 15-min wound stress, meanwhile, OsDUF668-1, -4, and -5 exhibited fast elicitation under 15-min rice blast treatment. In addition, expression levels of OsDUF668-6, -7, -11, and -12 were significantly down-regulated for both pathogen-defense related stresses. The results implied that these OsDUF668 genes might be involved in rice defense for pathogens. Figure 6. Expression changes of OsDUF668s under wound and rice blast stresses. The relative expression levels were showed in the heat map from blue to red. 0h was treated as standard control to normalized data. In the heat map, fold changes reflected relative expression values from blue to red, while gray cells represented undetected values. The abbreviation RB meant rice blast.

Discussion
In this case, we reported plants to possess a unique set of DUF668 genes from genome-wide identification profiling. A further study distinguished 103 DUF668 genes in the databases of dicot Arabidopsis thaliana and eight representative crops in Gramineae, explicating that the DUF668 family originated in an ancestral plant species before the divergence between monocotyledon and dicotyledon ( Figure 1). Thereby, DUF668 has been a conserved gene family for a long evolution. In the monocot grass family (Gramineae), DUF668 family members in each organism were close in number without regard to P. virgatum and Z. mays. One alternative explanation may be that the total number of annotated genes from the Phytozome database in rice (52,425) was in the same range as B. distachyon (52,973), S. bicolor (47,122), B. distachyon (52,973), and S. italica (43,002). Given that Z. mays and P. virgatum had relatively large genome size and a great many genes, the number of DUF668 genes was more than other species consequently. These results indicated similar gene numbers across a broad diversity of economically important grasses. In rice, a multitude of gene family members with high sequence similarity has been observed to cluster on the chromosomes as paralogous pairs [40,41]. However, we identified 12 OsDUF668s and found there were no collinearity relationships among them. It meant that the DUF668 family has still not experienced gene expansion events driven by tandem and segmental duplication. Furthermore, phylogenetic analysis categorized all DUF668 proteins into two distinct lineages differing from the exon-intron association and motif arrangement ( Figure 1). All clade I members in rice were the Avr9/Cf-9 mediated protein without sufficient research.
Cis-acting elements of light, phytohormone, and abiotic stresses responsiveness presented ubiquitously in the promoter region of OsDUF668, indicating these optical and chemical impact factors may interact to act on the DUF668 regulatory mechanism (Figure 2). Tissue expression profiles showed that the majority of OsDUF668s were ubiquitously expressed in various developmental stages (Figure 3), implying their roles in the whole life cycle of rice. OsDUF668-1, -3, -4, -5, -10, -11, and -12 also exhibited high expression levels in some specific tissues (Figure 3), reflecting the functional tissue specificity of these OsDUF668 genes. The positive response of most OsDUF668 genes to hormones was marginal, regardless of that OsDUF668-5 and OsDUF668-6 were significantly up-regulated in 6BA, IAA, and SA treatments (Figure 4), implying their relevant roles in these hormone pathways. Besides, OsDUF668-1 and OsDUF668-4 positively responded to UVB, elucidating their roles in rice adaption to the solar UV light ( Figure 5). A surprising finding was all thje OsDUF668 genes were up-regulated extensively by drought stresses (Figure 5), suggesting that OsDUF668 may be important for rice resistance to drought. Similarly, OsDUF668-1, -2, -4, and -10 were found to positively regulated in NaCl stress ( Figure 5), reflecting that the physiological function of these genes were also involved in plant salt-tolerance mechanisms. Considering the common positive response of OsDUF668-1, -2, -4, and -10 under drought and NaCl stresses overexpression of these four genes in rice may be effective methods to engineering plant fitness for drought and salinity conditions. Avr9/Cf-9 elicited genes play key roles in plant-pathogen interactions. Expression levels of Avr9/Cf-9 elicited genes are found to be differentially changed during transcriptome profile analysis of resistance induced by burdock fructooligosaccharide in tobacco [42]. The protein kinase ACIK1 is encoded by an Avr9/Cf-9 elicited gene and is essential for full Cf-9-dependent disease resistance in tomato [17]. The F-Box protein ACRE189/ACIF1, encoded by another Avr9/Cf-9 elicited gene, regulates cell death, and defense response activated during pathogen recognition in tobacco and tomato [16]. In our study, clade I members of OsDUF668 family OsDUF668-1, -4 and -5 expressed highly, and OsDUF668-6 and OsDUF668-12 were down-regulated in wound and rice blast treatments ( Figure 6), displaying these Avr9/Cf-9 elicited genes may also mediate response to biotic defense in rice. Moreover, OsDUF668-1, -3, -4, and -5 showed positive response under both wound and rice blast conditions (Figure 6), implying that overexpression of these four genes in rice may help to improve plant defense against pathogens. In addition, a key characteristic of the Avr9/Cf-9 elicited gene is that their expression pattern usually rapidly altered (15 min after elicitation) during the biotic defense response [17]. In rice, corresponding Avr9/Cf-9 elicited genes in DUF668 gene family also exhibited response to Dr (OsDUF668-1, -5, -6, and -12), NaCl (OsDUF668-1), wound (OsDUF668-1) and rice blast (OsDUF668-1, -4, and -5) in a rapid manner of 15 min (Figures 5 and 6), implying the conservation and similarity of regulation and function of Avr9/Cf-9 elicited genes among plant species. However, why and how these Avr9/Cf-9 elicited genes respond so fast still need further study in the plant.
Importantly, the conserved evolutionary property of DUF668 genes in the rice genome and the responses of some members to adverse situations reflected their importance for rice growth and resistance to multiple stresses. Tackling the function of the DUF668s nexus will require more powerful approaches like transgenic experiments. Taken together, we provide a detailed and multi-level view of the DUF668 gene family in rice. This study underlies further work to dissect the evolutionary significance of this gene family and simultaneously accelerates functional research of their roles in rice growth and development as well as the stress response.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/12/980/s1, Figure S1: Chromosomal locations of DUF668 genes in rice. Different colors of OsDUF668s represent the members of different subfamilies; Table S1: Primers for qRT-PCR; Table S2: Reference genes for each experimental condition; Table S3: DUF668 gene family number in 62 plant species; Table S4: Summary of information on DUF668 genes from seven other plants; Table S5: Annotation of conserved motifs in DUF668 proteins.