Genome Wide Identification and Annotation of NGATHA Transcription Factor Family in Crop Plants

The NGATHA (NGA) transcription factor (TF) belongs to the ABI3/VP1 (RAV) transcriptional subfamily, a subgroup of the B3 superfamily, which is relatively well-studied in Arabidopsis. However, limited data are available on the contributions of NGA TF in other plant species. In this study, 207 NGA gene family members were identified from a genome-wide search against Arabidopsis thaliana in the genome data of 18 dicots and seven monocots. The phylogenetic and sequence alignment analyses divided NGA genes into different clusters and revealed that the numbers of genes varied depending on the species. The phylogeny was followed by the characterization of the Solanaceae (tomato, potato, capsicum, tobacco) and Poaceae (Brachypodium distachyon, Oryza sativa L. japonica, and Sorghum bicolor) family members in comparison with A. thaliana. The gene and protein structures revealed a similar pattern for NGA and NGA-like sequences, suggesting that both are conserved during evolution. Promoter cis-element analysis showed that phytohormones such as abscisic acid, auxin, and gibberellins play a crucial role in regulating the NGA gene family. Gene ontology analysis revealed that the NGA gene family participates in diverse biological processes such as flower development, leaf morphogenesis, and the regulation of transcription. The gene duplication analysis indicates that most of the genes are evolved due to segmental duplications and have undergone purifying selection pressure. Finally, the gene expression analysis implicated that the NGA genes are abundantly expressed in lateral organs and flowers. This analysis has presented a detailed and comprehensive study of the NGA gene family, providing basic knowledge of the gene, protein structure, function, and evolution. These results will lay the foundation for further understanding of the role of the NGA gene family in various plant developmental processes.


Introduction
Plant growth and development require numerous rigorous regulatory processes, and therefore, transcriptional regulation plays an important role in every stage of plant growth and development. TFs bind to their target genes or adjacent regions and control gene expression by turning them on and off as needed [1,2], and therefore, they play crucial roles in regulating various plant processes and stress responses. To date, many TF families and their binding sites have been reported [3]. One such family of TFs is the B3 superfamily, which regulates the expression of various genes. B3 proteins are expressed in various plant tissues, suggesting a role for B3 proteins in different plant processes [4]. NGA belongs to the RELATED TO ABI3/VP1 (RAV) transcriptional subfamily, forming a subgroup of the B3 superfamily. The RAV subfamily is further divided into two categories: Class-I (proteins AtNCED3 increased under drought stress. A similar pattern was seen in ABA-deficient mutants of Arabidopsis, where NGA induced AtNCED3 to synthesize ABA in response to stress. In addition, a study by Guo et al. [22] showed that the overexpression of MtNGA1 from Medicago truncatula in A. thaliana exhibited increased tolerance to high salt stress. They also exhibited a reduction in the number of branches in the overexpressed lines along with delayed flowering, indicating the importance of NGA as key players in crucial aspects of plant development as well as stress responses. They also examined the reduced shoot branching by analyzing the transcript levels of SMXL genes in the MtNGA1 overexpression lines to observe that the transcript levels of AtSMXL6, AtSMXL7, and AtSMXL8 were downregulated while the expression of AtMAX1/2, AtBRC1, and AtBRC2 were up-regulated. The repressed shoot branching in the transgenic lines provides important evidence that NGA not only influences ABA, but also regulates strigolactones [22].
To date, phylogenetic analyses of the NGA family of a few plant species such as A. thaliana, B. napus, G. max, B. distachyon, O. sativa, P. patens, and M. truncatula have been reported in the literature [10,21]. Furthermore, Pfannebecker et al. [23] combined the phylogeny of members of the NGA family of cruciferous, nightshade, and grass families. Their study concluded that each gene family evolved independently through several rounds of gene duplication events.
In this study, we performed a detailed analysis of the NGA family in higher plant species, focusing on Solanaceae and Poaceae. Phylogenetic reconstruction of the gene family was followed by the characterization of the Solanaceae NGA gene family compared to the monocot members of Poaceae. The characterization included gene and protein structure, protein motifs, promoter analysis, Gene Ontology, and quantitative RT-PCR analysis of the NGA genes. Our obtained data provide a comprehensive understanding of the NGA gene family in higher plants and facilitate further research related to crop plant development and new control methods.

Identification and Characterization of NGA Genes
We used four NGA and three NGA-Like sequences from A. thaliana as the query to identify the NGA sequences in different plant species. An initial search was started with the BLASTP search in phytozome and Ensemble Plants. Databases such as the Sol Genomics Network and Rice Genome Annotation Project were also used to search for NGA family members. Altogether, 460 sequences were retrieved, which were subjected to reciprocal BLASTP against the NGA sequences of A. thaliana in the NCBI. The obtained sequences were checked for the presence of the B3 domain (PF02362.21) using the HMM profile of the Pfam and SMART databases. The validated sequences were further assessed using CD-HIT with a threshold of a ≥90% cut-off to eliminate redundant sequences. After the filtering process, 207 sequences of monocots and dicots were obtained to characterize the gene family (Table S1). The genes were named according to the homology to the Arabidopsis NGA family and the previous literature [6][7][8][9][10][11][12]14,16,18,23].

Phylogenetic Analysis of NGA Family
The evolutionary history of the NGA TF family was investigated by assessing the phylogenetic relationship to classify the NGA proteins. The phylogenetic tree was constructed based on the obtained protein sequences of 18 dicots and 7 monocots (Figure 1; Table S1). The phylogenetic analysis demonstrated that both the NGA and NGA-Like sequences of dicots and monocots diversified into different clades for ( Figures S1 and S2). Furthermore, the NGA and NGA-Like sequences of dicots were grouped based on families such as Brassicaceae (A. thaliana, B. rapa, Camelina sativa), Solanaceae (S. lycopersicum, S. tuberosum, C. annum, N. tabacum), and other species such as Populus trichocarpa, Phaseolus vulgaris, Medicago truncatula, etc. ( Figure S1).
Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2). The symbols are represented to each of the NGATHA proteins as follows. Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2) Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2) Figure S1). Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2).  Figure S1). Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2).  Figure S1). Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2).  Figure S1). Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2).  Figure S1). Among the dicots, the number of NGA and NGA-Like protein sequences varied within species, while among the selected monocots, the number of NGA and NGA-Like protein sequences was almost the same; five and two, respectively. However, there were some exceptions: one interesting feature observed here was that T. aestivum possessed the highest number of proteins (26), followed by C. sativa and Musa accuminata (banana) with 20 and 17 sequences, respectively. Furthermore, NGA sequences of banana were grouped separately from other monocots, indicating that banana evolution is independent of other monocots (as showed in Figures 1 and S2). Altogether, our results indicate that the evolution of NGA and NGA-Like sequences have followed divergent lineages. The similarity of protein sequences of A. thaliana, S. lycopersicum, and O. sativa L. japonica was examined to observe that the proteins were 47.63% similar on average (Table S2).

Physical and Chemical Properties of NGA Family
The physical and chemical properties of the NGA family of Solanaceae (tomato, potato, capsicum, and tobacco) along with Arabidopsis, and Poaceae (rice, sorghum, and Brachypodium) are outlined in Tables 1 and 2. The table shows the details of the NGA family such as gene ID, chromosome locations, length of the gene, complete coding sequence (CDS), protein molecular weight (MW), and isoelectric point (PI) as well as the predicted location of the signal peptide of the respective proteins. The length of the proteins was between 176 amino acids (CaNGA-Like1-1) to 477 amino acids (CaNGA3), with an average of 324 amino acids. The MW of the proteins ranged from 20.31 kDa (CaNGA-Like1-1) to 52.49 kDa (CaNGA3), with an average of 35.84 kDa (SbNGA1). The predicted IP varied from 4.66 (SbNGA4) to 10.45 (BdNGA5). The IP of twenty proteins was below pH 7, while the IP of twenty-seven proteins was above pH 7. The predicted signal peptides showed that the majority of the NGA proteins were located in the nucleus. In contrast, OsNGA2 and OsNGA4 were located in chloroplast and cytoplasm, respectively (Tables 1 and 2).

Gene Structure and Protein Motifs Analyses
The gene structure analysis of Solanaceae members followed a similar pattern as that of A. thaliana (Figure 2a-c). All the NGA genes possessed single introns with some genes possessing untranslated regions (UTRs) and some without UTRs. Among all of the NGAs, StNGA3 included the longest 3 UTR of 4154 bp. Similarly, the NGA-LIKE genes contained three exons and two introns with an exception for AtNGA-LIKE3 and NtNGA-LIKE with two exons and one intron as well as CaNGA-LIKE1-1 with only a single exon and no UTRs. Poaceae members such as rice, sorghum, and Brachypodium also followed a similar fashion as above-mentioned (Figure 2d-f). The NGA-LIKE gene in Poaceae with two exons and a single intron was observed in OsNGA-LIKE1-2. The number of exons (triple exons and double exons) in a gene is not consistent; however, the results suggest that these genes are conserved in gene structure.

Promoter cis-Element Analysis of NGA Genes
The promoter cis-element analysis was performed as followed by Wei et al. [29]. We considered the 1500 bp upstream region of the NGA genes and looked for the presence of  All of the NGA proteins included in the study possessed only one B3 domain (PF02362.21/CL0405). The NGA proteins contained a repressor motif (R/KLFGV) that is responsible for regulating heat stress-related genes ( Figure S3) [16,[23][24][25][26]. Most of the NGA and NGA-Like proteins possessed the repressor motif except for OsNGA3, cCaNGA-Like1-1, and StNGA-Like 1-1 ( Figure S3c,d). These results indicate that NGAs and NGA-Like proteins play an essential role in combating heat stress. We also looked for other protein motifs in the NGAs and NGA-Like sequences using MEME 2.0, represented in Figure 2c,f.
Furthermore, we considered four species of the Solanaceae family (tomato, potato, capsicum, and tobacco), named as "Solanaceae members" in the current study. Three common motifs in the NGA and NGA-Like proteins of the Arabidopsis and Solanaceae members are 1, 2, and 4, representing the conserved B3 domain in these species (Figures 2c and S4a). Motifs 7 and 8 are found distributed among the NGA proteins of tomato, potato, capsicum, tobacco, and A. thaliana, representing conserved domains specific to NGA proteins, named as the NGA-I and NGA-II domains [7,12]. The repressor motif RLFGV is represented in motif 3, and is involved in various stress responses such as heat, salt, and drought [10,22,24,27]. Other motifs such as motif 16 is unique to NGA3 of N. tabacum, while motif 19 is unique to the NGA3 proteins of C. annum and N. tabacum, indicating the importance of these motifs in plant development processes. Similarly, the unique motifs in CaNGA3, NtNGA3-1, and NtNGA3-2 might take part in various stress responses.
Similarly, motif analysis was also performed with NGA protein sequences in members of Poaceae such as O. sativa L. japonica, B. distachyon, and S. bicolor (Figures 2f and S4b), which are named members of Poaceae in this paper. It was observed that three motifs (i.e., motif 1, 2, and 4) are common in both the Solanaceae and Poaceae family members, Arabidopsis and monocots, indicating the conserved B3 domain. Similarly, in Poaceae members, motifs 9 and 10 were conserved only in the NGA sequences, which are named as NGA-II and NGA-I motifs, as described in [7,12]. The repressor motif RLFGV in these monocot species is represented as motif 3 in  Figure S5) [28].
Our results reveal that both Solanaceae and Poaceae members share similar motifs as in Arabidopsis, suggesting a common role of NGAs and NGA-Like sequences. Further experimental analysis would provide more knowledge on the possible roles of these proteins, especially in stress responses.

Promoter cis-Element Analysis of NGA Genes
The promoter cis-element analysis was performed as followed by Wei et al. [29]. We considered the 1500 bp upstream region of the NGA genes and looked for the presence of various cis-elements that include light, hormones (abscisic acid [ABA], gibberellic acid [GA], auxin, methyl jasmonic acid [MeJA], and salicylic acid [SA]), stress-responsive elements (drought inducibility, defense and stress response), and other cis-elements related to anaerobic induction, circadian control, meristem development, flavonoid biosynthesis, lowtemperature responsive, zein metabolism, seed-specific regulation, At-rich DNA binding protein, endosperm expression, anoxic specific inducibility, and cell cycle regulation. The details of the promoter cis-elements of NGA and NGA-LIKE are represented in Figure 3.
to anaerobic induction, circadian control, meristem development, flavonoid biosynthesis, low-temperature responsive, zein metabolism, seed-specific regulation, At-rich DNA binding protein, endosperm expression, anoxic specific inducibility, and cell cycle regulation. The details of the promoter cis-elements of NGA and NGA-LIKE are represented in Figure 3.  [30].
"Others" include promoter elements related to anaerobic induction, circadian control, meristem development, flavonoid biosynthesis, zein metabolism, seed specific regulation, endosperm expression and cell cycle regulation.

Three-Dimensional Structure of NGA Proteins
We analyzed the 3D structure of NGA proteins in Arabidopsis, tomato, and rice (Figures 4 and S6). In Arabidopsis and tomato, we observed that the protein structure had a conserved B3 domain and no AP2 domain, which shows that the NGA structure and function might be conserved in these two species ( Figure S3). Conservation of the B3 domain in NGA proteins is a notable feature in the RAV subclade, except for the presence of the AP2 domain, specifically in RAV proteins. Multi-alignment of NGA proteins also confirmed the conserved nature of the B3 domain in Arabidopsis, tomato, and rice, suggesting the functional conservation of these proteins during evolution ( Figure S3). Furthermore, another five amino acid motifs (RLFGV, green box in Figure S3) seemed to be present in all of the above three species (i.e., RLFGV), indicating an essential role of this motif in plant development. Figure 3. The analysis of cis-acting elements in the promoters of NGATHA genes. The X-axis represents the NGATHA genes, and the Y-axis represents the number of cis-elements in the promoter of each gene. The cis-elements were predicted in the 1500 bp upstream regions using PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/; accessed on 30 January 2022) [30]. "Others" include promoter elements related to anaerobic induction, circadian control, meristem development, flavonoid biosynthesis, zein metabolism, seed specific regulation, endosperm expression and cell cycle regulation.

Three-Dimensional Structure of NGA Proteins
We analyzed the 3D structure of NGA proteins in Arabidopsis, tomato, and rice (Figures 4 and S6). In Arabidopsis and tomato, we observed that the protein structure had a conserved B3 domain and no AP2 domain, which shows that the NGA structure and function might be conserved in these two species ( Figure S3). Conservation of the B3 domain in NGA proteins is a notable feature in the RAV subclade, except for the presence of the AP2 domain, specifically in RAV proteins. Multi-alignment of NGA proteins also confirmed the conserved nature of the B3 domain in Arabidopsis, tomato, and rice, suggesting the functional conservation of these proteins during evolution ( Figure S3). Furthermore, another five amino acid motifs (RLFGV, green box in Figure S3) seemed to be present in all of the above three species (i.e., RLFGV), indicating an essential role of this motif in plant development.

Synteny or Gene Duplication Analysis
Even though the NGA family is relatively well-studied in A. thaliana compared to other species such as B. rapa, the evolution history of the NGA family is not yet understood. In this study, we investigated the evolution and origin of the NGA genes of S. lycopersicum in comparison with A. thaliana ( Figure 5).
We identified five pairs of the syntenic relationship between A. thaliana and S. lycopersicum, where AtNGA-LIKE1 and AtNGA-LIKE2 are both linked to SlNGA-LIKE1-1 as well as SlNGA-LIKE1-3. AtNGA-LIKE3 is linked to SlNGA-LIKE1-1, forming the fifth syntenic pair. Although five genes are paired between A. thaliana and S. lycopersicum, the

Synteny or Gene Duplication Analysis
Even though the NGA family is relatively well-studied in A. thaliana compared to other species such as B. rapa, the evolution history of the NGA family is not yet understood. In this study, we investigated the evolution and origin of the NGA genes of S. lycopersicum in comparison with A. thaliana ( Figure 5).
where 32 syntenic pairs were observed, indicating that B. rapa is the closest relative of A. thaliana (Figure 5b). In addition, gene duplication analysis was also investigated in other dicots such as P. trichocarpa, Vitis vinifera, S. tuberosum and monocots such as B. distachyon with 14, 11, 9, 6, and 1 syntenic pairs, respectively ( Figure S7). These results show that P. trichocarpa is closely related to A. thaliana while B. distachyon seems to be a distant relative to A. thaliana with only one syntenic pair. We further assessed the association (Ka) and dissociation (Ks) constant of NGA genes to understand the evolutionary rates (Tables 3 and 4). The Ka/Ks ratio of most of the genes (is less than 1) indicated that the majority of them have evolved slowly under purifying We identified five pairs of the syntenic relationship between A. thaliana and S. lycopersicum, where AtNGA-LIKE1 and AtNGA-LIKE2 are both linked to SlNGA-LIKE1-1 as well as SlNGA-LIKE1-3. AtNGA-LIKE3 is linked to SlNGA-LIKE1-1, forming the fifth syntenic pair. Although five genes are paired between A. thaliana and S. lycopersicum, the number of synteny events suggests the distant evolutionary relationship between these two species (Figure 5a). However, the gene duplication event was also evaluated between O. sativa L. japonica and A. thaliana, where only one gene pair was observed between OsNGA-LIKE1-2 and AtNGA-LIKE2 (Figure 5c). The syntenic relationship between O. sativa L. japonica and A. thaliana further suggests a distant relationship between these two species. Moreover, despite the distant syntenic relation among the genomes of Arabidopsis, tomato and Arabidopsis, rice, the genes belonging to the same subfamily were linked in each syntenic block, suggesting that these species have evolved from the same ancestor ( Figure 5). The syntenic relationship was also assessed in other species such as B. rapa, where 32 syntenic pairs were observed, indicating that B. rapa is the closest relative of A. thaliana (Figure 5b). In addition, gene duplication analysis was also investigated in other dicots such as P. trichocarpa, Vitis vinifera, S. tuberosum and monocots such as B. distachyon with 14,11,9,6, and 1 syntenic pairs, respectively ( Figure S7). These results show that P. trichocarpa is closely related to A. thaliana while B. distachyon seems to be a distant relative to A. thaliana with only one syntenic pair.
We further assessed the association (Ka) and dissociation (Ks) constant of NGA genes to understand the evolutionary rates (Tables 3 and 4). The Ka/Ks ratio of most of the genes (is less than 1) indicated that the majority of them have evolved slowly under purifying selection pressure. These results indicate that genes have evolved under stringent conditions, thus maintaining the conserved nature of the NGA family during evolution. However, gene pairs of the sorghum NGA gene family, namely, SbNGA-LIKE1-2 paired with other members of the sorghum NGA family, have resulted from positive Darwinian selection, where the Ka/Ks ratio is greater than 1 (Table 4).

Functional Annotation of the NGA Gene Family
According to the GO analysis, NGA family is predicted to be involved in BPs including leaf shaping (GO:0010358), flower development (GO:0009908), response to karrikin (GO:0080167), meristem maintenance (GO:0010073), seed growth regulation (GO:0080113), glucosinolate metabolic processes (GO:0019760), regulation of leaf morphogenesis (GO:1901371), and the regulation of transcription (GO:0006355) ( Figure 6; Table 5). However, MFs include DNA binding (GO:0003677) and protein binding (GO:0005515) while CCs include the nucleus, suggesting that the NGA family transcription factors reside in the nucleus (GO:0005634). The obtained GO data represent that NGA genes play an essential role in regulating lateral organs and the development of gynoecium, and participate in various gene regulations involved in plant development and stress responses.
According to the GO analysis, NGA family is predicted to be involved in BPs including leaf shaping (GO:0010358), flower development (GO:0009908), response to karrikin (GO:0080167), meristem maintenance (GO:0010073), seed growth regulation (GO:0080113), glucosinolate metabolic processes (GO:0019760), regulation of leaf morphogenesis (GO:1901371), and the regulation of transcription (GO:0006355) ( Figure 6; Table  5). However, MFs include DNA binding (GO:0003677) and protein binding (GO:0005515) while CCs include the nucleus, suggesting that the NGA family transcription factors reside in the nucleus (GO:0005634). The obtained GO data represent that NGA genes play an essential role in regulating lateral organs and the development of gynoecium, and participate in various gene regulations involved in plant development and stress responses.

GO Term Annotation Involved Genes
BP GO:0006355 Regulation of transcription, DNA-templated in cotyledon, while very minimal expression was observed in the mature leaf. AtNGA3 expression was almost similar in the cotyledon and flower, and there was a gradual decrease in the AtNGA3 expression in mature leaf followed by the rosetta leaf. AtNGA4 shows the highest expression in mature leaf while the expressions were significantly lower in the cotyledon and flower. Similar to AtNGA4, AtNGA-LIKE1, and AtNGA-LIKE3 expression was significantly higher in the mature leaf while drastically reduced in the cotyledons, flower, and rosetta leaf (Figure 7).

Discussion
The NGA family belonging to the RAV subfamily of the B3 superfamily is relatively well-characterized in A. thaliana compared to other plant species [6,8,9,12,21]. In Arabidopsis, the NGA family is known to be involved in the development of gynoecium and the regulation of lateral organs. However, functional annotation of the NGA family is still an area of limited knowledge. In this study, we performed phylogenetic reconstruction of the NGA family using several dicots (Solanaceae) and monocots (Poaceae) (Figure 1).
The NGA phylogenetic tree has a peculiar feature (i.e., the NGA and NGA-LIKE sequences are very well distinguished, suggesting that these genes have evolved separately The SlNGA and SlNGA-LIKE expression in young tomato leaf was significantly high compared to the cotyledons, flower, and mature leaf (Figure 7). In the cotyledons and mature leaf, SlNGA and SlNGA-LIKE expression was consistently reduced except for SlNGA-LIKE1-1 in cotyledon and SlNGA-LIKE1-3 in mature leaf, where the expression levels were significantly higher. SlNGA2 showed the highest expression in tomato flower, followed by SlNGA1 and SlNGA-LIKE1-3 (Figure 7).

Discussion
The NGA family belonging to the RAV subfamily of the B3 superfamily is relatively well-characterized in A. thaliana compared to other plant species [6,8,9,12,21]. In Arabidopsis, the NGA family is known to be involved in the development of gynoecium and the regulation of lateral organs. However, functional annotation of the NGA family is still an area of limited knowledge. In this study, we performed phylogenetic reconstruction of the NGA family using several dicots (Solanaceae) and monocots (Poaceae) (Figure 1).
The NGA phylogenetic tree has a peculiar feature (i.e., the NGA and NGA-LIKE sequences are very well distinguished, suggesting that these genes have evolved separately with well-demarcated evolution in dicots and monocots (Figure 1)). Furthermore, NGA and NGA-LIKE sequences are defined based on the plant families where members of the Brassicaceae, Solanaceae, and Poaceae are phylogenetically well separated, suggesting that these sequences have resulted from multiple duplication events from the most recent common ancestor. Based on the phylogeny analysis, the NGA sequences from different subfamilies and the number of genes in each species vary. For example, in B. rapa, ten NGAs and seven NGA-LIKE genes were present, while in B. vulgaris, only one NGA and one NGA-LIKE gene were identified. The highest number of genes were identified in C. sativa with 14 NGAs and seven NGA-LIKEs, followed by T. aestivum with 18 NGAs and eight NGA-LIKEs (Figure 1; Figure S2). These results indicate that the NGA genes have evolved due to multiple rounds of duplications leading to the expansion of the gene family. Furthermore, among the monocots, banana forms a distinct clade with respect to both the NGA and NGA-LIKE genes, revealing that the genes within this species might have resulted from repeated segmental duplications (Figures 1 and S2).
Furthermore, the gene structure analysis gives a framework of gene duplications and the functional relationship among the gene families. The exon-intron structures of the NGA family in our analysis revealed that the numbers of exons and introns were conserved among subfamilies, indicating the conserved function of the genes within subfamilies ( Figure 2). The same trend has been observed among the protein structures where the NGA and NGA-Like proteins share some common motifs; however, few unique motifs are only present within the subfamilies or unique to species. For example, motifs 9, 11, 13, 14, 15, 17, 18, and 19 were acquired during evolution in the NGA and NGA-Like proteins of the Solanaceae species such as S. lycopersicum, S. tuberosum, C. annuum, and N. tabacum, indicating novel functions of the proteins. Similarly, monocots such as O. sativa L. japonica, B. distachyon, and S. bicolor possess common motifs that are also present in Solanaceae members, suggesting a conserved function of the NGA proteins. Consistent with these results, the three-dimensional structure of the proteins was conserved in these species; however, minor alterations in the amino acid sequences contribute to the functional variations among the NGA proteins ( Figure 4). The presence of protein motif (RLFGV) in the NGA proteins of A. thaliana, S. lycopersicum, and O. sativa implicates that this motif plays an essential role in plant development ( Figure S4). Consistent with this, it has been observed that AtNGA1 possessing the RLFGV motif directly binds to the promoter of AtNCED3, thereby inducing ABA biosynthesis in Arabidopsis in response to drought stress ( Figure S3) [10]. The presence of the repressor motif is also reported in N. benthamiana, Amborella trichopoda, and Aquilegia caerulea in their respective NGA protein sequences [7]. In addition, this repressor motif is reported to be involved in regulating heat stress in the Heat shock factor B family [7,24,25,27]. These findings indicate the significance of NGA proteins in many aspects of plant development, which is yet to be explored.
The analysis of cis-elements in the promoter region of the genes would provide clues into the transcriptional regulation of the respective genes. NGA genes are also looked for in the upstream cis-regulatory elements. It has been observed that light-responsive elements are present in the promoters of the genes, suggesting that light plays an important role in regulating these genes ( Figure 3). Almost half of the genes were observed to be involved in stress-related responses such as drought inducibility and defense, suggesting that these genes play a role in stress response. The NGA genes also possess hormone response elements such as ABA, GA, MeJA, SA, and auxin. ABA and SA are known to participate in plant stress, and the cis-elements analysis indicates that NGA genes might be involved in defense response [36][37][38]. The presence of auxin-responsive elements in the promoters of the NGA genes is an interesting feature. As discussed above, the NGA family regulates the AtYUC2 and AtYUC4 genes involved in auxin biosynthesis, especially in carpel development [6,12,16]. However, the direct link of auxin responsive elements with NGA regulation is yet to be discovered. Furthermore, some of these genes also implicate their role in gibberellin signaling and methyl jasmonate pathways. In addition, phytohormone ABA seems to play a major part in carpel development [39,40], and the roles of other hormones such as GA, SA, and MeJA in NGA regulation are still not understood. Among the other cis-elements, anaerobic induction and meristem development seem to be majorly involved in the regulating of NGA genes.
Gene duplications are the main source of evolution of gene families, predominantly tandem and segmental duplication events [41]. The synteny analysis of NGA genes of Arabidopsis with tomato, potato, and other species such as P. trichocapra, M. truncatula, and O. Sativa L. japonica showed that most of them have evolved through segmental duplications. However, these duplications are followed by the diversification of gene functions during evolution. In addition, tandem duplications are also not uncommon, as can be seen in the phylogeny with genes or proteins co-existing, resembling their similarities in terms of sequence and functions [42]. The nucleotide variations are the key to evolution within gene families. The Ka/Ks ratio tells us about the synonymous and non-synonymous changes in the gene sequences acquired during evolution and measures the evolutionary pressure of the nucleotide variations within the sequence of the genes [43,44]. The Ka/Ks ratio is assessed in NGA genes. Most of the genes have evolved under negative selection pressure, thereby screening random deleterious mutations, whereas, in S. bicolor, each gene pair with SbNGA-LIKE1-2 showed positive Darwinian selection. Our study revealed that the NGA family has evolved under stringent selection pressure, resulting in the conservation of the gene family.
GO analysis revealed the possible roles of NGA genes in Arabidopsis and tomato. Being derived from the B3 superfamily, NGA is primarily involved in gene regulation by sequence-specific DNA binding activity (including cis-elements) and is predicted to be localized in the nucleus ( Figure 6; Table 5). As evident from the previous literature on the NGA family in A. thaliana and S. lycopersicum, the AtNGAs are involved in regulating leaf morphogenesis and flower development [6,9,12,21,45]. In addition, AtNGA-LIKE1 is thought to be responsive to karrikins, indicating that this gene has a role in Strigolactone signaling. Other functions of NGA-LIKE genes include negative regulation of transcription, seed growth regulation, leaf shaping, and meristem maintenance. The possible roles of the NGA gene family based on the Gene Ontology results implicate the potential role of the genes in plant growth, development, and defense. Consistent with the Gene Ontology results, the gene expression of the NGA family in Arabidopsis and tomato reflected the importance of the genes in regulating leaf morphogenesis and flower development ( Figure 7). However, the localization of NGA proteins would provide better evidence for protein expression in different cell types rather than gene expression studies. These results correlate with the expression of NGA genes in Arabidopsis and B. rapa, affecting the development of lateral organs and floral development [8,9,11,21,46].

Multiple Sequence Alignment and Phylogenetic Tree Construction
The 207 sequences were aligned by MAFFT using default parameters (https://mafft. cbrc.jp/alignment/server/; accessed on 19 August 2021) [48]. The aligned sequences were used for phylogeny using the neighbor-joining (NJ) method and the bootstrap method with 1000 replicates in MEGA 11 [26].

Exon-Intron Structure and Protein Motif and Structure Analysis
The Gene Structure Display Server 2.0 tool was used to illustrate the location and length of the exons and intron within the respective genes (http://gsds.gao-lab.org/; accessed on 13 September 2021) [49]. The protein motifs were predicted using MEME suite version 5.4.1 (https://meme-suite.org/meme/tools/meme; accessed on 16 September 2021) [50]. The protein length was restricted from 6 to 20 amino acids long, and a maximum of 20 motifs was set.

Three-Dimensional (3D) Structure of NGA Proteins
The 3D structure of the full-length proteins was generated using I-TASSER (https: //zhanggroup.org/I-TASSER/; accessed on 18 January 2022). The best threading template close to the target protein was chosen based on the C-score and TM-score [35].

Chromosomal Location and Gene Duplication and Ontology Analysis
The complete genome, gene, and protein sequences were downloaded from the respective databases for the synteny analysis. The Multiple Collinearity Scan Toolkit (MC-ScanX) was used to scan the genome to identify the gene duplicated gene pairs [53]. Finally, the orthologous gene pairs were identified using a Dual synteny plotter in TBtools (https://github.com/CJ-Chen/TBtools; accessed on 18 January 2022) [54]. The association and dissociation constants (Ka and Ks) were assessed using the Ka/Ks calculation tool (http://services.cbu.uib.no/tools/kaks; accessed on 22 Fabruary 2022) [55].

Plant Material
The seeds were surface-sterilized using chlorine gas for four hours and plated on half-strength Murashige and Skoog medium (MS) with 1% sucrose. After 3-day stratification, seedlings were transferred to normal growth condition (150 µmol/m 2 /s, 16/8 h photoperiod, 21 • C and 60% relative humidity). In Arabidopsis, cotyledons, rosetta leaf, mature leaf, and flowers were collected from 7, 21, 28, and 32 day old plants, respectively. In tomatoes, cotyledons, developing young leaves (meristem)m and developed mature leaves (second node from the ground) were collected from 12, 28, and 35 day old plants, respectively. The flowers were obtained from the first set of flowers in both Arabidopsis and tomato.

RNA Extraction and Quantitative (q)PCR Analysis
The RNA from the respective samples was extracted using the Trizol method. First, the contaminating DNA was removed from the extracted RNA using DNase as per the manufacturer's protocol. The RNA integrity was assessed using the RNA 6000 Pico Kit and Agilent 2100 Bioanalyzer. Finally, one µg of RNA was reverse transcribed to cDNA using the iScript™ cDNA Synthesis Kit (BIO-RAD, CZ). This cDNA was used for qPCR analysis. Quantitative PCR was performed using the SYBR Green PCR Master Mix (Agilent Technologies, Santa Clara, CA, USA) on a 7300 Fast Real-Time PCR system (Applied Biosystems, CA, USA). The primer sequences used for Real-Time PCR were designed using Primer3 software (Table S3). Ubiquitin and RNaseH were used as the internal controls for Arabidopsis while actin and ubiquitin were used for tomato. The relative expression was calculated using the formula 2(− ∆∆ CT), where ∆ Ct = (Ct value of target gene) − (Ct value of actin) and ∆∆ CT = ∆ Ct of accession − ∆ Ct of reference.

Conclusions
The comprehensive analysis of the NGA gene family identified 207 sequences that were classified into different gene families according to species. The identified genes from the selected dicot species (Arabidopsis, tomato, potato, capsicum, and tobacco) and monocot species (rice, sorghum and brachypodium) were characterized for gene structure, protein motif, the 3D structure of proteins, gene duplications, Gene Ontology, and expression studies. The gene structure and protein 3D structure revealed the conserved nature of the gene families across different species. Furthermore, Gene Ontology studies implicated the possible roles of the gene families in various aspects of plant development and stress or defense responses. This is in concordance with the gene expression of the NGA genes, suggesting that the NGA genes are mainly involved in the regulation of lateral organs such as the development of the leaves and flowers. Therefore, the detailed characterization of NGA genes in different species is required for further understanding the gene family in various plant developmental processes.