Next Article in Journal
Analysis and Optimization of Thermal Storage Performance of Thermocline Storage Tank with Different Water Distribution Structures
Previous Article in Journal
Technological Research on Preparation of Alkyl Polyglycoside by High-Gravity Impinging Stream-Rotating Packed Bed
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computational Identification and Characterization of Glycosyltransferase 47 (GT47) Gene Family in Sorghum bicolor and Their Expression Profile in Internode Tissues Based on RNA-Seq Data

by
Rehana Rehana
1,
Muhammad Anwar
2,*,
Sarmad Frogh Arshad
3,* and
Muhammad Asif Saleem
4
1
Institute of Plant Breeding & Biotechnology (IPBB), Muhammad Nawaz Shareef University of Agriculture, Multan 60000, Pakistan
2
School of Tropical Agriculture and Forestry (School of Agricultural and Rural Affairs, School of Rural Revitalization), Hainan University, Haikou 570228, China
3
Department of Biochemistry and Biotechnology, Muhammad Nawaz Shareef University of Agriculture, Multan 60000, Pakistan
4
Department of Plant Breeding & Genetics, Bahauddin Zakariya University, Multan 60000, Pakistan
*
Authors to whom correspondence should be addressed.
Processes 2025, 13(3), 628; https://doi.org/10.3390/pr13030628
Submission received: 30 December 2024 / Revised: 16 February 2025 / Accepted: 19 February 2025 / Published: 22 February 2025
(This article belongs to the Section Biological Processes and Systems)

Abstract

:
Sorghum is an essential crop for biofuel. Many glycosyltransferase (GT) families, including GT47, are involved in the production of both types of polysaccharides. However, a comprehensive study related to the GT47 gene family is needed. The glycosyltransferase (GT) 47 family helps in the synthesis of xylose, pectin, and xyloglucan and plays an essential role in the formation of the proper shape of the plant cell wall. In this study, we performed identification, phylogenetic tree, physiochemical properties, subcellular localization, protein–protein interaction network, detection of motif analysis, gene structure, secondary structure, functional domain, gene duplication, Cis-acting elements, sequence logos, and gene expression profiles based on RNA-sequence analyses in the GT47 gene family. As a result, we identified thirty-one members of the GT47 gene family. The phylogenetic analysis grouped them into three distinct clusters. According to their physiochemical properties, all GT47 proteins were hydrophilic, and their molecular weights ranged from 22.7 to 88.6 kDa. Three essential motifs were identified via motif and conserved domain analysis, emphasizing structural conservation. Subcellular localization was proposed for the various functional roles across cellular compartments. While gene structure analysis showed significant variation in introns–exons, promoter study verified susceptibility to phytohormones like ABA. RNA sequencing revealed that several GT47 genes were highly expressed in internodes, and this was linked to biomass accumulation, cell wall manufacturing, and stem elongation. Analysis of networks of protein–protein interactions and Cis-elements confirmed involvement in stress adaptation and growth regulation. These results contribute to a better understanding of the functional and evolutionary significance of the GT47 gene family in sorghum.

1. Introduction

Enzymes named glycosyltransferases (EC 2.4.x.y) move active sugars from one molecule to another, which leads to the formation of glycosidic bonds [1,2]. The Carbohydrate-Active Enzymes (CAZy) database contains information on 105 different GT subfamilies based on expression and sequence similarity [3]. Multiple GT subfamilies have been discovered in a large number of plants. Multiple research studies in the plant species, including Arabidopsis thaliana, demonstrate that glycosyltransferases (GTs) like GT8, GT37, GT43, GT34, GT2, and GT47 are involved in the synthesis of carbohydrates [4,5,6]. The β-glucuronyltransferase domain (pf03016) was used to identify the GT47 family in plants [7,8,9]. The GT47 classification was established on the β-glucuronyltransferase domain of animal exostosins. Exostosins have an additional domain of the GT64 family that catalyzes 1, 4-N-acetylglucosaminyltransferase. The functions of several GT47 genes have been verified.
Multiple families of GTs genes (GT2s, GT8s, GT31s, GT34s, GT37s, and GT47s) are responsible for the synthesis of the three main components of cell walls (hemicellulose, cellulose, and pectin) [7]. Enzymes encoded by the GT47 genes are able to perform the biogenesis of several essential components of the plant cell wall, including pectin, xyloglucans, and xylan. MUR3 is an example of an Arabidopsis gene that encodes a member of the GT47 family, a large and diverse group of type-II membrane proteins that includes xyloglucan galactosyltransferase. Arabidopsis thaliana’s mur3 mutant cultivar exhibits a heavily modified xyloglucans structure [4,10]. Arabidopsis (dicot) and rice (monocot), both have GT47 family members [11].
Additionally, GT47s play an important role in the immune system in response to biotic and abiotic stresses and the growth of seeds in grasses like maize [12]. GT47 proteins are responsible for the synthesis of xylan and xyloglucan in plants. The GT47 gene family may play a potential role in maize against drought stress [13]. Different GT47 genes have dissimilar biological functions. OsGT47A, which is similar to AtIRX10, is involved in making xylan and has the same functions as AtIRX10 in rice [14] The genetic functions of GT47 in cotton have been explored for fiber production [7]. The relationship between GAX biosynthesis and wheat GT43, GT47, and GT75 families has been identified by a recent biochemical study [15]. The β-glucuronyltransferase was previously used to identify the GT47 family in plants. Xylan and xyloglucan are synthesized by plants using GT47 proteins. The primary function of GT47 is xylan synthesis in the secondary cell wall of the plant, and it also performs its best function in biomass development in plants [16].
The grain crop sorghum (Sorghum bicolor L. Moench) is well-known for its ability to block the nitrification pathway through a plant-mediated process called biological nitrification inhibition (BNI). Employing plants which generates BNI is an economically and ecologically responsible way to decrease the nitrogen losses, such as nitrous oxide (N2O) gas emissions and nitrate (NO3) leaching [17].
Sorghum (Sorghum bicolor) is the most widely grown food crop. and has extraordinary tolerance to poor soil conditions [18,19,20]. The grain crop sorghum (Sorghum bicolor L. Moench) is well known for its ability to block the nitrification pathway through a plant-mediated process called biological nitrification inhibition (BNI). Employing plants which generate BNI is an economically and ecologically responsible way to decrease the nitrogen losses, such as nitrous oxide (N2O) gas emissions and nitrate (NO3) leaching [17]. Sorghum’s genome is smaller (730 Mb) than the genomes of lignocellulosic biomass crops like switchgrass and Miscanthus [21]. The main components of sorghum biomass are cellulose (24.3–38.2%), non-cellulosic polysaccharides (12.2–22.1%), lignin (17.2–22.1%), and starch (1.2–38.2%) [22]. Multiple families of glycosyltransferases (GTs), including GT47, are involved in the synthesis of non-cellulosic polysaccharide synthesis, which helps with biomass and biofuel production. However, the GT47 gene family has not been appropriately investigated in sorghum [4]. One must be familiar with the GT47 gene family, which is essential in xylan biosynthesis, to fully understand wood synthesis [23]. Xylan exists in both the primary and secondary cell walls of sorghum (S. bicolor) and is one of the factors that enables biomass recalcitrance in both tissues. Knowing how to maximize sorghum’s GT47 gene family’s contribution to xylan biosynthesis is essential for the grain’s massive biomass yield [24].
Even though the sorghum genome is already available, functional research on sorghum has been limited by the need for a high-efficiency genetic transformation technology [25]. Here, we used computational analysis to profile the GT47 gene family in sorghum.

2. Materials and Methods

2.1. Identification of Gene Family in Sorghum

The GT47 protein sequences of A. thaliana, barley (Hordeum vulgare), wheat (Triticum astivum), rye (Secale cereal), and rice (Oryza sativa) were downloaded from the Phytozome database (https://phytozome-next.jgi.doe.gov/, accessed on 7 March 2022) [26]. The hidden Markov model (HMM) profile of the GT47 conserved domain (PF03016) was obtained from the Pfam database [7,27]. The HMM file was used to query the GT47s in sorghum (S. bicolor), using the Biotechnology Information (NCBI) database, with an E-value threshold of 1.0 × 10−5 [28].

2.2. Sequence Alignment Phylogenetic Analysis

ClustalX 2:0 [29] was used for multiple sequence alignment of GT47 protein sequences with default parameters. The alignment result was used to construct a phylogenetic tree with neighbor-joining method using MEGA 5.0 software (https://mega.software.informer.com/5.0/, accessed on 10 March 2022) [23,30]. Bootstrap resamplings (100) were used to assess the reliability of interior branches. The online tool iTOL (https://itol.embl.de/, accessed on 10 March 2022) was used to display the phylogenetic tree [31]. To improve the reliably of the phylogenetic tree, Splits Tree4 [32] was used to compute a phylogenetic network.

2.3. Physiochemical Properties Production

The online available ExPASy15 tool (https://web.expasy.org/protparam/, accessed on 15 March 2022) was used to compute the total chromosome, start and end point, number of intron and exon, protein molecular weight (MW), positive and negative charge residues, grand average of hydropathicity (GRAVY) total number of amino acids, average residue weight (ARW), isoelectric point (PI), and aliphatic index (AI) for each GT47 protein sequence [33].

2.4. Subcellular Localization Analysis

The subcellular localization was determined using Wolf PSORT Tools (https://wolfpsort.hgc.jp/, accessed on 25 March 2022) [34]. The heatmap of subcellular localization was constructed by using TBtool v 0.665 software [35].

2.5. Protein–Protein Interaction Network Analysis

The GT47 protein interaction network was examined by using the STRING online server (https://string-db.org/, accessed on 7 April 2022) [36].

2.6. Gene Structure and Motif Analysis

The GSDS website (http://gsds.cbi.pku.edu.cn/, accessed on 15 April 2022) was used to analyze the exon and intron structure and draw gene structures [37]. The MEME tool (https://bio.tools/meme/, accessed on 15 April 2022) was utilized to carry out the motif analysis based on the protein sequence of thirty-one GT47 genes [38].

2.7. Secondary Structure Prediction

The Prabi tool (https://npsa-prabi.ibcp.fr/, accessed on 25 April 2022) was used to identify the secondary structure of GT47 protein sequences [39].

2.8. Prediction of Functional Domain

The conserved domain (CDD) website (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi/, accessed on 30 April 2022) was used for the prediction of the GT47 gene family in sorghum [40].

2.9. Gene Duplication Analysis

The TBtool software was used to estimate the nonsynonymous (Ka) and synonymous (Ks) substation rates in GT47 protein sequences, using the formula T = Ks/2ʎ, where ʎ is 6.5 × 10−9 in sorghum [30,41].

2.10. Cis-Acting Regulatory Element Analysis

To identify CAREs, 2000 bp upstream sequences of GT47 were downloaded from the Phytozome database and analyzed using the Plant CARE online server (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 30 May 2022) [42]. The number of occurrences for each CARE motif were counted for GT47 genes, and the most commonly occurring CAREs were used to produce in Tbtools [43].

2.11. Sequences Logos Analysis

The WebLogo 2.8 online tool (https://weblogo.berkeley.edu/logo.cgi, accessed on 1 June 2022) was utilized to analyze sequence logos of GT47 protein sequences [44].

2.12. Expression Profiles of RNA Sequence of Internode of GT47 Gene Family in Sorghum

To investigate the expression patterns of the GT47 gene family in internodes of sorghum, the raw transcriptome data (accession number GSE98817) internodes of Sorghum bicolor were downloaded from the NCBI-GEO database (https://www.ncbi.nlm.nih.gov/geo/, accessed on 7 July 2022) for expression analysis of GT47 sorghum [37]. All RNA-seq data were first quality assessed using FasQC, followed by quality control cleaning, and the relative expression level TPM values of the GT47 gene family were obtained by further computational analysis of the transcriptome database on sorghum genomic through Kallis to 0.46.2 software for gene expression [45]; finally, the heat map tool of Tbtools was used for visualization.

3. Results and Discussion

3.1. GT47 Gene Family Identification in Sorghum Bicolor

According to the NCBI database search results, 39 candidate genes were obtained from the sorghum (V3.1.1) GT47 gene family. These obtained candidate genes were then submitted to several levels of filtering, the most important of which was a 90% identity with the query sequence, the presence of GT47-based domain, and significant E values which resulted in 31 putative GT47 protein sequences which were identified in sorghum (Table 1). To identify the GT47 protein sequences in A. thaliana, barley (Hordeum vulgare), wheat (Triticum aestivum), Zea mays, and rice (Oryza sativa), the GT47 domain (PF03016) was used as queries to perform a local BLAST search in the phytozome v13 database. As results, 25, 24, 27, and 21 GT47 genes in Arabidopsis thaliana, wheat (Triticum aestivum), Zea mays, and barley (Hordeum vulgare), and rice (Oryza sativa) were obtained (Table 2).

3.2. Phylogenetic Tree Analysis

To understand the evolutionary relationships between GT47 genes and GT47 genes in other species, first, we constructed a neighbor-joining phylogenetic tree involving the 31 GT47 protein sequences in sorghum (Figure 1a). Next, we constructed the ML phylogenetic developmental trees using GT47 protein sequences from Arabidopsis thaliana, thaliana, barley (Hordeumv vulgare), wheat (Triticum aestivum), Zea mays, and rice (Oryza sativa) (Figure 1b). The result of the phylogenetic tree showed that 31 genes were clustered into three different taxa (Group I to Group III) (Figure 1a) and 128 GT47 genes were divided into seven groups (Group I-VII) (Figure 1b). Among them, Group VI was the largest subgroup, containing 31 GT47 protein sequences. In contrast, Group II (21) and Group VII (11) had the lowest numbers of GT47 genes. Notably, sorghum GT47 genes are separately clustered as Group VI, and may be GT47 gene specific to the evolutionary development of sorghum.

3.3. Physiochemical Properties of the GT47 Gene Family

In this study, we characterized the protein sequences and physiochemical properties of GT47 family members. The interaction between water and proteins can be estimated using the GRAVY value. The GRAVY score of a peptide is calculated by dividing the total of the hydropath scores of all amino acids by the protein’s length. GRAVY classifies and measures protein hydrophobicity. A positive GRAVY score suggests hydrophobic protein; a negative value indicates hydrophilic protein [46]. As a result, the more negatively the GRAVY number, the more soluble the protein is. All GT47 protein sequences in the current study had a negative GRAVY core (Table 3). These results indicate that every member is a soluble protein. However, the relatively low GRAVY values for GT47 protein sequences in this study indicate that its members are more hydrophilic.
The range of molecular weight (MW) was 22.70 kDa~88.60 kDa and the range of the encoded GT47 protein length was 206~783 amino acids (Table 3). The pH value at which the molecule has no electrical charge is the isoelectric point (pI). It is an essential property of any amino acid because every amino acid has at least two acid–base (titratable) groups [47]. The range isoelectric point of protein is 5.67~10.06 (Table 3). An isoelectric point (pI) below 7 (pI < 7) indicates the acidic nature of the protein. On the other hand, a pI of more than 7 depicts the alkaline nature. Using the aliphatic side chains, such as leucine, isoleucine, valine, and alanine, the aliphatic index of a protein [48] is defined. The range of aliphatic index was 67.47~92.59 for GT47 proteins. The high aliphatic index refers to the fact that protein may be stable for a wide range of temperatures. Various physical and chemical parameters of GT47 protein members like chromosome number, start point, end point, number of introns and exons, total positive and negative charge residues, and average residue weight (ARW) are shown in Table 3.

3.4. Subcellular Localization Analysis

The predicted subcellular localization results showed that the GT47 genes were located in the vacuole (Vacuo), nine in the cytoplasmic/cytosolic, chloroplast (Chlo), and extracellular (Extr), twenty-two in the mitochondria (Mito), seventeen in the plasma membrane (Plas), fifteen in the nuclear and endoplasmic reticulum (E.R), and twelve in the golgi and peroxisome (Pero) (Table 4). Further, TBtool software was utilized to construct the heatmap of subcellular localization for GT47 genes (Figure 2). The heatmap shows expression patterns of GT47 genes in various tissues or conditions (e.g., Chlo, Mito, Plas). The heatmap shows green, yellow–orange, and gray represents high (~20.00), intermediate (~5.00–17.00), and low/no (<2.00) expression levels, respectively. As a result, genes like SbGT47-01 show high expression in Chlo, whereas SbGT47-23 exhibits moderate expression in other tissues.

3.5. Protein–Protein Interaction Network Analysis

A network was constructed using the STRING database to investigate protein–protein interactions between GT47 proteins (Figure 3). In this network, by design, all protein interactions with a minimum interaction score of 0.4 are displayed, and there is a stricter coefficient (0.3) to improve the precision of the interactions. As a result, a network consisting of 41 nodes and 40 edges was formed to represent the relationships between the GT47 genes in sorghum (S. bicolor) (Figure 3). The cluster was composed of closely connected protein interactions, and the balls provided an unspecified effect in the interaction network. The arrows show a positive action effect. The different colors of nodes showed the functional enrichment of the network (Figure 3).

3.6. Prediction of Functional Domain Analysis

The conserved domains were predicted by using the Conserved Domain Database (CDD). As a result, the following domains were present in 31 GT47 genes: exostosin, exostosin superfamily, TyeA, TyeA superfamily, EGF_2, and EGF_CA superfamily (Figure 4). However, the exostosin superfamily existed in all the members of GT47 genes. The TyeA superfamily only existed in the gene (Sobic.001G506900). TyeA appears to be involved in calcium-responsive regulation of the delivery of type III effectors, which are important regulators of growth and development. The IPR000742 domain, also known as the EGF-like domain because it was first identified in the epidermal growth factor, is an evolutionarily conserved protein domain. It is a large protein with 30–40 amino acid residues that is mostly present in animal proteins. Calcium-binding sites have been discovered at the N-terminus of specific EGF-like domains, and many of these proteins require calcium for proper function. Calcium binding may be important for a variety of protein–protein interactions. The structures of the non-calcium-binding EGF domains and the six conserved core cysteine residues form three disulfide bridges. Tandem repeats of EGF CA have been identified. The only gene encoding the EGF CA subfamily was (Sobic.001G486900) (Figure 4).

3.7. Gene Structure and Motif Analysis

The evolution of the sorghum GT47 gene family was further explored by studying the intron–exon structure of GT47 genes. The result showed that the number of intron structures in the GT47 gene family was in the range of 1–13 (Figure 5). Alterations in the structural nature of exons and introns are among the most important contributors to variation between gene families and plant variety. Genes have different functions and expressions because they have different structures [49,50]. Yellow boxes and black lines show exon and intron reactivity in Figure 5. Further, we also elucidated the conserved motifs of GT47 genes by using the MEME (Multiple Em for Motif Elicitation) online servers. Finally, three conserved motifs were identified in 31 GT47 genes, and all motifs were highly conserved in each gene’s members of the GT47 gene family (Figure 6). These results indicate that these motifs are highly conserved among GT47 genes (Figure 6). The sequences, width, and description of 1–3 motifs are present in Table 5. The expanded legend clarifies the colors green (Motif 1: highly conserved domain), yellow (Motif 2: regulatory function), and pink (Motif 3: specific the conserved sequence), and the labeled axis shows motif locations across gene length (Figure 6).

3.8. Secondary Structure Analysis of the GT47 Proteins

As a result, the functional variety of the GT47 proteins is probably reflected in the structural diversity seen in their secondary structure. Variations in the proteins’ alpha helices, extended strands, beta turns, and random coils indicate differences in enzymatic activity, stability, and flexibility. Greater structural stability may be exhibited by proteins with a larger alpha helix and longer strand content, which could improve their catalytic efficacy. In this study, the secondary structure of GT47 proteins consists mainly of alpha helices, extended strands, beta turns, and random coils, whereas alpha helix ranged from 25% to 41.26%, followed by random coils (40.789% to 59.51%), extended strands (9.79% to 17.89%), and beta turns (2.91% to 5.84%) (Table 6).

3.9. Gene Duplication Analysis

In the present study, the difference time (T) was determined as T = Ks/2λ, where λ is (6.5 × 10−9) (Table 7). Only ten pairs of synonymous (Ks) and nonsynonymous substitution (Ka) ratios were calculated using the Ka/Ks calculator in TBtool to analyze the molecular organization rate of paired duplicates of genes in 31 GT47 genes. The Ka/Ks ratio provides insights into the evolutionary pressure acting on the gene. The result showed that the Ka/Ks proportion for copied GT47 genes was in the range from 0.083~0.678, reflecting retention of core functions like cell wall biosynthesis and stress response (Table 7). Each gene tandem is considered to be under a neutral (Ka/Ks = 1), negative (Ka/Ks < 1), or positive (Ka/Ks > 1) selection [51].

3.10. Cis-Acting Elements in the Promoter Regions Analysis

To better understand the potential regulatory mechanisms of GT47 genes in sorghum regarding abiotic stress responses, phytohormones, and growth and development, we further analyzed the 2000 bp promoter of sorghum GT47 genes. A total of four Cis-elements (auxin (IAA), gibberellin (GA), salicylic acid (SA), and abscisic acid (ABA) have occurred in more than one GT47 gene (Figure 7 and Table 8). These phytohormones involved plant growth and development, plant hormone response, and plant defense. The results showed that the genes in the GT47 family may be controlled by a variety of phytohormones. In this analysis, phytohormones ABA are mostly frequently associated with GT47 genes and ensure adaptability to environmental stresses like drought or temperature changes.

3.11. Sequence Logos Analysis

The Sequence WebLogo Chart analysis shows the conservation of GT43 amino acids in Sorghum bicolor, Zea mays, and Arabidopsis. Key conserved residues, such as DXD motifs (Asp-X-Asp), play a role in coordinating divalent cations, essential for enzymatic activity. Glutamine (Q) and tyrosine (Y) contribute to substrate binding, while histidine (H) and glutamate (E) stabilize the transition state [52] (Figure 8). The results showed that the composition of the site amino acids in sorghum, maize, and Arabidopsis were similar.

3.12. Expression Profiles of RNA Sequence of Internode of GT47 Gene Family in Sorghum

The high expression of specific GT47 genes in the internodes of Sorghum bicolor shows their critical roles in stem elongation and structural development. Internodes are essential to plant height and biomass accumulation, primarily due to their contribution to cell wall biosynthesis. The GT47 gene family is identified to encode the glycosyltransferases enzyme that contributes to the synthesis of polysaccharides, pectin, and hemicellulose, which are main mechanisms of the plant cell wall [53]. The results of GT47 gene expression in internodes tissue showed that Sobic.003G410800, Sorbic.003G410700, Sobic.009G220200, and Sobic.00G220100 were highly expressed in internodes (Int) of the sorghum, which are likely to play a vital role in building key cell wall components that provide flexibility and mechanical strength to support stem elongation (Figure 9). Their activity enhances composition of hemicelluloses, thus contributing to the biomass accumulation and integrity in sorghum. However, genes such as Sobic.003G331000, Sobic.003G360300, Sobic.001G387300, Sobic.001G506300, Sobic.001G538700, Sobic.001G303300, and Sobic.006G260900 had lower expression, showing their chief roles in other tissues or stages of development, suggesting functional variation within the GT47 family (Figure 9). The gene expression level in such heatmap is measured in FPKM (Fragments Per Kilobase of transcript per Million mapped reads) in RNA-Seq data.

4. Discussion

GTs serve important functions in organisms’ anabolic metabolic metabolism of carbohydrates [54]. Many polysaccharide molecules are synthesized with the help of multiple GT genes. However, more research is needed on sorghum GTs genes, particularly the GT47 gene family. In the GTs, the β-glucuronyltransferase glucuronyltransferase domain is used to classify the GT47 gene family [55]. Sorghum, as a typical C4,, holds significant importance as a raw material in nutrition, brewing industries, and animal husbandry. Despite the publication of the reference genome of sorghum [56], a comprehensive understanding of the GT47 gene family in sorghum is still lacking. This gene family’s evolutionary and functional significance in sorghum, especially for its applications in nutrition, brewing, and animal husbandry, might be highlighted by contrasting it with other C4 crops like maize that have been the subject of more thorough research [57].
This study identified 31 GT47 genes in sorghum plants. According to previous studies on gene families [7], GT47s were named from GT47-1 to GT47-31 (Table 1). Among the groups in sorghum, Group 1, Group II, and Group III contained 16, 4, and 11 members, respectively (Figure 1a). The results from the phylogenetic tree indicate an increase in the number of GT47 family members during plant evolution, potentially driven by duplication, which could serve as the primary source of plant evolution and genetic variation (Figure 1b). GT47 family members exhibit relatively similar molecular weights and short protein lengths (Table 1 and Table 2). The analysis of subcellular localization revealed that GT47 family members existed in different compartments of sorghum (Figure 2 and Table 4). Protein–protein interaction network results showed that all genes were related to one another in the GT47 gene family [58]. Conserved domains analysis revealed that the GT47 gene family has six domains, but the exostosin superfamily was presented in all 31 GT47 genes (Figure 2). Furthermore, the distribution of conserved protein motifs among GT47 family members is shown in Figure 5. The analysis of gene structure and motifs revealed that most GT47s within the family have 1–13 introns and the number and distribution of exons and introns are relatively consistent (Figure 6).
The analysis of the secondary structure of GT47 genes exhibited the different composition of alpha helix, random coils, extended strands, and beta turns in sorghum as in other plants [59]. During evolution, both pair duplication and segmental duplication help make gene families [60]. Analysis of gene duplication revealed that all GT47 family members have a Ka/Ks value < 1, which showed the positive selection pressure between GT47 genes in sorghum. The statistical analysis of the Cis-acting elements showed the existence of multiple elements involved in response to plant stress, growth, development, and hormone response, indicating the functional diversity of this gene family as shown in a previous study [12,61]. GT47s have similar functions due to their conserved domains, but the diversity of non-conserved partial structures also gives them specificity [62]. The results of expression RNA-Seq analysis showed that genes like Sobic.003G410800, Sorbic.003G410700, Sobic.009G220200, and Sobic.00G220100 were highly expressed in internodes of the sorghum plant (Figure 9).
In the sorghum genome, we identified 31 members of the GT47 gene family and performed predictive analysis to explore their protein properties, gene structure, motifs, evolutionary relationships, promoter elements, RNA-Seq expression, sequence logo, multiple sequence alignments, gene duplication, 3D model, and secondary structure. The results indicated that GT47 gene family members may be involved in plant growth and development as well as stress response. Our results provided important information on the functional role of GT47 genes in sorghum plants. Integrating this analysis can aid in screening potential GT47 genes for further functional identification, concurrently aiming to improve the resistance of gramineous crops and ensure food security.

5. Conclusions

In this study, we identified 31 GT47 members in sorghum and classified them into three groups based on the phylogenetic tree. Gene structure and conserved motif analysis exposed the presence of conserved cysteine residues and highlighted their evolutionary significance. The promoter analysis indicated the involvement of the GT47 genes in abiotic stress responses, with the ABA-associated cis element playing a vital role in drought adaptability. Subcellular localization and RNA-seq expression profiling identified key genes such as Sobic.003G410800, Sobic.003G410700, Sobic.009G220200, and Sobic.00G220100 as being expressed in internodes, signifying their contribution in stem elongation and biomass accumulation. Also, gene duplication and selection analysis supported the strong functional conservation of GT47 genes. The results provide valuable ideas about the role of GT47 genes in plant development, growth, and stress resilience. This study helps our understanding of this plant gene family by providing the basis for future functional identification of the gaps in the GT47 gene family.

Author Contributions

R.R., M.A., S.F.A. and M.A.S.: performed the analysis, writing, and investigation of this study. R.R., M.A., S.F.A. and M.A.S.: made an equal contribution to the article and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by Hainan University Research Initiation Project Fund (XJ2400005264).

Data Availability Statement

Data will be available on request.

Acknowledgments

This article is part of the research thesis “Identification and computational proteomic characterization of glycosyltransferase (GT47) gene family in sorghum germplasm of Pakistan”. Authors are thankful to the staff of the biotechnology lab, Muhammad Nawaz Shareef University of Agriculture, Multan, 60000, Pakistan.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial relationships that could be construed as a potential conflict of interest.

Abbreviations

MWMolecular weight
ARWAverage residue weight
PIIsoelectric point
GRAVYGrand average of hydropathicity
AIAliphatic index
GAGibberellin
SASalicylic acid
ABAAbscisic acid
EREndoplasmic reticulum

References

  1. Tvaroška, I. Glycosyltransferases as Targets for Therapeutic Intervention in Cancer and Inflammation: Molecular Modeling Insights. Chem. Pap. 2022, 76, 1953–1988. [Google Scholar] [CrossRef]
  2. Capurro, J.I.B.; Hopkins, C.W.; Sottile, G.P.; González Lebrero, M.C.; Roitberg, A.E.; Marti, M.A. Theoretical Insights into the Reaction and Inhibition Mechanism of Metal-Independent Retaining Glycosyltransferase Responsible for Mycothiol Biosynthesis. J. Phys. Chem. B 2017, 121, 471–478. [Google Scholar] [CrossRef] [PubMed]
  3. Davies, G.; Gilbert, H.; Henrissat, B.; Svensson, B.; Vocadlo, D.; Williams, S. Ten Years of CAZypedia: A Living Encyclopedia of Carbohydrate-Active Enzymes. Glycobiology 2018, 28, 3–8. [Google Scholar] [CrossRef]
  4. Xu, H.; Ding, A.; Chen, S.; Marowa, P.; Wang, D.; Chen, M.; Hu, R.; Kong, Y.; O’Neill, M.; Chai, G.; et al. Genome-Wide Analysis of Sorghum GT47 Family Reveals Functional Divergences of MUR3-like Genes. Front. Plant Sci. 2018, 871, 1773. [Google Scholar] [CrossRef]
  5. Burton, R.A.; Fincher, G.B. Evolution and Development of Cell Walls in Cereal Grains. Front. Plant Sci. 2014, 5, 456. [Google Scholar] [CrossRef]
  6. Brown, D.M.; Goubet, F.; Wong, V.W.; Goodacre, R.; Stephens, E.; Dupree, P.; Turner, S.R. Comparison of Five Xylan Synthesis Mutants Reveals New Insight into the Mechanisms of Xylan Synthesis. Plant J. 2007, 52, 1154–1168. [Google Scholar] [CrossRef]
  7. Wu, A.; Hao, P.; Wei, H.; Sun, H.; Cheng, S.; Chen, P.; Ma, Q.; Gu, L.; Zhang, M.; Wang, H.; et al. Genome-Wide Identification and Characterization of Glycosyltransferase Family 47 in Cotton. Front. Genet. 2019, 10, 824. [Google Scholar] [CrossRef]
  8. Harholt, J.; Jensen, J.K.; Sørensen, S.O.; Orfila, C.; Pauly, M.; Scheller, H.V. ARABINAN DEFICIENT 1 Is a Putative Arabinosyltransferase Involved in Biosynthesis of Pectic Arabinan in Arabidopsis. Plant Physiol. 2006, 140, 49–58. [Google Scholar] [CrossRef]
  9. Rai, K.M.; Thu, S.W.; Balasubramanian, V.K.; Cobos, C.J.; Disasa, T.; Mendu, V. Identification, Characterization, and Expression Analysis of Cell Wall Related Genes in (Sorghum bicolor (L.)) Moench, a Food, Fodder, and Biofuel Crop. Front. Plant Sci. 2016, 7, 1287. [Google Scholar] [CrossRef]
  10. Mariette, A.; Kang, H.S.; Heazlewood, J.L.; Persson, S.; Ebert, B.; Lampugnani, E.R. Not Just a Simple Sugar: Arabinose Metabolism and Function in Plants. Plant Cell Physiol. 2021, 62, 1791–1812. [Google Scholar] [CrossRef]
  11. Zhang, B.; Zhao, T.; Yu, W.; Kuang, B.; Yao, Y.; Liu, T.; Chen, X.; Zhang, W.; Wu, A.M. Functional Conservation of the Glycosyltransferase Gene GT47A in the Monocot Rice. J. Plant Res. 2014, 127, 423–432. [Google Scholar] [CrossRef] [PubMed]
  12. Tan, J.; Miao, Z.; Ren, C.; Yuan, R.; Tang, Y.; Zhang, X.; Han, Z.; Ma, C. Evolution of Intron-Poor Clades and Expression Patterns of the Glycosyltransferase Family 47. Planta 2018, 247, 745–760. [Google Scholar] [CrossRef] [PubMed]
  13. Urbanowicz, B.R.; Peña, M.J.; Moniz, H.A.; Moremen, K.W.; York, W.S. Two Arabidopsis Proteins Synthesize Acetylated Xylan in Vitro. Plant J. 2014, 80, 197–206. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, B.; Gao, Y.; Zhang, L.; Zhou, Y. The Plant Cell Wall: Biosynthesis, Construction, and Functions. J. Integr. Plant Biol. 2021, 63, 251–272. [Google Scholar] [CrossRef]
  15. Zhong, R.; Ye, Z.H. Secondary Cell Walls: Biosynthesis, Patterned Deposition and Transcriptional Regulation. Plant Cell Physiol. 2015, 56, 195–214. [Google Scholar] [CrossRef]
  16. Wu, A.; Rihouey, C.; Seveno, M.; Ho, E.; Singh, S.K.; Matsunaga, T.; Ishii, T.; Lerouge, P.; Marchant, A. The Arabidopsis IRX10 and IRX10-LIKE Glycosyltransferases Are Critical for Glucuronoxylan Biosynthesis during Secondary Cell Wall Formation. Plant J. 2009, 2, 718–731. [Google Scholar] [CrossRef]
  17. Vega-Mas, I.; Ascencio-Medina, E.; Menéndez, S.; González-Torralba, J.; González-Murua, C.; Marino, D.; González-Moro, M.B. Selecting an Optimal Sorghum Cultivar Can Improve Nitrogen Availability and Wheat Yield in Crop Rotation. J. Sci. Food Agric. 2024, 105, 1930–1940. [Google Scholar] [CrossRef]
  18. Kolozsvári, I.; Kun, Á.; Jancsó, M.; Palágyi, A.; Bozán, C.; Gyuricza, C. Agronomic Performance of Grain Sorghum (Sorghum bicolor). Agronomy 2022, 12, 1185. [Google Scholar] [CrossRef]
  19. Byrt, C.S.; Grof, C.P.L.; Furbank, R.T. C4 Plants as Biofuel Feedstocks: Optimising Biomass Production and Feedstock Quality from a Lignocellulosic Perspective. J. Integr. Plant Biol. 2011, 53, 120–135. [Google Scholar] [CrossRef]
  20. Taylor, S.H.; Hulme, S.P.; Rees, M.; Ripley, B.S.; Woodward, I.F.; Osborne, C.P. Ecophysiological Traits in C3 and C4 Grasses: A Phylogenetically Controlled Screening Experiment. New Phytol. 2010, 185, 780–791. [Google Scholar] [CrossRef]
  21. Hoang, N.V.; Furtado, A.; Botha, F.C.; Simmons, B.A.; Henry, R.J. Potential for Genetic Improvement of Sugarcane as a Source of Biomass for Biofuels. Front. Bioeng. Biotechnol. 2015, 3, 182. [Google Scholar] [CrossRef] [PubMed]
  22. Khalil, S.R.; Abdelhafez, A.A.; Amer, E.A.M. Evaluation of Bioethanol Production from Juice and Bagasse of Some Sweet Sorghum Varieties. Ann. Agric. Sci. 2015, 60, 317–324. [Google Scholar] [CrossRef]
  23. Wilson, L.F.L.; Dendooven, T.; Hardwick, S.W.; Echevarría-Poza, A.; Tryfona, T.; Krogh, K.B.R.M.; Chirgadze, D.Y.; Luisi, B.F.; Logan, D.T.; Mani, K.; et al. The Structure of EXTL3 Helps to Explain the Different Roles of Bi-Domain Exostosins in Heparan Sulfate Synthesis. Nat. Commun. 2022, 13, 3314. [Google Scholar] [CrossRef] [PubMed]
  24. Dien, B.S.; Sarath, G.; Pedersen, J.F.; Sattler, S.E.; Chen, H.; Funnell-Harris, D.L.; Nichols, N.N.; Cotta, M.A. Improved Sugar Conversion and Ethanol Yield for Forage Sorghum (Sorghum bicolor L. Moench) Lines with Reduced Lignin Contents. Bioenergy Res. 2009, 2, 153–164. [Google Scholar] [CrossRef]
  25. Sharma, R.; Liang, Y.; Lee, M.Y.; Pidatala, V.R.; Mortimer, J.C.; Scheller, H. V Agrobacterium—Mediated Transient Transformation of Sorghum Leaves for Accelerating Functional Genomics and Genome Editing Studies. BMC Res. Notes 2020, 13, 116. [Google Scholar] [CrossRef]
  26. Sezer, F. Identification and Gene Expression Analysis of TIR1 in Easy and Hard to Root Olive (Olea europaea L.) Cultivars. Int. J. Innov. Approaches Sci. Res. 2023, 7, 1–8. [Google Scholar] [CrossRef]
  27. Kozlova, L.V.; Nazipova, A.R.; Gorshkov, O.V.; Gilmullina, L.F.; Sautkina, O.V.; Petrova, N.V.; Trofimova, O.I.; Ponomarev, S.N.; Ponomareva, M.L.; Gorshkova, T.A. Identification of Genes Involved in the Formation of Soluble Dietary Fiber in Winter Rye Grain and Their Expression in Cultivars with Different Viscosities of Wholemeal Water Extract. Crop J. 2022, 10, 532–549. [Google Scholar] [CrossRef]
  28. Ge, H.; Xu, J.; Hua, M.; An, W.; Wu, J.; Wang, B.; Li, P.; Fang, H. Genome-Wide Identification and Analysis of ACP Gene Family in Sorghum bicolor (L.) Moench. BMC Genom. 2022, 23, 538. [Google Scholar] [CrossRef]
  29. Contreras-Moreira, B.; Saraf, S.; Naamati, G.; Casas, A.M.; Amberkar, S.S.; Flicek, P.; Jones, A.R.; Dyer, S. GET_PANGENES: Calling Pangenes from Plant Genome Alignments Confirms Presence-Absence Variation. Genome Biol. 2023, 24, 223. [Google Scholar] [CrossRef]
  30. Górecki, K.; McEvoy, M.M. Phylogenetic Analysis Reveals an Ancient Gene Duplication as the Origin of the MdtABC Efflux Pump. PLoS ONE 2020, 15, e0228877. [Google Scholar] [CrossRef]
  31. Zhou, T.; Xu, K.; Zhao, F.; Liu, W.; Li, L.; Hua, Z.; Zhou, X. Itol.Toolkit Accelerates Working with ITOL (Interactive Tree of Life) by an Automated Generation of Annotation Files. Bioinformatics 2023, 39, btad339. [Google Scholar] [CrossRef] [PubMed]
  32. Pereira, D.S.; Hilário, S.; Gonçalves, M.F.M.; Phillips, A.J.L. Diaporthe Species on Palms: Molecular Re-Assessment and Species Boundaries Delimitation in the D. arecae Species Complex. Microorganisms 2023, 11, 2717. [Google Scholar] [CrossRef]
  33. Basumatary, D.; Saikia, S.; Yadav, H.S.; Yadav, M. In Silico Analysis of Peroxidase from Luffa acutangula. 3 Biotech 2023, 13, 25. [Google Scholar] [CrossRef] [PubMed]
  34. Nash, B.; Gregory, W.F.; White, R.R.; Protasio, A.V.; Gygi, S.P.; Selkirk, M.E.; Weekes, M.P.; Artavanis-Tsakonas, K. Large-Scale Proteomic Analysis of T. Spiralis Muscle-Stage ESPs Identifies a Novel Upstream Motif for in Silico Prediction of Secreted Products. Front. Parasitol. 2023, 2, 1078443. [Google Scholar] [CrossRef] [PubMed]
  35. Yan, L.; Fang, Z.; Zhang, N.; Yang, L.; Zhang, Y.; Zhuang, M.; Lv, H.; Ji, J.; Wang, Y. Genome-Wide Identification, Characterization, and Expression Analysis of the Geranylgeranyl Pyrophosphate Synthase (GGPPS) Gene Family Reveals Its Importance in Chloroplasts of Brassica oleracea L. Agriculture 2023, 13, 1615. [Google Scholar] [CrossRef]
  36. Verma, R.N.; Malik, Z.; Singh, G.P.; Subbarao, N. Healthcare Analytics Identification of Key Proteins in Host—Pathogen Interactions between Mycobacterium Tuberculosis and Homo Sapiens: A Systematic Network Theoretical Approach. Healthc. Anal. 2022, 2, 100052. [Google Scholar] [CrossRef]
  37. Zhang, A.; Xu, J.; Xu, X.; Wu, J.; Li, P.; Wang, B.; Fang, H. Genome-Wide Identification and Characterization of the KCS Gene Family in Sorghum (Sorghum bicolor (L.) Moench). PeerJ 2022, 10, e14156. [Google Scholar] [CrossRef]
  38. Coff, L.; Chan, J.; Ramsland, P.A.; Guy, A.J. Identifying Glycan Motifs Using a Novel Subtree Mining Approach. BMC Bioinform. 2020, 21, 1–18. [Google Scholar] [CrossRef]
  39. Li, H.; Chapla, D.; Amos, R.A.; Ramiah, A.; Moremen, K.W.; Li, H. Structural Basis for Heparan Sulfate Co-Polymerase Action by the EXT1–2 Complex. Nat. Chem. Biol. 2023, 19, 565–574. [Google Scholar] [CrossRef]
  40. Song, N.; Liang, H.; An, Y.; Bai, S.; Ma, F.; Zhang, Z.; Li, H.; Zhou, Y.; Guo, G.; Song, C. Identification and Bioinformatics Analysis of KNOX Gene Family in Wheat (Triticum aestivum L.). Mol. Plant Breed. 2021, 12, 1–11. [Google Scholar] [CrossRef]
  41. Khan, W.; Liu, W.; Liu, Z.; Zhu, X.; Wu, J.; Wang, P. Genome-Wide Identification, Expression Analysis, and Functional Verification of the JMJ (Jumonji) Histone Demethylase Gene Family in Pear (Pyrus bretchneideri). Tree Genet. Genomes 2023, 19, 10. [Google Scholar] [CrossRef]
  42. Zhu, K.; Wu, Q.; Huang, Y.; Ye, J.; Xu, Q.; Deng, X. Genome-Wide Characterization of Cis-Acting Elements in the Promoters of Key Carotenoid Pathway Genes from the Main Species of Genus Citrus. Hortic. Plant J. 2020, 6, 385–395. [Google Scholar] [CrossRef]
  43. Wang, S.; Li, R.; Zhou, Y.; Fernie, A.R.; Ding, Z.; Zhou, Q.; Che, Y.; Yao, Y.; Liu, J.; Wang, Y.; et al. Pectin Methylesterase (MePME) Genes to Filter Candidate Gene Responses to Multiple Abiotic Stresses. Plants 2023, 12, 2529. [Google Scholar] [CrossRef] [PubMed]
  44. Yang, G. Genomic Analysis, Evolution and Characterization of E3 Ubiquitin Protein Ligase (TRIM) Gene Family in Common Carp (Cyprinus carpio). Genes 2023, 14, 667. [Google Scholar] [CrossRef]
  45. Qiu, C.; Chen, J.; Wu, W.; Liao, B.; Zheng, X.; Li, Y.; Huang, J.; Shi, J.; Hao, Z. Of COBRA-like Gene Family in Liriodendron chinense. Plants 2023, 12, 1616. [Google Scholar] [CrossRef]
  46. Ur Rehman, S.; Nadeem, A.; Javed, M.; Ul Hassan, F.; Luo, X.; Khalid, R.B.; Liu, Q. Genomic Identification, Evolution and Sequence Analysis of the Heat-Shock Protein Gene Family in Buffalo. Genes 2020, 11, 1388. [Google Scholar] [CrossRef]
  47. Sun, L.; Chen, Q.; Lu, H.; Wang, J.; Zhao, J.; Li, P. Electrodialysis with Porous Membrane for Bioproduct Separation: Technology, Features, and Progress. Food Res. Int. 2020, 137, 109343. [Google Scholar] [CrossRef]
  48. Sanjaya, R.E.; Dwi, K.; Putri, A.; Kurniati, A.; Rohman, A. In Silico Characterization of the GH5- Cellulase Family from Uncultured Microorganisms: Physicochemical and Structural Studies. J. Genet. Eng. Biotechnol. 2021, 19, 143. [Google Scholar] [CrossRef]
  49. Mo, F.; Li, L.; Zhang, C.; Yang, C.; Chen, G.; Niu, Y.; Si, J.; Liu, T.; Sun, X.; Wang, S.; et al. Genome-Wide Analysis and Expression Profiling of the Phenylalanine Ammonia-Lyase Gene Family in Solanum tuberosum. Int. J. Mol. Sci. 2022, 23, 6833. [Google Scholar] [CrossRef]
  50. Wang, X.; Tang, Q.; Zhao, X.; Jia, C.; Yang, X.; He, G.; Wu, A.; Kong, Y.; Hu, R.; Zhou, G. Functional Conservation and Divergence of Miscanthus Lutarioriparius GT43 Gene Family in Xylan Biosynthesis. BMC Plant Biol. 2016, 16, 102. [Google Scholar] [CrossRef]
  51. Li, Z.; Wang, X.; Yang, K.; Zhu, C.; Yuan, T.; Wang, J.; Li, Y.; Gao, Z. Identification and Expression Analysis of the Glycosyltransferase GT43 Family Members in Bamboo Reveal Their Potential Function in Xylan Biosynthesis during Rapid Growth. BMC Genom. 2021, 22, 876. [Google Scholar] [CrossRef] [PubMed]
  52. Boens, S.; Szekér, K.; Van Eynde, A.; Bollen, M. Interactor-Guided Dephosphorylation by Protein Phosphatase-1. In Phosphatase Modulators; Millán, J., Ed.; Methods in Molecular Biology Series 1053; Humana Press: Totowa, NJ, USA, 2013; ISBN 978-1-62703-561-3. [Google Scholar]
  53. Oliver, J.; Fan, M.; McKinley, B.; Zemelis-Durfee, S.; Brandizzi, F.; Wilkerson, C.; Mullet, J.E. The AGCVIII Kinase Dw2 Modulates Cell Proliferation, Endomembrane Trafficking, and MLG/Xylan Cell Wall Localization in Elongating Stem Internodes of Sorghum bicolor. Plant J. 2021, 105, 1053–1071. [Google Scholar] [CrossRef] [PubMed]
  54. Yamada, R.; Han, S.; Park, H. Complete Genome Analysis of Subtercola sp. PAMC28395: Genomic Insights into Its Potential Role for Cold Adaptation and Biotechnological Applications. Microorganisms 2023, 11, 1480. [Google Scholar] [CrossRef] [PubMed]
  55. Coutinho, P.M.; Stam, M.; Blanc, E.; Henrissat, B. Why Are There so Many Carbohydrate-Active Enzyme-Related Genes in Plants? Trends Plant Sci. 2003, 8, 563–565. [Google Scholar] [CrossRef]
  56. Tanwar, R.; Panghal, A.; Chaudhary, G.; Kumari, A.; Chhikara, N. Nutritional, Phytochemical and Functional Potential of Sorghum: A Review. Food Chem. Adv. 2023, 3, 100501. [Google Scholar] [CrossRef]
  57. Courtial, A.; Soler, M.; Rey-, M.; De Génétique, U.; Fourragères, P. Review Article Open Access Breeding Grasses for Capacity to Biofuel Production or Silage Feeding Value: An Updated List of Genes Involved in Maize Secondary Cell Wall Biosynthesis and Assembly. Maydica 2013, 58, 67–102. [Google Scholar]
  58. Ma, Y.; Han, Y.; Feng, X.; Gao, H.; Cao, B.; Song, L. Genome-Wide Identification of BAM (β-Amylase) Gene Family in Jujube (Ziziphus jujuba Mill.) and Expression in Response to Abiotic Stress. BMC Genom. 2022, 23, 438. [Google Scholar] [CrossRef]
  59. Wang, Q.; McArdle, P.; Wang, S.L.; Wilmington, R.L.; Xing, Z.; Greenwood, A.; Cotten, M.L.; Qazilbash, M.M.; Schniepp, H.C. Protein Secondary Structure in Spider Silk Nanofibrils. Nat. Commun. 2022, 13, 4329. [Google Scholar] [CrossRef]
  60. Wu, Z.; Fu, D.; Gao, X.; Zeng, Q.; Chen, X.; Wu, J.; Zhang, N. Characterization and Expression Profiles of the B-Box Gene Family during Plant Growth and under Low-Nitrogen Stress in Saccharum. BMC Genom. 2023, 24, 79. [Google Scholar] [CrossRef]
  61. Kinnaert, C.; Daugaard, M.; Nami, F.; Clausen, M.H. Chemical Synthesis of Oligosaccharides Related to the Cell Walls of Plants and Algae. Chem. Rev. 2017, 117, 11337–11405. [Google Scholar] [CrossRef]
  62. Herrmann, A.; Torii, K.U. Shouting out Loud: Signaling Modules in the Regulation of Stomatal Development. Plant Physiol. 2021, 185, 765–780. [Google Scholar] [CrossRef]
Figure 1. (a) The phylogenetic tree of the GT47 gene family in Sorghum. (b) The phylogenetic tree of GT47 genes in Arabidopsis thaliana, barley (Hordeumv ulgare), wheat (Triticum aestivum), Zea mays, and rice (Oryza sativa).
Figure 1. (a) The phylogenetic tree of the GT47 gene family in Sorghum. (b) The phylogenetic tree of GT47 genes in Arabidopsis thaliana, barley (Hordeumv ulgare), wheat (Triticum aestivum), Zea mays, and rice (Oryza sativa).
Processes 13 00628 g001
Figure 2. Subcellular localization of GT47 genes shown by heatmap.
Figure 2. Subcellular localization of GT47 genes shown by heatmap.
Processes 13 00628 g002
Figure 3. Using the STRING database, we built a protein–protein interaction network to study the interactions between the sorghum GT47 genes. Proteins are shown by colored nodes, and the lines between them show the interactions between the proteins, as recorded by the database references.
Figure 3. Using the STRING database, we built a protein–protein interaction network to study the interactions between the sorghum GT47 genes. Proteins are shown by colored nodes, and the lines between them show the interactions between the proteins, as recorded by the database references.
Processes 13 00628 g003
Figure 4. Conserved domains of sorghum (S. bicolor) GT47 protein. Each domain is indicated by a colored box. The scale bar represents 800 amino acids.
Figure 4. Conserved domains of sorghum (S. bicolor) GT47 protein. Each domain is indicated by a colored box. The scale bar represents 800 amino acids.
Processes 13 00628 g004
Figure 5. Schematic diagram representing structures of GT47 genes of sorghum. Exons are represented by yellow boxes and introns by black lines. Intron phase numbers 0 and 1 are also displayed at the beginning of introns. All dimensions are accurate in this diagram.
Figure 5. Schematic diagram representing structures of GT47 genes of sorghum. Exons are represented by yellow boxes and introns by black lines. Intron phase numbers 0 and 1 are also displayed at the beginning of introns. All dimensions are accurate in this diagram.
Processes 13 00628 g005
Figure 6. The motif analysis of the GT47 gene family in sorghum. The motif diagram of the distribution of three conserved motifs (Motif 1, Motif 2, and Motif 3) across various GT47 gene sequences, represented along nucleotide length (0–800).
Figure 6. The motif analysis of the GT47 gene family in sorghum. The motif diagram of the distribution of three conserved motifs (Motif 1, Motif 2, and Motif 3) across various GT47 gene sequences, represented along nucleotide length (0–800).
Processes 13 00628 g006
Figure 7. Cis-acting elements in promotor regions of GT47 genes. The Cis-elements are represented by variously colored segments. Each GT47 gene’s length and location are illustrated in size. The scale bar shows how long the DNA sequence is.
Figure 7. Cis-acting elements in promotor regions of GT47 genes. The Cis-elements are represented by variously colored segments. Each GT47 gene’s length and location are illustrated in size. The scale bar shows how long the DNA sequence is.
Processes 13 00628 g007
Figure 8. Conserved region (amino acid residues) sequence logo in (a): Sorghum bicolor, (b): Z. mays, and (c): Arabidopsis thaliana.
Figure 8. Conserved region (amino acid residues) sequence logo in (a): Sorghum bicolor, (b): Z. mays, and (c): Arabidopsis thaliana.
Processes 13 00628 g008
Figure 9. Expression profiles of GT47 gene members in internodes of sorghum (Sorghum bicolor). The internodes 1, 2, 3, and 4 are displayed at the bottom, while the genes are displayed on the right.
Figure 9. Expression profiles of GT47 gene members in internodes of sorghum (Sorghum bicolor). The internodes 1, 2, 3, and 4 are displayed at the bottom, while the genes are displayed on the right.
Processes 13 00628 g009
Table 1. Identified GT47 genes in sorghum (Sorghum bicolor).
Table 1. Identified GT47 genes in sorghum (Sorghum bicolor).
Transcript IDGene NameGenomic SequenceTranscript SequenceCDS (bep)Peptide Length
Sobic.003G405600Sobic-GT47-01527021901299433
Sobic.003G234701Sobic-GT47-02215316491515505
Sobic.003G360300Sobic-GT47-03377915921281427
Sobic.003G331000Sobic-GT47-04506481741500500
Sobic.003G410800Sobic-GT47-05327720351263421
Sobic.003G410700Sobic-GT47-06332319971254418
Sobic.009G220200Sobic-GT47-07348318481251417
Sobic.009G220100Sobic-GT47-08307619471248416
Sobic.008G077900Sobic-GT47-09173216341572524
Sobic.008G021000Sobic-GT47-10352522561488496
Sobic.008G138100Sobic-GT47-11176117611761587
Sobic.004G070100Sobic-GT47-12451615151212404
Sobic.007G139300Sobic-GT47-13186217901572524
Sobic.001G506700Sobic-GT47-14241523171539513
Sobic.001G387300Sobic-GT47-15318925101389463
Sobic.001G506300Sobic-GT47-16302530251851617
Sobic.001G506800Sobic-GT47-17262724861590530
Sobic.001G506500Sobic-GT47-18585158511641547
Sobic.001G538700Sobic-GT47-19273920971290430
Sobic.001G486900Sobic-GT47-20522829772352784
Sobic.001G303300Sobic-GT47-21220022001353451
Sobic.001G229000Sobic-GT47-22233023301842614
Sobic.001G506900Sobic-GT47-23300728221524514
Sobic.001G506600Sobic-GT47-24188717771563521
Sobic.001G541601Sobic-GT47-2512291229621207
Sobic.001G228900Sobic-GT47-26139513801380460
Sobic.001G229100Sobic-GT47-27244318151473491
Sobic.006G186200Sobic-GT47-28202820281440480
Sobic.006G186100Sobic-GT47-29293727701707569
Sobic.006G260900Sobic-GT47-30257616151371457
Sobic.002G342300Sobic-GT47-31614317441371457
Table 2. Identified GT47 genes in Z. mays, barley (H. vulgare), Arabidopsis thaliana, wheat (T. aestivum), and rice (O. sativa).
Table 2. Identified GT47 genes in Z. mays, barley (H. vulgare), Arabidopsis thaliana, wheat (T. aestivum), and rice (O. sativa).
Z. maysBarley (H. vulgare)Arabidopsis thalianaWheat (T. aestivum)Rice (O. sativa)
Transcript IDGene NameTranscript IDGene NameTranscript IDGene NameTranscript IDGene NameTranscript IDGene Name
Zm00001d021821Zm-GT47-01HanXRQChr10g0315941HOR-GT47-01AT2G32750AT-GT47-01Traes_2BS_53D9472F2Traes-GT47-01LOC_Os10g32110LOC-GT47-01
Zm00001d019053Zm-GT47-02HanXRQChr10g0318411HOR-GT47-02AT2G32740AT-GT47-02Traes_3AL_10E90321ETraes-GT47-02LOC_Os10g10080LOC-GT47-02
Zm00001d026580Zm-GT47-03HanXRQChr10g0280221HOR-GT47-03AT2G28110AT-GT47-03Traes_3B_80B4E7C02Traes-GT47-03LOC_Os10g40559LOC-GT47-03
Zm00001d026066Zm-GT47-04HanXRQChr10g0280211HOR-GT47-04AT4G32790AT-GT47-04Traes_3B_90535BAF7Traes-GT47-04LOC_Os10g32170LOC-GT47-04
Zm00001d023291Zm-GT47-05HanXRQChr16g0498961HOR-GT47-05AT4G13990AT-GT47-05Traes_2DL_72B1DF5D9Traes-GT47-05LOC_Os10g32160LOC-GT47-05
Zm00001d002609Zm-GT47-06HanXRQChr07g0187251HOR-GT47-06AT4G16745AT-GT47-06Traes_3B_D46D8B6E7Traes-GT47-06LOC_Os08g34020LOC-GT47-06
Zm00001d028980Zm-GT47-07HanXRQChr07g0206321HOR-GT47-07AT4G38040AT-GT47-07Traes_3B_58EF227ABTraes-GT47-07LOC_Os04g48480LOC-GT47-07
Zm00001d027642Zm-GT47-08HanXRQChr09g0270861HOR-GT47-08AT1G74680AT-GT47-08Traes_4DS_1153D0950Traes-GT47-08LOC_Os04g57510LOC-GT47-08
Zm00001d027639Zm-GT47-09HanXRQChr09g0257441HOR-GT47-09AT1G21480AT-GT47-09Traes_3AS_A5635015ATraes-GT47-09LOC_Os07g37960LOC-GT47-09
Zm00001d027638Zm-GT47-10HanXRQChr09g0258371HOR-GT47-10AT1G34270AT-GT47-10Traes_3DL_2DFEC0902Traes-GT47-10LOC_Os01g69220LOC-GT47-10
Zm00001d029862Zm-GT47-11HanXRQChr09g0273791HOR-GT47-11AT1G63450AT-GT47-11Traes_2AL_735F7F6411Traes-GT47-11LOC_Os01g70200LOC-GT47-11
Zm00001d032797Zm-GT47-12HanXRQChr13g0403291HOR-GT47-12AT1G67410AT-GT47-12Traes_4AL_59C2F5FC5Traes-GT47-12LOC_Os01g70190LOC-GT47-12
Zm00001d027287Zm-GT47-13HanXRQChr13g0396241HOR-GT47-13AT3G45400AT-GT47-13Traes_5DS_BE64352A2Traes-GT47-13LOC_Os01g59630LOC-GT47-13
Zm00001d032791Zm-GT47-14HanXRQChr13g0394301HOR-GT47-14AT3G07620AT-GT47-14Traes_2AL_ECEE1A203Traes-GT47-14LOC_Os01g45350LOC-GT47-14
Zm00001d032173Zm-GT47-15HanXRQChr12g0385411HOR-GT47-15AT3G42180AT-GT47-15Traes_2BL_15045A787Traes-GT47-15LOC_Os01g01780LOC-GT47-15
Zm00001d032795Zm-GT47-16HanXRQChr12g0375441HOR-GT47-16AT3G57630AT-GT47-16Traes_4DL_9C5232C95Traes-GT47-16LOC_Os03g20850LOC-GT47-16
Zm00001d032794Zm-GT47-17HanXRQChr14g0449671HOR-GT47-17AT3G03650AT-GT47-17Traes_2DL_5663CE72DTraes-GT47-17LOC_Os03g05110LOC-GT47-17
Zm00001d027643Zm-GT47-18HanXRQChr14g0455031HOR-GT47-18AT5G20260AT-GT47-18Traes_4AS_BF84B58601Traes-GT47-18LOC_Os03g07820LOC-GT47-18
Zm00001d038905Zm-GT47-19HanXRQChr14g0444231HOR-GT47-19AT5G11130AT-GT47-19Traes_4AL_B613811A6Traes-GT47-19LOC_Os03g08420LOC-GT47-19
Zm00001d042333Zm-GT47-20HanXRQChr14g0461921HOR-GT47-20AT5G25310AT-GT47-20Traes_2AS_0CDF65B63Traes-GT47-20LOC_Os03g05060LOC-GT47-20
Zm00001d043134Zm-GT47-21HanXRQChr01g0019561HOR-GT47-21AT5G03795AT-GT47-21Traes_2DS_A06773E29Traes-GT47-21LOC_Os03g05070LOC-GT47-21
Zm00001d042820Zm-GT47-22HanXRQChr03g0088471HOR-GT47-22AT5G19670AT-GT47-22Traes_2BL_9DABE98FBTraes-GT47-22
Zm00001d044663Zm-GT47-23HanXRQChr15g0491231HOR-GT47-23AT5G44930AT-GT47-23Traes_4DL_115811E21Traes-GT47-23
Zm00001d041181Zm-GT47-24HanXRQChr02g0038921HOR-GT47-24AT5G11610AT-GT47-24Traes_3AS_8F70D84D1Traes-GT47-24
Zm00001d042281Zm-GT47-25HanXRQChr02g0033091HOR-GT47-25AT5G16890AT-GT47-25
Zm00001d044094Zm-GT47-26HanXRQChr02g0049431HOR-GT47-26
Zm00001d042276Zm-GT47-27HanXRQChr02g0052791HOR-GT47-27
Zm00001d048298Zm-GT47-28
Zm00001d048387Zm-GT47-29
Table 3. Physiochemical properties of GT47 gene family in sorghum.
Table 3. Physiochemical properties of GT47 gene family in sorghum.
Gene Name Chromosome NumberIntron ExonResiduesARW TPNCAmino Acid LengthAIPIMW (kDa)GRAVY/PWY
Sobic.003G405600334432112.11543284.289.148.44−0.166
Sobic.003G234701334504113.439.550485.3610.0657.1281−0.267
Sobic.003G36030037842611420.542688.569.5248.56−0.268
Sobic.003G331000378499122.81449991.329.0156.28−0.188
Sobic.003G410800334420112.32.542085.296.4247.15−0.0133
Sobic.003G410700334417111241741784.76.4246.90−0.117
Sobic.009G220200934416113.4341685.66.4747.18−0.142
Sobic.009G220100934415112241585.336.3746.56−0.113
Sobic.008G077900812523112.1352372.985.7858.62−0.279
Sobic.008G021000812495110.713.549575.478.9254.80−0.0338
Sobic.008G138100801586111.62258671.429.465.42−0.465
Sobic.004G07010044540311415.540387.849.1545.95−0.168
Sobic.007G139300712523112.731.552382.019.8658.95−0.274
Sobic.001G506700112512111.42751280.459.6657.03−0.139
Sobic.001G387300112462112.6746282.587.6652.03−0.219
Sobic.001G506300101616113.2−661667.475.6769.75−0.499
Sobic.001G506800112529111.52552982.89.5458.95−0.161
Sobic.001G506500101546113.61654674.87962.023−0.412
Sobic.001G538700123429113.222.542992.599.5448.55−0.079
Sobic.001G4869001131478311134.578374.486.4188.60−0.37
Sobic.001G303300101450112.212.545085.478.5450.49−0.278
Sobic.001G229000101613110.20.561369.776.167.55−0.471
Sobic.001G506900101513111.414.551373. 458,7957.15−0.361
Sobic.001G506600112520112.61552075.818.6958.53−0.306
Sobic.001G541601101206110.310.520682.436.1122.70−0.233
Sobic.001G228900112459111.41145076.368.8351.11−0.121
Sobic.001G229100101490111.11749073.929.2354.42−0.232
Sobic.006G18620060147911015.547988.529.2152.700.042
Sobic.006G186100612568110.410.556875.928.7762.70−0.302
Sobic.006G260900623456113.914.545687.879.151.94−0.299
Sobic.002G342300223456111.711.545690.298.9350.94−0.221
MW, molecular weight: ARW, average residues weight: PI, isoelectric point: GRAVY, grand average of hydropathicity: AI, aliphatic index.
Table 4. Subcellular localization of GT47 gene family in sorghum.
Table 4. Subcellular localization of GT47 gene family in sorghum.
Gene IDCholVacuMitoPlasNuclE. RCyto_
plas
CytoExtrGolg_plaGolgiPeroCyto-nuclE.R._plasSubcellular localization (Significant)
Sobic-GT47-01131000000000000Chloroplast, Mitochondrial
Sobic-GT47-0251610000000000Mitochondrial
Sobic-GT47-0360412100000000Mitochondrial
Sobic-GT47-0461421000000000Plasma membrane, Chloroplast
Sobic-GT47-050224.5033.51.5100000Plasma membrane
Sobic-GT47-060204.50300054.5000Mitochondrial
Sobic-GT47-0714021000600000Plasma membrane
Sobic-GT47-080304.50200054.5000Lysosomal
Sobic-GT47-090303.50400003.5000Chloroplast, plasma membrane
Sobic-GT47-1065000200001000Chloroplast
Sobic-GT47-1105211.5002.500112.50Nuclear
Sobic-GT47-1271111200100000Mitochondrial, plasma membrane
Sobic-GT47-13101005.505.500003.53.5 Mitochondrial, chloroplast
Sobic-GT47-1462600000000000Mitochondrial, plasma membrane
Sobic-GT47-1533222200000000Plasma membrane
Sobic-GT47-1631232101001000Chloroplast, cytoplasmic
Sobic-GT47-1752410000001010Plasma membrane, Mitochondrial
Sobic-GT47-1860411001001000Mitochondrial
Sobic-GT47-1962600000000000Mitochondrial, plasma Membrane
Sobic-GT47-2007112000102000Extracellular
Sobic-GT47-2153001100400000Mitochondrial, nuclear
Sobic-GT47-2201202404001000Chloroplast
Sobic-GT47-2341421.500000101.50Extracellular, mitochondrial
Sobic-GT47-2490500000000000Mitochondrial
Sobic-GT47-25200000012000000Chloroplast, nuclear
Sobic-GT47-2683101000100000Plasma membrane
Sobic-GT47-2700201406100000Chloroplast
Sobic-GT47-2832700101000000Plasma Membrane
Sobic-GT47-29130100000101000Plasma Membrane
Sobic-GT47-3036201000000000Plasma Membrane, mitochondrial
Sobic-GT47-3121080200000100Plasma membrane
Vacuo, vacuole: Cyto, cytoplasmic/cytosolic: Chlo, chloroplast: Extr, extracellular: Mito, mitochondria: Plas, plasma membrane: Nucl, nuclear: (E.R), endoplasmic reticulum: Pero, peroxisome.
Table 5. Description of motifs found in the GT47 gene family in sorghum (Sorghum bicolor).
Table 5. Description of motifs found in the GT47 gene family in sorghum (Sorghum bicolor).
No. MotifsSequencesLengthDescription
1GDTPTRRRLFDAIVAGCIPVI21Exostosin family
2RPYWRRSGGRDHFFV15Exostosin family
3YECRTNDPAEADAVFVPFYAG21Exostosin family
Table 6. The secondary structure statistics of the GT47s proteins in sorghum.
Table 6. The secondary structure statistics of the GT47s proteins in sorghum.
ProteinAlpha Helix (%)Extended Strand (%)Beta Turn (%)Random Coil (%)
Sobic.003G40560034.95%15.05%4.63%45.37%
Sobic.003G23470133.93%12.10%4.76%49.21%
Sobic.003G36030037.56%11.03%3.05%48.36%
Sobic.003G33100034.67%14.43%3.81%47.09%
Sobic.003G41080037.62%14.52%5.71%42.14%
Sobic.003G41070032.85%15.35%5.04%46.76%
Sobic.009G22020033.65%15.14%4.57%46.63%
Sobic.009G22010034.94%15.90%5.30%43.86%
Sobic.008G07790039.01%12.62%4.02%44.36%
Sobic.008G02100029.09%15.56%4.85%50.51%
Sobic.008G13810032.94%12.97%4.10%50.00%
Sobic.004G07010036.72%13.65%3.97%45.66%
Sobic.007G13930028.68%16.63%3.44%51.24%
Sobic.001G50670035.94%14.26%4.10%45.70%
Sobic.001G38730030.95%16.45%5.84%46.75%
Sobic.001G50630037.66%10.23%4.71%47.40%
Sobic.001G50680040.64%10.96%4.16%44.23%
Sobic.001G50650037.73%13.37%4.03%44.87%
Sobic.001G53870037.53%11.66%4.90%45.92%
Sobic.001G48690025.42%11.37%3.70%59.51%
Sobic.001G30330037.33%14.67%4.22%43.78%
Sobic.001G22900034.42%9.79%5.38%50.41%
Sobic.001G50690035.48%13.26%4.68%46.59%
Sobic.001G50660039.23%12.50%3.08%45.19%
Sobic.001G54160141.26%15.05%2.91%40.78%
Sobic.001G22890036.17%12.42%3.27%48.15%
Sobic.001G22910034.29%14.29%3.47%47.96%
Sobic.006G18620038.83%13.36%4.59%43.22%
Sobic.006G18610030.99%11.80%3.87%53.35%
Sobic.006G26090034.21%13.16%5.04%47.59%
Sobic.002G34230030.26%17.98%5.48%46.27%
Table 7. Synonymous and non-synonymous substitution rates in GT47 genes.
Table 7. Synonymous and non-synonymous substitution rates in GT47 genes.
Sr. NoParalogs Gene PairsKaKsKa/KsDuplication TypesSelection PressureTime (MYA)
1Sobic.003G410800Sobic.009G2201000.0420.5080.083Segmentalpurifying1.65 × 10−9
2Sobic.003G410700Sobic.009G2202000.0640.5210.124Segmentalpurifying1.69 × 10−9
3Sobic.003G234701Sobic.004G0701000.5792.3760.244Segmentalpurifying7.92 × 10−10
4Sobic.003G331000Sobic.002G3423000.6181.9830.312Segmentalpurifying6.44 × 10−9
5Sobic.008G021000Sobic.001G3873000.2391.2300.195Segmentalpurifying4.00 × 10−9
6Sobic.001G303300Sobic.001G5416010.1860.4020.464Segmentalpurifying1.31 × 10−9
7Sobic.006G186200Sobic.006G1861000.3020.4450.678Tandempurifying1.45 × 10−9
8Sobic.008G077900Sobic.008G1381000.4710.7870.598Tandempurifying2.56 × 10−9
9Sobic.001G506300Sobic.001G2290000.3911.5310.256Tandempurifying4.98 × 10−9
10Sobic.001G506700Sobic.001G5068000.0880.1660.527Tandempurifying5.41 × 10−10
Table 8. Cis-acting elements in promoter regions of GT47 genes.
Table 8. Cis-acting elements in promoter regions of GT47 genes.
Cis-ElementsFunctions of Cis-Elements
ABAEInvolved in abscisic acid responsiveness element
IAAEInvolved in auxin responsiveness element
SAEInvolved in salicylic acid responsiveness element
GAEInvolved in gibberellin-responsiveness element
SAREInvolved in systemic required resistance element
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rehana, R.; Anwar, M.; Arshad, S.F.; Saleem, M.A. Computational Identification and Characterization of Glycosyltransferase 47 (GT47) Gene Family in Sorghum bicolor and Their Expression Profile in Internode Tissues Based on RNA-Seq Data. Processes 2025, 13, 628. https://doi.org/10.3390/pr13030628

AMA Style

Rehana R, Anwar M, Arshad SF, Saleem MA. Computational Identification and Characterization of Glycosyltransferase 47 (GT47) Gene Family in Sorghum bicolor and Their Expression Profile in Internode Tissues Based on RNA-Seq Data. Processes. 2025; 13(3):628. https://doi.org/10.3390/pr13030628

Chicago/Turabian Style

Rehana, Rehana, Muhammad Anwar, Sarmad Frogh Arshad, and Muhammad Asif Saleem. 2025. "Computational Identification and Characterization of Glycosyltransferase 47 (GT47) Gene Family in Sorghum bicolor and Their Expression Profile in Internode Tissues Based on RNA-Seq Data" Processes 13, no. 3: 628. https://doi.org/10.3390/pr13030628

APA Style

Rehana, R., Anwar, M., Arshad, S. F., & Saleem, M. A. (2025). Computational Identification and Characterization of Glycosyltransferase 47 (GT47) Gene Family in Sorghum bicolor and Their Expression Profile in Internode Tissues Based on RNA-Seq Data. Processes, 13(3), 628. https://doi.org/10.3390/pr13030628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop