Differential Evolution of α-Glucan Water Dikinase (GWD) in Plants

The alpha-glucan water dikinase (GWD) enzyme catalyzes starch phosphorylation, an integral step in transitory starch degradation. The high phosphate content in stored starch has great industrial value, due to its physio–chemical properties making it more versatile, although the phosphate content of stored starch varies depending on the botanical source. In this study, we used various computational approaches to gain insights into the evolution of the GWD protein in 48 plant species with possible roles in enzyme function and alteration of phosphate content in their stored starch. Our analyses identified deleterious mutations, particularly in the highly conserved 5 aromatic amino acid residues in the dual tandem carbohydrate binding modules (CBM-45) of GWD protein in C. zofingiensis, G. hirsutum, A. protothecoides, P. miliaceum, and C. reinhardtii. These findings will inform experimental designs for simultaneous repression of genes coding for GWD and the predicted interacting proteins to elucidate the role this enzyme plays in starch degradation. Our results reveal significant diversity in the evolution of GWD enzyme across plant species, which may be evolutionarily advantageous according to the varying needs for phosphorylated stored starch between plants and environments.


Introduction
The plastid-localized alpha-glucan water dikinase (GWD) gene is encoded by the nuclear genome and its products catalyzes reactions responsible for starch phosphorylation, an essential step in the de novo biosynthesis of this polysaccharide [1]. Generally, two plastidial isoforms of GWD; GWD1 (EC 2.7.9.4) and phosphoglucan water dikinase (PWD) or GWD3 (PWD; EC 2.7.9.5) have been identified in most plants [2]. The dikinase mechanism of GWD and PWD involve autophosphorylation of the catalytic histidine by the β-phosphate of ATP, and the transfer of β-phosphate from the stable phosphohistidine to either the C3 or C6 position of the glucosyl residue of starch, while the γ-phosphate is transferred to water [1,3]. Glucosyl residues in its C6 positions are phosphorylated by GWD1 [4] while pre-phosphorylated starch is further phosphorylated in its C3 positions by PWD [5,6]. Strong experimental evidence demonstrates that PWD acts downstream of GWD1 because repression of GWD1 activity leads to elimination of PWD activity [7]. Furthermore, GWD1 and PWD have been linked to starch degradation. While less is known about this phenomenon, the model for plastidial starch degradation starts with its phosphorylation, orchestrated by the activities of GWD1 and PWD [2]. In addition to the pervasive GWD isoforms, another isoform, GWD2, uniquely localized in the cytosol, has been identified [8]. Mainly because of its cytosolic localization, GWD2 has not been associated with transitory starch degradation [8].
GWD's importance is not limited to its role in the biosynthesis or degradation of starch, but also accounts for starch-bound phosphate (SBP) in industrial starches. High phosphate content is desirable in industrial starches because it prevents or minimizes the need for artificial increase of phosphate starch content, with costly chemicals that are less-friendly to the environment. Nevertheless, an artificial increase in starch phosphate is often necessary because SBP is species and tissue type-specific in planta. For example, Zea mays seeds from which most industrial starch is extracted have negligible SBP [9], whereas 0.1-0.5% of glucose residues are phosphorylated in Solanum tuberosum [10]. Biotechnological processes provide safer alternatives to artificial augmentation of starch phosphate, and their application have yielded tremendous results [11]. Overexpression of S. tuberosum GWD led to double the amount of SBP in Manihot esculenta [12], while a six-fold increase was reported in S. tuberosum when the two isoforms of the starch branching enzyme were simultaneously repressed [13], including an increase in SBP when engineered laforin (a human enzyme) was expressed by potato [11].
Therefore, varying levels of phosphorylated starch prompts the hypothesis that GWD has undergone differentiation in different plant species, thereby altering its catalytic activity or substrate binding capacity. Enzymatic evolution, allow organisms to cope with changing and adverse environmental conditions, by providing novel molecular machinery capable of diversifying biochemical reactions [14]. In fact, Thalman and Santelia [15], opined that plants mobilize starch to cope during abiotic stress, strengthening the argument for the species-based evolution of GWD during contrary environmental pressures. Such evolution occurs via incremental residue mutation; insertion or deletion (indel) of amino acid residues; gene fusion and fission; protein oligomerization and post-transitional modification [16], which often results in substrate and functional selectivity. It may also alter binding specificity between enzyme and its substrate, this contingent on the degree of structural modification [17]. Variants of some of the reported proteins (with the same tertiary structure and performing similar biological functions but with different amino acid sequences) have been linked to the evolutionary divergence of species [18]. However, amino acid residues needed for maintaining protein 3D structure and biological activity are usually conserved while amino acid substitutions are common in positions that are less critical for function. Orthologs may retain biological function and structure across species, while modifications in other species [19,20] may have resulted in loss [21,22] or gain of functions [23][24][25].
The characterization of specific enzymatic functions is time and resource-intensive [26]. There is a rising number of gene and protein sequences available in public databases, enabling rapid and economically feasible evolutionary and functional prediction studies [16]. In this study, we utilized a comprehensive computational approach to understand GWD protein sequence diversity in selected agronomical and model plant species, and its contribution to substrate specificity and function. The knowledge generated will expedite functional studies into the biology and biochemistry of starch phosphorylation and guide gene editing approaches to potentially generate high phosphate starch in agro-economically important crop species.

Sequence Curation and Multiple Sequence Alignment
We studied the molecular evolution of the GWD enzyme in the following plant divisions (phyla); rhodophytes, chlorophytes, lycophytes, bryophytes and magnoliophytes (angiosperms). GWD protein sequences were curated from the following databases; UniProtKB/Swiss-Prot, GenBank, and Phytozome. Their designated identities/accession numbers are shown on Supplementary All the GWD protein sequences were compiled into a single file in Jalview 2.11.0 [27], and saved in a FASTA format. We computed multiple sequence alignment on the phylogenetic platform [28], utilizing Multiple Alignment using the Fast Fourier Transform (MAFFT) [29], with the following settings: gap extend penalty (0.123); gap opening penalty (1.53). Accuracy of phylogenetic inference was enhanced by block mapping and gathering with entropy (BMGE) [30]. BMGE implements trimming to reduce artefacts, therefore, only phylogenetic informative regions from such trimmed outputs were used to reconstruct the phylogenetic tree. Maximum likelihood phylogenetic tree PhyML [31] was generated with the following settings; the model selection option is Smart Model Selection (SMS) method [32] and AIC scores. Node support was also determined using approximate likelihood ratio test with Shimodaira-Hasegawa-like estimate (SH-like aLRT) [33]. The phylogenetic tree was formatted using interactive Tree Of Life (iTOL) [34].

Comparative Analysis of GWD Protein Primary Structure
The physical and chemical properties of GWD orthologs were analyzed with ProtParam [35]. The following properties were calculated: aliphatic index, instability index, molecular weight and grand average of hydropathicity (GRAVY), as described [36]. Cysteine count in each of the GWD sequences and disulfide connectivity predictions was done according to the method of Ferre and Clote [37,38].

Prediction of Signal Sequence, Motif Scanning and Analysis of Domains
The N-terminal region of the multiple sequence alignment (MSA) of all the GWD amino acid sequences was scanned for the presence of carbohydrate binding module (CBM). Using the potato GWD sequence as reference, the two tandems of CBM45 were extracted from the MSA with Snapgene version 4.3.8 (San Diego, CA, United States of America) and formatted by Corel X8. The MSA was also scanned for redox-regulation motif (CFATC) as reported [39]. Prediction of the presence of transit signal peptides, subcellular localization and cleavage site was carried out using TargetP-2 [40]. Some of the result of TargetP-2 was compared with those obtained from Multiloc2, which uses MultiLoc2-LowRes (Plant), 5 localizations prediction methods [41].

Amino acid Substitution Prediction
To test the effect of substitution and deletion among the conserved five aromatic amino acids in the two CBM45 modules and the redox-regulation motif, we used Protein Variation Effect Analyser (PROVEAN) to predict mutation effects [42]. The potato GWD sequence was used as the query sequence and effects of point mutations in those positions were predicted. The substitution or deletion effect was scored deleterious (capable of altering enzyme function) if the score was 2.5 or higher or scored neutral (enzyme still able to perform its function) if it was lesser than 2.5.

Sequence Comparison and Developmental Pattern
GWD transcripts were found in all the plant phyla examined, i.e., rhodophytes, chlorophytes, lycophytes, bryophytes and angiosperms. The phylogenetic tree grouped all the plants in their respective phyla, angiosperms branched further into asterids and rosids. For instance, in Fabidae (a rosid family), F. vesca, M. domestica, C. melo, G. max, P. vulgaris and V. unguiculata were all grouped in the same clade. Our result showed that GWD enzyme in rhodophytes (P. umbilicalis and C. crispus) was most distant from the rest of plant phyla, forming an out-group ( Figure 1). The primary structure of GWD protein in rhodophytes may have been the closest to the ancestral GWD protein of other phyla due to shortness of its branch and absence of diverged clade. Conversely, A. protothecoides a chlorophyte may have diverged the most from ancestral protein of GWD, going by its branch length, followed by another chlorophyte; C. zofingiensis. Two highly similar GWD sequences were found for P. patens-GWDa and GWDb. GWD duplication, which occupies a distinctive clade and referred to as GWD2 occurred in A. trichopoda and the dicot families; malpighiales, malvales, citrus, brassicales and brassicaceae, but was not found in monocots or other dicots. The average branches from the root of GWD2 clade and its sub-clades are longer and more varied when compared with those of GWD1 in angiosperm group. However, slight sequence divergence existed between GWD1 of the angiosperm group and its paralog GWD2, as indicated by their close proximities to the root of their clades.

Assessment of Physio-Chemical Properties of GWD Protein Sequences
We examined the primary structure of GWD protein sequences for the plants under study to elucidate its function (Table 1). We report the average length of GWD1 to be 1422 aa while that of GWD2 is~1285 aa. The aliphatic index of GWD protein was moderately high, regardless of the protein subgroup, indicating moderate protein thermostability and solubility. The in vivo half-life of GWD1, measured by instability index, revealed a longer half-life (instability index < 40) in A. protothecoides  (14). The number of cysteines in GWD2 subgroup however was unusually higher among plants with this paralog (doubled for most of the plants and nearly tripled in B. oleracea-31). Furthermore, though the GWD1 of rhodophytes, chlorophytes and bryophytes had higher cysteine composition than angiosperms, some fluctuation was recorded; slight difference among rhodophytes; P. umbilicalis (18) and C. crispus (15), gave way to wide variation among chlorophytes; C. zofingiensis (18), A. protothecoides (10) and C. reinhardtii (24), with bryophytes following a similar trend.

Prediction of Transit Peptide, Motif and Analysis of Functional Domain
We scanned the GWD1 and GWD2 peptide sequences for presence of organelle-targeting signal sequences, expected to reside at the N-terminus. Our result with TargetP-2 show that signal sequences exist in the GWD1 of most of the plants ( Table 2); C. reinhartii, all monocots except M. acumulata and H. vulgare, solanecea, malpighiales, malvales, citrus, brassicales and all brassicaceae except M. maritima, predicted to be targeting chloroplast. However, no signal sequence was identified for any of the GWD2. Additionally, our analysis predicted signal sequence for P. patens GWDb, H. vulgare and M. maritima to be mitochondrial transit peptide (mTP). Furthermore, the predicted length of the signal sequence varied, with D. alata possessing the longest (between 96-97 amino acids) and H. vulgare with the shortest (16-17).

Carbohydrate Binding Module (CBM45) Tandem Domains
The existence of two tandem starch binding domains (SBD), known as carbohydrate binding module 45 (CBM45), on CaZy database [43], was previously reported [44,45]. We examined the presence of this motif, with emphasis on the five aromatic amino acid residues crucial for its function. In most of the phyla, the five aromatic amino acid residues were conserved in the two CBM45 tandems, designated CBM45-1 (Figure 2a) and CBM45-2 ( Figure 2b). However, some deletions and substitutions were observed. For instance, we found a complete deletion in the GWD1 of C. zofingiensis and G. hirsutum, and partial deletions in A. protothecoides, C. rabica, P. miliaceum, and the GWD2 of M. esculenta. The five aromatic amino acid residues were mostly conserved in CBM45-2 of the phyla regardless of protein subgroup (GWD1 or GWD2), except in T. cacao and P. miliaceum. Contrarily, almost all the five aromatic amino acids were conserved in CBM45-2 of A. protothecoides, except the second phenylalanine which was substituted by tyrosine (Y). Furthermore, we observed that the CFACT motif linked to redox regulation of GWD was present and conserved in most of the plants except in rhodophytes, lycophytes, most of the chlorophytes (Figure 3), and mostly in grasses (monocots).

Analysis of Amino Acid Mutation
The potato GWD sequence was used as the reference sequence in PROVEAN to predict whether the variant effect in plants where substitution of conserved amino acids and redox regulation motifs were observed is deleterious or neutral. We observed a total of 55 amino acid variants; 37 for the conserved tandem domains and 18 for the predicted redox regulation motifs. In the tandem domains, neutral mutation (F substituted with Y) was predicted only in one position for each of A. protothecoides and P. patens GWDa and b, M. polymorpha and S. magellanicum (Table 3)

Analysis of Amino Acid Mutation
The potato GWD sequence was used as the reference sequence in PROVEAN to predict whether the variant effect in plants where substitution of conserved amino acids and redox regulation motifs were observed is deleterious or neutral. We observed a total of 55 amino acid variants; 37 for the conserved tandem domains and 18 for the predicted redox regulation motifs. In the tandem domains, neutral mutation (F substituted with Y) was predicted only in one position for each of A. protothecoides and P. patens GWDa and b, M. polymorpha and S. magellanicum (Table 3). Deleterious mutation was predicted for the remaining 33 variants in the conserved tandem domains of the following plants; C. crispus, C. zofingiensis, G. hirsutum (GWD1), A. protothecoides, P. miliaceum, C. arabica, M. esculenta (GWD2), C. reinhardti and P. umbilicalis. Contrarily, mutations at redox regulation motif were neutral in most of the positions, except seven positions, involving three positions for C. crispus (C1079G, F1080L and A1081V), L. usitatissimum/GWD2 (F1080L) and P. umbilicalis (C1079G, F1080L and A1081V).

Discussion
We examined alpha-water dikinase (GWD) sequences to elucidate its evolution in five plant phyla utilizing various computational approaches. Our phylogenetic tree reveals that the GWD enzyme may have evolved different protein sequences, while maintaining core function of the ancestral gene. This might be due to the fact that the selection pressure that led to the evolution of these sequences ensured that ancestral enzyme function is maintained in descendant species [46]. Two copies of the GWD gene were also observed in some angiosperm families, suggesting a divergence in function between GWD1 and GWD2 paralogs of these species. A different functional evolution may exist regarding PWD, another paralog of GWD, as described [2], but not the focus of current study. In M. esculenta, GWD2 and the other GWD isoforms were implicated by Zhou et al. [45] in storage starch degradation, due to increased expressions during post-harvest physiological deterioration where its activity was linked to seed development in A. thaliana and not the degradation of transitory starch at night [8]. However, gain of GWD2 in several plant families points to another unique event during GWD evolution. A similar occurrence has been reported in some other plants with an isoform of starch synthase (SSIV) and starch branching enzyme (SBEI) in A. thaliana and B. rapa, and a third isoform of starch phosphorylase (PHO3) [47][48][49]. Our result demonstrates that GWD1 sequences of rhodophytes are basal first from the rest of the phyla, indicating it is the most ancestral compared to other phyla.
Evolutionary events like amino acid substitutions, deletions, insertions, domain and gene duplications observed in the MSA suggest that GWD enzymatic activities may have been further modified post-speciation, possibly in response to environmental selection pressure. A recent report showed that these changes are capable of causing minor functional (creeping evolution) or drastic change [16]. Out of the nine cysteine residues present in potato GWD, only two were experimentally shown to form reversible disulfide linkage [39], which activate and inactivate GWD. While it is unclear why the number of cysteine residues is remarkably higher in C. reinhardtii or all the GWD2 protein sub-groups, we hypothesize that the occurrence of more cysteine may not only contribute to the stability of its GWD protein structure, but may increase its ability to form more disulfide connections with other cellular molecules. In addition, abundance of conserved glycine residues at various positions in the C-terminus, possibly implies that this amino acid is functionally important [9].
The aliphatic index was generally high for all the plants, indicating GWD's thermal stability, potentially a requirement for maintaining a periodical soluble form. According to Ritte et al. [50], GWD partly exist in a soluble form in illuminated leaves, and thermostability is correlated with protein solubility [51]. While other factors such as protein aggregation propensity and folding rate may also contribute to solubility [52], high thermostability may prevent GWD structural decomposition during photosynthesis.
We detected a predicted localization signal in almost all angiosperms but almost none in lower plants, implying that during evolution, GWD1 gained the ability to be chloroplast localized from its ancestral state and it has been maintained. However, we found no signal sequence in GWD2, possibly a deletion for novel localization-based function in those plants. However, the enzyme may be transited by through complexing with other proteins targeting plastids or mitochondria. Both cTP and mTP had been postulated to share similar evolutionary mechanisms [53] and their current functions were suggested to emanate from selection pressure, leading to shuffling and streamlining of separate exons to form multiple domains. This may have been responsible for the high variation we observed in the predicted amino acid sequences, lengths and arrangement, as reported [54,55]. This location at the N-terminus in MSA was also the least conserved region, giving an indication of the intense selection pressure in the evolution of the current function of transit peptides. The absence of transit peptides in GWD2 in the plant species in which it is present supports the fact that they are localized in the cytosol [8]. Finally, recent studies have established that cTP and mTP contain specific motifs [56] and amino acid residues [57,58], with which they either interact with cytosolic complexes containing Hsp70 and Hsp90 (molecular chaperones that target transit peptides and guide the preprotein to the outer membranes of the organelles) or TOC-TIC (chloroplast)/TOM-TIM (mitochondria) protein import machinery [59,60]. Proline has been reported to play a crucial role in efficient protein translocation into the chloroplast through import channels. However, proline was not conserved at any common location in the predicted signal sequences, though it was abundant in all the plants whose GWD were predicted to bear cTP. Hence, our prediction might be helpful in identifying those motifs embedded in the respective transit peptides of some of the plants and such knowledge may be useful in elucidating GWD transit to localized sites in plants.
The conversion of starch into soluble sugar, a process known as amylolysis, is an essential process in planta. However, the semi-crystalline structure of starch poses a limitation to enzymes involved in its breakdown. For efficient amylolysis, such enzymes have evolved substrate-binding sites on the catalytic module [61] or specialized carbohydrate-binding-module (CBM), known as starch-binding domain (SBD) [62]. CBM is an auxiliary module (classified into several families) of about 40 to 200 amino acids with a distinctive fold with which it binds to carbohydrate and adjacent to the carbohydrate-active enzyme [63]. It exists as two tandem domains in most of the plants studied and a similar domain arrangement was found in the plastidial α-amylase AMY3 in A. thaliana (separated by 50 linker amino acids), whose functional role as starch (CBM) was inferred from knock-out studies involving phosphoglucan phosphatase (SEX4) [64]. The two modules were suggested to arise as a result of gene duplication (Mikkelsen et al., 2006), and may have led to alterations in the functionality of the conjoining catalytic domain [16]. For example, in a transgenic potato line in which CBM45-1 was missing, the shorter glucan chains were preferred substrates for phosphorylation [43]. However, the stored starch in the endosperm of Z. mays and H. vulgare has been reported to have negligible phosphate content, even though the two CBM45 were fully present in GWD sequences of both plants. We suspect that for both CBM45 tandem domains in Z. mays and H. vulgare, degenerative mutations may have occurred within the domains, leading to altered enzyme structure and function. However, the five aromatic amino acid residues in CBM45s were conserved across most of the plants and various mutations in these residues were highly deleterious. Mutation of these conserved residues may be due to a more intense selection pressure. These mutations may affect glucan substrate specificity or alter its folding or function. It is unclear how GWD1 enzyme in G. hirsutum and C. zofingiensis successfully perform its function because the CBM45-1 tandem was completely absent, and their absence was predicted to be deleterious. We surmise that this enzyme may have evolved effective use of CBM45-2 in performing its function, or it may form complexes with other proteins, thereby providing a robust alternative to the deleterious effects of the mutation. Predicted deleterious mutations may affect the starch binding ability of GWD. For plants in which mutations at GWD redox motif may have occurred, they may utilize another cysteine in the sequence or devise another means by which it redox regulate the enzyme.

Conclusions
In this study, we used various computational approaches to compare the GWD sequences of 48 plant species, with our results providing an insight into the evolutionary variation in GWD catalytic activity among plants. Deleterious mutations were identified for some plants at various positions of the five aromatic amino acids, which are highly conserved in tandems of CBM45 and vital for binding of the enzymes to starch. These mutations may be responsible for altered carbohydrate binding activity of GWD in plants, thereby affecting phosphorylation of transit and stored starch. This study has the potential to guide ongoing and future efforts towards producing modified starch in plants, targeted at reducing environmental pollution occasioned by the artificial modification of industrial starch phosphate content. Finally, this study has provided a broader understanding of the complexity involved in the evolution of GWD enzymes in various plants during speciation.

Declaration of Competing Interest
Authors declare we have no competing financial or personal interest.