The GASA Gene Family in Cacao ( Theobroma cacao , Malvaceae): Genome Wide Identification and Expression Analysis

: The gibberellic acid-stimulated Arabidopsis ( GASA/GAST ) gene family is widely distributed in plants and involved in various physiological and biological processes. These genes also provide resistance to abiotic and biotic stresses, including antimicrobial, antiviral, and antifungal. We are interested in characterizing the GASA gene family and determining its role in various physiological and biological process in Theobroma cacao . Here, we report 17 tcGASA genes distributed on six chromosomes in T. cacao . The gene structure, promoter region, protein structure and biochemical properties, expression, and phylogenetics of all tcGASA s were analyzed. Phylogenetic analyses di-vided tcGASA proteins into five groups. Among 17 tcGASA genes, nine segmentally duplicating genes were identified which formed four pairs and cluster together in phylogenetic tree. Differential expression analyses revealed that most of the tcGASA genes showed elevated expression in the seeds (cacao food), implying their role in seed development. The differential expression of tcGASAs was recorded between the tolerant and susceptible cultivars of cacao, which indicating their possible role as fungal resistant. Our findings provide new insight into the function, evolution, and regulatory system of the GASA family genes in T. cacao and may suggest new target genes for development of fungi-resistant cacao varieties in breeding programs. about the expression analyses of this gene family specifically regarding fungus-related diseases. To the best of our knowledge, we are the first to provide data of GASAs related to their distribution in genome, chemical properties, subcellular localization, and cis-reg-ulatory elements of promoter regions in cacao. We also explored the roles of GASA genes in various abiotic and biotic stresses. This helps us to identify GASAs that show high expressions against infections of fungus Phytophthora megakarya . the identified GASA genes and their in cacao.


Introduction
Theobroma cacao L. belongs to the family Malvaceae [1]. This is an economically important tree and grows in up to 50 countries located in the humid tropics [2]. Theobroma cacao L. seeds are enclosed in pods and are used for chocolate production, confectionery, and cosmetics [3]. This plant is adapted to high humidity areas, and is therefore predisposed to various fungal diseases [4,5]. Pod rot, or black rod, is caused by the Phytophthora species of fungus (P. megakarya, P. palmivora, and P. capsici), leading to 20-30% loss in yield and 10% death of trees [4]. The elucidation of the whole genome is helping to understand the genetic bases of biotic and abiotic stresses [2]. The availability of the high-quality chromosome-level genome assembly of Theobroma cacao [2,6] provides quality resources for the characterization of various gene families to elucidate the role of different gene families in cacao development. However, to the best of our knowledge, few gene families such as WRKY [7], sucrose synthase [8], Stearoyl-acyl carrier protein desaturase [9], sucrose transporter [10], and NAC [11] have been elucidated in Theobroma cacao.
The gibberellic acid-stimulated Arabidopsis (GASA/GAST) gene family is widely distributed in plants and performs various functions [12,13]. GAST1 was the first gene identified among the GASA family's genes in tomato [14]. GASA proteins are comprised of three domains, including a signal peptide of up to 18-29 amino acids at the N-terminal, hydrophilic and high variable regions of up to 7-31 amino acids in the center, and a conserved domain at the C-terminal of up to 60 amino acids which mostly includes 12 cysteine residues [15][16][17]. The C-terminal domain is the characteristics of all identified GASAs [12,15,16,18]. These cysteine-rich peptides play a vital role in various plant processes. Their roles have been stated in organ development [19], lateral root development [20], stem growth [21], cell division [22,23], fruit ripening and development [24], flowering time [12,18,21], seed development [13,22], and bud dormancy [25]. The detailed expression analyses of various tissues of Arabidopsis, tomato, rice, soybean, and apple showed the tissue-specific expression of the GASA genes [12,15,18]. For example, GASA genes in tomato, including Solyc11g011210, Solyc12g089300, and Solyc01g111075, showed high expression at the fruit-ripening stage, whereas Solyc03g113910, Solyc01g111075, Solyc06g069790, and Solyc12g042500 showed high expression level at the flowering stage [15]. Moreover, some studies also showed the contrast effect of GASA proteins. For instance, AtGASA4 promotes flowering [22], while AtGASA5 induces the opposite effect [21].
The GASA genes also play crucial roles in response to various biotic, abiotic, and hormone-related stresses. For instance, a member of the GASA gene family, GmSN1, overexpressed and enhances virus resistance in Arabidopsis and soybean [26]. Similarly, a high expression level of CcGASA4 was reported in citrus leaves after infection with Citrus tristeza virus [27]. The antimicrobial properties of various proteins of the GASA family have also been reported [28][29][30]. The antifungal activity of the GASA proteins has been reported in almost all tissues of potato, including root, tubers, leaves, stem, stolon, axillary bud, and flowers [28][29][30]. Similarly, the antifungal activity of GASA members has been found in Arabidopsis, tomato, Alfalfa, and Jujuba [16,[31][32][33]. The GASAs also showed resistance to various abiotic stresses, such as salt and drought [34]. The induction of GASA4 and GASA6 has been reported in Arabidopsis by growth hormones such as auxin, brassinosteroids (BR), gibberellic acid (GA), and cytokinin. In contrast, repression has been stated by stress hormones including salicylic acid (SA), abscisic acid (ABA), and jasmonic acid (JA) [31]. The expression pattern and evolutionary relationships of GASA genes were studied in Arabidopsis [16], apple [12], common wheat [35], grapevine [13], soybean [18], and potato [36].
Here, we are interested in characterizing the GASA gene family and providing data about the expression analyses of this gene family specifically regarding fungus-related diseases. To the best of our knowledge, we are the first to provide data of GASAs related to their distribution in genome, chemical properties, subcellular localization, and cis-regulatory elements of promoter regions in cacao. We also explored the roles of GASA genes in various abiotic and biotic stresses. This helps us to identify GASAs that show high expressions against infections of fungus Phytophthora megakarya.

Identification of GASA Genes in the Genome of Theobroma cacao and Analyses for Conserved GASA Domain
We retrieved the GASA family's protein sequences from The Arabidopsis Information Resource (TAIR10) database (ftp://ftp.arabidopsis.org). We used them as a query in BLAST for the identification of GASA genes in the Theobroma cacao genome, with an expected value of E −10 . The GASA genes were identified in the latest version of the Theobroma cacao genome (Theobroma cacao Belizian Criollo B97-61/B2) [6] and retrieved protein sequences, coding DNA sequences (CDS), genomics, and promoter sequence (1500 bp upstream of gene). The retrieved protein sequences were further analyzed for the presence of the GASA domain using the CDD database, available online: https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi (accessed on 24 November 2020), of the National Center for Biotechnology Information (NCBI). All the sequences that showed the presence of the conserved GASA domain were selected for further analyses, whereas all the proteins with an absent/truncated domain were discarded.

Chromosome Mapping and Characterization of Physiochemical Properties
The position of each gene, including chromosome number and position on chromosome, was noted. All the genes were renamed according to the location of the chromosome and their position, as shown in Table 1. The MapChart software [37] through Ensemble was used to show the position of each tcGASA gene along the position of the chromosome. Various physiochemical properties, including length of the protein, molecular weight (MW), isoelectric point (PI), instability index, and the grand average of hydropathy (GRAVY), were determined using the ExPASy tool [38]. The subcellular localization of GASA genes was also predicted using the BUSCA webserver [39].

Gene Structure and Promoter Region Analyses
We analyzed CDS sequences for exons-introns within all tcGASA genes using the Gene Structure Display Server, available online: http://gsds.cbi.pku.edu.cn (accessed on 26 November 2020). PlantCare [40] was used to study cis-regulatory elements in the 1500 bp promoter region.

Prediction of Post-Translational Modifications of GASA Proteins
The phosphorylation site of the GASA proteins was predicted by the NetPhos 3.1 server [41] with a potential value >0.5. N-glycosylation sites were predicted using the NetNGlyc 1.0 server [42] with default parameters.

Phylogenetic and Conserved Motif Analyses
The phylogenetic relationship of the GASA genes of T. cacao was inferred with GASA genes of six other species, including Arabidopsis thaliana, Gossypium raimondii, Vitis vinifera, Oryza sativa, Brachypodium distachyon, and Zea mays. A similar approach, which was given for the identification of GASA genes in Theobroma cacao, was employed for the identification of the genes of all other species. Clustal Omega [43] was used for multiple alignment of the protein sequences of all species. The unrooted neighbor-joining tree was drawn using MEGA X [44] and visualization of the tree was improved by using an interactive tree of life (iTOL) [45]. The conserved motif distribution into GASA proteins was performed using MEME v5.3.0 server [46]. We searched for a maximum number of five motifs with a minimum width of motif 6 and a maximum width of motif 30.

Gene Duplications and Estimation of Ka/Ks Values
The identity of >85% in nucleotide sequences of genes is considered a sign of duplication [47]. Hence, we aligned DNA coding sequences using Clustal Omega [43], and the extent of the identity of the genes with each other was determined using Geneious R8.1 [48]. Gene duplication events, as compared to other species, were determined using the MCScan v0.8 program [49] through the Plant Genome Duplication Database.

Three-Dimensional Protein Modeling and Molecular Docking
We used iterative template-based fragment assembly simulations in I-TASSER [53] to build three-dimensional protein structures of GASAs after selection of best models by the 3D-refine program [54]. We also used P2Rank in the PrankWeb software [55] and the CASTp tool [56] to analyze the refined structure of GASA proteins to predict protein pockets and cavities. Finally, PyMOL [57] was used to visualize results.

In Silico Expression Analysis of GASA Genes through RNA-seq Data
The publicly available RNA-seq data related to the cacao genome were employed for expression assays to measure GASA family members in multiple tissues and during various biotic and abiotic stimuli exposure. The RNA-seq data of cacao inoculated with Phytophthora megakarya for 0 h (0 h), 6 h, 24 h, and 72 h in susceptible cultivar Nanay (NA32) and fungal resistant cultivar Scavina (SCA6) were downloaded from GEO DataSets under accession number GSE116041 [58]. The comparison of susceptible cultivar NA32 and fungal resistant cultivar SCA6 was performed to determine differentially expressed genes within both cultivars and to identify those genes that are specifically induced in fungal resistant cultivar SCA6. These data were log2 transformed to generate heatmaps via the TBtools package [59]. Furthermore, the expression levels of GASA genes for tissue specific expression and under multiple abiotic stresses, including cold, osmotic, salt, drought, UV, wounding, and heat, have been detected in the Arabidopsis orthologous genes for tcGASAs (SAMEA5755003 and PRJEB33339).

Identification of GASA Genes and Their Distributions on Chromosomes within Genomes
We detected 17 GASA genes in the genome of T. cacao distributed on six chromosomes out of ten. These genes were named from tcGASA1 to tcGASA17 based on their distribution on chromosomes starting from chromosome 1. When two or more genes were present on the same chromosome, then the gene present at the start of a chromosome was named first (Table 1 and Figure 1). Six genes were distributed on chromosome 8 and five genes were distributed on chromosome 4. Chromosome 1 and chromosome 2 each contained two genes, whereas chromosome 5 and chromosome 9 contained one gene each. These data showed the unequal distribution of tcGASA genes within the cacao genome. The location of each gene on the chromosome is mentioned in Table 1, as well as the start and end. Moreover, the sequences of genes, proteins, coding regions, and promoter regions are provided in Table S1.

Protein Length, Molecular Weight, and Isoelectric Point of tcGASA Proteins
In the current study, tcGASAs were characterized based on their physiochemical properties ( Table 1). The identified tcGASA proteins were low molecular weight proteins ranging in length from 88 (tcGASA13) to 320 (tcGASA05) amino acids with molecular weight (MW) ranging from 9.64 to 33.92 kDa. Except for tcGASA05 and tcGASA08, with molecular weights of 13.42 kDa and 33.92 kDa, respectively, the MW of all other tcGASAs was found to be less than 13 kDa. The isoelectric point also showed similarities among tcGASA and suggested the alkaline nature of the proteins. Except for tcGASA11, which has an isoelectric point of 6.67, all other tcGASA proteins have isoelectric points of more than 8, ranging from 8.27 to 9.83.

Analyses of Instability Index, GRAVY, and Subcellular Localization of tcGASA Proteins
The instability index provides information about the stable and non-stable features of proteins in the various biochemical processes. The instability index indicated 4 stable tcGASAs including tcGASA03, tcGASA11, tcGASA13, and tcGASA15, as well as 13 unstable tcGASAs ( Table 1). The positive value of GRAVY indicates its hydrophobic nature, whereas the negative value indicates the hydrophilic nature of proteins. The GRAVY value was recorded as negative for sixteen tcGASA proteins ranging from −0.75 to −0.023, but as positive (0.004) for tcGASA12 (Table 1). Hence, the data indicate the hydrophilic nature of most tcGASA proteins. Subcellular localization provides information about the function of proteins. Based on BUSCA, we predicted extracellular localization of tcGASA proteins, except for tcGASA08, which localized in the plasma membrane (Table 1).

tcGASA Proteins 3D Structure Analyses and Post-Translational Modifications
The predicted 3D structure of all tcGASA proteins showed that these proteins contain β sheets, α helices, random coils, and extended strands. The random coils were the most abundant and were more extensive than α helices, while the β sheets were the least ( Figure  2). In addition, the active sites of GASA proteins were predicted in their structure. The proline, cysteine, lysine, serine, and threonine amino acids were more predicted, as the binding sites in all candidate GASA proteins in cacao ( Figure 2). GASAs were diverse based on predicted 3D structure and pocket sites, indicating that they could have various functions. In the present study, the post-translational modifications of tcGASAs were predicted in terms of phosphorylation and glycosylation ( Figure 3, Table S2). We predicted a total of 224 potential phosphorylation events on amino acids serine, threonine, and tyrosine within tcGASA proteins. Most of the phosphorylation events were predicted related to serine (92) followed by threonine (86) and then by tyrosine (46). Among tcGASA proteins, most of the phosphorylation sites (57 sites) were predicted in tcGASA05, whereas in other proteins, phosphorylation events ranged from 9 to 14 sites. Three tcGASA, including tcGASA10, tcGASA15, and tcGASA17, were also identified with a potential glycosylation site (Table S2).

Gain and Loss of Intron(s) and Conserved Motifs
We also determined numbers of introns-exons within the genes and motifs in protein sequences. We drew a separate phylogeny of the tcGASA genes of T. cacao to find out the extent to which genes that cluster together are similar in term of introns numbers and pattern, and in terms of protein number and pattern. Each phylogenetic group was shown with different colors for clarity (Figure 5a). The analyses of genomic sequences showed the absence of intron in one gene, presence of one intron in five genes, two introns in eleven genes, and three introns in one gene (Figure 5b). Genes clustered together showed differences in their number and in intron-exon distributions (Figure 5a,b). Five motifs were revealed in tcGASA. Four motifs (1)(2)(3)(4) were distributed in all tcGASAs, whereas a fifth motif was limited to Tc04v2_t016520 (tcGASA08), Tc05v2_t012230 (tcGASA10), and Tc09v2_t020100 (tcGASA17) (Figure 5c). The motifs of proteins that clustered together within the phylogenetic tree presented similarities in the distribution of motifs to some extent (Figure 5a,c).

Duplications, Divergence, and Synteny among GASA Genes
We analyzed the paralogous relationships among the GASA genes within cacao and analyzed the orthologous relationships of GASA genes by comparing them with the other six species (mentioned in methodology). Segmental duplications were found in nine GASA genes that were paired into four groups as: tcGASA02-tcGASA03-tcGASA13; tcGASA08-tcGASA09; tcGASA10-tcGASA17; and tcGASA14-tcGASA15 (Table 2) (as shown with similar colors in Figure 1). The analyses of synonymous and non-synonymous substitutions revealed that high purifying selection pressure exists on these genes after duplication. The analyses of divergence time indicated that the event of duplication of these pairs occurred 50 MYA to 204 MYA ( Table 2). The synteny analyses with GASAs of other species showed high resemblance and identified orthologous genes among T. cacao and compared species (Figure 6). The 17 GASA genes in cacao showed the syntenic relationship with 7 and 11 ortholog genes in the Arabidopsis and Gossypium raimondii, respectively (Figure 6a,b). Moreover, tcGASAs had the syntenic relationship with 7, 5, 4, and 7 GASA genes from Vitis vinifera, Oryza sativa, Brachypodium distachyon, and Zea mays, respectively (Figure 6c-f). Interestingly, Os05g0432200, as a rice-GASA, and a GASA gene of Brachypodium, BRADI.2g24320v3, showed most syntenic blocks with tcGASAs.

Promoter Regions Analysis
The analyses of cis-regulatory elements in promoter regions revealed the presence of binding sites for key transcription factors related to light-responsive elements (48.88%), hormone-responsive elements (25.40%), stress-related elements (19.06%), growth-response elements (5.40%), and DNA-and protein-related binding sites (1.26) (Figure 7a). Regulatory sides were found for various hormones such as auxin, salicylic acid, abscisic acid, gibberellin, and methyl jasmonate (MeJA) (Figure 7b). Similarly, regulatory elements were identified for drought, elicitor, anaerobic induction, low temperature, and plant defense/stress (Figure 7c). The complete detail of each element, along with sequence and function is provided in Table S3.

In-Silico Tissue-Specific Expression of tcGASA Genes
We evaluated the expression of orthologous tcGASAs in various tissues to evaluate their role in the functions of T. cacao (Figure 8a). Differential expression was noted for tcGASA genes in various tissues of cacao. Five genes, including tcGASA02, tcGASA03, tcGASA08, tcGASA09, and tcGASA13, showed high expression in the beans (food part of cacao). In addition, tcGASA16 showed a high expression in leaves and entire seedlings. However, tcGASA02 and tcGASA03 were significantly downregulated in leaves compared to beans. In-silico expression results showed that tcGASA12 and tcGASA17 were less induced in pistil tissues.

Expression Analyses of tcGASAs in Abiotic and Biotic Stresses
We explicated the role of tcGASAs under various abiotic stress by analyzing orthologous genes of Arabidopsis (Figure 8b). The orthologous genes of tcGASA01, tcGASA05, tcGASA12, and tcGASA14 showed an upregulation in response to wound healing and the orthologs of tcGASA05 were more expressed in response to cold stress. The expression of orthologous GASA genes in Arabidopsis was less induced in response to drought and UV stresses. The tcGASA01 and tcGASA14 were highly upregulated in response to osmotic pressure and salt stress, while tcGASA016 and tcGASA17 were downregulated.

Expression Analyses of tcGASAs in Biotic Stress (P. megakarya)
The role of tcGASAs against P. megakarya was accessed using RNA-seq data of cacao inoculated with P. megakarya for 0 h, 6 h, 24 h, and 72 h in susceptible cultivar Nanay (NA32) and fungal resistant cultivar Scavina (SCA6) (Figure 9a,b). In susceptible cultivar, tcGASA12 and tcGASA13 as early responses were upregulated after 6 h of inoculation while tcGASA12 was more expressed after 72 h (Figure 9a). The expression patterns of tcGASAs were different in fungal resistant cultivar Scavina, whereas most genes were induced after 72 h of incubation of P. megakarya (Figure 9b). Moreover, tcGASA03, tcGASA05, and tcGASA13 were more upregulated in response to fungi treatment after 72 h of incubation in resistant cultivar Scavina (Figure 9b). Four genes, tcGASA01, tcGASA08, tcGASA09, tcGASA15, were not expressed. Probably, they are induced at specific conditions or at a specific step of growth and development. The analyses showed differential expression of tcGASAs expression under biotic stress of fungus P. megakarya in SCA6 and NA32 ( Figure 10). The tcGASA03, tcGASA05, tcGASA06, tcGASA16, and tcGASA17 showed high expression after 24 h of inoculation and tcGASA03, tcGASA04, and tcGASA13 more expressed after 72 h in SCA6. This showed their possible role for fungal resistance. The tcGASA05, tcGASA12, and tcGASA14 also showed high expression in SCA6 but these genes also showed high expression in NA32. Hence, these may not be involved in resistance to fungus.

Discussion
In the current study, we identified 17 tcGASAs and analyzed their genomic distributions and chemical properties. RNAseq data analyses explicated their possible roles in bean development (food of cacao), various abiotic stresses, and the biotic stress of P. megakarya.
Cacao is an economically important plant, and its seeds are used for chocolate, and it is also the main source of income for 40-50 million farmers [2,3]. Despite this importance, up to the best of our knowledge, five gene families, WRKY [7], sucrose synthase [8], Stearoyl-acyl carrier protein desaturase [9], sucrose transporter [10], and NAC [11] are studied in cacao. The role of the NAC family was not explained in abiotic and biotic stresses. The GASA family's role was reported in plant development, function regulation, and biotic and abiotic stresses, as mentioned in the introduction (vide infra). Cacao production is also affected by various biotic and abiotic stresses. In the current study, we focused on the GASA gene family in cacao. We identified 17 GASA genes, which were unequally distributed on six chromosomes among ten. The GASA proteins have low molecular weight with conserved GASA and cysteine domains. Our findings are in agreement with previous studies that revealed that GASA genes mostly exist in lower numbers, have low molecular weight, and are unequally distributed on chromosomes within genomes as reported for rice, 9 GASA genes [15], 14 genes in Grapevine [13], 15 genes in Arabidopsis [12], and 19 genes in tomato [15]. However, a somewhat high number of GASA have also reported, such as in apple, 26 genes [12], and 37 genes in soybean [18]. The number of introns were variable in tcGASA within genes that cluster together in phylogeny. The loss and gain of introns occurs in the course of evolution within protein-coding genes of the plants and is also reported within the GASA of other plant species [13,18,[60][61][62].
The tcGASAs were predicted to be alkaline, hydrophilic, and mostly unstable proteins. However, we also detected four stable proteins: tcGASA03, tcGASA11, tcGASA13, and tcGASA15. The stability demonstrates the lifetime of proteins related to cellular enzymatic reactions [62]. Hence, these four proteins may play extensive roles in various enzymatic activities. The data of subcellular localization also provide insight into the function of proteins [63]. Apart from tcGASA08 (which localizes to the plasma membrane), other proteins locate in the extracellular space. Extracellular localization of GASA proteins in several plants has also been reported previously [15,18,21]. Localization of GASA proteins in the plasma membrane, cytoplasm, and the nucleus has also been described. The variations in subcellular localization may occur due to various factors, such as proteinprotein interaction and post-translational modifications [36,64]. The prediction of 3D structure and pocket site of proteins can provide valuable information about protein function based on ligand-binding sites [65,66]. In the present study, the cysteine, proline, lysine, leucine, serine, and threonine were frequently predicted as the key binding residues in the structure of GASA proteins, in which proline, serine, and leucine are known as the amino acid residues associated with responses to environmental stimuli [65,67].
Post-translational modifications are processes of chemical modifications of proteins and they produce diversity in structure and function, including subcellular localization, protein-protein interaction, and regulating enzyme activity by allosteric phenomena [36,68,69]. We predicted phosphorylation sites in all tcGASA, ranging in number from 7 to 57. The phosphorylation of proteins is also vital for cell signaling, regulation of various protein mechanisms, and as a substrate for various kinases [70][71][72]. It could be of interest to further study the function and structure of tcGASA proteins using tandem mass spectrometry (MS/MS), and CRISPR/Cas9 genome editing along with transcriptomic analysis of transgenic lines. Similarly, glycosylation is also an abundant and varied modification that plays an essential role in the biological and physiological functions of a living organism [73]. We detected glycosylation sites on the N-terminal of tcGASA10, tcGASA14, and tcGASA16. These tcGASAs may play significant roles in plant function and regulations.
We performed the phylogenetic analyses of tcGASAs with GASA of 6 other species. We also included Gossypium raimondii, a closely related species from the plant family Malvaceae. The phylogenetic analyses distributed GASA proteins into 5 groups and the tcGASAs were also distributed into all five groups. Tc08v2_t01469 (tcGASA16) was the only protein which grouped with Arabidopsis, while other tcGASAs showed sister relationships with GASA of Gossypium. The cacao belongs to the basal group, whereas Gossypium belongs to the crown group of the family Malvaceae [74,75], but the close phylogenetic relationships of proteins of these two species support their family-level relationship. The motifs and exon-intron analyses showed variations within proteins that cluster together in phylogeny. This shows that GASA of some groups may have evolved during evolution, which led to variations in motifs and introns in some groups. The same was observed in GASA of other species and has also been reported for other gene families [13,18,76,77]. GASAs in a phylogenetic group showed different expression patterns, indicating that GASAs are controlled by various regulatory systems. These findings also show that some other processes are related to the function of proteins instead of their close phylogenetic relationships. A similar observation was reported in Nicotiana tabacum L. [72]. However, some studies also proposed that the closely related proteins on a phylogenetic tree have a similar function [78,79].
The tandem and segmental duplicated genes play an important role in evolution, domestication, functional regulation, and biotic and abiotic stresses [80][81][82][83][84]. We determined segmental duplication of nine genes that form four groups as: tcGASA02-tcGASA03-tcGASA13, tcGASA08-tcGASA09, tcGASA14-tcGASA15, and tcGASA10-tcGASA17. These gene pairs were present in the same group within the phylogenetic tree. Similar results were reported in genome-wide analyses of GASAs in soybean [18]. A previous study also proposed that segmentally duplicated genes also showed similar functions and stable expressions. [18,85]. In the current study, each pair of segmentally duplicated genes did not agree with the previous finding. In biotic stress of fungus (P. megakarya), tcGASA08-tcGASA09 and tcGASA10-tcGASA17 showed similar expression that tcGASA08 and tcGASA09 both did not express while tcGASA10-tcGASA17 both are less regulated/down-regulated genes. The other two pairs tcGASA02-tcGASA03-tcGASA13 and tcGASA14-tcGASA15 showed a different expression, as some genes were found down-regulated while some genes were found up-regulated ( Figure 9). These findings suggest that segmentally duplicated pairs may also perform different functions, and functional analyses of each segmentally duplicated gene can provide authentic information about their roles in various physiological and biochemical processes. Moreover, our study, along with the previous report [18], suggests that genes within the same groups have more chances of segmental duplication events among them. The Ka/Ks < 1 indicates that purifying selection pressure exists on the GASA genes after duplication, as reported previously [13]. We also observed mostly purifying selection pressure on tcGASAs, including duplicating genes (Table S1).
The cis-acting regulatory elements involved in transcription of regulation genes are induced through independent signal transduction pathways under biotic and abiotic stresses [72,86]. We observed several key cis-regulating elements in response to light, hormones, stresses, and growth in the promoter site of tcGASAs. The cis-regulating elements for drought, anerobic induction, low temperature, and plant defense were also evident. The existence of diverse cis-regulating elements in promoter regions indicates their roles in the regulation of the tcGASAs and different pathways of cacao. Further study such as using the CRISPR/Cas genome-editing system and T-DNA can shed light on the roles of these cis-regulating elements.
Different strategies are used to develop cacao cultivars that produce food in high quantities and are resistant to abiotic and biotic stresses [4]. Here, we studied the role of GASA genes in cacao development and in protection against biotic and abiotic stresses. Tissue-specific expression was observed for tcGASAs, which showed their role in the de-velopment and functional regulation of cacao. Up to eight tcGASAs were highly expressed in the cacao bean, which is the food part used for making chocolate. This high expression may reveal the conserved function of these genes in the development of the bean and cacao flavor. Gene expression analysis of tcGASA genes in cacao beans in disease resistance (bulk) and disease susceptibility (fine flavor cacao) could be of interest [87,88]. The role of tcGASAs was also stated in the development of grapevine seeds [13]. Further data based on cloning can provide new insight into their roles in bean development. Expression analyses of the orthologous genes of Arabidopsis also indicated the role of GASA genes in various abiotic stresses, including drought, which significantly affects the growth of cacao [89]. Hence, these genes may also be important to produce drought-resistant cultivars. Expression analyses also provide insight into the gene's function in response to biotic stresses [72]. The black rod disease of genus Phytophthora caused up to 20-25% loss (700,000 metric tons) to the world cacao production annually. In some regions, the Phytophthora caused losses of 30-90% of the crops [90]. Here, we explored the function of tcGASAs based on RNAseq data against black rod causing pathogen P. megakarya and observed highly expressed tcGASA genes in plants inoculated at 24 h and 72 h in the tolerant cultivar SCA6 as compared to susceptible cultivar NA32. These data indicate that tcGASAs respond to fungus. Hence, the complete characterization of these upregulated genes can provide target genes for the development of resistant cultivars to the disease of genus Phytophthora to enhance the production of cacao for the welfare of not only farmers involved in cacao cultivation, but also for the welfare of all humanity to provide high-quality delicious chocolate with quality nutrition.
In conclusion, our study provides new insight into the identification, characterization, and expression of the GASA genes in the Theobroma cacao plant. Expression analyses revealed the role of the GASA genes in seed development. Our findings reveal that tcGASAs are diverse based on their structure and regulatory systems, indicating that they are involved in various cellular pathways related to development and stress responses. Furthermore, our result indicates that the GASA genes could be related to resistance against the fungus Phytophthora megakarya, which causes significant losses to cacao production each year. Our study may be helpful for the generation of cultivars that are resistant to the fungus of the genus Phytophthora. The present work, as an in-silico study, revealed many aspects of structural, regulatory systems, and expression of GASA gene family members in cacao. However, it is necessary to evaluate the expression of these genes in different tissues of sensitive and fungus-resistant clones of cacao. We also suggest using new techniques such as CRISPR/Cas genome-editing to determine the function and interactions of cacao-GASAs.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4395/11/7/1425/s1, Table S1: The complete detail of each sequence of Theobroma cacao analyzed in current study including protein sequences, coding sequences, genomic sequences, and promoter regions, Table S2. Phosphorylation and glycosylation sites in GASA proteins, Table S3. Cis-regulatory elements in promoter regions of tcGASA.

Data Availability Statement:
The public data set is analyzed in the manuscript. All the analyzed data are available in the main manuscript or as Supplementary Materials.

Conflicts of Interest:
The authors declare no competing financial interests. Author Ibrar Ahmed and Hafiz Muhammad Talha Malik were employed by the company Alpha Genomics Private limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.