Functional Annotation of Hypothetical Proteins Derived from Suppressive Subtraction Hybridization ( SSH ) Analysis Shows NPR 1 ( Non-Pathogenesis Related )-Like Activity

Fusarium wilt is considered the most devastating banana disease incited by Fusarium oxysporum f. sp. cubense (FOC). The present study addresses suppressive subtraction hybridization (SSH) analysis for differential gene expression in banana plant, mediated through FOC and its interaction with biocontrol agent Trichoderma asperellum (prr2). SSH analysis yielded a total of 300 clones. The resultant clones were sequenced and processed to obtain 22 contigs and 87 singleton sequences. BLAST2GO (Basic Local Alignment Search Tool 2 Gene Ontology) analysis was performed to assign known protein function. Initial functional annotation showed that contig 21 possesses p38-like endoribonuclease activity and duality in subcellular localization. To gain insights into its additional roles and precise functions, a sequential docking protocol was done to affirm its role in the defense pathway. Atomic contact energies revealed binding affinities in the order of miRNA > phytoalexins > polyubiquitin, emphasizing their role in the Musa defense pathway. Contig 21 and polyubiquitin showed an atomic contact energy value of −479.60 kJ/mol, and even higher atomic contact energies were observed for miRNA (−804.86, −482.28, −494.75 kJ/mol), demonstrating its high RNA-binding properties. Phytoalexin contig 21-interacting interfacial residues were identified as rigid (10)/non-rigid (2) based on Bi, N values, and B-factor per residue. Hence, based on these results, contig 21 was characterized as a NPR1 (non-pathogenesis-related protein) homolog that is involved in plant defense and systemic induced resistance.


Introduction
Fusarium wilt, also called Panama disease, is caused by the soilborne fungus Fusarium oxysporum f. sp.cubense (FOC).It hinders banana production worldwide [1].The role of FOC in the spread of the disease can be corroborated by the production of chlamydospores; when the spores contact susceptible lateral or feeder banana roots, the spores germinate, and infection is established.After infection and enumeration, FOC occupies the xylem vessels.The infected banana plants show symptoms Agronomy 2019, 9, 57 2 of 14 such as yellowing, wilting, and vascular discolorization.Fungicide control for combating FOC includes a corm injection protocol using carbendazim [2].At this time, proteins termed hypothetical or uncharacterized are becoming increasingly abundant, due to difficulty in annotation protocols.Further deciphering the function of proteins, rather than their available structures, defines the impractical situation for researchers worldwide.Although chemical fungicides are increasingly being used in agriculture, the role of hypothetical proteins and their regulatory roles is still an underexploited arena.Nevertheless, it has been regarded as toxic and harmful not only for the plant, but also its surrounding environment [3].Deleterious effects resulting from the usage of chemical fertilizers have now forced farmers and agricultural researchers to look for an exclusive and effective alternate to abate chemical use in controlling FOC.Hence, the use of biocontrol agents in combating plant diseases has become an increasingly important research topic, and has been given a prominent role in ecofriendly agricultural crop protection strategies [4,5].Trichoderma spp.are regarded as an exemplary form of biocontrol agent against plant diseases in comparison with chemical fertilizers [6].Trichoderma spp.are ubiquitous and have been used for biocontrol on a wide variety of crop plants [7,8].Over the last decade, Trichoderma-based agro-products, such as biopesticides and biofertilizers, have been increasingly used as ecologically useful alternatives in crop protection, and their application has been observed to have the potential to alleviate soil fertility issues and result in rises in crop productivity and yield [6,7].Trichoderma has the potential to suppress the activity of multiple plant pathogens, including FOC of banana [9,10].
The advent of the omics era has resulted in the escalating accumulation of gene and protein sequences in public databases.Now, the analysis and annotation of accurate function-related data for available sequences pose huge problems to biologists.From the proteome perspective, protein sequences corresponding to comprehensive structures establish the need for bioinformatics approaches in the elucidation of three-dimensional structures of proteins.In the present century, deciphering the function of a protein relies largely on computational tools abridging the laborious experimental protocols.A significant proportion of a genome is demarcated as 'hypothetical' and 'conserved hypothetical'.The former represents the proteins, which lack experimental proof for translational machinery but are originated from an open reading frame [11].By contrast, conserved hypothetical proteins refer to proteins with phylogenetic lineages with no known definitive function [12].The functional annotation of proteins in any genome, whether prokaryotic or eukaryotic, yields a considerable amount of proteins as hypothetical, which possess novel and uncharacterized functional properties [13].Generally, almost half of the proteins in a genome are constituted by hypothetical proteins [14], stressing the need for manual annotation through computational tools.In the recent past, it has been demonstrated that hypothetical proteins have a strong association with evolutionary significance [15,16].
Suppressive subtraction hybridization (SSH) has been a very useful technique for the identification of differentially expressed genes, and reveals the activity, process, and function of a characteristic gene in complex pathways [17,18].SSH has been applied to decipher plant-pathogen interactions in the banana plant for Fusarium wilt [19,20], Mycosphaerella eumusae [21], and Mycosphaerella fijiensis [22].A considerable group resulting from SSH are hypothetical proteins with unknown function, which thus necessitate manual annotation for further functional characterization.The present study is unique, and the computational model used could be rationalized by assigning roles to the plant defense mechanism of Musa.It has been extensively shown by many researchers that the plant-pathogen interaction pathway in Musa depicts miRNA production, phytoalexin accumulation, and defense-related gene induction.The abovementioned steps were taken into account for the elucidation and establishment of a functional role for hypothetical proteins derived from the SSH library of biocontrol-FOC interactions in banana.In the present study, an attempt is made to characterize functional roles for a hypothetical protein, contig 21, based on an in silico analysis.The assessment encompasses sequential docking in the interpretation of RNA-protein interaction, protein-protein interaction, and protein-ligand interaction.Interacting interfacial residues involved in protein-ligand interactions were categorized into rigid/non-rigid residues based on solvent accessibility.SSH analysis of contig 21 resulted in hypothetical proteins being annotated for their function and involvement in the plant defense pathway.Post-translational modification was also performed, in an effort to gain insights into additional functional roles.

Plant Materials and RNA Isolation
Banana cv.Grand Naine (AAA), grown in mud pots, was inoculated with Fusarium wilt pathogen (Fusarium oxysporum f. sp.cubense) and challenged with Trichoderma asperellum.Petri plates were antagonistically streaked to assess antifungal activity.Total RNA was isolated for differential analysis of treatment with FOC alone and T. asperellum alone.

SSH Library Construction
Suppression subtractive hybridization was performed using PCR-Select cDNA subtraction kit (Clontech, Mountain View, CA, USA).Forward and reverse subtraction libraries were constructed using cDNA samples of control versus treatment.The subtracted cDNAs were subjected to two rounds of PCR to normalize and enrich cDNA populations.The PCR products were subcloned into a pGEM-T easy cloning vector and transformed into Escherichia coli JM109 competent cells.Luria-Bertani medium with isopropyl β-D-1-thiogalactopyranoside (IPTG) and X-gal were used for screening recombinants.Single white colonies were picked and grown overnight at 37 • C. Glycerol stocks were prepared, and all the clones were stored at 80 • C until further use.

Physicochemical Characterization
Theoretical physiochemical parameters, such as molecular weight, isoelectric point, aliphatic index, instability index, and grand average of hydropathicity (GRAVY) were calculated by using ExPASy's ProtParam server (http://web.expasy.org/protparam/) to predict protein stability.ProtScale (http://web.expasy.org/protscale/) was used for hydrophobicity plotting based on the Kyte and Doolittle method for analyzing permissive sites in the profile.

Homology Modeling of miRNA and Target Proteins
miRNAs in functionally annotated proteins were predicted from miRBase and conformations in the secondary structure were assessed using GeneBee server which, in turn, was used for 3D modeling (http://www.genebee.msu.su/services/rna2_reduced.html).Suitable templates were chosen using R3D BLAST and aligned using ClustalX.The template and alignment are fed on to MODERNA server (http://iimcb.genesilico.pl/modernaserver/submit/model/),for 3D modeling of the miRNAs.Model building involves the addition of modification, exchange of residues, copying of residues, the addition of loops, and removal of modifications.
For homology modeling of proteins, templates were selected using the homology detection and structure prediction by HMM-HMM comparison employing PSI-BLAST (http://toolkit.tuebingen.mpg.de/).The availability of authentic structures in the Protein Data Bank (PDB) was comparatively checked in NCBI Entrez, SWISS PROT, and SCOP databases.The templates chosen had an E-value of <3, and similarity of more than 85%.The proteins were modeled by employing MODELLER 9.14 [31].Secondary structure was predicted by PSIPRED server (http://bioinf.cs.ucl.ac.uk/psipred/) [32].

Molecular Dynamics Simulation
The 3D molecule was minimized locally in vacuo by constraining the backbone of the helices in order to give a first optimization of the rough geometry derived from homology modeling.Using GROMOS96 of SWISS-PDB VIEWER [35].GROMOS96 is a molecular dynamics computer simulation package for the study of biomolecular structures.Hydrogens were initially added and amino acids with a more than 30% solvent accessible surface were chosen for further simulations.Energy minimization parameters include 20 steps of deepest descent, and other default conditions in the GROMOS96 package (http://www.gromos.net/).

Analysis of Pockets and Clefts
Pocket detection and occupancy in the modeled proteins were found using CASTp (http://sts.bioe.uic.edu/castp/index.html?1ycs).The solvent accessible surface area (SASA) was found by GETAREA (http://curie.utmb.edu/getarea.html)[36].The atomic SASA covered by each cleft was calculated by utilizing the radius of water probe 1.4A and the area/energy/residue was calculated.The dielectric constant was set to 80.0, and Poisson-Boltzmann method of computation for 20 cycles was used for calculating the electrostatic potential in SWISS-PDB, USA viewer (https://spdbv.vital-it.ch/).

Preparation of Phytoalexins
ACD/ChemSketch (https://www.acdlabs.com/resources/freeware/chemsketch/)was used to create two-dimensional Chemical Markup Language (CML) files of phytoalexin structures.OPENBABEL GUI v2.4.1 (http://openbabel.org/wiki/Main_Page) was used to convert cml files to Protein Data Bank (PDB) files.The root-mean-square deviation (RMSD)-based energy minimization was performed in vacuo to give a first optimization of the rough structure.Hydrogens were initially added to a receptor molecule, and AMBER and Gasteiger charges were added to fix unusual bonds in the 3D structure, which utilizes CHARMM force field parameters.The 2D model was optimized, and energy use was minimized using a clean geometry option in ArgusLab 4.0.[37].

Docking and Analysis of the Interaction
PatchDock (https://bioinfo3d.cs.tau.ac.il/PatchDock/) was employed for RNA-protein docking and protein-protein docking.The atomic contact energy-occupying area of the docked complex was analyzed.Molecular Docking server was employed for protein-ligand docking (http:// www.dockingserver.com/web).The geometry of interactions and HB plot of interacting residues was calculated.

Prediction of Rigid/Non-Rigid Nature of Interacting Residues
Normalized B-factor for ligand-binding residues was predicted using BFPred server, and individual B-factors were statistically depicted for summarizing the rigid/non-rigid nature of the ligand-binding residues.The normalized B-factor per residue (B i , N) was calculated based on where B i -B-factor of residue, <B i >-mean B-factor, σ-standard deviation, B i , N ≤ 0.04 was considered as rigid whereas B i , N ≥ 0.04 were regarded non-rigid.

SSH Analysis
A total of 31 hypothetical proteins were chosen for functional annotation.Amplification of 28S rRNA between subtracted and unsubtracted cDNAs after 28 PCR cycles in subtracted sample confirms successive subtraction.

Sequence Analysis
The obtained contigs and singleton sequences were separately clustered on the basis of unknown/uncharacterized or hypothetical proteins.The Basic Local Alignment Search Tool (BLASTX) and homology detection by HHPRED, which is a free protein function and protein structure prediction server, resulted in Musa-specific hypothetical proteins.Initial sequence analysis provided the basic insight that contig 21 could have potential functional attributes.BLASTX results for the hypothetical proteins obtained from differentially expressed genes in biocontrol-FOC interaction reveal three contigs of relevance to the Musa genome (Table 1).

Homology Modeling and Sequential Docking
The sequences identified from SSH analysis yielded seven proteins that have functions in plant defense.The protein sequences comprise of bZIP transcription factor, polyubiquitin, calmodulin-binding protein, endochitinase, isoflavone reductase, and mannose glucose-binding lectin.The above proteins were modeled for comprehensive three-dimensional structures (Figure 1) along with the hypothetical protein contig 21, which was functionally annotated as NPR1 (non-pathogenesis-related protein) homolog.Stereochemical analysis of the protein significantly confers with 85.4% residues in the favored region and the least number of residues in the outlier region (Figure 2).With this view, nucleotide-binding properties and protein interaction of contig 21 were assessed after miRNA prediction.In contig 21, three putative miRNAs were identified by miRbase, namely >+36-57 AUGGUGAAAUUUGCAAACACUC, >+298-319 GAAGGGAACUCGAUCUAUCUGA, >−489-510 UUUGACGUUGGAGUCCAGUUC.miRNA has the greatest binding energy, followed by phytoalexins and finally, polyubiquitin (Table 2).The highest binding atomic contact energies (ACE) were observed for miRNA followed by phytoalexins and polyubiquitin, emphasizing contig 21's role in the Musa defense pathway.The three-dimensional structures of miRNAs predicted with unusual bond angles show that they are single-stranded in nature (Figure 3).With the emphasis of RNA-binding properties, phytoalexin anigorufone, cis-2,3-dihydro-2,3-dihydroxy-9-phenylphenalenone, 4-phenylphenalenone trenbolone, and 4 -ethoxyirenolone were docked onto contig 21.
The highest binding atomic contact energies (ACE) were observed for miRNA followed by phytoalexins and polyubiquitin, emphasizing contig 21's role in the Musa defense pathway.The three-dimensional structures of miRNAs predicted with unusual bond angles show that they are single-stranded in nature (Figure .3).With the emphasis of RNA-binding properties, phytoalexin anigorufone, cis-2,3-dihydro-2,3-dihydroxy-9-phenylphenalenone, 4-phenylphenalenone trenbolone, and 4′-ethoxyirenolone were docked onto contig 21.The modeled hytoalexins have comprehensive structures and, upon binding, show distinct geometry of interactions (Table 3).Docked poses of the four phytoalexins show similar binding, and this can be correlated to same active site-binding pattern for all the four phytoalexins docked onto contig 21 (Figure 4).The modeled hytoalexins have comprehensive structures and, upon binding, show distinct geometry of interactions (Table 3).Docked poses of the four phytoalexins show similar binding, and this can be correlated to same active site-binding pattern for all the four phytoalexins docked onto contig 21 (Figure 4).Active site-interacting residues are Phe39, Asp113, Leu101, Met114, Trp66, Asp111, Leu52, Met36, Ala102, Arg63, Val37, and Ile34.Furthermore, the interacting residues were characterized as rigid/non-rigid based on their B-factor profiles.B-factor values pictorially showing Bi, N values of individual residues clearly indicate that increased binding affinity is due to the contribution of the rigidness of interacting residues (Figure 5 and Table 4).Ten rigid residues and 2 non-rigid residues show the efficiency of interaction.

Duality in Localization in Deciphering Contig 21 as NPR1 Homolog
Subcellular localization analysis shows contig 21 to be both nuclear and cytoplasmic.Further, there were no signal peptides and transmembrane domains present.NPR1 occurs as an inactive oligomer in the cytoplasm.Based on gene ontology, a function for contig 21 was deciphered as endoribonuclease p38-like activity, which is involved in RNA interference and possesses transferase Active site-interacting residues are Phe39, Asp113, Leu101, Met114, Trp66, Asp111, Leu52, Met36, Ala102, Arg63, Val37, and Ile34.Furthermore, the interacting residues were characterized as rigid/non-rigid based on their B-factor profiles.B-factor values pictorially showing Bi, N values of individual residues clearly indicate that increased binding affinity is due to the contribution of the rigidness of interacting residues (Figure 5 and Table 4).Ten rigid residues and 2 non-rigid residues show the efficiency of interaction.

Ubiquitination Activities of Contig 21
Diversified ubiquitination patterns showed contig 21 as a NPR1 homolog (Table 5).Posttranslational modification mapping of contig 21 yielded significant differential patterns for ubiquitination alone, rather than glycosylation, sumoylation, acetylation, and other modifications.

Duality in Localization in Deciphering Contig 21 as NPR1 Homolog
Subcellular localization analysis shows contig 21 to be both nuclear and cytoplasmic.Further, there were no signal peptides and transmembrane domains present.NPR1 occurs as an inactive oligomer in the cytoplasm.Based on gene ontology, a function for contig 21 was deciphered as endoribonuclease p38-like activity, which is involved in RNA interference and possesses transferase activity.Hence, from the present results, it is evident that contig 21 has dual localization, like NPR1, and functions as a regulator of transcription.

Ubiquitination Activities of Contig 21
Diversified ubiquitination patterns showed contig 21 as a NPR1 homolog (Table 5).Post-translational modification mapping of contig 21 yielded significant differential patterns for ubiquitination alone, rather than glycosylation, sumoylation, acetylation, and other modifications.

Discussion
Fusarium wilt management involving chemical fungicides is rigorous and not economically feasible.Alternative strategies and environmentally friendly protocols for sustainable agriculture of banana and other crops have been in dire need worldwide.Transcriptome refers to the comprehensive listing of transcripts along with their splice variants.SSH analysis has been adopted as an economical mode for the identification of genes with ease and looks out for novel genes in a genome [38].Bioinformatic analysis, coupled with SSH, has been utilized for the identification of differentially expressed immune response in insects [39].However, studies pertaining to SSH analysis in banana corroborated with functional annotation of a hypothetical protein renders the present study significant.PSI-BLAST analysis showed a biological role of contig 21 in banana.The recently published Musa genome of 523 Mb [40] has resulted in a considerable amount of ESTs (expressed sequence tags).
NPR1 oligomer-monomer transition involves cysteine residues at 82, 150, 155, 160, and 216 positions [41].However, at the 82nd position alanine, instead of cysteine, is found in many homologs.Duality in subcellular localization viz., nucleus and cytoplasm, indicate that contig 21 could have potential implications for the similar monomer-oligomer transition.In contig 21, the 82nd residue was occupied by Ala, characteristic of NPR1.Leucine-rich repeats in the motif pattern of contig 21 are also revealing the function corresponding to NPR1.NPR1 natively possess protein-protein interaction motifs, a zinc finger, and ankyrin repeat domains with DNA-binding capability [42].Higher atomic contact energies were observed for miRNAs (−804.86,−482.28,−494.75 kJ/mol) followed subsequently by polyubiquitin (−479.60),suggesting a role for contig 21 in ubiquitination.miRNAs in regulating genes important for plant defense have been reported [43].The miRNA-binding capabilities of contig 21 indicate that it has regulatory significance in resistance to F. oxysporum.Differentially expressed miRNAs in tomato after F. oxysporum infection are involved in plant defense [44].Nevertheless, in the case with banana, this has not been previously covered.Hotspots in the interface of protein complexes identify binding sites unique from the rest of the surface [45].Hence, assessment of rigid/non-rigid nature decides the interaction sites and characterization of hotspot residues.Redox changes lead to the conversion of the oligomer to a monomer, that translocates to the nucleus and interacts with TGA transcription factors [42,46].
The results postulate that the hypothetical protein, contig 21, is involved in cross-communication between the salicylic acid and jasmonic acid-dependent defense signaling pathway, like that of NPR1 [47].As the SSH analysis resulted in proteins that belong to plant defense, the presence of NPR1 characteristically determines whether Fusarium resistance is constitutively expressed in banana.Resistance to F. oxysporum, apart from ethylene, jasmonic acid, and the salicylic acid pathway, requires the NPR1 gene [48].Thus, the functional annotation of contig 21 provides us a clue that it functions as an NPR1 homolog with regulatory attributes.NPR1 has its role in plant defense through ubiquitination-mediated activity [49].Overall, contig 21 has been deciphered as an NPR1 homolog based on SSH analysis and in silico perspectives.

Conclusions
Functional annotation of hypothetical proteins derived from the SSH library of banana after FOC -biocontrol interaction reveals contig 21 as an NPR1 homolog involved in systemic resistance.miRNA-binding properties, phytoalexin interaction, and ubiquitination affirm the role of contig 21 as an NPR1 homolog.However, additional in vitro and in vivo testing is necessitated to determine its definitive function.

Figure 3 .
Figure 3. Three dimensional structures of modeled miRNAs with unusual bond angles.Figure 3. Three dimensional structures of modeled miRNAs with unusual bond angles.

Figure 3 .
Figure 3. Three dimensional structures of modeled miRNAs with unusual bond angles.Figure 3. Three dimensional structures of modeled miRNAs with unusual bond angles.

Table 2 .
Binding energies of the docked complexes.

Table 2 .
Binding energies of the docked complexes.

Table 3 .
The geometry of interactions between contig 21 and phytoalexins.

Table 3 .
The geometry of interactions between contig 21 and phytoalexins.

Table 4 .
B-factor values for interacting residues and prediction of rigid/non-rigid nature of interacting residues.