In Silico Characterization of the Secretome of the Fungal Pathogen Thielaviopsis punctulata, the Causal Agent of Date Palm Black Scorch Disease

The black scorch disease of date palm caused by Thielaviopsis punctulata is a serious threat to the cultivation and productivity of date palm in Arabian Peninsula. The virulence factors that contribute to pathogenicity of T. punctulata have not been identified yet. In the present study, using bioinformatics approach, secretory proteins of T. punctulata were identified and functionally characterized. A total of 197 putative secretory proteins were identified, of which 74 were identified as enzymes for carbohydrate degradation (CAZymes), 25 were proteases, and 47 were predicted as putative effectors. Within the CAZymes, 50 cell wall-degrading enzymes, potentially to degrade cell wall components such as cellulose, hemicellulose, lignin, and pectin, were identified. Of the 47 putative effectors, 34 possessed at least one functional domain. The secretome of T. punctulata was compared to the predicted secretome of five closely related species (T. musarum, T. ethacetica, T. euricoi, T. cerberus, and T. populi) and identified species specific CAZymes and putative effector genes in T. punctulata, providing a valuable resource for the research aimed at understanding the molecular mechanism underlying the pathogenicity of T. punctulata on Date palm.


Introduction
Fungal pathogens cause huge yield losses in agricultural crops and post-harvest products worldwide [1]. According to the Food and Agriculture Organization (FAO), an estimated $220 billion global economy is lost due to fungal disease every year. To prevent such losses, farmers use several fungicides, which is not only an ineffective method as the pathogens gain resistance against these chemicals quickly but is also very harmful to humans and the environment. Alternatively, genetic approaches, including the use of resistance genes, are considered safer and more durable. However, the selection pressure imposed by single resistance genes in host plants can force rapid evolutionary changes in pathogens that often lead to resistance breakdown. For example, Fusarium oxysporum f. sp. lycopersici, a wilt pathogen of tomato, has evolved multiple times as different races to evade host resistance when a cultivar with a new resistance gene have been introduced in the field [2]. Therefore, to achieve more durable resistance against fungal pathogens, a deep understanding of the virulence factors secreted by the pathogen and the resulting plant immune responses is inevitable [3].
For the successful penetration and colonization, pathogens have to overcome multiple layers of plant immunity [4]. The first layer of immunity in plants is triggered by pattern recognition receptors, which recognizes pathogen-associated molecular patterns, for example, chitin in fungal pathogens. This layer of immunity is called PAMP-triggered immunity

Sequence Information and Gene Prediction
The gene models of the draft genome sequence (NCBI accession: GCA_000968615.1) of the Thielaviopsis punctulata isolate CR-DP1_NODE_1 was used for the prediction of secretome. For the comparative analysis, the gene models were predicted for the closely related species, viz; Thielaviopsis musarum (NCBI accession: GCA_001513885.), Thielaviopsis cerberus (GCA_016859225.1), Thielaviopsis ethacetica (NCBI accession: GCA_001599055.1), Thielaviopsis euricoi (NCBI accession: GCA_001599615.1), Thielaviopsis populi (NCBI acces-sion: GCA_017591655.1). The gene models were predicted according to the method used by Wingfield et al. [30]. To infer the phylogenetic relationship among Thielaviopsis species, 50 shared orthologs were selected randomly and a concatenated alignment was made. The relationship was constructed by MEGA11 using the Maximum Likelihood method and JTT matrix-based model (based on 1000 bootstrap replications).

Prediction of the Secretome
We used a pipeline described previously to predict fungal secretome [31,32]. Briefly, SignalP (version 6.0) was used in combination with Phobius server [33,34]. The sequences, that were predicted to carry a signal peptide by both programs were selected for further screening. To exclude the transmembrane proteins, DeepTMHMM server was used [35]. The endoplasmic reticulum-targeting protein sequences were removed by scanning the sequences for PS00014 ER motif retention against the Prosite database with the ScanProsite web server [36]. The subcellular localization of the proteins was predicted using TargetP and WoLF PSORT servers [37,38]. The proteins harboring glycophosphatidylinositol anchor motifs were predicted using NetGPI (version 1.1) [39].

Annotation of Secretory Proteins
The refined secretome were scanned against Uniprot, PFAM, InterPro, and Gene3D to retrieve functional annotation of the predicted proteins [40][41][42][43]. The CAZy database and dbCAN web server were used to retrieve the annotation of carbohydrate-degrading enzymes [44,45]. For the effector prediction, the standalone software, EffectorP (version 3.0), combined with the manual inspection was used [46]. In addition, the BlASTP (E value < 1 × 10 −10 ) was used to search against the pathogen-host interaction database (PHI database) to find similarities to known effectors and virulence factors [47]. Proteolytic enzymes were identified using a BlastP search against MEROPS database.

Prediction of Gene Models and Orthologue Analysis
The draft genome sequence of the Thielaviopsis punctulata (NCBI accession: GCA_0009-68615.1) and the 5296 predicted gene models were used for the identification of secretome. In addition, for the comparative analysis, we have also predicted the genes and the encoded proteins for 5 closely related species of T. punctulata viz-T. musarum, T. ethacetica, T. euricoi, T. populi, and T. cerberus-by utilizing the draft genome of these species available publicly (Table 1) [30]. The gene models were predicted according to the method described previously [30]. The number of predicted proteins in each species is given in Table 1. The gene models predicted for the closely related species were given in the Supplementary Table  S1. The orthologue analysis using the whole proteome revealed that all six species form 7001 clusters, 3719 orthologous clusters (at least contains two species) and 3282 single-copy gene clusters (Figure 1, Supplementary Table S2). The number of singletons (The proteins which did not form any cluster) vary among the species. The highest number of singletons were found for T. cerberus (354), whereas the lowest were found for T. punctulata (50), which suggested that 94.4% of predicted proteins of T. punctulata had orthologue in other species. A phylogenetic tree was constructed using a concatenated alignment of 50 single copy orthologue proteins and revealed that T. cerberus was closely related to T. punctulata. The close genetic similarity between T. punctulata and T. cerberus was shown previously using internal transcribed spacer (ITS), β-tubulin, and transcription elongation factor 1-α DNA markers ( Figure 1B) [25].

Secretome Identification and Analysis
The methodology used to predict the secretome of T. punctulata is illustrated in Figure 2. Using a combination of SignalP and Phobius server, of the 5296 total proteins, 314 proteins were predicted to have a signal peptide at their N-terminal region. Among these 314 proteins, 63 transmembrane proteins were excluded, and the remaining 251 proteins were scanned for an endoplasmic reticulum (ER)-targeting signal to exclude the proteins that remain in the endoplasmic reticulum. Of the 251 proteins, 13 were predicted to have PS00014 ER motif and were excluded from further analysis. The remaining 238 proteins were predicted as "extra-cellularly localized" through TargetP and WoLF PSORT analysis. Next, of the 238 proteins, 41 proteins were predicted to harbor glycophosphatidylinositol anchor motifs using NetGPI (version 1.1), which likely represent surface proteins rather than secreted effectors and were excluded. This resulted in a list of 197 "refined secretome", which is 3.7% of the whole predicted proteome of T. punctulata (Table 1). Using this method, the secretome of T. musarum, T. ethacetica, T. euricoi, T. populi, and T. cerberus were also predicted. Overall, the number of "refined" secretome in all six species ranged from 150 (T. cerberus) to 215 (T. ethacetica) (Table 1, Figure 2B). J. Fungi 2023, 9, x FOR PEER REVIEW 5 of 20 PS00014 ER motif and were excluded from further analysis. The remaining 238 proteins were predicted as "extra-cellularly localized" through TargetP and WoLF PSORT analysis. Next, of the 238 proteins, 41 proteins were predicted to harbor glycophosphatidylinositol anchor motifs using NetGPI (version 1.1), which likely represent surface proteins rather than secreted effectors and were excluded. This resulted in a list of 197 "refined secretome", which is 3.7% of the whole predicted proteome of T. punctulata (Table 1). Using this method, the secretome of T. musarum, T. ethacetica, T. euricoi, T. populi, and T. cerberus were also predicted. Overall, the number of "refined" secretome in all six species ranged from 150 (T. cerberus) to 215 (T. ethacetica) (Table 1, Figure 2B).

Structural and Functional Characterization of Secretome
The length of 197 refined secretome of T. punctulata ranged from 78 aa to 1356 aa. Of these, 43% (86) of proteins had a length of 78 aa to 399 aa, which indicated that small secretory proteins were enriched in the secretome of T. punctulata ( Figure 2C).
The comparison of CAZymes in closely-related species revealed 59 different classes of CAZymes in all species, including T. punctulata ( Figure 4, Table 3, Supplementary Table  S3). The secretome of all these species were rich in secreted CAZymes families, especially those involved in plant cell wall degradation (PCWD). High numbers of secreted CAZymes involved with PCWD have also been found in the genomes of several Botryosphaeriaceae pathogens [58]. However, the occurrence of these classes varied among Thielaviopsis species. For example, only 11 classes of CAZymes (AA1, AA5, GH10, GH11, GH16, GH17, GH20, GH32, GH43, GH76, and PL1) were found common in the secretome of all species (Figure 4, Table 3, Supplementary Table S3). In addition, some classes of CAZymes were found to be specific for some species, and notably 10 classes of CAZymes (GH64, GH125, GH45, GH132, AA16, GH128, GT4, GH51, GH115, GH38) were found only in the secretome of T. punctulata ( Figure 4, Table 3, Supplementary Table S3), suggesting that they might have a species-specific role in black scorch disease of date palm.

Secreted Proteases
Several studies have shown that plant pathogenic fungi secrete proteases that degrade plant antimicrobial proteins and protease inhibitors (PIs) to facilitate virulence [59]. The BlastP search against MEROPS database resulted in the identification of 24 putative proteases from the 197 refined secretome, which were classified into several groups based on their catalytic residues (Supplementary Table S4). Among the proteases, serine proteases (13) were dominant, followed by metallo proteases (6), aspartic proteases (4), and carboxy protease (1). The serine proteases included S8, S10, and S28 families. Among these, S8 was found to be dominant (Supplementary Table S4). Members of metallo protease were further classified into M14, M28, M36, and M43 families based on their similarity to the known members from these families. Members of the same classes of proteases were also identified in the closely related species (Supplementary Table S4). Comparison to closely related species revealed that Serine protease S8 was prominent in T. unctulate, whereas S9 was completely absent ( Figure 5).

CAZyme Class Class Members
Thielaviopsis Species

Secreted Proteases
Several studies have shown that plant pathogenic fungi secrete proteases that degrade plant antimicrobial proteins and protease inhibitors (PIs) to facilitate virulence [59]. The BlastP search against MEROPS database resulted in the identification of 24 putative proteases from the 197 refined secretome, which were classified into several groups based on their catalytic residues (Supplementary Table S4). Among the proteases, serine proteases (13) were dominant, followed by metallo proteases (6), aspartic proteases (4), and carboxy protease (1). The serine proteases included S8, S10, and S28 families. Among these, S8 was found to be dominant (Supplementary Table S4). Members of metallo protease were further classified into M14, M28, M36, and M43 families based on their similarity to the known members from these families. Members of the same classes of proteases were also identified in the closely related species (Supplementary Table S4). Comparison to closely related species revealed that Serine protease S8 was prominent in T. 12unctulate, whereas S9 was completely absent ( Figure 5).

Putative Effector Proteins
EffectorP, combined with the manual inspection, was employed to identify the putative effector proteins with the following characteristics: a signal peptide for secretion, no trans-membrane domains, fairly small size, and cysteine-rich [9,31,60]. This analysis resulted in the identification of 47 proteins as putative "effector" candidates ( Figure 6). The length of these effector candidates varied from 78 to 392 Amino acids, of which 15 candidates were 100 to 200, 16 were 200 to 300, and 14 were 300 to 400 amino acids in length,

Putative Effector Proteins
EffectorP, combined with the manual inspection, was employed to identify the putative effector proteins with the following characteristics: a signal peptide for secretion, no trans-membrane domains, fairly small size, and cysteine-rich [9,31,60]. This analysis resulted in the identification of 47 proteins as putative "effector" candidates ( Figure 6). The length of these effector candidates varied from 78 to 392 Amino acids, of which 15 candidates were 100 to 200, 16 were 200 to 300, and 14 were 300 to 400 amino acids in length, respectively (Table 4). Two candidates (KKA29758.1 and KKA27537.1) were found with a length of less than 100 amino acids. The number of cysteine residues varied from 2 to 8 in the selected putative effectors, of which 70% were found with more than four cysteine residues (Table 4) Table 4). Orthologue analysis found that T. punctulata shared 24 effectors with other closely related species, of which 3 proteins were shared with all closely related species (T. musarum, T. euricoi, T. populi, T. ethacetica and T. cerberus) (Figure 7). Two pairs of duplicated effector proteins were identified in the secretome of T. punctulata (KKA28270.1 and KKA26687.1 and KKA27979.1 and KKA29712.1). These paralogs showed 100% identity to each other, indicating a recent duplication. Interestingly, 18 effectors were found to be specific to T. punctulata, and functional analysis of these effectors may reveal more insight into the pathogenicity of T, punctulata on Date palm (Table 4). 23, 9, x FOR PEER REVIEW 13 of 20 4). Orthologue analysis found that T. punctulata shared 24 effectors with other closely related species, of which 3 proteins were shared with all closely related species (T. musarum, T. euricoi, T. populi, T. ethacetica and T. cerberus) (Figure 7). Two pairs of duplicated effector proteins were identified in the secretome of T. punctulata (KKA28270.1 and KKA26687.1 and KKA27979.1 and KKA29712.1). These paralogs showed 100% identity to each other, indicating a recent duplication. Interestingly, 18 effectors were found to be specific to T. punctulata, and functional analysis of these effectors may reveal more insight into the pathogenicity of T, punctulata on Date palm (Table 4).

Putative Virulence Factors
To identify the homologs of pathogenicity-associated genes in other phytopathogens, we screened 197 refined secretome, including all the putative effectors against the PHI (Pathogen-host interactions) database [47]. The protein sequences in the PHI database are classified into different categories such as loss of pathogenicity, reduced virulence, unaffected pathogenicity, increased virulence, effector (plant avirulence determinant), lethal, enhanced antagonism, resistant to chemical, and sensitivity to chemicals based on the results of mutation experiments. For example, the "Loss of pathogenicity" group includes proteins for which the mutant strains fail to cause diseases in host compared to the wild type. Based on the PHI annotation, of the 197 secretomes, 49 had PHI homologue, including 27 CAZymes, 7 proteases, and 15 putative effectors (Supplementary Table S5). Of the 27 CAZymes, four were assigned as effector (plant_avirulence_determinant), including three lytic cellulose monooxygenases in AA9 CAZymes family (KKA27328.1, KKA28497.1 and KKA25992.1) and one endo-β-1,4-xylanase in GH11 family (KKA30007.1). Both these enzymes were reported to contribute to the virulence of several plant pathogenic fungi. The homologue of lytic cellulose monooxygenase in Magnaporthe oryzae (MoCDIP) induced cell death when it was expressed in rice plant cells [61]. Similarly, a lytic cellulose monooxygenase gene (PHEC27213) in Podosphaera xanthii was shown to suppress the chitin-triggered immunity in cucurbit host [62]. Endo-β-1,4-xylanase was also shown to involved in the pathogenicity of many fungal pathogens, including Verticillium dahlia, Ustilago maydis, Valsa mali, etc. [63][64][65]. In addition, 13 CAZymes were also assigned as "reduced virulence" according PHI database, which includes three Pectin lyases (PL), two

Putative Virulence Factors
To identify the homologs of pathogenicity-associated genes in other phytopathogens, we screened 197 refined secretome, including all the putative effectors against the PHI (Pathogen-host interactions) database [47]. The protein sequences in the PHI database are classified into different categories such as loss of pathogenicity, reduced virulence, unaffected pathogenicity, increased virulence, effector (plant avirulence determinant), lethal, enhanced antagonism, resistant to chemical, and sensitivity to chemicals based on the results of mutation experiments. For example, the "Loss of pathogenicity" group includes proteins for which the mutant strains fail to cause diseases in host compared to the wild type. Based on the PHI annotation, of the 197 secretomes, 49 had PHI homologue, including 27 CAZymes, 7 proteases, and 15 putative effectors (Supplementary Table S5). Of the 27 CAZymes, four were assigned as effector (plant_avirulence_determinant), including three lytic cellulose monooxygenases in AA9 CAZymes family (KKA27328.1, KKA28497.1 and KKA25992.1) and one endo-β-1,4-xylanase in GH11 family (KKA30007.1). Both these enzymes were reported to contribute to the virulence of several plant pathogenic fungi. The homologue of lytic cellulose monooxygenase in Magnaporthe oryzae (MoCDIP) induced cell death when it was expressed in rice plant cells [61]. Similarly, a lytic cellulose monooxygenase gene (PHEC27213) in Podosphaera xanthii was shown to suppress the chitin-triggered immunity in cucurbit host [62]. Endo-β-1,4-xylanase was also shown to involved in the pathogenicity of many fungal pathogens, including Verticillium dahlia, Ustilago maydis, Valsa mali, etc. [63][64][65]. In addition, 13 CAZymes were also assigned as "reduced virulence" according PHI database, which includes three Pectin lyases (PL), two laccases (AA1), two β-glucosidases (GH03), one peroxidase (AA2), Oxidase (AA5), endo-1,4-β-xylanase (GH10), exo-α-1,6-mannosidase (GH25), and licheninase (GH16). The contribution of these genes in virulence was demonstrated for many plant-pathogenic fungi [66][67][68][69][70]. Of the CAZymes, six were assigned as "Unaffected category" (Table S5). Among the 25 proteases identified, two metallo peptidases (KKA26166.1 and KKA27603.1) were assigned as "loss of pathogenicity" and one carboxy peptidase (KKA30601.1) was assigned as "reduced virulence" (Supplementary Table S5). The homologs of these two metallo peptidases were shown to require the pathogenicity of Fusarium oxysporum f. sp. lycopersici and Magnaporthe oryzae on their respective hosts [71,72]. In addition, two proteases were assigned as "unaffected pathogenicity" according to PHI annotation (Supplementary Table S5). From the putative effectors group, PHI partners were identified for 15 proteins, of which one was assigned as "loss of pathogenicity", which encodes an acetylglucosaminyl phosphatidylinositol deacetylase (KKA29596.1) (Supplementary Table S5). The homologue of this protein in Colletotrichum graminicola, the causal agent of maize anthracnose, was shown to require cell-wall integrity and pathogenicity [73]. Thirteen effectors were assigned as "reduced virulence", and homologs of these proteins were shown to be required for pathogenicity in many fungal species. Notably, the most enriched group of effectors in T. punctulata were Egh16-like proteins, and their homologue in PHI database (PHI:256) was shown to act as virulence factors in rice blast fungus Magnaporthe grisea [74]. The Egh16 family members were also shown to be involved in the virulence of many pathogenic filamentous fungi. For example, two Egh16-like factors in Erysiphe pisi, EpCSEP087 and EpCSP083, were found to be highly induced during early infection stages on pea, suggesting a critical role in appressorium penetration and pathogenesis [75].

Conclusions
In the current study, the potential secretory proteins of Thielaviopsis punctulata were extensively characterized using a well-designed bioinformatics approach. Within the secretome, 74 CAZymes, 25 proteases, and 47 putative effector proteins were identified. For the comparative analysis, the genes models for five closely related species of T. punctulata were predicted and the secretome of each species was also well-characterized. The comparative analysis revealed that all Thielaviopsis species studied possessed species-specific CAZymes families, putative effectors, and several putative virulence factors. The current study will be a valuable source for the studies aimed at understanding the pathogenicity mechanism not only in T. punctulata-date palm interaction, but also in other Thielaviopsis species and their respective hosts.

Conflicts of Interest:
The authors declare no conflict of interest.