Genome-Wide Characterization of Effector Protein-Encoding Genes in Sclerospora graminicola and Its Validation in Response to Pearl Millet Downy Mildew Disease Stress

Pearl millet [Pennisetum glaucum (L.) R. Br.] is the essential food crop for over ninety million people living in drier parts of India and South Africa. Pearl millet crop production is harshly hindered by numerous biotic stresses. Sclerospora graminicola causes downy mildew disease in pearl millet. Effectors are the proteins secreted by several fungi and bacteria that manipulate the host cell structure and function. This current study aims to identify genes encoding effector proteins from the S. graminicola genome and validate them through molecular techniques. In silico analyses were employed for candidate effector prediction. A total of 845 secretory transmembrane proteins were predicted, out of which 35 proteins carrying LxLFLAK (Leucine–any amino acid–Phenylalanine–Leucine–Alanine–Lysine) motif were crinkler, 52 RxLR (Arginine, any amino acid, Leucine, Arginine), and 17 RxLR-dEER putative effector proteins. Gene validation analysis of 17 RxLR-dEER effector protein-producing genes was carried out, of which 5genes were amplified on the gel. These novel gene sequences were submitted to NCBI. This study is the first report on the identification and characterization of effector genes in Sclerospora graminicola. This dataset will aid in the integration of effector classes that act independently, paving the way to investigate how pearl millet responds to effector protein interactions. These results will assist in identifying functional effector proteins involving the omic approach using newer bioinformatics tools to protect pearl millet plants against downy mildew stress. Considered together, the identified effector protein-encoding functional genes can be utilized in screening oomycetes downy mildew diseases in other crops across the globe.


Introduction
Effectors are proteins secreted by several fungi and bacteria that manipulate the host cell structure and function. They are reported to cause infection or induce defense responses in the host [1,2]. This contradictory nature of effectors has been encountered in many fungal and bacterial plant diseases [3,4]. Depending on where they are found inside the host plant, effectors are categorized into two types: cytoplasmic and apoplastic. In the plant extracellular spaces, apoplastic effectors are released, whereas cytoplasmic effectors, on the other hand, are discharged within the plant cytoplasm via the pathogen's specialized structures, such as haustoria and vesicles. The delivery methods of effectors in fungi and oomycetes

GeneMark-ES Suite
The draft genome of Sclerospora graminicola was recovered from https://www.ncbi. nlm.nih.gov/bioproject/PRJNA325098/ (accessed on 27 August 2021) to predict genes and proteins using the GeneMark-ES suite. The ES and fungus flags were used with GeneMark script to enable self-training and branch point model to predict genes with default parameters, and the following measures were used to include the sequence containing the gene in the test set: a. The gene must have an initiator codon ATG, a conical acceptor/donor site; b. Intron/exon assembly must be reinforced by expressed sequence tag/complementary deoxyribonucleic acid [32,33]; c. The annotationdoes have to include substitute isoforms accompanied by EST/cDNA; d. There must be no gene overlap with any other genes that have been annotated; e. Multiple-gene sequences are more suitable for precision evaluation [34,35].

Identification of Signal Peptides (SP) in the N-Terminal Region
SignalP 6.0 (services.healthtech.dtu.dk/services/SignalP-6.0/) (accessed on 23 March 2022) was used to detect signal peptides (SP) in the N-terminal region. The amino acid sequence was converted into FASTA format and pasted in the given empty box given. Furthermore, appropriate options were selected, and the command line "signalpinput.fasta" was submitted. The results showed predicted SP and the position of the cleavage site [36][37][38].

Target P Server
The sequences obtained from SignalP 6.0 were evaluated by TargetP v1.1 (http:// www.cbs.dtu.dk/services/TargetP/) (accessed on 1 December 2021) [39] for their subcellular location based on N-terminal pre-sequences (at least the first 130 amino acids of the N-terminus required). The input data was a one-letter amino acid code reset; other symbols got converted to X before processing, and a non-plant option was selected before submitting the input [40][41][42].

TMHMM v2.0
The input sequences were in FASTA format with functional and secretory pathway proteins, and signal peptides were checked for the presence of transmembrane domains by TMHMM v2.0 (https://services.healthtech.dtu.dk/service.php?TMHMM-2.0) (accessed on 3 January 2022) [43][44][45]. Proteins with 0 and 1 TM domains (an N-terminal signal peptide) were combined to get the secretome of Sclerospora graminicola. Further, LxLFLAK and RxLR motifs were searched in secretome proteins using pattern matches to identify crinkler (CRN) and RxLR proteins, and all the results were cross verified with EffectorP 3.0 (http: //effectorp.csiro.au/) tool (accessed on 27 April 2022) [23]. The protein sequences were translated to their respective gene sequence, and a similarity search was carried out using the Basic Local Alignment Search Tool (BLAST) of NCBI (https://blast.ncbi.nlm.nih.gov/) (accessed on 18 November 2022).

Host and Pathogen
The downy mildew pathogen, Sclerospora graminicola, was isolated from a highly susceptible pearl millet host cultivar (7042S) grown in earthen pots (12-15 cm diameter) under greenhouse conditions. The pathogen was maintained on the same host throughout the experiment.

Extraction of RNA from Scelrospora Graminicola and cDNA Synthesis
The genomic DNA of the host plant was isolated from susceptible pearl millet leaves, as described by Divya et al. [46]. Leaf samples were collected from Sclerospora graminicolainfected pearl millet plants, rinsed with water to eliminate unwanted dirt and dust, and cleaned with sterile tissue paper. Clean leaves were stapled onto wet blotter disc and placed on the upper lid of a sterile Petri dish. The sterile Petri dish was filled with 15 mL of sterile water and incubated overnight at a temperature of 18-20 • C. During the early hours, spores were collected in the lower lid of the Petri dish and centrifuged at 5000 rpm for 5 min. Total RNA extraction processes were initiated by repeatedly washing the zoospores three times in sterile distilled water. The zoospores were washed thrice in sterile distilled water, and total RNA extraction was executed with the aid of RNAeasy plant micro kit as per manufacturer instructions (Qiagen, Hilden, Germany). Total RNA isolated from S. graminicola was checked for its purity at the absorbance of A260/A280 in Ultravioletvisible spectroscopy of Agilent (Cary 60 UV-Vis). The cDNA synthesis was performed with RNA templates using oligo (dT) 18 primers (ThermoFisher, Madison, WI, USA).

Primer Designing
For gene validation, PCR primers for a subset of anticipated full-length RxLR-dEER coding genes were designed and synthesized (Sigma-Aldrich Chemicals Pvt. Ltd., Bangalore, India). The designed primers used in this present study are mentioned in Table 1.
Briefly, the full length of the effector protein-encoding nucleotide sequence from the genome was designed manually by selecting a few nucleotides from the site of initiation and the site of termination. A few nucleotides were selected based on the reverse complement tool. To calculate the melting temperature of the primer, we used the percent GC Oligocalc tool.

Primer Designing
For gene validation, PCR primers for a subset of anticipated full-length RxLR-dEER coding genes were designed and synthesized (Sigma-Aldrich Chemicals Pvt. Ltd., Bangalore, India). The designed primers used in this present study are mentioned in Table 1. Briefly, the full length of the effector protein-encoding nucleotide sequence from the genome was designed manually by selecting a few nucleotides from the site of initiation and the site of termination. A few nucleotides were selected based on the reverse complement tool. To calculate the melting temperature of the primer, we used the percent GC Oligocalc tool.

Primer Designing
For gene validation, PCR primers for a subset of anticipated full-length RxLR-dEER coding genes were designed and synthesized (Sigma-Aldrich Chemicals Pvt. Ltd., Bangalore, India). The designed primers used in this present study are mentioned in Table 1. Briefly, the full length of the effector protein-encoding nucleotide sequence from the genome was designed manually by selecting a few nucleotides from the site of initiation and the site of termination. A few nucleotides were selected based on the reverse complement tool. To calculate the melting temperature of the primer, we used the percent GC Oligocalc tool. USA).

Primer Designing
For gene validation, PCR primers for a subset of anticipated full-length RxLR-dEER coding genes were designed and synthesized (Sigma-Aldrich Chemicals Pvt. Ltd., Bangalore, India). The designed primers used in this present study are mentioned in Table 1. Briefly, the full length of the effector protein-encoding nucleotide sequence from the genome was designed manually by selecting a few nucleotides from the site of initiation and the site of termination. A few nucleotides were selected based on the reverse complement tool. To calculate the melting temperature of the primer, we used the percent GC Oligocalc tool.

Primer Designing
For gene validation, PCR primers for a subset of anticipated full-length RxLR-dEER coding genes were designed and synthesized (Sigma-Aldrich Chemicals Pvt. Ltd., Bangalore, India). The designed primers used in this present study are mentioned in Table 1. Briefly, the full length of the effector protein-encoding nucleotide sequence from the genome was designed manually by selecting a few nucleotides from the site of initiation and the site of termination. A few nucleotides were selected based on the reverse complement tool. To calculate the melting temperature of the primer, we used the percent GC Oligocalc tool.

PCR Amplification
Polymerase Chain Reaction (PCR) experiments were carried out in a thermal cycler (C1000 Touch, part no, #1851148, BioRad, Philadelphia, PA, USA) on cDNA, and only the successfully amplified and reproducible segments were analyzed after the procedure was repeated thrice for each isolate individually. Deoxyribonucleic acid amplification was conducted in a 20 µL reaction mixture containing 0.2 mM of primer and dNTPs, 0.6 units Taq pol (Banglore Genei, Bengaluru, India), 10 mM of tris hydrochloride (pH 9.0), 1.5 mM magnesium chloride, 50 mM potassium chloride, and 50 ng DNA. PCR cycling settings were as follows: initial denaturation at 94 • C for 4 min followed by 40 cycles of 1 min at 94 • C, 1 min at primer-specific annealing temperature (Table 1), and 2min at 72 • C, with final extension for 10 min at 72 • C. The amplicons were electrophoresed on an agarose gel after adding bromophenol blue on 1.5% agarose gel stained with EtBr using 1 Tris-borate Ethylene diamine tetra-acetic acid buffer pH 8.3 [47]. A 1 kb DNA Ladder (part no: G571A, Promega Corporation, Madison, WI, USA) was used as molecular weight marker (m) at 60-65 V. The gel slab was removed and visualized under a molecular imager (Gel Doc imaging systems XR+, BIO RAD).
The amplicons were extracted from the gel with a sharp, sterile scalpel blade when the gel was illuminated with a UV-transilluminator (70%). The dissected gel fragments were added to a clean 2 mL microcentrifuge tube that had been pre-weighed. According to the technique provided, with the help of PureLink Quick Gel Extraction Kit (Cat.No.K210012, Invitrogen, Waltham, MA, USA), the required amplicon was recovered off the agarose gel, and the eluted product was subjected to Sanger sequencing (3730 DNA Analyzer 48-Capillary Array). The results were BLAST analyzed in NCBI for homology with any RxLR effector protein-encoding genes. The amplified nucleotide sequences of RxLR-dEER effectors were retrieved from the direct sequencing and converted to their respective amino acid sequences using the Translate tool, and the sequences with 5 to 3 , which had no gap, were selected. Screening of RxLR and dEER motif was carried out manually, and the intrinsic disorder of the respective proteins was investigated based on the predicted output of the PONDAR VL-XT tool [48,49].

Secretome Mapping
Protein sequences were obtained from Genemark-ES software and used as inputs for SignalP 6.0 to identify secretory proteins. Signal peptides, identified by SignalP 6.0, were present in 935 protein sequences. Out of 935 proteins, 911 were predicted to be involved in secretory pathway signal peptides as per TargetP v1.1.TMHMM v2.0 was used, which predicted 845 secretory proteins in which 803 proteins had 0 TM, and 42 proteins had 1 TM domain (an N-terminal signal peptide) as output. One of the requirements for classifying a protein as an effector is that it secretes extracellularly through the N-terminal secretion signal [50,51]. It was observed that in 35 proteins, LxLFLAK motifs were present, and 152 proteins had RxLR motifs.

Annotation for Crinklerand RxLR Effectors
Leucine-any amino acid-Phenylalanine-Leucine-Alanine-Lysine (LxLFLAK) motifs were present in 35 proteins and were labeled as crinkler (CRN) effector proteins. Furthermore, RxLR proteins were filtered based on the criteria that these motifs are present within 30-60 amino acids after signal cleavage site, and cleavage site is present within 30 amino acids using in-house pearl scripts [52,53]. This led to the identification of 69 RxLR motifs containing proteins that are designated as RxLR and further RxLR effectors carrying dEER motifs were screened, and 17 were identified based on the dEER motifs they were carrying in amino acid sequence after RxLR motifs (Figure 1).

Similarity Search Using NCBI BLAST Tool
The predicted Crinkler and RxLR nucleotide sequences were subjected to NCBI BLAST search. Out of 35 crinkler (CRN) genes, 9 had similarity with Phytophthora sojae strain P6479, 10 with Plasmopara halstedii, 3 with Phytophthora infestans T30-4, 3 with Lagenidium giganteum f. caninum, and 10 had no similarity with any of the genes in the database. Out of 52 RxLR genes, only one gene had a similarity with Plasmopara halstedii, the rest of the 51 had no similarity, and all 17 RxLR-dEER effectors had no similarity in the NCBI database. However, the translated RxLR protein sequence showed similarity with other proteins found in the NCBI database (Supplementary Table S1; Supplementary Figures S1-S5).

Confirmation of the Presence of RxLR-dEER Effectors Genes
Total RNA isolated from S. graminicola had a purity of 1.80 at the absorbance of A260/A280 in Cary 60 UV-Vis, an Ultraviolet-visible spectrophotometer by Agilent. The amplicons of all the 17 RxLR-dEER protein-coding genes were subjected to 1.5% agarose gel electrophoresis along with host DNA (Pennisetum glaucum), a ladder of 1 kb. Interestingly, only 5bands (6877_g, 60945_g, 8311_g, 35983_g, and 60741_g) were visible at 885, 1248, 1254, 1410, and 1533 base pairs, respectively ( Figure 2). These five genes were BLAST analyzed in the NCBI database for homology with any RxLR effectors. All five amplicons had no significant similarity in the NCBI database. Hence, the amplified sequences were submitted to NCBI Gene Bank (Table 5). gel electrophoresis along with host DNA (Pennisetum glaucum), a ladder of 1 kb. Interestingly, only 5bands (6877_g, 60945_g, 8311_g, 35983_g, and 60741_g) were visible at 885, 1248, 1254, 1410, and 1533 base pairs, respectively ( Figure 2). These five genes were BLAST analyzed in the NCBI database for homology with any RxLR effectors. All five amplicons had no significant similarity in the NCBI database. Hence, the amplified sequences were submitted to NCBI Gene Bank (Table 5).

Analysis of Overall Disorder Regions in RxLR-dEER Effector Proteins Using PONDR VL-XT
The nucleotide sequences of the five amplicons were translated to their respective amino acid sequences using the bioinformatics tool Translate, and the sequences were selected in the frame of 5′ to 3′endswith no gaps in it and at least having one open reading frame (ORF). The disordered content in predicted RxLR-dEER proteins ranged from 46.17% to 25.05% ( Figure 3A-E), and the mean disorder content was 33.928% ( Table 6). The sequence features and predicted domains of the five novel effector proteins are presented in Figure 4 and Table 7.

Analysis of Overall Disorder Regions in RxLR-dEER Effector Proteins Using PONDR VL-XT
The nucleotide sequences of the five amplicons were translated to their respective amino acid sequences using the bioinformatics tool Translate, and the sequences were selected in the frame of 5 to 3 endswith no gaps in it and at least having one open reading frame (ORF). The disordered content in predicted RxLR-dEER proteins ranged from 46.17% to 25.05% ( Figure 3A-E), and the mean disorder content was 33.928% ( Table 6). The sequence features and predicted domains of the five novel effector proteins are presented in Figure 4 and Table 7.

Discussion
Pathogens release effector proteins into the plant apoplast or transport them into the host cytoplasm, where they inhibit defense responses or change host metabolism [51,54]. The oomycete cytoplasmic RxLR and crinkler (CRN) effector classes are well documented based on the modular structure [55]. CRNs are a deep-rooted family of effectors discovered in various oomycete species with different evolutionary relationships [15]. In this present study, we discovered 104 effector-encoding genes in S. graminicola. A similar study carried out by Muller et al. [56] found 844 putative effector genes in Blumeria graminis f. sp. tritici. Huang et al. [57] used bioinformatic prediction approaches to identify 316

Discussion
Pathogens release effector proteins into the plant apoplast or transport them into the host cytoplasm, where they inhibit defense responses or change host metabolism [51,54]. The oomycete cytoplasmic RxLR and crinkler (CRN) effector classes are well documented based on the modular structure [55]. CRNs are a deep-rooted family of effectors discovered in various oomycete species with different evolutionary relationships [15]. In this present study, we discovered 104 effector-encoding genes in S. graminicola. A similar study carried out by Muller et al. [56] found 844 putative effector genes in Blumeria graminis f. sp. tritici. Huang et al. [57] used bioinformatic prediction approaches to identify 316 candidates secreted effector proteins (CSEPs) in the complete genome of Fusarium sacchari. A total of 95 CSEPs, spanning 40 superfamilies and 18 domains, had known conserved structures, while another 91 CSEPs comprised 7 recognized motifs. A total of 14 of the 130 CSEPs with no known domains or motifs had 1 of 4 unique motifs. The roles of 163 CSEPs were investigated using a heterogeneous expression system in Nicotianabenthamiana. In N. benthamiana, seven CSEPs reduced BAX-triggered programmed cell death, while four caused cell death. These eleven CSEPs' expression characteristics during F. sacchari infection revealed that they could be involved in sugarcane-F. sacchari interaction. The B. graminis f. sp. tritici, the powdery mildew pathogen of the wheat, genome sequence revealed that it encoded 7588 proteins that coded genes in 180 Mb genomic size. B. graminis f. sp. hordei genome has 5854 protein-coding genes [58]. A total of 660 and 620 secretory proteins were recognized in 2 individual races, 77 and 106, of Puccinia triticina, respectively [59]. Our analysis of S. graminicola predicted that 845 out of 79,754 proteins are secretory transmembrane proteins. Furthermore, out of these secretory proteins, 35 CRN effector and 69 RxLR effector proteins were identified. This is the first report on the identification and characterization of effector genes in Sclerospora graminicola, a downy mildew agent affecting Pennisetum glaucum.
From this present work, it is evident that the genome of S. graminicola has only thirtyfive crinkler (CRN) effectors proteins present. This might be because oomycetes are known to secrete CRN without a traditional signal peptide via an unusual secretion pathway. Because CRNs are known to cause necrosis, which is unfavorable to biotrophy, the lower CRN level in downy mildew pathogens could indicate an adaptation to biotrophy [17]. Moreover, 152 RxLR motifs contained effector proteins, out of which 69 had RxLR motifs within 30-60 amino acid sequences present after the signal cleavage site and cleavage site within 30 amino acids [52] in S. graminicola. In the future, a better understanding of fungal effector function and the underlying mechanisms and the application of host-induced gene silencing technology to generate disease-resistant crops could be an effective method for preventing and controlling plant illnesses [60].
The pathogenicity factors of downy mildew pathogens with the N-terminal RxLR motif are the best understood. Purayannur et al. [17] reported 296 RxLR effectors in P. humuli. Similarly, Tyler et al. [61] discovered 350 RxLR effectors in Phytophthora sojae and P. ramorum genomes, respectively. At least fifty downy mildew effectors have been discovered in Hyaloperonospora parasitica by Morgan and Kamoun [62]. Kamoun [63] reported 200 effectors in P. sojae, P. capsici, P. infestans, and P. ramorum, respectively. Baxter et al. [14] found 130 genes in H. arabidopsidis expressing putative effector proteins with RxLR dEER motifs. In a different study, Cai et al. [64] reported that the virulence of the bacterial pathogen, Aeromonas salmonicida is highly dependent on the effector protein hcp gene for its pathogenicity. In this study, the hypersensitive reaction was indicated by the development of necrotic areas on the resistant pearl millet callus within 2 h post inoculation, thus triggering defense signaling responses in the neighboring cells.
This study witnessed RxLR-dEER protein 35893_g (OM365913) had the highest disordered residues of 46.17% in the secretome of S. graminicola. Similarly, in a different investigation, it was reported that P. sojae had an average of 63% of disordered amino acid residues as RxLR-dEER proteins have a unique amino acid makeup and are high in disorder-promoting residue, and the disordered structure of the effectors may boost their pathogenic ability. Thus, proteins could interface with plant proteins to imitate host defense signaling molecules and control plant physiological responses [19,65,66]. Parallel to the genomic investigations that paved the way for studying effector and pathogen genetics, transcriptome analyses on biotrophs are popular; it has also improved pathogen understanding by bringing transcriptome research and genome-slicing investigations, employing complementary DNA libraries. Because biotroph effector molecules are generally unique and have little resemblance to existing proteins, selecting candidates based solely on sequences becomes more difficult [67]. Hacquard et al. [68] discovered a multistep mode of action of mildew candidate secretory effector proteins (CSEPs) in powdery mildew pathogenesis on barley and immune compromised Arabidopsis, with a first wave of CSEP transcripts accumulating during host cell entry (12 h) and a second wave of transcripts accumulating at the stage of haustorium formation (24 h). In wheat powdery mildew, a comparable high induction of CSEPs was seen during the haustorial stage [69].

Conclusions
Obligate biotrophs are among the fastest developing pathogens; they might use many mechanisms driving effector development or encode a high quantity of nontypically discharged effector-producing genes. Transcripts jammed with the host nucleotide sequences may obscure effector detection as these organisms tightly govern effector regulations and analyze expression in diseased plants. Effector proteins have an extremely varied sequence, and almost no protein has a resemblance to identified effectors. In this present study, Gene mark ES suite, SignalP 6.0, TargetP, and TMHMM v.2.0 could define the 'organism's entire secretome from the projected proteomes, and Effector 3.0 positively correlated with the results of the above tools. This study clearly shows that the genome of Sclerospora graminicola comprises two classes of effectors that are RxLR and crinkler (CRN), of which five of the novel RxLR with dEER motif effector proteins are documented in the NCBI database. Furthermore, the presence of intrinsic disorder in these proteins is a unique structural property of RxLR proteins. This is the first report to document the presence of CRN, RxLR, and RxLR-dEER effector proteins in the S. graminicola genome. Further study on the interaction of RxLR proteins with host plants would provide a new area for confirmation of the pathogenic or mimicking activity of the protein to trespass the host immune response to cause the disease.

Data Availability Statement:
The data used to support the results of this paper is available at the NCBI repository with accession numbers OM135515, OM365911, OM365912, OM365913, and OM365914 is accessible through the following link, https://www.ncbi.nlm.nih.gov/nucleotide/ (accessed on 27 August 2022). Sclerosporagraminicola genome available in NCBI GenBank used for transcriptome mapping: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA325098/ (accessed on 27 August 2022).