Genome-Wide Identification of Sweet Orange WRKY Transcription Factors and Analysis of Their Expression in Response to Infection by Penicillium digitatum

WRKY transcription factors (TFs) play a vital role in plant stress signal transduction and regulate the expression of various stress resistance genes. Sweet orange (Citrus sinensis) accounts for a large proportion of the world’s citrus industry, which has high economic value, while Penicillium digitatum is a prime pathogenic causing postharvest rot of oranges. There are few reports on how CsWRKY TFs play their regulatory roles after P. digitatum infects the fruit. In this study, we performed genome-wide identification, classification, phylogenetic and conserved domain analysis of CsWRKY TFs, visualized the structure and chromosomal localization of the encoded genes, explored the expression pattern of each CsWRKY gene under P. digitatum stress by transcriptome data, and made the functional prediction of the related genes. This study provided insight into the characteristics of 47 CsWRKY TFs, which were divided into three subfamilies and eight subgroups. TFs coding genes were unevenly distributed on nine chromosomes. The visualized results of the intron-exon structure and domain are closely related to phylogeny, and widely distributed cis-regulatory elements on each gene played a global regulatory role in gene expression. The expansion of the CSWRKY TFs family was probably facilitated by twenty-one pairs of duplicated genes, and the results of Ka/Ks calculations indicated that this gene family was primarily subjected to purifying selection during evolution. Our transcriptome data showed that 95.7% of WRKY genes were involved in the transcriptional regulation of sweet orange in response to P. digitatum infection. We obtained 15 differentially expressed genes and used the reported function of AtWRKY genes as references. They may be involved in defense against P. digitatum and other pathogens, closely related to the stress responses during plant growth and development. Two interesting genes, CsWRKY2 and CsWRKY14, were expressed more than 60 times and could be used as excellent candidate genes in sweet orange genetic improvement. This study offers a theoretical basis for the response of CSWRKY TFs to P. digitatum infection and provides a vital reference for molecular breeding.


Introduction
In the world citrus production, the total yield of oranges can reach 75.459 million tons with a harvested area of 388.5 ha, with sweet orange (Citrus sinensis) accounting for the largest share that is the main orange juice processing variety with a high economic value (Food and Agriculture Organization of the United Nations, 2020). Recently, increased global warming has raised the risk of pathogen infection in orchards, and the various biotic and abiotic stresses cause high loss rates in sweet oranges [1], with Penicillium digitatum dispensable roles. However, there are few reports on the roles of CSWRKY TFs in the infection of fruits by P. digitatum. In this study, we performed genome-wide identification, classification, phylogenetic and conserved domain analysis of CsWRKY TFs, visualized the structure and chromosomal localization of the encoded genes, explored the expression pattern of each CsWRKY gene under P. digitatum stress by transcriptome data, and made the functional prediction of the related genes. These results can provide a theoretical basis for further understanding the role of CsWRKY TFs in pathogen stress and lay a foundation for future functional verification of excellent candidate genes.

Genome-Wide Identification of the CsWRKY TFs
Relevant data, such as C. sinensis genome (v3.0) sequences, protein sequences, and annotation information, were downloaded from the Citrus Pan-Genome to Breeding Database (CPBD, http://citrus.hzau.edu.cn/download.php, accessed on 25 June 2022). Combined with the WRKY Hidden Markov Model (PF03106, http://pfam.xfam.org/ accessed on 25 June 2022), a domain search of the orange genome-wide proteins was performed using the hmmsearch program to select protein sequences with E-value < 1 × 10 −5 . Secondly, homology matching of sweet orange whole protein sequences was performed using the Blastp program with 71 AtWRKY TF sequences as a reference, and those with E-value < 1 × 10 −5 were screened. Two screens were combined and de-duplicated. Online protein domain analysis tools SMART (http://smart.embl-heidelberg.de/, accessed on 3 July 2022) and CDD (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, accessed on 3 July 2022) were used to determine the presence of WRKY domains in the candidate sequences [26,27], then eliminating missing or incomplete ones, finally obtaining CsWRKY TF family members.

Calculation of Physicochemical Parameters of CsWRKY TFs
The numbers of amino acids, molecular weight, theoretical isoelectric point (proteinwide included), and instability coefficient of the CsWRKY TFs were calculated using the online analysis tool ProtParam (http://www.expasy.org/tools/protparam.html, accessed on 5 July 2022). We used the Plant-mPLoc tool (http://www.csbio.sjtu.edu.cn/bioinf/ Cell-PLoc-2/, accessed on 5 July 2022) to predict the subcellular localization of CsWRKY sequences. The gene length, coding region length, and G-C contents in 2000 bp upstream of the coding genes were counted by TBtools [28].

Classification and Phylogenetic Analysis of CsWRKY TFs
The AtWRKY and CsWRKY TFs sequences were subjected to multiple sequence alignment by ClustalX2 software (version 2.0), and the results were corrected, clipped, and embellished by Jalview software (version 2.11.2) to retain 11 amino acid sequences upstream of the heptad domain and three amino acid sequences downstream of the zinc finger domain. We integrated the results and submitted them to WebLogo (http://weblogo. berkeley.edu/logo.cgi, accessed on 11 July 2022) for sequence identification while referring to the AtWRKY TFs classification methods for categorizing [16].
The domains (65-69 cross amino acid sequences) and full-length amino acid sequences of CsWRKY and AtWRKY TFs were compared using ClustalX2. The maximum likelihood tree-building software IQ-TREE and Modelfinder were chosen to select the best model to construct the phylogenetic tree, with the bootstrap parameter set to 1000 [29]. The evolutionary tree was obtained from beautification with the online iTOL (https://itol.embl. de/, accessed on 13 July 2022).

Analysis of the CsWRKY Gene Structures and TF Domains
Exon-intron information of the encoding genes was obtained from the sweet orange genome annotation files. The CsWRKY TF domains were predicted and annotated using the online tool MEME (http://meme-suite.org, accessed on 17 July 2022), with the number set to 10 and other parameters to default values [30]. Finally, the results were plotted using TBtools integration.

Analysis of the Cis-Elements of the CsWRKY Genes
To predict the cis-acting elements in the promoter, we extracted the upstream 2000 bp base sequences of the CsWRKY genes by TBtools. The promoter analysis was performed using the online PlantCARE (http://plantpan.itps.ncku.edu.tw/index.html, accessed on 25 July 2022), and the results were finally collated and visualized [31].

Chromosomal Localization, Duplicate Gene Pairs and Ka/Ks Calculations for the CsWRKY Genes
Based on the sweet orange whole genome and annotation files, the TBtools were used to extract the coordinates, length information, and genome-wide gene density information of target CsWRKY genes on chromosomes. The MCScanX was used to generate the gene collinearity files [32]. Additionally, the NG methods calculated the nonsynonymous changes (Ka), synonymous changes (Ks), and Ka/ks ratio of duplicated gene pairs [33].

Strain Isolation, Identification and Fruit Biostress Treatment
The method of Costa et al. [34] was referred to as isolate Penicillium disease samples. The mature sweet orange fruits infected with Penicillium were collected from Xinping County, Yunnan Province, China. Potato dextrose agar (PDA) medium was selected for isolation and purification, followed by monosporial culture, and finally sent to the company for sequencing and identification. The representative P. digitatum strain xp3 was chosen and inoculated onto a PDA medium by a plate coating method and incubated in a constant temperature incubator at 25 • C for three days. Then, the culture dishes were rinsed twice with sterile water, and the rinsing solution was collected in a centrifuge tube and shaken well to obtain a spore suspension.
Surface disinfection of 62-67 mm mature, medium-sized fruits harvested from the same asexual line in Xinping County by treating them with 4% sodium hypochlorite solution. Then, four holes (4 mm length × 4 mm width × 3 mm depth) were punched on the equator of each fruit skin with a punching bear after autoclaving, 30 µL of spore suspension was injected into each well separately as the treatment group, while the treatment injected with an equal amount of autoclaved distilled water was used as the control, and five Bing tang oranges were treated in each treatment and control group [35]. The treated fruits were packed into sterile plastic boxes and incubated in an ultra-clean bench at 25 • C for seven days, respectively. A sample of the peel with a radius of about 3 cm was taken from each hole, quickly stored in liquid nitrogen, and sent to Bioyi Biotechnology Co., Ltd. Wuhan, China, for transcriptome sequencing. Three biological replicates were set up for both the treatment and control groups (see Figure 1).
Curr. Issues Mol. Biol. 2023, 3, FOR PEER REVIEW 5 Wuhan, China, for transcriptome sequencing. Three biological replicates were set up for both the treatment and control groups (see Figure 1).

Figure 1.
Sweet oranges infected with P. digitatum and controls. CK, control fruits; T, fruits infected with P. digitatum.

Expression Profiling of CsWRKY Genes under P. digitatum
To avoid many junctions and low-quality transcriptomic data, we used Trim-galore and Trimmomatic to de-join and filter low-quality data first and then used FastaQC to quality check the data to ensure that all transcriptomic data had a Q-value, more than thirty. The reads were mapped to the reference genome using hisat2. Finally, the reads were compared to the reference genome using the FeatureCounts toolkit of Rsubread software (version 2.12.2). The fragments of each gene quantified the expression of each gene. Heat maps of CsWRKY gene expression patterns were constructed using the R package Pheatmap based on log2 FPKM values [36]. Additionally, we used DESeq2 to analyze differential genes under biotic stress. Finally, genes with Fold Change ≥ 2.00 and p-value ≤ 0.05 were defined as significantly differentially expressed genes [37].

Identification of CsWRKY TF Members and Calculation of Physical and Chemical Parameters
The whole sweet orange genome was scanned using BLAST alignment and the Hidden Markov Model to remove redundant sequences and sequences with short or missing domains before being further analyzed for the presence of intact WRKY domains in candidate members using the Pfam and CDD databases. Forty-seven reliable CsWRKY TFs were randomly renamed CsWRKY1-CsWRKY47 (Supplementary Table S1). It was shown that the G-C content in the promoter had a strong influence on promoter strength. The theoretical isoelectric point was a vital physicochemical property that affected the structure and function of the protein and could be used as an evolutionary marker for promoters and TFs, respectively. Both accompanied the characteristics of long-term biological evolution and showed a decreasing trend in plant cells [38]. So the effects of both on WRKY gene promoter intensity were explored ( Table 1). The amino acid numbers and relative molecular weights of the WRKY TFs ranged from 176 (CsWRKY32) to 953 (CsWRKY29), 20.11 (CsWRKY11) to 106.56 (CsWRKY32) kDa, with mean molecular weights and mean isoelectric points of 45.23 KDa and 6.99, the instability coefficients larger than 40, subcellular localization showed all of them located in the nucleus.

Expression Profiling of CsWRKY Genes under P. digitatum
To avoid many junctions and low-quality transcriptomic data, we used Trim-galore and Trimmomatic to de-join and filter low-quality data first and then used FastaQC to quality check the data to ensure that all transcriptomic data had a Q-value, more than thirty. The reads were mapped to the reference genome using hisat2. Finally, the reads were compared to the reference genome using the FeatureCounts toolkit of Rsubread software (version 2.12.2). The fragments of each gene quantified the expression of each gene. Heat maps of CsWRKY gene expression patterns were constructed using the R package Pheatmap based on log2 FPKM values [36]. Additionally, we used DESeq2 to analyze differential genes under biotic stress. Finally, genes with Fold Change ≥ 2.00 and p-value ≤ 0.05 were defined as significantly differentially expressed genes [37].

Identification of CsWRKY TF Members and Calculation of Physical and Chemical Parameters
The whole sweet orange genome was scanned using BLAST alignment and the Hidden Markov Model to remove redundant sequences and sequences with short or missing domains before being further analyzed for the presence of intact WRKY domains in candidate members using the Pfam and CDD databases. Forty-seven reliable CsWRKY TFs were randomly renamed CsWRKY1-CsWRKY47 (Supplementary Table S1). It was shown that the G-C content in the promoter had a strong influence on promoter strength. The theoretical isoelectric point was a vital physicochemical property that affected the structure and function of the protein and could be used as an evolutionary marker for promoters and TFs, respectively. Both accompanied the characteristics of long-term biological evolution and showed a decreasing trend in plant cells [38]. So the effects of both on WRKY gene promoter intensity were explored ( Table 1). The amino acid numbers and relative molecular weights of the WRKY TFs ranged from 176 (CsWRKY32) to 953 (CsWRKY29), 20.11 (CsWRKY11) to 106.56 (CsWRKY32) kDa, with mean molecular weights and mean isoelectric points of 45.23 KDa and 6.99, the instability coefficients larger than 40, subcellular localization showed all of them located in the nucleus.
In terms of evolutionary progression, promoters were more ready to turn on TFs due to reduced G-C content, and they showed a trend toward less positive charge when TFs did not require a more positive one. Altruistic interaction suggested a mutually beneficial co-evolutionary pattern between promoters and TFs in plant cells [39]. By comparing the overall mean isoelectric point (pI = 7.51) of the corresponding whole proteome, the CsWRKY TFs showed a decreasing trend, and the G-C content of the promoter region also presented an overall decreasing trend, which indicated a strong correlation between the two in the CsWRKY TFs evolution process.

Classification and Phylogenetic Analysis of CsWRKY TFs
Generally speaking, two or more genes with similar sequences had high similarity in function and regulation and vice versa. We detected the taxonomic situation and evolutionary relationships of each CsWRKY TFs by performing multiple sequence comparisons and constructing phylogenetic trees for the structural domains of CsWRKY and AtWRKY TFs (Figures 2 and 3). Members of subfamily I contained two heptad domains and two C2H2-type zinc finger domains, members of subfamily II included a WRKY domain and a C2H2-type zinc finger domain, and members of subfamily III contained a heptapeptideconserved sequence and a C2HC-type zinc finger domain. Among them, CsWRKY20 from group IIc showed the heptad domain of WRKYGKK, and similar ones, such as WRKY-GRK and WSKYEQK, were also found in Populus trichocarpa [40], which showed that the heptad domains and zinc finger domains of CsWRKY TFs had an overall low degree of variation. The subfamilies I, II and III were further subdivided into eight subgroups by combining sequence similarities. Subfamily I contained nine CsWRKYs; subfamily II included 32 members (IIa3, IIb8, IIc10, IId5, IIe6), the largest; subfamily III had six members. At the evolutionary level, the CsWRKY and AtWRKY were evenly distributed on each branch, which indicated that they had similar evolutionary patterns. The N-terminal and C-terminal WRKY domains in subfamily I were clustered into different evolutionary parts, and the five subgroups in subfamily II were not all clustered into one. While IIa, IIb, IId, and IIe belonged to the same subgroup, respectively, which were close to each other and belonged to parallel evolutionary processes but indicated the evolutionary differences between discrepant subgroups. After comparing the two plots in Figure 3A,B, we confirmed that the evolutionary tree construction based on both domains and full-length protein sequences could accurately indicate taxonomic and phylogenetic relationships of CsWRKY TFs, achieving mutual validation support.

Classification and Phylogenetic Analysis of CsWRKY TFs
Generally speaking, two or more genes with similar sequences had high similarity in function and regulation and vice versa. We detected the taxonomic situation and evolutionary relationships of each CsWRKY TFs by performing multiple sequence comparisons and constructing phylogenetic trees for the structural domains of CsWRKY and AtWRKY TFs (Figures 2 and 3). Members of subfamily I contained two heptad domains and two C2H2-type zinc finger domains, members of subfamily II included a WRKY domain and a C2H2-type zinc finger domain, and members of subfamily III contained a heptapeptideconserved sequence and a C2HC-type zinc finger domain. Among them, CsWRKY20 from group IIc showed the heptad domain of WRKYGKK, and similar ones, such as WRKYGRK and WSKYEQK, were also found in Populus trichocarpa [40], which showed that the heptad domains and zinc finger domains of CsWRKY TFs had an overall low degree of variation. The subfamilies I, II and III were further subdivided into eight subgroups by combining sequence similarities. Subfamily I contained nine CsWRKYs; subfamily II included 32 members (IIa3, IIb8, IIc10, IId5, IIe6), the largest; subfamily III had six members. At the evolutionary level, the CsWRKY and AtWRKY were evenly distributed on each branch, which indicated that they had similar evolutionary patterns. The N-terminal and C-terminal WRKY domains in subfamily I were clustered into different evolutionary parts, and the five subgroups in subfamily II were not all clustered into one. While IIa, IIb, IId, and IIe belonged to the same subgroup, respectively, which were close to each other and belonged to parallel evolutionary processes but indicated the evolutionary differences between discrepant subgroups. After comparing the two plots in Figure 3A,B, we confirmed that the evolutionary tree construction based on both domains and full-length protein sequences could accurately indicate taxonomic and phylogenetic relationships of CsWRKY TFs, achieving mutual validation support.    The phylogenetic tree based on the full-length amino acid sequences of the CsWRKY and AtWRKY TFs was constructed using the maximum likelihood tree-building software IQ-TREE, with the optimal model JTTDCMut + F + R5. The calibration parameters bootstrap was repeated 1000 times and retouched by iTOL. Branches with bootstrap ≥ 70% were shown, unmarked branches represent bootstrap < 70%, and the outer Roman numerals indicate grouping.

Analysis of the CsWRKY Gene Structures and TF Domains
Highly conserved gene sequences are vital for the survival of this species, and the domains have crucial functions that cannot be altered and are the core of the genes, and they may encode some important proteins [41]. Based on the phylogenetic relationships, we performed conserved domain analysis and gene structure analysis of 47 CsWRKY TFs to construct an evolutionary tree based on the full-length amino acid sequences ( Figure  4). Ten motifs were mined through the web tool MEME, among which motifs 1, 2, and 3 were considered WRKY domains. Each CsWRKY contained 1 to 7 motifs, ranging from 14 to 50 bp in length, and all members except CsWRKY37 included motifs 1 and 2. Members of the same subfamily (subgroup) of CsWRKY TFs had similar motif composition and arrangement. Some motifs presented specifically or simultaneously in multiple evolutionary branches, e.g., members of the group I contained motifs 1, 2, 3, 4, and 10, and motifs 7 and 9 were specific to subgroup IIb. Motif 8 was only to subgroup IId, and motif 6 was presented in subgroups IIa, IIb, and IIe, related to the functional differences. We found that the coding sequences corresponding to the WRKY domain all contained an intron with a highly conserved position, but the significance of their existence was unclear. The phylogenetic tree based on the full-length amino acid sequences of the CsWRKY and AtWRKY TFs was constructed using the maximum likelihood tree-building software IQ-TREE, with the optimal model JTTDCMut + F + R5. The calibration parameters bootstrap was repeated 1000 times and retouched by iTOL. Branches with bootstrap ≥ 70% were shown, unmarked branches represent bootstrap < 70%, and the outer Roman numerals indicate grouping.

Analysis of the CsWRKY Gene Structures and TF Domains
Highly conserved gene sequences are vital for the survival of this species, and the domains have crucial functions that cannot be altered and are the core of the genes, and they may encode some important proteins [41]. Based on the phylogenetic relationships, we performed conserved domain analysis and gene structure analysis of 47 CsWRKY TFs to construct an evolutionary tree based on the full-length amino acid sequences ( Figure 4). Ten motifs were mined through the web tool MEME, among which motifs 1, 2, and 3 were considered WRKY domains. Each CsWRKY contained 1 to 7 motifs, ranging from 14 to 50 bp in length, and all members except CsWRKY37 included motifs 1 and 2. Members of the same subfamily (subgroup) of CsWRKY TFs had similar motif composition and arrangement. Some motifs presented specifically or simultaneously in multiple evolutionary branches, e.g., members of the group I contained motifs 1, 2, 3, 4, and 10, and motifs 7 and 9 were specific to subgroup IIb. Motif 8 was only to subgroup IId, and motif 6 was presented in subgroups IIa, IIb, and IIe, related to the functional differences. We found that the coding sequences corresponding to the WRKY domain all contained an intron with a highly conserved position, but the significance of their existence was unclear. To provide more insight into the relationship between the coding gene structure and phylogeny, the exon-intron of the CsWRKY genes was demonstrated by TBtools. The forty-seven CsWRKY genes contained 2 to 12 exons and 1 to 11 introns, with high variation in number. In another example, twenty-two genes had three exons, seven genes had four exons, nine genes had five exons, and one gene had twelve exons, with CsWRKY29 having the highest number of exons and introns, which to some extent led to the functional diversity. Genes within the same subfamily (subgroup) usually had a similar number of exons, e.g., members I and III mostly contained three exons and two introns, while the number varied widely among different subgroups.
Here, synthesizing the structural similarities and differences in TFs on the same or different branches and groupings, it can be seen that the CsWRKY genes structure and conserved domain results exhibit a strong correlation with phylogenetic relationships, which can support the reliability of the CsWRKY TF family classification.

Analysis of Cis-acting Elements of the CsWRKY Gene's Promoter
Plant cis-acting elements played a vital role in the global regulation of gene expression. After a cis-acting element analysis of the 2000 bp upstream region of forty-seven CsWRKY genes by the PlantCARE, 14 common cis-acting elements were mainly predicted To provide more insight into the relationship between the coding gene structure and phylogeny, the exon-intron of the CsWRKY genes was demonstrated by TBtools. The forty-seven CsWRKY genes contained 2 to 12 exons and 1 to 11 introns, with high variation in number. In another example, twenty-two genes had three exons, seven genes had four exons, nine genes had five exons, and one gene had twelve exons, with CsWRKY29 having the highest number of exons and introns, which to some extent led to the functional diversity. Genes within the same subfamily (subgroup) usually had a similar number of exons, e.g., members I and III mostly contained three exons and two introns, while the number varied widely among different subgroups.
Here, synthesizing the structural similarities and differences in TFs on the same or different branches and groupings, it can be seen that the CsWRKY genes structure and conserved domain results exhibit a strong correlation with phylogenetic relationships, which can support the reliability of the CsWRKY TF family classification.

Analysis of Cis-Acting Elements of the CsWRKY Gene's Promoter
Plant cis-acting elements played a vital role in the global regulation of gene expression. After a cis-acting element analysis of the 2000 bp upstream region of forty-seven CsWRKY genes by the PlantCARE, 14 common cis-acting elements were mainly predicted (Supplementary Table S2), as well an evolutionary tree was constructed based on the fulllength sequences of CsWRKY TFs ( Figure 5). Including the W-box binding site, seven regulatory elements involved in phytohormone-induced responses (ABRE, cis-acting element involved in the abscisic acid responsiveness; CGTCA-motif and TGACG-motif, cisacting regulatory element involved in the Me-JA-responsiveness; TCA-element, cis-acting element involved in salicylic acid responsiveness; P-box and GARE-motif, gibberellinresponsive element; TGA-box, part of an auxin-responsive element) and six stress-related promoter regulators (ARE, a cis-acting regulatory essential for the anaerobic induction; GT1motif, light responsive element; MBS, MYB binding site involved in drought-inducibility; LTR, cis-acting element involved in low-temperature responsiveness; GCN4_motif, cisregulatory element involved in endosperm expression; WUN-motif, wound-responsive element). All forty-seven CsWRKY genes contained 5 to 32 common acting elements, with the most frequent being ABRE and the least being WUN-motif, with up to 32 predicted ones in CsWRKY45. Table S2), as well an evolutionary tree was constructed based on the f length sequences of CsWRKY TFs ( Figure 5). Including the W-box binding site, seven r ulatory elements involved in phytohormone-induced responses (ABRE, cis-acting ment involved in the abscisic acid responsiveness; CGTCA-motif and TGACG-motif, acting regulatory element involved in the Me-JA-responsiveness; TCA-element, cis-act element involved in salicylic acid responsiveness; P-box and GARE-motif, gibberellin sponsive element; TGA-box, part of an auxin-responsive element) and six stress-rela promoter regulators (ARE, a cis-acting regulatory essential for the anaerobic inducti GT1-motif, light responsive element; MBS, MYB binding site involved in drought-indu bility; LTR, cis-acting element involved in low-temperature responsiveness; GCN4_mo cis-regulatory element involved in endosperm expression; WUN-motif, wound-resp sive element). All forty-seven CsWRKY genes contained 5 to 32 common acting eleme with the most frequent being ABRE and the least being WUN-motif, with up to 32 p dicted ones in CsWRKY45. Notably, forty-seven CsWRKY genes predicted 73 W-boxes, suggesting that spec binding of the WRKY domain to the W-box was essential for coordinated transcriptio Notably, forty-seven CsWRKY genes predicted 73 W-boxes, suggesting that specific binding of the WRKY domain to the W-box was essential for coordinated transcriptional activation. CsWRKY45 contains 18 regulatory elements associated with the Me-JA response. CsWRKY2 had ten regulatory factors related to abscisic acid response. Nearly half of the ratio is mainly to biological processes such as hormone signaling under specific adversity stresses.

Chromosome Distribution, Gene Density and Duplication, Collinearity Analysis, and Ka/Ks Calculation for CsWRKY Genes
To understand the distribution and genome-wide density of CsWRKY genes on sweet orange chromosomes, we localized 47 CsWRKY genes on nine chromosomes of the sweet orange genome by TBtools ( Figure 6). There were 4, 7, 2, 8, 10, 5, 7, 1, and 3 CsWRKY genes localized on chromosomes 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively, with an overall uneven distribution and no significant correlation between chromosome length and the number of CsWRKY genes. However, the whole genes were relatively evenly distributed on each chromosome.
. Issues Mol. Biol. 2023, 3, FOR PEER REVIEW of the ratio is mainly to biological processes such as hormone signaling under speci adversity stresses.

Chromosome Distribution, Gene Density and Duplication, Collinearity Analysis, and Ka/K Calculation for CsWRKY Genes
To understand the distribution and genome-wide density of CsWRKY genes on swe orange chromosomes, we localized 47 CsWRKY genes on nine chromosomes of the swe orange genome by TBtools ( Figure 6). There were 4, 7, 2, 8, 10, 5, 7, 1, and 3 CsWRKY gen localized on chromosomes 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively, with an overall unev distribution and no significant correlation between chromosome length and the numb of CsWRKY genes. However, the whole genes were relatively evenly distributed on ea chromosome. Figure 6. Chromosome distribution, gene density, gene duplication, and collinearity analysis CsWRKY genes. The gray line indicated all duplicated genes in the sweet orange genome, blue p sents duplicated WRKY gene pairs, the gene density was shown as a heat map, and line graph w increasing density from blue to gray to yellow, and chromosome names were shown on the outsi of each chromosome.
Genome-wide duplication events are a common phenomenon in biology and cu rently reported in many species. Forty-seven CsWRKY genes duplication events were a Figure 6. Chromosome distribution, gene density, gene duplication, and collinearity analysis for CsWRKY genes. The gray line indicated all duplicated genes in the sweet orange genome, blue presents duplicated WRKY gene pairs, the gene density was shown as a heat map, and line graph with increasing density from blue to gray to yellow, and chromosome names were shown on the outside of each chromosome.
Changes in amino acids maybe lead to radical changes in their function and conformation, resulting in an advantage or disadvantage of evolutionary selection [42]. To understand the evolutionary selection of the duplicated homologous CsWRKY genes subjected during evolution and whether this selection contributed to the organism's fitness, we calculated the ratio between nonsynonymous and synonymous substitutions (Ka/Ks) for twenty-one pairs of duplicated genes (Supplementary Table S3). It was shown that genes were subject to positive selection for the Ka/Ks ratio > 1, neutral selection for the ratio = 1, and purifying one for the ratio < 1 [27]. Twenty-one pairs of duplicated genes were identified, and 90.5% had Ka/Ks values < 1, suggesting that CsWRKY genes in sweet orange may have been subject to purifying selection pressure during speciation, while 9.5% of duplicated gene pairs had zero Ks, which may have resulted from single-base mutations.

Gene Ontology (GO) Enrichment Analysis
Genome sequencing data have shown that most genes with core biological functions are common to all eukaryotes. GO annotation classification system is subject to elaborate biological macromolecules with three major underlying classifications: cellular components, molecular function, and biological processes [43]. To further understand the expression of forty-seven CsWRKY genes, we performed GO annotation to analyze (Supplementary  Table S4). Forty-one of forty-seven CsWRKY genes were up to 387 GO terms, mainly including three categories, of which 87.86% were to biological processes. Since nearly 400 GO terms were done, we selected 41 GO-Level 3 for display (Figure 7). The molecular functional categories mainly included heterocyclic compound binding (GO:1901363) and DNA-binding transcription factor activity (GO:0003700). Intracellular anatomical structure (GO:0005622) and membrane-enclosed lumen (GO:0031974) were in the cellular components. The category of biological processes was mainly involved in the organic substance metabolic process (GO:0071704), regulation of the process (GO:0050789), nitrogen compound metabolic process (GO:0006807), cellular metabolic process (GO:0044237), and biosynthetic process (GO:0009058). Most disease-resistance genes interacted with the TFs to enhance disease resistan For example, Xa1 shear in vivo released an intracellular kinase domain that transloca to the nucleus and interacts with the OsWRKY62 TFs to enhance rice resistance to le blight [44]. Pita encoding product interacted with the expression product of AVR-Pita non-virulent gene of rice blast fungus, to trigger a disease-resistance response [45]. The disease resistance genes were to the essential functions of the molecule and processes lated to cells, tissues, organs, or living organisms, and gene product localization.

The transcript-Level Analysis of CsWRKY Genes under Biotic Stress
To investigate the potential role of CsWRKY genes under biotic stress, we analyz the transcriptome data of sweet oranges after five days of infection by P. digitatum a obtained the expression patterns of 47 CsWRKY genes in sweet oranges (Figure 8). W screened thirteen differentially expressed genes (value > 2), including CsWRKY CsWRKY5, CsWRKY8, CsWRKY11, CsWRKY14, CsWRKY20, CsWRKY23, CsWRKY2 CsWRKY28, CsWRKY31, CsWRKY37, CsWRKY45, and CsWRKY46, only CsWRKY23 wa down-regulated expression gene, while the rest were all up-regulated expression gen Notably, CsWRKY2 and CsWRKY14 expression were up-regulated 67 and 155-fold, all dicating that different copies of different genes had different expression patterns in swe orange under biotic stress. CsWRKYs were extensively involved in the growth and dev opment of sweet oranges in response to P. digitatum infection. Interestingly, CsWRKY Most disease-resistance genes interacted with the TFs to enhance disease resistance. For example, Xa1 shear in vivo released an intracellular kinase domain that translocates to the nucleus and interacts with the OsWRKY62 TFs to enhance rice resistance to leaf blight [44]. Pita encoding product interacted with the expression product of AVR-Pita, a non-virulent gene of rice blast fungus, to trigger a disease-resistance response [45]. These disease resistance genes were to the essential functions of the molecule and processes related to cells, tissues, organs, or living organisms, and gene product localization.

The Transcript-Level Analysis of CsWRKY Genes under Biotic Stress
To investigate the potential role of CsWRKY genes under biotic stress, we analyzed the transcriptome data of sweet oranges after five days of infection by P. digitatum and obtained the expression patterns of 47 CsWRKY genes in sweet oranges (Figure 8). We screened thirteen differentially expressed genes (value > 2), including CsWRKY2, CsWRKY5, CsWRKY8, CsWRKY11, CsWRKY14, CsWRKY20, CsWRKY23, CsWRKY27, CsWRKY28, CsWRKY31, CsWRKY37, CsWRKY45, and CsWRKY46, only CsWRKY23 was a down-regulated expression gene, while the rest were all up-regulated expression genes. Notably, CsWRKY2 and CsWRKY14 expression were up-regulated 67 and 155-fold, all indicating that different copies of different genes had different expression patterns in sweet orange under biotic stress. CsWRKYs were extensively involved in the growth and development of sweet oranges in response to P. digitatum infection. Interestingly, CsWRKY17 and CsWRKY32 did not express in ripe fruits but did significantly after pathogen infection treatment. In Arabidopsis, the functions of most WRKY TFs were verified and widely in plants' response to pathogen infections. We selected six highly expressed C representative genes for functional inference based on the evolutionary relation structural similarity between CsWRKY and AtWRKY TFs ( Table 2). Combined motif prediction results, we found that the N and C-terminal ends of the highly e differential genes contained a novel motif5 and motif6. Respectively, those prese promote rapid expression. The promoter region had abundant TGACG-motif, Wpromoter elements that responded to pathogen infection and bound to diseasegenes, and then activated their transcription in response to pathogen infection pression level of the silenced CsWRKY32 was altered after P. digitatum infectio vital motif4 was predicted in front of the W-box, a fungus-inducible responding Me-JA (CGTCA-motif), which to some extent conferred a structural basis for resi the pathogen. In Arabidopsis, the functions of most WRKY TFs were verified and widely involved in plants' response to pathogen infections. We selected six highly expressed CsWRKYs representative genes for functional inference based on the evolutionary relationship and structural similarity between CsWRKY and AtWRKY TFs ( Table 2). Combined with the motif prediction results, we found that the N and C-terminal ends of the highly expressed differential genes contained a novel motif5 and motif6. Respectively, those present might promote rapid expression. The promoter region had abundant TGACG-motif, W-box core promoter elements that responded to pathogen infection and bound to disease-resistant genes, and then activated their transcription in response to pathogen infection. The expression level of the silenced CsWRKY32 was altered after P. digitatum infection, and a vital motif4 was predicted in front of the W-box, a fungus-inducible responding element Me-JA (CGTCA-motif), which to some extent conferred a structural basis for resistance to the pathogen. Expression of AtWRKY33 could be induced by parasitic plants, avirulent, necrotrophic fungal pathogens (Alternaria alternata, Sclerotinia sclerotiorum), and PAMPs (flg22 polypeptide, chitin, oxidative stress).

CsWRKY32
AtWRKY3 I Positive regulation Nucleus AtWRKY3 gene expression was up-regulated significantly under salt and Me-JA stress, and AtWRKY3 deletion affected the ROS scavenging pathway and reduced stress tolerance. Overexpression of AtWRKY3 significantly increased plant susceptibility to bacterial pathogens [47].

CsWRKY11
AtWRKY45, AtWRKY72, AtWRKY22 IIc Positive regulation Nucleus AtWRKY22 mediated the senescence signaling pathway and regulated the expression of senescence-related genes. Under hypoxic (flooding) stress, the encoded genes were expressed rapidly and strongly to enhance resistance to P. syringae [48].

CsWRKY45
AtWRKY48 IIc Negative regulation Nucleus AtWRKY48 can be rapidly induced by P. syringae infection and specifically binding to the W-box sequence. JA and ET signaling pathways were negatively regulated in stress and pathogenesis-induced AtWRKY48 expression [49].

CsWRKY14 AtWRKY7
IId Negative regulation Nucleus AtWRKY7 bonds to the bZIP28 promoter via a W-box element. AtWRKY7 mutant accumulates more bZIP28 TFs in response to Flg22, a highly conserved polypeptide at the N-terminal end of the bacterial flagellin, and the higher the degree of mutation, the higher the resistance to P. syringae pathovar tomato pathogenic (Pst) [50].

CsWRKY37
AtWRKY70 III Negative regulation Nucleus AtWRKY70 was constitutively expressed at all leaf developmental stages, with the highest signal intensity in senescent leaves; single mutants showed more resistance to P. syringae, while double mutants were more sensitive [51].

Discussion
WRKY family is a prime class of transcription factors in plants involved in the plant response to biological and abiotic stress regulation. Meanwhile, many WRKY genes in the genomes of higher plants also indicate that they play a broad role in development and evolution. This class of genes has been studied in monocotyledons and dicots, mainly due to the increasingly mature genome-wide, transcriptome sequencing technologies, bioinformatics tools, and the mature research systems for identification, classification, and biological functions of model plants, such as A. thaliana and G. max, which lay the foundation for the study of the WRKY TFs family in non-model species. Previous studies on CsWRKY genes focused on the differences in expression patterns of stems, leaves, fruits, or roots at different developmental periods and the analysis of expression under abiotic stress such as drought, cold, and high salt, as well as mechanistic studies. However, systematic studies on WRKY TFs in sweet orange under P. digitatum stressing have rare reports.
In this study, 47 CsWRKY TFs were identified by the genome-wide analysis, which was close to the woody plants, such as Zanthoxylum bungeanum (38 ZbWRKY) [52] and Prunus dulcis (62 PdWRKY) [53]. The number differed significantly compared to herbaceous plants such as Chenopodium quinoa (92 CqWRKY) [54] and Oryza nivara (97 OnWRKY) [55], which indirectly indicated that monocotyledonous genomes might be more abundant than dicotyledonous genomes on the whole [39]. In similar studies, 51 CsWRKY TFs were identified by Ayadi, and nine members were not located on chromosomes [56], compared with the 47 TFs we identified located on nine chromosomes. Our identification results may be more accurate than Ayadi's study and can better present information about the members of the CsWRKY TF families. As we used the latest draft genome of sweet orange (Citrus sinensis v3.0), the method used was easier to understand, and the screening criteria were much more rigorous. Sweet oranges often grow in hilly and poorly established environments, though it does not contain many WRKY TFs, and we assume that the expression and transcriptional regulation of most WRKY genes is vital for responding to various stresses. Many small-scale gene duplications or doubling events have occasionally occurred in woody or herbaceous plants during their long evolutionary history, which can easily lead to unequal gene numbers in different species. The duplication of genes in these regions has little effect on conserved sequences and ensures the exercise of TFs function [2]. Various proteins interact with WRKY TFs, and the vital WRKY TF proteins may provide important insights into their regulation and mode of action, closely related to their physicochemical properties. The G and C content of the promoter regions and the isoelectric point of their coding proteins determine the timing, intensity, and location of regulated gene expression. Forty-seven CsWRKY gene promoters showed decreasing trends in G and C content and isoelectric points of TFs, consistent with the co-evolutionary pattern of promoters and transcription factors in the plant kingdom [38].
During the visualization of the CsWRKY domain, novel heptad domains such as WRKYGKK were identified, which may have resulted from single base mutations. It has been reported that single or multiple amino acid changes in tobacco reduce the affinity for binding to the W-box and even alter its function [57]. Phylogenetic relationships showed that forty-seven CsWRKY TFs were divided into eight subgroups, with the IIc subgroup having the most members and the IIa subgroup the least, which was verified in almost all reported WRKY TFs [8,10,35,53].
Members of the same subfamily on the phylogenetic tree had the same or similar intron-exon structure, domain type, and arrangement order [58]. We observed some unique or missing domains, introns, and exons in some groups. The CsWRKY TF families underwent intron-exon losing or incoming events during the evolutionary process, and these differences may be related to CsWRKY gene functional diversity [59]. In Arabidopsis, the AtWRKY27-positive regulator was essential for defense against dead body trophic B. cinerea [60]. The intron-exon structure of the highly expressed CsWRKY2 was highly similar to that of AtWRKY27. However, the domains were arranged in a different order, probably due to differences in the domain arrangement between various species at different evolutionary levels. The expression levels of AtWRKY11 and AtWRKY17 were up-regulated at different developmental stages under ABA, salt, and osmotic stress [61]. The highly expressed gene CsWRKY14 in sweet orange was highly homologous to the above genes, but they played different roles in response to various stresses, and whether the CsWRKY14 could be highly expressed under abiotic stress remained to be verified. It has been shown that the domains and gene structures exhibit strong correlations with phylogenetic relationships and biological functions. However, only the functional validation of very few members of TF families had been accomplished in most species, so the overall realization of evolutionary relationships and taxonomic studies by gene functions have not been reported in non-model species, which deserves further research.
WRKY TFs bond to target gene promoter W-box cis-acting elements and participate in autoregulation or cross-regulation by regulating multiple stress signaling pathways to activate, enhance, or repress the expression of target genes [62]. Seventy-three W-Box binding sites were identified from 47 CsWRKY TFs, even as many as 4 in CsWRKY7 in this paper. There were 22 CsWRKY TFs with multiple W-Box binding sites (value >2). However, for the 56 CsWRKYs with much W-boxes identification, the percentage was 33.11% in the tea trees [8]. In barley, the HvWRKY38 TF was associated with two adjacent W-boxes to activate the transcription of downstream target genes effectively [63], fully demonstrating the vital regulatory role of W-boxes in plant response to biotic or abiotic stress. When ABA and Me-JA accumulated to a certain level in plants, they would initiate the expression of disease-resistant defense genes, prompting a defense response and thus exhibiting strong resistance to disease [64]. We predicted abundant hormone-regulated elements (154 ABRE elements and 85 TGACG-motif73), suggesting many inducible promoters with cumulative effects on the expression of downstream genes in addition to the required ones such as W-box elements, which was similar to the predicted promoters 2000 bp upstream of WRKY genes in pineapple [11] and watermelon [26]. In this study, we found that CsWRKY2 contained many listed vital promoter elements, indicating that this gene responds to P. digitatum infection as a process in which multiple cis-acting genes interact in an integrated manner. The CsWRKY14 contained only ABRE, ARE, and W-box elements. It indicated a difference in the mechanism of action of the two genes in response to fungal infection. It is inferred that these promoter elements may respond to pathogen stress, drought stress, high-temperature stress, low-temperature stress, salt stress, and plant damage stress.
Genome-wide duplication events are a common phenomenon in biology; gene duplication events are a dominant force in the evolution of plant genomes and duplicated gene pairs are now introduced in many species providing the molecular basis for species adaptation to different environments and biodiversity [65]. Among the 47 CsWRKY genes, there were 21 sets of fragment duplication genes, three duplications (CsWRKY24, CsWRKY42, and CsWRKY43) were amplified from the CsWRKY8 gene alone, and the duplication events were very conserved, with the number of fragment duplications far exceeding the number of tandem duplications, indicating that fragment duplication events were one of the main drivers of the evolution of the CsWRKY TFs. We noted that among the 21 pairs of duplicated genes, most of them are distributed in regions with high gene density, and chromosome 5 had the highest number of duplicated genes and the most distributed CsWRKY members, speculating that there was rich genetic diversity on chromosome 5, which may have evolutionarily created and preserved a wide variety of variant types. The Ka/Ks ratios of 21 pairs of duplicated genes showed that 90.5% of the Ka/Ks values were < 1. It inferred that CsWRKY genes might have been subjected to purify selective pressure during speciation, which meant that in most cases, the WRKY genes were selected to eliminate harmful mutations and keep the protein unchanged while maintaining its original function.
In the results of phylogenetic tree analysis, different subgroups had some preferences in response to various biotic or abiotic stress, while members from the same evolutionary branch often had many similarities in stress functions. For example, some WRKY from subfamilies IIa [66] and III [51] were mainly involved in regulating leaf senescence in response to environmental stress, and some WRKY genes from subfamily I [46] and subgroup IIb [67] were involved in response to fungal and bacterial pathogens. Since highly expressed genes usually play a vital role in plant development, CsWRKY genes can be expressed strongly and rapidly under specific biotic stress. In the present study, CsWRKY2 from group I and CsWRKY14 from group IId were up-regulated significantly in response to P. digitatum infection, and CsWRKY14 from subgroup IId and CsWRKY46 from subgroup IIe were identified as differentially expressed genes, both likely involved in the defense response to P. digitatum infection. These highly and differentially expressed genes WRKY2, 5,8,11,14,20,23,27,28,31,37,45,46 were essential for sweet orange growth and were good candidates for future functional analysis. CsWRKY17 and CsWRKY32 were not expressed in mature fruits but did after P. digitatum infection, so the two genes were also listed as valuable candidates.
The function prediction employing AtWRKY genes with similar structures to fifteen CsWRKY genes as a reference showed that some genes might act individually in response to multiple stress, depending on the signaling pathways and many phytohormone signaling cascades in plant stress responses. CsWRKY11 and CsWRKY45, the only pair of duplicated genes among the differentially expressed genes, may be involved in leaf senescence and defense against Pseudomonas syringae responses. However, the difference in expression was nearly one-fold, which was consistent with Wendel's [68] genomic evolutionary pattern for polyploids, i.e., two duplicated genes in functional divergence may take a long time to occur but diversity in expression can start to occur within a short period of inches after the duplication.

Conclusions
In this study, 47 different CsWRKY TFs were identified and classified into eight subgroups (three subfamilies), and their expression patterns were analyzed. The exonintron structures and domains of the CsWRKY genes strongly supported the classification results. Fourteen induced-response and stress-response elements were also identified in the promoter regions. Expression profiles indicated that forty-five genes involve in resistance to fungus infection. Additionally, fifteen significantly differentially expressed genes were essential for sweet oranges in response to P. digitatum infection, and their response varied with the degree of stress. In summary, this study provides a solid basis for studying the regulatory mechanisms of WRKY TFs in response to pathogen infection in sweet orange, and the results will help provide insights for further exploration of the role of CsWRKY TFs in biotic stress response.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cimb45020082/s1, Table S1: The information on the WRKY TFs family is in sweet orange; Table S2: Information on cis-elements detected in the promoter regions of the WRKY genes; Table S3: The Ka/Ks ratio of 21 pairs of tandem repeats in the sweet orange WRKY TFs family; Table S4: The GO enrichment analysis of the WRKY genes.