Composition and Codon Usage Pattern Results in Divergence of the Zinc Binuclear Cluster (Zn(II)2Cys6) Sequences among Ascomycetes Plant Pathogenic Fungi

Zinc binuclear cluster proteins (ZBC; Zn(II)2Cys6) are unique to the fungi kingdom and associated with a series of functions, viz., the utilization of macromolecules, stress tolerance, and most importantly, host–pathogen interactions by imparting virulence to the pathogen. Codon usage bias (CUB) is the phenomenon of using synonymous codons in a non-uniform fashion during the translation event, which has arisen because of interactions among evolutionary forces. The Zn(II)2Cys6 coding sequences from nine Ascomycetes plant pathogenic species and model system yeast were analysed for compositional and codon usage bias patterns. The clustering analysis diverged the Ascomycetes fungi into two clusters. The nucleotide compositional and relative synonymous codon usage (RSCU) analysis indicated GC biasness toward Ascomycetes fungi compared with the model system S. cerevisiae, which tends to be AT-rich. Further, plant pathogenic Ascomycetes fungi belonging to cluster-2 showed a higher number of GC-rich high-frequency codons than cluster-1 and was exclusively AT-rich in S. cerevisiae. The current investigation also showed the mutual effect of the two evolutionary forces, viz. natural selection and compositional constraints, on the CUB of Zn(II)2Cys6 genes. The perseverance of GC-rich codons of Zn(II)2Cys6 in Ascomycetes could facilitate the invasion process. The findings of the current investigation show the role of CUB and nucleotide composition in the evolutionary divergence of Ascomycetes plant pathogens and paves the way to target specific codons and sequences to modulate host–pathogen interactions through genome editing and functional genomics tools.


Introduction
Regulation of genes at different levels, such as transcriptional, post-transcriptional, translational, and post-translational is critical for generating a functional product. Gene expression regulation is required for cellular differentiation, adaptation, development, and evolution [1,2]. The regulation of gene expression is mediated by protein molecules that bind to specific DNA sequences and act as an activator or repressor of gene expression, termed transcription factors (TF). Whole genome sequencing has led to the identification of~80 transcription factor families in almost 200 fungal species [3]. Zinc finger proteins are one of the largest groups of transcription factors present in eukaryotes, with diverse secondary structures and functional characteristics [4]. These transcription factors have been categorised into three classes, namely Cys 2 His 2, Cys 4 zinc finger, and C 6 zinc finger proteins (Zn(II) 2 Cys 6 ). C6 proteins comprise two zinc atoms bound to six cysteine residues and are termed and represented as zinc binuclear clusters Zn(II) 2 Cys 6 . These Zn(II) 2 Cys 6 proteins are exclusively found in fungi and distributed in the ratio of 10:20:40 in chytridszygomycetes, basidiomycetes, and Ascomycetes, respectively [3]. Further, Zn(II) 2 Cys 6 has been extensively studied in S. cerevisiae, with Gal4p as the signature protein [4], Candida albicans [1], Tolypocladium guangdongense [5], and Aspergillus flavus [6]. A wide range of functions has been attributed to Zn(II) 2 Cys 6 proteins, from primary to secondary metabolisms, playing major roles in fungal development and imparting virulence to pathogenic fungi. It has been established that Zn(II) 2 Cys 6 proteins are involved in carbon, nitrogen utilization, secondary metabolites biosynthesis, stress response [7], chromatin remodelling, melanin biosynthesis, sugar and amino acid metabolism, drug resistance [8], hyphal growth regulation, appressorium polarisation, etc. [9,10]. A mutant TPC1 (transcription factor for polarity control) M. oryzae strain showed delayed glycogen and lipid metabolism, along with appressorium-mediated plant infection [11]. In Aspergillus oryzae, transcription of kojA and kojT is regulated by the Zn(II) 2 Cys 6 protein, KojR, which mediates the biosynthesis of kojic acid [12]. Similar orthologs were also identified in A. flavus, which, along with another gene cluster, regulates kojic acid biosynthesis [13]. In A. nidulans, almost 50 Zn(II) 2 Cys 6 proteins have been identified with diverse roles such as involvement in ST biosynthesis (AflR), amylolytic gene expression (AmyR), conidial maturation (VosA gene), asexual and sexual development (zcfA gene, FluG) [14,15], etc. Mutation in Ume6, which is an Zn(II) 2 Cys 6 , leads to a defective hyphal extension in the host tissue, resulting in loss of virulence [16]. Two melanin biosynthetic genes, SCD1 and THR1, are positively regulated by a Zn(II) 2 Cys 6 protein named Cmr1p in Collectotrichum lagenarium, which infects cucumbers [17].
The genetic code consists of a set of 64 codons. Excluding three stop codons, the genetic code encodes for 20 amino acids. Except for methionine and tryptophan, most of the codons code for more than one amino acid. Codons that get translated into the same amino acid are referred to as synonymous codons. The usage of these synonymous codons is not uniform. The unequal or preferential use of synonymous codons leads to an inclination towards a specific set of codons called codon usage bias (CUB) [18]. CUB is widespread and involves variations among (1) amino acid codons, (2) genes of a genome, and (3) genomes of different species [19,20]. CUB is based on the theory of mutationselection-genetic drift, which states that evolutionary forces generate some adaptive and non-adaptive mutations which do not affect the primary protein structure; however, CUB studies in model organisms suggested that CUB implants a marked influence on various transcriptional and translational processes [21]. The importance of CUB can be understood from the plethora of functions that influences, viz. mRNA transcription [22], mRNA stability [23], tRNA pairing [24], translational speed [25], correct protein folding [26], ensuring full protein biosynthesis [27], etc.
CUB has now been termed an important evolutionary parameter determining the expression of genes. The availability of novel and advanced sequencing techniques and genomic information resulted in an extensive study of CUB in both prokaryotes and eukaryotes. CUB is affected by factors such as hydrophobicity, gene length, replication, gene function, and secondary protein structure. Evolutionary factors contributing to CUB are mutational pressure and natural and translational selection [28]. It has been reported in some extremely AT-/GC-rich prokaryotic organisms, such as Micrococcus luteus [29], Rickettsia prowazekii [30], and Borrelia burgdorferi [31], that compositional bias solely governs the observed codon usage variations.
In contrast, there have also been instances in organisms, such as Escherichia coli [32], Mycobacterium tuberculosis [33], Drosophila melanogaster [34], and Caenorhabditis elegans [35], where translational selection pressure has been the major factor shaping codon usage signatures in highly expressed genes. Furthermore, codon usage patterns in eubacterial and archaeal genomes have also been reported to be a combinatorial consequence of mutational constraint and natural selection for translation [36]. Based on the large-scale data generated for various organisms, the researchers concluded that CUB is the result of balanced interaction of both natural selection and mutation pressure [20].
Through various studies, it has been established that CUB modulates heterologous gene expression, improves protein production, and regulates the cell cycle [37], cell proliferation and differentiation [38], and stress regulation [39]. Arella et al. [40] reported that CUB influences cellular fitness, which could further govern microbial organism ecology. Additionally, it was shown that CUB controls host-pathogen interactions by allowing the host and pathogens to adapt to their specific environments [41]. Codon optimization achieved through CUB favours pathogen colonisation in the hosts. Host colonization requires the secretion of certain diverse and complex proteins, evading host-imposed defence mechanisms and competing with other microbes [42]. The release of these secretory proteins is directly linked to translational efficiency, which is a crucial factor in synonymous codon usage patterns. Most of the studies conducted so far have focussed on the codon usage pattern at the whole genome level, especially in model systems and other species, viz. Caenorhabditis, Escherichia coli, Drosophila, Arabidopsis, yeast, Giardia lamblia, Entamoeba histolytica, Ustilago, Borrelia burgdorferi, Taenia saginata, A. flavus, A. nidulans, Saccharomyces cerevisiae, etc. [20,38,[43][44][45][46]. However, there are hardly any studies comparing the CUB pattern of a gene family in plant-pathogenic fungi.
Ascomycetes pose highly deleterious effects on plants. Approximately 60% of top fungal pathogens belong to Ascomycetes [47] and are capable of causing 70-80% yield loss in crops [48]. The M. oryzae alone is responsible for causing 10-30% yield loss in rice [49]. The Food and Agricultural Organization reported that around 25% of the global food crops were contaminated by mycotoxins. Further, the harmful effects of Ascomycetes mycotoxins of plant pathogenic species from Alternaria, Aspergillus, Fusarium, and Colletotrichum extend to humans and animals [50]. These mycotoxins from Ascomycetes disrupt cellular functions and kill organisms, including humans, birds, and animals [51]. Additionally, the mycotoxin produced by Fusarium is one of the top five mycotoxins infecting humans [52]. The best-studied Ochratoxin-A is produced by several species of Aspergillus, and Penicillium is a common food-contaminating mycotoxin, especially in cereals, pulses, nuts, fruits, vegetables, and stored products [53]. Therefore, in the current investigation, Zn(II) 2 Cys 6 sequences unique to fungi were chosen to study the CUB patterns in nine Ascomycetes plant pathogenic fungi in relation to the model yeast system to decipher the association between CUB and evolutionary aspects shaping the Ascomycetes systems.

Nucleotide Composition Analysis
The CDSs of ascomycetous pathogenic fungi of cereals under study and Saccharomyces cerevisiae were examined for nucleotide compositions. Nucleotide composition analysis was performed for each of the CDS sequences to quantify the frequencies of four standard nucleotides (A, T, G, and C), the occurrence of nucleotides at the third position of synonymous codons (A3, T3, G3, and C3), total GC content (GC%), and GC content at first (GC1), second (GC2), and third (GC3) positions of a codon. The percent GC content at the first and second position of codons (GC12) for each Zn(II) 2 Cys 6 CDSs was also calculated.

The Effective Number of Codons (ENC) and ENC Plot Analysis
The effective number of codons (ENC) considers the amino acid degeneracy level to calculate the total number of different codons used in a sequence. Therefore, ENC ranges from 20, with only one codon for each amino acid, to 61, with all the synonymous codons used with equal probability. Thus, ENC values are inversely proportional to codon usage bias. The ENC for Zn(II) 2 Cys 6 sequences were calculated as per Wright [48], as follows: where F n (n = 2, 3, 4, 6) is the mean of F n for n fold degeneracy of amino acids. The ENC plots were generated by plotting the ENC values against the GC3 value of sequences. The ENC mainly determines whether a gene's codon usage pattern is influenced by mutation and selection pressures. The position of ENC values of sequence on or around the standard GC3 curve suggests codon choice constraint owing to G + C mutation bias. If the ENC values are distributed considerably below the expected GC3s curve, this indicates the presence of selection effects on sequences [55].

Relative Synonymous Codon Usage Analysis
Relative synonymous codon usage (RSCU) is the ratio of the observed frequency of the codon to the expected usage frequency of all codons equally used within the given synonymous codon family of amino acids [56,57]. The RSCU of Zn(II) 2 Cys 6 CDSs from all the ten fungi species were calculated as per the following equation: where X ij is the extent of jth codon for ith amino acid and n i is the synonymous codons number for an ith amino acid.

Intrinsic Codon Deviation Index
Intrinsic codon deviation index (ICDI) provides a chi-square value independent estimate based on RSCU and degeneracy of amino acids in the sequence. ICDI is most helpful in estimating the codon bias in species where optimal codons are unknown [58,59]. The ICDI estimates are ranges from 0 with equal usage of all codons to 1 for one codon per amino acid. The ICDI estimates were calculated as per the following equation [38]: where F α is a relative frequency of amino acid α and S α = 1 k α (k α −1) ∑ c∈C α (r αc − 1) 2 . Here, r αc is RSCU and k α is the degeneracy of amino acid α.

Codon Adaptation Index
The codon adaptation index (CAI) quantifies the frequency or relative adaptiveness of a favoured codon being used amongst highly expressed genes. A codon's relative adaptiveness (w) is calculated as the ratio of individual codon usage to that of the most abundant codon for the same amino acid [56]. Therefore, CAI of Zn(II) 2 Cys 6 CDSs are obtained through: where n is the number of codons and w k = RSCU i RSCU max . Here, RSCU max is the highest codon usage frequency for synonymous codons in a highly expressed reference gene, i.e., which represent the most abundant codon for an amino acid, and RSCU i refers to the relative occurrence of a unified codon of the first codon encoding the corresponding amino acids. CAI ranges from 0 to 1 and is a primary hint on translation efficiency [60].

Codon Bias Index (CBI)
The codon bias index (CBI) estimates the bias of the codon usage pattern of the coding sequence based on the degree of preferred codons. The CBI values range from 0 to 1. A CBI value of zero refers to a random choice of codons, whereas a CBI value of 1 indicates the sequence mostly uses preferred codons. The CBI of Zn(II) 2 Cys 6 CDSs were calculated using the following equation [61]: where N o is the total occurrence of superior codons in the coding sequence, N r is the total of superior codons when all the synonymous codons are random, and N t refers to the frequency of amino acids corresponding to superior codons in the coding sequences.

Frequency of Optimal Codons (FoP)
The ratio of the number of optimal codons to the total number of codons (both optimal and non-optimal) provides the FoP index [32]. It is essential to understand that the FOP index is context-or species-dependent as its values depend on the genetic code of the particular species.

Synonymous Codon Usage Order (SCUO) Index
The synonymous codon usage order (SCUO) index quantifies the eccentricity from uniform distribution as a normalised difference between the maximum and observed entropy [62]. The average SCUO index for entire coding sequences was calculated using: where j is the codon ith amino acid and SCUO i = Here, SCUO i is the SCUO for ith amino acid in each sequence and H i and H max i are the entropy and maximum for an ith amino acid in a sequence.

Codon Usage Similarity Index
The codon usage similarity index (COUSIN) compares the codon usage preferences of a query sequence with the reference and normalises the output over the assumption of the null hypothesis of random codon usage. The COUSIN could be computed as COUSIN 18 or COUSIN 59 . The COUSIN 18 allows the equal contribution of each of the 18 families of synonymous codons to the global index. In comparison, COUSIN 59 allows the proportional contribution of each family to the frequency of the corresponding amino in the query sequences. The COUSIN scores of Zn(II) 2 Cys 6 CDSs of all the fungi species were computed using COUSIN software [63]; (https://cousin.ird.fr (accessed on 15 May 2022)).

GRAVY and AROMA
The biochemical properties of the final hypothetical translated products, viz., hydropathicity and aromaticity, are associated with codon bias of coding sequences. The general average hydropathicity or the grand average of hydropathicity (GRAVY) score was employed to estimate the hydropathy of sequence. GRAVY is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid in a hypothetical translated coding sequence product. The positive and negative GRAVY scores the hydrophobic and hydrophilic nature of the protein [64]. The aromaticity score provides the frequency of aromatic amino acids (Phe, Tyr, and Trp) in the hypothetical translated coding sequence product [65].

PR2 and Neutrality Plots
PR2-bias plots were generated based on the principle of parity rule 2. The parity rule 2 (PR2) states that under the absence of selection and mutational pressure, the nucleotide bases follow the A = T and G = C (where A + T + G + C = 1) rule [66]. The A3/(A3 + T3) and G3/(G3 + C3) values of every Zn(II) 2 Cys 6 CDS sequence were calculated and used as the ordinate and abscissa to visualise the association between purine (A and G) and pyrimidine (T and C) at the third codon position in the form of a PR2 bias plot. When A = T and G = C (PR2), the centre of the plot where both coordinates are 0.5 harbours the data points. Therefore, any deviation from the centre of the PR2 plot allows estimating the chain bias affected by the mutation, selection, or both. The significant deviation from the parity rules at the third codon position of four-codon amino acids mostly results from selective biases rather than mutational biases during evolution. In other words, if the data points are evenly distributed across the plan view, that is, if the frequency of A + T is equal to that of G + C at the third position of the codon, then the codon usage preference mainly results from mutation [66,67].
The neutrality plots for Zn(II) 2 Cys 6 CDSs were generated by plotting the average GC1 and GC2 (GC12) values against GC3. The neutrality plots depict the effect of mutationselection equilibrium in shaping the codon usage bias of sequences [68]. In neutrality plots, regression with a slope of 0 suggests the absence of directional mutation pressure or complete selective constraints. On the other hand, a slope of 1 indicates the same mutation module between GC12 and GC3 and that complete neutrality was the main element in the evolutionary process [69].

Translational Selection Index (P2)
The translation selection index (P2) provides the efficiency of codon-anticodon interactions and indicates translation efficiency if the information on preferred codon sets is unavailable. The P2 values were calculated with the following formula: P2 = (WWC + SSU)/ (WWC + SSY), where W = A or U, S = C or G, and Y = C or U [70]. A P2 value of more than 0.50 (P2 > 0.50) indicates the preference for translational selection in the given coding sequence.

Correlation and Principal Component Analysis
The association of nucleotide compositions with various codon bias parameters and RSCU of Zn(II) 2 Cys 6 CDSs were investigated through correlation analysis employing SAS 9.2. The principal component analysis was employed to realise the correlations between sequences and codons. After removing the terminal and start codons, viz., UAA, UAG, UGA, UGG, and AUG from every Zn(II) 2 Cys 6 CDS, the data was represented as a 59-dimensional vector, where each dimension corresponded to each sense codon's RSCU [18,71]. The PCA plots were generated with Origin 8.5 (OriginLab, Northampton, USA) software.

Nucleotide Composition Analysis
Detailed knowledge about the nucleotide composition of a coding sequence provides a basis for understanding the codon distribution across genes or species and its association with gene activity. Individual nucleotide composition, frequency of nucleotides at the third position, and overall composition was studied for all the ten target species. The frequency of nucleotide C (cytosine) was highest in all species, followed by A (adenine), G (guanine), then T (thymine). Out of all the four nucleotides, cytosine was the most available nucleotide, with an average value of 28.62 ± 4.04, followed by guanine (25.16 ± 2.91), adenine (24.47 ± 3.73), and thiamine (21.75 ± 3.22). An overall analysis of GC and AT composition showed the predominance of GC-richness in Zn(II) 2 Cys 6 coding sequences. However, C. graminicola, G. tritici, P. oryzae, and V. dahliae showed a higher percentage of GC than other species. Compared with the Ascomycetes group of fungi, S. cerevisiae showed high AT-richness (61.56%), and only 38.44% was contributed by GC percentage ( Table 2). The nucleotide type present at the third position of the codon has been known to be a key determinant of the amino acid; therefore, nucleotide composition at the third position was also critically investigated. Interestingly, at the third position of the codon, cytosine was the most preferred nucleotide, i.e., among GC3 and AT3; cytosine was the most frequently present nucleotide, followed by T, G, and A. G. tritici had the highest GC3%, followed by V. dahliae, C. graminicola, and P. oryzae. The nucleotide composition of Zn(II) 2 Cys 6 coding sequences of all the target species is given in the Supplementary Information (Tables S1-S10).

Relationship between Fungal Species via Clustering Analysis with Zn(II) 2 Cys 6 Coding Sequence Parameters
The clustering analysis divided the target fungal species into two major groups, with S. cerevisiae as an outlier. The first branch contained nine plant pathogenic fungal species belonging to Ascomycetes. The major branch with Ascomycetes was bifurcated into two clusters. The first cluster contained five species, viz. B. maydis, B. oryzae, A. alternata, F. graminearum, and A. flavus, while the other contained G. tritici, P. oryzae, C. graminicola, and V. dahliae (cluster 2). Separate positioning of S. cerevisiae from other fungal species may be attributed to AT abundance in the Zn(II) 2 Cys 6 sequences, in contrast to GC-richness in fungal species belonging to the Ascomycetes group ( Figure 1). Further clustering of fungi was very well-correlated with the CUB indices, where both groups of fungi showed similarities within their groups in terms of values and results, as shown in later subsections. The number of over-represented GC-rich codons was greater in a group comprising G. tritici, whereas there were more AT-rich codons in a group comprising B. maydis.

Relative Synonymous Codon Usage Analysis
To get an insight into codon usage variation, RSCU analysis was conducted, and subsequently, data were classified into different groups based on RSCU values: (1) RSCU > 1.6 were considered to be overrepresented codons or those with a strong preference; (2) RSCU between 1-1.6 were considered high usage frequency of the codon; (3) RSCU between 0.6-1 represented less frequently used codons; (4) RSCU < 0.6 were considered underrepresented. In all the 10 species studied, the presence of A/T rich and G/C rich codons was close to 50%, i.e., either 29 or 30 out of 59 codons. Most high-frequency codons were GC-rich except in S. cerevisiae, where the preference was more toward A or T codons. Similarly, overrepresented codons were most GC-rich except in S. cerevisiae, where all the six strong preferred codons were A/T rich (TTA, AGT, CAA, AAA, TGT, and AGA). G. tritici showed the maximum number of codons with RSCU values below 0.6 (23 A/T rich codons) and above 1.6 (10 G/C rich codons), respectively. Out of the nine Ascomycetes species, the four species showing maximum overrepresented codons belonged to cluster 2, which is already known to be composed of high GC-richness ( Table 3). The frequency of underrepresented codons was more for A/U rather than G/C ended codons, which happened to be 58 and 7, respectively, in a total of all species. Further, the detailed RSCU study revealed that among 59 codons, 10 codons (7 GC-rich codons: CTC, CTG, GTC, GAG, CGC, TGC, and GGC; 3 AT-rich codons: TTC, ATC, and AAG) were either overrepresented or of high usage and were present in all the fungal species except for S. cerevisiae, and 5 (4 GC-rich codons: AGC, GAC, GCC, and ACC; 1 AT-rich codon: TAC) were present in eight of the ten fungal species (Table S11). The complete list of RSCU values for each codon in each species is shown in Table S11 and Figure 2. Sharing the same set of GC-rich codons (CTG, GTC, GAG, and CGC) by all the fungal species belonging to different orders of Ascomycetes highlights the importance of these codons in determining codon usage patterns of Zn(II) 2 Cys 6 sequences. In addition, it reveals that Zn(II) 2 Cys 6 genes have greater preference for G/C-ended codons in comparison with A/T-ended codons.

ENC and ENC Plot
ENC is a parameter used to determine the degree of CUB in a given sequence. ENC values below 35 signifies high codon preference, and above 50 reveal random codon usage [72,73]. The average ENC values of Zn(II) 2 Cys 6 CDSs of the target fungal species ranged from 44.33-58.65, indicating slightly random CUB to no strong codon bias. Further, none of the Zn(II) 2 Cys 6 sequences among all the species, except for C. graminicola (3), G. tritici (14), and V. dahliae (7), had ENC values below 35, indicating the predominance of random codon usage patterns (Tables S1-S10). An inverse association was reported between the codon preference ENC value and gene expression, i.e., a low ENC value means a higher preference for codon bias and higher gene expression, and vice versa [72,73]. Correlation coefficient analysis showed a negative correlation between ENC and GC3, with C. graminicola, G. tritici, V. dahliae, and P. oryzae (cluster 2) being the most strongly negatively correlated compared with other fungal species.
As the GC content of the gene is an important determinant of ENC, an ENC plot was developed to understand the effect of GC3 on codon bias. If mutation was the sole factor responsible for codon bias, then genes were distributed either on the standard curve or above it, which also signified that genes were showing no bias, whereas if codon bias was affected by selection, then genes lay sufficiently below the standard curve [55,73]. Some of the Zn(II) 2 Cys 6 sequences were present on or above the standard curve, which implied that the compositional constraint was one of the essential factors in dictating codon usage, as was evident from the ENC plot of species in cluster 2, viz. C. graminicola, G. tritici, V. dahliae, and P. oryzae species.
On the contrary, in A. alternata, A. flavus, B. maydis, B. oryzae, F. graminearum (cluster 1), and S. cerevisiae, the genes clustered slightly below the standard curve, suggesting not only compositional constraint, but natural selection and other factors played a minor role in determining codon usage patterns (Figure 3). The result was in concordance with studies conducted for the whole genome of the genus Ustilago, Epichloe festucae, Meloidogyne incognita, and A. alternata [46,74,75]. The presence of a GC3 distribution in the range of 0.4-0.9, with S. cerevisiae as an exception, further strengthened the idea of the effect of mutation pressure on codon usage. These current findings were supported by the results of Kawabe and Miyashita [76], in which the GC3 distribution was a deciding factor between directional selection and mutational pressure.

Intrinsic Codon Deviation Index (ICDI)
ICDI is another tool to measure codon usage bias with values of 0 to 1. The genes possessing an ICDI value between 0.3-0.5 are moderately expressed, which means that an ICDI value below 0.3 signifies lower gene expression, which is related to low codon bias, whereas an ICDI value above 0.5 has higher codon bias, hence high gene expression. In the present study, the overall mean ICDI was 0.06-0.26 ± 0.069, which suggested that Zn(II) 2 Cys 6 coding sequences have a low codon bias ( Figure 4A). Despite all the species showing low biasness, if an attempt to compare both the clusters was made, higher values were seen for cluster 2 than 1. This could be linked to ENC results of high and low codon biasness, whereas in ENC analysis, cluster 2 also showed relatively more biasness than cluster 1. The results for S. cerevisiae were intermediate between both clusters.

Codon Adaptation Index (CAI)
CAI is a measure of adaptation of synonymous codon usage of a gene with respect to a reference set of the gene; in other words, it assesses the merits of preferred codons in highly expressed genes [56]. The range set for CAI is 0-1, where a value of 1 corresponds to a gene that utilises a specific set of codons, thereby supporting high codon usage bias. CAI values for Zn(II) 2 Cys 6 sequences varied from 0.651-0.828 ± 0.059, with V. dahliae showing minimum CAI and A. flavus showing maximum CAI ( Figure 4B). Based on the CAI value, it can be postulated that Zn(II) 2 Cys 6 sequences are highly expressed, as it is directly associated with gene expressivity, gene expression levels, adaptation, and codon usage bias [73,77,78].
It has been suggested in many studies that transcription factors belong to the category of essential genes and are also highly expressed. The function of these genes is closely related to optimal codon composition, as it can cut down energy costs and make the gene biologically significant [44]. As stated previously, the zinc binuclear protein family belong to the transcription factor category; thus, it can be inferred that zinc binuclear proteins are highly expressed genes that are well-correlated with high CAI values [4]. A high negative correlation between CAI and ENC further validates our idea of increased gene expression. Simultaneously, a significant positive correlation was also observed with GC and GC3 content (for GC r = 0.55-0.88 and GC3 r = 0.59-0.95), which indicates that codons in Zn(II) 2 Cys 6 sequences are GC-rich. This can be correlated with RSCU values where codons with RSCU > 1 were mainly GC-rich, implying that the gene preferred optimal codons ending with cytosine and guanine over uracil and adenine. An exception to the current observation was S. cerevisiae, which favoured AT-rich codons rather than GC, and showed a positive correlation with AT and AT3. Perseverance of GC-rich codons facilitates pathogen invasion in the host system by promoting gene expression, and this richness of G and C is common in fungal genomes [46,79].

Codon Bias Index (CBI), FoP, SCUO, and COUSIN
Different fungal species had different CBI values; however, they held uniformity in terms of random usage of preferred and non-preferred codons. A. flavus had the least CBI value of 0.053, whereas the maximum was for G. tritici (0.325), i.e., a member of cluster 2 ( Figure 4C). The results suggest low usage of highly expressed codons [54]. FoP (frequency of optimal codons) is also a measurement of usage of preferred or non-preferred codons. A value near 1 is indicative of utilization of preferred codons, whereas a value closer to 0 signifies the rare appearance of optimal codons. In our case, FoP ranged from 0.356-0.536, which could be interpreted as a lower inclination toward optimal codons. However, the FoP value for cluster 2 was greater than for cluster 1 ( Figure 4D). SCUO was calculated to determine codon biasness, and it was found that values were close to 0, indicating less codon biasness. The values were in the range of 0.046-0.189, with an average of 0.092 ± 0.048 ( Figure 4E). The COUSIN index, being another determinant of biasness, revealed that there was a weak to moderate codon biasness, as values corresponding to 0 show equal usage of synonymous codons, 1 shows high codon usage preference, and between 0-1 shows weaker biasness. For the present analysis, the value of the COUSIN index was between 0-1, i.e., 0.367-0.984 ( Figure 4F,G). These CUB indices showed that, in comparison to cluster 1, cluster 2 had a higher degree of codon biasness.  Figure 4I). The mean negative value indicated that the Zn(II) 2 Cys 6 sequences were predominated by codons which coded hydrophilic amino acids.

PR2 Plot Analysis
Mutational force and natural selection are the two important factors shaping the current CUB of coding sequences. The presence of mutational force and natural selection on CUB of sequences was ascertained by the PR2 bias plot analyses. The PR2 bias plot analysis of Zn(II) 2 Cys 6 sequences showed that most data points in plant pathogenic Ascomycetes fungi were plotted in the lower left quadrant of the parity plot, showing that T and C were the nucleotides of choice in the target coding sequences. As these phytopathogens were GC-biased and the PR2 plot showed biasness towards T and C, a general biasness towards C-ending codons was observed ( Figure 5A−I). Contrary to other fungi, in S. cerevisiae, the distribution was in the left and right lower quadrants in the PR2 plot, which showed selectivity towards T-ending codons ( Figure 5J). These results could very well be justified by the nucleotide composition analysis, where cytosine was the predominant nucleotide in the Zn(II) 2 Cys 6 sequences of the Ascomycetes group and T for S. cerevisiae. As the codons do not occupy the centre position in the plot but are deviated from the centre, it is evident that the observed CUB in Zn(II) 2 Cys 6 sequences is not only the function of mutation pressure but also selection pressure. The same was also evident in the case of A. alternata [68]. Further, the dual effect of natural selection and mutation pressure on the dispersal of codons from the centre of the PR2 plot was confirmed in the TP3 gene family [38], Zingiber officinale, and its associated fungal pathogens [79].

Neutrality Plots Analysis
The neutrality plot elucidated the relationship between GC12 and GC3 to determine the influence of mutational pressure and natural selection on CUB usage ( Figure 6). The neutrality plot in our study for all the Ascomycetes species showed that the Zn(II) 2 Cys 6 genes exhibited a wide range of GC3 values, ranging from 48-92%, whereas for S. cerevisiae, this range started from 33%, which was an indication of the effect of dual forces. The slope of the regression for all the fungi was less than 1, i.e., 0.031 (A. alternata), 0.088 (A. flavus), 0.118 (B. maydis), 0.082 (B. oryzae), 0.123 (C. graminicola), 0.098 (F. graminearum), 0.135 (G. tritici), 0.163 (P. oryzae), 0.223 (S. cerevisiae), and 0.055 (V. dahliae) (Figure 5), which meant that the effect of mutation pressure was 3. 1, 8.8, 11.8, 8.2, 12.3, 9.8, 13.5, 16.3, 22.3, and 5.5%, respectively. These values indicate that codon bias was affected less by mutational pressure and more by natural selection. Further, there was no significant correlation between GC12 and GC3, which further confirmed the supremacy of natural selection over mutational pressures [36,68].

Translational Selection Index (P2)
The interaction efficiency of codon-anticodon was screened by P2 analysis, where a value above 0.5 indicated the pronounced effect of translational selection during codon usage. Data generated revealed that for C. graminicola, G. tritici, P. oryzae, V. dahliae (cluster 2), and S. cerevisiae (Table 4), the values were less than 0.5, which meant that mutational pressure showed more influence on CUB of Zn(II) 2 Cys 6 sequences compared with other species. This was consistent with the high GC and GC3 content, except for S. cerevisiae, which was AT-rich. In A. alternata and F. graminearum (cluster 1) (Table 4), this value was greater than 0.5, which gave a clear indication of the higher influence of translational selection. For A. flavus, B. maydis, and B. oryzae (cluster 1) (Table 4), the P2 was equal to 0.5; however, this was the mean of P2 values of the number of CDS. When P2 values for each CDS of these species were analysed, it was found that a higher number of CDS had P2 > 0.5, which indicated that these species were more inclined toward translational selection (Table S12).

Principal Component Analysis
Principal component analysis was conducted to determine the trends in codon usage for Zn(II) 2 Cys 6 sequences. It was visualised that axis 1 and axis 2 were the major contributors to variance, followed by axis 3 and 4; the remaining axes hold less responsibility for codon usage variation. Axis 1 accounted for the maximum variation in the range of 12.06-36.01%. The contribution bestowed by both axes in each fungal species is listed in Table S13, which shows that, compared with F. graminearum, A. alternata, and A. flavus (cluster 1), axis 1 had a more pronounced effect in G. tritici, C. graminicola, and P. oryzae (cluster 2). Each of the coloured circles represent an individual Zn(II) 2 Cys 6 gene, with each colour being representative of a fungal species. The circles lay across the four quadrants, mainly concentrated near the axis; also, there was an instance of overlapping within the fungal species (Figure 7). Despite differences in sequences, all the fungal species shared similar codon usage patterns, to an extent. The incidence of some circles scattered away from the axis may be marked as the effect of other evolutionary forces, such as natural selection.

Correlation Analysis of CUB Indices
A scrupulous study was conducted to ascertain the relationship between different CUB indices, which would help to understand the pattern of codon usage and the factors influencing it. It has already been established that CAI and ENC share a negative relationship. CAI strongly correlated with overall GC content, individual GC1, GC2, GC12, GC3 components, and FoP. The maximum r value for GC3/CAI correlation was r = 0.97 *** for G. tritici, and Fop/CAI correlation was r = 0.97 *** for C. graminicola. The relation between FoP and GC3 was also positive. The CAI/FoP marked a positive correlation with GC-rich indices; however, they showed a strong negative correlation with AT and AT3 (Table S13-S22). All the fungal species showed a similar correlation pattern except for S. cerevisiae, which responded in an opposite manner to Ascomycetes fungal counterparts, i.e., CAI and FoP were positively correlated with AT and AT3 and negatively correlated with GC, GC1, GC2, and GC3. For the ENC parameter, it showed a positive correlation to AT and AT3, whereas is had a negative correlation with CAI, FoP, GC, GC1, GC2, and GC3 for all nine Ascomycetes species with varying r values (Tables S13-S22). For ENC, the results for S. cerevisiae were contrary to what was found for the other fungal species. Analysis of its relationship with other parameters showed a strong negative correlation with ENC, where it had a positive relation with CAI, Fop, ICDI, and GC3 indices. The positive relation between SCUO and GC3 was more pronounced in the fungal species of cluster 2 than cluster 1, signifying the role of mutational pressure on CUB in these species. The COUSIN indices exhibited a strong positive relation with CAI, SCOU, and GC3 and a negative relation with ENC and axis 1, implying that compositional constraints played a role in CUB determination.
The GRAVY and AROMA scores were also correlated with other CUB Indices. The correlation of GRAVY and AROMA scores with other CUB indices was variable among the ten species. Gene length showed a significant negative correlation with GRAVY in A. alternata, A. flavus, C. graminicola, and S. cerevisiae and was significantly positively correlated with AROMA for A. alternata, A. flavus, B. maydis, G. tritici, P. oryzae, and V. dahliae; however, the association in other species were nonsignificant (Tables S14-S23). Axis 1 exhibited a considerable positive correlation with GRAVY for cluster 1, except A. flavus, and a negative correlation existed in P. oryzae and V. dahliae. On the other hand, no significant correlation was observed between axis 1 with AROMA and between axis 2 with either AROMA or GRAVY scores. AROMA had no significant relation to ENC and CAI. At the same time, hydrophobicity was positively correlated to ENC for cluster 1, except A. flavus, and negatively correlated with P. oryzae and V. dahliae, and vice versa for CAI. The results indicated that hydrophobic proteins had weaker codon bias for cluster 1 and stronger codon bias for species of cluster 2. ENC values showed strong positive correlation with axis 1 (r = 0.67-0.97, p > 0.01) and equally strong negative correlation coefficients with CAI (r = −0.63 to −0.97, p > 0.01). Thus, it can be inferred that ENC may be one of the major factors in determining codon bias. The influence of CDS length on axis 1, CAI, and ENC could not be adequately determined as the correlation was significant for some species and nonsignificant for other species. However, based on the available information, gene length was positively correlated with CAI and negatively correlated with axis 1 genes with a longer length and higher expression level occupying the right side of the first axis. This observation was found for A. alternata, C. graminicola, F. graminearum, and S. cerevisiae, showing significant values. Conclusively significant correlations between AROMA, GRAVY, ENC, CAI, and axis 1 suggest an influential role of translational selection, especially in cluster 1 species, which was consistent with translational selection index results.

Discussion
The strong relation of CUB indices with GC3 was evident through CAI, ICDI, and FoP analyses, suggesting an important role of compositional constraint in determining codon biasness. Codon usage is known to be shaped by nucleotide composition [80]. In our study, we found that cytosine was most prominent among all the nucleotides (overall, as well as at the 3rd position). All of the nine Ascomycetes plant pathogenic fungi exhibited a high level of GC% and GC3%, despite having varying levels of GC-richness, i.e., cluster 2 was more GC-rich than cluster 1 ( Table 2). This implied that depending on the recombination rate, GC heterogeneity and GC bias did influence CUB [81,82]. Furthermore, through RSCU analysis, we found that the most preferred and overrepresented codons mainly were GC-rich with cluster 2 ( Table 3). Sharing of ten highly preferred codons by all the nine Ascomycetes, out of which seven were GC-rich codons, CTC, CTG, GTC, GAG, CGC, TGC, and GGC, and three AT-rich codons, TTC, ATC, and AAG, was direct evidence of the conservation of these codons during the course of evolution and the importance of these codons for Zn(II) 2 Cys 6 expression (Table S11). S. cerevisiae was found to be AT-rich, and this variation was responsible for keeping it out of the clusters of Ascomycetes. However, results from the neutrality plot, ENC plot, and PR2 showed different aspects of the story. The neutrality plot can be referred to as a tool to establish the influence of mutation pressure over natural selection. The ratio of mutational pressure to natural selection turned out to be 0.03 (A. alternata), 0.10 (A. flavus), 0.13 (B. maydis), 0.09 (B. oryzae), 0.14 (C. graminicola), 0.10 (F. graminearum), 0.15 (G. tritici), 0.19 (P. oryzae), 0.28 (S. cerevisiae), and 0.05 (V. dahliae) ( Figure 6). The low ratios indicated that the codon usage pattern was driven more by natural selection and less by mutational forces. Several organisms have been studied where CUB is more often a function of natural selection, such as in Calypogeiaceae, Marchantiophyta, and others [83,84]. Drifting codons from the centre and not concentrating at the centre can be further associated with the involvement of forces other than mutational pressure, as shown in PR2 plots ( Figure 5). ENC plots clarified the involvement of evolutionary forces responsible for biasness where the occurrence of genes below the standard curve indicated the role of natural selection along with compositional constraint (cluster 1). In contrast, being on or above the curve indicated an influential role of mutational pressure (cluster 2). P2 analysis data also showed a mutual role, and similar with the ENC plot, both fungal clusters exhibited different levels of forces acting on them. Cluster 2 were majorly P2 < 0.5 and cluster 1 were mostly P2 > 0.5, indicative of the idea that CUB may vary for the same gene family across the genomes. Overall, it can be inferred that codon biasness results from dual forces with a major impact posed by natural selection. Our results are in accordance with various other studies on eukaryotes and prokaryotes [47,75,85]. Additionally, it has been suggested that CUB results from a mutual partnership between selection and mutation, which is balanced by various unknown forces at different levels of organisation [19,20].
A strong association of GC3 with SCUO confirmed the variation of codon usage orders among the fungal species and that GC3 was one of the key determinants of codon bias in these Zn(II) 2 Cys 6 proteins. As the GC biasness increases, the CUB also increases. The positive correlation between GC3, overall GC, and SCUO has also been reported in previous studies [86].
Critical analysis of all the CUB indices in the ten species highlighted the association between evolution and codon usage patterns. Clustering analysis resulted in the generation of two branches, with S. cerevisiae as an outlier. The first or major branch was divided into two clusters. Interestingly the results of the CUB analysis could also be divided into two parts which coincided well with the clustering pattern of species. For instance, GC-richness, CAI, FoP, and SCUO were greater in all four species of cluster 2, whereas incidences of translational selection with a higher P2 index were greater in all five species of cluster 1. The results can be justified by the argument that CUB is the result of collective actions of mutation, natural selection, and genetic drift, which shape the evolution of genomes [87]. CUB reflects the origin, mutation, and evolution of genes and can be used to determine the evolutionary pattern among genes, species, organisms, etc., as closely related organisms are expected to have similar CUB patterns [81,88]. The various species of fungi used in the present study are the causative agents of devastating plant diseases, such as leaf spot (A. alternata), southern corn leaf blight and stalk rot (B. maydis), blast (P. oryzae), etc. The Zn(II) 2 Cys 6 protein is one of the potential causative factors for inducing infection in a plant. It is an exciting target to study in terms of its codon usage in these fungi. Badet et al. [42] reported that optimal codons channel the adaptation and colonisation of parasites to their respective hosts. Less information is available on plant-fungal interactions; however, extensive work has been conducted on viruses and parasites based on the importance of codon optimisation for Zn(II) 2 Cys 6 . Myco-reovirus isolated from Cryphonectria parasitica (chestnut blight fungus), along with other myco-reovirus, showed evidence of codon biasness for XYG+XYC and established that CUB in reovirus, and their respective hosts, would have been adapted during evolution [89]. Selection of optimal codons to adapt themselves to their host environments is an integral part of viral evolution, as was evident from various studies conducted on reoviruses, bacteriophages, mammalian viruses, plant viruses, etc. [89][90][91][92]. In nature, the sharing of synonymous CUB patterns by the host plants and their respective pathogens could be an outcome of common mutational bias or natural selection driven by evolutionary forces. Similarity in CUB patterns was observed between dicot plants and infecting viruses [93]. The same kind of codon usage adaptation of pathogens toward plants was also perceived by fungal systems. For instance, interaction of Crocus sativus with Aspergillus fumigatus and Fusarium oxysporum, and Z. officinale with A. flavus, A. niger, and F. oxysporum showed similar CUB indices [79,94]. ENC and CAI are the critical determinates of gene expression level, indicating the important role of CUB in deciding expression of genes. Any adjustment in the CUB of genes associated with virulence will trigger expression and ultimately affect pathogenicity.
Colonisation in the host is mediated by the efficacy of degrading enzymes, which are a function of secretory proteins regulated by codon optimisation [42]. Any abnormality in these proteins negatively affects the host's colonisation ability [95]. Thus, codon optimisation under the influence of translational selection regulates the colonisation and infectivity level of virulence factors in homogeneous and heterogenous host systems. Thus, understanding the CUB pattern of Zn(II) 2 Cys 6 would help to understand its role in causing plant infection. Furthermore, collaborating codon optimisation with modern day synthetic biology will add a feather to the cap. Codon optimisation gives an idea of an ideal gene that should function/express with a set of rules and regulations. At the same time, synthetic biology provides the platform to design the ideal gene. Synthetic biology is based on the concept of design-build-test-repeat, and bio-bricks form the basis of it [96][97][98]. The development of semi-synthetic artemisinin has been considered a breakthrough in the production of a potent anti-malarial drug [96]. Zn(II) 2 Cys 6 gene codons serve as parts; any modulation and re-arrangement in it will affect amino acids (device), which will lead to the rewiring of the genetic circuit (system), and ultimately impact the organism. Synthetic biology has already been applied to the production of amino acids (glutamic acid, lysine, methionine, lysine, etc.), using different chassis organisms [99]. Recently, by adding extra copies of six tRNA genes corresponding to E. coli CGG, GGA, CUA, CCC, AGA/AGG, and AUA minor codons in the BL21 strain of E. coli resulted in increased growth rate and higher expression of potential genes, subsequently enhancing translation rate in comparison with other non-modified strains [100]. Pathogens and hosts employ amino acids, which are biosynthetically cost-effective so that the saved energy can be channelized to impart more virulence or resistance to the system. Substitution of codons for amino acids, which would help to lower virulence in pathogens or increase resistance in plants, can be mediated to achieve disease-free plants. Synthetic genes based on codon optimisation parameters of a pathogen can be developed, which may help in imparting resistance to the plant. Alternately artificial genes may be constructed, which could impair the virulence of pathogens.

Conclusions
The present study was attempted to understand the codon usage pattern of the important transcription factor coding the Zn(II) 2 Cys 6 family, which is unique to fungi. The current study is the first of its kind, where a fungal-specific Zn(II) 2 Cys 6 gene family has been studied for codon usage bias among plant pathogenic Ascomycetes species in relation to model fungi yeast. The current investigation showed the influence of codon usage bias and nucleotide composition on the divergence of the Zn(II) 2 Cys 6 family between Ascomycetes and the model yeast system and within Ascomycetes species. Further, we found a combined influence of mutation pressure and natural and translational selections on codon usage bias of the Zn(II) 2 Cys 6 family. The study also identified common higher-represented codons specific to Ascomycetes and model fungi S. cerevisiae. The CUB of Zn(II) 2 Cys 6 sequences are directly relevant to the expression levels of genes. The preferable codons present in the genes could be targeted to decipher the molecular basis of the infection process and host-pathogen interactions through gene editing or knockouts.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jof8111134/s1, Table S1: The composition and various codon usage bias parameters of 133 Zn(II) 2 Cys 6 sequences from A. alternata; Table S2: The composition and various codon usage bias parameters of 348 Zn(II) 2 Cys 6 sequences from A. flavus; Table S3: The composition and various codon usage bias parameters of 192 Zn(II) 2 Cys 6 sequences from B. maydis ;  Table S4: The composition and various codon usage bias parameters of 177 Zn(II) 2 Cys 6 sequences from B. oryzae; Table S5: The composition and various codon usage bias parameters of 193 Zn(II) 2 Cys 6 sequences from C. graminicola; Table S6: The composition and various codon usage bias parameters of 273 Zn(II) 2 Cys 6 sequences from F. graminearum; Table S7: The composition and various codon usage bias parameters of 152 Zn(II) 2 Cys 6 sequences from G. tritici; Table S8: The composition and various codon usage bias parameters of 158 Zn(II)2Cys6 sequences from P. oryzae; Table S9: The composition and various codon usage bias parameters of 55 Zn(II) 2 Cys 6 sequences from S. cerevisiae; Table S10: The composition and various codon usage bias parameters of 135 Zn(II) 2 Cys 6 sequences from V. dahliae; Table S11: Relative synonymous codon usage bias analyses of Zn(II) 2 Cys 6 sequences from all ten target fungal species with emphasis on overrepresented and more frequently used codons; Table S12: The number and percentage of Zn(II) 2 Cys 6 sequences from all ten target species showing translational selection index greater or lesser than 0.50; Table S13: The percentage of variations explained by axis 1 and axis 2 from principal component analysis of RSCU of Zn(II) 2 Cys 6 sequences in target fungal species; Table S14: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from A. alternata; Table S15: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from A. flavus; Table S16: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from B. maydis; Table  S17: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from B. oryzae; Table S18: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from C. graminicola; Table S19: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from F. graminearum; Table S20: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from G. tritici; Table S21: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from P. oryzae; Table S22: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from S. cerevesiae; Table S23: The correlation coefficients among the codon bias parameters and important compositional parameters of Zn(II) 2 Cys 6 sequences from V. dahliae. Data Availability Statement: All raw data included in the present investigation were mined from public databases. All analysed and supporting data is given in Supplementary File.