Insight into Codon Utilization Pattern of Tumor Suppressor Gene EPB41L3 from Different Mammalian Species Indicates Dominant Role of Selection Force

Simple Summary The present study envisaged the codon usage pattern analysis of tumor suppressor gene EPB41L3 for the human, brown rat, domesticated cattle, and Sumatran orangutan. Most amino acids are coded by more than one synonymous codon, but they are used in a biased manner. The codon usage bias results from multiple factors like compositional properties, dinucleotide abundance, neutrality, parity, tRNA pool, etc. Understanding codon bias is central to fields as diverse as molecular evolution, gene expressivity, protein translation, and protein folding. This kind of studies is important to see the effects of various evolutionary forces on codon usage. The present study indicated that the selection force is dominant over other forces shaping codon usage in the envisaged organisms. Abstract Uneven codon usage within genes as well as among genomes is a usual phenomenon across organisms. It plays a significant role in the translational efficiency and evolution of a particular gene. EPB41L3 is a tumor suppressor protein-coding gene, and in the present study, the pattern of codon usage was envisaged. The full-length sequences of the EPB41L3 gene for the human, brown rat, domesticated cattle, and Sumatran orangutan available at the NCBI were retrieved and utilized to analyze CUB patterns across the selected mammalian species. Compositional properties, dinucleotide abundance, and parity analysis showed the dominance of A and G whilst RSCU analysis indicated the dominance of G/C-ending codons. The neutrality plot plotted between GC12 and GC3 to determine the variation between the mutation pressure and natural selection indicated the dominance of selection pressure (R = 0.926; p < 0.00001) over the three codon positions across the gene. The result is in concordance with the codon adaptation index analysis and the ENc-GC3 plot analysis, as well as the translational selection index (P2). Overall selection pressure is the dominant pressure acting during the evolution of the EPB41L3 gene.


Introduction
Tumor suppressor genes (TSGs) are the genes that keep the check on the genes that are responsible for cell cycle progression. These genes further couple the DNA damage to the cell cycle, so that until DNA damage is repaired, the cell does not enter the cell division process. Further, if the damage is irreparable, these genes function in the direction of apoptosis. Inactivation of these genes removes/downregulates negative cues (inhibitory factors) of cell proliferation and contributes to unusual cell growth and division that leads to tumor development. The inhibitory molecules enciphered by most antioncogene (TSGs) inhibit cell division or endurance. Examples of tumor suppressor proteins are retinoblastoma protein (RB1), tumor protein p53 (TP53), B-cell leukemia/lymphoma 2 (BCL2), breast cancer 2 (BRCA2), etc. Inactivated Rb is involved in carcinomas of the bladder, breast, and lung. The TP53, INK4, and PTEN were found to be involved in lung cancer, prostate cancer, and melanoma. The adenomatous polyposis coli and MADR2/Smad2 genes are found altered in cases of colorectal cancers. Inactivation of the erythrocyte membrane protein band 4.1-like 3 (EPB41L3) gene through methylation is involved in breast cancer and renal clear cell carcinoma [1]. In the case of esophageal squamous cell carcinoma (ESCC), a cancerous tumor [2], EPB41L3 expression is reduced. In the case of EPB41L3 protein expression in ESCC host cells, inhibition of cellular proliferation, induction of apoptosis, and G2/M cell cycle arrest were observed via activation of caspase-3/8/9 and cyclin-dependent kinase 1/cyclin B1 signaling [3]. It alters the function of arginine N-methyltransferase proteins like PRMT3 and PRMT5.
EPB41L3 is also known as 4.1B or DAL1/4.1B and was found to disrupt the development of brain, breast, and lung tumor cells [4]. Loss of EPB41L3 has been found linked with the invasiveness and metastatic ability of non-small-cell lung cancer (NSCLC) causing carcinoma cells [5]. For the EPB41L3 gene, more than 20% methylation is called hypermethylation. The hypermethylation was found to be correlated with the advancement and continuation of NSCLC and an indicator of poor prognosis [6]. One of the distinctive aspects of the growth of brain tumors is the hypermethylation of TSGs [7]. The gene EPB41L3 is hypermethylated (29% hypermethylation) only in tumors, confirming its cancerspecific role [7]. It has been reported that EPB41L3 suppresses tumor metastasis and matrix metalloproteinase-2/9 activity in esophageal squamous cell carcinoma [3]. How EPB41L3 is associated with a tumor-suppressive role is poorly understood; however, it has a prognostic value. EPB41L3 is a protein-coding gene located on chromosome 18 (cytogenetic band 18p11.31). It consists of 1087 amino acids and has a molecular mass of 121 KDa.
The study of codon usage patterns in a particular gene or genome is essential for grasping the evolutionary features, identifying high-frequency, preferred, or underrepresented codon pairs, and to investigate the expression-linked codon usage patterns. Nucleotide composition plays a crucial role in shaping codon usage while the GC content is one of the key factors during the evolution of genomic structures [8]. It is well-known that most amino acids can be coded by synonymous codons that differ at only the third nucleotide positions due to the redundancy of the genetic code and thus are good indicators of the extent of synonymous codon usage bias. The change of the nucleotide at the third position of the codon does not change the amino acid; hence, a codon is changed, but its meaning in terms of the amino acid is not changed. The study of the third position of the codon also indicates the role of mutational pressure on codon usage bias. The study of the first and second nucleotide positions of the codon is also important since change of the base here leads to the change in the amino acid, too. These two positions (the first and the second positions of the codon) are indicative of selection forces. Many of the recent bioinformatics and experimental studies have revealed that the frequencies of the synonymous codon usage vary between different genes within and across organisms [9] due to evolutionary forces like mutational and selection pressure, etc. [10]. The recent advances in sequencing technologies and the availability of coding sequences (CDSs) in GenBank have empowered the comprehensive study of codon usage bias (CUB) indices of genes. The current study pivots on the comprehensive analysis of the codon utilization trends of the EPB41L3 genes in five mammalian species as codon usage by any organism is inconsistent in the midst of varying species [11], in-between kindred species and genes [12], and predominantly related to gene function [13]. The prime motive of the study was to computationally inspect the circumstances accountable for shaping codon usage in the EPB41L3 gene across the envisaged species through several methods including relative synonymous codon usage (RSCU), effective number of codons (ENc), nucleotide composition, and phylogenetic analysis.
Our study has provided a valuable acumen into the codon utilization trends of the EPB41L3 gene that is supposed to enhance our understanding of the tendencies in the preferential utilization of codons among the mammalian species as well as the significance of forces of evolution in shaping the codon usage.

Sequence Data Retrieval
A total of 34 complete transcripts of the EPB41L3 gene of five mammalian species accessible at the NCBI (http://www.ncbi.nlm.nih.gov/GenBank; accessed on 2 October 2020) were retrieved (FASTA sequences) and used. The dataset was created with special diligence; only those CDSs that have a start and a stop codon without undisclosed base throughout its length, excluding partial and intercalary stop codon CDSs, were taken into account. Finally, we retrieved 34 EPB41L3 transcripts that fulfil the criteria in five species of the class Mammalia, namely Homo sapiens (human), Rattus norvegicus (brown rat), Bos taurus (aurochs or domesticated cattle), Mus musculus (house mouse), and Pongo abelii (Sumatran orangutan), and cumulatively encompass 29,660 codons that were counted with the help of the Codon Usage Database (https://www.kazusa.or.jp/codon/countcodon.html; accessed on 2 October 2020). The full attributes of the datasets examined in the present work are provided in Table S1.

Nucleotide Content Analysis
The computed values for EPB41L3 transcripts of major compositional values such as overall GC (guanine and cytosine content) with GC1% (guanine and cytosine content at the first codon position), GC2% (guanine and cytosine content at the second codon position), and GC3% (guanine and cytosine content at the third codon position) along with the frequencies of A, T, G, and C including the nucleotides at the third codon position (A3%, T3%, G3%, and C3%) were analyzed. Apart from these, average GC3 contents were considered to study the effect of base compositional bias [14].

Relative Dinucleotide Abundance Analysis
Dinucleotide pair frequency has a significant role in codon usage as it is often used to establish an association between the dinucleotide bias and the CUB. Dinucleotide pair's frequencies were calculated with the help of the EMBOSS explorer (http://emboss. bioinformatics.nl/cgi-bin/emboss/compseq; accessed on 3 October 2020). Dinucleotide frequency patterns help to understand both selection and mutational pressures [15]. The odds ratio (observed/expected dinucleotide frequency) below 0.78 is considered to signify underrepresentation while a ratio over 1.23 signifies overrepresentation [16,17].

Relative Synonymous Codon Usage Analysis
The RSCU values of all the 34 mRNA transcripts that included humans along with four other different mammalian species (Table 1) were calculated. The RSCU values were assessed using the CAIcal server [18]. Codons having the RSCU values less than 0.6 and more than 1.6 are known to be underrepresented and overrepresented respectively [19], whereas the values ranging in-between are considered unbiased. The values with more frequently used codons (RSCU > 1) corresponding to each amino acid of the gene transcripts among each species are highlighted in bold and asterisk-marked and the values highlighted in red were found overrepresented (RSCU > 1.6). Bold codons showed shared preferred synonymous codons across the envisaged mammalian species.

Effective Number of Codons Analysis
Synonymous codons are referred to as two or more than two codons coding for a particular amino acid (except for Met and Trp). Wright (1990) [20] used ENc to identify the predilection in synonymous codon usage. ENc values range between 20 and 61 where ENc value 20 represents the absolute bias, i.e., for a particular amino acid, only one codon is used, while 61 indicates lack of bias, indicating identical use of all the synonymous codons, i.e., all the possible codons are used without any preference. However, the ENc value ≥35 represents a significant codon bias [21]. The ENc values are associated with translational efficiency of the respective genes [22] because of the utilization of the optimal codon among the synonymous codons [23]. The ENc values for all the transcript variants of the EPB41L3 gene across the species were calculated individually with the help of the CAIcal server (http://genomes.urv.es/CAIcal/; accessed on 2 October 2020).

Neutrality Plot
A plot of neutrality was used to determine the role of mutational force in the CUB against other evolutionary forces. The scatter plot was drawn between GC12 (Y-axis) and GC3 (X-axis). The regression line slope approaching 1 implies absolute neutrality and demonstrates a confined GC3 dispensation with the slope of the regression line reaching zero when mutational pressure overcomes the selection pressure during the evolutionary process [24].

Parity Rule Two Bias Plot Analysis
Ina PR2 bias plot (scatter diagram with four quadrants), the abscissa is the GC bias [G3/(G3 + C3)] at the third position of the base in a codon whilst the ordinate is the AT bias [A3/(A3 + T3)] at the third position of the base in a codon [25,26]. In the plot, the coordinate of the center is (0.5, 0.5), which demonstrates no biasness between mutation and selection rates, while the extent of deviation from the center indicates biasness [25,27].

Codon Adaptation Index Analysis
CAI is an effective method to predict a gene's level of expression centered on how often a preferred codon is used. The value ranges from 0 to 1. CAI value 1 represents the highest relative adaptiveness. The higher the CAI value, the higher the gene expression potential and the CUB [15,28]. The CAI values were calculated for all the transcript variants of the EPB41L3 gene across the species individually with the help of the CAIcal server (http://genomes.urv.es/CAIcal/; accessed on 2 October 2020).

ENc-GC3 Plot
To explore the synonymous codon utilization trend of the EPB41L3 gene under the presence of evolutionary forces, the ENc-GC3 plot (where Enc-ordinate, GC3-abscissa) was drawn. The mutation is the primary factor influencing codon utilization as the resulting points collapse onto the anticipated curve, while the selection is the leading force in configuring codon utilization if the relevant points slip substantially below the anticipated curve [15,29].
The expected ENc values were calculated with the equation below where 's' represents the frequency of GC3 codons [20]. where W = A or T, S = C or G, and Y = C or T. A value of more than 0.5 reveals the bias towards translational selection [30].

Abundance Analysis of tRNA
For a single amino acid, different tRNA isotypes in various species bind to different codons. It is hypothesized that the most preferred codons are recognized by most abundant isoacceptor tRNAs, indicating the role of selection pressure [29]. The tRNA frequencies of each mammalian species were retrieved using GtRNAdb (genomic tRNA database).

Nucleotide Composition in the EPB41L3 Gene Indicated G/C-Ending Codons Preference
In the current study, we explored 34 transcripts comprising 29,660 codons and 88,980 nucleotides of the EPB41L3 gene across five different mammalian species. Insights into the compositional properties of the transcripts disclosed that the overall A% (29.95 ± 0.95%) was preeminent, followed by G% (27.05 ± 0.40%) ( Table S2). The nucleotide contents, particularly at the third codon position (A3, U3, G3, and C3), showed the overall G3% (29.65 ± 1.11%) was the highest, followed by A3% (25.45 ± 1.47%). This result supported that there might be more usage of the nucleobases A and G (purines) over the nucleobases C and U (pyrimidines) among the codons of the EPB41L3 gene. However, the mean GC% of 50.62 ± 1.23% ( Table 2) and AU% of 49.37 ± 1.23% (Table S2) revealed almost equal GC% and AU% content. Further nucleotide composition analysis indicated that the mean GC3% and AU3% compositions were 53.95 ± 2.27% (ranging between 51.60-63.10) and 46.05 ± 2.28% (ranging between 36.86-48.37), respectively (Table S2). Consequently, insight into the overall nucleotide content usage supported the G/C-ending codons preference over the codons ending with A/U in the EPB41L3 gene among the envisaged species. The minimum, maximum, and mean values of nucleobases along with the nucleobases at the third position of the EPB41L3 gene are presented graphically in Figure 1a,b.

Relative Dinucleotide Abundance Analysis Indicated GpA as the Most Abundant Dinucleotide Owing to Overall High GA Nucleotide Content
Dinucleotide composition is an efficient tool to predict bias as the compositions typically vary across species and are strongly symmetrical within a single genome [31]. Hence, often, the genome's odds ratio profile (as discussed above) pertains to its genomic characteristics [32]. In the present study, the most abundant dinucleotide was found to be GpA with an odds ratio of 1.524, reflecting high GA content in the EPB41L3 gene, whereas dinucleotide pair UpA with an odds ratio of 0.522 was the lowest and underrepresented

Relative Dinucleotide Abundance Analysis Indicated GpA as the Most Abundant Dinucleotide Owing to Overall High GA Nucleotide Content
Dinucleotide composition is an efficient tool to predict bias as the compositions typically vary across species and are strongly symmetrical within a single genome [31]. Hence, often, the genome's odds ratio profile (as discussed above) pertains to its genomic characteristics [32]. In the present study, the most abundant dinucleotide was found to be GpA with an odds ratio of 1.524, reflecting high GA content in the EPB41L3 gene, whereas dinucleotide pair UpA with an odds ratio of 0.522 was the lowest and underrepresented (less than 0.78) ( Table 3). Relative dinucleotide frequencies among the EPB41L3 transcripts are shown in Figure 2. (less than 0.78) ( Table 3). Relative dinucleotide frequencies among the EPB41L3 transcripts are shown in Figure 2. Table 3. The occurrence of the dinucleotide odds ratio in the EPB41L3 gene. The odds ratio was calculated by dividing each observed frequency by the expected frequency of dinucleotides. The odds ratio values highlighted with green and red fonts correspond to the maximum and minimum values for the GpA and UpA dinucleotides, respectively.

Relative Synonymous Codon Usage in the EPB41L3 Gene Revealed Preference of GpA-Ending Codons across the Selected Mammalian Species
Each codon's RSCU value was computed to interpret how recurrently G/C-ending codons could be favored ( Table 1). The average RSCU values of all the synonymous codons corresponding to 18 amino acids of the EPB41L3 gene along with the selected individual mammalian species were analyzed. The overall RSCU value analysis of the EPB41L3 gene suggested that 31 codons were frequently used (RSCU > 1), where G/Cending codons (18) were predominantly used in contrast to A/U-ending codons (13) (Table 1). Additionally, C-ending codons (10) were preferred over G-ending codons (8) in the EPB41L3 transcripts across the envisaged mammalian species. The findings also indicate

Relative Synonymous Codon Usage in the EPB41L3 Gene Revealed Preference of GpA-Ending Codons across the Selected Mammalian Species
Each codon's RSCU value was computed to interpret how recurrently G/C-ending codons could be favored ( Table 1). The average RSCU values of all the synonymous codons corresponding to 18 amino acids of the EPB41L3 gene along with the selected individual mammalian species were analyzed. The overall RSCU value analysis of the EPB41L3 gene suggested that 31 codons were frequently used (RSCU > 1), where G/C-ending codons (18) were predominantly used in contrast to A/U-ending codons (13) ( Table 1). Additionally, C-ending codons (10) were preferred over G-ending codons (8) in the EPB41L3 transcripts across the envisaged mammalian species. The findings also indicate the GC content preference as usually preferred among the eukaryotic genomes in the EPB41L3 gene amongst all the five envisaged mammalian species. Among the 31 most frequently used codons, the present study revealed that all the UpA-ending codons were not preferred (RSCU value below 1), whereas all the GpA-ending codons were preferred (RSCU value above 1) across the selected mammalian species. Moreover, among the most frequently used codons, 14 codons, CUG (leucine), AUC (isoleucine), GUG (valine), ACC (threonine), GCC (alanine), UAC (tyrosine), CAC (histidine), CAG (glutamine), AAC (asparagine), Cancers 2021, 13, 2739 9 of 23 GAC (aspartic acid), GAG (glutamic acid), CGC, AGA (arginine), and GGG (glycine), were found common across the selected mammalian species, depicting the evidence of a shared codon preference. More insight into RSCU values (Table 1) showed that the codons CUG, GUG (except for Pongo abelii), CAG (except for Homo sapiens and Pongo abelii), and UCU (except for Homo sapiens, Bos taurus, and Pongo abelii) were overrepresented while some codons such as AUC, UCC, ACC, and GCC (Bos taurus), CCA (Pongo abelii), UAC (Mus musculus), and AGA (Rattus norvegicus) were found overrepresented for individual species. The matrix plot (Figure 3a) drawn using the average RSCU values of codons showed a remarkable difference in codon usage in the EPB41L3 gene. Furthermore, the heat map represented the same inference that G/C-ending codons were favored over A/U-ending codons among the synonymous codons of the EPB41L3 gene (Figure 3b). The maximum overall RSCU value was observed for the codon GUG (1.88; valine) preceded by CUG (1.86; leucine) across the selected mammalian species. The RSCU values >1 (more frequently used codons) are highlighted in bold and asterisk-marked, the shared preferred synonymous codons across the envisaged mammalian species are highlighted in bold, and the overrepresented codons are highlighted in red (Table 1).
EPB41L3 gene amongst all the five envisaged mammalian species. Among the 31 most frequently used codons, the present study revealed that all the UpA-ending codons were not preferred (RSCU value below 1), whereas all the GpA-ending codons were preferred (RSCU value above 1) across the selected mammalian species. Moreover, among the most frequently used codons, 14 codons, CUG (leucine), AUC (isoleucine), GUG (valine), ACC (threonine), GCC (alanine), UAC (tyrosine), CAC (histidine), CAG (glutamine), AAC (asparagine), GAC (aspartic acid), GAG (glutamic acid), CGC, AGA (arginine), and GGG (glycine), were found common across the selected mammalian species, depicting the evidence of a shared codon preference. More insight into RSCU values (Table 1) showed that the codons CUG, GUG (except for Pongo abelii), CAG (except for Homo sapiens and Pongo abelii), and UCU (except for Homo sapiens, Bos taurus, and Pongo abelii) were overrepresented while some codons such as AUC, UCC, ACC, and GCC (Bos taurus), CCA (Pongo abelii), UAC (Mus musculus), and AGA (Rattus norvegicus) were found overrepresented for individual species. The matrix plot (Figure 3a) drawn using the average RSCU values of codons showed a remarkable difference in codon usage in the EPB41L3 gene. Furthermore, the heat map represented the same inference that G/C-ending codons were favored over A/U-ending codons among the synonymous codons of the EPB41L3 gene (Figure 3b). The maximum overall RSCU value was observed for the codon GUG (1.88; valine) preceded by CUG (1.86; leucine) across the selected mammalian species. The RSCU values > 1 (more frequently used codons) are highlighted in bold and asterisk-marked, the shared preferred synonymous codons across the envisaged mammalian species are highlighted in bold, and the overrepresented codons are highlighted in red (Table 1).

Neutrality Plot Showed Dominance of Selection Pressure
In this study, to figure out the strength of mutational and selection forces in determining the CUB of the EPB41L3 gene, neutrality analysis was performed [29]. A change at the third codon position results in a synonymous codon, that is, a corresponding amino

Neutrality Plot Showed Dominance of Selection Pressure
In this study, to figure out the strength of mutational and selection forces in determining the CUB of the EPB41L3 gene, neutrality analysis was performed [29]. A change at the third codon position results in a synonymous codon, that is, a corresponding amino acid is not changed, and thus there is no contribution of the selection force [15]. The strong positive correlation coefficient of GC12 and GC3 indices implies the influence of the mutational force on the codon utilization pattern [33]. Our result showed a notable high positive correlation (r = 0.926, p < 0.00001; GC12 versus GC3) inferring the mutational forces acting throughout the codon positions. Moreover, the regression line slope <0.5 indicates influence of the selection pressure [24]; in the present study, the regression slope of was 0.302 (Figure 4), representing the neutrality of 30.21% vis-à-vis, the mutational force was 30.21%, and the selection force was 69.79%, demonstrating the role of natural selection in shaping the codon usage of the EPB41L3 gene. The overrepresented (RSCU > 1.6), underrepresented (RSCU < 0.6), more frequently (RSCU > 1), and less frequently used codons (RSCU < 1) are shown. (b) Clustering of RSCU values of the EPB41L3 gene across the selected mammals. Heat map comparing the average RSCU value of a codon (rows) corresponding to the gene transcripts across the mammalian species (columns). The map indicates differing codon preferences within the gene itself (higher RSCU values with more frequent codon usage depicted in dark red and lower RSCU values depicted in dark blue).

Neutrality Plot Showed Dominance of Selection Pressure
In this study, to figure out the strength of mutational and selection forces in determining the CUB of the EPB41L3 gene, neutrality analysis was performed [29]. A change at the third codon position results in a synonymous codon, that is, a corresponding amino acid is not changed, and thus there is no contribution of the selection force [15]. The strong positive correlation coefficient of GC12 and GC3 indices implies the influence of the mutational force on the codon utilization pattern [33]. Our result showed a notable high positive correlation (r = 0.926, p < 0.00001; GC12 versus GC3) inferring the mutational forces acting throughout the codon positions. Moreover, the regression line slope < 0.5 indicates influence of the selection pressure [24]; in the present study, the regression slope of was 0.302 (Figure 4), representing the neutrality of 30.21% vis-à-vis, the mutational force was 30.21%, and the selection force was 69.79%, demonstrating the role of natural selection in shaping the codon usage of the EPB41L3 gene.

Parity Analysis Indicated Predilection for A/G over U/C at the Third Codon Position Owing to Selection Pressure
According to Chargaff's PR2 rule, in a DNA strand, the residue of A is equal to T and the residue C is equal to G [34], and at the coordinate position (0.5, 0.5) of the plot, no biasness between the mutation and selection rates has been reported [24].
The overall AT bias [A3/(A3 + T3)] was 0.552 and the GC bias [G3/(G3 + C3)] was 0.549 ( Figure 5). As per the parity plot, the values were not situated in the center. An unequal distribution might refer to the involvement of both mutational and selection forces in deciding the biasness [35]. In the EPB41L3 gene, purines were preferred over pyrimidines as the PR2 > 0.5 points towards the predilection for A/G over U/C at the third codon position [36] that confirms the selection pressure.

Codon Adaptation Index Close to 1 Shows Better Adaptation
Furthermore, the codon utilization preferences of the EPB41L3 gene among the selected mammalian species were examined using the CAI. The CAI determines the degree of the translation selection force acting upon a gene and thus plays a role in the directional measure of the CUB [19]. The EPB41L3 transcripts' computed CAI values using the codon usage database (https://www.kazusa.or.jp/codon/, accessed on 2 October 2020) of the respective mammalian species are provided in Table S2. The mean CAI value for all the 34 CDSs of the EPB41L3 gene was found to be 0.77 ± 0.01 (Table 2). CAI values range between 0 and 1. Sequences having the CAI values approaching closer to 1 are found to be better suited for a certain host than those with the CAI closer to 0 [15]. The correlation analysis between various CUB indices is provided in Table 4a.
According to Chargaff's PR2 rule, in a DNA strand, the residue of A is equal to T and the residue C is equal to G [34], and at the coordinate position (0.5, 0.5) of the plot, no biasness between the mutation and selection rates has been reported [24].
The overall AT bias [A3/(A3 + T3)] was 0.552 and the GC bias [G3/(G3 + C3)] was 0.549 ( Figure 5). As per the parity plot, the values were not situated in the center. An unequal distribution might refer to the involvement of both mutational and selection forces in deciding the biasness [35]. In the EPB41L3 gene, purines were preferred over pyrimidines as the PR2 > 0.5 points towards the predilection for A/G over U/C at the third codon position [36] that confirms the selection pressure.

Codon Adaptation Index Close to 1 Shows Better Adaptation
Furthermore, the codon utilization preferences of the EPB41L3 gene among the selected mammalian species were examined using the CAI. The CAI determines the degree of the translation selection force acting upon a gene and thus plays a role in the directional measure of the CUB [19]. The EPB41L3 transcripts' computed CAI values using the codon usage database (https://www.kazusa.or.jp/codon/ accessed on 2 October 2020) of the respective mammalian species are provided in Table S2. The mean CAI value for all the 34 CDSs of the EPB41L3 gene was found to be 0.77 ± 0.01 (Table 2). CAI values range between 0 and 1. Sequences having the CAI values approaching closer to 1 are found to be better suited for a certain host than those with the CAI closer to 0 [15]. The correlation analysis between various CUB indices is provided in Table 4a.   Here, *** p < 0.001; ** p < 0.01.

Mutational Force Plays a Minor Role in Configuring the CUB of the EPB41L3 Gene
For determining the intent of mutation pressure in shaping the codon utilization trends in the EPB41L3 gene, an ENc-GC3 plot was constructed. Wright (1990) [20] recommended that resulting points collapse exactly onto the anticipated curve in the plot if GC3s are the only driving SCU patterns. In our ENc-GC3 plot (Figure 6), all the relevant points were found substantially below the anticipated curve, indicating that mutation is not the prime factor, unlike other evolutionary factors including selection forces that tend to be associated with regulation of the specific restraints in configuring the CUB of the EPB41L3 gene [29,37]. A noteworthy negative correlation (r = −0.7131, p < 0.001) was found between ENc-GC3s (Table 4b).

Relevance of Bias in the Use of Codons and Compositional Attributes
Regression analysis between the mean ENc and compositional attributes was performed to analyze the effects of selection pressure on CUB patterning of the EPB41L3 gene. The ENc is a nondirectional measure of the CUB and depends upon the composition of a gene [38]. The higher the ENc, the lower the CUB, whereas a gene with the ENc value inclining towards the lower range than the gene consists of optimal codons and can be associated with elevated translational efficiency [39]. The mean ENc for the EPB41L3 transcripts is 57.66 ± 1.132 (Table 2), inferring low CUB (ENc > 35). The ENc values of the selected mammalian species with an average value of 57.66 (Figure 7) depicted a nearly similar (very low) CUB. The regression plot was drawn between the ENc and different CUB indices (Figure 8). The regression coefficient was positive for A, T, A3, T3, and CAI and negative for C, G, C3, G3, and GC3. The negative values of the regression coefficient of the ENc with C, G, C3, G3, and GC3 infer a positive influence on the CUB.

Mutational Force Plays a Minor Role in Configuring the CUB of the EPB41L3 Gene
For determining the intent of mutation pressure in shaping the codon utilization trends in the EPB41L3 gene, an ENc-GC3 plot was constructed. Wright (1990) [20] recommended that resulting points collapse exactly onto the anticipated curve in the plot if GC3s are the only driving SCU patterns. In our ENc-GC3 plot (Figure 6), all the relevant points were found substantially below the anticipated curve, indicating that mutation is not the prime factor, unlike other evolutionary factors including selection forces that tend to be associated with regulation of the specific restraints in configuring the CUB of the EPB41L3 gene [29,37]. A noteworthy negative correlation (r = −0.7131, p < 0.001) was found between ENc-GC3s (Table 4b).

Figure 6. ENc-GC3 plot (selection curve) analysis. ENc denotes the effective number of codons and
GC3s denotes the GC content in the third synonymous codon position. The red line curve represents the expected curve when the codon usage was only determined by the GC3 composition.

Relevance of Bias in the Use of Codons and Compositional Attributes
Regression analysis between the mean ENc and compositional attributes was performed to analyze the effects of selection pressure on CUB patterning of the EPB41L3 gene. The ENc is a nondirectional measure of the CUB and depends upon the composition of a gene [38]. The higher the ENc, the lower the CUB, whereas a gene with the ENc value inclining towards the lower range than the gene consists of optimal codons and can be associated with elevated translational efficiency [39]. The mean ENc for the EPB41L3 transcripts is 57.66 ± 1.132 (Table 2)

P2 Analysis Indicated High Expression Level of the EPB41L3 Gene among the Envisaged Species and Dominance of Translational Selection
The values of SSU, WWU, SSC, and WWC were computed using the RSCU values of their corresponding codons ( Table 5). The overall P2 value of the EPB41L3 gene was 0.97 (Table 2), implying higher translational efficacy of the respective gene. The EPB41L3 gene amongst the species showed P2 > 0.5 ( Table 5), indicating that translational selection has the dominant role over the mutational force in the codons' utilization patterns.

Codon Utilization Trends in the EPB41L3 Gene Harmonize to the Phylogeny of the Selected Species and the Homo sapiens's EPB41L3 Gene Resembles the Pongo abelii's EPB41L3 Gene
The phylogenetic analysis (Figure 9), a neighbor-joining method under the principle of minimum evolution [40], following K2P distances of the EPB41L3 transcripts covering the five selected mammalian species was performed. The neighbor-joining tree analysis showed that the codon utilization trends in the EPB41L3 transcripts have notable resemblance among the intently connected mammalian species. The gene EPB41L3 in Rattus norvegicus indicated similarities to the EPB41L3 gene in Mus musculus; likewise, this gene in Homo sapiens resembled that of Pongo abelii. Moreover, in a study, it was inferred that genes with similar functions have a decisive role in shaping similar patterns of codon utilization while species play a supportive character in deciding the further difference in the CUB for genes with similar functions [41].

Abundance of tRNA Influences Gene Expression and the Indicated Codon Preference Does Not Correspond to the Most Abundant tRNA Pool
The tRNA pool shown (Table 6a-e) represents the frequencies of tRNA genes in human cells along with four other mammalian cells. Our study indicated that in the EPB41L3 gene, the most preferentially and commonly shared codons across all the five mammalian species were for the amino acids Val, Asn, Asp, His, Gln, and Tyr. Their respective codons GTG, AAC, GAC, CAC, CAG, and TAC were preferred at these six codon-anticodon positions. These six codon-anticodon positions (Val, Asn, Asp, His, Gln, and Tyr) correspond to the most abundant tRNA isotypes present in human cells. Rattus norvegicus and Mus musculus showed marked similarities in their preferred codon families and, likewise, Homo sapiens and Pongo abelii exhibited similarities in their preferred codon families. Moreover, on comparing the synonymous optimal codons with their respective tRNA anticodon of each of the envisaged five mammalian species individually, amino acids, namely Leu and Glu in Homo sapiens, Arg, Leu, Phe, Lys, and Cys in Rattus norvegicus, Gly, Arg, Phe, Lys, and Cys in Bos taurus, Pro, Ser, Arg, Leu, Lys, Glu, and Cys in Mus musculus, and Ser, Lys, and Cys in Pongo abelii were found to have optimal codon-anticodon usage (here, the most preferred codon had highly abundant tRNA isotypes) except for Trp and Met. Overall, these outcomes supported much less adaptation between the codon usage preference for EPB41L3 and the tRNA pool corresponding to the envisaged mammalian species cells.

Abundance of tRNA Influences Gene Expression and the Indicated Codon Preference Does Not Correspond to the Most Abundant tRNA Pool
The tRNA pool shown (Table 6a-e) represents the frequencies of tRNA genes in human cells along with four other mammalian cells. Our study indicated that in the EPB41L3 gene, the most preferentially and commonly shared codons across all the five mammalian species were for the amino acids Val, Asn, Asp, His, Gln, and Tyr. Their respective codons GTG, AAC, GAC, CAC, CAG, and TAC were preferred at these six codon-anticodon positions. These six codon-anticodon positions (Val, Asn, Asp, His, Gln, and Tyr) correspond to the most abundant tRNA isotypes present in human cells. Rattus norvegicus and Mus musculus showed marked similarities in their preferred codon families and, likewise, Homo sapiens and Pongo abelii exhibited similarities in their preferred codon families. Moreover, on comparing the synonymous optimal codons with their respective tRNA anticodon of each of the envisaged five mammalian species individually, amino acids, namely Leu and Glu in Homo sapiens, Arg, Leu, Phe, Lys, and Cys in Rattus norvegicus, Gly, Arg, Phe, Lys, and Cys in Bos taurus, Pro, Ser, Arg, Leu, Lys, Glu, and Cys in Mus musculus, and Ser, Lys, and Cys in Pongo abelii were found to have optimal codon-anticodon usage (here, the most preferred codon had highly abundant tRNA isotypes) except for Trp and Met. Overall, these outcomes supported much less adaptation between the codon usage preference for EPB41L3 and the tRNA pool corresponding to the envisaged mammalian species cells.  Table 6. Cont.

Amino Acid Most Preferred Codons in EPB41L3 Isotypes of tRNA in Human Cells Total Count
Gln (Q) CAG CTG (13)

Amino Acid Most Preferred Codons in EPB41L3 Isotypes of tRNA in Mus musculus cells Total Count
Ala

Discussion
The present study envisaged synonymous codon usage in tumor suppressor proteincoding gene EPB41L3 among the five mammalian species that were identified. Nucleotide composition is an imperative factor in determining codon utilization in genes and genomes [42]. GC3 content at the third codon position signifies codon usage bias as Shen et al. (2015) [26] suggested that genes with a significant amount of GC3 content cotranslate in specific spatial regions and might be involved in genome organization. Genes with a higher GC content present a greater number of targets for methylation [43] as the de-gree of methylation plays a significant role in alteration of the gene expression level [44]. In previous studies, hypermethylation of tumor suppressor genes has been found linked with brain tumor progression. Furthermore, the EPB41L3 gene has been found hypermethylated (29% methylation) only in tumors, confirming its cancer-specific role [7]. In another study, inactivation of the EPB41L3 gene through methylation was involved in breast cancer and renal clear cell carcinoma [1]. Along with this, in the development and progression of NSCLC, hypermethylation also plays an imperative role and exacerbates a poor prognosis [6]. The overall GC content (53.8%) of the AARS (alanyl-tRNA synthetase) gene that belongs to the family of tRNA synthases of class II enzymes helps in tRNA aminoacylation and gene expression was found higher than the AT (46.2%) content [45,46]. GATA2 has a key role in hematopoietic development and the key to KRAS-driven non-small-cell lung cancer; its GC content (65.2%) is higher than the AT content (34.8%) [47,48]. In the present study, the average GC content was 50.63 ± 1.24% in the EPB41L3 gene among the selected mammalian species that was close to half of the total nucleotide content, indicating that a critical balance of methylation was over to proper functioning of the EPB41L3 gene.
The set of dinucleotide odds ratios is specific to every genome and often referred to as a signature. Closely related organisms often exhibit similar odds ratios in comparison to those which are distantly related [49], and thus they can help discriminate within and between species. Species-specific properties, including DNA modification, replication, and repair mechanisms, are reflected by the dinucleotide odds ratio [50]. In this study, relative dinucleotide abundance analysis (Table 3) revealed that the most abundant dinucleotide was GpA with an odds ratio of 1.524, reflecting high GA dinucleotide content in the EPB41L3 gene, whereas dinucleotide pair UpA with an odds ratio of 0.522 was the lowest and underrepresented (less than 0.78). In previous studies, mutations in breast cancers (31%) and in colorectal cancers (11%) were found at 5 -GpA-3 sites (or at complementary 5 -TpC-3 sites). UA (TA) is an obligatory component of various regulatory sequences of both prokaryotic and eukaryotic origin. The examples include TATA box in prokaryotes, TATATA in yeast, and polyadenylation signals, i.e., AATAAA in higher eukaryotes; therefore, this dinucleotide is used in a restrictive manner to avoid inappropriate binding of regulatory elements [51], and the EPB41L3 gene is not the exception.
A remarkable difference in codon usage in the EPB41L3 gene was observed through the matrix plot (Figure 3a) using the average RSCU values of codons. The G/C-ending codons were majorly preferred amongst the frequently used codons having the RSCU > 1. The most preferable codons (RSCU > 1.6) ended with G/C in all the degenerate codons across the selected mammalian species (except UCU for serine) ( Table 1). Among the 31 most frequently used codons, 18 ended with G/C, and two codons, namely GUG and CUG that code for valine and leucine with the highest average RSCU values of 1.88 and 1.86, correspondingly, were considered in the affirmation of a shared codon preference. Relative dinucleotide abundance analysis supported the preferred usage of GpA-ending codons over UpA-ending codons across the selected mammalian species.
According to Wright (1990) [20], the ENc determines the effect of the selection force on the codon utilization pattern of any gene. A higher ENc value exhibits a lower bias in codon usage. Low codon usage bias, as its name itself indicates, shows an almost equal use of the synonymous codons for their corresponding amino acids in CDSs [5]. Low CUB presents an advantage for organisms growing on other organisms as well as cells with different codon choices [33]. The average ENc value for the EPB41L3 gene was found to be 57.66 ± 1.132, which suggested that the CUB was low. A similarity in the results was found for the study involving genes of the Coronaviridae family, where the ENc close to 50 suggested a low mutational force in shaping the CUB [52]. The %GC3 value ranged between 51.60 and 63.10 with an overall value of 53.95 ± 2.27 (Table S2). A noteworthy negative correlation (r = −0.7131, p < 0.001) was found between ENc-GC3s (Table 4b). The regression plot of the ENc versus GC3 (Figure 8) had a negative regression coefficient that inferred a positive influence on the CUB [53]. In addition, the selection curve, a plot between the ENc and GC3s ( Figure 6) is suggestive of non-exclusiveness of the mutational force in shaping codon usage and also indicates the presence of other factors including natural selection associated with shaping codon usage in the EPB41L3 gene [29,37,45].
Different computational parameters were used to illustrate the role of mutational and/or selection pressure on codon utilization of the EPB41L3 gene. In this study, in Figure 8, we found significant negative regression coefficients of the ENc against nucleobases (C, G, G3, C3, and GC3) indicating a positive influence on codon usage bias. An earlier study on genes implicated in the CNS showed similar evidence of negative regression coefficients of the ENc against nucleobases G, C, G3, and C3 [54].
Neutrality plot analysis depicts a relation of GC12 versus GC3; it is performed to figure out the effect of mutational and/or selection pressure in configuring the codon utilization patterns [29]. A mutation may occur without any known external pressure. If a mutation occurs at the third codon position, it culminates with synonymous substitution, i.e., the corresponding amino acid is not changed and thus there is no contribution of selection, whilst mutations at the first or second position of the codon result in non-synonymous changes leading to changes in the amino acid [15]. In the present study, a notable positive correlation between GC12 and GC3 (r = 0.926 p < 0.00001) was found. A similarity in the results was found for the study involving structural and non-structural genes of the Coronaviridae family, where all the genes (E, M, N, S, ORF1a, and ORF8) had positive correlations between GC12 and GC3; all the structural genes except the S gene (correlation value above 0.6 with p < 0.05) had a greater influence of selection pressure over mutational forces on the CUB [52].
A study encompassing the yeast URA3 gene with a variable GC content (31%, 43%, and 63%) revealed that in the gene containing high GC%, the mutation rate was elevated with the presence of both single-base substitutions and deletions [55] attributed to DNA polymerase slippage. The GC-rich genes also exhibit higher rates of mitotic and meiotic recombination indicating an important role of the GC content in genome evolution. The average GC3 content is high in the EPB41L3 gene; hence, it is speculated that the gene is more prone to mutation. High GC content is supposed to facilitate more complex gene regulation [56]. Since CpG dinucleotides are prone to methylation and the degree of methylation alters the gene expression level [44] and ultimately affects phenotypes [44] apart from other genomic factors, these affect gene expression levels and result in new phenotypes [57]. The GC-and AT-rich domains in a gene display distinct chromatin conformations and histone modifications. The transcription process of coding sequences gets slowed down due to high GC content in the transcription bubble [58]. The GC content and gene expression are highly correlated [59]; however, the same is debated in mammals. To further assess the effect of the GC content on methylation, translational selection was calculated. The average P2 value of the EPB41L3 gene was 0.97, indicating high translational efficiency, inferring that methylation probably has no effect on gene expression.
According to the "genome hypothesis" [60], the codon preference patterns are considered to be well-conserved during the course of evolution. The pattern of synonymous codon usage is different between different kinds of organisms. The choice of synonymous codons is similar in all genes for a particular genome. Furthermore, within an organism, codon choice is linked to organism-specific isoaccepting tRNAs [61,62]. We compared the most preferred codon families of the EPB41L3 gene related to the selected mammalian tRNAs pool. According to the tRNA frequency tables (Table 6), the most preferentially and commonly shared codons across all the five mammalian species were for the amino acids Val, Asn, Asp, His, Gln, and Tyr. Their respective codons GTG, AAC, GAC, CAC, CAG, and TAC were preferred at these six codon-anticodon positions. Cluster analysis (neighbor-joining method) was performed based on the K2P distances of the CDSs in the EPB41L3 gene across the five selected mammalian species (Figure 9); the result showed a similar resemblance to the CUB pattern in the EPB41L3 gene across the envisaged five mammalian species. A study suggested that genes with similar functions have a decisive role in shaping similar patterns of codon utilization, while species play a supportive character in deciding the further difference in the CUB for genes with similar functions [12].
Moreover, when comparing the synonymous optimal codons with their respective tRNA anticodon of each of the envisaged five mammalian species individually, amino acids, namely Leu for Homo sapiens, Rattus norvegicus, and Mus musculus, Cys and Lys for Rattus norvegicus, Bos taurus, Mus musculus, and Pongo abelii, and Ser for Rattus norvegicus, Mus musculus, and Pongo abelii showed similar codon preferences and were found to have optimal codon-anticodon usage (here, the most preferred codon had highly abundant tRNA isotypes) except for Trp and Met. The impact of the tRNA choice has been found to affect the evolution of different codon choice patterns in early/late genes in viruses [63].
Overall, these outcomes supported much less adaptation between the codon usage preference for EPB41L3 and the tRNA pool corresponding to the envisaged mammalian species cells. However, a previous study conducted in the human genome by Comeron [64] observed that in a highly expressed gene, a combination of the most preferred codon and its respective most abundant tRNA gene was present.

Conclusions
In this study, we investigated the compositional properties and biases in the codon utilization trends of the gene were envisaged as the quantity of protein expressed from the coding sequences may vary remarkably due to distinguishable translational properties of different synonymous codons under evolutionary forces. In brief, our results indicated a fairly low CUB within the gene due to the high ENc value. Out of 31 frequently used codons, 18 ended with G/C and two overrepresented codons (GUG and CUG) were identified across all the selected mammalian species. Dinucleotide odds ratio values reflected the high quantity of GA in the EPB41L3 gene, whereas the dinucleotide pair UpA was very low. Codon usage in the EPB41L3 transcripts among the envisaged species was significantly affected by the GC bias, primarily due to GC3. P2 value (>0.5) indicated high translational efficiency of the EPB41L3 gene that implies the presence of optimal codons, inferring that methylation has probably no effect on gene expression. A disproportionate distribution in the parity plot might refer to the involvement of both mutational and selection forces in deciding the biasness; moreover, PR2 > 0.5 indicated the preference of purine over pyrimidine at the third codon position that confirms the influence of selection pressure in the EPB41L3 transcripts. Additionally, neutrality plot, ENc-GC3, and parity analysis inferred the dominance of selection pressure over mutational pressure throughout the codon positions, suggesting that natural selection tends to be associated with regulating the specific restraints on patterning the codon usage in the EPB41L3 gene. The negative values of the regression coefficient of the ENc with C, G, C3, G3, and GC3 infer a positive influence on the CUB. The phylogeny analysis and heat map of the RSCU values showed resemblance in the codon usage pattern of EPB41L3 in Homo sapiens to that of Pongo abelii, as well as in Rattus norvegicus to that of Mus musculus. Our study revealed that a specific gene with similar functions in closely related species shows a similar trend in codon usage as perceived from an earlier study in the serotonin receptor gene family [39]. Investigation of the preferred codons used by EPB41L3 and the corresponding tRNA pools of the envisaged species inferred that the EPB41L3 transcripts do not prefer codons from their corresponding suboptimal anticodon tRNA pool. The overall codon usage pattern study disclosed that the selection is the key factor influencing the pattern of codon utilization in the EPB41L3 gene.