Genome-Wide Identification, Evolutionary and Mutational Analysis of the Buffalo Sox Gene Family

Simple Summary The Sox gene family is a set of specific transcriptional factor (TF) proteins, with a very similar sequence compared to the sex-determining region Y (SRY), related high-mobility group (HMG) box genes found in mammals. The Sox gene family is involved in many important developmental processes, including sex determination. In the current study, an in silico analysis was performed to provide insights into the evolutionary importance, mutations and gene duplication events of the buffalo Sox gene family. Based on our analysis, we found that the HMG domain was highly conserved throughout the Sox gene family. Mutational analysis revealed twenty non-synonymous mutations with potential detrimental effects on physiological functions in buffalo. The current study concluded that the buffalo Sox gene family was highly conserved throughout evolution, and the non-synonymous mutations identified could potentially be valuable for the selective breeding of buffalo. Abstract The Sox gene family constitutes transcription factors with a conserved high mobility group box (HMG) that regulate a variety of developmental processes, including sex differentiation, neural, cartilage, and early embryonic development. In this study, we systematically analyzed and characterized the 20 Sox genes from the whole buffalo genome, using comparative genomic and evolutionary analyses. All the buffalo Sox genes were divided into nine sub-groups, and each gene had a specific number of exons and introns, which contributed to different gene structures. Molecular phylogeny revealed more sequence similarity of buffalo Sox genes with those of cattle. Furthermore, evolutionary analysis revealed that the HMG domain remained conserved in the all members of the Sox gene family. Similarly, all the genes are under strong purifying selection pressure; seven segmental duplications occurred from 9.65 to 21.41 million years ago (MYA), and four potential recombination breakpoints were also predicted. Mutational analysis revealed twenty non-synonymous mutations with potential effects on physiological functions, including embryonic development and cell differentiation in the buffalo. The present study provides insights into the genetic architecture of the Sox gene family in buffalo, highlights the significance of mutations, and provides their potential utility for marker-assisted selection for targeted genetic improvement in buffalo.

species, the Hidden Markov model (HMM) profile of the conserved HMG box domain protein (DNA binding protein) sequence from the Pfam database [29] was used to search for their predicted protein-coding variants, using a local BLASTP tool with an E values set at 10 −5 . Using the TBLASTN tool, the genomic pattern of Sox genes from these livestock species was further determined by comparing the query protein sequence with all six possible reading frames of the database.

Multiple Sequence Alignment and Phylogenetic Analysis of Buffalo Sox Genes
All Sox gene sequences were aligned using ClustalW with the default parameters, and the alignments were manually adjusted [30]. A phylogenetic tree of closely related species was constructed using MEGA 11 [31] by implementing neighbor joining (NJ) method and setting a bootstrap of 1000 replicates. The results of multiple sequence alignments and phylogeny were visualized and examined using iTOL v4.2.3 [32].

Gene Structure, Motif Pattern, and Conserved Domain Analysis
The buffalo Sox genes were mapped to their respective chromosomes by determining their chromosomal positions, as indicated by the buffalo whole genome sequence (Table S1). The nucleotide sequences associated with each identified Sox gene were obtained from the NCBI and Ensemble databases. Based on the alignments of their coding sequences and related genomic sequences, the exon-intron patterns of the Sox genes were identified. Multiple expectation maximization for motif elicitation (MEME) [33] and the conserved domain database were used [34] to identify the motif pattern, and to analyze the evolutionary conserved domain of the Sox gene family in buffalo. The minimum and maximum width of motifs were set to 6 and 50, respectively, and the number of motifs was set at 10. Then, using in-house scripts in the TBtools program, the final gene structure was presented and analyzed using the buffalo genome annotation file in the general feature format (GFF) [35].

Prediction of Physicochemical Properties
Physiochemical properties of Sox proteins, including protein size, molecular weight, instability index, aliphatic index, amino acid number, and isoelectric points, were determined using ExPASy [36]. The hydrophobicity and hydrophilicity of protein peptides were also calculated using the grand average of hydropathicity index (GRAVY) in the PROTPRAM tool [37].

Synteny Analysis and Chromosome Localization
Using genomic resources, the chromosomal localization of buffalo Sox genes was obtained. The genome annotation file was used as an input in MCScanX to map the actual locations of Sox genes on chromosomes [48]. Tbtools was used to generate a dual synteny plot to identify collinear Sox genes between buffalo and cattle. In addition, the frequency of gene duplication for the buffalo Sox gene family was evaluated by a pairwise alignment of homologous gene pairs of Sox genes, using MEGA11 v.11.0 and the MUSCLE program [31]. Furthermore, the ratio of nonsynonymous (Ka) to synonymous (Ks) nucleotide substitution rates was calculated to measure selective pressure, and pairwise combinations of genes were also determined using the ka/ks calculation tool (http://services.cbu.uib.no/tools/kaks) (assessed on 17 January 2023).

Selection Analysis of the Sox Gene Family
To measure the site-specific purifying and positive selection on the buffalo Sox gene family, the Selection server (http://selecton.tau.ac.il/ (accessed on 20 January 2023)), which applies a Bayesian inference strategy for evolutionary models, was used. The coding sequences (in FASTA format) of all buffalo Sox genes were analyzed in the SELECTION server [49] to detect evolutionary forces at single amino acid sites. The M8 evolutionary model, which allows positive selection to act on the protein, was implemented in this investigation. A fraction p0 of sites are selected from a beta distribution (specified in the interval 0), and a fraction p1(=1 − p0) of the sites are selected from an additional category ωs (specified as ≥1). Thus, sites selected from the beta distribution are those undergoing purifying selection, whereas sites selected from the ωs category are those undergoing neutral or positive selection [49].

Analysis of Recombination Breakpoints in the Sox Gene Family
To detect recombination breakpoints in multiple sequence aligned buffalo Sox genes, the Genetic Algorithm Recombination Detection (GARD) tool was used [50]. The approach is designed to look for evidence of segment-specific phylogenies from the beginning. The strategy looked for B or fewer breakpoints in the alignment if the maximum number of breakpoints (B, which can also be inferred) is known. It also inferred phylogenies for each prospective non-recombinant segment, and evaluated the goodness of a match-up using a relevant data criterion, such as the small subset Akaike information criterion (AIC) [51], which was derived from a maximum likelihood model fit with each component.

Genome-Wide Identification of Buffalo Sox Genes
Using the BLAST and HMMER tools, a total of 20 Sox genes were discovered in the buffalo genome, and classified into nine subgroups. A total of 80 non-redundant protein sequences encoded by 20 Sox genes were predicted from the whole genomes of four mammalian species, viz. goat, sheep, cattle, and buffalo ( Figure 1). All representative species have the same number of Sox genes in their genome.
number of breakpoints (B, which can also be inferred) is known. It also inferred phylogenies for each prospective non-recombinant segment, and evaluated the goodness of a match-up using a relevant data criterion, such as the small subset Akaike information criterion (AIC) [51], which was derived from a maximum likelihood model fit with each component.

Genome-Wide Identification of Buffalo Sox Genes
Using the BLAST and HMMER tools, a total of 20 Sox genes were discovered in the buffalo genome, and classified into nine subgroups. A total of 80 non-redundant protein sequences encoded by 20 Sox genes were predicted from the whole genomes of four mammalian species, viz. goat, sheep, cattle, and buffalo ( Figure 1). All representative species have the same number of Sox genes in their genome.

Phylogenetic Analysis of the Sox Gene Family
A phylogenetic analysis of Sox genes from four mammalian species was performed, and all identified genes were classified into nine primary subgroups. The SoxA subgroup contains the SRY gene; the SoxB2 subgroup contains Sox14 and Sox21 genes; SoxB1 subgroup contains Sox1, Sox2, and Sox3 genes; the SoxF subgroup contains Sox7, Sox17, and Sox18 genes; the SoxC subgroup contains Sox4, Sox11, and Sox12 genes; the SoxD

Phylogenetic Analysis of the Sox Gene Family
A phylogenetic analysis of Sox genes from four mammalian species was performed, and all identified genes were classified into nine primary subgroups. The SoxA subgroup contains the SRY gene; the SoxB2 subgroup contains Sox14 and Sox21 genes; SoxB1 subgroup contains Sox1, Sox2, and Sox3 genes; the SoxF subgroup contains Sox7, Sox17, and Sox18 genes; the SoxC subgroup contains Sox4, Sox11, and Sox12 genes; the SoxD subgroup contains Sox5, Sox6, and Sox13 genes; and the SoxG subgroup contains Sox15 gene. Most of the Sox genes in the buffalo Sox gene family showed more homology with cattle than with goats or sheep. The resulting dendrogram ( Figure 1) also showed that the buffalo Sox gene family was closely related to other representative mammalian species. The buffalo SRY gene shared sequence homology with cattle, while the goat SRY gene was more similar to sheep. Similarly, the Sox4, Sox9, Sox15, Sox17, Sox18, and Sox30 genes share similar sequence homology pattern in representative species.

Gene Structure Characterization of the Buffalo Sox Gene Family
The structural features of buffalo Sox genes, including gene structure (Figure 2), phylogenetic relationship ( Figure 3A), motif orientation ( Figure 3B), and conserved domain distribution ( Figure 3C), were analyzed. These analyses were performed with respect to the information obtained from the phylogenetic tree. The gene structure analysis revealed that the introns and UTRs structures differed substantially, and each buffalo Sox gene had a specific number of exons and introns ( Figure 2). Furthermore, conserved domain analysis illustrated the presence of the HMG domain in all Sox genes, whereas the SoxN domain was found only in the SoxE group. Furthermore, a total of 10 MEME conserved motifs were identified from buffalo Sox genes, of which three motifs (MEME-1, MEME-4, and MEME-9) had a higher number of amino acids (Table 1). Motif 1 was annotated as HMG domain, and motif 4 was annotated as SoxN, after the Pfam search ( Table 1). The peptide length in the Sox gene family ranged from 229 (SRY) to 870 (Sox6) amino acids ( Table 2).
gene. Most of the Sox genes in the buffalo Sox gene family showed more homology with cattle than with goats or sheep. The resulting dendrogram ( Figure 1) also showed that the buffalo Sox gene family was closely related to other representative mammalian species. The buffalo SRY gene shared sequence homology with cattle, while the goat SRY gene was more similar to sheep. Similarly, the Sox4, Sox9, Sox15, Sox17, Sox18, and Sox30 genes share similar sequence homology pattern in representative species.

Gene Structure Characterization of the Buffalo Sox Gene Family
The structural features of buffalo Sox genes, including gene structure (Figure 2), phylogenetic relationship ( Figure 3A), motif orientation ( Figure 3B), and conserved domain distribution ( Figure 3C), were analyzed. These analyses were performed with respect to the information obtained from the phylogenetic tree. The gene structure analysis revealed that the introns and UTRs structures differed substantially, and each buffalo Sox gene had a specific number of exons and introns ( Figure 2). Furthermore, conserved domain analysis illustrated the presence of the HMG domain in all Sox genes, whereas the SoxN domain was found only in the SoxE group. Furthermore, a total of 10 MEME conserved motifs were identified from buffalo Sox genes, of which three motifs (MEME-1, MEME-4, and MEME-9) had a higher number of amino acids (Table 1). Motif 1 was annotated as HMG domain, and motif 4 was annotated as SoxN, after the Pfam search ( Table 1). The peptide length in the Sox gene family ranged from 229 (SRY) to 870 (Sox6) amino acids ( Table 2).

Physiochemical Attributes of the Buffalo Sox Gene Family
The physicochemical properties of the buffalo Sox gene family were investigated, including its chromosomal distribution, exon number, amino acid length, molecular weight (Da), isoelectric point (pI), instability index (II), aliphatic index (AI), and grand average of hydropathicity index (GRAVY), as shown in Table 2. The molecular weight of the buffalo Sox gene family ranged from 25,180.29 to 96,885.84 (MW), and the isoelectric point (pI) value ranged from 4.95 and 9.90. All the Sox proteins behave as basic, except Sox5, Sox9, Sox10, Sox11, Sox12, Sox13, and Sox17, which exhibit acidic properties ( Table 2). Only four Sox proteins (Sox5, Sox13, Sox21, and Sox30) were thermostable, with aliphatic index values greater than 65. According to the instability index values, which were higher than 40, all members of the Sox gene family were highly unstable. Based on lower GRAVY values, all buffalo Sox proteins were hydrophilic ( Table 2).

Comparative Amino Acid Analysis of the Buffalo Sox Gene Family
All amino acid sequences of the buffalo Sox gene family were compared with cattle to check the amino acid variation. Five buffalo Sox genes, including Sox2, Sox3, Sox10, Sox12, and Sox14, had no amino acid substitutions, while eight buffalo Sox genes, including Sox1, Sox2, Sox3, Sox4, So6, Sox7, Sox9, and Sox18, had indels. Single insertions were observed at the 39, 27, 400, 287, 147, 388, and 298 positions of Sox1, Sox2, Sox4, Sox6, Sox7, Sox9, and Sox18 genes, respectively, while two insertions were observed in the Sox3 gene, one of which one was a 45 amino acid insertion and the other one was a single amino acid (Table 3).  Furthermore, a total of 80 amino acid substitutions were detected in buffalo Sox genes, where one single amino acid variation was found in the Sox1, Sox4, Sox11, and Sox21 genes, two variations were found in the Sox5, Sox9, Sox17, and Sox18 genes, four in Sox, and Sox15, and five and six amino acid substitutions were found in the Sox6 and Sox8 genes, respectively. Likewise, 11 mutations were detected in the Sox13 gene, 17 mutations in Sox30, and a total of 21 mutations were found in buffalo SRY gene ( Figures S1-S20). Furthermore, the functional effect of these mutations was evaluated using different software, and a total of 20 amino acid substitutions were identified in different Sox genes of buffalo, including Sox5 (P362S), Sox6 (L2P), Sox8 (Y443F), Sox13 (G5R, F20V, and F39L), Sox15 (T205P), Sox30 Animals 2023, 13, 2246 9 of 18 (P28A, C53W, T179R, Y605F, and E696D), and SRY (V63L, W64G, R69K, R70Q, D38K, W92G, A112S, and Q172H), which showed non-synonymous mutations that were expected to have an overall detrimental effect on protein structure and functions (Table S1). In contrast, the other amino acid substitutions had an overall synonymous effect, with no impact on protein structure and function.

Synteny Analysis and Chromosome Localization of the Sox Gene Family in Buffalo
In buffalo, all of the Sox gene isoforms were distributed on 11 chromosomes, while in cattle, they are present on 12 chromosomes. Additionally, in buffalo, most of the Sox genes were found near the terminal ends of the chromosomes (Figure 4). To better understand the evolutionary history, the gene duplication events of the buffalo Sox gene family were examined. The homologous gene pairs Sox17/Sox18, Sox8/Sox9, Sox30/Sox13, Sox5/Sox6, Sox21/Sox7, Sox11/Sox4, and Sox3/Sox1 were all found to be duplicated, and were considered to be segmental duplications (Table 4).
Sox21 genes, two variations were found in the Sox5, Sox9, Sox17, and Sox18 genes, four in Sox, and Sox15, and five and six amino acid substitutions were found in the Sox6 and Sox8 genes, respectively. Likewise, 11 mutations were detected in the Sox13 gene, 17 mutations in Sox30, and a total of 21 mutations were found in buffalo SRY gene (Figures S1-S20). Furthermore, the functional effect of these mutations was evaluated using different software, and a total of 20 amino acid substitutions were identified in different Sox genes of buffalo, including Sox5 (P362S), Sox6 (L2P), Sox8 (Y443F), Sox13 (G5R, F20V, and F39L), Sox15 (T205P), Sox30 (P28A, C53W, T179R, Y605F, and E696D), and SRY (V63L, W64G, R69K, R70Q, D38K, W92G, A112S, and Q172H), which showed non-synonymous mutations that were expected to have an overall detrimental effect on protein structure and functions (Table S1). In contrast, the other amino acid substitutions had an overall synonymous effect, with no impact on protein structure and function.

Synteny Analysis and Chromosome Localization of the Sox Gene Family in Buffalo
In buffalo, all of the Sox gene isoforms were distributed on 11 chromosomes, while in cattle, they are present on 12 chromosomes. Additionally, in buffalo, most of the Sox genes were found near the terminal ends of the chromosomes (Figure 4). To better understand the evolutionary history, the gene duplication events of the buffalo Sox gene family were examined. The homologous gene pairs Sox17/Sox18, Sox8/Sox9, Sox30/Sox13, Sox5/Sox6, Sox21/Sox7, Sox11/Sox4, and Sox3/Sox1 were all found to be duplicated, and were considered to be segmental duplications (Table 4). Additionally, for these homologous gene pairs, the ratio of nonsynonymous substitutions per nonsynonymous site (Ka) to synonymous substitutions per synonymous site (Ks) was determined ( Table 4). All seven gene pairs were under purifying selection pressure, with a Ka/Ks ratio <1. Similarly, four gene pairs (Sox17/Sox18, Sox11/Sox4, Sox3/Sox1 and Sox21/Sox7) were duplicated in the Tortonian age, between 7.246 to 11.63 mya (Table  4), after one gene pair (Sox8/Sox9) was duplicated in the Serravallian. In addition, two  Additionally, for these homologous gene pairs, the ratio of nonsynonymous substitutions per nonsynonymous site (Ka) to synonymous substitutions per synonymous site (Ks) was determined (Table 4). All seven gene pairs were under purifying selection pressure, with a Ka/Ks ratio <1. Similarly, four gene pairs (Sox17/Sox18, Sox11/Sox4, Sox3/Sox1 and Sox21/Sox7) were duplicated in the Tortonian age, between 7.246 to 11.63 mya (Table 4), after one gene pair (Sox8/Sox9) was duplicated in the Serravallian. In addition, two gene pairs (Sox5/Sox6 and Sox30/Sox13) were duplicated in Aquitanian stage, around 21 mya ago (Table 4).

Single Amino Acid Site Evolution
To analyze the site-wise selection of the buffalo Sox gene family, we evaluated a buffalo Sox gene data set using SELECTION server to determine the Ka and Ks ratio at each amino acid site. As shown in Figure 5, the Sox gene family is under purifying selection pressure. In our data set, we found out that 40-130 amino acids in the sequence were under strong purifying selection, as indicated by their ka/ks ratio. Note: According to the geological time scale, the Tortonian and Serravallian are two stages/ages of the Neogene period, which occurred approximately 23 to 2.6 million years ago. The Tortonian age is estimated to have spanned from approximately 11.6 to 7.2 million years ago, while the Serravallian age is estimated to have spanned from approximately 13.8 to 11.6 million years ago.

Single Amino Acid Site Evolution
To analyze the site-wise selection of the buffalo Sox gene family, we evaluated a buffalo Sox gene data set using SELECTION server to determine the Ka and Ks ratio at each amino acid site. As shown in Figure 5, the Sox gene family is under purifying selection pressure. In our data set, we found out that 40-130 amino acids in the sequence were under strong purifying selection, as indicated by their ka/ks ratio.

Recombination Analysis Using Genetic Algorithm Recombination Detection (GARD)
GARD found evidence of recombination breakpoints by examining the 10,492 models at a rate of 5.71 models per second. The alignment contained 2165 potential breakpoints, translating into a search space of 395,463,521,942,023 models with up to 5 breakpoints, of which 0.00% was explored by the genetic algorithm ( Figure S21). The modelaveraged support for the breakpoint sites was calculated by quantifying the model-averaged frequency of seeing a breakpoint at a particular site across all points in the alignment. It was based on the standardized Akaike weights of the models, for which the plots were congruent with the best-fitting model. By performing multiple breakpoint analyses using a genetic algorithm, the multiple sequence alignment of 20 Sox gene nucleotide sequences with 3117 sites, of which 2166 were variable, revealed four major recombination breakpoints at sites 880, 1046, 1308, and 2478 ( Figure 6).

Recombination Analysis Using Genetic Algorithm Recombination Detection (GARD)
GARD found evidence of recombination breakpoints by examining the 10,492 models at a rate of 5.71 models per second. The alignment contained 2165 potential breakpoints, translating into a search space of 395,463,521,942,023 models with up to 5 breakpoints, of which 0.00% was explored by the genetic algorithm ( Figure S21). The model-averaged support for the breakpoint sites was calculated by quantifying the model-averaged frequency of seeing a breakpoint at a particular site across all points in the alignment. It was based on the standardized Akaike weights of the models, for which the plots were congruent with the best-fitting model. By performing multiple breakpoint analyses using a genetic algorithm, the multiple sequence alignment of 20 Sox gene nucleotide sequences with 3117 sites, of which 2166 were variable, revealed four major recombination breakpoints at sites 880, 1046, 1308, and 2478 ( Figure 6).   Analyses of fragmented sequences indicated by GARD-discovered recombination breakpoints revealed phylogenetic segregation (Figure 7) across the different recombination fragments trees in the coordinate ranges 1-880 (Figure 7a), 881-1046 (Figure 7b

Discussion
Recent developments in high-throughput genome sequencing technologies, such as next-generation sequencing, have made it easier for researchers to find SNPs in the genome of an organism, and detect synonymous or non-synonymous mutations that may have a potential impact on the individual [52][53][54]. These SNPs have the potential to cause structural or functional abnormalities in the translated gene's final product [55,56]. Highthroughput sequencing technologies can help researchers to better understand the genetics of animals at a molecular level. For mammals, the study of candidate genes requires all available genetic resources to discover and identify those genes that have functional roles, like production ability, disease resistance, adaptation, and productivity, and also to study their interaction with other genes [57]. Comparative genomic analysis has helped researchers explore the genetics behind commercially important functional traits, by uncovering novel genes and their regulatory pathways that could have a significant impact on the progress of the buffalo industry [54].

Discussion
Recent developments in high-throughput genome sequencing technologies, such as next-generation sequencing, have made it easier for researchers to find SNPs in the genome of an organism, and detect synonymous or non-synonymous mutations that may have a potential impact on the individual [52][53][54]. These SNPs have the potential to cause structural or functional abnormalities in the translated gene's final product [55,56]. High-throughput sequencing technologies can help researchers to better understand the genetics of animals at a molecular level. For mammals, the study of candidate genes requires all available genetic resources to discover and identify those genes that have functional roles, like production ability, disease resistance, adaptation, and productivity, and also to study their interaction with other genes [57]. Comparative genomic analysis has helped researchers explore the genetics behind commercially important functional traits, by uncovering novel genes and their regulatory pathways that could have a significant impact on the progress of the buffalo industry [54].
In the present study, characterization of the Sox gene family in the buffalo genome revealed 20 Sox genes. Phylogenetic analysis indicated that most of the Sox genes had more similarity between "buffalo and cattle", and "sheep and goat" and were grouped into 9 subgroups. Previously, the analysis of Sox gene family revealed 19 members of Sox genes in chickens [14], 20 in mice and humans [16], and 27 in common carp [21]. Structural analysis of the buffalo Sox gene family was performed, and the top ten conserved motifs were analyzed, in which motif 1 and motif 5 were annotated as HMG and Sox-N domain after searching out in the Pfam database. Previously, it was shown that HMG (high mobility group) domain is a 70 amino acid long, highly conserved sequence in mammals, found in Sox gene family members [58].
The physicochemical properties of the buffalo Sox proteins exhibited identical properties in the same group, which is consistent with earlier research [59]. In vitro stability of proteins is indicated by an instability index (II) value of less than 40 [60]. In this study, all members of the Sox gene family were found to be highly unstable, based on higher instability index values. The high instability of all the Sox gene family members observed in this study may have significant implications for their biological function and potential applications in selective breeding. Aliphatic index values of higher than 65 for the proteins show a thermostable property [61]. Only Sox5, Sox13, Sox21, and Sox30 proteins exhibited thermostable properties in the present study. The GRAVY values for all Sox protein family members were found to be negative in the current study, indicating their hydrophilic nature. A negative GRAVY value indicates that the protein is hydrophilic in nature and has a higher solubility in aqueous solutions, with its subcellular location likely to be within the nucleus, indicating that it is not a membrane protein [62,63]. This information may have important implications for the function and stability of Sox proteins, which could be explored in future studies.
To acquire new genes or genetic polymorphisms, organisms employ gene duplication systems, such as retroposition, genome or chromosome duplication, and crossing over, which are essential for the evolution of physiological mechanisms [64]. Although the frequency of duplication is difficult to quantify, selective pressure and mutations with functional consequences are essential for the development of duplicated genetic variations [65]. In the present study, seven segmental duplication events were observed in the Sox gene family. The Ka/Ks ratio also showed that all duplicated gene pairs were found under purifying selection pressure (Ka/Ks ratio < 1) [66]. The results from the SELECTION server also justify that there is a strong purifying selection on the Sox gene family in the 40-130 amino acid region in buffalo, indicating that this is a highly conserved region. This strong purifying selection in the Sox gene family may have potential benefits in selective breeding for the improvement of desirable traits. It is important to note that a strong overall purifying selection may mask multiple codon sites under positive selection [67,68]. About 70% (14 out of 20) of the Sox genes were duplicated in the present study, indicating an important expansion of Sox gene family members in buffalo.
An exciting area of research in recent years has been the identification and measurement of evolutionary forces that have led to genetic diversity [69]. These techniques are now widely employed in the statistical toolkit for sequence analysis [70]. In this study, a recombination analysis using GARD revealed that there are four potential breakpoints in the nucleotide sequences of Sox genes. These breakpoints are the reason why the Sox gene family has diverse functions in different species. Strong evolutionary forces impose variations in the sequences, and this can lead to variable functions [50]. Comparing the AIC scores of the best-fitting GARD model (62,144.6), which allows for different topologies between segments, and the model (6515.7), which assumes the same tree for all the partitions inferred by GARD but allows for different branch lengths between partitions, leads to the conclusion that at least one of the breakpoints does indeed reflect a topological incongruence. It would be interesting to explore the reason for the topological incongruence, as it may be related to a biological or evolutionary process that created it, or the characteristics of the species tree [71]. Moreover, it should be examined in a methodological context, such as the extent to which error or uncertainty in phylogenetic inference affects the perception of topological incongruence between gene trees and the species tree [72].
Comparative genomics is a vast, integrated strategy that compares and investigates the biology of individual genomes to identify similarities and variations across genomes of different species [73][74][75]. Several members of the Sox gene family have been conserved between cattle and buffalo, including Sox2, Sox3, Sox10, Sox12, and Sox14. Comparative genomic studies of cattle and buffalo have shown up to 97% homology [76,77]. This high level of genomic homology indicates that cattle and buffalo are close on the evolutionary scale, so that cheap and fast solutions can be realized to transfer the latest genomic technologies between the species. Moreover, homology between these two species is important for understanding the variations in physiological traits and possible convergent evolutions of these species [54,78]. Mutations can be beneficial, by increasing overall fitness, or might be detrimental, by affecting the structural and functional properties of translated products [79,80]. By comparing the buffalo genome with the cattle genome, single amino acid substitutions in Sox genes were identified. Previously, researchers revealed 21 SNPs in the SRY coding sequence of Brahman crossbred cattle (produced by crossing with Belgian Blue and Wagyu bulls), in which they found 71% non-synonymous and 29% synonymous variations [81]. The transversion mutation at position T1707G changes the amino acid from phenylalanine to cystine [81].
In the present study, comparative analyses of the coding sequence of SRY gene between buffalo and other related mammals have shown the conservation of sequences in these species. The SRY gene is an important member of the Sox gene family, which is mainly involved in sex determination and testis differentiation [5,10,23,82]. Due to its susceptibility to recombination with the X chromosome in the meiotic XY pair, this gene is one of the mostconserved Y-specific genes during evolution in a variety of mammals [83]. Considering this property, researchers have used this gene as a molecular maker to find out the patrilineal phylogeny of Bovidae and other similar species [84]. The SRY genes of Chinese native cattle and yaks were cloned and sequenced, and the results showed less divergence of the coding region of SRY gene among these species [85]. In the present study, 20 amino acid changes with non-synonymous effects were observed between cattle and buffalo SRY coding regions, of which nine were present in the HMG region and eleven changes were detected outside HMG region, which is consistent with the previous investigation [86]. Further studies are warranted to reveal the physiological manifestation of these changes in the coding region of the SRY gene. Obviously, these variations in the buffalo SRY coding region have potential benefits in understanding the evolutionary relationship between cattle and buffalo, identifying functional differences, developing molecular markers, and improving animal productivity [54,78]. The SRY gene has shown copy number variations (CNVs) only in Vietnamese and Laotian buffalo, indicating its evolutionary significance. Moreover, the SRY gene is mainly associated with sex differentiation and male sexual development, and can potentially be used as a candidate marker for sperm quality and fertility in bulls, due to CNVs observed within and across species [87].
In the current study, the mutational effects were observed, which indicated eight nonsynonymous mutations in the SRY gene. Previously, mutations in different regions of the SRY gene were reported in patients of partial or complete gonadal dysgenesis [88][89][90][91][92][93][94][95][96][97]. Furthermore, other studies also highlighted the critical role of the Sox8 gene in human reproduction. They found that male infertility and a variety of abnormalities, including 46, XY DSD (disorder in sex development) and 46, XX POI (primary ovarian insufficiency), are caused by mutations in the Sox8 gene [98]. Similarly, it was also reported that patients with mental retardation or growth impairment with significant language disorder, behavioral issues, and mild dysmorphic traits have been shown to have Sox5 mutations. These findings suggest that Sox5 mutations may affect neuronal activity and/or development [99][100][101]. Thus, it can be inferred from previous investigations that the mutations found in the present study might have a significant functional role in sex differentiation and evolutionary processes. The single amino acid substitutions detected in the present study can also be used as a marker to detect genetic diversity between buffalo and other related species [102].

Conclusions
The present study identified 20 Sox genes from the buffalo genome. Molecular phylogenetic analysis revealed high structural and sequence similarity between cattle and buffalo Sox genes. Furthermore, evolutionary analysis revealed that the HMG domain is conserved in all members of Sox gene family. Similarly, all the genes are under strong purifying selection pressure, and seven segmental duplications that occurred from 9.65 to 21.41 MYA, as well as four potential recombination breakpoints, were also predicted. Mutational analysis revealed twenty non-synonymous mutations with potential detrimental effects on buffalo physiology. The current study concluded that the buffalo Sox gene family is highly conserved throughout evolution, and the non-synonymous mutations identified could potentially be valuable for the selective breeding of buffalo. However, further research is needed to determine the potential physiological consequences of the identified mutations in buffalo.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ani13142246/s1, Figure S1: Comparative amino acid analyses of Sox1 genes in cattle, buffalo, sheep, and goats; Figure S2: Comparative amino acid analyses of Sox2 genes in cattle, buffalo, sheep, and goats; Figure S3: Comparative amino acid analyses of Sox3 genes in cattle, buffalo, sheep, and goats; Figure S4: Comparative amino acid analyses of Sox4 genes in cattle, buffalo, sheep, and goats; Figure S5: Comparative amino acid analyses of Sox5 genes in cattle, buffalo, sheep, and goats; Figure S6: Comparative amino acid analyses of Sox6 genes in cattle, buffalo, sheep, and goats; Figure S7: Comparative amino acid analyses of Sox7 genes in cattle, buffalo, sheep, and goats; Figure S8: Comparative amino acid analyses of Sox8 genes in cattle, buffalo, sheep, and goats; Figure S9: Comparative amino acid analyses of Sox9 genes in cattle, buffalo, sheep, and goats; Figure S10: Comparative amino acid analyses of Sox10 genes in cattle, buffalo, sheep, and goats; Figure S11: Comparative amino acid analyses of Sox11 genes in cattle, buffalo, sheep, and goats; Figure S12: Comparative amino acid analyses of Sox12 genes in cattle, buffalo, sheep, and goats; Figure S13: Comparative amino acid analyses of Sox3 genes in cattle, buffalo, sheep, and goats; Figure S14: Comparative amino acid analyses of Sox3 genes in cattle, buffalo, sheep, and goats; Figure S15: Comparative amino acid analyses of Sox15 genes in cattle, buffalo, sheep, and goats; Figure S16 Comparative amino acid analyses of Sox17 genes in cattle, buffalo, sheep, and goats; Figure S17: Comparative amino acid analyses of Sox18 genes in cattle, buffalo, sheep, and goats; Figure S18: Comparative amino acid analyses of Sox21 genes in cattle, buffalo, sheep, and goats; Figure S19: Comparative amino acid analyses of Sox30 genes in cattle, buffalo, sheep, and goats; Figure S20: Comparative amino acid analyses of SRY genes in cattle, buffalo, sheep, and goats; Figure S21: Analysis of GARD; Table S1: Buffalo and cattle Sox gene family with chromosome number and location; Table S2: Functional effects of mutations in Sox genes in buffalo.