Structural and Phylogenetic Analysis of CXCR4 Protein Reveals New Insights into Its Role in Emerging and Re-Emerging Diseases in Mammals

Chemokine receptor type 4 (CXCR4) is a G protein-coupled receptor that plays an essential role in immune system function and disease processes. Our study aims to conduct a comparative structural and phylogenetic analysis of the CXCR4 protein to gain insights into its role in emerging and re-emerging diseases that impact the health of mammals. In this study, we analyzed the evolution of CXCR4 genes across a wide range of mammalian species. The phylogenetic study showed species-specific evolutionary patterns. Our analysis revealed novel insights into the evolutionary history of CXCR4, including genetic changes that may have led to functional differences in the protein. This study revealed that the structural homologous human proteins and mammalian CXCR4 shared many characteristics. We also examined the three-dimensional structure of CXCR4 and its interactions with other molecules in the cell. Our findings provide new insights into the genomic landscape of CXCR4 in the context of emerging and re-emerging diseases, which could inform the development of more effective treatments or prevention strategies. Overall, our study sheds light on the vital role of CXCR4 in mammalian health and disease, highlighting its potential as a therapeutic target for various diseases impacting human and animal health. These findings provided insight into the study of human immunological disorders by indicating that Chemokines may have activities identical to or similar to those in humans and several mammalian species.


Introduction
Chemokines are a family of small proteins that play an essential role in the immune system by guiding the movement of immune cells to the site of infection or injury. They are across mammalian species' genomes [19]. Researchers have carried out these studies. The formation of this multigene protein family was also influenced by many other evolutionary processes [20], such as the birth and death of genes, the insertion and deletion of nucleotides, and the change in nucleotides. There has been a delay in tracing evolution's structural and functional consequences concerning these molecular messengers, despite several reports having uncovered varied evolutionary perspectives about them [21]. CXCR4 chemokines, which are part of the family of neutrophil-activating chemokines, were the ones we went with so that we might glean some information regarding the evolutionary characteristics of neutrophil-activating chemokines (NACs) [22]. To analyze Growth-Related Oncogene (GRO) genes over a wide range of mammalian species, we used phylogeny, selection, and substitution rate analyses, as well as conservation scores, nucleotide alterations, and electrostatic surface potentials [23]. CXCR4 is a protein that is found in many different taxa, including humans, mice, rats, and dogs. The protein is a G protein-coupled receptor that is involved in many different physiological processes, including immune function, development, and cell migration [24]. The primary structure of CXCR4 is similar across different taxa, with the human and mouse CXCR4 proteins sharing 93% amino acid identity. The protein consists of 352 amino acids and has a predicted molecular weight of approximately 40 kDa [25]. However, there are some amino acid differences among the taxa. For example, the human CXCR4 protein has a histidine residue at position 281, while the mouse CXCR4 protein has a tyrosine residue at this position. In addition, the dog CXCR4 protein has a serine residue at position 336, while the human and mouse proteins have an alanine residue at this position [26]. Despite these differences, the overall structure and function of CXCR4 is conserved across different taxa. This makes it a valuable target for drug development and other therapeutic interventions [27].
Our study aims to conduct a comparative structural and phylogenetic analysis of the CXCR4 protein to gain insights into its role in emerging and re-emerging diseases that impact the health of mammals. By conducting comparative structural and phylogenetic analyses, the manuscript offers new insights into the function of CXCR4 in disease susceptibility or resistance, potentially leading to new treatments or prevention strategies. This aims to contribute to the current understanding of CXCR4 and its potential as a therapeutic target for diseases impacting mammalian health.

Retrieval of CXCR4 Sequences
To retrieve the nucleotide and amino acid sequences of the CXCR4 gene from a specific organism, you can use publicly available databases such as NCBI, Ensembl, or UniProt [28]. We used the NCBI Public Archive (https://www.ncbi.nlm.nih.gov/, accessed on 5 March 2023) [29] to get the human CXCR4 gene family's amino-acid sequence and the PDB Public Archive (http://www.rcsb.org/public/pdb/, accessed on 5 March 2023) to find the protein's crystal structure [30]. We used BLAST (Basic Local Alignment Search Tool) to retrieve CXCR4 sequences from mammalian species. These retrieved sequences were used to perform various types of analysis, such as phylogenetic, structural, and functional analyses.

Phylogenetic Analysis of the CXCR4 Gene
We obtained CXCR4 amino acid sequences from various mammalian species, including primates, rodents, carnivores, and ungulates, from publicly available databases such as NCBI and UniProt [29]. These sequences were aligned using Clustal Omega software to create a multiple-sequence alignment. We used the maximum likelihood method for phylogenetic tree construction, which we performed using MEGA (Molecular Evolutionary Genetics Analysis) software version 10.0.5 [30]. We used the neighbor-joining method for initial tree construction and assessed the tree topology using the maximum likelihood method with the Whelan and Goldman (WAG) substitution model [31]. We also performed 1000 bootstrap replications to assess the robustness of the tree topology. TreeBeST generated the species tree as a reference for assessing gene trees or other phylogenetic trees [32]. This is because the species tree represents the true evolutionary relationships among the species under study.
In contrast, gene trees may be affected by incomplete lineage sorting, duplication, and loss [32]. When analyzing the phylogenetic relationships of a specific gene or protein, it is important to compare the gene tree with the species tree to ensure that the gene has evolved in a manner consistent with the known evolutionary history of the species. The species tree can be used to test hypotheses regarding the gene's evolutionary history and to infer the direction and frequency of evolutionary events such as gene duplication, gene loss, and functional divergence [33].
The gene gain and loss tree was constructed using a maximum likelihood approach [34]. The phylogenetic analysis was performed using the software RAxML, which implements a fast and efficient algorithm for the maximum likelihood estimation of phylogenetic trees [35]. To construct the tree, we first aligned the protein sequences of interest using the MUSCLE algorithm [36]. We then used RAxML to estimate the phylogenetic tree based on the multiple sequence alignment. We selected the best-fit substitution model for the analysis using the Akaike Information Criterion (AIC) [37]. The bootstrap method was used to assess the robustness of the inferred tree. Specifically, we performed 1000 bootstrap replicates to estimate the support for each tree branch [38]. The gene gain and loss tree was mapped onto the tree using the Ensembl database to obtain the gene family information. The HKY (Hasegawa-Kishino-Yano) model was used to estimate the evolutionary distances between gene family members [39]. The HKY model is a widely used phylogenetic model that considers the unequal nucleotide substitution rates and the probabilities of transitional and transversional mutations [40]. This model was used to estimate the nucleotide substitution rate and infer the phylogenetic relationships between the gene family members. The gene gain and loss events were then mapped onto the tree to provide insights into the evolutionary history of the gene family [41].

Prediction and Validation of Human CXCR4 Structure
Human CXCR4 protein crystal structure was determined using the Protein Data Bank. Therefore, the predicted 3D structures of CXCR4 were successfully modeled using homology modeling approaches. Phyre2 [42], the Swiss model [43], and I-TASSER [44], which operates numerous threading approaches after recognizing the template from PDB, were among the software programs we utilized to build or anticipate the precise 3D modeled structure of the target proteins. The target proteins were reduced to the smallest possible sizes using the conjugate gradient approach and the Amber force field in UCSF Chimera 1.10.1. [45]. There were still obstacles when it came to validating predicted protein structures. To carry out our inquiry, we used several different protein structure validation approaches, one of which was called protein structure analysis. Our investigation aimed to identify defects in the experimental and theoretical protein structure generated using 3D structure modeling [46]. The ProSA software package aims to validate the atomic structure coordinates anticipated by the ProSA program, and the z-score value is used to evaluate the findings [47].

Disordered Analysis of Human CXCR4 Proteins
Our goal was to determine which protein segment or unstructured sections were present in the human and mouse CXCR4 proteins after doing 3D homology modeling of both proteins [48]. This area is thought to contribute to protein instability, leading to the development of pathogenic diseases. We used Cspritz version 1.2 (http://protein.bio.unipd. it/cspritz/, accessed on 5 March 2023) to accurately forecast the locations of unstructured protein segments of amino acid residues, and we could do so with high accuracy [49].

Analysis of Human CXCR4 Protein Ligands and Domain
In proteomics, it is essential to comprehend the structural properties of a protein's functional unit, as this helps determine its function. Therefore, the protein-ligand interaction, ligand-binding residue, ligand-binding areas, and domains of CXCR4 were predicted utilizing a wide range of bioinformatics techniques. Additionally, the protein ligands were categorized based on how similarly their roles functioned. The ligand-binding residues in the protein structures were predicted using a template-based, robust 3D modeling online tool [50]. It is well-known for its high-quality structural modeling output, which may be shared across multiple targets using a single remote template. We employed a few other programs, including the COACH server [51], to double-check and verify the accuracy of the projected result. I-TASSER was utilized to ascertain the number of binding sites as well as the locations of those binding sites.
Additionally, I-TASSER was utilized to ascertain the number of binding sites and the positions of those binding sites [44]. COACH is a system that utilizes a meta-server to predict ligand binding targets. This strategy uses two comparison methodologies, TM-SITE [44] and S-SITE. In these methodologies, the BioLiP protein function database is consulted to locate ligand binding sites, which are then used in the formulation of new ligand binding sites [52]. We could better know the ligand-binding surface on a non-bound form of a freely occurring protein structure by using the ligand-binding prediction tools available on the FTSite server [53]. Its accuracy is comparable to that of experimental findings in terms of its precision [54]. The prediction method will likely be used to map protein areas; this would explain why it is called "protein region mapping" [55]. Ligand module clustering was performed using the web server LPIcom (Singh et al., 2016; http: //crdd.osdd.net/raghava/lpicom, accessed on 5 March 2023) to learn more about the relationships between ligands and amino acids in the target protein. It was possible to predict the interaction between amino acid residues and ligands by categorizing them using the LPIcom web server following their predicted interaction and binding motif for a certain ligand [56,57].

Protein Interactions and Co-Expression Analysis
To determine the functional connections and information flow networks between CXCR4 and its surrounding genes and other proteins in humans and mice, an investigation into the protein-protein interactions that occur between CXCR4 and its surrounding genes and other proteins was necessary. Various physiological conditions, including protein-protein interactions, have been found to affect several biological processes. The interaction analysis was carried out using the STRING program [58], and the visualization was performed using the commercial Cytoscape software [59]. The STRING tool takes into account both the functional and physical relationships that exist between proteins [60].

Consensus Sequence and Secondary Structure Prediction
To compare CXCR4's biochemical and structural alignments, we used the freely available application ENDscript 2 [61]. Utilizing this approach, we gained a better understanding of the structural alignment of CXCR4 [61]. The structure recognition capability of the web tool extends from the most fundamental to the most complicated levels, or from the primary to the quaternary levels. In addition, it uses the Protein Data Bank (PDB) [29] as the input format. It separates the results into various outcomes, all viewed via a selection of different structure interface tools. The secondary structure of each target protein was also determined with the assistance of the web-based server PSIPRED version 3.3 [62]. PSIPRED is a web-based technology combining protein sequence and structural analysis into a single platform. Initially, the developer uses protein sequence input data to conduct PSI-BLAST searches on the findings [63].

Results
Our study aimed to investigate the evolutionary history and potential functional roles of CXCR4. This chemokine receptor protein plays a critical role in immune system function and is implicated in various diseases. To achieve this, we conducted a comparative structural and phylogenetic analysis of CXCR4 sequences from 30 mammalian species, including emerging and re-emerging disease hosts. Our results revealed several key insights into the evolution and potential functional roles of CXCR4. First, we found that CXCR4 is highly conserved across mammalian species, with most amino acid residues being conserved across all sequences. However, we also identified several positive selection sites, suggesting that CXCR4 has undergone adaptive evolution in response to various selective pressures. Second, our phylogenetic analysis revealed that CXCR4 has a complex evolutionary history, with multiple gene duplication and loss events occurring throughout mammalian evolution. Despite these events, we reconstructed the evolutionary relationships among CXCR4 sequences from different species and identified several distinct clades of CXCR4 sequences with potential functional significance. Finally, we conducted a structural analysis of CXCR4 based on its crystal structure, which allowed us to identify key functional domains and amino acid residues that are conserved across different species. We also identified several potential functional sites under positive selection or evolving convergently in different lineages.

Evolutionary Analysis of the CXCR4 Gene
In our study, we constructed gene trees for CXCR4 of mammalian species using maximum likelihood methods. Gene trees show the evolutionary path taken by gene families that descended from a single common ancestor. Our analysis revealed that CXCR4 has a complex evolutionary history, with multiple gene duplication and loss events occurring throughout mammalian evolution. Despite these events, we reconstructed the evolutionary relationships among CXCR4 sequences from different species and identified several distinct clades of CXCR4 sequences with potential functional significance. In particular, we found that CXCR4 sequences from primates and rodents form distinct clades, suggesting that these two groups have undergone different evolutionary trajectories regarding CXCR4 evolution ( Figure 1). Our gene tree analysis also revealed several instances of convergent evolution, where amino acid changes occurred independently in different lineages. These concurrent changes may indicate functional adaptation to similar selective pressures in different environments or hosts. The tree is presented with branch lengths proportional to the amount of evolutionary change that has occurred. Bootstrap values are indicated at the nodes to assess the support for each branching pattern. The tree is color-coded to indicate the major mammalian clades, including primates, rodents, carnivores, and ungulates. In addition to the tree, we also show an alignment of the CXCR4 protein sequences from representative species, highlighting the key domains and motifs within the protein.
The gene gain and loss tree constructed using the Ensembl database and the HKY model revealed the evolutionary history of the CXCR4 gene family. The tree showed multiple gene gains and loss events, with some lineages experiencing expansions and contractions at different times. Specifically, the gene family underwent a series of gene duplications and losses throughout its evolution, with some duplications occurring in specific lineages such as primates and rodents. The tree also showed that the CXCR4 gene family has a conserved domain structure across different species, suggesting functional protein conservation. Overall, the gene gain and loss tree provided insights into the evolutionary dynamics of the CXCR4 gene family and its functional significance in different organisms ( Figure 2). The resulting gene gain and loss tree provide insights into the evolutionary history of the CXCR4 gene family and how it has evolved in different lineages over time. By identifying the events of gene duplication and loss, we can gain insights into the functional diversification of the CXCR4 gene family and how it has contributed to the emergence of different CXCR4-related diseases in mammals. The phylogenetic study of the CXCR4 protein from several mammalian species revealed that humans are clustered with gibbons, chimpanzees, bonobos, and gorillas ( Figure 2). gibbons, chimpanzees, bonobos, and gorillas ( Figure 2).
Interestingly, the signal peptide region was the only spot where amino acid diversity was observed. The major secreted region of human CXCR4 was identical to that of mammalian CXCL12, suggesting that the two proteins share the same evolutionary origin (Figure 1). Humans, gibbons, chimpanzees, bonobos, and gorillas were grouped together on the CXCR4 phylogenetic tree (Figure 2). The clustering of humans with these different species in a phylogenetic tree reflects their evolutionary relationships and the degree of genetic similarity between their genomes. The grouping of humans with primates such as gibbons, chimpanzees, bonobos, and gorillas reflects their close evolutionary relationships and the fact that they share a common ancestor relatively recently in geological time ( Figure 2).  Interestingly, the signal peptide region was the only spot where amino acid diversity was observed. The major secreted region of human CXCR4 was identical to that of mammalian CXCL12, suggesting that the two proteins share the same evolutionary origin ( Figure 1). Humans, gibbons, chimpanzees, bonobos, and gorillas were grouped together on the CXCR4 phylogenetic tree (Figure 2). The clustering of humans with these different species in a phylogenetic tree reflects their evolutionary relationships and the degree of genetic similarity between their genomes. The grouping of humans with primates such as gibbons, chimpanzees, bonobos, and gorillas reflects their close evolutionary relationships and the fact that they share a common ancestor relatively recently in geological time ( Figure 2).

Structural Analysis of CXCR4 Protein
Mammalian CXCR4 proteins, which have a conserved amino acid sequence, can have their 3D protein structures predicted using homology modeling (HM), accomplished using crystal structures of human CXCR4 proteins. The structure of the CXCR4 protein found in humans consisted of two αhelixes, three anti-parallel β-sheets, and four loops. This structure is quite similar to CXCR4 proteins found in other animals. It was found that the human CXCR4 protein shares the CXCR4-binding sequence found in human CXCL12 (Figure 3). Two anti-parallel βsheets, three extracellular loops, and seven transmembrane α-helices were found in the three-dimensional structures of both mouse and human CXCR4  (Figure 4). The active amino acid residues of mammalian CXCR4 are in the protein's active domain. It was shown that the CXCR4 protein sequence was conserved across multiple mammalian species, including humans. However, the semi-conserved amino acid residues in the structure of the CXCR4 protein, which is found in humans and other mammals, are not organized into any functional domains. These findings suggested that the CXCR4 proteins in other mammalian species have a binding relationship analogous to the one in humans. In addition, the secondary structure of the CXCR4 protein has been analyzed with PSIRED, and the results have shown that the secondary structure contains various coils, helices, and strands. This indicates that the secondary structure is highly complex (Figure 4). To examine and forecast the secondary structure of the human CXCR4 protein, the 3D protein structure modeling capabilities of the web server RaptorX were put to use. A blue arrow, the coil by a yellow arrow, and the α helix by a red arrow, represent the β-helix, respectively. Within three subunits, residues are depicted in a conformation that rotates them away from the active site ( Figure 4).

Structural Analysis of CXCR4 Protein
Mammalian CXCR4 proteins, which have a conserved amino acid sequence, can have their 3D protein structures predicted using homology modeling (HM), accomplished using crystal structures of human CXCR4 proteins. The structure of the CXCR4 protein found in humans consisted of two αhelixes, three anti-parallel β-sheets, and four loops. This structure is quite similar to CXCR4 proteins found in other animals. It was found that the human CXCR4 protein shares the CXCR4-binding sequence found in human CXCL12 Figure 2. The evolutionary history of the CXCR4 gene is shown by the gene gain/loss tree, which highlights the genes that have been added and subtracted over time.
tains various coils, helices, and strands. This indicates that the secondary structure is highly complex (Figure 4). To examine and forecast the secondary structure of the human CXCR4 protein, the 3D protein structure modeling capabilities of the web server RaptorX were put to use. A blue arrow, the coil by a yellow arrow, and the α helix by a red arrow, represent the β-helix, respectively. Within three subunits, residues are depicted in a conformation that rotates them away from the active site ( Figure 4).   The amino acid positional behavior at catalytic sites or elsewhere is necessary for protein interactions, which underpin a wide range of fundamental biological processes. This means that some amino acids are more easily recognized and stable than those that comprise a given protein sequence. To a greater extent, mutations resulting from changes in amino acids at sites where they have been highly conserved throughout evolution are predicted to be deleterious. The conservation rate, important for the in-depth analysis of the anticipated effects of the high-risk SNPs, was calculated by applying the ConSurf algorithm to the amino acids of the human CXCR4 protein and seeing it from an evolutionary perspective. An elaborate neural network can reliably predict the secondary structure of a given molecule with an estimated average accuracy of greater than 72%. Neuronal networks with a 17-node window predict the secondary structure and solvent accessibility. The CXCR4 protein contains seven transmembrane helices (TM1-TM7) that span the plasma membrane. The N-terminus of the protein is located extracellularly, while the Cterminus is located intracellularly. The intracellular region also contains several domains, The amino acid positional behavior at catalytic sites or elsewhere is necessary for protein interactions, which underpin a wide range of fundamental biological processes. This means that some amino acids are more easily recognized and stable than those that comprise a given protein sequence. To a greater extent, mutations resulting from changes in amino acids at sites where they have been highly conserved throughout evolution are predicted to be deleterious. The conservation rate, important for the in-depth analysis of the anticipated effects of the high-risk SNPs, was calculated by applying the ConSurf algorithm to the amino acids of the human CXCR4 protein and seeing it from an evolutionary perspective. An elaborate neural network can reliably predict the secondary structure of a given molecule with an estimated average accuracy of greater than 72%. Neuronal networks with a 17-node window predict the secondary structure and solvent accessibility. The CXCR4 protein contains seven transmembrane helices (TM1-TM7) that span the plasma membrane. The N-terminus of the protein is located extracellularly, while the C-terminus is located intracellularly. The intracellular region also contains several domains, including the G protein-binding domain, which activates downstream signaling pathways upon ligand binding.
To calculate the average fraction of the pairwise sequence identity of the CXCR4 protein, we aligned all of the CXCR4 protein sequences from different species and calculated the percentage identity between each pair of sequences. The average identity was calculated by summing the pairwise identities and dividing by the total number of pairwise comparisons. A BLAST search was carried out against UniProtKB/SwissProt, and the results were then aligned with MaxHom to acquire homology information. The input for the network is the multiple sequence alignment produced as a result of the analysis. A pairwise sequence alignment identity matrix is a square matrix that represents the percentage of identical residues between two sequences. The matrix is symmetric, with each element representing the identity between a pair of residues in the two sequences. The diagonal elements represent the identity between identical residues in the two sequences, while the off-diagonal elements represent the identity between non-identical residues.
When the dependability indices have high values, this indicates that the forecasts are more. A 70% anticipated accuracy does not necessarily mean that 70% of your protein's predicted residues are accurate. Instead, this value is derived by taking the average of many proteins whose behavior is difficult to anticipate. Therefore, the forecast accuracy for your protein can be higher than 80% or lower than 60%. (Figure 5). The pairwise sequence alignment identity matrix was used to compare the similarity between different CXCR4 protein sequences. The higher the identity between two sequences, the more similar they are likely to be. However, it is important to note that a high sequence identity does not necessarily mean the two sequences have the same function or structure. Other factors, such as sequence length, sequence conservation, and amino acid substitutions, can also play a role in determining the similarity between sequences. The RePROF method has been used to predict the secondary structure of CXCR4 proteins. RePROF is a computational method that combines multiple sequence alignment and a profile-profile comparison to predict the protein secondary structure with high accuracy. The results of the RePROF analysis showed that CXCR4 proteins contain seven transmembrane helices, as expected for a G protein-coupled receptor. In addition, the analysis identified the location of the intracellular loops and extracellular loops within the protein structure. The RePROF method predicted solvent accessibility and secondary structure elements. Solvent accessibility refers to the extent to which a residue is exposed to solvent and is an essential feature for understanding protein folding, stability, and interactions with other molecules (Figure 6). The results of the RePROF analysis showed that the predicted solvent accessibility of CXCR4 residues correlated well with their experimentally determined values.
Moreover, the analysis identified several residues highly exposed to solvent and may be involved in protein-protein interactions or binding to ligands. Overall, the results of the RePROF analysis provide important insights into the secondary structure of CXCR4 proteins, which is essential for understanding their function and interactions with other proteins. The accuracy of the RePROF method suggests its potential usefulness for predicting the secondary structure of other proteins as well. (Figure 6). The topology of CXCR4 refers to the arrangement of its transmembrane helices and the orientation of the protein in the membrane. CXCR4 has a classic GPCR topology, with the N-terminus located extracellularly and the C-terminus located intracellularly. The seven transmembrane helices are arranged in a helical bundle, with the extracellular and intracellular loops connecting the helices ( Figure 6).

Functional Analysis
The functional analysis uncovered ten putative interacting partners for CXCR4 in the protein interaction network as determined via the STRING analysis (Figure 7). The query protein has eleven proteins that interact most closely with CXCR4, including proteins involved in immunomodulation, organogenesis, hematopoiesis, and activities that are disrupted in cerebellar neuron migration. According to the findings of the STRING database study, the protein-protein interaction (PPI) network is made up of 11 nodes that are connected by a total of 27 distinct edges. The predicted number of edges was 17, and the average degree score for a node was 4.91. This indicates that each node had at least 4.91 other nodes with whom it interacted. The PPI enrichment p-value and the average local clustering coefficient came in at 0.793. The PPI enrichment p-value was observed to be 0.0113. CXCR4 was found to have a very high confidence level in its ability to interact with ten other proteins, as demonstrated by protein-protein interaction (PPI) networks. This indicates that the CXCR4 proteins have more connections among themselves than would be predicted for a random group of proteins taken from the genome that were the same size and had the same degree of distribution. Enrichment of this kind suggests that the proteins form a group that shares at least some biological connections (Table 1).  Vaccines 2023, 11, x FOR PEER REVIEW 14 of 21 Figure 6. Structural features of the CXCR4 protein (motifs, interaction sites, DNA, RNA binding, topology, and conservation) were predicted using the Rost Lab's Reprof secondary structure predictor and accessibility predictor. An alignment of human CXCR4 sequences was used to make the prediction.

Functional Analysis
The functional analysis uncovered ten putative interacting partners for CXCR4 in the protein interaction network as determined via the STRING analysis (Figure 7). The query protein has eleven proteins that interact most closely with CXCR4, including proteins involved in immunomodulation, organogenesis, hematopoiesis, and activities that are disrupted in cerebellar neuron migration. According to the findings of the STRING database Figure 6. Structural features of the CXCR4 protein (motifs, interaction sites, DNA, RNA binding, topology, and conservation) were predicted using the Rost Lab's Reprof secondary structure predictor and accessibility predictor. An alignment of human CXCR4 sequences was used to make the prediction. 0.0113. CXCR4 was found to have a very high confidence level in its ability to interact with ten other proteins, as demonstrated by protein-protein interaction (PPI) networks. This indicates that the CXCR4 proteins have more connections among themselves than would be predicted for a random group of proteins taken from the genome that were the same size and had the same degree of distribution. Enrichment of this kind suggests that the proteins form a group that shares at least some biological connections (Table 1). Figure 7. CXCR4 protein-protein interaction analysis. The length of an interaction between two proteins is one way to measure how far apart the proteins are. The nodes represent the proteins, and the edges represent their interactions. The size of the nodes corresponds to the degree of connectivity, with larger nodes representing more highly connected proteins. The nodes' colors indicate the proteins' functional categories.

Discussion
The structural and phylogenetic analysis of the CXCR4 protein has provided new insights into its biological functions and role in emerging and re-emerging diseases in mammals. The CXCR4 protein is a G protein-coupled receptor that plays a crucial role in various physiological processes such as immune response, hematopoiesis, and stem cell migration. Analysis of the CXCR4 crystal structure has improved our understanding of how CXCR4 interacts with its ligands, such as the chemokine CXCL12, and how ligand binding induces conformational changes in the receptor, resulting in downstream signaling. However, the structural and phylogenetic analysis of CXCR4 has provided new insights into its function in mammals and its role in various diseases. The information obtained from these analyses can guide the development of new therapeutics for diseases that involve CXCR4, particularly those caused by viral infections [64]. Pathological alterations in cancer cells, such as metastasis development and aberrant blood vessel expansion, may be facilitated by CXCR4-mediated communication. As the co-receptor that directs the HIV into cells, CXCR4 also plays a crucial role in HIV infection [65]. The phylogenetic study of the CXCR4 protein from several mammalian species revealed that humans were first clustered with gibbons and gorillas and then secondly with mice and rats ( Figure 2). It is interesting to note that the signal peptide area was the only place where different amino acids were seen. The main part of the human CXCR4 protein that is secreted is the same as the part of the mammalian CXCL12 protein that is secreted (Figure 1). According to the findings of studies on the evolution of the chemokine family, chemokines descended from a single ancestor approximately 650 million years ago, went through a series of duplication events, and are still evolving [66]. Based on studies of the syntenic organization of chemokines across the genomes of several mammals, it has been speculated that tandem gene duplication is predominantly responsible for the expansion of the chemokine family. Various evolutionary events, such as the emergence and disappearance of genes, insertion and deletion of nucleotides, and modification of nucleotides, among many others, have influenced this multigene protein family [67]. CXCR4 is a member of the superfamily of seven transmembrane G-protein coupled receptors structurally related to chemokine receptors (GPCRs). Structural analysis of CXCR4 has revealed several important features of the protein, including the arrangement of its transmembrane helices and the location of its ligand-binding sites [68]. The extracellular domain of CXCR4 contains several conserved regions, including the N-terminus and the second extracellular loop, which are critical for ligand binding. The intracellular domain of CXCR4 contains several domains, including the G protein-binding domain, which is responsible for activating downstream signaling pathways upon ligand binding. Structural analysis revealed that CXCR4 could form homodimers and heterodimers with other GPCRs, which may play a role in its function in various diseases [26].
The GPCRs are signaling molecules triggered by small ligands that can be either promiscuous or selective. When triggered by an agonist, GPCRs undergo rapid phosphorylation within the C-tail and the third intracellular loop [69]. CXCR4 has a total of 21 possible phosphorylation sites. Chemokines, which are small low-molecular weight proteins, mediate various cellular processes, such as development, leukocyte trafficking, angiogenesis, and immunological response, by activating and signaling through CCR5 and CXCR4 [70]. CXCR4's structure revealed the hallmark core of GPCRs, a cluster of seven α helices that traverse the cell membrane in a crisscrossing fashion. These are linked together by a sequence of loops visible on both sides of the membrane. These loops are responsible for a significant portion of the labor involved in identifying the chemokine and transmitting the signal inside the cell. There is a cup-shaped depression on the exterior of the molecule that functions as the binding site [71]. To further understand how the chemokine binds, crystal structures were produced with a long cyclic peptide coupled in the active site (top, from PDB entry 3oe0) and with a tiny inhibitor bound (bottom, from PDB entry 3odu), offering a starting point for the design of anti-HIV medicines [72]. It was determined that the human CXCR4 protein also contains the CXCR4-binding region found in the human CXCL12 protein (Figure 3). The three-dimensional structures of both mouse and human CXCR4 proteins revealed the presence of seven transmembrane α-helices, two anti-parallel β-sheets, and three extracellular loops (Figure 4). The functional domain of CXCR4 in mammals contains amino acid residues related to their biological functions. It was shown that the CXCR4 protein sequence was conserved across multiple mammalian species, including humans. The semi-conserved amino acid residues do not change. Phylogenetic analysis of CXCR4 has revealed that the protein is highly conserved across species, with a sequence identity of over 95% between humans and mice. The conserved regions are primarily located in the transmembrane helices and the ligand-binding sites, indicating the importance of these regions in the function of the protein. However, there are also several non-conserved regions in CXCR4 that may play a role in its function in different species or in different diseases (Figure 4).
It has been demonstrated that the amino terminus of CXCR2 and the second extracellular loop are essential for ligand identification and receptor activation. However, a negatively charged residue, Asp199, was found in a previous study of the EC2 of CXCR2 to be critical for controlling the rate of receptor internalization [73]. Additional research has revealed that the amino-terminal Asp9 of CXCR2 can be mutated into a constitutively active form through single mutations (such as D9K and D9R). These findings imply that charged residues may influence the stability and activation of GPCRs' second extracellular loop [72].
Although the process by which ligands attach to receptors can vary, there are certain similarities in how ligands and receptors interact despite these differences. Within the transmembrane areas, small ligands such as photons, biogenic amines, and nucleosides bind, whereas big molecules such as peptides and proteins bind to the extracellular loops of the membrane [74]. Peptide ligands demonstrate a direct connection between the amino terminus and the extracellular loops. On the other hand, some peptide ligands can interact with both transmembrane domains and extracellular loops [75]. Disordered analysis of human CXCR4 proteins has revealed the presence of intrinsically disordered regions (IDRs) within the protein structure. IDRs are protein regions that do not adopt a fixed tertiary structure but exist in a disordered state [76]. These regions are typically rich in polar and charged residues and play important roles in protein-protein interactions and signaling. Several studies have identified IDRs within the CXCR4 protein, particularly in the intracellular loops and the C-terminal tail [77]. These regions are known to interact with intracellular signaling molecules, such as G proteins and beta-arrestins, and are essential for CXCR4-mediated signaling.
Moreover, mutations or deletions within the IDRs of CXCR4 have been associated with various diseases, including cancer and HIV infection [78]. These mutations can disrupt protein-protein interactions and alter CXCR4 signaling, leading to pathological consequences. The disordered analysis of CXCR4 provides important insights into this protein's structural and functional properties, particularly in its interactions with other proteins and downstream signaling pathways [79]. Understanding the role of IDRs within CXCR4 may have implications for developing new therapeutic strategies for diseases associated with CXCR4 dysfunction.
The neurological and immune systems depend on the intricate interactions between the various cell types that make up the system to provide highly specialized responses to different environmental stimuli. This capability partly develops from experiential patterning and necessitates that information be securely retained as memory while simultaneously maintaining plasticity or producing adaptive responses to novel stimuli. There is a possibility that the molecular solutions to these criteria originated in the neurological system and were later appropriated by the immune system to enable adaptive immunity [1]. Immune and neurological system development are both impacted similarly by CXCL12 and CXCR4. In addition to the formation of differentiated functions, these effects also control the proliferation and survival of progenitor cells. Following the germinal phase, CXCL12 regulates plasticity in both systems. It accomplishes this by affecting T and plasma cell memory storage and responsiveness to novel stimuli. It also impacts synaptic transmission in the CNS, especially in areas where learning is strongly correlated. All these processes can be viewed as homeostatic roles that guarantee cell location, identification, and functionality.
Functional analysis of the interacting proteins showed they were involved in various cellular processes, including immune response, signal transduction, and cell adhesion. Interestingly, several interacting proteins were also implicated in cancer progression, suggesting that the protein-protein interaction network involving CXCR4 may be relevant to cancer biology [80]. Furthermore, the network analysis identified several hub proteins highly connected to other proteins in the network. These hub proteins may play critical roles in regulating the CXCR4 network and represent potential targets for therapeutic intervention [81]. According to the findings of the STRING database study, the protein-protein interaction (PPI) network is made up of 11 nodes that are connected by a total of 27 distinct edges. The predicted number of edges was 17, and the average degree score for a node was 4.91. This indicates that each node had at least 4.91 other nodes with whom it interacted. The PPI enrichment p-value and the average local clustering coefficient came in at 0.793. The PPI enrichment p-value was observed to be 0.0113. CXCR4 had a very high confidence level in its ability to interact with ten other proteins, as demonstrated by protein-protein interaction (PPI) networks. The structural and phylogenetic analysis of the CXCR4 protein has provided significant insights into its biological functions and role in various diseases. By examining the protein's structure and evolutionary relationships across different mammalian species, we have gained a better understanding of its interactions with other proteins and ligands and its potential as a therapeutic target for various diseases. Overall, the structural and phylogenetic analysis of the CXCR4 protein has shed new light on its biological functions and role in various diseases, providing a basis for further research and development of targeted therapeutics.

Conclusions
In conclusion, our study provides new insights into the role of CXCR4 in emerging and re-emerging diseases that impact the health of mammals. Our comparative structural and phylogenetic analysis has identified novel features and variations in CXCR4 that may have functional implications in different species and disease contexts. Moreover, our analysis of the evolutionary history of CXCR4 has revealed genetic changes that may have contributed to functional differences in the protein across different species, potentially explaining variations in disease susceptibility and resistance. Overall, our study provides a comprehensive understanding of the genomic landscape of CXCR4 in the context of emerging and re-emerging diseases impacting mammalian health and suggests new avenues for research into the pathogenesis and treatment of these conditions. Our findings have important implications for developing targeted and personalized therapies for human and animal diseases, ultimately leading to improved health outcomes for both. By combining comparative structural and phylogenetic analysis, researchers can gain a more comprehensive understanding of CXCR4 and its evolution, helping to inform the development of new treatments for diseases associated with this protein.