Molecular Evolution of the Bactericidal/Permeability-Increasing Protein (BPIFA1) Regulating the Innate Immune Responses in Mammals

Bactericidal/permeability-increasing protein, a primary factor of the innate immune system of mammals, participates in natural immune protection against invading bacteria. BPIFA1 actively contributes to host defense via multiple mechanisms, such as antibacterial, surfactant, airway surface liquid control, and immunomodulatory activities. However, the evolutionary history and selection forces on the BPIFA1 gene in mammals during adaptive evolution are poorly understood. This study examined the BPIFA1 gene of humans compared with that of other mammalian species to estimate the selective pressure derived by adaptive evolution. To assess whether or not positive selection occurred, we employed several different possibility tests (M1 vs. M2 and M7 vs. M8). The proportions of positively selected sites were significant, with a likelihood log value of 93.63 for the BPIFA1 protein. The Selecton server was used on the same dataset to reconfirm positive selection for specific sites by employing the Mechanistic-Empirical Combination model, thus providing additional evidence supporting the findings of positive selection. There was convincing evidence for positive selection signals in the BPIFA1 genes of mammalian species, which was more significant for selection signs and creating signals. We performed probability tests comparing various models based on dN/dS ratios to recognize specific codons under positive selection pressure. We identified positively selected sites in the LBP-BPI domain of BPIFA1 proteins in the mammalian genome, including a lipid-binding domain with a very high degree of selectivity for DPPC. BPIFA1 activates the upper airway’s innate immune system in response to numerous genetic signals in the mammalian genome. These findings highlight evolutionary advancements in immunoregulatory effects that play a significant role in the antibacterial and antiviral defenses of mammalian species.


Introduction
Bactericidal permeability-increasing protein (BPI) is a highly effective antimicrobial protein that binds and neutralizes lipopolysaccharides released from the outer membrane of bacteria [1]. The BPI fold-containing family A member 1 (BPIFA1) gene is known to have effects on the local immune system, and these effects can potentially influence the growth and invasion of microorganisms [2]. One of the potential mechanisms that underlie this link population levels have made it possible to conduct a rapid and robust assessment of the evolutionary rates and adaptations that are driven by natural selection [20]. At both the phylogenetic and population levels, a substantial amount of work has been conducted to build inference methods. Furthermore, the increasing accessibility of protein structural and functional data has allowed researchers to examine the impact of structural and functional constraints on the evolution and adaptation of protein sequences [16]. Because of the limits imposed by their structures and their functions, the rates of evolution and adaptation are different for various proteins and sites within the same protein [19].
The bulk of a cell's functions is intricately intertwined with the regulatory networks of gene expression that enable organisms to tolerate higher infection levels or mitigate the effects of those infections [21]. Most of the components that make up cellular physiology are intimately related to these gene expression regulatory systems, which are frequently old evolutionary adaptations [22]. These mechanisms have drawn a substantial degree of interest in research that has utilized a constrained set of model species for which genetic information is available [23]. However, little is known about the mechanisms that led to the evolution of these systems or how they adapted to diverse environmental settings as evolution progressed. This study aims to investigate the evolutionary origins of the BPIFA1 gene to reveal its physiochemical features and apply comparative genomics to provide an assessment of the gene in various mammalian species. We conducted in-depth comparative studies of the bactericidal/permeability-increasing protein (BPIFA1) gene, which regulates the innate immune response in mammals, to better understand how these genes work. There is a possibility that selective pressure will have a significant effect on the evolution of adaptation. In this study, we investigate the history of these genes in various vertebrate species, as well as how genetic diversity and natural selection have influenced the development of this gene family over time.

Sequence Retrieval and Analysis
The amino acid and coding nucleotide sequences of the BPIFA1 gene in 34 mammalian species, including humans as the reference species in this study, were collected from GenBank (https://www.ncbi.nlm.nih.gov/genbank, accessed on 20 September 2021), and they were aligned using the Clustal Omega tool in MEGA 6 software [24]. The maximumlikelihood method was used in MEGA 6 software to generate the phylogenetic tree for the BPIFA1 gene. This tree was constructed based on the evolutionary relationships among the genes. The bootstrap test calculated the average number of substitutions per site and the average branch length by employing a maximum-likelihood method with 1000 repeats to determine taxonomic clustering. This method was used to pick a topology for more advanced log-likelihood values [25,26]. The species names and accession numbers used to study the BPIFA1 gene are provided in Supplementary Table S1.

Selection Analysis
Maximum likelihood approaches were used to compare the ratios of dN/dS for each codon site to identify specific codons in mammalian BPIFA1 gene sequences subjected to positive selection [27,28]. CODEML executed in PAML [29] and the DATAMONKEY webserver (https://www.datamonkey.org, accessed on 29 September 2021) [30] were utilized for the analysis, and the outcomes were designated using substitution ratios of codons that were considerably higher than 1 for codons under positive selection. The initial step of this research was to determine whether or not positive selection occurred using the maximum likelihood ratio test. This analysis determined the presence of sites with a dN/dS ratio greater than one. In this study, we contrasted a discrete (generic) model that performed this function with a null model that prohibited the occurrence of sites with a value greater than 1 [31]. Analyses were compared using a likelihood log (2∆l) distribution with df = 4. The null hypothesis (M7) asserted that the distribution was bounded by the values 0 and 1. An alternative model (M8) with two parameters, omega (ω) and beta (β), allowed for the derivation of a value from the dataset, which may be greater than 1 [27]. Analyses using fixed effect likelihood (FEL), single likelihood ancestor counting (SLAC), and random effect likelihood (REL) all found that the BPIFA1 gene was subject to positive selection when global values for synonymous and non-synonymous divergences at each site were compared [32].
The second stage was to utilize the maximum probability estimate to locate amino acid positions that were the subject of positive selection throughout the course of evolution. The Bayes theorem, which predicts the posterior probabilities of the sites that are subject to positive selection, was used to successfully accomplish this goal. Positive selection was observed to be operating at amino acid locations with posterior probabilities ranging from 95% to 99% [33]. Amino acid residues with a high probability that the value was greater than one were subjected to selective procedure. The Swiss model and Phyre 2 (http://www. sbg.bio.ic.ec.k/phyre/html, accessed on 28 September 2021) are web-based applications that display the locations of favorably selected amino acids on protein structures [34]. We predicted the location of evolutionary conservation of nucleic acids and amino acids in the protein using the ConSurf tool (http://consurftest.tau.ac.il, accessed on 28 September 2021), which was based on the phylogenetic relationship between sequences [35]. The sequence of the aligned codon of BPIFA1 was examined in Selecton version 2.2 (http://secton.tau.ac.il, accessed on 28 September 2021), which permits determining the varied ratios of various codons inside the aligned sequences. These ratios were measured using the Bayesian inference approach through various likelihood tests. This was performed to confirm positively selected codons [36]. Moreover, the Selecton results were shown in various colors to denote the various selection criteria.

Recombination Analysis
To find evidence of recombination, we performed a model selection procedure based on statistical likelihood that can sift through many sequence alignments in search of breakpoints and spot likely recombinant DNA. This technique used a genetic algorithm to search the alignments of several sequences for recombination breakpoints in order to accomplish its goal. The GARD approach is simple to grasp, easily extensible, and highly parallelizable. Extensive simulation experiments have demonstrated that the method beats other current tools in almost all cases, particularly concerning accuracy. To investigate the evidence of recombination, the nucleotide sequences were first assessed to identify haplotypes (Na) and estimate the polymorphic sites (S), average number of nucleotide differences (K), and nucleotide diversity (π) using DnaSP 5.10 software [37]. Detection of breakpoints and assessment of recombinant signals in nucleotide sequences were performed using the online GARD tool of the Datamonkey webserver [38]. Additionally, using GARD to screen sequences for recombination assures that methods focused on identifying positive selection have acceptable statistical features.

Protein-Protein Interactions Analysis
Much interest has been directed toward investigating how protein-protein interactions are preserved from one species to another. Since there are several hurdles in the experimental identification and confirmation of interactome data, it would be intriguing to understand a PPI transferred from a species that has been proven in another species [39]. The STRING databank is a free bioinformatics resource that contains information describing how proteins interact with one another as part of several pathways. The number of lines connecting each protein node and betweenness values are used to identify intermediate nodes, representing proteins that play important biological roles and are intimately linked to one another. Network creation was carried out using STRING and Cytoscape software (http://www.cytoscape.org, accessed on 29 September 2021) was used to display the network [40]. By identifying the protein-protein interactions of BPIFA1 among immune proteins and co-expression analysis using STRING version 9.1 (http://www.string-db.org, accessed on 29 September 2021), we were able to further determine how BPIFA1 functions at the molecular level.

Structural Analysis of BPIFA1 Protein
In this analysis, we built the crystal structure of the human BPIFA1 protein using homology modeling with online tools, such as the Swiss model (http://swissmodel.expasy. org, accessed on 29 September 2021) [41], I-TESSAR [42], and Phyre2 (http://www.sbg. bio1.ic.ac.uk/phyre2/html, accessed on 29 September 2021) [43]. The conjugate gradient method and Amber force field in UCSF Chimera 1.10.1 software were used to reduce the assembled target protein. In addition, the ProSA webserver was utilized to evaluate the stereochemical properties of the expected structure [22].

Results
The BPIFA1 protein sequences encoded in the mammalian genome were studied to determine the role of adaptive selection and evolution. The protein BPIFA1 is the key mediator of innate signaling against microbial infections by bacteria and fungi. Once the sequences were combined using MSA, they were utilized to create Bayesian phylogenetic trees and undergo further investigation. To initiate intracellular signaling cascades, activating a set of genes identified in the appropriate mammalian species and possessing a functioning (LBP-BPI) domain is necessary. For the surfactant phospholipid dipalmitoylphosphatidylcholine (DPPC), this lipid-binding domain has a very high degree of selectivity. The upper airway's innate immune system is activated in response to numerous genetic signals, such as increased non-synonymous substitution rates, significant homologous haplotypes, and an absence of genetic variation in BPIFA1 proteins, demonstrating that the presence of these proteins has been favored by positive selection.

Molecular Evolution of BPIFA1 Gene
In this work, we searched for signs of adaptation in the BPIFA1 gene, ranging from progressively weak to strong selection signals during adaptive evolution in the mammalian genome. The typical percentage of codons in the BPIFA1 gene undergoing adaptive evolution was determined. Following the same procedure for each coding sequence, we calculated the average proportion of positively selected codons across all branches. Using BUSTED and synonymous rate variation in carefully chosen test branches of the BPIFA1 phylogeny, we determined traces of gene-wide episodic diversifying selection. As a result, we concluded that divergent selection occurred along the three examined lines of descent. Using synonymous rate variation, we observed gene-wide episodic diversifying selection in the test branches of the BPIFA1 phylogeny. A gene-wide episodic diversifying selection was used to achieve this (LRT). Two test branches exhibited evidence of diversifying selection, suggesting that the site had been subjected to this type of evolution ( Figure 1).
The average dN/dS ratios for BPIFA1 across all sites and lineages were greater than one. As a result, research was conducted on this protein to identify the signatures of positive selection. The protein was found to have a conserved structure of amino acids, making it possible to be purified, and it had an omega value greater than 1. A log-likelihood test was performed on this protein, all of its sites were analyzed, and the substitution rate was calculated. To assess whether or not a positive selection occurred, we used three different sets of likelihood models: M0 vs. M3, M1 vs. M2, and M7 vs. M8. The parameter estimates under M1 and M2 were compared and it was found that the M2 value for these proteins was positive. The percentages of positively selected sites were significant for the three models, with values of 422.86, 64.5, and 93.63, respectively (Table 1). To provide additional evidence to support the findings of positive selection, we applied the Mechanistic-Empirical Combination model to specific sites using the Selecton server. During this process, we discovered that several sites had been identified as having been subjected to selective pressure at various points during evolution ( Figure 1). Because of this, we could estimate the degree to which this gene has been evolutionary conserved. We found that the vast majority of the positively selected sites had been conserved throughout the mammalian clades. This was because the conserved amino acids accounted for most of the signals used for positive selection in the neural network's algorithm ( Table 2).
The parameter estimates under M1 and M2 were compared and it was found that the M2 value for these proteins was positive. The percentages of positively selected sites were significant for the three models, with values of 422.86, 64.5, and 93.63, respectively (Table  1). To provide additional evidence to support the findings of positive selection, we applied the Mechanistic-Empirical Combination model to specific sites using the Selecton server. During this process, we discovered that several sites had been identified as having been subjected to selective pressure at various points during evolution ( Figure 1). Because of this, we could estimate the degree to which this gene has been evolutionary conserved. We found that the vast majority of the positively selected sites had been conserved throughout the mammalian clades. This was because the conserved amino acids accounted for most of the signals used for positive selection in the neural network's algorithm (Table 2).

Figure 1.
Results of adaptive selection on 20 primate BPIFA1 sequences using the MEC model. The human protein was used as a reference. Positive selection is indicated by yellow and magenta, whereas purifying selection is represented by blue and green.
The codon model selection method evaluated 9113 different models. The best model (log(L) = −18,910, mBIC = 39,340.92) contained three rates and was the most accurate. With this model, improvements of 218.66 log(L) and 398.33 mBIC points were achieved compared to a single rate model, in which all non-synonymous substitutions occurred at the same rate, as shown in Table 1. Each model in the credible set had an evidence ratio of at least 0.01 compared to the best model, meaning that it was within 9.21 mBIC units of the best model, or equivalently, that it had an evidence ratio of at least 0.01 compared to the best model. Model averaging estimated the rate of change in this collection of models ( Figure 2). The evolutionary selection pattern on amino acid positions in the BPIFA1 protein was also assessed using codon model selection analysis, which showed that the substitution of amino acid sites occurred during adaptive evolution in the proteins. We revealed that the basic amino acid positions of the proteins exhibited adaptive evolution due to varying substitution rations. Based on the distribution of amino acid sites in BPIFA1, the maximum substitution rate was approximately 1.19, while the lowest was.14 ( Figure 2).
Application of a genetic algorithm (GA) model to identify structural and evolutionary rate clusters from BPIFA1 protein alignments. Maximum-likelihood estimation was used to identify each cluster and GA was used to determine its rate. Identification of physiologically significant regions of a protein can be performed by contrasting the frequency of synonymous (Ks) and non-synonymous (Ka) substitutions in the protein. This provides the basis for concluding the existence of purifying selection and localized positive Darwinian selection. We used Selecton v. 2.2 (accessible at http:// selecton-bioinfo-tau.ac.il, accessed on 29 September 2021), a web server that automatically calculates the ratio of Ka to Ks (u) at each site in the protein. Different colors represent different types of selection (positive selection, purifying selection, and no selection) and are used to graphically display this ratio at each site. The Selecton model is a collection of different evolutionary hypotheses that can be used to statistically test the likelihood that a given protein has been subjected to positive selection. It operates via a graphical user interface. The recently established mechanistic-empirical model influenced the amino acid's physical properties (Table 3). The distribution of synonymous (α) and non-synonymous (β) substitution rates across sites estimated by the MEME model is shown in this summary table, where the percentage of branches with β > α is much higher than 0. The p-value was calculated using a combination of χ 2 distributions. The Simes technique regulated the false discovery rate under the strict neutral null and generated the q-values (likely to be conservative).

Adaptive Selection of BPIFA1 Gene
To determine the degree to which different mammalian species have adapted to their environments, we used multiple alignments of the coding sequences of the BPIFA1 gene from each of the 34 species. These tests can be employed individually or in combination. The most common variety of tests is known as a branch test. During evolution of the vertebrate species, the selection of specific lineages was utilized to recognize distinct lineages as subject to selection pressure. Lineage-specific selection probabilities were calculated for each phylogenetic group using an adaptive branch-site random effects likelihood (aBS-REL) model. In addition, the aBS-REL technique was utilized to dissect each gene to determine which lineages had been subjected to adaptive selection at different times in evolutionary history. When applied to mammalian lineages, the aBS-REL model confirmed that the BUSTED-predicted genes were under positive selection. Our results, which suggested that selective pressure was acting on BPIFA1 genes in mammalian lineages, demonstrated that the two hypotheses were congruent (Table 4). In the phylogeny of the BPIFA1 gene, there was evidence of episodic diversifying selection in eight branches. The importance of the findings was evaluated using the Likelihood Ratio Test (p > 0.05), which was carried out after the outcomes of many other tests were considered (Figure 3). In total, 63 distinct lines were put through this specific test for diversifying selection. Multiple tests were carried out, and the significance of the findings was established by applying the Likelihood Ratio Test with a p-value threshold of 0.05. This table reports a statistical summary of the models' fit to the data. Baseline MG94xREV refers to the MG94xREV baseline model that infers a single ω rate category per branch. The full adaptive model refers to the adaptive aBS-REL model, which implies an optimized number of ω rate categories per branch.
During the evolutionary process, we examined the omega values by employing the SLAC, FUBAR, MEME, and FEL methods to locate indications of positive selection (Table 5). According to our findings, the BPIFA1 gene in mammalian clades has been subject to positive evolutionary selection. We could detect which regions of the genome were being subjected to selective pressure by using the Bayesian method. This technique involves determining the posterior probability for each codon. Sites with a greater number of possibilities are more likely to have undergone diversifying selection, which leads to higher rates of non-synonymous and synonymous substitution than sites with a lower number of probabilities (Table 2). Using BEB analysis, we found that several locations all across the bactericidal protein's LBP-BPI domain had been subject to positive selection with a high posterior probability of 95%. This was the case for all sites. The sites were dispersed throughout the domain in various locations. The findings of PAML were examined using the dataset found in the Selecton server. This server was able to identify adaptive selection at certain sites within the protein, which allowed us to validate the existence of positive selection. To determine the substitution rates, the MEC model was applied. The findings demonstrated that adaptive selection occurred at several locations in BPIFA1 (Table 5). The aBS-REL models used to undertake selective analyses of vertebrate activating transcription factor genes. The length of the branch is separated into segments based on the percentage of sites that correspond to each class, and the color of the branch segment shows the relative relevance of the relevant parameters. Because of this, sites along a branch can be categorized according to the β distribution that has been inferred. Depending on whether or not the site has a p value of less than 0.05 (adjusted for multiple testing), thicker branches are categorized as having either undergone diversifying positive selection or diversifying negative selection. This determination is based on whether or not the null hypothesis is rejected.
During the evolutionary process, we examined the omega values by employing the SLAC, FUBAR, MEME, and FEL methods to locate indications of positive selection (Table  5). According to our findings, the BPIFA1 gene in mammalian clades has been subject to positive evolutionary selection. We could detect which regions of the genome were being subjected to selective pressure by using the Bayesian method. This technique involves determining the posterior probability for each codon. Sites with a greater number of possibilities are more likely to have undergone diversifying selection, which leads to higher rates of non-synonymous and synonymous substitution than sites with a lower Figure 3. The aBS-REL models used to undertake selective analyses of vertebrate activating transcription factor genes. The length of the branch is separated into segments based on the percentage of sites that correspond to each class, and the color of the branch segment shows the relative relevance of the relevant parameters. Because of this, sites along a branch can be categorized according to the β distribution that has been inferred. Depending on whether or not the site has a p value of less than 0.05 (adjusted for multiple testing), thicker branches are categorized as having either undergone diversifying positive selection or diversifying negative selection. This determination is based on whether or not the null hypothesis is rejected.

Recombination Analysis
For the BPIFA1 gene, a recombination analysis was performed to find potential evolutionary links between genes. The research revealed three recombination events. Each of the recombination sequences, including the major and minor parents, came from the BPIFA1 gene. We identified recombination breakpoints using GARD analysis. At a rate of 30.30 models per second, GARD inspected 5120 models. The search space of 72,874,879 models with up to three breakpoints was generated by the alignment's 759 possible breakpoints, of which the genetic algorithm only examined 0.01%. With an evidence ratio of 100 or above, the multiple tree model was preferred to the single tree model, indicating that at least one of the breakpoints actually reflected a topological incongruity. This was validated by comparing the AICc scores of the best-fitting GARD model, which allowed for variable topologies across segments (37,996.2), and the model, which assumed the same tree for all of the partitions determined by GARD, but allowed varied branch lengths between partitions. Specifically, the AICc score of the best-fitting GARD model was 37,996.2, whereas the AICc score of the model was 37,996.2. (Figures 4 and 5).

Protein-Protein Interactions and Ligand Binding Analysis
We used the STRING database to search for proteins expressed with BPIFA1, identifying several pairs of protein-protein interactions. There were 13 nodes and 35 edges denoted by the proteins expressed with BPIFA1. The edges of the PPI diagram are the line networks that link the individual nodes ( Figure 6). The average local clustering coefficient value was 0.978. PPI enrichment had a p-value of 5.25 × 10 −12 . The PPI network represented the BPIFA1 gene's interactions with other co-expressed immune genes. COX7B2, BPIFB6, BPIFB4, BPIFB2, BPIFB3, PLTP, CETP, BPI, LBP, and ODF2L were the 10 genes involved in the PPI network of BPIFA1 ( Figure 6).   The BPIFB6, BPIFB4, BPIFB2, and BPIFB3 genes were the most significant because they are involved in biological signaling pathways, which play an essential role in innate immunity against bacterial infection. In addition, these genes are upregulated by BPIFA1, which is another reason they were considered so significant ( Table 6). The molecular pathways essential in eradicating invading germs through membrane-disrupting activity comprised all related proteins with varied roles. Membrane-disrupting activity was necessary for the elimination of invading germs. Two crucial proteins in the mediation of signals in response to lipopolysaccharides include LPS-binding protein (LPSBP) and bactericidal permeabilityincreasing protein (BPI). They displayed a strong affinity for Lipid A, a substance found in LPS, and were strikingly similar to one another. Despite having similar structures, LBP and BPI perform various biological functions that are distinctly different from one another. For instance, LBP frequently binds to LPS and greatly facilitates the presentation of LPS to CD14+ cells, such as macrophages and monocytes, whereas BPI inhibits and lowers the bioactivity of LPS. These two proteins are both present in bacteria.

Protein-Protein Interactions and Ligand Binding Analysis
We used the STRING database to search for proteins expressed with BPIFA1, identifying several pairs of protein-protein interactions. There were 13 nodes and 35 edges denoted by the proteins expressed with BPIFA1. The edges of the PPI diagram are the line networks that link the individual nodes ( Figure 6). The average local clustering coefficient value was 0.978. PPI enrichment had a p-value of 5.25 × 10 −12 . The PPI network represented the BPIFA1 gene's interactions with other co-expressed immune genes. COX7B2, BPIFB6, BPIFB4, BPIFB2, BPIFB3, PLTP, CETP, BPI, LBP, and ODF2L were the 10 genes involved in the PPI network of BPIFA1 ( Figure 6).
The BPIFB6, BPIFB4, BPIFB2, and BPIFB3 genes were the most significant because they are involved in biological signaling pathways, which play an essential role in innate immunity against bacterial infection. In addition, these genes are upregulated by BPIFA1, which is another reason they were considered so significant ( Table 6). The molecular pathways essential in eradicating invading germs through membrane-disrupting activity comprised all related proteins with varied roles. Membrane-disrupting activity was necessary for the elimination of invading germs. Two crucial proteins in the mediation of signals in response to lipopolysaccharides include LPS-binding protein (LPSBP) and bactericidal permeability-increasing protein (BPI). They displayed a strong affinity for Ligands are critical components in the process of controlling the expression and activity of proteins. Intermolecular binding forces, such as ionic bonds, hydrogen bonds, hydrophobic interaction, and Vander-Waals forces, contribute to the ligand-binding process. Due to interactions between ligands and proteins, the protein's three-dimensional structure will be altered. Because of these changes in the conformational state of the protein, some of the protein's functions may be either inhibited or activated. Therefore, we performed a protein-ligand binding interaction study using amino acid physiochemical characteristics to determine which residues interact with the ligand and which do not. To accomplish this, we used a website (http://crdd.osdd.net/raghava/lpicom, accessed on 18 October 2021) that calculates the fraction of residues that interact with a given ligand. Key residues, such as cysteine, glycine, alanine, lysine, aspartic acid, histidine, leucine, valine arginine, tryptophan, serine, threonine, and tyrosine, were shown to interact with seven ligands (1BP1, BPH, XE, NEH, CLA, CU, and MG) and PC1. Compared to the interaction with PC1, charged amino acids, especially essential amino acids, had a greater advantage when interacting with 1BP1, BPH, XE, NEH, CLA, CU, and MG (Figure 7). The small and polar amino acids that correlated with them were characterized in each of the three ligands.
We used two distinct approaches to make predictions regarding complementary binding sites: the first was predicated on comparing binding-specific substructures (TM-SITE), while the second was predicated on the alignment of the sequence profiles (S-SITE). These techniques assessed the BPIFA1 protein against 500 non-redundant proteins that combined with 814 organic, synthetic, and metal ion compounds. Beginning with predictions of lowresolution protein structures, the approaches successfully identified the binding residues of BPIFA1, achieving an average Matthews correlation coefficient (MCC) that was much higher. Additionally, the techniques uncovered ligands that bind with the residues (Table 7).
Genes 2023, 13, x FOR PEER REVIEW 16 of 24 Figure 6. The protein-protein interaction (PPI) network for the BPIFA1 gene constructed using the online STRING database. The genes that are responsible for upregulation, downregulation, and neutral regulation are represented by red, blue, and green circles, respectively. The intensity of the interactions that take place between these genes is represented by the thickness of the lines that connect them. Mean values of a negative correlation coefficient are represented by solid edges, whereas mean values of a positive correlation coefficient are represented by dotted lines. Changes in the folding or stitching of proteins that take place after transcription are represented as nodes in the protein-protein interaction (PPI) network. Each node in the network represents the whole set of proteins that can be produced by a single copy of the protein-coding gene. high-density lipoprotein particle remodeling 2 of 17 0.0052 Figure 6. The protein-protein interaction (PPI) network for the BPIFA1 gene constructed using the online STRING database. The genes that are responsible for upregulation, downregulation, and neutral regulation are represented by red, blue, and green circles, respectively. The intensity of the interactions that take place between these genes is represented by the thickness of the lines that connect them. Mean values of a negative correlation coefficient are represented by solid edges, whereas mean values of a positive correlation coefficient are represented by dotted lines. Changes in the folding or stitching of proteins that take place after transcription are represented as nodes in the protein-protein interaction (PPI) network. Each node in the network represents the whole set of proteins that can be produced by a single copy of the protein-coding gene. Table 6. Functional enrichment of biological processes in the human BPIFA1 protein network.

Discussion
Heterogeneous backgrounds offer platforms where populations undergoing divergent selection can be distinguished into natively adapted subpopulations [44]. The influence of selection on gene flow among populations, such as migration-selection balance, determines the possibility of innate adaptation and continued divergence. This is also known as the migration-selection balance. There is a tendency for local genetic variability within populations to become homogenized due to gene flow when the effect of selection is less significant than the effect of gene flow. Instead, genetic variants may accumulate and be retained across specific loci susceptible to powerful divergent selection if the selective pressure is greater than the integrative force of gene flow [45]. In the possible alternative outcome, the benefits of gene flow are limited by selection against immigrants who have a poor genetic fit, which also paves the way for local adaptation [45,46]. There must be a connection between gene flow and selection to understand population differences in the frequency of gene flow [46]. Under such circumstances, selection determines whether the population continues to evolve or diverge as a distinct group. The empirical Bayes approach calculated the LRT at each branch site and located all the different sites where diversified selection may occur. Based on the empirical Bayes approach, the Fast, Unconstrained Bayesian Approximation, also known as FUBAR, was applied to locate the diversifying selection occurring at the BPIFA1 gene. FUBAR allowed for site-to-site and branch-tobranch dispersion of codons and was utilized to explore the adaptive evolution that occurred at the gene level. The method of MEME was utilized to investigate the adaptive evolution that occurred at the gene level [25,32,47]. The episodic diversifying coding sites were found by SLAC with a p value of less than 0.01 (Table 1). This model was used to estimate the synonymous and non-synonymous substitution rates, and coding sites with synonymous substitution rates greater than or equal to the non-synonymous rate were considered noteworthy for identifying sites that were undergoing diversifying selection. In MEME, maximum-likelihood estimations for the BPIFA1 gene's codons 130,167,168,190,243,265, and 289 were obtained (Table 2). Based on their non-significant signals, these codons were not identified as positively selected sites, which is due to the episodic character of natural selection. The natural selection that took place sporadically throughout brief intervals of adaptive evolution was masked by the frequent occurrence of either purifying or natural selection. Consequently, signs of adaptive evolution could not be found via sensitivity testing and positive selection [48].
We found seventeen sites that were favorably chosen using the PAML method, fifteen sites that were chosen using the IFEL algorithm, and four sites that were chosen using the FEL algorithm. The adaptive selection pressure on the BPIFA1 gene's codon sequences was calculated using the MEC model. This resulted in the identification of seventy-four amino acids (Figure 1). A model of evolution based on positive selection was used, revealing differences at the codon level (M8). The MrBayes application on the Selecton server utilized an MCMC model to previously determine differences in the MAVS gene in mammals at the codon level [49]. Based on the results of MAFFT protein alignments, previous studies have shown that the Ig domain remains in the MAVS coding sequences. These results suggest that alternative protein switches in purifying selected regions are deleterious and thus unlikely to be maintained throughout evolution [50,51]. Sites for multiple evolutionary pathways were identified using a multi-parameter rate distribution, a random effect model with a 95% confidence interval, and substantial Pr [β > α] values. Sites could then be located thanks to this method (Table 3). In the case of positive selection, the class rate weight was determined using a bivariate general discrete distribution for each coding site. Convergence of the MCMC model was demonstrated by the fact that the posterior mean estimates for BPIFA1 were found to be closer to the considering reduction factor value ( Table 2). These values ranged from 0.95 to 0.99. During the process of diversifying selection, only the coding sites with empirical Bayes factor (EBF) values of more than 50 were considered. Calculations were performed using the net effective sample size to determine the EBF values for each coding site evaluated using positive selection. Inferring the distribution of gene-specific selection parameters could improve the detected selections across a large number of coding sites. The coding areas that were positively selected and identified give significant evidence of diversifying selection in BPIFA1 genes that are now undergoing selective lineage. As a result, some mutations that initially appear to be neutral (and have no immediate impact on fitness) can be "permissive," allowing the protein to withstand later changes that would otherwise be harmful and cause phenotypic differences [52]. Neutral mutations in epistasis lay the foundation for later selection and adaptation, which has recently attracted much attention and been offered as a way to reconcile neutral and selection models of evolution [53].
The substitution rate for the pair FWY and HKR was approximately 50%, the substitution rate for DENQ was 50%, and the substitution rate for ACGILMPSTV was 90%. The PPI network represented the interactions of the BPIFA1 protein with other co-expressed immune proteins. COX7B2, BPIFB6, BPIFB4, BPIFB2, BPIFB3, PLTP, CETP, BPI, LBP, and ODF2L were the ten genes that we determined to be responsible for these protein interactions ( Figure 6). The BPIFB6, BPIFB4, BPIFB2, and BPIFB3 genes are the most significant because they are involved in biological signaling pathways, which play an essential role in innate immunity against bacterial infection. In addition, these genes are upregulated by BPIFA1, providing another reason that they are so significant (Table 6). Interfaces contain clusters of conserved residues with an amino acid composition compatible with both the interface core (residues with the largest change in burial upon binding) and a conserved region [54], and hot regions evolving from the clustering of hot spots correspond to tightly packed and conserved regions. Thus, interfaces are under evolutionary pressure to sustain current connections while averting unfavorable, non-specific interactions. Certain physicochemical features can be altered to reduce the likelihood that protein-protein interfaces may form dysfunctional interactions [55]. As a result of our investigation, we found that values were more than 1 for positively selected codons presented in Table 1. This illustrates that the development of synonymous sites required more time than the development of non-synonymous sites (dN sites). This beneficial impact of Darwinian selection, which encourages novel variations and greater allelic polymorphism, operates as balancing or purifying selection [56], which causes an alteration in the structural protein and affects the signaling pathway [57]. In spite of the fact that they originate from the same lineage, amino acid substitutions in the offspring of different species might have very different consequences [56,57]. This contrasts with the fact that their pedigree coincides with earlier submissions. The BPIFA1 genes chosen in this study provide some information for bioanalysis, which aims to select genes based on the evolutionary time scale from the most recent to longer-term periods. In addition, the fundamental evolutionary mechanism that has been uncovered as a result of recent research may be insufficient due to the absence of the structural and functional features of a large number of proteins in the genome. The evolution and adaptation of protein-coding genes in Drosophila melanogaster were thoroughly examined in order to determine the most relevant determinants of evolution and adaptation at the level of protein-coding genes. This was accomplished by comparing D. melanogaster to closely related species and their own populations. Large-scale applications of bioinformatics and structural analysis were carried out by our team in order to ascertain the structural and functional features of proteins. Subsequently, we divided the residues into a variety of structural and functional sites using our categorization system. The rates of sequence evolution and adaptation were compared across a variety of proteins and locations, which enabled the identification of hotspots of adaptation across the whole genome. In addition, it has been demonstrated that fast-adaptive proteins interact with one another at rates that are higher than what would be predicted by chance; this discovery shows that coadaptation is likely ubiquitous among fast-adaptive proteins.
As a result of their physical connections, the following are examples of mechanisms that have the potential to contribute to coadaptation: (1) fast-adaptive proteins are often found to be enriched in similar chemical activities and exposed to similar selection pressure, and (2) fast-adaptive proteins coevolve. Two different instances of adaptive evolution in PPIs were demonstrated in this research, which leads the authors to hypothesize that these physical interactions may have played a role in the coadaptation of fast-adaptive proteins in D. melanogaster. In addition, we showed that the phenomenon of coadaptation may take place in a more general sense than only between fast-adaptive proteins. The rate of adaptation is typically higher in proteins that interact with fast-adaptive proteins. Given that molecular interactions play a role in adaptive evolution, it is fair to anticipate that these interactions may also govern coadaptation at a more global level. It has been postulated that the coevolution of physical contacts is the mechanism responsible for the similar evolutionary rates observed in interacting proteins.

Conclusions
Our goal was to identify the selective pressures that have contributed to the development of the plant and mammalian BPIFA1 system, the expression of which is modulated in a wide variety of diseases. The BPIFA1 protein rapidly evolved in response to selective pressure in the human lineage, and we were able to pinpoint the genetic selection determinants that account for its bactericidal activity. During its evolutionary history, positive selection may have had a crucial role in improving the virulence response to different stimuli, which could explain the observed diversity in the stability of the gene's function. Our findings provide a more comprehensive understanding of the evolutionary history of BPIFA1 genes, which will enhance the functional genomics analysis of pathogenicity in biological processes. It is anticipated that these findings may also help to improve the understanding of disease prevention. Additionally, the study of these genes might facilitate the design of a unique method that could assist in determining the various virulence proteins present in bacterial pathogens. Our findings lead us to hypothesize that restrictions during the evolutionary process have played a key role in shaping our discoveries. As a result of these limitations, we were able to identify some numerical boundaries when we coupled characteristics such as protein length to complicated complexes. The unique characteristics of proteins are intriguing because they may provide an indication of unusual stressors or homeostatic adjustments that have enabled their presence in cells. Therefore, they are a promising choice for further research.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14010015/s1, Table S1: The species names and accession numbers used to study the BPIFA1 gene. Data Availability Statement: All data relevant to this article shall be openly available to readers.