1. Introduction
The human immunodeficiency virus (HIV) remains one of the most significant global health challenges [
1]. Despite advances in antiretroviral therapy (ART) targeting key stages of the viral replication cycle [
2], infection persistence is largely driven by the emergence of drug-resistant viral mutations, reducing the efficacy of existing treatment regimens [
3,
4]. This underscores the urgent need to discover new bioactive compounds and therapeutic strategies to overcome current protocol limitations.
HIV research relevance extends beyond developing drugs targeting various viral cycle stages to identifying novel molecular targets and endogenous infection control mechanisms. This necessity stems from the extreme complexity of virus–human immune system interactions [
5], which significantly constrains existing therapeutic strategies’ effectiveness [
6,
7]. A promising direction involves identifying endogenous innate immunity factors with antiviral activity. Studies demonstrate that certain human proteins, like chemokine CCL3, can inhibit infection by binding chemokine co-receptors and competitively blocking viral entry [
8]. Crucially, such inhibition targets cellular rather than viral components. Key advantages of these endogenous inhibitors include their preexistence in the human body, partially understood physiological roles and safety profiles, and known interaction mechanisms with potential target proteins for some. A prominent example is APOBEC3G, which inhibits HIV-1 replication by inducing lethal hypermutation in newly synthesized viral DNA, thereby blocking reverse transcription and integration processes [
9]. However, the systematic identification of such endogenous inhibitors among broad panels of host proteins remains challenging.
Current approaches for interaction discovery include public databases enable reconstruction of large-scale protein–protein interaction (PPI) networks. Such network analysis helps identify key protein hubs associated with HIV pathogenesis for further in-depth study [
10]. However, these networks often reflect functional associations or co-expression rather than direct physical interaction, potentially yielding false-positive results. Even for physical interactions, network analysis typically lacks molecular mechanism details, spatial characteristics (complex conformation and stoichiometry), binding affinity, or process dynamics. Identifying physical interactions involving direct atom-atom contact between molecules is critical, as they underpin most fundamental cellular processes: signal transduction, enzymatic catalysis, viral particle assembly, and function blocking. Therefore, establishing physical contact between host proteins and viral proteins and/or established cellular cofactors opens avenues for designing targeted low-molecular-weight inhibitors or peptidomimetics that can specifically disrupt this interaction, thereby interrupting the pathogen’s life cycle. Simultaneously, this research direction deepens understanding of molecular pathogenesis mechanisms by revealing key target proteins and regulatory pathways involved in disease development. However, conventional PPI networks typically cannot distinguish between direct physical interactions and indirect functional associations.
The comprehensive experimental determination of physical interactions across the full spectrum of potential host–pathogen protein pairs presents substantial practical challenges due to the exceptional resource intensity involved. Such large-scale experimental screening demands immense investments of time, specialized equipment, and materials, rendering exhaustive experimental approaches impractical for initial candidate discovery. In this context, preliminary computer modeling offers a powerful strategy to optimize the search process through sophisticated in silico assessment. Modern computational methods can generate rationally justified priority target lists for subsequent experimental validation, thereby concentrating research efforts on the most promising candidates. Recent advances in structural bioinformatics, particularly the development of AlphaFold and related deep learning architectures [
11], have revolutionized our ability to predict protein–protein interaction with unprecedented accuracy. These approaches enable detailed characterization of binding interface formation between viral and cellular proteins at atomic resolution, predicting spatial architecture of complexes and identifying specific amino acid residues that dominate binding energy contributions. The resulting structural insights provide critical foundation for multiple downstream applications, including virtual screening of low-molecular-weight compound libraries to identify candidates capable of sterically or allosterically disrupting pathogenic complex formation. However, the efficient processing and prioritization of hundreds of predicted complexes generated by these methods require specialized analytical pipelines. Thus, despite the power of computational modeling, a pressing need remains for integrated frameworks that can systematically evaluate and rank large volumes of predicted protein complexes to maximize research efficiency [
12,
13,
14]. The primary aim of this study was to develop a computational pipeline for predicting protein–protein interaction and to apply it to identify human proteins capable of physically interacting with the viral glycoprotein gp120 and/or major HIV cellular co-receptors (CD4, CCR5, CXCR4, CCR2).
2. Results
The study proceeded through several key stages: initial validation of protein structures, modeling of binary interactions, quantitative analysis of interfaces, and finally, a comparative ranking of candidates based on a composite metric. The results of each stage are detailed below.
To validate the modeling workflow, we first generated single-protein structures for each host receptor and the HIV gp120 glycoprotein (five single models of HCBGPs/HRP and one CCR5-Δ32 model). The parameters of these models are presented in
Table 1. The obtained protein structures visually correspond to models available in protein databases, while pTM and RS parameter values indicate close approximation to the native structures of the analyzed proteins.
The subsequent stage involved modeling interactions between receptors/coreceptors and the HRP (5 models). Model parameters are presented in
Table 2.
In line with the expected limitations of our simplified modeling approach (
Section 4.2.4), the generated models for biologically established reference complexes did not achieve high interface confidence values (ipTM < 0.6 for all HCBGP-HRP pairs), despite their plausible visual appearance and correspondence with known interaction patterns (
Figure 1). This consistent result across all reference pairs confirms that absolute ipTM scores are not reliable discriminators in this specific screening context. It thereby reinforces the validity of our decision to employ a comparative analysis framework based on the composite area metric, which normalizes predicted interactions against these internal reference benchmarks.
We generated interaction models for each background protein paired with every candidate protein, resulting in a total of 275 protein–protein interaction models. Model confidence results are presented in
Appendix A. Among these, 68 models showed reliable pTM but unreliable RS, while 37 models demonstrated both reliable RS and pTM but unreliable ipTM. Two models (ADRA2C, FPR3) exhibited reliable RS but unreliable ipTM and pTM.
It should be noted that despite satisfactory confidence metrics for several predicted models, physical interactions between these proteins in the presented conformations remain unlikely due to steric constraints and electrostatic incompatibility (
Figure 2). While these artifacts represent known limitations of the simplified docking system, we intentionally retained all models in subsequent analysis to demonstrate the pipeline’s ability to handle diverse prediction scenarios. A visual assessment of model plausibility was performed, though this evaluation necessarily remains subjective; its results are provided in
Appendix B.
Having established a set of complex models, we proceeded to quantitatively characterize the protein-protein interfaces. We analyzed atomic contacts, steric clashes, and hydrogen bonds to derive a normalized interaction score (I
n) for each candidate complex. This provided a biophysical characterization of the interaction interfaces, complementing the structural confidence metrics. The resulting contact data for the background genes are presented in
Table 3.
Contact data for HICGP interactions with each background protein and their normalized metrics are presented in
Appendix C,
Table A3 and
Table A4.
To integrate both the model confidence (RS) and the interface quality (I
n) into a single prioritization metric, we calculated the composite Normalized Interaction Area (A = RS × I
n). This approach allowed for the systematic ranking of all candidate interactions against our internal reference set (gp120-HCBGPs). The resulting landscape of interaction areas revealed distinct clusters and high-priority candidates (
Figure 3,
Table 4).
For standardization, interactions between HICGPs and background proteins were considered significant when their area values exceeded 95% of the area value for the corresponding background protein’s interaction with gp120 (excluding the gp120 interaction itself). Complete calculation data are presented in
Appendix D and
Appendix E.
Table 4 presents HICGPs with the most significant area values.
Significant proteins based on calculated interaction area with CCR5 were: CCL2, CCL25, CCL27, CCL8, CXCL12, CXCL13, CXCL2, CXCL3, and PNOC. Significant proteins based on calculated interaction area with CXCR4 were: CXCL12 and PNOC. Significant proteins based on calculated interaction area with CCR2 were: CCL2, CCL25, CCL8, CCR7, CXCL13, CXCL2, CXCL3, NPY1R, NPY5R, OPRK1, and PENK.
Analysis of CD4-gp120 interactions reveals a limited number of atomic contacts and insignificant binding surface area. Consequently, the diagnostic value of this complex for comparative analysis is substantially reduced, as most investigated HICGPs demonstrate area values comparable to or exceeding that of the CD4+gp120 system. Similarly to the CD4+gp120 complex, the area in the gp120+CCR5-Δ32 system also proved insufficient for use as a threshold to exclude HICGPs with low significance. When considering the interaction area threshold of CCR5 with gp120, the candidate list is as follows: CCL27, CCR7, NPY1R, NPY5R, OPRK1, ACKR3, ADRA2C, CCR10, CCR9, CXCR3, CXCR5, CXCR6, FPR3, GPER1, HTR1D, HTR1F, HTR5A, OXER1, PTGDR2, S1PR2, SSTR3, TAS2R14, GPR18, and SST.
The primary proteins interacting with gp120 are receptors, a finding which is further supported by cluster analysis results presented in
Table 5. A clear separation into receptor and ligand clusters is evident. Particular attention should be given to HICGP interaction models with HCBGPs/HRP that demonstrated area values exceeding established thresholds. No candidate proteins interacting with all background human proteins analyzed in this study were identified. HICGPs interacting with three HCBGPs (CCR5, CXCR4, CD4) were identified: PNOC and CXCL12. HICGPs interacting with one of the main coreceptors (CCR5 or CXCR4) and one or two other HCBGPs (CCR2 and/or CD4) were identified: CCL2, CCL8, CCL25, CCL27, CXCL13, CXCL3, and CXCL2. Interaction models are presented in files in the
Supplementary Materials.
Among HICGPs, only CCL27 showed area values above thresholds for interaction models with both CCR5 and gp120 (
Figure 4). Interaction models are presented in files in the
Supplementary Materials.
3. Discussion
3.1. Overview of the Computational Approach and Key Findings
Top-ranked HICGPs based on comparative Area analysis (values exceeding operational thresholds for prioritization are indicated in bold). This study aimed to conduct an in silico screen of a panel of 55 HICGPs to identify molecules potentially capable of modulating a key stage of the HIV life cycle, namely the interaction between viral glycoprotein gp120 and cellular receptors [
16]. The application of the AlphaFold 3 algorithm enabled the reconstruction and comprehensive analysis of 275 molecular complexes, resulting in the identification of both expected and previously undescribed potential targets for therapeutic intervention in HIV infection.
3.2. Validation of the Pipeline: Separation of Ligands and Receptors
The clear separation of candidate proteins into distinct receptor-type and ligand-type clusters, as revealed by our k-means cluster analysis (
Table 5), provides internal validation for our computational pipeline and prioritization strategy. This recapitulation of fundamental biological categories demonstrates that our comparative framework, based on the composite Area metric, effectively captures biologically relevant features of protein–protein interaction. The clear dichotomy suggests that the predicted interaction models respect basic biological principles, where ligands (such as chemokines and neuropeptides) and receptors occupy distinct functional and structural niches, even within the simplified in silico environment. This successful separation reinforces the biological plausibility of the top-ranking candidates identified by our screening approach and supports the robustness of our method in distinguishing between different modes of potential interaction with the HIV entry machinery.
3.3. Chemokine Ligands: Expected and Discordant Results
As anticipated based on their known biological functions, C-C family chemokines (CCL2, CCL8, CCL25, CCL27) and C-X-C family chemokines (CXCL12, CXCL13, CXCL2, CXCL3) demonstrated high interaction potential with their natural receptors CCR5 and CCR2 in our models. Although normalized contact parameters for most did not exceed those of the gp120-coreceptor complexes, their binding capacity suggests these chemokines may act as natural competitive antagonists, potentially blocking viral glycoprotein binding sites.
Of particular note is the case of chemokine CCL2. Our model predicts high-affinity binding to CCR5, suggesting a potential for direct competitive inhibition of viral entry. This finding, however, appears to contradict experimental studies reporting that CCL2 can enhance HIV replication in vivo and ex vivo [
17,
18]. We propose that this discrepancy underscores the distinction between a direct physical interaction, captured by our structural models, and a protein’s net biological effect within a complex physiological environment.
The primary physiological role of CCL2 is the chemotaxis of monocytes and other immune cells to sites of inflammation [
19]. This recruitment significantly expands the pool of target cells (e.g., CD4+ T cells, macrophages) available for HIV infection, an indirect proviral effect that likely dominates the net outcome in many experimental and physiological contexts. Thus, the in silico prediction and experimental observations can be reconciled within a dual-activity framework: CCL2 may possess an intrinsic, direct antiviral potential via coreceptor blockade (as predicted by our model), which is masked in vivo by its potent, indirect proviral effect via target cell recruitment.
This case highlights a critical principle for interpreting computational screens: a predicted physical interaction signifies mechanistic potential, but the net biological outcome is determined by the broader cellular and systemic context [
20].
Analysis revealed a significant disproportion: the relative number of ligands interacting with CXCR4 was substantially lower than with CCR5/CCR2. These findings suggest that within the employed model, CC-type chemokines exhibit more pronounced inhibitory activity against viral utilization of CCR5 co-receptor. The observed binding disparity is consistent with the hypothesis that an imbalance in available natural ligands could contribute to selective pressures influencing coreceptor switching and the emergence of CXCR4-tropic variants in later stages of infection [
18,
21].
3.4. Potential gp120 Interactions with Non-Canonical Receptors
Beyond the classical co-receptors, our modeling suggests the capacity of the viral glycoprotein gp120 for direct interaction with a broad spectrum of cellular membrane receptors. Although overall prediction reliability was moderate—as expected given gp120’s high conformational plasticity and the challenges of predicting its binding sites—our comparative analysis identified two significant subgroups of potential interactors.
The first group comprises chemokine superfamily receptors (CCR10, CCR7, CCR9, CXCR3, CXCR5, CXCR6), which demonstrated contact quantities with gp120 comparable to reference coreceptors. These findings align with publications suggesting that some of these receptors may serve as alternative or supplementary viral entry portals into specific cell types [
22,
23]. The specific targeting of this receptor class by gp120 may represent a viral adaptation to broaden cellular tropism.
The second, more diverse group consisted of various neuroreceptors and other membrane proteins (HRH4, HTR1D, HTR1F, HTR5A, NPY1R, NPY5R, OPRK1, OXER1, OXGR1, PTGDR2, S1PR2, S1PR3, SSTR1, SSTR3, SUCNR1). For several of these, existing data suggest possible associations with HIV-associated neuropathologies [
24], making them promising targets for further investigation in the context of neuroinvasion and neuropathogenesis.
Somatostatin (SST) merits particular attention in this context, as its expression level has been reported to correlate with HIV progression [
25,
26], although earlier studies refuted its substantial role [
27], indicating ambiguity in existing data that warrants further clarification.
The G-protein family (GNAI1, GNAI2, GNA13) exhibited high structural confidence values (pTM) but low contact quantities with coreceptors, which is expected since they typically interact with receptor cytoplasmic domains. Their potential influence on infection is likely mediated through complex intracellular signaling cascades and cannot be adequately assessed within our binary interaction modeling methodology [
28].
3.5. Hypothesis-Generating Predictions for Neuropeptides
The high ranking of several neuropeptides (PNOC, NPY, PDYN, PENK) among the candidate interactors warrants specific discussion. Their prioritization should be interpreted with particular caution due to inherent methodological considerations. The small size and inherent structural flexibility of neuropeptides pose a particular challenge for reliable modeling using static docking approaches, potentially allowing for multiple conformations and leading to overestimated confidence in some binding poses. Indeed, some models showed potential binding to sterically inaccessible sites, such as intracellular domains. Therefore, while these candidates ranked highly in our screen, they should be classified as the most speculative predictions, serving primarily to generate hypotheses for rigorous experimental validation.
Notably, the literature analysis provides indirect support for the potential biological relevance of these systems in the context of HIV infection. For instance, anterior cingulate cortex samples from Patients with HIV showed decreased PDYN (prodynorphin) gene mRNA levels alongside increased OPRK1 (kappa-opioid receptor) mRNA expression compared to controls [
24]. We hypothesize that reduced PDYN expression may represent a compensatory mechanism aimed at limiting monocyte recruitment and mitigating neuroinflammatory processes, while enhanced OPRK1 expression might be associated with attempts to modulate proinflammatory signaling pathways. Furthermore, increased neuropeptide Y (NPY)-like immunoreactivity has been observed in the cerebrospinal fluid of Patients with HIV, suggesting a potential link to HIV encephalopathy [
29]. Thus, while the direct physical interactions predicted by our models remain highly speculative, the involved neuropeptide systems appear to be engaged in the host response to HIV infection, particularly within the nervous system.
3.6. Limitations of the Study
While our computational pipeline provides a systematic approach for prioritizing protein interactions, several limitations should be acknowledged. First, the low ipTM scores observed for biologically validated complexes are a direct consequence of our simplified binary modeling strategy, which traded atomic-level refinement for screening throughput. This inherent trade-off is why our comparative analysis framework, rather than absolute confidence scores, forms the core of the prioritization pipeline. Second, our models represent simplified binary interactions without key biophysical contexts such as explicit lipid membranes, gp120 glycosylation, or physiological ionic conditions. Third, the operational thresholds and confidence intervals used for candidate selection were derived from a comparative analysis with biologically verified reference interactions rather than from rigorous statistical distributions of null models. While this practical approach allowed for large-scale prioritization, it lacks a formal statistical foundation. Finally, and most importantly, all predictions—particularly those involving neuropeptides and novel receptor interactions—require experimental validation (e.g., SPR, BLI, cellular assays) before any firm biological conclusions can be drawn. We reiterate that the term “significant” throughout this manuscript refers specifically to candidates that surpassed our operational, comparative thresholds for prioritization within this computational screen. These thresholds provide a systematic ranking for guiding future research but do not constitute statistical or biological validation of the interactions.
3.7. Concluding Remarks
This in silico study successfully demonstrates a generalizable computational pipeline for the prioritization of protein–protein interaction in pathogen-host systems. By applying this framework to HIV-1 attachment, we have systematically narrowed a broad panel of candidates to a focused set of high-priority targets. Our results not only recapitulate known biology, validating our approach, but also generate novel and sometimes unexpected hypotheses regarding viral engagement with chemokine and neuromodulatory systems. The predictions presented here, most notably the dual-binding candidate CCL27, establish a robust and prioritized foundation for guiding future experimental efforts aimed at validating these interactions and exploring their therapeutic potential. This work underscores the power of integrated computational modeling to illuminate complex host–pathogen interaction landscapes.