Predicted Cellular Interactors of the Endogenous Retrovirus-K Integrase Enzyme

Integrase (IN) enzymes are found in all retroviruses and are crucial in the retroviral integration process. Many studies have revealed how exogenous IN enzymes, such as the human immunodeficiency virus (HIV) IN, contribute to altered cellular function. However, the same consideration has not been given to viral IN originating from symbionts within our own DNA. Endogenous retrovirus-K (ERVK) is pathologically associated with neurological and inflammatory diseases along with several cancers. The ERVK IN interactome is unknown, and the question of how conserved the ERVK IN protein–protein interaction motifs are as compared to other retroviral integrases is addressed in this paper. The ERVK IN protein sequence was analyzed using the Eukaryotic Linear Motif (ELM) database, and the results are compared to ELMs of other betaretroviral INs and similar eukaryotic INs. A list of putative ERVK IN cellular protein interactors was curated from the ELM list and submitted for STRING analysis to generate an ERVK IN interactome. KEGG analysis was used to identify key pathways potentially influenced by ERVK IN. It was determined that the ERVK IN potentially interacts with cellular proteins involved in the DNA damage response (DDR), cell cycle, immunity, inflammation, cell signaling, selective autophagy, and intracellular trafficking. The most prominent pathway identified was viral carcinogenesis, in addition to select cancers, neurological diseases, and diabetic complications. This potentiates the role of ERVK IN in these pathologies via protein–protein interactions facilitating alterations in key disease pathways.


Introduction
Viral proteins often usurp and alter cellular signaling pathways. For exogenous viruses, this tweaking of cellular function serves to enhance their replicative success through the modulation of pathways related to virion production, dissemination, cell survival, and immunity [1,2]. It is less clear in what manner ever-present viral symbionts such as endogenous retroviruses (ERVs) interact with the proteome of their hosts.

STRING Analysis and KEGG Pathways
To identify potential ERVK IN binding partners based on ELM motifs, the names of interacting proteins were curated from each ELM reference page. When only a general interaction domain for a given ELM was listed, it was further linked to the InterPro database to curate a list of human proteins containing the interaction domain. Based on the 48 ELMs identified in ERVK IN, a total of 213 putative human protein interaction partners were identified (Table A5). The list was submitted to STRING (String Consortium; version 11.0; Software for Technical Computation; 2020, https://string-db.org/, accessed on 16 May 2021) for network analysis. Full network analysis was performed using Experiment and Databases as active interaction sources. Nodes indicate submitted query proteins only, with edges indicating confidence lines with a minimum interaction score of 0.9 (highest confidence). Query proteins unlinked to the network were excluded from analysis. A payload list was used to color hub proteins based on cellular function. KEGG pathways associated with the network analysis (E value < 1.0 × 10 −5 ) were presented in a heatmap using GraphPad Prism (version 9.1.1) software, and the full list of KEGG pathways is presented in Table A6.

Motif conservation
Ligand

Characterization of Eukaryotic Linear Motifs in ERVK Integrase and Other Betaretroviral Integrases
To establish which exogenous and endogenous retroviruses contain integrase sequences most similar to ERVK IN, we performed BLASTp searches using the nr, mo, and tsa NCIB databases. As expected, exogenous Betaretroviruses were identified through BLASPp search, which included multiple hits for Mouse mammary tumor virus (MMTV), Mason-Pfizer monkey virus (M-PMV), Enzootic Nasal Tumor Virus (ENTV), and Jaagsiekte sheep retrovirus (JSRV) ( Table A1).

Characterization of Eukaryotic Linear Motifs in ERVK Integrase and Other Endogenous Integrases
ERVK integrase-like sequences were found in boreoeutherians, including the Euarchontoglires (primates, rodents, and pikas), and Laurasiatherians (ungulates), along with other clades including Euteleostomi (birds) and Protostomes (worms, insects, and water fleas) (Tables A2-A4). Results ranged from 26.43 to 83.77% identity and E values ranged from 0.001 to 2.0 × 10 −127 , demonstrating a high degree of similarity with ERVK IN.
The conservation of ELM motifs was apparent (Table 2, Figure 2), including DDRrelated canonical 14-3-3 interaction motifs and WDR5 interaction, cell signaling associated with USP7 binding, IAP-binding motif, STAT5 binding motifs, SH3 protein interaction, as well as phosphorylation sites for CK proteins, GSK3, NEK2, polo-like kinases, and p38. Many LIR motifs for engaging Atg8 proteins during selective autophagy were also apparent. Finally, all IN displayed Pex14 binding motifs and potential to interact with the µ-subunit of the adaptor protein complex. Additional ELMs and their motif frequencies in individual endogenous ERVK-like INs are listed in Table 2

ERVK Integrase Is Likely Post-Translationally Sumoylated
Unlike all other INs examined, only ERVK and MMTV contain a C-terminal inverted version (D/ExKphi) of the canonical sumoylation motif [67]. Considering that sumoylation often causes re-localization of nuclear proteins, this modification may be related to ERVK IN nuclear positioning, association with chromatin, and ultimately successful integration of viral DNA [68,69].

ERVK Integrase Exhibits Enhanced Interaction Potential with DDR Proteins
Phospho-Ser/Thr binding domain proteins are key hub proteins in cell cycle regulation and DDR, and they include 14-3-3 proteins, WW domains, Polo-box domains, WD40 repeats, BRCA1 carboxy-terminal (BRCT) domains, and Forkhead-associated (FHA) domains [54], all of which are interacting domains of ELMs identified in ERVK IN (Tables 1 and 2, Figure 1). Additionally, ERVK IN contained five (ST)Q motifs, which are potential phosphorylation sites for PIKK proteins, such as DDR-related proteins ATR, ATM, DNA-PK, and multifunctional protein mTOR [70]. As compared with exogenous betaretroviruses and endogenous ERVK-like integrases, ERVK IN displayed a greater number of DDR-related motifs: FHA domain protein interaction sites (6), PLK-1 phosphorylation sites (4), and PP1c docking motif for target dephosphorylation (3) [50]. In contrast to MMTV, ENTV, JSRV, and most other endogenous integrases, fewer WD40 repeat domain WDR5 interaction sites were found in ERVK IN (2 vs. 5-12 sites each). This suggests ERVK IN has potentially shifted away from WDR5 interaction in favor of BRCA1 (or BRCT domain) interaction as a means to interact with the DDR pathway [54,71].

ERVK Interactome Reveals Association with a Diversity of Cellular Pathways
Based on ELMs identified in ERVK IN, a curated list of potential interacting proteins was generated and used to build an ERVK IN interactome network using STRING software ( Figure 3). The ERVK IN network contained 189 nodes and 692 edges (expected number of edges 222), resulting in a significant PPI enrichment (p < 1.0 × 10 −16 ). Only direct interactor query proteins are shown without links to second shell interactions. To illustrate key nodes and hub proteins, select network proteins were colored based on function related to DDR, cell cycle, apoptosis, cell signaling, or cellular transport. A complete list of the KEGG pathways significantly associated with the network is presented in Table A6.

ERVK Integrase May Utilize Specific Cellular Transport Systems
The ERVK interactome contains proteins related to cellular transport. EB1 (MAPRE1) is an end-binding (EB) protein connected with both cell cycle and signaling pathways and is functionally associated with the regulation of microtubule dynamics [83]. Adaptor protein complex 2 associated proteins (AP2M1 and CTTN) were identified and indicate a role in cargo internalization via clathrin-mediated endocytosis and actin dynamics [65,84]. Lastly, ERVK IN may interact with Pex14 and Pex13 independently of the main network for peroxisome import [63]. While these pathways were likely important for the ancestral exogenous ERVK to transverse the cell and mediate infection, it remains unclear how endogenous IN may interact with these systems. . ERVK IN is predicted to interact with cellular pathways involved in the cell cycle, cell signaling, immunity, and inflammation, as well as disease pathways associated with several cancers, the nervous system, and diabetes.  Figure 4). KEGG pathways for several known ERVK-associated cancers were also identified, including lung cancer [85], myeloid leukemia [86], and hepatocellular carcinoma [87] (Figure 4). Glioma was also identified, yet ERVK is downregulated in this condition [88]. Aligned with cellular transformation, proteins associated with cell cycle were also over-represented in the pathway analysis, which are specifically related to the cyclin docking site ELM (DOC_CYCLIN_RxL_1) and numerous FHA domain protein interaction sites (LIG_FHA_1 and LIG_FHA_2) in ERVK IN (Tables 1 and 2, Figures 1 and 2).

Diabetes
The role of ERVK in diabetes remains contentious [91][92][93][94]. However, network analysis suggests that the ERVK IN interactome is potentially linked to AGE-RAGE signaling in diabetic complications, insulin signaling, and insulin resistance (Figure 4).

Discussion
ERVK expression has been repeatedly associated with human disease states including cancer, neurological disease, and diabetes. By exploring the potential ERVK integrase interactome, we can postulate how this viral symbiont may contribute to disease pathogenesis via interaction with key proteins and pathways. Our analysis reveals that viral carcinogenesis and modulation of the DNA damage response are the most likely pathways to be pathologically associated with ERVK IN expression.
Retroviral integrase activity causes DNA lesions in the host genome as part of the proviral integration process [19]; therefore, interactions with DDR pathways are to be expected. Several DDR proteins have been shown to be essential for provirus suture into the host genome and maintenance of genome fidelity [19]. Yet, the impairment of select aspects of DDR has also been documented in exogenous retroviral infections, including HIV [95,96] and HTLV-1 [97,98]. This may be driven by the fact that NHEJ proteins also play an essential role in innate immune recognition of retroviral cytosolic ssDNA intermediates and dsDNA pre-integration complexes [98,99]. Thus, retroviruses must balance the benefits and drawbacks of DDR outcomes through the engagement and modulation of specific proteins.
BRCT domain, FHA domain, and 14-3-3 proteins work in concert during the DNA damage response (reviewed in [100]). Many of these DDR proteins are also cellular targets of retroviruses and oncogenic viruses [98,[101][102][103][104]. BRCA1 BRCT domains recognize phosphopeptides based on a pSXXF motif, but XX residues and the surrounding amino acids also impact binding affinity and selectivity [105]. All the betaretroviral INs examined showed the capacity to interact with BRCT domains. However, only ERVK IN displayed a high affinity (S.F.K) BRCA1 BRCT domain binding site; the only other similar ELM structure is found in the DDR protein Fanconi anemia group J protein (FACJ/BACH1) [106]. It is also possible that dual anchoring onto the ERVK IN using both a BRCT domain and an FHA domain found in NBN or MDC1 may strengthen protein-protein interactions.
The utilization and evasion of 14-3-3 proteins are common among many viruses [104]. ERVK IN is unique in having a C-terminal RASTE motif, in addition to two other canonical arginine containing phospho-motifs recognized by 14-3-3 proteins. Given that an elevated expression of 14-3-3 proteins occurs in both cancers and neurodegenerative diseases [107,108], ERVK IN interaction with 14-3-3 protein members may be related to either modulation of the cell cycle and oncogenesis or regulation of protein aggregation, respectively. The deregulation of 14-3-3 and RAF kinase interaction can also lead to inappropriate downstream MAPK activity (associated with oncogenesis) [54,109] and may be an aspect to consider for the predicted ERVK IN network.
ABL1 appears to be a key hub protein linking DDR and downstream signaling cascades. Interestingly, DDR is known to be a rapid driver of ABL1 activation [110]. The ablation of ABL1 reduces retrovirus integration [111,112], while active ABL1 can turn on the HIV promoter independently of HIV Tat [113]. Putative interaction between ERVK IN and ABL1 may have been important for ERVK integration into germline cells, and it may additionally play a role in ERVK expression, specifically in neurodegenerative disease displaying enhanced ABL1 activity [114,115].
DDR is intimately tied to innate immune response, specifically NF-κB activation [116]. Considering ERVK's dependence on NF-κB for driving its own expression [11], it is conceivable that ERVK IN plays a role in preparing the host cell for viral transcription. 14-3-3 activity is key in driving ATM-TAK1-mediated NF-κB signaling during DDR [117,118]; thus, the predicted ERVK IN interaction with 14-3-3 (YWHAE) may be a mechanism to favor viral transcription. The MAPK p38 was also predicted to both phosphorylate and bind ERVK IN. This association may be linked to p38 s regulation of inflammatory signaling, as well as its capacity to enhance the transcriptional activity of NF-κB p65 via modulation of the acetyltransferase activity of coactivator p300 [119]. Sustained NF-κB activity is linked to oncogenesis [116] and ties into the strongest ERVK IN-linked KEGG pathway: viral carcinogenesis. However, enhanced ERVK IN-associated NF-κB signaling may also fit with inflammatory and neurodegenerative conditions. ERVK IN stability and protein turnover is likely linked to its cellular protein partnerships. In the case of HIV, binding select cellular proteins such as LEDGF/p75 and Ku70 prevents integrase proteosomal degradation [120,121]. Similarly, c-Jun N-terminal kinase (JNK) S 57 phosphorylation of the core domain can make HIV IN a target for Pin1, thus enhancing its stability and activity [122,123]. In this study, Pin1 s WW domain was predicted to be an interactor based on three [ST]P motifs in the C-terminal portion of ERVK IN. This raises the possibility that similar to many other viral proteins [124], ERVK IN may be stabilized through Pin1 interaction. The functional significance of this interaction may underlie how elevated levels of ERVK IN are maintained and potentially drive pathology in select diseases, such as ALS and cancer.
Distinct from other exogenous betaretroviruses, only ERVK IN and some endogenous integrases contained canonical LIR motifs for binding Atg8 protein family members. Mammalian Atg8-like proteins include LC3 and GABARAP families, which mediate selective autophagy, as well as play essential roles in antiviral defense and innate immune signaling [125]. However, it is often observed that viruses subvert autophagy processes to avoid viral protein clearance and repurpose Atg8 proteins as well as autophagosomal membranes for viral replication [125,126]. Considering the perturbances of autophagy in neurodegenerative disease [127,128], the interaction between ERVK IN and Atg8 proteins warrants further investigation. Consistent with genomic instability profiles in cancer [129], ALS [130], and Alzheimer's disease [131], the ERVK interactome analysis identified each of these conditions as significant KEGG pathways. Despite differences in clinical presentation, the molecular underpinnings in cancer and neurodegenerative disease are remarkably similar and include alterations in DDR [129,130,132], 14-3-3 expression [133,134], p53 signaling [135,136], p38 signaling [137,138], and Wnt signaling [139,140]-which are all KEGG pathways enriched in the ERVK IN network. AGE-RAGE signaling was also identified as a potential pathway associated with the ERVK IN interactome. Not only is this pathway implicated in diabetic complications [141], but it also plays a role in nuclear response to DNA damage [142], carcinogenesis [143], and inflammatory neurodegenerative diseases [144]. Collectively, our results point to ERVK IN driving a pattern of pathology that, depending on cellular context, may lead to carcinogenesis, neurodegeneration, or contribute to diabetic complications. However, the engagement of DDR can also have beneficial impacts on lifespan extension, depending on tissue context and host genotype; thus, non-pathogenic effects of ERVK IN should also be considered [145,146].
Apart from the importance of putative ERVK IN interaction partners, it is also important to consider which cellular proteins were not associated with the ERVK IN interactome. One interaction that was not predicted was with LEDGF/p75, and indeed, this interactor is limited to partnership with lentiviral integrases [147,148]. Another set of DDR proteins commonly found to impact retroviral integration and replication is the DNA-PK complex [99,149]. HIV integrase directly interacts with Ku70 [120]; while ERVK IN was predicted to be phosphorylated by DNA-PKcs (PRKDC), it contained no ELMs to suggest direct interaction with Ku80 or Ku70. Another apparent difference is the use of EB proteins in microtubule trafficking for HIV and ERVK. ERVK IN contained an SxIP motif that binds EBH domains, whereas HIV capsid conversely has EB-like motifs that interact with SxIP motifs in plus-end tracking protein (+TIP) [150]. These genus-specific distinctions are likely to emerge as important considerations for therapeutic targeting strategies and imply that pharmaceuticals geared toward HIV infection may not consistently translate for use in ERVK-associated disease.
Another consideration that stems from this study is the choice and caveats of using animal models in ERVK research. A diversity of animals outside of the primate lineage are host to ERVK IN-like sequences, such as rodents, ungulates, fish, and insects. Drosophila, a common model organism, also contained ERVK IN-like elements in their genome, specifically LTR retrotransposons flea and Xanthias, as identified by FlyBase (Table A4). The transposable element Xanthias is known to be active in D. melanogaster [151,152], and it shares a degree of similarity with ERVK IN. The presence and activity of these ERVs is an important factor to consider when performing experiments.
It is shocking how little we understand of the impact endogenous viral symbionts have on cellular functioning. Herein, we have predicted that ERVK IN may participate in the modulation of cellular pathways such as DDR, cell cycle regulation, and kinase signaling cascades by way of select protein interaction motifs. The main caveat of in silico predictions is the requirement for experimental validation; while research into ERVK IN is currently underway, this study suggests there remains a myriad of disease-related betaretroviral integrase interactions to explore.

Informed Consent Statement: Not applicable.
Data Availability Statement: National Center for Biotechnology Information (NCBI; https://www. ncbi.nlm.nih.gov/, (accessed on 12 July 2021)) can be used to access sequences listed in the paper.

Acknowledgments:
The authors would like to thank Samuel Narvey, Megan Rempel and Alex Vandenakker for their peer review of the manuscript drafts.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Dedication: This study is dedicated to patients with ALS-we are working on it! Appendix A

Category ELM STRING Predicted Interactor (Gene Name) Network
Cleavage    Table A5. Cont.

Category ELM STRING Predicted Interactor (Gene Name) Network
Circle symbol: denotes protein depicted in network analysis in Figure 3.