Investigation and Functional Enrichment Analysis of the Human Host Interaction Network with Common Gram-Negative Respiratory Pathogens Predicts Possible Association with Lung Adenocarcinoma

Haemophilus influenzae (Hi), Moraxella catarrhalis (MorCa) and Pseudomonas aeruginosa (Psa) are three of the most common gram-negative bacteria responsible for human respiratory diseases. In this study, we aimed to identify, using the functional enrichment analysis (FEA), the human gene interaction network with the aforementioned bacteria in order to elucidate the full spectrum of induced pathogenicity. The Human Pathogen Interaction Database (HPIDB 3.0) was used to identify the human proteins that interact with the three pathogens. FEA was performed via the ToppFun tool of the ToppGene Suite and the GeneCodis database so as to identify enriched gene ontologies (GO) of biological processes (BP), cellular components (CC) and diseases. In total, 11 human proteins were found to interact with the bacterial pathogens. FEA of BP GOs revealed associations with mitochondrial membrane permeability relative to apoptotic pathways. FEA of CC GOs revealed associations with focal adhesion, cell junctions and exosomes. The most significantly enriched annotations in diseases and pathways were lung adenocarcinoma and cell cycle, respectively. Our results suggest that the Hi, MorCa and Psa pathogens could be related to the pathogenesis and/or progression of lung adenocarcinoma via the targeting of the epithelial cellular junctions and the subsequent deregulation of the cell adhesion and apoptotic pathways. These hypotheses should be experimentally validated.

Hi is responsible for CAP and exacerbations of COPD. The non-type b strains are linked to more invasive disease in CAP patients while non-typable strains (NTHi) are mainly responsible for COPD exacerbations. Following the introduction of the vaccine against Hi strain b (Hib), the incidence of infection due to other strains, such as type f (Hif ) and NTHi, has increased [5]. More specifically, a study conducted in Europe from 2007 to 2014 reported a 3.3% annual increase in Hi invasive disease, with NTHi being responsible for 78% of all cases [6]. Moreover, a recent study in the US indicated a 16% increase during 2009-2015 compared with 2002-2008 [7].
MorCa is an opportunistic pathogen of the respiratory tract, commonly associated with otitis media in children and COPD exacerbations in adults [8]. It has been estimated that 10% of all exacerbations of COPD and 2-4 million exacerbations per year in the USA are attributed to MorCa [9]. Additionally, cases of MorCa and Hi co-infection have been reported, where it has been suggested that the two pathogens act synergistically, facilitating each other's pathogenicity and survival in the respiratory tract [10].
Psa is not a common cause of CAP, yet it inflicts more severe disease. It is mostly responsible for HAP, intensive care unit (ICU) acquired pneumonia and opportunistic infections. Further, it is one of the most common multi-drug resistant pathogens isolated in infectious respiratory specimens [4,11]. Psa has been associated with more severe clinical manifestations and higher mortality rates in patients with bronchiectasis [12].
Molecular pathways that mediate pathogenicity of gram-negative bacteria, especially Hi, MorCa and Psa, have been the topic of various studies. The Hi Lipoprotein H (lph) and Protein E have been shown to facilitate pathogen resistance to host immune response by inhibiting complement factor H and membrane attack complex (MAC), respectively [13,14]. The adhesive protein UspA1 of MorCa has been characterized as being important for the adhesion and internalization of pathogens in the epithelium [15]. Lastly, the Psa Exoenzyme S (ExoS) has been reported to bind to the Rho GTPases, thus contributing to serum resistance by blocking phagocytosis [16]. Amid all of these findings, there is still a lack of data regarding the broad spectrum of the host-pathogen gene interaction network (interactome) and the functional annotations associated with this network. Moreover, novel findings on pathogen synergy [10] and pathogen resistance to host defense mechanisms [17] further highlight the importance of a detailed analysis of the human-pathogen interactome. We conducted an in silico investigation using bioinformatics tools. Additionally, we identified the enriched gene ontologies (GOs) of the functional annotations associated with the host's biological processes (BP), cellular components (CC) and associated diseases.

Materials and Methods
The Host Pathogen Interaction Database version 3.0 (HPIDB 3.0) was used to identify the human proteins interacting with the three pathogens of interest. HPIDB is a database that facilitates the host pathogen interaction (HPI) prediction and analysis, collecting published, experimental molecular HPI data from 12 different databases [18]. The database allows searching based on protein sequence, keywords or homologous HPI. In this study the search was performed using the Keyword tool that further offers four query options. We used the "Taxon name/Species" option and Haemophilus influenzae, Moraxella catarrhalis and Pseudomonas aeruginosa were inserted as keywords. Non-human proteins interacting with the three pathogens were also identified. Subsequently, out of those, only human proteins were further analyzed.
The identified HPIDB protein names were inserted in the Uniprot database in order to retrieve the names of the corresponding genes [19]. These gene names were then inserted in the ToppFun tool of the ToppGene database so as to perform Functional Enrichment Analysis (FEA) relative to BP, CC and diseases. ToppFun performs FEA of input gene list based on transcriptome, proteome, regulome, ontologies, phenotype, pharmacome and bibliome assuming the whole genome as background [20]. The p values were corrected for multiple testing with the Bonferroni and false discovery rate (FDR) methods of Benjamini-Hochberg and Benjamini-Yekutieli to determine statistical significance.
In order to corroborate our findings, we performed the same analysis by inquiring the GeneCodis database. GeneCodis is a bioinformatics platform designed for FEA that integrates functional, regulatory or structural information, searches for frequent patterns in the space of annotations and estimates their statistical relevance [21]. The analysis was performed with respect to BP GOs, CC GOs and KEGG pathways. The hypergeometric p values retrieved from the analysis were adjusted using the FDR method of Benjamini-Hochberg. The enrichment significance cut off level for the adjusted p value was 0.05 in both databases. All analyses were conducted in February 2020.

Identification of the Host-Pathogen Interactomes
In total, 11 human proteins were found to interact with the bacterial pathogens as evidenced by the HPIDB 3.0 analysis. The genes expressing those 11 proteins were the following: CFAH (complement factor H), VTN (vitronectin), CEAM1 (carcinoembryonic antigenrelated cell adhesion molecule 1), YWHAB (tyrosine 3-monooxygenase/tryptophan 5monooxygenase activation protein beta), YWHAE (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon), YWHAG (tyrosine 3-monooxygenase/ tryptophan 5-monooxygenase activation protein gamma), YWHAH (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein eta), YWHAQ (tyrosine 3-monooxygenase/ tryptophan 5-monooxygenase activation protein theta), YWHAZ (tyrosine 3-monooxy-genase/tryptophan 5-monooxygenase activation protein zeta) and SFN (stratifin) (the last seven genes are all members of the 14-3-3 family of protein kinase C inhibitors), and RAC1 (ras-related C3 botulinum toxin substrate 1). The exact type of host-pathogen interactions along with the experimental methods used in each case are shown in Table 1. A graphical representation of the host-pathogen interactomes is displayed in Figure 1.  In the context of BP GOs, the top five significantly enriched annotations pertained to protein insertion into mitochondrial membrane and mitochondrial membrane permeability, involved in apoptotic pathways. Interestingly, the genes related to these processes were found to be the seven genes of the 14-3-3 family in all five cases ( Table 2). With respect to CC GOs, the top five annotations were related to membrane junctions and cell-substrate adhesion (Table 3). RAC1, CEACAM1 and all 14-3-3 family genes, except for SFN, were the predominant genes in these lists. Regarding diseases, the disorder where the input genes were over-represented was the lung adenocarcinoma, followed by the soft drusen and the bipolar and glomerular renal disorders (Table 4). Remarkably, five out of the seven genes that were found to relate to adenocarcinoma were also over-represented in the BP and CC annotations, with small differentiations in each case. This finding is highlighted in Table 5.     In terms of BP GOs, the first two annotations were related to protein insertion into the mitochondrial membrane involved in apoptotic signaling pathway and membrane organization. The over-represented genes in this category belonged to the 14-3-3 family, as in the case of ToppFun ( Table 6). The annotations with the higher level of confidence concerning cellular components were the extracellular exosomes and focal adhesion. All 11 genes were correlated to the exosomes annotation, while YWHAQ, YWHAZ, YWHAG, YWHAE, YWHAB and RAC1 were over-represented in the focal adhesion annotation (Table 7). Finally, the analysis with respect to the KEGG pathways revealed an involvement in the regulation of the cell cycle and a strong correlation with viral carcinogenesis. The genes that were found to correlate with the former are the seven genes of the 14-3-3 protein family, while the ones predicted to associate with the latter were the YWHAQ, YWHAZ, YWHAH, YWHAG, YWHAE, YWHAB and RAC1. The KEGG pathway predictions are shown in detail in Table 8. Of interest, five genes (RAC1, YWHAB, YWHAE, YWHAG and YWHAZ) were common in the lung adenocarcinoma and viral carcinogenesis ontologies, a finding which is presented in the form of a Venn diagram in Figure 2.

Biological Interpretation of the Human-Gram (−) Pathogens Interactome
In total, 11 human proteins were identified as interactors with the three gram-negative pathogens. It has been reported that the lipoprotein binding factor H of Hi interacts with the human CFAH and VTN proteins, whereas the surface adhesion protein E is associated with VTN only. These types of interactions have been suggested to mediate the inhibition of the indirect complement activation pathway, serum resistance and pathogen adhesion to alveolar epithelium [14,22]. Human CEAM1 was found to interact with the UspA1 adhesin of MorCa. The ability of UspA1 to bind to CEAM1 by a trimeric coil has already been described [23]. UspA1 is an adhesive molecule, facilitating pathogen invasion in the respiratory epithelium and CEAM1 adhesive properties have also been reported [15,24]. Interaction of other gram-negative bacteria with CEAM1 has been found to mediate pathogen invasion; thus, this could also be the case for MorCa [24]. ExoS of Psa was associated with RAC1 and the members of the 14-3-3 protein family. ExoS has been shown to mediate RAC1 downregulation, thereby increasing Psa resistance to host cell defenses and blocking phagocytosis [16]. Furthermore, ExoS has been proven to bind all 14-3-3 family proteins, a procedure necessary for its activation [25,26]. This phenomenon is linked with increased Psa pathogenicity. Overall, these interactions could be interpreted as inducers of pathogen invasion and resistance thereby facilitating pathogenesis.

Gene Involvement in Apoptotic Pathways, Cellular Junctions, Cell Cycle, Carcinogenesis and Lung Adenocarcinoma: FEA Interpretation
The ToppFun FEA, relative to diseases, predicted a possible association of the RAC1, CFH and 5 of the 14-3-3 protein family genes, namely SFN, YWHAB, YWHAE, YWHAG and YWHAZ with lung adenocarcinoma. The GeneCodis results were in line with this prediction and further revealed an association of several of the interactome genes with the cell cycle. These findings suggest that the genes of interest could be linked to pathways which mediate the deregulation of cell cycle thus inducing carcinogenesis and, in this case, lung adenocarcinoma.
Additionally, ToppFun FEA relative to BP GOs revealed an association of all the 14-3-3 protein family genes to apoptotic pathways, whereas the same genes, together with RAC1 and CEACAM1 were over-represented in the CC GO annotations. Interestingly, FEA results from both databases shared substantial similarities. In the BP GOs and CC GOs, the common annotations were the "positive regulation of protein insertion into mitochondrial membrane involved in apoptotic signaling pathway" and "focal adhesion", respectively. Of note, in each common annotation, the associated genes were the same, both in ToppFun and GeneCodis.
It has been reported that RAC1 contributes to the re-assembly and maintenance of the adherens junctions, thus preserving the epithelial integrity and the endothelial barrier function [27,28]. Moreover, it has been demonstrated that the inhibition of RAC1 in pulmonary endothelial cells leads to leaky junctions, a finding which highlights its protective role in the endothelial barrier [29]. Regarding the members of the 14-3-3 protein family (YWHAB, YWHAE, YWHAG, YWHAZ) and their role in cell junction regulation, previous studies have provided evidence that the binding of these proteins to protein kinase C or to Connexin 43 inhibits the activity of gap junctions [30,31]. It has also been suggested that the same proteins regulate cell adhesion, by interacting with integrin b1 [32]. When it comes to their involvement in apoptotic pathways, all 14-3-3 proteins have been found to bind the pro-apoptotic BCL family protein members BAD, BAX and BIN, inducing their inactivation and the inhibition of mitochondria mediated apoptosis [33][34][35][36]. Our findings, in line with these reported data, link the aforementioned proteins to the regulation of intercellular junctions, cell adhesion and apoptosis. The functional interplay of junctional proteins with cell adhesion and apoptosis and the contribution of this interaction to the pathogenesis and/or progression of various types of cancer have been well documented [37,38]. Moreover, Psa has been reported to contribute to endothelial and epithelial barrier disruption. One of the Psa proteins, LasB metalloprotease, induces interruption of intercellular and cell to matrix junctions of endothelial cells [39]. Additionally, Psa caused reorganization of tight junctions and reduction of transepithelial resistance of bronchial epithelial cells, in experiments conducted in vitro and in vivo [40]. These findings suggest that Psa could facilitate malignant behaviors such as transepithelial invasion and hematogenous metas-tasis. The results of our study suggest that human infection with the Hi, MorCa and Psa gram-negative bacteria could also be related with the pathogenesis and/or progression of lung adenocarcinoma possibly through the targeting of the epithelial cellular junctions and the subsequent deregulation of the cell adhesion and apoptotic pathways.
Regarding the enrichment of genes in the lung adenocarcinoma annotation, it has already been shown that RAC1 is important for lung cancer stem cell activity, and that its knockdown results in impaired proliferation, colony formation, adhesion, migration and invasion of human lung adenocarcinoma cells [41]. In tumor tissue from patients with lung adenocarcinoma and lung squamous cell carcinoma, RAC1 was over expressed, compared to normal tissue [42]. This finding was related to poor prognosis and high risk of lymphatic metastasis. In terms of viral carcinogenesis, RAC1 was found to facilitate HPV-8 related skin papilloma development [43]. On the other hand, the CFAH protein has been shown to increase the oncogenic action of adrenomedullin, a peptide that promotes tumor growth in various cancer types, including lung adenocarcinoma [44,45]. It has also been reported that complement factor H is over expressed in non-small cell lung cancer, blocking complement action on cancer cells [44]. As for SFN, it has been demonstrated that it is over expressed in invasive lung adenocarcinoma cell lines, compared to in situ adenocarcinoma or normal lung tissue and it induces proliferation of cancer cells [46]. Moreover, the 14-3-3 protein family members and, more specifically, the YWHAG and YWHAZ proteins have been linked to tumorigenesis and induction of malignant phenotypes, including cell growth, migration and invasion [47][48][49]. The same proteins are also involved in the regulation of cell cycle by interacting with the chk1 kinase on the one hand and by inhibiting the cdc25C protein on the other [50]. Infection with the gram-negative E.Coli has been associated with tumor growth, progression and metastasis of non-small cell lung cancer, by promoting lipid synthesis through the TLR4/9 pathway and, also, by facilitating cancer stem cell properties through the TLR4/IL-33 pathway [51][52][53][54]. Overall, published data have suggested a link of the proteins that occurred in our study to the cell cycle processes, carcinogenesis and malignant phenotypes.

Novelty, Weaknesses and Future Directions of Our Study
The novelty of our study lies in the unreported association of the MorCa/Psa/Hi-Human interactome with carcinogenesis and lung adenocarcinoma, as predicted by the in-silico analysis tools. In addition to this, we have identified specific BP GOs, CC GOs and KEGG pathways that are targeted by the human proteins interacting with the gram-negative pathogens. Our findings are also indicative of an overlap between the aforementioned annotations as the proteins that were found to associate with lung adenocarcinoma have also been related with the regulation of both the cell cycle and apoptotic pathways. Therefore, altered expression or function of these proteins due to the interaction with these pathogens could be followed by the disruption of the above processes, possibly leading to carcinogenesis. Additionally, these same proteins are also linked to cell junctions and adhesive components, which could be important for the adhesive, invasive and metastatic properties of a tumor. Altogether, this study does not only predict an association with lung adenocarcinoma but also suggests potential pathophysiological mechanisms that could lead to this clinical entity.
It should be mentioned that our in silico analysis has certain limitations since our results will have to be experimentally verified so as to test the aforementioned hypotheses. Our findings, however, provide the basis for an experimental rational to be followed in future clinical and translational studies with respect to the most common respiratory tract gram-negative pathogens in patients with lung adenocarcinoma and identification of the relationship of infection and carcinogenesis. It is worth mentioning that, in order to verify experimental data availability, we also performed a search in the gene expression omnibus (GEO) datasets, focusing on studies reporting on the association of the gramnegative bacteria with lung cancer or lung adenocarcinoma. Our search did not yield any transcriptomic data, thus future research is imperative in order to validate bioinformatic predictions and further expand our knowledge over the possible involvement of gramnegative bacteria respiratory infection with lung carcinogenesis.
Author Contributions: L.-E.G. and A.-S.G.: data curation and analysis, investigation, and writing. C.H. and K.I.G.: review and editing. E.R.: conceptualization, investigation, validation, and review and editing. S.G.Z.: supervision, validation, and review and editing. All authors have read and agreed to the published version of the manuscript.