Proteomic Network Analysis of Bronchoalveolar Lavage Fluid in Ex-Smokers to Discover Implicated Protein Targets and Novel Drug Treatments for Chronic Obstructive Pulmonary Disease

Bronchoalveolar lavage of the epithelial lining fluid (BALF) can sample the profound changes in the airway lumen milieu prevalent in chronic obstructive pulmonary disease (COPD). We compared the BALF proteome of ex-smokers with moderate COPD who are not in exacerbation status to non-smoking healthy control subjects and applied proteome-scale translational bioinformatics approaches to identify potential therapeutic protein targets and drugs that modulate these proteins for the treatment of COPD. Proteomic profiles of BALF were obtained from (1) never-smoker control subjects with normal lung function (n = 10) or (2) individuals with stable moderate (GOLD stage 2, FEV1 50–80% predicted, FEV1/FVC < 0.70) COPD who were ex-smokers for at least 1 year (n = 10). After identifying potential crucial hub proteins, drug–proteome interaction signatures were ranked by the computational analysis of novel drug opportunities (CANDO) platform for multiscale therapeutic discovery to identify potentially repurposable drugs. Subsequently, a literature-based knowledge graph was utilized to rank combinations of drugs that most likely ameliorate inflammatory processes. Proteomic network analysis demonstrated that 233 of the >1800 proteins identified in the BALF were significantly differentially expressed in COPD versus control. Functional annotation of the differentially expressed proteins was used to detail canonical pathways containing the differential expressed proteins. Topological network analysis demonstrated that four putative proteins act as central node proteins in COPD. The drugs with the most similar interaction signatures to approved COPD drugs were extracted with the CANDO platform. The drugs identified using CANDO were subsequently analyzed using a knowledge-based technique to determine an optimal two-drug combination that had the most appropriate effect on the central node proteins. Network analysis of the BALF proteome identified critical targets that have critical roles in modulating COPD pathogenesis, for which we identified several drugs that could be repurposed to treat COPD using a multiscale shotgun drug discovery approach.

Although structural changes in the airways, parenchyma, and pulmonary vessels are typical in patients with COPD, the lower airways and the alveoli are the initial sites of the inflammatory process [17,18]. The inflammatory process initiated by smoking persists after cessation and is likely exaggerated by autoimmunity and infection [19,20]. Accurate and precise measurement of the molecular mediators in the airways should aid in rigorous analysis of their role in disease.
There has been a keen interest in understanding the genetic determinants of COPD, as the interaction between genes and the environment leads to protein expression, ultimately resulting in either healthy or disease states. However, genomic data alone does not predict protein abundance or activity; proteins are the ultimate participants in integrated biological processes that lead to healthy physiological function or pathology. Proteome-based analysis of bronchoalveolar lavage fluid (BALF) in COPD can identify tissue-specific markers of inflammation that can lead to understanding the mechanisms of COPD progression.
We sought to determine an unbiased proteome-based analysis of BALF in COPD under stable conditions (not in exacerbation status) to identify a broad series of molecules involved in COPD pathogenesis. A label-free proteomics mass spectroscopy method was utilized. The differentially expressed proteins were analyzed using multiple bioinformatics tools to identify critical pathways that were altered in these ex-smoker patients with COPD compared to never-smoker healthy controls and important "hub" proteins implicated in COPD etiology, and identify putative drug candidates that can be repurposed to treat COPD.
The raw proteomic data used in this manuscript were initially detailed in a previously published methodology manuscript using strict criteria (two peptide identification criteria for a protein, ≥1.5 folds change, and p-value < 0.05) to identify 423 individual proteins with 76 proteins expressed differently between COPD and controls [21]. In this analysis, we adopted a pragmatic approach to the same raw proteomic data (one peptide identification criterion, ≥1.5 folds change, and p-value < 0.05) that identified 1831 individual proteins and 233 differentially expressed proteins between the two groups. The latter, more practical approach provides important information for biomarker and therapeutic target discovery that may be utilized in future research to discover valuable interventions.

Study Population Characteristics
Characteristics for subjects included in the BALF study are shown in Table 1, with the only significant differences between the two groups being in tobacco smoke exposure and lung function. Pack-Years: the average number of packs of cigarettes smoked per week multiplied by the years the subject smoked cigarettes. *: The significant differences in lung function noted between the control subjects and COPD subjects are expected based on a priori selection of the cohort based on different lung function capacities.

BALF Proteome Characteristics
A total of 1831 unique proteins were identified in the BALF proteome. A total of 233 proteins (>1.5-fold absolute change, p-value < 0.05) had a significant differential expression in BALF samples from patients with COPD versus healthy ex-smokers; 138 proteins were decreased in COPD while 95 proteins were increased (Tables S2 and S3).

Bioinformatic Pathway Analysis of BALF Proteomic Data
The protein expression datasets were imported into IPA (Ingenuity Systems) and projected onto the relevant biological pathways; processes linked to the differentially expressed proteins were analyzed with the IPA's manually curated knowledge database. Of the 233 differentially expressed proteins, 217 matched the IPA-curated database and were analyzed. Sixteen pathways were noted to have several proteins associated with the differentially expressed BALF dataset (Table S6), including proteins implicated in cellular movement, cellular death and survival, cell morphology, immune cell trafficking, and cell cycle. Figures S1-S4 depict IPA networks of selected pathways with the highest number of differentially expressed proteins.

Computational Drug Prediction
One-hundred-and-thirty out of 233 BALF differentially expressed proteins were identified in the CANDO human protein library. This subset of proteins within the CANDO platform was used to predict 189 putative drug candidates with the most similar protein interaction signatures to the set of known drugs used to treat COPD ( Figure 1 and Table S7). Many of the drugs were corticosteroids; however, other putative drugs included tezacaftor [33], a recently developed drug to potentiate sodium channel activity to treat cystic fibrosis; two additional drugs predicted to treat COPD, gemfibrozil [34,35], and pioglitazone [35,36], are drugs currently used to treat hyperlipidemia and diabetes, respectively. A total of 233 proteins were identified as differentially expressed between COPD patients and healthy controls by mass spectrometry. Of these, 214 were represented in the Elsevier knowledge graph [37], with the remainder comprising specific immunoglobulin chain proteins. A query of the knowledge graph for documented regulatory interactions between these protein entities yielded 206 regulatory edges supported by 807 references (with a median of one reference per edge). One-hundred-and-twelve of the 214 identified proteins could not be connected to the broader network circuit by a documented interaction. The protein entities in this network were then assessed in terms of their importance as mediators of signal transfer based on their betweenness centrality ( Figure 2).

Network Topological Analysis
Four nodes representing proteins in the network stood out based on the normalized betweenness centrality values representing a greater than linear baseline increase from the next lower-ranking node: fibronectin, vimentin, intercellular adhesion molecule 1 (ICAM1), and galectin-3. These potentially key signaling mediators had a betweenness centrality of at least 25% of the maximum.
Analysis of the initial data reveals fibronectin and ICAM1 are reduced in COPD patients relative to healthy controls; thus, any candidate therapeutic should target an increase in their activity. The reverse is true for vimentin and galectin-3. We, therefore, sought drugs or combinations of drugs predicted to accomplish the appropriate activation or inhibition of the four most central nodes. Specifically, drugs that will promote central node proteins that were downregulated in the COPD cohort and inhibition of central node proteins that were overabundant in COPD. Therefore, the idealized drug vector constitutes interactions leading to desirable modulation of the central hub protein. CANDO identified 189 distinct drugs (Figure 1, Tables S6 and S7) with relevance for COPD; 39 of these represented in the Elsevier knowledge graph [37] were analyzed for their enrichment of the desired agonist and antagonist effects on the most central entities in the protein regulatory network. Highly enriched drugs or drug pairs are more likely than randomly selected drugs to exert appropriate inhibition or promotion of the most central proteins. Two single drugs (fluocinolone acetonide and dexrazoxane) and 57 two-drug combinations were significantly enriched. Fluocinolone acetonide and dexrazoxane appeared in 54% and 46% of all significantly enriched two-drug combinations, respectively, far greater than the other drugs appearing in these combinations ( Figure 3). Fluocinolone acetonide and dexrazoxane are the most enriched two-drug combination leading to an idealized drug vector.
. Figure 1. Putative drug candidates for treating COPD generated using the CANDO platform. A subset of 130 proteins from the CANDO human protein library was identified from 233 differen tially expressed proteins in the BALF. These 130 proteins were utilized to generate BALF-specifi interaction signatures for 2450 FDA-approved drugs via our in-house docking protocol BANDOCK (see methods). These drug-proteome interaction signatures were compared to 34 known drugs used to treat COPD to predict 189 of the most similar putative drug candidates. The 189 drugs are repre sented by colored circles, with the diameter of the circles decreasing with descending overall rank Drug name labels are depicted for the selection of the 189 drugs shown by the colored circles. Th horizontal axis plots the consensus score count or the number of times the particular drug is listed within the top 30 most similar drugs to those known to treat COPD based on interaction signatur similarity. The vertical axis plots the average of the cumulative ranks of the consensus scores for th putative drug. The overall rank of a putative drug is determined by initially sorting the drug by th consensus score, as noted above, and then additional sorting by the average rank. Many of the drug candidates were corticosteroids not used to treat COPD; however, other putative drugs included tezacaftor, a drug to potentiate sodium channel activity in the treatment of cystic fibrosis; two addi tional drugs predicted to treat COPD, gemfibrozil, and pioglitazone, are drugs currently used t treat hyperlipidemia and diabetes, respectively. This analysis indicates that the CANDO platform applied to the BALF proteome can generate putative drug candidates for COPD treatment.

Literature Informed Protein-Protein and Protein-Drug Interaction Network
A total of 233 proteins were identified as differentially expressed between COPD pa tients and healthy controls by mass spectrometry. Of these, 214 were represented in th Elsevier knowledge graph [37], with the remainder comprising specific immunoglobulin chain proteins. A query of the knowledge graph for documented regulatory interaction between these protein entities yielded 206 regulatory edges supported by 807 reference (with a median of one reference per edge). One-hundred-and-twelve of the 214 identified proteins could not be connected to the broader network circuit by a documented interac Figure 1. Putative drug candidates for treating COPD generated using the CANDO platform. A subset of 130 proteins from the CANDO human protein library was identified from 233 differentially expressed proteins in the BALF. These 130 proteins were utilized to generate BALF-specific interaction signatures for 2450 FDA-approved drugs via our in-house docking protocol BANDOCK (see methods). These drug-proteome interaction signatures were compared to 34 known drugs used to treat COPD to predict 189 of the most similar putative drug candidates. The 189 drugs are represented by colored circles, with the diameter of the circles decreasing with descending overall rank. Drug name labels are depicted for the selection of the 189 drugs shown by the colored circles. The horizontal axis plots the consensus score count or the number of times the particular drug is listed within the top 30 most similar drugs to those known to treat COPD based on interaction signature similarity. The vertical axis plots the average of the cumulative ranks of the consensus scores for the putative drug. The overall rank of a putative drug is determined by initially sorting the drug by the consensus score, as noted above, and then additional sorting by the average rank. Many of the drug candidates were corticosteroids not used to treat COPD; however, other putative drugs included tezacaftor, a drug to potentiate sodium channel activity in the treatment of cystic fibrosis; two additional drugs predicted to treat COPD, gemfibrozil, and pioglitazone, are drugs currently used to treat hyperlipidemia and diabetes, respectively. This analysis indicates that the CANDO platform applied to the BALF proteome can generate putative drug candidates for COPD treatment. BALF network centrality nodes ranked by betweenness centrality. Betweenness centrality quantitatively describes how a node (in this case, a differentially expressed protein in the BALF proteome) mediates the interaction between communities of neighboring nodes in the network. Shown are 44 network entities with betweenness centrality > 0.01, normalized to the maximum betweenness centrality present in the network. The betweenness centrality scores for all nodes were expressed as fractions of the maximum betweenness centrality present in the network. The (red and blue) colors indicate the needed effect (inhibition/induction) to restore these entities from COPD levels to the normal levels in healthy control subjects. The four nodes with ≥25% of the maximum betweenness centrality (fibronectin) with normalized betweenness centrality values representing a greater than the linear baseline increase from the next lower ranking node are fibronectin, vimentin, intercellular adhesion molecule1 (ICAM1), and galectin-3 (LGALS3). These potential key signaling mediators had a betweenness centrality of at least 25% of the maximum. Topological analysis of the interaction network regulatory interactions documented in the literature suggests that these proteins were central mediators of COPD [38][39][40][41][42][43]. Colors indicate the needed effect to restore these entities to the normal levels in healthy control subjects.

Figure 2.
BALF network centrality nodes ranked by betweenness centrality. Betweenness centrality quantitatively describes how a node (in this case, a differentially expressed protein in the BALF proteome) mediates the interaction between communities of neighboring nodes in the network. Shown are 44 network entities with betweenness centrality > 0.01, normalized to the maximum betweenness centrality present in the network. The betweenness centrality scores for all nodes were expressed as fractions of the maximum betweenness centrality present in the network. The (red and blue) colors indicate the needed effect (inhibition/induction) to restore these entities from COPD levels to the normal levels in healthy control subjects. The four nodes with ≥25% of the maximum betweenness centrality (fibronectin) with normalized betweenness centrality values representing a greater than the linear baseline increase from the next lower ranking node are fibronectin, vimentin, intercellular adhesion molecule1 (ICAM1), and galectin-3 (LGALS3). These potential key signaling mediators had a betweenness centrality of at least 25% of the maximum. Topological analysis of the interaction network regulatory interactions documented in the literature suggests that these proteins were central mediators of COPD [38][39][40][41][42][43]. Colors indicate the needed effect to restore these entities to the normal levels in healthy control subjects. crease in their activity. The reverse is true for vimentin and galectin-3. We, therefore, sought drugs or combinations of drugs predicted to accomplish the appropriate activation or inhibition of the four most central nodes. Specifically, drugs that will promote central node proteins that were downregulated in the COPD cohort and inhibition of central node proteins that were overabundant in COPD. Therefore, the idealized drug vector constitutes interactions leading to desirable modulation of the central hub protein. CANDO identified 189 distinct drugs (Figure 1, Tables S6 and S7) with relevance for COPD; 39 of these Figure 3. Drug frequency amongst idealized drug combinations predicted to modulate central proteins in COPD. The drugs initially identified by CANDO, which were predicted to activate or inhibit the four central nodes with the highest maximum betweenness centrality, are listed earlier ( Figure  2). Specifically the drugs are predicted to promote central node proteins which were downregulated in the COPD cohort and inhibit central node proteins (identified by network topological graph),  We additionally conducted a targeted query to assess the predicted effects of drugs commonly applied in pulmonary disease treatment on the most central proteins of this regulatory network ( Figure 4 and Table S8). While some of these drugs have been documented to have the desired effect on two of the central proteins, fibronectin or vimentin, all have been documented to have the opposite effect on at least one of the most central proteins. Therefore, they were not significantly enriched out of the set of all possible candidate drugs. Fluocinolone acetonide and dexrazoxane appeared in 54% and 46% of all significantly enriched twodrug combinations, respectively, which is far greater than other drugs appearing in these combinations. The combination of fluocinolone acetonide and dexrazoxane is the most enriched two-drug combination leading to an idealized drug vector that most likely reverses the protein levels of the four central nodes to levels found in healthy control subjects.

Discussion
Our investigation of the COPD BALF proteome utilizing novel bioinformatic techniques revealed significant differences in proteins involved in multiple biological processes, including lung-specific mechanisms, protease/anti-protease homeostasis, immunoregulation, and the extracellular matrix. Proteomic profiling of the complex pathways implicated in COPD provides broader physiological exploration not provided by studying one entity at a time. We identified several differentially expressed proteins in COPD versus controls that, based on a review of published literature, have not been previously implicated in COPD etiology. This preliminary analysis illustrates how our BALF proteomic analysis represents a powerful approach to elucidate COPD pathogenesis and identify novel biomarkers.
Employing the bioinformatics tool DAVID and IPA, putative pathway networks were constructed based on the differentially expressed proteins in the BALF proteome that implicated multiple transcription factor pathways and disparate biological processes, such as

Discussion
Our investigation of the COPD BALF proteome utilizing novel bioinformatic techniques revealed significant differences in proteins involved in multiple biological processes, including lung-specific mechanisms, protease/anti-protease homeostasis, immunoregulation, and the extracellular matrix. Proteomic profiling of the complex pathways implicated in COPD provides broader physiological exploration not provided by studying one entity at a time. We identified several differentially expressed proteins in COPD versus controls that, based on a review of published literature, have not been previously implicated in COPD etiology. This preliminary analysis illustrates how our BALF proteomic analysis represents a powerful approach to elucidate COPD pathogenesis and identify novel biomarkers.
Employing the bioinformatics tool DAVID and IPA, putative pathway networks were constructed based on the differentially expressed proteins in the BALF proteome that implicated multiple transcription factor pathways and disparate biological processes, such as extracellular space, proteolysis, extracellular matrix, cell adhesion, cytoskeleton, defense response, cell migration, and oxidation reduction.
The CANDO platform identified 189 drug candidates with similar protein interaction signatures based on the BALF proteome compared to known drugs used to treat COPD. However, while most putative drug and protein interactions are likely inhibitors, the Pharmaceuticals 2022, 15, 566 9 of 18 induction or inhibition of a target protein is indeterminable with solely the binding potential between drug and protein pairs. Topological analysis of the interaction network connecting 233 proteins differentially expressed in COPD through regulatory interactions documented in the literature suggested that ICAM1 [38] and galectin-3 [39] are important central mediators of inflammation. At the same time, both fibronectin [40,41] and vimentin [42] are central mediators of inflammation and fibrogenesis. CD44 [43] is one of the most abundant receptors in mesenchymal stem cells and has been implicated in cell migration in response to lung injury. This body of evidence corroborates the pathway-enrichment analysis results described above and points to fibrosis and innate inflammation as important processes governing the pathogenesis and progression of COPD. A literature knowledge-based query (Elsevier knowledge graph) of drugs with desired drug-target interactions (generated using CANDO) identified putative drugs, such as anti-neoplastic, anti-fibrotic drugs, and inflammation regulators that would restore key central proteins to the levels characteristic of healthy controls. Our results also suggest that currently utilized medications for COPD have disparate effects on the identified central node proteins that are key putative mediators of COPD pathogenesis and progression.
In contrast, the corticosteroid fluocinolone acetonide [44,45] and the cardioprotective agent dexrazoxane [46,47] were highly enriched for the desired effects on central network entities, both individually and in combination. Fluocinolone acetonide is a stronger potentiator than other corticosteroids of the TGF-β pathway [48], which is noted to be dysregulated in COPD [49], and fluocinolone acetonide may be more effective than comparable corticosteroids in improved homeostasis in that pathway. Dexrazoxane [46,47] is used to reduce cardiac toxicity associated with anthracycline-based chemotherapy agents by binding to iron and reducing reactive oxygen species; with oxidative stress as a significant factor in COPD pathogenesis [50], antioxidative therapy may be beneficial.
The documented actions of these immunomodulators were predicted here to substantially counteract the observed dysregulation of centrally connected proteins in COPD patients. The relatively high representation of immunomodulators among the candidate agents and the increased centrality of fibrosis-related proteins is consistent with the paradigm of airway remodeling as central to COPD pathology [51]. With additional data, this regulatory circuit could be used as a testbed for computational evaluation of these and other candidate drug effects using network topological methods [52].

Limitations and Strengths
Our approach does have some limitations. The variability in how much BALF is recovered from each aliquot of saline infused in to the lower airway in COPD vs. control subjects is inherent in most BALF proteomic analyses. However, the BALF proteins were normalized to albumin BALF concentrations to account for the variability. The examination of protein levels without accounting for post-translational modifications, such as phosphorylation, may neglect important differences in protein interactions and activity, despite no significant differences in protein levels. Also, the BALF samples were from subjects in the COPD group who were ex-smokers. This exclusion limits the generalizability of our findings, particularly in current smokers, since the acute effects of tobacco smoke were excluded in our study design.
However, we confined our analysis to ex-smokers with moderate COPD to obtain some uniformity of the COPD phenotype and to avoid the acute inflammatory effects of current smoking. Future work on proteomic profiles will inform us of the difference between such profiles in current smokers and different stages of COPD.

Comparison to Previously Published Studies
A sputum proteomics study endeavored to identify COPD severity biomarkers by employing 2D gel electrophoresis and revealed 15 proteins that were significantly differentially expressed between healthy smoker controls and subjects with GOLD stage II; subsequently, 9 of the 15 candidate proteins were validated with Western Blot. Of the nine candidate proteins validated with Western Blot, seven were statistically significantly different between groups, specifically albumin, alpha-2-HS glycoprotein, transthyretin, PSP94, apolipoprotein A1, lipocalin-1, and PLUNC [53]. Employing quantitative ELISA data normalized for protein content, the investigators identified apolipoprotein A1 and lipocalin-1 as statistically differentially expressed in COPD. Although apolipoprotein A1 and lipocalin-1 were identified in our study of the BALF proteome, the proteins were not significantly differentially expressed, likely due to the differences in expression in the different biocompartments of sputum vs. bronchoalveolar lumen.
A 2D differential gel electrophoresis study and subsequent mass spectroscopy were performed by Ohlmeier et al., which compared healthy smokers, non-smokers, and smokers with GOLD stage II COPD and revealed a different set of 15 proteins that were differentially expressed between the groups [54]. Of these proteins, polymeric immunoglobulin receptor levels in lung tissue and blood between the three groups were correlated with airflow obstruction.
In Lee et al., tumor-free lung tissue harvested from patients with lung cancer resection, when examined via 2D gel electrophoresis/MALDI-TOF-MS, revealed 8 proteins that were upregulated in subjects with COPD compared to non-smokers and 10 significantly differentially expressed proteins between subjects with COPD and smoking subjects without COPD [55]. Two of the identified proteins, matrix metalloprotease 13 (MMP13) and thioredoxin-like 2, were confirmed to be increased in COPD subjects with Western Blot and immunohistochemical staining, with MMP13 localized to the alveolar macrophage and type II pneumocytes and thioredoxin-like 2 found in the bronchial epithelium. Thioredoxinlike 2, which contains thioredoxin, was found in the BALF proteome but not significantly differentially expressed. However, MMP13 was not identified in our BALF study due to differences in study populations and variable biocompartments.

Methods
We analyzed the protein quantifications derived from the BALF of subjects with COPD and healthy ex-smoker control subjects via liquid chromatography and mass spectroscopy. We then used pathway analysis tools to identify the relevant cellular pathways associated with differentially expressed proteins quantified from the BALF analysis. We subsequently employed the Computational Analysis of Novel Drug Opportunities (CANDO) platform (https://github.com/ram-compbio/CANDO Accessed on 1 December 2021) to identify FDA-approved drugs that could be repurposed to COPD based on their putative interaction with the differentially expressed proteins. Using topological network analysis, we identified putative hub proteins that modulate the cellular pathways associated with COPD. Using the medical literature to predict the repurposed drugs effects on the most important hub protein, we created a refined list of drugs predicted to modulate the cellular pathway to impede COPD pathogenesis and generate proteomic interaction signatures for the compounds.

Recruitment of Subjects
BALF was obtained in a NHLBI-funded study of innate lung defense in COPD [56]. All procedures received approval from the Institutional Review Board (IRB), Veterans Affairs Western New York Healthcare System (WNY-VA), and strictly adhered to institutional guidelines.

Ethics Statement
This study is a sub-study of a larger group of patients with COPD and healthy controls to understand biological determinants of exacerbation frequency and was approved by the Institutional Review Boards of the Veterans Affairs Western New York Healthcare System and the University at Buffalo. The participants gave written consent to the study via an IRB-approved consent form.

Inclusion/Exclusion Criteria
The inclusion criteria and procedures for this study have been described previously and are provided in the supplementary material [55]. After informed consent, 116 volunteers were divided into three groups: (1) healthy non-smokers, (2) ex-smokers with COPD, and (3) active smokers with COPD who underwent bronchoscopy and bronchoalveolar lavage. The methodology for bronchoscopy, lavage, and sample processing is included in the supplementary material.
We selected BALF obtained from 10 ex-smokers with moderate COPD and 10 healthy non-smoking controls for proteomic analysis, respectively. To minimize variability due to the effects of acute smoking and disease severity, we confined this analysis to ex-smokers and moderate stage 2 disease per the Global Obstructive Lung Disease (GOLD) [56] criteria of the forced expiratory volume in 1 s (FEV 1 ); 50-80% predicted with the ratio of FEV 1 over forced vital capacity (FEV 1 /FVC) < 0.70. All ex-smokers had ceased smoking for at least 1 year.

Bronchoscopy and BALF Sample Preparation
The research bronchoscopy and BALF sample preparation were performed as described previously [57].

Protein Identification/Quantification
To investigate the soluble molecules in the epithelial lining fluid that may participate in COPD pathogenesis, unbiased proteomic analysis of BALF commenced without protein depletion or fractionation. Details of the methodology were published [21] and are also provided in the supplementary material.

Long Gradient Nano-RPLC/Mass Spectrometry
The complete separation of the complex peptide mixture utilized a nano-LC/nanospray setup [21]; the ion-current long gradient approach with mass spectrometry and subsequent protein identification was performed as described in Tu et al. [21,58,59]. All proteins identified with one or more peptide hits, fold change of ≥1.5, and p-value < 0.05 are included as part of the differentially expressed BALF proteome.

Manually Curated Pathway Analysis
Gene ontology, transcription factors, and expression locations were determined by uploading the protein expression dataset onto a web-based tool, the NIH's Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.7 (http://david.abcc.ncifcrf.gov/ Accessed on 21 March 2021) [22,23]. Biological networks were generated with Ingenuity Pathway Analysis (IPA, Ingenuity Systems), a web-based relational database and network generator. Proteins overrepresented in the uploaded datasets in biological networks, canonical pathways, and biological processes were identified.

Literature Informed Protein-Protein and Protein-Drug Interaction Network
In addition to annotating differentially expressed proteins with the manually curated pathways cataloged in IPA, a network of protein-protein interactions was created using known regulatory relationships extracted from published scientific literature using the MedScan text-mining engine [60] as well as protein-drug interactions cataloged in the Reaxsys medicinal chemistry database (Elsevier, Amsterdam). These are embedded in the broader Elsevier Knowledge Graph database [37] and were accessed via the Pathway Studio interface (Elsevier, Amsterdam, The Netherlands) [61].

Shotgun Multiscale Drug Discovery Platform
We used the CANDO platform [62][63][64][65][66][67][68][69] to predict drugs that can be repurposed to treat stable COPD. In CANDO, a compound/drug is potentially repurposable for an indication when it is found to have similar binding interactions with a specific proteome or library of proteins as a drug with known approval for the indication of interest.
This study calculated the interaction scores between 2450 United States Federal Drug Administration (FDA)-approved drugs from the CANDO version 2.3 compound library and a curated human library of 8385 proteins, including 5316 solved X-ray crystallography structures and 3069 computed protein structures modeled by I-TASSER [70,71]. The interaction scores were calculated using the bioanalytic docking (BANDOCK) protocol in the CANDO, which utilizes predicted binding site information and chemical similarity to determine an interaction score that is a surrogate for the likelihood of interaction between a compound and protein [62]. Binding sites were predicted for all human proteins using COACH [72], which uses the consensus of three complementary methods utilizing structure and sequence information to find similarities to solved structures in the Protein Data Bank (PDB) [73,74]. For each binding site predicted by COACH, a confidence score (PScore) and an associated co-crystallized ligand are output. The ligand is then compared to the query compound/drug using chemical fingerprinting methods, which enumerates the presence or absence of molecular substructures on the compound/drug. The Sorensen-Dice coefficient [75,76] was used for the protein-ligand and compound/drug fingerprints (CScore). The BANDOCK interaction score outputted for each compound-protein pair is the product of the Pscore and the Cscore.
For this analysis, we focused on the differentially expressed proteins in the BALF proteome (as described) and drugs used to treat COPD ("MESH:D029424") (Table S1). We selected proteins in the CANDO human protein library represented in the differentially expressed BALF proteome. We then used the CANDO platform to predict the top drug candidates that could be repurposed to treat COPD based on the compound-proteome interaction signature similarity to drugs currently approved/used to treat stable COPD. The protocol iterates through 34 known drugs used to treat stable COPD, counting the number of times drugs not associated with COPD show up in the top 30 most similar compounds to the known treatments, then outputs the consensus predictions ranked by the number of times each compound appeared across all top 30 lists. The similarity between a given drug and all other drugs in the library is determined by comparing their proteomic interaction signatures using the cosine similarity metric. Compounds with greater similarity scores rank stronger than those with low similarity. Thereby, drugs that were most similar (in terms of interaction signatures) to multiple drugs used to treat COPD will be ranked highest.

Network Topological Analysis
Although not a complete descriptor, the topological location and aspects of the connectivity linking a node to a broader biological network can inform the node's function in mediating network behavior. Among the measures of a node's importance or centrality, betweenness centrality has been used to describe how a node might serve as an important mediator of information flow in a regulatory network. In this work, C b (n) for each node n of a network was calculated using the Brandes algorithm [77]. The betweenness centrality of a node n reflects the amount of control that this node exerts over the interaction between communities of neighboring nodes in the network [78] and can be computed as follows: where s and t are the source and target nodes in the network different from n, σ s,t denotes the number of shortest paths from all s to all t, and σ s,t (n) is the number of shortest paths from s to t that must pass through node n. Here, unweighted betweenness centralities were calculated for each node in the literature-informed protein-protein network. The betweenness centrality scores for all nodes were expressed as fractions of the maximum betweenness centrality present in the network. All calculations were conducted in R version 4.0.2 (Vienna, Austria) [79].

Literature Based Drug Enrichment Analysis
Using putative drugs ranked by CANDO and further analyzed via the Elsevier knowledge graph [37], a drug enrichment analysis was performed to predict which drugs can most closely mimic an idealized intervention against the hub proteins identified in the network topological analysis. Drugs are represented as vectors with a length equal to the empirically derived number of protein entities in the network model. Each index value is listed as 0 if there is no interaction between the drug and the corresponding model entity, a 1 if the drug promotes that entity, or a-1 if the drug inhibits that entity. Next, the cosine similarity, S c , between each drug vector and the idealized intervention vector is calculated [80]. Cosine similarity is calculated as: where D is the drug vector and M is the idealized intervention. Higher S c indicates a closer match between the drug vector and the idealized vector. An S c of 1 means the two vectors are identical, and −1 indicates that the two are exactly opposed. For multidrug combinations, the net-effect of the individual drug vectors is calculated as: where n is the total number of drugs in the combination, D i is the vector corresponding to the ith drug, and sgn is the sign function. The cosine similarity of the net-effect vector and idealized vector is then calculated. The statistical significance of these enrichment scores is determined empirically from an estimated null distribution of cosine similarities. This null distribution uses a set of model-relevant background drugs for which each interacts with at least one entity in the network. All CANDO drugs of interest were included in the background. Empirical p-values are estimated asp where r is the number of null S c values greater than the observed S c , and n is the total number of null S c values.

Statistical Analysis
Statistical analysis was performed with SPSS/19. Demographic values were depicted as mean ± SEM.

Conclusions
In summary, our work provides a valuable pipeline for identifying many proteins associated with COPD pathogenesis that illustrate the complexity of the development of this disease and identify putative therapeutic treatment options using cutting-edge bioinformatics approaches. Identifying differentially expressed proteins will form the basis for future mechanistic studies of critical pathways and novel treatment discovery. The validation of our proposed therapeutic approach in animal models and human pilot studies are important next steps.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ph15050566/s1, Figure S1: IPA Network 1: Cellular Movement, Inflammatory Response, Cardiovascular System Development and Function. Figure S2: IPA network 2: cell death and survival, drug metabolism, small molecule biochemistry. Figure S3: IPA network 9: cellular movement, hematological system development and function, immune cell trafficking. Figure S4: IPA network 11: cell cycle, visual system development and function, hair and skin development and function. Figure S5: IPA network putative upstream regulators. Table S1: Drugs commonly prescribed for stable COPD. Curated drugs currently indicated for COPD ("MESH:D029424") from the Comparative Toxicogenomics Database2 and manually curated to include drugs used to treat stable COPD in the United States. Table S2: Unique proteins upregulated in BALF (n = 95), Differentially expressed proteins with at least 1.5x fold change increase in the BALF proteome in COPD versus control cohort samples. Table S3: Unique proteins downregulated in BALF (n = 138). Differentially expressed proteins with at least 1.5x fold change decrease in the BALF proteome in COPD versus control cohort samples. Table S4: See David_FuncAnnotClustering_BALF.csv for DAVID functional annotation clustering file. Table S5: Transcription factors associated with binding sites on genes from differentially expressed proteins in BALF. Transcription factors that have association with binding sites on genes from differentially expressed proteins in BALF as noted by DAVID [ref]. Table S6: Top functional networks of differentially expressed molecules in the BALF proteome. The top biological functions associated with molecular pathways imputed with IPA that are significantly associated differentially expressed molecules measured in the BALF proteome. Red represents upregulated, and green represents downregulated proteins. The networks are collections of interconnected molecules assembled by a network algorithm. Each connection represents known relationships between the molecules, found in the Ingenuity Knowledge Base. The score is the degree of relevance of network eligible molecules to the BALF dataset. The score takes into account the number of network eligible molecules in the network and its size, as well as the total number of network eligible molecules analyzed and the total number of molecules in the Ingenuity Knowledge Base that could potentially be included in networks. The network score is based on the hypergeometric distribution and is calculated with the right-tailed Fisher's Exact Test: Score=-log(Fisher's Exact test result). Focus Molecules are the number of proteins identified in the BALF proteome that is found in the network. Table S7: Computational drug prediction CANDO Table S8: Predicted interactions of drugs treating respiratory diseases and central node proteins. Data Availability Statement: The source code and data used to produce the CANDO results and analyses presented in this manuscript are available from Github repository: https://github.com/ ram-compbio/CANDO.

Acknowledgments:
The authors would like to thank Catherine Wrona for her technical support.

Conflicts of Interest:
The authors declare no conflict of interest. The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the Department of Defense. To obtain the raw proteomic data on a DVD media, please contact Jun Qu, junqu@buffalo.edu.

Abbreviations
BALF bronchoalveolar lavage fluid BANDOCK bioanalytical docking CANDO computational analysis of novel drug opportunities COPD chronic obstructive pulmonary disease DAVID database for annotation visualization and integrated discovery GO gene ontology IPA ingenuity pathway analysis LTQ Orbitrap linear ion trap combined with an orbitrap analyzer mass spectrometer