Investigation of Precise Molecular Mechanistic Action of Tobacco-Associated Carcinogen ‘NNK’ Induced Carcinogenesis: A System Biology Approach

Cancer is the second deadliest disease listed by the WHO. One of the major causes of cancer disease is tobacco and consumption possibly due to its main component, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK). A plethora of studies have been conducted in the past aiming to decipher the association of NNK with other diseases. However, it is strongly linked with cancer development. Despite these studies, a clear molecular mechanism and the impact of NNK on various system-level networks is not known. In the present study, system biology tools were employed to understand the key regulatory mechanisms and the perturbations that will happen in the cellular processes due to NNK. To investigate the system level influence of the carcinogen, NNK rewired protein–protein interaction network (PPIN) was generated from 544 reported proteins drawn out from 1317 articles retrieved from PubMed. The noise was removed from PPIN by the method of modulation. Gene ontology (GO) enrichment was performed on the seed proteins extracted from various modules to find the most affected pathways by the genes/proteins. For the modulation, Molecular COmplex DEtection (MCODE) was used to generate 19 modules containing 115 seed proteins. Further, scrutiny of the targeted biomolecules was done by the graph theory and molecular docking. GO enrichment analysis revealed that mostly cell cycle regulatory proteins were affected by NNK.


Introduction
Cancer is one of the major non-communicable diseases [1] and is accountable for millions of deaths per year worldwide. According to the World Health Organization (WHO), cancer is the second major cause of morbidity, with an estimate of 9.6 billion deaths in 2018 [2]. Cancer is a multistage process caused by aberrations in the cellular processes. Cancer is not only caused by mutation in any single gene but also by the accumulation of mutations in multiple genes, a phenomenon described as 'oncogene addiction' [3]. According to the WHO, there are mainly three reasons that lead to these aberrations, with tobacco consumption heading the list, which is single-handedly responsible for around 22% of deaths by cancer globally [4].
NNK is not just responsible for causing cancer but also holds serious implications in other diseases as well. Earlier studies have shown that NNK has significant impact on steatohepatitis [8], Alzheimer's disease [9], and tuberculosis [10], for example.
In this study, attempts have been made to exploit the system biology approach for an investigation of the overall impact of tobacco-generated carcinogen NNK on molecular systems.
Systems biology is an interdisciplinary field, which is a combination of mathematics, computer science, and biology [11]. Systems biology holds importance as it helps in getting a holistic view of the connections of biomolecules. It provides an anti-reductionist approach towards the involvement of different biomolecular components in a variety of biological systems. It focuses on the development of interactome of the affected targets and then analyzing it by using mathematical models [12]. Graph theory is the core assessment tool for the topological analysis of the interactome that aims to identify hub biomolecular targets based on their clustering coefficients, degrees, and betweenness centralities. Whereas, clustering and gene ontology (GO) enrichment analysis categorize the biomolecules on the basis of functions to procure more promising insight into complex networks. This study aims to find the most probable biomolecular targets of tobacco-associated carcinogen NNK along with its interactions and associated pathways that get perturbated by various cellular mechanisms and lead to cancer development. The most probable key targets of NNK are identified on the basis of their bottleneck scores and based on their thermodynamic interactions with NNK calculated by molecular docking simulations.

Materials and Methods
The full methodology scheme is mentioned in Figure 1. when they form a complex. Docking simulation also explores the thermodynamic stability of the complexes by providing information regarding the binding energy and inhibition constant (Ki) value and helps in finding the best binding modes or orientations of a ligand with its biomolecule. The docking parameters used were based on the studies published by [22]. Autodock 4.0 MGL suite [23] was used for docking simulations. The simulations were performed on AMD E1-6015 APU processor, CPU 1.4 GHz and 4 GB RAM of Hewlett-Packard (HP) machine.

Construction and Visualization of Protein-Protein Interaction Network
In total, 544 biomolecular targets were found to be affected by NNK in approximately 1320 studies using PubMed. A protein-protein interaction network was developed using the STRING database version 10.5 [13]. The network was evidence-based and developed with the highest confidence level score of 0.9, having 50 interactors in the first as well as second shell.

Protein-Protein Interaction Network (PPIN) Analysis
The Cytoscape Software (version 3.6.1) program [14] was used for the analysis of the protein-protein interaction network (PPIN) to generate protein interaction networks. Network analyzer, an in-built plugin of Cytoscape, was used to analyze the topological properties of an NNK modulated PPIN [15]. The topological properties of any network provide a deep insight into complex biological networks [16,17]. The topological analysis also helps in reducing noise in the data and offers reliable information regarding the network [18]. Node properties, like degree distribution, shortest path length, average clustering coefficient, betweenness centrality, and closeness centrality, were also analyzed.

Protein Interaction Network Modular Analysis and Pathway Enrichment
Clusters or modules are closely connected nodes in a network that come together and form a dense sub-network [19]. The analysis of clusters or modules helps in attaining detailed information about PPIN. Molecular COmplex DEtection (MCODE) is a plug-in available in the Cytoscape software program, which was used for the cluster analysis. The clusters are scored on the basis of size and density-a high score means a big and dense cluster-while gene ontology (GO) serves the purpose of validating the cluster that belongs to a specific function [20]. Thus, the enrichment of clusters helps in enriching the pathways by providing an additional number of external genes that are not present in the dataset. GO analysis provides detailed information about the biological process underneath that cluster. For GO functional enrichment analysis, ClueGO (version 2.5.1) plug-in of Cytoscape software was used [21]. The analysis was done using a threshold p value < 0.05. A two-sided hypergeometric test was used for the statistical analysis along with the Bonferroni correction method, in case applied.

Molecular Docking Analysis
Molecular docking is one of the most preferred methods to find the orientation of two molecules when they form a complex. Docking simulation also explores the thermodynamic stability of the complexes by providing information regarding the binding energy and inhibition constant (K i ) value and helps in finding the best binding modes or orientations of a ligand with its biomolecule. The docking parameters used were based on the studies published by [22]. Autodock 4.0 MGL suite [23] was used for docking simulations. The simulations were performed on AMD E1-6015 APU processor, CPU 1.4 GHz and 4 GB RAM of Hewlett-Packard (HP) machine.

Construction of the Network
In total, 544 biomolecular targets were found to be affected by NNK interaction through a literature survey. A protein-protein interaction network (PPIN) was developed using the STRING database version 10.5. The network developed at the 0.9 confidence level score and 50-50 interactors in the first and second shell comprised of 534 nodes and 2909 edges. The average node degree was 10.09 and average local clustering coefficient was 0.501. The PPI enrichment p value was <1 × 10 −16 . Figure 2 represents the NNK rewired protein-protein interaction network having 534 nodes and 2909 edges. Figure 3A,B depicts the number of biomolecular targets involved in various processes and pathways, respectively.

Construction of the Network
In total, 544 biomolecular targets were found to be affected by NNK interaction through a literature survey. A protein-protein interaction network (PPIN) was developed using the STRING database version 10.5. The network developed at the 0.9 confidence level score and 50-50 interactors in the first and second shell comprised of 534 nodes and 2909 edges. The average node degree was 10.09 and average local clustering coefficient was 0.501. The PPI enrichment p value was <1 × 10 −16 . Figure 2 represents the NNK rewired protein-protein interaction network having 534 nodes and 2909 edges. Figure 3A and Figure 3B depicts the number of biomolecular targets involved in various processes and pathways, respectively.

Topological Properties of the Network
The protein-protein interaction network developed was further analyzed using Cytoscape software. The topological properties of the PPIN calculated with the help of Network Analyzer plug-in were the shortest path length average, neighborhood connectivity distribution, clustering coefficient, node degree distribution, betweenness centrality, closeness centrality, etc. The shortest path length in any network depicts the shortest communication mode between two nodes. Figure 4 shows a graphical representation of path length distribution 2971. This means that at the 2971-unit path length, the information is being passed on at the highest frequency. The degree of a node describes the connectivity of a node with other nodes; it is the total number of links, which are either reaching or starting from that node [24]. The node degree distribution ( Figure 5) is one of the most important topological properties of a network. It fits a power law that indicates the presence of hubs in the network [25]. The nodes, which lie close to the power line and have a higher degree, can be considered as the hubs. The average neighborhood connectivity distribution stands for the average number of neighbors, which was observed as 15,940 in this study ( Figure 6). Figure 6 shows the average connections of each node with its neighbors. The above parameter helps in understanding the density of a network. The clustering coefficient of a network depicts the tendency of a graph to be divided into clusters [26]. The local clustering coefficient is the number of edges around a particular node, whereas the average clustering coefficient is the clustering coefficient of the whole network [25], and in this study, the average clustering coefficient in this network was found to be 0.597 ( Figure 7).

Topological Properties of the Network
The protein-protein interaction network developed was further analyzed using Cytoscape software. The topological properties of the PPIN calculated with the help of Network Analyzer plugin were the shortest path length average, neighborhood connectivity distribution, clustering coefficient, node degree distribution, betweenness centrality, closeness centrality, etc. The shortest path length in any network depicts the shortest communication mode between two nodes. Figure 4 shows a graphical representation of path length distribution 2971. This means that at the 2971-unit path length, the information is being passed on at the highest frequency. The degree of a node describes the connectivity of a node with other nodes; it is the total number of links, which are either reaching or starting from that node [24]. The node degree distribution ( Figure 5) is one of the most important topological properties of a network. It fits a power law that indicates the presence of hubs in the network [25]. The nodes, which lie close to the power line and have a higher degree, can be considered as the hubs. The average neighborhood connectivity distribution stands for the average number of neighbors, which was observed as 15,940 in this study ( Figure 6). Figure 6 shows the average connections of each node with its neighbors. The above parameter helps in understanding the density of a network. The clustering coefficient of a network depicts the tendency of a graph to be divided into clusters [26]. The local clustering coefficient is the number of edges around a particular node, whereas the average clustering coefficient is the clustering coefficient of the whole network [25], and in this study, the average clustering coefficient in this network was found to be 0.597 (Figure 7).

Topological Properties of the Network
The protein-protein interaction network developed was further analyzed using Cytoscape software. The topological properties of the PPIN calculated with the help of Network Analyzer plugin were the shortest path length average, neighborhood connectivity distribution, clustering coefficient, node degree distribution, betweenness centrality, closeness centrality, etc. The shortest path length in any network depicts the shortest communication mode between two nodes. Figure 4 shows a graphical representation of path length distribution 2971. This means that at the 2971-unit path length, the information is being passed on at the highest frequency. The degree of a node describes the connectivity of a node with other nodes; it is the total number of links, which are either reaching or starting from that node [24]. The node degree distribution ( Figure 5) is one of the most important topological properties of a network. It fits a power law that indicates the presence of hubs in the network [25]. The nodes, which lie close to the power line and have a higher degree, can be considered as the hubs. The average neighborhood connectivity distribution stands for the average number of neighbors, which was observed as 15,940 in this study ( Figure 6). Figure 6 shows the average connections of each node with its neighbors. The above parameter helps in understanding the density of a network. The clustering coefficient of a network depicts the tendency of a graph to be divided into clusters [26]. The local clustering coefficient is the number of edges around a particular node, whereas the average clustering coefficient is the clustering coefficient of the whole network [25], and in this study, the average clustering coefficient in this network was found to be 0.597 ( Figure 7).

Clustering and GO Enrichment Analysis
For modular analysis and pathway enrichment, the MCODE plug-in of the Cytoscape was used. Modules or clusters were created from the network and were scored on the basis of their size and density. A high score depicts a denser and tighter cluster. Formation of clusters also helps in the reduction of the noise and getting a better understanding of the genes involved in the clusters. Figure  8 shows the clusters generated by MCODE. The nodes in red color represent the seed proteins, and the yellow nodes are the connectors. The seed proteins are the proteins that were reported earlier to get either upregulated or downregulated by the action of NNK and connector proteins are the proteins that are associated with the seed proteins in the transfer of information, but are not reported to have any direct relation with NNK. The seed proteins were checked in all the clusters and are presented in bold in Table 1, while the remaining proteins (non-highlighted) are the connectors. The cluster that was ranked first had the highest score of 29,862, with 30 nodes and 433 edges. Moreover, during analysis, it was found that cluster 1 had 19 seed proteins and 11 connectors and the overall analysis of the entire cluster found 115 seeds and 88 connectors.
The seed proteins were identified in each cluster. Thereafter, the PPIN was generated using the STRING database with 0.9 confidence level, where 50 interactors in the first and another 50 in the second shell were recorded. The PPIN generated now was further enriched using the ClueGO plug-in.

Clustering and GO Enrichment Analysis
For modular analysis and pathway enrichment, the MCODE plug-in of the Cytoscape was used. Modules or clusters were created from the network and were scored on the basis of their size and density. A high score depicts a denser and tighter cluster. Formation of clusters also helps in the reduction of the noise and getting a better understanding of the genes involved in the clusters. Figure  8 shows the clusters generated by MCODE. The nodes in red color represent the seed proteins, and the yellow nodes are the connectors. The seed proteins are the proteins that were reported earlier to get either upregulated or downregulated by the action of NNK and connector proteins are the proteins that are associated with the seed proteins in the transfer of information, but are not reported to have any direct relation with NNK. The seed proteins were checked in all the clusters and are presented in bold in Table 1, while the remaining proteins (non-highlighted) are the connectors. The cluster that was ranked first had the highest score of 29,862, with 30 nodes and 433 edges. Moreover, during analysis, it was found that cluster 1 had 19 seed proteins and 11 connectors and the overall analysis of the entire cluster found 115 seeds and 88 connectors.
The seed proteins were identified in each cluster. Thereafter, the PPIN was generated using the STRING database with 0.9 confidence level, where 50 interactors in the first and another 50 in the second shell were recorded. The PPIN generated now was further enriched using the ClueGO plug-in.

Clustering and GO Enrichment Analysis
For modular analysis and pathway enrichment, the MCODE plug-in of the Cytoscape was used. Modules or clusters were created from the network and were scored on the basis of their size and density. A high score depicts a denser and tighter cluster. Formation of clusters also helps in the reduction of the noise and getting a better understanding of the genes involved in the clusters. Figure 8 shows the clusters generated by MCODE. The nodes in red color represent the seed proteins, and the yellow nodes are the connectors. The seed proteins are the proteins that were reported earlier to get either upregulated or downregulated by the action of NNK and connector proteins are the proteins that are associated with the seed proteins in the transfer of information, but are not reported to have any direct relation with NNK. The seed proteins were checked in all the clusters and are presented in bold in Table 1, while the remaining proteins (non-highlighted) are the connectors. The cluster that was ranked first had the highest score of 29,862, with 30 nodes and 433 edges. Moreover, during analysis, it was found that cluster 1 had 19 seed proteins and 11 connectors and the overall analysis of the entire cluster found 115 seeds and 88 connectors.  13  3  3  3  3  0  MAP3K5, TRAF1, BIRC3   14  3  3  3  3  0  PSMC3IP, MND1, DMC1   15  3  3  3  3  0  OIP5, MIS18A, NPM1   16  3  3  3  3  0  TNFRSF11A, TNFRSF12A, TNFSF11   17  3  3  3  3  0  RHOB, RHOC, CDC25C   18  3  3  3  3  0  SDCBP, PYCARD, CPPED1   19  3  3  3  3 Nodes in "bold text" represent the seed proteins and the nodes in "normal" text represent the connector proteins.
The seed proteins were identified in each cluster. Thereafter, the PPIN was generated using the STRING database with 0.9 confidence level, where 50 interactors in the first and another 50 in the second shell were recorded. The PPIN generated now was further enriched using the ClueGO plug-in. Genes 2019, 10, 564 9 of 21

PIN Construction and Topological and GO Analysis of Final Selected Seed Proteins
After the modularization process, 115 seed proteins were obtained, and used to further create a PPIN ( Figure 9) with 100 connectors using the highest confidence level score of 0.9. The PPIN generated had 213 nodes and 2509 edges, with an average node degree 23.6 and average clustering coefficient of 0.761. The PPI enrichment p-value was less than 1 × 10 −16 .

PIN Construction and Topological and GO Analysis of Final Selected Seed Proteins
After the modularization process, 115 seed proteins were obtained, and used to further create a PPIN ( Figure 9) with 100 connectors using the highest confidence level score of 0.9. The PPIN generated had 213 nodes and 2509 edges, with an average node degree 23.6 and average clustering coefficient of 0.761. The PPI enrichment p-value was less than 1 × 10 −16 . The network generated was analyzed using Cytohubba, a plug-in of Cytoscape [27]. Moreover, cytohubba analysis of each node of the network was performed and scored based on various parameters, like the degree, closeness centrality, clustering coefficients, betweenness, bottleneck, and stress. The proteins were first sorted on the basis of the clustering coefficient and then on the basis of bottleneck. The proteins with a clustering coefficient less than 0.5 were selected. The proteins having a clustering coefficient more than 0.5 were rejected as this depicts that these proteins were highly clustered and have no further spaces for the attachment of other molecules. The nodes with a clustering coefficient less than 0.5 and high degrees were found to be highly significant. A high degree and clustering coefficient less than 0.5 depict that the nodes are important in various connecting networks and they also have binding spaces available on their surface for other molecules to bind to. Once the nodes were sorted on the basis of clustering coefficients, the next most important parameter was the bottleneck. The proteins with high bottleneck were considered the most critical proteins in any PPIN. The median of the bottleneck scores of selected proteins was calculated which came out to be 3. All the proteins with bottleneck more than or equal to 2 were finally selected. Table 2 enlists the final selected proteins, which were sorted on the basis of bottleneck, with clustering coefficients less than 0.5. CHEK1, showing the highest bottleneck score of 29, was ranked in first position followed by TP53 with a bottleneck of 27. These were also analyzed with ClueGO for the pathway enrichment analysis. Figure 10 shows the pie chart representation of the enriched pathways in the form of groups. In Figure 10, the cell cycle is occupying the maximum area (43.64%) of the pie chart. From this, we can say that the cell cycle is the most enriched pathway, having the maximum number of biomolecules (a detailed graph displaying the sub-pathways in each group along with the number of genes present in it and the proteins involved in each group is enlisted in Table 3, also depicted in Supplementary Figure 1). Figure 11 shows that the maximum number of genes are associated with the cell cycle followed by the cellular macromolecule metabolic process. The network generated was analyzed using Cytohubba, a plug-in of Cytoscape [27]. Moreover, cytohubba analysis of each node of the network was performed and scored based on various parameters, like the degree, closeness centrality, clustering coefficients, betweenness, bottleneck, and stress. The proteins were first sorted on the basis of the clustering coefficient and then on the basis of bottleneck. The proteins with a clustering coefficient less than 0.5 were selected. The proteins having a clustering coefficient more than 0.5 were rejected as this depicts that these proteins were highly clustered and have no further spaces for the attachment of other molecules. The nodes with a clustering coefficient less than 0.5 and high degrees were found to be highly significant. A high degree and clustering coefficient less than 0.5 depict that the nodes are important in various connecting networks and they also have binding spaces available on their surface for other molecules to bind to. Once the nodes were sorted on the basis of clustering coefficients, the next most important parameter was the bottleneck. The proteins with high bottleneck were considered the most critical proteins in any PPIN. The median of the bottleneck scores of selected proteins was calculated which came out to be 3. All the proteins with bottleneck more than or equal to 2 were finally selected. Table 2 enlists the final selected proteins, which were sorted on the basis of bottleneck, with clustering coefficients less than 0.5. CHEK1, showing the highest bottleneck score of 29, was ranked in first position followed by TP53 with a bottleneck of 27. These were also analyzed with ClueGO for the pathway enrichment analysis. Figure 10 shows the pie chart representation of the enriched pathways in the form of groups. In Figure 10, the cell cycle is occupying the maximum area (43.64%) of the pie chart. From this, we can say that the cell cycle is the most enriched pathway, having the maximum number of biomolecules (a detailed graph displaying the sub-pathways in each group along with the number of genes present in it and the proteins involved in each group is enlisted in Table 3, also depicted in Supplementary Figure S1). Figure 11 shows that the maximum number of genes are associated with the cell cycle followed by the cellular macromolecule metabolic process.       Figure 11. Representation of the pathways enriched and the total number of genes associated with them.  Table 4. Figure 12 shows the top three interactions of NNK with its target biomolecules, namely CDK7, CCNA1, and CDKN1B. Table 5 shows the key biomolecular targets of NNK and their role in cell cycle regulation. Figure 11. Representation of the pathways enriched and the total number of genes associated with them.  Table 4. Figure 12 shows the top three interactions of NNK with its target biomolecules, namely CDK7, CCNA1, and CDKN1B. Table 5 shows the key biomolecular targets of NNK and their role in cell cycle regulation.        Table 5. Key proteins and their roles in cell cycle regulation.

CDK7
Catalytic subunit of CAK complex which activates the cyclin-associated kinases CDK1, CDK2, CDK4, and CDK6 by threonine phosphorylation, thus regulating cell cycle progression [28,29] 2. CCNA1 Controls the cell cycle at the G1/S (start) and G2/M (mitosis) transitions [30] 3. CDKN1B Important regulator of cell cycle progression. Acts either as an inhibitor or an activator of cyclin type D-CDK4 complexes depending on its phosphorylation state and/or stoichometry.
[31]  Table 5. Key proteins and their roles in cell cycle regulation.

CDK7
Catalytic subunit of CAK complex which activates the cyclin-associated kinases CDK1, CDK2, CDK4, and CDK6 by threonine phosphorylation, thus regulating cell cycle progression [28,29] 2. CCNA1 Controls the cell cycle at the G1/S (start) and G2/M (mitosis) transitions [30] 3. CDKN1B Important regulator of cell cycle progression. Acts either as an inhibitor or an activator of cyclin type D-CDK4 complexes depending on its phosphorylation state and/or stoichometry. [31] 4. CASP8 Responsible for the positive and negative regulation of apoptosis and inflammation [32] 5. CHEK2 required for checkpoint-mediated cell cycle arrest in response to p53 defects [33] 6. PLK1 Regulates entry and exit from mitosis and cytokinesis. Regulates DNA replication [34] 7. BID induces apoptosis on interaction with BCl2 family proteins [35] 8. HSP90AA1 regulates the function of ATM in sensing and repairing the DNA damages [36] 9. BRCA1 Has role in DNA damage repair, cell cycle control and transcriptional regulation [37] 10. CDK1 Promotes G2-M transition and regulates G1 progress and G1-S transition via association with multiple interphase cyclins. [38] 11. CDK2 Essential for transition of cell cycle from G1 to S phase and then from S to G2 phase. [39,40] 12. CCNB1 Essential for the control of the cell cycle at the G2/M (mitosis) transition [41] 13. CHEK1 required for smooth cellular proliferation by inducing the degradation of cdc25a [42] 14.

Discussion
Cancer is a global inflammable problem, which pathologically occurs due to the accumulation of mutations in one or many genes, a phenomenon known as "oncogene addiction". The loss of fidelity during the replication of DNA or the repair of damaged DNA leads to a cell becoming cancerous. Tobacco consumption has a major influence on these aberrations due to its main component, NNK, which has been strongly related with various cancers, mainly lung cancer [53,54]. NNK binds with various proteins involved in several cellular processes and leads to cancer. Hence, it is vital to fully understand the precise mechanism of the disease development caused by NNK and the interactome analysis of biomolecular target genes/proteins for attaining an overview of the highly complex functional interdependency of these targets for the efficient performance of the cell.
Systems biology is an interdisciplinary field that facilitates an understanding of complex biological network systems. These networks can be metabolic networks, gene regulation network, cell signaling network, or protein-protein interaction network [55]. With the help of system biology, we can analyze huge real-time networks precisely. Significantly, the huge amount of data from research being conducted globally were collected and statistically analyzed to uncover the functions of individual genes and proteins. The results may then be integrated to provide higher-level information [56]. Mathematical modeling and computational simulations help in understanding the internal dynamics of the system or the process and thus help in predicting the future of the process [57]. The networks created represent the genes or proteins with the nodes and their relations or physical connections with the edges [58]. The biological systems are immensely huge and complex networks depicting how one protein is connected with other proteins and how a slight impact on a gene or protein will affect the whole interactome [59]. These highly complicated and tightly packed networks, which govern all biological processes, are also referred to as protein-protein interaction networks (PPINs). All these graphs and networks work on the principle of graph theory. In the current study, we focused on NNK rewired PPIN to identify its bio-molecular targets that regulate the various cellular processes and cause cancer.
To construct PPIN, a literature survey was performed on PubMed database using keywords, like NNK, and around 544 genes extracted from 1317 research articles. The STRING database was used for the generation of the PPIN as it gives freedom to the users to choose the active interaction sources for their network. For this study, we used experimental data sources, like BIND, IntAct, etc., and pathway database interaction sources, like KEGG and reactome. The network generated had 534 nodes, including 100 connectors, at the highest confidence level score of 0.9. It contained 2909 edges that depicted that all the reported nodes were well connected with each other, which were further confirmed by the topological parameter, i.e., the average node degree, which was found to be 10.9. The degree of a node gives the information about the connection of the node with other nodes or simply the number of edges that either enter a node or leave it. All the biological systems are undirected and scale-free so the node degree depicts the average connectivity of each node with other nodes present in the network. The identification of key nodes in any network is not possible just by calculating its degree, hence various other parameters require evaluation, like the betweenness centrality, closeness centrality, and bottleneck scores [60]. Nodes with clustering coefficients less than 0.5 are key nodes because lesser clustering coefficients depict that the protein still have space for interaction with other molecules. The average neighborhood connectivity details provide the information about the density of the network. The shortest path length is important as it gives details about the minimum number of the edges between the two nodes and how fast information can pass on from one node to another node.
The network generated was then divided into modules. Modularization is an important step as it helps in reducing the noise of the data. On modularization, 19 clusters were obtained and each cluster contained seed proteins. Finally, 115 seed proteins were procured from the initial list of 534 proteins. Again, a PPIN was generated using these 115 seed proteins and 100 connectors at the first and second shells, and we obtained 213 nodes and 2509 edges. In addition, 100 connectors were done in order to include some more relevant proteins, which were possibly missed in the above procedure. The topological analysis was done on the basis of clustering coefficient and bottleneck scores. We used the bottleneck score as our final selection criteria as bottlenecks are the significant proteins, which have great functional and dynamic significance in a network [61]. To find the statistically significant cut-off for the proteins to be selected as key targets, the median was calculated on the bottleneck score, which came out to be 2. The enrichment analysis was done to find the pathways and processes that are enriched in the process of the PPIN generation. On the GO enrichment analysis, it was found that most of the seed proteins fall under the pathways that are related to cell cycle regulation and these results were also reflected on the final selected seed proteins that had a bottleneck score of 2 or more than 2. The cell cycle is a tightly regulated and a precisely timed event that is divided into phases and is monitored by checkpoints [62]. It is a highly orchestrated event, which ensures the integrity and stability of the genome. NNK binds to various receptors and hampers various pathways, leading to the cell becoming cancerous [63]. It not only binds with the receptors but also targets many important genes and proteins, like CDKs and cyclins, that are crucial for cell cycle processes [6]. Table 3 and Figure 10, Supplementary Figure S1, and Figure 11 clearly depict that most of the pathways that were enriched fall under cell cycle regulation at different phases.
Finally, docking studies were performed for the target proteins using NNK as a ligand. Molecular docking is the key tool to check whether the binding of ligand and protein is thermodynamically possible or not. The binding energies of these docking results further helped in refining the targets for NNK. The major targets of NNK found were CDK7, CCNA1, CDKN1B, CASP8, CHEK2, PLK1, BID, HSP90AA1, BRCA1, CDK1, CDK2, CCNB1, CHEK1, RPA2, ATM, CDK4, TFDP1, TP53, RB1, and RPA1, with their binding energies ranging from −5.93 to −3.67 Kcal/Mol. PYCARD and QXOS1 were excluded from this screening because of their low degree and 0 clustering coefficient. In addition to the above key proteins, our present study also revealed five connector proteins, which can be proven as putative biomolecular targets of NNK. These proteins are CHEK2, CCNA1, BID, RPA1, and RPA2. From these docking results, we can easily conclude that a majority of the key targets of NNK belong to the cell cycle regulatory proteome. CDKs and cyclins are the key regulatory proteins of the cell cycle that control the various phases of the cell. Different CDK/cyclin complexes are required for a smooth transition of the cell cycle. Other proteins, like ATM, RB1, and TP53, may act as tumor suppressor proteins and signaling proteins for DNA damage. Table 5 enlists the final key biomolecular targets of NNK and their role in the cell cycle.

Conclusions
In conclusion, a NNK rewired PPIN, with the help of various systems biology tools, was explored for the identification of potential targets involved in tobacco induced cancer development. The biomolecules that were suspected for being the most probable targets of NNK were further screened using functional enrichment techniques and then using molecular docking techniques. The results of this study show that the maximum proteins that are the most probable targets of NNK are involved in cell cycle regulation. With these, we can conclude that NNK has a major impact on the biomolecules of the cell cycle regulatory proteome. The present study opens future possibilities to identify new biomarkers for the cancers associated with NNK. This can also help in the pre-symptomatic diagnosis of diseases and in the development of precision medicines.