Weighted Gene Co-Expression Network Analysis Identifies Key Modules and Hub Genes Associated with Mycobacterial Infection of Human Macrophages

Tuberculosis (TB) is still a leading cause of death worldwide. Treatments remain unsatisfactory due to an incomplete understanding of the underlying host–pathogen interactions during infection. In the present study, weighted gene co-expression network analysis (WGCNA) was conducted to identify key macrophage modules and hub genes associated with mycobacterial infection. WGCNA was performed combining our own transcriptomic results using Mycobacterium aurum-infected human monocytic macrophages (THP1) with publicly accessible datasets obtained from three types of macrophages infected with seven different mycobacterial strains in various one-to-one combinations. A hierarchical clustering tree of 11,533 genes was built from 198 samples, and 47 distinct modules were revealed. We identified a module, consisting of 226 genes, which represented the common response of host macrophages to different mycobacterial infections that showed significant enrichment in innate immune stimulation, bacterial pattern recognition, and leukocyte chemotaxis. Moreover, by network analysis applied to the 74 genes with the best correlation with mycobacteria infection, we identified the top 10 hub-connecting genes: NAMPT, IRAK2, SOCS3, PTGS2, CCL20, IL1B, ZC3H12A, ABTB2, GFPT2, and ELOVL7. Interestingly, apart from the well-known Toll-like receptor and inflammation-associated genes, other genes may serve as novel TB diagnosis markers and potential therapeutic targets.


Introduction
Tuberculosis (TB) is still a global threat and one of the leading infectious diseases according to the World Health Organisation, causing 1.5 million deaths and 10 million new cases in 2018 [1]. TB infection begins when the causal bacilli, Mycobacterium tuberculosis (Mtb), reach the alveolar air sacs of the lungs, where they invade and replicate within macrophages [2,3]. Granulomas are formed when macrophages, T lymphocytes, B lymphocytes, and fibroblasts aggregate, surrounding the infected macrophages. The granuloma prevents dissemination of the mycobacteria but provides a local environment for the infection of immune cells [3]. The formation of granulomas may also suppress host immune responses, as macrophages and dendritic cells in the granulomas are unable to present antigen to lymphocytes [4]. In addition to active disease, Mtb may also cause latent infection [5]. Epidemiology studies indicate that 90% of those infected with TB are

Construction of Weighted Co-Expression Network
To learn about the immune response to infection by different mycobacteria, we investigated the in vitro gene regulatory response of macrophages to infection using multiple Mtb strains and related mycobacterial species (Table 1).  Table S1 for a list of all accession numbers corresponding to each Bioproject.
Specifically, we collected RNAseq data of human macrophage infected and/or stimulated with the following mycobacteria: Mtb H37Rv (pathogenic and slow growing), the common strain often used in laboratory experiments; Mtb GC1237, which is a clinical isolate of the virulent Beijing family; Mycobacterium smegmatis, which is fast-growing and non-pathogenic; M. bovis bacillus Calmette-Guerin (BCG), which is slow growing and an attenuated Mycobacterium bovis strain used for vaccinations; MAB-R and MAB-S, which are the rough and smooth morphotypes of the fast-growing M. abscessus; and heat-killed Mtb H37Rv. These previously published transcriptome studies were combined with our own data using M. aurum (PRJNA575195). The detailed features of mycobacteria are listed in Table 2. To construct the gene co-expression networks, all data were downloaded from the Gene Expression Omnibus (GEO) database, including our own unpublished experimental data, as detailed in Table 1 and Table S1. Raw data were preprocessed identically using R for background correction and normalisation. A total of 11,533 genes (Appendix A (Additional File 1)) for 198 samples were used to build a WGCNA. First, genes and conditions were clustered using the flashClust function (see Figure S1, where information on infection status, infection time, or infection type/different bacterial strains is indicated). Selection of the soft-thresholding power is an important step when constructing a WGCNA [22]. We performed the analysis of network topology for thresholding powers from 1 to 50 and identified the relatively balanced scale independence and mean connectivity of the WGCNA. As shown in Figure 1, power value 30, which was the lowest power for the scale-free topology fit index on 0.85, was selected to produce a hierarchical clustering tree (dendrogram) of 11,533 genes. From the WGCNA analysis, we identified 47 distinct modules with mergeCutHeight of 0.3 ( Figure 2). data, as detailed in Table 1 and Table S1. Raw data were preprocessed identically using R for background correction and normalisation. A total of 11,533 genes (Appendix A (Additional File 1)) for 198 samples were used to build a WGCNA. First, genes and conditions were clustered using the flashClust function (see Figure S1, where information on infection status, infection time, or infection type/different bacterial strains is indicated). Selection of the soft-thresholding power is an important step when constructing a WGCNA [22]. We performed the analysis of network topology for thresholding powers from 1 to 50 and identified the relatively balanced scale independence and mean connectivity of the WGCNA. As shown in Figure 1, power value 30, which was the lowest power for the scale-free topology fit index on 0.85, was selected to produce a hierarchical clustering tree (dendrogram) of 11,533 genes. From the WGCNA analysis, we identified 47 distinct modules with mergeCutHeight of 0.3 ( Figure 2).

Correlation between Modules and Identification of Key Modules
In WGCNA, modules are defined as clusters of highly interconnected genes; genes within the same cluster have high correlation coefficients. The module eigengene (ME) is representative of the corresponding module's gene expression profile correlated with a defined trait (infection status). We analysed the interaction relationships of the 47 modules and plotted the network heatmap ( Figure S2). The results showed that each module was

Correlation between Modules and Identification of Key Modules
In WGCNA, modules are defined as clusters of highly interconnected genes; genes within the same cluster have high correlation coefficients. The module eigengene (ME) is representative of the corresponding module's gene expression profile correlated with a defined trait (infection status). We analysed the interaction relationships of the 47 modules and plotted the network heatmap ( Figure S2). The results showed that each module was an independent validation from the others, which indicated a high level of independence among the modules and relative independence of gene expression within each module. Moreover, we calculated eigengenes and clustered them according to their correlations. The results demonstrated that the 47 modules were mainly divided into two clusters ( Figure S3). As shown in Figure 3, we found that several modules were significantly correlated with the infection status (significance value < 0.001), which was defined as presence or absence of mycobacterial infection. Among them, the grey60 module showed the highest correlation (r value) compared with the other modules. Therefore, grey60 was considered as the most relevant differentially expressed gene module across these diverse mycobacteria and variety of host macrophage cells exposed to infection. Figure 4 illustrates the correlation between module membership and gene significance in the grey60 module. The results indicated that genes in the grey60 module likely play an important role in responding to mycobacterial infection in macrophages.

Functional Enrichment and Identification of Hub Genes
Following this observation, we inspected in detail the composition of the grey60 module. The module is correlated with mycobacterial infection status and contains 226 genes, the expression of each gene is listed in Appendix B (Additional File 2). Appling Kyoto Encyclopedia of Genes and Genome (KEGG) and Gene Ontology (GO) enrichment analysis, we found these 226 genes significantly enriched into pathways such as Cytokine-

Functional Enrichment and Identification of Hub Genes
Following this observation, we inspected in detail the composition of the grey60 module. The module is correlated with mycobacterial infection status and contains 226 genes, the expression of each gene is listed in Appendix B (Additional File 2). Appling Kyoto Encyclopedia of Genes and Genome (KEGG) and Gene Ontology (GO) enrichment analysis, we found these 226 genes significantly enriched into pathways such as Cytokinecytokine receptor interaction, mitogen-activated protein kinase (MAPK) signalling, toumor necrosis factor (TNF) signalling, phosphatidylinositol 3-kinase (PI3K)-protein kinase B (Akt) signalling, interleukin 17 (IL-17) signalling, and biological processes such as response to lipopolysaccharides, response to molecule of bacterial origin, and positive regulation of peptidyl-tyrosine phosphorylation, respectively (full list included in Appendix C (Additional File 3)). A heatmap of the weighted relative expression of each gene in the grey60 module ( Figure 5) revealed that many module genes were strongly expressed after mycobacteria infection compared with the uninfected control groups. Among the three macrophage controls, blood monocyte-derived macrophage (BMDM) had the lowest expression of these genes, followed by the THP1 and human alveolar macrophage (HAM) cells. Among the different mycobacterial infections, Mtb GC1237 and M. smegmatis caused the highest upregulation of these genes and were followed by BCG, Mtb H37Rv, MABS, MABR, heat inactivated Mtb H37Rv, and M. aurum. macrophage controls, blood monocyte-derived macrophage (BMDM) had the lowe pression of these genes, followed by the THP1 and human alveolar macrophage (H cells. Among the different mycobacterial infections, Mtb GC1237 and M. smegmatis ca the highest upregulation of these genes and were followed by BCG, Mtb H37Rv, M MABR, heat inactivated Mtb H37Rv, and M. aurum. Out of the 226 genes from the grey60 module, 74 genes showed the best linear c lation with mycobacteria infection (Figure 4, see the full gene list in Appendix D (A tional File 4)). Thus, those 74 genes were used as candidates for hub gene identific analysis by applying the NetworkAnalyst platform to map the protein-protein interac and visualised them as networks in Cytoscape ( Figure 6). The first 10 genes in the g module according to their interaction degree, connected to most of the other genes w the module, were highlighted as main hub genes: nicotinamide phosphoribosyltran ase (NAMPT), interleukin 1 receptor associated kinase 2 (IRAK2), suppressor of cyt signalling 3 (SOCS3), prostaglandin-endoperoxide synthase 2 (PTGS2), C-C motif ch kine ligand 20 (CCL20), interleukin 1 beta (IL1B), zinc finger CCCH-type containing (ZC3H12A), ankyrin repeat and BTB domain containing 2 (ABTB2), glutamine fruct Out of the 226 genes from the grey60 module, 74 genes showed the best linear correlation with mycobacteria infection (Figure 4, see the full gene list in Appendix D (Additional File 4)). Thus, those 74 genes were used as candidates for hub gene identification analysis by applying the NetworkAnalyst platform to map the protein-protein interactions and visualised them as networks in Cytoscape ( Figure 6). The first 10 genes in the grey60 module according to their interaction degree, connected to most of the other genes within the module, were highlighted as main hub genes: nicotinamide phosphoribosyltransferase (NAMPT), interleukin 1 receptor associated kinase 2 (IRAK2), suppressor of cytokine signalling 3 (SOCS3), prostaglandin-endoperoxide synthase 2 (PTGS2), C-C motif chemokine ligand 20 (CCL20), interleukin 1 beta (IL1B), zinc finger CCCH-type containing 12A (ZC3H12A), ankyrin repeat and BTB domain containing 2 (ABTB2), glutamine fructose 6 phosphate transaminase 2 (GFPT2), and elongation of very long chain fatty acids protein 7 (ELOVL7). We also performed enrichment analysis to explore the biological processes (BP) and pathways in which the key module was involved using the 74 genes that showed better linear correlation within the grey60 module. GO enrichment of BP was conducted by R programme clusterProfiler, as detailed in Figure 7. The findings are consistent with the results using all 226 genes in the grey60 module (Appendix C (Additional File 3)). The results highlight that genes in the grey60 module were enriched in innate immune stimulation responses, such as pathogen-associated molecular patterns (PAMPs), cytokine secretion, and immune cell migration, all of which were positively correlated with mycobacterial infection. 7 (ELOVL7). We also performed enrichment analysis to explore the biological processes (BP) and pathways in which the key module was involved using the 74 genes that showed better linear correlation within the grey60 module. GO enrichment of BP was conducted by R programme clusterProfiler, as detailed in Figure 7. The findings are consistent with the results using all 226 genes in the grey60 module (Appendix C (Additional File 3)). The results highlight that genes in the grey60 module were enriched in innate immune stimulation responses, such as pathogen-associated molecular patterns (PAMPs), cytokine secretion, and immune cell migration, all of which were positively correlated with mycobacterial infection.

Discussion
The interaction between macrophage and bacterium is central to the immunopathology of mycobacterial diseases; a deeper understanding of this interplay will identify new

Discussion
The interaction between macrophage and bacterium is central to the immunopathology of mycobacterial diseases; a deeper understanding of this interplay will identify new treatment strategies. Here, we aimed to address the question of whether different mycobacteria similarly modulate the host response of macrophages. Direct comparison of the differentially expressed genes (DEGs) of macrophages infected with different mycobacteria identified from different individual studies was not appropriate to meet this goal, as the analyses were performed differently by each group and, more fundamentally, DEG analysis neglects any key but minor changes in gene expression. For example, when we first explored the transcriptional change of THP1-derived macrophages challenged with M. aurum by DEG analysis, we only identified 27 differentially expressed genes in comparison with uninfected control macrophages (Appendix E (Additional File 5)). The most up-regulated and down-regulated genes were TMEM189-UBE2V1 and RPS18, respectively. TMEM189-UBE2V1 is related to the IL-1 signalling pathway [23,24]; however, standard DEG analysis did not identify any further IL-1 related genes. To note, M. aurum shares a high similarity with Mtb at the genomic level [25], and most of the genes reported to be associated with drug resistance are common [26].
To compare between published macrophage transcriptome signatures and to overcome the DEG analysis drawback of neglecting minor changes in gene expression, we applied WGCNA to identify a core set of macrophage mycobacterial-responsive genes, contrasting the RNA profiles of macrophages responding to different mycobacterial infections including seven mycobacterial exposures and three types of macrophages. These mycobacteria are known to differ in pathogenicity, growth rate, morphology, and cell wall lipid composition ( Table 2). The selected mycobacterial species and macrophage types represent most of the working models that have been widely used for Mtb infection characterisation [3,15,27].
In our study, by WGCNA analysis of the 198 transcriptome samples, we can categorise 11,533 genes into 47 modules according to their expression pattern (Figure 2). Among the defined modules, we identified a module (grey60), consisting of 226 genes, which strongly correlates with mycobacterial adaptations to infection (Figure 4). The grey60 module is mostly enriched with genes from biological processes such as response to pathogen-associated molecules, cytokine secretion, and chemoattractant release (Figure 7). Specifically in the module, we find genes induced in response to infection such as IL1B, TNF, IL6, IL12B, NFKBIA, JUN, and MAP3K8, which are related to Toll-like receptor signalling (Appendix B (Additional File 2)). To note, IL1B, TNF, IL1A, IL6, IL23A, IL12B, PLK3, and IRAK2 were already reported as Mtb-responsive genes [27][28][29][30][31][32]. Next, by protein interaction network analysis, we identified a set of hub genes that represent the general response of macrophage infected with different mycobacteria, the top 10, connecting the genes in the module together, were NAMPT, IRAK2, SOCS3, PTGS2, CCL20, IL1B, ZC3H12A, ABTB2, GFPT2, and ELOVL7. Of them, IRAK2 and IL1B are well known to be related to Toll-like receptor signalling pathways [33]; SOCS3, PTGS2, CCL20, and IL1B are linked to TNF signalling pathways [34]; ZC3H12A and PTGS2 are involved in cellular inflammatory responses [35]; and GFPT2, NAMPT, ABTB2, and ELOVL7 are related to cell cycle dysregulation, tissue damage, and autophagy [36]. IFN-γ, IP-10, CRP, TNF-α, CCL4, IL1β, and TLR4 have been associated with high accuracy to TB [37,38]. Genes involved in NAD+ biosynthesis, for example, NAMPT, were strongly upregulated during Mtb infection [39]. The current data emphasise the previously suggested key role of the IFN pathway in the macrophage response to Mtb and successful outcome of infection [39,40]. Interestingly, some of these genes are involved in nucleic acid or lipid turnover and are also activated during other diseases, such as autoimmunity or cancer [36,[40][41][42][43][44][45]. Our results back up very recent studies of Mtb modulation of host cell metabolism pathways [39]. Moreover, we note here the role of cell migration factors in TB pathogenesis, besides cell response to bacteria, as the genes in grey60 split into the two directions (Figure 7). In addition, interferon-γ release assay (IGRA), based on IFN-γ response, is widely used in TB diagnosis [46]. Except for the well-known TB markers such as IRAK2, SOCS3, PTGS2, CCL20, and IL1B, the other hub genes we identified here, NAMPT, ZC3H12A, ABTB2, GFPT2, and ELOVL7, may be useful as new biomarkers for TB diagnosis and as therapeutic targets for host-directed strategies. The present results based on the host metabolism can provide a complementary approach to other studies strictly focused on the mycobacterial response to available drugs in the market [47][48][49][50][51]. Hopefully, the dual sequencing of host-pathogen models [52] will help to unravel the key genes associated to both virulence and host response to infection and set the path to a better tailored drug design.
We conclude here that different mycobacterial infection models can trigger a common core set of genes related to macrophage activation and migration, which is separate from macrophage stimulation with heat-killed Mtb. However, we cannot disregard the inherent differences between mycobacteria strains and macrophage type, as shown in Figure 6. Control uninfected BMDM had the lowest expression of this gene module, HAM had the highest level of expression of these genes, whereas THP1 showed a medium profile. The three studied macrophage types include two primary macrophage cells: blood monocytederived macrophages and alveolar macrophages, together with an immortalised monocytelike cell line (THP1). Alveolar macrophages are natural mature macrophages resident in the lung and are among the first immune cells to encounter invading mycobacteria, while blood monocytes and THP1 cells need to be activated to become macrophages. We cannot disregard the key differences between immortalized THP1 cells and their considered physiological counterparts. For example, THP-1 cells, when compared with monocytes, are far less responsive to some pathogen recognition patterns [53]. Therefore, our results emphasise once more the importance of selecting the appropriate cell infection model to suit each study. Comparing the different mycobacteria strains, Mtb H37Rv, Mtb GC1237, M. bovis BCG, M. smegmatis, MABS, and MABR showed similar induction of genes in the grey60 module, while infection with M. aurum or stimulation with heat-inactivated Mtb H37Rv resulted in a lower magnitude of upregulation of these genes. The two species, M. aurum and M. smegmatis, have been regarded as ideal surrogates for mimicking tuberculosis in drug development, both demanding a shorter time of culture and less stringent biosafety laboratory conditions [25,54].The differential regulation of the gene module identified here by all mycobacteria highlights the potentiality of some alternative infection models to live Mtb infection. (NCTC #88081201) were performed as previously described; triplicate biological replicates were performed [55]. After 24 h infection, uninfected and infected macrophage cells were washed 3 times with prewarmed PBS and collected by cell scrapper for RNA extraction. Total RNA was extracted using mirVana TM miRNA Isolation Kit as described by the manufacturer (Ambion, Life Technologies, AM1560). RNA purity was determined by spectrophotometry, and RNA integrity was analysed using Agilent 2100 Bioanalyzer and calculated as an RNA integrity number. Following RNA extraction, RNA sequencing libraries were prepared according to protocols provided by Illumina. 50 bp-long single-end sequencing was carried out in an Illumina HiSeq2000 sequencer with a depth of >20 million reads per sample at the CRG genomics (Centre for Genomic Regulation, Barcelona) facility. The raw transcriptomic data were uploaded in NCBI Sequence Read Archive: PRJNA575195. After sequencing, quality assessment of reads was carried out using FastQC [56] to assess the distribution of Phred quality scores and mean percentage GC content across each read. Adapters were trimmed using Trimmomatic [57]. Trimmed reads were aligned to the latest human genome assembly (version GRCh38) using Hisat2 [58]. Aligned reads were converted and sorted in the SAM file format using Samtools, providing the sorted BAM file. Stringtie was used to count the number of reads mapping to the GTF file of the annotated genome using BAM file as input [59]. Following, DEseq2 was used to reveal differently expressed genes (DEGs) using the read count as input file, and the results of all statistical tests were adjusted for multiple testing (0.05) [60].

Data Collection and Sample Processing for WGCNA
Mycobacterial macrophage infection gene expression studies were reviewed and filtered based on the inclusion-exclusion criteria of the following: (1) human macrophage; (2) at least 2 h post infection time points; (3) minimum sample size of 3 control/3 experimental. After filtering, 6 genome-wide expression studies were selected, including our own transcriptome of M. aurum-infected macrophages from this study, PRJNA575195. As shown in Table 1, these datasets involved three different macrophage cell types (BMDM, THP1, HAM) infected with 8 different mycobacteria (Mtb-H37Rv, BCG, MAB-S, MAB-R, M. smegmatis, Mtb-GC1237, M. aurum, and heat-inactivated Mtb-H37Rv). All data were generated using a whole genome platform (RNAseq). In total, 198 samples were analysed, as summarised in Table 1 and detailed in full in Table S1. Ensembl IDs from RNAseq were converted into gene symbols and merged gene expression into two groups. For sequencing data, genes with less than 10 average counts per million were filtered out of the analysis. Then, the expression data were transformed using log2 transformation to normalise the data [61]. The well-established ComBat procedures in SVA packages were performed to reduce the potential study-specific batch effect [62]. Four samples were detected as outliers and filtered out using flashClust [63] (cluster dendrogram detailed in Figure S1).

Construction of Weighted Gene Co-Expressed Networks
We performed WGCNA to identify the gene modules of interest from the assembled dataset using R package WGCNA [18]. The standard WGCNA procedure generates a squared adjacency matrix, between genes, based on their correlation [64]. In the present study, the absolute value of the Pearson correlation between the expression profiles of all candidate genes was determined for the 11532 most varying non-redundant transcripts, which was transformed into a connection strength measure by using a power function (connection strength (i,j) = |correlation(i,j)|ˆβ). The scale-free topology criterion was applied to determine the lowest soft threshold power guaranteeing a scale-free topology fit (R 2 > 0.85) [18]. To group genes with coherent expression profiles into modules, we used the WGCNA R packages average lineage hierarchical clustering to measure the dissimilarity by topological overlapping. The co-expression modules were constructed using the automatic network construction function blockwiseModules with the following settings: power, 30; degree of similarity, 0.75; minModuleSize, 30; and TOMType, signed. All other parameters of the modules were set to default [18]. Gene significance (GS) was defined as the log10 transformation of P-value (GS = lgP) in the linear regression between gene expression and sample information. In addition, module significance (MS) was defined as the average GS for all the genes in a module. Modules with significance less than 0.0001 and an absolute value of correlation coefficient higher than 0.5 were recognised as significantly associated modules.

Functional Enrichment of Recurrence-Associated Modules
The significantly associated modules were applied to GO and KEGG databases for function and pathway enrichment analyses [65]. P-Value corrected by Benjamin-Hochberg FDR less than 0.05 was the threshold used to discriminate significant terms.

Identification of Hub Genes in Key Modules
Hub genes that are highly interconnected with module nodes were considered as functionally significant. A protein-protein interaction network was constructed using NetworkAnalyzer [66] for the significantly associated module. The intramodular connectivity was calculated to identify hub genes (qvalue < 0.01).

Conclusions
In this work, we compared the transcriptomic responses of different types of macrophages infected with seven different mycobacterial strains: four that are pathogenic and survive intracellularly and three that are non-pathogenic surrogates. By weighted gene co-expression network analysis (WGCNA) of 198 sample profiles, we identified a specific 226-gene module enriched with genes related to bacterial unique pattern recognition, cytokine secretion, and leukocyte chemotaxis. We also defined by protein-protein network analysis a core set of genes that may be used as novel diagnosis markers and therapeutic targets. We also observed that the magnitude of expression of this immune cell response signature differed between the mycobacteria evaluated. Results highlight the close connection between the host immune system and pathogenicity, and suggest new targets for host-directed therapies.
Supplementary Materials: The following are available online at https://www.mdpi.com/2079-638 2/10/2/97/s1, Table S1. Data information. Figure S1. Dendrogram of all samples. The leaves of the tree correspond to each sample, clustering expression ratios from 11,533 genes. The first colour band underneath the tree indicates which samples appear to be outlying (the outliers, coloured in red, were excluded from the following analysis), the second colour band represents the infection time.  Figure S2. Interaction relationship analysis of co-expressed genes. Different colours of horizontal axis and vertical axis represent different gene modules. The brightness of yellow to red of the intersects represent the degree of connectivity between different modules. There was no significant difference in interactions among different modules, indicating a high-scale independence degree among these modules. Figure S3. Cluster dendrogram of module eigengenes.

Conflicts of Interest:
The authors declare that they have no competing interests.