Consistent Biomarkers and Related Pathogenesis Underlying Asthma Revealed by Systems Biology Approach

Asthma is a common chronic airway disease worldwide. Due to its clinical and genetic heterogeneity, the cellular and molecular processes in asthma are highly complex and relatively unknown. To discover novel biomarkers and the molecular mechanisms underlying asthma, several studies have been conducted by focusing on gene expression patterns in epithelium through microarray analysis. However, few robust specific biomarkers were identified and some inconsistent results were observed. Therefore, it is imperative to conduct a robust analysis to solve these problems. Herein, an integrated gene expression analysis of ten independent, publicly available microarray data of bronchial epithelial cells from 348 asthmatic patients and 208 healthy controls was performed. As a result, 78 up- and 75 down-regulated genes were identified in bronchial epithelium of asthmatics. Comprehensive functional enrichment and pathway analysis revealed that response to chemical stimulus, extracellular region, pathways in cancer, and arachidonic acid metabolism were the four most significantly enriched terms. In the protein-protein interaction network, three main communities associated with cytoskeleton, response to lipid, and regulation of response to stimulus were established, and the most highly ranked 6 hub genes (up-regulated CD44, KRT6A, CEACAM5, SERPINB2, and down-regulated LTF and MUC5B) were identified and should be considered as new biomarkers. Pathway cross-talk analysis highlights that signaling pathways mediated by IL-4/13 and transcription factor HIF-1α and FOXA1 play crucial roles in the pathogenesis of asthma. Interestingly, three chemicals, polyphenol catechin, antibiotic lomefloxacin, and natural alkaloid boldine, were predicted and may be potential drugs for asthma treatment. Taken together, our findings shed new light on the common molecular pathogenesis mechanisms of asthma and provide theoretical support for further clinical therapeutic studies.


Introduction
Asthma is a chronic inflammatory disease of the airways, caused by genetic or environmental and lifestyle factors, which is characterized by airway hyper-responsiveness (AHR), inflammation, and variable airflow obstruction [1]. Despite recent progress in the development of anti-asthmatic medication, asthma is still a major public health problem in the world. Its prevalence, morbidity, and mortality are still increasing [2]. According to the Global Asthma Report 2018, asthma affects over 339

Ten Independent Datasets Meeting the Criteria in This Study
In the study, 191 microarray datasets were obtained by keyword searching (as of 20th December 2018). After filtering based on dataset inclusion and exclusion criteria, 10 asthma-related microarray datasets from six disparate platforms (Affymetrix: HG-U95Av2, HG-U133A, HG-U133_Plus_2, HT_HG-U133_ Plus_PM, and Agilent: 4 × 44K G4112F and SurePrint G3 Human GE v3 8 × 60K) were selected. The detailed information of these datasets and sample descriptions is shown in Table 1 and Supplementary Material S1, respectively. To compile expression data for data integrating and subsequent investigation, each dataset was preprocessed, normalized, and then integrated for further analysis following the steps illustrated in Figure 1.

Sample Quality Control and Microarray Data Preprocessing
After the quality control, 1 (GSE470), 2 (GSE4302), 2 (GSE18965), 11 (GSE41861), 4 (GSE63142), 3 (GSE64913), 7 (GSE89809), and 4 (GSE104468) samples did not meet the cut-off criteria in array quality metrics assessment and were excluded for subsequent analysis; the QC reports are presented in Supplementary Material S2. Thus, a total of 556 samples were used for further analysis, containing 348 asthmatic patients and 208 healthy subjects. Through microarray data preprocessing, 10 gene expression matrices (genes in rows and samples in columns) were generated. For these matrices, all gene names were replaced by Entrez gene identifiers (IDs).

Data Integration and Batch Correction
Based on the results of the data preprocessing, probe annotations, and gene matching, a total of 8324 genes from 556 samples shared by six microarray platforms were selected (Supplementary Figure  S1). After data integrating, batch effects were corrected using the ComBat algorithm. The clustering dendrogram showed that each dataset was clearly separated from the others before the batch correction ( Figure 2A). After the batch correction, the samples from all datasets were well intermixed ( Figure 2B), suggesting that the batch-adjusted data were suitable for further analysis. 1 A and H represent asthmatics and healthy controls, respectively.
To compile expression data for data integrating and subsequent investigation, each dataset was preprocessed, normalized, and then integrated for further analysis following the steps illustrated in Figure 1. The workflow of microarray data integration and subsequent analyses in this study. Quality control of each dataset was manually checked, and then 10 preprocessed and independent qualifying datasets were merged into a large dataset (designated as Merged DS) for further analysis. Batch effects were removed by using the algorithm of batch effect removal. Differentially expressed genes (DEGs) between the asthmatics and healthy controls were identified based on expression fold change > 1.2 and false discovery rate (FDR) < 0.05. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)-based pathways were subsequently enriched and followed by protein-protein Figure 1. The workflow of microarray data integration and subsequent analyses in this study. Quality control of each dataset was manually checked, and then 10 preprocessed and independent qualifying datasets were merged into a large dataset (designated as Merged DS) for further analysis. Batch effects were removed by using the algorithm of batch effect removal. Differentially expressed genes (DEGs) between the asthmatics and healthy controls were identified based on expression fold change > 1.2 and false discovery rate (FDR) < 0.05. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)-based pathways were subsequently enriched and followed by protein-protein interaction (PPI) network construction, prediction of potential transcription factors (TFs) and microRNAs, chromosomal localization, and potential candidate chemical prediction for asthma treatment. The partial contents in this workflow were cited from published papers [18][19][20][21]. interaction (PPI) network construction, prediction of potential transcription factors (TFs) and microRNAs, chromosomal localization, and potential candidate chemical prediction for asthma treatment. The partial contents in this workflow were cited from published papers [18][19][20][21].

Sample Quality Control and Microarray Data Preprocessing
After the quality control, 1 (GSE470), 2 (GSE4302), 2 (GSE18965), 11 (GSE41861), 4 (GSE63142), 3 (GSE64913), 7 (GSE89809), and 4 (GSE104468) samples did not meet the cut-off criteria in array quality metrics assessment and were excluded for subsequent analysis; the QC reports are presented in Supplementary Material S2. Thus, a total of 556 samples were used for further analysis, containing 348 asthmatic patients and 208 healthy subjects. Through microarray data preprocessing, 10 gene expression matrices (genes in rows and samples in columns) were generated. For these matrices, all gene names were replaced by Entrez gene identifiers (IDs).

Data Integration and Batch Correction
Based on the results of the data preprocessing, probe annotations, and gene matching, a total of 8324 genes from 556 samples shared by six microarray platforms were selected (Supplementary Figure S1). After data integrating, batch effects were corrected using the ComBat algorithm. The clustering dendrogram showed that each dataset was clearly separated from the others before the batch correction ( Figure 2A). After the batch correction, the samples from all datasets were well intermixed ( Figure 2B), suggesting that the batch-adjusted data were suitable for further analysis.

Identification of DEGs Involved in Pathogenesis of Asthma
Based on feature selection, we identified 153 differentially expressed genes including 78 upregulated and 75 down-regulated genes across the integrated dataset with the fold change > 1.2 and FDR < 0.05 (Table 2 and Supplementary Material S3). To identify genes most closely related to asthma, more stringent criteria (fold change > 1.5 and FDR < 0.05) were applied and only 8 up-regulated, 2 down-regulated genes were obtained. The up-regulated genes included CEACAM5

Gene Ontology Analysis of DEGs
Based on DAVID analysis, a total of 12 GO functional terms were enriched with adjusted p < 0.01 The up-and down-regulated genes are marked in red and green, respectively. Genes with fold change more than 1.5-fold are labeled by gene symbols.

KEGG Pathway Analysis of DEGs
Based on KEGG pathway mapping, DEGs were significantly enriched in 12 pathways (FDR < 0.05) involved in cancer, arachidonic acid metabolism, linoleic acid metabolism, calcium signaling, aldosterone-regulated sodium reabsorption, and so on (Table 4). It is worth noting that 7 down-and 3 up-regulated genes involved in cancer were also enriched (FDR = 3.24 × 10 −5 , Z-score = −1.26), which is consistent with the previous report that asthma status was associated with decreased risk of aggressive bladder cancer [22]. Moreover, the pathway associated with bladder cancer was also enriched (FDR = 1.17 × 10 −2 ) with a suppressed trend (Z-score = −0.58). Arachidonic acid metabolism was the second most significant pathway (FDR = 1.42 × 10 −4 ). Its abnormal metabolism has been observed in asthma [23,24]. Dysregulated linoleic acid metabolism (FDR = 7.69 × 10 −3 ) was the third enriched pathway. Woods et al. proposed that fatty acid levels were associated with the risk of asthma in young adults [25]. Black et al. argued that increasing dietary intake of n-6 linoleic acid had resulted in increased arachidonic acid and PGE2 production, with a consequent increase in the likelihood of asthma [26]. Our results also found the calcium signaling pathway was significantly suppressed in asthma (FDR = 1.17 × 10 −2 , Z-score = −1.34). An abnormal calcium signaling pathway has been linked with many diseases. Mahn et al. reported that the dysregulation of Ca 2+ homeostasis was probably related to abnormal asthmatic phenotype [27]. Our result supports the fact that peroxisome proliferator-activated receptors (PPAR) are associated with obstructive lung disease [28]. Thus, PPAR may also be a potential target of asthma. Interestingly, a novel pathway named Hematopoietic cell lineage was significantly enriched in this study, suggesting some components in this pathway may be associated with asthma pathogenesis. the production of several pro-inflammatory factors in mast cells [32]. Furthermore, the increased expression of IL-13 promoted the sialylation of MUC4β N-glycans by ST6GAL1 and inhibited the proliferation of damaged epithelial cells in asthmatic patients [33]. The damaged epithelial cells can produce fibroblast-promoting growth factors, which aggravated the airway remodeling response [34]. Similarly, 24 potential regulatory microRNAs were identified (FDR < 0.05) (Supplementary Material S5). A previous study showed that the expression level of microRNA-181a was higher in the acute asthma patients compared to the healthy controls at the beginning of asthma, then dropped to the control level when there were no new airway stimuli. Therefore, microRNA-181a is a potential pro-inflammatory factor in asthma [35]. Mohamed et al. reported that the overexpression of microRNA-26a would increase hypertrophy of human airway smooth muscle cells and promote airway remodeling by inhibition of GSK-3β [36].

Effect of Chromosomal Position on the Expression of DEGs
For many chromosomal loci, gene mutation and dysregulation of gene expression often result in many diseases including prostate cancer [37] and breast cancer [38]. Genome-wide linkage analysis of asthma and airway responsiveness showed that chromosome 12q24.31 contained a locus that was crucial in intermediate phenotype for asthma in a Hispanic population of Costa Rica [39]. Subsequently, Ferreira et al. found that chromosome 11q13.5 locus was significantly associated with the risk of allergic sensitization, in turn, increasing the risk of subsequent development of asthma in Australia [40]. In this study, gene position enrichment analysis of 153 DEGs revealed that 9 asthma-related genes were located on chromosome 3 ( Figure 5). Among them, 5 up-regulated genes (CSTA, GATA2, CPA3, P2RY14, and AADAC) and 1 down-regulated gene (HEG1) were located in q21 region (FDR = 9.65 × 10 −5 ), while the remaining 3 down-regulated genes (LTF, ITPR1, BHLHE40) were located out of q21 region on chromosome 3. Our results indicated that chr3q21 was a new locus associated with asthma. FOXOs were involved in multiple cellular processes, such as stress resistance, cell cycle, apoptosis, and metabolism [29]. In asthmatic patients, the increased expression of AGR2 and RUNX2 would lead to excessive secretion of mucin [30,31]. GATA2 positively regulated the expression of IL-33 receptor (ST2), which stimulated the production of several pro-inflammatory factors in mast cells [32]. Furthermore, the increased expression of IL-13 promoted the sialylation of MUC4β N-glycans by ST6GAL1 and inhibited the proliferation of damaged epithelial cells in asthmatic patients [33]. The damaged epithelial cells can produce fibroblast-promoting growth factors, which aggravated the airway remodeling response [34]. Similarly, 24 potential regulatory microRNAs were identified (FDR < 0.05) (Supplementary Material S5). A previous study showed that the expression level of microRNA-181a was higher in the acute asthma patients compared to the healthy controls at the beginning of asthma, then dropped to the control level when there were no new airway stimuli. Therefore, microRNA-181a is a potential pro-inflammatory factor in asthma [35]. Mohamed et al. reported that the overexpression of microRNA-26a would increase hypertrophy of human airway smooth muscle cells and promote airway remodeling by inhibition of GSK-3β [36].

Effect of Chromosomal Position on the Expression of DEGs
For many chromosomal loci, gene mutation and dysregulation of gene expression often result in many diseases including prostate cancer [37] and breast cancer [38]. Genome-wide linkage analysis of asthma and airway responsiveness showed that chromosome 12q24.31 contained a locus that was crucial in intermediate phenotype for asthma in a Hispanic population of Costa Rica [39]. Subsequently, Ferreira et al. found that chromosome 11q13.5 locus was significantly associated with the risk of allergic sensitization, in turn, increasing the risk of subsequent development of asthma in Australia [40]. In this study, gene position enrichment analysis of 153 DEGs revealed that 9 asthmarelated genes were located on chromosome 3 ( Figure 5). Among them, 5 up-regulated genes (CSTA, GATA2, CPA3, P2RY14, and AADAC) and 1 down-regulated gene (HEG1) were located in q21 region (FDR = 9.65 × 10 −5 ), while the remaining 3 down-regulated genes (LTF, ITPR1, BHLHE40) were located out of q21 region on chromosome 3. Our results indicated that chr3q21 was a new locus associated with asthma.

Identification of Hub Genes Based on PPI Network Construction
To better understand the mechanism of asthma and identify the crucial hub genes among the DEGs, PPI network was constructed using network analysis, which enables the analysis of proteinprotein interaction for multiple genes using NetworkAnalyst [41]. To categorize gene functions, genes in this network were divided into three groups: hub genes highly interacting with other neighbor genes (CD44, KRT6A, LTF, SERPINB2, MUC5B, and CEACAM5), bridge genes connecting different communities (e.g., MUC5AC, CLCA1, POSTN, SP1, and EGFR) and leaf-node genes with one neighbor (e.g., CPA3, BAG1, MNF1, and APP), as shown in Figure 6.

Identification of Hub Genes Based on PPI Network Construction
To better understand the mechanism of asthma and identify the crucial hub genes among the DEGs, PPI network was constructed using network analysis, which enables the analysis of protein-protein interaction for multiple genes using NetworkAnalyst [41]. To categorize gene functions, genes in this network were divided into three groups: hub genes highly interacting with other neighbor genes (CD44, KRT6A, LTF, SERPINB2, MUC5B, and CEACAM5), bridge genes connecting different communities (e.g., MUC5AC, CLCA1, POSTN, SP1, and EGFR) and leaf-node genes with one neighbor (e.g., CPA3, BAG1, MNF1, and APP), as shown in Figure 6. Within this network, CD44 encoding a cell surface adhesion receptor was the most highly ranked hub gene (degree = 59). A previous study showed that CD44 was significantly up-regulated in the bronchial epithelium in asthmatics compared with the controls [42]. Higher expression of CD44 in blood eosinophils could regulate the recruitment and function of leukocytes in asthmatics and are considered as a marker of bronchial asthma [43]. KRT6A are a member of type II epithelial keratins, which are intermediate filament-forming proteins that provide mechanical support and fulfil a variety of additional functions in epithelial cells. The presence of KRT6A was associated with pachyonychia congenital [44], as well as renal carcinoma [45] and breast cancer progression [46]. Genome-wide expression profiling with linkage analysis revealed that the transcription level of KRT6A significantly increased in peripheral blood mononuclear cells (PBMCs), airway brushing cells (ABCs), and bronchioalveolar lavage (BAL) when asthmatics were exposed to cockroach allergen [47]. In line with this, a recent proteomic study of asthma also found that KRT6A was up-regulated in the blood of asthmatics [48]. Therefore, KRT6A was considered as an important marker for asthma. Lactotransferrin (LTF), also called lactoferrin, is an iron-binding glycoprotein and servers as an immune-modulator and anti-inflammatory factor. Transcriptome analysis showed that LTF was significantly up-regulated during the development of asthma [49]. LTF could reduce pollen-induced airway inflammation in a mouse model of allergic asthma [50]. Immunoglobulin receptor CEACAM5, also known as CD66e, is a member of the carcinoembryonic antigen (CEA) family and involved in cell signaling, cell proliferation, cell repair responses, and the maintenance of the intact bronchial Within this network, CD44 encoding a cell surface adhesion receptor was the most highly ranked hub gene (degree = 59). A previous study showed that CD44 was significantly up-regulated in the bronchial epithelium in asthmatics compared with the controls [42]. Higher expression of CD44 in blood eosinophils could regulate the recruitment and function of leukocytes in asthmatics and are considered as a marker of bronchial asthma [43]. KRT6A are a member of type II epithelial keratins, which are intermediate filament-forming proteins that provide mechanical support and fulfil a variety of additional functions in epithelial cells. The presence of KRT6A was associated with pachyonychia congenital [44], as well as renal carcinoma [45] and breast cancer progression [46]. Genome-wide expression profiling with linkage analysis revealed that the transcription level of KRT6A significantly increased in peripheral blood mononuclear cells (PBMCs), airway brushing cells (ABCs), and bronchioalveolar lavage (BAL) when asthmatics were exposed to cockroach allergen [47]. In line with this, a recent proteomic study of asthma also found that KRT6A was up-regulated in the blood of asthmatics [48]. Therefore, KRT6A was considered as an important marker for asthma. Lactotransferrin (LTF), also called lactoferrin, is an iron-binding glycoprotein and servers as an immune-modulator and anti-inflammatory factor. Transcriptome analysis showed that LTF was significantly up-regulated during the development of asthma [49]. LTF could reduce pollen-induced airway inflammation in a mouse model of allergic asthma [50]. Immunoglobulin receptor CEACAM5, also known as CD66e, is a member of the carcinoembryonic antigen (CEA) family and involved in cell signaling, cell proliferation, cell repair responses, and the maintenance of the intact bronchial epithelium [51]. A previous study showed that the expression of CEACAM5 was increased in smoking and non-smoking severe asthma [52]. Consistent with this, the up-regulation of CEACAM5 was also observed in the epithelium in severe neutrophilic asthma [53].
Within bridge genes, MUC5AC is a glycoprotein and significantly increased when the respiratory tract was exposed to external stimulus [54]. High expression of MUC5AC was observed in bronchial epithelial cells of patients with severe asthma [55]. A recent study showed that IL-13 response genes (SERPINB2, POSTN, and CLCA1), protease genes (CPA3 and TPSAB1), and a nitric oxide synthase gene (NOS2) were significantly up-regulated in the epithelium of mild asthma [11].
To provide a comprehensive PPI network structure, the well-established network community recognition algorithm InfoMap [56] was applied and three distinct communities (marked with royal blue, dark turquoise or magenta areas) were obtained (p < 0.001). Combined with gene enrichment analysis, we found that they were associated with the regulation of response to stimulus, cytoskeleton, and response to lipid, respectively ( Figure 6). This result suggested that the occurrence of asthma may be associated with the disturbance of lipid and abnormal cytoskeleton when epithelium cells were exposed to the stimulus.

Identification of Candidate Small Molecules
Using the 10 DEGs with fold change > 1.5 as seeds, 20 potential asthma-related chemical molecules with |score| > 0.6 and p < 0.05 were screened using a Connectivity Map (CMap)-based systems approach. Among them, 17 molecules probably promote the progression of asthma, such as Prestwick-1082, ricinine, and milrinone. On the contrary, 3 molecules have potential therapeutic effects, including catechin, lomefloxacin, and boldine (Table 5). Catechins were one kind of polyphenols and showed various biological and pharmacological activities in antioxidative [57], anti-carcinogenic [58], and antiallergic effects [59]. Catechins from green tea could significantly inhibit the migration of inflammatory cells by suppressing MMP-9 expression and ROS generation in endothelial cells in a murine model of asthma [60]. Recently, Patel et al. confirmed that catechins from Acacia catechu showed inhibitory effects on ovalbumin-induced allergic asthma by inhibiting the activity of the histidine decarboxylase enzyme, and the author suggested that catechin can be used for the treatment of allergic inflammatory disease in humans [61].
Grassi et al. found that antibiotic lomefloxacin can treat acute exacerbations of chronic bronchitis [62]. Using the signalome screening method, Todd et al. reported that lomefloxacin showed significant effects on multiple pro-inflammatory signaling pathways that are constitutively activated in the auto-inflammatory disease TNF receptor [63]. However, no study has been performed concerning the usage of lomefloxacin for asthma treatment.
Boldine is a natural alkaloid from the leaves and bark of the Chilean boldo tree, Peumus boldus, and had anti-inflammatory [64] and antioxidant properties [65]. However, potential therapeutic effects on asthma have not been reported.
Taken together, the compounds catechin, lomefloxacin, and boldine may be potential drugs for the treatment of asthma caused by inflammation and allergy. However, further experiments will be needed to confirm the meta-analysis results.

Crosstalk Pathway of Asthma
To further explore the molecular mechanism of disease pathogenesis, functional enrichment analysis based on the BIOCARTA, REACTOME, and Pathway Interaction Database was performed by using the ToppGene Suite with the threshold of BH-FDR < 0.05.
Six categories were enriched, including 1) genes encoding ECM and ECM-associated proteins (M5889); 2) ECM-affiliated proteins, ECM-regulators and secreted factors (M5885); 3) HIF-1α transcription factor network; 4) FOXA1 transcription factor network; 5) termination of O-glycan biosynthesis, and 6) interleukin-4 and 13 signaling (Table 6). Interestingly, two asthma-associated pathways (M5889 and M5885) were enriched from the BIOCARTA database. Changes observed during airway remodeling in chronic asthmatic patients include excessive extracellular matrix (ECM) production and collagen deposition, increased airway smooth muscle mass, and mucus hypersecretion [66]. Abnormal deposition of ECM protein and/or ECM-associated proteins causes airway stiffening and narrowing, and differences in ECM protein expression may represent a specific asthma phenotype [67]. Our results strongly support that the accumulation of ECMs is essential for the development and progression of airway remodeling in asthma. Some components of ECMs may be novel therapeutic targets in the treatment of airway diseases by suppressing airway remodeling and inflammation.  Cytokine receptors binding to IL-4 activate the JAK-STAT signaling pathway that regulates the expression of downstream genes MMP1, ALOX15, CD36, VEGFA, FOS, NOS2, CLCA1, HIF-1α, TIMP1, and FN1 [72]. All target genes, except for HIF-1α, were identified in this study. Therefore, targeting this pathway through the inhibition of activating IL13 and IL4, and their receptors, and other pathway components should have therapeutic effects on asthma.

Microarray Gene Expression Data Acquisition
Publicly available microarray gene expression datasets for asthma were retrieved from the Gene Expression Omnibus Database (GEO) (http://www.ncbi.nlm.nih.gov/geo/) using the keyword "asthma". The raw datasets were manually checked and only those that met the following criteria were included for subsequent analysis: 1) gene expression profiling in asthmatics and controls, 2) cell type: airway epithelial cell, but not nasal epithelium, 3) gene expression data were generated by a single-channel microarray platform (Affymetrix or Agilent chips), 4) availability of raw CEL or TXT files, and 5) samples with detailed descriptions.

Quality Control and Individual Microarray Dataset Preprocessing
Quality control (QC) analysis was performed for each microarray data using the arrayQualityMetrics package in R. Datasets were excluded from subsequent analysis when they failed to pass any one of the following assessments: 1) pairwise distances between arrays, 2) boxplots; 3) MA plots [73].
Briefly, the raw Affymetrix CEL files were normalized using the robust multi-array averaging (RMA) approach in the Affy package in Bioconductor Project with the following parameters: background correction, normalization and summarization [74], and returning log2 transformed intensities. Raw Agilent microarray data were normalized using the limma package, following In addition, hypoxia-induced HIF-1α could increase the expression of MUC5AC via NF-kB-mediated signaling pathway in asthma. Knocking down HIF-1α by siRNA decreased MUC5AC expression under hypoxia even in IL-1β-treated cells. The above results suggest that HIF-1α may be a therapeutic target for asthma [71].
Cytokine receptors binding to IL-4 activate the JAK-STAT signaling pathway that regulates the expression of downstream genes MMP1, ALOX15, CD36, VEGFA, FOS, NOS2, CLCA1, HIF-1α, TIMP1, and FN1 [72]. All target genes, except for HIF-1α, were identified in this study. Therefore, targeting this pathway through the inhibition of activating IL13 and IL4, and their receptors, and other pathway components should have therapeutic effects on asthma.

Microarray Gene Expression Data Acquisition
Publicly available microarray gene expression datasets for asthma were retrieved from the Gene Expression Omnibus Database (GEO) (http://www.ncbi.nlm.nih.gov/geo/) using the keyword "asthma". The raw datasets were manually checked and only those that met the following criteria were included for subsequent analysis: 1) gene expression profiling in asthmatics and controls, 2) cell type: airway epithelial cell, but not nasal epithelium, 3) gene expression data were generated by a single-channel microarray platform (Affymetrix or Agilent chips), 4) availability of raw CEL or TXT files, and 5) samples with detailed descriptions.

Quality Control and Individual Microarray Dataset Preprocessing
Quality control (QC) analysis was performed for each microarray data using the arrayQuality Metrics package in R. Datasets were excluded from subsequent analysis when they failed to pass any one of the following assessments: 1) pairwise distances between arrays, 2) boxplots; 3) MA plots [73].
Briefly, the raw Affymetrix CEL files were normalized using the robust multi-array averaging (RMA) approach in the Affy package in Bioconductor Project with the following parameters: background correction, normalization and summarization [74], and returning log2 transformed intensities. Raw Agilent microarray data were normalized using the limma package, following background correction using the normexp method, quantile normalization and log2 transformation [75]. All analyses were performed with Bioconductor and R [76].

Data Integration and Batch Effect Removal
In this study, each dataset from all studies was preprocessed separately and then combined into one large dataset for further analysis following the steps illustrated in Figure 1. Probe identifiers from different microarray data were converted into Entrez gene IDs. If more than one probe mapped to a gene, the probe with the largest interquartile range (IQR) was selected as described by Letellier et al. [77], while a probe that mapped to multiple genes was excluded from further analysis.
The direct data integration (DDI) strategy presented in our previous publication [78] was applied to combine the multiple datasets in this study. Specifically, the shared genes among all microarray platforms used in this study were extracted, and then the multiple normalized datasets were merged into a large dataset (i.e., gene expression matrix) for shared genes.
To reduce potential study-specific batch effects, the preprocessed and normalized individual dataset was subjected to correction using the ComBat algorithm, which used the empirical Bayes method to adjust the extreme expression ratios and stabilized gene variances across all other genes and protected their reference from artifacts in the data [79]. The results of data integration and batch effect removal were visualized using clustering dendrograms, created by R and ggtree package [80].

Identification of Differentially Expressed Genes
Differentially expressed genes (DEGs) in integrated datasets were identified using limma package [75]. The linear fit, empirical Bayes (eBayes) statistics, and a false discovery rate (FDR) correction for all data were conducted using lmFit function. Genes with fold change > 1.2 and FDR < 0.05 were considered as DEGs between asthmatics and healthy controls. In addition, fold change > 1.5 and FDR < 0.05 were provided as well, hence we could identify genes which were more closely related to asthma.

Gene Ontology and Pathway Enrichment Analysis
To study the function of DEGs, enrichment of gene ontology (GO) was performed using DAVID (https://david.ncifcrf.gov/). The gene list was uploaded and analyzed using the annotation clustering for biological processes (BP), cellular components (CC), and molecular functions (MF). GO terms were considered to be significant when adjusted p < 0.01. The enrichment results were visualized using GOplot package [81]. Additionally, the enrichment analysis of Kyoto Encyclopedia of Genes and Genomes (KEGG)-based pathway was carried out using the GSEA online tool (http://www. broadinstitute.org/gsea/msigdb/index.jsp) [82]; KEGG pathways were considered to be significant when FDR < 0.05.
To identify up-or down-regulated terms based on DEGs, the Z-score proposed by Walter was adopted and calculated using the following formula [81]: where N up , N down represent the number of up-or down-regulated genes between asthmatics and healthy controls, respectively. The count is the number of DEGs belonging to this term. To investigate the biological pathway most relevant to asthma, functional pathway enrichment analyses of DEGs were performed using ToppGene Suite [83].

Protein-Protein Interaction Network Construction and Community Detection
To explore potential protein-protein interaction associated with asthma, DEGs with fold change > 1.5 were mapped to the PPI data using NetworkAnalyst [41] and its built-in IMEx Interactome, as well as the STRING online database [84]. The PPI network was visualized using Cytoscape V3.5.1 [85].
The asthma-relevant hub-genes were screened using the node degrees calculated in Cytoscape, and the detection of communities in the PPI network was performed with the InfoMap algorithm [56].

Target Gene Prediction of Key Transcription Factors and Regulatory MicroRNAs
Analysis of key transcription factors and their putative target genes was carried out using the GSEA online tool [82]. The statistical significance was determined using hypergeometric distribution and followed by Benjamini-Hochberg multiple testing. Significantly enriched TFs and microRNAs were considered when FDR < 0.05.

Chromosome Position Effect on Gene Expression
To study whether chromosome position affected gene expression patterns, all DEGs were mapped and enriched on chromosomes using the GSEA online tool. The mapping results (FDR < 0.01) were visualized using the karyoploteR package [86].

Connection of DEGs and Small Chemical Molecules
The Connectivity Map was a database of gene expression patterns from cultured human cells treated with bioactive small chemical molecules, in which 6100 groups of small molecule interference experiments and 7056 corresponding gene-expression profiles were stored [87]. CMap analysis was helpful in identifying bioactive molecules resulting in similar or adverse gene expression patterns. DEGs with fold change > 1.5 were subjected to connective mapping analysis using the PharmacoGx package [88], and the enrichment score (ES) was calculated. A molecule was considered beneficial for asthma when ES < −0.5, while it was harmful when ES > 0.5.

Conclusions
To investigate the characteristics of gene expression profiles of bronchial epithelium from asthmatic patients, data integration and systematic bioinformatics approaches were applied in this study. A total of 78 significantly up-regulated and 75 down-regulated genes were identified. Further analyses revealed that 10 differentially expressed genes are possibly instrumental in asthma, including up-regulated CEACAM5, CLCA1, POSTN, CPA3, SERPINB2, KRT6A, CD44, MUC5AC, and down-regulated LTF and MUC5B. Especially, the present study suggests that CD44, KRT6A, LTF, SERPINB2, MUC5B, and CEACAM5 could serve as hub genes in asthma-relevant PPI network and could further act as the biomarkers of asthma. Furthermore, the cross-talking of the IL-4/13 signaling pathway and networks intermediated by transcription factor HIF-1α and FOXA1 play a crucial role in the pathogenesis of asthma. Drug candidate prediction showed that compounds catechin, lomefloxacin, and boldine may be potential drugs for the treatment of asthma caused by inflammation and allergy. Our results elucidated the possible pathogenesis mechanisms of bronchial asthma and also provided several targets for further investigation.