Bioinformatics and Experimental Analyses Reveal NFIC as an Upstream Transcriptional Regulator for Ischemic Cardiomyopathy

Ischemic cardiomyopathy (ICM) caused by coronary artery disease always leads to myocardial infarction and heart failure. Identification of novel transcriptional regulators in ICM is an effective method to establish new diagnostic and therapeutic strategies. In this study, we used two RNA-seq datasets and one microarray dataset from different studies, including 25 ICM and 21 non-failing control (NF) samples of human left ventricle tissues for further analysis. In total, 208 differentially expressed genes (DEGs) were found by combining two RNA-seq datasets with batch effects removed. GO and KEGG analyses of DEGs indicated that the response to wounding, positive regulation of smooth muscle contraction, chromatin, PI3K-Akt signaling pathway, and transporters pathways are involved in ICM. Simple Enrichment Analysis found that NFIC-binding motifs are enriched in promoter regions of downregulated genes. The Gene Importance Calculator further proved that NFIC is vital. NFIC and its downstream genes were verified in the validating microarray dataset. Meanwhile, in rat cardiomyocyte cell line H9C2 cells, two genes (Tspan1 and Hopx) were confirmed, which decreased significantly along with knocking down Nfic expression. In conclusion, NFIC participates in the ICM process by regulating TSPAN1 and HOPX. NFIC and its downstream genes may be marker genes and potential diagnostic and therapeutic targets for ICM.


Introduction
Ischemic heart disease is among the most common causes of morbidity and mortality worldwide [1]. The histological damage associated with scar tissue and the nonreciprocal loss of cardiomyocytes eventually leads to functional depression-heart failure [2]. The currently available therapeutic strategies such as cardiovascular drugs, interventional surgery, and coronary artery bypass grafting, which improve symptoms, cannot fundamentally reverse the condition. Thus, new therapeutic strategies need to be established.
Previous studies have reported that important transcription factors have considerable influence in ICM on account of their special binding motif. For example, the delivery of transcription factor GATA4 locally to the infarct border zone demonstrates increased cardiac function and less negative remodeling [3]. Another transcription factor, Foxo3a, activates the expression of the apoptosis repressor with a caspase recruitment domain by directly binding to its promoter; thus, maintaining calcium homeostasis and inhibiting cardiac apoptosis caused by ICM [4]. NFIC is a transcription factor, a member of the NFI family. The initial study has found that the NFI family is a transcription factor family necessary for adenovirus DNA replication [5], which consists of four members in humans and most vertebrates: NFIA, NFIB, NFIC, and NFIX [6]. They share a highly conserved DNA-binding motif at their N-termini, and therefore bind to a common DNA sequence [7,8]. Studies have shown that NFIA, NFIB, and NFIX play important roles in different organs [9][10][11]. In previous studies, NFIC is regarded as a key regulator of tooth development. NFIC deficiency severely disrupts odontoblast differentiation, leading to the formation of aberrant odontoblasts in the early stage of root formation [12]. Accumulative evidence has also revealed that NFIC is involved in regulating adipocyte and osteoblast differentiation [13,14]. However, the role of NFIC in the heart is unclear.
The rapidly developing integrated bioinformatics analyses have become a powerful tool for predicting disease-associated genes, disease subtypes, and disease treatment. Due to different criteria regarding experimental conditions and sample selection between different datasets, the gene expression levels always show diversity in different datasets. Meta-analysis has been widely used to overcome inconsistent findings among different studies [15,16]. A study identified two potential marker genes (CDC5L and DDX46) of pulmonary arterial hypertension by meta-analysis and animal model validation [17]. Three upstream transcriptional regulators and multiple novel dysregulated genes involved in the development of ischemic cardiomyopathy and its associated cardiovascular diseases were found by RNA-seq meta-analysis [18]. Transcription factors that play important roles in diseases are usually found by animal models and experimental methods, while bioinformatics analyses are also effective ways to search motifs.
In our study, we combined two RNA-seq datasets with the batch effects removed. In total, 118 downregulated and 90 upregulated genes were found and Kyoto Encyclopedia of Genes (KEGG) and Gene Ontology (GO) analyses were also performed to investigate the underlying mechanisms. Simple Enrichment Analysis indicated that NFIC-binding motifs are enriched in promoter regions of downregulated genes in ICM and the essentiality of NFIC was further proved with the Gene Importance Calculator. NFIC and its downstream genes were verified in validating the microarray dataset and two of them were confirmed in Nfic knockdown H9C2 cells. Our study may point to potential targets for the treatment of ICM.

Data Selection
The RNA-seq datasets GSE120852, GSE55296 were downloaded from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/ (accessed on 25 October 2021)). GSE120852 includes heart tissues from non-failing controls (n = 5) and ICM patients (n = 5); GSE55296 includes heart tissues from non-failing controls (n = 10) and ICM patients (n = 13). We downloaded the expression data of 18 ICM heart tissues and 15 non-failing controls from GSE120852 and GSE55296 expression profiling for further analysis. The microarray dataset GSE1869 includes heart tissues from non-failing controls (n = 6) and ICM patients (n = 7). We downloaded the Series Matrix File for further verification. Detailed information about the datasets can be found in Tables S1-S3. The workflow for bioinformatics analysis in our study is illustrated (Figure 1).

Data Preprocessing and DEGs Screening
A pipeline of HISAT2 [19], Samtools [20], and featureCounts [21] was used for aligning the trimmed reads to the human reference genome (GRCh38) and quantifying gene expression. Only uniquely mapped reads were used for expression quantification. The GSE120852 and GSE55296 datasets were merged with batch effects removed using the "limma" package [22]. According to the workflow, the merged dataset served as a training dataset to identify important genes while the GSE1869 dataset was used for assessment. The effect of inter-sample correction was demonstrated using PCA plots which were per-formed on the training matrices before and after the removal of the inter-batch effect with the "BioLadder" (https://www.bioladder.cn/web/ (accessed on 1 March 2022)). DEGs were screened using the "DESeq2" package [23] in R software (version 4.1.2) with the cutoff indicated in figure legends.

Data Preprocessing and DEGs Screening
A pipeline of HISAT2 [19], Samtools [20], and featureCounts [21] was used for aligning the trimmed reads to the human reference genome (GRCh38) and quantifying gene expression. Only uniquely mapped reads were used for expression quantification. The GSE120852 and GSE55296 datasets were merged with batch effects removed using the "limma" package [22]. According to the workflow, the merged dataset served as a training dataset to identify important genes while the GSE1869 dataset was used for assessment. The effect of inter-sample correction was demonstrated using PCA plots which were performed on the training matrices before and after the removal of the interbatch effect with the "BioLadder" (https://www.bioladder.cn/web/ (accessed on 1 March 2022)). DEGs were screened using the "DESeq2" package [23] in R software (version 4.1.2) with the cutoff indicated in figure legends.

Transcription Factor Binding Site Analysis
Genes that were significantly downregulated in ICM compared with NF were subjected to Transcription Factor Binding Site (TFBS) enrichment analysis using the Simple Enrichment Analysis (https://meme-suite.org/meme/tools/sea (accessed on 16 November 2021)) [26]. Predicted TFBS were identified in gene regions 5000 bp upstream of the transcription start site (TSS) for all downregulated genes. The promoter regions

Transcription Factor Binding Site Analysis
Genes that were significantly downregulated in ICM compared with NF were subjected to Transcription Factor Binding Site (TFBS) enrichment analysis using the Simple Enrichment Analysis (https://meme-suite.org/meme/tools/sea (accessed on 16 November 2021)) [26]. Predicted TFBS were identified in gene regions 5000 bp upstream of the transcription start site (TSS) for all downregulated genes. The promoter regions were exported by "biomaRt" package. NFIC motif profiles were downloaded from JASPAR TFBS database (https://jaspar.genereg.net/ (accessed on 16 November 2021)) [27].

Construction of Lentiviral Vectors and Cell Culture
To knockdown NFIC expression, Nfic-specific short hairpin RNA (shRNA)-expressing constructs (shNfic) were designed. The shRNA sequences targeting Nfic are listed in Table S7. The shRNA was cloned into pLKO.1 vector and empty vector was used as the control. pMD2.G and psPAX2 were transfected together with the pLKO.1 vector as the helper vectors. The ratio of the pLKO.1 vector, psPAX2, and PMD2.G is 4:3:1. They were transfected into HEK293T cells using Genetwin transfection reagent. H9C2s were purchased from China Nation Collection of Authenticated Cell Cultures. Cells were cultured in Dulbecco's modified Eagle's medium (DMEM) (Hyclone, Logan, UT, USA) supplemented with 10% fetal bovine serum (FBS) (ExCell, Suzhou, China), 100 IU/mL penicillin, and 100 µg/mL streptomycin (GENOM, Hangzhou, China). These cells were cultured at 37 • C in a humidified incubator under 5% CO 2 . Cultured H9C2s were infected with the indicated lentiviral vectors after three cell passages. After 24 h, the culture medium was replaced by DMEM supplemented with 10% FBS. The transduced cells were then supplied with 2 µg/mL puromycin 2 days after transduction to clear the nontransduced cells.

Quantitative Real-Time PCR (qRT-PCR)
Total RNA was extracted from the cultured cells by using TRIZol reagent (Tsingke, Beijing, China). The cDNA was synthesized with 2 µg of RNA using the 4×EZscript Reverse Transcription Mix II Kit (EZBioscience, Roseville, MN, USA). Then, each 96-well plate well was mixed with 6.5 µL of cDNA (diluted at 1:50), 7.5 µL of qPCR SYBR Mix (Tsingke, Beijing, China), and 1 µL of primers (10 mM). The primers used in the study are listed in Table S6.

Statistical Analysis
Expression of hub genes between the two groups was performed by unpaired t-tests using GraphPad Prism (version 8.3.0, GraphPad Software). p value < 0.05 was assigned as significance.

Data Preprocessing and DEGs Screening
Two RNA-seq datasets (GSE120852 [28], GSE55296 [29]), including 18 ICM and 15 NF samples, were combined with batch effects eliminated. In total, 208 differentially expressed genes (DEGs), including 118 downregulated and 90 upregulated genes, were identified in a meta-analysis using the strict cutoff (p value < 0.05, |(log 2 FC)| ≥ 1) (Table S4). Next, the batch effects were eliminated, the distributions of sample row counts before adjustment for batch effects and the distributions of DEGs after adjustment are illustrated in the PCA plots ( Figure 2A,B). A volcano plot and DEGs heatmaps are shown in Figure 2C,D. The top 50 DEGs ordered by p value are presented in Table 1. Due to the variation in the human samples, we also used the loose cutoff (p value < 0.05, |(log 2 FC)| ≥ 0.7) to screen DEGs, including 201 upregulated and 255 downregulated genes (Table S5).

GO and KEGG Analyses of Downregulated Genes
GO and KEGG analyses were performed by DAVID [30] for both upregulated and downregulated genes screened by the strict cutoff. The results of DAVID GO and KEGG analyses of the upregulated genes are shown in Figure S2. The upregulated genes were involved in the inflammatory response, extracellular region, transmembrane signaling receptor activity, antigen processing, and presentation pathways. More genes are downregulated in all DEGs. Additionally, we are more interested in downregulated transcription activators in which downstream genes would be also downregulated. In the DAVID GO analyses of all 118 downregulated genes, the most enriched biological process (BP) terms were associated with response to wounding, positive regulation of smooth muscle contraction, and positive regulation of transcription from RNA polymerase II promoter. The most enriched terms for cellular components (CC) were mainly associated with post synapse and extracellular space. The most enriched molecular function (MF) terms were associated with glycine: sodium symporter activity, transcription factor activity, sequence-specific DNA binding and RNA polymerase II core promoter proximal region sequence-specific DNA binding ( Figure 3A-C). In the DAVID KEGG analysis, the downregulated genes were enriched in the PI3K-Akt signaling pathway ( Figure 3D).
Genes 2022, 13, x FOR PEER REVIEW 5 of 18 in the human samples, we also used the loose cutoff (p value < 0.05, | (log2FC) | ≥ 0.7) to screen DEGs, including 201 upregulated and 255 downregulated genes (Table S5). DEGs. NF, non-failing heart tissue; ICM, heart tissue of ischemic cardiomyopathy patient; DEGs, differentially expressed genes; IBNS-DEGs, important-but-not-such-significantly-differentially expressed genes.  The DAVID GO and KEGG analysis for pathway enrichment analyses totally depend on the category count numbers of the gene list and one gene set, ignoring the different importance of every single gene. The weighted enrichment analysis (WEAT) method can overcome the issue of treating every gene equally to the conventional method [25]. The results of WEAT GO and KEGG analyses of the upregulated genes screened by the strict cutoff are shown in Figure S2. In the WEAT GO analyses of downregulated genes screened by the strict cutoff, the most enriched biological process (BP) terms were associated with negative regulation of the apoptotic signaling pathway, peripheral nervous system development, and response to wounding. The most enriched terms for cellular components (CC) were mainly associated with post synapse and chromatin. The most enriched molecular function (MF) terms were associated with serine-type endopeptidase activity, diuretic hormone activity and DNA-binding transcription activator activity, RNA polymerase II-specific. (Figure 3E-G). In the WEAT KEGG analysis, the downregulated genes screened by the strict cutoff were enriched in Retinol metabolism, ABC transporters, and PI3K-Akt signaling pathway ( Figure 3H).
KRT10-AS1 0.0008554 0.855157383 Up DAVID GO and KEGG analyses were also performed on DEGs screened by loose cutoff. The upregulated genes were enriched in inflammatory response, extracellular region, antigen processing, and presentation pathways, whereas the downregulated genes were involved in pathways such as positive regulation of smooth muscle contraction, extracellular space, transcription factor activity, sequence-specific DNA binding, and the PI3K-Akt signaling pathway ( Figure S1). The enrichment results of DEGs screened by loose cutoff are similar to the DEGs screened by strict cutoff. Hence, we used the DEGs screened by the strict cutoff to undertake the following analysis.

Enrichment Analysis of NFIC-Binding Motif on Downregulated Promoter Sequences
In order to find out which transcription factor regulates the downregulated genes in ICM, previously obtained 118 downregulated genes from full DEGs were used to identify motifs significantly enriched in the promoter regions (−5000 bp relative to the TSS) via Simple Enrichment Analysis (SEA). Due to the limitation of the "biomaRt" package, some promoter regions could not be found and some regions were duplicated when extracting the promoter regions with it. Finally, 107 effective promoter regions were obtained. We acquired 220 motifs with p value < 0.005 through SEA. However, the expression of transcription factors with significantly enriched motifs may not decrease. In order to select downregulated important transcription factors, we set a loose cutoff as log 2 FC < −0.3 and p value < 0.06 to include more downregulated genes. Then we obtained four overlapping transcription factors (KLF6, KLF1, KLF7, and NFIC) with both enrichment in the downregulated gene promoters and decreased expression, shown in Table 2. However, these four transcription factors were not included in the DEGs list. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by Gene Importance Calculator (GIC) [31]. Thus, we calculated these four genes' GIC scores, shown in Table 2. Interestingly, among the four transcription factors, NFIC was the most important one with the highest GIC score ( Table 2). Summarizing the results of SEA and GIC analysis, NFIC would be an important upstream regulator of downregulated genes in ICM.

Enrichment Analysis of NFIC-Binding Motif on Downregulated Promoter Sequences
In order to find out which transcription factor regulates the downregulated genes in ICM, previously obtained 118 downregulated genes from full DEGs were used to identify motifs significantly enriched in the promoter regions (−5000 bp relative to the TSS) via Simple Enrichment Analysis (SEA). Due to the limitation of the "biomaRt" package, some promoter regions could not be found and some regions were duplicated when extracting the promoter regions with it. Finally, 107 effective promoter regions were obtained. We acquired 220 motifs with p value < 0.005 through SEA. However, the expression of transcription factors with significantly enriched motifs may not decrease. In order to select downregulated important transcription factors, we set a loose cutoff as log2FC < −0.3 and p value < 0.06 to include more downregulated genes. Then we obtained four overlapping transcription factors (KLF6, KLF1, KLF7, and NFIC) with both enrichment in the downregulated gene promoters and decreased expression, shown in Table 2. However, these four transcription factors were not included in the DEGs list. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by Gene Importance Calculator (GIC) [31]. Thus, we calculated these four genes' GIC scores, shown in Table 2. Interestingly, among the four transcription factors, NFIC was the most important one with the highest GIC score ( Table 2). Summarizing the results of SEA and GIC analysis, NFIC would be an important upstream regulator of downregulated genes in ICM. an alternate name for the motif that may be provided in the motif database file; 2 : the optimal enrichment p value of the motif according to the statistical test; 3 : the number of primary sequences matching the motif/the number of primary sequences (the percentage of primary sequences matching the motif); 4 : the number of control sequences matching the motif/the number of control sequences (the percentage of control sequences matching the motif); 5 : the relative enrichment ratio of the motif in the primary vs. control sequences; 6 : a sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal threshold (Score Threshold), this is the score threshold used by SEA to determine the values of "TP" and "FP", SEA uses the hold-out sequence set to determine the score threshold unless there are too few sequences in the input. 7 : represents the importance of the gene, higher GIC score represents more importance [31].

Verification of Predicted Genes with NFIC-Binding Motif by Validating Microarray Dataset and Nfic Knockdown Cardiomyocytes
Among the 118 downregulated genes, 33 of them were predicted to have a NFICbinding motif in SEA motif analysis (

Enrichment Analysis of NFIC-Binding Motif on Downregulated Promoter Sequences
In order to find out which transcription factor regulates the downregulated genes in ICM, previously obtained 118 downregulated genes from full DEGs were used to identify motifs significantly enriched in the promoter regions (−5000 bp relative to the TSS) via Simple Enrichment Analysis (SEA). Due to the limitation of the "biomaRt" package, some promoter regions could not be found and some regions were duplicated when extracting the promoter regions with it. Finally, 107 effective promoter regions were obtained. We acquired 220 motifs with p value < 0.005 through SEA. However, the expression of transcription factors with significantly enriched motifs may not decrease. In order to select downregulated important transcription factors, we set a loose cutoff as log2FC < −0.3 and p value < 0.06 to include more downregulated genes. Then we obtained four overlapping transcription factors (KLF6, KLF1, KLF7, and NFIC) with both enrichment in the downregulated gene promoters and decreased expression, shown in Table 2. However, these four transcription factors were not included in the DEGs list. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by Gene Importance Calculator (GIC) [31]. Thus, we calculated these four genes' GIC scores, shown in Table 2. Interestingly, among the four transcription factors, NFIC was the most important one with the highest GIC score ( Table 2). Summarizing the results of SEA and GIC analysis, NFIC would be an important upstream regulator of downregulated genes in ICM. an alternate name for the motif that may be provided in the motif database file; 2 : the optimal enrichment p value of the motif according to the statistical test; 3 : the number of primary sequences matching the motif/the number of primary sequences (the percentage of primary sequences matching the motif); 4 : the number of control sequences matching the motif/the number of control sequences (the percentage of control sequences matching the motif); 5 : the relative enrichment ratio of the motif in the primary vs. control sequences; 6 : a sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal threshold (Score Threshold), this is the score threshold used by SEA to determine the values of "TP" and "FP", SEA uses the hold-out sequence set to determine the score threshold unless there are too few sequences in the input. 7 : represents the importance of the gene, higher GIC score represents more importance [31].

Verification of Predicted Genes with NFIC-Binding Motif by Validating Microarray Dataset and Nfic Knockdown Cardiomyocytes
Among the 118 downregulated genes, 33 of them were predicted to have a NFICbinding motif in SEA motif analysis (

Enrichment Analysis of NFIC-Binding Motif on Downregulated Promoter Sequences
In order to find out which transcription factor regulates the downregulated genes in ICM, previously obtained 118 downregulated genes from full DEGs were used to identify motifs significantly enriched in the promoter regions (−5000 bp relative to the TSS) via Simple Enrichment Analysis (SEA). Due to the limitation of the "biomaRt" package, some promoter regions could not be found and some regions were duplicated when extracting the promoter regions with it. Finally, 107 effective promoter regions were obtained. We acquired 220 motifs with p value < 0.005 through SEA. However, the expression of transcription factors with significantly enriched motifs may not decrease. In order to select downregulated important transcription factors, we set a loose cutoff as log2FC < −0.3 and p value < 0.06 to include more downregulated genes. Then we obtained four overlapping transcription factors (KLF6, KLF1, KLF7, and NFIC) with both enrichment in the downregulated gene promoters and decreased expression, shown in Table 2. However, these four transcription factors were not included in the DEGs list. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by Gene Importance Calculator (GIC) [31]. Thus, we calculated these four genes' GIC scores, shown in Table 2. Interestingly, among the four transcription factors, NFIC was the most important one with the highest GIC score ( Table 2). Summarizing the results of SEA and GIC analysis, NFIC would be an important upstream regulator of downregulated genes in ICM. an alternate name for the motif that may be provided in the motif database file; 2 : the optimal enrichment p value of the motif according to the statistical test; 3 : the number of primary sequences matching the motif/the number of primary sequences (the percentage of primary sequences matching the motif); 4 : the number of control sequences matching the motif/the number of control sequences (the percentage of control sequences matching the motif); 5 : the relative enrichment ratio of the motif in the primary vs. control sequences; 6 : a sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal threshold (Score Threshold), this is the score threshold used by SEA to determine the values of "TP" and "FP", SEA uses the hold-out sequence set to determine the score threshold unless there are too few sequences in the input. 7 : represents the importance of the gene, higher GIC score represents more importance [31].

Verification of Predicted Genes with NFIC-Binding Motif by Validating Microarray Dataset and Nfic Knockdown Cardiomyocytes
Among the 118 downregulated genes, 33 of them were predicted to have a NFICbinding motif in SEA motif analysis (

Enrichment Analysis of NFIC-Binding Motif on Downregulated Promoter Sequences
In order to find out which transcription factor regulates the downregulated genes in ICM, previously obtained 118 downregulated genes from full DEGs were used to identify motifs significantly enriched in the promoter regions (−5000 bp relative to the TSS) via Simple Enrichment Analysis (SEA). Due to the limitation of the "biomaRt" package, some promoter regions could not be found and some regions were duplicated when extracting the promoter regions with it. Finally, 107 effective promoter regions were obtained. We acquired 220 motifs with p value < 0.005 through SEA. However, the expression of transcription factors with significantly enriched motifs may not decrease. In order to select downregulated important transcription factors, we set a loose cutoff as log2FC < −0.3 and p value < 0.06 to include more downregulated genes. Then we obtained four overlapping transcription factors (KLF6, KLF1, KLF7, and NFIC) with both enrichment in the downregulated gene promoters and decreased expression, shown in Table 2. However, these four transcription factors were not included in the DEGs list. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by Gene Importance Calculator (GIC) [31]. Thus, we calculated these four genes' GIC scores, shown in Table 2. Interestingly, among the four transcription factors, NFIC was the most important one with the highest GIC score ( Table 2). Summarizing the results of SEA and GIC analysis, NFIC would be an important upstream regulator of downregulated genes in ICM. an alternate name for the motif that may be provided in the motif database file; 2 : the optimal enrichment p value of the motif according to the statistical test; 3 : the number of primary sequences matching the motif/the number of primary sequences (the percentage of primary sequences matching the motif); 4 : the number of control sequences matching the motif/the number of control sequences (the percentage of control sequences matching the motif); 5 : the relative enrichment ratio of the motif in the primary vs. control sequences; 6 : a sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal threshold (Score Threshold), this is the score threshold used by SEA to determine the values of "TP" and "FP", SEA uses the hold-out sequence set to determine the score threshold unless there are too few sequences in the input. 7 : represents the importance of the gene, higher GIC score represents more importance [31].

Verification of Predicted Genes with NFIC-Binding Motif by Validating Microarray Dataset and Nfic Knockdown Cardiomyocytes
Among the 118 downregulated genes, 33 of them were predicted to have a NFICbinding motif in SEA motif analysis (Table 3). In the validating dataset GSE1869, the expression of NFIC was significantly downregulated in ICM samples, and expressions of an alternate name for the motif that may be provided in the motif database file; 2 : the optimal enrichment p value of the motif according to the statistical test; 3 : the number of primary sequences matching the motif/the number of primary sequences (the percentage of primary sequences matching the motif); 4 : the number of control sequences matching the motif/the number of control sequences (the percentage of control sequences matching the motif); 5 : the relative enrichment ratio of the motif in the primary vs. control sequences; 6 : a sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal threshold (Score Threshold), this is the score threshold used by SEA to determine the values of "TP" and "FP", SEA uses the hold-out sequence set to determine the score threshold unless there are too few sequences in the input. 7 : represents the importance of the gene, higher GIC score represents more importance [31].

Verification of Predicted Genes with NFIC-Binding Motif by Validating Microarray Dataset and Nfic Knockdown Cardiomyocytes
Among the 118 downregulated genes, 33 of them were predicted to have a NFICbinding motif in SEA motif analysis (Table 3). In the validating dataset GSE1869, the expression of NFIC was significantly downregulated in ICM samples, and expressions of 21 predicted downstream genes were found whereas the other 12 genes were not recorded. Among the 21 genes, TSPAN1, HOPX, MPP3, LAD1, KCNE4, AGXT, SPHK1, and FOSB were significantly downregulated ( Figure 4A,B). In parallel, we performed cellular experiments to cross-validate our predication. We decreased the Nfic expression by lentivirus-mediated transgene expression of two Nfic-specific shRNA (shRNA#1 and shRNA#2) in cultured H9C2. qRT-PCR was performed to measure the expression levels of Nfic mRNA. Accordingly, mRNA expression level of Nfic was lower in H9C2 treated with Nfic shRNA than H9C2 treated with empty pLKO.1 vector. The knockdown efficiency was better using Nfic shRNA#1 demonstrated by the mRNA and protein expression levels ( Figure 4C-E). Thus, the mRNA expression levels of all 33 predicted genes with NFIC-binding motif were measured in H9C2 treated with Nfic shRNA#1. Among them, two genes, Tspan1 (tetraspanin1) and Hopx (HOP homeobox) were significantly downregulated ( Figure 4F), whereas expression levels of others were not consistent with prediction or too low to be detected ( Figure S3 and Table S8).

Discussion
In this study, we performed an integrated bioinformatics analysis of clinical ICM RNA-seq data. A total of 208 DEGs including 118 downregulated and 90 upregulated genes were identified by RNA-seq meta-analysis. Simple Enrichment Analysis and GIC were performed to identify one key transcription factor NFIC. Downstream genes were verified by quantitative real-time PCR experiments.
Among the top 50 significant DEGs (Table 1), several genes, such as ERBB3, RCAN1, and NLRX1 were previously reported to be related with ICM [32][33][34][35][36]. ERRB3, erb-b2 receptor tyrosine kinase 3, has been reported as a protective factor which inhibits death mechanisms activated by redox stress and supports an involvement of this receptor in the pro-survival responses after MI/R injury [32]. The activation of NRG-1/ErbB3 signaling significantly decreased inflammation and improved LV function in remote ischemic conditioning rats [33]. Moreover, cardiomyocytes depleted of RCAN1 were more sensitive to simulated MI/R and the calcineurin/Rcan1-signaling cascade could act as a potential therapeutic target through which to benefit from innate circadian changes in cardiac protection without disrupting core circadian oscillations that are essential to cardiovascular, metabolic, and mental health [34]. NLRX1, NOD-like receptors family member X1, is significantly downregulated following intestinal MI/R injury [35], and ablation of the mitochondrial NLRX1 exerts a detrimental effect on acute cardiac infarction induced by a prolonged ischemia-reperfusion episode and activates potential MI/R injury mechanisms contributing to increased MI/R injury, related to elevated energy metabolism and diminished Akt [36].
Due to different criteria regarding experimental conditions, sample selection between different datasets, the gene expression levels always show diversity in different datasets. RNA-seq meta-analysis overcomes the above limitations well. In the merged datasets, we identified 208 overlapped DEGs. We are more interested in the downregulated genes and the upregulated genes remain to be studied. GO and KEGG analyses were performed by DAVID and WEAT. In general, the genes in one input gene list and one gene set are equally treated in gene functional enrichment analysis which would lead to biased results because genes are actually different in essentiality. WEAT provides a weighted gene functional enrichment analysis method according to the above truth that different genes have different essentiality in one biological process or an organ [25]. In WEAT analysis, we added different weights to our DEGs according to their essentiality in the heart. In the results of DAVID and WEAT BP analyses, there are same terms such as response to wounding, cell death, and positive regulation of transcription from the RNA polymerase II promoter, indicating that these processes have played an important role in the development of ICM. Terms such as positive regulation of smooth muscle contraction and lipid phosphorylation are only enriched in WEAT BP analysis. Additionally, it is reasonable that the movement and metabolism of muscle is damaged during the process of ICM. Thus, WEAT gives expression to the importance of essentiality scores in gene functional analysis. The PI3K-Akt signaling pathway is enriched in both DAVID and WEAT KEGG analyses. Previous studies have shown that the activation of the P13K/Akt signaling pathway enhances the cardioprotective effect of progranulin in the rat model of acute MI/R injury [37]. In our study, several genes (DDIT4, MYC, CSF3, TNC, EIF4E1B, PCK2, and BDNF) belonging to the P13K/Akt signaling pathway were downregulated in ICM, indicating the potential diagnostic and therapeutic targets of ICM. The WEAT KEGG analysis listed other pathways such as "Retinol metabolism" and "Glycine, serine and threonine metabolism" which predicted the latent metabolic characteristics of ICM.
Many studies have reported genes related to ICM. For example, F13A1 and its protein product factor XIII-A (FXIII-A) is a member of the transglutaminase family of enzymes, named for its important role in blood coagulation [38]. The genetically determined FXIII-A level acts as an independent predictor of early prognosis after acute myocardial infarction (AMI) and the mutant of FXIII-A seems associated with lower AMI risk, which makes it a marker gene of AMI [39,40]. Moreover, it has a direct proangiogenic effect on endothelial cells in vitro due to the possible role of FXIII in inhibiting antiangiogenic TSP1 gene expression [41,42]. Additionally, modulation of FXIII activity by therapy influences myocardial healing in the mouse model of coronary ligation [42]. Few studies point out the regulators which regulate the ICM-related genes. Dysregulated transcription factors and their downstream gene expression signatures indicate the molecular mechanisms underlying disease processes. Therefore, finding such novel regulators aids in the development of new diagnostic and therapeutic applications. Transcription activators are cellular regulators that can upregulate the transcription level of genes by binding to specific short sequences of DNA in the promoters or enhancers of target genes. Many studies have reported that downregulated transcription activators play important roles in various diseases. Reduced expression of transcription factor FOXP1 and its downstream gene p21 Waf1/Cip1 was found to be a contributing factor in Huntington's disease [43]. The expression of transcription activator FOXO4 is decreased significantly in most gastric cancer tissues and in various human gastric cancer cell lines, suggesting that it may serve as a potential therapeutic target for gastric cancer [44]. By the inspiration of these studies, we hope to find the key downregulated transcription activators and their downstream targets of ICM. Hence, we are more interested in the downregulated genes and transcription activators. NFIC was regarded as a key regulator of tooth development initially. It is expressed in odontoblasts of crown and root and the deletion of Nfic leads to short tooth roots [45]. However, the investigation of NFIC in the heart is still very limited. Liu Q et al. found that the addition of 13-cis-retinoic acid during cardiomyocyte differentiation would increase the binding of NFIC to DNA when studying the toxic mechanism of 13-cis-retinoic-acid-induced early cardiac differentiation and development [46]. Another member of the NFI family, NFIX, has been reported as a transcriptional switch from embryonic to fetal myogenesis [47] and directly represses the Myostatin promoter; thus, controlling the proper timing of satellite cell differentiation and muscle regeneration [48]. Similar to NFIX, NFIC binds and regulates downstream genes through the binding motif. It is likely that NFIC will bind and regulate the genes which have the NFIC binding motif in their promoter regions. In our study, we found that NFIC is highly expressed in the heart, but the role of NFIC in the heart is unclear. NFIC is a transcription factor which positively regulates the expression of downstream genes. In general, disease-related transcription factors are usually found in DEGs. However, the cutoff of DEGs is a little strict which may leave out some important genes. A recent study pointed out that important-but-not-such-significantly-differentially expressed genes (IBNS-DEGs) may play important roles in the given biological process although it is not taken as candidate genes which can be recognized by the Gene Importance Calculator [31]. Therefore, we adjusted the cutoff to search the key regulator. Our study performed the motif research combining Simple Enrichment Analysis and the Gene Importance Calculator. SEA showed that the NFIC-binding motif is significantly enriched in the promoter region of downregulated genes in ICM and GIC determined the high importance of NFIC. In summary, we proved that NFIC is an upstream regulator of downregulated genes in clinical ICM and plays an important role in many physiological and pathological processes in the heart.
The Kruppel-like Factor (KLF) family is a subfamily of the zinc-finger class of DNAbinding transcriptional regulators. There are several published reports describing the role of KLFs in the cardiovascular system. KLF6 was reported to cooperate with the related GC box binding protein Sp1 to regulate the expression of endoglin and related members of the TGF-β signaling complex in vascular repair [49]. KLF15 is an inhibitor of cardiac hypertrophy in which overexpression in neonatal rat ventricular cardiomyocytes inhibits cell size, protein synthesis, and hypertrophy-related gene expression [50]. Our study indicated that KLFs are important transcription factors and may regulate some momentous genes in ICM, but their function in ICM needs further study.
Many genes with the NFIC-binding motif were found through motif enrichment research but only two genes were verified, which showed the importance of these two genes. TSPAN1 (tetraspanin 1), a member of the tetraspanin (TSPAN/TM4SF) superfamily, was reported to be involved in many fundamental biological processes, such as cell proliferation, adhesion, and migration [51]. However, the role of TSPAN1 in the heart is unclear yet. Our study found that TSPAN1 is downregulated in ICM owing to the key transcription factor NFIC, indicating that it may be involved in the proliferation of cardiomyocytes and the remolding of myocardium in ICM. HOPX is a homeodomain protein which lacks residues for binding DNA. It was shown to be necessary for normal myocardial development in mice and zebrafish through studies with loss-of-function mutations [52]. Additionally, it contacts DNA indirectly by serum response factor or other DNA-binding proteins and represses genes through recruiting histone deacetylases [53].
There are some limitations in our study: (1) the accuracy of disease assessment and prediction can be improved if the sample size is increased and genetic information is completed; (2) verification in our study is not enough. For example, the luciferase reporter assay, ChIP assay, and in vivo animal experiments were in need of further study to verify the relationship between NFIC and ICM.

Conclusions
In summary, by integrating and analyzing different datasets, and through the cell experiment validation, one key regulator NFIC and its two downstream genes (TSPAN1 and HOPX) of ICM, were identified. Through GO and KEGG analyses, the response to wounding, positive regulation of smooth muscle contraction, chromatin, PI3K-Akt signaling pathway, and transporters pathways may be related to ICM. Together, our findings provide new perspectives for the understanding of NFIC and its downstream genes in the pathogenesis of ICM.