Comparative Analysis of Public RNA-Sequencing Data from Human Intestinal Enteroid (HIEs) Infected with Enteric RNA Viruses Identifies Universal and Virus-Specific Epithelial Responses

Acute gastroenteritis (AGE) has a significant disease burden on society. Noroviruses, rotaviruses, and astroviruses are important viral causes of AGE but are relatively understudied enteric pathogens. Recent developments in novel biomimetic human models of enteric disease are opening new possibilities for studying human-specific host–microbe interactions. Human intestinal enteroids (HIE), which are epithelium-only intestinal organoids derived from stem cells isolated from human intestinal biopsy tissues, have been successfully used to culture representative norovirus, rotavirus, and astrovirus strains. Previous studies investigated host–virus interactions at the intestinal epithelial interface by individually profiling the epithelial transcriptional response to a member of each virus family by RNA sequencing (RNA-seq). Despite differences in the tissue origin, enteric virus used, and hours post infection at which RNA was collected in each data set, the uniform analysis of publicly available datasets identified a conserved epithelial response to virus infection focused around “type I interferon production” and interferon-stimulated genes. Additionally, transcriptional changes specific to only one or two of the enteric viruses were also identified. This study can guide future explorations into common and unique aspects of the host response to virus infections in the human intestinal epithelium and demonstrates the promise of comparative RNA-seq analysis, even if performed under different experimental conditions, to discover universal and virus-specific genes and pathways responsible for antiviral host defense.


Introduction
Acute gastroenteritis (AGE) is a major source of illness globally. It is defined as the inflammation of the mucus membranes of the intestinal tract accompanied by rapid onset diarrhea, nausea, vomiting, and abdominal pain. Five major virus families have been identified as etiological agents of viral gastroenteritis: noroviruses, sapoviruses (both single-stranded positive-sense RNA viruses belonging to the Caliciviridae family), rotaviruses (double-stranded RNA viruses belonging to the Reoviridae family), adenoviruses (double-stranded DNA viruses belonging to the Adenoviridae family), and astroviruses (single-stranded RNA viruses belonging to the family Astroviridae) [1]. Only 20-30% of AGE cases in the United States have a specific causal virus identified [2]. Globally, noroviruses have the highest prevalence across all age groups, causing approximately one fifth of all AGE cases [3]. Rotaviruses are the leading cause of AGE in children < 5 years of age despite the availability of a vaccine [4]. Human astroviruses typically cause diarrhea in children < 2 years of age but extraintestinal disease, including meningitis/encephalitis, is observed in immunocompromised individuals [5]. A full understanding of viral AGE intestinal pathogenesis has been challenging due to genetic diversity within each viral family and due to the lack of suitable experimental models resembling the complexity of the human intestinal epithelium.
Human intestinal enteroids (HIEs) are patient-derived endoderm-only 3D structures composed of heterogeneous cell populations recapitulating intestinal tissues in vivo, providing a more faithful experimental model than immortalized and transformed cells. A large degree of epithelial cellular diversity is observed in HIEs: Columnar intestinal epithelial, stem/progenitor, enteroendocrine, and secretory cells can be observed depending on the culture conditions [6]. Spheroid structures can also be transitioned into 2D monolayers to facilitate host-microbe interaction studies in physiologic asymmetric oxygen conditions [7]. Furthermore, HIEs have been used to study the pathogenesis of several enteric viruses including difficult-to-cultivate ones [8,9].
HIEs provide a uniquely complex system to study host-virus interactions at the intestinal epithelial interface and have been used to profile the epithelial transcriptional response to viral causes of AGE by RNA sequencing (RNA-seq) in several studies. Nonclassic human astrovirus (VA1 strain) upon infection of duodenal HIEs triggered type I and type III interferon (IFN) signaling [10]. Ileal HIEs infected with human norovirus (GII.4 strain) showed high enrichment in STAT1 and STAT2 binding sites, suggesting JAK-STAT signaling pathway activation due to type I IFN signaling [11]. Jejunal HIEs when infected with rotavirus (strain Ito) showed a predominant and conserved type III IFN response [12]. Integrating the conclusions from these studies that explore the pathogenesis of astrovirus [10], norovirus [11], and rotavirus [12] in HIEs is challenging since there are technical differences between studies. Each study uses HIEs grown from different tissue donors and varies the time interval at which the host response to the virus was evaluated. Furthermore, different library construction protocols can result in different sequence coverage, complexity, and evenness [13]. Despite the limitations to the comparative analysis of RNA-seq datasets generated from different studies, a re-analysis of public datasets can provide novel insights into a scientific question as long as researchers take the biological context into consideration in which each dataset was generated [14].
Here, our goal was to re-analyze three publicly available datasets from HIEs infected with three representative viruses that cause AGE to gain further insights into shared and unique transcriptional changes in the intestinal epithelium upon virus infection from a qualitative assessment standpoint.

Cell Lines and Virus Infection
Each public RNA-seq dataset was generated in a different institution with variations in the experimental protocol used for viral infection, which we briefly summarize here. To generate the RNA-seq dataset from duodenal HIEs infected with astrovirus, HIEs were seeded in 48-well plates as 2D monolayers in complete L-WRN medium (GMCF+) and infected with astrovirus strain VA1 at an MOI of 1 based on genome copies per cell for 1 h at 37 • C or mock. Total RNA was extracted for sequencing at 24 h post-infection (hpi) [10]. Ileal HIE monolayers grown in 48-well plates were incubated for 2 h at 37 • C with stool filtrates containing a patient-derived GII.4 human norovirus strain (~1 × 10 6 viral RNA copies) or mock-treated. Samples were collected at 48 hpi before sequencing [11]. For the RNA-seq dataset generated from rotavirus-infected jejunal HIEs, spheroid (3D) HIEs were grown in complete media with growth factors (CMGF+) and differentiated for 3-4 days before infection. Then HIEs were infected with human rotavirus (HRV) strain Ito and sequenced at 6 hpi [12].

Selection of GEO and EMBL-EBI Data
The sequencing reads obtained from RNA-seq experiments in each one of the studies re-analyzed were obtained from the Gene Expression Omnibus (GEO) or the European Bioinformatics Institute (EMBL-EBI) ArrayExpress collection. FASTQ files from a total of 12 HIEs samples (6 mock-treaded and 6 virus-infected) grown from two patients (TI006 and TI365) were obtained for the norovirus-infected ileal HIEs study from the GEO database (accession number GSE117911) [11]. FASTQ files for the rotavirus-infected jejunal HIEs study (accession number GSE90796) were also obtained from the GEO database. A total of 4 HIEs samples (2 mock and 2 treated) obtained from HIEs grown from two patients (J2 and J11) were deposited under this accession number [12]. Finally, for the astrovirus-infected duodenal HIE study, FASTQ files were obtained from EMBL-EBI ArrayExpress collection for a total of 6 HIE samples (3 mock-treated and 3 virus-infected) grown from one patient (D124) [10]. The metadata associated with each one of the samples used in this study is listed in Supplemental Table S1.

Differential Expression Analysis
Transcript abundances from pseudoalignments (provided in transcripts-per-million) were generated for each RNA-seq dataset using Kallisto [15] and used to generate matrices with estimated gene counts for each one of the RNA-seq datasets re-analyzed. We took into account whether the publicly available RNA-seq data was generated from paired-end or single-end Illumina sequencing.
Estimated gene count matrices were then generated for each RNA-seq dataset from transcript abundances using the package tximport [16]. Subsequently, we independently filtered out weakly expressed genes by calculating a similarity index among biological replicates using the HTSFilter method which seeks to identify whether genes tend to either have normalized counts less than or equal to the cutoff value in all samples (filtered genes) or greater than the cutoff value in all samples (non-filtered genes) [17]. The set of genes used for differential expression analysis for each RNA-seq dataset consisted then of nonweakly expressed genes that had an annotation in the Entrez Molecular Sequence Database System [18] and a symbol in HUGO Gene Nomenclature Committee (HGNC) [19].
Filtered estimated gene count matrices generated for each RNA-seq dataset were normalized (median of ratios) with the R package DESeq2 version 1.32.0 (https://bioconductor. org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) [20] and subjected to differential expression analysis using the alternative shrinkage estimator ashr [21] to control for false discovery rates (FDR), and effect sizes. A cutoff of an actual foldchange of at least 1.5 (Log 2 FC > 0.58) with a false discovery rate (FDR) cutoff of 1% (adjusted p-value < 0.01) was used to determine whether a gene was differentially expressed. Volcano plots were generated to display the list of DEG for each RNA-seq dataset using the R package EnhancedVolcano [22]. Additionally, we evaluated the association of DEGs with type I, II, III IFN responses using the Interferome database (http://www.interferome.org/interferome/home.jspx last accessed on 30 March 2021).
Estimated gene count matrices were also used for the exploratory analysis of the dimensionality of the data [23]. Dimensionality reduction was done via two techniques: Principal Component Analysis (PCA) and T-Distributed Stochastic Neighbouring Entities (t-SNE). Each RNA-seq dataset was explored individually to determine the variance within each dataset due to tissue origin of HIEs and treatment modality (mock-treatment versus viral infection). PCA plots and t-SNE plots were generated using the R packages factoextra and Rtsne available at https://rpkgs.datanovia.com/factoextra/index.html and https: //github.com/jkrijthe/Rtsne accessed on 30 March 2021, respectively.

Over-Representation Analysis
Over-representation analysis was used to determine whether known biological processes (BP) were enriched in a list of differentially expressed genes (DEGs) for each one of the RNA-seq datasets analyzed. The package clusterProfiler [24] was used for over- representation analysis with strict false discovery rate (FDR) adjusted p-value cutoff of <0.001 using the Benjamini-Hochberg (BH) step-up procedure. To determine whether any BP categories were enriched, the gene ratio (number of DEGs relative to gene ontology [GO term]/total number of DEGs) and background ratio (number of genes relative to GO term/total number of DEGs and non-DEGs) are calculated and then compared. Visualization of the DEG assigned to each BP category was done using the R package Complex-Heatmap version 2.6.2 [25] available at https://github.com/jokergoo/ComplexHeatmap.

Transcription Factor Enrichment Analysis (TFEA)
TFEA using the R package TFEA.ChIP [26] was used to identify transcription factors (TF) responsible fo the co-regulation of differentially expressed genes (DEGs) shared across RNA-seq datasets as well as for DEGs identified uniquely in a single dataset. TFEA.ChIP predicts transcription factor binding sites (TFBS) proximal to a given set of genes (DEGs) based on experimentally determined protein-DNA binding profiles from chromatin immunoprecipitation (ChIP) sequencing (ChIP-seq). A total of 1122 ChIP-seq datasets covering 333 different TFs generated by the ENCODE Consortium [27], as well as collected from the Gene Expression Omnibus (GEO), comprise the TFBS database used by TFEA.ChIP. The analysis of the association of TFBS consisted of comparing how many targets of a given TF are in two lists of genes of interest (DEGs versus non-DEGs). A contingency matrix was generated for each ChIP-seq in the TFBS database counting the number of genes it interacts with in the DEGs list and non-DEGs list, with a subsequent Fisher's exact test applied to each contingency matrix to determine if the difference in the distribution is statistically significant. The results were subsequently summarized by TF. Gene Set Enrichment Analysis (GSEA) is then performed to test whether ChIP-seq datasets belonging to the same TF are significantly enriched or depleted as a group. Enrichment Score (ES) values reflect the degree to which a TF is over-represented at the top (up-regulated) or bottom (down-regulated) of a ranked list. The enriched TF ranking as well as the number of ChIP-seq datasets contributing to the TF enrichment were subsequently graphed with ggplot2 in R.

Code Availability
The source code used for analysis of the datasets and to generate the figures in this study are available at: https://gitlab.com/rjcieza/HIEs-GI-viruses-comparativetranscriptomic.git, accessed on 30 March 2021. The datasets that have been used in this study are publicly available from the GEO and EMBL-EBI as indicated above.

Samples Across Datasets Showed Divergence Due to Tissue Origin of the HIEs Line and Viral Infection
The RNA-seq datasets from each of the studies that we analyzed [10][11][12] were generated from HIEs grown from small intestinal biopsy tissue. However, in each study, a different section of the small intestine from a different donor was used, specifically the terminal ileum (TI006 and TI365 lines) for norovirus infection [11], jejunum (J2 and J11 lines) for rotavirus infection [12], and duodenum (D124) for astrovirus infection [10] (Supplemental Table S1). Hence, it was not surprising that the main source of variance when looking at the three datasets together was not the treatment (mock-treated versus virus-infected) but rather the segment of the small intestine used to derive HIEs ( Figure 1A). In the datasets where more than one HIE line was used, the driver of variance still was the HIE line used in the experiment (although that is linked with the institution where the RNA-seq experiment was done). Nevertheless, when looking at each dataset individually and comparing between samples infected or not, consistent differences can be observed ( Figure 1B-D). Dimensional reduction via principal component analysis (PCA) confirmed these results and validated that the primary source of variance in the three datasets was the tissue origin of the small intestine used to derive HIEs (Supplemental Figure S1). HIE line used in the experiment (although that is linked with the institution where the RNA-seq experiment was done). Nevertheless, when looking at each dataset individually and comparing between samples infected or not, consistent differences can be observed ( Figure 1B-D). Dimensional reduction via principal component analysis (PCA) confirmed these results and validated that the primary source of variance in the three datasets was the tissue origin of the small intestine used to derive HIEs (Supplemental Figure S1). Figure 1. Samples across the three datasets showed divergence due to tissue origin of the HIE line but due to viral infection when looking at each dataset individually. Nonlinear dimensionality reduction with t-SNE of the 16 samples belonging to the three datasets were analyzed. Samples are colored by the segment of the small intestine that was used to derive HIE and each one of the datasets corresponds to a different small intestinal segment (duodenum, ileum and jejunum). The t-SNE plot shows that clustering is primarily driven by the tissue origin of HIE (A). Nonlinear dimensionality reduction with t-SNE for each individual dataset (6, 12 and 4 samples for astrovirus-infected (B), norovirus-infected (C), and rotavirus-infected (D) HIEs datasets, respectively) shows clustering of samples due to treatment (mock-treated versus virusinfected) as well as the HIE line since in the norovirus-infected and rotavirus-infected HIE datasets more than one HIE line were used. Plots were generated with Rtsne and perplexity values were chosen for each plot after stability.

Most Non-Weakly Expressed Genes are Shared Across Datasets Including a Number of Innate Immune Response Genes
We next focused on identifying differences and similarities in HIE transcriptional changes upon enteric viral infection between the three RNA-seq datasets analyzed. After filtering we considered 11,824, 12,917, and 12,416 genes from astrovirus-infected duodenal HIEs, norovirus-infected ileal HIEs and rotavirus-infected jejunal HIEs, respectively. Samples across the three datasets showed divergence due to tissue origin of the HIE line but due to viral infection when looking at each dataset individually. Nonlinear dimensionality reduction with t-SNE of the 16 samples belonging to the three datasets were analyzed. Samples are colored by the segment of the small intestine that was used to derive HIE and each one of the datasets corresponds to a different small intestinal segment (duodenum, ileum and jejunum). The t-SNE plot shows that clustering is primarily driven by the tissue origin of HIE (A). Nonlinear dimensionality reduction with t-SNE for each individual dataset (6, 12 and 4 samples for astrovirus-infected (B), norovirus-infected (C), and rotavirus-infected (D) HIEs datasets, respectively) shows clustering of samples due to treatment (mock-treated versus virus-infected) as well as the HIE line since in the norovirus-infected and rotavirus-infected HIE datasets more than one HIE line were used. Plots were generated with Rtsne and perplexity values were chosen for each plot after stability.

Most Non-Weakly Expressed Genes are Shared Across Datasets Including a Number of Innate Immune Response Genes
We next focused on identifying differences and similarities in HIE transcriptional changes upon enteric viral infection between the three RNA-seq datasets analyzed. After filtering we considered 11,824, 12,917, and 12,416 genes from astrovirus-infected duodenal HIEs, norovirus-infected ileal HIEs and rotavirus-infected jejunal HIEs, respectively. The union for the three RNA-seq datasets consisted of 13,581 genes with 82.92% of these (11,262 genes) present in all the datasets (Figure 2A). Among the non-weakly expressed genes detected in all RNA-seq datasets were several IFN receptor genes (IFNAR, IFNGR, and IFNLR) as well as IFN-stimulated genes. Other key players of the innate immune response, such as the genes associated with Toll-like receptor signaling (TLR1-3) ( Figure 2B) and chemokines secreted in response to IFN, including CXCL10 and CXCL1 (data not shown), were also detected in all the datasets. Several of the selected shared genes annotated in the Entrez Molecular Sequence Database System as part of IFN responses showed estimated gene counts above 1000 (Log 2 values > 10). No genes coding for IFNs were found in the list of shared genes from all RNA-seq datasets. However, IFN-λ (IFNL1-3) was uniquely identified in rotavirus-infected jejunal HIEs, while IFN-ε (IFNE) was partially shared across two of the RNA-seq datasets (Supplemental Table S2). Biologically, IFNs clearly play a critical role in limiting astrovirus [10] and norovirus infections [28], but limitations in sensitivity due to sequencing depth of RNA sequencing [29], may account for this discrepancy in detection. Taken together, the overall expression patterns across the different RNA-seq datasets identified a large number of shared genes across the three studies.
The union for the three RNA-seq datasets consisted of 13,581 genes with 82.92% of these (11,262 genes) present in all the datasets (Figure 2A). Among the non-weakly expressed genes detected in all RNA-seq datasets were several IFN receptor genes (IFNAR, IFNGR, and IFNLR) as well as IFN-stimulated genes. Other key players of the innate immune response, such as the genes associated with Toll-like receptor signaling (TLR1-3) ( Figure  2B) and chemokines secreted in response to IFN, including CXCL10 and CXCL1 (data not shown), were also detected in all the datasets. Several of the selected shared genes annotated in the Entrez Molecular Sequence Database System as part of IFN responses showed estimated gene counts above 1000 (Log2 values > 10). No genes coding for IFNs were found in the list of shared genes from all RNA-seq datasets. However, IFN-λ (IFNL1-3) was uniquely identified in rotavirus-infected jejunal HIEs, while IFN-ε (IFNE) was partially shared across two of the RNA-seq datasets (Supplemental Table S2). Biologically, IFNs clearly play a critical role in limiting astrovirus [10] and norovirus infections [28], but limitations in sensitivity due to sequencing depth of RNA sequencing [29], may account for this discrepancy in detection. Taken together, the overall expression patterns across the different RNA-seq datasets identified a large number of shared genes across the three studies.

Differential Expression Analysis Showed That HIE Responses Against Astrovirus, Norovirus and Rotavirus Is Primarily Characterized by an Up-Regulation of Interferon-Stimulated Genes (ISGs)
To determine the differentially expressed genes (DEGs) between infected versus mock controls in each RNA-seq dataset, we re-analyzed each dataset individually for differential expression with the package DeSeq2 and compared the list of DEGs we obtained in this analysis with those previously published [10][11][12]. For each individual RNA-seq dataset, DEGs were identified among non-weakly expressed genes (Table 1) based on the criteria of an actual fold-change of at least 1.5 (Log 2 FC > 0.58) with a false discovery rate (FDR) cutoff of 1% (adjusted p-value < 0.01) using an Empirical Bayes (EB) approach with the R package ashr [21]. Relatively stringent FDR cutoffs were used to focus only on the strongest biological effects upon virus infection in each of the datasets. Table 1. Differential expression analysis summary. Differential analysis of RNA-seq data was performed on the three published RNA-seq datasets. Listed are the total number non-weakly expressed genes that were analyzed in each dataset to determine differentially expressed genes (DEG). In the astrovirus-infected duodenal HIE dataset, 42 DEGs were identified out of a total of 11,824 non-weakly expressed genes, the smallest number of DEGs of the three RNA-seq datasets analyzed and all were up-regulated genes ( Figure 3A). When re-analyzing the norovirus-infected ileal-HIEs dataset, we evaluated the sequencing data from both HIE lines (TI006 and TI365) together to explore universal patterns of host-response against norovirus GII.4. Additionally, sequencing data for both HIE lines were generated within the same institution for the same study and both HIE lines were grown from ileal biopsies. We found that of a total of 12,917 non-weakly expressed genes, 109 were DEGs, which were all up-regulated ( Figure 3B). Finally, in the RNA-seq dataset generated from rotavirusinfected jejunal HIEs, two jejunal HIE lines (J2 and J11) were used. In this study, out of a total of 12,416 non-weakly expressed genes, 75 were DEGs with 71 up-regulated and 4 down-regulated genes ( Figure 3C).

Differential Expression Analysis Showed That HIE Responses Against Astrovirus, Norovirus and Rotavirus Is Primarily Characterized by an Up-Regulation of Interferon-Stimulated Genes (ISGs)
To determine the differentially expressed genes (DEGs) between infected versus mock controls in each RNA-seq dataset, we re-analyzed each dataset individually for differential expression with the package DeSeq2 and compared the list of DEGs we obtained in this analysis with those previously published [10][11][12]. For each individual RNA-seq dataset, DEGs were identified among non-weakly expressed genes (Table 1) based on the criteria of an actual fold-change of at least 1.5 (Log2 FC > 0.58) with a false discovery rate (FDR) cutoff of 1% (adjusted p-value < 0.01) using an Empirical Bayes (EB) approach with the R package ashr [21]. Relatively stringent FDR cutoffs were used to focus only on the strongest biological effects upon virus infection in each of the datasets. Table 1. Differential expression analysis summary. Differential analysis of RNA-seq data was performed on the three published RNA-seq datasets. Listed are the total number non-weakly expressed genes that were analyzed in each dataset to determine differentially expressed genes (DEG). In the astrovirus-infected duodenal HIE dataset, 42 DEGs were identified out of a total of 11,824 non-weakly expressed genes, the smallest number of DEGs of the three RNA-seq datasets analyzed and all were up-regulated genes ( Figure 3A). When re-analyzing the norovirus-infected ileal-HIEs dataset, we evaluated the sequencing data from both HIE lines (TI006 and TI365) together to explore universal patterns of host-response against norovirus GII.4. Additionally, sequencing data for both HIE lines were generated within the same institution for the same study and both HIE lines were grown from ileal biopsies. We found that of a total of 12,917 non-weakly expressed genes, 109 were DEGs, which were all up-regulated ( Figure 3B). Finally, in the RNA-seq dataset generated from rotavirus-infected jejunal HIEs, two jejunal HIE lines (J2 and J11) were used. In this study, out of a total of 12,416 non-weakly expressed genes, 75 were DEGs with 71 up-regulated and 4 down-regulated genes ( Figure 3C).  . Differential expression analysis in HIEs infected with astrovirus, norovirus and rotavirus. Differentially expressed genes (DEGs) for the three re-analyzed RNA-seq datasets are shown. DEGs with Log 2 fold-change (FC) > 0.58 when compared to mock-treated samples and an FDR of less than 1% are shown red, while genes with an FDR of less than 1% (adjusted p-value < 0.01) that were below the Log 2 FC threshold are shown in blue. FDR was calculated using an Empirical Bayes (EB) approach with the R package ashr. DEGs were identified in astrovirus-infected duodenal-HIEs (A), norovirus-infected ileal-HIEs (B) and rotavirus-infected jejunal-HIEs (C) using the package DESeq2 and plotted using the package EnhanceVolcano in R.

Differential Expression Analysis Summary
A large degree of similarity was found in the list of DEGs identified between our analysis and those previously published [10][11][12]. Any differences in the number of DEGs detected between our and published analysis are likely due to different filtering methods applied to the matrices containing the estimated gene counts and different Log 2 FC cutoff.

Antiviral Defense and IFN Signaling Represent a Conserved Response of the Epithelium to Viral Infection
To identify shared DEGs, we compared the list of DEGs for each individual RNA-seq dataset. There were 33 DEGs common to all three datasets, suggesting that they are part of a shared response against viruses by the intestinal epithelium ( Figure 4A). Twentyfive out of the 33 common DEGs are annotated as part of the IFN response in the Entrez Molecular Sequence Database system ( Figure 4B). Of these, a number of IFN-stimulated genes (ISGs) were identified as part of the universal HIEs response to the three viruses, including IFN-induced protein with Tetratricopeptide repeats (IFIT1, IFIT2 and IFIT3) as well as the signal transducer and activator of transcription proteins 1 and 2 (STAT1 and STAT2). Shared DEGs were also evaluated in the Interferome database, which revealed that 16 out of the 33 DEGs shared across RNA-seq datasets were associated with type I, II, III IFN responses (Table 2). Hence, the set of shared DEGs across the three datasets was enriched in genes associated with IFN signaling. However, within the shared DEGs, virus-specific induction patterns were also observed. For example, we observed that the ISGs 2 -5 -oligoadenylate synthetase-like (OASL) and Interferon Induced Protein 44 (IFI44), as well as the Solute Carrier Family 15 Member 3 (SLC15A3), which potentiates MAVS-and STING-mediated IFN production [30], showed a stronger up-regulation in the norovirusinfected ileal HIEs compared to astrovirus-and rotavirus-infected ones ( Figure 4B). On the other hand, IFIT1 showed greater up-regulation in the rotavirus dataset versus either the norovirus or astrovirus datasets. In addition, 30 genes were differentially expressed in only two out of the three datasets ( Figure 4C). These partially shared DEGs included for example C-X-C Motif Chemokine Ligand 11 (CXCL11) and Bone Marrow Stromal Cell Antigen 2 (BST2, also called Tetherin), which was up-regulated in response to norovirus and astrovirus but not rotavirus infection. A different pattern was observed for example for the transcription factors IFN Regulated Factor 7 and 9 (IRF7, IRF9), which were differentially expressed in the astrovirus and rotavirus but not norovirus datasets, and the rotavirus and norovirus but not astrovirus datasets, respectively.
In addition to specific genes, we also wanted to explore which biological processes were over-represented in the list of DEGs identified in each RNA-seq dataset. Towards that end, we performed a gene ontology (GO) over-representation analysis comparing biological processes (BP). The dot plots highlight the top ten over-represented biological process (BP) categories identified in the list of DEGs for each dataset ( Figure 5A-C). 18, 26 and 23 over-represented BP categories were found in astrovirus-infected, norovirusinfected and rotavirus-infected HIEs, respectively, using a FDR cutoff of 0.1% (adjusted p value < 0.001) with the Benjamini-Hochberg (BH) step-up procedure (Table 3). Ten overrepresented BP categories were shared across all RNA-seq datasets ( Figure 5D). The top two over-represented BP categories were the same in all three datasets ("response to virus" [GO:0009615] and "defense response to virus" [GO:0051607]) with a gene ratio between 0.4 and 0.6, showing that close to half of the DEG identified for each dataset fall into these two categories independently of the infecting virus. The next group of over-represented BP categories contained processes associated with a type I IFN response ("regulation of type I interferon production" [GO:0032479] and "type I interferon production" [GO:0032606]) with gene ratios close to 0.2. The total number of over-represented BP categories was similar across the three RNA-seq datasets, even though the number of DEGs identified in the astrovirus-infected duodenal-HIEs was much lower than in the other two datasets (Table 3). categories was similar across the three RNA-seq datasets, even though the number of DEGs identified in the astrovirus-infected duodenal-HIEs was much lower than in the other two datasets (Table 3).         Overall, a shared set of DEGs informing a shared set of biological processes (BP) were identified in the three RNA-seq datasets. This conserved, predominant response of the intestinal epithelium to viral infection is mediated by antiviral defense and IFN signaling pathways.

Distinct DEGs and Biological Processes (BP) were Also Noted for Each Enteric Virus
In addition to shared or partially shared DEGs, we also identified 2, 47, and 18 DEGs uniquely identified in astrovirus-infected, norovirus-infected, and rotavirus-infected HIEs, respectively ( Figure 4A, Supplemental Table S3). In the dataset generated from astrovirusinfected duodenal HIEs, two DEGs (IFI27 and REV3L) not overlapping with the other datasets were identified. Interferon Alpha Inducible Protein 27 (IFI27) expression sensitizes cells to apoptotic stimuli [31], which may help the intestinal epithelium to repel astrovirusinfected cells. The dysregulation of REV3L the catalytic subunit of DNA polymerase ζ engages the innate immune response with potential antiviral consequences [32]. However, additional studies are needed to test these hypotheses. A total of 47 DEGs were found exclusively in norovirus-infected ileal HIE, including IFN-induced and -stimulated genes IFI35 and IFIT5 as well as the innate immune receptor toll-like receptor (TLR) 3 and several tripartite motif containing (TRIM) genes (TRIM 14, 21, 25, 34 and 56). Among the 18 DEGs found exclusively in the rotavirus-infected jejunal HIE dataset were IFN-λ (IFNL1, IFNL2 and IFNL3).
In line with unique DEGs, there were 1, 4, and 10 over-represented BP that were uniquely identified in each one of the three datasets from astrovirus, rotavirus and norovirus infections, respectively ( Figure 5D). However, the majority of these BP categories showed low gene ratios (gene ratio < 0.1). In astrovirus-infected duodenal HIEs "regulation of defense response to virus" [GO:0050688] was the only uniquely over-represented BP category for this dataset (Table 4). However, it was driven by genes not uniquely differentially expressed in this dataset (IFIT1, HERC5, DDX60, DHX58, and STAT1) ( Figure 6). A likely explanation is that the combination of these genes might play a more relevant part of the host response triggered in the intestinal epithelium against astrovirus than norovirus or rotavirus. The chemokine CCL5 and IFN-λ (IFNL1-3) were important for 3 out of the 4 BP over-represented uniquely in rotavirus-infected HIEs (Table 4). These genes were exclusively differentially expressed as part of the HIEs response to rotavirus ( Figure 6A). Norovirus infection upregulated the most unique BP categories with 8 out of 10 over-represented BPs including genes encoding TRIM family proteins (Table 4). Table 4. Over-representation analysis of biological processes (BP) with the package ClusterProfiler in R was used to identify uniquely over-represented BPs for each dataset. A false discovery rate (FDR) adjusted p-value cutoff of <0.001 using the Benjamini-Hochberg (BH) step-up procedure was used to determine significant over-represented BP categories. BP categories unique to each dataset are indicated including the gene ontology (GO) term ID and description as well as the genes assigned to the GO term ID.  IFITM3,IFIT1,IFITM1,LGALS9,TRIM22,EIF2AK2,TRIM5,TRIM34,  TRIM25,TRIM14,TRIM21 norovirus-infected ileal GO:0043123 positive regulation of I-kappaB kinase/NF-kappaB signaling 0.10 10 0. 00006  BST2,LGALS9,TRIM22,TRIM5,IFIT5,TRIM34,TRIM25,TLR3,  TRIM14,TRIM21   norovirus-infected  ileal  GO:0046718  viral entry into host cell  0.09  9  0.00001  IFITM3,IFITM1,LGALS9,TRIM22,TRIM5,TRIM34,TRIM25,  TRIM14,TRIM21 norovirus-infected ileal GO:0051092 positive regulation of NF-kappaB transcription factor activity 0.09 9 0.00007

Uniquely Over-Represented Biological Processes (BPs) for each Dataset Analyzed
LGALS9 ,TRIM22,EIF2AK2,TRIM5,TRIM34,TRIM25,TLR3,  TRIM14,  In addition, we analyzed the TFBS enriched in the non-shared DEGs, i.e., those identified in a single RNA-seq dataset. This analysis was performed on the 47 and 18 DEGs Figure 6. DEG expression patterns in shared over-represented biological processes (BP) in the three RNA-seq datasets after gene ontology (GO) over-representation analysis. Heatmap-like functional classification displays in which differentially expressed genes (DEGs) were assigned to the BP categories "response to virus" (A), "type I interferon production" (B) and "regulation of response to cytokine stimulus" (C). For each dataset, genes that were not differentially expressed for the indicated RNA-seq dataset are indicated in grey. Genes annotated in the Entrez Molecular Sequence Database System as part of IFN responses are highlighted in red. Genes with a symbol containing IFI (IFN-inducible) or ISG (IFN-stimulated genes) are in orange, while genes coding for IFN proteins are in green. Log 2 fold-change (FC) values within the range of each dataset are shown. Over-represented BP categories were identified with the package clusterProfiler and heatmaps generated with the package ComplexHeatmap in R. The sampling time point in hours post-infection (h.p.i) at which each RNA-seq dataset was generated is indicated. While the top shared over-represented BP categories were the same in the three RNA-seq datasets ( Figure 5), we noted virus-specific differences in the expression pattern of DEGs assigned to each category ( Figure 6). For example, of the 55 genes included in the "response to virus" BP category, 23 were expressed in all datasets, while 13 were differentially expressed in two datasets, and the remaining 19 genes were only differentially expressed in one dataset ( Figure 6A). Of note, while the number of DEGs assigned to the BP category "response to virus" in the norovirus-infected HIEs dataset was the largest, the gene ratio was the smallest compared to the other two datasets ( Figure 5A-C). This apparent discrepancy is due to the gene ratio being a function of the number of DEGs detected for each dataset. Thus, with the denominator in the norovirus dataset being larger, the ratio ultimately is smaller. A similar pattern of shared, partially shared, and unique DEGs was also observed in other over-represented BPs, including type I IFN production ( Figure 6B) and regulation of response to cytokine stimulus ( Figure 6C).
In summary, shared and unique DEGs inform similar biological processes that are triggered in HIEs as part of an intestinal epithelial response to viral infection.

Transcription Factor Enrichment Analysis (TFEA) Revealed That the Conserved Intestinal Epithelial Response against the Three Enteric Viruses is Driven by STAT1/STAT2, IRF1, and NFkB
After identifying unique and shared DEGs, we next wanted to identify transcription factor binding sites (TFBS) that were enriched in the list of shared and partially shared DEGs across the three RNA-seq datasets. Our rationale was that these DEGs are likely regulated by common transcription factors (TFs) or TFs from the same family to drive conserved responses of HIEs against viral infection. A total of 63 DEGs (shared or partially shared across datasets, Figure 4A) were subjected to Transcription Factor Enrichment Analysis (TFEA). ChIP-seq datasets corresponding to 11 TFs were significantly enriched (FDR-adjusted p value < 0.05) in the list of DEGs when compared to non-differentially expressed background genes. Among the top enriched TFBS were ones for STAT1, STAT2, IRF1 and RELA ( Figure 7A). Type I IFNs signal through the heterotrimeric factor complex ISGF3 made off phosphorylated STAT1/STAT2 and IRF9 [33]. Thus, the finding of enriched STAT1 and STAT2 TF binding sites is consistent with Type I IFN responses, one of the GO over-represented pathways for all RNA-seq datasets. IRF1 mediates part of HIEs conserved response to these three viruses in the intestinal epithelium. IRF1, a member of the IFN response factor (IRF) family, can activate an antiviral response against a wide range of viruses [34] via the stimulation of cytokines, chemokines and ISGs. It is a potent anti-norovirus effector against human (HNoV) [35] and murine norovirus (MNoV) [36], as well as a target of the rotavirus non-structural protein 1 (NSP1) [37]. A role for IRF1 in controlling astrovirus infection has not been described to date. Furthermore, the NF-κB subunit RELA regulates a predominantly pro-inflammatory set of genes in the course of RIG-I-like receptors (RLR) antiviral response [38]. This includes the target of RELA regulation CXCL11, which was found in the list of DEGs in the astrovirus-infected and norovirus-infected HIEs, further validating the analysis.
In addition, we analyzed the TFBS enriched in the non-shared DEGs, i.e., those identified in a single RNA-seq dataset. This analysis was performed on the 47 and 18 DEGs identified uniquely in norovirus-and rotavirus-infected HIEs, respectively. The number of DEGs identified in the astrovirus-infected HIEs (i.e., 2 DEGs; Figure 4A) was too small for TFEA. We found that TF gene targets identified across ChIP-seq datasets corresponding to 10 and 12 TF ( Figure 7B,C) were significantly enriched in the list of unique DEGs identified in norovirus-and rotavirus-infected HIEs, respectively. The top enriched TFBS in the norovirus-infected HIEs dataset were STAT1, STAT2, and IRF1 and largely overlapped with the enriched TFBS in the shared DEGs for all datasets ( Figure 7B). Interferon Regulatory Factor 2 (IRF2) was also present and it regulates basal and induced expression of TLR3 in response to IFN stimulation [39]. This is consistent with the finding that TLR3 was differentially expressed only in norovirus-infected HIEs. Thus, TLR3 activation and signaling appears to be a unique host-response mechanism observed upon norovirus infection in the intestinal epithelium that could be experimentally investigated in the future. Finally, in the list of DEGs identified only in rotavirus-infected HIEs, the top enriched TFBS were CTCF, HDAC2, KDM4A, SPI1, and NR3C1 ( Figure 7C). None of these TFBS were enriched in the list of DEGs shared for all viruses or for the norovirus dataset alone, suggesting a unique response of the intestinal epithelium against rotavirus. CCCTC-binding factor (CTCF) is involved in the control of viral gene transcription of several DNA viruses [40] but its potential role during infection by rotavirus, a double-stranded RNA virus, is unknown. This finding might be further explored, especially since there was a strong enrichment of several CTCF ChIP-seq datasets (129 datasets) in the list of DEGs uniquely identified in rotavirus-infected HIEs.
identified in norovirus-and rotavirus-infected HIEs, respectively. The top enriched TFBS in the norovirus-infected HIEs dataset were STAT1, STAT2, and IRF1 and largely overlapped with the enriched TFBS in the shared DEGs for all datasets ( Figure 7B). Interferon Regulatory Factor 2 (IRF2) was also present and it regulates basal and induced expression of TLR3 in response to IFN stimulation [39]. This is consistent with the finding that TLR3 was differentially expressed only in norovirus-infected HIEs. Thus, TLR3 activation and signaling appears to be a unique host-response mechanism observed upon norovirus infection in the intestinal epithelium that could be experimentally investigated in the future. Finally, in the list of DEGs identified only in rotavirus-infected HIEs, the top enriched TFBS were CTCF, HDAC2, KDM4A, SPI1, and NR3C1 ( Figure 7C). None of these TFBS were enriched in the list of DEGs shared for all viruses or for the norovirus dataset alone, suggesting a unique response of the intestinal epithelium against rotavirus. CCCTC-binding factor (CTCF) is involved in the control of viral gene transcription of several DNA viruses [40] but its potential role during infection by rotavirus, a double-stranded RNA virus, is unknown. This finding might be further explored, especially since there was a strong enrichment of several CTCF ChIP-seq datasets (129 datasets) in the list of DEGs uniquely identified in rotavirus-infected HIEs.

Discussion
The goal of this study was to identify universal and unique patterns of the intestinal epithelial response to enteric viral infection to contribute to a better understanding of viral pathogenesis. Our meta-analysis draws from studies done at different centers, using HIEs from both different regions in the gut and distinct donors. This biological variability emphasizes the robustness of our ultimate finding, which was that IFNs and ISGs are central to the innate immune response to viral infections within the gut.

Discussion
The goal of this study was to identify universal and unique patterns of the intestinal epithelial response to enteric viral infection to contribute to a better understanding of viral pathogenesis. Our meta-analysis draws from studies done at different centers, using HIEs from both different regions in the gut and distinct donors. This biological variability emphasizes the robustness of our ultimate finding, which was that IFNs and ISGs are central to the innate immune response to viral infections within the gut.
Our analysis was aware of the caveats presented from combining distinct RNAseq datasets: (a) sequenced at different institutions, (b) obtained from HIEs grown from different sections of the small intestine (jejunum, duodenum, and ileum), and (c) collected at different times post-virus infection (6,24, and 48 hpi with rotavirus, astrovirus and norovirus, respectively). These differences were noticeable when the data was subjected to dimensionality reduction, where t-Distributed Stochastic Neighbor Embedding (t-SNE) showed that the primary factor driving the differences between the three RNA-seq datasets was likely the anatomical origin of the biopsy used to generate HIEs (Figure 1). Despite the variance across datasets due to the biopsy source used to grow HIEs, 89.92% of non-weakly expressed genes were identified in all three datasets (11,262 out of 13,581 genes). This was particularly useful for downstream differential analysis for each one of the RNA-seq datasets, since DEGs were identified from mostly similar lists of genes.
Several over-represented biological processes were part of type I IFN production, signaling, and response ( Figure 5) and thus are part of the critical defense response in the intestinal epithelium to enteric viruses. This multigene family encodes 13 partially homologous IFNα subtypes, IFNβ and several other IFN gene products that are less well characterized [31]. However, in this study we were not able to detect either IFNα or IFNβ in any of the RNA-seq datasets. Only IFNε (IFNE) was detectable in the datasets generated from astrovirus-infected duodenal HIEs and norovirus-infected ileal HIEs, but IFNE was not differentially expressed. This is in contrast to our previously published research, where we found that colonic and ileal HIEs showed detectable IFNβ gene expression in response to astrovirus infection by quantitative PCR [10]. This highlights a caveat of transcriptomic analysis, whereby the findings are influenced by the depth of sequencing required to pick up, in this case, these specific IFN genes. For example, mammalian reovirus infected T84 intestinal cells and enterovirus 71-infected HT29 cells express lower levels of IFNβ transcripts compared to IFNλ [41,42]. As such the level of IFNα and/or IFNβ production in the HIEs could have been below the limit of detection of RNA-sequencing. Similarly, we observed that a type III IFN response was a significant part of the epithelial response to rotavirus, with IFNλ (IFNL1, IFNL2, and IFNL3) being differentially expressed, consistent to what the authors originally reported [12]. Changes in IFNλ however were not subjected to differential expression analysis since these genes had very low relative expression levels and were thus filtered out by our stringent analysis criteria. Thus, a greater sequencing depth would have been required to pick up the IFNλ genes since our research on astroviruses [10] was able to detect IFNλ by quantitative PCR in colonic and ileal HIEs at 24 h post-infection. Nevertheless, it is also possible that the production of IFNλ by HIEs in response to rotavirus might be of a larger magnitude than in response to norovirus or astrovirus infections. Comparative infections with the three viruses in the same HIE line would be required to test this possibility in the future.
Type I and III IFN production and signaling typically occurs in response to stimulation of pattern recognition receptors (PRRs), including toll-like receptors (TLRs). We found TLR3 to be the only TLR differentially expressed and only in ileal HIEs in response to norovirus (≈fold-change of 2). While TLR1, 2 and 3 were detectable across all datasets, they were only differentially expressed in response to norovirus infection. TLR3 is primarily present in the endosomal compartment and senses double-stranded (ds) RNA [42,43]. DsRNA is a RNA virus replication intermediate present during infection with all three viruses. Thus, TLR3 could conceivably function in the antiviral response to all three viruses. However, TLR3 may only be differentially expressed in the norovirus-infected HIEs dataset due to the timing of sample collection (48 h post-infection) since an amplification of the baseline expression level of TLR3 during viral infection of intestinal epithelial cells is part of the ISG response [42]. It is also conceivable that TLR3 might play a more unique role in the epithelial response to norovirus rather than astrovirus and rotavirus. There is precedent from murine norovirus (MNoV) that TLR3 contributes to controlling infection in vivo [44]. Furthermore, a number of ISGs, including IFN-stimulated gene 15 (ISG15), a central player in the host antiviral response [45], and the E3 ubiquitin-protein ligase HERC5, were strongly up-regulated in all datasets and classified as part of the type I IFN production biological process.
In addition to the shared enrichment of DEGs in the type I IFN and response to virus pathways, a variation in the DEGs between the three viruses was also observed. The factors resulting in this variance are likely multi-fold and represent a balance of experimental differences, including in sampling timepoints, and inherent differences in the biology of these three viruses, including their ability to antagonize innate immune responses. For example, the host response to viral infection is dependent on the expression kinetics of type I and III IFNs, among others. The expression of IFNβ and IFNλ mRNA peaked at 16 and 24 hpi, respectively, in intestinal enteroids infected with reovirus [41], consistent with the notion that type III IFN responses are generally of lower magnitude, slower kinetics and longer duration compared to type I IFN responses [46]. Thus, sampling of the IFN responses in the three datasets occurred at different points of the IFN signaling cascade with the astrovirus-infected HIEs being sampled around the peak of IFN expression (i.e., 24 hpi), while rotavirus-infected HIEs were sampled prior (i.e., 6 hpi) and norovirusinfected HIEs sampled after (i.e., 48 hpi) the peak in expression. Thus, the enriched TFBS unique to rotavirus infection could reflect events occurring prior to IFN induction, while the similarities in TFBS enriched during norovirus infection and the shared TFBS are consistent with downstream signaling from the type I and III IFN receptors. Similarly, TLR3 was uniquely identified in the norovirus dataset at 48 hpi, consistent with a previously identified IRF1/type III IFN-mediated positive feedback loop of TLR3 expression that exhibited a delayed peak in TLR3 expression (between 2-3 days pi) relative to IFNλ and IRF1 (1 dpi) in enterovirus EV71-infected intestines [42]. However, these host-driven dynamics are counterbalanced by virus-encoded proteins since pathogenic viruses have evolved multiple strategies to overcome innate host defenses and establish successful infection. In case of rotavirus, several strategies to disrupt host defense responses have been identified. These include degradation of the mitochondrial antiviral signaling protein (MAVS) by the RNA capping enzyme VP3 [47] to limit viral sensing, degradation of type I, II, III IFN receptors [48], and nonstructural protein NSP1-mediated degradation of IRF3, IRF5, and IRF7 [37,49] or inhibition of NFkB signaling via degradation of the β-transducin repeat-containing protein (β-TrCP) [50][51][52]. Much less is known about human norovirus antagonism. A recent study did not observe differences in STAT1-deficient vs wild-type HIE infected with the same genotype used in the RNA-seq study (GII.4), while a different genotype (GII.3) was sensitive, suggesting GII.4 HNoV limits IFN signaling [28]. While the human norovirus p48 (NS1/2) and p22 (NS4) proteins interfere with intracellular protein trafficking when overexpressed in cells [53], whether they block IFN secretion during infection remains to be explored. Whether human astroviruses employ mechanisms to antagonize IFN responses remains unknown. Thus, the interplay between viral and host factors likely drives differences in differential expression between the various virus-infected HIEs datasets.
Overall, while a comparative analysis of dissimilar RNA-seq datasets from the literature has limitations, our processing pipeline and filters (removal of weakly-expressed genes, higher FDR cutoffs) limited these caveats and allowed us to draw several conclusions. Specifically, we identified key shared players across all datasets, which are part of the response to virus (e.g., IF44, IF44L, STAT1, STAT2, MX1, and MX2), demonstrating that HIEs employ a generic antiviral response. However, this is augmented by more tailored responses depending on the virus (e.g., IFI27 in response to astrovirus, and TLR3 in response to norovirus). We hope this comparative study provides a useful resource to the scientific community that stimulates further explorations into common and unique aspects of the host response to virus infections in the human intestinal epithelium. These studies should be complemented by comparative studies between different viral pathogens that cause gastroenteritis in the same host genetic background.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/v13061059/s1, Figure S1: Samples across the three datasets showed divergence due to tissue origin of the HIE line and due to viral infection when looking at each dataset individually, Table S1: Sequence read archive (SRA) list of all samples analyzed in this study, Table S2: Nonweakly expressed genes shared in all RNA-seq datasets and partially shared across datasets, Table S3: Uniquely differentially expressed genes (DEGs) for each RNA-seq dataset analyzed. Data Availability Statement: The datasets that have been used in this study are publicly available from the GEO and EMBL-EBI. The source code used for analysis of the datasets and to generate the figures in this study are available at: https://gitlab.com/rjcieza/HIEs-GI-viruses-comparativetranscriptomic.git accessed on 30 March 2021.