A Multi-mRNA Prognostic Signature for Anti-TNFα Therapy Response in Patients with Inflammatory Bowel Disease

Suraj Sakaram; Yehudit Hasin-Brumshtein; Purvesh Khatri; Yudong D. He; Timothy E. Sweeney

doi:10.3390/diagnostics11101902

,

and

¹

Inflammatix, Inc., 863 Mitten Rd., Suite 104, Burlingame, CA 94010, USA

²

Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, Palo Alto, CA 94305, USA

³

Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA

^*

Authors to whom correspondence should be addressed.

Diagnostics2021, 11(10), 1902;https://doi.org/10.3390/diagnostics11101902

This article belongs to the Section Pathology and Molecular Diagnostics

Version Notes

Order Reprints

Review Reports

Abstract

Background: Anti-TNF-alpha (anti-TNFα) therapies have transformed the care and management of inflammatory bowel disease (IBD). However, they are expensive and ineffective in greater than 50% of patients, and they increase the risk of infections, liver issues, arthritis, and lymphoma. With 1.6 million Americans suffering from IBD and global prevalence on the rise, there is a critical unmet need in the use of anti-TNFα therapies: a test for the likelihood of therapy response. Here, as a proof-of-concept, we present a multi-mRNA signature for predicting response to anti-TNFα treatment to improve the efficacy and cost-to-benefit ratio of these biologics. Methods: We surveyed public data repositories and curated four transcriptomic datasets (n = 136) from colonic and ileal mucosal biopsies of IBD patients (pretreatment) who were subjected to anti-TNFα therapy and subsequently adjudicated for response. We applied a multicohort analysis with a leave-one-study-out (LOSO) approach, MetaIntegrator, to identify significant differentially expressed (DE) genes between responders and non-responders and then used a greedy forward search to identify a parsimonious gene signature. We then calculated an anti-TNFα response (ATR) score based on this parsimonious gene signature to predict responder status and assessed discriminatory performance via an area-under-receiver operating-characteristic curve (AUROC). Results: We identified 324 significant DE genes between responders and non-responders. The greedy forward search yielded seven genes that robustly distinguish anti-TNFα responders from non-responders, with an AUROC of 0.88 (95% CI: 0.70–1). The Youden index yielded a mean sensitivity of 91%, mean specificity of 76%, and mean accuracy of 86%. Conclusions: Our findings suggest that there is a robust transcriptomic signature for predicting anti-TNFα response in mucosal biopsies from IBD patients prior to treatment initiation. This seven-gene signature should be further investigated for its potential to be translated into a predictive test for clinical use.

Keywords:

mRNA prognostic; anti-TNFα therapy; IBD; multicohort analysis

1. Introduction

Inflammatory bowel disease (IBD) is a chronic inflammation of the digestive system and includes at least two main types: Crohn’s disease (CD) and ulcerative colitis (UC). CD and UC patients often present with abdominal pain and diarrhea, as well as rectal bleeding frequently occurring in UC patients [1]. Age-standardized prevalence of IBD has increased from 79.5 in 1990 to 84.2 in 2017 (per 100,000), resulting in 6.8 million cases of IBD globally [2]. While the US has the highest age-standardized prevalence rates (464.5 per 100,000), there is also an alarming rise in prevalence in low/middle-income countries [2].

The introduction of anti-TNFα therapies (drugs that inhibit the interaction of TNFα with its receptors) has revolutionized the management of IBD [3]. The five drugs (Etanercept, Infliximab, Adalimumab, Certolizumab, and Golimumab) account for more than $25 billion in annual sales globally, making them one of the highest-revenue class of drugs on the market [4]. However, anti-TNFα administration and usage remain suboptimal for achieving positive health outcomes for several reasons. First, a substantial percentage of patients fail to achieve a therapeutic response. Approximately 10 to 30% of patients do not respond to the initial treatment (primary non-responders); of those who respond initially, 23 to 46% become non-responders over time (secondary non-responder) [5]. Second, anti-TNFα therapies carry an increased risk of infections, most notably reactivation of tuberculosis, as well as liver problems, arthritis, and lymphoma [3]. Third, since the treatment choice and administration are empirical, multiple different drugs are often tried in sequence, resulting in high costs and morbidity [6,7,8]. For example, in a recent study, direct healthcare expenses markedly increased after initiating anti-TNFα therapy from approximately $5500 to $45,000 in the first year alone and exceeded $200,000 over a span of 5 years [9].

To date, there is no clear predictive factor of response or loss of response to anti-TNFα therapies [10]. Although several studies have identified single biomarker predictors of response, none has translated well for clinical practice [11,12,13,14]. Overcoming this gap in the administration of anti-TNFα therapies is an important next step in improving their utility and reducing overall healthcare costs, morbidity, and mortality.

Several studies have previously used gene expression profiles to predict response to anti-TNFα therapies [13,15,16]. However, almost all studies used a single homogeneous cohort with a relatively small sample size that does not represent the clinical and biological heterogeneity of the patients with IBD, and this could be due to sample source (colon vs. ileal biopsy), age, disease duration, and disease status (flare vs. remission). This lack of biological and clinical heterogeneity, in turn, reduces the generalizability of findings. Using a multicohort analysis framework [17], we have repeatedly demonstrated that leveraging biologically, clinically, and technically heterogeneous cohorts identifies a more robust generalizable gene signature compared to using a single homogeneous cohort [18,19,20,21,22,23,24,25,26,27,28] and can be translated in a diagnostic test for use in clinics [29,30,31,32].

Here, we hypothesized that a multicohort analysis of transcriptomic data from baseline (pretreatment) intestinal mucosal biopsies from patients with IBD across heterogeneous datasets would identify a robust generalizable gene expression signature predictive of a patient’s response to anti-TNFα therapy. To test this hypothesis, we performed a multicohort analysis of four publicly available gene-expression datasets and identified a seven-gene signature that robustly predicts anti-TNFα responders from non-responders prior to initiation of treatment.

2. Methods

2.1. Dataset Search and Curation

We systematically searched (June 2019) for clinical studies on anti-TNFα therapy response in two public data repositories, NCBI GEO and EBI ArrayExpress, using the following search terms: infliximab, Remicade, adalimumab, Humira, certolizumab, Cimzia, golimumab, Simponi, etanercept, and Enbrel. We then excluded studies if they (1) did not directly pertain to anti-TNFα therapy in IBD, including its subtypes (UC and CD), (2) did not have a clinical adjudication for response, or (3) did not contain transcriptomic data from baseline (pretreatment) samples. Altogether, we identified 5 datasets that passed our inclusion criteria: GSE12251, GSE23597, GSE14580, GSE23597, and E-MTAB-7604 [13,15,16,33].

2.2. Sample Curation and Clinical Response Adjudication

Two datasets, GSE12251 and GSE23597, contained samples collected from ACT1 Trial, raising the possibility that samples may be overlapping between the two datasets. To ensure that there were no overlapping samples present in both studies, we compared the raw gene-expression data of the samples across both datasets and found that 23 samples were exactly matched, thus confirming the presence of overlapping samples. These 23 samples made up the entirety of GSE12251, while being a subset of samples in GSE23597. Therefore, we removed GSE12251 from our analysis altogether. GSE14580 comprises active UC patients and is a subset of GSE16879 (matching GSM sample IDs in both datasets); therefore, overlapping samples were removed from GSE16879 to prevent duplicates. The remaining samples in GSE16879 were derived from a CD cohort where 19 patient biopsies were extracted from the colon (CDc) and 18 from the ileum (CDi). In the CDi group, 8 patients were responders and 10 were non-responders [15]. GSE23597 had response outcome adjudicated at weeks 8 and 30 post-Infliximab treatment [33]. For the scope of our analysis, we used the week-8 timepoint for adjudication to be consistent with the timepoints for response assessment used in the other datasets. We used the clinical response definition to anti-TNFα therapy, as described in each original study (Supplementary Materials Table S1), and assigned each sample a binary response label (responder = 1; non-responder = 0).

2.3. Gene-Expression Normalization

Microarray: The 3 microarray datasets used Affymetrix Human Genome U133 Plus 2.0 Array (GEO platform accession: GPL570) for profiling. Thus, to remove platform specific technical variation, we processed samples from all microarray cohorts in one batch. Specifically, we downloaded original data files (.CEL) and normalized all data by using the Robust Multichip Average (RMA) method from the affy R package (version 1.63.1, REF) in conjunction with a custom CDF from BrainArray, HGU133Plus2_Hs_ENTREZG (version 23.0.0, ENTREZG).

RNA-seq: One study, E-MTAB-7604, used RNA-Seq for transcriptome profiling. For this study, we downloaded the raw data (fastq files) from ArrayExpress. We used our previously described pipeline to process the data [34]. Briefly, we used FASTQC to assess multiple QC metrics and Cutadapt [35] to trim adapter sequences and 3 bases on the 3′ end of the reads. We used STAR aligner (version 2.7.3a) to map the reads to the human reference genome and transcriptome (versions GRCh38 and GENCODE v32 primary assembly GTF, respectively) [36,37]. We used STAR quantification option to sum the mapped reads across Ensembl transcript IDs, which were then translated to Entrez gene IDs with AnnotationDbi from Bioconductor [38]. All 44 samples passed standard QC metrics, and the resulting counts matrix (20,460 Entrez genes by 44 samples) was used in subsequent data-normalization and -processing steps.

Voom transform: Low-expressed genes were filtered by using the following cutoff: max counts per million (CPM) less than 5 across all 44 samples. Normalization factors were obtained by using Trimmed Mean of M values (TMM) method (edgeR package version 3.28.0) [39,40]. The voom method (limma R package version 3.41.18) was then used to transform counts into normalized log2-CPM [41]. This method transformed the data to make them amenable for multicohort analysis with microarray datasets.

2.4. Inter-Dataset Co-Normalization

We used Combat CO-Normalization, using conTrols to co-normalize samples across platforms [31]. COCONUT (COCONUT R package version 1.0.2) uses healthy controls (HC) to removes batch effects under the assumption that HCs from different cohorts represent the same distribution. Briefly, HCs from each platform undergo ComBat co-normalization without covariates [42]. The derived cohort-specific normalization factors are then applied to the diseased samples in a cohort. In order to co-normalize microarray and RNA-Seq expression data, we made use of pooled healthy controls from datasets that were available to co-normalize across platforms.

2.5. Leave-One-Study-Out (LOSO) Multicohort Analysis

We performed a LOSO multicohort analysis with k studies, holding out one study and performing a multicohort analysis on the remaining k-1 cohorts, and repeated k times in a round-robin fashion where a different study was held out each time. In each round, we calculated the effect size (Hedges’ g) for all genes between anti-TNFα responders and non-responders within a study and summarized across all datasets, using the DerSimonian and Laird random-effects model to obtain a pooled or summary effect size [17]. We calculated effect size correlations between each pair of datasets to assess dataset similarity and potential for signal to exist. A p-value based on standard normal distribution was calculated for the pooled effect size with a Benjamini–Hochberg False Discovery Rate (FDR) correction for multiple hypothesis testing (q-value). We considered only genes that were significant across all LOSO rounds. We applied a q-value threshold of 10% and absolute effect size threshold of 0.8, where needed, to obtain a set of significant differentially expressed (DE) genes.

2.6. Anti-TNFα Response (ATR) Score and Performance Metrics

We used the following formula to calculate an anti-TNFα response (ATR) score for each sample:

A T R s c o r e = z s c o r e (G e o M e a n (p o s) - G e o M e a n (n e g) * (\frac{N p o s}{N n e g}))

where GeoMean(pos) and GeoMean(neg) are the geometric mean of the expression of all positive (overexpressed in responders) or negative (underexpressed in responders) genes, respectively; and Npos and Nneg are counts of positive to negative genes, respectively. In the case of 0 positive genes, the formula collapses to GeoMean(neg) term with negative sign scaled via zscore. We used the ATR score in conjunction with the ground truth response adjudication to test the class discriminatory power of a gene set, using area-under-the-receiver operating-characteristic curves (AUROC) as our primary metric. We used the trapezoidal method to calculate AUROCs and generated a smoothened pooled ROC curve with weighted standard deviation, using the Kester and Buntinx Method [43]. We determined an optimal cut-point to obtain the sensitivity and specificity of each AUROC, using the Youden method (cutpointr R package version 1.1.0).

2.7. Parsimonious Anti-TNFα Response Gene Signature

To identify a minimal set of genes with robust discriminatory performance (weighted AUROC) despite heterogeneity, we used a greedy forward search algorithm [31,44]. Briefly, starting with a set of DE genes, an ATR score, samples’ response adjudications, and a stopping threshold (0.1), the forward search computes the ATR score for each gene individually and chooses the gene with the highest weighted AUROC across datasets. In subsequent iterations, each one of the remaining genes is added to the model, one at a time, whereby the gene which provides the greatest increase in weighted AUROC is retained. Once the iterative increase in weighted AUROC falls below the stopping threshold (i.e., the addition of any gene from the list no longer increases the total weighted AUROC by more than the threshold), the forward search terminates, resulting in the final gene list. We defined the weighted AUROC as the sum of each dataset’s AUROC multiplied by its number of samples.

2.8. Pathway Analysis

We used Gene Set Enrichment Analysis (GSEA) [45] to explore the biological relevance of differentially expressed genes, as identified by the multicohort analysis. Specifically, we tested significance of over-representation of genes reflected in Gene Ontology (GO), including biological process (BP), molecular function (MF), and cellular compartment (CC). The human transcriptome reference was used as background, and the p-values from the hyper-geometric test were adjusted by using the Benjamini–Hochberg method.

3. Results

3.1. Data Collection, Curation, and Preprocessing

We chose to integrate multiple independent gene-expression datasets that collectively represent biological, clinical, and technical heterogeneity observed in the real-world patient population to identify a robust generalizable gene signature for anti-TNFα therapy response in IBD [17,24,25,27]. We surveyed NCBI GEO and EBI ArrayExpress for whole-transcriptome datasets from patients with IBD who were subjected to anti-TNFα therapies that met the inclusion criteria (Table 1 and Methods). Collectively, these datasets included patients from multisite global studies, such as ACT1 Trial (biological heterogeneity), with a wide range of disease severity (clinical heterogeneity) and profiled by using different high-throughput platforms (technical heterogeneity). Overall, we identified four gene-expression datasets comprising 136 mucosal biopsy samples (71 responders and 65 non-responders), for which 15,116 genes were measured across all datasets.

Table 1. Mucosal biopsy datasets used for multicohort analysis. Responders and non-responders were labeled based on cohort’s annotation criteria, as described in Methods.

3.2. Multicohort Analysis Identified 324 Significant Differentially Expressed (DE) mRNAs between Responders and Non-Responders

In order to assess the potential for obtaining signal, we considered dataset similarity based on gene-effect size correlations. We found strong positive correlations (>0.5) between GSE14580 and GSE16879, as well as GSE23597 and E-MTAB-7604 (Supplementary Materials Figure S1). This indicated that there is potential for a generalizable response signal. To obtain a baseline signal of response, we performed a LOSO multicohort analysis, using all four datasets. We identified 324 differentially expressed mRNAs (58 overexpressed and 266 underexpressed) in responders, as compared to non-responders, with absolute pooled summary effect size >0.8 and false discovery rate (FDR) <10% (Figure 1a and Supplementary Materials Table S2). Notably, E-MTAB-7604, an RNA-Seq dataset, stood out in comparison with the three microarray datasets whereby gene effect sizes are visibly varied; this is very likely due to technical heterogeneity in how gene expression is measured across the two platforms. Importantly, when considering gene effect sizes across datasets in a pooled fashion (Figure 1a, top row), we obtained point estimates for the effect size of genes that represent the underlying transcriptional differences between responders and non-responders that may exist in the overall IBD patient population.

Figure 1. Multicohort analysis of IBD cohorts reveals 324 significant DE genes. (a) Heatmap of 324 DE genes’ effect sizes sorted by pooled summary effect size. Genes were selected by |pooled summary effect size| > 0.8, FDR < 10% in a LOSO multicohort analysis between anti-TNFα responders vs. non-responders in 4 individual datasets. (b) Thirty top-ranked significantly enriched GO terms revealed by the gene-set enrichment of the 324 GE genes. GeneRatio in x-axis represents the number of genes in our gene set within a pathway (size of points) out of the total number of genes of that pathway. The adjusted p-value of enrichment of our gene set in each pathway is shown by the color of points.

The GSEA of these 324 genes showed that the most over-represented pathways included neutrophil activation, neutrophil degranulation, and neutrophil-mediated immunity, as well as leukocyte migration. The Gene Ontology analysis of the 324 genes found that they are enriched for the regulation of inflammatory response as the major predictive factor for responsiveness to anti-TNFα therapy, consistent with previous studies (Figure 1b) [46,47,48].

3.3. A Parsimonious Seven-Gene Signature Suitable for Clinical Utility

The list of 324 DE genes was not optimized for discriminatory performance and ill-suited for translation to clinical practice. Hence, we used a greedy forward search to identify a parsimonious discriminatory gene set, yielding seven genes. Of the seven, three genes (WNK2, OCRL, and ASB7) were overexpressed, and four were underexpressed (PCBP3, AMPD2, FAM155A, and IL13RA2), in responders (Figure 2a and Supplementary Materials Table S2—highlighted in red). Using this seven-gene signature, we computed an ATR score for each sample across all datasets (Methods). The ATR scores of responders were significantly higher than those of non-responders across all datasets (p < 0.05; Figure 2b). Overall, the seven-gene signature had robust discriminatory performance across all datasets, with a pooled AUROC of 0.88 (range from 0.80 to 0.97; Figure 2c). We used the Youden Index to determine an optimal cut-point that maximizes the signature’s differentiating ability, and this yielded a mean sensitivity of 91%, mean specificity of 76%, and mean accuracy of 86% (Supplementary Materials Table S3). Based on the pooled ROC (Figure 2c), we estimated the performance of the ATR score for rule-in and rule-out scenarios, respectively. Specifically, for a rule-in scenario with sensitivity fixed at 95%, the ATR score has a specificity of 50%; alternatively, for a rule-out scenario with the specificity fixed at 90%, the ATR score has a sensitivity of 70%.

Figure 2. Effect sizes and discriminatory performance of the 7-gene signature. (a) Forest plots for random-effects-model estimates of effect size of the 7-gene signature derived from greedy forward search, comparing anti-TNFα responders vs. non-responders (box size is inversely proportional to standard error of effect size; whiskers represent upper and lower confidence intervals). (b) Violin plots of ATR scores based on the 7-gene signature in responders vs. non-responders (p < 0.05). (c) ROC curves shown for discriminatory performance of 7-gene signature in discovery datasets obtained with LOSO approach. The dotted line denotes 0.5 AUC line (random guessing). The gray shaded area denotes confidence band around pooled ROC curve (black line).

4. Discussion

Although anti-TNFα therapies offer a powerful way to manage the progression and treatment of IBD, they are expensive, ineffective in more than 50% of patients, and increase the risk of infections, liver problems, arthritis, and lymphoma [3,5]. With global prevalence of IBD on the rise, a prognostic for anti-TNFα therapy response is crucial to addressing the challenges associated with clinical use of potent immunomodulators [2]. To date, no studies have identified a set of biomarkers that have translated to clinical practice [10,11,12,13,14,49,50,51].

Our goal was to address this unmet global healthcare need by identifying a clear predictive gene signature of response to anti-TNFα therapy, in the hopes that it would greatly improve the efficacy and cost-to-benefit ratio of these biologics. We used our established multicohort analysis framework to analyze four mucosal biopsy datasets curated from the public domain and identified 324 genes that were differentially expressed between anti-TNFα responders from non-responders prior to treatment initiation, irrespective of biological, clinical, and technical heterogeneity between datasets due to sample source (colon or ileal), disease pathology (UC or CD), disease status (remission or flare), age, and sex. From this broad set of genes, we utilized a greedy forward search algorithm to downselect a parsimonious set of genes that have the potential to translate well into a clinically useful response signature. Specifically, our seven-gene signature (WNK2, OCRL, ASB7, PCBP3, AMPD2, FAM155A, and IL13RA2) had a robust pooled AUROC performance of 0.88 across all datasets, demonstrating the feasibility of using a multi-mRNA signature for an anti-TNFα response prognostic test. Interestingly, IL13RA2 has been previously identified as an underexpressed marker in baseline (pretreatment) mucosal biopsies of patients that had endoscopic remission after being subjected to anti-TNFα therapy [13]. The other six genes in the context of anti-TNFα response in IBD have not been investigated, holding promise of yet undiscovered molecular pathophysiology. Irrespective of biological context, in clinical practice, a prognostic test with this level of performance would be useful in multiple clinical applications. We envision use cases for rule-in and rule-out tests with the ATR score. For rule-out case with sensitivity fixed at 95%, the ATR score has a specificity 50%. Alternatively, for a rule-in case with specificity fixed at 90%, the ATR score has a sensitivity 70%.

While our results show that the seven-gene signature has the potential to translate into a clinically actionable prognostic test, a limitation of our analysis is that we included all gene-expression datasets published to date for analysis. As a result, it was not possible to hold out datasets for independent validation at this stage. However, we have previously shown in a methodological study that three-to-five datasets are sufficient to find reproducible differentially expressed genes, if in fact there is signal between the two classes [52]. It is evidently clear in the literature that roughly one-third of patients on anti-TNFα therapies respond, while two-thirds do not. Moreover, previous studies have sought to identify underlying transcriptional differences in anti-TNFα naïve IBD patients that predispose some patients to having a therapeutic response. In lieu of independent validation, we applied a stringent method of biomarker discovery via a LOSO round-robin multicohort analysis. This ensures that no one dataset drives the biomarker discovery process. Moreover, it has been shown to lend itself to a more generalizable signal that would be reproducible in external cohorts [31]. However, we recognize that it is possible that an independent validation cohort may exhibit characteristic variations that are not captured in our discovery datasets, such that the seven-gene signature would suffer from a loss in performance. Our framework is designed to account for this eventuality; by incorporating more sources of variation from forthcoming datasets while holding out data, our methodology enables iterative refinement and improvement in performance, allowing us to progress towards the goal of a developing a clinically actionable test.

We believe that our seven-gene signature will have a significant impact on the use of anti-TNFα therapies. There is no generalizable prognostic test to predict response to anti-TNFα therapy across the heterogeneous patient population. When translated as a companion diagnostic test, the seven-gene signature would identify patients who are likely to benefit from biologic therapy (rule-in), while identifying patients unlikely to respond (rule-out), enabling the clinician to identify other treatment modalities for them substantially sooner. In other words, our seven-gene signature would aid the clinicians’ treatment decision by increasing the percentage of patients more likely to improve from anti-TNFα therapy, while minimizing the number of patients who would not have a response and, consequently, reducing the adverse side effects that diminish quality of life. Collectively, our seven-gene signature would significantly reduce healthcare costs, morbidity, and mortality.

To summarize, we report several important findings in this work. First, despite a relatively low sample size and a limited number of datasets, we demonstrated that our multicohort analysis framework with an LOSO approach identified 324 DE genes between anti-TNFα responders and non-responders, despite biological, clinical, and technical heterogeneity between datasets. Second, we further illustrated that, from this broad set of genes, we converged to a parsimonious seven-gene signature, using greedy forward search, and achieved a pooled performance AUROC of 0.88. We expect this signature to be validated in prospective patient cohorts, as this would enable us to develop a clinically actionable predictive test.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/diagnostics11101902/s1.

Author Contributions

T.E.S., Y.D.H. and P.K. designed the study; S.S., Y.H.-B. and Y.D.H. performed bioinformatics analysis; S.S. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

Research reported in this publication was supported in part by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under Award Number R43DK127578. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data for the datasets used in our analysis can be accessed on GEO and Array, under their respective study IDs.

Conflicts of Interest

S.S., Y.H.-B., Y.D.H. and T.E.S. are employees of and shareholders in Inflammatix, Inc., which has filed a provisional patent concerning the findings herein. P.K. reports being a shareholder and a consultant to Inflammatix, Inc.

References

Meeting, A.S.; Course, P.; Antonio, S.; Loftus, E.V.; Clinic, M. Progress in the Diagnosis and Treatment of Inflammatory Bowel Disease. Gastroenterol. Hepatol. 2011, 7, 1–4. [Google Scholar]
Alatab, S.; Sepanlou, S.G.; Ikuta, K.; Vahedi, H.; Bisignano, C.; Safiri, S.; Sadeghi, A.; Nixon, M.R.; Abdoli, A.; Abolhassani, H.; et al. The Global, Regional, and National Burden of Inflammatory Bowel Disease in 195 Countries and Territories, 1990–2017: A Systematic Analysis for the Global Burden of Disease Study 2017. Lancet Gastroenterol. Hepatol. 2020, 5, 17–30. [Google Scholar] [CrossRef] [Green Version]
Lis, K.; Kuzawińska, O.; Bałkowiec-Iskra, E. Tumor Necrosis Factor Inhibitors—State of Knowledge. Arch. Med. Sci. AMS 2014, 10, 1175–1185. [Google Scholar] [CrossRef] [PubMed]
Herman, B. Abbvie-Humira-2018-Sales-20-Billion-E4039176-Baeb-44ff-B4fe-1b63005283b9 @ Www.Axios.Com. Axios n.d. Available online: https://www.axios.com/abbvie-humira-2018-sales-20-billion-e4039176-baeb-44ff-b4fe-1b63005283b9.html (accessed on 19 November 2019).
Roda, G.; Jharap, B.; Neeraj, N.; Colombel, J.F. Loss of Response to Anti-TNFs: Definition, Epidemiology, and Management. Clin. Transl. Gastroen. 2016, 7, e135-5. [Google Scholar] [CrossRef] [PubMed]
Sandborn, W.J.; Abreu, M.T.; D’Haens, G.; Colombel, J.F.; Vermeire, S.; Mitchev, K.; Jamoul, C.; Fedorak, R.N.; Spehlmann, M.E.; Wolf, D.C.; et al. Certolizumab Pegol in Patients with Moderate to Severe Crohn’s Disease and Secondary Failure to Infliximab. Clin. Gastroenterol. H 2010, 8, 688–695.e2. [Google Scholar] [CrossRef] [PubMed]
Sandborn, W.J.; Rutgeerts, P.; Enns, R.; Hanauer, S.B.; Colombel, J.F.; Panaccione, R.; D’Haens, G.; Li, J.; Rosenfeld, M.R.; Kent, J.D.; et al. Adalimumab Induction Therapy for Crohn Disease Previously Treated with Infliximab: A Randomized Trial. Ann. Intern Med. 2007, 146, 829–838. [Google Scholar] [CrossRef]
Yarur, A.J.; Rubin, D.T. Therapeutic Drug Monitoring of Anti-Tumor Necrosis Factor Agents in Patients with Inflammatory Bowel Diseases. Inflamm. Bowel Dis. 2015, 21, 1709–1718. [Google Scholar] [CrossRef]
Targownik, L.E.; Benchimol, E.I.; Witt, J.; Bernstein, C.N.; Singh, H.; Lix, L.; Tennakoon, A.; Zubieta, A.A.; Coward, S.; Jones, J.; et al. The Effect of Initiation of Anti-TNF Therapy on the Subsequent Direct Health Care Costs of Inflammatory Bowel Disease. Inflamm. Bowel Dis. 2019, 25, 1718–1728. [Google Scholar] [CrossRef]
Macaluso, F.S.; Sapienza, C.; Ventimiglia, M.; Renna, S.; Rizzuto, G.; Orlando, R.; Pisa, M.D.; Affronti, M.; Orlando, E.; Cottone, M.; et al. The Addition of an Immunosuppressant After Loss of Response to Anti-TNFα Monotherapy in Inflammatory Bowel Disease: A 2-Year Study. Inflamm. Bowel Dis. 2018, 24, 394–401. [Google Scholar] [CrossRef] [Green Version]
Dubinsky, M.C.; Mei, L.; Friedman, M.; Dhere, T.; Haritunians, T.; Hakonarson, H.; Kim, C.; Glessner, J.; Targan, S.R.; McGovern, D.P.; et al. Genome Wide Association (GWA) Predictors of Anti-TNFα Therapeutic Responsiveness in Pediatric Inflammatory Bowel Disease. Inflamm. Bowel Dis. 2010, 16, 1357–1366. [Google Scholar] [CrossRef]
Khor, B.; Gardet, A.; Xavier, R.J. Genetics and Pathogenesis of Inflammatory Bowel Disease. Nature 2011, 474, 307–317. [Google Scholar] [CrossRef] [Green Version]
Verstockt, B.; Verstockt, S.; Dehairs, J.; Ballet, V.; Blevi, H.; Wollants, W.-J.; Breynaert, C.; Assche, G.V.; Vermeire, S.; Ferrante, M. Low TREM1 Expression in Whole Blood Predicts Anti-TNF Response in Inflammatory Bowel Disease. Ebiomedicine 2019, 40, 733–742. [Google Scholar] [CrossRef] [Green Version]
Gaujoux, R.; Starosvetsky, E.; Maimon, N.; Vallania, F.; Bar-Yoseph, H.; Pressman, S.; Weisshof, R.; Goren, I.; Rabinowitz, K.; Waterman, M.; et al. Cell-Centred Meta-Analysis Reveals Baseline Predictors of Anti-TNFα Non-Response in Biopsy and Blood of Patients with IBD. Gut 2019, 68, 604–614. [Google Scholar] [CrossRef] [PubMed]
Arijs, I.; Hertogh, G.D.; Lemaire, K.; Quintens, R.; Lommel, L.V.; Steen, K.V.; Leemans, P.; Cleynen, I.; Assche, G.V.; Vermeire, S.; et al. Mucosal Gene Expression of Antimicrobial Peptides in Inflammatory Bowel Disease before and after First Infliximab Treatment. PLoS ONE 2009, 4, e7984. [Google Scholar] [CrossRef] [PubMed]
Arijs, I.; Li, K.; Toedter, G.; Quintens, R.; Lommel, L.V.; Steen, K.V.; Leemans, P.; Hertogh, G.D.; Lemaire, K.; Ferrante, M.; et al. Mucosal Gene Signatures to Predict Response to Infliximab in Patients with Ulcerative Colitis. Gut 2009, 58, 1612–1619. [Google Scholar] [CrossRef] [PubMed]
Haynes, W.A.; Vallania, F.; Liu, C.; Bongen, E.; Tomczak, A.; Andres-Terrè, M.; Lofgren, S.; Tam, A.; Deisseroth, C.A.; Li, M.D.; et al. Empowering Multi-Cohort Gene Expression Analysis to Increase Reproducibility. Biocomput 2017, 0, 144–153. [Google Scholar] [CrossRef] [Green Version]
Chowdhury, R.R.; Vallania, F.; Yang, Q.; Angel, C.J.L.; Darboe, F.; Penn-Nicholson, A.; Rozot, V.; Nemes, E.; Malherbe, S.T.; Ronacher, K.; et al. A Multi-Cohort Study of the Immune Factors Associated with M. Tuberculosis Infection Outcomes. Nature 2018, 560, 644–648. [Google Scholar] [CrossRef] [PubMed]
Schultz, B.J.; Sweeney, T.; DeBaun, M.R.; Remmel, M.; Midic, U.; Khatri, P.; Gardner, M.J. Pilot Study of a Novel Serum MRNA Gene Panel for Diagnosis of Acute Septic Arthritis. World J. Orthop. 2019, 10, 424–433. [Google Scholar] [CrossRef]
Warsinske, H.; Vashisht, R.; Khatri, P. Host-Response-Based Gene Signatures for Tuberculosis Diagnosis: A Systematic Comparison of 16 Signatures. PLoS Med. 2019, 16, e1002786. [Google Scholar] [CrossRef] [Green Version]
Avey, S.; Cheung, F.; Fermin, D.; Frelinger, J.; Gaujoux, R.; Gottardo, R.; Khatri, P.; Kleinstein, S.H.; Kotliarov, Y.; Meng, H.; et al. Multicohort Analysis Reveals Baseline Transcriptional Predictors of Influenza Vaccination Responses. Sci. Immunol. 2017, 2, eaal4656. [Google Scholar] [CrossRef] [Green Version]
Robinson, M.; Sweeney, T.E.; Barouch-Bentov, R.; Sahoo, M.K.; Kalesinskas, L.; Vallania, F.; Sanz, A.M.; Ortiz-Lasso, E.; Albornoz, L.L.; Rosso, F.; et al. A 20-Gene Set Predictive of Progression to Severe Dengue. Cell Rep. 2019, 26, 1104–1111.e4. [Google Scholar] [CrossRef] [Green Version]
Sweeney, T.E.; Braviak, L.; Tato, C.M.; Khatri, P. Genome-Wide Expression for Diagnosis of Pulmonary Tuberculosis: A Multicohort Analysis. Lancet Respir. Med. 2016, 4, 213–224. [Google Scholar] [CrossRef] [Green Version]
Lofgren, S.; Hinchcliff, M.; Carns, M.; Wood, T.; Aren, K.; Arroyo, E.; Cheung, P.; Kuo, A.; Valenzuela, A.; Haemel, A.; et al. Integrated, Multicohort Analysis of Systemic Sclerosis Identifies Robust Transcriptional Signature of Disease Severity. JCI Insight 2016, 1, e89073. [Google Scholar] [CrossRef]
Li, M.D.; Burns, T.C.; Morgan, A.A.; Khatri, P. Integrated Multi-Cohort Transcriptional Meta-Analysis of Neurodegenerative Diseases. Acta Neuropathol. Commun. 2014, 2, 1–23. [Google Scholar] [CrossRef] [PubMed]
Andres-Terre, M.; McGuire, H.M.; Pouliot, Y.; Bongen, E.; Sweeney, T.E.; Tato, C.M.; Khatri, P. Integrated, Multi-Cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity 2015, 43, 1199–1211. [Google Scholar] [CrossRef] [Green Version]
Haynes, W.A.; Haddon, D.J.; Diep, V.K.; Khatri, A.; Bongen, E.; Yiu, G.; Balboni, I.; Bolen, C.R.; Mao, R.; Utz, P.J.; et al. Integrated, Multicohort Analysis Reveals Unified Signature of Systemic Lupus Erythematosus. JCI Insight 2020, 5, e122312. [Google Scholar] [CrossRef] [Green Version]
Khatri, P.; Roedder, S.; Kimura, N.; Vusser, K.D.; Morgan, A.A.; Gong, Y.; Fischbein, M.P.; Robbins, R.C.; Naesens, M.; Butte, A.J.; et al. A Common Rejection Module (CRM) for Acute Rejection across Multiple Organs Identifies Novel Therapeutics for Organ Transplantation. J. Exp. Med. 2013, 210, 2205–2221. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Södersten, E.; Ongarello, S.; Mantsoki, A.; Wyss, R.; Persing, D.H.; Banderby, S.; Meuzelaar, L.S.; Prieto, J.; Gnanashanmugam, D.; Khatri, P.; et al. Diagnostic Accuracy Study of a Novel Blood-Based Assay for Identification of Tuberculosis in People Living with HIV. J. Clin. Microbiol. 2021, 59, e01643-20. [Google Scholar] [CrossRef] [PubMed]
Mayhew, M.B.; Buturovic, L.; Luethy, R.; Midic, U.; Moore, A.R.; Roque, J.A.; Shaller, B.D.; Asuni, T.; Rawling, D.; Remmel, M.; et al. A Generalizable 29-MRNA Neural-Network Classifier for Acute Bacterial and Viral Infections. Nat. Commun. 2020, 11, 1177. [Google Scholar] [CrossRef] [Green Version]
Sweeney, T.E.; Wong, H.R.; Khatri, P. Robust Classification of Bacterial and Viral Infections via Integrated Host Gene Expression Diagnostics. Sci. Transl. Med. 2016, 8, 346ra91. [Google Scholar] [CrossRef] [Green Version]
Buturovic, L.; Zheng, H.; Tang, B.; Lai, K.; Kuan, W.S.; Gillett, M.; Santram, R.; Shojaei, M.; Almansa, R.; Nieto, J.Á.; et al. A 6-MRNA Host Response Whole-Blood Classifier Trained Using Patients with Non-COVID-19 Viral Infections Accurately Predicts Severity of COVID-19. Medrxiv 2020. [Google Scholar] [CrossRef]
Toedter, G.; Li, K.; Marano, C.; Ma, K.; Sague, S.; Huang, C.C.; Song, X.Y.; Rutgeerts, P.; Baribaud, F. Gene Expression Profiling and Response Signatures Associated with Differential Responses to Infliximab Treatment in Ulcerative Colitis. Am. J. Gastroenterol. 2011, 106, 1272–1280. [Google Scholar] [CrossRef]
Thair, S.A.; He, Y.D.; Hasin-Brumshtein, Y.; Sakaram, S.; Pandya, R.; Toh, J.; Rawling, D.; Remmel, M.; Coyle, S.; Dalekos, G.N.; et al. Transcriptomic Similarities and Differences in Host Response between SARS-CoV-2 and Other Viral Infections. Iscience 2021, 24, 101947. [Google Scholar] [CrossRef]
Martin, M. Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. Embnet. J. 2011, 17, 10–12. [Google Scholar] [CrossRef]
Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast Universal RNA-Seq Aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef] [PubMed]
Frankish, A.; Diekhans, M.; Ferreira, A.-M.; Johnson, R.; Jungreis, I.; Loveland, J.; Mudge, J.M.; Sisu, C.; Wright, J.; Armstrong, J.; et al. GENCODE Reference Annotation for the Human and Mouse Genomes. Nucleic Acids Res. 2018, 47, gky955. [Google Scholar] [CrossRef] [Green Version]
Pagès, H.; Carlson, M.; Falcon, S.; Li, N. AnnotationDbi: Manipulation of SQLite-Based Annotations in Bioconductor. Available online: https://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html (accessed on 20 April 2021).
Robinson, M.D.; Oshlack, A. A Scaling Normalization Method for Differential Expression Analysis of RNA-Seq Data. Genome Biol. 2010, 11, R25. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef]
Johnson, W.E.; Li, C.; Rabinovic, A. Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods. Biostatistics 2007, 8, 118–127. [Google Scholar] [CrossRef]
Kester, A.D.M.; Buntinx, F. Meta-Analysis of ROC Curves. Med. Decis. Mak. 2000, 20, 430–439. [Google Scholar] [CrossRef]
Sweeney, T.E.; Shidham, A.; Wong, H.R.; Khatri, P. A Comprehensive Time-Course-Based Multicohort Analysis of Sepsis and Sterile Inflammation Reveals a Robust Diagnostic Gene Set. Sci. Transl. Med. 2015, 7, 287ra71. [Google Scholar] [CrossRef] [Green Version]
Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [Green Version]
Zhang, C.; Shu, W.; Zhou, G.; Lin, J.; Chu, F.; Wu, H.; Liu, Z. Anti-TNF-α Therapy Suppresses Proinflammatory Activities of Mucosal Neutrophils in Inflammatory Bowel Disease. Mediat. Inflamm. 2018, 2018, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pavlidis, S.; Monast, C.; Loza, M.J.; Branigan, P.; Chung, K.F.; Adcock, I.M.; Guo, Y.; Rowe, A.; Baribaud, F. I_MDS: An Inflammatory Bowel Disease Molecular Activity Score to Classify Patients with Differing Disease-Driving Pathways and Therapeutic Response to Anti-TNF Treatment. PLoS Comput. Biol. 2019, 15, e1006951. [Google Scholar] [CrossRef]
Dinallo, V.; Marafini, I.; Fusco, D.D.; Laudisi, F.; Franzè, E.; Grazia, A.D.; Figliuzzi, M.M.; Caprioli, F.; Stolfi, C.; Monteleone, I.; et al. Neutrophil Extracellular Traps Sustain Inflammatory Signals in Ulcerative Colitis. J. Crohn’s Colitis 2019, 13, 772–784. [Google Scholar] [CrossRef] [PubMed]
Gisbert, J.P.; Chaparro, M. Predictors of Primary Response to Biologic Treatment [Anti-TNF, Vedolizumab, and Ustekinumab] in Patients with Inflammatory Bowel Disease: From Basic Science to Clinical Practice. J. Crohn’s Colitis 2020, 14, 694–709. [Google Scholar] [CrossRef] [PubMed]
Atreya, R.; Neurath, M.F.; Siegmund, B. Personalizing Treatment in IBD: Hype or Reality in 2020? Can We Predict Response to Anti-TNF? Front. Med. 2020, 7, 517. [Google Scholar] [CrossRef] [PubMed]
West, N.R.; Hegazy, A.N.; Owens, B.M.J.; Bullers, S.J.; Linggi, B.; Buonocore, S.; Coccia, M.; Görtz, D.; This, S.; Stockenhuber, K.; et al. Oncostatin M Drives Intestinal Inflammation in Mice and Its Abundance Predicts Response to Tumor Necrosis Factor-Neutralizing Therapy in Patients with Inflammatory Bowel Disease. Nat. Med. 2017, 23, 579–589. [Google Scholar] [CrossRef] [PubMed]
Sweeney, T.E.; Haynes, W.A.; Vallania, F.; Ioannidis, J.P.; Khatri, P. Methods to Increase Reproducibility in Differential Gene Expression via Meta-Analysis. Nucleic Acids Res. 2017, 45, 1–14. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Multicohort analysis of IBD cohorts reveals 324 significant DE genes. (a) Heatmap of 324 DE genes’ effect sizes sorted by pooled summary effect size. Genes were selected by |pooled summary effect size| > 0.8, FDR < 10% in a LOSO multicohort analysis between anti-TNFα responders vs. non-responders in 4 individual datasets. (b) Thirty top-ranked significantly enriched GO terms revealed by the gene-set enrichment of the 324 GE genes. GeneRatio in x-axis represents the number of genes in our gene set within a pathway (size of points) out of the total number of genes of that pathway. The adjusted p-value of enrichment of our gene set in each pathway is shown by the color of points.

Figure 2. Effect sizes and discriminatory performance of the 7-gene signature. (a) Forest plots for random-effects-model estimates of effect size of the 7-gene signature derived from greedy forward search, comparing anti-TNFα responders vs. non-responders (box size is inversely proportional to standard error of effect size; whiskers represent upper and lower confidence intervals). (b) Violin plots of ATR scores based on the 7-gene signature in responders vs. non-responders (p < 0.05). (c) ROC curves shown for discriminatory performance of 7-gene signature in discovery datasets obtained with LOSO approach. The dotted line denotes 0.5 AUC line (random guessing). The gray shaded area denotes confidence band around pooled ROC curve (black line).

Table 1. Mucosal biopsy datasets used for multicohort analysis. Responders and non-responders were labeled based on cohort’s annotation criteria, as described in Methods.

Accession	Author	Center	Platform	Disease	Anti-TNFα	Responder	Non-Responder	Total
EMTAB7604	Verstockt	University Hospital Leuven	Illumina HiSeq 4000	IBD	Adalimumab/Infliximab	19	25	44
GSE14580	Arijs	University Hospital Leuven	GPL570	UC	Infliximab	8	16	24
GSE16879	Arijs	University Hospital Leuven	GPL570	CD	Infliximab	20	17	37
GSE23597	Toedter	Multicenter ACT1	GPL570	UC	Infliximab	24	7	31
Total	3 Authors	>2 centers	2 platforms	2 major subtypes	2 anti-TNFα therapies	71	65	136

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Multi-mRNA Prognostic Signature for Anti-TNFα Therapy Response in Patients with Inflammatory Bowel Disease

Abstract

1. Introduction

2. Methods

2.1. Dataset Search and Curation

2.2. Sample Curation and Clinical Response Adjudication

2.3. Gene-Expression Normalization

2.4. Inter-Dataset Co-Normalization

2.5. Leave-One-Study-Out (LOSO) Multicohort Analysis

2.6. Anti-TNFα Response (ATR) Score and Performance Metrics

2.7. Parsimonious Anti-TNFα Response Gene Signature

2.8. Pathway Analysis

3. Results

3.1. Data Collection, Curation, and Preprocessing

3.2. Multicohort Analysis Identified 324 Significant Differentially Expressed (DE) mRNAs between Responders and Non-Responders

3.3. A Parsimonious Seven-Gene Signature Suitable for Clinical Utility

4. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics