An Easy and Reliable Strategy for Making Type I Interferon Signature Analysis Comparable among Research Centers

Interferon-stimulated genes (ISGs) are a set of genes whose transcription is induced by interferon (IFN). The measure of the expression of ISGs enables calculating an IFN score, which gives an indirect estimate of the exposition of cells to IFN-mediated inflammation. The measure of the IFN score is proposed for the screening of monogenic interferonopathies, like the Aicardi-Goutières syndrome, or to stratify subjects with systemic lupus erythematosus to receive IFN-targeted treatments. Apart from these scenarios, there is no agreement on the diagnostic value of the score in distinguishing IFN-related disorders from diseases dominated by other types of cytokines. Since the IFN score is currently measured in several research hospitals, merging experiences could help define the potential of scoring IFN inflammation in clinical practice. However, the IFN score calculated at different laboratories may be hardly comparable due to the distinct sets of IFN-stimulated genes assessed and to different controls used for data normalization. We developed a reliable approach to minimize the inter-laboratory variability, thereby providing shared strategies for the IFN signature analysis and allowing different centers to compare data and merge their experiences.


Introduction
Type I interferon (IFN) production is part of the innate immune response to viruses or intracellular bacteria, which is triggered by the sensing of pathogen-associated nucleic acids [1]. Even though the identification of IFNs dates back to the 50s-60s [2], the description of a group of mendelian disorders with dysregulated IFN-mediated inflammation has only recently shed light on the fine regulation of the production and action of these cytokines [3]. Of note, this new group of disorders, known as type I interferonopathies, displays significant phenotypic overlaps with both systemic lupus erythematosus (SLE) and congenital viral infections of the TORCH (Toxoplasmosis, Rubella, Cytomegalovirus, Herpes simplex) and HIV (human immunodeficiency virus) groups [3,4].
Type I interferonopathies are marked by the hyper-expression of a set of genes (IFN-stimulated genes, ISGs) in inflamed tissue and often in peripheral blood, leading to the definition of the so-called

Subjects
On wet IFN signature analysis was assessed by quantitative PCR (qPCR) in ten young-aged healthy subjects (Dataset A, ten out of eleven individuals, five males and five females).
To establish whether data from qPCR and RNAseq analysis were comparable, IFN signature on wet (by qPCR) and in silico (by RNAseq) analysis was performed in twenty subjects with inflammatory diseases, such as systemic lupus erythematosus (SLE), interferonopathies or inflammatory bowel diseases (IBD), and patients' relatives, recruited at our center (Dataset E). A brief description of patients' clinical diagnosis is displayed in Table S1.
To increase healthy donors numerosity, in silico IFN signature investigation (by RNAseq) has been performed in twenty healthy individuals, collected at our center Dataset A (four out of eleven individuals), and selected from different whole blood RNA-sequencing (RNAseq) open-access web-based datasets: we sorted another three datasets, while considering exclusively healthy control samples, accessible at ArrayExpress (accession number E-MTAB-5735, Dataset B, five individuals) and at the Gene Expression Omnibus (GEO) (accession number GSE112057, Dataset C, nine individuals; GSE90081, Dataset D, two individuals). Specification about gender was not available for all the samples. However, sex has been easily inferred by expression analysis of the sex-specific genes RPS4Y1 and USP9Y.
The dataset composition is shown in Table 1. Table 1. Dataset composition: subjects, size, methods and purposes.

Datasets Subjects n Total (F/M) Method (n) Purpose
A-Data of from our center; Accession: # Healthy donors 11 (5/6) qPCR (10) RNAseq (3) To test the variability of expression of the six ISGs in a healthy donor small group available at out center RNAseq (1) To increase the healthy donor group size, to improve the power of the variability measurement B-Accession: E-MTAB-5735 Healthy donors 5 (2/3) RNAseq (5) To increase the healthy donor group size, to improve the power of the variability measurement C-Accession: GSE112057 Healthy donors 12 (6/6) RNAseq (9) To increase the healthy donor group size, to improve the power of the variability measurement D-Accession: GSE90081 Healthy donors 12 (12/0) RNAseq (2) To increase the healthy donor group size, to improve the power of the variability measurement E-Patients and patient's relatives recruited at our center Patients 20 (9/11) qPCR (20) RNAseq (20) To compare IFN signature results between qPCR and RNAseq analyses # data not present in open-access web-based datasets.

Sample Collection, RNA Isolation and cDNA Preparation
Peripheral blood was collected in PAXgene Blood RNA Tubes (PreAnalytiX, Hombrechtikon, Switzerland) and, after two-hours incubation at room temperature, tubes were frozen at −20 • C until processing. Total RNA was extracted with PAXgene Blood RNA Kit (PreAnalytiX, Switzerland), following the manufacturer's instructions, and quantified with NanoDrop Spectrophotometer (Thermo Fisher, Waltham, MA, USA). RNA integrity was checked using an Agilent Technologies 2100 Bioanalyzer.
Up to 1 µg of total RNA was retro-transcribed using SensiFAST cDNA Synthesis Kit (Bioline, London, UK).

IFN Signature Analysis
The expression of six IFN-stimulated genes was assessed by qPCR using AB 7500 Real Time PCR System (Applied Biosystems, Waltham, MA, USA), TaqMan Gene Expression Master Mix (Applied Biosystems, USA) and UPL Probes (Roche, Basel, Switzerland) for IFI27, IFI44L, IFIT1, ISG15, RSAD2, and SIGLEC1. Using AB 7500 Real Time PCR software, each target quantity was normalized with the expression level of HPRT1 and G6PD, and the relative quantification (RQ) was conducted relating to a "calibrator" sample (mix of ten healthy controls, Dataset A) using the 2 −∆∆Ct method [25]. The median fold change of the six genes was used to calculate the IFN score for each patient.

RNAseq Analysis
Transcriptome sequencing was performed using the TruSeq Stranded mRNA Sample Preparation kit (Illumina, San Diego, CA, USA) and sequenced on a NovaSeq 6000 platform (Illumina, San Diego, CA, USA), generating 2X100 bp paired-end reads (30 million reads per sample) in twenty subjects from Dataset E (patients and patients' relatives) and four out of eleven controls from Dataset A.
Data of patients with autoinflammatory diseases and three healthy individuals of our dataset were normalized and analyzed for differentially expressed genes by DESeq2 [28]. From the result table, we only considered the ISGs and evaluated their relative fold changes on each patient compared to the set of controls.
To assess the ISGs expression variability within the group of twenty healthy subjects, shortlisted from the datasets described above, we determined the expression values for each gene, normalized by Fragments Per Kilobase per Million mapped reads (FPKM) method with edgeR (rpkm function) [29,30], using the values from the "Length" column, in the featureCounts' output, for the calculation.
Principal component analysis (PCA), useful for data visualization, was conducted with DESeq2, to define the overall variability between samples.

Statistical Analyses
Considering that each of the six genes measured was expressed on a different scale, we decided to calculate the sample size based on the coefficient of variation, instead than the mean and the standard deviation. We further hypothesized that different runs did not increase the variation in comparing the samples, assuming the only origin of variability to be represented by the subjects' heterogeneity.
To determine the statistical power for data obtained by qPCR and RNAseq, we computed the noncentrality parameter (λ) using GPower 3.1.9.2. software [31,32], with a generic two-tailed t-test, given α = 0.05, β = 0.2, and degrees of freedom equal N-1. If the noncentrality parameter under these conditions (reference value, "λref") resulted in being lower than the one calculated on our samples, we considered the sample size as appropriate.
GraphPad Prism 6 software was employed for χ 2 contingency analysis; p-values <0.05 were considered significant.
To identify the appropriate sample size for variability assessment, we computed λ for increasing numerosity (up to forty) using GPower 3.1.9.2. (Heinrich-Heine University Düsseldorf, Germany), and determined a "plateau value" by an exponential decay function (GraphPad Prism 6 software, La Jolla California USA).

Variability Assessment in IFN-Stimulated Genes Expression in Healthy Controls (Dataset A) Analyzed by qPCR
The variability of expression of the six ISGs (IFI27, IFI44L, IFIT1, ISG15, RSAD2, SIGLEC1) was assessed in ten healthy controls processed by on wet qPCR analysis. Five out of six genes showed low variability coefficients and noncentrality parameters (λ) that fulfilled the analysis criteria (as described in Materials and Methods, Section 2.5) ( Table 2). Only IFI44L did not comply with the analysis parameters, presenting higher variability and a lower λ than the reference value (λref) ( Figure 1)   38.99 λr e f : 3 . 1 5 SD: standard deviation; λref: noncentrality parameter (λ) calculated based on α = 0.05, β = 0.2, and degrees of freedom equal N-1. Sample size is considered as appropriate when λ computed on each gene is higher than λref. The value below λref is highlighted in bold. Sample size is considered as appropriate when λ computed on each gene is higher than λref.
Thus, ten healthy controls could not be considered an appropriate sample size to represent an ideal healthy population, in which the physiologically floating expression values of the ISGs present acceptable variability. For this reason, we should increase the numerosity of healthy controls to obtain a suitable pool in which the gene expression variability is minimized.

IFN-Stimulated Genes Expression Evaluated by qPCR or RNAseq Analysis Are Comparable
To improve the power of the variability measurement, we decided to take advantage of RNAseq open-access web-based data, as an easy source to increase the number of healthy subjects to calculate ISGs interindividual differences. This choice came from comparisons between the relative ISGs fold change assessed in the same twenty subjects (Dataset E) by both qPCR and RNAseq analysis, selecting the same set of three out of eleven healthy controls from Dataset A, to normalize data for both techniques.
Some subjects showed different relative expression values for the same gene calculated by on wet qPCR and in silico RNAseq, but the overall results of the IFN signatures (IFN scores) were extremely consistent between the two techniques for each individual (χ 2 contingency analysis p-value = 0.405, not significant). The comparability of IFN scores is easily explained, considering that these values represent the median of the six relative ISGs quantifications, and they broadly reflect the overexpression status in the analyzed sample (Table 3). Thus, the two methods provided the same trend in gene expression in subjects presenting low, intermediate, and high IFN signatures, as indicated by the three representative graphs in Figure 2 (Subject n.9: low IFN signature; Subject n.14: intermediate IFN signature; Subject n.11: high IFN signature). Sample size is considered as appropriate when λ computed on each gene is higher than λref.
Thus, ten healthy controls could not be considered an appropriate sample size to represent an ideal healthy population, in which the physiologically floating expression values of the ISGs present acceptable variability. For this reason, we should increase the numerosity of healthy controls to obtain a suitable pool in which the gene expression variability is minimized.

IFN-Stimulated Genes Expression Evaluated by qPCR or RNAseq Analysis Are Comparable
To improve the power of the variability measurement, we decided to take advantage of RNAseq open-access web-based data, as an easy source to increase the number of healthy subjects to calculate ISGs interindividual differences. This choice came from comparisons between the relative ISGs fold change assessed in the same twenty subjects (Dataset E) by both qPCR and RNAseq analysis, selecting the same set of three out of eleven healthy controls from Dataset A, to normalize data for both techniques.
Some subjects showed different relative expression values for the same gene calculated by on wet qPCR and in silico RNAseq, but the overall results of the IFN signatures (IFN scores) were extremely consistent between the two techniques for each individual (χ 2 contingency analysis p-value = 0.405, not significant). The comparability of IFN scores is easily explained, considering that these values represent the median of the six relative ISGs quantifications, and they broadly reflect the overexpression status in the analyzed sample (Table 3). Thus, the two methods provided the same trend in gene expression in subjects presenting low, intermediate, and high IFN signatures, as indicated by the three representative graphs in Figure 2 (Subject n.9: low IFN signature; Subject n.14: intermediate IFN signature; Subject n.11: high IFN signature).

Preliminary Analysis for Sample Selection and Variability Assessment in IFN-Stimulated Genes Expression Analyzed by RNAseq
Given the comparability of data between qPCR and RNAseq, we exploited the availability of large open-access web repositories containing RNAseq data to calculate the proper sample size for the assessment of the IFN score with an acceptable inter-laboratory variability.
To select the appropriate sample size, we firstly calculated the noncentrality parameter (λ) on a numerosity up to forty individuals using GPower software. Then, we determine the plateau value of the exponential decay function of the previously computed λ values. The provided plateau (3.01) corresponds to λ for sample size n = 15 (Figure 3), leading us to consider fifteen subjects as an appropriate sample size. For more experimental strength, we considered both n = 15 and n = 20 in the following analyses. We did not further increase the sample size over twenty subjects, whereas exceeding this number might bring difficulties in term of donors' collection.

Preliminary Analysis for Sample Selection and Variability Assessment in IFN-Stimulated Genes Expression Analyzed by RNAseq
Given the comparability of data between qPCR and RNAseq, we exploited the availability of large open-access web repositories containing RNAseq data to calculate the proper sample size for the assessment of the IFN score with an acceptable inter-laboratory variability.
To select the appropriate sample size, we firstly calculated the noncentrality parameter (λ) on a numerosity up to forty individuals using GPower software. Then, we determine the plateau value of the exponential decay function of the previously computed λ values. The provided plateau (3.01) corresponds to λ for sample size n = 15 (Figure 3), leading us to consider fifteen subjects as an appropriate sample size. For more experimental strength, we considered both n = 15 and n = 20 in the following analyses. We did not further increase the sample size over twenty subjects, whereas exceeding this number might bring difficulties in term of donors' collection. We performed a principal component analysis (PCA), a data visualization analysis, to evaluate the total expression variation of the six ISGs and to define the most homogeneous set of twenty healthy individuals. This investigation allows the detection of possible rare outliers with the highest variance that might not be considered as suitable controls to study IFN signature.
RNAseq records have been chosen considering the presence of similar features such as blood collection type, RNA extraction protocol and library selection, to reduce as much as possible the technical procedure variability ( Table 3).
As a first attempt, we investigated all the RNAseq samples of healthy donors from Dataset A (data from our center, n = 4/11), Dataset B (E-MTAB-5735, n = 5) and Dataset C (GSE112057, n = 12), twenty-one specimens in total. Figure 4a displays the PCA results showing the overall ISGs expression variability between individuals. The analysis exhibited a higher variance in three out of twenty-one subjects: we thus decided not to include these three samples in further studies, collecting eighteen samples that were suitable for our purpose. To get the proper numerosity (n = 20), we examine Dataset D (GSE90081, n = 12) and we ran the same analysis again, obtaining a satisfactory level of variation, without outliers, between datasets and among individuals. From these preliminary observations, we randomly chose two out of twelve samples from Dataset D, combining them with data previously selected. Again, we observed an acceptable gene expression variability among our final twenty-controls-sized group (Table 4), as shown in Figure 4b. We performed a principal component analysis (PCA), a data visualization analysis, to evaluate the total expression variation of the six ISGs and to define the most homogeneous set of twenty healthy individuals. This investigation allows the detection of possible rare outliers with the highest variance that might not be considered as suitable controls to study IFN signature.
RNAseq records have been chosen considering the presence of similar features such as blood collection type, RNA extraction protocol and library selection, to reduce as much as possible the technical procedure variability ( Table 3).
As a first attempt, we investigated all the RNAseq samples of healthy donors from Dataset A (data from our center, n = 4/11), Dataset B (E-MTAB-5735, n = 5) and Dataset C (GSE112057, n = 12), twenty-one specimens in total. Figure 4a displays the PCA results showing the overall ISGs expression variability between individuals. The analysis exhibited a higher variance in three out of twenty-one subjects: we thus decided not to include these three samples in further studies, collecting eighteen samples that were suitable for our purpose. To get the proper numerosity (n = 20), we examine Dataset D (GSE90081, n = 12) and we ran the same analysis again, obtaining a satisfactory level of variation, without outliers, between datasets and among individuals. From these preliminary observations, we randomly chose two out of twelve samples from Dataset D, combining them with data previously selected. Again, we observed an acceptable gene expression variability among our final twenty-controls-sized group (Table 4), as shown in Figure 4b.   We calculated the λref for larger samples of healthy controls (fifteen and twenty subjects) using GPower software, and the variability of the ISGs expression in fifteen and twenty healthy controls  We calculated the λref for larger samples of healthy controls (fifteen and twenty subjects) using GPower software, and the variability of the ISGs expression in fifteen and twenty healthy controls processed by in silico RNAseq analysis. Table 5 shows that all the variability coefficients are considerably low and all λ calculated fulfilled the analysis parameters ( Figure 5) (expression values of single gene for each control are displayed in Table S2. Thus, we can hypothesize that pooling together samples from fifteen or twenty healthy subjects could also be considered a proper sample size in qPCR analyses. processed by in silico RNAseq analysis. Table 5 shows that all the variability coefficients are considerably low and all λ calculated fulfilled the analysis parameters ( Figure 5) (expression values of single gene for each control are displayed in Table S2. Thus, we can hypothesize that pooling together samples from fifteen or twenty healthy subjects could also be considered a proper sample size in qPCR analyses. SD: standard deviation; λref: noncentrality parameter calculated based on α = 0.05, β = 0.2, and degrees of freedom equal N-1. Sample size is considered as appropriate when λ computed on each gene is higher than λref.

Pooling Twenty Subjects Could Be Considered an Optimal Strategy to Minimize Gene Expression Variability among Healthy Controls for on Wet IFN Signature Analysis
We checked whether the values of mean, SD and variability coefficient obtained by on wet qPCR on ten controls met the λ criteria considering n = 15 and n = 20 subjects as already calculated, assuming that the variability coefficient does not change as the sample size increases (Table 6).

Pooling Twenty Subjects Could Be Considered an Optimal Strategy to Minimize Gene Expression Variability among Healthy Controls for on Wet IFN Signature Analysis
We checked whether the values of mean, SD and variability coefficient obtained by on wet qPCR on ten controls met the λ criteria considering n = 15 and n = 20 subjects as already calculated, assuming that the variability coefficient does not change as the sample size increases (Table 6). Mean, standard deviation (SD) and variability coefficient values previously determined on ten subjects are reported in the table. λref: noncentrality parameter (λ) calculated based on α = 0.05, β = 0.2, and degrees of freedom equal N-1. Sample size is considered appropriate when λ computed on each gene is higher than λref.
The analysis provides quite good results for both sample sizes tested, even if IFI44L showed a λ (3.08) very close to λref (3.01). For this reason, we can consider more adequate to increase the numerosity to twenty subjects, a still easy-to-gather number of healthy donors. Thus, we could assert that twenty healthy controls could be considered a suitable sample size for IFN signature analysis performed by qPCR, as predicted by in silico RNAseq ( Figure 6).  Mean, standard deviation (SD) and variability coefficient values previously determined on ten subjects are reported in the table. λref: noncentrality parameter (λ) calculated based on α = 0.05, β = 0.2, and degrees of freedom equal N-1. Sample size is considered appropriate when λ computed on each gene is higher than λref.
The analysis provides quite good results for both sample sizes tested, even if IFI44L showed a λ (3.08) very close to λref (3.01). For this reason, we can consider more adequate to increase the numerosity to twenty subjects, a still easy-to-gather number of healthy donors. Thus, we could assert that twenty healthy controls could be considered a suitable sample size for IFN signature analysis performed by qPCR, as predicted by in silico RNAseq ( Figure 6).

Discussion
The clinical employment of the IFN signature is strictly related to the screening of pathological conditions characterized by type I IFN dysregulation [24]. However, several studies have been carried out to associate the IFN-related inflammation with specific clinical or laboratory features of rheumatologic conditions, like systemic lupus erythematosus, primary antiphospholipid syndrome, Sjögren syndrome, rheumatoid arthritis, autoimmune myositis and systemic sclerosis [18,20,[33][34][35][36][37][38]. Some Authors proposed that the assessment of IFN inflammation may help identify subgroups of patients with a better response to specific treatments, as B cell targeted therapies [39][40][41][42]. Moreover, since most anti-inflammatory or immunomodulatory agents have only a weak effect on IFN inflammation, the calculation of the IFN score may also serve to guide targeted therapy approaches with novel drugs like Janus Kinase inhibitors [10].
However, there is no consensus on a shared and validated method to classify different inflammatory conditions by transcriptome analysis. Crow and collaborators used pooled cDNA from healthy donors as calibrator for qPCR, and after assessing a large number of healthy controls and patients with Aicardi-Goutières Syndrome calculated a fixed cut-off value of normality suitable for the screening of interferonopathies [6]. However, this cut-off has been validated to facilitate the detection of monogenic interferonopathies and not to assess IFN inflammation in other conditions.

Discussion
The clinical employment of the IFN signature is strictly related to the screening of pathological conditions characterized by type I IFN dysregulation [24]. However, several studies have been carried out to associate the IFN-related inflammation with specific clinical or laboratory features of rheumatologic conditions, like systemic lupus erythematosus, primary antiphospholipid syndrome, Sjögren syndrome, rheumatoid arthritis, autoimmune myositis and systemic sclerosis [18,20,[33][34][35][36][37][38]. Some Authors proposed that the assessment of IFN inflammation may help identify subgroups of patients with a better response to specific treatments, as B cell targeted therapies [39][40][41][42]. Moreover, since most anti-inflammatory or immunomodulatory agents have only a weak effect on IFN inflammation, the calculation of the IFN score may also serve to guide targeted therapy approaches with novel drugs like Janus Kinase inhibitors [10].
However, there is no consensus on a shared and validated method to classify different inflammatory conditions by transcriptome analysis. Crow and collaborators used pooled cDNA from healthy donors as calibrator for qPCR, and after assessing a large number of healthy controls and patients with Aicardi-Goutières Syndrome calculated a fixed cut-off value of normality suitable for the screening of interferonopathies [6]. However, this cut-off has been validated to facilitate the detection of monogenic interferonopathies and not to assess IFN inflammation in other conditions. Moreover, the reference to locally pooled control cDNA may make it difficult to compare results obtained in distinct centers.
Even though multiple ISG panels have been described either in peripheral blood cells or affected tissues, many laboratories set their assays by analyzing a minimal set of 5-6 targeted-genes, usually including the set proposed by Crow et al. This set has been applied to thousands of analyses and its potential for screening of monogenic interferonopathies is established. After the normalization of results on at least two housekeeping genes, the main source of variability limiting interlaboratory comparison of data consists in the use of different controls for data normalization. Indeed, this is a remarkable problem, considering that the potential role of IFN signature analysis in clinical practice can be defined only by sharing data among research centers, comparing or merging case series. Of course, the best option for the future should rely on the development of industrially manufactured kits validated for In Vitro Diagnostics (IVD) and usable worldwide with the same reference values, not only in the most advanced research areas. Conversely, only the analysis and comparison of data available at various centers can tell industries whether the development of such diagnostic kits is worthwhile or not. Thus, we focused on a strategy that could be immediately applied in biomedical laboratories to facilitate the sharing of experience and minimize the inter-laboratory variability.
We investigated if using a pool of biological samples from healthy controls could solve the data normalization issue, which is the main source of inter-laboratory variability. We proposed that this strategy could "equalize" the differences in gene expression that physiologically occur among individuals.
For this purpose, we evaluated if any pool of healthy controls more than a given number n of subjects could be suitable to level differences, through an approach based on laboratory (qPCR), bioinformatic (RNAseq), and statistical data integration analyses.
Starting from the ISGs expression assessment in peripheral blood from ten healthy subjects by on wet qPCR analysis, we found that this sample size is not suitable for equalizing the variable expression levels of all the six ISGs in healthy volunteers. To find the appropriate number of samples to be pooled together with low enough variability, we further investigated if public data from RNAseq can be exploited to expand our analysis. We thus compared relative ISGs fold change values calculated by qPCR and RNAseq analysis for the same subjects with the same calibrator, showing that the two methods yield comparable results with very low variability between each other. Previous studies compared gene expression measurements generated by in silico RNAseq and on wet qPCR assays, showing consistent results between the two methods for most genes, with the only exception of some genes generally characterized by small size and low expression levels. None of the ISGs is included in the list of genes with inconsistent estimation of expression according to the Authors [43]. Our results confirmed that RNAseq and qPCR generate consistent results in the assessment of ISGs expression levels. Thus, based on the literature and on our preliminary results, we considered RNAseq as a good asset to increase the number of healthy subjects on which to calculate interindividual differences in ISGs expression, exploiting both RNAseq samples available at our center and open-access web-based data.
The results of our study support the choice of pooling twenty healthy controls for the normalization of the assay, allowing to express results as relative to a "standard set of controls". Of note, we validated this sample size only for the selected set of six ISGs in peripheral blood cells. The expression of other genes included in larger panels, may present higher variability among donors and between different affected tissues. However, the same procedure that we have described can also be used to define the optimal sample size for other transcription profiles, for interferonopathies or for other rheumatologic disorders. The performance of the proposed twenty-controls-sized standard could be improved by analyzing all the donors separately before pooling, and by performing cluster variance analysis, which can enable excluding rare donors with outlier variance.