Influenza viruses cause worldwide outbreaks of mild to severe respiratory disease and is a constant public health concern each year. They are genetically diverse and are divided into four types A, B, C, and D, which differ in their genetic organization, host species, and clinical-epidemiological characteristics. During seasonal epidemics, influenza B viruses (IBV), influenza A/H3N2, and A/H1N1 viruses are continuously co-circulating and cause a significant disease burden [1
]. Influenza A virus (IAV) has a broad species tropism while IBV exists predominantly within the human population [2
The single-stranded, negative-sense viral RNA (vRNA) genome comprises eight segments, which are each named for the major protein that it encodes (PB2, PB1, PA, HA, NP, NA, M, NS) and vary in length from 2.3 kb to 0.9 kb. Packaging of all eight vRNA segments is required for production of fully infectious virus particles [3
] and occurs en route to the plasma membrane [5
]. Evidence suggests that assembly is a selective process [7
], which results in a distinct pattern of seven vRNA in a circle surrounding a single one in the center [10
]. The precise mechanism of selective assembly is still unknown, but direct RNA-RNA interactions have been proposed to mediate formation of the vRNA supra-molecule [10
Each influenza vRNA is bound by multiple nucleoprotein (NP) molecules and one heterotrimeric RNA polymerase complex composed of PB2, PB1, and PA to form a viral ribonucleoprotein (vRNP) complex. NP is the most abundant protein in the vRNP complex and provides a scaffold for the vRNA to facilitate the formation of the double helical vRNP structure, which results from NP binding RNA and self-oligomerization [15
]. Prevailing schematics of the viral genome depict a uniform string of NP along the length of the vRNA segment. However, this classical architecture would make segment-specific RNA-RNA interactions difficult to form. Given the fact that the classical viral genomic structure was constructed from EM images [18
], which provides no information regarding the RNA structure, we proposed that NP could associate with vRNA in a non-random and non-uniform manner. We previously used high-throughput RNA sequencing coupled with crosslinking immunoprecipitation (HITS-CLIP) to map NP association to vRNA for all eight segments of two H1N1 strains [20
] and found that NP bound to RNA in a non-uniform, non-random manner. This was highly reproducible. Recently, another group using a similar technique (PAR-CLIP) also found that NP does not bind randomly and uniformly to H1N1 vRNA [21
]. These studies propose a new architecture of the influenza viral genome.
In this study, we performed HITS-CLIP to map NP association to vRNA for all eight segments of a seasonal H3N2 and an IBV strain. These viruses were chosen since they co-circulate with seasonal H1N1 viruses and have the potential to reassort in nature. We demonstrate that H3N2 has some correlation to the NP-vRNA binding profiles of H1N1 while IAV and IBV strains have no correlation in their NP-binding profiles. Our studies demonstrate that the non-uniform, non-random NP-binding pattern is consistent for all human seasonal viruses and is a conserved feature of the influenza genomic architecture.
2. Materials and Methods
2.1. Cells and Viruses
Madin-Darby canine kidney (MDCK) cells were maintained in Eagle’s Minimum Essential Medium (EMEM, Sigma-Aldrich, St. Louis, MO, USA) supplemented with 10% fetal bovine serum (FBS, Hyclone, Logan, UT, USA), 2 mM L-glutamine (Gibco, Waltham, MA, USA), and 1% penicillin/streptomycin (Gibco). A/Panama/07/1999 (H3N2) virus was a generous gift from Dr. Zhiping Ye (Center for Biologics Evaluation and Research, FDA) through the WHO Global Influenza Surveillance Response System. The influenza B virus, B/Texas/02/2013 (Victoria Lineage), Cell-Derived, FR-1302, was obtained through the Influenza Reagent Resource, Influenza Division, WHO Collaborating Center for Surveillance, Epidemiology and Control of Influenza, Centers for Disease Control and Prevention, Atlanta, GA, USA.
2.2. HITS-CLIP and Deep Sequencing Data Analysis
HITS-CLIP was performed as described previously [20
]. Two confluent T175 flasks of MDCK cells were washed with phosphate-buffered saline and infected with the indicated virus at a dilution of 1:10,000 in serum-free EMEM containing TPCK-trypsin (Worthington Biochemicals, Berkshire, UK). At 96 h post-infection, 40 mL of the culture medium containing ~107
infectious virus particles per mL was harvested and cellular debris was pelleted by centrifugation at 2000× g
for 20 min. A clarified supernatant was UV irradiated at 254 nm (400 mJ/cm2
followed by 200 mJ/cm2
) and ultra-centrifuged at 200,000× g
for 2 h on a 30% sucrose-NTE (100 mM NaCl, 10 mM Tris pH 7.4, 1 mM EDTA) cushion. Concentrated virions were re-suspended in PXL buffer (1× PBS, 1% NP40, 0.5% deoxycholate, 0.1% SDS), treated with DNase and RNase, and it was followed by a partial RNase A (0.25, 0.025, and 0.0025 μg) digestion for 5 min at 37 °C. Virus lysate was incubated with the Dynabead Protein G conjugated to mouse monoclonal antibody MAB8251 (Millipore, Burlington, MA, USA) for A/Panama/07/1999 or mouse monoclonal antibody ab2074 (Abcam, Cambridge, MA, USA) for B/Texas/02/2013 to immuno-precipitate influenza NP. 5′ and 3′ adaptor ligation, RT reaction, and first-round PCR amplification were performed before preparing Illumina-compatible deep sequencing libraries using the NEBNext Ultra DNA Library Prep Kit (NEB, Ipswich, MA, USA). Deep sequencing was carried out using Illumina’s NextSeq platform and the data were deposited in the Sequence Read Archive under accession numbers SRP158158. Data analysis was performed using the NovoAlign alignment program and mapping the reads to reference genomes available from the NCBI database. Sequencing results were visualized on the Integrated Genome Viewer [22
2.3. Pearson Correlation Analysis
HITS-CLIP data were normalized to the highest peak within each segment (excluding super peaks). The normalized read depth at each nucleotide position was compared between H1N1, H3N2, and IBV strains using the Prism 6 software (GraphPad, La Jolla, CA, USA). The Pearson correlation coefficients and corresponding p
-values were determined between influenza virus strains for each segment. In general, Pearson correlation coefficients (r
) range from 1 to −1 where r
≥ 0.7 demonstrates a high positive correlation, 0.5 ≤ r
< 0.7 is a moderate positive correlation, 0.3 ≤ r
< 0.5 is a low positive correlation, and r
< 0.3 is a negligible correlation [23
]. To compare IAV and IBV vRNA segments with different lengths, the sequences of the two strains were aligned using a multiple sequence aligner and the gap corrected by padding any gaps with the preceding values in the bedGraph file.
2.4. Peak Analysis
Predicted NP binding sites were determined using the peak-finding algorithm MACS [24
]. For each HITS-CLIP experiment, a p-score was chosen that exhibited the best performance in terms of calling NP peaks. Non-peak regions were anything not called a peak by MACS and include HITS-CLIP read coverage of less than 5% of the maximum peak height of each experiment. Mean peak widths were calculated using the coordinates obtained from MACS analysis and omitting apparent double peaks that were called a single peak by the algorithm.
Nucleotide composition for peaks was determined using the coordinates provided by the MACS software. The sequences were isolated and the percentages of A, U, G, and C were calculated based on the length of the peak. Nucleotide compositions of non-peak regions were calculated in the same manner. The nucleotide percentages for each peak and non-peak were graphed as a scatter plot to illustrate the observed variation. A two-way ANOVA analysis using Prism 6 software (GraphPad) identified a highly significant difference in the G and U content between peaks and non-peaks (p-value < 0.0001). No statistically significant difference was observed between the percentage of A and C bases.
2.5. Continuous Wavelet Transforms
To examine potential periodicity in the NP-vRNA binding patterns, continuous one-dimensional wavelet transforms (CWT) were performed on NP-vRNA binding profiles using the “Analytical Morlet” wavelet package within the MATLAB wavelet toolbox. The sampling rate was defined as 1 per nucleotide and the resulting wavelet analysis was mapped using filled contour plots with a nucleotide repetition length as the “y-axis” frequency parameter.
In this study, we show that seasonal H3N2 and IBV strains display a non-uniform, non-random NP-vRNA binding pattern (Figure 1
and Figure 2
), which is similar to H1N1 strains [20
]. Overall, all strains tested had a similar number of peaks (Figure 3
D) and peaks trended to be more G-rich and U-poor compared to non-peaks. These observations provide the first evidence for a nucleotide bias in NP-vRNA binding. In addition, we demonstrate that the frequency of NP-peaks along a vRNA segment does not follow a periodic pattern (Figure 4
). Lastly, we compared the NP-binding profile of three IAV, two H1N1, and one H3N2 strain and observed low correlation between strains when all eight segments are concatenated together. Interestingly, the Pearson correlation of NP-binding patterns between segments revealed that some segments were more similar than others, which may be related to sequence similarity or imply a similar relationship during influenza viral RNA assembly.
Efficient incorporation of all eight segments requires segment-specific packaging signals that are known to be in the 3′ and 5′ ends of each vRNA segment [33
]. Specific regions within the coding sequence of vRNA segments have also been identified using in vitro binding assays [10
]. Many of the regions responsible for vRNA-vRNA interaction were located within the coding region for the avian H5N2 (A/Finch/England/205/91) virus [11
]. However, these studies were performed on naked RNA without NP. Based on the distinct association of NP to the vRNA segments, the addition of NP might alter the regions of a vRNA segment available for base pairing. A recent study using vRNPs from a H3N2 (A/Udorn/307/72) virus found that segment 2 (PB1
) between nucleotides 1776 and 2070 interacted with segment 6 (NA
), which drives co-segregation of these two segments in the virus [43
]. In our NP-vRNA binding profile of H3N2, segment 2 (PB1
) contains two NP-free regions and, interestingly, one of these lies within nucleotides 1776–2070 (Figure 1
B, red arrowhead). Therefore, it is an intriguing possibility that the NP-vRNA binding patterns may provide insight into potential vRNA-vRNA interactions.
Segmentation of the influenza virus genome facilitates exchange of vRNPs between co-infecting strains and provides an evolutionary advantage to the virus. Within the human population, H3N2 and H1N1 viruses continuously co-circulate during seasonal epidemics and have previously reassorted to generate a novel H1N2 virus strain [44
] that caused a small outbreak during the 2001 to 2002 season. IBV also co-circulates but is evolutionarily divergent and intertypic re-assortment of vRNA segments has never been observed in nature or tissue culture [45
]. Interestingly, there is a genetic bias during reassortment where not all 256 (from 28
vRNP choices) possible virus combinations are observed during experimental reassortment studies [46
]. Mutations caused by the influenza error-prone polymerase as well as reassortment can promote genetic diversity. However, most re-assortment events are deleterious due to protein incompatibility [47
]. Taken together with evidence for intersegmental interactions within the vRNA coding regions, NP binding to vRNA may play a role in potential reassortment events. Variation in influenza NP density along vRNA segments may aid in the coordinated assembly of all eight vRNP segments in an orderly process. Conserved NP-binding patterns between segments from different strains, as we observed in Table 1
, may also provide insight into the reassortment potential of certain vRNA segments. Further investigation is needed to determine whether segments with highly similar NP binding patterns are interchangeable and are more likely to be packaged with segments from different strains in a coinfected cell. Expanding on the notion that NP-binding patterns may contribute to vRNA assembly, a re-assortment and the striking differences between IAV and IBV NP-vRNA binding profiles may provide an explanation as to why IAV and IBV are incapable of reassortment.
We demonstrate that NP-vRNA binding profiles of IAV and IBV are non-uniform, which highlights a conservation in the new architecture recently proposed by us and others [20
]. Both IAV and IBV encode homologous proteins, which can be distinguished by different protein size, and contain noncoding regions that serve as promoters for replication and transcription [48
]. For example, biochemically purified IAV and IBV NP proteins are very similar [49
] while the N-terminal domain of IBV is significantly longer than IAV and shares little homology [30
]. Interestingly, we observed that both IAV and IBV NP proteins have a similar RNA nucleotide bias for binding with preferential association to G-rich and U-poor regions as compared to non-peak regions (Figure 3
). It is still unclear what drives NP enrichment at specific sites on the vNRA and further studies are needed to examine the contribution of the RNA sequence and the structure on NP binding.