Influenza viruses present a significant threat to public health, as highlighted by the recent introduction of the swine-derived pandemic, H1N1/09 [
38,
39,
40], into human populations. Influenza virus pandemics have been often initiated by the introduction of viruses from non-human sources followed by adaptation among humans through human-to-human transmission. One important issue in studies of viral genomes, particularly those of the influenza virus, is to predict possible changes in genomic sequence occurring in the near future [
41,
42]. In this review, we explain our recent finding [
13,
43] that BLSOM can predict the directional change of influenza A genome sequences after invasion into human populations from non-human sources, at least in a specific aspect, and, therefore, can systematically survey potentially hazardous non-human strains when introduced into human populations.
5.1. Host-Dependent Clustering of Influenza Virus Genome Sequences
Influenza A and B virus genomes are composed of eight segments, each of which encodes primarily one or two proteins. When we include only those whose sequences are available from all eight segments, genomic sequences are available from approximately 14,000 strains of influenza viruses in the Influenza Virus Resource [
44] at NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, USA). Oligonucleotide frequencies in the eight segments are summed up for each strain, and we have constructed di-, tri- and tetra-nucleotide BLSOMs (abbreviated as Di, Tri and Tetra) for all influenza A and B strains (
Figure 4). The direct target of natural selection is a virion containing a full set of the eight segments, and this genome-level BLSOM analysis should provide valuable, novel information about genome characteristics of individual strains. Influenza virus possesses the negative-sense single-stranded RNA genome, and the sequences corresponding to the coding strand are registered in the databank and, thus, are analyzed in this study; a pair of complementary oligonucleotides are not added, because the complimentary strands have clearly different biological functions. When we consider the results for the RNA genome itself, we have to make the exchange between A and U and between C and G.
Figure 4.
BLSOMs for influenza virus genome sequences. (A) Di, Tri and Tetra. Di-, tri- and tetra-nucleotide BLSOMs have been constructed for 14,447 strains of influenza A and B virus. Lattice points containing sequences from strains isolated from more than one host are indicated in black, and those containing sequences from one host are indicated in a color: avian, red; human, green; swine, blue; equine, yellow; bat, gray; and influenza B, light blue; (B) Human subtype. On each BLSOM in A, individual human virus subtypes are specified in a color representing the subtype: H1N1, brown; H1N1/09, dark green; H2N2, gray; H3N2, blue; H5N1, red; H7N9, purple; other H7 and H9, pink; other human subtype, green; and territories of other hosts are achromatic; and (C) Avian subtype. On each BLSOM in A, individual avian virus subtypes are specified in a color representing the subtype: H5N1, gray; H7N2, light blue; H7N3, blue; H9N2, brown; other avian subtype, red; and territories of other hosts are achromatic.
Figure 4.
BLSOMs for influenza virus genome sequences. (A) Di, Tri and Tetra. Di-, tri- and tetra-nucleotide BLSOMs have been constructed for 14,447 strains of influenza A and B virus. Lattice points containing sequences from strains isolated from more than one host are indicated in black, and those containing sequences from one host are indicated in a color: avian, red; human, green; swine, blue; equine, yellow; bat, gray; and influenza B, light blue; (B) Human subtype. On each BLSOM in A, individual human virus subtypes are specified in a color representing the subtype: H1N1, brown; H1N1/09, dark green; H2N2, gray; H3N2, blue; H5N1, red; H7N9, purple; other H7 and H9, pink; other human subtype, green; and territories of other hosts are achromatic; and (C) Avian subtype. On each BLSOM in A, individual avian virus subtypes are specified in a color representing the subtype: H5N1, gray; H7N2, light blue; H7N3, blue; H9N2, brown; other avian subtype, red; and territories of other hosts are achromatic.

In
Figure 4A, lattice points containing virus strains isolated from one host species are indicated in a color representing the host, and those containing strains isolated from more than one host are in black. Though only oligonucleotide compositions have been given during the BLSOM calculation, viral sequences are clustered (self-organized) according to host; for details, see our original papers [
13,
43,
45]. This host-dependent clustering of viral sequences should reflect the host dependency of the viral growth. Viruses are inevitably dependent on many host factors for their growth (e.g., pools of nucleotides, amino acids and tRNAs) and, at the same time, have to escape from antiviral host mechanisms, such as antibodies, cytotoxic T-cells, interferons and RNA interferences [
46,
47,
48,
49].
In
Figure 4B, lattice points that contain human influenza A viruses of one subtype are specified with one color representing the subtype on each BLSOM. Among the 14,000 strains analyzed, approximately 3000 strains correspond to the pandemic H1N1 strain (H1N1/09) (dark green in
Figure 4B), which has started its pandemics among humans around since April 2009. Although its origin is derived from avian strains, it has been infected to humans via swine after multiple reassortments of genome segments. Interestingly, on BLSOM, they are located apart from seasonal human H1N1 or H3N2 strains (brown or blue in
Figure 4B) and surrounded by avian and swine territories (red and blue in
Figure 4A and blank in
Figure 4B), indicating that these H1N1/09 strains have not yet been best suited to growth in human cellular environments.
In contrast to H1N1/09 (dark green), most human H5N1 strains (red in
Figure 4B and mainly black in
Figure 4A) and human H7N9 strains (purple in
Figure 4B and mainly black in
Figure 4A) are rather scattered within the avian territory (red in
Figure 4A and blank in
Figure 4B). This is consistent to the view that the human H5N1 and H7N9 strains have jumped to humans, but not yet been able to spread from human to human [
48,
50], and therefore, they have characteristics of avian viruses. These strains are more separated from the human seasonal flu territories than H1N1/09 strains, and this difference from H1N1/09 may relate to their lower infection power in human populations. It should also be noted that a very minor portion of avian H5N1 and H9N2 strains (gray or brown in
Figure 4C) are located near the human territories, showing that these strains should have oligonucleotide composition with a higher similarity to human strains than the human H5N1 and H7N9 strains already known, and these will be mentioned later in connection with potentially hazardous strains; for details, see Iwasaki
et al. [
43]. There are also the avian strains that are scattered primarily within the swine territories (red in
Figure 4C). These avian strains most likely have jumped to avians from swines because these have the swine-type oligonucleotide composition.
In the case of influenza B strains (light blue in
Figure 4A and blank in
Figure 4B), which have repeatedly caused epidemics only among humans, they form a territory more distant from the avian territory than other human seasonal strains.
5.2. Retrospective Time Series Changes Visualized for Human Viruses
The prediction of genomic sequence changes in the near future is one important issue for bioinformatics studies of influenza viruses. Invader viruses will change their genome sequences on a balance between stochastic processes of mutation and selective forces derived from various constraints, including those from a new host. In other words, a certain level of change may be predictable after invasion into a new host, at least in regard to specific aspects. To examine possible directional changes, we next visualize retrospective time series changes of human seasonal H1N1 and H3N2 strains on Tetra (
Figure 5A). Human seasonal H1N1 and H3N2 strains isolated in a specific time period are indicated in brown and blue, respectively; other human strains are left in green, and strains isolated from other hosts are left achromatic. Seasonal human strains isolated in a very early period of their pandemic (“1930–1957” for H1N1 and “1968–1974” for H3N2) are located near the avian territory (achromatic in
Figure 4B and red in
Figure 4A), and pandemic descendants isolated in later periods moved apart from the avian territory, showing time-series directional changes [
43].
Figure 5B similarly visualizes time series changes of H1N1/09 strains on Tetra; strains isolated in a specific time period are indicated in pink. The major portion of the strains isolated in April 2009 are located in the vicinity of avian and swine territories, but those isolated after 2009 are primarily located near the human seasonal flu territory and, thus, apart from the avian territory; for details, see Iwasaki
et al. [
43,
45]. These directional changes in the oligonucleotide composition for human pandemic strains have been confirmed by a numerical analysis of the multidimensional vectorial data [
45].
Figure 5.
Chronological changes observed for seasonal human flu subtypes. (
A) On Tetra listed in
Figure 4A, seasonal human H1N1 and H3N2 strains isolated in different periods are separately marked in
brown and
blue, respectively; and (
B) H1N1/09 strains that are isolated in different periods are separately marked in
pink.
Figure 5.
Chronological changes observed for seasonal human flu subtypes. (
A) On Tetra listed in
Figure 4A, seasonal human H1N1 and H3N2 strains isolated in different periods are separately marked in
brown and
blue, respectively; and (
B) H1N1/09 strains that are isolated in different periods are separately marked in
pink.
Taken together, BLSOM can visualize any category of strain, in which experimental and medical research groups will be interested; e.g., visualizing strains isolated in a specific time period and/or in a specific land area, along with the characteristics of their oligonucleotide composition. This ability of BLSOM is particularly useful for the efficient survey of potentially hazardous strains in a vast number of strains isolated.
5.4. A Strategy for Finding Potentially Hazardous Strains
Another important issue for bioinformatics studies of influenza viruses is the search for strains that will become hazardous in the near future. Our previous papers have proposed a novel strategy for finding the potentially hazardous avian strain [
43,
45]. In the present review, we introduce a similar, but slightly modified strategy. Specifically, we here suppose that the seasonal human H2N2, H3N2 and H1N1/09 strains isolated at a very early stage (
i.e., those isolated in 1957, 1968 and April 2009, respectively) may have such characteristics that potentially prepare them for efficient human-human transmission;
i.e., a significant level of human-type preference. We thus focus on specific tetranucleotides, whose occurrences in these specified human strains are distinct from those in most avian strains (
Table 1).
Table 1.
Diagnostic tetranucleotides preferred in human strains isolated in the very early period.
Table 1.
Diagnostic tetranucleotides preferred in human strains isolated in the very early period.
Class | Diagnostic Tetranucleotides |
---|
High | AAGU, ACUA, ACUU, AUGA, AUUA, |
AUUU, CUAA, CUUU, GCCG, GGCC, |
UAAG, UAUC, UCAU, UUAA, UUAU |
Low | ACCG, ACGC, AUCU, CUCA, CUGA, |
GAGC, GAGG, GAUC, GCAG, GCUG, GGAG |
Table 2 lists avian strains with a high level of the abovementioned human-type preference. Two H1N1 strains isolated from turkeys in Ontario in 2009 have these preferences for 18 and 17 tetranucleotides out of a total of 26 diagnostic tetranucleotides listed in
Table 1; designated here as a score of 18 and 17 points. These avian strains are known to be human-to-bird transmitted H1N1/09 by phylogenetic tree analysis [
52]. Two H1N1 strains isolated from turkeys in the U.S. scored 17 points and are located within a swine territory near a border to the human territory, indicating swine-to-bird transmission. These findings support the suitability of the choice of the diagnostic tetranucleotides. Importantly, an H4N2 strain isolated from Pekin duck in California also has a very high score equivalent to the abovementioned human-to-bird transmitted strains, though the H4N2 subtype has not caused epidemics among humans. When avian strains with characteristics similar to this Pekin duck strain invade to humans, this may cause human-to-human transmission with a significant probability, and therefore, have a high risk potential. H4N8, H3N8, H5N2 and H6N2 strains isolated from various birds in various places also have relatively high scores, although these subtypes also have not caused epidemics in human populations (listed in bold in
Table 2). In contrast, all known human H5N1 strains, which have not caused epidemics in human populations, have low scores (≤5); an avian H5N1 strain isolated from chicken in West Bengal has a higher score (seven points) than all known human H5N1 strains, indicating that this avian H5N1 strain may have a higher possibility of human-human transmission than the known human H5N1 strains. By combining mutually independent bioinformatics methods, we can develop a strategy for efficient and systematic surveillance of potentially hazardous strains that may cause new pandemics in a high probability among humans in the near future.
Table 2.
Potentially hazardous avian strains.
Table 2.
Potentially hazardous avian strains.
Point | Subtype | Year | Strain |
---|
18 | H1N1 | 2009 | A/turkey/Ontario/FAV110 |
18 | H1N1 | 1988 | A/turkey/NC/19762 |
18 | H1N1 | 1988 | A/turkey/NC/17026 |
17 | H1N1 | 2009 | A/turkey/Ontario/FAV114-17 |
17 | H1N1 | 1992 | A/turkey/IA/21089-3 |
17 | H1N1 | 1991 | A/chicken/PA/35154 |
15 | H1N1 | 1980 | A/turkey/Kansas/4880 |
14 | H4N2 | 2006 | A/Pekin duck/California/P30 |
11 | H3N8 | 1987 | A/duck/LA/17G |
11 | H3N2 | 2011 | A/turkey/Ontario/FAV-9 |
11 | H1N2 | 2001 | A/duck/NC/91347 |
10 | H3N2 | 2011 | A/turkey/Ontario/FAV-10 |
9 | H6N2 | 2004 | A/chicken/CA/S0403106 |
9 | H6N2 | 2002 | A/wild duck/Shantou/867 |
9 | H5N2 | 2012 | A/chicken/Taiwan/A1997 |
9 | H5N2 | 2002 | A/chicken/Guatemala/194573 |
We first calculated the average occurrence of each tetranucleotide in human H2N2, H3N2 and H1N1/09 strains isolated in the very early stage of their pandemics and then selected the tetranucleotides, for which each of the abovementioned three averages was higher or lower than the occurrences of more than 80% of the avian strains. This selection was based on the assumption that a limited portion of avian strains may have a human-type preference for some tetranucleotides. Fifteen higher and 11 lower cases of the diagnostic tetranucleotides were thus selected; for details, see Iwasaki
et al. [
43].
The avian strains that have been thought to be transferred directly from humans or swines are indicated in italic. Other avian subtype strains that have not caused pandemics among humans, and, thus, are suspected to be the potentially hazardous strains, are indicated in bold.