Morphologic and Genetic Characterization of Ilheus Virus, a Potential Emergent Flavivirus in the Americas

Ilheus virus (ILHV) is a mosquito-borne flavivirus circulating throughout Central and South America and the Caribbean. It has been detected in several mosquito genera including Aedes and Culex, and birds are thought to be its primary amplifying and reservoir host. Here, we describe the genomic and morphologic characterization of ten ILHV strains. Our analyses revealed a high conservation of both the 5′- and 3′-untranslated regions but considerable divergence within the open reading frame. We also showed that ILHV displays a typical flavivirus structural and genomic organization. Our work lays the foundation for subsequent ILHV studies to better understand its transmission cycles, pathogenicity, and emergence potential.


Extraction of Viral RNA
Viral RNA from 140 µL of a cell culture supernatant was extracted using the QIAmp RNA mini kit (Qiagen, Hilden, Germany) and resuspended in 50 µL of RNase/DNase and protease-free water (Ambion, Austin, TX, USA).

Next-Generation Sequencing
Next-generation sequencing was performed on stocks of the passage histories described in Table 1. Viral RNA (~0.9 µg) was fragmented by incubation at 94 • C for eight minutes in 19.5 µL of a fragmentation buffer (Illumina, San Diego, CA, USA). A sequencing Viruses 2023, 15, 195 4 of 20 library was prepared from the sample RNA using a TruSeq RNA v2 (Illumina) kit following the manufacturer's protocol. Samples were sequenced on a HiSeq 1500 (Illumina) using the 2 × 50 paired-end protocol, except for ZCM 228 and BeH 7445, which were sequenced on a NextSeq 550 (Illumina) in the paired-end 75 base format. Reads in the fastq format were quality-filtered, and any adapter sequences were removed using Trimmomatic software [47]. The de novo assembly program ABySS [48] was used to assemble the reads into contigs using several different sets of reads and k values from 20 to 40. In all samples, host reads were filtered out before de novo assembly. The longest contigs were selected, and reads were mapped back to the contigs using bowtie2 [49] and visualized with the Integrated Genomics Viewer [50] to verify that the assembled contigs were correct. A total of 17.4, 12.7, 13.0, 13.7, 9.9, 9.2, 13.3, 16.

Rapid Amplification of cDNA Ends (RACE)
The genomic termini for the Original, 331, FSE 0800, PE 20545, and ZCM 228 ILHV strains were determined using the FirstChoice RLM RACE kit (Invitrogen, Vilnius, Lithuania). RACE was performed on the ZCM 228 strain of the passage history described in Table 1. RACE was performed on the Original, 331, FSE 0800, and PE 20545 strains of the passage histories described in Table 1 plus one additional passage in Vero cells. Self-ligated RNAs were amplified using GoTaq (Promega, Madison, WI, USA) with the virus-specific primers Ilheus_3_Outer (5 -AAGTGTGGAACAGGGTCTGG-3 ) in the forward orientation and Ilheus_5_Outer (5 -CTCTCCGTGGTGAGGAATGT-3 ) in the reverse orientation. The approximately 1400 nt amplicons were purified using the Zymoclean gel DNA recovery kit (Zymo Research, Irvine, CA, USA) prior to sequencing with the Ilheus_3_Inner (5 -CTGGGTTACCAAAGCCGTTA-3 ) primer in the forward orientation and the Ilheus_5_Inner (5 -GCATGGTGGTCAGTTCCTTT-3 ) primer in the reverse orientation. Sanger dideoxy sequencing was performed using the BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, Austin, TX, USA) and a 3500 Genetic Analyzer machine (Applied Biosystems, Austin, TX, USA).

Transmission Electron Microscopy
For ultrastructural analysis, Vero and C6/36 cells infected for 3 days with the Original, H 2944, and ZPC 804 strains (passage histories in Table 1) were fixed for at least 1 h in a mixture of 2.5% formaldehyde prepared from paraformaldehyde powder and 0.1% glutaraldehyde in a 0.05 M cacodylate buffer (pH 7.3), to which 0.03% picric acid and 0.03% CaCl 2 were added. The monolayers were washed in a 0.1 M cacodylate buffer, and cells were scraped off and further processed as pellets. The pellets were postfixed in 1% OsO 4 in a 0.1 M cacodylate buffer (pH 7.3) for 1 h, washed with distilled water, and en-bloc stained with 2% aqueous uranyl acetate for 20 min at 60 • C. The pellets were dehydrated in ethanol, processed through propylene oxide, and embedded in Poly/Bed 812 (Polysciences, Warrington, PA, USA). Ultrathin sections were cut on a Leica EM UC7 ultramicrotome (Leica Microsystems, Buffalo Grove, IL, USA), stained with lead citrate, and examined with a JEM-1400 (JEOL USA, Inc., Peabody, MA, USA) transmission electron microscope at 80 kV.

Genome Annotation
RNA structure prediction was performed in mFold [51], and manual annotations were added with Biorender.com. The 5 UTR structures were trimmed to include the entire 5 UTR and the first 17 nucleotides of the capsid gene to complete the stem-loop B (SLB) structure. The 3 UTR structures were trimmed immediately following the TAA stop codon, and mFold was restricted to an 80-nucleotide maximum distance between paired bases. Repeat Viruses 2023, 15, 195 5 of 20 sequences in the 3 UTR were identified using Unipro UGENE v41.0 [52]. Transmembrane domains (TMDs) were predicted on the basis of their alignment with TMDs identified in other flaviviruses [53][54][55][56][57][58][59][60]. Protein cleavage sites were identified by their alignment with the previously deposited and annotated ILHV Original GenBank sequence NC_009028.2 [61], as well as with analysis with SignalP 4.1 [62]. N-linked glycosylation sites were predicted with NetNGlyc 1.0 [63]; sites with a jury agreement of 9/9 are reported unless otherwise noted. Phosphorylation sites were predicted with NetPhos 3.1 [64,65]; sites with a score of ≥0.8 are reported. Cysteine bridge formation was predicted on the basis of alignment with cysteine bridges that have been experimentally verified to exist in other flaviviruses. Glycosylation, phosphorylation, and cysteine bridge prediction was performed with the Original strain of ILHV.

Phylogenetic Analysis
The evolutionary history was inferred by using the maximum likelihood method and the general time reversible model [72]. The tree with the highest log likelihood (−29,698.16) is shown. Initial tree(s) for the heuristic search were automatically obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach and then selecting the topology with the superior log likelihood value. A discrete Gamma distribution was used to model the evolutionary rate differences among sites (5 categories (+G, parameter = 0.9858)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 57.03% sites). The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 14 nucleotide sequences (13 ILHV and Rocio virus (ROCV) as the outgroup used to root the tree). The included codon positions were 1st + 2nd + 3rd + Noncoding. There were a total of 10,278 positions in the final dataset. Evolutionary analyses were conducted in MEGA11 [73,74].

Virus Morphology
In ultrathin sections of infected Vero and C6/36 cells, different structures related to the flavivirus replication cycle were consistently observed (Figure 2A-H)-convoluted membranes (asterisks in Figure 2A-C,E), smooth membrane structures (SMS) within endoplasmic reticulum cisterns (solid triangles in Figure 2A-C,E,H) that are considered viral RNA processing sites, and immature virus particles within the cisterns of granular endoplasmic reticulum (Figure 2A-H, solid arrows). Virus particles were~45 nm in diameter, and all the described structures of ILHV were typical for the genus Flavivirus.
superior log likelihood value. A discrete Gamma distribution was used to model the evolutionary rate differences among sites (5 categories (+G, parameter = 0.9858)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 57.03% sites). The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 14 nucleotide sequences (13 ILHV and Rocio virus (ROCV) as the outgroup used to root the tree). The included codon positions were 1st + 2nd + 3rd + Noncoding. There were a total of 10,278 positions in the final dataset. Evolutionary analyses were conducted in MEGA11 [73,74].

Virus Morphology
In ultrathin sections of infected Vero and C6/36 cells, different structures related to the flavivirus replication cycle were consistently observed (Figure 2A-H)-convoluted membranes (asterisks in Figure 2A-C,E), smooth membrane structures (SMS) within endoplasmic reticulum cisterns (solid triangles in Figure 2A-C,E,H) that are considered viral RNA processing sites, and immature virus particles within the cisterns of granular endoplasmic reticulum (Figure 2A-H, solid arrows). Virus particles were ~45 nm in diameter, and all the described structures of ILHV were typical for the genus Flavivirus.

Phylogenetic Analysis
We provided the first phylogenetic analysis of ILHV strains based on 13 complete open reading frame sequences, 10 of which were determined with the NGS leveraging of

Phylogenetic Analysis
We provided the first phylogenetic analysis of ILHV strains based on 13 complete open reading frame sequences, 10 of which were determined with the NGS leveraging of the resources of WRCEVA. The strains were isolated between 1944 and 2017 (mostly in Brazil, Venezuela, Ecuador and Peru), representing strains from diverse localities throughout South America, including ROCV used as an outgroup to root the ILHV tree ( Figure 3). While there are approximately 30 ILHV nucleotide sequences in GenBank, most represent partial NS5 gene sequences, thus limiting the ability of comprehensive analyses to obtain insights into ILHV's phylogeography and spatiotemporal dynamics of transmission. De-spite the limited number of complete ORF sequences available at our disposal, the analysis revealed considerable genetic diversity reflecting ILHV's continual divergence and diverse geographic distribution (Figure 3).
the resources of WRCEVA. The strains were isolated between 1944 and 2017 (mostly in Brazil, Venezuela, Ecuador and Peru), representing strains from diverse localities throughout South America, including ROCV used as an outgroup to root the ILHV tree ( Figure 3). While there are approximately 30 ILHV nucleotide sequences in GenBank, most represent partial NS5 gene sequences, thus limiting the ability of comprehensive analyses to obtain insights into ILHV's phylogeography and spatiotemporal dynamics of transmission. Despite the limited number of complete ORF sequences available at our disposal, the analysis revealed considerable genetic diversity reflecting ILHV's continual divergence and diverse geographic distribution ( Figure 3).

Genomic Characterization
The ILHV genome comprises approximately 10.8 kilobases (kb) of a single-stranded RNA of positive polarity. A single open reading frame (ORF) of 10,278 nucleotides ( Table  2) is flanked by untranslated regions (UTRs) at the 5′ and 3′ ends. The 5′ UTR is 92-93 nt long and is capped with a type I 5′ cap, while the 387-388 nt 3′ UTR lacks the classical polyadenylation site [75,76].

5′ and 3′ UTRs
The 5′ UTR consists of 93 nucleotides and is highly conserved, with only two points of variation between strains. The two oldest strains, Original and 331, have two adenines beginning at nucleotide 18 in comparison with the three adenines observed in all other

Genomic Characterization
The ILHV genome comprises approximately 10.8 kilobases (kb) of a single-stranded RNA of positive polarity. A single open reading frame (ORF) of 10,278 nucleotides (Table 2) is flanked by untranslated regions (UTRs) at the 5 and 3 ends. The 5 UTR is 92-93 nt long and is capped with a type I 5 cap, while the 387-388 nt 3 UTR lacks the classical polyadenylation site [75,76].

5 and 3 UTRs
The 5 UTR consists of 93 nucleotides and is highly conserved, with only two points of variation between strains. The two oldest strains, Original and 331, have two adenines beginning at nucleotide 18 in comparison with the three adenines observed in all other strains ( Table 2). The three Venezuelan strains were found to have a cytosine at nucleotide 51, while all other ILHV strains were found to have a uracil at the equivalent position. Both of these changes were found to be located in loop structures and to have minimal impact on the free energy of the predicted structure of the 5 UTR (Figure 4). strains ( Table 2). The three Venezuelan strains were found to have a cytosine at nucleotide 51, while all other ILHV strains were found to have a uracil at the equivalent position. Both of these changes were found to be located in loop structures and to have minimal impact on the free energy of the predicted structure of the 5′ UTR (Figure 4). The 3′ UTR of ILHV is 387-388 nucleotides long and was predicted to contain the SL-II, DB1, DB2, and 3′ SL structural elements ( Figure 5) [77]. The 3′ UTR of ILHV also contains the conserved sequence 1 (CS1), conserved sequence 2 (CS2), conserved sequence 3 (CS3), and repeated CS2 (RCS2) elements of the flavivirus 3′ UTR but lacks the repeated CS3 (RCS3) element characteristic of the JEV group and the YF-R1, YF-R2, and YF-R3 elements characteristic of YFV [78,79]. Of the total 387-388 nucleotides, only eight were found to vary between strains. Two variable positions were shown to be unique to the oldest strains, 331 and Original: C10,503U and A10,644G, with the nucleotides numbered according to their positions in the Original genome here and for all remaining descriptions of variable nucleotides. One variable position was found to be unique to the H 2944, PE 20545, and PE 163615 Peruvian isolates: A10,596G. Three variable positions were found to be unique to the Venezuelan isolates ZPC 659, ZPC 804, and ZCM 228: C10,372U, A10,383del, and A10,577G. One variable position was found to be unique to the three Venezuelan isolates and the FSE 0800 Ecuadorian isolate: C10,564U. The A10,647G variable position was found to be unique to strain ZCM 228. Strain BeH 7455 matched the consensus at all eight variable positions of the 3′ UTR. The predicted initial free energy ranged from −136.0 kcal/mol for the Original and 331 strains to −145.3 kcal/mol for the Peruvian isolates. The 3 UTR of ILHV is 387-388 nucleotides long and was predicted to contain the SL-II, DB1, DB2, and 3 SL structural elements ( Figure 5) [77]. The 3 UTR of ILHV also contains the conserved sequence 1 (CS1), conserved sequence 2 (CS2), conserved sequence 3 (CS3), and repeated CS2 (RCS2) elements of the flavivirus 3 UTR but lacks the repeated CS3 (RCS3) element characteristic of the JEV group and the YF-R1, YF-R2, and YF-R3 elements characteristic of YFV [78,79]. Of the total 387-388 nucleotides, only eight were found to vary between strains. Two variable positions were shown to be unique to the oldest strains, 331 and Original: C10,503U and A10,644G, with the nucleotides numbered according to their positions in the Original genome here and for all remaining descriptions of variable nucleotides. One variable position was found to be unique to the H 2944, PE 20545, and PE 163615 Peruvian isolates: A10,596G. Three variable positions were found to be unique to the Venezuelan isolates ZPC 659, ZPC 804, and ZCM 228: C10,372U, A10,383del, and A10,577G. One variable position was found to be unique to the three Venezuelan isolates and the FSE 0800 Ecuadorian isolate: C10,564U. The A10,647G variable position was found to be unique to strain ZCM 228. Strain BeH 7455 matched the consensus at all eight variable positions of the 3 UTR. The predicted initial free energy ranged from −136.0 kcal/mol for the Original and 331 strains to −145.3 kcal/mol for the Peruvian isolates.

Open Reading Frame
The ILHV open reading frame (ORF) was found to be 3425 amino acids long, with 99.3-100% sequence identity (Table 2, Supplementary Figure S1). The resulting polyprotein encompasses the three structural proteins (capsid (C), pre-membrane/mature membrane (prM/M), and envelope (E)) toward its amino terminus and the seven nonstructural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) toward its carboxy terminus ( Figure 6). In line with previously characterized flaviviruses, cleavage was predicted to be achieved by a combination of host and viral proteases [80].

Open Reading Frame
The ILHV open reading frame (ORF) was found to be 3425 amino acids long, with 99.3-100% sequence identity (Table 2, Supplementary Figure S1). The resulting polyprotein encompasses the three structural proteins (capsid (C), pre-membrane/mature membrane (prM/M), and envelope (E)) toward its amino terminus and the seven nonstructural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) toward its carboxy terminus ( Figure 6). In line with previously characterized flaviviruses, cleavage was predicted to be achieved by a combination of host and viral proteases [80].

Capsid
The ILHV capsid (C) protein was found to be 119 amino acids long, with the 102 amino-terminal residues forming the virion C protein and the 17 carboxy-terminal residues forming a trans-membrane domain (TMD) that anchors the C to the membrane. The C protein contains five sites that were shown to vary between ILHV strains (Supplementary Figure S1). Residue 13 is a threonine in all but ZPC 804, in which it is an alanine, and residue 33 is a leucine in all but Original and 331, in which it is a phenylalanine. The remaining variable residues (106, 114, and 116) were found to be located within the anchor portion of C. Interestingly, residue N72 of the ILHV C was predicted to be glycosylated. The glycosylation of the C protein has not been reported for other flaviviruses; however, comparable in silico prediction does hint at possible N72 glycosylation in ROCV (Supplementary Figure S2), albeit with reduced confidence. Seven serine residues (2, 22, 24, 38, 70, 93, and 103) and one threonine (74) were predicted to be phosphorylated. All predicted phosphorylation sites except S103 were shown to be located in the cytoplasmic region of C; S103 is located immediately to the carboxy side of the predicted NS3 cleavage site and is the first residue of the anchor portion of C.  Figure S1). The N15 residue of ILHV prM was predicted in silico to be glycosylated. This residue is known to be glycosylated in WNV and JEV, and it has been associated with receptor usage, tropism, virion assembly, and pathogenesis [81][82][83]. Other flaviviruses, such as DENV, YFV, and Zika virus (ZIKV), are also known to have glycosylated prM proteins [84][85][86]. Three serine residues (5, 27, and 102), four threonine residues (49, 109, 114, and 117), and three tyrosine residues (51, 75, and 134) were predicted to be phosphorylated. Half of the predicted phosphorylation sites, S109, T109, T114, T117, and Y134, were found to be located within the M portion of prM; of these, all but Y134 were found to be located in the predicted ectodomain of M. Three disulfide bridges have been experimentally verified in the crystal structure of DENV-2 [87]; all six cysteines are

Capsid
The ILHV capsid (C) protein was found to be 119 amino acids long, with the 102 aminoterminal residues forming the virion C protein and the 17 carboxy-terminal residues forming a trans-membrane domain (TMD) that anchors the C to the membrane. The C protein contains five sites that were shown to vary between ILHV strains (Supplementary Figure S1). Residue 13 is a threonine in all but ZPC 804, in which it is an alanine, and residue 33 is a leucine in all but Original and 331, in which it is a phenylalanine. The remaining variable residues (106, 114, and 116) were found to be located within the anchor portion of C. Interestingly, residue N72 of the ILHV C was predicted to be glycosylated. The glycosylation of the C protein has not been reported for other flaviviruses; however, comparable in silico prediction does hint at possible N72 glycosylation in ROCV (Supplementary Figure S2), albeit with reduced confidence. Seven serine residues (2, 22, 24, 38, 70, 93, and 103) and one threonine (74) were predicted to be phosphorylated. All predicted phosphorylation sites except S103 were shown to be located in the cytoplasmic region of C; S103 is located immediately to the carboxy side of the predicted NS3 cleavage site and is the first residue of the anchor portion of C.  Figure S1). The N15 residue of ILHV prM was predicted in silico to be glycosylated. This residue is known to be glycosylated in WNV and JEV, and it has been associated with receptor usage, tropism, virion assembly, and pathogenesis [81][82][83]. Other flaviviruses, such as DENV, YFV, and Zika virus (ZIKV), are also known to have glycosylated prM proteins [84][85][86]. Three serine residues (5, 27, and 102), four threonine residues (49, 109, 114, and 117), and three tyrosine residues (51, 75, and 134) were predicted to be phosphorylated. Half of the predicted phosphorylation sites, S109, T109, T114, T117, and Y134, were found to be located within the M portion of prM; of these, all but Y134 were found to be located in the predicted ectodomain of M. Three disulfide bridges have been experimentally verified in the crystal structure of DENV-2 [87]; all six cysteines are conserved across all nine flaviviruses considered here and were predicted to form disulfide bridges between C34-C69, C45-C81, and C53-C67 of ILHV (Supplementary Figure S2).

Envelope
The ILHV envelope (E) protein was found to be 501 amino acids long and to contain six variable residues (Supplementary Figure S1). Three points of variation only occur in a single strain: residue 157 is a glycine in BeH 7455 and an alanine in all other strains, residue 306 is an arginine in ZCM 228 and a lysine in all other strains, and residue 388 is an arginine in strain 331 and a glutamine in all other strains. The other three points varied by strain origin: residue 147 is an isoleucine in the two oldest strains (Original and 331) and a threonine in all other strains, residue 367 is a lysine in the three oldest strains (Original, 331, and BeH 7455) and an asparagine in all other strains, and residue 390 is a serine in the Peruvian strains (H 2944, PE 2054, and PE 163615) and an asparagine in all other strains. The N154 residue of the ILHV E was predicted to be glycosylated in silico. This glycosylation site is widespread among flaviviruses but is not universal at either the genus or species level. It has been shown to modulate neuroinvasion, receptor binding, particle assembly, mosquito midgut invasion, and immunogenicity [82,83,[88][89][90][91]. Ten serine residues (68,69,95,232,238,364,365,368,402, and 482), seven threonine residues (55, 115, 126, 251, 314, 318, and 350), and three tyrosine residues (96, 176, and 329) were predicted to be phosphorylated. Six disulfide bridges have been experimentally verified in WNV [92] and DENV-2 [93]; all twelve cysteines were found to be conserved across all nine flaviviruses considered here and were predicted to form disulfide bridges between C3-C80, C60-C121, C74-C105, C92-C116, C190-C288, and C305-C336 of ILHV (Supplementary Figure S2). NS1 NS1, a non-structural protein often linked to RNA replication and immune modulation, exists as intracellular monomers and dimers and is secreted as a hexamer [94,95]. The ILHV NS1 was found to be 353 amino acids long and to contain six variable residues (Supplementary Figure S1). Three of the variable residues are found exclusively in Venezuelan strains: residue 54 is an isoleucine in ZCM 228 and a valine in all other strains, residue 146 is a leucine in ZPC 659 and ZPC 804 and a serine in all other strains, and residue 245 is a threonine in ZCM 228, ZPC 659, and ZPC 804 and an isoleucine in all other strains. The Peruvian strains (H 2944, PE 20545, and PE 163615) contain an arginine at residue 261 where all other strains contain a lysine. The two oldest strains (331 and Original) contain a glutamic acid at residue 328 where all other strains contain an aspartic acid. Strain BeH 7455 contains a serine at residue 293 where all other strains contain a glycine. No N-linked glycosylation sites were predicted with high confidence in silico; the N207 site, which is highly conserved among flaviviruses and has been linked to pathogenicity, was predicted in silico not to be glycosylated [96,97]. Eight serine residues (49, 140, 178, 204, 209, 252, 298, and 305) and four threonine residues (38, 105, 165, and 303) were predicted to be phosphorylated. Multiple lines of evidence have confirmed the six disulfide bonds of flavivirus NS1. Crystallography has been used to identify the cysteine pairs of WNV and DENV-2 as 1-2, 3-4, 5-6, 7-12, 8-9, and 10-11 [98]. These results were in slight contrast to the previously reported DENV-2 arrangement of 1-2, 3-4, 5-6, 7-12, 8-10, and 9-11, the last two pairs being determined with tandem mass spectrometry [99]. This experimental discrepancy was likely caused by the difficulty in resolving the ninth, tenth, and eleventh cysteines due to their proximity in the primary structure of NS1 (CCKNC in ILHV and ROCV and CCRSC in other mosquito-borne flaviviruses [98]). All twelve cysteines were found to be conserved across all nine flaviviruses considered here and were predicted to form disulfide bridges between C4-C15, C55-C143, C179-C223, C281-C330, and either C229-C313 and C314-C317 or C229-C314 and C313-C317 of ILHV (Supplementary Figure S2).

NS2A
The flavivirus NS2A protein binds the 3 UTR during RNA replication and is necessary for particle assembly [59,100,101]. In ILHV, the NS2A protein was found to be 227 amino acids long and to contain three variable residues: residue 24 is an arginine in the three Brazilian strains (331, Original, and BeH 7455) and a lysine in all other strains, residue 72 is an isoleucine in ZCM 228 and a valine in all other strains, and residue 204 is an alanine in the two oldest strains (331 and Original) and a valine in all other strains (Supplementary Figure S1). No asparagines were predicted with a high degree of confidence in silico to be glycosylated. Phosphorylation was predicted at two serine residues (70 and 189) and two threonine residues (29 and 92). There are no conserved cysteine residues in the NS2A protein of flaviviruses (Supplementary Figure S2), and disulfide bonds within NS2A have not been reported or experimentally verified in other flaviviruses.

NS2B
NS2B is an essential co-factor for the NS3 protease [102,103]. The ILHV NS2B protein was found to be 131 amino acids long, with only a single point of variation: residue 60 is a valine in the three Venezuelan strains (ZCM 228, ZPC 659, and ZPC 804) and an isoleucine in all other strains (Supplementary Figure S1). No glycosylation sites were predicted in the ILHV NS2B, nor have any been reported for other flaviviruses. Three serine residues (61, 72, and 81) were predicted to be phosphorylated, as was the threonine at residue 125. There are no conserved cysteine residues in the NS2B protein of flaviviruses (Supplementary Figure S2), and disulfide bonds within NS2B have not been reported or experimentally verified in other flaviviruses.

NS3
The flavivirus NS3 protein is critical to viral replication, serving as both a serine protease to cleave the viral polyprotein and an RNA helicase and RNA triphosphatase for genomic replication [104][105][106]. In ILHV, NS3 was found to be 618 amino acids long and to be highly conserved. Only a single strain, ZPC 804, was found to vary at a single residue, possessing a valine at residue 30 where all other strains possess an isoleucine (Supplementary Figure S1). Residue N66 was predicted in silico to be glycosylated; however, the presence of a proline at residue 67 in the N-X-S/T N-linked glycosylation motif renders this prediction questionable, and the glycosylation of the NS3 protein has not been reported in other flaviviruses. Phosphorylation was predicted at nine serine residues (71,253,302,390,393,426,468,547, and 609), ten threonine residues (175, 180, 190, 245, 267, 272, 303, 318, 377, and 479), and two tyrosine residues (472 and 555). The cysteines at ILHV C262 and C563 were found to be fully conserved amongst the nine considered flaviviruses and C375 was found to be conserved among all but YFV (Supplementary Figure S2). However, the X-ray crystallography of the helicase portion of ILHV NS3, comprising residues 177-618, made no note of any disulfide bond formation [107].

NS4A and 2K
The flavivirus NS4A and 2K are membrane-bound proteins involved in membrane remodeling, the RNA replication complex, and NS3 protease activity [55,[108][109][110]. The ILHV NS4A protein was found to be 149 amino acids long, from which the 23 carboxyterminal amino acids are cleaved to form the 2K protein. The mature NS4A protein was only found to vary at a single position, residue 69, which is a threonine in strain 331 and an alanine in all other strains (Supplementary Figure S1). One additional residue, position 17 in 2K and position 143 in the immature NS4A protein, is a valine in the Peruvian strains (H 2944, PE 20545, and PE 163615) and a leucine in all other strains. Glycosylation was not predicted, nor has it been reported for other flaviviruses. Only a single phosphorylation site was predicted: a tyrosine at residue 15 of NS4A. There are no conserved cysteine residues in the NS4A or 2K proteins of flaviviruses (Supplementary Figure S2), and disulfide bonds have not been reported or experimentally verified in other flaviviruses.

NS4B
Flavivirus NS4B is a component of the RNA replication complex and suppresses several host immune responses [111][112][113][114]. The ILHV NS4B protein was found to be 255 amino acids long with six variable residues (Supplementary Figure S1). Three of those variable residues are unique to the Venezuelan strains (ZCM 228, ZPC 659, and ZPC 804): residue 15 is an arginine in the Venezuelan strains and a lysine in the other strains, residue 22 is a histidine in the Venezuelan strains and an aspartic acid in the other strains, and residue 83 is an asparagine in the Venezuelan strains and a serine in the other strains. Two residues were found to be unique to the Peruvian strains (H 2944, PE 20545, and PE 163615) and the single Ecuadorian strain (FSE 0800): residue 20 is a threonine in these four strains and a serine in the other strains, and residue 30 is a histidine in these four strains and a glutamine in the other strains. Residue 164 is an isoleucine in ZPC 804 and a threonine in all other strains. Glycosylation was not predicted for ILHV NS4B. The N219 residue was identified as a potential glycosylation site with weak confidence by in silico analysis; however, the equivalent position in DENV-2 was also predicted to be glycosylated in silico but was experimentally demonstrated to not be glycosylated [114,115]. Phosphorylation is predicted at two serine residues (19 and 20) and at two threonine residues (8 and 9). ILHV possesses three cysteine residues in its NS4B protein: C102, C182, and C227. Although none of those cysteines are universally conserved between the nine considered flaviviruses (Supplementary Figure S2), two cysteines in equivalent positions (C99 and C178) of DENV showed chemical shifts in an NMR analysis suggestive of possible disulfide bond formation [116].

NS5
The NS5 protein of flaviviruses is critical to RNA replication, with both RNA-dependent RNA polymerase and RNA guanylyltransferase activity [117][118][119]. The ILHV NS5 protein was found to be 905 amino acids long and to vary at 11 residues (Supplementary Figure S1). The two oldest strains (331 and Original) were found to vary at four of these residues: residue 72 is an arginine in the oldest strains and a lysine in the other strains, residue 567 is a lysine in the oldest strains and a glutamine in the other strains, residue 619 is an alanine in the oldest strains and a valine in the other strains, and residue 886 is a cysteine in the oldest strains and a tyrosine in the other strains. Residue 843 was found to vary in the three oldest strains (331, Original, and BeH 7455), which possess an isoleucine where the other strains possess a leucine. Residue 424 is an isoleucine in the Peruvian strains (H 2944, PE 20545, and PE 163615) and an arginine in the other strains. Strain ZPC 659 was found to vary at two unique positions: residues 200 and 546 are both threonines in ZPC 659 and alanines in the other strains. Strain FSE 0800 was found to vary at two unique positions: residue 790 is an asparagine in FSE 0800 and an aspartic acid in the other strains, and residue 827 is a tyrosine in FSE 0800 and a histidine in the other strains. Residue 206 is an isoleucine in strain BeH 7445 and a valine in the other strains. Residue N213 may be glycosylated according to in silico analysis; however, the prediction had only weak confidence and glycosylation has not been reported for NS5 in other flaviviruses. Eighteen serines (46,128,153,214,271, 320, 389, 500, 504, 524, 596, 640, 660, 665, 745, 748, 751, and 836), ten threonines (59,93,161,396,422,544,573,695,794, and 895), and one tyrosine (883) were predicted to be phosphorylated. The phosphorylation of NS5 has been associated with binding to NS3 in the replication complex [120]. The ILHV NS5 protein possesses nineteen cysteine residues, eight of which (C395 C448, C451, C669, C713, C732, C757, and C784) were found to be conserved in all nine flaviviruses considered here (Supplementary Figure S2). However, disulfide bridges have not been noted in the crystal structures of full-length NS5 from DENV2, DENV3, JEV or Zika virus despite the presence of 10-16 cysteines in these proteins [121][122][123][124][125].

Discussion
Despite first being characterized in 1946 [1,2], ILHV remains an understudied virus. Severe and fatal human disease associated with ILHV infection is sporadic [11][12][13][14][15]34,46,[126][127][128], and there have been no known epidemic or epizootic outbreaks of ILHV. However, the introduction of WNV to the United States in 1999 [66] and ZIKV to Brazil in 2015 [129] both demonstrated that flaviviruses are capable of rapid expansion in previously naïve regions and that that circulation can be associated with significant disease and potentially emergent pathologies [130,131]. Given the frequency with which ILHV is found by either the isolation or detection of viral RNA in mosquitoes, birds, humans, and other potential host species [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17], as well as the prevalence of ILHV-reactive antibodies in serological surveys [1,4,10,, it is worth considering ILHV circulation's epidemic potential. It is therefore prudent to characterize ILHV now, so that the initial knowledge and tools are in place should ILHV ever emerge as a more widespread threat to human health.
The ten new full-length genomes deposited into GenBank as part of this characterization contribute a significant increase in the ILHV sequence diversity available for analysis. Furthermore, the temporal and geographic spread of these strains makes them a particularly valuable addition. The genomic organization, cellular organization, and structure of ILHV is typical of flaviviruses with no major deviations. However, there are a few features worth noting. The 3 UTR of ILHV was predicted to possess the CS1, CS2, CS3 and RCS2 conserved sequence elements and to lack the RCS3, YF-R1, YF-R2, and YF-R3 conserved sequence elements. The presence of the CS3 element and the absence of the RCS3 element leaves ILHV straddling the JEV group, which contains both sequence elements, and the DENV group, which contains neither sequence element [132]. The potential glycosylation of residue N72 in the capsid protein is also noteworthy in light of the similar prediction for ROCV and the lack of glycosylation reported in the capsid protein of other flaviviruses. It should be emphasized that our genomic characterization of ILHV relied on in silico predictions and must be experimentally verified (e.g., crystal structure). However, this work lays the foundation for the future study of ILHV and emphasizes the unique role of collections such as the WRCEVA in characterizing understudied pathogens with potential for emergence.