Structural and Functional Properties of the Capsid Protein of Dengue and Related Flavivirus

Dengue, West Nile and Zika, closely related viruses of the Flaviviridae family, are an increasing global threat, due to the expansion of their mosquito vectors. They present a very similar viral particle with an outer lipid bilayer containing two viral proteins and, within it, the nucleocapsid core. This core is composed by the viral RNA complexed with multiple copies of the capsid protein, a crucial structural protein that mediates not only viral assembly, but also encapsidation, by interacting with host lipid systems. The capsid is a homodimeric protein that contains a disordered N-terminal region, an intermediate flexible fold section and a very stable conserved fold region. Since a better understanding of its structure can give light into its biological activity, here, first, we compared and analyzed relevant mosquito-borne Flavivirus capsid protein sequences and their predicted structures. Then, we studied the alternative conformations enabled by the N-terminal region. Finally, using dengue virus capsid protein as main model, we correlated the protein size, thermal stability and function with its structure/dynamics features. The findings suggest that the capsid protein interaction with host lipid systems leads to minor allosteric changes that may modulate the specific binding of the protein to the viral RNA. Such mechanism can be targeted in future drug development strategies, namely by using improved versions of pep14-23, a dengue virus capsid protein peptide inhibitor, previously developed by us. Such knowledge can yield promising advances against Zika, dengue and closely related Flavivirus.


Introduction
Viral hemorrhagic fever is a global problem, with most cases due to dengue virus (DENV), which originates over 390 million infections per year worldwide, being a major socio-economic burden, mainly for tropical and subtropical developing countries [1]. A working vaccine was registered in Mexico in December 2015, approved for official use in some endemic regions of Latin America and Asia and, as of October 2018, also in Europe [2][3][4]. However, this vaccine is not 100% effective against all

Analysis of Amino Acid Sequence Conservation Among Flavivirus C proteins
A phylogenetic analysis of the Flavivirus C protein and the polyprotein amino acid residue sequences reveals if the C protein is an indicator of phylogenetic similarity ( Figure 1). C proteins of Spondweni group viruses, i.e., ZIKV, Spondweni virus (SPOV) and Kedougou virus (KEDV), cluster together, being the most similar to DENV (Figure 1a). Another cluster corresponds to mosquito-borne encephalitis-causing Flavivirus: Saint Louis encephalitis (SLEV), WNV, WNV serotype Kunjin (WNV-K), Alfuy (ALFV), Murray Valley encephalitis (MVEV), Usutu (USUV) and Japanese encephalitis (JEV) viruses. The Flavivirus polyproteins sequences show similar clusters ( Figure 1b). As such, the C protein is a good indicator of viral genetic similarity. Thus, we investigated the C protein amino acid sequences, seeking common patterns relevant to biological activity. The amino acid residues sequences of the Flavivirus C proteins identified above were analyzed in the context of the three main regions identified in DENV C sequence, i.e., the conserved fold region, the flexible fold region and the N-terminal IDP region ( Figure 2). This was done for all mosquito-borne Flavivirus relevant for human diseases (Figure 2a), as well as for the four main DENV C serotypes (Figure 2b). For this, the 16 mosquito-borne Flavivirus and the 4 DENV serotypes amino acid sequence of the C protein are jointly aligned. In agreement with previous work [12,14], five conserved motifs are found in the mosquito-borne Flavivirus C proteins and deserve attention, namely: the N-terminal conserved 13 hNML+R 18 ; 40 GXGP 43 in loop L1-2; 44 h+hhLAhhAFF+F 56 in α2 helix; 68 RW 69 of α3 helix; and, finally, the 84 F++-h 88 motif from α4 (with 'h', '+' and '-' representing hydrophobic, positively charged and negatively charged residues, respectively). Between residues 70-100, other motifs, not previously reported and containing hydrophobic and positively charged residues, are visible. Moreover, amino acid residues G and P, that can break the continuity of α-helices, are conserved in specific positions of the protein, especially in the disordered N-terminal and the flexible fold regions (Figure 2c). Charged residues are also conserved in specific locations. They are mostly in the conserved fold region, especially after position 95 ( Figure 2d). Overall, the disordered N-terminal and the flexible fold regions, when compared with the conserved fold region, have an average of, respectively, 10 versus 4 G and P residues (Figure 2c), green, 10 versus 15 K and R residues (Figure 2d), blue, and 1 versus 2 D and E residues (Figure 2d), magenta. Flavivirus C protein are 55% conserved, with residues being considered conserved if, in a given position, more than 15 are equal (red) or stereochemically similar (black). (b) Conservation between DENV serotypes is 80%, with the same criteria as in (a). (c) Structure-breaking residues G and P (green). (d) Charged residues: dark blue for positively charged residues (K and R), light blue for H, and magenta for negatively charged residues (D or E). (e) Overall conserved regions of Flavivirus C proteins: the disordered N-terminal and the conserved fold are clearly conserved in terms of charged and G/P amino acids. In contrast, the flexible fold region allows higher variability. Thus, its main role seems to be to connect the disordered N-terminal and the conserved fold regions, and to enable alternative conformations. DENV C serotype 2 is highlighted in blue, with amino acid residues numbered according to its sequence. Amino acid residues are numbered according to the consensus, coinciding with DENV-2 residues numbers. The viruses' full designation is found in the abbreviations section.
Several motifs in the Flavivirus C protein sequences can be identified. These represent the main sections of the protein, conserved during evolution as these must be crucial to protein function ( Figure 2e). The N-terminal region, although disordered, is highly conserved, in terms of charged amino acid and G/P residues. The flexible fold section allows greater variability, in line with previous reports by us and others, suggesting that it can adopt several conformations [15].

Analysis of the Flavirus C Protein Sequences Hydrophobicity and Secondary Structure Propensity
Hydrophobicity and α-helical propensity predictions were performed as previously reported [15], using the Kite-Doolittle [26] and the Deleage-Roux [27] scales on ProtScale server, respectively, for the 16 mosquito-borne Flavivirus C proteins analyzed ( Figure 3). The hydrophobicity scale ranges from −4.5, for highly polar amino acids (hydrophilic), to 4.5, for highly hydrophobic amino acid residues [26]. Therefore, when plotting the average values for each amino acid residue of the Flavivirus C sequences, negative local minima and positive local maxima indicate, respectively, hydrophilic and hydrophobic regions (Figure 3a,b). All proteins display a similar profile even in the N-terminal and flexible fold regions despite the slightly higher amino acid residues variability ( Figure 2). The α0 domain, homologous to pep14-23, is amphipathic, with average values near 0. In the flexible fold region, which is mostly amphipathic too, there is a peak of hydrophobicity between residues 30 and 40, possibly explaining its intermediate structure/dynamics behavior [13,14]. Some peaks of hydrophobicity are observed in the α3 and α4 domains, with the most hydrophobic domain being α2, as expected from the sequence analysis ( Figure 2) and from the literature [12,14,18]. For α-helical predictions secondary structure is highly probable above a threshold of 1.0 [27]. Flavivirus C proteins secondary structure predictions correlate well with the known secondary structure of DENV C (Figure 2e) [12]. Such agreement supports the concept of a transient α0 occurring for these proteins, as hypothesized earlier [15]. Roughly, between positions 12 to 20, occurs a disordered region with high tendency to acquire α-helical secondary structure. Importantly, the values of the predictions are similar and the same tendencies are found in all proteins, with peaks and valleys co-localizing ( Figure 3). Along with data from the last subsection, these results strengthen the idea that Flavivirus C proteins have similar structure and dynamics properties.

Analysis of the Flavivirus C Protein Tertiary Structure Propensity
Flavivirus C proteins tertiary structure was then investigated, complementing the α-helical predictions, to help understanding the disordered N-terminal region role(s). Following previous work [15], I-TASSER [28][29][30] was used to predict tertiary structures for the 16 closely related mosquito-borne Flavivirus C proteins ( Figure 4). Eighty monomer conformations were obtained (several for each sequence) and superimposed with the DENV C homodimer partial structure deposited at the Protein Data Bank (PDB) and obtained via nuclear magnetic resonance (NMR) spectroscopy (PDB ID: 1R6R). Noteworthy, DENV [12,16], WNV [31] and ZIKV [25] C proteins form homodimers, stabilized by hydrophobic and electrostatic interactions involving their conserved fold region [12][13][14]25,[31][32][33]. Since this is the most conserved region of Flavivirus C proteins sequences ( Figure 2), a homodimer is thus not only a stable conformational arrangement, but also likely to occur. Thus, as 28 conformers had more than 5 backbone clashes with the other monomer when superimposed in a homodimer structure (not allowing a viable homodimer), those conformers were discarded Table 1. The remaining 52 Flavivirus C proteins conformational models were analyzed, while superimposed with DENV C homodimer (PDB ID: 1R6R, model 21 [12]). These were then grouped into four clusters by visual inspection of their similarity ( Figure 4). Flavivirus C proteins tertiary structure predictions, organized into four conformational clusters. The Flavivirus C proteins conformations predicted by I-TASSER are superimposed with DENV C experimental homodimer structure (black). Amino acid residues of the N-terminal region in α-helix conformation are in blue, the other α-helices in red and the loops in gray. From the 80 conformers, 52 can be clustered by similarity of conformations, from cluster A to D. Clusters A, B and C have the α1 helix in the DENV C experimentally determined conformation (Protein Data Bank (PDB) ID: 1R6R [12]). In cluster D the α1 is in West Nile Virus (WNV) C and ZIKV C conformation (PDB IDs: 1SFK [31] and 5YGH [25], respectively). The closed autoinhibitory conformation of cluster C seems the most probable, having the highest number of models. Although unlikely given their transient unstable nature, N-terminal IDP regions may interact with each other. Table 1 specifies each cluster composition.
Most sequences have a conformer in each cluster ( Figure 1 and Table 1). In cluster A, some N-terminal amino acid residues are close to α4-α4 and may interact with RNA, namely the positively charged residues. Cluster B has the most scattered conformers, with the N-terminal region at the "top", not interacting with other protein regions, resembling a transition between more ordered states. In cluster C, the N-terminal region is in an autoinhibitory conformation, blocking the access to the α1-α2-α2 -α1 region, as previously suggested by us for DENV C [15]. 18 conformer models are predicted in this closed conformation with, at least, one model from most of the C proteins tested (except JEV C and ZIKV C; see Table 1). Therefore, it can occur in most Flavivirus C proteins. As for cluster D conformation, the α1 helix is in the conformation of WNV [14,31] and ZIKV [25] C experimental structures, an arrangement not previously reported for DENV C [15]. This closed conformation also involves the N-terminal region and α1 domain, and partially blocks the α2-α2 hydrophobic cleft (or totally blocks it, when both monomers are in the same conformation). Importantly, both cluster C and D are closed conformations, supporting the autoinhibition hypothesis. Table 1. Distribution of the I-TASSER predicted models through the four clusters.

Protein
Cluster A Cluster B Cluster C Cluster D Excluded Dimers with A or B conformers in one monomer enable the simultaneous co-existence of all other conformers (A to D) on the other monomer. The C conformer neither permits the existence of C-C homoconformers (i.e., both monomers in the same conformation) nor the heteroconformers of C-D and D-C . Despite that, D-D homoconformers are allowed, similarly to the conformation that WNV C adopts in the crystal form [31]. Moreover, to go from cluster A to cluster C or D, the N-terminal region should pass by cluster B. These constraints suggest a path for transitions between conformations, discussed ahead. Overall, the autoinhibition hypothesis proposed for DENV C [15] is supported and such conformation can occur in other Flavivirus C proteins.

Analysis of Dengue Virus (DENV) C Protein Rotational Correlation Time
Given the close similarities between Flavivirus C proteins (Figures 1-4), DENV C can be used as a general model for them. Hence, we proceeded to determine DENV C overall rotational correlation time (τ c ), taking advantage of the tryptophan residue in position 69 (W69) intrinsic fluorescence. Our computational data support three main structure/dynamics regions, including a disordered N-terminal region, which would increase its expected apparent size (as it would not be globular and folded), a property detectable by such an approach. Upon testing molecules in aqueous solution and at room temperature, fluorescence lifetimes are usually in the ns timescale, and the fluorescence decays are sensitive to the anisotropy of the fluorophore, which depends on its τ c (vd. Equations (1)-(8), describing these relations, in the Methods section [34,35]). Thus, the time-resolved fluorescence decay of DENV C W69 and the corresponding anisotropy decay were determined, both at pH 6.0 and 7.5 ( Figure 5). Table 2. Fitting parameters of DENV C time-resolved fluorescence anisotropy data analysis. Parameters obtained from fitting Equations (5) and (8) to the data of Figure 5. Values are average (±% standard error, SE). * Statistically significant differences (p < 0.05) between the values obtained for the two pH values tested.
Time-resolved fluorescence anisotropy decays at both pH values are similar (Figure 5b,d). Fluorescence lifetime components (τ 1 , τ 2 and τ 3 ) were obtained from the intensity decays Equations (2)-(6) [34,35], with a triple-exponential retrieving the best fit (Figure 5a,c). Fitting the data retrieves similar values Table 2 for τ 1 , τ 2 and τ 3 , and corresponding weights (α 1 , α 2 and α 3 pre-exponential factors, respectively). For accurate calculation of τ c , the condition τ c < 3 × τ 3 must occur [34,35]. Since τ 3 values were~6.4 ns (with a significant weight α 3 of~0.42), this means that, at both pH values, we could measure τ c values up to a limit of~19 ns. In both pH conditions, the τ c measured was 16.4 ± 0.5 ns at 22 • C, within the limit and higher than expected for a purely globular protein of DENV C size, as predicted [13].
Rossi et al. [36] correlated the τ c of 16 globular proteins at 20 • C with their molecular weight (MW in kDa), based on NMR data, leading to the relation: τ c ≈ 0.6 MW. Assuming DENV C as a 23.5 kDa fully globular homodimer and correcting for the temperature (T) and viscosity (η) [37], the τ c predicted is 12.0 ns. However, the correlational time must be slightly higher, as the protein will be partially unfolded and disordered (in the N-terminal). Jones et al. [16] measured a τ c of 13 ns at 27 • C, by NMR, which with the corrections from Equation (10) [37], corresponds to 13.4 ns at 25 • C. Given DENV C size, this implies that the protein is not globular, in line with current knowledge of DENV C structure and dynamics [12][13][14][15][16]. Fluorescence anisotropy supports an even more open and partially disordered DENV C structure, given the τ c value of 15.2 ± 0.5 ns at 25 • C Table 3, in line with in silico data (Figures 1-4). Table 3. Comparing DENV C τ c values (τ c at 25 • C in H 2 O were calculated using Equation (10)).

Analysis of DENV C Conformational Stability
Circular dichroism (CD) spectroscopy was used to study DENV C secondary structure, via its thermal denaturation in solution from 0 to 96 • C, at pH 6.0 and 7.5 (2 • C steps, Figure 6). At both pH values, the α-helical structure is partially lost upon increasing temperature (Figure 6a,b). However, even at 96 • C, the protein does no become completely random coil, as seen from the spectrum shape and its high ellipticity at 222 nm ( Figure 6c). Plotting the mean residue molar ellipticity at 222 nm, [θ], as a function of temperature, T, reveals a transition at~70 • C at both pH ( Figure 6c).  (20), (22), (24) and (28)). Vertical dashed lines represent experimentally observed T m , colored according to pH. Error bars represent SD, from three independent experiments. Residuals are shown below the graph, being lower than SD. DENV C does not display a typical unfolding profile, as the denaturation curves do not reach a flat plateau. Still, ellipticity data were successfully fitted to a denaturation curve (Figure 6c), assuming a homodimer with one-step denaturation [32]. Briefly, Equation (21) was combined with Equations (20), (22) and (24) and fitted to the data. This allows to obtain the thermodynamic parameters of DENV C unfolding Table 4, namely the melting temperature (T  Table 4. A small but consistent variation of the CD spectra between 0 and 40 • C is observable, implying: (i) a conformational equilibrium with temperature and/or (ii) some flexibility of the structure and/or (iii) a transition between alternative conformations. This temperature range covers the physiological conditions of both mosquitoes (20 to 40 • C, depending on the environment) and humans (36 to 40 • C). DENV C can continuously transition between conformations as temperature varies, in line with the previously hypothesized conformational equilibrium [15]. As temperature increases, the disordered conformations become more abundant but only a partial loss of structure is seen. This indicates that the C protein conserved region is thermodynamically stable. Similar observations are expected for other Flavivirus C proteins.  (20), (22), (24) and (28)) to the data. T m is the experimentally observed melting temperature (represented by the vertical lines in Figure 6c). Estimations are average ± SE. There were no significant variations between the two pH values tested (p < 0.05).

Discussion
Flavivirus C proteins are known to have similar sequences and structure [12][13][14][15][16]25,31]. Here, we go further by examining common features at different structural levels, complemented with data on DENV C size and thermodynamic stability. The phylogenetic analysis of the C proteins and the polyproteins ( Figure 1) shows that the former is a marker of Flavivirus evolution. There are several conserved motifs, highlighted in previous studies with 16 Flavivirus [12,14]. The work is now expanded to include the four DENV serotypes (Figure 2). When these 20 Flavivirus C amino acid sequences, with between 96 and 107 amino acid residues each, are jointly analyzed, it is clear that 55% of the residues are conserved or stereochemically similar (Figure 2a). About 80% of amino acid residues are equal or similar and, thus, conserved among the four DENV C serotypes (Figure 2b). From the five major conserved motifs, four are known to be involved in dimer stabilization [14]: the 40 GXGP 43 motif at loop L1-2, that marks the transition from the flexible to the conserved fold region [14]; the 68 RW 69 at α3 forms an hydrophobic pocket that accommodates the W69 side chain involving residues from α2, α3 and α4 [12,32]; and, the 44 h+hhLAhhAFF+F 56 and 84 F++-h 88 motifs, respectively from α2 and α4 helices, maintain the homodimer structure both via the α2-α2 hydrophobic interaction and via the salt bridges of residues [RK] 45 [12,14,32]. Flavivirus C proteins must have similarly sized secondary structure domains, since G/P are in the same positions and these amino acid residues tend to break the secondary structure (Figure 2c). Charged residues are also conserved (Figure 2d), which makes sense as charges would promote the interaction of the C protein with the negatively charged host lipid systems [12,14,[20][21][22] and the viral RNA [12]. C proteins have a common homodimer conserved fold region (roughly, residues 45-100), as observed for DENV, WNV and ZIKV C structures [12,14,25,31]. Conserved motifs are summarized in (Figure 2e).
The above explains the C proteins similar hydrophobic and α-helix propensities (Figure 3). The conserved motif 13 hNML+R 18 , at the N-terminal region, and the α2-α2 hydrophobic cleft are of particular importance for DENV C interaction with LDs and VLDL [14,[20][21][22]38]. Mutations in specific residues of DENV C α2-α2 and α4-α4 also impair RNA binding. Likewise, ZIKV C also accumulates on LDs surface, with specific mutations on this protein disrupting the association [25]. ZIKV C also binds single-stranded and double-stranded RNAs [25], with, as for DENV C, the high positively charged residues density prompting the binding to LDs and RNA [12,39,40]. Given the match at the level of N-terminal α-helical propensity and α2-α2 hydrophobicity (Figure 3), the C proteins may all be self-regulated by an autoinhibition mechanism, as proposed for DENV C [15].
The autoinhibition hypothesis is corroborated by the quaternary structure analysis ( Figure 4); Table 1. Two clusters, C and D, are autoinhibited conformations. Importantly, cluster D α1 aligns with WNV C [14,31] and ZIKV C [25]. Moreover, if two monomers are in a D conformation (D-D homoconformer), the dimer α2-α2 region is totally inaccessible. Cluster C does not allow a C-C homoconformer nor a C-D heteroconformer, imposing restrictions to the simultaneous transitions that are possible between A, B, C and D, as homodimer. The interaction between N-terminal regions within a dimer may be considered. Nonetheless, the disordered nature and high density of positively charged amino acid residues will mostly favor the repulsion between these IDP regions.
It is important to look at the clusters (Figure 4), while considering the number of positively charged residues (Figure 2) in the disordered N-terminal and flexible fold (10 K and R residues) versus those in the conserved fold (15 K and R). The charge distribution in some arrangements implies that the disordered N-terminal is at least in theory able to bind the viral RNA [39,40]. Such binding would be governed by the N-terminal region cationic amino acid residues [41,42]. Here, the structure predictions reveal that, indeed, the first 12 N-terminal residues can locate near α4-α4 Cluster A (Figure 4), the most likely RNA binding site [12,39,40]. Furthermore, binding to RNA via the C-terminal α4-α4 interface may be favored by a previous or simultaneous interaction of the protein with host LDs via the N-terminal region and α2-α2 interface. Access to α2-α2 (controlled by the N-terminal region) would modulate the interaction (Figure 4) and, thus, viral assembly. In agreement, the binding of the related hepatitis C virus core protein (homologous to DENV C) to host LDs is what enables efficient viral assembly [43]. Thus, the C protein disordered N-terminal would be critical to protein function, enabling crucial structural and functional roles.
To evaluate this, we used DENV C as a model system, measuring its τ c value by time-resolved fluorescence anisotropy ( Figure 5) and its thermal stability by CD spectroscopy (Figure 6), at pH 6.0 and 7.5 (within the usual pH range of its biological microenvironment). A similar τ c , 15.2 ± 0.5 ns, is obtained at both pH values ( Figure 5; Tables 2 and 3), in line with previous work [13]. DENV C maintains its homodimer structure and dynamics behavior between pH 6.0 and 7.5. The τ c value and respective size are higher than expected, due to the N-terminal disordered nature.
Regarding DENV C thermodynamic stability ( Figure 6, Table 4), the protein T m is~70 • C at both pH values. These denaturation parameters are in line with other authors, as a chemically synthesized DENV C 21-100 fragment (without most of the disordered N-terminal region) displays a T m = 71.6 • C [32]. DENV C high thermal stability in physiological conditions is likely due to the large hydrophobic area that is shared by the two monomers [12], but also to the W69 stabilizing interactions and, as experimentally observed [32], the formation of salt bridges (residues K45 and R55 with E87). As structure/dynamics properties are conserved among Flavivirus C proteins (Figures 2-4), these observations can probably be generalized for all these proteins.
These findings must also be considered in light of DENV C biologically relevant interactions with LDs [22] and RNA (Figure 7). DENV C experimental structure [12] contains three distinct structural regions [13]: a disordered N-terminal region (from the N-terminal up to residue R22), a flexible fold (residues V23 to L44, where α-helix 1 is located) and a conserved fold with helices α2, α3 and α4, containing the R68 and W69 amino acid residues, highly conserved among Flavivirus [12]. R68 terminates α3 helix, with its side chain pointing to the protein interior [12]. W69 locates at DENV C α4-α4 interface, having a crucial role in the dimer structural stabilization [12]. Along with dimer structural stability, these interactions enable allosteric communication and movements between DENV C more hydrophobic section (α2-α2'dimer interface) and its remaining sections, namely the α4-α4 region. Figure 7 displays this, in the context of the C protein biologically relevant interactions, as they are understood on the basis of recent studies [12][13][14][15]18,[20][21][22][23][24].
Looking further, it is important to consider that the binding of DENV C to host LDs is mediated by both the N-terminal IDP region and the α2-α2 interface [14]. V51 of α2 is affected by the interaction with LDs and stabilizes the dimer by contacting with α3 (I65). Another interaction via salt bridges, between α2 (K45 and R55 ) and α4 (E87), stabilizes the homodimer (Figure 7a). The C protein binding to host LDs, which affects the α2-α2 , can lead to changes in the α4-α4 structural arrangement (Figure 7b). To investigate this we searched for similar proteins. An RNA-binding protein with a two-helix domain similar to DENV C α4-α4 was identified (Figure 7c), influenza A non-structural protein 1 (NS1, PDB ID: 2ZKO [44]). Influenza NS1 has interesting features: it accumulates in the nuclei of host cells after being translocated by importin α and β and works as a viral immuno-suppressor by weakening the host cell gene expression [45]. DENV C was also reported to have an importin α-like motif in the N-terminal [15,46]. Regarding the targets that may interact with importin α and be transported to the nucleus, they normally contain a nuclear localization sequence (NLS), consisting of a motif of at least 2 consecutive positively charged residues [47][48][49][50][51]. Some of these proteins contain 2 NLS motifs, with at least 8 (up to 40 or even more) residues in between, designated as a bipartite NLS motif [49][50][51]. Strikingly, Flavivirus C proteins have three motifs of two consecutive cationic residues in the N-terminal region and α1 domain, which could form a bipartite NLS. A bipartite NLS formed by the cationic residues before position 10 and at positions 17 and 18, with a spacer of 7 to 13 residues can occur. The other bipartite NLS possibility may be formed by residues at positions 17 and 18, and at positions 31 and 32, with 9 to 12 spacer residues. Possible bipartite NLS are also seen in the conserved fold region but its static nature precludes activity as NLS. If DENV C binds to importin α, it may act as a cargo protein to be transported to the nucleus. This could explain why has DENV C been found in the nucleus of DENV infected cells [46,52,53]. DENV C may directly bind importin β, given the similarities between the N-terminal region of DENV C and importin α [49]. This may allow it to disrupt the normal nuclear import/export system in DENV-infected cells. The conformational plasticity of the N-terminal and flexible fold regions is certainly compatible with interactions with importin(s). As the hypothesized bipartite NLS are conserved among Flavivirus C proteins, this may occur in other Flavivirus.
The C protein may act as an immuno-suppressor, similarly to influenza NS1, by interacting with importins α and/or importin β. Ivermectin, a specific inhibitor of importin α/β-mediated nuclear import, is able to inhibit HIV-1 and DENV replication [54]. The mechanism of DENV C inhibition might involve the C protein, specifically the intrinsically disordered N-terminal IDP region, which is similar to importin α disordered N-terminal region [15]. Moreover, influenza NS1 can counteract the RNA-activated protein kinase (PKR)-mediated antiviral response through a direct interaction with PKR [55]. Besides, influenza NS1 blocks interferon (IFN) regulatory factor 3 activation, which in turn prevents the induction of IFN-related genes [56]. DENV inhibits the IFN signaling pathway in a similar manner [57]. By its N-terminal region dsRNA-binding ability, influenza NS1 inhibits the nuclear export of mRNAs and modulates pre-mRNA splicing, suppressing antiviral response [44]. Similarities between DENV C and influenza NS1 also extend to the later ability to bind RNA (Figure 7c). Recognition of dsRNA is made by the influenza NS1 RNA-binding domain, which forms a homodimer [44]. Afterwards, a slight change in R38-R38 orientation leads to anchoring the dsRNA to the protein by a hydrogen bond network to the protein [44]. One of the main functions of influenza NS1 binding to RNA is sequestering dsRNA from the 2 -5 oligo(A) synthetase [58]. We propose that, as with influenza NS1, a small conformational change in DENV C α4-α4 interface occurs after the contact of its α2-α2 interface with LDs, modulated by transitions between alternative N-terminal "open" and "closed" conformations. Binding to LDs requires an open conformation (Figure 7d), decreasing the conformational variability and entropy of the C protein, which trigger the allosteric movements affecting the C-terminal α4-α4 . As with influenza NS1, the Flavivirus C protein would remain in the same overall fold, but a small opening of α4-α4 would facilitate its binding to RNA.  [44]). (d) DENV C with schematically bound to a LD and to RNA. DENV C amino acid residues affected by the binding to LDs are colored yellow, while a key internal salt bridge is shown in cyan. DENV C binding to host LDs may enable allosteric rearrangements (eventually involving the salt bridge), allowing a small conformational change in α4 side chains, namely the positively charged residues, prompting stable RNA-C protein binding.
The C-terminal is likely to be the crucial section for RNA binding given its similarity with influenza NS1 (Figure 7). Nevertheless, the N-terminal conformers must also be considered in the context of RNA binding (Figure 4). The A and D conformers allow RNA to be bound to the α4-α4 interface and, simultaneously, to the N-terminal cationic amino acid residues. A-A and D-D conformations result in the possible binding of a single continuous portion of RNA to both the C-terminal α4-α4 and the N-terminal IDP region, making the RNA more tightly bound. Moreover, the A-B', B-B' and B-C' conformations would enable the protein to bind two distinct sections of the RNA, one bound to α4-α4 and another to the N-terminal regions. That arrangement may allow to further compact the viral RNA. The N-terminal IDP region putative binding to RNA should not be disregarded given its positive net charge (+7). It compares very well with the C-terminal α-helical region net charge (+8 for a monomer, +16 for α4-α4 dimer interface). Both may thus bind RNA due to, mostly, electrostatic forces. This IDP region can thus provide multi-functionality by several modes of binding and different ligands, enabled by alternative conformations. It must be stressed that this is not unlikely. Viral proteins tend to have IDP regions that increase their biological activity [59][60][61]. In a proteome as small as that of flaviviruses (10 proteins), IDP regions augment the number of ligands with which it can interact. Less structure often means more function. This is an increasingly hot topic of recent research, leading to design of algorithms to identify these regions [62,63]. Further analysis will help understand the interaction between DENV C and its ligands.
To conclude, the data imply a common structure and functions for mosquito-borne Flavivirus C proteins. Moreover, studying DENV C rotational diffusion and thermodynamics reveals a stable protein due to the conserved fold maintaining the homodimer structure. These findings apply to other Flavivirus C proteins, supporting a common mechanism for their biological activity. Such understanding of this key protein structure and dynamics properties may contribute to the future development of C protein-targeted drugs to impair dengue virus and other Flavivirus infections.
Statistical comparison of the disordered N-terminal plus flexible fold regions with the conserved fold region of Flavivirus C proteins, for G and P content, as well as charged amino acid residues, was performed via a paired t-test, using GraphPad Prism v5 software. p-values were always lower than 0.001.
Predictions of hydrophobicity and α-helix propensity were done using ProtScale server (http://web.expasy.org/protscale/) [26,27], tertiary structure predictions were performed via I-TASSER server (http://zhanglab.ccmb.med.umich.edu/I-TASSER/) [28][29][30], following previous approaches [15]. Briefly, Flavivirus C protein sequences from our previous work were employed [14]. DENV and WNV (serotype Kunjin) C structures were excluded, not serving as templates for the tertiary structure prediction. ZIKV C protein structure was also not included, as it was not yet determined when the modeling was conducted. This avoids a bias towards known homologous protein structures. Five I-TASSER models were obtained for each C protein sequence. These were superimposed with DENV C experimental structure (PDB ID 1R6R, model 21) [12] after root-mean-square deviation (RMSD) minimization in UCSF Chimera v1.9 software [68]. Clusters were formed based on the visual similarity between predictions. The number of N-terminal amino acid residues with backbone clashes with the other monomer backbone was calculated for each model. In our previous work [15], a DENV C predicted structure was excluded from further analysis if it had 6 clashes or more, as it would not be viable as an homodimer [15]. Here we excluded models with more than 5 clashes (28 models rejected). These would preclude homodimer formation and, thus, were not considered in the clusters analysis (Table 1 excluded models column).

Structure Comparison Between DENV C and Influenza NS1
Protein structures coordinates were extracted from the Protein Data Bank (PDB, www.pdb.org). PDB identification codes are specified ahead after each protein name. The protein structures were superimposed through UCSF Chimera 1.13.1 software MatchMaker tool. After that, we carefully analyzed the superposition visually. Then, using the Match-Align tool of UCSF Chimera, which returns a sequence alignment based on the regions and taking into account the structure superimposition, we identified the residues simultaneously similar in structure and sequence. Protein structure figures were obtained using UCSF Chimera 1.13.1 version [68].

DENV C Recombinant Protein Production and Purification
Recombinant DENV C protein expression and purification was conducted based on previous approaches [13]. We used a pET-21a plasmid containing DENV serotype 2 strain New Guinea C capsid protein gene (encoding amino acid residues 1-100) [69]. The protein was expressed in Escherichia coli C41 and C43 bacteria grown in lysogeny broth (LB) medium. The only differences in the purification protocol are the abolition of the ammonium sulfate precipitation step and the addition of a size exclusion chromatography step (with Sephadex S200) after the heparin affinity column chromatography, using an AKTA chromatography equipment. The C protein was purified in a 55 mM KH 2 PO 4 , pH 6.0, 550 mM KCl. DENV C protein purified fractions were concentrated with Amicon Ultra-4 Centrifugal Filters of 3 or 10 kDa nominal cut-off, from Millipore (Billerica, MA, USA). Concentrated protein samples were stored at −80 • C. Protein samples quality was assessed by SDS-PAGE and matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI-TOF MS) analysis. Very low degradation and the highest peak consistent with the expected mass of the protein monomer (11765 Da).

Time-Resolved Fluorescence Anisotropy
Time-resolved fluorescence spectroscopy measurements were performed in a Life Spec II equipment with an EPLED-280 pulsed excitation light-emitting diode (LED) of 275 nm (Edinburgh Instruments, Livingston, UK), acquiring the emission at 350 nm. DENV C (monomer) concentration was 20 µM in 50 mM KH 2 PO 4 , 200 mM KCl, pH 6.0 or pH 7.5, with 550 µL total volume, in 0.5 cm × 0.5 cm quartz cuvettes. The instrument response function, IRF(t), was obtained with the same settings, except emission, which was at 280 nm, with a solution of polylatex beads of 60 nm diameter diluted in Mili-Q water. Measurements were performed at 22 • C. Time-resolved fluorescence intensity measurements with picosecond-resolution were obtained by the time-correlated single-photon timing (TCSPT) methodology [35]. Measurements were performed at constant time, with 15 min per decay, acquiring 2048 time points in a 50 ns window. Four intensity decays, I(t), were acquired in each condition, with excitation/emission polarizers, respectively at vertical/vertical positions, I VV (t), vertical/horizontal positions, I VH (t), horizontal/vertical positions, I HV (t), and horizontal/horizontal positions, I HH (t). The instrumental G-factor was calculated as [35]: The G-factor value obtained was 1.61. The intensity decay with emission polarizer at the magic angle (~54.7 • , with respect to the vertical excitation polarizer), I m (t), avoids the effects of anisotropy. It can be calculated easily [35]: with I VV (t) and I VH (t) depending on the time-resolved fluorescence anisotropy, r(t), as: Thus, I m (t) was used to obtain the fluorescence lifetime components, τ i , and the respective amplitudes, α i , for the DENV C W69. I m (t) was described by a sum of three exponential terms: where the index i represents each component of the fluorescence decay. For the fitting to the data, α i and τ i values were obtained by iteratively convoluting I m (t) with the IRF(t): and fitting I calc m (t) to the experimental data, I exp m (t), using a non-linear least squares regression method. The usual statistical criteria, namely a reduced χ 2 value bellow 1.3 and a random distribution of weighted residuals, were used to evaluate the goodness of the fits [35]. Data analysis was performed using the TRFA Data Processing Package v1.4 (Scientific Software Technologies Centre, Belarusian State University, Minsk, Belarus) which allows calculating automatically the standard error (SE) for each fitted parameter [35].
The time-resolved fluorescence anisotropy, r(t), is calculated via I VV (t), I VH (t) and G via_ENREF_52: In this case, the obtained r(t) can be fitted to a single exponential decay [35]: where r 0 is the anisotropy when t→0 and τ c is the rotational correlation time. The r(t) decays were globally analyzed in TRFA Data Processing Package v1.4 maintaining the previously obtained α i and τ i values constant, and convoluting Equations (3) and (4) with the respective IRF(t), analogously to the analysis of I m (t), using Equation (8)

Rotational Correlation Time Corrections
The τ c of a molecule in solution is related with the solution viscosity, η, the molecular hydrodynamic volume, V, the Boltzmann constant, k B , and the absolute temperature, T, as [35,70]: Based on Equation (9), τ c can be corrected for different temperatures, considering that the molecular volume does not change significantly in a small temperature interval (±5 • C; i.e., V and k B are constants), using [70]: where the indexes 'a' and 'b' represent a different condition of T and η, taking into account the variation of η with T [37]. The η values were assumed to be those of pure H 2 O or 10% D 2 O in the case of the corrections for the NMR-based values (those from the literature). In this way, Table 5 below shows the values employed on the calculations [37]: Table 5. Values for η employed in this work, derived from the references and Equations above.

Temperature Denaturation Measurements via Circular Dichroism (CD) Spectroscopy
Circular dichroism spectroscopy measurements were carried out in a JASCO J-815 (Tokyo, Japan), using 0.1 cm path length quartz cuvettes, data pitch of 0.5 nm, velocity of 200 nm/min, data integration time (DIT) of 1 s and performing 3 accumulations. Spectra were acquired in the far UV region, between 200 and 260 nm, with 1 nm bandwidth. The temperature was controlled by a JASCO PTC-423S/15 Peltier equipment. It was varied between 0 and 96 • C, in steps of 2 • C, increasing at a rate of 8 • C/min and waiting 100 s after crossing 5 times the target temperature, T. Then, the system was allowed, at least, 120 s to equilibrate (sufficient time for a stable CD signal). Before and after denaturation, spectra were acquired at 25 • C, to determine the reversibility of thermal denaturation. DENV C monomer concentration was 20 µM in 50 mM KH 2 PO 4 , 200 mM KCl, pH 6.0 or pH 7.5, with 220 µL of total volume. Spectra were smoothed through the means-movement method (using 7 points) and normalized to mean residue molar ellipticity, [θ] (in deg cm 2 dmol −1 Res −1 ).
For the CD temperature denaturation data treatment, we assumed a dimer to monomer denaturation model [71][72][73] in which the folded dimer, F 2 , separates into unfolded monomers, U, in a single step described by reaction R1: In this system, the total protein concentration, [P m ], in monomer equivalents, is described as: Hereafter, concentrations are treated as dimensionless, being divided by the standard concentration of 1 M, in order to be at standard thermodynamic conditions. The fractions of monomer in the folded, f F , and unfolded, f U , states are calculated by [71,72]: f F + f U = 1 (14) and the concentrations of folded dimer and unfolded monomer can be written in terms of f U : Then, the equilibrium constant, K eq , of R1 is defined in terms of [U] and [F 2 ], or f U and [P m ]: which can be solved in order to f U , with the only solution in which f U ∈ [0; 1] being: where [θ] T,F and [θ] T,U have a variation with T described here by a straight line (i can be F or U) [72,74]: Equation (19) can be re-written to evidence f U and then substitute it by Equation (18) K eq can also be described by the standard Gibbs free-energy, ∆G • , of the reaction R1: where R is the rare gas constant and T is the absolute temperature. The ∆G • function used to fit the data contains both the enthalpic, ∆H • , and entropic, ∆S • , variations with temperature, which take into account ∆H In our data, ∆C • p was statistically equal to 0 and, thus, Equation (23) can be simplified to: Then, Equation (21) was combined with Equations (20), (22) and (24), and fitted to the data using GraphPad Prism v5 software, via the non-linear least squares method, to extract both the ∆H Interestingly, for a dimer to monomer denaturation, K eq depends on [P m ] and, consequently, ∆G The SE of T m was based on the percentual SE value of T • m . Values obtained for both pH conditions were statistically evaluated via F-tests to compare two possible fits, one assuming a given parameter as being different for the distinct data sets, and another assuming that parameter to be equal between data sets (while maintaining the other parameters different). No statistically significant difference (p < 0.05) was observed.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.