Looking at the Pathogenesis of the Rabies Lyssavirus Strain Pasteur Vaccins through a Prism of the Disorder-Based Bioinformatics

Rabies is a neurological disease that causes between 40,000 and 70,000 deaths every year. Once a rabies patient has become symptomatic, there is no effective treatment for the illness, and in unvaccinated individuals, the case-fatality rate of rabies is close to 100%. French scientists Louis Pasteur and Émile Roux developed the first vaccine for rabies in 1885. If administered before the virus reaches the brain, the modern rabies vaccine imparts long-lasting immunity to the virus and saves more than 250,000 people every year. However, the rabies virus can suppress the host’s immune response once it has entered the cells of the brain, making death likely. This study aimed to make use of disorder-based proteomics and bioinformatics to determine the potential impact that intrinsically disordered protein regions (IDPRs) in the proteome of the rabies virus might have on the infectivity and lethality of the disease. This study used the proteome of the Rabies lyssavirus (RABV) strain Pasteur Vaccins (PV), one of the best-understood strains due to its use in the first rabies vaccine, as a model. The data reported in this study are in line with the hypothesis that high levels of intrinsic disorder in the phosphoprotein (P-protein) and nucleoprotein (N-protein) allow them to participate in the creation of Negri bodies and might help this virus to suppress the antiviral immune response in the host cells. Additionally, the study suggests that there could be a link between disorder in the matrix (M) protein and the modulation of viral transcription. The disordered regions in the M-protein might have a possible role in initiating viral budding within the cell. Furthermore, we checked the prevalence of functional disorder in a set of 37 host proteins directly involved in the interaction with the RABV proteins. The hope is that these new insights will aid in the development of treatments for rabies that are effective after infection.


Introduction
Rabies lyssavirus is a bullet-shaped, negative-sense, unsegmented, single-stranded RNA virus of the Rhabdoviridae family. There are 10 viruses in the Rabies serogroup, but most are not pathogenic to humans. Rabies lyssavirus and Australian bat lyssavirus are the only two rhabdoviruses that have been known to cause disease in humans [1]. The rabies virus (RABV) is a zoonotic neurotropic virus that causes fatal neurological symptoms in almost all mammals and is spread through the bite of an infected mammal. Rabies disease causes between 40,000 and 70,000 deaths every year worldwide. Once a rabies patient has become symptomatic, there is no effective treatment for the illness. In fact, in unvaccinated  In mature RABV, the nucleoprotein, phosphoprotein, and viral polymerase envelop the genomic RNA in a structure known as the ribonucleocapsid (RNP). The matrix protein surrounds the RNP and determines the shape of the virus. The matrix protein also anchors the glycoprotein to the envelope [10] (original source of the image: Philippe Le Mercier, SIB Swiss Institute of Bioinformatics). and that such structure-less intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are commonly found in various proteomes [34][35][36][37][38], where they are involved in regulation, signaling, and control pathways [27,30,32,[39][40][41], thereby possessing functions that complement the functional repertoire of traditional ordered proteins [42][43][44][45][46][47]. IDPs/IDPRs are often involved in various human diseases [48,49]. They have complex mosaic structures and show remarkable multi-level spatiotemporal heterogeneity, existing as dynamic conformational ensembles [25,27,31,40,42,50,51] where different parts of a protein can be (dis)ordered to different degrees [40,52,53]. Importantly, these differently (dis)ordered pieces of the protein structural mosaic might have well-defined and specific functions [53]. Therefore, IDPs/IDPRs are structurally and functionally heterogeneous complex systems whose functionality is described in terms of the protein structure-function continuum [53,54], where the structural and functional diversification of a protein is defined by several factors determining the capability of a single gene to encode a set of distinct protein molecules, known as proteoforms [55]. This is achieved at several levels by altering the chemical structure of the proteinaceous product(s) of a given gene via allelic variations at the DNA level (utilizing several specific means, such as single or multiple point mutations, indels, and SNPs) by alternative splicing and other pre-translational mechanisms affecting mRNA, via a broad arsenal of countless post-translational modifications (PTMs) of a polypeptide chain, by the presence of intrinsic disorder, or by structural alterations induced by functioning [54].
Importantly, based on computational analyses of the abundance of intrinsic disorder in various organisms, it has been concluded that the proteomes of viruses have the largest variability in the content of disordered residues in comparison with all other kingdoms of life [37,38]. The abundance and functional importance of IDPs/IDPRs have been systematically investigated for human papillomaviruses (HPVs) [56,57], human immunodeficiency virus 1 (HIV-1) [58], human hepatitis C virus (HCV) [59,60], Dengue virus [61], rotavirus [62], human respiratory syncytial virus [63], Zika virus [64,65], Chikungunya virus [66], Alkhurma virus (ALKV) [67], Japanese encephalitis virus [68], and SARS-CoV-2, human SARS, and bat SARS-like coronaviruses [69]. These studies suggested that the presence of IDPRs in viral proteins is crucial for their functionality and represents an important means of the overall enhancement of viral propagation during the virus life cycle. To the best of our knowledge, there is no similar analysis of the disorder status of the RABV proteome. The aim of this study was to fill this gap by conducting a comprehensive computational analysis of the penetrance of intrinsic disorder in the proteins of the Rabies lyssavirus strain Pasteur Vaccins proteome.

Materials and Methods
The Universal Protein Resource (UniProt) is an annotated database of protein sequences [70]. Entries may be manually annotated with information extracted from the literature and evaluated by computational analysis (Swiss-Prot) or computationally analyzed (TrEMBL). The Rabies lyssavirus strain Pasteur Vaccins proteome is a Swiss-Prot annotated entry. Each entry contains information, including but not limited to the taxonomy, molecular function and included biological processes, subcellular location, potential modifications, pathology, interactions, and structures. Of these, the amino acid sequences and the structures are the most valuable resources for the purposes of studying intrinsic disorder in the virus. A set of 37 human proteins involved in interactions with the RABV proteins was assembled through a literature search, with the majority of data being retrieved from [71]. Amino acid sequences and basic disorder-related features of these proteins are provided in Supplementary Figures S1-S5.
The amino acid sequences can be analyzed through various methods to identify regions in the protein that have a predisposition towards intrinsic disorder. Through a comparison of these regions of intrinsic disorder and the function of the protein, an analysis can be made of the disorder-based functionality of the protein. Determining the structure of the protein is important for mapping the intrinsic regions of disorder to regions on the protein. Intrinsically disordered regions are flexible. As a result, these regions are not recorded in the crystal structure of the protein. Regions of the protein that do not show up in the crystal structure are indicative of intrinsic disorder [72].
For this study, amino acid FASTA sequences for the Rabies lyssavirus PV proteome were gathered through UniProt. The sequences were run through a series of disorder prediction tools to generate an estimate of the intrinsic disorder of each residue. These predictions of intrinsic disorder were averaged to form an overall prediction of intrinsic disorder for that residue. Table 1 shows a summary of these predictions, including the UniProt entry ID of each protein analyzed. The table also displays the protein length and the length of the longest region of disorder within the protein. Overall disorder was calculated based on the incidence of regions of high disorder. The FASTA sequences used in this study are reproduced in Supplementary Materials. The underlined sections of each amino acid sequence correspond to regions of high disorder within the protein. The amino acid sequences presented here were taken from UniProt [70]. Regions of high intrinsic disorder (intrinsic disorder of more than 30%) are shown using bold underlined text. The perresidue intrinsic disorder propensity for each protein is calculated by taking the averages of the PONDR ® VL3, PONDR ® VLS2, PONDR ® VLXT, PONDR ® FIT, IUPred_Short, and IUPred_Long predictors. Predictor of Natural Disordered Regions (PONDR ® ) is meta-prediction software that can analyze an amino acid FASTA sequence on a per-residue basis to predict regions of intrinsic disorder. This study made use of the PONDR ® VLXT, PONDR ® VL3, PONDR ® VLS2, and PONDR ® FIT meta-predictors from the PONDR family, as well as IUPred_Short and IUPred_Long [73][74][75]. PONDR ® VLXT works by applying three different neural networks: one for each terminal end of an intrinsically disordered sequence and one for the internal region of the sequence. Each network uses a specific dataset that contains only the amino acid residues that are present in that region. The result of the predictor is an average of the results of the three networks. Transitions between the prediction networks work by averaging the predictors in a short region of overlap at the boundary between the two. PONDR ® VLXT is useful for predicting short regions of disorder but underestimates the occurrence of long disordered regions. PONDR ® VL3 works by running the residue through ten neural networks and selecting the final prediction by taking the simple majority vote of the predictions. This meta-predictor is known to be useful for predicting longer regions of intrinsic disorder. PONDR ® VSL2 combines neural network predictors for short and long disordered regions. The networks are trained using sequences of specific lengths, and the final prediction is a weighted average of the predictions for each length. Because it combines both short and long disordered regions, it is considered the most accurate predictor of the three [73][74][75].
IUPred works from the assumption that globular and structured proteins have higher numbers of effective inter-residue interactions than disordered proteins do, which means that they have negative free-energy. Structured proteins have lower free-energy estimates compared to disordered proteins. The IUPred meta-predictor is able to use this biophysicsbased approach to estimate disorder by calculating the pairwise free-energy of the se-quence [74,75]. IUPred Long predicts global structural disorder, or disorder in regions with more than 30 consecutive residues. IUPred Short is useful for predicting short disordered regions, such as the region corresponding to the missing residues in the X-ray structure of a largely globular protein. ANCHOR2 is used to predict context-dependent intrinsic disorder. Context-dependent intrinsic disorder may occur when the binding region of an IDPR is able to interact specifically with a globular protein. When bound, these regions adopt an ordered structure. Context-dependent intrinsic disorder may also occur when the change in disorder is due to a change in the redox state. These regions may change their disorder depending on their localization relative to the cell. For all query proteins, the presence of such context-dependent disordered regions, disorder-based binding regions, and molecular recognition features (MoRFs), i.e., disordered regions that fold upon their interaction with partners, was analyzed by the ANCHOR algorithm [76,77].
The outputs of the per-residue predictors were averaged, and proteins were grouped based on their percentages of predicted intrinsically disordered residues (PPIDR) using accepted classification criteria [87]. Proteins with an average content of intrinsically disordered residues below 10% are considered ordered or mostly ordered. Proteins containing between 10% and 30% predicted disordered residues are considered moderately disordered. Proteins containing more than 30% predicted disordered residues are considered highly disordered [87].
Complementary disorder evaluations, together with important disorder-related functional information, were retrieved from the D 2 P 2 database (http://d2p2.pro/, accessed on 15 August 2022) [88], which is a database of predicted disorder for a large library of proteins from completely sequenced genomes [88]. The D 2 P 2 database uses the outputs of IUPred [74,82], PONDR ® VLXT [79], PrDOS [89], PONDR ® VSL2B [81], PV2 [88], and ESpritz [90]. The visual console of D 2 P 2 displays 9 colored bars representing the location of disordered regions, as predicted by these different disorder predictors. In the middle of the D 2 P 2 plots, the blue-green-white bar shows the predicted disorder agreement between the nine disorder predictors (IUPred, PONDR ® VLXT, PONDR ® VSL2, PrDOS, PV2, and ESpritz), with blue and green parts corresponding to disordered regions by consensus. Above the disorder consensus bar are two lines with colored and numbered bars that show the positions of the predicted (mostly structured) SCOP domains [91,92] using the SU-PERFAMILY predictor [93]. The yellow zigzagged bar shows the location of the predicted disorder-based binding sites (MoRF regions) identified by the ANCHOR algorithm [76], whereas differently colored circles at the bottom of the plot show the locations of various PTMs assigned using the outputs of the PhosphoSitePlus platform [94], which is a comprehensive resource for experimentally determined post-translational modifications.
Information on the interactability of human proteins that interact with RABV proteins was retrieved using Search Tool for the Retrieval of Interacting Genes (STRING, http: //string-db.org/, accessed on 15 August 2022). STRING generates a network of proteinprotein interactions based on predicted and experimentally validated information on the interaction partners of a protein of interest [95]. In the corresponding network, the nodes correspond to proteins, whereas the edges show predicted or known functional associations. Seven types of evidence are used to build the corresponding network, where they are indicated by differently colored lines: a green line represents neighborhood evidence; a red line represents the presence of fusion evidence; a purple line represents experimental evidence; a blue line represents co-occurrence evidence; a light blue line represents database evidence; a yellow line represents text mining evidence; and a black line represents coexpression evidence [95]. In this study, STRING was utilized in three different modes: to create PPI networks centered on individual human proteins, to generate the internal network of protein-protein interactions (PPIs) among the human proteins involved in interactions with the RABV proteins, and to build a PPI network centered on the entire set.
All computer-generated structures of the RABV proteins analyzed in this study were generated using SWISS-MODEL [97] and ExPasy. The 3D structural models of human proteins that interact with the RABV proteins were generated by AlphaFold [98].

Predicted Disorder of the P-Protein and Its Suggested Functional Consequences
The P-protein of RABV, which is a 297-residue-long catalytic polymerase cofactor and regulatory protein that plays an important role in viral transcription and replication, was shown to display a significant amount of intrinsic disorder (see Figure 2). In fact, based on the PONDR ® VSL2 outputs, roughly 49% of the protein residues were predicted to be intrinsically disordered (i.e., they have disorder scores exceeding the 0.5 threshold). Furthermore, almost 46% of its residues were predicted to be flexible (i.e., possessing disorder scores ranging from 0.15 to 0.5), indicating that 95% of this protein is expected to be either disordered of structurally flexible. Figure 2A also shows that the P-protein contains two long intrinsically disordered regions, IDD1 (residues 33-89) and IDD2 (residues , that flank an oligomerization domain, and a short disordered/flexible region (residues 242-252) located within the mostly ordered C-terminal domain (PCTD, residues 201-297). It was indicated that Negri bodies, also known as viral inclusion bodies in the host cytoplasm used for viral replication, are formed via the interaction of the P-protein oligomerization domain, IDD2, and the PCTD [99] with the intrinsically disordered regions of the N-protein, whereas the N-terminal part and IDD1 of the P-protein are dispensable [99,100].
High levels of intrinsic disorder in the RABV P-protein are further evidenced by the analysis of its X-ray crystal structure (PDB ID: 3OA1). In fact, although the 69-297 fragment of this protein was used in the crystallization experiments, the structure was eventually solved for less than half of this construct (residues 192-297), with the entire N-terminal half (residues 69-191) representing a region with missing electron density (i.e., highly flexible or disordered region). Furthermore, even within the solved structure of the C-terminal domain of the P-protein, some short regions were not modeled or incompletely modeled as well (residues 220-221, 231, 272-273, and 296-297) (PDB ID: 3OA1).
Importantly, as per the manually asserted information inferred from the sequence similarity and available in the UniProt database (https://www.uniprot.org/uniprotkb/ P06747/entry, accessed on 15 August 2022), most of the IDD2 region of the RABV Pprotein is expected to be engaged in binding to cytoplasmic dynein light-chains 1 and 2 (DYNLL1 and DYNLL2, see below). Furthermore, ANCHOR analysis [76,77] suggested that the P-protein contains three potential disorder-based binding sites, known as molecular recognition features (MoRFs), which are disordered regions that are expected to fold upon their interaction with specific partners, thereby driving protein-protein interactions. These are residues 34-90, 124-136, and 166-190. Therefore, it is likely that these segments (shown as gray-shaded areas in Figure 2A) might correspond to ligand binding sites of the RABV P-protein (see below). High levels of intrinsic disorder in the RABV P-protein are further evidenced by the analysis of its X-ray crystal structure (PDB ID: 3OA1). In fact, although the 69-297 fragment of this protein was used in the crystallization experiments, the structure was eventually solved for less than half of this construct (residues 192-297), with the entire N-terminal half (residues 69-191) representing a region with missing electron density (i.e., highly flexible or disordered region). Furthermore, even within the solved structure of the C-terminal domain of the P-protein, some short regions were not modeled or incompletely modeled as well (residues 220-221, 231, 272-273, and 296-297) (PDB ID: 3OA1).
Importantly, as per the manually asserted information inferred from the sequence similarity and available in the UniProt database (https://www.uniprot.org/uniprotkb/P06747/entry, accessed on 15 August 2022), most of the IDD2 region of the RABV P-protein is expected to be engaged in binding to cytoplasmic dynein light-chains 1 and 2 (DYNLL1 and DYNLL2, see below). Furthermore, ANCHOR analysis [76,77] suggested that the P-protein contains three potential disorder-based binding sites, known as molecular recognition features (MoRFs), which are disordered regions that are expected to fold upon their interaction with specific partners, thereby driving protein-protein interactions. These are residues 34-90, 124-136, and 166-190. Therefore, it is likely that these segments Additionally, the high disorder and structural flexibility content of the RABV P-protein suggests that the mechanism of action of this protein to suppress the type-I-interferonmediated immune response may be based on the utilization of its disordered (or flexible) regions for interaction with the STAT proteins of cells. In fact, although the RABV Pprotein binding site responsible for the STAT1/2 interaction is located within the ordered C-terminal domain (CTD, residues 186-297 [16,17,[101][102][103][104]), the residues that made the greatest contribution to this interaction and that form the so-called W-hole (C261, W265, and M287) were predicted to be flexible by at least one of the disorder predictors used in this study, with the highest level of structural flexibility being expected for residue M287, which is 100% conserved among most lyssavirus P-proteins [16] and shows a mean disorder score of 0.31 ± 0.12 (ranging from 0.16 to 0.5, as per the outputs of IUPred_short and PONDR ® VSL2, respectively (see Figure 2A). Similarly, the positive patch (residues K211, K212, K214, and R260), which is 100% conserved in the lyssavirus P-proteins and known to be responsible for interaction with the N-protein [16], is predicted to be flexible/disordered as well (see Figure 2A). This, again, suggests that structural flexibility plays a role in the interaction of the ordered CTD with partner proteins, including STAT1 and STAT2. It was pointed out that because of these C-terminal-domain-driven interactions of the RABV P-protein (CTD, residues 186-297 [101][102][103][104]) with host STAT proteins, the P-protein represents the major interferon antagonist of the lyssavirus, thereby affecting the type-Iinterferon (IFNα/β)-mediated innate immune response [16]. It was also pointed out that the interaction of the RABV P-protein with STATs is crucial for the development of the lethal rabies disease [16].
Another level of structural and functional complexity of this protein is given by the fact at it has multiple isoforms generated by the alternative initiation of the P-protein during viral transcription. In fact, alternative initiation generates isoforms P2, P3, P4, and P5, which differ from the canonical isoform P1 due to a lack of N-terminal residues 1-19, 1-52, 1-68, and 1-82. An obvious consequence of this truncation is the elimination of the first MoRF region of P1, suggesting that these isoforms might be characterized by different interactability. Curiously, although P3, P4, and P5 have all lost an N-terminal MoRF as expected, P2 was predicted to behave differently. In fact, despite missing the N-terminal residues predicted to be the MoRF in P1, this isoform gained three new N-terminal MoRFs (residues 1-14, 19-27, and 30-38).
The functional diversity of the RABV P-protein is further increased by the phosphorylation of its serine residues S63 and S64 by an unknown kinase (denoted rabies virus protein kinase (RVPK)) and residues S162, S210, and S271 by protein kinase C (PKC) [105], all located within IDPRs. A recent study revealed that the phosphorylation of the P3 isoform of the RABC P-protein at the S210 position resulted in a significant reduction in the nuclear localization modulated by the MT binding/bundling of P3 [106].
Therefore, due to the presence of regions with high intrinsic disorder content, several MoRFs, several phosphorylation sites, and the usage of alternative initiation, it is possible for the P-protein to serve many roles within the virus [10]. Furthermore, the P-protein isoforms were shown to differ in nucleocytoplasmic localization and microtubule (MT) association, mediated by several functional motifs, including the nuclear localization sequence (NLS, residues 211-214) and N-and C-terminally located nuclear export sequences (N-NES and C-NES, residues 49-58 and 223-232, respectively) [107][108][109]. For example, shorter isoforms (P3 to P5) lacking the N-terminally located NES are more nuclear and are capable of binding and bundling MTs [107]. As per the outputs of PONDR ® VSL2, N-NES and NLS are located within IDPRs (residues 33-89 and 208-216, respectively; see Figure 2A). Figure 2A shows that intrinsic disorder is unevenly distributed within the P-protein sequence. It is preferentially concentrated at its N-terminal and central regions (residues 1-200), with the C-terminal domain being predicted to possess a more ordered structure. In agreement with the results of the computational evaluation of the intrinsic disorder predisposition of this protein, a crystal structure was solved for residues 192-295 (see Figure 2B) (PDB ID: 3OA1). Curiously, as was already indicated, although a much longer fragment of the P-protein (residues 69-297) was used in the crystallization experiments, a very significant part of this polypeptide was not observed in the resulting structure, representing regions of missing electron density; i.e., regions with high conformational flexibility that preclude them from being crystallized.
Since the P-protein participates in the formation of Negri bodies [99] and can bind PML-bodies [110], we also checked the liquid-liquid phase separation (LLPS) potential of this protein by FuzDrop. This analysis revealed that the longest isoform (P1) of the P-protein is characterized by p LLPS = 0.5276 and contains a droplet-promoting region (DPR, residues 134-184) located within the long central IDPR of this protein (residues . A recent systematic analysis revealed that this region indeed plays a crucial role in the formation of viral Negri body (NB)-like structures in infected cells [100]. Furthermore, it was shown that the deletion of residues 151-181 did not affect the ability of the RABV P-protein to form NB-like structures, indicating that only the amino-terminal part of IDD2 (residues 132-150) is required for this process [100].

Disorder of the M-Protein and Its Suggested Functional Consequences
The M-protein is a 202-residue-long protein that plays a crucial role in the assembly and budding of the virion and engages in complete coverage of the ribonucleoprotein coil to keep it in a condensed bullet-shaped form (see Figure 1). It was found to be highly disordered as well, with 43% of this protein being composed of IDPRs (see Figure 3A). This suggests that the interactions of the M-protein with RelAp43 and other proteins in the NF-κB pathway [23] may induce the suppression of the antiviral response via the utilization of some advantages of intrinsic disorder. This is in line with the results of several studies on the potential roles of intrinsic disorder in proteins that form the shells of several viruses (such as SARS-CoV-2, MERS-CoV, SARS-CoV, other CoVs, Nipah, Zika, HIV, and retroviruses) for viral transmission, immune evasion, and virulence [112][113][114][115][116][117][118][119][120]. This intrinsic disorder in the M-protein may also allow for the increased flexibility of the protein to aid in the regulation of virus budding. The M-protein has been shown to have the ability to create vesicles without any interaction with other viral proteins, suggesting that the flexibility of the protein allows it to induce budding by itself [3,13]. Figure 3A presents the disorder profile of the MATRX_RABVP protein and shows that a section in the middle of the protein, spanning roughly from residue 50 to residue 175, displays, on average, low disorder content and is likely to represent the structural domain of the protein, which, however, includes some disordered/flexible regions.
Curiously, region 115-151, which is essential for the glycoprotein (as per manually asserted information inferred from the sequence similarity and available in the UniProt database; see https://www.uniprot.org/uniprotkb/P08671/entry, accessed on 15 August 2022) binding, includes an IDPR (residues 129-141, as per PONDR ® VLXT, or residues 130-137, as per PONDR ® VSL2), indicating that the intrinsic disorder (or structural flexibility) of this region can contribute to its interactability. Furthermore, the M-protein contains a PPxY motif (residues [35][36][37][38], which is commonly found in viral proteins capable of manipulating the autophagic machinery to prevent the autophagic degradation of viruses [121]. This PPxY motif is included in the long N-terminal IDPR (residues 1-48) (see Figure 3A). Finally, FuzDrop analysis indicated that although the M-protein shows a low probability of spontaneous LLPS (pLLPS = 0.2220), this protein contains one C-terminally located DPR (residues 199-202), which is included in the IDPR (residues 182-202), indicating that the M-protein can act as the droplet-client. FuzDrop also identified regions 16-29, 121-131, and 172-190 as regions with context-dependent interactions [111].

Disorder of the N-Protein and Its Suggested Functional Consequences
The 450-residue-long N-protein is a viral RNA-binding protein that encapsulates the genome in a ratio of one N-protein per nine ribonucleotides. The long C-terminal IDPR Although no structural information is currently available for the M-protein of RABV, we used SWISS-MODEL (https://swissmodel.expasy.org/, accessed on 15 August 2022) to create homology models for this protein. Figure 3C shows a model for the 30-202 fragment of the RABV M-protein that was created using the structure of the Lagos bat virus matrix protein (PDB ID: 2W2S; [122]; UniProt ID: Q6JAM6) as a template with sequence identity to the query M-protein of 76.73%. A comparison of Figure 3A,B illustrates the remarkable similarity between the per-residue intrinsic disorder predispositions of the Lagos bat virus matrix protein and the RABV M-protein, thereby providing the intrinsic-disorder-based validation of the selection of the Lagos bat virus matrix protein as a template.
Finally, FuzDrop analysis indicated that although the M-protein shows a low probability of spontaneous LLPS (p LLPS = 0.2220), this protein contains one C-terminally located DPR (residues 199-202), which is included in the IDPR (residues 182-202), indicating that the M-protein can act as the droplet-client. FuzDrop also identified regions 16-29, 121-131, and 172-190 as regions with context-dependent interactions [111].

Disorder of the N-Protein and Its Suggested Functional Consequences
The 450-residue-long N-protein is a viral RNA-binding protein that encapsulates the genome in a ratio of one N-protein per nine ribonucleotides. The long C-terminal IDPR and several shorter IDPRs make up roughly 31% of the protein (see Figure 4A). The N-protein, which is the most transcriptionally abundant protein during infection [123], has been shown to encapsulate the viral genomic RNA to protect it from nucleases and form a complex with the P-protein during replication [124]. The RABV P-protein binds to the RNA-free N • -protein through the N-terminus [100,125], which is predicted to be highly intrinsically disordered (see Figure 2A), whereas the N-terminal of the P-protein CTD is responsible for its interaction with the RNA-bound N-protein (see below). and several shorter IDPRs make up roughly 31% of the protein (see Figure 4A). The Nprotein, which is the most transcriptionally abundant protein during infection [123], has been shown to encapsulate the viral genomic RNA to protect it from nucleases and form a complex with the P-protein during replication [124]. The RABV P-protein binds to the RNA-free N°-protein through the N-terminus [100,125], which is predicted to be highly intrinsically disordered (see Figure 2A), whereas the N-terminal of the P-protein CTD is responsible for its interaction with the RNA-bound N-protein (see below). The complex formed during replication, called a Negri body, is an inclusion in the host's cytoplasm that is formed from interactions of the highly disordered C-terminal region of the N-protein with the intrinsically disordered regions of the P-protein [127]. The complex prevents non-specific RNA binding and the phosphorylation of the RNA [6]. The N-protein is predicted to have one MoRF (residues 406-411) located within the long Cterminal IDPR, which also includes a phosphoserine at position 389. A large region in the middle of the protein with little disorder suggests that the protein serves a largely struc- The complex formed during replication, called a Negri body, is an inclusion in the host's cytoplasm that is formed from interactions of the highly disordered C-terminal region of the N-protein with the intrinsically disordered regions of the P-protein [127]. The complex prevents non-specific RNA binding and the phosphorylation of the RNA [6]. The N-protein is predicted to have one MoRF (residues 406-411) located within the long C-terminal IDPR, which also includes a phosphoserine at position 389. A large region in the middle of the protein with little disorder suggests that the protein serves a largely structural purpose. The N-protein also functions to prevent the activation of the antiviral innate immune response receptor RIG-I-mediated antiviral response [19,20]. Although the actual information on the molecular mechanisms of this suppression is currently unavailable, it is tempting to hypothesize that the MoRF-containing IDPR found in the N-protein may serve to aid in suppressing the antiviral response. Figure 4A shows the results of the disorder prediction for the NCAP_RABVP protein. The crystal structure of the N-protein from the RABV strain ERA (which is 99.11% identical to the N-protein from the RABV strain PV) in complex with RNA was solved (PDB ID: 2GTT; [126]). Figure 4B shows that in this RABV nucleoprotein-RNA complex, the N-proteins are organized in an undecameric ring. In such a complex, the two core domains of the nucleoprotein clamp around the RNA at their interface and shield it from the environment [126]. Polymerization of the nucleoprotein is achieved by domain exchange between protomers, with flexible hinges allowing nucleocapsid formation. It is likely that the high structural flexibility and pliability of the C-terminal region of the protein, which is predicted to be highly intrinsically disordered, make it able to more tightly cover bound RNA. The nucleoprotein and the RNP core are able to adopt different conformations as a result at different periods of the viral cycle [127]. This important observation is illustrated in Figure 4C, which shows the structure of the N-protein protomer and demonstrates the presence of two "arms" in the structure (residues 6-28 and 349-414). Importantly, the C-terminal arm contains a MoRF. In addition to these functional arms, each protomer has several regions of missing electron density (residues 1-5, 104-117, 186-188, 374-397, and 449-450). Furthermore, according to the FuzDrop analysis, one can find four regions in the N-protein with context-dependent interactions (residues 105-115, 284-318, 367-396, and 398-411), which overlap, include, or are included in disordered/flexible regions of this protein (residues 103-111, 294-303, 361-400, and 403-428) (see Figure 4A).
Although, based on the FuzDrop analysis, the N-protein is expected to have a low probability of spontaneous LLPS (p LLPS = 0.1405) and does not include any DPRs, this RNA-binding protein is invariantly present in Negri bodies (NBs) [100]. It is likely that the involvement of the N-protein in NB biogenesis is linked to the ability of this protein to bind both viral RNA and the P-protein. In fact, no NBs were found when limiting concentrations of one of these proteins were expressed in model experiments [100]. Furthermore, even when the RABV P-and N-proteins were expressed alone (i.e., without viral RNA), they were capable of forming NB-like structures [100]. These observations indicated that in the resulting N-P inclusions, the RABV N-protein is likely bound to cellular RNAs, forming N-RNA complexes similar to viral nucleocapsids [125]. It is known that different regions of the P-protein are utilized in the interaction with the N-protein, where the disordered N-terminal domain interacts with the RNA-free N • -protein, whereas the P-protein CTD binds to the RNA-associated N-protein [100,125]. As was already indicated, the positive patch within the N-terminal region of the P-protein CTD that is actually responsible for the interaction with the RNA-associated N-protein contains flexible/disordered residues, further emphasizing the potential role of structural disorder/flexibility in NB biogenesis.

Disorder of the G-Protein and Its Suggested Functional Consequences
The G-protein is a 524-residue-long type I transmembrane protein with a long extravirion region (residues 20-459), a transmembrane helix (residues 460-480), and an intravirion domain (residues 481-524). Being located on the surface of RABV particles, the G-protein controls receptor binding and the release of the viral ribonucleoprotein (RNP) in the cytoplasm via pH-dependent membrane fusion, thereby playing a crucial role in the cell entry and in vivo spread [128]. Furthermore, it was shown that the G-protein (in particular, its ectodomain) accumulates adaptive mutations that improve the release of infectious viral particles [129].
The G-protein is synthesized as a precursor with the N-terminal signal peptide (residues [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19], which is removed during the maturation of this protein. Similar to the proteins found in the envelopes of other viruses, the RABV G-protein forms homotrimers on the surface of the virion that are responsible for the attachment of the virus to the host cellular receptors, such as the muscular form of the nicotinic acetylcholine receptor (nAChR), the neuronal cell adhesion molecule (NCAM), and the p75 neurotrophin receptor (p75NTR). There are approximately 400 such trimeric spikes, which are tightly arranged on the surface of the virus. The C-terminal domain of the G-protein (residues 258-505) is essential for trimer stability [8]. Figure 5A shows that the G-protein is mostly ordered and contains relatively few IDPRs, which is a characteristic feature of spike/glycoproteins of many other viruses. The most promising predicted disordered region is the C-terminal IDPR (residues 486-524; see Figure 5A), which corresponds to the intravirion domain of this protein (https://www.uniprot.org/uniprotkb/P08667/entry, accessed on 15 August 2022) that is engaged in the interaction with the matrix protein [13]. Several flexible regions located within the extravirion domain serve as aids in the interaction of the G-protein with surface molecules of the host cell [24]. There are three glycosylation cites in this protein (asparagine residues 56, 266, and 338) and a C-terminally located lipidation site, S-palmitoyl cysteine 479.
There is currently no structural information for the G-protein from the RABV strain PV. Therefore, we used SWISS-MODEL to create a homology model for this protein. Figure 5C shows a model for the 20-424 fragment of this protein that was generated using the known structure of the G-protein from the RABV strain CVS-11 (PDB ID: 6LGW [130]; UniProt ID: O92284) as a template, with sequence identity to the query G-protein of 91.48%. This structure is characterized by a highly elongated form and the presence of a C-terminal "arm" (residues 400-416). In the original structure of the G-protein from the RABV strain CVS-11, there are several regions of missing electron density (residues 21-24, 95-101, 133-143, 202-204, and 414-429).
Furthermore, the authors of this study present a structure for another form of this protein (PDB ID: 6LGX), which shows a different pattern of missing electron density regions (residues 21-25, 91-105, 131-147, 274-294, and 427-463). Since these two structures were resolved under different conditions (at~pH-6.5 in the complex with a neutralizing antibody 523-11 (PDB ID: 6LGW) and at~pH-8.0 in free form (PDB ID: 6LGX)), these observations suggest that this G-protein structure is characterized by its noticeable sensitivity to environmental conditions. The authors pointed out that the basic-to-acidic pH change results in large re-orientations of the three domains found in the G-protein from the RABV strain CVS-11, leading to concomitant domain-linker reconstructions that switch from a bent hairpin conformation into an extended conformation [130]. These low-pH-induced structural transitions within the domain-linker region are related to the functionality of the G-protein, playing important roles in G-protein-mediated membrane fusion [130]. Figure 5A,B show that the G-proteins from the RABV strains PV and CVS-11 are characterized by very similar disorder profiles, suggesting that the observations made for the structural peculiarities of the RABV G-protein from the CVS-11 strain are applicable to the G-protein from the RABV strain PV as well.
Similar to the N-protein, the glycoprotein of RABV is characterized by a low probability of spontaneous LLPS (p LLPS = 0.1351) and does not include any DPRs but contains five regions with context-dependent interactions (residues 18-29, 218-223, 229-239, 434-441, and 502-508), which overlap, include, or are included in the disordered/flexible regions of this protein (residues 214-225, 422-442, and 486-524) (see Figure 5A). There is currently no structural information for the G-protein from the RABV strain PV. Therefore, we used SWISS-MODEL to create a homology model for this protein. Figure 5C shows a model for the 20-424 fragment of this protein that was generated using the known structure of the G-protein from the RABV strain CVS-11 (PDB ID: 6LGW [130]; UniProt ID: O92284) as a template, with sequence identity to the query G-protein of 91.48%. This structure is characterized by a highly elongated form and the presence of a C-terminal "arm" (residues 400-416). In the original structure of the G-protein from the

Disorder of the L-Protein and Its Suggested Functional Consequences
With an amino acid sequence of 2124 residues, the L-protein is the longest protein in the RABV proteome. This protein is an RNA-directed RNA polymerase (RdRp) that catalyzes the transcription of viral mRNAs as well as their polyadenylation and capping. It has several functional regions, such as an RdRp catalytic domain (residues 611-799), a mononegavirus-type SAM-dependent 2 -O-MTase domain (residues 1674-1871) included in the C-terminal region that is involved in the interaction with the P-protein (residues 1562-2127) and contains a disorder-based interaction site, and a MoRF (residues 1631-1638). In addition to interactions with the P-protein, the L-protein may form homodimers. Although the L-protein includes many disordered or flexible regions, its overall intrinsic disorder level is relatively low (see Figure 6A).
It is possible that the need for interplay between ordered and disordered features in this protein reflects its purpose to serve as a low-fidelity viral polymerase, as intrinsic disorder and structural flexibility likely define the lower fidelity of the polymerase, which results in a high mutation rate and therefore higher flexibility in the ability of the virus to adapt to host defenses [7]. The polymerase activity of the L-protein depends on the identity of residues upstream of the protein, as well as the identity of C-terminal residues [131]. Figure 6C shows the structural model for the L-protein from the RABV strain PV built using the cryo-EM structure of the large structural protein from the RABV strain SAD B19 (sequence identity: 98.68%) complexed with a fragment of the P-protein (PDB ID: 6UEB, [132]; UniProt ID: P16289) as a template. Due to their high sequence similarity, the disorder profiles of the L-proteins from RABV strains SAD B19 and PV are almost identical (cf. Figure 6A,B). Since in the aforementioned cryo-EM structure of the L-P complex, residues 1-27 of the L-protein from the RABV strain SAD B19 constitute a region of missing electron density, it is likely that this N-terminal region is disordered in the L-protein from the RABV strain PV as well.

Host Interactors of the N-Protein
Very few host proteins were shown to act as physical partners of the RABV N-protein.

Host Interactors of the L-Protein
The only human partner of the L-protein is DYNLL1, which is shared with the P-protein [71].  Supplementary Figures S1-S5, respectively. The corresponding analyses were conducted using a set of computational tools, such as RIDAO, STRING, D 2 P 2 , and AlphaFold. This revealed a very high level of disorder in the majority of the proteins from this dataset, with the entire set being characterized by a mean PPIDR of 41.6 ± 20.9% (as evaluated using the outputs of the PONDR ® VSL2 predictor, which is one of the most accurate stand-alone disorder predictors [142,143]).
Supplementary Figures S1-S5 show that all of these proteins contain multiple IDPRs of various lengths. Many proteins contain multiple MoRFs, and almost all human proteins in this dataset are densely decorated by a multitude of different PTMs. These observations suggest that intrinsic disorder in this these proteins might be related to their functionality, playing a role in their binding promiscuity, as evidenced by dense PPI networks centered on these proteins.
These observations are further illustrated in Figure 7, which presents the global intrinsic disorder characteristics of 37 human proteins. In fact, Figure 7A shows there is no single protein in this dataset that could be classified as mostly ordered, whereas 25 proteins (67.6%) are expected to be mostly disordered (i.e., their PPIDR exceeds 30%). This classification is accepted in field practice to group proteins based on their PPIDR values [87], where proteins with PPIDR < 10% are considered ordered or mostly ordered; proteins with 10% ≤ PPIDR < 30% are considered moderately disordered; and proteins with PPIDR ≥ 30% are considered highly disordered [87]. , and PFDN1 (O60925; 50%)) exceed 50%, indicating that 35.1% of human proteins interacting with RABV are expected to be extremely disordered. These levels of disorder in human RABV interactors are comparable to those observed in the entire human proteome, where out of 20,317 proteins, 12,363 proteins (60.8%) and 7590 (37.3%) are characterized by PPIDR ≥ 30% and PPIDR ≥ 50%, respectively. Figure 7B shows the ∆CH-∆CDF plot (a combined binary predictor of intrinsic disorder) that verifies a global prevalence of intrinsic disorder in 37 human proteins interacting with the RABV proteins. The ∆CH-∆CDF plot provides the means for the evaluation of the flavors of intrinsic disorder. Figure 7B shows that quadrant Q1 (bottom right corner) contains 22 proteins that are predicted to be ordered by both predictors; quadrant Q2 (bottom left corner) includes 10 proteins, which are predicted to be ordered/compact by the CH-plot and disordered by CDF (i.e., it contains either molten globular proteins, which are compact, but without unique 3D structures, or hybrid proteins containing comparable levels of ordered and disordered residues); and quadrant Q3 (top left corner) includes 4 highly disordered regions (native coils or native pre-molten globules), which are predicted to be disordered by both predictors. Finally, one protein in quadrant Q4 (top right corner) is predicted to be disordered by the CH-plot and ordered by the CDF-plot.
Therefore, 15 human proteins that interact with the RABV proteins are predicted to contain very noticeable levels of disorder (i.e., they are located outside quadrant Q1). This correlates well with the results shown in Figure 7A, where 13 proteins are located within the dark-red segment.
The apparent discrepancies between the data shown in Figure 7A,B are rooted in the principle differences in the tools utilized for these analyses, where Figure 7A represents the outputs of the per-residue predictor, whereas Figure 7B reports data generated by the so-called binary predictors; i.e., tools that classify query proteins as mostly ordered or mostly disordered. Obviously, mostly ordered proteins might contain noticeable levels of disordered residues, whereas mostly disordered proteins might possess noticeable levels of ordered residues.
Not surprisingly, all 37 proteins in the analyzed set were found to form a rather dense PPI network (see Figure 8A), in which, on average, each protein interacts with at least 11 partners.
In this intraset PPI network, the five most significantly enriched biological processes were viral process (GO:0016032; p = 3. 49  i.e., residues with disorder scores above 0.5. Color blocks indicate regions in which proteins are mostly ordered (blue and light blue), moderately disordered (pink and light pink), or mostly disordered (red). If the two parameters agree, the corresponding part of the background is dark (blue or pink), whereas light blue and light pink reflect areas in which only one of these criteria applies. (B) Charge-hydropathy and cumulative distribution function (CH-CDF) plot. The Y-coordinate is calculated as the distance of the corresponding protein from the boundary in the CH plot. The X-coordinate is calculated as the average distance of the corresponding protein's CDF curve from the CDF boundary. The quadrant in which the protein is located determines its classification. Q1, protein predicted to be ordered by the CH-plot and CDF. Q2, protein predicted to be ordered by the CH-plot and disordered by the CDF-plot. Q3, protein predicted to be disordered by the CHplot and CDF. Q4, protein predicted to be disordered by the CH-plot and ordered by CDF. Figure 7B shows the ΔCH-ΔCDF plot (a combined binary predictor of intrinsic disorder) that verifies a global prevalence of intrinsic disorder in 37 human proteins interacting with the RABV proteins. The ΔCH-ΔCDF plot provides the means for the evaluation of the flavors of intrinsic disorder. Figure 7B shows that quadrant Q1 (bottom right corner) i.e., residues with disorder scores above 0.5. Color blocks indicate regions in which proteins are mostly ordered (blue and light blue), moderately disordered (pink and light pink), or mostly disordered (red). If the two parameters agree, the corresponding part of the background is dark (blue or pink), whereas light blue and light pink reflect areas in which only one of these criteria applies. (B) Chargehydropathy and cumulative distribution function (CH-CDF) plot. The Y-coordinate is calculated as the distance of the corresponding protein from the boundary in the CH plot. The X-coordinate is calculated as the average distance of the corresponding protein's CDF curve from the CDF boundary. The quadrant in which the protein is located determines its classification. Q1, protein predicted to be ordered by the CH-plot and CDF. Q2, protein predicted to be ordered by the CH-plot and disordered by the CDF-plot. Q3, protein predicted to be disordered by the CH-plot and CDF. Q4, protein predicted to be disordered by the CH-plot and ordered by CDF. so-called binary predictors; i.e., tools that classify query proteins as mostly ordered or mostly disordered. Obviously, mostly ordered proteins might contain noticeable levels of disordered residues, whereas mostly disordered proteins might possess noticeable levels of ordered residues.
Not surprisingly, all 37 proteins in the analyzed set were found to form a rather dense PPI network (see Figure 8A), in which, on average, each protein interacts with at least 11 partners.  To include all proteins in the network, a low confidence of 0.15 was used as the minimum required interaction score in this case. This network includes 36 proteins linked by 204 interactions. The resulting average node degree of this network is 11.3, and its average local clustering coefficient (which defines how close its neighbors are to being a complete clique; the local clustering coefficient is equal to 1 if every neighbor connected to a given node Ni is also connected to every other node within the neighborhood, and it is equal to 0 if no node that is connected to a given node Ni connects to any other node that is connected to Ni) is 0.578. Since the expected number of edges in a network of the same size for proteins randomly selected from the human proteome is 128, this network is characterized by a PPI enrichment p-value of 4.7 × 10 −10 . (B) The STRING-generated PPI network centered on human proteins interacting with the RABV proteins. Note that the number of interactors in STRING is limited to 500. This network, generated with a high confidence score of 0.7, includes 536 proteins connected by 11,358 interactions. The average node degree and average local clustering coefficient of this PPI are 42.4 and 0.631, respectively; its PPI enrichment p-value is <10 −16 .
Although a detailed description of the potential roles of intrinsic disorder in the functionality of human proteins shown to interact with the RABV proteins is outside the scope of this study, it is clear that all of the proteins analyzed here (i.e., RABV N-, L-, P-, M-, and Gproteins and 37 human proteins) contain very significant levels of intrinsic disorder. This is further illustrated in Figure 9, showing experimentally identified and validated interactions between the five RABV proteins, G (glycoprotein), N (nucleoprotein), L (RNA-dependent polymerase), P (phosphoprotein), and M (matrix protein), and 37 human proteins. This diagram clearly shows that most of the proteins (viral and human) in this network are "red" (highly disordered), and there are no "blue" (mostly ordered) proteins, suggesting the importance of intrinsic disorder for RABV infection.

Conclusions
All Rabies lyssavirus PV proteins contain IDPRs, most of which are expected to aid in the flexibility of the virus and its ability to evade host antiviral defenses. All human proteins found to be RABV interactors also contain high levels of intrinsic disorder, with most of these proteins being highly disordered. This disorder-centric layer of complexity in RABV and its interaction with host proteins adds a new angle to the search for potential targets for anti-rabies drugs. Once the virus has infected a host cell, there are virtually no effective treatments available that can be used to destroy the virus. Although the modern RABV vaccine has largely eradicated the virus in developed countries, many people around the world are unable to seek treatment until the virus has already crossed the blood-brain barrier. Without the immediate usage of the vaccine upon infection, the RABV-related mortality rate is close to 100%. The advent of bioinformatics approaches in a clinical setting has inspired many developments that have led to the creation of new drugs. It is likely that targeting regions of intrinsic disorder within the viral proteins or

Conclusions
All Rabies lyssavirus PV proteins contain IDPRs, most of which are expected to aid in the flexibility of the virus and its ability to evade host antiviral defenses. All human proteins found to be RABV interactors also contain high levels of intrinsic disorder, with most of these proteins being highly disordered. This disorder-centric layer of complexity in RABV and its interaction with host proteins adds a new angle to the search for potential targets for anti-rabies drugs. Once the virus has infected a host cell, there are virtually no effective treatments available that can be used to destroy the virus. Although the modern RABV vaccine has largely eradicated the virus in developed countries, many people around the world are unable to seek treatment until the virus has already crossed the blood-brain barrier. Without the immediate usage of the vaccine upon infection, the RABV-related mortality rate is close to 100%. The advent of bioinformatics approaches in a clinical setting has inspired many developments that have led to the creation of new drugs. It is likely that targeting regions of intrinsic disorder within the viral proteins or host proteins that directly interact with the RABV proteins will help in creating novel drugs that can target the virus once it has infected the central nervous system.