In-Silico Approaches for Molecular Characterization and Structure-Based Functional Annotation of the Matrix Protein from Nipah henipavirus

: Nipah henipavirus is an emerging RNA virus that poses a danger to world safety due to its high fatality rate. The Nipah virus has caused several illness epidemics in South and Southeast Asia. The matrix protein of Nipah henipavirus plays a crucial function in linking the viral envelope to the virus core. Connecting the viral envelope to the virus core is critical for virus assembly. Through analyses of structural and functional protein explanations, bioinformatics tools can aid in our comprehension of the protein. This study intends to provide structural and functional annotations to proteins. Using in silico approaches, the analysis also assigns the protein’s physicochemical properties, three-dimensional structure, and functional annotation. The in silico research validated the protein’s hydrophilic nature and alpha ( α ) helix-dominated secondary structure. The protein’s tertiary structure model is generally consistent based on various quality evaluation approaches. The functional explanation claimed that the protein is a structural protein that connects the viral envelope to the virus core, a protein that is necessary for virus assembly. This study reveals the importance of the matrix protein as a functional protein needed by Nipah henipavirus .


Introduction
Nipah henipavirus is a bat-borne virus that can infect humans and other animals [1,2]. Nipah virus infection is a zoonotic disease that is spread from animals to humans. It can also be spread through contaminated food or from person to person. It causes a variety of symptoms in infected individuals, ranging from asymptomatic (subclinical) infection to acute respiratory sickness and deadly encephalitis [3][4][5]. The Nipah virus is a member of the Paramyxoviridae family and the genus Henipavirus along with the Hendra virus, which has also caused disease outbreaks [6,7]. The Nipah virus genome is a single (nonsegmented) negative-sense, single-stranded RNA with a length of over 18 kb, which is far longer than that of other paramyxoviruses [8,9]. The Nipah virus was initially detected in pigs and pig farmers in Peninsular Malaysia in 1998 [9]. Infection outbreaks of the Nipah virus have been documented in Malaysia, Singapore, Bangladesh, and India [6]. The highest rates of death from Nipah virus infection have been reported in Bangladesh, where outbreaks are most common in the winter [10,11]. The consumption of fruits or fruit products (such as raw date palm juice) contaminated with urine or saliva from infected fruit bats was the most likely cause of infection in later outbreaks in Bangladesh and India [11]. About 700 human cases of the Nipah virus had been reported as of May 2018, with 50 to 75 percent of those affected dying [12]. In the Indian state of Kerala, an epidemic of the disease led to 17 deaths in May 2018 [13].
A study of the proteins using bioinformatics tools allows researchers to assess their three-dimensional structural conformation, classify new domains, explore certain pathways to gain a better understanding of our evolutionary tree, uncover more clusters, and assign roles to the proteins [14][15][16]. This knowledge can also be used to develop successful pharmacological methods and aid in the development of new drugs to treat a wide range of diseases [17][18][19]. This study demonstrated the matrix protein secondary as well as tertiary characteristics that are associated with protein-structure relationships. The selected protein can be used as a potential target for protein-based drug and protein-based vaccine design candidates to minimize the viral infection.

Protein Selection and Sequence Retrieval
The amino acid (aa) sequence of the matrix protein found in Nipah henipavirus was obtained in FASTA format from the NCBI database [20].

Physicochemical Characterization of the Selected Protein
The amino acid sequence composition, instability index, aliphatic index, GRAVY (assessment of the hydrophobicity or hydrophilicity of a protein), and extinction coefficients were all measured using the ExPASy server ProtParam tool [21]. The theoretical isoelectric point (pI) of the QBQ56721.1 protein was also measured using SMS Suite (v.2.0) [22].

Functional Annotation of the Selected Protein
The conserved domain in the protein QBQ56721.1 was predicted using the NCBI platform's CD-search tool [23,24]. The ExPASy software's ScanProsite tool [25] and Pfam tool [26] were used to determine protein motifs. The evolutionary relationships of the protein QBQ56721.1 were assigned by the SuperFamily program [27].

Secondary Structural Properties and Assessment
The self-optimized prediction method with alignment (SOPMA) was used to predict secondary structure elements [28,29]. The secondary structure was predicted using the SPIPRED (v.4.0) [30] algorithm.

Three-Dimensional Structure Prediction and Validation of the Selected Protein
With Modeller [31], HHpred [32] predicted the three-dimensional (tertiary) structure. The most suitable template (HHpred ID: 6BK6 A) was chosen for creating the tertiary structure, with a probability, an E value, an aligned Cols, and goal lengths of 100, 2.4 × 10-116, 342, and 372, respectively. To predict the Ramachandran plot and validate the expected tertiary structure, the PROCHECK tool of the SAVES (v.6.0) program [33,34] was used.

Protein Sequence Retrieval
The amino acid (aa) sequence of the Nipah henipavirus protein (QBQ56721.1) was obtained from the NCBI database. The 352-amino-acid-long protein sequence was used to model the tertiary structure of the protein QBQ56721.1. Table 1 provides additional information on the protein (QBQ56721.1).

Identification of the Physicochemical Properties of the Protein
The amino acid sequence of QBQ56721.1, which is found in Nipah henipavirus, was obtained in FASTA format and utilized as a query sequence for physicochemical parameter measurement. The protein is stable because its instability index is 30.59 (less than 40.00) [35]. The theoretical isoelectric point (pI) of the protein (pI 9.31, 9.65 *) indicates that it is basic [36][37][38]. The molecular weight, aliphatic index, instability index, and GRAVY are 39,847.16 Dalton, 89.69, 30.59, and −0.212, respectively (Table 2) [39][40][41]. The protein's higher aliphatic index value of 89.69 indicates increased thermos-stability over a wide temperature range, which is a favorable factor [42,43]. The GRAVY index value of −0.212 suggested the protein's hydrophilic character and, hence, the prospect of more water interaction [44,45].

Functional Annotation Anticipation of the Selected Protein
The NCBI CDD tool identifies the domain that appears in identical protein sequences. RPS-BLAST is used by CD-Search to compare a test sequence to position-specific rating datasets compiled from conserved domain (CD) alignments in the CD protein cluster. The CD search engine identified a conserved domain in the protein QBQ56721.1 as a viral matrix protein (matrix, accession no. pfam00661). Viral matrix proteins are structural proteins that connect the viral envelope and the virus core [46,47]. The matrix protein plays an important role in virus assembly and in linking the viral envelope with the virus core. It is possible that they are found in Morbillivirus, Paramyxovirus, and Pneumovirus [47,48]. A motif was also predicted by the Pfam software at locations 16-349 (Pfam ID: PF00661; Viral matrix protein; e value of 1:7 10146). Protein motifs are small regions of a three-dimensional protein structure or amino acid sequence that are shared by multiple proteins [48,49]. Motifs are distinct regions of a protein structure that may or may not be defined by a distinct chemical or biological function [48,[50][51][52][53].
The CDD technique also confirmed the presence of the viral matrix protein at the 17-348 position. The lone member of the superfamily cl02918 is the viral matrix protein (CDD no. pfam00661). A protein superfamily is a group of proteins made up of one or more protein families [46,53,54]. The set of all superfamilies must be a partitioning of the set of all protein sequences or subsequences defined by the protein families' relationship, and each superfamily must be closed under transitivity [54]. The protein QBQ56721.1 (Figure 1) was predicted to be closely related to the matrix superfamily by the SuperFamily tool (e value of 0.0). Main text paragraph (M_Text).

Functional Annotation Anticipation of the Selected Protein
The NCBI CDD tool identifies the domain that appears in identical protein sequences. RPS-BLAST is used by CD-Search to compare a test sequence to position-specific rating datasets compiled from conserved domain (CD) alignments in the CD protein cluster. The CD search engine identified a conserved domain in the protein QBQ56721.1 as a viral matrix protein (matrix, accession no. pfam00661). Viral matrix proteins are structural proteins that connect the viral envelope and the virus core [46,47]. The matrix protein plays an important role in virus assembly and in linking the viral envelope with the virus core. It is possible that they are found in Morbillivirus, Paramyxovirus, and Pneumovirus [47,48]. A motif was also predicted by the Pfam software at locations 16-349 (Pfam ID: PF00661; Viral matrix protein; e value of 1:7 10146). Protein motifs are small regions of a three-dimensional protein structure or amino acid sequence that are shared by multiple proteins [48,49]. Motifs are distinct regions of a protein structure that may or may not be defined by a distinct chemical or biological function [48,[50][51][52][53].
The CDD technique also confirmed the presence of the viral matrix protein at the 17-348 position. The lone member of the superfamily cl02918 is the viral matrix protein (CDD no. pfam00661). A protein superfamily is a group of proteins made up of one or more protein families [46,53,54]. The set of all superfamilies must be a partitioning of the set of all protein sequences or subsequences defined by the protein families' relationship, and each superfamily must be closed under transitivity [54]. The protein QBQ56721.1 ( Figure  1) was predicted to be closely related to the matrix superfamily by the SuperFamily tool (e value of 0.0). Main text paragraph (M_Text).

Tertiary-Structure Anticipation and Validation of the Protein
The target sequence of QBQ56721.1 in FASTA format was inserted into the HHpred Template Selection tool as the input, and the most suitable template (6BK6 A) was selected with a probability rate of 100%, an E-Value of 2.4 × 10-116, a Cols of 342, and a target length of 372 and finally stored the tertiary modeled protein structure in PDB format, as predicted by Modeller (Figure 3). The Ramachandran plot by PROCHECK (Figure 4) was used to assess the matrix protein's tertiary structure, which revealed that 92.4 percent of the total residues (342) were found in the core (A,B,L); 6.3 percent of residues were in the

Tertiary-Structure Anticipation and Validation of the Protein
The target sequence of QBQ56721.1 in FASTA format was inserted into the HHpred Template Selection tool as the input, and the most suitable template (6BK6 A) was selected with a probability rate of 100%, an E-Value of 2.4 × 10-116, a Cols of 342, and a target length of 372 and finally stored the tertiary modeled protein structure in PDB format, as predicted by Modeller (Figure 3). The Ramachandran plot by PROCHECK (Figure 4) was used to assess the matrix protein's tertiary structure, which revealed that 92.4 percent of the total residues (342) were found in the core (A, B, L); 6.3 percent of residues were in the additional allowed regions (a, b, l, p); and 0.7 percent of residues were in the generously allowed regions (a, b, l, p). The total number of non-glycine and non-proline residues was 301; that of the the end-residues (excluding Gly and Pro) was 1.0; that of the glycine and proline residues was 27 and 13, respectively, out of 473 total residues (Table 5). Verify 3D: a tertiary structure evaluation tool was used to demonstrate that the anticipated tertiary structure passed the evaluation.

Conclusions
Understanding how proteins function is vital for describing how they work, and this protein is critical for virus assembly. With the virus core, the matrix protein binds to the viral envelope. This research reveals the protein's fundamental features, such as its hydrophilic nature and functional annotation, in relation to its tertiary structure. As a result, the outcomes of this study demonstrate the efficacy and scope of future research on the matrix protein using the bioinformatics methodologies used in this investigation. The selected protein's secondary and tertiary structures demonstrated the protein-function relationships of the matrix protein. This research will strengthen and sharpen our understanding of pathophysiology, allowing for the development of promising protein-based drugs and vaccine candidates to combat Nipah virus infection.