Comprehensive Linear Epitope Prediction System for Host Specificity in Nodaviridae

Background: Nodaviridae infection is one of the leading causes of death in commercial fish. Although many vaccines against this virus family have been developed, their efficacies are relatively low. Nodaviridae are categorized into three subfamilies: alphanodavirus (infects insects), betanodavirus (infects fish), and gammanodavirus (infects prawns). These three subfamilies possess host-specific characteristics that could be used to identify effective linear epitopes (LEs). Methodology: A multi-expert system using five existing LE prediction servers was established to obtain initial LE candidates. Based on the different clustered pathogen groups, both conserved and exclusive LEs among the Nodaviridae family could be identified. The advantages of undocumented cross infection among the different host species for the Nodaviridae family were applied to re-evaluate the impact of LE prediction. The surface structural characteristics of the identified conserved and unique LEs were confirmed through 3D structural analysis, and concepts of surface patches to analyze the spatial characteristics and physicochemical propensities of the predicted segments were proposed. In addition, an intelligent classifier based on the Immune Epitope Database (IEDB) dataset was utilized to review the predicted segments, and enzyme-linked immunosorbent assays (ELISAs) were performed to identify host-specific LEs. Principal findings: We predicted 29 LEs for Nodaviridae. The analysis of the surface patches showed common tendencies regarding shape, curvedness, and PH features for the predicted LEs. Among them, five predicted exclusive LEs for fish species were selected and synthesized, and the corresponding ELISAs for antigenic feature analysis were examined. Conclusion: Five identified LEs possessed antigenicity and host specificity for grouper fish. We demonstrate that the proposed method provides an effective approach for in silico LE prediction prior to vaccine development and is especially powerful for analyzing antigen sequences with exclusive features among clustered antigen groups.


Introduction
Nodaviridae infection is a common cause of death in marine animals and insects, and the virus family is classified into several genera according to host specificity. To date, various vaccines have been developed for aquaculture, including recombinant proteins, synthetic peptides, inactivated virions, DNA vaccines, and virus-like particles. However, the efficacy of these vaccines remains unsatisfactory. Therefore, a more effective immunization strategy and a comprehensive vaccine development against these viruses are important MrNV does not cause death in adult prawns, they might serve as virus carriers to transmit viruses and are known to cause 100% mortality in larval and post-larval prawns [4].
Immunologists have developed an integrated method for vaccine development based on the analysis of the protein sequences and the structures of target viruses [22]. For example, major capsid protein (MCP) can be assembled into virus-like particles (VLPs). B-cells play an important role in the immune system. Immunoglobulin, with the same antigen specificity, is secreted as an antibody by terminally differentiated B cells. Membranebound immunoglobulin on the surface of B cells serves as a receptor for antigens and is known as the B-cell receptor (BCR). Each of them is associated with unique receptor specificity. When a BCR binds to a cognate antigen, the B-cell receptor is stimulated to undergo proliferation. This involves the generation of plasma B cells and memory T cells [23]. Antigens are typically too large to bind to any receptor. Hence, partial antigen segments located on surface areas, called epitopes, are recognized by specific antibodies. Epitopes are generally divided into two categories: linear epitopes (LE), where a stretch of continuous amino acids is sufficient for binding, and conformational epitopes (CE), consisting of key amino acid residues that are brought together by protein folding [24]. CE prediction requires antigenic structures, mostly those of major capsid proteins (MCPs). MCP protein structure is also needed to further host-specific structural analysis to be resolved prior to conformational analysis. However, only a few protein structures have been resolved for the Nodaviridae. We applied both the empirical approaches (Phyre2 [25] and I-TASSER [26]) and state-of-the-art computational (RoseTTAFold) [27] methods. We focused on LE prediction, because information on the corresponding antigenic structures is scarce. The ideal predicted peptides should effectively elicit antibodies from specific hosts that recognize antigens and provide protection against infections [28]. Therefore, the peptides selected for vaccine design should ideally be conserved across different stages of the pathogen and possess binding affinity for the major populations of specific hosts. In this report, we present an integrated computational system incorporating a multi-expert voting mechanism and host-specific and surface structural analytics for LE prediction. Five existing epitope prediction tools including LBTOPE [29], BepiPred [30], BCPreds [23], ABCPred [31], and LEPS [32] were applied, and three important features were considered, including length constraint, physicochemical characteristics, and host-specific features. Candidates were screened based on these features from the five selected prediction systems, using different approaches and databases. The predicted epitopes were analyzed through surface structural characteristics and experimentally verified.

Materials and Methods
Host species and their corresponding major capsid protein sequences were retrieved from NCBI [33] and UniProt [34]. Due to the characteristics of Nodaviridae, the P-domain section was extracted from the full sequences of specific trials. In addition, we applied Phyre2, I-TASSER, and RoseTTAfold to predict the three-dimensional models, because the MCP structures of certain species in Nodaviridae have not yet been resolved. We retrieved the resolved structures of Nodaviridae.pdb files from RCSB. Species names were acquired from the ICTV taxonomy. In total, 3 genera and 18 species of Nodaviridae were collected. The genera, tentative species, and specific infected hosts are listed in Table 1.

Multi-Expert Voting Mechanism-Based LE Prediction
Our system is a metamodel that ensembles results from several existing prediction servers. In this case, we integrated five LE prediction tools into our voting mechanism to predict LEs for each representative antigenic sequence, including LBTOPE, BepiPred, BCPREDS, ABCPred, and LEPS. Each virus group was analyzed using five LE prediction tools designed to identify each antigenic residue as an epitope or a non-epitope residue. Each residue in a query antigen sequence possesses a corresponding score from the five different prediction tools. A higher score represented a higher possibility of the residue being an epitope. For comparison of the two subfamilies, if a continuous segment is predicted as an epitope segment within a subfamily, the system will check whether aligned segments from the other subfamily are also predicted as epitopes; if so, the system classifies this segment as a conserved epitope in both subfamilies, otherwise it is identified as a unique epitope. The observed variations in sequence or structural alignments could be associated with host specificity. The exclusive epitope segments within the same clustered antigen subfamily may play important roles in binding with antibodies present within the same group of host species. Multiple sequence alignments were performed using T-Coffee [35]. We also applied the protein structure prediction method, RoseTTAFold, to predict the P-domain of betanodaviridae. To make a difference between homogeneous and computational methods, we chose a P-domain section of resolved structure "4RFU" and a structure which lacked P-domain"3JBM" to predict protein structures.

Validation Method
In addition to the multi-expert voting mechanism prediction model, we proposed an additional complete sequence search (CSS) and a validation model consisting of a variety of propensity scales for enhanced evaluation. CSS applies BLAST tools (BLASTp-short) to search for all existing known and experimentally proven antigen peptides from the largest IEDB database. If experimentally proven epitopes in the IEDB could be matched by the predicted segments, there would be a higher possibility that the predicted LEs are genuine epitopes. Furthermore, we statistically analyzed the predicted LEs according to their residue contents and physicochemical properties for a reinforced classifier design. These features are introduced as follows.
Amino acid pairs (AAP) were generated by scanning the peptides using a window of two residue lengths and calculating the frequency of occurrence for each AAP. In total, 400 AAPs were generated. If a query peptide contains AAPs that belong to AAPs within true epitopes, there is a greater chance that the peptide could be considered an epitope. The SVMtrip_16AA [36] dataset contains two subsets: positive (LE) and negative (non-LE). We calculated the occurrence frequencies of AAPs between these two subsets to indicate the tendencies of epitopes and non-epitopes. Here, we refer to a previous study [37] to determine the frequency of occurrence of each pair. f + AAP_i and f − AAP_i are the occurrence frequencies of given AAP_i in the epitope and non-epitope set. N + AAP_i and N − AAP_i are the number of the specific ith AAP from 400 possible AAPs in the epitopes and non-epitopes. Finally, Total + AAP and Total − AAP are the total number of 400 AAPs in the epitope set and the non-epitope set. The differences between the two subsets can be interpreted as a likelihood ratio and normalized as an AAP antigenicity scale, Norm(R AAP ), by the Equation (1).
The position-specific scoring matrix (PSSM) is a commonly used representation of sequence motifs. PSSM is a position weight matrix (PWM) that can distinguish evolved sequences and genuine binding sites among similar sequences [38]. First, a position frequency matrix (PFM) creates a column for each amino acid, corresponding to a total of 20 rows for amino acids in the protein sequences. An alignment result X is given. Each column is created by calculating the occurrence of each position in a sequence. Second, a position probability matrix (PPM) can be created by dividing the occurrence counts by the number of sequences. If we give a set of N-aligned sequences for the sequence length of L residues: alignment result X (size L*N), the value in a corresponding PPM (matrix M) can be calculated using Equation (2). I (X i,j = k) is an indicator function, where I (a = k) is 1 if a = k and 0 otherwise. Additionally, we applied a variable b = 1/k as a background model (expected probability), where k is the total number of amino acids (i.e., k = 20). PPM can be converted to PWM using Equation (3). We can convert a sequence to parameter information content (IC) using amino acids with corresponding positions in the PWM.
For the propensity scale, sequences were scanned using a sliding window constituting the central residues i and neighborhood residues (i ± 1 2 * window size). We assigned a value of 7 to the window size parameter [39]. We also applied four physicochemical scales extracted from the ProtParam tool for reinforced evaluation [40], including hydrophobicity [41], flexibility [42], surface accessibility [43], and polarizability [44]. In addition, we adopted the "surface patch" strategy to describe the local spatial context of each residue in the predicted epitopes. Commonly, the surface patch consists of some spatially adjacent surface residues and the central residue itself and is classified as epitope patch and nonepitope patch according to the state assigned to the central residues [45]. In this study, the surface patch consisted of the residues of predicted epitopes, and the middle residue of the peptide sequence was taken as the central residue. To gain insight into the common structural contents or physicochemical characteristics of the predicted LE, the surface patch was evaluated based on the presence of several known features. To measure the spatial features of the adjacent residues, we considered whether the distance between adjacent residues and the central residue could impact antigenicity and calculated the average distance of the surface patches. Furthermore, the contributions of interior and surface residues were also taken into consideration. If the relative accessible surface area (RASA) calculated by the DSSP [46] program was greater than 5%, the residue was considered to be a "surface residue" [47]. Subsequently, the ratio of the number of surface residues to the number of residues in the interior of the peptide was calculated. Non-polar molecules exposed to water are unfavorable and hydrophobic molecules are usually located in the center, therefore hydrophilic and surface-exposed amino acids are preferable. The values of half-sphere exposure (HSE), which is widely used in protein structural analysis and provides relatively more geometric information than other measurements, were also calculated in this study. A larger HSE value indicates that the Cα of the central residue is more adjacent to other Cα atoms [48]. Finally, the residue depth from a Cα atom to the protein surface for query residues was also considered. Both the HSE and the residue depth were obtained using MSMS [49] and Biopython's Bio.PDB package. In this phase, a support vector machine (SVM) classifier with "RBF" kernel was applied to train the prediction model. B cell epitope datasets were taken from SVMTrip_20AA [36], Chen's database, Epitopia [50], and Bepipred-2.0 [30]. Among them, a total of 6969 epitopes and 6962 random peptides were collected. Datasets for evaluation were obtained from IEDB and Uniport, which contain 20,335 experimentally validated epitopes and 20,161 randomly selected peptides. We also calculated two properties of surface shape-shape index and curvedness-to evaluate the predicted epitopes from resolved protein structures. The shape index (S i ) is a number ranging from -1 to 1, the larger number the number of curvedness the more it shows how curved the object is, which describes the shape of the local surface at any given point and is independent of the scale of the surface. Points with positive values represent convex doom and negative values represent concave cup. We converted epitope patches from .pdb file to point cloud format with MSMS and applied pymesh [51] to calculate mean and Gaussian curvature to each vertex of the model. Shape index and curvedness were calculated by Equation (4) and Equation (5) on a surface, there will be two points at which the curvature reaches a maximum K max and minimum K min [52,53].

Biological Assays
According to the constructed prediction systems and in silico validation principles, the predicted exclusive LEs and reference segments for betanodaviruses were chemically synthesized by PepPower™ Peptide Synthesis Technology (GenScript, Piscataway, NJ, USA). After synthesis, the synthesized epitopes were used as antigens for antigenicity tests. All peptide samples were proceeded in triplicate by immunoassays. The assays were performed using an enzyme-linked immunosorbent assay (ELISA) to validate the antigenicity of the predicted LEs. This immunological analysis is very sensitive and highly specific for the detection and quantification of substances such as antibodies, antigens, and other proteins. The antigen-containing samples were coated on 96-well microplates containing polystyrene and incubated until they were adsorbed onto the surface of the microplate with coating buffer (0.2 M sodium carbonate/bicarbonate, pH 9.4). The coating buffer immobilizes antigens, which leads to maximal adsorption on the microplate surface and optimization of interactions with the detection antibody. The hydrophobic sites were exposed after the antigens were adsorbed onto the microplates. The blocking processes were used to fill the interspaces with bovine serum albumin (BSA), non-fat milk powder, or casein to block nonspecific binding. Washing steps were required to eliminate unbound and excessive components that might interfere with the assay. First, 10 µg of antigens (synthesized peptides) were applied to a 96-well microplate, incubated for 1 hat RT, and blocked with wash buffer (PBST buffer, 1× phosphate-buffered saline with 0.1% Tween 20) to remove non-specific antigens. Next, the cells were treated with the rabbit pre-immune antibody and rabbit post-immune antibody (rabbit anti-NNV capsid protein antibody) and were incubated for an hour and washed 3-5 times. Finally, hybridization with a secondary antibody (goat anti-rabbit IgG (H+L) antibody) conjugated with alkaline phosphatase (AP, Jackson ImmunoResearch, West Grove, PA, USA) was used as the detection antibody. After hybridization, the microplate was also washed 3-5 times to remove non-specific antibodies with wash buffer, and substrate pNPP (para-nitrophenylphosphate, ThermoFisher) was added for AP detection and read at 405 nm using an ELISA reader. The ELISA results were further compared and analyzed before and after immunization using GraphPad Prism (version 5.0; GraphPad Software, Inc.). The ELISA results were further statistically analyzed by t-test for each synthesized peptide (n = 3, p < 0.05).

Prediction Results between Alphanodavirus and Betanodavirus
There was no conserved LE between alphanodavirus and betanodavirus clusters, but eight exclusive LEs were found for the alphanodavirus subfamily and two exclusive LEs for betanodavirus. The sequence similarity between alphanodavirus (subunit particles, PDB:1NOV) and betanodavirus (S-domain; PDB 4RFT) is 22.67%. None of the predicted LEs for alphanodavirus could be found in the experimentally verified database IEDB, while the two predicted LEs for betanodavirus could be matched with existing reports from IEDB. In addition, a previous study reported and validated the true epitope segment of "BFNNV 261~272 :RPLSIDYSLGTGD" using biological experiments [19]. Through in silico scanning of IEDB, our prediction system increases the opportunity to predict genuine epitopes.

Prediction Results between Betanodavirus and Gammanodavirus
In the second trial, the predicted LEs located within the P-domain of betanodavirus and gammanodavirus were compared. In addition, P-domain segments from the grouperinfecting betanodavirus subfamily were exclusively predicted for detail comparison. The sequence similarity of the P-domain of betanodavirus (PDB:4RFU) and gammanodavirus (PDB:5YKV) was 27.18%, and the root-mean-square error (RMSD) between the two structures was 3.418 Å. No conserved epitopes were found in these two clusters. In contrast, four unique LEs in the betanodavirus group and two unique LEs in the gammanodavirus group were detected. The results of the four predicted LEs for grouper-infecting betanodavirus are shown in Table 2. Most of the predicted LEs were exactly or partially identical to the predicted LEs of the whole betanodavirus subfamily. For example, BFNNV 283~295 :KKVAGNVGTPAGW was also predicted in the grouper-infecting betanodavirus subfamily trial (DGNNV 283~295 :KKFAGNAGTPAGW); BFNNV 302~322 :DNFNKTFT QGVAYYSDAQPRQ in the betanodavirus trial was split into BFNNV 302~309 :DNFNKTFT and BFNNV 314~322 :AYYSDAQPRQ in grouper-infecting virus trials (DGNNV 302~309 :DNF NKTFT; DGNNV 314~322 :AYYSDEQPRQ). In Figure 1a, we selected one subunit (PDB:4RFU) and one chain of this subunit to visualize the predicted epitopes with cartoon style. Each epitope was represented by distinct color codes. Identical or partially duplicated epitopes in betanodavirus and grouper-infecting betanodavirus trials were colored the same. In Figure 1b, we also marked the residues with the following color codes: yellow (highly variable regions), pink (positivecharged residues), blue (negative-charged residues). In summary, the predicted epitopes for betanodavirus were charged residues and well conserved within highly variable regions.
There were four predicted LE epitopes for betanodavirus; among them, three segments were considered epitope candidates and were synthetized for biological experiments. BFNNV244-251:GSTQLDIA is the only unique epitope in the betanodavirus. We synthesized the predicted LEs of betanodavirus that overlapped among the different trials (Table 2). Through comparison, we selected exclusive epitopes for betanodavirus (BFNNV 244-251 :GSTQLDIA, BFNNV 261~272 :RPLSIDYSLGTGDV, BFNNV 283~295 :KKVAGNV GTPAGW, and BFNNV 300~321 :LWDNFNKTFTQGVAYYSDAQP) and grouper-infecting betanodavirus (DGNNV 221-238 :PIMTQGSLYNDSLSTNDF and BFNNV221-238:PILTLGPLYN DSLAANDF). To compare with previous research results, we synthesized EFNNV/ GNNV 249-258 :DIAPDGAVFQ as a reference for antigenicity comparison with our predicted LEs [54]. An ELISA was performed to identify the host specificity of NNV for grouper species. The results revealed that a significant change occurred before and after immunization. These predicted LEs reflect a strong antigenic response in grouper species. In Figure 2, enzyme-linked immunosorbent assays (ELISAs) were performed to identify host-specific LEs. Synthetic peptides (10 µg) or coating buffer were coated on a 96-well microplate. All peptides were labeled with primary antibody (rabbit anti-NNV coat protein antibody) and secondary antibody (goat anti-rabbit IgG (H+L) conjugated alkaline phosphatase. Detection was performed at 405 nm after adding the alkaline phosphate substrate. The x-axis represents the comparison of antibodies against NNV before and after immunization. The y-axis indicates the absorbance value at 450 nm. The responses of the synthetic peptides for betanodavirus (BFNNV_CP244-251, 261-272, 283-295, and 302-322) are shown from (a-d). The response of the synthetic peptide for grouper-infecting betanodavirus (DGNNV_CP221-238) is shown in (e). The reference control EFNNV_CP249-258 was based on a previous report and is shown in (f). The comparison of pre-immunization and after immunization, the ELISA results revealed each candidate peptide with antigenicity after immunization in rabbit anti-NNV CP antibody showing significant differences (n = 3, p < 0.05) (Supplementary Tables). In Figure 1a, we selected one subunit (PDB:4RFU) and one chain of this subunit to visualize the predicted epitopes with cartoon style. Each epitope was represented by distinct color codes. Identical or partially duplicated epitopes in betanodavirus and grouper-infecting betanodavirus trials were colored the same. In Figure 1b, we also marked the residues with the following color codes: yellow (highly variable regions), pink (positive-charged residues), blue (negative-charged residues). In summary, the predicted epitopes for betanodavirus were charged residues and well conserved within highly variable regions. There were four predicted LE epitopes for betanodavirus; among them, three segments were considered epitope candidates and were synthetized for biological experiments. BFNNV244-251:GSTQLDIA is the only unique epitope in the betanodavirus. We synthesized the predicted LEs of betanodavirus that overlapped among the different trials ( Table 2). Through comparison, we selected exclusive epitopes for betanodavirus  Enzyme-linked immunosorbent assays (ELISAs) were performed to identify host-specif LEs. From (a-d), 4 betanodavirus exclusive epitopes; (e) grouper-infecting virus exclusive epitop (f) a reference peptide. The symbol "-" represents the case of pre-immunization (without immun ization), and "+" represents after immunization (rabbit antibody was immunized by NNV capsi protein). All ELISA values were further analyzed by GraphPad Prism 8.0 software with t-test (n = * p < 0.05). Detailed experimental data can be found in Supplementary Tables S1 and S2.

Grouper-infecting betanodavirus
In Figure 3a, we applied traditional empirical approaches (Phyre 2 and I-TASSER) an state-of-the-art computational (RoseTTAFold) methods to predict the P-domai Betanodavirus "4RFU"and "3JBM". The RMSD of each predicted model between ar Phyre 2 0.3; I-TASSER 0.5; RoseTTAFold 1.07. It is surprising that the traditional homolog method achieved a lower RMSD for a better prediction. In Figure 3b, the predicted epitope are shown by structurally aligning with the resolved structure "4RFU" and RoseTTAFold pre dicted "3JBM". Only 3JBM49~65:LSIDYSLGTGDV/4RFU49~65:LSIDYSLGTGDV an 3JBM117~201:VCTRVO/4RFU117~201:VCTRDSX show a difference in substructure, and th mapped epitopes do not show obvious differences. (f) a reference peptide. The symbol "-" represents the case of pre-immunization (without immunization), and "+" represents after immunization (rabbit antibody was immunized by NNV capsid protein). All ELISA values were further analyzed by GraphPad Prism 8.0 software with t-test (n = 3, * p < 0.05). Detailed experimental data can be found in Supplementary Tables S1 and S2.
For gammanodavirus subfamily analysis, two conserved epitopes including MrNV 257~276 : YNADTIGNWVPPTELKQTYT and MrNV353~360:AVDPKPYQ were colored with both cartoon-style and space-filled 3D structured models and shown in Figure 4a. Figure 4b shows the aligned results by the self-developed multiple structure alignment system(AIR system). The charged residues and hypervariable regions of gammanodavirus and the sequence or structurally aligned results of PvNV (PDB:5YKZ) and MrNV (PDB:5YKV) were shown. The RMSD of these two structures was 1.607 Å, despite their relatively low sequence similarity. It is worth noting that PvNV353~360:ASKKQTTG and MrNV353~360: AVDP-KPYQ were located in regions of high variability and exhibited symmetrical structures in three dimensions. The β-strand peptide located at the C-terminus of the P-domain containing the last 26 amino acids (MrNV346~371:LVTDYQGAVDPKPYQYRIIRAIVGNN) were related to infectivity. MrNV353~360:AVDPKPYQ and DGNNV261~272:RPLSIDYSLGTGDV from betanodavirus showed similar properties (β-strand, charged, mostly protrusion shaped).  For gammanodavirus subfamily analysis, two conserved epitopes including MrNV257~276:YNADTIGNWVPPTELKQTYT and MrNV353~360:AVDPKPYQ were colored with both cartoon-style and space-filled 3D structured models and shown in Figure 4a. Figure 4b shows the aligned results by the self-developed multiple structure alignment system(AIR system). The charged residues and hypervariable regions of gammanodavirus and the sequence or structurally aligned results of PvNV (PDB:5YKZ) and MrNV (PDB:5YKV) were shown. The RMSD of these two structures was 1.607 Å, despite their relatively low sequence similarity. It is worth noting that PvNV353~360:ASKKQTTG and MrNV353~360: AV-DPKPYQ were located in regions of high variability and exhibited symmetrical structures in three dimensions. The β-strand peptide located at the C-terminus of the P-domain containing the last 26 amino acids (MrNV346~371:LVTDYQGAVDPKPYQYRIIRAIVGNN) were related to infectivity. MrNV353~360:AVDPKPYQ and DGNNV261~272:RPLSIDYS-LGTGDV from betanodavirus showed similar properties (β-strand, charged, mostly pro- In Figure 5, two types of NNV P-domain sequences performed by structural and sequence alignments (PDB:4RFU and 5YKV) with secondary structure annotations are shown. It can be observed that despite the P-domain of two structures they share low similarity (RMSD:3.418) but show close homology with similar compositions of secondary structures (β-strands).  In Figure 5, two types of NNV P-domain sequences performed by structural and sequence alignments (PDB:4RFU and 5YKV) with secondary structure annotations are shown. It can be observed that despite the P-domain of two structures they share low similarity (RMSD:3.418) but show close homology with similar compositions of secondary structures (β-strands).

Prediction Results between Alphanodavirus and Gammanodavirus
In the third trial, one conserved LE, five unique LEs for the alphanodavirus, and two unique LEs for gammanodavirus were identified. The sequence similarity between alphanodavirus (subunit particles; PDB:1NOV) and gammanodavirus (S-domain; PDB:6AB6) was 21.28%. Five unique LEs for alphanodaviruses were Nov 67-81 :DFSTDPGKGIPDKFQ, Nov 137-147 :PGFDQLFGTSAT, Nov 176-186 :AGSIQVYKIPL, Nov 258~268 :PPANVTNAQAS, and Nov 325~338 :ARESPANDEYALAA, and two unique LEs for gammanodavirus were MrNV 41~49 : VAKPTVAP and MrNV 110~135 :SQFWERYRWHKAAVRYVPAVPNTLAC. Alphanodavirus lacks the P-domain, therefore the whole subunit structure and predicted epitopes are visualized and colored in Figure 6. As shown in Figure 6a, the NNV S-domains of three subfamilies containing alphanodavirus (1NOV), betanodavirus (4RFT), and gammanodavirus (6AB6) were determined by sequence alignment with secondary structure annotations from DSSP. It can be observed that the core structure of the S-domain is mainly composed of beta-turn elements and possesses a composition similar to the secondary and tertiary structures. Therefore, the S-domain of NNV was relatively stable compared with the P-domain during evolution. The subunit (PDB:1NOV), trimer particle model, and total predicted epitopes are shown in Figure 6b. In Figure 6c, three types of NNV (subunit particles PDB:1NOV 6AB6; S-domain: 4RFT) were structurally aligned. The RMSD between the three structures was 2.761 and four types of alphanodaviruses (Pav:1F8V, BBV:2BBV, FHV:4RFT, and Nov:1NOV) were structurally aligned. The RMSD between the four alphanodavirus S-domain structures is 1.195. Compared with both gammanodaviruses and betanodaviruses, alphanodaviruses possess two unique protrusion structural segments (NOV 187~225 :KQVLNSYSQTVATVPPTNLAQNTIAIDGLEALDALPNNN and Nov 258~281 :PPANVTNAQASMFTNLTFSGARYT), which are well conserved in the alphanodavirus genus. It was also observed that the predicted epitopes NOV 200~226 :VPPTNLAQNTI AIDGLEALDALPNNNY and NOV 258~281 :PPANVTNAQAS were located at the unique protrusion structure shown in Figure 6c.

Physicochemical Characteristics of Predicted Residues
The amino acid index was obtained by calculating the occurrence frequency, which is defined as the number of predicted epitope residues divided by the overall residues and surface residues. Glycine (G) and alanine (A) accounted for the two highest ratios of predicted epitopes. In contrast, histidine (H) and glutamine (Q) are less likely to be considered epitope residues. The charge states of the predicted residues were further examined, as shown in Figures 1, 3 and 5. Most of the peptides were located in a highly variable state and contained positively or negatively charged residues. Furthermore, the peptides were transformed into a point-cloud data structure and their corresponding mean curvature, Gaussian curvature, and shape index were analyzed, as shown in Figure 7. Most shape indices of the predicted epitopes were −1-−0.75 (spherical cup~rut) as a receptor. residues. The charge states of the predicted residues were further examined, as shown in Figures 1, 3 and 5. Most of the peptides were located in a highly variable state and contained positively or negatively charged residues. Furthermore, the peptides were transformed into a point-cloud data structure and their corresponding mean curvature, Gaussian curvature, and shape index were analyzed, as shown in Figure 7. Most shape indices of the predicted epitopes were −1-−0.75 (spherical cup ~ rut) as a receptor.
(a) (b) Figure 7. Shape index of predicted epitope from betanodavirus and gammanodavirus. The surfaces of the predicted epitopes were extracted from betanodavirus (4RFT) and gammanodavirus (5YKV and 5YKZ) with MSMS. PyMesh is then applied to calculate the Gaussian and mean curvatures for each vertex and to further calculate the shape index (SI). Each histogram is represented using the corresponding color of the peptide from the sequence alignment. (a) Epitopes of betanodaviruses; (b) epitopes of gammanodavirus.

Discussion
A comprehensive LE prediction system for host-specific antigens has been proposed. For group feature detection, the antigen sequences were clustered prior to importing the sequences into the proposed system. For example, the Nodaviridae family can be categorized into three different subfamilies: alphanodvirus, betanodavirus, and gammanodavirus. In this study, we applied different combinations of existing resources to predict the conserved and unique LEs. Antigen sequences of each subfamily were analyzed using a multi-expert voting mechanism, and multiple structural alignments were performed to confirm the conserved and unique characteristics. Using multiple sequence aligned locations, the consensus voting module selected epitope candidate residues by accumulating votes provided by five different renowned LE epitope prediction tools. In addition to individually voted epitope residues, the minimum lengths of concatenated epitope residues were required for further experimental design. All LE candidates for different host-specific groups were cross identified. In addition, to increase the success rates of the eventual vaccine design, we emphasized the surface structural characteristics of the predicted epitopes. Therefore, antigens without resolved structures can be analyzed by applying structure prediction systems to their corresponding virtual structure predictions. The predicted epitope residues were reconfirmed for their surface conditions and aligned using structural alignment tools for revalidation. Based on the alignment results, the predicted conserved and/or unique LEs for different subfamilies were sequentially and structurally distinguished. The proposed system was compared with existing resources by selecting a well-known prediction system for comparison. We selected the ABCPreds prediction system and applied the betanodavirus antigen sequence as a test case. In this case, ABCPreds predicted 10 epitopes, one of the betanodavirus sequences, including 86% of the query antigen sequences. Conversely, our system predicted only four significant epitopes, and the total length of the predicted epitopes was 38.7% of the query antigen sequence. A unique aspect of the proposed system is that in addition to achieving accurate prediction, it provides host-specific LE prediction as long as the antigen sequences can be clustered in advance. Through biological experiments, three of the predicted epitopes were initially validated as accurate epitopes with strong antigenicity responses. Our system utilizes host-specific features to predict effective epitopes for biologists, and the developed multi-expert voting mechanism-based LE prediction system can successfully predict LEs with significant antigenic specificity. In the antigenicity assay, we found that there were many overlapping LEs in the betanodavirus subfamily. To further compare the differences among these predicted LEs, five predicted LEs were selected, synthesized, and analyzed by ELISA. In a recent report, an epitope of EFNNV/GNNV 249-258 :DIAPDGAVFQ was shown to be effective in the giant grouper (Epinephelus lanceolatus); therefore, this epitope was selected as a reference for antigenicity analysis. As revealed by the ELISA tests, all predicted LEs displayed high antigenicity for the orange spot grouper. Interestingly, the last three amino acids in the predicted LE BFNNV 244~251 :GSTQLDIA were the first three residues in EF/GNNV 249~258 :DIAPDGAVFQ. This suggests that they might play an important role in grouper immunity. The predicted LEs were approximately 8-22 residues in length; the majority of the residues were located at the N-terminus of the capsid protein and characterized by adequate antigenic properties and host specificity. Therefore, we hypothesized that the predicted epitopes were involved in the adaptive immunity of groupers. However, further investigation via in vivo analysis is required to confirm this hypothesis. In conclusion, this prediction system based on hostspecific characteristics provides important and exclusive information to fish immunologists for developing fish vaccines in an effective and efficient manner.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/v14071357/s1, Table S1: Raw data of ELISA assay (all peptide samples were conducted by ELISA tests, and each sample was coated on 96 well microplate in triplicate). No. 1, 2, and 3 for pre-immunization triple experiments; 4, 5, 6 for after immunization triple; Table S2: All experimental data were analyzed by multiple T test (Holm-Šídák method).