Quantitative Proteomics Comparison of Total Expressed Proteomes of Anisakis simplex Sensu Stricto, A. pegreffii, and Their Hybrid Genotype

The total proteomes of Anisakis simplex s.s., A. pegreffii and their hybrid genotype have been compared by quantitative proteomics (iTRAQ approach), which considers the level of expressed proteins. Comparison was made by means of two independent experiments considering four biological replicates of A. simplex and two each for A. pegreffii and hybrid between both species. A total of 1811 and 1976 proteins have been respectively identified in the experiments using public databases. One hundred ninety-six proteins were found significantly differentially expressed, and their relationships with the nematodes’ biological replicates were estimated by a multidimensional statistical approach. Results of pairwise Log2 ratio comparisons among them were statistically treated and supported in order to convert them into discrete character states. Principal component analysis (PCA) confirms the validity of the method. This comparison selected thirty seven proteins as discriminant taxonomic biomarkers among A. simplex, A. pegreffii and their hybrid genotype; 19 of these biomarkers, encoded by ten loci, are specific allergens of Anisakis (Ani s7, Ani s8, Ani s12, and Ani s14) and other (Ancylostoma secreted) is a common nematodes venom allergen. The rest of the markers comprise four unknown or non-characterized proteins; five different proteins (leucine) related to innate immunity, four proteolytic proteins (metalloendopeptidases), a lipase, a mitochondrial translocase protein, a neurotransmitter, a thyroxine transporter, and a structural collagen protein. The proposed methodology (proteomics and statistical) solidly characterize a set of proteins that are susceptible to take advantage of the new targeted proteomics.


Taxonomic Identification of Nematodes and Selection of Specimens for Proteins Extraction
Nematodes were collected during a general survey carried out in the Mercamadrid Central Fish Market, related to the incidence of these nematodes in commercial fish species. All specimens were captured in the FAO 27 major fishing area. A total of 235 pools of L3 larvae specimens  were randomly obtained from eight different fish hosts. Parasites were removed following published procedures [21,36,37]. The larvae pools were rinsed in 0.9% saline solution and placed in an antibiotic-antimitotic solution (80 mg gentamycin sulphate, 0.625 mg amphotericin B, 10.000 IU penicillin G, 10 mg streptomycin sulphate, 4.5 mL of saline Hank's solution making up to 10 mL of volume with bi-distilled water; Sigma Aldrich, St. Louis, MO, USA). After 40 min in disinfectant solution, the larvae pools were rinsed in bi-distilled water for 1 h. From each larvae pools (from each fish host), all specimen were isolated and for each of these L3 larvae specimens, the caudal part was used for DNA extraction and PCR amplification for species and hybrid identification; this caudal part and the rest of the body (used for proteomics experiment) were separately stored at −80 • C until required. Anisakis identification was performed following the taxonomic criteria of D'Amelio et al., 2000 [11] and Abollo et al., 2003 [14] using the ITS1 region of the nuclear ribosomal DNA (rDNA).

Selection of Specimens for Proteomics
We selected completely independent biological replicates of nematodes for the proteomics experiments. For A. simplex s.s. and A. pegreffi pure "specimen populations" were independently selected from six Merluccius merluccius hosts. In the case of A. simplex, four pools of larvae each containing 10 individuals were prepared as 4 biological replicates (A. simplex-1, A. simplex-2, A. simplex-3, and A. simplex-4). For A. pegreffii two independent biological replicates each with 10 individuals were prepared (A. pegreffii-1, A. pegreffii-2). However mixed hybrids genotype specimens were selected from different hosts (Micromesistius poutassou, Merluccius merluccius, Conger conger, Lepidorhombus boscii, Lophius budegassa, Lophius piscatorius, Scomber scombrum, Thunnus thynnus) in which one or both species and their hybrids were present because no pure hybrid specimen populations were found. The hybrids can differ depending on the contribution of the maternal and paternal genomes (A. simplex s.s. or A. pegreffii). We have used voucher material previously selected based on the results of mitochondrial DNA sequences, which differentiate the two types of hybrids [26] (data in DB Anisakis, "COIImarkers," www.anisakis.mncn.csic.es and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA316941) considering each as an independent biological replicate (Hybrid-1 and Hybrid-2) based on the phylogenetic position (resemblance to A. simplex or A. pegreffii). In total, a mix of six different maternal hybrid specimens were used-for Hybrid-1, six specimen whose COII clusters to A. simplex (Vouchers 48. 50, 82.19, 92.26, 99.39, 178.6, 187.31) and for Hybrid-2, six specimen whose COII clusters to A. pegreffii (Vouchers 45.9, 90. 5, 115.18, 119.36, 122.4, 176.22)-guaranteeing the independence of both hybrids biological replicates.

Protein Digestion and Tagging with iTRAQ-4-plex ® Reagent
For digestion, 40 ug of protein from each condition/population was reduced with 2 µL of 50mM Tris(2-carboxyethyl) phosphine (TCEP, SCIEX), pH 8.0, at 37 • C for 60 min and followed by 2 µL of 200mM cysteine-blocking reagent (Pierce MMTS, methyl methanethiosulfonate; Thermo Fisher Scientific, Waltham, MA, USA) for 10 min at room temperature. Samples were diluted up to 1 M Urea concentration with 25 mM TEAB. Digestions were initiated by adding sequence grade-modified trypsin (Sigma-Aldrich, St. Louis, MO, USA) to each sample in a ratio of 1:20 (w/w), which were then incubated at 37 • C overnight on a shaker. Sample digestions were evaporated to dryness.
Digested samples were labelled at room temperature for 2 h with iTRAQ Reagent Multi-plex kit (SCIEX, Foster City, CA, USA) according to the manufacturer's instructions. The iTRAQ labelling was performed separately in parallel in two four-plex designs using tags 114, 115, 116, and 117 (Supplementary File 1, Labelling Scheme). To reinforce the criteria of independent comparison, in both labelling (iTRAQ2 and iTRAQ3), tags were used for A. simplex-1, A. pegreffii-1, A. simplex-2, and Hybrid-1 in the case of iTRAQ2, while for iTRAQ3 tags were used for A. simplex-3, A. pegreffii-2, A. simplex-4, and Hybrid-2. After labelling, the samples were pooled, dried and desalted using a SEP-PAK C18 Cartridge (Waters). Finally, the cleaned tryptic peptides were evaporated to dryness and stored at −20 • C until analysis.
Data acquisition was performed with a TripleTOF 5600 System (SCIEX, Foster City, CA, USA) using an ionspray voltage floating (ISVF) 2300 V, curtain gas (CUR) 35 L/h, interface heater temperature (IHT) 150 • C, ion source gas 1 (GS1) 25 L/h, declustering potential (DP) 150 V. All data was acquired using information-dependent acquisition (IDA) mode with Analyst TF 1.7 software (SCIEX, Foster City, CA, USA). For IDA parameters, 0.25 s MS survey scan in the mass range of 350-1250 Da were followed by 30 MS/MS scans of 150 ms in the mass range of 100-1800. Switching criteria were set to ions greater than mass to charge ratio (m/z) 350 and smaller than m/z 1250 with charge state of 2-5 and an abundance threshold of more than 90 counts (cps). Former target ions were excluded for 20 s. IDA rolling collision energy (CE) parameters script was used for automatically controlling the CE. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [39] partner repository with the dataset identifier PXD019289 and 10.6019/PXD019289.

Data Analysis
The mass spectrometry data obtained for pooled samples were processed using PeakView ® 1.5.1 Software (SCIEX, Foster City, CA, USA) and exported as mgf files which were searched against a combined protein database that included A. simplex, A. pegreffii and A. simplex x A. pegreffii (hybrid) protein sequences from Anisakis DB (www.anisakis.mncn.csic.es) (containing 252.756 protein coding genes included their corresponding reversed entries), using the Mascot Server v. 2.6.1 (Matrix Science, London, UK). The search parameters were enzyme, trypsin; allowed missed cleavages, 2; fixed modifications, iTRAQ4plex (N-term and K) and beta-methylthiolation of cysteine; variable modifications, oxidation of methionine, acetyl (Protein N-term), pyrrolidone from E, and pyrrolidone from Q. Peptide mass tolerance was set at ±25 ppm for precursors and 0.05 Da for fragment masses. The interval of confidence for protein identification was set to ≥95% (p < 0.05) although a threshold for correct identification of individual peptides was established above the 1% false discovery rate (FDR). Only proteins with at least two quantified peptides were considered in the quantitation statistical model. A 5% quantitation FDR threshold was used to consider differentially expressed proteins.
A comparison of shared and independent proteins in the iTRAQ-2 and iTRQ-3 experiment was performed on both total identified proteins and those which accomplished the criteria of 5% quantitation FDR threshold by mean of Venn diagrams using Venny software (CSIC, Madrid, Spain) [40].

Coding Protein Regulation Values
Selection of the proteins which can be considered as taxonomic markers was done by coding the resulting quantification of proteins expression in the iTRAQ experiments to obtain significant discrete states (0, 1). Codification (0, 1) was based on level of signification of ratio (Log 2 ratios) of the iTRAQ comparison experiments, and according to the criterion of signification for continuous characters [41,42], such that 0 is down-regulated and 1 is an up-regulated protein [35]. Only proteins that were identified in all biological replicates showing some significant values were accepted. When expression values of any protein did not follow the same tendency in both experiments, the codification considers both possibilities (up or down regulated), and it is codified as question mark (?). Selected proteins were submitted to a functional protein association networks analysis by STRING software [43].

Statistical Comparison and Validation Experiments
In order to compare and assess the biologically independent pools of the two quantitative experiments iTRAQ2 (A. simplex-1, A. pegreffii-1, A. simplex-2, and Hybrid-1) and iTRAQ3 (A. simplex-3, A. pegreffii-2, A. simplex-4 and Hybrid-2), correspondence analysis was performed for proteins whose Log 2 ratio values were obtained for both experiments according the established value thresholds. The final result of codified proteins was assessed by means of principal component analysis (PCA) to establish the overall resemblance of A. simplex, A. pegreffii, and their hybrids. For both multivariate exploratory methods, Statistica v.6 programme (Tulsa, OK, USA) was used [44].

Results
The biological samples are compared by means of two different iTRAQ experiments (iTRAQ2 and iTRAQ3) following the labelling scheme of Supplementary File 1. All comparisons were performed always using an independent A. simplex as a reference. Consequently, for iTRAQ2 the comparison was: A. pegreffii-1 vs. A. simplex-1, Hybrid-1 vs. A.simplex-2, and A. simplex-2 vs. A. simplex-1. For iTRAQ3, the comparison was: A. pegreffii-2 vs. A. simplex-3, Hybrid-2 vs. A. simplex-4, and A. simplex-4 vs. A. simplex-3. In total 1811 and 1976 proteins were respectively identified by both iTRAQ experiments (Supplementary File 1) however both experiments share 1423 common proteins ( Figure 1A).
Criterion for identification of differential regulation among them, were at least two peptides showing a 95% level of signification (p < 0.05) of differentially regulated proteins for 1% FDR at peptide quantitation level, measured by Log 2 ratios of the relative protein/peptide abundances of all independent biological pools compared with their respective references (A. simplex-1 and A.simplex-2; A. simplex-3 and A. simplex-4). One hundred ninety-six proteins accomplished this criterion (Supplementary File 1, summary_quant_iTRAQ2-3); 154 proteins in iTRAQ2 and 162 proteins in iTRAQ3; both experiments share 120 proteins with significant criteria ( Figure 1B). These proteins were assigned according to their accession number in the published transcriptome [25]; their description, metabolic and biological function and homology with other nematodes, when possible, are included in the Supplementary Materials (Supplementary File 1, Functional Analysis).
Analysis of correspondence considering the raw data of 196 proteins only is possible for the 120 common proteins ( Figure 2) because for these there are values in both experiments and replicates. This analysis confirms the biological independence of the selected pools of L3 individuals detecting the variation in protein expression. The total variability is explained at 88.689% by the first three axes (41.30%, 33.22%, and 14.17% respectively). There are two sets of proteins that split the factorial space in those whose expression is related to A. simplex and other whose expression is more related to A. pegreffii-Hybrid (Figure 2A,B). The protein expression should be more variable in A. pegreffii than in A. simplex and Hybrid would resemble more to A. pegreffii due they share similar expression level in more common proteins than with A. simplex. Genes 2020, 11, x FOR PEER REVIEW 6 of 24  However, from these 120 common proteins, there were 24 whose level of expression is so variable (up-regulated or down-regulated) in the reference biological replicates that they cannot be considered as good taxonomic markers. In fact, given that A. pegreffii and the hybrid genotypes were two times separately compared with two replicates each of A. simplex (A. simplex-1 and A. simplex-2; A. simplex-3 and A. simplex-4) the reference ratio of Log 2 for both A. simplex replicates (As2/As1 and As4/As3) approximates 1 (i.e., ratio of comparison of A. simplex with itself), defining the tendency of down or up regulation of proteins. In conclusion, from the first 196 proteins that showed significant values in the FDR that were not considered for conversion to binary states, those which did not show values in some of the replicates or in which the compared Log 2 ratios (A. simplex-2/A. simplex-1 or A. simplex-4/A. simplex-3) were significant, indicating that some of the replicates deviated from the value of 1. This is because both A. simplex pools are from the same speciesm and the considered set of proteins have to be more similar in their expression. Consequently, 96 proteins were chosen for binary conversion of their expression level.
The correspondence analysis for these 96 proteins ( Figure 3) increases the explained source of variation up to 95.5% for the first three axes (64.64%, 24.01%, and 10.85% respectively) when compared with the former correspondence analysis ( Figure 2). This means that most of the detected variations could be explained by the selected proteins that form three different groups, each of them with direct relationships with the considered taxonomic units (A. simplex, A. pegreffii, and hybrid). Both species and their common hybrids are also clearly distinguished and split, narrowing the distance in the factorial space for A. simplex biological replicates and also for hybrids biological replicates ( Figure 3A,B). The influence of the sets of proteins is reflected according their level of expression. In this step, we choose those that discriminate according significant statistical criterion.
The overall level of significance expression for these proteins was obtained considering the three Log 2 ratios (A. pegreffii/A. simplex, A. hybrid/A. simplex, and A. simplex/A. simplex) and their relations with the values of average (x) and standard deviation (δx) ratios including the biological replicates and the two experiments. According to this, the level of significance would be established by 2δx if δx is ≤ x or δx if δx ≥ x (gap-coding criterion of Archie [41]) ( Table 1). This allows the conversion of the significantly regulated proteins as continuous characters in discrete or binary states (1, up-regulated protein; 0, down-regulated protein) [42,45,46].
Principal component analysis ( Figure 4) for the converted binary data (Table 1) confirms that hybrid genotype resembles A. pegreffii more than A. simplex. The percentage of explanation is 100%-38.60% for first factor, 31.60% for second factor, and 29.80% for third factor. This demonstrates that the chosen proteins are good taxonomical markers to differentiate among the studied species and their hybrid. The factorial space is mainly formed by the proteins which show significant discriminant values according the Archie statistical criteria [42,43] of Table 1 Table S1). Genes 2020, 11, x FOR PEER REVIEW 12 of 24 Principal component analysis ( Figure 4) for the converted binary data (Table 1) confirms that hybrid genotype resembles A. pegreffii more than A. simplex. The percentage of explanation is 100%-38.60% for first factor, 31.60% for second factor, and 29.80% for third factor. This demonstrates that the chosen proteins are good taxonomical markers to differentiate among the studied species and their hybrid. The factorial space is mainly formed by the proteins which show significant discriminant values according the Archie statistical criteria [42,43] of Table 1 where at  Table 1. Ninety-six selected candidates among the differential proteins for 5% false discovery rate (FDR) at quantitative level of signification and codification in binary states according the statistical criterion of Archie [42].  protein taxonomical markers ( Table 2). Only 12 of these taxonomical markers are included in the functional analysis and from them, structural protein dpy-5 (cuticle collagen) is the node of a cluster including another two proteins markers (col-170 and nas-15) and dpy-18 (procollagen-proline 4-dioxygenase activity). None of the interacting cluster formed by mif-1 (Tranitheryn5), T28F4.5 (Bm-DAP-1 identical), and mif-2 (cold-shock DNA-binding domain) can be considered taxonomical markers based on the statistical criteria. The last marker (nas-13, predicted zinc metalloase) is weakly included in a linear cluster with eef-1A.1, rla-1, and cct-1.

(A)
Genes 2020, 11, x FOR PEER REVIEW 14 of 24 (B)  supplementary Table S1. Percentage of explanation is for total variability is 100% for three factors (axes). The interaction network of the 96 selected proteins shows a very weak connection among them ( Figure 5) when the confidence limit is setting to 0.7. The interaction analysis was performed using the Caenorhabditis elegans database of proteins of STRING. The C. elegans orthologs and their proteins name are referred in Tables 1 and 2. Fourty-four nodes with 12 edges (expected 3) were detected; however, most of the nodes are considered "orphans." (B)  supplementary Table S1. Percentage of explanation is for total variability is 100% for three factors (axes).  Caenorhabditis elegans orthologs and the STRING [43] proteins names are included in Table 1.
Binary conversion provided evidence that 59 proteins are not informative because they present state 1 or 0 in the three assays and ambiguity at least for the same entity (it can be 0 or 1) referred as (?) in Table 1. In total, 37 proteins are clearly significant and are candidates to be considered as protein taxonomical markers (Table 2). Only 12 of these taxonomical markers are included in the functional analysis and from them, structural protein dpy-5 (cuticle collagen) is the node of a cluster including another two proteins markers (col-170 and nas-15) and dpy-18 (procollagen-proline 4-dioxygenase activity). None of the interacting cluster formed by mif-1 (Tranitheryn5), T28F4.5 (Bm-DAP-1 identical), and mif-2 (cold-shock DNA-binding domain) can be considered taxonomical markers based on the statistical criteria. The last marker (nas-13, predicted zinc metalloase) is weakly included in a linear cluster with eef-1A.1, rla-1, and cct-1. Table 2. Selected differentially expressed proteins as biomarker value after comparison and conversion in binary states. For description and characteristics, the proteins sequences were subjected to blast analysis against UniProt and National Center for Biotechnology Information (NCBI) databases (1, up-regulated; 0. Down-regulated? ambiguous).

Discussion
Global proteomics analysis in anisakids nematodes is a very promising discipline [32,47] from clinical to diagnostic tools. There are approximately 20,000 genes in nematodes, however it is estimated that there are hundreds of thousands of proteins isoforms, most of them modified at postranscriptional stage. This complexity (proteins and isoforms) requires a very confident methodology which in our case is provided by iTRAQ approach allowing in the same experiment separation, quantification and identification. Comparing A. simplex, A. pegreffii, and their hybrid by means of two independent experiments is the basis the definition as biomarker discovery phase in a global quantitative proteomic analysis [26]. Beause quantification is very exact, the data are suceptible to statistical analysis. We have assesed the independence of the compared biological pools through multifactorial statistical methods wich order the total factorial space variability and also define the importance of proteins. The biological independence of both experiments comparing four biological replicates of A. simplex and two other biological replicates for A. pegreffii and their common hybrids has been assesed by means of multifactorial correspondance analysis (Figures 2 and 3). Lastly, selected significant proteins ( Table 1) that acomplish the criteria for identification (p < 0.05 FDR) were also submited to a direct test in order to convert the quantitative data of both experiments in qualitative one reflecting the expression protein tendence (up or down regulated) as binary data, as was formerly proposed [35].
In spite that the hybrid genotype has 50% of each parental species (Hybrid-1 from A. simplex cluster; Hybrid-2 from A. pegreffii cluster), the selected proteins markers share common expression pattern ( Figure 3). With these results, it is also possible to compare by means of similarity clustering or by sequential taxonomic analysis to assign the parental species. Principal component analysis (Figure 4) confirms the binary conversion and stress the contribution of each protein marker by means of their contribution to formation of factorial space (Supplementary Materials, Table S1). Accordingly, the hybrid genotype shares 19 proteins states with A. pegreffii, while only 4 protein states are shared with A. simplex. A. pegreffii and A. simplex share five proteins states. This confirms that although the hybrid is separated from the other two, it shares more proteomic similarities with A. pegreffii than with A. simplex such as was demonstrated when allergenic proteins have been compared in [22].
The main characteristics (including the biological and biochemical process in which they are involved) and difference among the 37 proposed proteins markers among the studies taxonomic entities are summarized in Table 2. These proteins are encoded in 33 different loci. Lipase class 3 family protein may function as a lipase (GO:0006629; catalysis of the hydrolysis of ester bonds of insoluble substrates such a triglycerides), they are widely distributed in animals, plants and prokaryotes. It is upregulated in A. pegreffii compared with A. simplex and their hybrid genotype. The class 3 family proteins are distantly related to other lipase families [48].
The leucine-rich repeats-containing domain shows equal significant differences in five loci. This domain is evolutionarily conserved in many proteins associated with innate immunity in plants, invertebrates and vertebrates. Serving as a first line of defense [49] is up-regulated for the five transcripts of A. simplex, while it is down-regulated in A. pegreffii and hybrid genotype, although one of the transcripts appears as unambiguous for A. pegreffii in Table 1 (ANAP13823-2TR). Transthyretin 46 is a protein known as thyroid hormone-binding protein (as a probable thyroxine transporter) considered of extracellular region or secreted protein [50], but it has also been sequenced in nematodes (Toxocara canis and Caenorhabditis elegans); it is up-regulated in A. simplex and hybrid genotype. Zinc metalloase nas-13, Zinc metalloase nas-15, and briggsae CBR-NAS-13 are endopeptidases (GO:0008270, GO:0004222; proteolytic peptidases) [51]. The endopeptidases are a very large family of proteolytic proteins present along the whole biological evolutionary tree. This would explain how the four transcripts (in four different loci) have different representation (protein isoforms) in the three studied entities.
The allergenic proteins Ani s12 [52] and Ani s14 [53] have been demonstrated to be major allergens [54], although no functional or biological processes are known. Ani s12 is up-regulated in A. pegreffii and the hybrid genotype, while is down-regulated in A. simplex. However, Ani s14 is down-regulated in both species while is up-regulated in their hybrid. SXP RAL-2 family 2 isoform 1 is demonstrated to be the heat-stable allergen Ani s8. Within this family of proteins there is another allergen of A. simplex (Ani s5) [55]. This Ani s8 shares a high homology with Ani s5 having been shown the IgE cross-reactivity between both allergens [56]. This allergen codified in three different transcripts from two loci is up-regulated in A. simplex and down-regulated in A. pegreffii and the hybrid genotype. UA3-recognized partial is the Anisakis simplex UA3-recognized allergen Ani s7 [57]. This allergen appears as significant expression state in four transcripts from two different loci (one and three transcripts respectively). When both loci are considered, the protein is down-regulated in A. simplex and up-regulated in the hybrid genotypes, while for A. pegreffii it is up-regulated for one locus and ambiguous for the other. Allergen Ani s10 [58] is down-regulated in A. pegreffii and upregulated in A. simplex, while both expression states in the hybrids. Anisakis simplex Anis11L1 mRNA for Ani s11-like protein precursor, complete cds, precursor de Ani s11 [59] is only up-regulated in A. simplex.
Ancylostoma secreted protein is differentially expressed through five transcripts from five different loci, known as venom allergen protein; it is upregulated in A. pegreffii and the hybrid genotype. This type of protein is very important for plant and animal nematodes. They are secreted during several stages of parasitism as a mix of proteins with presence of a structurally conserved group of venom allergen-like proteins (VALs) [60] causing damage to host tissue, and they are considered important proteins in host parasite relationships. This protein is considered homologous of Venom allergen 5.02 codified by two different loci.
Mitochondrial import inner membrane translocase subunit TIM44 (GO:0030150) is up-regulated in the hybrid genotype and down-regulated in both species. It interacts with mtHsp70 chaperone protein (GO:0051087) and is member of the complex found in the inner mitochondrial membrane [61]. Carboxylic ester hydrolase is a member of the cholinesterases family whose function is to act as neurotransmitter [62] in the membrane. It is up-regulated in A. pegreffii and down-regulated in A. simplex, and in the hybrid genotype, it can be present in both states. The DUF4440 domain-containing protein is an uncharacterized, hypothetical protein with high homology with LOC101164525 (Oryzias latipes) and a hypothetical protein ASU_00001 with protein R102.1 from C. elegans; both proteins were sequenced for transcriptomics comparison of the tree entities [25]. Finally, there are different expression in the structural protein cuticle collagen dpy-5 (GO:0042302) (down-regulated in A. pegreffii, up-regulated in hybrid genotype and ambiguous in A. simplex); this protein encodes a cuticle procollagen and is responsible for final body phenotype [63]. There are four uncharacterized proteins (numbers 75, 97, 135, and 141) which are however statistically very representatives.
The proteomics approach presented here characterizes the level of protein expression in order to obtain robust protein markers which can be used both as a taxonomical and clinical tool due the capacity of proteomics technology to process in a rapid way a high number of samples. The proposed method is very suitable for data such as continuous values with reliable confidence such as those obtained by iTRAQ analysis [35]. These data are clear when statistically significant differences of exact and precise proteomics can be recorded and applied in a binary matrix [29,34], as is the case of this study.
In conclusion, a total 1811 and 1976 proteins were identified in the two iTRAQ experiments comparing the proteome of A. simplex, A. pegreffii, and their hybrid genotype. From those, 37 proteins are robust candidates as markers according to the confidence limits imposed by iTRAQ methodology. The statistical conversion of continuous characters (Log2 ratios of the relative protein/peptide abundances) to binary states allows us to recognize and simplify the statistical significance of proteins regulation level (upregulated or down-regulated). The proposal of these proteins as candidate biomarkers is based on the quantitative identification of proteins combined with a statistical approach (multidimensional and classical parametric) used to reinforce the selection, however a good marker requires further development in several predictive tests. The taxonomy of the genus Anisakis is solidly founded avoiding environmental and fish host effects for species diagnosis. Consequently, a confirmatory analysis to demonstrate that protein selection accomplished the defined criteria regarding expression is coincident with the molecular taxonomy, which will allow us to develop high-throughput methodologies applied to clinical, epidemiological, and food technology. The development of proteomic technologies has greatly accelerated the confirmation of potential proteins as biomarkers. Quantitation using multiple-reaction monitoring mass spectrometry (MRM-MS) in combination with isotope-labeled internal standards has driven what is known as targeted proteomics [64], focused on targeted acquisition and targeted data analysis applying mass spectrometry [65]. Accordingly, the experimental demonstration to validate the taxonomical biomarkers we have proposed in this study has to be developed closing the canonical approach for biomarker detection using proteomics [66].
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/8/913/s1, Table S1. Proteins contribution to the factors according to the Lg 2 ratio and correlation of studied nematodes with factors of principal component analysis ( Figure 4); Supplementary File 1. Parameters of total (1811 and 1976) identified proteins and their accessions according Anisakis DB (www.anisakis.mncn.csic.es), including summary results for the 196 significant proteins (196), their functional analysis, labelling scheme, and results for the independents experiments iTRAQ-2 and iTRAQ-3.