The Evolution, Gene Expression Profile, and Secretion of Digestive Peptidases in Lepidoptera Species

Serine peptidases (SPs) are responsible for most primary protein digestion in Lepidoptera species. An expansion of the number of genes encoding trypsin and chymotrypsin enzymes and the ability to upregulate the expression of some of these genes in response to peptidase inhibitor (PI) ingestion have been associated with the adaptation of Noctuidae moths to herbivory. To investigate whether these gene family expansion events are common to other Lepidoptera groups, we searched for all genes encoding putative trypsin and chymotrypsin enzymes in 23 publicly available genomes from this taxon. Phylogenetic analysis showed that several gene family expansion events may have occurred in the taxon’s evolutionary history and that these events gave rise to a very diverse group of enzymes, including proteins lacking the canonical SP catalytic triad. The expression profile of these enzymes along the midgut and the secretion mechanisms by which these enzymes enter the luminal content were also analyzed in Spodoptera frugiperda larvae using RNA-seq and proteomics. These results support the proposal of a midgut countercurrent flux responsible for the direction of these proteins to the anterior portion of the midgut and show that these enzymes reach the midgut lumen via both exocytosis and microapocrine secretion mechanisms.


Introduction
In insects, expansion events in gene families encoding digestive enzymes including cathepsins L [1,2] and D [3], lysozymes [4], and trypsins [5] have been associated with the adaptation of diverse taxonomic groups to their food sources. The role of these duplication events in the adaptation of these insects is hypothesized to be related to dosage amplification selection and the diversification of the expression and biochemical properties of the expanded genes. In herbivorous Lepidoptera species, the expansion of the number of genes encoding serine peptidases (SPs) has been mainly associated with adaptation to overcome the effects of host serine peptidase inhibitors (PIs) [6].
Lepidopteran larvae rely on serine peptidases (mainly trypsin and chymotrypsin enzymes) for primary protein digestion [7]. These enzymes are well suited to the alkaline midgut of these insects but also make them susceptible to dietary serine peptidase inhibitor (PI) ingestion. PIs improve plant defenses against insects by acting as pseudosubstrates and thereby limiting the effects of SPs on proteolysis [8]. Polyphagous Spodoptera frugiperda (Lepidoptera, Noctuidae) larvae exhibit a large set of genes encoding trypsin and chymotrypsin enzymes in their genomes (14 chymotrypsin and nine trypsin genes were identified in a previous assembly of the species transcriptome), which are differentially transcribed when the larvae are exposed to PIs [9,10]. In contrast, the less-generalist Diatraea saccharalis (Lepidoptera, Crambidae) larvae have a smaller set of genes encoding these enzymes (nine chymotrypsin and four trypsin genes were identified in the species transcriptome), and none of these genes are upregulated in response to soybean PI ingestion [6]. Spodoptera frugiperda larvae are able to physiologically adapt to soybean peptidase inhibitor ingestion [11], whereas Diatraea saccharalis larvae are significantly affected by these inhibitors [12]. These results show that Lepidoptera species rely on different strategies to overcome the deleterious effects of PIs, but in at least some of these species, SP gene expansion events must play an important role in adaptation to their host plants.
Approximately one-third of peptidases are serine peptidases. SP enzymes are typically composed of a catalytic Ser/His/Asp triad and function via a mechanism in which serine acts as a nucleophile, histidine acts as a general acid base and aspartate aids in histidine positioning and charge stabilization. However, enzymes exhibiting SP folds with noncanonical triads (such as Ser/His/Glu, Ser/His/His, Ser/Glu/Asp, or Gly/His/Asp) and dyads (Ser/Lys and Ser/His) have also been described [13]. These innovations have been shown to be useful in some cases; for example, the Ser/Glu/Asp triad is believed to allow the peptidase to function in low-pH environments [13]. In Arthropoda, serine peptidase homologs have been found to play important roles such as acting as antimicrobial molecules [14], collaborating in the activation of prophenoloxidases [15], and mediating cell adhesion [16]. Nonenzyme proteins can regulate their functional enzyme homologs, as observed for PLA2 inhibitors stabilizing the inactive form of phospholipase A2 [17] and SOD1 copper chaperones necessary for the full activity of superoxide dismutase [18]. For these pairs of proteins, the nonenzyme protein is hypothesized to have evolved from its enzyme counterpart [19]. However, the evolutionary history and role of most SP protein homologs in insects remain poorly understood.
Serine peptidases are synthesized as inactive zymogen precursors that must undergo an activation process. In this process, an N-terminal pro-peptide region of variable length is removed by a second peptidase enzyme that cleaves the precursor between a basic amino acid (lysine or arginine) and an isoleucine residue. In Lepidoptera, the trypsin and chymotrypsin activation process is believed to be performed by trypsins, as the amino acid composition of the activation region of both enzymes includes a high-affinity to trypsin binding site [5].
In the Lepidoptera species S. frugiperda, digestive trypsins are secreted in the anterior midgut region and bound to the membrane via a microapocrine mechanism in which small vesicles bud from microvilli [20]. Another pathway for anchoring SPs to the membrane was identified on the basis of the presence of detergent-resistant domains in midgut microvilli, where these domains likely act as a mechanism to recruit proteins for microapocrine secretion [21]. In the posterior region, these enzymes are soluble and secreted via exocytosis [20].
Trypsin and chymotrypsin enzymes present decreasing activity along the lepidopteran larval midgut, an effect that can be eliminated by disrupting the peritrophic membrane in calcofluor-treated larvae [22][23][24]. Thus, it seems that the compartmentalization achieved by the peritrophic membrane, together with water counterfluxes in the ectoperitrophic space (outside PM), is responsible for recycling digestive enzymes in the midgut of this animal.
To better understand how the evolutionary history of trypsins and chymotrypsins is correlated with Lepidoptera species diversification, we searched for genes encoding these proteins in all available Lepidoptera genomes. Phylogenetic analysis revealed that several expansion events may have occurred in these genes during the evolutionary history of the order and also highlighted the emergence of several noncanonical peptidases over time. Moreover, we analyzed the expression profile of these genes along the midgut and the secretion mechanisms by which they reach the luminal space in Spodoptera frugiperda larvae via RNA-seq and proteomic analyses of different midgut regions.

Lepidopteran Trypsin and Chymotrypsin Gene Evolution
To understand how the genes encoding trypsins and chymotrypsins evolved during Lepidoptera species diversification and in which taxonomic groups gene family expansion events may have occurred, we searched for all protein sequences related to these enzymes in 23 Lepidoptera genomes with public gene annotation datasets. These sequences were then subjected to phylogenetic and substrate-binding site analyses (see the analyzed sequence positions in Figure 1A). Branches with good support predicted for each phylogenetic tree (designated T1 to T18 for trypsins and C1 to C23 for chymotrypsins) were individually analyzed to facilitate the identification of gene family expansion events and substitutions in the amino acid composition of the enzyme substrate-binding sites ( Figures S1 and S2).
For trypsins, the largest numbers of gene copies are observed in the genomes of species from the Noctuidae (28 to 42), Plutellidae (37), and Sphingidae (35) moth families (Tables S1,S2). The number of genes encoding chymotrypsins is highest in the genomes of species from the moth families Plutellidae (71) and Noctuidae (32 to 58) and the butterfly family Pieridae (61) (Tables S3,S4). However, the number of genes encoding both trypsins and chymotrypsins is variable among Lepidoptera genomes, even between species within the same family, as observed for Ostrinia furnacalis and Chilo suppressalis from the Crambidae family.
The first six branches of both trees (T1-T6 and C1-C6) are included in a major branch together with homologous sequences from Diptera and/or Homo sapiens that were retrieved from the UniProt database ( Figures S1 and S2). Several gene family expansion events may be speculated on other branches, such as the trypsin T7, T10, T15, T17, and T18 branches. These branches include multiple gene copies likely originated from several duplication events that probably occurred at different times during the evolutionary history of Lepidoptera species.
The trypsin and chymotrypsin similarity groups are highly variable in their substrate-binding sites ( Figures S3 and S4). The only highly conserved sites are those related to the catalytic triad, disulfide bondforming residues, and amino acids located at the S1 subsite. This high variability is also observed among the enzyme activation sites and pro-peptide regions (data not shown). Notably, the mammalian DDDD motif in the trypsinogen activation site, which shows high affinity for enterokinase enzyme activation, is not observed in any of the trypsin groups.

Trypsin and Chymotrypsin Genes: Expression and Secretion Pathways in the Spodoptera frugiperda Midgut
Protein sequences classified as trypsin and chymotrypsin sequences were used to search for homologous proteins in the Spodoptera frugiperda transcriptome. S. frugiperda transcripts were analyzed according to their expression profile in three midgut regions (anterior (AM), middle (MM), and posterior (PM)) and the carcass (CAR, larvae without the midgut) ( Figure 1B).
In agreement with our classification, all S. frugiperda genes exhibited higher expression (transcripts per million (TPM)) values in at least one midgut tissue than in the carcass (Tables 1,2). Notably, most trypsin and chymotrypsin genes were most highly expressed in the middle midgut (MM). Moreover, with only one exception, the proteomics data showed that all sequences chosen for phylogenetic analyses were present in the midgut and not in the other analyzed tissues. Only group T4 seemed to be composed of nondigestive trypsins. Sf_186.1 was highly abundant in hemolymph cells in contrast to the midgut fractions, and the low levels observed only in microapocrine vesicles could be due to hemolymph contamination.
Protein sequencing was effective, and approximately 50% of the sampled sequences included in Tables  1,2 were identified by this technique (14 out of 26 from Table 1 and 19 out of 36 from Table 2). In addition, most of the sequences identified by proteomics were sequences with higher expression values. Using proteomic data, we determined the secretion mechanism that is likely used by each protein. The enzymes that were observed in microapocrine vesicles and reached the PM and/or endoperitrophic contents were considered to be secreted via a microapocrine mechanism. By exclusion, the protein was considered to present an exocytic mechanism if it was not present in these vesicles but reached the PM and/or its contents. Finally, we did not consider a protein to be present in an evaluated sample when its quantitative value was less than 10% across all fractions because minor cross-contamination is intrinsic to sample handling for obtaining the different midgut fractions.    AM, MM, and PM: anterior, middle, and posterior portions of the midgut, respectively; CAR: carcass (body without the midgut). ‡ Normalized spectral abundance factor varying from 0 to 1,000,000. * The value represents quantitatively less than 10% of the observed protein in relation to its identification across all analyzed samples. ECS, endoperitrophic content supernatant; EMM, enterocyte microvillar membrane; EF, ectoperitrophic fluid; FB, fat body; HS, hemolymph soluble proteins; mvMP, microapocrine vesicle membrane proteins; mvSP, microapocrine vesicle soluble proteins; PMW, peritrophic membrane washing; PMS, peritrophic membrane supernatant. EC, exocytosis; HC, hemolymph cells; MA, microapocrine secretion; DRM, protein found in detergent-resistant domains present in the microvillar membrane [21]; E, empirical observation of a GPI anchor [21]; P, positive for GPI anchor prediction using PredGPI [25]. Protein sequences with substitutions in the catalytic triad: a S195F and b S195A (with best BLAST hits in the tree: Sfrr_SFRICE004673-PA and Slit_XP_022821660.1, respectively, both with the same amino acid replacements in the catalytic triad as their S. frugiperda homologs).     [21]; E, empirical observation of a GPI anchor [21]; P, positive for GPI anchor prediction using PredGPI [25]. Protein sequences with substitutions in the catalytic triad: a H57Y, b H57S, and S195V; c S195T (with best blast hits in the tree: Sfrr_SFRICE019800-PA, Sfrr_SFRICE005241-PA, Sfrr_SFRICE000209-PA, respectively, both with the same amino acid replacements in the catalytic triad as their S. frugiperda homologs). GPI anchor was not predicted for any chymotrypsin.
Some of the proteins that were secreted in microapocrine vesicles could also be found in detergentresistant domains, which is apparently a way to recruit these molecules for release into the lumen. Trypsins were not observed via this route, but three putative chymotrypsins and two SP homologs were enriched in these domains [21]. Two of these putative chymotrypsins are shown in Table 2. The other sequences did not meet the established criteria to chymotrypsins identification used in the present work (see Material and Methods section).
There is a tendency of proteins on the same branch to follow the same secretory route (Tables 1,2). Exceptions to this pattern are observed in clades T18 ( Figure S1) and C1 ( Figure S2). Among the six sequences in group C21, only the low-abundance protein Sf_11251.1 was assigned to an exocytic route, whereas all other proteins seemed to be secreted through a microapocrine mechanism. The absence of Sf_11251.1 in microapocrine fractions could be due to its low abundance. Exocytosis seems to be the route used by the majority of both trypsins (70%) and chymotrypsins (63%). Chymotrypsins are also secreted in detergent-resistant membranes present in microapocrine vesicles through an unclear anchoring mechanism.
Other protein sequences from the S. frugiperda transcriptome predicted to include a trypsin domain (according InterproScan analysis, IPR001254), but not classified according our criteria as trypsins or chymotrypsins, are presented in Table S5. Some of these proteins were detected by proteomics analysis in the midgut lumen and are encoded by genes that are more highly expressed in the midgut than in the carcass. These proteins may represent unsampled digestive trypsin and chymotrypsin groups.

Noncanonical Peptidases and Their Putative Role in Lepidoptera Species
Several sequences with high similarity to trypsin and chymotrypsin sequences but with noncanonical amino acids occupying at least one of the positions constituting the catalytic triad (His57, Asp102, and Ser195) were observed. These sequences are highlighted in red in both phylogenetic trees (Figures S1 and  S2).
Branches composed only of sequences exhibiting substitutions in the catalytic triad from different species were observed in both trees, showing that some of these sequences may have originated from duplications in ancient ancestral species and been preserved in these genomes to the present day.
Two trypsin and three chymotrypsin-homologous genes encoding protein sequences with amino acid replacements at the putative catalytic triad positions were expressed in the S. frugiperda midgut. Two genes encoding trypsin homologs with substitutions of the nucleophilic serine (S195F and S195A) and three genes encoding chymotrypsin homologs with substitutions of serine and histidine residues (H57Y, H57S, S195V, and S195T) were expressed in the S. frugiperda midgut. These substitutions were also present in the most similar sequences of these proteins in the enzyme phylogenetic trees. For example, the trypsin Sf_10598.1 transcript encodes a protein with an S195F substitution (i.e., it exhibits a phenylalanine residue instead of the catalytic serine residue). This amino acid replacement is also present in the most similar sequence to this protein in the trypsin phylogenetic tree (Sfrr_SFRICE004673-PA) ( Figure S1). According to the phylogenetic analysis, this substitution may have occurred in a sequence of the ancestral Noctuidae species and been preserved in S. frugiperda, S. litura, H. virescens, and T. ni orthologous sequences. This gene is highly expressed in S. frugiperda midgut tissues (PM > AM > MM) and was identified by proteomics as being present in the peritrophic membrane and endoperitrophic space contents, showing that it may be an important protein in S. frugiperda digestion.

Trypsin and Chymotrypsin Gene Families are Highly Expanded in Almost All Lepidoptera Groups
In this work, we searched for genes encoding trypsins and chymotrypsins in all available Lepidoptera genomes with protein annotation and analyzed their evolutionary history and sequence divergence. Phylogenetic analysis showed that gene family expansion events may have occurred in all studied lepidopteran taxonomic groups and at different times during the evolutionary history of these insects. Gene duplication events involving these gene families seem to be very common in Lepidoptera species, and many of these copies have survived in these genomes for several millions of years ( Figures S1 and S2).
Our previous study highlighted the presence of four lepidopteran trypsin groups, designated L-VII, L-VIII, L-IX, and L-X, which are exclusively composed of sequences from the Noctuidae family and exhibited unusual characteristics in their substrate-binding sites. These groups were hypothesized to have resulted from a specific lineage expansion event originating in Noctuidae family-specific ancestors in the recent past. In this work, these four groups of sequences were merged into group T18, which also included sequences from the Noctuidae family. However, the present analyses based on complete genome datasets from five Noctuidae species (and two Spodoptera frugiperda lineages) showed that these sequences may be a result of duplication events that occurred only in more recent ancestral species. This conclusion is based on the fact that the genome of T. ni (from the Plusiinae subfamily) exhibits five sequences in the T18 group, but all of them are more similar to each other than to the sequences from subfamilies Heliothinae (H. virescens and H. armigera) and Amphipyrinae (S. litura and S. frugiperda) and are probably derived from gene duplication events that occurred after the speciation event that gave rise to these lineages. The groups previously reported to have resulted from Noctuidae lineage-specific expansion events may in fact have resulted from several duplication events that occurred in the common ancestral species of Heliothinae and Amphipyrinae and in its descendant species. Finally, substrate-binding site characteristics that were considered to be exclusive to this Noctuidae group are also reported in other sequences from different trypsin groups and families ( Figures S3 and S4).
Finally, based on the estimated average time for the silencing of one copy of a duplicated gene pair of approximately 4 million years [26], many of the duplication events observed in our trees may have already been silenced or will be in years to come.

Trypsin and Chymotrypsin Genes are Mainly Expressed in the Middle Midgut, and the Proteins are Mainly Secreted by Exocytic and Microapocrine Vesicles.
Trypsin and chymotrypsin enzymes exhibit decreasing activities along the midgut of S. frugiperda larvae [22]. However, according to our RNA-seq analysis, the genes encoding these proteins are mainly expressed in the middle midgut of this insect. One explanation for these findings is that the water counterflux in the ectoperitrophic space (outside PM) [22] must carry the newly synthesized proteins in the direction of the anterior midgut (see the schematic model of this mechanism in Figure 1B).
Biochemical and immunohistochemical data indicated that trypsin is secreted bound to the membrane via a microapocrine mechanism in the anterior midgut, whereas in the posterior region, trypsin is soluble and is observed in exocytic vesicles [20]. A further study suggested that detergent-resistant domains present in midgut microvilli could act as recruiters for protein secretion in microapocrine vesicles [21]. Although we did not obtain data about different regions along the midgut in our proteomic analyses, both microapocrine and exocytosis mechanisms can be proposed for different trypsins, as shown in Table 1. In addition, we observed trypsins in membrane fractions, confirming previous biochemical data.
It has been proposed that membrane-bound trypsins are anchored by a hydrophobic peptide [20]. However, the transmembrane prediction was only correct for a few low-abundance sequences, and none of these sequences were identified by mass spectrometry. The presence of a GPI anchor was empirically determined for Sf_2549.1, which exhibits low expression [21], and predicted for the highly abundant Sf_225.1. In both cited studies, phospholipase C treatment was used to solubilize GPI-anchored enzymes. Although Jordão and collaborators [20] did not observe differences between the trypsin activities of control and enzymatically treated samples, in another study, in which a highly sensitive mass spectrometer was used, five serine endopeptidase homologs were shown to be differentially released from midgut microvilli after phospholipase C incubation [21]. Thus, it seems likely that a third mechanism, such as lipidation, could explain how SPs with neither a transmembrane prediction nor a GPI anchor are bound to the membrane. This newly proposed mechanism could explain the trypsin activity observed in membrane fractions as well as how some proteins are associated with detergent-resistant domains (Table 1, Sf_290.1 and Sf_7342.1). Analytical efforts aimed at addressing this subject and the type of enzymes involved should be considered for future research in the field. In particular, the prediction tools for lipidation do not seem to be effective for insect sequences, which require the development of mass spectrometry protocols for these specific targets.
Proteomic analyses confirmed that at a minimum, approximately 50% of the S. frugiperda proteins selected in this work as putative trypsin and chymotrypsin sequences may be considered digestive enzymes, as they were found in the midgut contents (ECS) with a single exception (Sf_186.1). In groups T7, T18, C1, C20, C21, and C23, most or all of the proteins have been sequenced, suggesting an important role of these enzymes and their closely related forms in the digestive process in S. frugiperda. Note that even products of putatively recent gene duplication events are translated into proteins and may be actively involved in digestion. In contrast, T11, a group with an average transcription expression level, was not identified via proteomic analysis. In addition, the presence of Sf_1782.1 chymotrypsin associated with the fat body seems to be an artifact, as it was also detected in all midgut fractions.
In general, the secretory route (exocytic or microapocrine secretion) is common to both trypsins and chymotrypsins from the branches of the corresponding trees, with the exception of trypsin T18. We could not identify common signatures within enzyme sequences associated with the type of secretory route.

Trypsin-and Chymotrypsin-Homologous Sequences must Play an Important Role in Lepidopteran Evolution
We found several SP-homologous protein sequences with high similarity to trypsins and chymotrypsins, but with changes in at least one of the residues occupying the positions of the enzyme catalytic triad (Figures S1 and S2). SP homologs were found in all Lepidoptera genomes, and some of them were grouped on branches composed of closely related species, showing that these sequences may have originated from a duplication event of an SP gene in a common ancestor. In addition, the residue changes were maintained in these genes upon lineage speciation. This conservation over time and the presence of these enzymes in S. frugiperda tissues, as observed by RNA-seq and proteomics analyses (Tables 1,2), show that these enzymes might play important roles in these insects.
Inactive enzyme homologs have been widely found in different Metazoan species where they play different roles, particularly in regulatory processes [27]. Enzyme catalytic site residues play an essential role in maintaining the activity of enzymes toward their canonical substrates, and must therefore be subjected to high purifying selective pressure. The high frequency of SP homologs observed in our work may be related to the large number of gene expansion events observed during trypsin and chymotrypsin gene family evolution, which may lead to newly generated variants.
The putative role of these SP homologs may be linked to the regulation of other SP enzymes, antimicrobial activities and/or the inactivation of plant PIs. These roles must be further elucidated.

Chymotrypsin and Trypsin Sequence Identification
Protein sequence datasets obtained from Lepidoptera genome sequencing projects were retrieved from the NCBI Genome [28], BIPAA (https://bipaa.genouest.org/sp/spodoptera_frugiperda_pub/), and OrthoDB databases [29]. Quantitative measures for the assessment of the gene set completeness of the 23 analyzed genomes according to BUSCO results [30] are shown in Figure S5. For genes with transcript isoforms, only the lengthiest protein sequence was retained for further analysis.
For all genomes, protein sequences were annotated using InterProScan software [31]. All sequences annotated as containing a trypsin domain (IPR001254) were selected for further analysis (see workflow in Figure S6). These sequences were grouped according their sequence similarity using the MAFFT online service with the UPGMA hierarchical clustering method [32]. Insect and Homo sapiens sequences previously annotated as encoding serine peptidase proteins were also included in the tree prediction to guide the selection of the branches, including putative trypsin, and chymotrypsin sequences.
Putative trypsin and chymotrypsin sequences were further classified according to the amino acid residue occupying position 189 (numbered according to bovine chymotrypsinogen, UniProt ID: P00766, here and throughout the text). Based on this criterion, sequences with an aspartate at position 189 were selected as putative trypsins, and the remaining sequences were selected as putative chymotrypsins. These two groups of sequences were further filtered via a neighbor-joining clustering method [32] and according to their best hit annotation in BLASTp searches against the Refseq protein database. The predicted tree branches were classified according to their domain (predicted by InterproScan) and functional (predicted by RefSeq) annotation, and sequences within the majority of branches composed of proteins annotated as encoding nondigestive peptidases were removed from the next analysis. The classification and the number of sequences discarded for each insect genome in this step are provided in Table S6.

Gene Phylogenetic Tree Inference
Trypsin and chymotrypsin protein sequences were individually aligned using MAFFT software [33] and submitted for phylogenetic tree prediction using RAxML [34]. A total of 100 mL and 100 bootstrap searches were carried out using the best protein model automatically determined by RAxML by using the PROTGAMMAAUTO option. The resultant trees were visualized and annotated using Figtree software.
Branches with good support predicted for both enzyme trees were individually analyzed. These analyses included the number of genes per species genome and their consensus amino acids at positions related to the enzyme substrate-binding site (described in Figure 1A and based on studies by Bode et al. [35] and Hedstrom et al. [36]).

Spodoptera frugiperda Expression Analyses
Transcript sequencing analysis for Spodoptera frugiperda was conducted using the HiSeq2500 platform (Illumina/Solexa) and the paired-end strategy (2 × 100 bp). Three biological replicates, including ten insects each, divided into three midgut regions (anterior, middle, and posterior) and the carcass (larvae less midgut), were used in the analysis. Total RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. Transcriptome assembly and RNA-seq analysis were conducted according to the method described by Dias et al. [37]. Figure 1B shows a schematic representation of the fractions analyzed by proteomics. Midgut microvillar membranes and microapocrine vesicles were obtained as previously described using pools of 25 last-instar larvae [21,38]. The ectoperitrophic fluid is equivalent to the last supernatant obtained from midgut washing for microapocrine vesicle isolation. The peritrophic membrane and the endoperitrophic content were separated after dissection. Then, the PM was washed twice in 100 µL of two different solutions of 25 mM ammonium bicarbonate buffer, pH 7.5 (proteomics buffer), for 1 min at 3000 RPM. The solutions obtained from 25 PMs were pooled and centrifuged at 600× g for 20 min at 4 °C. The supernatant was collected and used as a peritrophic membrane washing (PMW) sample source. The remaining PM samples were homogenized in proteomics buffer using a Potter-Elvehjem device, followed by centrifugation for 20 min at 600× g and 4 °C, and the supernatant was collected for proteomic analyses. The isolated endoperitrophic contents of 25 animals were subjected to the same subsequent procedure as the PM remnants prior to analyses.

Midgut Fractionation and Proteomic Analyses
Hemolymph (20 µL) was collected from a single animal after a small puncture was introduced near its third leg. Quickly thereafter, the sample was subjected to centrifugation at 16,100× g for 5 min at 4 °C. The supernatant was collected, and the pellet was resuspended in 20 µL of deionized water. Both fractions were immediately frozen at −80 °C. Fat bodies from the same larvae were then dissected and homogenized in deionized water using a Potter-Elvehjem device.
Due to the different nature of the protein sources, two different protocols were used for protein digestion. In-gel and in-solution digestion were performed as previously described using 30 or 15 µg of protein, respectively [21]. Each gel lane was cut into 2 mm bands and pooled into five samples for analysis, each of which included six subjacent bands starting from the top of the gel. Two biological replicates of each midgut fraction, hemolymph and fat body, were analyzed after digestion by the two protocols. Analyses were conducted in a Q-Exactive Plus instrument (Thermo Scientific) equipped with a nanoAcquityUPLC System (Waters). Tryptic digests were resuspended in solution A (0.1% formic acid, 1% acetonitrile). The samples were first trapped in a nanoACQUITY UPLC SymmetryTM C18 Trap Column (100 Å, 5 µm, 180 µm × 200 mm, Waters) and then separated in a Peptide BEH C18 nanoACQUITY Column (130 Å, 1.7 µm, 150 µm × 100 mm, Waters). Trapping was performed for 5 min with a constant flow rate of 3 µl/min in solution A. Peptides were separated through a gradient from 100% solution A to 40% solution B (0.1% formic acid, 99% acetonitrile) over 90 min at a constant flow rate of 250 nL/min. The column was subsequently cleaned for 10 min with 95% solution B. The Q-Exactive Plus MS was operated in positive-ion mode with data-dependent acquisition. MS spectra were acquired in the Orbitrap with an automatic gain control (AGC) target value of 10 × 10 6 ions and a maximum fill time of 80 ms, and the full-scan event was performed over the mass range of m/z 380 to 1800 at a resolution of 70,000. Subsequently, ions selected for MS/MS were dynamically excluded for 30 s. For MS2 scans, higher-energy collisional dissociation (HCD) fragmentation was used, and the spectra were acquired at a resolution of 20,000; MS2 scans were acquired for the 10 most abundant ions, with an activation time of 0.1 ms over a mass range of m/z 100 to 2000. The minimal signal intensity required to trigger MS/MS fragmentation was 2.5 × 10 5 . Raw files were accessed using Xcalibur software (Thermo Scientific).
Generic Mascot files were generated using msConvert [39] with default parameters. The abovementioned transcriptome database was used for protein identification. The searches in Mascot (Matrix Sciences) allowed errors of 10 ppm and 0.1 Da for MS1 and MS2, respectively. Cysteine carbamidomethylation was set as a fixed modification, whereas methionine oxidation and asparagine deamidation were set as variable modifications. Two trypsin miss cleavages were allowed. The search results for the replicates of each midgut fraction, hemolymph, or fat body tissues obtained via both protocols were loaded as a single experiment onto Scaffold 4 [40] for comparisons between them using multidimensional protein identification technology. In the overall experiment, the proteins that exhibited positive identification presented at least two sequenced peptides. Under these conditions, a false discovery rate of 0.2% was observed. Quantification was performed using the normalized spectral abundance factor [41] in Scaffold.

Conclusions
We showed that the trypsin and chymotrypsin gene families are expanded in almost all Lepidoptera groups and that the large numbers of genes in these families are probably a result of duplication events that occurred at different times during the evolutionary history of Lepidoptera. We also identified a large number of genes encoding proteins with high similarity to trypsins and chymotrypsins but with an amino acid replacement at a minimum of one of the positions related to the enzyme's catalytic triad. Some of these proteins were confirmed to occur in different regions of S. frugiperda larvae by RNA-seq and proteomics analysis, suggesting a functional role.
Finally, we demonstrated the relevance of a recycling mechanism allowing newly synthesized enzymes to be carried toward the anterior midgut region in S. frugiperda. In addition, we propose that these enzymes are secreted via both exocytosis and microapocrine secretion mechanisms.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Maximum likelihood phylogenetic tree based on putative trypsin protein sequences, Figure S2: Maximum likelihood phylogenetic tree based on putative chymotrypsin protein sequences, Figure S3: Trypsin-binding site consensus for each similarity group, Figure S4: Chymotrypsin-binding site consensus for each similarity group, Figure S5:BUSCO assessment results for the 23 Lepidoptera species protein datasets, Figure S6: Overview of the bioinformatic workflow used in the putative trypsin and chymotrypsin protein sequences annotation, Table S1: Distribution of the number of genes related to each trypsin similarity group by genome, Table S2: Description of the trypsin protein sequences grouped in each similarity group, Table S3: Distribution of the number of genes related to each chymotrypsin similarity group by genome, Table  S4: Description of the chymotrypsin protein sequences grouped in each similarity group, Table S5: Other sequences with trypsin domains in the S. frugiperda transcriptome, Table S6: Number of sequences with an identified trypsin domain that were not classified as trypsin or chymotrypsin sequences for each analyzed genome