The Set of Serine Peptidases of the Tenebrio molitor Beetle: Transcriptomic Analysis on Different Developmental Stages

Serine peptidases (SPs) of the chymotrypsin S1A subfamily are an extensive group of enzymes found in all animal organisms, including insects. Here, we provide analysis of SPs in the yellow mealworm Tenebrio molitor transcriptomes and genomes datasets and profile their expression patterns at various stages of ontogeny. A total of 269 SPs were identified, including 137 with conserved catalytic triad residues, while 125 others lacking conservation were proposed as non-active serine peptidase homologs (SPHs). Seven deduced sequences exhibit a complex domain organization with two or three peptidase units (domains), predicted both as active or non-active. The largest group of 84 SPs and 102 SPHs had no regulatory domains in the propeptide, and the majority of them were expressed only in the feeding life stages, larvae and adults, presumably playing an important role in digestion. The remaining 53 SPs and 23 SPHs had different regulatory domains, showed constitutive or upregulated expression at eggs or/and pupae stages, participating in regulation of various physiological processes. The majority of polypeptidases were mainly expressed at the pupal and adult stages. The data obtained expand our knowledge on SPs/SPHs and provide the basis for further studies of the functions of proteins from the S1A subfamily in T. molitor.

Development of high throughput sequencing technologies lead to the appearance of high-quality genome assemblies for the whole-genome investigation of SP/SPH genes, performed for model insects, as well as species of great agricultural and medical importance.Genome-wide analyses in beetles (Coleoptera) identified 125 SP/SPH genes in Rhyzopertha dominica (Bostrichidae) [25].From the first coleopteran sequenced genome of the red flour beetle Tribolium castaneum (Tenebrionidae), 177 genes coding for SPs/SPHs were identified [12,26].For another tenebrionid, the yellow mealworm Tenebrio molitor, 38 SP/SPH transcripts were previously identified in the larval gut [27], two of which corresponded to the major digestive trypsin and chymotrypsin studied using biochemical approaches [28,29].Later, 48 SP/SPH transcripts were identified in the larval gut during the study of Cry3A intoxication in T. molitor [30].Analyzing trypsin-like SPs/SPHs in transcriptome datasets from different stages of the T. molitor life cycle, we have previously de novo assembled 54 trypsins and five trypsin-like SPHs [31].We also characterized recombinant preparations of SP, SerP38, and SPH, SerPH122, expressed in the Komagataella kurtzmanii system [32,33].Recent work by Wu and coauthors [34] provided information on 200 T. molitor genes including 112 SPs and 88 SPHs, and transcriptome datasets together with RT-PCR analysis were used for SP-related genes expression profiling at various developmental stages and tissues.
Here, we present the extended and corrected dataset of putative T. molitor SP/SPH cDNAs obtained from genome and transcriptome datasets.We have identified several groups of deduced proteins based on the composition of their active site and predicted specificity, analyzed evolutionary relationships and evaluated differential expression along the life cycle.Finally, sets of SP-related genes involved in digestion, embryonic development, metamorphosis, and innate immunity were predicted, providing valuable information for further physiological, biochemical, and phylogenetic studies of tenebrionid Genome-wide analyses in beetles (Coleoptera) identified 125 SP/SPH genes in Rhyzopertha dominica (Bostrichidae) [25].From the first coleopteran sequenced genome of the red flour beetle Tribolium castaneum (Tenebrionidae), 177 genes coding for SPs/SPHs were identified [12,26].For another tenebrionid, the yellow mealworm Tenebrio molitor, 38 SP/SPH transcripts were previously identified in the larval gut [27], two of which corresponded to the major digestive trypsin and chymotrypsin studied using biochemical approaches [28,29].Later, 48 SP/SPH transcripts were identified in the larval gut during the study of Cry3A intoxication in T. molitor [30].Analyzing trypsin-like SPs/SPHs in transcriptome datasets from different stages of the T. molitor life cycle, we have previously de novo assembled 54 trypsins and five trypsin-like SPHs [31].We also characterized recombinant preparations of SP, SerP38, and SPH, SerPH122, expressed in the Komagataella kurtzmanii system [32,33].Recent work by Wu and coauthors [34] provided information on 200 T. molitor genes including 112 SPs and 88 SPHs, and transcriptome datasets together with RT-PCR analysis were used for SP-related genes expression profiling at various developmental stages and tissues.
Here, we present the extended and corrected dataset of putative T. molitor SP/SPH cD-NAs obtained from genome and transcriptome datasets.We have identified several groups of deduced proteins based on the composition of their active site and predicted specificity, analyzed evolutionary relationships and evaluated differential expression along the life cycle.Finally, sets of SP-related genes involved in digestion, embryonic development, metamorphosis, and innate immunity were predicted, providing valuable information for further physiological, biochemical, and phylogenetic studies of tenebrionid pests.These data are of particular interest due to the fact that T. molitor is the first insect approved by the European Food Safety Authority as a novel food in specific conditions and uses, testifying its growing relevance and potential [35].

Results
2.1.General Characteristics of T. molitor Predicted SPs/SPHs of the S1A Subfamily 2.1.1.Identified Set of Peptidase-like Sequences Analysis of the total T. molitor transcriptome assembly, transcriptomes from different developmental stages coupled with verification of sequences in three new whole genome assemblies (GCA_027725215.1;GCA_014282415.3;GCA_907166875.3), revealed a total of 269 mRNA sequences encoding putative SPs and SPHs.Of these, 137 were transcripts of active SPs with a conserved catalytic triad of amino acid residues in the active center-H57, D102, S195, whereas 125 sequences having one or more substitutions in the catalytic triad were SPHs.In addition, there were seven sequences of polypeptidases (polyserases in humans according to [36]) containing two or three tandem peptidase domains, SP and/or SPH, translated from a single ORF as an integral part of the same polypeptide chain.
Bioinformatics analysis allowed us to discover 69 new sequences, and the structure of another 23 sequences previously available [27,31,34] was revised and reannotated.

Annotation of Predicted Protein Sequences of T. molitor SPs
The sequences of active SPs were analyzed by the composition of the S1 substratebinding subsite, where three amino acid residues in positions 189, 216, and 226 reflect to a large extent the specificity of the peptidase [9].We identified trypsins as SPs with a conserved set of amino acid residues in the S1 subsite-D189, G216, G226 (DGG), bringing the negative charge to the S1 pocket base, ensuring specificity for basic residues (R/K) at the P1 position of the substrate [37].Those with A, T, or S at positions 216 or 226 instead of G, while keeping negatively-charged D at the bottom (DGA; DGT; DSG; DAT), were tentatively named as trypsin-like, although their specificity is questionable due to larger side chains located at the pocket walls.Predicted peptidases lacking the negative charge at the base of the S1 pocket were defined as chymotrypsin-or elastase-like according to the residues that occupy the wall positions 216 and 226.Those with small amino acid residues (SGS; SGA; GGS; GAS; GSG; SSG) including sequences with negative charge in the pocket wall (GGD), characteristic of insects [38], were predicted as chymotrypsin-like, for which specificity towards large aromatic (F, Y, W) or mid-size aliphatic (L) side chains in the P1 position is generally accepted.Whereas in putative elastase-like SPs, wall position 216 occupied by bulky hydrophobic residues (SVS; GVS; GVN; GIS; GFS; GYS) generally provides a platform for interaction with small hydrophobic residues at P1.A group of non-annotated peptidases with an unusual S1 subsite was also established, for which specificity could not be reasonably predicted from sequence analysis.The most numerous SPs were trypsins with 64 sequences.Other groups included 10 trypsin-like peptidases, 30 chymotrypsin-like peptidases, 18 elastase-like and 15 non-annotated peptidases.

Domain Organization
To propose the functional role of T. molitor SPs/SPHs, their domain organization was studied.The vast majority of the sequences were presented as preproenzymes.The predomain or N-terminal signal peptide responsible for the secretory pathway was found in 262 sequences out of 269 studied.Eighty-three sequences contained one or more regulatory domains in the propeptide structure responsible for various physiological functions in the insect.Namely, these were 53 sequences out of 137 SPs with the classical catalytic triad, 23 sequences out of 125 SPHs, and all sequences of polypeptidases contained regulatory domains.Thirteen peptidases had a transmembrane domain (TM).Among them, seven had a TM at the N-terminus and six at the C-terminus.Most sequences of mature enzymes without prodomain contained 225-260 amino acid residues.

Trypsins and Trypsin-like Peptidases
In T. molitor transcriptome dataset, transcripts coding for putative trypsin-related proteins constituted the most numerous group: 64 trypsin sequences and 10 trypsin-like.Sequence analysis revealed that 39 trypsins were mosaic containing a variety of noncatalytic regulatory domains in the propeptide, as well as 6 trypsin-like sequences, and only 25 trypsins and 4 trypsin-like peptidases had no regulatory regions in the propeptide, but 4 trypsins had a transmembrane region in the C-terminal end of the sequence (Table 1, Figure 2).

Figure 2).
Most of SPs without regulatory domains are probably activated by trypsins, since 24 out of 25 sequences demonstrate conserved cleavage (activation) site with R or K residues at the carboxyl side of the scissile bond (P1) and hydrophobic branched V or I at the P1′, indispensable for stabilization of new active conformation by hydrogen bonding to D194, the preceding residue to the catalytic S195 [39].Non-tryptic activation (processing) of the proenzyme is proposed for only single trypsin SerP135 with G residue at P1 of the scissile bond, and single trypsin-like SerP105 with L residue at P1, both from the group of SPs without regulatory domains.In the group of trypsins and trypsin-like T. molitor SPs with regulatory domains, 16 sequences have mainly hydrophobic residues at the C-terminal of the propeptide, which do not match the specificity of trypsin and are presumably activated by other peptidases.It should be noted that none of the T. molitor trypsins compared to its mammalian counterparts contain a consensus motif for recognition and cleavage by enteropeptidase (DDDDK#) [40], suggesting an alternative regulation of zymogens conversion into active enzymes in insect midgut lumen.Most of SPs without regulatory domains are probably activated by trypsins, since 24 out of 25 sequences demonstrate conserved cleavage (activation) site with R or K residues at the carboxyl side of the scissile bond (P1) and hydrophobic branched V or I at the P1 ′ , indispensable for stabilization of new active conformation by hydrogen bonding to D194, the preceding residue to the catalytic S195 [39].Non-tryptic activation (processing) of the proenzyme is proposed for only single trypsin SerP135 with G residue at P1 of the scissile bond, and single trypsin-like SerP105 with L residue at P1, both from the group of SPs without regulatory domains.In the group of trypsins and trypsin-like T. molitor SPs with regulatory domains, 16 sequences have mainly hydrophobic residues at the C-terminal of the propeptide, which do not match the specificity of trypsin and are presumably activated by other peptidases.It should be noted that none of the T. molitor trypsins compared to its mammalian counterparts contain a consensus motif for recognition and cleavage by enteropeptidase (DDDDK#) [40], suggesting an alternative regulation of zymogens conversion into active enzymes in insect midgut lumen.
Among the 45 mosaic sequences with one or more regulatory regions in the propeptide, clip domains of several different types represent the most abundant non-catalytic structural unit of these trypsin-related proteins.A total of 35 clip domain trypsins were identified, including 12 with clip-B, 12 with clip-C, and 11 with clip-D type domains, revealed according to the classification provided earlier [41].Among 10 sequences of trypsin-like peptidases, which had substitutions in the structure of the S1 subsite (7 with DGA, and single DAT, DGT, and DSG) (Table 1), 4 of 6 sequences with regulatory regions had clip domains (1 with clip-B and 3 with clip-C) and 2 had the CUB domain (CUB, IPR000859) (Figure 2).The remaining four mosaic sequences of true trypsins contained chitin-binding modules (CBM, IPR002557), low-density lipoprotein receptor type A repeats (LDL, IPR002172), scavenger receptor cysteine-rich domain (SRCR, IPR017448), thrombospondin type 1 repeats (TSP, IPR000884), Frizzled domain (Fz, IPR020067), Pan/Apple domain (PAN, IPR003609), and a domain in Complement 1r/s, Uegf and Bmp1 (CUB, IPR000859).
The isoelectric point (pI) of true trypsins and trypsin-like SPs varied over a wide pH range from 4.3 to 9.5 pH units, suggesting possible involvement of these SPs in different physiological processes.

Chymotrypsin-like Peptidases
Thirty insect chymotrypsin-like peptidases are quite diverse in configuration of amino acid residues at positions 189, 216, and 226, which are essential to ensure primary substrate specificity.There was no residues configuration found in the classical vertebrate A-type chymotrypsin P00766 (S189, G216, G226) (Table 2).The bottom of the S1 specificity pocket (sequence position 189) was mostly occupied by G residues, as well as by five classical S, three A, and unique T. In 20 peptidases, where G was present at position 189, S residue was detected in wall positions 216 or 226, and in two sequences (SerP71 and SerP303), A residue was detected like in bovine chymotrypsin B P00767 (S189, G216, A226).Two sequences, SerP16 and SerP69, resembled bovine chymotrypsin-like elastase 2a Q29461 (S189, G216, S226).SerP69 was previously purified and was similar in substrate specificity to chymotrypsins, but did not hydrolyze short substrates containing up to two amino acid residues [27,29], which is typical for insect chymotrypsins [42].SignalP-Signal peptide; Mm mature-molecular mass of the mature protein; pI-isoelectric point of the mature protein; SerP-serine peptidase.Regulatory domains: LDL-Low-Density Lipoprotein receptor (IPR002172); Sushi-Sushi-domain (IPR000436).The amino acid residues after which the propeptide is cleaved are highlighted in bold.
Ten peptidases with a charged residue in the wall of the S1 specificity pocket (GGD, GSD, AGD, GAD) represent another specific to insects group of chymotrypsins, and according to the available biochemical data, display preferential hydrolysis of chymotrypsin substrates [38,43,44].However, presence of a negatively charged residue at position 226 of the S1 pocket may provide additional specificity for basic side chains at P1 of the substrate due to differences in the overall structure of the S1 pocket, as it was described for crab collagenases brachyurins [45,46].
Most of the 30 chymotrypsin-like sequences identified in T. molitor represented SPs without regulatory domains, except only a single mosaic peptidase (SerP449) with four LDL and one Sushi (IPR000436) domains in propeptide (Figure 3), which was proposed as a putative ortholog of M. sexta HP14 (modular SP, MSP) [17].For most of these chymotrypsin-like SPs, a conserved propeptide cleavage site was predicted (R#I), suggesting trypsins involvement in activation.Alternatively, cleavage at the proposed unique site (H#I) may provide a strictly specific activation (SerP16), or other chymotrypsin-or elastase-like SPs may perform cleavage at the L#I site as in the case of SerP449.Most remarkable was the absence of a canonical activation cleavage site in SerP586, which proposes alternative mechanisms for activation at the L#K site.Most of the chymotrypsin-like SPs had a pI in the acidic region, from 3.8 to 5.3 pH units.Two SPs (SerP101 and SerP276) had a neutral pI and only SerP69 had an alkaline pI of 8.8.SignalP-Signal peptide; Mm mature-molecular mass of the mature protein; pI-isoelectric point of the mature protein; SerP-serine peptidase.Regulatory domains: LDL-Low-Density Lipoprotein receptor (IPR002172); Sushi-Sushi-domain (IPR000436).The amino acid residues after which the propeptide is cleaved are highlighted in bold.

Elastase-like Peptidases
A group of 18 predicted T. molitor SP sequences with bulky hydrophobic residues (mostly V or I) at wall position 216 of the S1 binding subsite were annotated as elastaselike enzymes (Table 3).This position is considered a key determinant of the specificity of vertebrate elastases and ensures hydrolysis of small amino acid residues at position P1-A, V, and less commonly, L [47].The other wall position 226 of the S1 specificity pocket was occupied by the S residue, with the exception of two proteins with residues N (SerP94) and A (SerP472), and at the bottom position 189, there were also small residues G, S, and one A (SerP156).The larger residues were found only at position 216 in three predicted enzymes: T in SerP185, F in SerP155, and Y in SerP85, and the two latter enzymes are of a special interest as its substrate-binding pocket theoretically should be more reduced in depth as compared to other T. molitor elastases.Unfortunately, there were no vertebrate peptidases described providing a similar residues configuration of the S1 pocket, to

Elastase-like Peptidases
A group of 18 predicted T. molitor SP sequences with bulky hydrophobic residues (mostly V or I) at wall position 216 of the S1 binding subsite were annotated as elastase-like enzymes (Table 3).This position is considered a key determinant of the specificity of vertebrate elastases and ensures hydrolysis of small amino acid residues at position P1-A, V, and less commonly, L [47].The other wall position 226 of the S1 specificity pocket was occupied by the S residue, with the exception of two proteins with residues N (SerP94) and A (SerP472), and at the bottom position 189, there were also small residues G, S, and one A (SerP156).The larger residues were found only at position 216 in three predicted enzymes: T in SerP185, F in SerP155, and Y in SerP85, and the two latter enzymes are of a special interest as its substrate-binding pocket theoretically should be more reduced in depth as compared to other T. molitor elastases.Unfortunately, there were no vertebrate peptidases described providing a similar residues configuration of the S1 pocket, to further speculate about their specificity.Elastases with two bulky residues in key positions of the specificity pocket, like bovine pancreatic elastase 1 (A189/V216/T226, Q28153), were absent in T. molitor, so it can be assumed that in the majority of insect elastases, the substrate-binding subsite is less occluded compared to that of pancreatic elastases 1 of vertebrates.Another interesting feature of the studied elastases was the presence of I in the position 216 and five SPs had the triad GIS in the S1 subsite, which is typical only for representatives of the Tenebrionidae family.SignalP-Signal peptide; Mm mature-molecular mass of the mature protein; pI-isoelectric point of the mature protein; SerP-serine peptidase.The amino acid residues after which the propeptide is cleaved are highlighted in bold.
All elastase-like enzymes had no regulatory regions in the propeptide (Figure 3), with a conserved propeptide cleavage site (R#I) suggesting for most of the sequences (16 out of 18) involvement of trypsins in activation (Table 3).For only two SPs (SerP94 and SerP120), cleavage at a unique site (H#I) suggests a specific processing pathway.The majority of elastases-like SPs had a pI in the acidic region from 4.0 to 4.9 pH units.A single SP SerP74 had an alkaline pI of 8.6, while vertebrate elastases 1 and 2 are mostly cationic or neutral [48].

Non-Annotated Serine Peptidases
A heterogeneous array of sequences, of which the specificity remains obscure due to the non-typical combination of primary specificity determinant residues, were tentatively grouped as non-annotated SPs, until the biochemical data will become available or closely related orthologs will be found and characterized.A total of 15 sequences were attributed to this group showing the most diverse 189, 216, 226 residues configuration (AAT, GAT, GGK, QGS, RGV, VAD) (Table 4).The propeptide cleavage site in this group of sequences is variable including R, K, L, and I at the C-terminus of the propeptide.Most non-annotated peptidases had neutral or alkaline pI.
For seven sequences, regulatory regions were identified in the propeptide (Figure 3), including the GD (gastrulation defective, IPR031986) domain confirmed in five related peptidases, which are putative orthologs of D. melanogaster gastrulation defective involved in establishment of dorsoventral embryonic polarity [49].SerP1040 had a Sushi domain, and SerP355 had four LDL and one Sushi.SerP416 had a C-terminal TM domain.

Serine Peptidase Homologs
Serine peptidase homologs are SP-related proteins, for which the functional role is still poorly understood.Although sharing an SP-like domain and fold, they contain one or more substitutions in the catalytic triad residues, suggesting partial or complete loss of catalytic activity, and new functions of SPHs (like regulation, inhibition, and immune modulation) may be compensated through an alternative exosite [50].In total, 125 SPH sequences with various substitutions of the catalytic triad H57, D102, S195 were identified in T. molitor (Table S1).In the catalytic position H57, only 42 proteins had H, and the most common substitution was H195Q in 55 SPHs.At position D102, only 13 substitutions were observed, while S195 was retained in 24 SPHs.In the remaining proteins, S in position 195 was replaced by 26 G, 21 T, 11 N, 10 L, 9 V, 7 I, and also 1-4 residues were presented by A, M, D, E, K, R, Y, F.
Most SPHs had a signal peptide (that is, they are secreted proteins) and are presumably processed by trypsin.In addition, a significant group of proproteins with an unconventional type of processing was also identified, and in some cases, it was even difficult to identify the sequence of the processing site, which is highly conserved in SPs.Most SPHs were anionic proteins with pI at 4-5 pH units.However, a significant proportion of homologs, mainly SPHs with regulatory domains in the propeptide, had neutral or alkaline pI.Most of the SPHs (102 sequences) had no regulatory regions in the propeptide, while 21 out of the rest of the 23 sequences possessed an array or clip domains of A, B, and C types (Figure 4).Two homologs (SerPH570 and SerPH364) were proposed to be associated with plasma membrane via a type-II transmembrane motif.Their prolonged extracellular region included an array of domains such as characteristic juxtamembrane SEA (Sperm protein, Enterokinase, and Agrin domain, IPR000082) or Frizzled domains as well as modules for protein-protein interaction including LDL, EGF (laminin/Epidermal Growth Factor-like domain, IPR002049), and SRCR.

Polypeptidases
We identified seven T. molitor polypeptidase transcripts that encoded putative proteins comprising two to three tandemly arranged peptidase domains, which contained regulatory regions located upstream of each peptidase unit, most often presented by two Sushi domains (Figure 4, Table 5).Four of these proteins contained two peptidase-like domains of which the first (N-terminal) was chymotrypsin-like SP, while the second (Cterminal) was SPH.Another related polypeptidase (pSerPH608) contained two SPH domains, and pSerP614 comprised one chymotrypsin-like and two SPH domains.For all these six secreted proteins was predicted a conserved activation site (L#I) upstream of each SP/SPH domain.And a single transcript encoded a membrane-anchored protein (pSerP1050) containing trypsin and unusual SPH domain with on the whole seven LDL regulatory regions.

Polypeptidases
We identified seven T. molitor polypeptidase transcripts that encoded putative proteins comprising two to three tandemly arranged peptidase domains, which contained regulatory regions located upstream of each peptidase unit, most often presented by two Sushi domains (Figure 4, Table 5).Four of these proteins contained two peptidase-like domains of which the first (N-terminal) was chymotrypsin-like SP, while the second (C-terminal) was SPH.Another related polypeptidase (pSerPH608) contained two SPH domains, and pSerP614 comprised one chymotrypsin-like and two SPH domains.For all these six secreted proteins was predicted a conserved activation site (L#I) upstream of each SP/SPH domain.
And a single transcript encoded a membrane-anchored protein (pSerP1050) containing trypsin and unusual SPH domain with on the whole seven LDL regulatory regions.Based on data on "polyserases", human polypeptidases, it can be assumed that upon activation, peptidase domains may be linked to each other by interdomain disulfide bonds [51].It was also proposed that SPH domains of secreted polyserases would act as dominant negative binding proteins, modulating the function of the first active SP domain.The same proteolytic mechanism can be proposed for T. molitor polypeptidases that resemble human polyserases.

Phylogenetic Analysis of SPs and SPHs in T. molitor
Phylogenetic analysis of 269 SP-related sequences identified in T. molitor showed that they were clustered into two major groups, A and B (Figure 5).Group A (164 sequences) with nine major branches identified (A1-A9) included both SPs and SPHs without regulatory domains in the propeptide.The A1 clade mainly consisted of trypsins including the major digestive trypsin SerP1 (see Section 2.9.3), with only a few sequences proposed as chymotrypsin-like and non-annotated peptidases.Clade A2 included putative trypsins and a single homolog (SerPH43) with a carboxy-terminal hydrophobic extension that resembles a corresponding region of vertebrate peptidases prostasin and testisin, which are post-translationally modified via a glycophosphatydylinositol (GPI) linkage responsible for cell-surface association of these SPs [52,53].Additionally, two SPs with extended hydrophobic C-terminus from clades A9 (SerP423) and B4 (SerP416) also likely represent distinct GPI-anchored enzymes with unknown specificity.Here, for the first time, we present a group of putative insect analogs of vertebrate regulatory GPI-anchored SPs, of which prostasin also shared a trypsin-like specificity [54].In some SP sequences, the hydrophobic regions were longer, and they were confidently predicted as TM by programs such as Phobius (SerP125, SerP84, SerP76, SerP416), while in other sequences, predictions about these regions were only from the TM DOCK program (SerP48, SerP104, SerPH43) or had rather low probability.The A3 and A4 clades included predominantly chymotrypsin-like peptidases and related homologs.Chymotrypsin-related sequences from clade A4 represent another insect-specific group containing the acidic residue D226 located on the wall of the S1 pocket (see Section 2.3), but displaying chymotrypsin specificity [38,43,44] in contrast to crab homologs with the same S1 subsite triad [45,46], which efficiently hydrolyze both trypsin and chymotrypsin substrates.It is interesting to note that most of the related homologs from clades A3 and A4 also shared acidic (D or E) residues at position 226 of their primary specificity pocket (Table S1).Clades A5, A6, and A7 included numerous SPHs, likely evolved by multiple duplication events.All 18 predicted elastase-like SPs were scattered among the 87 SPHs, which similar to the elastases mostly shared large aliphatic residues (V/I) at position 216 of their S1 binding pocket.In clade 7, there were also four chymotrypsin-like SPs, one of which, SerP69, was the major digestive chymotrypsin, had an S1 binding subsite (SGS) similar to bovine chymotrypsin-like elastase 2a Q29461, and was biochemically shown to lack the ability to cleave short peptide substrates [27,29] in contrast to another digestive chymotrypsin-like enzyme SerP38 from clade A4 [44].Clade A8 contained putative chymotrypsin-like SPs mainly with GGS primary specificity determinant.The A9 clade also included chymotrypsin-like SPs, but with the GSG structure of the S1 binding subsite, as well as trypsin SerP6 and unusual non-annotated peptidase SerP423; all these SPs were characterized by acidic pI.
Group B contained 105 sequences of which most possessed one or more regulatory domains in the extended propeptide.Clip domains represent the most abundant noncatalytic structural units predicted for 60 of such sequences, divided into four major groups (clip-A, -B, -C and -D) based on clip sequence similarity [41].Fifteen clip-A proteins exclusively represented by non-active SPHs were clustered together into a single clade B5 including prophenoloxidase (pPO)-activating factor II PPAF II (SerPH415) [55].Clip-A domain folds as irregular β-sheet [56], which is likely characteristic for all of these related SPHs.Clip-B and clip-C proteins from clades B3 and B2, respectively, mainly presented by trypsins and few SPHs, likely shared a more typical clip domain fold composed of antiparallel distorted β-sheet flanked by two α-helices [57].It is established that clip-C SPs activate terminal clip-B peptidases of the extracellular immune signaling pathway, which cleave the effector molecules pPO or procytokine proSpätzle [41].In T. molitor, these peptidases were identified [58] and clip-C trypsin (SerP228) named Tm-SAE is in clade B2.Clip-C SerP228 activates terminal clip-B trypsin Tm-SPE (SerP183) from clade B3, which in turn activates pPO and its inactive cofactor SerPH415 (clade B5), or proSpätzle in the Toll signaling pathway [4].It must be noted that one clip-B SP from clade B3 (SerP275) contained two clip-B domains.Clip-D trypsins mainly located in clade B9 possessed a propeptide highly variable in length and sequence (108-548 aa) often including prolonged disordered regions downstream of the N-terminal clip domain.A clip-D peptidase HP1 of M. sexta is proposed as an unusual component of immunity associated with the signaling pathway [59].
The B1 clade included two trypsin-like peptidases with the CUB domain in propeptide.Shown to be involved in protein-protein interaction, CUB domain(s) are characteristic for an array of chymotrypsin family SPs such as mammalian complement subcomponents (C1r/C1s), enterokinase, and matriptase.Confirmed to be essential for a diverse range of functions from immune regulation to digestion, development, and morphogenesis in vertebrates [60,61], the role of the CUB domain SPs in insects still needs further research.A highly supported clade B6 contained peptidases with Sushi domains including the majority of polypeptidases and chymotrypsin-like modular SP Tm-MSP (SerP448) that initiates proteolytic signaling cascades activating clip-C trypsin Tm-SAE (SerP228) from clade B2 [4,62].The clade B7 contained five peptidases with the gastrulation defective (GD) domain.In D. melanogaster embryo, GD SP participates in the developmental Toll signaling pathway [63].The clade B8 included sequences of long SP-related proteins with a highly variable set of regulatory domains in the propeptides such as Tequila (SerP55), Corin (SerP285), Nudel (pSerP1050), TSP (SerP11), and membrane-associated homologs SerPH364 and SEA (SerPH570).The clade B10 contained predominantly low-expressed trypsins at the stages of embryogenesis and metamorphosis (see Sections 2.9.1 and 2.9.2).Interestingly, in a tree constructed using only peptidase domain sequences without prepropeptides (Figure S1), major branches with minor variations are retained, including a clade containing the peptidases with the longest propeptides (B8).

Expression Profiling of SP and SPH Genes in Different Life Stages of T. molitor
To infer the functional role of the described diversity of SPs/SPHs in various physiological processes, we analyzed expression patterns of their transcripts at different stages of the T. molitor life cycle, including eggs, larvae of the II instar, larvae of the IV instar, early and late pupae, and male/female adults.Data for the most highly expressed tran-of 29 scripts at the egg, pupal and feeding larval and adult stages are presented in Tables 6-8, respectively, while the expression data for all transcripts are shown as heatmaps in Figure 6, where they are combined into six groups.Group 1-SPs without regulatory domains, expressed at the feeding stages of larvae and adults; group 2-SPs without regulatory domains, expressed in eggs and pupae; group 3-SPHs without regulatory domains, expressed at the feeding stages; group 4-SPHs without regulatory domains, expressed in eggs and pupae; group 5-SPs/SPHs with clip domains; group 6-SPs/SPHs with other regulatory domains.domains, expressed at the feeding stages of larvae and adults; group 2-SPs without regulatory domains, expressed in eggs and pupae; group 3-SPHs without regulatory domains, expressed at the feeding stages; group 4-SPHs without regulatory domains, expressed in eggs and pupae; group 5-SPs/SPHs with clip domains; group 6-SPs/SPHs with other regulatory domains.Clip-C-light blue; Clip-D-grey-blue; Sushi-green; GD-red; MSP-blue-green; peptidases with several regulatory domains-dark blue.Life cycle stages: E-egg, LII-second instar larvae, LIVfour instar larvae, EP-early pupa, LP-late pupa, M-male, F-female.

Embryonic Stage: Eggs
Most of the SPs/SPHs with relatively high mRNA expression levels in the embryonic stage belonged to regulatory proteins, as they contained regulatory clip and GD domains (Table 6, Figure 6(5a,g,6b)).The maximum level of expression in eggs was observed for clip-A SPHs, SerPH236, and Ser PH235, with lower levels at other stages.Transcripts with egg-specific expression showed slightly lower expression levels.Those included clip-B trypsins (SerP166 and SerP116) and SPH (SerPH203), as well as a clip-A SPH SerPH165.Clip-C SPs with moderate expression (SerP145 and SerP61) as well as rather low-expressed SPs with a GD domain (SerP550 and SerP442) demonstrated constitutive expression across most of the stages with the predominance in eggs.
Transcripts without identified regulatory domains in the propeptide with rather low expression levels (Table 6, Figure 6(2a,c)), as well as two SPs with GD domains (SerP466 and SerP454) and clip-A SerPH389, also demonstrated constitutive expression including the egg stage, but with increased levels in the late pupae and IV instar larvae.It should be noted that within this group, three trypsins (SerP28, SerP22, and SerP5) had extended propeptides, but without known regulatory regions, which may indicate the possible presence of potential regulatory domains that have not yet been identified, and, accordingly, specific functions that have not yet been defined.And the only SP in this group with a short propeptide without regulatory domains was a putative elastase SerP156, which could be involved in hydrolytic functions in the egg, such as vitellin hydrolysis.

Metamorphosis: Early and Late Pupae
Most of the highly expressed SP/SPH transcripts at the pupal stages, as well as at the egg stage, contained regulatory domains, and among them, the majority were SPHs with a clip-A domain (Table 7, Figure 6(5d)).In general, SP/SPH transcripts were expressed at both pupal stages, but the levels of expression were higher at the late pupae, and most of the transcripts were also expressed at varying levels across the entire life cycle.The exception was the transcript of the anionic trypsin SerP35 (Table 7, Figure 6(2a)) with a short propeptide, which was specific only for the early pupal stage, and clip-A SerPH78 (Table 7, Figure 6(5g)), which was expressed predominantly at the early pupae.But the highest level of expression at the early pupae was observed for the transcript of homologs SerPH164 with a clip-A domain and SerPH1034 (Table 7, Figure 6(4a)) without regulatory domains, which were upregulated at the late pupae.Noticeable levels of expression were observed here also for transcripts of the SerPH364 homolog and the SerP55 Tequila peptidase (Table 7, Figure 6(6b)), both with a large number of regulatory domains in the propeptide.
At the late pupae in contrast to the early pupae, trypsin SerP28 (Table 7, Figure 6(2a)) had the highest level of expression together with two peptidases with regulatory domains, SerP247 (Table 7, Figure 6(5d)) and SerP466 (Table 7, Figure 6(6b)).The latter belonged to unannotated peptidases, had a GD regulatory domain, and was also actively expressed at the egg stage.The only transcript that was actively expressed at the late pupal stage and was not expressed in the early pupae belonged to the single elastase-like SerP156 (Table 7, Figure 6(2a)) without a regulatory domain.This type of non-regulated peptidases, SerP156 and SerP35 specific for the early pupae, may be involved in specific tissue remodeling at specific pupal stages.Interestingly, SerP156, as well as trypsin SerP28, were among the highly expressed peptidases at the egg stage, and their transcripts were also upregulated at larval stages IV and II, respectively.

Feeding Stages: Larvae and Imago (Adults)
The largest part of the SP/SPH transcripts was expressed at the feeding stages, larvae (II and IV instars) and adults (females and males) (Table 8, Figure 6(1,3)), whereas at the developmental stages, eggs and pupae, their genes were practically silent, which most likely indicates the involvement of these SPs/SPHs in the digestive process.This involvement is also confirmed by the data on the high level of expression of these transcripts in the larval gut transcriptome (Table 8).Almost all these transcripts coded for preproenzymes with a small propeptide without regulatory regions.In most cases, they were processed by trypsin after C-terminal R of the propeptide.The highest levels of expression were from active SPs (Table 8, Figure 6(1)), although highly expressed transcripts at feeding stages were also present in the large group of SPHs (Table 8, Figure 6(3)).
Among 61 transcripts of SPs with the classical catalytic triad HDS expressed at one or more feeding stages (Figure 6(1)), several subgroups could be distinguished with similar expression profiles.Subgroup 1a-SPs with a high level of transcripts expression at all feeding stages; 1b-SPs with a high level of expression at IV instar larvae and imago stages; 1c-SPs expressed only at adult stages; 1d-SPs with a high level of transcripts expression mainly at the IV instar larvae.
Subgroup 1a contained the most highly expressed transcripts of digestive SPs (Figure 6(1a)).The majority of them (10) encoded chymotrypsin-like SPs including the earlier characterized major digestive chymotrypsin SerP69 with an extended binding site [29], two transcripts encoded trypsins including the major digestive trypsin SerP1 [28], and two were elastase-encoding transcripts (SerP85, SerP288).The transcript of chymotrypsin-like SerP108 was characterized by an extremely high level of expression at the early larval stage (Table 8).A similar expression profile was demonstrated by chymotrypsin-like SerP314 and trypsin SerP16.All SPs from subgroup 1a had a pI in the acidic region, with the exception of the major trypsin SerP1 and chymotrypsin SerP69 (Sections 2.2 and 2.3).
Transcripts from SPs of subgroup 1b expressed at IV instar larvae and adults (Figure 6(1b)) encoded five putative elastase-like SPs, three chymotrypsin-like and three trypsins.The most highly expressed were two elastase-like peptidases, SerP41 and SerP185, and chymotrypsin-like SerP246.The majority of SPs from subgroup 1b also had pI in the acidic region, with the exception of elastase-like SerP74 and trypsin SerP30 (Sections 2.2 and 2.4).Another trypsin, SerP125, had a C-terminal TM domain.
Transcripts from subgroup 1c encoded SPs expressed predominantly at adult stages.Almost half of the group (five) were non-annotated SPs due to an atypical set of amino acid residues in the S1 subsite (Figure 6(1c), Table 4).The subgroup also included two chymotrypsin-like SPs, three elastase-like, three trypsins, and one trypsin-like SP.All these transcripts had a moderate level of expression with maximum values in non-annotated SerP462 (S1 binding subsite TSF).Interestingly, all non-annotated SPs had alkaline or neutral pI (Section 2.5), while all the other SPs were anionic.
Most of transcripts from subgroup 1d coded for SPs expressed predominantly at the IV instar larvae (Figure 6(1d)).The subgroup included 10 chymotrypsin-like, 6 elastaselike SPs, 5 trypsins, and one non-annotated SP.The maximum level was observed for chymotrypsin-like SerP38 with an unusual S1 binding subsite GGD, but exhibiting substrate specificity typical of chymotrypsins (Table 8) [44].Another transcript with a high level of expression encoded trypsin SerP209.The remaining transcripts had a moderate or low level of expression.All SPs including the non-annotated one had a PI in the acidic region.Three trypsins (SerP76, SerP84, SerP104) with low levels of transcript expression had a C-terminal TM domain (Section 2.2).
It must be noted that we found two peptidases with regulatory domains expressed only at the feeding stages: trypsin SerP282 with clip-C domain and trypsin-like SerP178 with a CUB domain (Figure 6(5e,6c)).
Thus, group 1 of 61 SPs (Figure 6(1)) was related to digestion since their transcripts were expressed predominantly at feeding stages, and included the majority of identified chymotrypsin-, elastase-like, and non-annotated SPs without regulatory domains.At the same time, only about a half of non-regulatory trypsins have a similar connection with digestion.The general trend of digestive SPs expression level increase from early to the late larvae instars previously documented [65,66] was confirmed here regarding the expression of transcripts encoding the major digestive SPs of T. molitor larvae.Only a few SP transcripts were predominantly expressed at the early larval stage including three chymotrypsin-like enzymes: major SerP108, SerP314, and SerP16 (Table 8, Figure 6(1a)).
In addition to transcripts of active SPs with the classical catalytic active center, 95 transcripts coding for SPHs were predominantly expressed at feeding stages (Table 8, Figure 6(3)) and most of them can be associated with digestive function.The majority of these SPHs, as well as SPs expressed at feeding stages, had a small propeptide without regulatory domains, being processed to mature form by trypsin.The majority of the SPH transcripts were significantly upregulated at the IV larval instar, and the most highly expressed are summarized in Table 8. Almost all SPH transcripts were also confirmed in adults although with lower levels, and only about a quarter of the transcripts was also expressed at the II instar larvae.Two SPH transcripts (SerPH393 and SerPH485) had a significant level of expression only at the adult stage (Figure 6(3a)), while no transcripts specific to the II instar larvae were identified.Note that among the highly expressed SPHs (Table 8), there are SerPH122 and SerPH245 with conservative Ser/Thr substitution in the active center in contrast to the radical replacements in the other SPHs.Characterization of recombinant SerPH122 showed that this synonymous homolog had low but reliably detectable proteolytic activity towards chymotrypsin and trypsin chromogenic peptide substrates [33].
The exact role of SPHs is still poorly understood.Nevertheless, whole genome microarray analysis of T. castaneum larvae revealed that the transcripts of ten SPH genes were upregulated more than 5-fold as compensation for the effects of cysteine and serine peptidases dietary inhibitors [67].Also, according to the Section 2.8.mention of the role of clip-A SerPH415 (PPAF-II) in activation of pPO [55], it may be speculated that the above-described major SPHs induced in the feeding stages are somehow involved in luminal digestive SPs activation.

Constitutively Expressed SP-Related Proteins of T. molitor
Another important group of transcripts included SPs/SPHs expressed at several or all stages of the beetle life cycle and presumably participated in important physiological processes such as immune defense, adhesion, regulation of development, and metabolism.Most of the SPs/SPHs with a sufficiently high level of expression at all or most of the life cycle stages had regulatory regions in the sequence structure (Figure 6(5d-g,6b)), and only about one third of the transcripts lacked regulatory domains (Figure 6(2,4)), the majority of which were active SPs.
SPs/SPHs expressed at all stages included the ones with a clip domain of different subtypes (clip-A, clip-B, clip-C, and clip-D) (Figure 6(5d-g)).The majority of peptidases with the Sushi domain were also expressed at all or most stages of the life cycle.They included polypeptidases and MSP-like SPs (chymotrypsin-like SerP449 and non-annotated SerP355) containing a Sushi domain and four LDL domains (Figure 6(6b,c)).Peptidases containing the GD domain had similar expression profiles.Four out of five of them (SerP442, SerP454, SerP466, SerP550) were expressed at all stages of development with their higher level at the egg and pupa stages, and SerP466 was also found to be highly expressed in adult females.Trypsins, which had a complex multidomain structure, were also expressed at most stages of the life cycle.These peptidases included SerP55 (Tequila), SerP285 (Corin), and SerP11 (TSP).

Discussion
SP-related proteins of S1A family identified in the T. molitor transcriptome include 269 sequences of which 137 were identified as active SPs with classical catalytic residues, and 125 were annotated as putative non-active SPHs that possess one or more substitutions in the catalytic triad.Seven deduced sequences containing several SP/SPH domains were putative polypeptidases, for which the physiological role remains generally unknown.T. molitor SPs/SPHs of the S1A chymotrypsin family occupy an intermediate position among insects in terms of the number of identified sequences.A comparable number of SP-related sequences (257) was described for D. melanogaster (Diptera: Brachycera) [12], whereas in mosquitoes A. aegypti and A. gambiae (Diptera: Nematocera), genome-wide analysis identified 369 and 337 SP-related sequences, respectively [12,15].A significantly lower number with only 44 identified sequences of putative SPs/SPHs was described in A. mellifera (Hymenoptera) [23].
In T. molitor, 84 SPs and 102 SPHs without regulatory domains constitute the largest group of SP-related proteins.Transcripts of 61 SPs were expressed only in the feeding life stages; 24 of them were highly expressed in the larval gut and presumably play an important role in digestion.Similar quantitative data were previously obtained for other insects including larvae of D. melanogaster (53 gut peptidases of which 35 were highly expressed) [12], A. gambiae (63 and 27, respectively) [14], and M. sexta (61 and 35, respectively) [18].But even closely related insects have functional differences in the general set of digestive SPs; for example, the most highly expressed SP in T. molitor is trypsin SerP1, and in T. castaneum it is chymotrypsin XP_970603.1,although their major digestive cysteine peptidases are orthologs with 74% identity [68].At the same time, there is a close link between the primary structure of the certain digestive SPs and their functions.Accordingly, a comparison of two orthologous pairs of T. molitor and T. castaneum chymotrypsin-like digestive SPs, SerP38 and CBC01177 (pair I, respectively), and SerP88 and CBC01166 (pair II, respectively), shows that pair I was expressed at the larval and adult stages, while pair II was expressed only in the larval gut [69].
The remaining 23 transcripts of SPs without regulatory domains showed constitutive or specific expression at certain stages of T. molitor development.The physiological role of most of these SPs requires further study, but it can be assumed that SPs showing high expression at the egg stage participate in the hydrolysis of storage proteins, as was previously shown for B. mori [1], while the SPs expressed at the pupal stages of T. molitor can be involved in the breakdown of the larval structures during metamorphosis.
In addition to SPs, the largest group of 95 T. molitor SPH sequences lacking regulatory regions were also expressed predominantly during feeding stages.The physiological role of SPHs is still poorly understood; however, some of them highly expressed (9 out of 95) during the feeding stages may play a certain regulatory role that may be related with digestive peptidase activation or their interaction with substrates or inhibitors in the midgut lumen.It was shown that some of the homologs are able to bind with the substrates and even provide a low-rate hydrolysis [33,50].
Another group of SP-related proteins identified included 53 sequences of SPs and 23 SPHs with regulatory domains, such as different clips, LDL, SRCR, TSP, and others.While having significantly lower expression levels than that of the gut digestive peptidases, most of them demonstrated constitutive expression throughout the entire life cycle, while specific SPs and SPHs with various regulatory domains demonstrated increased expression at eggs or pupae stages.Among these sequences, clip SPs/SPHs were the most numerous.Of the 60 SPs/SPHs with a clip domain that we identified in T. molitor, 16 belonged to the clip-A type (all SPHs), 16 to clip-B (13 SPs and 3 SPHs), 17 to clip-C (15 SPs and 2 SPHs), and 11 to clip-D (all SPs).A total number of 60 clip SPs/SPHs is close to 54 sequences identified in the closely related T. castaneum [12,34], and about twice the amount of clip SPs/SPHs, including a distinct subtype clip-E SPs, was identified in mosquitoes A. aegypti and A. gambiae [3,14].According to the available data, SPs/SPHs with clip domains are non-digestive and are present in the hemolymph of insects and other arthropods.They play an important role in regulation of various physiological processes in insects like innate immune responses leading to activation of pPO necessary for melanization, activation of the Toll-dependent signaling pathway leading to synthesis of antimicrobial peptides [41] or regulation of the dorsal-ventral pattern in D. melanogaster embryos [70], as well as regulate the coagulation cascade during hemolymph clotting in crabs [71].
The majority of T. molitor clip-containing transcripts were expressed at all or most stages of the ontogeny, but three of them were specific to the egg stage (SerP116 and SerP166 and SerPH203), while at the pupae stage, only increased expression of constitutively expressed clip transcripts was observed.The egg stage specificity of SerP116 and SerP166 was also described by Wu et al. [34] using RT-PCR analysis.The only experimental data on the specific roles of clip SPs/SPHs in T. molitor came from B.L. Lee's laboratory, where the extracellular larval activation cascade of the Toll receptor and pPO was characterized in detail [4,55,58,62].The proteolytic part of the cascade starts with SerP449 with multiple regulatory domains (MSP), which activates the downstream proSerP228 with clip-C domain (proSAE), which in turn activates proSerP183 (proSPE) with clip-B involved in proSpätzle or pPO activation, but processing of pPO requires additional activation of clip-A homolog proSerPH415.
The remaining smaller part of T. molitor SPs/SPHs had different regulatory domains.Transcripts of SPs with a GD domain were expressed constitutively throughout the entire T. molitor life cycle including eggs and pupae, and all of them were from the non-annotated group of SPs.Similar peptidases with the GD domain were well studied in D. melanogaster, but for the egg stage only [49,63,70].The stable constitutive mRNA expression of these peptidases in T. molitor transcriptomes indicates their possible participation in a wide range of physiological processes in addition to the expected involvement in the cascades forming embryonic polarity during egg development.Another transcript of a large SP Tequila (SerP55) with a variety of regulatory domains was upregulated during the T. molitor pupal and adult stages, and in D. melanogaster this SP was found throughout development participating in immunity response [72].
One of the most interesting groups in T. molitor were polypeptidases, mainly expressed at the pupal and adult stages.Six of them comprise two or three SP/SPH domains and several Sushi domains (Sushi(2)-SP(H)-Sushi(2)-SPH(-Sushi-SPH)).A similar domain architecture, including several peptidase domains and several Sushi domains, has a peptidase SP14 in T. castaneum [12].In A. gambiae, several polypeptidases with a little different structure were identified (SP(H)-SPH-clipE-SPH) [14].In addition, a polypeptidase Nudel (pSerP1050) was also found in T. molitor, which contained two peptidase domains-trypsin and an SPH domain with LDL domains.Similar Nudel (LDL(2)-SP-LDL(2)-SPH-LDL(3)) peptidases were identified in many insects [12,14,21].In D. melanogaster embryo, Nudel initiates the peptidase cascade related with dorsal-ventral patterning [70].Thus, complex polypeptidases were found in insects, but this issue requires further study in order to accurately identify the structure and functions of such proteins.
The great diversity and abundance of serine peptidases of the chymotrypsin S1A family in various insects provide great opportunities for a more detailed study of insects important for agriculture and/or medicine, and for a fundamental understanding of their physiology.We hope that our study will allow scientists to move in this direction.pling due to read depth.Reads were assembled into 130,559 contigs, with 36,463 contigs of the length greater than 1 kb.

SP/SPH Identification in the Transcriptomes
BLAST [78] was used to identify ORFs homologous to those encoding SP/SPH.The sequence of human trypsin 2 (UniProt AC P07478) was used as a query and further identified T. molitor SP/SPH from different groups were used as queries to search for new sequences.Multiple sequence alignment with BioEdit (v.7.0.5)[79] was used to refine and build consensus sequences, and in the case of SNPs, the amino acid chosen was the highest percentage and more than 50% of the total.ORFs that were grouped into blocks with identity of at least 95% and that overlapped with another block of at least 10 amino acid residues were considered as referring to a unique peptidase.The resulting sequences were compared with those available in three newly sequenced T. molitor genome versions (PRJNA820846: GCA_027725215.1;PRJNA579236: GCA_014282415.3;PRJEB44755: GCA_907166875.3) [80,81].

Expression Profiling of SP and SPH at Different Developmental Stages
The expression values were calculated for assembled and refined sequences of complete peptidase mRNAs obtained from T. molitor transcriptomes and genomes (Section 4.3).To obtain expression values for peptidase mRNA by normalized reads per kilobase per million mapped reads (RPKM) [92], a custom script was used using tBLASTn, calculating each multiread as one unit.RPKM values in biological repeats were averaged for each stage of the life cycle.The transcript of eukaryotic translation factor 3 subunit B (NCBI ID: CAH1377306) was used as a housekeeping protein.Hierarchically clustered gradient heat maps of log2(RPKM+1) values were plotted using TBtools [93].A Kruskal-Wallis test [94] was conducted among the life stages (df = 5), calculated from total RPKM values on Statistics Kingdom webserver (https://www.statskingdom.com/index.html)(accessed on 25 March 2024) [95].The resulting p-values were adjusted using the Benjamini and Hochberg approach [64].

Conclusions
Serine peptidases (SPs) and homologs (SPHs) of the S1A family constitute a very diverse family of mostly secreted proteins involved in a variety of processes including digestion as well as development and innate immunity regulation.A thorough analysis of several transcriptomes and two newly sequenced genomes of T. molitor allowed us to update available information and identify 269 SPs and SPHs in this insect, performing sequence analysis and annotation, constructing phylogenetic relationships, and evaluating expression patterns across the entire life cycle.For 122 SPs, their putative trypsin-, chymotrypsin-and elastase-like specificities were predicted from the S1 binding subsite sequence analysis, and for 15 non-annotated SPs, specificity remains obscure, due to peculiarities of their S1 subsite structure.All studied SP-related sequences of T. molitor were grouped according to the organization of their propeptide region.The largest group of 84 SPs and 102 SPHs had no regulatory domains, while the remaining 53 SPs and 23 SPHs had different regulatory domains in the propeptide.Transcripts of 61 SPs without regulatory domains were expressed only in the feeding life stages likely being involved in digestion.The remaining 23 transcripts of SPs without regulatory domains showed mostly constitutive expression while those upregulated at the egg and pupa stages may be involved in the hydrolysis of storage proteins and in the breakdown of the larval structures during metamorphosis, respectively.In addition to SPs, the largest group of 95 T. molitor SPH sequences lacking regulatory regions were also expressed predominantly during feeding stages and their physiological role is presumably related to the digestive process; in particular, it may be an interaction with substrates or inhibitors in the midgut lumen.
The group of SPs and SPHs with regulatory domains contained in the propeptide four types of clips (A-D), GD, Sushi, LDL, SEA, PAN, FZ, TSP, EGF, CUB, SRCR, and CBM domains.Transcripts from the majority of these proteins were expressed constitutively throughout the entire life cycle of T. molitor, while some of them were specific to the egg stage or/and upregulated at the pupal stage.For most of these regulatory SP/SPH transcripts, a significantly lower expression level was documented than for the abovedescribed transcripts associated with digestive functions.One of the most interesting groups in T. molitor were seven polypeptidases, mainly expressed at the pupal and adult stages.Most of them comprise two or three SP/SPH domains and several Sushi domains.Similar complex polypeptidases were identified in a few insect species, but this group of proteins requires further study in order to accurately identify their structure and functions.The data obtained provide valuable information for further studies on biological functions in insects of the diverse S1A peptidase family.

Figure 1 .
Figure 1.Total number of SP and SPH genes found in sequenced genomes of insects from different orders.Data on SP are shaded in blue, data on SPH are in yellow, and undifferentiated data on the sum of SP/SPH genes are shaded in green.

Figure 1 .
Figure 1.Total number of SP and SPH genes found in sequenced genomes of insects from different orders.Data on SP are shaded in blue, data on SPH are in yellow, and undifferentiated data on the sum of SP/SPH genes are shaded in green.

Figure 3 .
Figure 3. Domain organization of 30 chymotrypsin-like peptidases, 18 elastase-like peptidases, and 15 non-annotated peptidases of T. molitor.Regulatory domains are marked with different shapes and colors.Description for domains: SignalP-signal peptide; TM-transmembrane domain; LDL-Low-Density Lipoprotein receptor class A repeat; Sushi-Sushi domain; GD-Gastrulation Defective domain.

Figure 3 .
Figure 3. Domain organization of 30 chymotrypsin-like peptidases, 18 elastase-like peptidases, and 15 non-annotated peptidases of T. molitor.Regulatory domains are marked with different shapes and colors.Description for domains: SignalP-signal peptide; TM-transmembrane domain; LDL-Low-Density Lipoprotein receptor class A repeat; Sushi-Sushi domain; GD-Gastrulation Defective domain.

Figure 5 .
Figure 5. Phylogenetic analysis of 269 SPs and SPHs of T. molitor.Complete protein sequences were aligned using MAFFT.The phylogenetic tree was built in the IQTREE service.Peptidases in the tree are divided into two groups: (A) (red)-SP and SPH without regulatory domains; (B) (blue)-SP and SPH with regulatory domains (including polypeptidases).For the interpretation of the colors of the identifiers, see the legend above.

Figure 6 .
Figure 6.Heatmaps of stage-specific expression pattern of 269 SP/SPH transcripts of T. molitor.The hierarchical clustering of RPKM values was used to compare the relative expression levels of transcripts from different T. molitor life stages transcriptomes, differentiated into 6 distinct groups.Groups 1-4-SP/SPH without regulatory domains in the propeptide; groups 5-6 have regulatory

Table 1 .
Domain organization and key structure features of 64 trypsins and 10 trypsin-like SPs of T. molitor.

Table 2 .
Domain organization and key structure features of 30 chymotrypsin-like SPs of T. molitor.

Table 3 .
Domain organization and key structure features of 18 elastase-like SPs of T. molitor.

Table 4 .
Domain organization and key structure features of 15 non-annotated SPs of T. molitor.

Table 5 .
Domain organization and key structure features of seven polypeptidases of T. molitor.

Table 6 .
T. molitor SP/SPH transcripts with the highest expression levels at the egg stage compared to other stages.

Table 7 .
T. molitor SP/SPH transcripts with the highest expression levels at the early and late pupal stages compared to other stages.

Table 8 .
T. molitor SP/SPH transcripts with the highest expression levels at the feeding stages compared to other stages and IV instar larvae gut.Bold font indicates the RPKM values for larvae IV and larval IV gut.Shaded are replaced amino acids in the catalytic triad of the SPH.