Genome-Wide Identification, Evolutionary Analysis, and Expression Patterns of Cathepsin Superfamily in Black Rockfish (Sebastes schlegelii) following Aeromonas salmonicida Infection

Cathepsins are lysosomal cysteine proteases belonging to the papain family and play crucial roles in intracellular protein degradation/turnover, hormone maturation, antigen processing, and immune responses. In the present study, 18 cathepsins were systematically identified from the fish S. schlegelii genome. Phylogenetic analysis indicated that cathepsin superfamilies are categorized into eleven major clusters. Synteny and genome organization analysis revealed that whole-genome duplication led to the expansion of S. schlegelii cathepsins. Evolutionary rate analyses indicated that the lowest Ka/Ks ratios were observed in CTSBa (0.13) and CTSBb (0.14), and the highest Ka/Ks ratios were observed in CTSZa (1.97) and CTSZb (1.75). In addition, cathepsins were ubiquitously expressed in all examined tissues, with high expression levels observed in the gill, intestine, head kidney, and spleen. Additionally, most cathepsins were differentially expressed in the head kidney, gill, spleen, and liver following Aeromonas salmonicida infection, and their expression signatures showed tissue-specific and time-dependent patterns. Finally, protein–protein interaction network (PPI) analyses revealed that cathepsins are closely related to a few immune-related genes, such as interleukins, chemokines, and TLR genes. These results are expected to be valuable for comparative immunological studies and provide insights for further functional characterization of cathepsins in fish species.


Introduction
Lysosomal proteases exert vital roles in several highly regulated life processes in both prokaryotes and eukaryotes, including cell division, fertilization, vitellogenesis, differentiation, tissue remodeling, antigen processing, and apoptosis [1][2][3]. The lysosome is a membrane-bound cytoplasmic organelle that serves as a major degradative compartment in eukaryotic cells [4]. Both endogenous and exogenous macromolecules can be delivered to lysosomes through biosynthetic and endocytic pathways, respectively. In addition, lysosomes can degrade proteins transported from the cytosol [3]. Therefore, the lysosome plays an important role in normal metabolism to maintain cellular homeostasis. Four catalytic groups of proteolytic enzymes (serine, cysteine, aspartic, and metalloproteases) have been recognized on the basis of the reaction nucleophile in the catalytic site of the enzyme [5]. The degradative function of the lysosome is carried out by more than 50 acid-dependent hydrolases (e.g., proteases, lipases, and glycosidases) contained within its lumen, which has an acidic pH in the range of 4.6-5.0 [6].
Cysteine proteases of the papain superfamily, which are regularly referred to as lysosomal cathepsins, comprise a large number of enzymes [7]. On the one hand, cathepsins can be divided into three subgroups, namely, cysteine proteases (cathepsins B, C, F, H, K, L, O, S, T, U, V, W, and X), serine proteases (cathepsins A and G), and aspartic proteases (cathepsins D and E), according to the amino acid residues in their active sites [8,9]. Among

Phylogenetic and Synteny Analysis of S. schlegelii Cathepsins
To validate the identification of the cathepsins, as well as explore the evolutionary relationship among different organisms, phylogenetic analysis was performed ( Figure 3). Overall, the phylogenetic relationship of cathepsins is relatively conserved, supporting the annotation of S. schlegelii cathepsins ( Figure 3). Most of the cathepsins were clustered into clades with their respective counterparts of flounder, medaka (Oryzias latipes), fugu (Takifugu rubripes), and turbot (Scophthalmus maximus) and then clustered with other teleost and tetrapod species, with strong bootstrap support. The only exception was CTSLb, which was clustered with CTSS and CTSZ of zebrafish (Danio rerio) and then grouped into the clade of CTSLb of zebrafish. In addition, cathepsin genes were divided into eleven major subgroups: CTSA, CTSB, CTSC, CTSD, CTSF, CTSH, CTSL, CTSK, CTSO, CTSS, and CTSZ. Of the eighteen cathepsins, a number of them are highly related duplicates. For example, CTSA, CTSB, CTSD, CTSH, CTSL, CTSS, and CTSZ have two copies ( Figure 3).
Mar. Drugs 2022, 20, x FOR PEER REVIEW 6 of 23 Figure 3. Phylogenetic analysis of S. schlegelii cathepsin genes. The phylogenetic tree was constructed based on the amino acid sequences of cathepsin of vertebrates using the neighborjoining method in MEGA 6. Gaps/missing data treatment was analyzed using all sites, and the phylogenetic tree was evaluated with 1000 bootstrap replications. The bootstrap values are indicated by numbers at the nodes. The red circles indicate the S. schlegelii cathepsin genes characterized in the present study, and green triangles indicate zebrafish cathepsin genes. Though phylogenetic relationships provided strong support for the identification of most cathepsins, syntenic analyses were performed to provide additional evidence ( Figure 4). On the one hand, the synteny analyses provided strong evidence for the identification of S. schlegelii CTSLb, whose phylogenetic relationship was obscure. The conserved flanking genes included parkin co-regulated gene (pacrg), quaking-a (qkia), (ythdf2), arfgap with sh3 domain, ankyrin repeat and PH 3 (asap3), connector enhancer of kinase suppressor of ras 1 (cnksr1), and grainyhead-like 3 (grhl3) among black rockfish, zebrafish, and catfish ( Figure 4M). On the other hand, the eighteen S. schlegelii cathepsins identified in this study were all well clarified and annotated. For instance, the neighboring genes of CTSBa, including fanconi anemia complementation group M (fancm), set and mynd Domain Containing 2a (smyd2a), German-Chinese forum on computational catalysis 2 (gcfc2), transcription factor B2, mitochondrial (tfb2m), potassium channel tetramerization domain-containing 3 (kctd3), and estrogen-related receptor gamma (esrrg), were conserved Mar. Drugs 2022, 20, 504 6 of 21 among black rockfish, zebrafish, and channel catfish ( Figure 4C). The neighboring genes of CTSF, including mitochondrial ribosomal protein L18 (mrpl18), transient receptor potential vanilloid 1 (trpt1), heterogeneous nuclear ribonucleoprotein H1 (hnrnph1), run and fyve domain-containing 1 (rufy1), spermatogenesis-associated gene 4 (spata4), platelet-derived growth factor C (pdgfc), and Translocase of outer mitochondrial membrane 5 (tomm5), were conserved among black rockfish, zebrafish, and catfish ( Figure 4H). The adjacent gene (rassf9) downstream of CTSAb in zebrafish and channel catfish was also observed downstream of the CTSAb gene in black rockfish. Moreover, the synteny analysis supported the annotations of duplicated genes, including CTSA, CTSB, CTSD, CTSH, CTSL, CTSS, and CTSZ. In detail, CTSA, CTSB, CTSD, CTSH, CTSL, CTSS, and CTSZ were all duplicated by whole-genome duplication.   Table S1.

Evolutionary Rate Analysis
To determine whether cathepsins evolved under different selection pressures among the selected teleost species, the values of Ka, Ks, and their ratio for each homolog from nine selected teleost species were compared. As shown in Figure 7, the values of Ka among these cathepsins ranged from 0.15 to 0.86, and the values of Ks varied from 0.16 to 1.41. In addition, cathepsins were grouped into two categories according to their ratio of Ka to Ks. The average values of Ka/Ks of CTSC, CTSDa, CTSF, CTSHa, CTSHb, CTSO, CTSSa, CTSSb, CTSZa, and CTSZb were greater than one, indicating that these genes were under positive selection. Among them, the highest Ka/Ks ratios were observed in CTSZa and CTSZb, with values of 1.97 and 1.75, respectively. The Ka/Ks ratios of other cathepsins were less than one, revealing that they underwent efficient purifying selection. The lowest Ka/Ks ratios were observed in CTSBa (0.13) and CTSBb (0.14). Mar. Drugs 2022, 20, x FOR PEER REVIEW 9 of 23

Evolutionary Rate Analysis
To determine whether cathepsins evolved under different selection pressures among the selected teleost species, the values of Ka, Ks, and their ratio for each homolog from nine selected teleost species were compared. As shown in Figure 7, the values of Ka among these cathepsins ranged from 0.15 to 0.86, and the values of Ks varied from 0.16 to 1.41. In addition, cathepsins were grouped into two categories according to their ratio of Ka to Ks. The average values of Ka/Ks of CTSC, CTSDa, CTSF, CTSHa, CTSHb, CTSO, CTSSa, CTSSb, CTSZa, and CTSZb were greater than one, indicating that these genes were under positive selection. Among them, the highest Ka/Ks ratios were observed in CTSZa and CTSZb, with values of 1.97 and 1.75, respectively. The Ka/Ks ratios of other cathepsins were less than one, revealing that they underwent efficient purifying selection. The lowest Ka/Ks ratios were observed in CTSBa (0.13) and CTSBb (0.14).

Evolutionary Rate Analysis
To determine whether cathepsins evolved under different selection pressures among the selected teleost species, the values of Ka, Ks, and their ratio for each homolog from nine selected teleost species were compared. As shown in Figure 7, the values of Ka among these cathepsins ranged from 0.15 to 0.86, and the values of Ks varied from 0.16 to 1.41. In addition, cathepsins were grouped into two categories according to their ratio of Ka to Ks. The average values of Ka/Ks of CTSC, CTSDa, CTSF, CTSHa, CTSHb, CTSO, CTSSa, CTSSb, CTSZa, and CTSZb were greater than one, indicating that these genes were under positive selection. Among them, the highest Ka/Ks ratios were observed in CTSZa and CTSZb, with values of 1.97 and 1.75, respectively. The Ka/Ks ratios of other cathepsins were less than one, revealing that they underwent efficient purifying selection. The lowest Ka/Ks ratios were observed in CTSBa (0.13) and CTSBb (0.14).

Expression Profiles of Cathepsins after A. salmonicida Infection
The expression patterns of S. schlegelii cathepsins were detected in nine healthy tissues, including the blood, brain, gill, intestine, liver, spleen, head kidney, skin, and muscle. Among the tested tissues, the cathepsins were expressed at comparatively low levels in the blood, and then we used the expression level in the blood as the baseline for S. schlegelii cathepsins. Overall, S. schlegelii cathepsins were ubiquitously expressed in all of the examined tissues with distinct expression patterns ( Figure 8). In detail, the highest expression levels of most S. schlegelii cathepsins were observed in the gill, head kidney, and spleen, while the lowest expression levels were found in the blood, muscle, and liver, with moderate expression levels observed in the brain, intestine, and skin ( Figure 8).

PPI Network Construction of Cathepsins
A PPI network analysis was conducted to reveal potential interacting proteins and immune-related signal transduction pathways. On the one hand, cathepsins were linked within families (Figure 13). For instance, as shown in Figure 13A, CTSAa was predicted to interact with CTSBa, CTSD, and CTSZ, indicating potential connectivity within S. schlegelii cathepsin families. Similar PPI patterns were also observed for other members, such as CTSAb, CTSBa, CTSBb, CTSC, CTSDa, CTSHa, CTSZa, and CTSZb. In addition, a few immune-related genes were also observed to show high connectivity to S. schlegelii

Discussion
In the present study, we systematically identified a complete repertoire of eighteen cathepsin genes in the S. schlegelii genome and transcriptome datasets. Firstly, multiple alignments of the cathepsin superfamily were performed based on the obtained

Discussion
In the present study, we systematically identified a complete repertoire of eighteen cathepsin genes in the S. schlegelii genome and transcriptome datasets. Firstly, multiple alignments of the cathepsin superfamily were performed based on the obtained sequences, revealing that most of the common features were present, including the high-level conservation of the glutamine oxyanion hole (except for cathepsins Aa, Ab, Da, and Db) and cysteine, histidine (except for cathepsins Aa, Ab, Da, Db, and Hb), and asparagine active site residues (except for cathepsins Aa, Da, Db, and Hb), which play important roles in the formation and stabilization of the catalytic sites of activating enzymes. These results are identical to the characteristics of the cathepsin family [38]. Secondly, based on the S. schlegelii genome and transcriptome datasets, we performed gene structure, phylogenetic, syntenic, and evolutionary rate analyses and determined their expression patterns after infection with A. salmonicida. These results are expected to be valuable for comparative immunological studies and suggest that members of the cathepsin family in S. schlegelii play vital roles in immune responses to bacterial infection in teleost species.
By comparing the number of cathepsins identified in S. schlegelii and other teleost species, combined with the evolutionary rate analysis, several interesting phenomena in the evolution process were observed. On the one hand, a few members (cathepsins A, D, H, and Z) were duplicated through whole-genome duplications. Admittedly, the cathepsin superfamily is comparatively conserved, but there are also unique differences in various taxonomic species, suggesting that the function and evolution mechanism of the cathepsin superfamily need to be further studied in different teleost species. On the other hand, the lowest Ka/Ks ratios were observed in CTSBa (0.13) and CTSBb (0.14), and the highest Ka/Ks ratios were observed in CTSZa and CTSZb, with values of 1.97 and 1.75, respectively. The Ka/Ks ratios reveal that they underwent efficient purifying selection.
The immune system of fish is very different from that of mammals. Fish lack bone marrow and lymph nodes and instead use the kidney as the main lymphoid organ [39]. In addition, the major lymphoid tissues in teleost fishes are the kidney, thymus, spleen, and mucosa-associated lymphoid tissues, which contain lymphocytes, macrophages, and various types of granulocytes [40]. In this study, we explored the expression profiles of eighteen cathepsin genes in S. schlegelii. On the one hand, all cathepsins were expressed constitutively in various tested tissues, suggesting that they may play multifunctional roles in healthy S. schlegelii. Among them, the highest expression level was observed in the spleen, followed by the gill, intestine, head kidney, and muscle. The kidney and spleen are important hematopoietic and lymphoid organs that are closely related to the immune response in fish [41]. Notably, cathepsins F, La, and Hb were highly expressed in muscle (Figure 8). A previous study reported that using antibodies directed against chum salmon cathepsins B and L showed that these enzymes were derived from muscleinvading phagocytes [42]. Hence, the differential expression of cathepsins F, La, and Hb in S. schlegelii muscle may be related to muscle degradation in fish. On the other hand, SsCTSK was widely expressed in all of the examined tissues, with higher expression levels observed in the spleen, head kidney, gill, and skin, consistent with previous studies in turbot [23,43,44]. There are two copies of CTSH in S. schlegelii. CTSHa was expressed at a high level in the intestine, indicating that CTSHa might be involved in intestinal immune responses [45]. In contrast, CTSHb was highly expressed in muscle. Interestingly, CTSH was highly expressed in the intestine and differentially expressed in muscle at the same time in rock bream [46]. A high expression level of CTSBb was observed in the intestine, which is consistent with the highest expression level of CTSB in the intestine of miiuy croaker (Miichthys miiuy) [38], whereas CTSBa showed a high expression level in the spleen, consistent with the high expression level of CTSB in the spleen of Nile tilapia (Oreochromis niloticus) [47]. The difference in cathepsin expression in different aquatic animals may be related to the different species, living environments, and physiological statuses of individuals [48].
In order to explore the immune roles of cathepsins, the expression profiles of cathepsins following bacterial infection were characterized in the head kidney, spleen, liver, and gill. After A. salmonicida infection, most cathepsins were significantly differentially expressed, and obvious tissue-specific and time-dependent expression patterns were observed, suggesting their involvement in response to bacterial infection. In general, the expression patterns of cathepsins were similar among the gill, head kidney, and liver, exhibiting a general up-regulation trend at most time points after A. salmonicida infection (Figures 9-11). However, a few cathepsin members, such as CTSBb, CTSF, CTSHb, CTSO, CTSSb, and CTSZb, were observed to be down-regulated at 48 h and 72 h in the spleen (Figure 12). Notably, CTSHa was expressed at high levels in the intestine, spleen, head kidney, and gills of healthy rockfish, and it was also observed to be expressed at high levels in the spleen, head kidney, and gills following A. salmonicida infection, indicating the potentially vital role of CTSHa in response to bacterial infection. It was reported that CTSA was significantly up-regulated in gill following V. anguillarum infection in turbot [49], which is consistent with our result that CTSAa and CTSAb were up-regulated the most in the gill compared with other tissues after bacterial infection. Similar to what was reported in flounder and turbot, CTSB showed the strongest up-regulation, and the up-regulation of CTSBa in mucosal surfaces could be a result of the inflammatory response against invading pathogens [50,51].
Cysteine proteases, which are the largest group, are synthesized as inactive precursors, and the typical domain architectures include a signal peptide, a propeptide, and a mature peptide. The characteristic ERFNIN motif of endopeptidase cathepsins was found in the propeptide regions of CTSBa, CTSBb, CTSC, CTSF, CTSHa, CTSHb, CTSK, CTSLa, CTSLb, CTSO, CTSSa, CTSSb, CTSZa, and CTSZb. A previous study reported that the ERFNIN signature sequence in CTSH might participate in the scaffold of a helical structure in the proregion, which was significant for its specific inhibition [52]. The mature form of CTSZ has an ECD motif (Glu-Cys-Asp), which might play an important role in integrin-mediated signal transduction [53]. In addition, CTSAa and CTSAb of S. schlegelii are representative members of serine proteases. Previous research has demonstrated that CTSA has both catalytic and protective functions [54]. Besides its enzymic functions, cathepsin A plays an important role in the protection of lysosomal β-galactosidase and neuraminidase against intralysosomal proteolysis by forming a macromolecular complex [55,56]. Moreover, CTSDa and CTSDb of S. schlegelii are representative members of aspartic proteases. Previous studies reported that CTSD prefers an α-helical folding of its substrate for proteolytic degradation, and the activity of CTSD may be crucial in determining which peptide products arise from antigen processing and then serve as T-cell epitopes. This information is relevant to the design of synthetic immunogens and vaccines that aim at T-cell activation [57]. In general, cathepsins play physiological roles in protein degradation/turnover (CTSB, CTSL, and CTSH), bone resorption (CTSK), proenzyme activation (CTSB and CTSC), antigen presentation/processing (CTSF, CTSH, CTSL, CTSS, and CTSV), epidermal homeostasis (CTSL), and hormone maturation (cathepsins B and L) [58].
PPI analyses were conducted to further reveal the regulatory mechanisms of S. schlegelii cathepsins, which are essential for predicting activities for most proteins. As shown in Figure 13, a number of immune-related genes were observed to show high connectivity to cathepsins, including interleukin genes (IL12RB2), TLR (TLR3), TOLL (toll8), CD genes (CD74 and CD74a), mmp (MMP13b and MMP20a), mhc (MHC2A, MHC2B, MHC2BL, MHC2DAB, and MHC2DCB), bcl (BCL2, BCL2b, and BCL2L1), irf (IRF6) (EIF2B1), chemokines (ccr12.2), etc. CCR12a and CCR12b were both differentially expressed after A. salmonicida infection in S. schlegelii, suggesting their key roles in immune regulation in fish [59,60]. TLR3-mediated signaling is important for host defense against RNA viruses [61]. Infection with the IHN virus and the time in culture both appeared to enhance MHC class II expression in rainbow trout [62]. In mammals, cell-mediated cytotoxicity is largely mediated by MHC class I molecules, but peptide-loaded MHC class II molecules can also stimulate cytotoxicity by T cells [63,64]. More experimental evidences are required to discover their immune response mechanisms.

Gene Identification
In order to identify cathepsins in turbot, all available sequences of cathepsins from zebrafish, channel catfish, flounder, medaka, tilapia (Oreochromis niloticus), fugu, frog (Xenopus laevis), chicken (Gallus gallus), mouse (Mus musculus), and human (Homo sapiens) were retrieved from the NCBI (http://www.ncbi.nlm.nih.gov/, accessed on 17 September 2021), Ensemble (http://www.ensembl.org, accessed on 17 September 2021), UniProt (http://www.uniprot.org/, accessed on 17 September 2021), and ZFIN (http://zfin.org/, accessed on 17 September 2021) datasets. These sequences were used as queries to search against the S. schlegelii genome [65] and transcriptome datasets [66] using the TBLASTN program with an E-value cutoff of 1 × 10 −5 . Then, Clustal Omega (http:// www.ebi.ac.uk/ Tools/msa/clustalo/, accessed on 25 September 2021) was used to eliminate duplicates in the initial sequence pool, and each sequence was confirmed to be a unique gene according to its location on the genome. After that, the ORF (opening reading frame) finder (http:// www.ncbi.nlm.nih.govgorf/gorf.html, accessed on 27 September 2021) was used to predict coding sequences, which were further validated by BLASTP against NCBI non-redundant (nr) protein database. Lastly, the Fgenesh program of Molquest software (Softberry Int, version 2.4, Mount Kisco, NY, USA) was used to predict the genes from the retrieved genomic chromosome sequences [67]. Combining the Fgenesh predictions with the blast results captured the complete set of cathepsins' sequences.

Sequence Analysis
To explore the characteristics of cathepsins, a series of bioinformatic analyses were conducted. Firstly, the lengths of mRNA and amino acid sequences, molecular weights (MWs), and isoelectric points (PIs) of cathepsins were calculated using ProtParam (https:// web.expasy.org/protparam/, accessed on 8 October 2021) [68]. ProtComp 9.0 (http://www. softberry.com, accessed on 12 October 2021) was used to predict the subcellular localizations of cathepsins. In addition, the functional domains (motif patterns) were predicted using the simple modular architecture research tool (SMART; http://smart.emblheidelberg.de/, accessed on 15 October 2021) and further endorsed by conserved domains predicted through BLASTP. The number of exons was extracted from the annotation results of the S. schlegelii genome reference [66], and the Gene Structure Display Server 2.0 (GSDS, http: //gsds.gao-lab.org/, accessed on 26 October 2021) was used to draw the gene structures of cathepsins. The mappings of cathepsins on chromosomes were analyzed by TBtools. The presumed 3D protein structural model was established using PHYRE2 Protein Fold Recognition Server (Phyre2 server) (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi? id=index, accessed on 5 November 2021) [69]. To determine the PPI network of expressed proteins, amino acid sequences of cathepsins were blasted against C. semilaevis or D. rerio by using STRING software 11.0. Representation of the protein-protein network was analyzed at a confidence score of 0.40 from text mining, experiments, databases, co-expression, and neighborhood sources [70].

Phylogenetic Analysis
Phylogenetic analysis was conducted using the amino acid sequences of cathepsins, along with several representative vertebrates, including human, mouse, chicken, fugu, zebrafish, channel catfish, flounder, rainbow trout (Onchorynchus mykiss), tongue sole (Cynoglossus semilaevis), Atlantic salmon (Salmo salar), and turbot. Alignment of multiple amino acid sequences with default parameters was conducted by MUSCLE (multiple sequence comparison by log-expectation) [71]. Then, the phylogenetic trees of cathepsins were constructed in the MEGA 6 program. In detail, the phylogenetic tree of cathepsin was reconstructed with JTT (Jones-Taylor-Thornton) + G (gamma distribution for modeling rate heterogeneity) + I (invariant sites) + F (Freqs) model, with a proportion of invariable sites of 0.01 and Gamma shape parameter of 2.27. Bootstrapping with 1000 replications was performed to test the phylogenetic tree, and gaps/missing data treatment was analyzed using all sites. Finally, EvolView (http://www.evolgenius.info/evolview, accessed on 15 November 2021) (http://www.evolgenius.info/evolview, accessed on 18 November 2021) was used to annotate the phylogenetic trees.

Synteny and Schematic Genomic Organization Analysis
Syntenic analysis was conducted to provide additional evidence for the identification of cathepsins based on the comparison of neighboring genes of cathepsins with those of zebrafish and channel catfish (or turbot). The flanking genes of S. schlegelii cathepsins were predicted from S. schlegelii genomic chromosomes using the Fgenesh program in Molquest software [67], and the identified amino acid sequences were annotated by running BLASTP against NCBI non-redundant (nr) database and UniProt Knowledgebase (UniProtKB). Meanwhile, the conserved syntenic blocks of cathepsins and their flanking genes in zebrafish and channel catfish (or turbot) were obtained from the NCBI database. Zebrafish is a model organism and has the most updated nomenclature system (http://zfin.org/, accessed on 25 November 2021), so we named cathepsins after zebrafish whenever possible based on orthologs, which were determined by phylogenetic and syntenic analysis.

Evolutionary Rate Analyses
There are two indicators of base substitution frequency: synonymous sites (Ks) and non-synonymous substitution sites (Ka). Their ratios (Ka/Ks) are used to measure the evolutionary rates of genes. When the ratio is >1, it indicates that these genes underwent positive selection, while when the ratio is <1, it indicates that these genes underwent purifying selection (<1) during the evolutionary process. To illustrate the evolutionary rates of cathepsins among the teleost species, the blast method was used to extract the homologs or regions containing cathepsins in available species, including zebrafish, channel catfish, turbot, fugu, medaka, and tilapia. Subsequently, the nucleic acid sequences of cathepsins identified from the selected teleost species were imported into the MEGA software (version: 6.0, Center for Evolutionary Medicine and Informatics, Tampe, FL, USA) and then aligned according to codons with the parameters set as: gap opening penalty: 10; gap extension penalty: 0.2; delay divergent cutoff: 30%. Finally, DnaSP ver. 5 was used to calculate the ratios of Ka/Ks [72].

Sample Collection of Healthy S. schlegelii
In order to characterize the basal expression patterns of cathepsins in S. schlegelii, nine tissues of healthy S. schlegelii were collected. Experimental S. schlegelii adults were obtained from a local fish farm in Qingdao, Shandong Province, with an average length of 15 ± 2 cm. Before sample collection, fish were reared in the laboratory and acclimatized in a flow-through system filtered with seawater for one week without feeding. The DO, pH, and temperature were 5.2 ± 0.35 mg/L, 8.2 ± 0.01, and 22.03 ± 0.58 • C, respectively. Thirty fish were anesthetized with tricaine methanesulfonate (MS-222, 100 mg/L) in seawater. Then, nine tissues, namely, the blood, brain, head kidney, skin, gill, muscle, spleen, intestine, and liver, were collected, flash-frozen in liquid nitrogen, and then stored at −80 • C in an ultra-low freezer until the preparation of RNA.

Bacterial Challenge and Sample Collection
In order to characterize the immune response of cathepsins to bacterial infection, A. salmonicida was selected to conduct the injection challenge. The experimental fish were the same as mentioned above. Thereafter, a pre-challenge in S. schlegelii was performed. Subsequently, confirmed A. salmonicida isolated from symptomatic fish was cultured in LB medium at 28 • C overnight at 180 rpm/min. The aquaria, with 30 fish in each aquarium, were randomly assigned to 6 h, 24 h, 48 h, and 72 h post-injection groups. Then, the experi-mental fish were intraperitoneally injected with 0.1 mL A. salmonicida at a concentration of 1 × 10 7 CFU/mL. At the same time, the fish in the control group were injected with an equal amount of physiological saline. During the challenge, the fish were collected after being anesthetized with MS-222 (100 mg/L). Subsequently, the tissues (head kidney, liver, spleen, and gill) were collected after injection at different time points (6 h, 24 h, 48 h, and 72 h) and were designated as AS_6 h, AS_24 h, AS_48 h, and AS_72 h, respectively. Each group had three replicates, and each replicate included five random individuals. All samples were flash-frozen in liquid nitrogen and then stored in an ultra-low freezer at −80 • C until RNA extraction.

Total RNA Extraction and Real-Time PCR Analysis
Before RNA extraction, a mortar and pestle were used to homogenize samples under liquid nitrogen. Using Trizol Reagent (Invitrogen, Carlsbad, CA, USA), the total RNA was extracted according to the supplied instructions. Then, the integrities of RNAs were detected on 1% agarose gel. Then, all RNA samples were run on the gel to ensure that no genomic DNA existed. Using Nanodrop 2000 (Thermo electronic North America LLC, Boston, FL, USA), the RNA quality and quantity of each sample were determined. All extracted samples had an A260/280 ratio greater than 1.8. A NanoPhotometer spectrophotometer (IMPLEN, Westlake Village, CA, USA) was used to check RNA concentration.
Quantitative real-time PCR (qPCR) was conducted to examine the expression patterns of cathepsins in different tissues (head kidney, liver, spleen, and gill) following A. salmonicida challenges. Gene-specific primers were designed using PrimerQuest (https: //sg.idtdna.com/PrimerQuest/Home, accessed on 20 December 2021) based on the gene sequences of cathepsin ( Table 2). The β-actin gene was used as an internal control for normalization of the expression levels ( Table 2). First-strand cDNA (500 ng RNA per 10 µL reaction) was synthesized according to the manufacturer's protocol, and the PrimeScript™ RT reagent Kit (Takara, Otsu, Japan) was used to synthesize first-strand cDNA (500 ng RNA per 10 µL reaction). After that, qPCR was performed on a CFX96 real-time PCR detection system (Bio-Rad Laboratories, Hercules, CA, USA) following the manufacturer's instructions. The reactions were performed in 20 µL volumes containing 2 µL of the diluted template cDNA, 0.6 µL of each primer, 10 µL of SYBR ® Premix Ex TaqTM II (TliRNaseH Plus), and 6.8 µL of RNA-free water. The thermal cycling profile was performed as follows. In order to verify the specificity of the amplicons, the PCR reaction mixture was denatured at 95 • C for 30 s, followed by 40 cycles of 95 • C for 30 s, 58 • C for 30 s, followed by dissociation curve analysis, 5 s at 65 • C, and then up to 95 • C at a rate of 0.1 • C/s increment. A non-template control was performed as a negative control for each gene, and each reaction was confirmed by repeating the qPCR analysis in triplicate. Finally, the 2 −∆∆Ct method was used to calculate the relative gene expression fold changes [73].

Conclusions
In summary, genome-wide identification of eighteen cathepsins was performed in S. schlegelii. Gene structure, phylogenetic, synteny, genome organization, and evolution rate analyses allowed annotations of these genes, and the results provide insights into their evolutionary dynamics. Furthermore, cathepsins were ubiquitously expressed in nine examined tissues, with high expression levels observed in the head kidney, spleen, liver, and gill. Meanwhile, most cathepsins were differentially expressed in four examined tissues after A. salmonicida infection, suggesting their participation both in homeostasis and in immune responses to bacterial infection. Finally, PPI analyses indicate that cathepsins interact with a few immune-related genes, such as interleukin, chemokine, MHCII, and TLR genes. These results indicated the vital roles of cathepsins in teleost immunity, and further studies are needed to explore the immune function and mechanism of cathepsins in teleost host immune activities.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/md20080504/s1. Table S1: Abbreviations of gene names used in synteny analysis; Table S2: Abbreviations and gene IDs of gene names used in the phylogenetic tree in Figure 3; Table S3: Abbreviations and gene IDs of gene names used in PPI; Table S4: Abbreviations of gene names used in PPI.