Bioevaluation of Pheretima vulgaris Antithrombotic Extract, PvQ, and Isolation, Identification of Six Novel PvQ-Derived Fibrinolytic Proteases

Thrombosis is a disease that seriously endangers human health, with a high rate of mortality and disability. However, current treatments with thrombolytic drugs (such as recombinant tissue-plasminogen activator) and the oral anticoagulants (such as dabigatran and rivaroxaban) are reported to have a tendency of major or life-threatening bleeding, such as intracranial hemorrhage or massive gastrointestinal bleed with non-specific antidotes. In contrast, lumbrokinase is very specific to fibrin as a substrate and does not cause excessive bleeding. It can dissolve the fibrin by itself or convert plasminogen to plasmin by inducing endogenous t-PA activity to dissolve fibrin clots. Therefore, searching for potentially new therapeutic molecules from earthworms is significant. In this study, we first collected a strong fibrinolytic extract (PvQ) from the total protein of the Pheretima vulgaris with AKTA pure protein purification systems; its fibrinolytic bioactivity was verified by the fibrin plate assay and zebrafish thrombotic model of vascular damage. Furthermore, according to the cell culture model of human umbilical vein endothelial cells (HUVECs), the PvQ was proven to exhibit the ability to promote the secretion of tissue-type plasminogen activator (t-PA), which further illustrated that it has an indirect thrombolytic effect. Subsequently, extensive chromatographic techniques were applied to reveal the material basis of the extract. Fortunately, six novel earthworm fibrinolytic enzymes were obtained from the PvQ, and the primary sequences of those functional proteins were determined by LC-MS/MStranscriptome cross-identification and the Edman degradation assay. The secondary structures of these six fibrinolytic enzymes were determined by circular dichroism spectroscopy and the three-dimensional structures of these proteases were predicted by MODELLER 9.23 based on multi-template modelling. In addition, those six genes encoding blood clot-dissolving proteins were cloned from P. vulgaris by RT-PCR amplification, which further determined the accuracy of proteins primary sequences identifications and laid the foundation for subsequent heterologous expression.


Introduction
As commonly known, thrombosis is a common underlying factor for the occurrence of cerebro-cardiovascular diseases, such as stroke, heart attacks, and pulmonary embolisms [1,2]. Thrombotic diseases present a significant risk of mortality and disability, which has seriously affected public health [3]. Dissolving the thrombus is essential in the treatment of thrombotic diseases. However, the current drug used for thrombolysis exhibits various side-effects, the most dangerous being life-threatening bleeding [4]. Since bioprospecting introduced the promise of additional sources of bioeffective products with fewer side-effects, it has become possible to explore complementary and alternative therapies. As an alternative therapy, medicinal animal preparations from leeches, snakes, and according to the measured diameter results, as shown in Table 1. Furthermore, the specific activity of PvQ (247 U/µg) was about twice higher than that of lumbrokinase (115 U/µg). This indicates that PvQ possesses an excellent and long-acting fibrin-hydrolysis activity in vitro. Experiments were repeated on three dishes for each sample.

Determination of Fibrinolytic Activity
As evident from the formation of the clear lysis zone on the fibrin plate ( Figure 2), PvQ degraded fibrin into soluble peptides, which revealed its fibrinolytic activity. After incubation for 18 h, the specific activity [18] of PvQ relative to lumbrokinase was calculated according to the measured diameter results, as shown in Table 1. Furthermore, the specific activity of PvQ (247 U/μg) was about twice higher than that of lumbrokinase (115 U/μg). This indicates that PvQ possesses an excellent and long-acting fibrinhydrolysis activity in vitro. Experiments were repeated on three dishes for each sample.

Determination of Fibrinolytic Activity
As evident from the formation of the clear lysis zone on the fibrin plate ( Figure 2), PvQ degraded fibrin into soluble peptides, which revealed its fibrinolytic activity. After incubation for 18 h, the specific activity [18] of PvQ relative to lumbrokinase was calculated according to the measured diameter results, as shown in Table 1. Furthermore, the specific activity of PvQ (247 U/μg) was about twice higher than that of lumbrokinase (115 U/μg). This indicates that PvQ possesses an excellent and long-acting fibrinhydrolysis activity in vitro. Experiments were repeated on three dishes for each sample.

Evaluation of Thrombolytic Activity In Vivo
The antithrombotic effects of PvQ in vivo were further evaluated using the zebrafish thrombus model of epinephrine hydrochloride-induced vascular endothelial injury [19,20]. After administration, the heart rate and arterial pulse frequency of zebrafish were increased, which accelerated the blood circulation in veins; thrombosis developed rapidly in the tail vein. Therefore, the successful modelling not only resulted in the formation of venous thrombosis, but also accompanied by the increased heart rate and arterial pulse. Thereafter, the tail vein thrombosis of zebrafish was directly observed and photographed under a microscope, as shown in Figure 3. Obviously, zebrafish had no thrombosis in the

Evaluation of Thrombolytic Activity In Vivo
The antithrombotic effects of PvQ in vivo were further evaluated using the zebrafish thrombus model of epinephrine hydrochloride-induced vascular endothelial injury [19,20]. After administration, the heart rate and arterial pulse frequency of zebrafish were increased, which accelerated the blood circulation in veins; thrombosis developed rapidly in the tail vein. Therefore, the successful modelling not only resulted in the formation of venous thrombosis, but also accompanied by the increased heart rate and arterial pulse. Thereafter, the tail vein thrombosis of zebrafish was directly observed and photographed under a microscope, as shown in Figure 3. Obviously, zebrafish had no thrombosis in the normal group (A), while zebrafish in the model group (B) formed severe thrombosis, and the staining intensity of tail vein erythrocytes in the PvQ group (C) was significantly reduced, indicating that most of blood clots were hydrolyzed by PvQ. Furthermore, the quantitative analysis results showed that PvQ has a significant in vivo fibrinolytic effect with an inhibitory rate of 67.3% at a dosage of 10 ng/fish, according to the staining intensity of red blood cells in the tail vein of zebrafish (Table 2, Figure 4).

FOR PEER REVIEW 4 of 21
the staining intensity of tail vein erythrocytes in the PvQ group (C) was significantly reduced, indicating that most of blood clots were hydrolyzed by PvQ. Furthermore, the quantitative analysis results showed that PvQ has a significant in vivo fibrinolytic effect with an inhibitory rate of 67.3% at a dosage of 10 ng/fish, according to the staining intensity of red blood cells in the tail vein of zebrafish (Table 2, Figure 4).     Figure 4).

Determination of t-PA Content
Tissue type plasminogen activator (t-PA) is one of the proteases secreted by vascula endothelial cells which converts the proenzyme plasminogen to plasmin. Increasing th release of t-PA can promote the dissolution of the thrombus. Additionally, t-PA plays a A B C Figure 4. The staining intensity of tail vein erythrocytes. Data are expressed as mean ± SD (n = 18). * p ≤ 0.05, ** p ≤ 0.01 compared to the normal group; ## p ≤ 0.01 compared to the model group.

Determination of t-PA Content
Tissue type plasminogen activator (t-PA) is one of the proteases secreted by vascular endothelial cells which converts the proenzyme plasminogen to plasmin. Increasing the release of t-PA can promote the dissolution of the thrombus. Additionally, t-PA plays an indirect role in thrombolysis and is even given as a drug to relieve ischemia. To verify whether the PvQ can promote the release of t-PA as reported in the literature [21], we first conducted a 2-(2-methoxy-4-nitrophenyl)-3-(4-nitrophenyl)-5-(2,4-disulfonic acid benzene)-2H-tetrazolium monosodium salt (CCK-8) assay to examine the cell viability in different treatment groups. Then, an ELISA assay was performed in this research. As displayed in Figure 5a, the viability of HUVECs was not affected when the concentration of PvQ was lower than 40 µg/mL. Noticeably, the decrease was distinct with 60 µg/mL of PvQ. Therefore, the maximum drug-treated concentration was determined as 40 µg/mL. Additionally, the results of the ELISA assay revealed that the amount of t-PA was significantly upregulated (Figure 5b) at a concentration of PvQ not greater than 20 µg/mL. Confusingly, t-PA levels declined as the drug concentration increased. In our view, the reason for this phenomenon may be that PvQ, as a basic serine protease, has trypsin-like effects and a high dose of PvQ hydrolyzed t-PA [22]. To be precise, a definitive explanation of this phenomenon requires further experimentation later. However, in any case, the level of t-PA did increase in the presence of low concentrations of PvQ. Meanwhile, earthworm fibrinolytic proteins were orally administered in the traditional application, and the plasma concentration of the active proteases may not reach a high level, thus, showing a therapeutic effect rather than hydrolysis. conducted a 2-(2-methoxy-4-nitrophenyl)-3-(4-nitrophenyl)-5-(2,4-disulfonic acid benzene)-2H-tetrazolium monosodium salt (CCK-8) assay to examine the cell viability in different treatment groups. Then, an ELISA assay was performed in this research. As displayed in Figure 5a, the viability of HUVECs was not affected when the concentration of PvQ was lower than 40 μg/mL. Noticeably, the decrease was distinct with 60 μg/mL of PvQ. Therefore, the maximum drug-treated concentration was determined as 40 μg/mL. Additionally, the results of the ELISA assay revealed that the amount of t-PA was significantly upregulated (Figure 5b) at a concentration of PvQ not greater than 20 μg/mL. Confusingly, t-PA levels declined as the drug concentration increased. In our view, the reason for this phenomenon may be that PvQ, as a basic serine protease, has trypsin-like effects and a high dose of PvQ hydrolyzed t-PA [22]. To be precise, a definitive explanation of this phenomenon requires further experimentation later. However, in any case, the level of t-PA did increase in the presence of low concentrations of PvQ. Meanwhile, earthworm fibrinolytic proteins were orally administered in the traditional application, and the plasma concentration of the active proteases may not reach a high level, thus, showing a therapeutic effect rather than hydrolysis.

Purification of Functional Proteins from PvQ
The freeze-dried PvQ was redissolved in 20 mM Tris-HCl buffer (pH 7.8) and subjected to a HiTrap Q HP column. Three fractions (Fr.1-Fr.3) possessing fibrinolytic activity were collected ( Figure 6) and the SDS-PAGE profile of the fractions showed impurities. Fr.1-Fr.3 were successively loaded into a HiPrep Phenyl FF column. After collecting all chromatographic peak fractions, the fibrin plate assay was performed to verify their activity. The results showed that Fr.1-Fr.3 contained 1, 2 and 3 active peaks (Fr.1a, Fr.2a, Fr.2b, Fr.3a-Fr.3c), respectively. Then, gel filtration chromatography (Superdex 75 Increase) was carried out for all the above active fractions. Finally, six novel P. vulgaris-derived fibrinolytic proteins, named PvI-PvVI, were obtained from PvQ, with the molecular weight ranging from 20 to 35 kDa. Additionally, most bands were more than 98% purity as estimated by SDS-PAGE coomassie brilliant blue staining ( Figure 7a). Meanwhile, under approximately the same concentration conditions (0.23 mg/mL, determined by the bicinchoninic acid method), the six proteases possessed significant biological activity, proven by the fibrin plate assay. The rank order of the fibrinolytic activity of the six isoenzymes on the fibrin plates is PvII > PvV > PvIV > PvVI > PvIII > PvI (Figure 7b).

Purification of Functional Proteins from PvQ
The freeze-dried PvQ was redissolved in 20 mM Tris-HCl buffer (pH 7.8) and subjected to a HiTrap Q HP column. Three fractions (Fr.1-Fr.3) possessing fibrinolytic activity were collected ( Figure 6) and the SDS-PAGE profile of the fractions showed impurities. Fr.1-Fr.3 were successively loaded into a HiPrep Phenyl FF column. After collecting all chromatographic peak fractions, the fibrin plate assay was performed to verify their activity. The results showed that Fr.1-Fr.3 contained 1, 2 and 3 active peaks (Fr.1a, Fr.2a, Fr.2b, Fr.3a-Fr.3c), respectively. Then, gel filtration chromatography (Superdex 75 Increase) was carried out for all the above active fractions. Finally, six novel P. vulgaris-derived fibrinolytic proteins, named PvI-PvVI, were obtained from PvQ, with the molecular weight ranging from 20 to 35 kDa. Additionally, most bands were more than 98% purity as estimated by SDS-PAGE coomassie brilliant blue staining ( Figure 7a). Meanwhile, under approximately the same concentration conditions (0.23 mg/mL, determined by the bicinchoninic acid method), the six proteases possessed significant biological activity, proven by the fibrin plate assay. The rank order of the fibrinolytic activity of the six isoenzymes on the fibrin plates is PvII > PvV > PvIV > PvVI > PvIII > PvI (Figure 7b). Bottom-up proteomic strategy is far more widespread and refers to using exhaustive, usually tryptic, digestion of proteins prefractionated by polyacrylamide gel electrophoresis. In the present study, six trypsin-digested samples were injected into LC-MS/MS and the results were analyzed using Byonic TM compared against the local database of P. vulgaris. The retrieval results for the six proteins are shown in Table 3 and all proteins have more than two unique peptides. Next, we determined the PTH amino acid compositions which were released in each step of Edman degradation. The N-terminal sequence of the first 10 amino acid residues of purified proteases from PvQ are shown in Table 4 and the chromatograms are shown in the supplementary materials Figures 1-8. Consequently, the actual amino acids sequence was identified from the possible sequences based on Byonic TM combined with N-terminal sequencing through the Edman degradation assay. The primary sequences of the mature peptides are shown in the supplementary materials Tables 1-6. Additionally, the six isoenzymes possess sequence similarity of 56.4% and had the same catalytic triplet as trypsin (His 57 , Asp 102 and Ser 195 ), indicating that the functional proteins belong to the trypsin family ( Figure 8).

The Bottom-Up Proteomics and N-Terminal Sequence Analysis
Bottom-up proteomic strategy is far more widespread and refers to using exhaustive, usually tryptic, digestion of proteins prefractionated by polyacrylamide gel electrophoresis. In the present study, six trypsin-digested samples were injected into LC-MS/MS and the results were analyzed using Byonic TM compared against the local database of P. vulgaris. The retrieval results for the six proteins are shown in Table 3 and all proteins have more than two unique peptides. Next, we determined the PTH amino acid compositions which were released in each step of Edman degradation. The N-terminal sequence of the first 10 amino acid residues of purified proteases from PvQ are shown in Table 4 and the chromatograms are shown in the supplementary materials Figures 1-8. Consequently, the actual amino acids sequence was identified from the possible sequences based on Byonic TM combined with N-terminal sequencing through the Edman degradation assay. The primary sequences of the mature peptides are shown in the supplementary materials Tables 1-6. Additionally, the six isoenzymes possess sequence similarity of 56.4% and had the same catalytic triplet as trypsin (His 57 , Asp 102 and Ser 195 ), indicating that the functional proteins belong to the trypsin family ( Figure 8). Bottom-up proteomic strategy is far more widespread and refers to using exhaustive, usually tryptic, digestion of proteins prefractionated by polyacrylamide gel electrophoresis. In the present study, six trypsin-digested samples were injected into LC-MS/MS and the results were analyzed using Byonic TM compared against the local database of P. vulgaris. The retrieval results for the six proteins are shown in Table 3 and all proteins have more than two unique peptides. Next, we determined the PTH amino acid compositions which were released in each step of Edman degradation. The N-terminal sequence of the first 10 amino acid residues of purified proteases from PvQ are shown in Table 4 and the chromatograms are shown in the Supplementary Materials Figures S1-S8. Consequently, the actual amino acids sequence was identified from the possible sequences based on Byonic TM combined with N-terminal sequencing through the Edman degradation assay. The primary sequences of the mature peptides are shown in the Supplementary Materials Tables S1-S6. Additionally, the six isoenzymes possess sequence similarity of 56.4% and had the same catalytic triplet as trypsin (His 57 , Asp 102 and Ser 195 ), indicating that the functional proteins belong to the trypsin family ( Figure 8).  Table 4. N-terminal amino acid sequences of PvI-PvVI.

Secondary-Structure Prediction
Circular Dichroism (CD) spectroscopy is a widely used technique for the study of protein structure. The results through BeSTSel algorithms [23] showed that the secondary structure of the proteins covers various forms, including α-helix, β-sheet, turn and random coil ( Table 5). As the results in Figure 9 show, the secondary structure forms in all six proteases have a similar tendency, of which strands and random coil account for the largest proportion. Additionally, a high proportion of unordered secondary structures in earthworm fibrinolytic protease could be the reason for the broad substrate affinity of the enzyme and stability [13].  [23] showed that the secondary structure of the proteins covers various forms, including α-helix, β-sheet, turn and random coil ( Table 5). As the results in Figure 9 show, the secondary structure forms in all six proteases have a similar tendency, of which strands and random coil account for the largest proportion. Additionally, a high proportion of unordered secondary structures in earthworm fibrinolytic protease could be the reason for the broad substrate affinity of the enzyme and stability [13].   2.6.3. Tertiary-Structure Prediction MODELLER 9.23 was used to predict the three-dimensional model of the six proteases. The results of BLASTtp before modelling showed that the purified proteases had a high similarity with the homologous template in the PDB database, ranging fom 36% to 76%. According to the principle of minimum energy, the optimal model was selected from the ten models generated by MODELLER 9.23 and loop refining was performed to obtain the final optimization model ( Figure 10). Furthermore, RAMPAGE software verified the validity of the six selected models and a Ramachandran plot of them is shown in Figure 11. The percentage of residues in the most favored regions and in the allowed regions of each protein is above 95%. Therefore, the result of this modelling prediction carries a high level of confidence.

Gene Cloning of Fibrinolytic Proteins
Total RNA was obtained through a commercial kit, and agarose gel electrophoresis was used to detect RNA integrity. Figure 13a shows tight 18S ribosomal electrophoretic bands, indicating the presence of good-quality intact RNA; the OD260/280 is 1.89. Next, the cDNA was generated by RT-PCR using total RNA as a template. The six genes of mature peptide sequences were cloned in succession by specific primers and cDNA, as shown in Figure 13b. The resultant DNA fragment was inserted into a pCE2 TA/Blunt-Zero vector and the sequencing results of the six genes were illustrated in Figure 14. The open reading frame (ORF) sequence of PvI-PvVI contains 238, 225, 225, 240, 242, and 239 codons including a stop codon, respectively.
Additionally, the full-length sequences of these six genes were cloned based on transcriptome data in previous studies. However, based on the N-terminal amino acid sequence, we found the proteases might consist of duplicate regions, including a mature peptide sequence and a pro-region sequence upstream from its mature sequence, as previously reported [26,27]. The active form (the mature protein) of PvI-PvVI is initiated from isoleucine or valine but not methionine, which implies that the polypeptides produced may be conducted through post-translational modification. Meanwhile, the primary sequence of proteases was further verified by cloning the genes. Additionally, these six fibrinolytic genes (Supplementary Materials Tables S7-S12) were reported for the first time, providing a research basis for the subsequent heterologous expression of such fibrinolytic proteins derived from earthworm. The resulting homology model showed that each of the six proteases contains two βsheet barrel-like subdomains connected by three trans-domain straps. The overall structure is very similar to that of the trypsin family. As shown in Figure 10, except for PvI, which contains two six-stranded barrels, the N-terminal barrel contains six antiparallel β-sheets and the C-terminal barrel contains seven in the PvII-PvVI. The active site cleft and the catalytic residues are located at the junction of both barrels, with the active site cleft perpendicular to the junction. Moreover, the S1 pocket was walled by several residues, indicated in Figure 8 with red asterisks. As the 3D models of PvI-PvVI showed in Figure 12, the histidine residue is located in the top right of the S1 pocket in PvI-PvVI; the Gly (Asp)-Ser segment goes from the bottom to the upper right; Thr (Val)-Cys goes from the top to the bottom. Besides the segment, Pro-Val is in the center and Tyr is closed at the bottom of S1 pocket. According to the predicted structural information, we know that PvI, PvIV and PvV possess the essential S1 specificity determinants characteristic of elastase, but the amino acid residues at the entrance of their pockets are different from elastase, PvI (Val, Ser), PvIV (Gly, Ser) and PvV (Gly, Ser). This might cause a wider space at the entrance of the substrate-binding pocket than the elastase [24]. In addition, the S1 pockets of PvII and PvIII were similar to that of chymotrypsin. However, PvVI has a mostly identical substratebinding pocket with trypsin, and both of them possess Asp residue at the bottom of their pockets. Therefore, PvVI may have excellent binding ability for Lys and Arg residues [25].

Gene Cloning of Fibrinolytic Proteins
Total RNA was obtained through a commercial kit, and agarose gel electrophoresis was used to detect RNA integrity. Figure 13a shows tight 18S ribosomal electrophoretic bands, indicating the presence of good-quality intact RNA; the OD 260/280 is 1.89. Next, the cDNA was generated by RT-PCR using total RNA as a template. The six genes of mature peptide sequences were cloned in succession by specific primers and cDNA, as shown in Figure 13b. The resultant DNA fragment was inserted into a pCE2 TA/Blunt-Zero vector and the sequencing results of the six genes were illustrated in Figure 14. The open reading frame (ORF) sequence of PvI-PvVI contains 238, 225, 225, 240, 242, and 239 codons including a stop codon, respectively.

Discussion
Although many fibrinolytic agents have been developed and used for clinical purposes, thromboembolic diseases remain the leading cause of adult morbidity and mortality in the world [3]. Increasing attention has been applied in recent years to the development of effective agents for clinical application from various unusual animal Additionally, the full-length sequences of these six genes were cloned based on transcriptome data in previous studies. However, based on the N-terminal amino acid sequence, we found the proteases might consist of duplicate regions, including a mature peptide sequence and a pro-region sequence upstream from its mature sequence, as previously reported [26,27]. The active form (the mature protein) of PvI-PvVI is initiated from isoleucine or valine but not methionine, which implies that the polypeptides produced may be conducted through post-translational modification. Meanwhile, the primary sequence of proteases was further verified by cloning the genes. Additionally, these six fibrinolytic genes (Supplementary Materials Tables S7-S12) were reported for the first time, providing a research basis for the subsequent heterologous expression of such fibrinolytic proteins derived from earthworm.

Discussion
Although many fibrinolytic agents have been developed and used for clinical purposes, thromboembolic diseases remain the leading cause of adult morbidity and mortality in the world [3]. Increasing attention has been applied in recent years to the development of effective agents for clinical application from various unusual animal species [2]. A group of fibrinolytic enzymes secreted by the alimentary tract of earthworm have exhibited excellent potential in the clinical treatment of blood clotting diseases [28]. However, the study of medicinal animals has always been a knotty problem in traditional medicine, which has led to insufficient research on their bioactive constituents and clinical application.
In this research, we established a time-saving purification process for the enrichment of active extracts from P. vulgaris, a medicinal animal listed in the Chinese pharmacopoeia, and denominated as PvQ. In the in vitro study, the fibrin plate assay was performed to verify its fibrinolytic activity. During fibrin-agarose coagulation, thrombin converts fibrinogen to fibrin monomers, which connect to form fibrin bundles. This process is very similar to that of blood coagulation [29]. The area of clear hydrolyzed zone on the fibrin plate indicated that PvQ had greater bioactivity than lumbrokinase and this activity difference may result from the varieties of earthworm species. Additionally, PvQ exhibited an excellent in vivo thrombolysis effect through the thrombotic zebrafish model. Unfortunately, restricted by low yield, antithrombotic and thrombolytic effects in mammalian models have not yet been evaluated. Furthermore, the proteases have the ability not only to hydrolyze fibrin but also to activate proenzymes, such as plasminogen [21,30]. In the present study, the ELISA assay indicated that PvQ can promote the secretion of t-PA in HUVECs. This result agreed with the traditional application of earthworm protein extract for the prevention of thrombotic diseases. Certainly, given the whole complexity of endothelial cells, more cell model experiments and in vivo experiments are needed to demonstrate the potential activity of PvQ.
It is evident from various studies that lumbrokinase derived from earthworm has potential proteolytic activity and has been used to cure cardiovascular diseases [31] since ancient times. However, research on the material basis of its efficacy is relatively insufficient, and the identification of protein structures remains a major challenge. Therefore, the present study was conducted to isolate and identify earthworm fibrinolytic enzymes derived from P. vulgaris to be used for the treatment of thrombotic diseases. Fortunately, six novel P. vulgaris-derived fibrinolytic proteases were obtained from the active extract, PvQ, in the current research. Their primary sequences were identified through LC-MS/MS transcriptome cross identification and the Edman degradation assay. Analysis of conserved sequence motifs showed that the six proteases had the same catalytic triplet as the trypsin family. Meanwhile, the circular dichroism spectra and homologous modelling were performed to predict the spatial structures. According to the predicted structural information, we know that PvI, PvIV and PvV possess typical S1-pocket of elastase-like protease, which suggests that their S1 pockets should be preferable only for P1 residues with small and hydrophobic side-chains. In addition, the S1 pockets of PvII and PvIII were similar to those of chymotrypsin. However, compared to the substrate-binding pocket structures of PvII and PvIII with chymotrypsin, Gly 216 and Gly 226 at the entrance of the substrate-binding pocket of chymotrypsin are replaced by Gly and Ser in PvII and PvIII. These extra side chains in PvII and PvIII make the entrance narrower than that of chymotrypsin. The backbones of the S1 pocket in PvVI are readily superimposable on that of trypsin and the Asp residues at the bottom of the pockets improved the substrate specificity of PvVI. In short, structural research is significant for comprehending protein functions, biological mechanisms and the interaction of proteins, which is very important for future exploration in biological medicine and pharmaceutics. Therefore, it will be an important project to culture protein crystals in the following studies.
Although the monomers can be separated by conventional purification methods, it is a time-consuming and difficult project. The use of different purification procedures will also result in a final product that varies in composition. Therefore, the level of fibrinolytic activity may also complex. Therefore, the successful expression of the recombinant protease provides a new approach to obtain a single component with fibrinolytic activity [32][33][34]. To date, 42 earthworm fibrinolytic gene sequences are publicly available at NCBI GenBank, including Eisenia foetida [35], Lumbricus rubellus [36], among others. A few of these genes have also been successfully expressed and characterized in E. coli [35][36][37] and the yeast Pichia pastoris [33,38,39]. Certainly, the majority of studies have reported that, for undetermined reasons, recombinant proteins are either not expressed or do not exhibit fibrinolytic activity. Additional studies are required to clarify why some genes are capable of being expressed in cell culture systems, but without fibrinolytic activity. In the current study, six fibrinolytic genes encoding fibrinolytic proteins derived from P. vulgaris have been reported for the first time. Additionally, the Pichia pastoris expression system is being used to explore the expression of target proteins, but the results remain unpredictable.
Although earthworm fibrinolytic proteases have some advantages in pharmaceutical applications, there are some problems that need to be solved. First, as an oral drug, it is not clear how earthworm fibrinolytic protease passes through the intestinal mucosa and transport into the blood circulation. Meanwhile, it is ambiguous whether earthworm fibrinolytic proteases will produce plasmin complications when they directly dissolve fibrinogen and fibrin. Additionally, the molecular mechanism for its thrombolytic effect in vivo remains unclear. Therefore, it is necessary to generate the fibrinolytic monomer and study its pharmacokinetics and toxicology in future research. It is well-known that the molecular structure is the basis of its catalytic mechanism and determines their extraordinary substrate specificity, so the study of bioactive protein spatial structure contributes to a better understanding of the fibrinolysis mechanisms. In conclusion, biophysical studies are necessary for the elucidation of their molecular biological characteristics and therapeutic applications.

Material
Fresh P. vulgaris was collected from Shanghai, China. A voucher specimen, code Pv-Dsy 2020, was deposited at the School of Chinese Materia Medica, Beijing University of Chinese Medicine. Additionally, their crude extract was obtained according to laboratory preparation technology. Lumbrokinase (140650-201804), fibrinogen (140607-202042), thrombin (140605-201927) were all acquired from the Chinese Food and Drug Testing Institute. High-glucose Dulbecco's modified Eagle medium (DMEM), fetal bovine serum (FBS), penicillin and streptomycin were purchased from Gibco (Thermo Fisher Scientific Co., Waltham, MA, USA). A human t-plasminogen activator ELISA kit (KE00180) was purchased from Wuhansanying Biotechnology Co. Ltd. (Proteintech, Wuhan, China). Other applied reagents were of analytical grade or above.

Enrichment of the Fibrinolytic Extract, PvQ
The lyophilized-crude extract powders (200 mg) were dissolved in 50 mL of a 20 mM Tris-HCl buffer (pH 7.8), filtered with membrane filters (0.45 µm) and loaded on a prepacked HiTrap Q HP column (GE Healthcare, Chicago, IL, USA) equilibrated with the buffer. After loading, the adsorbed proteins were eluted with a stepwise gradient of 0.1-1 M NaCl in the Tris-HCl buffer at a flow rate of 5.0 mL per minute on the AKTA pure. At the end of elution, all fractions were collected, and a Fibg-TT assay was carried out to evaluate their bioactivity. According to the results of the Fibg-TT assay, the anticoagulant fractions were screened and desalted via centrifugal filters (MWCO of 3 kD, Millipore, Germany). To facilitate the subsequent experiments, all target fractions were freeze-dried.

Fibrin Plate Assay
The fibrin-hydrolysis activity of PvQ was measured by the fibrin plate assay [40,41], with slight modification of the amount of sample. Fibrinogen solution (2 mg/mL) was prepared in 8 mL of normal saline, then mixed with 20 mL of 0.5% agarose solution. Additionally, 200 µL of 40 BP/mL thrombin solution was distributed into each sterile Petri dish before 28 mL of fibrinogen and agarose mixed solution were added. After the agarose solution solidified, samples and serial concentrations of lumbrokinase (24,000, 16,000, 12,000, 8000, 6000, 4000 U/mL) were added to the dish, respectively. The plates were then incubated at 37 • C for 18 h. Two diameters on the X and Y directions of each hydrolyzed clear zone were measured and the hydrolytic activity of PvQ was then calculated.

The Effect of PvQ on Zebrafish Thrombotic Model of Vascular Damage
Wildtype (AB) zebrafish were purchased from the China Zebrafish Resource Center, and bred in the zebrafish circulating aquaculture system (ESEN, China) according to Mustafa et al. [42]. Larval zebrafish (5 days post-fertilization) with normal development were selected under a stereomicroscope and randomly placed in 6-well plates containing 3 mL of culture medium, with 10 fishes in each well. The zebrafish in the blank group were cultured normally, and the other groups were treated with epinephrine hydrochloride solution with the final concentration of 30 µM for 16 h in the dark. At the end of the experiment, the fibrinolytic effect of PvQ (10 ng/fish) on zebrafish was evaluated by o-anisidine staining; tail vein thrombosis in zebrafish and the reduction in intensity of erythrocyte staining were taken as indicators.
According to the pre-experiment results, the whole laboratory was divided into the model group, the drug administration group and the blank group, and all samples were placed in a 28.5 • C constant temperature biochemical incubator. The experiment was repeated three times with larval zebrafish from different parents. After staining, zebrafish in each group were photographed via a stereoscopic microscope, and the staining intensity of red blood cells in the tail of zebrafish was counted using Image-Pro Plus 6 processing software. The inhibition ratio was expressed through the rate of staining intensity of the PvQ group and that of the model group; before calculations, both groups' background values were subtracted [43].

The Determination of t-PA Content
The human umbilical vein endothelial cell line (No. 1101HUM-PUMC000437) was obtained from the Cell Resource Center of Peking Union Medical College (Beijing, China). HUVECs were cultured in DMEM basal medium supplemented with 10% fetal bovine serum (FBS), 100 IU/mL of penicillin, and 100 µg/mL of streptomycin. The cells were maintained in standard conditions (37 • C, 5% CO 2 , saturated humidity) and the culture medium was renewed every three days. Moreover, subculture was performed at a ratio of 1:3 when the confluence of cells reached about 90%.
The CCK-8 assay kit (Vazyme, Nanjing, China) was used to determine cell viability. HUVECs were seeded in 96-well microplates (Corning, NY, USA) at a density of 10 5 cells/mL with a total volume of 100 µL for 24 h. The seeded cells were then treated with PvQ in different concentrations (ranging from 10 µg/mL to 60 µg/mL, dissolved in DMEM) and incubated for 24 h. Subsequently, 10 µL of CCK-8 solution was added into each well and incubated at 37 • C for another 2 h in the dark. The OD values were measured with a microplate reader (BioTek, Winooski, VT, USA) at 450 nm and cell viability was calculated. On this basis, the maximum drug-treated concentration was determined as cell viability greater than 95% during 24 h of incubation. Under the same conditions, the concentration of t-PA in the cell culture supernatants of each group was assessed using a human t-Plasminogen activator ELISA kit according to the manufacturer's instructions.

Purification of Fibrinolytic Proteases (PvI-PvVI) from PvQ
PvQ is a significant fibrinolytic enrichment, containing several active protein monomers. In order to clarify the effective substances of PvQ more thoroughly, a systematic study of its proteases in the fraction was performed in this project and a series of chromatographic techniques were used for purification [12].
Firstly, the lyophilized powders of PvQ were redissolved in 20 mM Tris-HCl buffer (pH 7.8) and purified on a HiTrap Q HP column (GE Healthcare, Chicago, IL, USA) eluted with a NaCl gradient from 0 to 20 mM at a flow rate of 5.0 mL/min. The fractions were pooled and assayed by the fibrin plate to track the active protein peak. The eluted active fractions were successively subjected to a HiPrep Phenyl FF column (GE Healthcare, Chicago, IL, USA), eluting with a stepwise gradient of 20 mM Tris-HCl buffer with 1 M ammonium sulfate (from 1 to 0.5 M, pH 7.8). Subsequently, size exclusion chromatography using a Superdex 75 Increase gel filtration column (GE Healthcare, Chicago, IL, USA) was carried out for each active HIC fraction through an AKTA pure protein purification system with 20 mM Tris-HCl buffer containing 20 mM NaCl at a flow rate of 0.5 mL/min. All columns were used repeatedly until SDS-PAGE showed single bands. Active fractions were pooled, desalted, and lyophilized. Six protein monomers, named PvI-PvVI, were identified by bottom-up proteomics using LC-MS/MS and the acquired data were compared with the local database constructed through the transcriptome results of P. vulgaris. Briefly, six samples were subjected to a 12% polyacrylamide gel and corresponding strips were collected. Six bands were digested in PAGE by trypsin and separated using Ultimate 3000 RSLC (Thermo, Waltham, MA, USA) with the Acclaim PepMap C18 column (1.9 µm, 100 Å, 150 µm i.d. × 150 mm). The loaded sample was gradient eluted from buffer A (0.1% formic acid in water) to buffer B (0.1% formic acid in acetonitrile) at 600 nL/min for a total of 66 min and analyzed by a Q Exactive TM hybrid Quadrupole Orbitrap TM mass spectrometer with spray voltage of 2.2 kV, capillary temperature of 320 • C. Furthermore, Byonic [44] was used to compare the original mass spectrometric data with the local database of P. vulgaris. Afterwards, the raw data collected by mass spectrometry was retrieved from the local database by Byonic, and results for protein identification were obtained. The nucleotide sequences corresponding to the proteins were also acquired.

N-Terminal Sequence Analysis
In this study, the Edman degradation assay was used to determine the natural Nterminal sequence of those six proteins. The experimental procedure is presented in Figure 15. More specifically, the protein bands on the SDS-PAGE gel were electroblotted onto a polyvinylidene fluoride (PVDF) membrane. The protein on the film reacted with phenyl isothiocyanate (PITC) using the PPSQ-33A automatic protein sequencer (Shimadzu, Kyoto, Japan) to produce phenylthiohydantoin amino acid (PTH-AA). Afterwards, the PTH-AA was separated via HPLC and identified through comparative analysis with the chromatogram of 19 mixed PTH-AA. The remaining protein was repeatedly treated using the same method as before to produce various PTH-AA in turn. In this experiment, the number of cycles was set to 10 times and raw data were analyzed by the software PPSQ-30 Data Processing.

Secondary Structure Determination
The secondary structure of the six purified proteins was determined by circular dichroism (CD) spectroscopy. CD spectra were carried with about 0.2 mg/mL of purified proteins using a 1.0 mm quartz cell. A CD spectrometer (JASCO 715, Tokyo, Japan) was set with a measurement range of 180-250 nm at a scanning speed of 50 nm/min. The bandwidth was 1.0 nm, and the data pitch was 0.1 nm. Deionized water was used as a solvent and to mark the baseline. Further analysis of the results was carried out using BeSTSel, a non-commercial web server (http://bestsel.elte.hu) (accessed on 14 April 2021). This method even estimates α-helix content more accurately than previous methods, but it provides detailed information regarding the β-sheets, and overcomes the challenge brought about by the large spectral and structural diversity of β-sheets [23,45].

Three-Dimensional Protein Structure Prediction
Steric structures of proteins are important for analyzing protein function. Builiding protein crystals has long been a major challenge, and homologous modelling is a quick and reliable method for predicting protein structure when X-ray crystallography or nuclear magnetic resonance (NMR) is not feasible. In this experiment, we predicted the tertiary structures of six novel purified fibrinolytic proteins based on MODDLER 9.23 for multi-template modelling. Firstly, protein homology searches were performed in the Protein Data Bank (PDB) database using a Blastp tool to obtain homologous templates. Subsequently, multi-template modelling was carried out based on the alignment results and structural information of homologous proteins. Then, the optimal model was selected by DOPE scoring in combination with Molpdf score, and loop correction was performed to pick a model with the lowest energy. Furthermore, the RAMPAGE server (http://wwwcryst.bioc.cam.ac.uk/rampage) (accessed on 16 March 2021) was used for the validation of the predicted structures [46,47].

Cloning of the Six Genes Proteins Gene
Total RNA was extracted from a living earthworm using the Mollusc RNA kit R6875 purchased from OMEGA Bio-tec, America. Then, reverse transcriptase-polymerase chain reaction (RT-PCR) was conducted according to the protocol of the HiScript III 1st Strand cDNA Synthesis kit (Vazyme, Nanjing, China) with the oligo (dT)20 primer. Upon the nucleotide sequence of the transcriptome of P. vulgaris, six pairs of primers (

Secondary Structure Determination
The secondary structure of the six purified proteins was determined by circular dichroism (CD) spectroscopy. CD spectra were carried with about 0.2 mg/mL of purified proteins using a 1.0 mm quartz cell. A CD spectrometer (JASCO 715, Tokyo, Japan) was set with a measurement range of 180-250 nm at a scanning speed of 50 nm/min. The bandwidth was 1.0 nm, and the data pitch was 0.1 nm. Deionized water was used as a solvent and to mark the baseline. Further analysis of the results was carried out using BeSTSel, a non-commercial web server (http://bestsel.elte.hu) (accessed on 14 April 2021). This method even estimates α-helix content more accurately than previous methods, but it provides detailed information regarding the β-sheets, and overcomes the challenge brought about by the large spectral and structural diversity of β-sheets [23,45].

Three-Dimensional Protein Structure Prediction
Steric structures of proteins are important for analyzing protein function. Builiding protein crystals has long been a major challenge, and homologous modelling is a quick and reliable method for predicting protein structure when X-ray crystallography or nuclear magnetic resonance (NMR) is not feasible. In this experiment, we predicted the tertiary structures of six novel purified fibrinolytic proteins based on MODDLER 9.23 for multitemplate modelling. Firstly, protein homology searches were performed in the Protein Data Bank (PDB) database using a Blastp tool to obtain homologous templates. Subsequently, multi-template modelling was carried out based on the alignment results and structural information of homologous proteins. Then, the optimal model was selected by DOPE scoring in combination with Molpdf score, and loop correction was performed to pick a model with the lowest energy. Furthermore, the RAMPAGE server (http://www-cryst. bioc.cam.ac.uk/rampage) (accessed on 16 March 2021) was used for the validation of the predicted structures [46,47].

Cloning of the Six Genes Proteins Gene
Total RNA was extracted from a living earthworm using the Mollusc RNA kit R6875 purchased from OMEGA Bio-tec, America. Then, reverse transcriptase-polymerase chain reaction (RT-PCR) was conducted according to the protocol of the HiScript III 1st Strand cDNA Synthesis kit (Vazyme, Nanjing, China) with the oligo (dT) 20 primer. Upon the nucleotide sequence of the transcriptome of P. vulgaris, six pairs of primers (Table 6) were designed to be used in PCR amplification with cDNA by the Phanta Max Super-Fidelity DNA Polymerase kit (Vazyme, Nanjing, China). PCR was performed with 3 µL cDNA template, 1 µL DNA polymerase, 1 µL dNTP mix, and 2 µL of each primer in a 50 µL reaction mixture. Additionally, the PCR protocol was as follows: a pre-denaturation for 3 min at 95 • C was followed by 35 cycles of 15 s at 95 • C, 15 s at the right temperature (Tm), and 1 min at 72 • C. The PCR product was purified by the Gel/PCR extraction kit (Biomiga, San Diego, CA, USA) and subcloned into a pCE2 TA/Blunt-Zero vector at 37 • C using the 5 min TA/Blunt-Zero Cloning kit (Vazyme, Nanjing, China). The recombinant plasmids were transformed into E. coli DH5α and sequenced. The sequencing results were compared with the sequences in the transcriptome to further verify the validity of the primary sequences of the purified proteins.

Statistical Analysis
All statistical analyses were performed through one-way ANOVA with the aid of SPSS 20.0; results were presented as mean ± SD. A p-value under 0.05 was regarded as a significant difference.

Conclusions
In this paper, a bioactive protein fraction, PvQ, was isolated from P. vulgaris using AKTA pure, which exhibited excellent in vitro and in vivo activity. In order to clarify the effective substances of PvQ more thoroughly, a systematic study of its fibrinolysis constituents was performed, combined with extensive column chromatography techniques. Until now, six novel bioactive proteases were obtained and their primary structures were identified through mass spectrometry techniques in combination with RNA-seq. Moreover, the spatial structures were developed by CD spectroscopy and homology modelling methods. Six genes encoding purified proteases were cloned in succession, which laid the foundation for subsequent heterologous expression and enriched the fibrinolytic gene data from P. vulgaris.

Supplementary Materials:
The following are available online, Figure S1: Chromatography of 19 phenylthiohydantoin amino acid mixtures for PvI, Figure S2