Comparative Genomic Analysis of a Thermophilic Protease-Producing Strain Geobacillus stearothermophilus H6

The genus Geobacillus comprises thermophilic gram-positive bacteria which are widely distributed, and their ability to withstand high temperatures makes them suitable for various applications in biotechnology and industrial production. Geobacillus stearothermophilus H6 is an extremely thermophilic Geobacillus strain isolated from hyperthermophilic compost at 80 °C. Through whole-genome sequencing and genome annotation analysis of the strain, the gene functions of G. stearothermophilus H6 were predicted and the thermophilic enzyme in the strain was mined. The G. stearothermophilus H6 draft genome consisted of 3,054,993 bp, with a genome GC content of 51.66%, and it was predicted to contain 3750 coding genes. The analysis showed that strain H6 contained a variety of enzyme-coding genes, including protease, glycoside hydrolase, xylanase, amylase and lipase genes. A skimmed milk plate experiment showed that G. stearothermophilus H6 could produce extracellular protease that functioned at 60 °C, and the genome predictions included 18 secreted proteases with signal peptides. By analyzing the sequence of the strain genome, a protease gene gs-sp1 was successfully screened. The gene sequence was analyzed and heterologously expressed, and the protease was successfully expressed in Escherichia coli. These results could provide a theoretical basis for the development and application of industrial strains.


Introduction
Thermophilic microorganisms are microbes that can grow at 41~122 • C, and their optimal growth temperature is 45~80 • C. Thermophilic ecological environments are distributed at sites including volcanoes, geothermal areas (terrestrial, underground and marine hot springs), compost, oil reservoirs and other extreme high-temperature areas on Earth [1]. Many thermophilic microorganisms play an important role in biotechnology and have major commercial applications in industrial production [2]. For example, they can produce a variety of thermostable enzymes [3], generate biofuels by degrading agricultural wastes [4], and they show a special leaching capacity for certain minerals [5] and a bioremediation capacity [6].
Geobacillus was separated from Bacillus in 2001 by T.N. Nazina and others as a new genus of bacteria [7]. It is mainly composed of aerobic or facultative anaerobic bacteria with the ability to form endophytic spores and is a typical thermophilic microbial group [8]. The genus is widely distributed and is found in natural environments such as oil fields, hot springs, volcanic vents, dairy plants, food processing, compost and other high-temperature environments [9]. The ability of Geobacillus to grow at high temperatures makes it suitable for various applications in biotechnology and industrial production. R.E. Cripps and others used metabolic engineering to transform two Geobacillus thermophilic bacteria to

Sample Collection, Strain Isolation and Culture
The samples used in this study were hyperthermophilic composting soil samples from Beijing. To isolate thermophilic bacteria, LB and R2A liquid culture media were used, and 5 g soil samples were added to 50 mL liquid culture media and were then incubated in a water bath at 80 • C for 48 h. Then, 500 µL of enrichment solution was added to 50 mL of new liquid medium for further culture in an 80 • C water bath, and the enrichment solution was collected three times. The enrichment solution was diluted to 10 −1 , 10 −2 , 10 −3 and 10 −4 with ddH 2 O, and the dilutions were spread on corresponding agar plates and cultured in an 80 • C incubator. A colony was selected for 16S rRNA sequencing, and the bacteria were preserved. The strain was isolated from an R2A agar plate.

Genomic DNA Extraction and Sequencing of G. stearothermophilus H6
G. stearothermophilus H6 was cultured in LB medium at 60 • C overnight. The bacterial solution was centrifuged at 4000 r/min at 4 • C for 10 min, the supernatant was discarded, and the bacterial cells were collected. Total bacterial DNA was extracted according to the operating instructions of a bacterial genomic DNA isolation kit (Mei5 Biotechnology Co. Ltd., Beijing, China), and the concentration and quality of the DNA were assessed with a NanoDrop 2500 system (OD 260 /OD 280 = 1.8-2.0, ≥10 µg). The total DNA of the extracted samples was stored on dry ice and sent to Biomarker Technologies to complete the sequencing analysis.

Phylogenetic Tree and Comparative Genomic Analysis of 16S rRNA of G. stearothermophilus H6
Genomic DNA was extracted and purified with a commercial bacterial genomic DNA isolation kit. The 16S rRNA gene was amplified with the universal bacterial primers 27F and 1492R. Preliminary sequence analysis of the 16S rRNA gene was conducted using the NCBI database, and strains with high homology in the NCBI database were selected for phylogenetic tree analysis. The corresponding phylogenetic tree was constructed by using MEGA 6.0 [22] software and the maximum likelihood method (ML). The evolutionary tree was constructed based on the bootstrap values of 1000 repeats.
Five homologous strains were selected, their basic information was compared with that of G. stearothermophilus H6, and the average nucleotide identity (ANI) was calculated (www.ezbiocloud.net/tools/ani (accessed on 15 October 2022)). Using the Mauvealigner algorithm of Mauve 2.4.0 [23], the whole-genome sequence of G. stearothermophilus H6 and the whole-genome sequence of the reference located close to the source strain were analyzed for collinearity.
2.4. Detection of Protease Activity of G. stearothermophilus H6 G. stearothermophilus H6 bacterial solution (2.5 µL) was cultured to the middle logarithmic phase spotted onto a skim milk plate, and the results were compared with Bacillus velezensis, Bacillus subtilis and E. coli BL21 (DE3). The four strains were cultured at both 37 • C and 60 • C. After different culture times, the transparent circles that appeared were observed to preliminarily judge the protease production ability of the strains.

Whole-Genome Sequencing and Analysis
The original genome data were filtered and more than 2 kb of reads were retained. Hifiasm v0.16.0 2 [24] software was used to assemble the filtered reads. Circulator v1.5.5 software was used to cyclize and adjust starting sites. Pilon v1.22 software was used to further correct errors using second-generation data, and a genome with higher accuracy was obtained for subsequent analysis. Prodigal v2.6.3 [25] was used to predict coding sequences (CDSs) in the genome of the strain, and genome information obtained by assembly and prediction, such as information on tRNAs, rRNAs, repeat sequences, GC contents and gene functions, was used to draw a circular genome map with the software Circos v0.66 [26].
We used software to predict repeat sequences, rRNAs, tRNAs, CRISPRs, and gene islands in the genome. Gene function annotation was mainly based on protein sequence comparison, performed by comparing the gene sequences in each database. The predicted gene sequences were compared with eggNOG, KEGG, Swiss-Prot, TrEMBL, Nr, GO, Pfam and other databases to obtain gene function annotation results.

Analysis of G. stearothermophilus H6 Protease
General databases such as eggNOG, KEGG, Swiss-Prot, TrEMBL, Nr, GO, and Pfam were used to predict the distribution of proteases in strain G. stearothermophilus H6, and the software SignalP v4.0 [27] was used to predict protein signal peptides and specific signal peptide excision sites for further analysis. At the same time, TMHMM v2.0 [28] was used to predict the transmembrane domains of the protease. The similarity of the protease gene between G. stearothermophilus H6 and the other two homologous strains was compared with gene sequences in the NCBI database.

Heterologous Expression of the G. stearothermophilus H6 Protease Gene in E. coli
After screening the protease genes of G. stearothermophilus H6, the GE001730 gene was selected and named gs-sp1. The thermophilic protease gene fragment was then inserted into the pET22b vector digested by Hind III and Xho I using seamless cloning and recombination technology.
PCR technology was used to amplify the gene (95 • C for 5 min, 1 cycle; 95 • C for 30 s, 58 • C for 30 s and 72 • C for 30 s, 33 cycles; 72 • C for 10 min, 1 cycle) using the following primers: F (5 -tcgagctccgtcgacaagcttATCTTCCCTCATATGAGTATAGGA-3 ) and R (5 -gtggtggtggtggtgctcgagGCGCTGCAGCAGTTGCTC-3 ). The PCR products were purified with a gel recovery kit (Mei5 Biotechnology Co. Ltd., Beijing, China). The PCR products were cloned into the expression plasmid pET-22b using the ClonExpress Ultra One Step Cloning Kit (Vazyme, Ninjing, China) and then transformed into E. coli BL21 (DE3) cells for further screening and verification.
The constructed E. coli BL21/pET22b/gs-sp1 strain was selected on Amp-resistant plates and cultured overnight at 37 • C. A single colony was picked and cultured overnight at 220 rpm in 20 mL liquid medium at 37 • C. Then, 2% of the volume was transferred to 20 mL LB medium, and culture was continued until reaching OD 600 = 0.5~0.7. Thereafter, 0.1 mM IPTG was used to induce protein expression at 37 • C for 4 hours, and a 5 µL aliquot was spotted on a skim milk plate. The wild-type E. coli BL21 and E. coli BL21/pET22b strains were used as controls, and the transparent circles produced by the bacteria were observed.

Phylogenetic Analysis of G. stearothermophilus H6 16S rRNA
To clarify the taxonomic position of G. stearothermophilus H6, 16S rRNA analysis was used. G. stearothermophilus H6 is a thermophilic bacterium obtained from hyperthermophilic composting soil by culture in R2A medium and screening based on 80 • C culture. According to 16S rRNA gene sequence analysis, the strain is a bacterium of the genus Geobacillus showing the highest homology with G. stearothermophilus B5; therefore, the strain was named G. stearothermophilus H6. The 16S rRNA gene of G. stearothermophilus H6 presented the highest similarity with G. stearothermophilus B5 (99.97%), G. stearothermophilus D1 (99.97%), G. stearothermophilus DSM 458 (99.93%), G. stearothermophilus DG-1 (99.93%), G. stearothermophilus IFO 12550 (99.79%) and G. stearothermophilus 10 (99.78%). A phylogenetic analysis of the 16S rRNA gene with a tree constructed based on the maximum likelihood (ML) method showed that G. stearothermophilus H6 was a member of the genus Geobacillus ( Figure 1). at 220 rpm in 20 mL liquid medium at 37 °C. Then, 2% of the volume was transferred 20 mL LB medium, and culture was continued until reaching OD600 = 0.5~0.7. Thereafte 0.1 mM IPTG was used to induce protein expression at 37 °C for 4 hours, and a 5 μL aliqu was spotted on a skim milk plate. The wild-type E. coli BL21 and E. coli BL21/pET22 strains were used as controls, and the transparent circles produced by the bacteria we observed.

Phylogenetic Analysis of G. stearothermophilus H6 16S rRNA
To clarify the taxonomic position of G. stearothermophilus H6, 16S rRNA analysis wa used. G. stearothermophilus H6 is a thermophilic bacterium obtained from hypertherm philic composting soil by culture in R2A medium and screening based on 80 °C cultur According to 16S rRNA gene sequence analysis, the strain is a bacterium of the genu Geobacillus showing the highest homology with G. stearothermophilus B5; therefore, th strain was named G. stearothermophilus H6. The 16S rRNA gene of G. stearothermophilus H presented the highest similarity with G. stearothermophilus B5 (99.97%), G. stearothermop ilus D1 (99.97%), G. stearothermophilus DSM 458 (99.93%), G. stearothermophilus DG (99.93%), G. stearothermophilus IFO 12550 (99.79%) and G. stearothermophilus 10 (99.78%). phylogenetic analysis of the 16S rRNA gene with a tree constructed based on the max mum likelihood (ML) method showed that G. stearothermophilus H6 was a member of th genus Geobacillus ( Figure 1).  This tree shows the phylogenetic relationships between G. stearothermophilus H6 and closely related species. The GenBank registration number of the 16S rRNA gene sequence is shown in parentheses. Bootstrap test: the percentage is based on 1000 duplicates (bootstrap value > 70, indicating that the branch is reliable, and less than 50% will not be displayed).

Comparative Genome Analysis of G. stearothermophilus H6
To compare the differences between G. stearothermophilus H6 and the five most closely related strains (G. stearothermophilus DSM 458, G. stearothermophilus 10, G. stearothermophilus DG-1, G. stearothermophilus D1 and G. stearothermophilus B5), the genomic characteristics of the six strains were statistically analyzed, and the results are shown in Table 1. The average nucleotide homology (ANI) value indicates the similarity between the sequences of the conserved regions of two genomes and allows the genetic relationships between them to be analyzed. According to the whole-genome information of these six strains, the ANI of the genome of G. stearothermophilus D1 was highest (97.65%), showing good homology ( Table 1). The genome sizes and GC contents of the six strains were similar, with genome sizes ranging from 2.97-3.65 Mb and GC contents from 51.66-52.6%. Based on 16S rRNA phylogenetic tree analysis, the genomes of G. stearothermophilus B5 and G. stearothermophilus 10, which show close homologous relationships with G. stearothermophilus H6, were selected. The software Mauve 2.4.0 was used to perform genome synteny analysis and quickly analyze whether large-segment sequence rearrangements existed between genomes. The squares with similar colors represent highly homologous assembly regions of the two genomes. Figure 2 shows that G. stearothermophilus H6 and G. stearothermophilus 10 had poor synteny, with many gene rearrangements, such as insertions, deletions, inversions and translocations, between them. For example, compared with G. stearothermophilus 10, there was a gene deletion at 1,177,537-1,516,081 bp in G. stearothermophilus H6, and an inversion occurred at 1,604,138-1,649,959 bp in G. stearothermophilus H6. G. stearothermophilus H6 presented good synteny with G. stearothermophilus B5, but there were also some gene rearrangements between them, such as deletions and inversions. For example, compared with G. stearothermophilus B5, a gene inversion occurred at 1,786,619-1,868,252 bp in G. stearothermophilus H6, and a deletion occurred at 2,464,988-2,718,550 bp in G. stearothermophilus H6 (Figure 2).

Detection of Protease Activity in G. stearothermophilus H6
The protease hydrolysis activity of G. stearothermophilus H6 was observed by the skimmed milk plate method and compared with that of B. velezensis, B. subtilis and E. coli BL21(DE3). The four strains were cultured at both 37 °C and 60 °C, and the results showed that G. stearothermophilus H6 could produce transparent circles that became larger with increasing culture time ( Figure 3). When cultured at 37 °C, B. velezensis, B. subtilis and G. stearothermophilus H6 produced transparent circles, with B. velezensis producing the strongest degradation results. When cultured at 60 °C, G. stearothermophilus H6 produced the largest transparent circle, and the other three strains did not produce a transparent circle. Thus, G. stearothermophilus H6 produces extracellular proteases that function under high temperature and show a good effect.

Overview of the Genome Assembly and Whole Genome of G. stearothermophilus H6
Based on the specificity of the high-temperature tolerance of G. stearothermophilus H6, the whole genome of the strain was sequenced to further explore the specific coding genes associated with its high-temperature tolerance. Gene prediction was carried out with Prodigal v2.6.3 software, and a genome completion map was obtained through assembly and construction. The size of the genome sequence of G. stearothermophilus H6 was 3,054,993 bp, and the average GC content was 51.66%. It was predicted that there were

Detection of Protease Activity in G. stearothermophilus H6
The protease hydrolysis activity of G. stearothermophilus H6 was observed by the skimmed milk plate method and compared with that of B. velezensis, B. subtilis and E. coli BL21(DE3). The four strains were cultured at both 37 • C and 60 • C, and the results showed that G. stearothermophilus H6 could produce transparent circles that became larger with increasing culture time ( Figure 3). When cultured at 37 • C, B. velezensis, B. subtilis and G. stearothermophilus H6 produced transparent circles, with B. velezensis producing the strongest degradation results. When cultured at 60 • C, G. stearothermophilus H6 produced the largest transparent circle, and the other three strains did not produce a transparent circle. Thus, G. stearothermophilus H6 produces extracellular proteases that function under high temperature and show a good effect.

Detection of Protease Activity in G. stearothermophilus H6
The protease hydrolysis activity of G. stearothermophilus H6 was observed by the skimmed milk plate method and compared with that of B. velezensis, B. subtilis and E. coli BL21(DE3). The four strains were cultured at both 37 °C and 60 °C, and the results showed that G. stearothermophilus H6 could produce transparent circles that became larger with increasing culture time (Figure 3). When cultured at 37 °C, B. velezensis, B. subtilis and G. stearothermophilus H6 produced transparent circles, with B. velezensis producing the strongest degradation results. When cultured at 60 °C, G. stearothermophilus H6 produced the largest transparent circle, and the other three strains did not produce a transparent circle. Thus, G. stearothermophilus H6 produces extracellular proteases that function under high temperature and show a good effect.

Overview of the Genome Assembly and Whole Genome of G. stearothermophilus H6
Based on the specificity of the high-temperature tolerance of G. stearothermophilus H6, the whole genome of the strain was sequenced to further explore the specific coding genes associated with its high-temperature tolerance. Gene prediction was carried out with Prodigal v2.6.3 software, and a genome completion map was obtained through assembly and construction. The size of the genome sequence of G. stearothermophilus H6 was 3,054,993 bp, and the average GC content was 51.66%. It was predicted that there were

Overview of the Genome Assembly and Whole Genome of G. stearothermophilus H6
Based on the specificity of the high-temperature tolerance of G. stearothermophilus H6, the whole genome of the strain was sequenced to further explore the specific coding genes associated with its high-temperature tolerance. Gene prediction was carried out with Prodigal v2.6.3 software, and a genome completion map was obtained through assembly and construction. The size of the genome sequence of G. stearothermophilus H6 was 3,054,993 bp, and the average GC content was 51.66%. It was predicted that there  (Table 2). Based on the genome information obtained by assembly and prediction, such as information on tRNAs, rRNAs, repeat sequences, GC contents and gene functions, Circos v0.66 software was used to obtain the circular genome map (Figure 4).  (Table 2). Based on the genome information obtained by assembly and prediction, such as information on tRNAs, rRNAs, repeat sequences, GC contents and gene functions, Circos v0.66 software was used to obtain the circular genome map (Figure 4).  the fourth circle is the repeat sequence; the fifth circle is tRNA and rRNA, where blue is tRNA and purple is rRNA; the sixth circle is the GC content. The straw yellow part indicates that the GC content in this region is higher than the average GC content of the genome. The higher the peak value is, the greater the difference between the GC content and the average GC content is. The blue part indicates that the GC content in this region is lower than the average GC content of the genome; the innermost circle is GC skew. Dark grey represents the area where G content is greater than C, and red represents the area where C content is greater than G.
The amino acid sequence of G. stearothermophilus H6 was compared with the Nr database, and the corresponding species information was obtained from the annotation database. Through BLAST searches comparing the protein sequences of genes with the Nr database, the most similar sequences in the Nr database could be found. The corresponding annotation information of the sequences was the annotation information of the corresponding gene in the genome sequence. A total of 3699 genes were annotated. the fourth circle is the repeat sequence; the fifth circle is tRNA and rRNA, where blue is tRNA and purple is rRNA; the sixth circle is the GC content. The straw yellow part indicates that the GC content in this region is higher than the average GC content of the genome. The higher the peak value is, the greater the difference between the GC content and the average GC content is. The blue part indicates that the GC content in this region is lower than the average GC content of the genome; the innermost circle is GC skew. Dark grey represents the area where G content is greater than C, and red represents the area where C content is greater than G.
The amino acid sequence of G. stearothermophilus H6 was compared with the Nr database, and the corresponding species information was obtained from the annotation database. Through BLAST searches comparing the protein sequences of genes with the Nr database, the most similar sequences in the Nr database could be found. The corresponding annotation information of the sequences was the annotation information of the corresponding gene in the genome sequence. A total of 3699 genes were annotated. BLAST comparisons of the protein-encoding gene sequences of the whole genome were performed against the eggNOG database, and a database of the results was generated. The database was frequently used to classify and annotate genes of newly sequenced genomes. The annotation information and classification information in the genome corresponded to the gene sequences of the sequenced genome. A total of 3186 genes were annotated in the database.
The amino acid sequences of G. stearothermophilus H6 were subjected to BLAST searches in the KEGG database to assemble databases of the biological pathways related to diseases, drugs and chemical substances in the genome. The strain has 1798 genes in the KEGG database.
The prediction results were annotated in the GO database. The number of genes dominated by GO functional classifications mainly included the highest-level functional nodes: cellular component, molecular function and biological process. A total of 2686 genes were predicted in the database ( Figure 5). BLAST comparisons of the protein-encoding gene sequences of the whole genome were performed against the eggNOG database, and a database of the results was generated. The database was frequently used to classify and annotate genes of newly sequenced genomes. The annotation information and classification information in the genome corresponded to the gene sequences of the sequenced genome. A total of 3186 genes were annotated in the database.
The amino acid sequences of G. stearothermophilus H6 were subjected to BLAST searches in the KEGG database to assemble databases of the biological pathways related to diseases, drugs and chemical substances in the genome. The strain has 1798 genes in the KEGG database.
The prediction results were annotated in the GO database. The number of genes dominated by GO functional classifications mainly included the highest-level functional nodes: cellular component, molecular function and biological process. A total of 2686 genes were predicted in the database ( Figure 5).

Analysis of G. stearothermophilus H6 Protease
The predicted gene sequences were compared with eggNOG, GO, KEGG, Nr, Pfam, Swiss-Prot, TrEMBL and other general databases to obtain gene functional annotation results. Approximately 141 proteases were predicted, which accounted for approximately 3% of the total encoded proteins. The predicted proteases mainly consisted of serine proteases and metalloproteinases and a few cysteine proteases, aspartic proteases and

Analysis of G. stearothermophilus H6 Protease
The predicted gene sequences were compared with eggNOG, GO, KEGG, Nr, Pfam, Swiss-Prot, TrEMBL and other general databases to obtain gene functional annotation results. Approximately 141 proteases were predicted, which accounted for approximately 3% of the total encoded proteins. The predicted proteases mainly consisted of serine proteases and metalloproteinases and a few cysteine proteases, aspartic proteases and threonine proteases, accounting for 19%, 27%, 1%, 2% and 1% of the predicted proteases, respectively. Other proteases could not be characterized (Table 3). Secretory proteins are proteins secreted from the cells of living microorganisms. Through the prediction and analysis of signal peptides and secretory proteins in the genome, approximately 161 proteins with signal peptides were identified, which could form secretory proteins. The predicted proteases included 18 secretory proteins with signal peptides, and the serine and metalloproteinases included 5 secretory proteins with signal peptides.
The 18 secreted proteases were analyzed and compared with the proteases of the G. stearothermophilus B5 and G. stearothermophilus 10 genomes. Among them, GE000377 was not predicted to show homologous proteins in G. stearothermophilus B5 but showed higher homology with the proteins of G. stearothermophilus 10 (99.24%); GE003445 was not predicted to show homologous proteins in G. stearothermophilus 10 but presented homologous proteins with G. stearothermophilus B5 (76.00%). The homologous proteins encoded by the GE002130, GE003446, GE003450 and GE003532 genes in the two homologous strains exhibited low similarity. These proteins may be encoded by genes unique to G. stearothermophilus H6. Among the 18 proteases with signal peptides, most of the signal peptides were removed by the type I signal peptidase SP (Sec/SPI), and only the signal peptides of the GE000438 and GE002405 genes were removed by the type II signal peptidase LIPO (Sec/SPII) ( Table 4).  The prediction and analysis of transmembrane helix structures in the genome indicated that approximately 841 proteins had transmembrane helix structures. Among these proteins, 46 proteases had transmembrane helix structures, while serine proteases, metalloproteinases and aspartic proteases had 12, 16 and 3 transmembrane helix structures, respectively.

Construction of GS-SP1 Protein Expression Vector and Verification of Secreted Proteases
The GS-SP1 protease gene was cloned into the pET22b expression vector, which contains the signal peptide pelB upstream of multiple cloning sites. Then, the constructed pET22b/gs-sp1 plasmid was transformed into E. coli BL21 (DE3) (Figure 6a), and wild-type E. coli BL21 and E. coli BL21/pET22b were used as controls. Five microliters of bacterial liquid culture was spotted onto a skimmed milk plate, and culture was performed at 37 • C for 24 h to observe transparent circle development. The results showed that E. coli BL21/pET22b/gs-sp1 could produce a transparent circle when induced by 0.1 mM IPTG, while the other two strains could not (Figure 6b). These results indicated that the GS-SP1 protein showed protease activity, and further investigation of this protein will be important for the exploitation of thermophilic proteases.

Discussion
In this study, we isolated a strain of Geobacillus from hyperthermophilic compost and named it G. stearothermophilus H6. Thereafter, 16S rRNA sequence analysis and comparisons showed that the strain presented the highest consistency with G. stearothermophilus

Discussion
In this study, we isolated a strain of Geobacillus from hyperthermophilic compost and named it G. stearothermophilus H6. Thereafter, 16S rRNA sequence analysis and comparisons showed that the strain presented the highest consistency with G. stearothermophilus B5 (99.97%). The skimmed milk plate experiment showed that G. stearothermophilus H6 could produce extracellular proteases with enzyme activity at high temperature. Whole-genome sequencing and genome annotation analysis revealed that G. stearothermophilus H6 produces a variety of enzymes with biotechnological significance, such as proteases, amylases and lipases. Thus, it may be an important source of thermophilic enzymes and has important research value.
Geobacillus is a genus of thermophilic Gram-positive bacteria belonging to Bacillaceae, including denitrifying bacteria, facultative anaerobes and obligate aerobic bacteria, which can grow at 45-80 • C [29]. The members of the genus can form endophytic spores, which can diffuse through global atmospheric circulation [30] and are widely distributed in environments such as in soil, hot springs, dairy plants or other food processing plants [8]. The chromosomes and plasmids of Geobacillus species exhibit significant genetic diversity. Bezuidt et al. [31] analyzed the pangenome of 29 genome sequences of Geobacillus sp. and found that the core genome was relatively small, mainly consisting of Bacillus-related genes, indicating that these bacteria originated from an ancestor of Bacillus; it contained a large number of dispensable genomes, which showed that Geobacillus spp. can achieve extensive genomic diversity through horizontal gene transfer, which is the key mechanism whereby Geobacillus spp. adapt to different environmental niches. For example, G. stearothermophilus obtained the lac operon through horizontal gene transfer, enabling it to survive in dairy products [32]. This feature provides a new way to produce thermostable enzymes for industrial use through the evolution of thermoadaptive directed enzymes, thus expanding the biotechnological application of Geobacillus spp. For example, G. kaustophilus HTA42 producing thermostable variants of rRNA methyltransferase was generated through thermal-adaptation-directed evolution [33]. G. stearothermophilus H6 shows potential as a host for whole-cell applications and a biological tool in evolutionary engineering.
The characteristics and distribution of proteases in the G. stearothermophilus H6 genome indicate that its proteases consist of serine proteases, metalloproteinases, cysteine proteases and aspartic proteases, and the proportion of proteases from PDB entries distributed in all Bacillus genomes is similar [34,35]. The exploration of the proteases of this strain may provide knowledge for the discovery of new potential proteases with various potential industrial applications. By analyzing G. stearothermophilus H6 genome proteases, we screened 18 proteases with signal peptides, selected the gs-sp1 gene for heterologous expression, and successfully expressed the protease in E. coli BL21. Compared with the homologous strains G. stearothermophilus B5 and G. stearothermophilus 10, the gs-sp1 protease had higher homology. In addition, G. stearothermophilus H6 had unique protease genes, like the GE003532 gene, which had low similarity among homologous strains. The G. stearothermophilus H6 genome also contains a variety of other enzymes which may have high thermal stability and broad application prospects in biotechnology applications. The ability of the thermophilic Geobacillus microorganisms to grow under high temperatures makes them a valuable resource for the development of new biotechnological applications [36]. They can be a source of many thermophilic enzymes, such as proteases, xylanases, amylases and lipases, and can be used for the synthesis of biofuels, such as bioethanol, isobutanol, biogas and biodiesel [37][38][39].
Currently, many species of Geobacillus are used to produce thermophilic enzymes either naturally or through the introduction of genetic engineering. Thermophilic enzymes are mainly used in biotechnology [40], including the food industry, detergent industry, leather industry, and medical industry [41]. The proteases isolated from Geobacillus sp. are extremely heat-resistant and can be used to improve the biodegradation of sewage sludge [42]. The optimum conditions for Geobacillus sp. YMTC 1049 to produce serine protease are 85 • C and pH 7.5 [43]. Due to the decreasing reserves of natural fossil fuels, the world needs to produce biofuels to develop alternative energy sources or fuels [8].
Geobacillus is used to biodegrade agricultural and industrial residues such as beet, soybean, barley, sugarcane, corn, sorghum and other biomass and produce biofuels through modern processes [44]. G. stearothermophilus has been employed to produce bioethanol using sucrose as a carbon source at approximately 70 • C, and the product yield is the same as that of yeast [45]. When Geobacillus strain AT1 is added to methanogenic sludge, it could effectively improve biogas production due to protease activity [46].
G. stearothermophilus is an important species of Geobacillus that can be employed as a source of various thermophilic enzymes and is widely used in a variety of biotechnology industries. Thermophilic enzymes produced by G. stearothermophilus SR74 α-amylase can be used in the papermaking, food and other industries [15]. G. stearothermophilus strain RM is used for the mass production of α-glucosidase at high temperature [47]. G. stearothermophilus PS11 can produce thermophilic and stable lipase under high temperature and alkali conditions, which is used for the production of biodiesel [17]. A protease cloned from G. stearothermophilus strain B-1172 has been used in the detergent and many other industries due to its catalytic domain and good activity [20]. G. stearothermophilus H6 isolated from hyperthermophilic compost can produce a protease with good activity at high temperature, which has broad application prospects in biotechnology applications.