Next Article in Journal
The Impact of Life History Traits and Defensive Abilities on the Invasiveness of Ulex europaeus L.
Previous Article in Journal
Glyphosate-Induced Shifts in Edaphic Microbiota: A Comparative Study of Bacterial and Fungal Responses in Historical Milpa Soils
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

β-Glucosidases: In Silico Analysis of Physicochemical Properties and Domain Architecture Diversity Revealed by Metagenomic Technology

1
Institute of Biology, Vietnam Academy of Science and Technology (VAST), 18-Hoang Quoc Viet, Cau Giay, Hanoi 10072, Vietnam
2
School of Biotechnology, Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18-Hoang Quoc Viet, Cau Giay, Hanoi 10072, Vietnam
3
Faculty of Biology, University of Science, Vietnam National University, 334 Nguyen Trai, Hanoi 11406, Vietnam
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Diversity 2025, 17(11), 804; https://doi.org/10.3390/d17110804
Submission received: 9 September 2025 / Revised: 6 November 2025 / Accepted: 10 November 2025 / Published: 20 November 2025
(This article belongs to the Section Microbial Diversity and Culture Collections)

Abstract

β-Glucosidases, ubiquitous enzymes with significant contribution to several industries were previously identified as diverse in bacterial metagenomes from Vietnamese native goat rumens, wood humus from Cuc Phuong national forest, and termite gut. In this study, we systematically analyzed their sequence diversity, domain architectures, domain arrangements, physicochemical properties, and producers associated with their structures, conserving catalytic domains. A total of 833 β-glucosidase sequences were categorized into three families: GH1, GH16, and GH3, forming 30 distinct domain architectures with variable isoelectric points, alkaline scores, and melting temperatures across ecological niches. GH1 enzymes exhibited the lowest architectural diversity, whereas GH16 enzymes were frequently associated with carbohydrate-binding module 4 (CBM4) and CBM12 domains. Over 90% of GH3 enzymes contained fibronectin type III (FN3) and accessory domains such as PA14, CBM6, Big_2, or ExoP, with some harboring secondary catalytic domains. Most goat rumen β-glucosidases originated from cellulosome-producing bacteria. A recombinant β-glucosidase GH3-31 expressed in E. coli exhibited optimal activity at 40 °C (lower than the predicted Tm of 49.8 °C), pH5.5 (near the predicted pI of 5.61), Km of 1.37 mM ± 0.08 mM, and Vmax of 43.17 ± 0.6 U/mg. Its activity was enhanced by Tween 20, Tween 80, Triton X-100, and CTAB. These findings provide a comprehensive resource for β-glucosidase engineering and application-oriented screening.

1. Introduction

β-Glucosidase (EC 3.2.1.21, BGC) belongs to the large glycosyl hydrolase (GH) family of enzymes, which are capable of hydrolyzing the glycosidic bond from terminal non-reducing β-D-glucosyl residues. These enzymes act on a wide range of substrates, including cellobiose and other short-chain oligosaccharides, glycoconjugates such as glucosides, 1-O-glycosyl esters, and glycosidically bound volatiles, thereby releasing glucose and high-value chemicals and compounds [1,2]. In industry, β-glucosidase is a key enzyme in the final step of cellulose degradation within the cellulase complex, to release glucoses in biomass conversion, bioethanol production, and biorefinery processes. In the food and beverage industry, these enzymes are employed to hydrolyze β-glucosidic flavor precursors in plants to enhance flavor, reduce bitterness, and release aroma compounds [1,3,4]. In the pharmaceutical and medical fields, β-glucosidases participate in the synthesis of bioactive glycosides, production of aglycones with therapeutic properties such as antioxidant, anti-inflammatory and anticancer activities, and improvement of drug solubility [3,5]. For example, β-glucosidase have been applied to converse effectively, economically ginsenoside precursors to minor ginsenosides—a group of bioactive compounds from Panax ginseng [6]. Through the activity, bacterial β-glucosidases contribute to natural carbon cycling and enhance nutrient availability in soils as well as in the digestive systems of humans, animals, and insects. Owing to their wide range of applications, particularly in key industries, β-glucosidases have attracted increasing research interest for application in both environmental sustainability and economic development. In fact, β-glucosidases represented a significant portion of the global enzyme market, accounting for 11% of the total global enzyme market in 2021 [7] with a value of 1042 million USD. This value is projected to reached 1412.5 million USD in 2025 and 2591.7 million USD in 2033 [8]. However, some limitations including low stability, narrow substrate specificity, high production costs, and glucose sensitivity remain barriers to enzyme industrial applications [7,9]. Moreover, most β-glucosidases function optimally at 40–70 °C and pH 4.5–5.0, whereas industrial processes often require more extreme conditions, such as bio-fuel production at temperatures above 80 °C or flavor enhancement in fruit juices at acidic pH 2.8–3.8 [10]. Therefore, the discovery of novel enzymes from diverse microbial sources, particularly gene pools from non-cultivated microbiota (accounting for over 99% of the total microbial population [11]), as well as natural enzyme structures for enzyme engineering is essential to meet industrial demands.
Based on amino acid sequence similarity and conserved motifs, β-glucosidases listed in the Carbohydrate Active Enzyme database (CAZy, http://www.cazy.org/, accessed on 25 August 2025) are categorized into 12 GH families: GH1-GH5, GH16, GH30, GH39, GH116, GH131, GH175, and GH180. Among these, GH1, GH3 β-glucosidases are the most prevalent and well-studied, and they contained glucose-tolerant members [12]. GH1 β-glucosidases offer several advantages for industrial applications as they can hydrolyze a broad range of β-glucosidic linkages (β-1,2; β-1,3; β-1,4; β-1,6) in diverse plant-derived substrates and often display glucose tolerance, making them the most extensively characterized [1,2,4,5,6,13,14]. Structural studies show that GH1 β-glucosidases have a relatively simple architecture, consisting solely of the conserved GH1 catalytic domain that forms a narrow and deep substrate-binding pocket [10], without any accessory domains [15,16]. This structural simplicity, however, may restrict the size of substrates that can be accommodated. In contrast, GH3 β-glucosidases exhibit diverse domain architectures. They typically contain a fibronectin type III (FN3) domain downstream of the GH3C domain [17] and are often collocated with carbohydrate-binding modules (CBMs) such as CBM2, CBM6, CBM9, and CBM11 [18]. These accessory domains enhance enzyme affinity to their substrates and thereby improve hydrolytic efficiency. Nonetheless, most GH3 β-glucosidases are glucose-sensitive and unstable [12], with a few rare exceptions [19].
Over the last two decades, metagenomics technology has emerged as one of the most powerful approaches for discovering not only novel β-glucosidases, but also numerous valuable proteins and enzymes [20]. For example, glucose-tolerant GH3 β-glucosidase identified from soda lake metagenomics libraries had activity stimulated at 300 mM glucose, against high salt concentration of 1 M, and functioned optimally at pH 8.5, making it valuable for biomass degradation [21]. Other metagenome-derived β-glucosidases, such as those from sheep rumen [22] and hot spring [23] have been characterized for desirable properties, including glucose tolerance or multi-functionality [24,25].
With the aim of mining novel enzymes for biomass conversion, our previous studies employed Illumina shotgun sequencing to analyze bacterial metagenomes extracted from the gut of lower termite Coptotermes gestroi; rumens of native Co and Bach Thao goats fed a lignocellulose-rich diet; as well as humus derived from fungal-degraded woods in Cuc Phuong National Forest. The sequencing generated datasets of 5.4 Gb from termite gut [26], 8.4 Gb from goat rumens [27,28], 48.7 Gb from goat rumens [29], and 51.8 Gb from humus [30]. Briefly, in the metagenomic dataset obtained from the termite gut, low-quality reads (containing at least three “N” nucleotides) accounted for 1.64% of the total, while clean reads represented 96.7% of the raw reads. Approximately 40.1% of the reads were mapped to contigs of at least 500 bp, and a total of 125,431 genes were predicted, of which 37,545 were complete genes. About 80% of the predicted genes were annotated as bacterial in origin [26]. From the 8.4 Gb goat rumen dataset, 143 β-glucosidase-encoding sequences were previously analyzed for modular structure. However, due to database limitations, the diversity of domain architectures was underestimated [17], thus the sequences were excluded from this study. In the metagenomic datasets derived from bacteria in goat rumen (48.7 Gb) and humus (51.8 Gb), sequencing quality was high, with Q30 values of 94.59% and 94.49%, respectively. The mapping rates of reads to contigs were 64.22% and 62.81%, while bacterial genes accounted for 98.07% and 99.69%, respectively [29,30]. Based on KEGG annotations (set E-value of less than 10−5), a total of 503 genes from humus [30], 211 genes from termite gut [26], and 961 genes from goat rumen [29] were identified as β-glucosidase. While earlier studies provided overview pictures of lignocellulases in these metagenomic datasets, detailed analyses specifically focused on β-glucosidases were lacking. For instance, β-glucosidases from wood humus were only briefly categorized into six main types—GH3-FN3, GH1, GH3, GH43, GH3-Exop_C and CE3 [30]—without in-depth analysis of their domain architectures, structural arrangements, or bacterial origins at genus level. Similarly, the domain architectures of 961 β-glucosidases from goat rumen were not fully characterized [29]. Across datasets, bacterial communities producing β-glucosidases were not resolved at the genus level, nor were their associations with GH families or domain architectures clarified to highlight potential enzyme producers. Furthermore, physicochemical properties predicted in silico were not comprehensively compiled or systematically analyzed. A particular limitation of prior analyses was the inclusion of incomplete genes because gene prediction often failed to capture full-length sequences.
In this study, all the genes were carefully validated to ensure completeness, with reliable catalytic domains confirmed before further analysis. Comprehensive investigations were conducted to generate a detailed overview of β-glucosidases—including GH families, domain architectures, domain rearrangements, pI, alkaline score, melting temperature (Tm), and producer taxonomy—in goat rumen, termite gut, and wood humus. These findings provide a systematic foundation for future research in enzyme engineering and β-glucosidase-producing bacterial screening for industrial applications.

2. Materials and Methods

2.1. Materials

The β-glucosidase datasets consisted of 503 sequences from wood humus [30], 211 genes from termite gut [26], and 961 genes from goat rumen [29], which were used for collecting complete enzyme sequences by excluding sequences shorter than 200 amino acids or lacking conserved β-glucosidase catalytic domains examined by BLASTP 3.21 (https://www.ncbi.nlm.nih.gov/Structure/cdd/, accessed on 10 August 2025), Pfam version 38.0 (http://pfam.xfam.org/search, accessed on 14 August 2025) and HMM 3.0 (https://www.ebi.ac.uk/Tools/hmmer, accessed on 14 August 2025). A total of 522 sequences from the goat rumen dataset, 285 sequences from the forest humus dataset, and 26 sequences from termite gut were selected for in silico analysis of their physicochemical properties and modular architecture. All the 833 β-glucosidase sequences of three datasets were described in Table S1.

2.2. Phylogenetic Analysis of β-Glucosidase Sequences

The amino acid sequences of GH1, GH16, and GH3 β-glucosidases in the three datasets were separately aligned by GH family using MUSCLE with default parameters (https://www.ebi.ac.uk/jdispatcher/msa/muscle, accessed on 15 August 2025). All sequences were successfully aligned, and a .mas file was generated without missing data. The alignment file was converted from .mas to .meg format for phylogenetic tree analysis in MEGA5.2. Phylogenetic trees for GH1, GH16, and GH3 β-glucosidases were constructed using the neighbor-joining algorithms implemented in MEGA5.2 software with the maximum likelihood method. The reliability of branching was assessed by bootstrap resampling with 1000 pseudo-replicates, and all bootstrap values (up to 100%) were displayed to confirm the statistical robustness of the phylogenetic analysis. All sequence names were included. Phylogenetic trees of GH1 and GH16 β-glucosidases were visualized in circle format, with distance scales set to 1.0 for GH1 and 0.5 for GH16. A phylogenetic tree of GH3 β-glucosidases—representing the most diverse and abundant group—was visualized in linearized format, fitted to the display, and annotated with representative sequence names to facilitate interpretation of sequence sites related to derivative environmental sources.

2.3. In Silico Analysis of Physicochemical Properties and Domain Architectures of β-Glucosidases

To predict the probability that an enzyme is acidic or alkaline based on its amino acid sequence, the AcalPred software (http://lin-group.cn/server/AcalPred, accessed on 16 August 2025) [31,32], was employed based on similarity of input sequence to its data. A feature selection approach was applied to extract informative features, which were subsequently used to construct a prediction model based on support vector machine (SVM) algorithms. The total score equaled one, corresponding to the sum of the acidic and alkaline scores, wherein an alkaline score of 1 indicates a 100% probability that the enzyme is alkaline, whereas an alkaline score of 0 indicates a 0% probability. However, the AcalPred software cannot help to predict the specific pH range of enzyme activity but only estimates whether the enzyme is more likely to function under alkaline or acidic conditions. The isoelectric point (pI) of the enzymes was calculated using the software version 2 provided by the University of Alberta (https://www.bioinformatics.org/sms2/protein_iep.html; accessed on 17 August 2025) with monoisotopic resolution. Protein thermostability was predicted through melting temperature (Tm), estimated by the TBI deepStabP 1.1.0 (https://csb-deepstabp.bio.rptu.de/; accessed on 17 August 2025) with parameters set for enzymes in lysate and synthesized at 22 °C (enzymes from wood humus and termites) or 37 °C (enzyme from goat rumen). Functional domains within the enzyme sequences were identified using the Pfam version 38.0 (http://pfam.xfam.org/search, accessed on 14 August 2025) and HMM 3.0 (https://www.ebi.ac.uk/Tools/hmmer, accessed on 14 August 2025) [33] and CAZy database [34] with all threshold E values below 10−5.

2.4. Taxonomic Assignment

For the identification of β-glucosidase origins, the genes datasets from the metagenomic DNA data were searched against the NCBI non-redundant protein (NR) database using the BLASTx version 2.12.0 algorithm with an E-value of less than 10−5 [26,29,30]. The protein datasets deduced from genes coding for β-glucosidase were retrieved and integrated with taxonomic outputs. The Metagenome Analyzer program (MEGAN, version 4.6) providing an integrated platform suitable for analyzing metagenomic datasets of moderate size [35] was employed for taxonomic assignment based on the NCBI taxonomy, using lowest common ancestor (LCA) algorithm. The LCA algorithm assigns proteins to taxa such that the taxonomic level of the assignment reflects the level of sequence conservation. A phylogeny tree was constructed to illustrate the genetic relationships of the bacterial enzymes from humus, termite gut, and goat rumen. Taxonomic level correlation was visualized using Krona 2.8.1 (https://sourceforge.net/projects/krona/, accessed on 23 August 2025).

2.5. Purification and Characterization of β-Glucosidase GH3-31

With both aims of (1) production of bifunctional enzyme for application; and (2) testing the properties of the enzyme predicted in silico, the gene GL0036730 (3780 nucleotides, derived from goat rumen, Table S2) encoding a β-glucosidase GH3N-GH3C-FN3-GH31 (abbreviated as BGC-GH3-31, 1259 amino acids, Figure S1A,B), was codon optimized, artificially synthesized (Table S2), and cloned into the NcoI-XhoI sites of pET22b (+) for expression in E. coli BL21 Rosetta [36]. For the first objective, the enzyme BGC-GH3-31 contains two catalytic domains, GH3 and GH31, which theoretically enable it to act on both α- and β-glycosides in diverse carbohydrate substrates that have not yet been investigated. This domain architecture is novel, unique within the datasets, and has not been observed elsewhere. For the latter aim, in addition to this enzyme, another GH3 β-glucosidase (sequence code GL0050362 from humus) was also recombinantly produced in E. coli [37] and functionally characterized [38] for comparison. The enzyme BGC-GH3-31 was in silico predicted to have Tm of 49.8 °C, alkaline score of 0.85, and pI of 5.61. For BGC-GH3-GH31 expression, recombinant E. coli cells were cultured in LB medium (0.5% yeast extract, 1% peptone, 1% NaCl) and induced with 0.1 mM IPTG (isopropyl β-D-1-thiogalactopyranoside) at 37 °C for 5 h. The cells were harvested by centrifugation at 6000 rpm for 10 min at 4 °C and re-suspended in water to an optical density at 600 nm (OD600) of 10.
For enzyme purification, the cells were disrupted by sonication on ice (10 pulses, 30 s each at 100 W with 20 s intermission). The soluble fraction was collected after centrifugation at 13,000 rpm for 10 min at 4 °C. The expressed soluble proteins were applied to a 5 mL Ni-charged Sepharose Fast Flow column (HisTrap; GE Healthcare, Chicago, IL, USA) pre-equilibrated with 10 column volumes (CV) of 20 mM PBS buffer pH7.0 (274 mM NaCl, 5 mM KCl, 20 mM Na2HPO4•H2O, 3.5 mM KH2PO4) containing 20 mM imidazole. Contaminant proteins were removed by washing with 5 CV of the equilibration buffer followed by 10 CV of the same buffer containing 50 mM imidazole. The target enzyme was eluted with PBS buffer containing 300 mM imidazole. Elution fractions (2 mL each) were collected and analyzed on 12.6% polyacrylamide gel by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The purified protein solution was supplemented with 10% glycerol and desalted using 50 kDa cutoff dialysis tubing (Spectra/Por, Repligen, Waltham, MA, USA) against 20 mM PBS buffer, pH 7.0, containing 5% glycerol. The purity of recombinant BGC_GH3-31 was assessed by SDS-PAGE and analyzed by Quantity One Software 4.6.8 (Bio-Rad, Hercules, CA, USA). Enzyme concentration was determined by Bradford method [39]. The desalted enzymes were subsequently used for determination of β-glucosidase activity and characterization of its properties.
β-Glucosidase activity was measured as previously described [40]. Briefly, the assay mixture (200 µL) contained 20 µL of pure enzyme and 180 µL of 5 mM p-nitrophenyl-β-glucopyranoside (pNPG) (Sigma Chemical Co., Ronkonkoma, NY, USA) in 10 mM PBS pH 7. Negative controls were prepared by replacing the enzyme with PBS buffer. Reactions were carried out at 37 °C for 30 min then stopped by adding 800 µL of 0.2 mM Na2CO3 with thorough mixing. The released p-nitrophenol (pNP) was measured at 405–410 nm using p-nitrophenol (pNP, 1048, Sigma, St. Louis, MO, USA) as the standard. One unit of β-glucosidase activity was defined as the amount of enzyme required to hydrolyze pNPG to release 1 µmol of pNP per minute under the assay conditions. All measurements were performed in triplicate.
The effect of incubation time, pH, temperature, various ions, and detergents on enzyme activity was evaluated. Incubation time was tested from 5 to 40 min at 5 min intervals; pH varied from 3.5 to 8.0 at 0.5-unit intervals; temperature was tested at 30, 37, 40, 45, 50 °C; glucose dependence was assessed in the range of 0–50 mM. PBS buffer was used for pH profiling, with adjustments made using HCl or NaOH to the desired pH. Thermal stability was determined by pre-incubating the enzymes at temperatures of 25 °C, 35 °C, and 40 °C at the optimal pH (pH 5.5) for 1–4 h; samples were withdrawn every one hour to measure residual activity as described above. The effects of 10 mM metal ions (Na+, K+, Cu2+, Fe2+, Zn2+, Ni2+, Mg2+, Mn2+, Ca2+) urea, NaN3, EDTA, 2-mercaptoethanol, and 1% detergents (tween 20, tween 80, triton X-100, CTAB and SDS) on enzyme activity were also investigated. Substrate specificity of enzyme was determined using 0.5% birchwood xylan (Sigma, Taufkirchen, Germany), 0.5% carboxymethyl cellulose (Sigma Chemical Co., Ronkonkoma, NY, USA), 0.1% of pNP-β-D-xylopyranoside (Sigma, Ronkonkoma, NY, USA), and 2% filter paper in PBS buffer (pH 5.5) at 40 °C for 15 min and conducted as previously described [29]. Enzyme kinetics was determined using pNPG concentrations ranging from 0.5 to 18 mM under optimal assay condition.

2.6. Statistical Analysis

Statistical analyses were performed using SPSS version 20.0 (IBM, Armonk, NY, USA) to compare datasets of pI, Tm, and alkaline values among GH families and environmental sources including termite gut, goat rumen, and humus. Since sample sizes and variances between groups (assessed by the F-test) were unequal, the non-parametric Mann–Whitney U test was applied to compare differences between two groups. Correlations among pI, Tm, alkaline values, and amino acid sequence length were examined using the Pearson correlation coefficient and the χ2 test. Statistical significance was determined at p ≤ 0.05.

3. Results

3.1. Diversity of β-Glucosidase Sequences from Goat Rumen, Wood Humus, and Termite Gut

Although β-glucosidases are distributed across many GH families, including GH1-GH5, GH16, GH30, GH39, GH116, GH131, GH175, and GH180 (CAZy database, accessed on 25 August 2025), GH1, GH3 are the most prevalent, occurring in archaea, bacteria, and eukaryotes. In this study, a total of 883 β-glucosidase sequences (Table S1) were classified into three GH families: GH1, GH16, and GH3 based on consistent annotations from the KEGG, CAZy, and HMM databases (Table S3). Among these, GH3 was the most dominant family in both goat rumen and wood humus, accounting for 87.6% (458 sequences) and 73.9% (210 sequences) of 522 and 285 sequences, respectively. GH1 was the second most abundant family in humus (24.6%) but was the lowest abundant family in goat rumen (5.5%). The 26 sequences from termite gut belonged to GH1 (61.5%) and GH3 (38.5%) (Table 1 and Table S3).
Phylogenetic analysis revealed high sequence diversity within each GH family. However, sequences derived from the same source exhibited greater homology and were grouped into distinct clusters in the phylogeny trees (Figure 1).
In GH1 tree (115 sequences) and GH16 tree (40 sequences), sequences from goat rumen showed the greatest evolutionary variability, often forming basal lineages. GH1 sequences clustered into two major branches, one of which was further subdivided into two highly sub-branches (Figure 2A). GH16 sequences also formed two branches, but only one underwent extensive evolution (Figure 1B). GH3, comprising 678 sequences, clustered into two main branches: one containing a limited number of sequences and the other subdividing into two large branches, each containing diverse sequences from both humus and goat rumen (Figure 1C).
Although phylogenic trees of GH1, GH16, and GH3 exhibited different evolutionary patterns, sequences originating from the same source consistently grouped in corresponding clusters. This may reflect the presence of diverse bacterial populations producing β-glucosidase in each environment.

3.2. Physicochemical Properties of Bacterial β-Glucosidases from Wood Humus, Termite Gut and Goat Rumen

The isoelectric point (pI) of an enzyme is the pH at which its net surface charge is zero, and it plays a critical role in enzyme structure, function, and biotechnological applications. At the pI, the solubility of enzymes is decreased, that can affect substrate binding, catalytic efficiency, and overall activity. Enzymes typically exhibit maximal activity at pH values near, but not identical to, their pI. These make investigating the pI of an enzyme important. Moreover, the pI is a property that can be readily calculated from the enzyme’s amino acid sequence.
The pI of β-glucosidases from goat rumen ranged from 4.00 to 9.65, while those from wood humus ranged from 4.38 to 9.67, and those from termite gut ranged from 4.44 to 6.41 (Table S3), with corresponding skewness values of 1.872, 0.167 and 0.311, respectively. The mean and median pI values of enzymes from wood humus were the highest (6.8) and differed significantly from those from goat rumen, which had an average pI of 5.5 (Mann–Whitney U = 27,655, Z = −14.77, p = 0.000), and from termite gut, with an average pI of 5.4 (Mann–Whitney U = 1233, Z = −5.63, p = 0.000). The mean pI values of enzymes from termite gut and goat rumen did not differ significantly (p = 0.54). The pI values of enzymes also varied by GH family. The mean pI of GH1 was the highest (6.2), differing significantly from GH3 (mean pI = 5.9; Mann–Whitney U = 31,520, Z = −3.3, p = 0.001) and from GH16 (mean pI = 5.3; Mann–Whitney U = 1036, Z = −5.2, p = 0.000). The mean pI value of GH3 also differed significantly from that of GH16 (Mann–Whitney U = 7826, Z = −4.5, p = 0.000). Within a given source, enzymes belonging to GH3, GH1 showed similar pI values, whereas GH16 enzymes tended to have lower values. For example, in the goat rumen dataset, the mean pI of GH16 was 5.2, significantly lower than that of GH1 (5.6; Mann–Whitney U = 251, Z = −3.5, p = 0.001) and GH3 (5.5; Mann–Whitney U = 4756, Z = −4.0, p = 0.000) (Figure 2A).
The alkaline scores of enzymes from goat rumen and wood humus were relatively high, with median values of 0.8 and mean values of 0.7, significantly greater than those from termite gut (median 0.3; Mann–Whitney U = 2758 and 5113, respectively, Z = −2.2 and −2.1, respectively, p = 0.03), and also varied among GH families (Figure 2B). GH3 enzymes exhibited significantly higher alkaline scores (mean = 0.7) than GH1 enzymes (mean = 0.5; Mann–Whitney U = 22,405, Z = −7.3, p = 0.000). All GH16 enzymes were predicted to be acidic, with median and mean alkaline scores below 0.2, significantly lower than those of GH1 (Mann–Whitney U = 1116, Z = −4.8, p = 0.000) and GH3 (Mann–Whitney U = 2266, Z = −8.9, p = 0.000) (Figure 2B). However, in the metagenome datasets derived from goat rumen and humus samples, only 33 and 14 enzymes, respectively, were predicted to have an alkaline score > 0.99, indicating that the proportion of enzymes confidently functioning under alkaline conditions was low. The pI and alkaline score were independent variables and showed no correlation.
The Tm value is a fundamental characteristic of an enzyme, primarily determined by its amino acid sequence. At the melting temperature, half of the enzyme molecules are denatured, meaning that Tm directly reflects the enzyme’s thermostability. Investigation of the Tm values of β-glucosidases in these datasets based on their sequences showed that the majority of enzymes from all three sources had Tm values around 40–50 °C (Table S3). The mean Tm values of enzymes from goat rumen (47.7 °C), humus (47.0 °C), and termite gut (48.4 °C) do not differ significantly. Three enzymes from goat rumen and one from humus exhibited higher thermal stability, with Tm values ranging 60–70 °C or above 70 °C (Figure 2C,D). The Tm values of enzymes were independent of enzyme sizes. For example, two GH3 enzymes from goat rumen composed of 2253 and 2075 amino acids (AAs) were predicted to melt at 45 and 46 °C, respectively (Figure 2D). In contrast, the three enzymes with Tm value above 60 °C had smaller sizes of approximately 300–500 AAs (Figure 2D, Table S3). These thermally stable enzymes were also predicted to probably function in acidic environments, with alkaline values of 0.1–0.3 (Figure 2E, Table S3).

3.3. Domain Architectures of β-Glucosidases from Goat Rumen, Wood Humus and Termite Gut

A total of 30 modular structures of 833 β-glucosidases belonging to three families (GH1, GH16, and GH3) were identified. At first glance, GH1 exhibited the lowest diversity of domain architectures, consisting only of a catalytic domain with or without a signal peptide at N-terminal (Table 1). Remarkably, GH1 β-glucosidases derived from goat rumen and termite gut lacked a signal peptide, suggesting that these enzymes were either intracellular or part of a cellulosome complex. In wood humus, 13 of the 70 GH1 enzymes (18.6%) contained signal peptide that facilitated secretion. Signal peptide domain was detected in 65.0% of GH16 enzymes and 44.7% of GH3 enzymes (Table 1, Figure 3). The number of GH16 β-glucosidases detected in the goat rumen was higher than that from wood humus.
Notably, a significant proportion of these enzymes (55.6%) contained CBM4 domains at the C-terminus. A GH16 enzyme from the goat rumen was also identified as a component of cellulosomal cellulase—a complex for effective degradation of celluloses—characterized by the presence of a dockerin_1 domain at the N-terminus after a signal peptide. In addition, one sequence from the humus dataset was found to harbor both a CBM32 and Por secretion system domain at the C-terminus (Table 1, Figure 3).
The most complex, diverse, and sophisticated architectures were observed GH3 β-glucosidases, represented by 22 different modular structures (Table 1). Similar to GH16, 95% of GH3 enzyme-derived goat rumen, 98% of GH3 enzymes from wood humus, and 90% of GH3 enzymes from termite gut harbored at least one of the substrate binding domains, which include FN3, exopC, PA14, CBM6, and Big2. Especially, the FN3 domain was consistently located downstream of GH3C, appearing in 89.5% of GH3 enzymes from goat rumen, 95.7% of GH3 enzymes from wood humus, and 90.0% from termite gut. The catalytic region of GH3 consists of two conserved domains GH3N and GH3C, named according to their positions at the N- or C-termini and arranged as GH3N-GH3C. However, in 17.9% of GH3 β-glucosidases from goat rumen, and 1% from wood humus, GH3C was located at N-terminus to generate a domain arrangement of GH3C-GH3N (Table 1, Figure 3). In some cases, a signal peptide was also found at the N-terminus of the enzymes. When GH3C was positioned at the N-terminus, the FN3 domain was located between GH3C and GH3N (Figure 3).
Although PA14 and CMB6 are independent functional domains, they were normally inserted within GH3C domains. A small proportion of GH3 β-glucosidases from goat rumen also contained a second catalytic domain, such as GH5 (cellulase), GH31, CE8 (pectinesterase), and GH43 with or without an upstream Big2 domain.

3.4. Diversity of Bacteria Producing β-Glucosidase

Bacterial communities producing β-glucosidases in goat rumen and termite gut were dominated by Bacteroidetes and Firmicutes (Figure 4A,C, Table S3), whereas in wood humus the dominant groups were Bacteroidetes, Proteobacteria, and Actinobacteria (Figure 4B, Table S3). Across all datasets, the most dominant producers of GH3 enzymes were Bacteroidetes, accounting for 49% of total genes in goat rumen, 43% in wood humus, and 23% in termite gut (Figure 4A–C). Although Bacteroidetes was abundant in goat rumen and termite gut, GH1 enzymes in these environments originated mainly from Firmicutes (Figure 4A,C). By contrast, in wood humus, Bacteroides contributed a significant proportion of GH1 enzymes (Figure 4B). These differences likely reflect variation at the genus and species level.
In goat rumen, genus Prevotella contributed a substantial proportion (21%), producing mainly GH3 enzymes, followed by Bacteroides (Figure 4A,D, Table S3), whereas these genera were absent from the wood humus and termite gut communities (Figure 4B,C, Table S3). In wood humus, GH3-producing Bacteroidetes included Chryseobacterium, Flavobacterium, Pedobacter, Mucilaginibacter, and Arachidicoccus (Figure 4B, Table S3). Among Firmicutes, abundant GH3-producing bacteria included Butyrivibrio and Ruminococcus in goat rumen and Pseudomonas, Clostridia, and Pseudolactococcus in termite gut (Figure 4C). In humus, in addition to Sphingomonas (4%) and Stenotrophomonas (2%), a significant fraction of Proteobacteria-producing GH3 enzymes could not be classified at the genus level (Figure 4B,F). Ruminococcus was the most abundant GH16 producer in goat rumen. GH1 β-glucosidases were produced by a small proportion of Butyrivibrio (1%), whereas in humus, diverse genera such as Enterobacter, Novosphingobium, and Pedobacter were the main GH1 producers (Figure 4, Table S3).
In term of domain architecture, GH3N-GH3C enzymes were produced by a wide range of bacteria in both goat rumen and wood humus, whereas GH3C-GH3N enzymes were specifically produced by members of the class Clostridiales, including Butyrivibrio, Pseudbutyrivibrio, Eisenbergiella, Roseburia, and Lachnoclostridium (Figure 4E). Enzymes containing the GH3-PA14 domain in goat rumen were produced by Prevotella, while in wood humus they were produced by a diverse set of bacteria, including Chryseobacterium (20%), Flavobacterium (5%), Dyella (5%), Sulfitobacter (5%), and Bacteroidales (5%) (Figure 4G).

3.5. Purification and Characterization of β-Glucosidase GH3-31

β-Glucosidase GH3-31 (137 kDa) was overexpressed in E. coli in soluble form [36]. In this study, the enzyme was purified by His-tag affinity chromatography (Figure 5A). Washing with PBS buffer containing 50 mM imidazole removed a significant amount of protein contaminants. In the elution fractions E1 and E3, BGC-GH3-31 appeared relatively pure; however, in fraction E3, at higher concentration, some contaminating proteins smaller than 70 kDa were observed on the polyacrylamide gel (Figure 5A). Increasing imidazole concentration (70 mM, 100 mM) in washing buffer did not eliminate the contaminants but instead lead to the release of target protein. The β-glucosidase GH3-31 was unstable.
Docking analysis of the β-glucosidase GH3-31 model with some small substrates indicated the enzyme structure was stabilized in the presence of glycerol (Figure S1C). Therefore, after purification, glycerol was added to the enzyme solution at a final concentration of 10%. The enzyme was then desalted against PBS buffer containing 5% glycerol using a dialysis membrane with a 50 kDa molecular weight cut-off. After dialysis, the contaminants were partly removed from the purified BGC-GH3-31. On average, the enzyme purity was 96% (Figure 5B).
For evaluation of substrate specificity of the β-glucosidase GH3-31, the activity of the enzyme was tested against five substrates: pNPG, pNPX, birchwood xylan, carboxymethyl cellulose, and filter paper. Of these, only pNPG was hydrolyzed by the enzyme. This result confirmed that the enzyme annotated as a β-glucosidase by in silico analysis indeed exhibited β-glucosidase activity. The second catalytic domain of GH31 is predicted to hydrolyze α-1,4-glycosidic bond in starch, maltose, or xylose. However, the starch degradation has not been detected. It is also possible that GH31 contributes to enhancing the overall catalytic efficiency of the enzyme during the reaction process. However, this hypothesis requires further investigation. pNPG was therefore selected for further characterization of enzyme BGC-GH3-31. Reaction time analysis revealed that the highest activity was achieved after 15 min of incubation (Figure 5C).
According to AcalPred (http://lin-group.cn/server/AcalPred, accessed on 16 August 2025) and Tm predictor deepStabP 1.1.0, the β-glucosidase had a predicted Tm of 49.8 °C, pI of 5.61, and an alkaline score of 0.85 (suggesting activity at neutral to mildly alkaline pH). Experimental results demonstrated that the recombinant β-glucosidase was not an alkaline enzyme, because optimal pH for enzyme action was 5.0–5.5. This optimal pH was close to the enzyme’s pI. At pH6.5, enzyme retained 41.5% of its activity compared with that at the optimal pH (Figure 5D).
The optimal temperature for enzyme activity was 40 °C. Enzyme activity declined rapidly when the temperature increased from 45 °C to 50 °C. At 45 °C, 59.5% of enzyme activity was retained, but this dropped to 38.1% at 50 °C (Figure 5E). Overall, the enzyme was assessed as unstable. Even in the presence of glycerol, nearly 50% activity was lost during incubation at 25, 35, 40 °C. After 2 h at 40 °C, the enzyme was completely inactivated, and after 4 h at 25 °C, 90% of activity was lost (Figure 5F). An optimal temperature close to the Tm suggests that at optimal temperature (40 °C), thermal motion likely loosens interdomain contacts, allowing the two catalytic domains to move more freely, thereby enhancing enzyme–substrate interactions and function more efficiently. At temperatures below 40 °C, the catalytic region of the enzyme may be obscured, leading to the reduced activities. In aqueous solution without substrate, these interdomain contacts may remain rigid, promoting aggregation and resulting in a rapid loss of enzymatic activity. Highly purified proteins are particularly susceptible to such aggregation if suitable stabilizing agents are not included. Many factors have been considered to enhance enzyme stability, such as pH, mechanical stress (e.g., shaking), and the use of metal ions that can increase enzyme activity; however, we have not yet identified a suitable method.
The recombinant enzyme was tolerant to K+, Mg2+, Na+, Ca2+, urea, EDTA, 2-mercaptoethanol at concentration of 10 mM. It was activated by the detergents of tween 20, tween 80, triton X-100, and CTAB at 1%, but its activity decreased in the presence of Cu2+, Fe2+, Zn2+, Ni2+, and NaN3 at 10 mM and was completely abolized by SDS at 1% (Figure 5G). These results indicated that the enzyme structure was unstable, being sensitive to tempurature and the agents that disrup non-covalent bonds. However, the effects were concentration-dependent. NaCl concentrations ranging from 0–50 mM did not affect enzyme activity, but concentration of 100 mM or higher caused a sharp reduction (Figure 5H). Triton X-100 at concentrations of 0.3–3 mM enhanced enzyme activity, whereas higher concentrations reduced enzyme activity (Figure 5I). Tween and triton X-100 are surfactants that play important roles in preventing aggregation of hydrophobic patches, particularly between the two domains, thereby improving enzyme solubility and enhancing catalytic activity. This provides additional evidence that enzyme aggregation likely leads to a rapid loss of activity in aqueous solution.
β-Glucosidases are typically inhibited by the accumulation of glucose, the product released during catalysis. BGC-GH3-GH31 was strongly inhibited by glucose: at 2mM, 48% of its activity was lost (Figure 5J). Enzyme activity decreased progressively with increasing glucose concentrations, and at 50 mM glucose, activity was completely abolished (Figure 5J). This may be a reason for reducing enzyme activity quickly. Under optimal reaction conditions (PBS buffer at pH 5.5 containing 1% of triton X-100, 40 °C, 15 min), the kinetic parameters of BGC-GH3-31 toward pNPG were determined with a Km of 1.37 ± 0.08 mM and Vmax of 43.17 ± 0.6 U/mg.

4. Discussion

β-Glucosidases are essential enzymes catalyzing the hydrolysis of glycosidic bonds in carbohydrates to release glucose. These enzymes play a critical role in biomass degradation, biofuel production, and various biotechnological applications. β-Glucosidases exhibit remarkable sequence diversity, reflecting their widespread occurrence across bacteria, fungi, plants, and animals. While conserved catalytic motifs ensure the hydrolysis of β-glycosidic bonds, sequence variability enables adaptation to diverse ecological environments. In this study, sequence evolution appeared to be shaped by the bacterial community in specific habitats; thus, within a given GH family or phylogenetic sub-branch, β-glucosidase sequences from the same environments (e.g., wood humus, termite gut or goat rumen) tend to cluster together (Figure 1).
β-Glucosidases have been classified by GH family, subtrate specificity, or response to glucose. Bacterial β-glucosidases display extensive sequence diversity, spanning multiple GH families (GH1-GH5, GH16, GH30, GH39, GH116, GH131, GH175, and GH180 according to CAZy database accessed on 25 August 2025), with GH1 and GH3 being predominant. In this study, GH3 and GH1 β-glucosidases accounted for 93.1, 98.6, and 100% of β-glucosidase sequences in goat rumen, wood humus, and termite gut, respectively. Normally, GH16 β-glucosidases are less abundant in nature. Remarkably, GH16 β-glucosidases represented the second most abundant family in the goat rumen, contributing 6.9% of the sequences—higher than GH1, which accounted for 5.5%. These enzymes frequently contained CBM4/CBM32 domains, and dockerin-1 module, indicating their integration into cellulosomes. A significant proportion of GH16 β-glucosidases were originated from genus Ruminocuccus (Figure 2). Cellulosomes are multi-enzyme complexes specialized for efficient plant cell wall degradation in goat rumen and are also found in digestive tract of ruminants such as cattle, sheep, etc. [41]. They are assembled from numerous enzymes linked together by dockerin–cohesin interactions, anchored to a non-catalytic scaffoldin. Within the cellulosome, enzymes act synergistically to degrade complex polysaccharides, thereby enhancing energy acquisition for the host [42]. The presence of CBM4 in GH16 β-glucosidases increase enzyme affinity for substrates, thus enhancing the overall catalytic activity of the system. CBM4 is well known as a prevalent module in lignocellulases, particularly in cellulosomal cellulases from bacteria such as Clostridium thermocellum and Ruminiclostridium cellulolyticum [43]. Notably, GH16 enzymes were acidic (median of alkaline score was 0.07 in goat rumen and 0.12 in wood humus) while GH1and GH3 enzymes were not, so these enzymes are valuable to fruit industries.
GH1 enzymes are typically single domain and highly glucose-tolerant [13,14]. This property makes GH1 enzymes advantageous for application. In general, based on their sensitivity to glucose, β-glucosidases are devided into four classes: (i) strongly inhibited by low concentrations of glucose, (ii) glucose-tolerant, (iii) stimulated by low but inhibited by high glucose concentration; (iv) uninhibited at high glucose concentration [12]. Most GH3 β-glucosidases are strongly inhibited by low glucose concentrations, with the exception of the enzyme from Mucor circinelloides [44]. In contrast, the majority of GH1 β-glucosidases are glucose-toterant, although some GH1 enzymes are also inhibited by glucose [12]. Furthermore, GH1 β-glucosidases are predominant in the group of enzymes that are stimulated by low glucose concentrations but inhibited at high concentrations, depending on the glucose-binding site within the enzyme‘s active site [45]. In this study, the GH1 β-glucosidase identified from the goat rumen exhibited a simple domain structure without accessory modules and originated from Butyrivibrio. This genus also produced approximately 4% of GH3 β-glucosidases, especially GH3C–GH3N architecture. A previous study reported Prevotella as a crucial genus for lignocellulose degradation in goat rumen [29]. In addition to Prevotella, our analysis of β-glucosidase domain architecture highlighted the significant roles of Ruminococcus and Butyrivibrio in cellulose degradation within the rumen ecosystem. In contrast, GH1 β-glucosidases in wood humus were produced by diverse genera such as Enterobacter, Novosphingobium, and Pedobacter. These genera are not known to encode cellulosomal systems, reflecting a fundamental difference between lignocellulose-degrading bacterial communities in ruminants and those in other environments.
GH3 β-glucosidases exhibited highly diverse domain architectures; however, more than 90% of these enzymes contain accessory domains such as FN3, PA14, CBM6, Big_2, or ExoP (Table 1). These domains expand substrate-binding capacity and can influence enzyme stability [14,46,47]. The Big_2 domain is a structural motif characterized by an immunoglobulin-like (Ig-like) fold, often collocated with GH9 endo-glucanases [17]. In this study, the presence of different domain architectures did not correlate with the Tm values or alkaline stability of the enzymes, despite their predicted origins from diverse sources. As a strong cellulose-degrading environment, with diverse β-glucosidases, in goat rumen, some GH3 enzymes were found to have second catalytic domains such as CE8, GH5, and GH31.
In addition to the catalytic domain/enzyme family diversity, domain architecture diversity, the physicochemical properties of β-glucosidases—including pH, Tm, and pI values—have been major subjects of investigation. These properties reflect enzyme adaptation to specific ecological niches. For example, the goat rumen maintains a temperature of 38–41 °C and a pH of 5.9–6.3, depending on diet [48,49]. Accordingly, the Tm values of GH1, GH3, and GH16 β-glucosidases from the goat rumen were higher than those from wood humus and termite gut environments, where the ambient temperature is typically below 37 °C. The pI values of rumen-derived enzymes ranged from 4.8 to 5.4, lower than those of enzymes from wood humus (5.7–6.7), which corresponds to an environmental pH of 6.9–7.3 [30]. Notably, the pI values of all enzymes were lower than the pH of their respective environments however depended on GH families.
Predicted active pH values, based on alkaline scores, further revealed that all GH16 β-glucosidases were acidic. GH16 β-glucosidases were the most acidic followed by GH1. Since GH16 β-glucosidases represent a rare subset of GH16 cellulases, they have been only infrequently reported. By contrast, GH3 enzymes from wood humus and goat rumen were annotated to function at neutral to alkaline pH (Figure 2). Previous studies have shown that bacterial GH1 β-glucosidases generally favor neutral to slightly acidic conditions [5,50,51], consistent with our findings that GH1 enzymes exhibit alkaline scores ranging from 0.2 to 0.8. Importantly, GH1 β-glucosidases with glucose tolerance present significant application advantages in the feed industry compared with GH3 β-glucosidases [5].
β-Glucosidase GH3-31 was successfully expressed in soluble form in E. coli Rosetta [36]. In this study, the recombinant enzyme was purified to 96% purity. Compared with in silico predictions, the enzyme activity exhibited optimal temperature of 40 °C, corresponding to a Tm value of 49.8 °C. Its optimal pH was 5.0–5.5, close to the predicted pI of 5.61, although the alkaline score was 0.85. A GH3 β-glucosidase (sequence code GL0050362) from the humus metagenomic dataset was selected for recombinant production in E. coli using pET22b (+) vector [37]. The enzyme consists of 849 amino acids and was predicted to have a pI of 6.1, an alkaline score of 0.5, and a Tm of 52.9 °C [38]. The enzyme exhibited the highest activity at 37 °C, pH 6.0 and remained stable at 37 °C for 12 h. Approximately 50% of the enzyme activity was lost after incubation at 45 °C for 3 h or 50 °C for 1 h [38]. Consistent with the findings of this study, this enzyme also exhibited the optimal pH close to its predicted pI value and corresponded well with the calculated alkaline score of 0.5. In addition, in a previous study, a bacterial endo-xylanase from the goat rumen with a predicted pI of 6.27 and Tm of 49.8 °C was over-expressed in E. coli and exhibited the optimal pH of 5.5–6.5 (close to pI) and optimal temperature of 40–50 °C (corresponding to the melting temperature predicted) [29]. In a previous study, Talley and Alexov reported experimental active pH and optimal pH and compared to the pI of 67 different enzymes; they showed that 44.8% of enzymes had optimal pH higher than pI at least one unit, 28.4% of enzymes possessed optimal pH close to pI (fluctuation of −1 to 1 unit), and 26.9% of enzymes had optimal pH lower than pI at least one unit [52]. Taken together, these results further confirm that in silico predictions of Tm, pI, and alkaline scores provide valuable guidance for screening enzymes based on sequence for desired properties.
This enzyme was unstable, sensitive with SDS but activated by concentration-dependent detergents tween and triton X-100, inhibited by low concentration of glucose and Ni2+, Zn2+, Fe2+, Cu2+. The BGC-GH3-GH31 exhibited Km of 1.37 ± 0.08 mM and Vmax of 43.17 ± 0.6 U/mg toward pNPG in the ranges of Km reported for other prevalent β-gluosidases, from 0.31 mM to 1.87 mM [53]. For example, the β-glucosidase from Clostridium saccharolyticus has a Km of 0.67 mM [54]. The glucose-tolerant enzyme Bgl1A shows Km and Vmax values of 0.39 mM and 18 µmol/min, respectively, while the glucose-tolerant β-glucosidase PersiBGL1 has Km and Vmax of 1.25 mM and 1.55 mM/min, respectively [22]. In addition, an enzyme derived from unculturable bac bacterium exhibited Km and Vmax values of 0.39 mM and 50.7 U/mg, respectively; enzyme originated from Daldinia eschscholzii exhibited Km and Vmax values of 1.52 mM and 3.21 U/mg, respectively, and enzyme from Scytalidium thermophilum displayed Km and Vmax values of 0.29 mM and 13.27 U/mg, respectively [40].
Study Limitations and Future Research Directions.
Although the manuscript has analyzed the sequence diversity and several characteristics of β-glucosidases derived from three microbial systems (termite gut, goat rumen, and humus), we acknowledge that this study still has some limitations.
Firstly, the metagenomic DNA dataset from termite gut bacteria is relatively small (5.4 Gb of clean reads) with low coverage, and only about 29.9% of genes are complete. Therefore, the number of β-glucosidase-encoding genes identified from termite samples may not be fully representative. Despite the limited dataset (26 complete gene sequences), the results revealed that termite gut bacteria contain a high proportion of GH1 β-glucosidases (61.5% of total genes), whereas GH3 β-glucosidases predominate in goat rumen and humus. Given the importance of GH1 β-glucosidases, further in-depth studies on enzymes from this microbial system are needed to provide a more precise scientific basis for future applications.
Secondly, the validation of enzyme properties predicted by in silico tools was conducted on only a small subset of enzymes, including two β-glucosidases and one xylanase for which recombinant expression and characterization of one β-glucosidase and a xylanase had already been reported. While these enzymes exhibited optimal pH values close to their respective pI and Tm values roughly corresponding to predicted melting temperatures, the predictive accuracy of these tools remains to be confirmed. Further evaluation across a larger set of enzymes is needed to establish more robust and generalizable conclusions.
Thirdly, despite considerable efforts, we have not yet identified conditions to enhance enzyme stability, even though the enzyme exhibits a high reaction rate. The role of the GH31 region has not yet been elucidated. These issues warrant further investigation to fully exploit the enzyme’s potential advantages.

5. Conclusions

β-Glucosidases from goat rumen, wood humus, and termite gut exhibited diverse sequences and domain architectures. A total of 833 β-glucosidase sequences were categorized into three families: GH1, GH16, and GH3, forming 30 domain architectures, with distinct isoelectric points, alkaline scores, and melting temperatures across ecological niches. GH3 was the most dominant family in both goat rumen and wood humus (87.6% and 73.9%, respectively), encompassing 22 domain architectures, whereas GH1 displayed minimal diversity with only two architectures. The enzymes’ pI, alkaline scores, domain arrangements, and accessory domains reflected family-specific characteristics, with GH16 enzymes generally acidic and GH3 enzymes more alkaline. A recombinant β-glucosidase GH3-31 expressed in E. coli exhibited optimal activity at 40 °C (below predicted Tm of 49.8 °C), pH5.5 (near predicted pI of 5.61), Km of 1.37 ± 0.08 mM, and Vmax of 43.17 ± 0.6 U/mg, stimulated by tween 20, tween 80, triton X-100, and CTAB, tolerant to K+, Mg2+, Na+, Ca2+, urea, EDTA, and 2-mercaptoethanol. Its activity was enhanced by certain surfactants, tolerated several metal ions, and was inhibited by glucose. This study provides insights for enzyme engineering, structural modification, enzyme screening orientation, and the isolation of potential β-glucosidase producing bacteria for industrial applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/d17110804/s1, Figure S1: The structure of beta-glucosidase GH3-31 based on sequence analysis. (A) Domain architecture and arrangement detected by Pfam; (B) The tertiary structure of the β-glucosidase was predicted by Swiss model; (C) Structure of glycerol binding site of the enzyme. Table S1: A total of 833 beta-glucosidase sequences were identified from bacterial metagenomes of goat rumen, wood humus and termite gut; Table S2: Sequences of original and optimized beta-glucosidase GH3-31; Table S3: Physiochemical properties, GH families and original taxonomy of 833 beta-glucosidase from bacterial metagenomes of goat rumen, wood humus and termite gut.

Author Contributions

Conceptualization, T.H.D. and T.Q.N.; methodology, T.H.D., T.K.D. and T.Q.N.; software, T.H.D., N.G.L., N.T.D. and H.D.N.; validation, T.H.D., H.D.N. and T.K.D.; formal analysis, T.H.D. and T.Q.N.; investigation, T.Q.N.; resources, T.Q.N. and T.K.D.; data curation, T.H.D. and T.Q.N.; writing—original draft preparation, T.Q.N. and T.H.D.; writing—review and editing, T.H.D., T.Q.N. and T.K.D.; visualization, T.H.D.; supervision, T.H.D. and N.H.T.; project administration, T.H.D.; funding acquisition, T.H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Vietnam Academy of Science and Technology, grant number NVCC08.05/24-25.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the National Key Laboratory of Gene Technology, Institute of Biotechnology, VAST, Vietnam for use of their facilities.

Conflicts of Interest

We certificate that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
GHGlycosyl hydrolase
CAZyCarbohydrate-Active enZYmes
KEGGKyoto Encyclopedia of Genes and Genomes
FN3fibronectin type III
CBMcarbohydrate-binding module
pIisoelectric point
Tmmelting temperature
MEGANThe Metagenome Analyzer program
NRNCBI non-redundant protein
BGC-GH3-31Beta-glucosidase carried domains GH3 and GH31

References

  1. Godse, R.; Fernandes, J.M.; Kulkarni, R. Characterization of β-Glucosidase Activity of a Lactiplantibacillus plantarum 6-Phospho-β-Glucosidase. Appl. Microbiol. Biotechnol. 2025, 109, 86. [Google Scholar] [CrossRef]
  2. Paventi, G.; Di Martino, C.; Coppola, F.; Iorizzo, M. β-Glucosidase Activity of Lactiplantibacillus plantarum: A Key Player in Food Fermentation and Human Health. Foods 2025, 14, 1451. [Google Scholar] [CrossRef]
  3. Ketudat Cairns, J.R.; Esen, A. β-Glucosidases. Cell Mol. Life Sci. 2010, 67, 3389–3405. [Google Scholar] [CrossRef] [PubMed]
  4. Muradova, M.; Proskura, A.; Canon, F.; Aleksandrova, I.; Schwartz, M.; Heydel, J.M.; Baranenko, D.; Nadtochii, L.; Neiers, F. Unlocking Flavor Potential Using Microbial β-Glucosidases in Food Processing. Foods 2023, 12, 4484. [Google Scholar] [CrossRef]
  5. Cao, H.; Zhang, Y.; Shi, P.; Ma, R.; Yang, H.; Xia, W.; Cui, Y.; Luo, H.; Bai, Y.; Yao, B. A Highly Glucose-Tolerant GH1 β-Glucosidase with Greater Conversion Rate of Soybean Isoflavones in Monogastric Animals. J. Ind. Microbiol. Biotechnol. 2018, 45, 369–378. [Google Scholar] [CrossRef]
  6. Tran, T.N.A.; Son, J.S.; Awais, M.; Ko, J.H.; Yang, D.C.; Jung, S.K. β-Glucosidase and Its Application in Bioconversion of Ginsenosides in Panax ginseng. Bioengineering 2023, 10, 484. [Google Scholar] [CrossRef]
  7. Magwaza, B.; Amobonye, A.; Pillai, S. Microbial β-Glucosidases: Recent Advances and Applications. Biochimie 2024, 225, 49–67. [Google Scholar] [CrossRef]
  8. Ashwini, M. Beta Glucosidase Market Report 2025 (Global Edition). 2025. Available online: https://www.cognitivemarketresearch.com/beta-glucosidase-market-report (accessed on 9 November 2025).
  9. Singh, N.; Sithole, B.; Kumar, A.; Govinden, R. A Glucose Tolerant β-Glucosidase from a Newly Isolated Neofusicoccum parvum Strain F7: Production, Purification, and Characterization. Sci. Rep. 2023, 13, 5134. [Google Scholar] [CrossRef]
  10. Ouyang, B.; Wang, G.; Zhang, N.; Zuo, J.; Huang, Y.; Zhao, X. Recent Advances in β-Glucosidase Sequence and Structure Engineering: A Brief Review. Molecules 2023, 28, 4990. [Google Scholar] [CrossRef] [PubMed]
  11. Solden, L.; Lloyd, K.; Wrighton, K. The Bright Side of Microbial Dark Matter: Lessons Learned from the Uncultivated Majority. Curr. Opin. Microbiol. 2016, 31, 217–226. [Google Scholar] [CrossRef] [PubMed]
  12. Salgado, J.C.S.; Meleiro, L.P.; Carli, S.; Ward, R.J. Glucose Tolerant and Glucose Stimulated β-Glucosidases—A Review. Bioresour. Technol. 2018, 267, 704–713. [Google Scholar] [CrossRef]
  13. Kaçıran, A.; Şahinkaya, M.; Çolak, D.N.; Zada, N.S.; Kaçağan, M.; Güler, H.İ.; Saygın, H.; Beldüz, A.O. Biochemical Characterization of a Novel, Glucose-Tolerant β-Glucosidase from Jiangella ureilytica KC603, and Determination of Resveratrol Production Capacity from Polydatin. Appl. Biochem. Biotechnol. 2025, 197, 5104–5130. [Google Scholar] [CrossRef]
  14. Tiwari, R.; Kumar, K.; Singh, S.; Nain, L.; Shukla, P. Molecular Detection and Environment-Specific Diversity of Glycosyl Hydrolase Family 1 β-Glucosidase in Different Habitats. Front. Microbiol. 2016, 7, 1597. [Google Scholar] [CrossRef] [PubMed]
  15. Mao, G.; Song, M.; Li, H.; Lin, J.; Wang, K.; Liu, Q.; Su, Z.; Zhang, H.; Su, L.; Xie, H.; et al. Biochemical and Structural Characterization of a Highly Glucose-Tolerant β-Glucosidase from the Termite Reticulitermes perilucifugus. Int. J. Mol. Sci. 2025, 26, 3118. [Google Scholar] [CrossRef]
  16. Dong, S.; Liu, Y.J.; Zhou, H.; Xiao, Y.; Xu, J.; Cui, Q.; Wang, X.; Feng, Y. Structural Insight into a GH1 β-Glucosidase from the Oleaginous Microalga, Nannochloropsis Oceanica. Int. J. Biol. Macromol. 2021, 170, 196–206. [Google Scholar] [CrossRef] [PubMed]
  17. Nguyen, K.H.V.; Dao, T.K.; Nguyen, H.D.; Nguyen, K.H.; Nguyen, T.Q.; Nguyen, T.T.; Nguyen, T.M.P.; Truong, N.H.; Do, T.H. Some Characters of Bacterial Cellulases in Goats’ Rumen Elucidated by Metagenomic DNA Analysis and the Role of Fibronectin 3 Module for Endoglucanase Function. Anim. Biosci. 2021, 34, 867–879. [Google Scholar] [CrossRef]
  18. Sidar, A.; Voshol, G.P.; Arentshorst, M.; Ram, A.F.J.; Vijgenboom, E.; Punt, P.J. Deciphering Domain Structures of Aspergillus and Streptomyces GH3-β-Glucosidases: A Screening System for Enzyme Engineering and Biotechnological Applications. BMC Res. Notes 2024, 17, 257. [Google Scholar] [CrossRef]
  19. He, Y.; Wang, C.; Jiao, R.; Ni, Q.; Wang, Y.; Gao, Q.; Zhang, Y.; Xu, G. Biochemical Characterization of a Novel Glucose-Tolerant GH3 β-Glucosidase (Bgl1973) from Leifsonia Sp. ZF2019. Appl. Microbiol. Biotechnol. 2022, 106, 5063–5079. [Google Scholar] [CrossRef]
  20. Georgakis, N.; Premetis, G.E.; Pantiora, P.; Varotsou, C.; Bodourian, C.S.; Labrou, N.E. The Impact of Metagenomic Analysis on the Discovery of Novel Endolysins. Appl. Microbiol. Biotechnol. 2025, 109, 126. [Google Scholar] [CrossRef]
  21. Jeilu, O.; Alexandersson, E.; Johansson, E.; Simachew, A.; Gessesse, A. A Novel GH3-β-Glucosidase from Soda Lake Metagenomic Libraries with Desirable Properties for Biomass Degradation. Sci. Rep. 2024, 14, 10012. [Google Scholar] [CrossRef] [PubMed]
  22. Ariaeenejad, S.; Nooshi-Nedamani, S.; Rahban, M.; Kavousi, K.; Pirbalooti, A.G.; Mirghaderi, S.; Mohammadi, M.; Mirzaei, M.; Salekdeh, G.H. A Novel High Glucose-Tolerant β-Glucosidase: Targeted Computational Approach for Metagenomic Screening. Front. Bioeng. Biotechnol. 2020, 8, 813. [Google Scholar] [CrossRef]
  23. Kaushal, G.; Rai, A.K.; Singh, S.P. A Novel β-Glucosidase from a Hot-Spring Metagenome Shows Elevated Thermal Stability and Tolerance to Glucose and Ethanol. Enzym. Microb. Technol. 2021, 145, 109764. [Google Scholar] [CrossRef]
  24. Mai, Z.; Su, H.; Zhang, S. Characterization of a Metagenome-Derived β-Glucosidase and Its Application in Conversion of Polydatin to Resveratrol. Catalysts 2016, 6, 35. [Google Scholar] [CrossRef]
  25. Matsuzawa, T.; Watanabe, M.; Nakamichi, Y.; Akita, H.; Yaoi, K. Crystal Structure of Metagenomic β-Glycosidase MeBglD2 in Complex with Various Saccharides. Appl. Microbiol. Biotechnol. 2022, 106, 4539–4551. [Google Scholar] [CrossRef]
  26. Do, T.H.; Nguyen, T.T.; Nguyen, T.N.; Le, Q.G.; Nguyen, C.; Kimura, K.; Truong, N.H. Mining Biomass-Degrading Genes through Illumina-Based de Novo Sequencing and Metagenomic Analysis of Free-Living Bacteria in the Gut of the Lower Termite Coptotermes gestroi Harvested in Vietnam. J. Biosci. Bioeng. 2014, 118, 665–671. [Google Scholar] [CrossRef]
  27. Do, T.H.; Dao, T.K.; Nguyen, K.H.V.; Le, N.G.; Nguyen, T.M.P.; Le, T.L.; Phung, T.N.; van Straalen, N.M.; Roelofs, D.; Truong, N.H. Metagenomic Analysis of Bacterial Community Structure and Diversity of Lignocellulolytic Bacteria in Vietnamese Native Goat Rumen. Asian-Australas. J. Anim. Sci. 2018, 31, 738–747. [Google Scholar] [CrossRef]
  28. Do, T.H.; Le, N.G.; Dao, T.K.; Nguyen, T.M.P.; Le, T.L.; Luu, H.L.; Nguyen, K.H.V.; Nguyen, V.L.; Le, L.A.; Phung, T.N.; et al. Metagenomic Insights into Lignocellulose-Degrading Genes through Illumina-Based de Novo Sequencing of the Microbiome in Vietnamese Native Goats’ Rumen. J. Gen. Appl. Microbiol. 2018, 64, 108–116. [Google Scholar] [CrossRef] [PubMed]
  29. Dao, T.K.; Do, T.H.; Le, N.G.; Nguyen, H.D.; Nguyen, T.Q.; Le, T.T.H.; Truong, N.H. Understanding the Role of Prevotella Genus in the Digestion of Lignocellulose and Other Substrates in Vietnamese Native Goats’ Rumen by Metagenomic Deep Sequencing. Animals 2021, 11, 3257. [Google Scholar] [CrossRef] [PubMed]
  30. Le, T.T.H.; Nguyen, T.B.; Nguyen, H.D.; Nguyen, H.D.; Le, N.G.; Dao, T.K.; Nguyen, T.Q.; Do, T.H.; Truong, N.H. De Novo Metagenomic Analysis of Microbial Community Contributing in Lignocellulose Degradation in Humus Samplesharvested from Cuc Phuong Tropical Forest in Vietnam. Diversity 2022, 14, 220. [Google Scholar] [CrossRef]
  31. Lin, H.; Chen, W.; Ding, H. AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes. PLoS ONE 2013, 8, e75726. [Google Scholar] [CrossRef]
  32. Nguyen, K.; Nguyen, T.; Truong, N.; Do, T. Application of Bioinformatic Tools for Prediction of Active pH and Temperature Stability of Endoglucanases Based on Coding Sequences from Metagenomic DNA Data. Biol. Forum-Int. J. 2019, 11, 14–20. [Google Scholar]
  33. Lim, S.; Seo, J.; Choi, H.; Yoon, D.; Nam, J.; Kim, H.; Cho, S.; Chang, J. Metagenome Analysis of Protein Domain Collocation within Cellulase Genes of Goat Rumen Microbes. Asian-Australas. J. Anim. Sci. 2013, 26, 1144–1151. [Google Scholar] [CrossRef]
  34. Lombard, V.; Henrissat, B.; Garron, M.-L. CAZac: An Activity Descriptor for Carbohydrate-Active Enzymes. Nucleic Acids Res. 2025, 53, D625–D633. [Google Scholar] [CrossRef]
  35. Huson, D.H.; Auch, A.F.; Qi, J.; Schuster, S.C. MEGAN Analysis of Metagenomic Data. Genome Res. 2007, 17, 377–386. [Google Scholar] [CrossRef] [PubMed]
  36. Nguyen, T.Q.; Do, T.H.; Nguyen, T.K.L.; Nguyen, H.D.; Truong, N.H. Expression of Beta-Glucosidase Mined from Metagenomic DNA Data of Bacteria in Vietnamese Goats’ Rumen in Eschrichia coli System. Acad. J. Biol. 2022, 44, 43–52. [Google Scholar] [CrossRef]
  37. Binh, N.T.; Quy, N.T.; Huyen, D.T.; Hong, L.T.T.; Hai, T.N. Selection of Optimal Culture Conditions for Expression of Recombinant Beta-Glucosidase in Escherichia coli. Vietnam J. Biotechnol. 2022, 20, 425–433. [Google Scholar] [CrossRef]
  38. Binh, N.T.; Quy, N.T.; Hong, L.T.T.; Hai, T.N. Purification and Characterization of a Recombinant Beta-Glucosidase in Escherichia coli. Vietnam J. Biotechnol. 2022, 20, 599–607. [Google Scholar] [CrossRef]
  39. Bradford, M.M. A Rapid and Sensitive Method for the Quantitation of Microgram Quantities of Protein Utilizing the Principle of Protein-Dye Binding. Anal. Biochem. 1976, 72, 248–254. [Google Scholar] [CrossRef]
  40. Fang, S.; Chang, J.; Lee, Y.S.; Guo, W.; Choi, Y.L.; Zhou, Y. Cloning and Characterization of a New Broad specific β-Glucosidase from Lactococcus sp. FSJ4. World J. Microbiol. Biotechnol. 2014, 30, 213–223. [Google Scholar] [CrossRef] [PubMed]
  41. Qi, J.; Zhang, M.; Chen, C.; Feng, Y.; Xuan, J. Cellulosome Systems in the Digestive Tract: Underexplored Enzymatic Machine for Lignocellulose Bioconversion. Catalysts 2025, 15, 387. [Google Scholar] [CrossRef]
  42. Bule, P.; Pires, V.M.R.; Alves, V.D.; Carvalho, A.L.; Prates, J.A.M.; Ferreira, L.M.A.; Smith, S.P.; Gilbert, H.J.; Noach, I.; Bayer, E.A.; et al. Higher Order Scaffoldin Assembly in Ruminococcus Flavefaciens Cellulosome Is Coordinated by a Discrete Cohesin-Dockerin Interaction. Sci. Rep. 2018, 8, 6987. [Google Scholar] [CrossRef]
  43. Alahuhta, M.; Xu, Q.; Bomble, Y.J.; Brunecky, R.; Adney, W.S.; Ding, S.Y.; Himmel, M.E.; Lunin, V.V. The Unique Binding Mode of Cellulosomal CBM4 from Clostridium thermocellum Cellobiohydrolase A. J. Mol. Biol. 2010, 402, 374–387. [Google Scholar] [CrossRef]
  44. Huang, Y.; Busk, P.K.; Grell, M.N.; Zhao, H.; Lange, L. Identification of a β-Glucosidase from the Mucor Circinelloides Genome by Peptide Pattern Recognition. Enzym. Microb. Technol. 2014, 67, 47–52. [Google Scholar] [CrossRef]
  45. Yang, Y.; Zhang, X.; Yin, Q.; Fang, W.; Fang, Z.; Wang, X.; Zhang, X.; Xiao, Y. A Mechanism of Glucose Tolerance and Stimulation of GH1 β-Glucosidases. Sci. Rep. 2015, 5, 17296. [Google Scholar] [CrossRef] [PubMed]
  46. Correia, M.A.S.; Pires, V.M.R.; Gilbert, H.J.; Bolam, D.N.; Fernandes, V.O.; Alves, V.D.; Prates, J.A.M.; Ferreira, L.M.A.; Fontes, C.M.G.A. Family 6 Carbohydrate-Binding Modules Display Multiple Beta1,3-Linked Glucan-Specific Binding Interfaces. FEMS Microbiol. Lett. 2009, 300, 48–57. [Google Scholar] [CrossRef] [PubMed]
  47. Rigden, D.J.; Mello, L.V.; Galperin, M.Y. The PA14 Domain, a Conserved All-β Domain in Bacterial Toxins, Enzymes, Adhesins and Signaling Molecules. Trends Biochem. Sci. 2004, 29, 335–339. [Google Scholar] [CrossRef] [PubMed]
  48. Castro-Costa, A.; Salama, A.A.K.; Moll, X.; Aguiló, J.; Caja, G. Using Wireless Rumen Sensors for Evaluating the Effects of Diet and Ambient Temperature in Nonlactating Dairy Goats. J. Dairy. Sci. 2015, 98, 4646–4658. [Google Scholar] [CrossRef]
  49. Mamuad, L.L.; Lee, S.S.; Lee, S.S. Recent Insight and Future Techniques to Enhance Rumen Fermentation in Dairy Goats. Asian-Australas. J. Anim. Sci. 2019, 32, 1321–1330. [Google Scholar] [CrossRef]
  50. Huang, Y.Y.; Lv, Z.H.; Zheng, H.Z.; Zhu, Q.; Liu, M.T.; Sang, P.; Wang, F.; Zhu, D.; Xian, W.D.; Yin, Y.R. Characterization of a Thermophilic and Glucose-Tolerant GH1 β-Glucosidase from Hot Springs and Its Prospective Application in Corn Stover Degradation. Front. Microbiol. 2023, 14, 1286682. [Google Scholar] [CrossRef]
  51. Zhu, Q.; Huang, Y.; Yang, Z.; Wu, X.; Zhu, Q.; Zheng, H.; Zhu, D.; Lv, Z.; Yin, Y. A Recombinant Thermophilic and Glucose-Tolerant GH1 β-Glucosidase Derived from Hehua Hot Spring. Molecules 2024, 29, 1017. [Google Scholar] [CrossRef]
  52. Talley, K.; Alexov, E. On the pH-Optimum of Activity and Stability of Proteins. Proteins 2010, 78, 2699–2706. [Google Scholar] [CrossRef] [PubMed]
  53. Krisch, J.; Takó, M.; Papp, T.; Vágvölgyi, C. Characteristics and Potential Use of β-Glucosidases from Zygomycetes. In Current Research; Technology and Education Topics in Applied Microbiology and Microbial Biotechnology; Formatex Research Center: Badajoz, Spain, 2010; pp. 891–896. ISBN 978-84-614-6195-0. [Google Scholar]
  54. Hong, M.R.; Kim, Y.S.; Park, C.S.; Lee, J.K.; Kim, Y.S.; Oh, D.K. Characterization of a Recombinant Beta-Glucosidase from the Thermophilic Bacterium Caldicellulosiruptor Saccharolyticus. J. Biosci. Bioeng. 2009, 108, 36–40. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Analysis of phylogenetic relationships of β-glucosidase sequences obtained from metagenomic DNA data of bacteria in goat rumen, termite gut, and fungal-degraded wood humus, using the maximum likelihood method with 1000 bootstrap replications. (A) Phylogenetic relationships of 115 β-glucosidase sequences belonging to GH1. (B) Phylogenetic relationships of 40 β-glucosidase sequences belonging to GH16. (C) Phylogenetic relationships of 678 β-glucosidase sequences belonging to GH3. Sequence names in the phylogenetic trees include the GH family of their sequences followed by the original data source and accession codes; specifically, “G” denotes sequences derived from goat rumen; “H” denotes sequences derived from humus, and “T” denotes sequences derived from termite gut. The numbers in the figures indicate the bootstrap values with a maximum value of 100. Stars denote the positions of the distance scale. The distance scale in the linearized GH3 β-glucosidase tree is shown as a ruler at the bottom of the tree. Two representatives of divergent distances are displayed in cyan.
Figure 1. Analysis of phylogenetic relationships of β-glucosidase sequences obtained from metagenomic DNA data of bacteria in goat rumen, termite gut, and fungal-degraded wood humus, using the maximum likelihood method with 1000 bootstrap replications. (A) Phylogenetic relationships of 115 β-glucosidase sequences belonging to GH1. (B) Phylogenetic relationships of 40 β-glucosidase sequences belonging to GH16. (C) Phylogenetic relationships of 678 β-glucosidase sequences belonging to GH3. Sequence names in the phylogenetic trees include the GH family of their sequences followed by the original data source and accession codes; specifically, “G” denotes sequences derived from goat rumen; “H” denotes sequences derived from humus, and “T” denotes sequences derived from termite gut. The numbers in the figures indicate the bootstrap values with a maximum value of 100. Stars denote the positions of the distance scale. The distance scale in the linearized GH3 β-glucosidase tree is shown as a ruler at the bottom of the tree. Two representatives of divergent distances are displayed in cyan.
Diversity 17 00804 g001
Figure 2. The differences in physicochemical properties of bacterial β-glucosidase families from humus, termite gut, and goat rumen. (A) Differences in pI values among enzyme families and sources. (B) Differences in alkaline values among β-glucosidase families and sources. (C) Differences in Tm values among β-glucosidase families and sources. (D) Distribution of enzymes based on amino acid sequence length and Tm values. (E) Distribution of enzymes based on Tm values and alkaline values. Numbers next to the median lines represent median values. Circles (○) denotes outliers with values beyond 1.5 × IQR (Interquartile Range) but within 3 × IQR. Stars (*) in (AC) denote outliers with values beyond 3 × IQR from the nearest quartile.
Figure 2. The differences in physicochemical properties of bacterial β-glucosidase families from humus, termite gut, and goat rumen. (A) Differences in pI values among enzyme families and sources. (B) Differences in alkaline values among β-glucosidase families and sources. (C) Differences in Tm values among β-glucosidase families and sources. (D) Distribution of enzymes based on amino acid sequence length and Tm values. (E) Distribution of enzymes based on Tm values and alkaline values. Numbers next to the median lines represent median values. Circles (○) denotes outliers with values beyond 1.5 × IQR (Interquartile Range) but within 3 × IQR. Stars (*) in (AC) denote outliers with values beyond 3 × IQR from the nearest quartile.
Diversity 17 00804 g002
Figure 3. Domain organization of β-glucosidases GH1, GH16, and GH3 derived from bacteria in goat rumen, wood humus and termite gut. Sequence names include the original data source and accession codes; specifically, “G” denotes sequences derived from goat rumen; “H” denotes sequences derive from humus, and “T” denotes sequences derived from termite gut. GH: glycosyl hydrolase family; GH3N: N-terminal domain of GH3; GH3C: C-terminal domain of GH3; CBM: carbohydrate binding module; FN3: fibronectin 3; SigP: signal peptide; Big2: bacterial Ig-like domain; Dock: Dockerin_1 domain; PorSec: Por secretion system C-terminal sorting domain; ExopC: substrate binding domain at C-terminus of enzyme; PA14: protective antigen with carbohydrate-binding function; CE8: Pectinesterase domain.
Figure 3. Domain organization of β-glucosidases GH1, GH16, and GH3 derived from bacteria in goat rumen, wood humus and termite gut. Sequence names include the original data source and accession codes; specifically, “G” denotes sequences derived from goat rumen; “H” denotes sequences derive from humus, and “T” denotes sequences derived from termite gut. GH: glycosyl hydrolase family; GH3N: N-terminal domain of GH3; GH3C: C-terminal domain of GH3; CBM: carbohydrate binding module; FN3: fibronectin 3; SigP: signal peptide; Big2: bacterial Ig-like domain; Dock: Dockerin_1 domain; PorSec: Por secretion system C-terminal sorting domain; ExopC: substrate binding domain at C-terminus of enzyme; PA14: protective antigen with carbohydrate-binding function; CE8: Pectinesterase domain.
Diversity 17 00804 g003
Figure 4. Diversity of bacteria-producing β-glucosidases, classified by families and domain architectures derived from goat rumen, wood humus, and termite gut. (A) Distribution of bacteria-producing GH1, GH16, and GH3 enzymes in goat rumen. (B) Distribution of bacteria-producing GH1, GH16, and GH3 enzymes in wood humus. (C) Distribution of bacteria-producing GH1 and GH3 enzymes in termite gut. (D) Distribution of bacteria-producing GH3 enzymes with different domain architectures in goat rumen. (E) Distribution of bacteria-producing the GH3 enzymes with the domain architecture GH3C-GH3N in goat rumen. (F) Distribution of bacteria-producing the GH3 enzymes with different domain architectures in the wood humus. (G) Distribution of bacteria-producing GH3 enzymes with domain architecture GH3-PA14 in wood humus.
Figure 4. Diversity of bacteria-producing β-glucosidases, classified by families and domain architectures derived from goat rumen, wood humus, and termite gut. (A) Distribution of bacteria-producing GH1, GH16, and GH3 enzymes in goat rumen. (B) Distribution of bacteria-producing GH1, GH16, and GH3 enzymes in wood humus. (C) Distribution of bacteria-producing GH1 and GH3 enzymes in termite gut. (D) Distribution of bacteria-producing GH3 enzymes with different domain architectures in goat rumen. (E) Distribution of bacteria-producing the GH3 enzymes with the domain architecture GH3C-GH3N in goat rumen. (F) Distribution of bacteria-producing the GH3 enzymes with different domain architectures in the wood humus. (G) Distribution of bacteria-producing GH3 enzymes with domain architecture GH3-PA14 in wood humus.
Diversity 17 00804 g004
Figure 5. Analysis of BGC-GH3-31 purification and characterization of recombinant β-glucosidase GH3-31. (A) SDS-PAGE analysis of protein fractions during BGC-GH3-31 purification on 12.6% polyacrylamide gel. (B) SDS-PAGE analysis of desalted BGC-GH3-31 at different protein amounts on 12.6% polyacrylamide gel. (C) Effect of reaction time on BGC-GH3-31 activity. (D) Effect of pH on BGC-GH3-31 activity. (E) Effect of temperature on BGC-GH3-31 activity. (F) Effect of incubation time at different temperatures on BGC-GH3-31 activity. (G) Effect of selected chemicals and detergents on BGC-GH3-31 activity. (H) Effect of NaCl concentration on BGC-GH3-31 activity. (I) Effect of Triton X-100 concentration on BGC-GH3-31 activity. (J) Effect of glucose concentration on BGC-GH3-31 activity. M: Standard protein marker (Fermentas); TS: Total soluble proteins; F: Flow-through proteins; W1: Wash fraction with equilibration buffer; W2: Wash fraction with PBS buffer containing 50 mM imidazole; E1–E3: Eluted fractions with PBS buffer containing 300 mM imidazole.
Figure 5. Analysis of BGC-GH3-31 purification and characterization of recombinant β-glucosidase GH3-31. (A) SDS-PAGE analysis of protein fractions during BGC-GH3-31 purification on 12.6% polyacrylamide gel. (B) SDS-PAGE analysis of desalted BGC-GH3-31 at different protein amounts on 12.6% polyacrylamide gel. (C) Effect of reaction time on BGC-GH3-31 activity. (D) Effect of pH on BGC-GH3-31 activity. (E) Effect of temperature on BGC-GH3-31 activity. (F) Effect of incubation time at different temperatures on BGC-GH3-31 activity. (G) Effect of selected chemicals and detergents on BGC-GH3-31 activity. (H) Effect of NaCl concentration on BGC-GH3-31 activity. (I) Effect of Triton X-100 concentration on BGC-GH3-31 activity. (J) Effect of glucose concentration on BGC-GH3-31 activity. M: Standard protein marker (Fermentas); TS: Total soluble proteins; F: Flow-through proteins; W1: Wash fraction with equilibration buffer; W2: Wash fraction with PBS buffer containing 50 mM imidazole; E1–E3: Eluted fractions with PBS buffer containing 300 mM imidazole.
Diversity 17 00804 g005
Table 1. Functional domains of β-glucosidases derived from bacteria in goat rumen, wood humus, and termite gut.
Table 1. Functional domains of β-glucosidases derived from bacteria in goat rumen, wood humus, and termite gut.
NoModular StructureGoatHumusTermiteTotal
Total β-glucosidase GH1297016115
1GH1295716102
2Sig-GH1 13 13
Total β-glucosidase GH16355040
1GH166107
2SigP-GH1693 12
3GH16-CBM46 6
4SigP-GH16-CBM413 13
5SigP-Dockerin-GH16_CBM4-CBM41 1
6GH16-CBM32-Por_sec_tail 1 1
Total β-glucosidase GH345821010678
1SigP-GH3C-FN3-GH3N7 7
2GH3C-FN3-GH3N722 74
3GH3C-GH3N3 3
4GH3N-GH3C-FN3-Lactamase1 1
5GH3N-GH3C-FN3174692245
6GH3N-GH3C253 28
7GH3N-GH3C-ExopC 2 2
8SigP-GH3N-GH3C-FN3-GH51 1
9SigP-GH3N-GH3C-FN3-GH311 1
10SigP-GH3N-GH3C-FN31211104235
11SigP-GH3N-GH3C194124
12GH3N-GH3C:PA14:GH3C-FN387318
13GH3N-GH3C-FN3-PA141 1
14SigP-GH3N-GH3C:PA14:GH3C-FN3813 21
15SigP-DUF-GH3N-GH3C:PA14:GH3C-FN31 1
16GH3N-GH3C:PA14:GH3C1 1
17GH3N-GH3C:CBM6:GH3C-FN3-CE81 1
18GH3N-GH3C:CBM6:GH3C-FN3-Big2-GH431 1
19SigP-GH3N-GH3C:CBM6:GH3C-FN3-CE81 1
20SigP-GH3N-GH3C:CBM6:GH3C-FN3-Big2-GH431 1
21SigP-GH3N-GH3C:CBM6:GH3C-FN3-GH432 2
22SigP-GH3N-GH3C:CBM6:GH3C-FN39 9
Total52228526833
GH: glycosyl hydrolase family; GH3N: N-terminal of GH3; GH3C: C-terminal of GH3; CBM: carbohydrate binding module; FN3: fibronectin 3; SigP: signal peptide; Big2: bacterial Ig-like domain; Dockerin: Dockerin_1; Por_sec_tail: Por secretion system C-terminal sorting domain; ExopC: substrate binding domain at C-terminal of enzyme; PA14: protective antigen with function of carbohydrate binding; CE8: Pectinesterase. “-” separates two functional domains; “:”: denotes an inserted domain within a functional domain.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nguyen, T.Q.; Do, T.H.; Le, N.G.; Nguyen, H.D.; Dao, T.K.; Dinh, N.T.; Truong, N.H. β-Glucosidases: In Silico Analysis of Physicochemical Properties and Domain Architecture Diversity Revealed by Metagenomic Technology. Diversity 2025, 17, 804. https://doi.org/10.3390/d17110804

AMA Style

Nguyen TQ, Do TH, Le NG, Nguyen HD, Dao TK, Dinh NT, Truong NH. β-Glucosidases: In Silico Analysis of Physicochemical Properties and Domain Architecture Diversity Revealed by Metagenomic Technology. Diversity. 2025; 17(11):804. https://doi.org/10.3390/d17110804

Chicago/Turabian Style

Nguyen, Thi Quy, Thi Huyen Do, Ngoc Giang Le, Hong Duong Nguyen, Trong Khoa Dao, Nho Thai Dinh, and Nam Hai Truong. 2025. "β-Glucosidases: In Silico Analysis of Physicochemical Properties and Domain Architecture Diversity Revealed by Metagenomic Technology" Diversity 17, no. 11: 804. https://doi.org/10.3390/d17110804

APA Style

Nguyen, T. Q., Do, T. H., Le, N. G., Nguyen, H. D., Dao, T. K., Dinh, N. T., & Truong, N. H. (2025). β-Glucosidases: In Silico Analysis of Physicochemical Properties and Domain Architecture Diversity Revealed by Metagenomic Technology. Diversity, 17(11), 804. https://doi.org/10.3390/d17110804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop