1. Introduction
Processed foods constitute a significant portion of human diets worldwide, accounting for an estimated 50–90% of caloric intake in some populations [
1]. The prevalence of industrially processed food has exacerbated environmental challenges by generating large quantities of agro-industrial waste [
2]. One notable example is cheese whey, a lactose-rich byproduct of the dairy industry. In many regions, whey is disposed of into the environment, where its high organic content causes pollution of water bodies. Although some whey is repurposed (e.g., for protein concentrates and animal feed), in developing countries such as Chile it is still often discarded, posing a serious environmental burden [
3]. Reducing such waste aligns with circular economy and bioeconomy principles, which aim to transform biological waste streams into high-value products [
4,
5]. Implementing a circular bioeconomy, however, requires efficient biotechnological solutions to convert low-value substrates like lactose into value-added compounds.
Lactose valorization—converting lactose into higher-value products—has drawn considerable interest as a strategy to mitigate whey waste. A particularly attractive target product is D-tagatose, a keto-hexose sugar that is nearly as sweet as sucrose but with ~40% fewer calories and a very low glycemic index. D-Tagatose also exhibits prebiotic properties, making it a promising functional sweetener for foods and pharmaceuticals [
6]. An enzymatic pathway can be employed to convert lactose into tagatose. In the first step, β-galactosidase (β-gal, EC 3.2.1.23) hydrolyzes lactose into D-glucose and D-galactose. This step is already used industrially to produce lactose-free dairy products and galactooligosaccharides (GOS) from whey lactose. In the second step, L-arabinose isomerase (L-AI, EC 5.3.1.4) isomerizes the released D-galactose into D-tagatose. L-AIs are actively being researched for tagatose production processes [
6]. Both enzymatic steps add value to the whey stream: β-galactosidases facilitate lactose removal and GOS synthesis, while L-AIs enables the generation of a low-calorie sweetener (tagatose) with potential health benefits.
Structurally, the
Escherichia coli β-galactosidase (LacZ) is a well-characterized model enzyme. It is a homotetramer of ~465 kDa, with each ~116 kDa monomer contributing to the active site at subunit interfaces. The active site pocket contains key residues that coordinate lactose and catalyze its hydrolysis via a classic double-displacement (ping-pong) mechanism. Two glutamic acid residues act in tandem—one as a general acid to protonate the glycosidic oxygen, and another as a nucleophile to form a galactosyl-enzyme intermediate—facilitating breakdown of the lactose into monosaccharides [
7]. These catalytic glutamates and surrounding residues are conserved among β-galactosidases from various species [
8]. Lactose binding is further stabilized by multiple hydrogen bonds (in
E. coli LacZ, interactions involve Asn99, His391, Glu461, Gln537, etc.) and hydrophobic stacking with active site tryptophans, as well as coordination to essential metal ions [
7].
E. coli LacZ requires Mg
2+ for maximal activity, likely because the metal ion helps stabilize the transition state and coordinate reactive water molecules; removal of Mg
2+ significantly impairs catalysis [
9]. In addition, a Na
+ ion is known to bind near the lactose 6-hydroxyl in
E. coli LacZ, enhancing activity, though sodium is not universally required in all β-galactosidases [
7]. Many bacterial β-galactosidases are metalloenzymes to varying degrees—for example, some require divalent cations (Mg
2+ or Mn
2+) for optimal activity, and some possess structural metal sites that influence stability [
10]. Industrial applications of β-galactosidases leverage these properties to achieve high activity under process conditions, for instance using thermostable or cold-active enzymes depending on the desired operation temperature.
L-Arabinose isomerase (L-AI), in contrast, typically functions as a homotetramer or hexamer of ~500–600 kDa total. L-AIs catalyzes the reversible isomerization of D-galactose to D-tagatose (as well as L-arabinose to L-ribulose in their native context) via an ene-diol mechanism. The active site of L-AI usually contains two catalytic acidic residues (Glu/Asp) and two histidines, arranged to facilitate proton transfer and aldose-ketose isomerization. Many L-AIs are metalloenzymes, requiring a divalent metal cofactor (often Mn
2+) bound in the active site for activity—the metal ion helps polarize the substrate and stabilize the enediolate intermediate [
6]. Optimal activity of L-AIs is often at slightly acidic to neutral pH (around 6–7.5) and moderate temperatures, though there is variability among species. D-Tagatose formation by L-AI is typically equilibrium-limited (equilibrium favors galactose ~3:1 over tagatose at 30–60 °C), so yields around 30–50% can be achieved at high substrate concentrations or with equilibrium-shift strategies (e.g., continuous product removal). Recent engineering efforts have also produced metal-independent L-AIs, which have mutations in the metal-binding sites allowing them to catalyze isomerization without added metal ions [
11].
Cold-active enzymes offer significant advantages, as they not only catalyze reactions at low temperatures but also maintain high activity and reaction rates by lowering the activation energy required. They are characterized by their higher specific activity at low temperatures and a relatively low optimum catalysis temperature [
12]. This facilitates the economical production of enzymes through high-yield, high-productivity, intensive fermentation processes [
13].
In the search for these types of enzymes, psychrophilic or psychrotolerant microorganisms have been investigated. Some relevant enzymes derived from psychrophilic bacteria include one from
Alkalilactibacillus ikkense, which maintains 60% of its activity at 0 °C, demonstrating transgalactosylation activity [
14], and one from
Arthrobacter sp., with a peak activity of 42% at 10 °C [
15]. Similarly,
Rahnella inusitata has shown between 41–62% hydrolysis activity at temperatures of 4–15 °C [
16]. Among psychrotolerant bacteria, active L-AIs have been identified, such as one from
Arthrobacter sp., with a maximum activity of 60% for isomerizing D-galactose at 30 °C, without the need for divalent ions [
17], and one from
Shewanella sp., which shows maximum activity between 15 and 35 °C, with a moderate need for divalent cations, achieving 16% D-galactose isomerization at 4 °C and 34% at 35 °C [
18]. Finally, an L-AI from
Pseudoalteromonas haloplanktis was found, which showed activity at 40 °C, although without specificity for the substrate D-galactose [
19].
To advance lactose waste valorization, in this work we investigate two key enzymes from a novel bacterial source: β-galactosidase (for lactose hydrolysis) and L-arabinose isomerase (for tagatose production). Our focus was on an Antarctic isolate, strain L47, which was identified as a Gram-negative Ewingella americana from genomic analyses. Notably, the draft genome of L47 revealed one L-arabinose isomerase gene and three distinct β-galactosidase genes, an unusual multiplicity for a single isolate (most bacteria have only one or two β-gal genes). This raised the question of whether one of these β-galactosidases specializes in lactose degradation. We hypothesized that only one of the three β-gal enzymes is primarily responsible for lactose hydrolysis in L47. Therefore, the objectives of this study were: (i) to identify which of the three L47 β-galactosidases has the highest lactose-degrading activity, by combining bioinformatic predictions with heterologous expression and enzymatic assays, and (ii) to characterize the activity of the L47 L-arabinose isomerase for D-tagatose production. We employed a combination of genomic analysis, structural modeling, and in vitro experiments (enzyme expression, purification, and kinetic assays) to achieve these goals. Our integrated approach allowed us to correlate the enzymes’ structures with their function, and to pinpoint the most promising enzyme candidates for a lactose-to-tagatose conversion process.
3. Discussion
In this study, we combined genomic, structural, and biochemical approaches to evaluate the lactose-to-tagatose conversion potential of the Antarctic isolate L47. Structural models are discussed exclusively in terms of sequence conservation and putative active-site architecture of the native enzymes, as the recombinant β-galactosidases were obtained predominantly as inclusion bodies. Therefore, any structural interpretation is limited to comparative and predictive insights rather than direct confirmation of native quaternary organization.
Genomic annotation of strain L47 revealed an operon-like organization for
araA,
bgaA,
bglY, and
lacZ loci, with putative −35/−10 motifs compatible with σ
70-dependent promoters, suggesting basal transcription driven by the housekeeping RNA polymerase–RpoD (σ
70). However, the presence and arrangement of nearby transcriptional regulators indicate that expression is primarily inducible and governed by carbon source availability, consistent with classical operon models [
28]. The
ara region exhibits a canonical L-arabinose-inducible architecture consistent with AraC-dependent activation [
29], whereas the
rafB–
bgaA module couples transport and lactose hydrolysis and is flanked by an AraC-type regulator, supporting positive inducible control [
30,
31]. In contrast, the
bglY–
gan region integrates hydrolytic functions and transport with LacI-type regulators, consistent with a unit specialized in utilization of more complex galactosides, while the
lacI–
lacZ locus resembles a lac-type inducible system responsive to β-galactosides [
32]. Collectively, these features support a regulatory network in which basal σ
70 transcription is fine-tuned by dedicated regulators according to carbon source availability, consistent with the metabolic adaptability expected for environmental strains.
A central finding of this work is that, among the three β-galactosidases encoded in the L47 genome, BgaA displayed significant lactose-hydrolyzing activity, whereas the classical LacZ enzyme and the second GH42 β-galactosidase BglY showed little or no detectable activity under the experimental conditions tested. This result was initially unexpected, as in silico analyses predicted LacZ to exhibit the strongest lactose binding affinity, followed by BgaA, with BglY ranking lowest. Ultimately, the experimental results partially aligned with these predictions: BglY was indeed the least effective enzyme, while BgaA outperformed it. The main discrepancy arose from LacZ, which could not be obtained in an active form despite its favorable predicted binding properties, underscoring the importance of expression context and protein aggregation behavior in determining functional enzyme availability.
The differential activity observed between BgaA and BglY is particularly intriguing, given that both enzymes belong to the GH42 family and originate from the same organism. Several factors may explain this divergence. First, intrinsic differences in folding propensity and aggregation behavior likely play a major role. Although both enzymes accumulated as inclusion bodies in
E. coli, BgaA-derived inclusion bodies consistently retained detectable catalytic activity, whereas BglY-derived aggregates did not. This observation suggests that BgaA inclusion bodies may contain a higher fraction of correctly folded or catalytically competent protein, a phenomenon increasingly reported for recombinant enzymes produced as inclusion bodies [
33,
34,
35].
A second important distinction between BgaA and BglY relates to structural zinc dependence. Sequence and structural analyses indicate that BglY retains a canonical zinc-binding loop, likely involved in stabilizing the active-site architecture under native conditions [
27]. In contrast, BgaA lacks key cysteine residues required for Zn
2+ coordination, rendering it effectively zinc independent. Previous studies have demonstrated that Zn
2+ supplementation can significantly enhance the activity of certain GH42 β-galactosidases when zinc is limiting [
27]. In the present work, Zn
2+ was not included in the activity assays, which may have disproportionately affected BglY functionality. Conversely, the apparent Zn
2+ independence of BgaA likely confers greater robustness under heterologous expression conditions, allowing it to retain activity even within aggregated states.
Differences at the active-site level may further contribute to the observed behavior. Structural modeling suggested modest substitutions in BgaA relative to BglY, including His→Trp and Gln→Ala changes within the substrate-binding pocket. While these alterations could slightly affect substrate affinity, they do not appear to abolish catalytic function, as evidenced by the measured kinetic parameters for BgaA (Km ≈ 17 mM for o-NPG). Such substitutions may even enhance hydrophobic packing or confer greater conformational flexibility, partially compensating for the loss of specific hydrogen bonds. In contrast, the inactivity of BglY is more plausibly attributed to structural instability and aggregation-related constraints rather than to active-site substitutions alone.
From a physiological perspective, it is also conceivable that BgaA and BglY fulfill distinct roles in the native L47 organism. Many bacteria encode multiple β-galactosidases with nonredundant functions, differing in substrate specificity, cellular localization, or induction conditions [
31]. BgaA may represent the primary lactose-hydrolyzing enzyme expressed under lactose-rich conditions, whereas BglY could be induced under alternative environmental cues or act on different galactoside substrates. Without in vivo expression data, this remains speculative; however, our biochemical results strongly suggest that BgaA is the physiologically relevant lactose-active β-galactosidase in L47.
The case of LacZ further illustrates the gap between predicted enzymatic potential and practical applicability. Despite its favorable predicted lactose-binding affinity and the well-established efficiency of LacZ-type enzymes, the L47 LacZ could not be obtained in an active form. Large, multimeric enzymes such as LacZ are notoriously prone to aggregation and inclusion body formation during recombinant expression, even in optimized bacterial hosts. This limitation highlights a critical consideration for industrial enzyme selection: catalytic superiority in theory is insufficient if the enzyme cannot be produced in a functional and scalable form. In this context, BgaA emerges as the most promising β-galactosidase from L47, not due to superior intrinsic kinetics, but because it retains measurable activity when produced as inclusion bodies.
The observation that catalytically active enzymes can be recovered directly from washed inclusion bodies aligns with a growing body of literature redefining inclusion bodies as functional protein aggregates rather than inactive waste products [
34,
35,
36]. Inclusion bodies have been shown to act as mechanically stable, protease-resistant reservoirs of enzymatic activity and, in some cases, as naturally immobilized biocatalysts suitable for industrial applications [
37]. In this study, the use of mild washing steps with nonionic detergents allowed removal of cellular contaminants without disrupting the catalytic competence of BgaA-associated aggregates, supporting this modern view of inclusion bodies as usable biocatalytic materials.
With BgaA identified as the principal lactose-hydrolyzing enzyme, a conceptual lactose-to-tagatose bioconversion cascade can be envisioned using BgaA in combination with L-arabinose isomerase (AraA). While BgaA efficiently generates galactose from lactose, the isomerization of galactose to tagatose by AraA represents the main bottleneck of the process. Under the conditions tested, AraA yielded approximately 18% tagatose from 100 mM galactose, well below both the theoretical equilibrium and the conversion levels required for industrial feasibility. Similar limitations have been reported for other L-AIs, where high substrate concentrations (>500 mM) and optimized reaction conditions are required to approach equilibrium yields [
11].
The limited performance of AraA suggests that future improvements should focus on enzyme and process optimization rather than lactose hydrolysis itself. Strategies may include protein engineering to enhance AraA stability and catalytic efficiency, immobilization approaches to reduce enzyme precipitation, or the use of alternative L-AIs with higher intrinsic activity or reduced metal dependence [
6,
11]. Notably, recent studies have demonstrated improved tagatose production using multi-enzyme cascades combining robust β-galactosidases with thermostable or metal-independent L-AIs [
38], highlighting promising directions for future development.
Finally, the temperature and pH profiles of the L47 enzymes are noteworthy in the context of industrial flexibility. As a psychrotolerant isolate, L47 produces enzymes capable of retaining activity across a broad temperature range, from near-freezing conditions to moderately elevated temperatures. Crude extract assays revealed sustained β-galactosidase activity even at 4 °C and up to approximately 60 °C (
Figure S1F), a desirable trait for processes requiring adaptability to variable operating conditions. Likewise, the slightly acidic pH optimum of AraA (~6.0) may facilitate integration with β-galactosidase-catalyzed lactose hydrolysis, enabling operation at a compromise pH that preserves acceptable activity for both enzymatic steps.
4. Materials and Methods
4.1. Selection of Strain of Interest
The 28 isolates from Antarctica, stored at −80 °C, were reactivated in Luria-Bertani (LB) liquid medium at 15 °C. Refresher cultures were performed under aerobic conditions, with orbital shaking at 120 rpm in a SKIR-601 system (Shin Saeng, Seoul, Republic of Korea) for 5 days. They were also cultured on LB agar at 15 °C for the same period. In isolates that showed growth on LB medium, the ability to utilize L-arabinose as the sole source of carbon and energy was evaluated. For this purpose, the microorganisms were cultured in minimal medium M9 [
39] supplemented with L-arabinose, under aerobic conditions, in Erlenmeyer flasks incubated at 15 °C with shaking at 120 rpm. The ability to utilize L-arabinose as a sole carbon source was verified after 7 days of culture, determining cell growth by turbidimetry (OD
600nm).
Similarly, the growth of strains that grew in LB medium was evaluated in M9 minimal mineral medium supplemented with lactose (2% w/v). Those strains selected for their ability to grow using lactose as the sole carbon source were plated on agar plates containing M9 minimal mineral medium supplemented with lactose (2% w/v) and 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-Gal). The plates were incubated at 20 °C for 4 days or until blue colonies appeared.
4.2. Identification of Microorganisms by 16S rRNA Sequencing
For isolates that used L-arabinose as the sole carbon source, partial amplification of the 16S ribosomal RNA gene was performed using polymerase chain reaction (PCR). The universal primers 27F 5′-AGAGTTTGATC(A/C)TGGCTCAG-3′ and 1492R 5′-TACGG(C/T)TACCTTGTTACGACTT-3′ were used. Once the amplicon was obtained, the PCR product was purified using the Wizard® SV Gel and PCR Clean-Up System (Promega©). The purified PCR product was sequenced using the MacroGen Inc. sequencing service with the primers 785F 5′-GGATTAGATACCCTGGTA-3′ and 907R 5′-CCGTCAATTC(A/C)TTT(A/G)AGTTT-3′ by capillary electrophoresis. The sequencing results were analyzed using the BLASTn database of the National Center for Biotechnology Information (NCBI), which allowed the identification of 13 bacterial isolates to the genus level. Once the strains were identified, their reference genomes deposited at the NCBI were searched for putative L-AI coding sequences.
4.3. Evaluation of D-Tagatose and L-Ribulose Production Using Cell Extracts from Isolates
In L47 and R61 isolates, whose genomes were found to contain putative genes for L-AI, the capacity of their cell extracts to catalyze the synthesis of L-ribulose and D-tagatose from L-arabinose and D-galactose, respectively, was evaluated.
Both isolates were grown in minimal medium M9 supplemented with L-arabinose to stimulate L-AI expression. The strains were cultured until reaching an OD
600nm of 1. Biomass was recovered by centrifugation at 4080 RCF and 4 °C for 15 min using a Hettich Universal 320R centrifuge machine (Hettich GmbH & Co. KG, Tuttlingen, Germany). The cell pellet was resuspended in lysis buffer (50 mM Tris-HCl, pH 7, 0.1 mM phenylmethylsulfonyl fluoride (PMSF)) and lysed by cavitation using a Sonic-650WT-V2 instrument (MRC Ltd., Holon, Israel). The ultrasonic cavitation process was carried out for 30 min at 85% amplitude, with 2 s pulses followed by 3 s rest periods. Cell debris was removed by centrifugation at 4080 RCF and 4 °C for 15 min. The supernatant (cell extract) was recovered and characterized for total protein content using the Bradford method [
40], with bovine serum albumin (BSA) as the standard. The arabinose and galactose isomerization activity present in the cell extracts was determined by quantifying D-tagatose production. or L-ribulose from D-galactose or L-arabinose, respectively. As a first approach to verify the activity of the L-AI enzyme in the cell extracts, an initial reaction was carried out in an LXC thermoregulated water bath (PolyScience, Niles, IL, USA) for both isolates, at a temperature of 40 °C, using an initial substrate concentration of 300 mM, 5 mg/mL of the cell extract, 1 mM of each cofactor (CoCl
2 and MnCl
2), and 50 mM Tris-HCl buffer, pH 7. The reactions were conducted under shaking at 300 rpm in a total volume of 2 mL. Samples were taken at 0, 1, 2, 3, 5, and 7 days, and the reaction was stopped by incubating the samples at 99 °C for 2 min [
18]. Once the presence of L-AI activity in the cell extract was confirmed, the effect of pH was evaluated by testing pH 5 and 6 (citrate buffer) and 7 and 8 (Tris-HCl), as well as the effect of temperature (4 to 60 °C) and the presence of cofactors (CoCl
2 and MnCl
2) on enzyme activity. The effect of these parameters was assessed by independently varying each variable. The substrates (D-tagatose or L-ribulose) and products (D-galactose or L-arabinose) were determined by high-performance liquid chromatography (HPLC) as described later in the carbohydrate analysis section.
4.4. Determination of β-Galactosidase Activity by o-NPG Hydrolysis
Enzyme activity in the extracts was quantified spectrophotometrically through the hydrolysis reaction of o-NPG. This reaction releases o-nitrophenol (o-NP) as a product, a compound that imparts a yellow color to the sample and exhibits absorbance at 420 nm. The optimum pH for the obtained enzymes was determined by quantifying activity within a pH range of 4 to 8. Cofactor requirements were verified by adding 0.1 mM MnCl2 and 0.1 mM CaCl2 during the activity quantification assay. All assays were performed in triplicate using an initial o-NPG concentration of 11 mM. For the purposes of this work, an international unit of β-D-galactosidase is defined as the amount of enzyme capable of hydrolyzing 1 µmol of o-NPG per minute at 20 °C.
4.5. Genomic DNA Isolation and Gene Identification
Genomic DNA from strain L47 was isolated using the Wizard® Genomic DNA Purification Kit (Promega©) (Plasmidsaurus, Eugene, OR, USA). The quantification of purified L47 genomic DNA was performed at 260 nm on a TECAN Infinite M200 Pro multiplate reader using NanoQuant plate (Tecan Group Ltd., Männedorf, Switzerland) and its purity was assessed by means of the A260/280 ratio. This DNA was sent for sequencing and analysis to the commercial service Plasmidsaurus (Eugene, OR, USA). Sequencing was performed using Oxford Nanopore long-read technology. Subsequent analysis—including assembly, annotation, and identification of genes of interest—was carried out by the technical team of the same service, identifying the sequence corresponding to the araA gene in the genome by SnapGene program (version 8.0.3).
4.6. Genome Relatedness (ANI/dDDH) and In Silico Operon/Promoter Prediction
Taxonomic classification of Isolate L47 was initially determined using Average Nucleotide Identity (ANI) analysis against the GTDB database (release 226) genomes, as implemented in GTDB-tk software (version 2.4.1) [
41]. Subsequently, a genus-level ANI analysis was performed. This involved downloading seven Genus relative genomes from the GTDB database, based on the genus assigned to Isolate L47, and calculating pairwise ANI values using fastANI software (version 1.34) [
42]. These pairwise ANI values were then used to generate a heatmap visualization in R employing the ComplexHeatmap package functions [
43]. Finally, digital DNA-DNA Hybridization (dDDH) was calculated between the Isolate L47 genome and the prokaryotic genome exhibiting the highest ANI value, utilizing the online web tool GGDC 3.0 [
44].
Operons carrying the target genes were predicted with Operon-mapper (web version, 2021), which infers transcriptional units based on gene orientation and intergenic distances. The L47 genome was annotated with Prokka v1.14.6 [
45] to confirm gene context and to identify putative sigma factor genes by homology/domain detection. Operon coordinates and orientations were validated by BLASTn using BLAST+ v2.12.0 [
46]. Up to 500 bp upstream of each predicted operon were extracted and analyzed with BPROM (SoftBerr Inc., Mount Kisco, NY, USA) to identify −10/−35 motifs, estimate transcription start sites, and obtain LDF scores consistent with σ
70-dependent promoters [
47].
4.7. Microscopic and Genomic Characterization of the Isolate L47
Considering the L-AI enzymatic activity characteristics observed in cell extracts, in terms of pH, temperature, and cofactor dependence, the Ewingella sp. isolate was selected for further study. Therefore, a partial characterization of this isolate was undertaken, given the lack of detailed information on its morphology and genomics in the literature.
As a first step, morphological characterization was performed using Gram staining [
48]. In addition, the strain was characterized by scanning electron microscopy (SEM) at the Electron Microscopy and Microanalysis Laboratory of the University of Chile. For sample preparation, 1 mL of a logarithmic-phase culture (OD
600nm ≈ 1) was taken and washed twice with 0.01 M sodium cacodylate buffer, pH 7, containing 0.15 M NaCl, and incubated for 15 min. The cells were then fixed by incubating them for 2 h at room temperature in a 1%
v/
v glutaraldehyde solution prepared in the same buffer (0.01 M sodium cacodylate, pH 7; 0.15 M NaCl). The samples were then washed again twice with the same 0.01 M cacodylate buffer, pH 7; 0.15 M NaCl for 15 min. Finally, the samples were dehydrated in an ascending series of ethanol (30, 50, 70, 80, 90 and 100%), coated with a thin gold film and observed in a variable pressure EVO MA10 scanning electron microscope (SEM) (Carl Zeiss) (Carl Zeiss Microscopy GmbH, Oberkochen, Germany) at 50 Pa and a voltage of 15 kV.
4.8. Primer Design, Gene Amplification by PCR, and Construction of Expression Vectors
For the design and synthesis of primers, the annotated genes within the sequenced genome of the
Ewingella americana isolate, identified in the previous section, were searched. Once the sequence was confirmed, and to clone the gene into the pET-21b(+) or pET101/D-TOPO vector, primers with sequences specific to the gene of interest were generated, as shown in
Table 2.
PCR reactions were performed on a Multigene™ OptiMax thermal cycler (Labnet Houston, TX, USA). Amplifications were conducted in a 50 µL volume containing the isolate’s genomic DNA, engineered primers, deoxyribonucleoside triphosphates (dNTPs), GoTaq® Q5 polymerase, and GoTaq® Q5 polymerase working buffer (Applied Biosystems, Foster City, CA, USA). The PCR reaction was carried out according to the following program: 95 °C for an initial 2 min, followed by 30 cycles of 95 °C for 30 s, 63 °C for 30 s, 72 °C for 50 s, and a final amplification at 72 °C for 5 min. The PCR product was visualized by electrophoresis on a 1% w/v agarose gel in pH 8 Tris-Acetate-EDTA buffer (TAE). SYBR® Safe (Thermo Fisher Scientific, Waltham, MA, USA) was used for DNA staining and visualization on the transilluminator.
The PCR product was treated using the Wizard
® PCR preps purification system from Promega
© and digested with the restriction enzymes. Simultaneously, the pET-21b(+) vector was linearized using the same restriction enzymes. The digestion product and the linearized vector were contacted (at a 1:5 or 1:3 ratios (vector:PCR product)) overnight at 16 °C in the presence of T4 DNA ligase (New England Biolabs, Ipswich, MA, USA) and its corresponding ligation buffer (50 mM Tris-HCl pH 7; 10 mM MgCl
2; 1 mM DTT; and 1 mM ATP), resulting in the recombinant pET-21b(+)-L47genes vector. Subsequently, chemocompetent
E. coli DH5α or TOP10 cells were transformed with the respective vector pET-21b by heat shock. The transformed cells were grown in LB medium supplemented with ampicillin (50 mg/mL). Transformed colonies were verified with universal primers for the T7 region of the vector, and to verify the integrity and directionality of the insert, the recombinant plasmids were purified from confirmed colonies and amplified with the primers (
Table 2) to verify the presence of the insert. Once obtained, the insert was sequenced by capillary electrophoresis at the sequencing service of the Pontificia Universidad Católica de Chile (FONDEQUIP EQM150077) using an Applied Biosystems ABI PRISM 3500 XL instrument to verify its directionality and integrity.
For the pET101/D-TOPO Champion™ vector, the forward primers were designed with a CACC sequence at the 5′ end, which is required for the directional cloning method, and the stop codon was removed from the reverse primer. In the case, purified PCR product of the bglY gene was used, after being purified, to ligate directly with the vector, in a 1:3 ratio following the manufacturer’s protocol, incubating the reaction for 30 min at room temperature.
4.9. Gene Cloning and Heterologous Expression
For
E. coli expression, the genes were subcloned into expression plasmids (
Table 3). Specifically,
bglY was cloned into pET-101/D-TOPO (Invitrogen, Carlsbad, CA, USA) to add a C-terminal 6×His-tag, and
araA,
bgaA and
lacZ were cloned into pET-21a(+) (Novagen, Madison, WI, USA) which also provides a C-terminal His-tag. These constructs (pET101/
bglY, pET101/
araA, pET21/
bgaA, pET21/
lacZ) were transformed into chemically competent
E. coli (DH5α or TOP10 for plasmid propagation). Verified plasmids were then transformed into expression strains:
E. coli BL21 (DE3) for the β-galactosidases and
E. coli Rosetta (DE3) for
araA (Rosetta provides tRNAs for rare codons, aiding expression of the
araA gene which had several rare codons).
Expression of recombinant enzymes in E. coli was induced with isopropyl β-D-1-thiogalactopyranoside (IPTG). For araA in Rosetta (DE3), cultures were grown at 37 °C to OD600nm ~0.6, then induced with 0.5 or 1 mM IPTG and incubated overnight at 37 °C to enhance soluble yield. For the β-galactosidases in BL21 (DE3), induction was similarly done at OD600nm ~0.6 with 1 mM IPTG, followed by overnight expression at 18 °C. In all cases, uninduced control cultures were maintained in parallel. After expression, cells were harvested by centrifugation and resuspended in lysis buffer (20 mM Tris–HCl, pH 7 or 8, 300 mM NaCl, and 1 mM PMSF). Cells were lysed by sonication on ice.
For AraA, most of the protein was found in the soluble fraction after centrifugation (15,000× g, 30 min, 4 °C). For the β-galactosidases, a large portion of each recombinant protein was found in insoluble inclusion bodies, especially for BgaA, BglY, and LacZ expressed in E. coli. As a result, we prepared both soluble extracts and inclusion body fractions for purification, as described below.
4.10. Expression and Recovery of AraA and Inclusion Body Preparation of β-Galactosidases
All target enzymes were engineered with a His-tag to facilitate purification by nickel-affinity chromatography.
For L-arabinose isomerase (AraA), the protein was produced in soluble form in E. coli Rosetta (DE3). The clarified supernatant from cell lysate was loaded onto a HisTrap HP Ni–NTA column (GE Healthcare, Chicago, IL, USA) pre-equilibrated with binding buffer (20 mM Tris–HCl, 300 mM NaCl, 10 mM imidazole, pH 8.0). After washing the column with buffer containing 50 mM imidazole to remove non-specific proteins, AraA was eluted using 250 mM imidazole. Eluted fractions were pooled and dialyzed into storage buffer (50 mM HEPES, 100 mM NaCl, pH 7.5). The final preparation showed high purity on SDS-PAGE and yielded approximately 10 mg/L of culture.
In contrast, expression of the β-galactosidases (BgaA, BglY, LacZ) in soluble form was negligible, so these enzymes were processed from inclusion bodies. After harvesting 0.5–1 L cultures, cell pellets were lysed by sonication and the insoluble fractions (inclusion body pellets) were collected by centrifugation. These pellets underwent a stepwise washing protocol to remove contaminants and facilitate future solubilization: First wash: Pellets were resuspended in 100 mM Tris–HCl buffer (pH 7.0) containing 0.5% (v/v) Triton X-100, incubated at 25 °C for 20 min, and centrifuged at 9000 RCF (4 °C, 10 min). Second wash: The pellet was resuspended in the same buffer supplemented with DNase I (25 µg/mL), incubated for 20 min at 25 °C, and centrifuged under the same conditions. Final washes: Two additional washes were performed using Tris buffer alone (no detergents or enzymes) to eliminate residual Triton or nucleases. These sequential washes yielded clean inclusion body preparations that could be stored or used directly in enzyme assays. The resulting protein yields from washed inclusion bodies were estimated at ~3–5 mg/L for BgaA and BglY, and ~1–2 mg/L for LacZ.
All purification and processing steps were evaluated by SDS-PAGE on 12% polyacrylamide gels stained with Coomassie Brilliant Blue R-250. Protein concentration was determined using the Bradford assay (Bio-Rad Laboratories, Hercules, CA, USA), with bovine serum albumin as a standard. The processed proteins were then used for enzymatic assays and activity screening.
4.11. Enzymatic Activity Assays
L-arabinose isomerase (AraA) activity assays: AraA activity was assessed by measuring the production of D-tagatose (or L-ribulose) from galactose (or arabinose) via cysteine–sulfuric acid or cysteine–carbazole assays, and by high-performance liquid chromatography (HPLC) for endpoint yield. For kinetic parameters, we focused on the native L-arabinose to L-ribulose reaction, which is easier to monitor continuously. Initial reaction rates were measured at 30 °C in 50 mM HEPES buffer (pH 7.5) by varying L-arabinose concentration (50–600 mM) in the presence of 1 mM MnCl
2, which was selected based on preliminary metal-dependence assays showing maximal activation compared to Co
2+ or Mg
2+. We used cysteine and sulfuric acid to derivatize produced L-ribulose (which produces a distinct absorbance at 548 nm) and measured absorbance over time. Michaelis–Menten kinetics (V
max, K
m) for AraA were obtained by nonlinear regression of the initial velocity data. Additionally, to evaluate tagatose production, purified AraA (~0.2 mg/mL) was incubated with 100 mM D-galactose in 50 mM HEPES, pH 7.5, with 1 mM Mn
2+ at 30 °C. After 48 h, the reaction mixture was analyzed by HPLC (Rezex RCM monosaccharide column) with refractive index detection to determine the concentration of D-tagatose produced. We also tested AraA activity across different pH values (pH 5.5 to 8.5) and with alternative metal ions (Co
2+, Mg
2+, or no metal, in comparison to Mn
2+) to assess its cofactor requirements and stability, as detailed in the
Supplementary Materials.
β-galactosidase activity assays: The enzymatic activity of inclusion bodies obtained from transformed
E. coli BL21 (DE3) strains was evaluated. Analysis was performed spectrophotometrically via the o-NPG hydrolysis reaction, which releases o-NP as a product. This yellow compound exhibits absorbance at 420 nm. β-galactosidase activity assays were performed using washed inclusion bodies, as these aggregates retain catalytically active protein and avoid potential loss of activity associated with chaotropic solubilization and refolding procedures. Mn
2+ (0.1 mM) was included as a functional divalent cation cofactor, as it can substitute for Mg
2+ in β-galactosidase catalysis, while Ca
2+ (1 mM) was added to support potential structural stabilization, particularly for GH42 family enzymes. Preliminary assays comparing metal conditions are shown in
Supplementary Figure S5. The reaction mixture contained 100 µL of the inclusion body solution, 0.1 mM MnCl
2 cofactor, 1 mM CaCl
2 cofactor, 45 mM o-NPG substrate, and 100 mM Tris-HCl buffer (pH 4–pH 7). The reaction was carried out for 90 min.
To determine the kinetic parameters of BgaA, it was analyzed via the o-NPG hydrolysis reaction, measuring absorbance at 420 nm as described above. The reaction mixture contained 100 µL of inclusion solution, 1 mM MnCl2 cofactor, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, and 60 mM substrate (o-NPG), and 100 mM Tris-HCl buffer at pH 6. The reaction was carried out for 90 min.
The experiments were conducted on the TECAN Infinite M200 Pro multi-reader using a NanoQuant plate (TECAN) (Tecan Group Ltd., Männedorf, Switzerland). The assay was performed in triplicate. For the purposes of this work, one international unit of β-Gal is defined as the amount of enzyme capable of hydrolyzing 1 µmol of o-NPG per minute at room temperature.
4.12. Computational Sequence and Structural Analysis
Homology modeling: Three-dimensional models of the L47 enzymes (LacZ, BgaA, BglY, AraA) were constructed to analyze their structural features. For each enzyme, we identified suitable template structures via BLAST and HHsearch against the Protein Data Bank [
49]. The best templates were:
E. coli LacZ (PDB ID: 1JYN, 50% identity [
7]) for L47 LacZ; a
Bacillus GH42 β-galactosidase (PDB ID: 3TTS, 55% identity [
26]) for L47 BglY; an intracellular
Geobacillus GH42 β-galactosidase (GanB, PDB ID: 4OIF, 48% identity [
50]) for L47 BgaA; and
E. coli L-arabinose isomerase (PDB ID: 2AJT, 70% identity [
25]) for L47 AraA. Models were built using Alphafold server [
51] and MODELLER 10.4 [
52], generating 100 candidate structures for each enzyme. Models were evaluated by their DOPE score [
53] and verified for stereochemical quality and fold accuracy using tools like VERIFY3D and PROCHECK [
54]. The top-ranking model for each enzyme was selected for further analysis. Figures of protein structures were prepared with PyMOL v2.5 and VMD v1.9 [
55].
Substrate docking: Lactose was docked into the β-galactosidase models (LacZ, BgaA, BglY) and D-galactose into the AraA model to examine substrate binding modes. We used AutoDock Vina [
56] software to generate plausible poses of lactose in the enzyme active sites. For each β-galactosidase, the search space was defined around the catalytic cleft containing the two key glutamates. For AraA, docking of galactose considered both closed-ring and open-chain forms. Top-scoring poses were inspected for consistency with known binding in homologous structures (e.g., lactose in
E. coli LacZ, arabinose in
E. coli AraA). The best pose for each enzyme–substrate pair was used as the starting conformation for simulations.
Molecular dynamics (MD) simulations: To refine the docking poses and assess stability, we carried out MD simulations (100 ns each) of the enzyme–substrate complexes. Each complex was placed in a cubic water box with 150 mM NaCl. Simulations were performed with the NAMD 2.14 engine [
57] using the CHARMM36 force field for proteins and carbohydrates. After energy minimization and equilibration (with gradual relaxation of position restraints on the protein), production runs of 100 ns were conducted in the NPT ensemble at 300 K. Coordinates were saved every 2 ps. We monitored the root-mean-square deviation (RMSD) of backbone atoms over time to ensure the systems reached equilibrium and computed the root-mean-square fluctuation (RMSF) for key active-site residues to evaluate flexibility.
Binding free energy calculations: To estimate relative binding affinities of lactose/galactose to the enzymes, we performed endpoint free energy calculations using the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) method [
58]. From each 100 ns MD trajectory, we extracted snapshots (every 10 ns in the equilibrated phase) and calculated the ΔG of binding for the substrate to the enzyme using the GB model and surface area term for solvation. The average ΔG
bind and standard deviation were obtained for each enzyme–substrate complex. While absolute values from MM/GBSA have limited accuracy, the comparative rankings provide insight into which enzyme is predicted to bind the substrate most tightly. For the free energy, 3000 frames from the simulation were taken for LacZ, 1500 frames for both BgaA and BglY, and 2500 frames for AraA.