Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides

Rehman, Yasir; Kim, Younghoon; Tong, Michelle; Blaby, Ian K.; Blaby-Haas, Crysten E.; Beatty, J. Thomas

doi:10.3390/biom15111529

Open AccessArticle

Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides

by

Yasir Rehman

^1,2,

Younghoon Kim

¹,

Michelle Tong

¹,

Ian K. Blaby

^3,4,

Crysten E. Blaby-Haas

^3,5,* and

J. Thomas Beatty

^1,*

¹

Department of Microbiology & Immunology, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada

²

Department of Life Sciences, School of Science, University of Management and Technology, Lahore 54770, Pakistan

³

U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

⁴

Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

⁵

The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

^*

Authors to whom correspondence should be addressed.

Biomolecules 2025, 15(11), 1529; https://doi.org/10.3390/biom15111529

Submission received: 19 September 2025 / Revised: 16 October 2025 / Accepted: 24 October 2025 / Published: 30 October 2025

(This article belongs to the Special Issue New Insights into the Membranes of Anoxygenic Phototrophic Bacteria)

Download

Browse Figures

Versions Notes

Abstract

Photosynthesis is the foundation of the vast majority of life systems, and is therefore the most important bioenergetic process on earth. The greatest diversity of photosynthetic systems is found in microorganisms. However, our understanding of the biophysical and biochemical processes that transduce light into chemical energy is derived from a relatively small subset of proteins from microbes that are amenable to cultivation, in contrast to the huge number of predicted proteins that catalyze the initial photochemical reactions deposited in databases, such as from metagenomics. We describe the use of a Rhodobacter sphaeroides laboratory strain for the expression of heterologous photosynthesis genes to demonstrate the feasibility of mining this resource, focusing on hot spring Chloroflexota gene sequences. Using a synthetic operon of genes, we produced a photochemically active complex of reaction center proteins in our biological system. We also present bioinformatic analyses of anoxygenic type II reaction center sequences from metagenomic samples collected from hot (42–90 °C) springs available through the JGI IMG database, to generate a resource of diverse sequences that are potentially adapted to photosynthesis at such temperatures. These data provide a view into the natural diversity of anoxygenic photosynthesis, through a lens focused on high-temperature environments. The approach we took to express such genes can be applied for potential biotechnology purposes as well as for studies of fundamental catalytic properties of these heretofore inaccessible protein complexes.

Keywords:

Chloroflexota; photosynthetic reaction center; heterologous expression; anoxygenic; hot springs; bioinformatic database

1. Introduction

Prokaryotes in the phylum Chloroflexota (formerly Chloroflexi) are metabolically very diverse, and have been found in abundance in extreme environments ranging from freshwater hot springs to the sediments of the sea floor [1]. Chloroflexota species have been used as sources of enzymes with unusual catalytic properties and tolerance of temperature or pH extremes, for example, for biotechnology applications [2].

Apart from one organism that has a type I reaction center (RC) [3], at present most photosynthetic Chloroflexota contain a type II RC, exemplified by the thermophiles Chloroflexus aurantiacus and Roseiflexus canstenholzii, and perform anoxygenic photosynthesis utilizing an RC that contains bacteriochlorophyll (BChl) a and resembles the widely conserved RC of purple phototrophic bacteria such as Rhodobacter species [4,5]. In fact, all bacterial type II RCs are structurally very similar in terms of the two chlorophyll-binding proteins (PufL and PufM) that are universally present. The RC is imbedded in the cell membrane by virtue of transmembrane segments of PufL and PufM—typically five alpha helices in each. The purple bacterial RC also contains a third protein, the H protein, which is absent from Chloroflexota RCs. The H protein stabilizes the RC, and it is unknown how the C. aurantiacus and R. castenholzii RCs, which lack an H protein, are inherently resistant to the relatively high temperatures of environments in which these organisms are found. As shown in Figure 1, the cofactors present in type II RCs consist of six molecules of BChl and bacteriopheophytin (BPhe), two quinones (Q_A and Q_B) and an iron atom (except for C. aurantiacus which has a manganese atom instead of iron) [6,7]. The 3-D organization of these cofactors is very similar in diverse species, although the ratio of BPhe:BChl may vary from 2:4 to 3:3, and the quinones may be solely ubiquinone or menaquinone, or one of each, depending on the species [8]. The C. aurantiacus and R. castenholzii RCs contain 3 BPhes and 3 BChls, and menaquinone is the sole type of quinone.

Light energy excites a special pair of BChls (called P) in the RC, and electrons are transferred from P to a quinone (Q_B) that leaves the RC after being reduced by acquisition of two electrons and two protons. The reduced quinone diffuses to a cytochrome complex where it is oxidized, while an oxidized quinone takes its place in the RC for another cycle of light-driven electron and proton transfer reactions.

The C. aurantiacus RC was isolated in 1983 [11], and since then has been studied biochemically and spectroscopically. Electron microscopy has shown that the RC is surrounded by a ring of light-harvesting proteins [12] that together are called the core complex, similar to the high resolution cryo-EM structures of core complexes from many species of purple phototrophic bacteria and R. castenholzii [13]. Although the C. aurantiacus and R. castenholzii RC-LH core 3-D structures are known at a high resolution [7,9,14], and a number of spectroscopic tools have been used to study the function of this and the C. aurantiacus core complex [15,16,17,18], there is no genetic system for expressing mutant or engineered genes of these simple, thermostable RCs. There are basic scientific questions, such as the reasons for thermal stability, that could be addressed using a genetic approach. There are also potential biotechnical applications for thermostable RCs that harvest solar energy for electrical power. Although the thermophilic cyanobacterium Thermosynechococcus has been used in such applications [19,20], RCs from anoxygenic phototrophs remain largely unexploited in this connection, which would be enhanced by the use of genetic techniques. Furthermore, the number of Chloroflexota and related photosynthesis gene sequences is burgeoning and has far outstripped the number of organisms that have been cultivated. If these genes could be expressed in a tractable host that provides an appropriate membrane system, BChl, and other cofactors needed for assembly of a functional RC, it would greatly expand the diversity of such photosynthetic complexes that could be used to address fundamental scientific questions and exploited in biotechnology applications.

A major obstacle to the use of Escherichia coli or other chemotrophic organism as a host for heterologous expression of photosynthesis genes is that it is not yet possible to obtain a high level of chlorophyll production in such species [21]. We circumvented this obstacle by using an engineered strain of the purple phototrophic bacterium Rhodobacter sphaeroides to express the Chloroflexota bacterium L.E.CH.39_1 RC gene sequences found in a hot spring metagenome [22]. We show that these previously uncharacterized Chloroflexota genes were expressed in the R. sphaeroides host, using a synthetic, codon-optimized operon, resulting in an RC with an absorption spectrum that resembles that of C. aurantiacus and R. castenholzii. This RC was catalytically functional on the basis of light flash-induced oxidation of the P BChls. These results show that this system for heterologous expression of photosynthesis genes from uncultivated organisms has the potential for greatly expanding the diversity of naturally evolved RCs that can be studied, as material for future scientific investigations and biotechnological applications.

As such, we also searched the Joint Genome Institute’s (JGI) Integrated Microbial Genomes and Microbiomes (IMG/M) database for metagenomic-derived sequences encoding homologues of the RC proteins PufM and PufL that were obtained from aquatic environments in the temperature range of 42–90 °C. In addition to identifying putative RC proteins from yet-to-be isolated hot spring bacteria, these comprehensive sequence analyses highlight the existence of highly divergent clades of PufL/PufM proteins encoded by Chloroflexota genomes and related environmental samples. Evolutionarily distinct hot spring metagenomic-derived RC proteins more closely related to proteins from Pseudomonadota were also found and represented the largest number of unique RC protein sequences identified in this analysis. In addition to providing a broad phylogenetic examination of the evolution of PufL/PufM, this analysis represents a resource that can be exploited to readily identify genes from organisms that are likely to be capable of performing anoxygenic photosynthesis in this temperature range.

2. Materials and Methods

2.1. Strains, Growth Conditions and Plasmids

The E. coli strain DH5α was used for cloning, and strain S17-1 [23] was used for conjugation of plasmids into R. sphaeroides. E. coli BL21 DE3 [24] was used for isopropyl β-D-1-thiogalactopyranoside (IPTG) induction of recombinant protein production for purification. R. sphaeroides strain RCx^R [25] containing plasmid pDJ-MK [26] was used as the host strain for plasmid expressing the Chloroflexota RC genes, and cultures were grown at 30 °C in RLB medium [25] supplemented with tetracycline-HCl (0.5 µg/mL) and kanamycin sulfate (10 μg/mL). The E. coli plasmid-containing strain was grown at 37 °C in Luria–Bertani (LB) medium [27] supplemented with antibiotics at the following concentrations in µg/mL: tetracycline, 10; kanamycin sulfate, 50.

Photosynthesis gene sequences of Chloroflexota bacterium L.E.CH.39_1 [22] (NCBI accession number JACAEO010000223) encoding the RC were modified to match the codon usage of Rhodobacter sphaeroides and interspaced with ribosome binding site sequences designed for optimal expression as a synthetic operon [28,29]. Designed sequences were synthesized (Twist Bioscience, South San Francisco, CA, USA) and assembled using Gibson assembly (NEB HiHi DNA Assembly, New England Biolabs, Ipswich, MA, USA) into the pIND4 (linearized by NcoI + HindIII digest) expression plasmid containing an IPTG-inducible promoter and a kanamycin resistance marker [30], yielding plasmid pCFRC. A 6-His tag was incorporated immediately after the N-terminal methionine of the PufM protein. The DNA sequence of this operon is given in Supplementary Table S1, and a map of the recombinant plasmid pCFRC that expresses the Chloroflexota RC genes is shown in Supplementary Figure S1.

2.2. Protein Purification, SDS-PAGE, and Spectroscopy

The RC was purified essentially as described previously [26], and summarized in the following text. Cells from induced cultures were collected by centrifugation and suspended in resuspension buffer (50 mM Tris-HCl, 150 mM NaCl, 0.5 mM MgCl₂, pH 8.0), disrupted in a French press cell, and centrifuged at 17,600× g for 15 min at 4 °C to pellet unbroken cells. The resultant supernatant liquid was ultracentrifuged at 424,000× g for 15 min at 4 °C to pellet membranes, which were solubilized by the addition of n-dodecyl maltoside to 1%, followed by ultracentrifugation at 424,000× g for 15 min. Solubilized proteins in the supernatant liquid were purified by binding to Ni²⁺-NTA agarose beads, washed with resuspension buffer containing 5 mM imidazole until the A₂₈₀ of the eluate decreased to less than 0.05, after which the RC was eluted with 150 mM imidazole in resuspension buffer.

Proteins were separated in SDS-PAGE using 4% stacking and 12% resolving gels. Samples were mixed with 5× loading buffer and heated at the appropriate temperature for 10 min before loading. Gels were run at 150–200 V for ~45 min or until the dye front reached the bottom of the resolving gel. For total protein visualization, gels were stained overnight with Coomassie brilliant blue and destained the following day.

For measurement of redox-induced changes in absorption, RC samples were reduced by the addition of an L-ascorbic acid solution to a final concentration of 10 mM, and oxidized by the addition of ferricyanide to 10 mM concentration. Photobleaching of RC samples was performed using a 100 ms flash of illumination at 860 nm, while measuring the absorption at 865 nm as previously described [26].

2.3. Bioinformatic Tools and Data

Molecular graphics and analyses were performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases.

Publicly available metagenomic bins, metagenomic-derived protein sequences, and associated search tools were accessed through the Joint Genome Institute’s IMG/MER (the login version of IMG/M) (April 2025) [31]. The UniProt database (version: 2025_01) [32] was accessed through the EFI-EST tool [33,34] using the Protein Family Addition Option with PF00124; to reduce the number of nodes, the UniRef90 [35] option was selected. Sequence Similarity Networks (SSNs) were visualized in Cytoscape (version 3.10.3) with the Prefuse Force Directed OpenCL layout without edge weights [36]. InterProScan [37] was used to determine the sequence coordinates of PF00124 domains [38]. Phylogenetic trees were built using MAFFT [39] for multiple sequence alignments and FastTreeMP [40] with default parameters on the CIPRES Science Gateway [41]. The resulting trees were visualized and annotated in iTOL [42]. Multiple sequence alignments were manually edited in Jalview [43] to remove fragments and poorly aligned regions; sequence redundancy in the edited alignment was reduced based on 95% sequence identity using the Remove Redundancy tool in Jalview, i.e., if the percentage identity between the aligned positions of any two sequences exceeds 95%, the shorter sequence is discarded. Protein sequences that remained after this filtering step are referred to as “unique”. Gene neighborhoods for selected proteins from UniProt were visualized with EFI-GNT [34]. Data and information for SSNs, the alignment, the tree, and retrieved gene neighborhood information are available in Tables S2–S4.

2.3.1. Curation of Hot Spring Metagenomes

Identification of hot spring metagenomic bins involved a search of public IMG metadata for user-supplied sample collection temperature of above 42 °C associated with datasets labeled as “Thermal springs: Hot (42–90 °C)”. Not all samples in this Ecosystem Subtype were collected from waters in that temperature range, and those samples were excluded in this analysis (64 assemblies with a curated collection temperature below 42 °C). For missing metadata, we curated sample collection temperatures from the associated published studies, resulting in a conservative set of metagenomics-derived sequences isolated from samples of at least 42 °C (3070 assemblies were missing recorded collection temperature although they are annotated as being collected from “Thermal springs: Hot (42–90 °C)”). The resulting assemblies were then searched for predicted protein sequences similar to PufL or PufM (described in next section). This approach reduced the number of metagenomic bins from 3641 to 233 with a confirmed collection temperature between 42 °C and 90 °C and encoding a PufL or PufM sequence (Table S5). These 233 metagenomic samples are referred to as “hot spring metagenomic bins”, and proteins from these bins as “hot spring proteins”. These assemblies were largely derived from the United States (225 assemblies) with most originating from Yellowstone National Park (207 assemblies). The resulting curated data are associated with the studies of [44,45,46,47,48,49,50,51]. For UniProt- or NCBI-derived protein sequences, we collected isolation source from the corresponding GenBank record.

2.3.2. Identification of Reaction Center Proteins PufL and PufM

We searched the predicted protein sequences derived from the hot spring metagenomic bins using PF00124 and a blastp search with PufL and PufM from Chloroflexota bacterium L.E.CH.39_1 and the fused PufL-PufM protein from Roseiflexus sp. 629F.7. The blastp search was used to identify homologous proteins that do not have a significant match to PF00124, which can be true for related but highly divergent sequences. The blastp search identified 9 sequences from hot spring metagenomic bins the that are not recognized as belonging to PF00124, for a total of 29,685 protein sequences. An additional 51,363 UniProt sequences were identified using PF00124. Neither the blastp search nor the PF00124 search distinguishes PufM and PufL from the homologous D1 (PsbA) and D2 (PsbD) photosystem II (PSII) reaction center proteins from cyanobacteria, algae and land plants. To distinguish PufL and PufM from PsbA and PsbD, we generated an SSN with an alignment score of 90, which clearly delineates clusters containing PsbA or PsbD from clusters containing PufL or PufM (Figure S2).

To further analyze PufL and PufM, additional sequence analyses was performed. For each node in a Puf cluster (Figure S2), the member with the longest sequence was kept for downstream analysis. PufL and PufM are predicted to be encoded by a single ORF in some Chloroflexota genomes [47,52]. There is evidence that the mature form of these gene products is separate PufL and PufM proteins [9], but it is unknown at what point the fused polypeptide is processed or whether the RNA is processed. Nevertheless, a PufL-PufM fusion protein is computationally predicted during genome annotation in these cases. Therefore, we used InterProScan and alignments to PF00124 to identify the sequence coordinates for individual domains in both single- and double-domain proteins. This resulted in the identification of 164 PufL-PufM fusions from hot spring metagenomic bins and 20 representative fusions from UniProt. In Tables S2 and S3, domains from a fusion protein are labeled with “N” or “C”. The sequences corresponding to each domain (either from a fusion or non-fusion protein) were then used to regenerate the SSN, using alignment scores of 50 or 80. Lower alignment scores result in more edges connecting nodes, while higher alignment scores result in less edges. Use of multiple alignment scores allows the visualization of similarity between representative protein sequences at different thresholds. For each network, a node represents a cluster of sequences that share 95% sequence identity calculated by EFI-EST. Representative sequences and cluster members can be found in Table S2. Representative sequences from each node in the SSN generated with an alignment score of 80 were used to generate a multiple sequence alignment and phylogenetic tree (described in more detail in previous section). PsbA1, PsbA2, and PsbD1 from Acaryochloris marina were used to root the tree in iTOL.

Each subsequent analysis reduced the number of non-redundant protein sequences. Protein sequences were considered redundant if they shared 95% sequence identity or higher to one another. With metagenomic-derived proteins, using sequence identity to cluster sequences can be potentially misleading, as two proteins would be considered to be less than 95% identical if one sequence is shorter by 5%. Therefore, the phylogenetic reconstruction represents a more conservative estimate of uniqueness, as the multiple sequence alignment was edited to remove short sequences and poorly aligned regions before reducing the number of representative sequences based on sequence identity. As a result, 1716 protein sequences representing each connected node in the PufLM SSN was reduced to 995 representative sequences in the phylogenetic tree.

Following SSN construction, nodes without connecting edges were not retained in the subsequent analysis. In Table S2, these are listed as singletons under “cluster”. Typically, these nodes represent small incomplete protein sequences, i.e., fragments. However, the recently isolated PufL and PufM sequences in Vulcanimicrobium alpinum [53] were represented by edgeless nodes in the PF00124 SSN and were not included in downstream analyses. Therefore, to ensure that removal of node singletons did not inadvertently remove divergent full-length hot spring proteins, we performed a blastp search against the predicted proteins from each hot spring metagenomic bins using PufL from V. alpinum; we did not identify any similar hot spring proteins that are not already represented in the phylogenetic reconstruction.

3. Results

3.1. Design, Construction and Expression of Chloroflexota Synthetic pufLM Operon Encoding the RC

The genome of Chloroflexota L.E.CH.39_1 (hereafter referred to as Chloroflexota sp.) currently is in 459 contigs [22], with the RC pufL and pufM genes adjacent to each other on one contig. The amino acid sequence identities in full-length (Needleman-Wunsch) alignments between the proteins encoded by puf gene homologues in Chloroflexota sp. and C. aurantiacus are: pufL, 85%; pufM, 83%. For expression from the plasmid vector in the R. sphaeroides host, an operon (Figure 2) was synthesized with codons changed to the relatively high GC codons most frequently found in R. sphaeroides (see Section 2). An AlphaFold model [54] predicted the N-termini of PufL and PufM to be located in the cytoplasm (Figure 3). A 6×-His tag was incorporated at the N-terminus of the PufM protein to facilitate purification by metal ligand affinity chromatography. The operon was inserted into the IPTG-inducible expression plasmid pIND4, yielding plasmid pCFRC, which was introduced into the R. sphaeroides strain RCx^R (pDJ-MK) that lacks all puf genes encoding the RC and expresses menaquinone biosynthesis genes carried on a second plasmid [26].

3.2. SDS-PAGE, Absorption Spectra and Catalytic Activity of the RC Complex Produced by the pufLM Operon

The SDS-PAGE analysis of the RC proteins was complicated by their aberrant mobility and aggregation, which were affected by the temperature of sample treatment (Figure 4). Increasing the temperature of sample heating from 35 to 80 °C prior to loading the gel resulted in a decrease in intensity of the faster-migrating bands, and the appearance of bands near the top of the gel. The masses of PufL and 6His-PufM are 35.1 and 35.5 kD; therefore, these proteins would migrate as a single band in the gel system employed. Furthermore, hydrophobic proteins such as PufL and PufM often migrate faster than would be indicated by their mass. Therefore, we suggest that the broad band at approximately the 27 kD position represents the PufL and 6His-PufM proteins, whereas the slower-migrating bands represent aggregations of these proteins.

After affinity chromatography purification, the RC complex yielded an absorption spectrum with three peaks in the 700 to 800 nm range (the Qy bands), which is characteristic of anoxygenic Type II RCs (Figure 5). In this case, we attribute these peaks to the presence of the P BChls (~860 nm), accessory BChl (~810 nm), and BPhes (~760 nm). In the visible (Qx) region of the spectrum the peak near 535 nm is attributed to BPhe, and the peak near 600 nm to BChl. The relative peak heights in both the Qy and Qx regions of the spectrum are consistent with the presence of three BChls and three BPhes per RC, as has been shown for the RCs of C. aurantiacus and R. castenholzi [11,52]. Figure 6 shows the ferricyanide-oxidized minus ascorbate-reduced absorption spectrum of the Chloroflexota sp. RC. The bleaching of the 860 nm peak after oxidation confirms the assignment of this band to the P BChls, and shows that they are potentially capable of initiating electron transfer to the RC quinones as diagrammed in Figure 1. To determine whether this Chloroflexota sp. RC is fully catalytically active (i.e., capable of photon-driven Q_B reduction as shown in Figure 1), we specifically excited the P BChls of the RC using a flash of 860 nm light, and followed the kinetics of bleaching and recovery at 865 nm. The rate of recovery at 865 nm reflects the rate of the back reaction of electron transfer from the quinones to P⁺, which differs depending on whether Q_B is present. As shown in Figure 7, there was a rapid decrease in absorbance at 865 nm after the flash, followed by a relatively slow recovery that started to plateau after about 5 s. These data show that this RC is fully functional in terms photon-driven electron transfer from the P BChls to the second of the two quinones, Q_B. This interpretation is based on the rate of recovery, which in the R. sphaeroides RC takes about 1 s from Q_B⁻, whereas recombination from Q_A⁻ takes only about 0.1 s [55]. The R. castenholzii and C. aurantiacus RCs similarly have fast recombination rates from Q_A⁻, on the order of 0.04 to 0.06 s [15,56]. Therefore, the kinetics of recovery that we observed for this Chloroflexota RC are closer to the values for recovery by electron transfer to P⁺ from Q_B⁻ than from Q_A⁻ in homologous RCs, and indicates that this RC is replete in cofactors and fully functional.

3.3. Bioinformatic Analyses of RC Proteins

Given the success with heterologous expression of metagenomic-derived Chloroflexota sp. proteins to study functions encoded by non-cultivable bacteria, we next aimed to generate a resource of diverse metagenomic sequences potentially adapted to anoxygenic photosynthesis at extreme temperatures. We searched publicly available metagenomic samples collected from hot (42–90 °C) springs available through the JGI IMG database and focused on homologs of PufL and PufM. Identified sequences were clustered to reduce the number of identical and highly similar sequences and were compared to homologous proteins available in the UniProt database, which were also clustered to reduce the number of identical and highly similar sequence. Based on an iterative approach that combines SSNs (Figure 8 and Figure S2) with phylogenetic reconstruction (Figure 9), we identified 386 unique PufL homologs, 577 unique PufM homologs, 14 N-terminal unique PufL-domains, and 18 C-terminal unique PufM-domains. (See Section 2 for a discussion on identification of proteins and defining uniqueness.) These homologs and related sequences are grouped into 8 clusters in the network and 8 corresponding clades in the tree: 4 for PufL and 4 for PufM (Figure 8 and Figure 9). The N- and C-terminal domains were predicted from genes encoding PufL-PufM fusions. There is evidence that the mature form of these gene products is separate PufL and PufM proteins [9], but it is unknown at what point the fused polypeptide is processed (or whether the RNA is processed). To avoid potential artifacts in the analyses, only sequence corresponding to the domain PF00124 was used for comparisons, and “N_” is used to distinguish the PufL-like region from these fusions and “C_” is used to distinguish the PufM-like region.

Based on a sub-sampled gene neighborhood analysis, photosynthesis operons typically contain genes for both PufL and PufM (Figure 10 and Table S2). This trend holds true even for sequences from clusters PufL_4 and PufM_7 where the photosynthesis genes are separated into two distinct genomic locations, one containing adjacent genes for PufL and PufM and the second locus containing adjacent genes for PufA, PufB and PufC (Figure 10). Therefore, the unequal number of non-redundant PufL and PufM proteins is likely due to a combination of a small bias in the number of PufM homologs initially identified (~25% more PufM than PufL) and a larger bias in the similarity among PufM homologs compared to among PufL homologs. As seen in Figure 8A, the main PufM cluster (cluster 5) is less compact than the main PufL cluster (cluster 1), which reflects a larger proportion of low alignment scores between PufM homologs, i.e., fewer edges between PufM nodes. The extent of divergence between PufM homologs compared to PufL homologs is also apparent in the phylogenetic tree (Figure 9).

As reflected in the SSN and phylogenetic tree, we identified 4 distinct clusters/clades for both PufL and PufM homologs (Figure 8 and Figure 9). The largest clusters (cluster PufL_1 and PufM_5) are dominated by sequences from Pseudomonadota, while the other clusters (PufL_2, PufL_3, PufL_4, PufM_6, PufM_7, and PufM_8) are dominated by sequences from Chloroflexota and the hot spring samples (Figure 8B). The N-terminal PufL region and C-terminal PufM region from the fused gene products unique to Chloroflexota are found exclusively in clusters 2 and 6, respectively. We also identified three PufM homologs with a C-teminal OmpA-like domain in PufM cluster 5 (Figure 10E). The hypothesis that this OmpA-like domain may function in photosynthesis was further supported by the identification of analogous fusions where PufC homologs contain a OmpA-like N-terminal domain (Figure 10E). However, as these Puf-OmpA fusions are relatively rare, we cannot rule out the possibility of inaccurate gene models. In the case of genes encoding the fused PufL and PufM, a single ORF is a conserved characteristic for the proteins in cluster 2 and 6. Those sequences in either cluster 2 or 6 with only a single domain identified appear to be fragments.

Based on our iterative approach that combined SSNs and phylogenetic reconstruction, we identified 30 PufL sequences, 29 PufM sequences, and 2 C-terminal domains that are unique and represented by an IMG hot spring sequence (Table S6). As the UniProt database is not an exhaustive collection of predicted proteins, we searched NCBI’s Genbank using a threshold of 95% identity to determine if similar proteins have been detected in deposited sequencing projects. This analysis suggests 15 unique hot spring sequences in PufL_1, 3 in PufL_3, 1 in PufL_4, 17 in PufM_5, 1 in PufM_7, and 1 in PufM_8 (Figure 9 and Table S6). The closest homologs of these proteins are also often from hot spring samples. Exceptions do occur, such as three Puf sequences from IMG identified from the Seven Mile Hole Area in Yellowstone National Park with closest matches (83–85% identity) to a Pseudomonadota bacterium sequenced from plastic particles (plastisphere) in the Pacific Ocean, and two IMG Puf sequences from Empress Pool OF2, Yellowstone National Park, with closest matches (88% identity) to sequences from plastic particles in the Atlantic Ocean [57].

4. Discussion

In this paper we describe how the genetically tractable alphaproteobacterium R. sphaeroides was used as a chassis for the expression of distantly related Chloroflexota photosynthesis genes mined from hot spring metagenomic samples. Although Chloroflexota sp. has never been cultivated or microscopically visualized, the RC proteins are 82–85% identical in sequence alignments to those of C. aurantiacus, whereas the corresponding identities between those of R. castenholzii and the Chloroflexota sp. are 49% (PufL) and 48% (PufM). These relative similarities are implicit in the SSNs and phylogenetic tree generated in our bioinformatic analyses. Therefore, the photosynthetic apparatus of Chloroflexota sp. appears to be physiologically similar to that of C. aurantiacus, although there is no evidence of chlorosome-related proteins encoded in the contigs. Our assumption that the Chloroflexota sp. is a thermophile and contains a thermostable RC is supported by a heat denaturation experiment in which it was found that there was little change in the RC accessory BChl peak after 15 min at 50 °C (Figure S4). This is similar to the RC of C. aurantiacus [58,59] and in contrast to the RC of the mesophile R. sphaeroides, which loses about 90% of this peak after 15 min at 50 °C [58,60]. Attempts to obtain anaerobic phototrophic growth of R. sphaeroides strain RCx^R [25] containing plasmids pDJ-MK [26] and pCFRC were unsuccessful, which could be due to an inability of the R. sphaeroides cytochrome bc₁ complex, which oxidizes ubiquinol [6], to oxidize menaquinol produced by the Chloroflexota sp. RC.

Our bioinformatic analysis of PufL and PufM sequences shows that hot spring sequences almost exclusively cluster with sequences from Chloroflexota, supporting previous analyses that Chloroflexota are abundant in hot spring microbial samples [38]. Not only are they abundant, but the Chloroflexota proteins can be grouped into one of three ancient lineages supported by the sequence analyses and distinct operonic structure of photosynthesis genes. Intriguingly, a conservative approach that takes into account the sequence identity of aligned positions in the multiple sequence alignment suggests that a greater number of unique hot spring sequences (i.e., less than 95% identity to proteins deposited in UniProt) are related to proteins from Pseudomonadota. As such, even though Chloroflexota PufL and PufM sequences dominate the hot spring samples and are derived from distinct and ancient lineages, Pseudomonadota-like metagenomic-derived sequences represent a broader swath of sequence diversity.

The overwhelming number of diverse and unique PufL and PufM sequences have yet-to-be experimentally characterized. This lack of data is a combination of limited availability of genetically tractable model systems, the inability to cultivate a large number of anoxygenic phototrophs detected with DNA sequencing, and rapidly growing collections of predicted protein sequences that are impractical for experimental characterization. Here, we present a resource of unique and diverse PufL and PufM sequences for selecting candidate sequences, and an experimental approach to study these proteins through heterologous expression. Combined with the decreasing cost of DNA synthesis, these resources can enable the study and genetic modification of novel photosynthesis proteins from thermophiles that may never be amenable to cultivation as pure cultures in the laboratory.

5. Conclusions

We describe a general method to express anoxygenic type II RC genes from uncultivated hot spring organisms in the mesophile R. sphaeroides, to allow for the study of uncharacterized proteins, and present results using sequences derived from the metagenome of an otherwise uncharacterized Chloroflexota sp. to show proof of principle. This approach grants access to the well-developed genetic tools available for application in R. sphaeroides for a variety of purposes. Although it remains to be seen whether obstacles may exist with genes from other species, such as in an initial attempt of ours to express the fused pufLM genes of R. castenholzii, it is possible that with some species additional genes encoding the entire RC-LH core and cytochromes could be functionally expressed, to recreate a thermophilic light-driven system for transmembrane translocation of protons. Our bioinformatic study makes the point that there is a potential treasure trove available for exploitation, and provides a focused collection of sequences directly relevant to the wet lab work we describe, i.e., curation and analysis to aid in the selection of such sequences for study by the community of scientists interested in photosynthesis.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom15111529/s1, Figure S1: Genetic map of plasmid pCFRC; Figure S2: Identification of PufL and PufM proteins encoded by metagenomic samples from hot springs; Figure S3: Representative taxa for each node in Figure 8; Figure S4: Change in absorbance of the RC accessory BChl peak at 810 nm as a function of time of incubation at the temperatures indicated; Table S1: DNA sequence of operon; Table S2: Information associated with nodes in PufL/PufM sequence similarity network; Table S3: Information associated with phylogenetic tree; Table S4: Gene neighborhood infromation retreived with EFI-GNT for proteins in UniProt; Table S5: Metadata associated with metagenomic samples included in analysi; Table S6: Unique metagenomic PufL and PufM sequences.

Author Contributions

Conceptualization, J.T.B., I.K.B. and C.E.B.-H.; methodology, J.T.B., I.K.B. and C.E.B.-H.; investigation, Y.R., Y.K. and M.T.; resources, J.T.B., I.K.B. and C.E.B.-H.; data curation, C.E.B.-H.; writing—original draft preparation, J.T.B. and C.E.B.-H.; writing—review and editing, Y.R., Y.K., M.T. and I.K.B.; supervision, J.T.B. and I.K.B.; funding acquisition, J.T.B. and C.E.B.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC grants RGPIN 2018–03898 and 2025–04928 to J.T.B.; work at the Molecular Foundry was supported by the Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 (C.E.B.-H.); work conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231 (C.E.B.-H. and I.K.B); Y.R. was supported by a grant from the Higher Education Commission (Pakistan) 3-1/PDFP/HEC/2022(B-3)/2339/02.

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

Molecular graphics images were prepared using UCSF ChimeraX [UCSF ChimeraX: Tools for structure building and analysis. Meng EC, Goddard TD, Pettersen EF, Couch GS, Pearson ZJ, Morris JH, Ferrin TE. Protein Sci. 2023 Nov;32(11):e4792], developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases. DNA synthesis and expression constructs were provided by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231 (proposal:510024, doi.org/10.46936/10.25585/60008867).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

RC(s)	Reaction center(s)
BChl(s)	Bacteriochlorophyll(s)
BPhe(s)	Bacteriopheophytin(s)
P	Primary donor or special pair of BChls in the RC
Q_A	Quinone within the RC
Q_B	Quinone within the RC
sp.	Species
ms	millisecond
nm	nanometer
NTA	nitrilotriacetic acid
JGI	Joint Genome Institute
IMG/M	Integrated Microbial Genomes and Microbiomes
SSN(s)	Sequence similarity network(s)

References

Wiegand, S.; Sobol, M.; Schnepp-Pesch, L.K.; Yan, G.; Iqbal, S.; Vollmers, J.; Muller, J.A.; Kaster, A.K. Taxonomic re-classification and expansion of the phylum Chloroflexota based on over 5000 genomes and metagenome-assembled genomes. Microorganisms 2023, 11, 2612. [Google Scholar] [CrossRef]
Freches, A.; Fradinho, J.C. The biotechnological potential of the Chloroflexota phylum. Appl. Environ. Microbiol. 2024, 90, e0175623. [Google Scholar] [CrossRef]
Tsuji, J.M.; Shaw, N.A.; Nagashima, S.; Venkiteswaran, J.J.; Schiff, S.L.; Watanabe, T.; Fukui, M.; Hanada, S.; Tank, M.; Neufeld, J.D. Anoxygenic phototroph of the Chloroflexota uses a type I reaction centre. Nature 2024, 627, 915–922. [Google Scholar] [CrossRef]
Blankenship, R.E. Molecular Mechanisms of Photosynthesis; Blackwell Science: Oxford, UK, 2002. [Google Scholar]
Liu, L.N.; Bracun, L.; Li, M. Structural diversity and modularity of photosynthetic RC-LH1 complexes. Trends Microbiol. 2024, 32, 38–52. [Google Scholar] [CrossRef]
Kirmaier, C.; Blankenship, R.E.; Holten, D. Formation and decay of radical-pair state P⁺I⁻ in Chloroflexus aurantiacus reaction centers. Biochim. Biophys. Acta 1986, 850, 275–285. [Google Scholar] [CrossRef]
Huang, G.Q.; Dong, S.S.; Ma, L.; Li, L.; Ju, J.X.; Wang, M.J.; Zhang, J.P.; Sui, S.F.; Qin, X.C. Cryo-EM structure of a minimal reaction center-light-harvesting complex from the phototrophic bacterium Chloroflexus aurantiacus. J. Integr. Plant Biol. 2025, 67, 967–978. [Google Scholar] [CrossRef]
Gardiner, A.T.; Nguyen-Phan, T.C.; Cogdell, R.J. A comparative look at structural variation among RC-LH1 ‘Core’ complexes present in anoxygenic phototrophic bacteria. Photosynth. Res. 2020, 145, 83–96. [Google Scholar] [CrossRef]
Xin, Y.; Shi, Y.; Niu, T.; Wang, Q.; Niu, W.; Huang, X.; Ding, W.; Yang, L.; Blankenship, R.E.; Xu, X.; et al. Cryo-EM structure of the RC-LH core complex from an early branching photosynthetic prokaryote. Nat. Commun. 2018, 9, 1568. [Google Scholar] [CrossRef]
Meng, E.C.; Goddard, T.D.; Pettersen, E.F.; Couch, G.S.; Pearson, Z.J.; Morris, J.H.; Ferrin, T.E. UCSF ChimeraX: Tools for structure building and analysis. Protein Sci. 2023, 32, e4792. [Google Scholar] [CrossRef] [PubMed]
Pierson, B.K.; Thornber, J.P. Isolation and spectral characterization of photochemical reaction centers from the thermophilic green bacterium Chloroflexus aurantiacus strain J-10-f1. Proc. Natl. Acad. Sci. USA 1983, 80, 80–84. [Google Scholar] [CrossRef] [PubMed]
Bina, D.; Gardian, Z.; Vacha, F.; Litvin, R. Supramolecular organization of photosynthetic membrane proteins in the chlorosome-containing bacterium Chloroflexus aurantiacus. Photosynth. Res. 2014, 122, 13–21. [Google Scholar] [CrossRef]
Swainsbury, D.J.K.; Qian, P.; Hitchcock, A.; Hunter, C.N. The structure and assembly of reaction centre-light-harvesting 1 complexes in photosynthetic bacteria. Biosci. Rep. 2023, 43, BSR20220089. [Google Scholar] [CrossRef]
Qi, C.H.; Wang, G.L.; Wang, F.F.; Xin, Y.Y.; Zou, M.J.; Madigan, M.T.; Wang-Otomo, Z.Y.; Ma, F.; Yu, L.J. New insights on the photocomplex of revealed from comparisons of native and carotenoid-depleted complexes. J. Biol. Chem. 2023, 299, 105057. [Google Scholar] [CrossRef] [PubMed]
Collins, A.M.; Kirmaier, C.; Holten, D.; Blankenship, R.E. Kinetics and energetics of electron transfer in reaction centers of the photosynthetic bacterium Roseiflexus castenholzii. Biochim. Biophys. Acta 2011, 1807, 262–269. [Google Scholar] [CrossRef][Green Version]
Wang, X.P.; Yu, B.Y.; Qi, C.H.; Wang, G.L.; Zou, M.J.; Zhang, C.F.; Yu, L.J.; Ma, F. Energy Transfer and Exciton Relaxation in B880-B800-RC Complex through Two-Dimensional Electronic Spectroscopy. J. Phys. Chem. Lett. 2024, 15, 3619–3626. [Google Scholar] [CrossRef]
Xin, Y.Y.; Lin, S.; Blankenship, R.E. Femtosecond spectroscopy of the primary charge separation in reaction centers of Chloroflexus aurantiacus with selective excitation in the Q and soret bands. J. Phys. Chem. A 2007, 111, 9367–9373. [Google Scholar] [CrossRef]
Zabelin, A.A.; Kovalev, V.B.; Khristin, A.M.; Khatypov, R.A.; Shkuropatov, A.Y. Primary charge separation in Chloroflexus aurantiacus reaction centers at room temperature: Ultrafast transient absorption measurements on Q_A-depleted preparations with native and chemically modified bacteriopheophytin composition. Photosynth. Res. 2025, 163, 1–15. [Google Scholar] [CrossRef]
Kato, M.; Cardona, T.; Rutherford, A.W.; Reisner, E. Covalent immobilization of oriented photosystem II on a nanostructured electrode for solar water oxidation. J. Am. Chem. Soc. 2013, 135, 10610–10613. [Google Scholar] [CrossRef]
Rasul, F.; You, D.W.; Jiang, Y.; Liu, X.J.; Daroch, M. Thermophilic cyanobacteria-exciting, yet challenging biotechnological chassis. Appl. Microbiol. Biotechnol. 2024, 108, 270. [Google Scholar] [CrossRef] [PubMed]
Chen, G.E.; Hunter, C.N. Engineering chlorophyll, bacteriochlorophyll, and carotenoid biosynthetic pathways in Escherichia coli. ACS Synth. Biol. 2023, 12, 2236–2244. [Google Scholar] [CrossRef] [PubMed]
Levy-Booth, D.J.; Hashimi, A.; Roccor, R.; Liu, L.Y.; Renneckar, S.; Eltis, L.D.; Mohn, W.W. Genomics and metatranscriptomics of biogeochemical cycling and degradation of lignin-derived aromatic compounds in thermal swamp sediment. ISME J. 2021, 15, 879–893. [Google Scholar] [CrossRef]
Simon, R.; Priefer, U.; Pühler, A. A broad host range mobilization system for in vivo genetic engineering: Transposon mutagenesis in Gram-negative bacteria. Nat. Biotechnol. 1983, 1, 784–791. [Google Scholar] [CrossRef]
Studier, F.W.; Moffatt, B.A. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J. Mol. Biol. 1986, 189, 113–130. [Google Scholar] [CrossRef] [PubMed]
Jun, D.; Saer, R.G.; Madden, J.D.; Beatty, J.T. Use of new strains of Rhodobacter sphaeroides and a modified simple culture medium to increase yield and facilitate purification of the reaction centre. Photosynth. Res. 2014, 120, 197–205. [Google Scholar] [CrossRef]
Jun, D.; Richardson-Sanchez, T.; Mahey, A.; Murphy, M.E.P.; Fernandez, R.C.; Beatty, J.T. Introduction of the menaquinone biosynthetic pathway into Rhodobacter sphaeroides and de novo synthesis of menaquinone for incorporation into heterologously expressed integral membrane proteins. ACS Synth. Biol. 2020, 9, 1190–1200. [Google Scholar] [CrossRef] [PubMed]
Sambrook, J.; Fritsch, E.F.; Maniatis, T. Molecular Cloning: A Laboratory Manual, 2nd ed.; Cold Spring Harbor Laboratory Press: Woodbury, NY, USA, 1989. [Google Scholar]
Oberortner, E.; Cheng, J.F.; Hillson, N.J.; Deutsch, S. Streamlining the design-to-build transition with build-optimization software tools. ACS Synth. Biol. 2017, 6, 485–496. [Google Scholar] [CrossRef]
Salis, H.M.; Mirsky, E.A.; Voigt, C.A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 2009, 27, 946–950. [Google Scholar] [CrossRef] [PubMed]
Ind, A.C.; Porter, S.L.; Brown, M.T.; Byles, E.D.; de Beyer, J.A.; Godfrey, S.A.; Armitage, J.P. Inducible-expression plasmid for Rhodobacter sphaeroides and Paracoccus denitrificans. Appl. Environ. Microbiol. 2009, 75, 6613–6615. [Google Scholar] [CrossRef]
Chen, I.M.A.; Chu, K.; Palaniappan, K.; Ratner, A.; Huang, J.H.; Huntemann, M.; Hajek, P.; Ritter, S.J.; Webb, C.; Wu, D.Y.; et al. The IMG/M data management and analysis system v.7: Content updates and new features. Nucleic Acids Res. 2023, 51, D723–D732. [Google Scholar] [CrossRef]
Bateman, A.; Martin, M.J.; Orchard, S.; Magrane, M.; Adesina, A.; Ahmad, S.; Bowler-Barnett, E.H.; Bye-A-Jee, H.; Carpentier, D.; Denny, P.; et al. UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 2024, 52, D609–D617. [Google Scholar] [CrossRef]
Oberg, N.; Zallot, R.; Gerlt, J.A. EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme function Iiitiative (EFI) web resource for genomic enzymology tools. J. Mol. Biol. 2023, 435, 168018. [Google Scholar] [CrossRef]
Zallot, R.; Oberg, N.; Gerlt, J.A. The EFI web resource for genomic enzymology tools: Leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry 2019, 58, 4169–4182. [Google Scholar] [CrossRef]
Suzek, B.E.; Wang, Y.Q.; Huang, H.Z.; McGarvey, P.B.; Wu, C.H.; Consortium, U. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015, 31, 926–932. [Google Scholar] [CrossRef]
Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef]
Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.Z.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef] [PubMed]
Paysan-Lafosse, T.; Andreeva, A.; Blum, M.; Chuguransky, S.R.; Grego, T.; Pinto, B.L.; Salazar, G.A.; Bileschi, M.L.; Llinares-López, F.; Meng-Papaxanthos, L.; et al. The Pfam protein families database: Embracing AI/ML. Nucleic Acids Res. 2024, 53, D523–D534. [Google Scholar] [CrossRef] [PubMed]
Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef]
Miller, M.A.; Pfeiffer, W.; Schwartz, T. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Proceedings of the 2010 Gateway Computing Environments Workshop (GCE), New Orleans, LA, USA, 14 November 2010; pp. 1–8. [Google Scholar]
Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v6: Recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024, 52, W78–W82. [Google Scholar] [CrossRef]
Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef] [PubMed]
Shelton, A.N.; Yu, F.B.; Grossman, A.R.; Bhaya, D. Abundant and active community members respond to diel cycles in hot spring phototrophic mats. ISME J. 2025, 19, wraf001. [Google Scholar] [CrossRef]
Klatt, C.G.; Inskeep, W.P.; Herrgard, M.J.; Jay, Z.J.; Rusch, D.B.; Tringe, S.G.; Parenteau, M.N.; Ward, D.M.; Boomer, S.M.; Bryant, D.A.; et al. Community structure and function of high-temperature chlorophototrophic microbial mats inhabiting diverse geothermal environments. Front. Microbiol. 2013, 4, 106. [Google Scholar] [CrossRef]
Klatt, C.G.; Liu, Z.F.; Ludwig, M.; Kühl, M.; Jensen, S.I.; Bryant, D.A.; Ward, D.M. Temporal metatranscriptomic patterning in phototrophic Chloroflexi inhabiting a microbial mat in a geothermal spring. ISME J. 2013, 7, 1775–1789. [Google Scholar] [CrossRef]
Klatt, C.G.; Wood, J.M.; Rusch, D.B.; Bateson, M.M.; Hamamura, N.; Heidelberg, J.F.; Grossman, A.R.; Bhaya, D.; Cohan, F.M.; Kühl, M.; et al. Community ecology of hot spring cyanobacterial mats: Predominant populations and their functional potential. ISME J. 2011, 5, 1262–1278. [Google Scholar] [CrossRef] [PubMed]
Lynes, M.M.; Krukenberg, V.; Jay, Z.J.; Kohtz, A.J.; Gobrogge, C.A.; Spietz, R.L.; Hatzenpichler, R. Diversity and function of methyl-coenzyme M reductase-encoding archaea in Yellowstone hot springs revealed by metagenomics and mesocosm experiments. ISME Commun. 2023, 3, 22. [Google Scholar] [CrossRef]
Jarett, J.K.; Dzunková, M.; Schulz, F.; Roux, S.; Paez-Espino, D.; Eloe-Fadrosh, E.; Jungbluth, S.P.; Ivanova, N.; Spear, J.R.; Carr, S.A.; et al. Insights into the dynamics between viruses and their hosts in a hot spring microbial mat. ISME J. 2020, 14, 2527–2541. [Google Scholar] [CrossRef]
Inskeep, W.P.; Jay, Z.J.; Tringe, S.G.; Herrgård, M.J.; Rusch, D.B.; Co, Y.M.P.S. The YNP metagenome project: Environmental parameters responsible for microbial distribution in the Yellowstone geothermal ecosystem. Front. Microbiol. 2013, 4, 67. [Google Scholar] [CrossRef]
Bowers, R.M.; Nayfach, S.; Schulz, F.; Jungbluth, S.P.; Ruhl, I.A.; Sheremet, A.; Lee, J.; Goudeau, D.; Eloe-Fadrosh, E.A.; Stepanauskas, R.; et al. Dissecting the dominant hot spring microbial populations based on community-wide sampling at single-cell genomic resolution. ISME J. 2022, 16, 1337–1347. [Google Scholar] [CrossRef] [PubMed]
Yamada, M.; Zhang, H.; Hanada, S.; Nagashima, K.V.P.; Shimada, K.; Matsuura, K. Structural and spectroscopic properties of a reaction center complex from the chlorosome-lacking filamentous anoxygenic phototrophic bacterium. J. Bacteriol. 2005, 187, 1702–1709. [Google Scholar] [CrossRef]
Yabe, S.; Muto, K.; Abe, K.; Yokota, A.; Staudigel, H.; Tebo, B.M. Vulcanimicrobium alpinus gen. nov. sp. nov., the first cultivated representative of the candidate phylum “Eremiobacterota”, is a metabolically versatile aerobic anoxygenic phototroph. ISME Commun. 2022, 2, 102. [Google Scholar] [CrossRef] [PubMed]
Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2022. [CrossRef] [PubMed]
Feher, G.; Allen, J.P.; Okamura, M.Y.; Rees, D.C. Structure and function of bacterial photosynthetic reaction centers. Nature 1989, 339, 111–116. [Google Scholar] [CrossRef]
Venturoli, G.; Trotta, M.; Feick, R.; Melandri, B.A.; Zannoni, D. Temperature-dependence of charge recombination from the P⁺Qa^- and P⁺Qb^- states in photosynthetic reaction centers isolated from the thermophilic bacterium Chloroflexus-aurantiacus. Eur. J. Biochem. 1991, 202, 625–634. [Google Scholar] [CrossRef]
Lips, S.; Schmitt-Jansen, M.; Borchert, E. Metagenomic analyses of the plastisphere reveals a common functional potential across oceans. bioRxiv 2024. [Google Scholar] [CrossRef]
Pierson, B.K.; Thornber, J.P.; Seftor, R.E.B. Partial purification, subunit structure and thermal stability of the photochemical reaction center of the thermophilic green bacterium Chloroflexus aurantiacus. Biochim. Biophys. Acta 1983, 723, 322–326. [Google Scholar] [CrossRef]
Nozawa, T.; Madigan, M.T. Temperature and solvent effects on reaction centers from Chloroflexus aurantiacus and Chromatium tepidum. J. Biochem. 1991, 110, 588–594. [Google Scholar] [CrossRef] [PubMed]
Watson, A.J.; Hughes, A.V.; Fyfe, P.K.; Wakeham, M.C.; Holden-Dye, K.; Heathcote, P.; Jones, M.R. On the role of basic residues in adapting the reaction centre—LH1 complex for growth at elevated temperatures in purple bacteria. Photosynth. Res. 2005, 86, 81–100. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Spatial organization of RC cofactors. Molecules have been simplified and phytyl tails removed for clarity. Green spheres represent Mg²⁺ and the red sphere Fe²⁺. Top center, BChl dimer, P; top left, Bhe; top right, BChl; center left and right, BPhe; bottom right, menaquinone Q_A; bottom center, iron atom; bottom left, menaquinone Q_B. Arrows show the pathway of electron transfer. Image is based on the R. castenholzii structure 5YQ7.pdb [9]. Image created using Chimerax [10].

Figure 2. Representation of synthetic operon of Chloroflexota sp. RC genes pufL and pufM. Operon name is given on the left, arrows represent coding regions.

Figure 3. AlphaFold model of Chloroflexota sp. RC proteins. PufL is in blue, PufM in pink. The periplasmic side is at the top, and the cytoplasmic side at the bottom, with the vertical a-helices traversing the cytoplasmic membrane. Image obtained using Chimerax [10].

Figure 4. SDS-PAGE of Chloroflexota sp. RC showing effects of sample treatment temperature on protein migration. MW, molecular weight ladder in kD; numbers above lanes give the temperature (celsius) of sample treatment prior to gel loading.

Figure 5. Absorption spectrum of the Chloroflexota sp. RC.

Figure 6. Oxidized minus reduced Chloroflexota sp. RC absorbance spectrum.

Figure 7. Kinetics of recovery of Chloroflexota sp. RC absorbance at 865 nm after a flash of 860 nm illumination.

Figure 8. Identification of PufL and PufM proteins encoded by metagenomic samples from hot springs: (A) Sequence similarity network of PufL-like and PufM-like proteins using an alignment score of 80. Clusters 1–4 represent PufL-like proteins based on sequence similarity to characterized PufL sequences; clusters 5 through 8 represent PufM-like proteins based on sequence similarity to characterized PufM sequences. Nodes are colored by class according to the node key, colored red for hot spring proteins identified in IMG, or colored gray for proteins from UniProt with no assigned taxonomic rank. Square nodes represent N-terminal PufL-like domains, and triangles are used to indicate those nodes representing C-terminal PufM-like domains from a fusion protein. (B) The same network as in panel (A) except that an alignment score of 50 was used to draw edges. Nodes are colored the same as in panel (A). (C) Number of nodes represented by the indicated taxa as a percent for each cluster. Minor taxa are not shown. A graphical representation of all taxa can be found in Figure S3.

Figure 9. Phylogenetic reconstruction of PufL and PufM homologs. Outer ring annotations are colored according to the color key. Leaves are labeled that represent PufL and PufM from Chloroflexota bacterium L.E.CH.39_1 (C) or characterized proteins in SwissProt. Av, Allochromatium vinosum; Ca, Chloroflexus aurantiacus; Rc, Rhodobacter capsulatus; Rg, Rubrivivax gelatinosus. To help identify the leaves representing IMG thermal spring sequences, red bars are used for phylum, class, and in the IMG “hot” annotation ring. IMG hot spring proteins with less than 95% identity to another protein in Genbank are labeled with a small red square on the outmost of the circle. PufL-PufM fusions were found exclusively in the clades representing cluster 2 (contains the PufL-like domain) and cluster 6 (contains the PufM-like domain). In those cases where two domains were not identified for proteins in these clusters, some of those proteins are fragments (annotated with a zigzag line), the sequence encoding PufM is at the 5′ most end of the contig (A0A0P9DKP2), or the gene model appears to be incorrect (A0A7W1TQK3), suggesting that the full fusion was not captured in these cases. The tree was rooted at the outgroup (PsbA1, PsbA2, and PsbD1 from Acaryochloris marina). Group # means group number.

Figure 10. Gene neighborhoods containing puf genes: (A) neighborhoods containing genes encoding proteins from PufL cluster 1 and PufM cluster 5; (B) neighborhoods containing genes encoding proteins from PufL cluster 4 and PufM cluster 7. In these genomes, pufC is in a separate neighborhood; (C) neighborhoods containing genes encoding proteins from PufL cluster 3 and PufM cluster 8; (D) neighborhoods containing genes encoding fusions proteins that have domains from PufL cluster 2 and PufM cluster 6; (E) examples of gene neighborhoods encoding PufM or PufC proteins with a C-terminal OmpA-like domain (dark purple). For all panels, genes encoding homologous proteins are colored the same, and unlabeled genes are transparent. A vertical black line is used to indicate the end of a contig. The corresponding phylum for each bacterium is given to the left, according to the color key. Gray background shading is used to emphasize the distinct arrangements of in each cluster of genes encoding PufL, PufM, and PufC.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rehman, Y.; Kim, Y.; Tong, M.; Blaby, I.K.; Blaby-Haas, C.E.; Beatty, J.T. Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides. Biomolecules 2025, 15, 1529. https://doi.org/10.3390/biom15111529

AMA Style

Rehman Y, Kim Y, Tong M, Blaby IK, Blaby-Haas CE, Beatty JT. Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides. Biomolecules. 2025; 15(11):1529. https://doi.org/10.3390/biom15111529

Chicago/Turabian Style

Rehman, Yasir, Younghoon Kim, Michelle Tong, Ian K. Blaby, Crysten E. Blaby-Haas, and J. Thomas Beatty. 2025. "Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides" Biomolecules 15, no. 11: 1529. https://doi.org/10.3390/biom15111529

APA Style

Rehman, Y., Kim, Y., Tong, M., Blaby, I. K., Blaby-Haas, C. E., & Beatty, J. T. (2025). Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides. Biomolecules, 15(11), 1529. https://doi.org/10.3390/biom15111529

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mining Thermophile Photosynthesis Genes: A Synthetic Operon Expressing Chloroflexota Species Reaction Center Genes in Rhodobacter sphaeroides

Abstract

1. Introduction

2. Materials and Methods

2.1. Strains, Growth Conditions and Plasmids

2.2. Protein Purification, SDS-PAGE, and Spectroscopy

2.3. Bioinformatic Tools and Data

2.3.1. Curation of Hot Spring Metagenomes

2.3.2. Identification of Reaction Center Proteins PufL and PufM

3. Results

3.1. Design, Construction and Expression of Chloroflexota Synthetic pufLM Operon Encoding the RC

3.2. SDS-PAGE, Absorption Spectra and Catalytic Activity of the RC Complex Produced by the pufLM Operon

3.3. Bioinformatic Analyses of RC Proteins

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI