Homologs of Phycobilisome Abundance Regulator PsoR Are Widespread across Cyanobacteria

: During chromatic acclimation (CA), cyanobacteria undergo shifts in their physiology and metabolism in response to changes in their light environment. Various forms of CA, which involves the tuning of light-harvesting accessory complexes known as phycobilisomes (PBS) in response to distinct wavelengths of light, have been recognized. Recently, a negative regulator of PBS abundance, PsoR, about which little was known, was identiﬁed. We used sequence analyses and bioinformatics to predict the role of PsoR in cyanobacteria and PBS regulation and to examine its presence in a diverse range of cyanobacteria. PsoR has sequence similarities to the β -CASP family of proteins involved in DNA and RNA processing. PsoR is a putative nuclease widespread across Cyanobacteria, of which over 700 homologs have been observed. Promoter analysis suggested that psoR is co-transcribed with upstream gene tcpA . Multiple transcription factors involved in global gene regulation and stress responses were predicted to bind to the psoR - tcpA promoter. The predicted protein–protein interactions with PsoR homologs included proteins involved in DNA and RNA metabolism, as well as a phycocyanin-associated protein predicted to interact with PsoR from Fremyella diplosiphon (Fd PsoR ) . The widespread presence of PsoR homologs in Cyanobacteria and their ties to DNA- and RNA-metabolizing proteins indicated a potentially unique role for PsoR in CA and PBS abundance regulation.


Introduction
Cyanobacteria are oxygenic phototrophs that are responsible for 20-30% of global carbon fixation and the majority of global nitrogen fixation [1,2]. These prokaryotes serve as model organisms for studying photosynthesis and photoacclimation because of their fast growth and the tools available for the genetic manipulation of many species. In addition to research on the mechanisms governing their growth and fitness in natural contexts, there has been ongoing research into their use as a microbial chassis in the biotech industry for the production of biofuels, biopharmaceuticals, food supplements, bioremediation, and biofertilizers [3,4]. Due to the growing interest in this ancient phylogeny of bacteria, it is important to fully understand the genes and mechanisms that control their photosynthetic potential, productivity, and adaptive responses.
Cyanobacteria have evolved to survive in a wide range of ecological niches, including freshwater and marine environments, as well as in arid deserts, arctic tundra, and hot springs [5][6][7]. The dynamic conditions in these environments present various challenges for cyanobacteria, including nutrient limitations and fluctuating light levels, which impact organismal growth and fitness. To deal with variable photoenvironments, cyanobacteria have evolved a mechanism called chromatic acclimation (CA), which allows them to potential role for the protein in regulating PBS abundance in response to external light cues and/or stress.

Sequence Analysis and Protein Modeling
The 554-amino-acid sequence of PsoR from F. diplosiphon was obtained from the National Center for Biotechnology Information (NCBI) database (GenBank ID: EKE97789.1) and analyzed using the Phyre2 Protein Fold Recognition Server [37] to predict the protein structure, including a search for conserved domains, sequence features, and prediction(s) of putative functions.

Sequence Homology and Phylogeny
A protein sequence of PsoR from F. diplosiphon was run through the position-specific iterated BLAST (PSI-BLAST) algorithm on the NCBI website [38]. Three successive iterations were performed to refine the results. Multiple sequence alignments of the obtained sequences were generated through the MEGA-7 program using MUSCLE [39]. A phylogenetic tree was created using the maximum likelihood method and the Jones-Taylor-Thornton matrix-based model in MEGA-X [40]. The likelihood log was-746644. 20. An unrelated β-CASP protein from Gloeobacter violaceus (Gene ID: WP_011141021.1) was used to root the phylogenetic tree.

Promoter Analyses
The BPROM program [41] was used to analyze the potential promoter and transcription factor binding sites associated with the F. diplosiphon psoR gene sequence as well as the upstream and downstream areas flanking it. The analyzed sequence was retrieved from the NCBI database (accession ID: DQ286230.1).

Predicted Protein-Protein Interactions
We used the STRING program to search for potential proteins that may interact with PsoR from F. diplosiphon (FdPsoR) and its homologs [42]. The PsoR protein sequence (GenBank ID: EKE97789.1) from F. diplosiphon and homologs from Synechocystis sp. PCC 6803 (GenBank ID: sll0514) and Synechocystis sp. PCC 7002 (GenBank ID: ACA99583.1) were used.

PsoR Is a Putative β-CASP Domain-Containing Ribonuclease
Using the Phyre2 server to predict the structure of FdPsoR, we found matches with β-CASP domain-containing proteins within the metallo-β-lactamase fold superfamily (Table 1). Based on the solved structures of known β-CASP proteins, Phyre2 was able to predict the overall structure of PsoR with 82% of residues modeled at >90% confidence ( Figure 1). The β-lactamase superfamily contains proteins that act on a wide range of substrates, including DNA and RNA [43]. β-CASP domain-containing proteins are a subgroup that typically displays endo-and exonuclease activity on DNA and RNA substrates. These proteins also play a role in DNA repair and pre-mRNA maturation [44]. Although both DNA-and RNA-processing β-CASP nucleases have been found in eukaryotes, only β-CASP homologs acting on RNA have been found in bacteria and archaea to date [45]. The enzymes in this family include the Saccharomyces cerevisiae 3 -processing endonuclease Ysh1 and the cleavage and polyadenylation specificity factor MTH1203 from Methanothermobacter thermautotrophicus [46,47].
The β-lactamase and β-CASP domain regions of FdPsoR span residues 20-430 ( Figure 2). A C-terminal region followed from residue 431 to residue 554, and the protein contained a small n-terminal region comprising residues 1 to 19. β-CASP enzymes within prokaryotes typically function as dimers, with the C-terminal regions involved in dimerization [45,[47][48][49]. Considering the 123-amino-acid C-terminal region of FdPsoR, this region may serve to dimerize or facilitate the interaction of PsoR with other proteins in vivo, although in this region, there were no sequence or structural similarities with putative interaction domains of other β-CASP proteins.  [46,47]; 82% of the FdPsoR residues were modeled at >90% confidence. The protein is rainbow-colored: red (N-terminus) to blue (C-terminus). This figure was made/generated by Alicia Layer.  [46,47]; 82% of the FdPsoR residues were modeled at >90% confidence. The protein is rainbow-colored: red (N-terminus) to blue (C-terminus). This figure was made/generated by Alicia Layer.
tained a small n-terminal region comprising residues 1 to 19. β-CASP enzymes within prokaryotes typically function as dimers, with the C-terminal regions involved in dimerization [45,[47][48][49]. Considering the 123-amino-acid C-terminal region of FdPsoR, this region may serve to dimerize or facilitate the interaction of PsoR with other proteins in vivo, although in this region, there were no sequence or structural similarities with putative interaction domains of other β-CASP proteins. The β-CASP family of proteins has seven conserved residues that are involved in substrate binding and hydrolysis [44]. FdPsoR contains all seven conserved residues associated with enzymatic activity (Figure 3). The motifs contain aspartic acids, glutamic acids, and histidines, which are involved in binding metal ions, particularly zinc and magnesium (Table 2) [51]. The β-CASP family of proteins has seven conserved residues that are involved in substrate binding and hydrolysis [44]. FdPsoR contains all seven conserved residues associated with enzymatic activity (Figure 3). The motifs contain aspartic acids, glutamic acids, and histidines, which are involved in binding metal ions, particularly zinc and magnesium ( Table 2) [51]. tained a small n-terminal region comprising residues 1 to 19. β-CASP enzymes within prokaryotes typically function as dimers, with the C-terminal regions involved in dimerization [45,[47][48][49]. Considering the 123-amino-acid C-terminal region of FdPsoR, this region may serve to dimerize or facilitate the interaction of PsoR with other proteins in vivo, although in this region, there were no sequence or structural similarities with putative interaction domains of other β-CASP proteins. Figure 2. Schematic of FdPsoR protein domains. The full-length PsoR sequence from Fremyella diplosiphon was analyzed using BLASTp [50]. Amino acids 19-430 of FdPsoR have sequence similarity to the Ysh1 superfamily of proteins (shown as a blue-labeled box).
The β-CASP family of proteins has seven conserved residues that are involved in substrate binding and hydrolysis [44]. FdPsoR contains all seven conserved residues associated with enzymatic activity (Figure 3). The motifs contain aspartic acids, glutamic acids, and histidines, which are involved in binding metal ions, particularly zinc and magnesium (Table 2) [51]. Orange marks indicate areas where point mutations in Ysh1 resulted in a loss of endonuclease activity [51]. Table 2. The seven conserved motifs of β-CASP proteins present in FdPsoR and Ysh1. Motifs 1-4 are found in the Metallo-β-lactamase superfamily of proteins involved in the hydrolysis of different substrates. In addition to these four motifs are motifs A, B, and C found in β-CASP domain-containing proteins involved in nucleolytic activity, using DNA and RNA as substrates. The motifs, in particular, are involved in metal binding, such as zinc and magnesium, to form the catalytic reaction center of the protein where cleavage occurs.

PsoR Is Widespread throughout the Cyanobacteria Phylum
A PSI-BLAST search resulted in the identification of 798 homologs found within the cyanobacteria phylum. Species containing a PsoR homolog had only one copy. No homologs were found outside the cyanobacteria. PsoR homologs were notably absent from red algae, another group of photosynthetic organisms containing PBSs.
Homologs were found in cyanobacteria in all orders except Gloeobacter, the oldest extant group of cyanobacteria, which lacks thylakoid membranes and has atypical PBS structures [52,53]. A PsoR phylogenetic tree was generated and rooted using an unrelated β-CASP protein found in Gloeobacter ( Figure 4A). PsoR homologs are generally grouped based on order and family clades, in which PsoR from F. diplosiphon is grouped with other homologs found in Nostocales species. The overall tree structure resembled that of a previous phylogenetic tree based on cyanobacterial genomes [54], suggesting that PsoR was vertically inherited. The majority of homologs (742 of 799) were the closest in size to FdPsoR, and the entire group ranged from 86 to 684 bp. These homologs contained one to three conserved motifs of β-CASP proteins. The shortest homologs in this group were not grouped in a single clade but were interspersed among the larger homologs. Despite their small size, these homologs were retained in the analysis because they may represent truncated forms of PsoR. In total, 57 homologs ranging in size from 344 to 918 bp, grouped in the phylogenetic tree closest to the outgroup, and found predominantly in Oscillatoriales ( Figure 4B).

Promoter Analysis for psoR in Fremyella Diplosiphon
To determine the locations of potential promoter regions for psoR from F. diplosiphon and of possible transcriptional regulatory elements, a sequence-based promoter analysis was performed. Using the BPROM program from the Softberry website, three putative promoter regions and several transcription factor binding sites were located. Two of these promoter regions were upstream of a gene named tcpA, which is found upstream of psoR, while the third promoter region was located within tcpA itself ( Figure 5). In addition to the −10 and −35 promoter regions, 13 transcription factor binding sites were found (Table 3). All but one of the factor binding sites were predicted to co-localize with one of the three promoter sites, and Ihf was predicted to bind two promoter regions; ihf and fis were found to encode global regulators that transcriptionally control hundreds of genes in response to environmental stimuli [55]. Another transcription factor binding site for OmpR was identified. Of note, the response regulator RcaC, which is involved in the regulation of CCA, is a member of the OmpR/PhoB family of DNA-binding proteins [33]. LexA encodes a transcriptional repressor involved in salt stress responses [56].  [38]. The identified PsoR homologs were aligned through the MEGA-7 program using MUSCLE [39]. (A) A phylogenetic tree was created using the maximum likelihood method and Jones-Taylor-Thornton matrix-based model in MEGA-X [40]. The likelihood log was −746644.20. An unrelated β-CASP protein from Gloeobacter violaceus 3). All but one of the factor binding sites were predicted to co-localize with one of the three promoter sites, and Ihf was predicted to bind two promoter regions; ihf and fis were found to encode global regulators that transcriptionally control hundreds of genes in response to environmental stimuli [55]. Another transcription factor binding site for OmpR was identified. Of note, the response regulator RcaC, which is involved in the regulation of CCA, is a member of the OmpR/PhoB family of DNA-binding proteins [33]. LexA encodes a transcriptional repressor involved in salt stress responses [56]. Predicted promoter and transcriptional factor binding sites for psoR shown with upstream gene tcpA. The BPROM program [41] was used to analyze the potential promoter and transcription factor binding sites associated with the F. diplosiphon tcpA-psoR genomic region. The sequence analyzed was retrieved from the NCBI database (accession ID: DQ286230.1). In the figure, −10 and −35 promoter sites are indicated by arrowheads, while the predicted transcription factor binding sites are indicated by color-coded symbols. Table 3. List of potential promoter sites and transcription factor binding sites upstream of psoR in Fremyella diplosiphon. The sequence containing the upstream region of psoR had three predicted promoter regions. Potential transcription factor binding sites were also found, with binding sequences and locations indicated. Bp = base pairs; TF = transcription factor.  Predicted promoter and transcriptional factor binding sites for psoR shown with upstream gene tcpA. The BPROM program [41] was used to analyze the potential promoter and transcription factor binding sites associated with the F. diplosiphon tcpA-psoR genomic region. The sequence analyzed was retrieved from the NCBI database (accession ID: DQ286230.1). In the figure, −10 and −35 promoter sites are indicated by arrowheads, while the predicted transcription factor binding sites are indicated by color-coded symbols. Table 3. List of potential promoter sites and transcription factor binding sites upstream of psoR in Fremyella diplosiphon. The sequence containing the upstream region of psoR had three predicted promoter regions. Potential transcription factor binding sites were also found, with binding sequences and locations indicated. Bp = base pairs; TF = transcription factor.

Protein-Protein Interactions of FdPsoR and Synechocystis PsoR Homologs
To predict whether FdPsoR and its homologs interact with other proteins in vivo, the STRING program was used to predict protein-protein interactions [42]. For FdPsoR, 10 potential interactors were determined ( Figure 6). The majority of the proteins predicted to interact with FdPsoR were involved in DNA and RNA metabolism (Table 4). Of note, given the potential role of PsoR in regulating PBS abundance, one of the proteins identified as a potential interacting partner for FdPsoR was EKF03351.1, a phycocyanin-associated protein.
STRING program was used to predict protein-protein interactions [42]. For FdPsoR, 10 potential interactors were determined ( Figure 6). The majority of the proteins predicted to interact with FdPsoR were involved in DNA and RNA metabolism (Table 4). Of note, given the potential role of PsoR in regulating PBS abundance, one of the proteins identified as a potential interacting partner for FdPsoR was EKF03351.1, a phycocyanin-associated protein. Figure 6. Protein-protein interaction network for FdPsoR. A protein-protein interaction network was generated using STRING [42]. The PsoR protein sequence (GenBank ID: EKE97789.1) from F. diplosiphon was used as a query. Edges represent functional associations, and differently colored lines represent different types of interactions: green line, neighborhood of genes; blue line, co-occurrence across species; purple line, experimental evidence; yellow-green line, text mining; light blue line, database documentation; black line, co-expression data.  Two homologs of FdPsoR, Sll0514 from Synechocystis sp. PCC 6803 and ACA99583.1 from Synechocystis sp. PCC 7002, were also submitted to STRING. Ten proteins were predicted to interact with Sll0514 ( Figure 7, Table 5), and nine proteins were predicted to interact with ACA99583.1 (Figure 8, Table 6). All three PsoR homologs were predicted to interact with Gpm, the 2,3-bisphosphoglycerate-independent phosphoglycerate mutase involved in glucose metabolism, and the components of SbcC/D, a heterodimer that cleaves DNA hairpins [57,58]. Both Synechocystis homologs were predicted to interact with PolA, which is DNA polymerase I [59].
Two homologs of FdPsoR, Sll0514 from Synechocystis sp. PCC 6803 and ACA99583.1 from Synechocystis sp. PCC 7002, were also submitted to STRING. Ten proteins were predicted to interact with Sll0514 ( Figure 7, Table 5), and nine proteins were predicted to interact with ACA99583.1 (Figure 8, Table 6). All three PsoR homologs were predicted to interact with Gpm, the 2,3-bisphosphoglycerate-independent phosphoglycerate mutase involved in glucose metabolism, and the components of SbcC/D, a heterodimer that cleaves DNA hairpins [57,58]. Both Synechocystis homologs were predicted to interact with PolA, which is DNA polymerase I [59]. Figure 7. Protein-protein interaction network for PsoR homolog Sll0514 in Synechocystis sp. PCC 6803. A protein-protein interaction network was generated using STRING [42]. The Sll0514 protein sequence (GenBank ID: sll0514) from Synechocystis sp. PCC 6803 was used as a query. Edges represent functional associations with differently colored lines representing different types of interactions: red line, fusion of genes; green line, neighborhood of genes; blue line, co-occurrence across species; purple line, experimental evidence; yellow-green line, text mining; light blue line, database documentation; black line, co-expression data. A protein-protein interaction network was generated using STRING [42]. The Sll0514 protein sequence (GenBank ID: sll0514) from Synechocystis sp. PCC 6803 was used as a query. Edges represent functional associations with differently colored lines representing different types of interactions: red line, fusion of genes; green line, neighborhood of genes; blue line, co-occurrence across species; purple line, experimental evidence; yellow-green line, text mining; light blue line, database documentation; black line, co-expression data.

Discussion
Considering the critical role of light energy in photosynthesis, mechanisms that allow photosynthetic organisms, such as cyanobacteria, to perceive and respond to fluctuating light environments are vital for organisms to survive and thrive. In cyanobacteria, CCA involves a network of photoreceptors, effectors, and gene operons to finetune pigmentation, cell morphology, filament morphology, and metabolisms in response primarily to RL and GL. The psoR gene has been reported to encode a protein that negatively regulates PBS abundance [36]. Although significant insights have been gained into the regulation of structural genes needed for PBS synthesis, particularly in response to light cues during CA, the regulation of PBS abundance has not been extensively examined in cyanobacteria at the mechanistic level. To understand the roles of PsoR in CCA and cyanobacteria, we assessed its potential structure and function and searched for homologs across the cyanobacterial kingdom to determine when this gene may have arisen in these organisms.
The putative function of PsoR was investigated using the Phyre2 program. The protein encoded by psoR contains a predicted β-CASP domain, including a collection of conserved motifs/residues involved in the processing of DNA and RNA substrates. The β-CASP family of enzymes belongs to the metallo-β-lactamase superfamily, which includes proteins that process a wide variety of substrates in organisms. β-CASP proteins are found in all three domains of life, and they play a predominant role in pre-mRNA processing, although some β-CASP enzymes are involved in DNA repair [44,45,60]. The conserved motifs of PsoR are involved in metal binding, typically zinc ions, and in stabilizing nucleic acid substrates for cleavage. Due to its β-CASP domain and as all prokaryotic β-CASP enzymes identified thus far have been ribonucleases, the role of PsoR in cyanobacteria may be the regulation of mRNA, and it may specifically regulate genes involved in PBS abundance. Other β-CASP proteins tend to function as dimers or within larger protein complexes, usually binding through the C-terminal region of the β-CASP domain or a separate C-terminal region [45,[47][48][49]. Given the presence of 123 residue C-terminal regions in FdPsoR, PsoR may also function as a dimer in vivo or may work through interactions with other proteins.
Homologs of PsoR from F. diplosiphon were found across the Cyanobacteria phylum, and only Gloeobacter lacked a homolog. Gloeobacter are an ancient clade of cyanobacteria that lack thylakoid membranes and have atypical PBSs. Instead of rods grouped around a hemisicoidal core, Gloeobacter PBSs are grouped in bundles of parallel rod-shaped structures attached to cytoplasmic membranes [52,53]. Gloeobacter also lacks the genes psbY, psbZ, and psb27, which encode subunits of photosystem II [53,61]. Homologs of PsoR were not found in red algae or other organisms, which suggests that PsoR homologs arose sometime after branching off from Gloeobacter, perhaps around the time that more complex PBSs evolved. Although red algae have PBSs similar to cyanobacteria, the lack of PsoR homologs may indicate a different enzyme or system for regulating PBS abundance in these distinct organisms.
The range of sizes found among PsoR homologs is notable. As the smallest homologs did not form a distinct branch but were found to be intermingled with homologs in sizes closer to FdPsoR, perhaps these shorter homologs are truncated forms of PsoR ( Figure 4A). Whether they retain the functionality of PsoR needs to be explored experimentally. The 57 homologs that formed their own branch closest to the Gloeobacter outgroup contained the longest homologs found in our analysis, at as much as 918 bp ( Figure 4B). Predominantly of the Oscillatoriales order, they could be extended forms of PsoR.
Three predicted promoter start sites and multiple transcription factor binding sites preceded the psoR gene; yet, one of the two regions also upstream of the neighboring tcpA gene may be a more likely start site of transcription ( Figure 5). Previous work established that TcpA, or tetracontapeptide A, is a 40-amino-acid peptide encoded by a gene that appears to be co-transcribed with psoR and may also play a role in PBS abundance regulation [36]. As tcpA was reported to be found upstream of psoR in every cyanobacterial genome searched by Cobley et al. [36], the presence of these two promoter regions provides further evidence that the two genes are co-transcribed in cyanobacteria where they are both found, and they may even work together in some way to regulate PBS abundance in these bacteria.
The predicted protein-protein interactions indicated interactions with protein partners involved in gene and metabolism regulation. Since FdPsoR is a putative β-CASP nuclease, it is possible that FdPsoR interacts with other proteins involved in regulating CCA and metabolism in response to environmental stimuli. Of note, a PC-associated protein was among the proteins predicted to interact with FdPsoR.
Light fluctuations in terrestrial and aquatic environments can occur frequently, in which the light quality and quantity are altered by factors such as cloud coverage and shading by other objects and organisms. The ability to perceive and respond to changing light conditions is important for cyanobacteria to maximize productivity and minimize damage caused by excess light in these dynamic conditions. Due to the important role PBSs play in CA and photosynthesis, understanding how PBSs undergo remodeling or how their abundance is tuned to external cues could extend our understanding of how cyanobacteria survive and thrive in fluctuating light environments. Although significant insights into the light quality-dependent regulation of the pigment content have been obtained during 50 years of research on CCA, many aspects of PBS regulation remain to be elucidated. Here, PsoR has been reported to be widespread across cyanobacteria, where it is likely to play a critical role in cellular PBS regulation. Understanding how PsoR functions as a β-CASP protein and its role in CCA and cyanobacteria can help lead to a better understanding of how photosynthetic organisms fine-tune their photosynthetic machinery in response to changes in their light environment. This knowledge may contribute to improvements in using cyanobacteria for industrial and pharmaceutical purposes.