Identification of Specific Variations in a Non-Motile Strain of Cyanobacterium Synechocystis sp. PCC 6803 Originated from ATCC 27184 by Whole Genome Resequencing

Cyanobacterium Synechocystis sp. PCC 6803 is a widely used model organism in basic research and biofuel biotechnology application. Here, we report the genomic sequence of chromosome and seven plasmids of a glucose-tolerant, non-motile strain originated from ATCC 27184, GT-G, in use at Guangzhou. Through high-throughput genome re-sequencing and verification by Sanger sequencing, eight novel variants were identified in its chromosome and plasmids. The eight novel variants, especially the five non-silent mutations might have interesting effects on the phenotype of GT-G strains, for example the truncated Sll1895 and Slr0322 protein. These resequencing data provide background information for further research and application based on the GT-G strain and also provide evidence to study the evolution and divergence of Synechocystis 6803 globally.


Introduction
As the first sequenced photosynthetic organism and with high transformation competency, the freshwater cyanobacteria Synechocystis sp. PCC 6803 was one of the most widely used model OPEN ACCESS organisms for the research in photosynthesis and stress response, as well as for the biotechnological application of biofuel production [1][2][3][4][5]. The original Berkeley strain of Synechocystis sp. PCC 6803 was isolated from freshwater in California [6] and deposited in the Pasteur Culture Collection as PCC 6803 strain and in the American Type Culture Collection as ATCC 27184 strain. A glucose-tolerant (GT) strain was isolated from ATCC 27184 and designated Williams GT strain [7], which later the GT-Kazusa strain was derived from. The chromosome sequences of GT-Kazusa were published as the first Synechocystis sp. PCC 6803 genomic sequence [8,9]. In recent years, based on the high-throughput sequencing techniques, several other strains of Synechocystis sp. PCC 6803 were sequenced and reported world widely [10][11][12][13]. Other than the database errors, unique sequence variations were identified in GT-S, GT-I, PCC-P (positive phototactic), PCC-N (negative phototactic) and PCC-M (Moscow, Russia) strain, as well as the GT-O1 and GT-O2 in New Zealand. It is suggested that strain-specific mutations are likely to be responsible for phenotypic variation, such as pilus biosynthesis and motility. Such widespread genomic variations imply that novel mutations may exist between and within research labs. Recent genomic analysis of stress-evolved Synechocystis sp. PCC 6803 strains also revealed interesting information in adaptive evolution and stress response under high temperature or low pH [14,15].
In our lab, a designated wild type strain of Synechocystis sp. PCC 6803 was originated from ATCC 27184 and subjected to mutant construction for analyzing the signal transduction in stress response [16][17][18][19]. It is glucose-tolerant [17], but its genomic background information was not defined. Thus we re-sequenced and analyzed our own wild type stain GT-G (Guangzhou, China) to provide reference information for future research and to clarify its phylogenetic relationships with various sequenced strains. Our results not only provide background information for further research and application based on GT-G strain, but also provide evidence to study the evolution and divergence of Synechocystis 6803 globally.

Overview
The glucose tolerant strain originated from ATCC 27184 through routine laboratory culture conditions in our lab was designated GT-G (Guangzhou) and subjected to genomic re-sequencing. More than 8 million short reads (101 bp per read) were obtained from Illumina Hiseq2000 sequencing platform, about 808 Mb high quality data in total. This represents more than 200 folds coverage of the 3.96 Mb Synechocystis 6803 chromosome and plasmid genome. Using BWA [20] and VarScan [21,22] software, genomic sequences were constructed and putative variants were identified through mapping reads to the reference sequence of GT-Kazusa chromosome and plasmids. SNPs and indels were identified, while no large structure variation was detected. The putative variants were then verified by Sanger sequencing of the corresponding PCR products. No false-positive variant was found. The genome sequence of GT-G was deposited in the GenBank database under the accession number CP012832.
In total, 40 SNPs and indels were identified and verified in GT-G strain, 34 in chromosome and six in plasmids (Table 1). Among these, 32 variations were previously reported, including the 21 database errors of GT-Kazusa reported previously [10] (Table 2). Excluding the errors of database, among the 19 mutations of GT-G, 10 mutations are shared with PCC-M, nine are shared with PCC-N and PCC-P, six are shared with GT-O1 and GT-O2, five are shared with GT-I, and three are shared with GT-S [10-13].

Chromosome Variations Shared with Other Strains
Mutation #1 implies that the 102 base pair deletion in slr1084 is specific to the GT-Kazusa and GT-S strains (Table 2) [11,12]. Mutation #2 implies that GT-G originated from ATCC 27184 before the 154 base pairs deletion appeared upstream and within slr2031. However, GT-G shares with the other glucose tolerant, non-motile strains the 1 bp insertion in sll1574/5 (spkA) gene, as checked and confirmed by PCR (Supplemental Table S1). The spkA gene was essential for motility and pilus biosynthesis [23,24] and its mutation might partly explain the non-motility in GT strain. Mutation #3 occurs in the non-coding region between infA and adk gene, 12 bp upstream of the transcriptional start site of infA gene [25]. This variation was also identified in PCC-P, PCC-N and PCC-M strain [11,12]. It changes the putative −10 element from "TGTGAT" to "TATGAT". Thus it might have an effect on the transcription of infA gene, which encodes translation initiation factor IF-1.
It was reported that re-sequencing and mapping might fail to detect large indels, but report SNPs in the target region instead [11,13]. However in this study, several large indels are successfully called by mapping and confirmed by PCR and Sanger sequencing. Three 1.2 kb deletions in GT-G represent that the ISY203b (#6), ISY203e (#11), and ISY203g (#34) transposases insertion does not appear in GT-G, thus suggesting that they are specifically present in GT-Kazusa and/or GT-S [10][11][12].
The 1 bp deletion in slr0162 (pilC, #14) is a variation common to all the reported PCC strains and GT strains except for GT-Kasuza [10], which suggests that the 1 bp insertion in slr0162 was specific in GT-Kazusa (Table 2). This insertion caused a frameshift mutation in pilC gene and resulted in a truncated PilC protein, which might contribute to the lack of motility in GT-Kasuza [26]. Four novel variants in the chromosome unique to the GT-G strain are identified and verified as #8, #15, #16, and #17, which will be discussed in detail later. Two SNPs, mutations #27 and #29 are shared between GT-G strain and all PCC strains, suggesting their close relationship. They result in a silent mutation in PleD like protein coding gene slr0302 and an amino acid change in a putative transposase ISY100v3 coding gene ssr1176, respectively.

Variations in Plasmids
Sequencing data cover all the seven plasmids and identify six mutations in three plasmids (#35-#40), which are all successfully verified by Sanger sequencing of PCR product. Of the six mutations in plasmid, four are unique to GT-G strain (#36, #37, #38, #40) and will be discussed in the next section. The 1.2 kb deletion in plasmid pSYSM (#35) represents the ISY203j transposase missing in GT-G, which was also reported in PCC-M [12]. The SNP in ssr6089 of plasmid pSYSX (#39) results in a N37S change in the hypothetical protein, and is shared with GT-O1 and GT-O2 strains [13].

Novel Variations in GT-G
Among the eight GT-G specific mutations identified here, five are SNP, two are deletion, and one is insertion, all of which locate in the open reading frame. Three SNPs (#15, #37 and #38) are silent mutations, while the other mutations cause amino acid change or frameshift.
The 1 bp insertion in sll1895 gene (#8) leads to frameshift and results in a truncated Sll1895 protein (Figure 1a). The 696 amino-acids long Sll1895 protein in GT-Kazusa is predicted to contain several functional domains, such as FHA (Forkhead-associated domain for phosphopeptide recognition), GGDEF (diguanylate cyclase domain), and EAL (candidate for a diguanylate phosphodiesterase function). It was suggested to contribute to signal transduction according to its conserved domain [27] and Sll1895 protein was found upregulated by hexane in a proteomic analysis [28]. The 377 amino acids-long truncated Sll1895 in GT-G strain lose EAL domain and part of the GGDEF domain, which may result in a non-functional protein (Figure 1a). A novel large deletion was revealed in GT-G as 981 bp deletion in the middle of slr0322 gene (#17), resulting in 327 amino acids deletion inside the 1095 amino acids long Slr0322 protein (Figure 1b). Slr0322 in GT-Kasuza is a putative two-component hybrid sensor and regulator designated as Hik43, consisting of a histidine kinase domain and two response regulator domains in the N and C terminal respectively [3]. It was also designated PilL-C/CheA since it was homologous to the C-terminal of CheA-like protein and was essential for motility, thick pili biosynthesis, and transformation competency [29]. Slr0322 in GT-Kasuza strain contains the ATPase domain and CheW like domain between the kinase domain and response regulator domain, but they are lost in GT-G strain due to the deletion (Figure 1b). In addition, SNP in slr0322 (#16) leads to a P280T residue change in the histidine kinase domain of this protein. Such functionally adverse mutations might have an effect on GT-G phenotype. Thus, we examined its surface structure under transmission electron microscope and its motility under lateral illumination. Electron micrographs of negatively stained GT-G cells indicated the deficiency of pilus and no phototactic movement of the GT-G colony was observed under lateral illumination (Figure 2). These phenotypes may be attributed to both the 1 bp insertion in spkA gene and the mutations in slr0322. Further research is needed to characterize the impact of individual mutations in GT-G strain.
Other than the two silent SNPs (#37, #38), SNP #36 in plasmid pSYSX is predicted to result in I64M change in unknown protein Slr6004. One base pair deletion in pCB2.4 (#40) leads to frameshift in hypothetical protein MYO_820 gene.

Phylogenetic Relationships
Among the Synechocystis strains sequenced and reported so far, the genomic sequence of GT-G is most similar to PCC-M strain, sharing nine chromosome variants and one plasmid variant, though they are different in motile capacity (Table 2, Figure 2). According to our result and the published data, the phylogenetic relationships among various sequenced strains of Synechocystis sp. PCC 6803 are summarized in Supplemental Table S1 and visualized in Figure 3. GT-G strain can grow in glucose and cannot move towards light, which are characteristic of GT strains. GT-G strain shares with the other GT strains the 1 bp insertion in sll1574/5 (spkA), which was critical for the motility and pilus biosynthesis [23,24]. However, it doesn't contain the 154 bp deletion upstream and within slr2031, which makes it different from the other GT strains. GT-G shares with PCC strains SNP ssr1176, SNP slr0302 and SNP before infA, which suggests that GT-G may be the strain closest to the origin of the splitting of the PCC and GT strains.  [11][12][13].

Strain and DNA Extraction
The GT-G strain of Synechocystis sp. PCC 6803 was derived from ATCC 27184. It was cultured in BG11 medium with 20 mM HEPES-NaOH (pH 7.5) at 29 °C, and illuminated with 30 μE·m −2 ·s −1 .
The cells of mid-logarithmic phase (OD730 = 1.0) were harvested by centrifugation at 5000× g for 5 min. Total DNA was extracted using the extraction kit (Dongsheng, Guangzhou, China) according to the manufacturer's instructions.

Mutation Verification
All the putative SNPs and indels were verified through Sanger sequencing the PCR products, which covered the variation site. Annotation information was obtained from Cyanobase [31]. The reported error of database not called by software was checked by Sanger sequencing.

Electron Microscopy and Motility Assay
The electron microscopy and phototactic assay were performed as previously described [29]. Briefly, the cell surface structures were examined after staining with 0.8% (w/v) phosphotungstic acid (pH 7.0) under transmission electron microscope (1200EX, JEOL, Tokyo, Japan). Phototactic movement was observed on 0.8% (w/v) agar under lateral illumination.

Conclusions
Re-sequencing of the GT-G strain of Synechocystis 6803 identified eight novel variants, which are likely to affect gene function. Mutations found in GT-G strain indicate that it is divergent from ATCC 27184 after the 1 bp insertion in spkA, and before the 154 bp deletion upstream and within slr2031. Agreement with previously reported error of database and the successful verification of variations by Sanger sequencing indicate the effectiveness and powerfulness of re-sequencing at about 200-fold coverage. Our data highlight the specific variants in the GT-G strain originated from ATCC 27184 and provide background information for future research based on GT-G strain. It also provides further evidence to identify the evolution and divergence of Synechocystis 6803 globally.