ࡱ > < > 5 6 7 8 9 : ; RO bjbj|:|: 0` P P RG p p 3 3 3 3 3 G G G G G , s G o o o o o o o ~ y! $ Z 3 o o o o s 3 3 o o w 3 o 3 o o ~ h 3 3 p`Y G 1 0 , u$ u$ 3 o o o p :
Methods:
Sixteen saxitoxin genes from Cylindrospermopsis raciborskii T3 (sxtB, C, D, E, G, H, I, M, N, P, Q, R, S, T, U and sxtV) and the sxtA1 and sxtA2 fragments of the same organism ADDIN EN.CITE Kellmann200842194219421917Kellmann, R.Mihali, T. K.Jeon, Y. J.Pickford, R.Pomati, F.Neilan, B. A.The University of New South Wales, School of Biotechnology and Biomolecular Sciences, Sydney, NSW 2052, Australia.Biosynthetic intermediate analysis and functional homology reveal a saxitoxin gene cluster in cyanobacteriaApplied & Environmental MicrobiologyApplied & Environmental MicrobiologyAppl. Environ. Microbiol.Appl Environ Microbiol4044-5374132008Jul1098-5336 (Electronic)18487408eng[1] were BLASTed against four different databases. 1. CAMERA all metagonomic sequence reads (tBLASTn), 2. CAMERA all metagonomic 454 reads (tBLASTn), 3. CAMERA all prokaryotic genomes (tBLASTn) and 4. NCBInr (BLASTp). The CAMERA database ADDIN EN.CITE Seshadri200749574957495717Seshadri, R.Kravitz, S. A.Smarr, L.Gilna, P.Frazier, M.J. Craig Venter Institute, Rockville, Maryland, United States of America. rseshadri@venterinstitute.orgCAMERA: a community resource for metagenomicsPLoS Biologye75532007/03/16Bacteria/geneticsEvolution*Genomics2007Mar1545-7885 (Electronic)17355175http://pubmedcentralcanada.ca/articlerender.cgi?tool=pubmed&pubmedid=17355175http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1735517518210591544-9173-5-3-e75 [pii]
10.1371/journal.pbio.0050075eng[2] can be accessed at HYPERLINK "http://camera.calit2.net/index.php" http://camera.calit2.net/index.php. For each of the resulting outputs the top 50 hits or those with an exception value (E) lower than 1.0E-20 (whichever was less) were acquired. Hits acquired from the three CAMERA databases were converted from their native excel file format to FASTA using the Galaxy server ADDIN EN.CITE Taylor200749684968496817Taylor, J.Schenck, I.Blankenberg, D.Nekrutenko, A.New York University, New York, New York, USA.Using galaxy to perform large-scale interactive data analysesCurr Protoc BioinformaticsUnit 10 5Chapter 102008/04/23*AlgorithmsBase SequenceChromosome Mapping/*methodsComputer GraphicsDNA/*geneticsDNA Mutational Analysis/*methodsMolecular Sequence DataSequence Alignment/*methodsSequence Analysis, DNA/*methods*Software*User-Computer Interface2007Sep1934-340X (Electronic)
1934-3396 (Linking)18428782http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1842878210.1002/0471250953.bi1005s19eng[3]. The results for sxtH and sxtT were combined because the analyses of these two dioxygenases clustered together and resulted in comparable phylogenetic inferences. For each gene, the hits from the four separate sequence databases were compiled and aligned with MAFFT using the GINSI algorithm under default settings ADDIN EN.CITE Katoh200549634963496317Katoh, K.Kuma, K.Toh, H.Miyata, T.Katoh, K
Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto 6110011, Japan
Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto 6110011, Japan
Waseda Univ, Sch Engn, Dept Elect Engn & Biosci, Tokyo 1698555, Japan
Kyoto Univ, Dept Biophys, Grad Sch Sci, Kyoto 6068502, JapanMAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids ResearchNucleic Acids ResearchNucleic Acids ResNucleic Acids Res.511-518332protein sequencesdatabaseprogramsgenerationbenchmarkalgorithmsearch20050305-1048ISI:000226941000019<Go to ISI>://000226941000019Doi 10.1093/Nar/Gki198English[4]. The subsequent gene alignments were edited manually in MacClade v4.0 ADDIN EN.CITE Maddison19924964496449649Maddison, W.Maddison, D.MacClade. 3rd ed 1992Sinauer Associates[5]. Ambiguous sites were excluded and duplicate sequences removed. In addition, any taxon with more than 70% missing amino acid sequence data was excluded from the analysis. A preliminary maximum likelihood inference with RAxML version 7.0.3 ADDIN EN.CITE Stamatakis200649674967496717Stamatakis, A.Stamatakis, A
Swiss Fed Inst Technol, Sch Comp & Commun Sci, Lab Prof Moret, Stn 14, CH-1015 Lausanne, Switzerland
Swiss Fed Inst Technol, Sch Comp & Commun Sci, Lab Prof Moret, CH-1015 Lausanne, SwitzerlandRAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformaticsBioinformaticsBioinformaticsBioinformatics2688-26902221inferencediversityecology2006Nov 11367-4803ISI:000241629600016<Go to ISI>://000241629600016Doi 10.1093/Bioinformatics/Btl446English[6] was used to determine and subsequently remove any phylogenetically distant taxa from the alignments. The resulting alignments were again aligned as described previous and the optimal substitution model (WAG) determined from the best likelihood score from the AiC method in ProtTest ADDIN EN.CITE Abascal200549614961496117Abascal, F.Zardoya, R.Posada, D.Abascal, F
Univ Vigo, Dept Biochem Genet & Immunol, Vigo 36310, Spain
Univ Vigo, Dept Biochem Genet & Immunol, Vigo 36310, Spain
CSIC, Museo Nacl Ciencias Nat, Dept Biodivers & Evolutionary Biol, E-28006 Madrid, SpainProtTest: selection of best-fit models of protein evolutionBioinformaticsBioinformaticsBioinformaticsBioinformatics2104-2105219maximum-likelihoodinformation criterionmitochondrial-DNAsubstitutionsequencesphylogeneticsalgorithmsites2005May 11367-4803ISI:000228783000051<Go to ISI>://000228783000051Doi 10.1093/Bioinformatics/Bti263English[7].
Maximum Likelihood (ML) was inferred as previous with the optimal evolutionary model. Topology searches were performed with 100 separate heuristic searches from random starting trees, while bootstrap analyses were done by 100 pseudoreplicates with the same evolutionary model as the initial search. Trees were inferred from RAxML under PROTMIX, with an initial inference under the PROTCAT approximation (optimization of individual per-site substitution rates and classification of those individual rates under 25 specified rate categories), before the PROTGAMMA model with 4 discrete GAMMA rate categories was used to evaluate the final tree topology, such that it yielded stable likelihood values ADDIN EN.CITE ADDIN EN.CITE.DATA [6].
For MEGAN analyses we BLASTed (BLASTn) the rRNA gene sequence of Gymnodinium catenatum strain GCCW991 (GenBank accession DQ779989) and of Alexandrium fundyense strain CCMP1719 (GenBank accession DQ444290) against the same databases as the saxitoxin genes (see above), subsequently BLASTed the results against the non-redundant database using the Bioportal at the University of Oslo, and analysed the results using MEGAN version 3.7.4 ADDIN EN.CITE Huson200749584958495817Huson, D. H.Auch, A. F.Qi, J.Schuster, S. C.Center for Bioinformatics, Tubingen University, Sand 14, 72076 Tubingen, Germany. huson@informatik.uni-tuebingen.deMEGAN analysis of metagenomic dataGenome ResearchGenome ResearchGenome ResGenome Res.377-861732007/01/27Atlantic Ocean*BiodiversityComputational Biology/methods*Ecosystem*Genetic VariationGenome/*geneticsGenomics/*methods*Phylogeny*SoftwareSpecies Specificity2007Mar1088-9051 (Print)17255551http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=172555511800929gr.5969107 [pii]
10.1101/gr.5969107eng[8]. Default parameters were used, except for the LCA parameter Min Support that was reduced from 5 to 4.
All model estimation and phylogenetic analyses were done on the freely available Bioportal at University of Oslo ( HYPERLINK "http://www.bioportal.uio.no/" http://www.bioportal.uio.no/).
Supplementary Figure 1. sxtA1 protein phylogeny. The tree is constructed from the closest hits from the CAMERA all metagonomic sequence reads, all metagonomic 454 reads, all prokaryotic genomes, and NCBInr databases. The tree was reconstructed with Maximum Likelihood (RAxML). Numbers on the internal nodes represent bootstrap values (> 50%).
Supplementary Figure 2. sxtA2 protein phylogeny. The tree is constructed from the closest hits from the CAMERA all metagonomic sequence reads, all metagonomic 454 reads, all prokaryotic genomes, and NCBInr databases. The tree was reconstructed with Maximum Likelihood (RAxML). Numbers on the internal nodes represent bootstrap values (> 50%).
Supplementary Figure 3. sxtG protein phylogeny. The tree is constructed from the closest hits from the CAMERA all metagonomic sequence reads, all metagonomic 454 reads, all prokaryotic genomes, and NCBInr databases. The tree was reconstructed with Maximum Likelihood (RAxML). Numbers on the internal nodes represent bootstrap values (> 50%).
Supplementary Figure 4. sxtI protein phylogeny. The tree is constructed from the closest hits from the CAMERA all metagonomic sequence reads, all metagonomic 454 reads, all prokaryotic genomes, and NCBInr databases. The tree was reconstructed with Maximum Likelihood (RAxML). Numbers on the internal nodes represent bootstrap values (> 50%).
Supplementary Figure 5. Gymnodinium. Taxonomic classification of 400 metagenomic reads from the CAMERA database with the highest sequence similarity to the rRNA sequence of Gymnodinium catenatum strain GCCW991 (GenBank accession DQ779989). Numbers behind node labels indicate the total number of reads summarized in the node, whereas the size of the red circles indicates the number of reads directly assigned to this node.
Supplementary Figure 6. Alexandrium. Taxonomic classification of 400 metagenomic reads from the CAMERA database with the highest sequence similarity to the rRNA sequence of Alexandrium fundyense strain CCMP1719 (GenBank accession DQ444290). Numbers behind node labels indicate the total number of reads summarized in the node, whereas the size of the red circles indicates the number of reads directly assigned to this node.
References:
ADDIN EN.REFLIST 1. Kellmann, R.; Mihali, T. K.; Jeon, Y. J.; Pickford, R.; Pomati, F.; Neilan, B. A., Biosynthetic intermediate analysis and functional homology reveal a saxitoxin gene cluster in cyanobacteria. Appl. Environ. Microbiol. 2008, 74, (13), 4044-53.
2. Seshadri, R.; Kravitz, S. A.; Smarr, L.; Gilna, P.; Frazier, M., CAMERA: a community resource for metagenomics. PLoS Biology 2007, 5, (3), e75.
3. Taylor, J.; Schenck, I.; Blankenberg, D.; Nekrutenko, A., Using galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinformatics 2007, Chapter 10, Unit 10 5.
4. Katoh, K.; Kuma, K.; Toh, H.; Miyata, T., MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 2005, 33, (2), 511-518.
5. Maddison, W.; Maddison, D. MacClade. 3rd ed Sinauer Associates: 1992.
6. Stamatakis, A., RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22, (21), 2688-2690.
7. Abascal, F.; Zardoya, R.; Posada, D., ProtTest: selection of best-fit models of protein evolution. Bioinformatics 2005, 21, (9), 2104-2105.
8. Huson, D. H.; Auch, A. F.; Qi, J.; Schuster, S. C., MEGAN analysis of metagenomic data. Genome Res 2007, 17, (3), 377-86.
& 8 9 D I M O
F$ G$ J$ K$ $ $ $ &