Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus
Abstract
1. Summary
2. Data Description
2.1. Dataset Description
2.2. Tables
3. Materials and Methods
3.1. Database Construction for Proteogenomic Analyses
3.1.1. Database A: Protein Sequences from Fingerhut et al. (2018)
3.1.2. Database B: Antimicrobial Peptides (AMPs)
3.1.3. Database C: Proteins Identified with Proteome Discoverer
- STEP 1: Sample preparation and LC–MS/MS analysis
- STEP 2: Protein identification using Proteome Discoverer
3.1.4. Databases D and E: Proteins Identified from the de novo Transcriptome Assemblies of Cephalopods’ PSGs
- STEP 1: Search and de novo assembly of cephalopods’ PSGs transcriptomes
- STEP 2: Database D—proteins identified by TransDecoder
- STEP 3: Database E—proteins identified by the six-frame translation tool
3.1.5. Database F: O. vulgaris Proteins Identified by the Six-Frame Translation Tool
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Almeida, D.; Domínguez-Pérez, D.; Matos, A.; Agüero-Chapin, G.; Osório, H.; Vasconcelos, V.; Campos, A.; Antunes, A. Putative antimicrobial peptides of the posterior salivary glands from the cephalopod Octopus vulgaris revealed by exploring a composite protein database. Antibiotics 2020, 9, 757. [Google Scholar] [CrossRef] [PubMed]
- Fingerhut, L.C.H.W.; Strugnell, J.M.; Faou, P.; Labiaga, Á.R.; Zhang, J.; Cooke, I.R. Shotgun Proteomics Analysis of Saliva and Salivary Gland Tissue from the Common Octopus Octopus vulgaris. J. Proteome Res. 2018, 17, 3866–3876. [Google Scholar] [CrossRef] [PubMed]
- Aguilera-Mendoza, L.; Marrero-Ponce, Y.; Tellez-Ibarra, R.; Llorente-Quesada, M.T.; Salgado, J.; Barigye, S.J.; Liu, J. Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences. Bioinformatics 2015, 31, 2553–2559. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Proteomics Toolkit (Protk). Available online: https://github.com/iracooke/protk (accessed on 14 April 2019).
- Wiśniewski, J.R.; Zougman, A.; Nagaraj, N.; Mann, M. Universal sample preparation method for proteome analysis. Nat. Methods 2009, 6, 359–362. [Google Scholar] [CrossRef] [PubMed]
- Bateman, A. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef]
- Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.; Lieber, M.; et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef] [PubMed]
- Sequence Read Archive of National Center for Biotechnology Information. Available online: https://www.ncbi.nlm.nih.gov/sra/?term=Cephalopoda (accessed on 26 October 2018).
- Sequence Set Browser from National Center for Biotechnology Information. Available online: https://www.ncbi.nlm.nih.gov/Traces/wgs/?page=1&view=TSA&search=Cephalopoda (accessed on 26 October 2018).
- Ruder, T.; Sunagar, K.; Undheim, E.A.B.; Ali, S.A.; Wai, T.-C.; Low, D.H.W.; Jackson, T.N.W.; King, G.F.; Antunes, A.; Fry, B.G. Molecular Phylogeny and Evolution of the Proteins Encoded by Coleoid (Cuttlefish, Octopus, and Squid) Posterior Venom Glands. J. Mol. Evol. 2013, 76, 192–204. [Google Scholar] [CrossRef] [PubMed]
- European Nucleotide Archive. Available online: https://www.ebi.ac.uk/ena (accessed on 16 November 2018).
- CLC Genomics Workbench 11.0.1. Available online: https://www.qiagenbioinformatics.com/ (accessed on 16 November 2018).
- Geneious. Available online: https://www.geneious.com (accessed on 16 November 2018).
- DB Browser for SQLite. Available online: https://sqlitebrowser.org/ (accessed on 16 November 2018).
Dataset Name | File Name | File Type | DOI |
---|---|---|---|
Dataset_1 | All_Databases_5950827_sequences | FASTA | 10.17632/df8w8dct3b.1 |
Database_A_19087_sequences | FASTA | ||
Database_B_16990_sequences | FASTA | ||
Database_C_2427_sequences | FASTA | ||
Database_D_84778_sequences | FASTA | ||
Database_E_5106635_sequences | FASTA | ||
Database_F_720910_sequences | FASTA | ||
Dataset_2 | DA_summary_Proteome_Discoverer_ISD | XLSX | 10.17632/hrydnjz937.1 |
DA_summary_Proteome_Discoverer_FASP | XLSX | ||
Dataset_3 | 272704_contigs_from_16_cephalopods_PSGs_transcriptome_assemblies | FASTA | 10.17632/fjnnjv6nnn.1 |
SRR680047_assembly | FASTA | ||
SRR684167_assembly | FASTA | ||
SRR684223_assembly | FASTA | ||
SRR725597_assembly | FASTA | ||
SRR725779_assembly | FASTA | ||
SRR725780_assembly | FASTA | ||
SRR725935_assembly | FASTA | ||
SRR725936_assembly | FASTA | ||
SRR725937_assembly | FASTA | ||
SRR725938_assembly | FASTA | ||
SRR2047107_assembly | FASTA | ||
SRR3105321_assembly | FASTA | ||
SRR3105558_assembly | FASTA | ||
SRR5204441_assembly | FASTA | ||
SRR5204442_assembly | FASTA | ||
SRR6349992_assembly | FASTA | ||
Table_S1 | XLSX | ||
Dataset_4 | SRR680047_assembly.fasta.transdecoder.pep | FASTA | 10.17632/h94v3bk4j6.1 |
SRR684167_assembly.fasta.transdecoder.pep | FASTA | ||
SRR684223_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725597_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725779_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725780_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725935_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725936_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725937_assembly.fasta.transdecoder.pep | FASTA | ||
SRR725938_assembly.fasta.transdecoder.pep | FASTA | ||
SRR2047107_assembly.fasta.transdecoder.pep | FASTA | ||
SRR3105321_assembly.fasta.transdecoder.pep | FASTA | ||
SRR3105558_assembly.fasta.transdecoder.pep | FASTA | ||
SRR5204441_assembly.fasta.transdecoder.pep | FASTA | ||
SRR5204442_assembly.fasta.transdecoder.pep | FASTA | ||
SRR6349992_assembly.fasta.transdecoder.pep | FASTA | ||
Dataset_5 | cases | CSV | 10.17632/p6vnj6ssrf.1 |
transcripts | CSV | ||
DB | DB | ||
SQL_command | TXT | ||
187926_contigs_not_included_in_Database_D | CSV | ||
187926_contigs_not_included_in_Database_D | FASTA | ||
a sixframe.rb | RB | ||
six-frame_translation_of_187926_contigs_not_included_in_Database_D | FASTA | ||
Dataset_6 | cases | CSV | 10.17632/x73ff3n744.1 |
transcripts | CSV | ||
DB1 | DB | ||
SQL_command1 | TXT | ||
31661_contigs_not_included_in_Database_A | CSV | ||
31661_contigs_not_included_in_Database_A | FASTA | ||
a sixframe.rb | RB | ||
six-frame_translation_of_31661_contigs_not_included_in_Database_A | FASTA |
Instrument Platform (Library Layout) | Species | CLC Genomics Workbench de novo Assembly a | TransDecoder Analysis a,b | Six-Frame Translation Tool Analysis a,c | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SRA Run Accession d | Number of Reads | Matched e | Contig Count | Contig Average Length | Reads Mapped in Pairs f | Reads Mapped in Broken Pairs g | N50 h | N75 i | # of Contigs Analyzed j | # of Proteins Identified k | # of Contigs Analyzed l | # of ORFs Identified m | ||
Illumina (paired) | Sepia officinalis (female) | SRR5204441 | 34,623,104 | 31,510,916 | 47,489 | 686 | 23,187,508 | 8,323,408 | 1005 | 425 | 47,489 | 14,583 | 32,906 | 870,077 |
Sepia officinalis (male) | SRR5204442 | 21,428,980 | 18,038,146 | 40,778 | 675 | 14,141,858 | 3,896,288 | 929 | 426 | 40,778 | 14,056 | 26,722 | 691,205 | |
Callistoctopus minor | SRR6349992 | 69,681,384 | 52,377,156 | 58,327 | 703 | 39,695,532 | 12,681,624 | 1072 | 440 | 58,327 | 15,365 | 42,962 | 1,164,790 | |
Hapalochlaena maculosa | SRR3105558 | 16,128,360 | 13,948,566 | 36,755 | 636 | 12,399,458 | 1,549,108 | 832 | 410 | 36,755 | 13,695 | 23,060 | 580,147 | |
Octopus kaurna | SRR3105321 | 46,268,294 | 40,764,402 | 33,936 | 584 | 37,224,454 | 3,539,948 | 718 | 379 | 33,936 | 10,965 | 22,971 | 572,048 | |
Octopus bimaculoides | SRR2047107 | 71,186,024 | 65,629,243 | 50,286 | 875 | 58,627,142 | 7,002,101 | 1606 | 582 | 50,286 | 14,267 | 36,019 | 1,145,961 | |
LS454 (single) | Abdopus aculeatus | SRR680047 | 33,464 | 21,627 | 774 | 526 | N.A. | N.A. | 529 | 411 | 774 | 331 | 443 | 11,133 |
Hapalochlaena maculosa | SRR725938 | 55,955 | 49,003 | 528 | 475 | N.A. | N.A. | 494 | 378 | 528 | 154 | 374 | 9310 | |
Loliolus noctiluca | SRR725597 | 72,031 | 67,299 | 200 | 552 | N.A. | N.A. | 545 | 436 | 200 | 93 | 107 | 2724 | |
Octopus cyanea | SRR725937 | 55,039 | 40,899 | 964 | 503 | N.A. | N.A. | 521 | 396 | 964 | 352 | 612 | 15,328 | |
Pareledone turqueti | SRR725936 | 64,419 | 60,295 | 231 | 500 | N.A. | N.A. | 522 | 404 | 231 | 101 | 130 | 3024 | |
Octopus kaurna | SRR684223 | 61,953 | 55,831 | 491 | 497 | N.A. | N.A. | 497 | 394 | 491 | 164 | 327 | 7985 | |
Sepia latimanus | SRR725779 | 49,960 | 42,657 | 434 | 461 | N.A. | N.A. | 459 | 361 | 434 | 83 | 351 | 8693 | |
Adelieledone polymorpha | SRR684167 | 71,506 | 69,025 | 116 | 528 | N.A. | N.A. | 474 | 397 | 116 | 37 | 79 | 1847 | |
Sepia pharaonis | SRR725935 | 45,677 | 36,088 | 492 | 489 | N.A. | N.A. | 480 | 395 | 492 | 166 | 326 | 7756 | |
Sepioteuthis australis | SRR725780 | 68,851 | 60,037 | 903 | 562 | N.A. | N.A. | 563 | 448 | 903 | 366 | 537 | 14,607 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Almeida, D.; Domínguez-Pérez, D.; Matos, A.; Agüero-Chapin, G.; Castaño, Y.; Vasconcelos, V.; Campos, A.; Antunes, A. Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus. Data 2020, 5, 110. https://doi.org/10.3390/data5040110
Almeida D, Domínguez-Pérez D, Matos A, Agüero-Chapin G, Castaño Y, Vasconcelos V, Campos A, Antunes A. Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus. Data. 2020; 5(4):110. https://doi.org/10.3390/data5040110
Chicago/Turabian StyleAlmeida, Daniela, Dany Domínguez-Pérez, Ana Matos, Guillermin Agüero-Chapin, Yuselis Castaño, Vitor Vasconcelos, Alexandre Campos, and Agostinho Antunes. 2020. "Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus" Data 5, no. 4: 110. https://doi.org/10.3390/data5040110
APA StyleAlmeida, D., Domínguez-Pérez, D., Matos, A., Agüero-Chapin, G., Castaño, Y., Vasconcelos, V., Campos, A., & Antunes, A. (2020). Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus. Data, 5(4), 110. https://doi.org/10.3390/data5040110