Next Article in Journal
A Primal-Dual Interior-Point Method for Facility Layout Problem with Relative-Positioning Constraints
Next Article in Special Issue
Molecular Subtyping and Outlier Detection in Human Disease Using the Paraclique Algorithm
Previous Article in Journal
Adaptive Quick Reduct for Feature Drift Detection
 
 
Article

An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema

1
Line of Business Life Science, Adesso Schweiz AG, 8048 Zürich, Switzerland
2
Bioprocess Engineering, University of Magdeburg, 39106 Magdeburg, Germany
3
Databases and Software Engineering, University of Magdeburg, 39106 Magdeburg, Germany
*
Author to whom correspondence should be addressed.
Academic Editor: Antonello Rizzi
Algorithms 2021, 14(2), 59; https://doi.org/10.3390/a14020059
Received: 9 January 2021 / Revised: 5 February 2021 / Accepted: 8 February 2021 / Published: 11 February 2021
(This article belongs to the Special Issue Biological Knowledge Discovery from Big Data)
Mass spectrometers enable identifying proteins in biological samples leading to biomarkers for biological process parameters and diseases. However, bioinformatic evaluation of the mass spectrometer data needs a standardized workflow and system that stores the protein sequences. Due to its standardization and maturity, relational systems are a great fit for storing protein sequences. Hence, in this work, we present a schema for distributed column-based database management systems using a column-oriented index to store sequence data. In order to achieve a high storage performance, it was necessary to choose a well-performing strategy for transforming the protein sequence data from the FASTA format to the new schema. Therefore, we applied an in-memory map, HDDmap, database engine, and extended radix tree and evaluated their performance. The results show that our proposed extended radix tree performs best regarding memory consumption and runtime. Hence, the radix tree is a suitable data structure for transforming protein sequences into the indexed schema. View Full-Text
Keywords: trie; radix tree; storage system; sequence data; proteomics; mass spectrometry trie; radix tree; storage system; sequence data; proteomics; mass spectrometry
Show Figures

Figure 1

MDPI and ACS Style

Zoun, R.; Schallert, K.; Broneske, D.; Trifonova, I.; Chen, X.; Heyer, R.; Benndorf, D.; Saake, G. An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema. Algorithms 2021, 14, 59. https://doi.org/10.3390/a14020059

AMA Style

Zoun R, Schallert K, Broneske D, Trifonova I, Chen X, Heyer R, Benndorf D, Saake G. An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema. Algorithms. 2021; 14(2):59. https://doi.org/10.3390/a14020059

Chicago/Turabian Style

Zoun, Roman, Kay Schallert, David Broneske, Ivayla Trifonova, Xiao Chen, Robert Heyer, Dirk Benndorf, and Gunter Saake. 2021. "An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema" Algorithms 14, no. 2: 59. https://doi.org/10.3390/a14020059

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop