Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets†
AbstractFor metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics datasets, there are difficulties with the analysis, error detection and variant calling, stemming from the challenges of discerning sequencing errors from biological variation. Confirming base candidates with high frequency of occurrence is no longer a reliable measure because of the natural variation and the presence of rare bases. The paper discusses an approach to the application of machine learning models to classify bases into erroneous and rare variations after preselecting potential error candidates with a weighted frequency measure, which aims to focus on unexpected variations by using the inter-sequence pairwise similarity. Different similarity measures are used to account for different types of datasets. Four machine learning models are implemented and tested. View Full-Text
Scifeed alert for new publicationsNever miss any articles matching your research from any publisher
- Get alerts for new papers matching your research
- Find out the new papers from selected authors
- Updated daily for 49'000+ journals and 6000+ publishers
- Define your Scifeed now
Krachunov, M.; Nisheva, M.; Vassilev, D. Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets. Computers 2017, 6, 29.
Krachunov M, Nisheva M, Vassilev D. Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets. Computers. 2017; 6(4):29.Chicago/Turabian Style
Krachunov, Milko; Nisheva, Maria; Vassilev, Dimitar. 2017. "Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets." Computers 6, no. 4: 29.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.