Metagenomics analysis revealing the composition and functional repertoire of complex microbial communities typically relies on large amounts of sequence data. Numerous analysis strategies and computational tools are available for their analysis. Fully integrated automated analysis pipelines such as MG-RAST or MEGAN6 are user-friendly but not designed for integrating specific knowledge on the biological system under study. In order to facilitate the consideration of such knowledge, we introduce a modular, adaptable analysis pipeline combining existing tools. We applied the novel pipeline to simulated mock data sets focusing on anaerobic digestion microbiomes and compare results to those obtained with established automated analysis pipelines. We find that the analysis strategy and choice of tools and parameters have a strong effect on the inferred taxonomic community composition, but not on the inferred functional profile. By including prior knowledge, computational costs can be decreased while improving result accuracy. While automated off-the-shelf analysis pipelines are easy to apply and require no knowledge on the microbial system under study, custom-made pipelines require more preparation time and bioinformatics expertise. This extra effort is minimized by our modular, flexible, custom-made pipeline, which can be adapted to different scenarios and can take available knowledge on the microbial system under study into account.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited