1. Introduction
Due to its stability and ubiquity in the environment, the DNA contamination of laboratory consumables and reagents is nearly inevitable. This is of particular concern with highly sensitive analysis methods, such as polymerase chain reaction (PCR) assays and next-generation sequencing. Nucleic acid contamination has been a particular problem in studies of human microbiomes [
1,
2,
3,
4,
5,
6]. Due to this, controversy has arisen as to whether some tissues are sterile, as was previously assumed, or if they have low-level bacterial communities, such as the human placenta [
1,
3]. This has led to the coining of the term “kitome” or contaminating bacterial sequences that result from laboratory consumables and nucleic acid isolation kits [
3,
5]. It has also been reported that PCR materials can be sources of DNA contamination [
1,
7], which was proven by treatment of PCR master mixes with DNases that specifically target double-stranded DNA, which have recently become commercially available [
7,
8]. Currently, the best practice when examining samples expected to have low bacterial burden is to include a variety of negative control samples to exclude those sequences that are likely due to contamination [
2,
3,
6]. Despite supposed widespread knowledge of the issue of contamination, only a minority of microbiome papers report using methodology to control for it, and most studies lack any particular negative controls [
9]. However, the inclusion of additional control reactions in short-read sequencing protocols and analysis is resource intensive and can require significant additional computational time. This makes arguing in favor of these controls difficult, especially in the case of researchers with less resources available to them.
While many researchers have moved away from using amplicon-based sequencing to determine the makeup of the microbiome in favor of metagenomics processes, PCR remains an important step in the characterization of the microbial communities of low-burden samples, and amplicon-based approaches still have many advantages [
10,
11].
A variety of bioinformatic tools are available and in development to control for bacterial contamination [
12,
13,
14]. However, many of these tools are overly blunt and only remove samples of low abundance, or assume that any sequence that is present in a negative control must be a contaminant in an experimental reaction [
13]. Thus, there is still a need for experimental validation and human oversight of these filtering processes. These tools need further development, and they still function best when used in conjunction with thorough negative sampling.
Here, we present data that show that multiple commercially available PCR enzymes and their components are contaminated with different dominant bacterial DNAs, and that this contaminating DNA can be detected with endpoint PCR and Sanger sequencing. Our data indicate that DNA contamination should be examined regardless of the molecular biology reagents used, and that this can be carried out rapidly, for minimal cost, and without short-read sequencing, which makes the large number of control reactions needed feasible for labs with less access to resources. These contaminants can then be excluded or otherwise addressed in further analysis of the microbiome data.
2. Materials and Methods
2.1. PCR
Nine PCR enzymes (1–9) were obtained from five manufacturers; the identities of these enzymes and manufacturers can be found in
Supplementary Table S1. Reactions were carried out according to manufacturer’s recommendations. To test for bacterial DNA contamination, two sets of reactions were performed per enzyme. First,
E. coli DNA was used as a template to confirm the accuracy of the primer set for bacterial DNA detection as well as to ensure that reactions were running as expected. Second, to determine if contamination was present, reactions were run with no DNA template and instead used water only. Primers were obtained from Invitrogen. Positive control reactions contained DNA extracted from overnight cultures of
E. coli isolated from a human fecal sample, a generous gift from Dr. Julie In. Primers and reaction conditions were adapted from published studies [
15,
16]. PCRs were prepared under laminar flow, in hoods that are only used for PCR preparation, using aseptic technique. Information regarding the enzyme manufacturer, deoxynucleotide triphosphate (dNTP) mix aliquot, water aliquot used, and if the enzyme was premixed is listed in
Table 1. All reactions were performed using Invitrogen RT-PCR grade water except for enzyme 9, which came with its own molecular-grade water. All PCR cycling conditions can be found in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6. Primer sequences can be found in
Table 7.
2.2. Gel Electrophoresis
Five μL of PCR product was combined with 1 μL of 6X gel loading dye and then separated by gel electrophoresis in a 1% agarose gel using the Owl system from ThermoFisher. Gels were cast using premixed SYBRsafe 0.5% solution (ThermoFisher, Waltham, MA, USA) and run in 1X TBE buffer. Gel images were developed using ultraviolet light in a UVP UVsolo touch system (Analytikjena, Jena, Germany).
2.3. Size Selection of Contaminating Bands for Sequencing
Samples were mixed with 10X loading dye (ThermoFisher) and separated by electrophoresis in the EGel system (ThermoFisher) using clone well 0.8% and Size Select 2% agarose gels according to manufacturer’s directions (ThermoFisher). Bands were selected for sequencing at 500 bases, the expected size of the V3-4 region of the bacterial 16S rRNA. One enzyme generated bands at 1000 bases that were also collected. Bands found at approximately 100 bases were collected if present. Samples were collected in nuclease free water.
2.4. Sanger Sequencing of Samples
Samples were submitted to GENEWIZ (Azenta, Burlington, MA, USA). Samples were sequenced in separate forward and reverse reactions using the same primers from the PCRs.
2.5. Informatics Analysis of Sample Sequences
Chromatogram files, the raw spectral files that are used to examine the Sanger sequencing results of the sequences, were used for quality control. Read ends were trimmed to remove long runs of unidentified bases, and the first and last included base had to have a quality score greater than or equal to 20. After trimming, the sequences were used to generate a phylogenetic tree using Mega 11 software [
17]. The greatest likelihood algorithm was used, and 1000 bootstraps were performed. The tree file was visualized using the Interactive Tree of Life (ITOL) for color coding and labeling [
18]. Trimmed sequences were used to query the NCBI GenBank database via megablast, searching for highly similar nucleotide sequences. The top three hits from each search were identified and genus, and species if present, were recorded as well as the percent coverage and percent identity of each read. The similarity between samples was calculated as percent identity using Clustal Omega [
19].
3. Results
3.1. Seven of Nine Commercially Available PCR Enzymes Were Found to Have Bacterial DNA Contamination
Commercially available DNA polymerases and their reaction components were purchased from five manufacturers, and nine polymerase products were tested for bacterial DNA contamination in triplicate using
E. coli DNA as a template or water as a negative control. One of the polymerases was premixed, and all kits provided all necessary components except the dNTP solution, which was shared between the reactions as indicated in
Table 1. The polymerases were grouped into similar conditions and the reactions were run in groups in a thermocycler. The PCR products were then visualized after gel electrophoresis using UV light. It was found that seven of the nine polymerases had bands of equivalent size to the V3-4 region of the 16S rRNA in water control reactions (
Figure 1). This is seen when comparing the reactions to positive control reactions which contained template DNA isolated from overnight cultures of
E. coli (
Figure 1). These data indicate that bacterial DNA contamination is present in seven of the nine commercially available products. Enzyme and manufacturer identities are found in
Supplementary Table S1.
These results are not due to contamination from our laboratory, as two of the enzymes (2 and 9) do not have visible contamination after PCR. Enzyme 1 also had low levels of contamination; however, it does have a faint band in the third technical replicate of the water reaction, indicating a small amount of DNA contamination. The enzymes that display contamination were prepped at different times, and run in different reactions in the thermocycler, reducing the likelihood of cross-contamination between different reactions.
While DNA gels have become less popular as an endpoint for the detection of DNA, they are a highly sensitive assay. A DNA band can be visualized from as little as 2 ng/μL of total concentration of DNA in a solution. Due to the exponential nature of the PCR, this amount of DNA can be generated from sub-picogram amounts of DNA in the initial reaction. Thus, the blank lanes in the gel indicate samples that were free from any DNA that could be amplified by the selected primer set. If there was DNA present, but not enough to generate the required concentration visualizable in a gel, it would not be apparent here.
It is possible that there would be detectable reads during a next-generation sequencing run, as there have been reports of contamination appearing at levels of 100 s of reads per lane [
5]. However, that would then open the possibility that this contamination occurred during library preparation, or during chip loading and sequencing itself, which would be out of our control. For a truly comprehensive approach, the size selection of PCR products could be performed for the approximately 500 base-pair contaminants, even in those reactions that do not have visible DNA on an agarose gel. These selected products could then be submitted for Sanger sequencing. However, this is unlikely to generate useful sequencing data, as the ideal amount of DNA for a Sanger read is roughly equivalent to the limit of detection in an agarose gel [
20,
21].
3.2. DNA Contamination Derived from Different Bacterial Sources
PCR products from these reactions were submitted for Sanger sequencing using either the forward or reverse primer from the PCR. Each contaminating band of the expected size of the V3-4 regions (500 bp) was selected, as were bands of 100 bp if they were present and distinct after electrophoresis. Enzyme 8 also produced bands of 1000 bp in its positive control reactions, and these were selected as well. The chromatograms were used to trim low-quality ends, and any sequences that did not result in usable reads were excluded. These sequences were then used to determine (a) if there was an association between the contaminating DNA and the polymerases used in each reaction and (b) the most likely identity of the contaminating bacteria.
By generating phylogenetic trees using the maximum likelihood algorithm, we found that there was an association between the polymerase used and the resulting sequence of the contaminating bands. Most of the polymerases closely associated with themselves (
Figure 2 and
Figure 3). The exception to this is enzyme 7, which is interspersed with enzymes 3 and 5. Percent identity between sequences, as calculated by the Clustal algorithm, can be found in
Table 8. These calculations indicate that while certain enzymes and reactions have highly similar sequences (enzyme 8’s positive reactions), the sequencing results are highly diverse across the different enzymes. The most likely contaminating bacteria based on sequence identity were determined through the use of the NCBI blast algorithm. The likely identities of these bacteria can be found in
Supplementary Table S2.
This grouping of polymerases, as well as the distinct sequences that were generated from each contaminant band, indicate that these bacterial DNA contaminants were intrinsic to the polymerase reaction components themselves and not introduced in our own workflow. If the bacterial contamination had been introduced by our handling of the samples, we would expect to see more similar sequences across all the polymerases, without an association between sequences and individual polymerases. Additionally, if the contamination was introduced by us, we would not have expected to see the wide variance in bacterial species and genera that were found via blast search. Furthermore, the contamination of our own reagents or the introduction of contamination during reaction set up would have led to every polymerase reaction being contaminated, but two of the polymerases, enzymes 2 and 9, did not contain apparent bacterial DNA.
4. Discussion
It has been previously demonstrated that bacterial DNA contamination can be found in extraction kits, laboratory consumables, and even in PCR reagents [
1,
2,
3,
4,
5,
6,
7]. Here, we have extended these previous findings by examining a large cross-section of commercially available PCR enzymes, while previous work has mostly only characterized one enzyme at a time. We have also made a point of detecting and characterizing these contaminants using only endpoint PCR and Sanger sequencing. These methods are attainable even for labs with minimal resources and for whom additional short-read sequencing reads and analysis would be cost- or time-prohibitive.
Here, we demonstrate that not only is DNA contamination widespread in PCR reagents, but also that the dominant bacteria in these reagents varies. This variation indicates that it cannot be assumed that a lab will always have the same contaminating sequences, as this will change with different PCR enzyme selection. The authors would suggest that research groups maintain a single type and lot of enzyme for a set of related experiments. And if the enzyme or lot was to be switched for any reason, then a new set of control experiments should be performed to ensure that there is an accurate depiction of the possible bacterial contaminants. This is shown in these experiments by the difference in contamination profiles of enzyme 1. In previous experiments in our lab, a different lot of this enzyme produced bright bands in all its negative controls of equal size to the V3-4 region of the 16S rRNA (unpublished data). However, in these experiments, there is now only a single reaction of the three that has a faint contamination band, approaching the limit of detection of an agarose gel. This shows how widely the contamination of an enzyme can vary between lots. We would recommend that any PCR enzymes that are used be purchased by specific lots, and that this single lot be maintained for any experiments that will be directly compared to one another.
Importantly, our data show that this contamination of bacterial DNA can be detected with endpoint PCR and Sanger sequencing, two methods which are rapid and accessible for many labs compared to the more expensive and resource-intensive qRT-PCR and short-read sequencing. Microbiome research remains dominated by groups in well-resourced countries, often far from the locations where samples are collected, especially in humans. For these labs, the additional expense and resources required for sequencing negative control reactions is inconsequential. However, many labs are tightly constrained by budgets and may be limited to only being able to sequence a minimum of experimental samples and controls. Our work here is valuable to show that, in these situations, endpoint PCR and Sanger sequencing can be used as valid substitutes for additional short-read sequencing runs.
Together, the data presented here reinforce how important it is to carefully control for bacterial contamination regardless of the analysis methods used. As stated above, bacterial DNA contamination is an ongoing issue in the study of the microbiome, and it is a particular problem in samples that are likely to have a low bacterial burden, as contamination is likely to make up a higher proportion of the DNA present. And even though this is a known issue in the field, it remains underappreciated and, in some cases, uncontrolled for. There are many sources of contamination that should be examined for contamination, such as DNA extraction kits, plastic ware, and as shown here, PCR reagents used for expansion of target sequences. The authors recommend that anytime that a low bacterial burden tissue or environmental sample is going to be examined for bacterial DNA, one of the included control reactions should be used to control for potential contamination of PCR reagents. This control can be used in one of two ways: (1) if the selected PCR enzyme must be used, then the contaminants can be identified and controlled for during analysis; or (2) multiple enzymes and enzyme lots can be tested, and only those that are found to be free of detectable DNA can be selected.
Supplementary Materials
The following supporting information can be downloaded at:
https://www.mdpi.com/article/10.3390/microorganisms13040732/s1, Table S1. Table containing the key for the enzyme and manufacturers used, Table S2. Table containing predicted bacterial identities from blast search, Table S3. Table containing the accession numbers of the sequencing information in the NCBI GenBank database.
Author Contributions
Conceptualization, A.M.S. and S.B.B.; methodology, A.M.S.; validation, A.M.S.; formal analysis, A.M.S.; investigation, A.M.S.; resources, S.B.B.; writing—original draft, A.M.S.; writing—review and editing, A.M.S. and S.B.B.; visualization, A.M.S. and S.B.B.; supervision, S.B.B.; project administration, S.B.B.; funding acquisition, A.M.S. and S.B.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research is supported by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20GM103451 (SBB) and a National Institutes of Health grant K12 GM088021 (AMS).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All sequence information is available for download and inspection from the NCBI GenBank database at the accession numbers found in
Supplemental Table S3.
Acknowledgments
We thank Julie In for providing E. coli samples for the positive controls used in this manuscript.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
PCR | polymerase chain reaction |
qRT-PCR | quantitative real-time polymerase chain reaction |
dNTP | deoxynucleotide triphosphate |
References
- de Goffau, M.C.; Lager, S.; Sovio, U.; Gaccioli, F.; Cook, E.; Peacock, S.J.; Parkhill, J.; Charnock-Jones, D.S.; Smith, G.C.S. Human Placenta Has No Microbiome but Can Contain Potential Pathogens. Nature 2019, 572, 329–334. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, D.; Rajan, D.; Bellis, K.L.; Betteridge, E.; Brennan, J.; de Sousa, C.; Carriage Study Team; Parkhill, J.; Peacock, S.J.; de Goffau, M.C.; et al. Optimization of High-Throughput 16S rRNA Gene Amplicon Sequencing: An Assessment of PCR Pooling, Mastermix Use and Contamination. Microb. Genom. 2023, 9, 001115. [Google Scholar] [CrossRef] [PubMed]
- Olomu, I.N.; Pena-Cortes, L.C.; Long, R.A.; Vyas, A.; Krichevskiy, O.; Luellwitz, R.; Singh, P.; Mulks, M.H. Elimination of “Kitome” and “Splashome” Contamination Results in Lack of Detection of a Unique Placental Microbiome. BMC Microbiol. 2020, 20, 157. [Google Scholar] [CrossRef] [PubMed]
- Glassing, A.; Dowd, S.E.; Galandiuk, S.; Davis, B.; Chiondini, R.J. Inherent Bacterial DNA Contamination of Extraction and Sequencing Reagents May Affect Interpretation of Microbiota in Low Bacterial Biomass Samples. Gut Pathog. 2016, 8, 24. [Google Scholar] [PubMed]
- Salter, S.J.; Cox, M.J.; Turek, E.M.; Calus, S.T.; Cookson, W.O.; Moffatt, M.F.; Turner, P.; Parkhill, J.; Loman, N.J.; Walker, A.W. Reagent and Laboratory Contamination Can Critically Impact Sequence-Based Microbiome Analyses. BMC Biol. 2014, 12, 87. [Google Scholar] [CrossRef]
- Drengenes, C.; Wiker, H.G.; Kalananthan, T.; Nordeide, E.; Eagan, T.M.L.; Nielsen, R. Laboratory Contamination in Airway Microbiome Studies. BMC Microbiol. 2019, 19, 187. [Google Scholar] [CrossRef] [PubMed]
- Stinson, L.F.; Keelan, J.A.; Payne, M.S. Identification and Removal of Contaminating Microbial DNA from PCR Reagents: Impact on Low-Biomass Microbiome Analyses. Lett. Appl. Microbiol. 2019, 68, 2–8. [Google Scholar] [CrossRef] [PubMed]
- Nilsen, I.W.; Øverbø, K.; Havdalen, L.J.; Elde, M.; Gjellesvik, D.R.; Lanes, O. The Enzyme and the cDNA Sequence of a Thermolabile and Double-Strand Specific Dnase from Northern Shrimps (Pandalus Borealis). PLoS ONE 2010, 5, e10295. [Google Scholar] [CrossRef]
- Harrison, J.G.; Randolph, G.D.; Buerkle, C.A. Characterizing Microbiomes via Sequencing of Marker Loci: Techniques To Improve Throughput, Account for Cross-Contamination, and Reduce Cost. mSystems 2021, 6, e0029421. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.-X.; Qin, Y.; Chen, T.; Lu, M.; Qian, X.; Guo, X.; Bai, Y. A Practical Guide to Amplicon and Metagenomic Analysis of Microbiome Data. Protein Cell 2021, 12, 315–330. [Google Scholar] [CrossRef] [PubMed]
- Peterson, D.; Bonham, K.S.; Rowland, S.; Pattanayak, C.W.; RESONANCE Consortium; Klepac-Ceraj, V.; Deoni, S.C.L.; D’Sa, V.; Bruchhage, M.; Volpe, A.; et al. Comparative Analysis of 16s rRNA Gene and Metagenome Sequencing in Pediatric Gut Microbiomes. Front. Microbiol. 2021, 12, 670336. [Google Scholar] [CrossRef] [PubMed]
- Piro, V.C.; Renard, B.Y. Contamination Detection and Microbiome Exploration with Grimer. GigaScience 2023, 12, giad017. [Google Scholar] [CrossRef] [PubMed]
- Austin, G.I.; Park, H.; Meydan, Y.; Seeram, D.; Sezin, T.; Yue, C.L.; Firek, B.A.; Morowitz, M.J.; Banfield, J.F.; Christiano, A.M.; et al. Contamination Source Modeling with Scrub Improves Cancer Phenotype Prediction from Microbiome Data | Nature Biotechnology. Nat. Biotechnol. 2023, 41, 1820–1828. [Google Scholar] [CrossRef] [PubMed]
- Hülpüsch, C.; Rauer, L.; Nussbaumer, T.; Schwierzeck, V.; Bhattacharyya, M.; Erhart, V.; Traidl-Hoffmann, C.; Reiger, M.; Neumann, A.U. Benchmarking Microbiem—A User-Friendly Tool for Decontamination of Microbiome Sequencing Data. BMC Biol. 2023, 21, 269. [Google Scholar] [CrossRef]
- Fadeev, E.; Cardozo-Mino, M.G.; Rapp, J.Z.; Bienhold, C.; Salter, I.; Salman-Carvalho, V.; Molari, M.; Tegetmeyer, H.E.; Buttigieg, P.L.; Boetius, A. Comparison of Two 16S rRNA Primers (V3–V4 And V4–V5) for Studies of Arctic Microbial Communities. Front. Microbiol. 2021, 12, 670336. [Google Scholar] [CrossRef]
- Drengenes, C.; Eagan, T.M.L.; Haaland, I.; Wiker, H.G.; Nielsen, R. Exploring Protocol Bias in Airway Microbiome Studies: One versus Two PCR Steps and 16s rRNA Gene Region V3 V4 Versus V4. BMC Genom. 2021, 22, 3. [Google Scholar] [CrossRef] [PubMed]
- Tamura, K.; Stecher, G.; Kumar, S. Mega11: Molecular Evolutionary Genetics Analysis Version 11. Mol. Biol. Evol. 2021, 38, 3022–3027. [Google Scholar] [CrossRef] [PubMed]
- Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) V5: An Online Tool for Phylogenetic Tree Display and Annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef] [PubMed]
- Madeira, F.; Pearce, M.; Tivey, A.R.N.; Basutkar, P.; Lee, J.; Edbali, O.; Madhusoodanan, N.; Kolesnikov, A.; Lopez, R. Search and Sequence Analysis Tools Services from Embl-Ebi in 2022. Nucleic Acids Res. 2022, 50, W276–W279. [Google Scholar] [CrossRef] [PubMed]
- Crossley, B.M.; Bai, J.; Glaser, A.; Maes, R.; Porter, E.; Killian, M.L.; Clement, T.; Toohey-Kurth, K. Guidelines for Sanger Sequencing and Molecular Assay Monitoring. J. Vet. Diagn. Investig. 2020, 32, 767–775. [Google Scholar] [CrossRef]
- Technical Notes-Sample Submission Guidelines-Resources-GENEWIZ. Available online: https://www.genewiz.com/en/Public/Resources/Sample-Submission-Guidelines/Sanger-Sequencing-Sample-Submission-Guidelines/Technical-Notes (accessed on 30 September 2024).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).