A Two-Step PCR Protocol Enabling Flexible Primer Choice and High Sequencing Yield for Illumina MiSeq Meta-Barcoding

High-throughput amplicon sequencing that primarily targets the 16S ribosomal DNA (rDNA) (for bacteria and archaea) and the Internal Transcribed Spacer rDNA (for fungi) have facilitated microbial community discovery across diverse environments. A three-step PCR that utilizes flexible primer choices to construct the library for Illumina amplicon sequencing has been applied to several studies in forest and agricultural systems. The three-step PCR protocol, while producing high-quality reads, often yields a large number (up to 46%) of reads that are unable to be assigned to a specific sample according to its barcode. Here, we improve this technique through an optimized two-step PCR protocol. We tested and compared the improved two-step PCR meta-barcoding protocol against the three-step PCR protocol using four different primer pairs (fungal ITS: ITS1F-ITS2 and ITS1F-ITS4, and bacterial 16S: 515F-806R and 341F-806R). We demonstrate that the sequence quantity and recovery rate were significantly improved with the two-step PCR approach (fourfold more read counts per sample; determined reads ≈90% per run) while retaining high read quality (Q30 > 80%). Given that synthetic barcodes are incorporated independently from any specific primers, this two-step PCR protocol can be broadly adapted to different genomic regions and organisms of scientific interest.


Introduction
The advancement of high-throughput sequencing has transformed our ability to explore microbial diversity across different environments. The overall sequence read output in combination with the ability to multiplex samples (i.e., pooling PCR products generated from many samples together in one sequencing effort and assigning the reads back to the original samples) makes MiSeq amplicon sequencing more cost-effective than other approaches. However, various technical difficulties may occur during library preparation, leading to low read quality or low clustering and read recovery rate [1]. To optimize sequencing quality, PhiX, the short DNA fragments that are derived from a well-characterized bacteriophage genome, is recommended by the manufacturer (Nicolas Devos, personal communication) to be added upon sequencing. In cases with 10% PhiX addition, 10% of unassigned reads are expected. If more than 10% unassigned reads are generated during sequencing, it is likely due to mistagging or unsuccessful tagging of barcodes to the intermediate PCR products during library construction. Many existing protocols utilize a one-step PCR for DNA amplicon sequencing [2][3][4]. In this one-step PCR approach, a barcode is linked to a primer that targets a specific genomic region [3]. However, when a new primer set is tested, new primers with barcodes are required (e.g., 96-384, each with ca. 60 bp synthetic oligonucleotides), which is costly. Compared to one-step PCR, the multi-step Figure 1. Examples of sample sources tested in this study, including (A) roots and soils from a cotton field, (B,C) roots from pine forests, (D) Leaves of grass, and (E,F) soil cores collected from a grass field.

Overview of Three-Step and Two-Step PCR Library Construction
To compare the sequence outcomes of two-step PCR (2P) and three-step PCR (3P) [5], we sequenced the 16S and ITS rDNA gene regions. We compared the microbial communities (bacteria, archaea, and/or fungi) from diverse environmental samples including soil and leaf samples from the forest and agricultural lands as well as root samples of pine trees (Figure 1). The references of each primer set tested are provided in Supplementary  Table S1. The methods used for DNA extraction from plant and soil samples were applied according to Liao et al., 2014 [11], and Beule et al., 2019 [8], respectively.
Briefly, the 3P protocol started with a template enrichment step with the amplification of a targeted genomic DNA region using organism/genomic region-specific primers, generating PCR product 3P_1st. Product 3P_1st was then supplied as the DNA template for the next PCR, which used organism/genomic region-specific primers with Illumina sequencing primers attached, producing product 3P_2nd. The 3P_2nd product was used as a DNA template for the final PCR step that PCR-ligates Illumina adaptors to both ends and adds a 10-bp barcode to the 3′ end of the targeted region, resulting in PCR product 3P_3rd (Tables 1 and 2) [5]. To assess the effect of magnetic bead clean-up, a "3P+cleanup" (Table 1) protocol that only differs from 3P by including a clean-up step before the final PCR reaction was tested as well.

Overview of Three-Step and Two-Step PCR Library Construction
To compare the sequence outcomes of two-step PCR (2P) and three-step PCR (3P) [5], we sequenced the 16S and ITS rDNA gene regions. We compared the microbial communities (bacteria, archaea, and/or fungi) from diverse environmental samples including soil and leaf samples from the forest and agricultural lands as well as root samples of pine trees (Figure 1). The references of each primer set tested are provided in Supplementary Table S1. The methods used for DNA extraction from plant and soil samples were applied according to Liao et al., 2014 [11], and Beule et al., 2019 [8], respectively.
Briefly, the 3P protocol started with a template enrichment step with the amplification of a targeted genomic DNA region using organism/genomic region-specific primers, generating PCR product 3P_1st. Product 3P_1st was then supplied as the DNA template for the next PCR, which used organism/genomic region-specific primers with Illumina sequencing primers attached, producing product 3P_2nd. The 3P_2nd product was used as a DNA template for the final PCR step that PCR-ligates Illumina adaptors to both ends and adds a 10-bp barcode to the 3 end of the targeted region, resulting in PCR product 3P_3rd (Tables 1 and 2) [5]. To assess the effect of magnetic bead clean-up, a "3P+cleanup" (Table 1) protocol that only differs from 3P by including a clean-up step before the final PCR reaction was tested as well. Table 1. Cross comparison of the steps applied for two-step PCR (2P), three-step PCR (3P), and three-step PCR with bead clean-up (3P+cleanup) protocols. Forward Gene Region-Specific Primer (FGRSP, e.g., ITS1F, 341F, and 515F) and Reverse Gene Region-Specific Primer (RGRSP, e.g., ITS2, ITS4, and 806R) can be replaced by forward and reverse primers of interest, respectively. x = barcode ID. Each of the 2P steps (Step 1-7) were described in detail in Figure 2.
Step Included or Not Step Included or Not Beads clean-up (2nd PCR product clean-up) Step 5 Yes Yes PCR product evaluation and multiplex Step 6 Yes Yes
The overall workflow of 2P is illustrated in Figure 2. The core steps of 2P include the isolation of the total genomic DNA from the samples (Step 1), followed by an initial PCR amplification using a pair of organism/genomic region-specific primers with Illumina sequencing primers attached (Step 2) to generate the PCR product 2P_1st (Figure 3). The 2P_1st product was cleaned up (Step 3) with a size-selection magnetic bead system (AMPure XP, Beckman Coulter, Inc., Indianapolis, IN, USA) to generate product C_2P_1st. C_2P_1st was used as the starting DNA template for the second PCR (Step 4). The second PCR further amplified product C_2P_1st with a universal primer set, adding Illumina adaptors and a 10-bp barcode that enabled the bioinformatic separation of reads derived from individual samples ( Figure 3). The PCR product (2P_2nd) generated from the second PCR was purified with magnetic beads to generate C_2P_2nd (Step 5). The C_2P_2nd product was examined with gel electrophoresis to ensure the correct size of the amplicons. The DNA concentration was measured, and the samples were normalized and pooled (Step 6). The pooled DNA library was submitted to the sequence facility for sequencing (Step 7). The demultiplex step was carried out using Illumina bcl2fastq Conversion Software v2 with one base error allowance for samples sequenced at Duke University Center for Genomic and Computational Biology. The detailed 2P protocol is illustrated in the next section. The sequences of the oligonucleotides required for 2P are provided in Supplementary Tables S1-S3. Agronomy 2021, 11, x FOR PEER REVIEW 5 of 19 section. The sequences of the oligonucleotides required for 2P are provided in Supplementary Tables S1-S3. Step 2 include the "gene region-specific primer (GRSP)" (colored in green; e.g., ITS1F and ITS4); the "linker" region (orange), which contains two base pairs; the "frame shift" regions (pink), which are one to six randomized nucleotides (F1-F6); and the sequencing primers (blue) specific to the sequencing platform selected (e.g., Illumina). In Step 3, the forward primer (PCR_F) added includes sequencing primers and Illumina primers (purple). The reverse primer (PCR_R_bc_(X), X = barcode ID) has an additional 10 bp barcode region (yellow) that is used for the recognition of read originality. PCR product clean-up was carried out after each PCR reaction (Steps 3 and 5). The oligonucleotides and sequences of Read1_seq and Read2_seq are provided to the sequencing facility upon sample submission. Synthetic oligonucleotide sequences are provided in Supplementary Tables S1-S3. Thermocycler programs are illustrated in Figure 3. Step 2 include the "gene region-specific primer (GRSP)" (colored in green; e.g., ITS1F and ITS4); the "linker" region (orange), which contains two base pairs; the "frame shift" regions (pink), which are one to six randomized nucleotides (F1-F6); and the sequencing primers (blue) specific to the sequencing platform selected (e.g., Illumina). In Step 3, the forward primer (PCR_F) added includes sequencing primers and Illumina primers (purple). The reverse primer (PCR_R_bc_(X), X = barcode ID) has an additional 10 bp barcode region (yellow) that is used for the recognition of read originality. PCR product clean-up was carried out after each PCR reaction (Steps 3 and 5). The oligonucleotides and sequences of Read1_seq and Read2_seq are provided to the sequencing facility upon sample submission. Synthetic oligonucleotide sequences are provided in Supplementary Tables S1-S3. Thermocycler programs are illustrated in Figure 3.  The major differences between 2P protocol and the original 3P protocol [5] include (Tables 1 and 2): (1) a decrease in the number of PCR steps from three to two by eliminating the enrichment step but by maintaining the same total number of PCR cycles (30 cycles); (2) the use of a steady decrease in annealing temperature setting (touchdown approach) during PCR instead of a constant temperature for primer annealing, improving primer specificity and annealing and reducing primer dimer; (3) a decrease in the amount of DNA template added to the final PCR cycle; and (4) the implementation of a magnetic bead clean-up step before the final PCR reaction.
To evaluate the performance of 2P vs. 3P protocols, we first compared the lengths of PCR products based on the same DNA extractions to check if barcodes/adaptors were successfully added (dataset 1). Second, we compared the sequencing results of samples prepared with three protocols, including 2P, 3P, and 3P+cleanup (sample number = 7 for each protocol, dataset 2) ( Table 1, Supplementary Table S4). The libraries were based on exactly the same DNA extractions with the same primer set (515F-806R) and pooled with equal moles of PCR products for sequencing. The differences of the sequencing results (i.e., read number and quality) across the three protocols were evaluated. We then compared the critical component of the sequencing report from 10 independent MiSeq runs, of which 5 runs were prepared with 2P protocols and the other 5 prepared with 3P protocol (dataset 3, Supplementary Table S5). All 10 MiSeq runs were conducted with a single sequencing provider (Duke University Center for Genomic and Computational Biology) using the same platform (Illumina MiSeq v3 300PE). For both the 2P and 3P libraries, we included various sample types (different soil or plant samples) representing heterogeneous DNA extractions targeting bacteria/archaea or fungi. While the heterogeneous nature of the samples tested could bring about inconsistency, their diverse contents also offer a valuable opportunity to evaluate the consistency across sample types and individual runs. To evaluate the consistency of sequencing quality and the percentage of undetermined reads across independent runs, we performed Welch Two Sample t-test (or Wilcoxon rank-sum test for non-normally distributed data) and Levene test for homogeneity of variance to assess the differences between the 2P and the 3P protocols and the variation within the same protocol, respectively. All of the statistical tests were conducted in R [12]. The major differences between 2P protocol and the original 3P protocol [5] include (Tables 1 and 2): (1) a decrease in the number of PCR steps from three to two by eliminating the enrichment step but by maintaining the same total number of PCR cycles (30 cycles); (2) the use of a steady decrease in annealing temperature setting (touchdown approach) during PCR instead of a constant temperature for primer annealing, improving primer specificity and annealing and reducing primer dimer; (3) a decrease in the amount of DNA template added to the final PCR cycle; and (4) the implementation of a magnetic bead clean-up step before the final PCR reaction.
To evaluate the performance of 2P vs. 3P protocols, we first compared the lengths of PCR products based on the same DNA extractions to check if barcodes/adaptors were successfully added (dataset 1). Second, we compared the sequencing results of samples prepared with three protocols, including 2P, 3P, and 3P+cleanup (sample number = 7 for each protocol, dataset 2) ( Table 1, Supplementary Table S4). The libraries were based on exactly the same DNA extractions with the same primer set (515F-806R) and pooled with equal moles of PCR products for sequencing. The differences of the sequencing results (i.e., read number and quality) across the three protocols were evaluated. We then compared the critical component of the sequencing report from 10 independent MiSeq runs, of which 5 runs were prepared with 2P protocols and the other 5 prepared with 3P protocol (dataset 3, Supplementary Table S5). All 10 MiSeq runs were conducted with a single sequencing provider (Duke University Center for Genomic and Computational Biology) using the same platform (Illumina MiSeq v3 300PE). For both the 2P and 3P libraries, we included various sample types (different soil or plant samples) representing heterogeneous DNA extractions targeting bacteria/archaea or fungi. While the heterogeneous nature of the samples tested could bring about inconsistency, their diverse contents also offer a valuable opportunity to evaluate the consistency across sample types and individual runs. To evaluate the consistency of sequencing quality and the percentage of undetermined reads across independent runs, we performed Welch Two Sample t-test (or Wilcoxon rank-sum test for non-normally distributed data) and Levene test for homogeneity of variance to assess the differences between the 2P and the 3P protocols and the variation within the same protocol, respectively. All of the statistical tests were conducted in R [12]. To validate that effects of 2P vs. 3P are consistent across sequencing facilities and different microbial Agronomy 2021, 11, 1274 7 of 18 groups, we independently replicated the 2P vs. 3P protocols and sequencing at Michigan State University (MSU), targeting fungi and bacteria from agricultural soils and from small mammal scat (dataset 4, Supplementary Method S1, Table S6). We compared the read number, richness, and taxonomic composition of bacteria and fungi based on the same DNA exactions. Finally, to evaluate the scale of increase/decrease of sequencing performance on the same sample, we calculated the ratio between 2P and 3P for read number and read quality generated by Duke and MSU sequencing facilities.
Shapiro-Wilk test was applied to assess the normality of the data. If the data followed a normal distribution, an ANOVA or Welch two-sample t-test was conducted. A post hoc Tukey HSD test was then performed for the significant outcomes evaluated using ANOVA. If data normality was rejected, a Kruskal-Wallis or Wilcoxon rank-sum test was carried out. When a Kruskal-Wallis test was significant, the pair-wise comparison was performed using Wilcoxon rank-sum test corrected by FDR multiple sample comparison. Raw reads of datasets 2 and 3 were deposited at the Sequence Read Archive of NCBI (Bioproject: PRJNA736330).

Detailed Workflow of Two-Step PCR (2P) Protocol
The recommended steps for 2P were listed here. This protocol has been tested on soil and root materials (Figure 1

. Sample Collection and Processing
Samples collected from the field need to be immediately stored in 4 • C in the field. Sample processing (e.g., soil sieving and root picking) must be carried out within 24 h, and the samples are subsequently stored in −20 • C or −80 • C freezer for longer sample preservation.

DNA Extraction
Soil DNAs are extracted with Qiagen PowerSoil kit. Depending on the soil type, a different amount of soil might be needed. Generally, 0.2-0.5 g of soil is recommended [8,10]. Root DNAs are extracted following a CTAB-based protocol described in Liao et al., 2014 [11]. Typically, more than three root tips or clusters from the Pinus species are recommended to obtain adequate DNA.

Methods (Optional) to Prevent PCR Inhibitors in Extracted DNA Affecting the Following PCR Steps
For DNA extractions from the root, DNA extraction may be cleaned up with AMPure XP beads (Beckman Coulter, Inc., Indianapolis, IN, USA). This is the optional step to reduce PCR inhibitors from root tissues that are co-extracted. The ratio of volume of beads to DNA extraction is 1:1. The clean-up protocol was performed according to Step 3. Loading equal volumes of the substrate for DNA extraction and normalizing DNA concentration across sample types, such as soils or root tissue, helps with generating even libraries. Still, it is recommended that PCR on a series of DNA concentrations (1-20 ng/µL, determined by Thermo Scientific™ NanoDrop, or Qubit™) be performed first on a few samples to determine the optimal DNA concentration before bulk sample processing. In special cases when excessive PCR inhibitors (i.e., substrates in DNA extraction that inhibit PCR reaction) or low levels of targeted organisms are present in the DNA extraction, diluting or adding more original DNA into the PCR could be tested [13].

PCR Reagent Preparation
Use the DNA extraction from Step 1, and follow the PCR recipe 2P_1st in Table 3.

PCR Amplification
Place the PCR tubes on a thermocycler, and use the PCR program 2P_1st setting illustrated in Figure 3. Briefly, this step amplifies specific genomic regions with a pair of forward (e.g., fungi: ITS1F; bacteria: 515F and 341F) and reverse (e.g., fungi: ITS2 and ITS4; bacteria: 806R) gene region-specific primers (FGRSP and RGRSP) that include frame-shift features (Figure 2, sequences provided in Supplementary Table S1). The steadily decreasing annealing temperature of the "touchdown" approach implemented in the 2P_1st thermocycler program facilitates the primer annealing efficiency for a wide range of primer sequences. This step yields PCR product 2P_1st.

First PCR Product Clean-Up
The PCR purification step is performed using the solid-phase reversible immobilization (SPRI) bead size selection system [14] that removes remaining dNTPs, primers, primer-dimers, and salt.

AMPure Bead Preparation
Take the Beckman Coulter™ Agencourt AMPure XP beads out of the refrigerator and warm them to room temperature for approximately 30 min before use. Mix the AMpure XP beads well with the solution by gently vortexing or shaking until the beads are fully suspended in the solution (i.e., the solution becomes homogeneously brown).

Mix AMPure Beads and PCR Products
Add 12.5 µL of AMpure XP beads into each PCR tube (the ratio of volume of beads to the PCR product is 1:1), and pipet at least 10 times to fully mix the PCR product and beads.

Separate Solution versus AMPure Beads
Place the PCR tube onto the magnetic stand (e.g., Agencourt SPRIPlate 96R Super Magnet Plate), and wait until the tube becomes clear (about 5 min). Take out 10 µL, and discard the liquid.

Ethanol Wash
Add 200 µL of 80% ethanol; be careful not to disturb the pellet. Discard all liquid. Let the tubes air dry for 15 min to allow for evaporation of the adhered ethanol. Remove the PCR tube from the magnetic stand, add 12.5 µL water into the PCR tube, mix well, and incubate at room temperature for 2 min.

AMpure Bead Separation and Removal
Place the tube on the magnetic stand, and wait until the tube becomes clear. Move 10 µL supernatant to a new tube. The cleaned-up PCR product of this step is C_2P_1st.

PCR Reagent Preparation
Use the cleaned PCR product from Step 3 (C_2P_1st) as the DNA template. Follow the PCR recipe in Table 3. The sequences of primers are provided in Supplementary Table S2.

PCR Amplification
Place the PCR tubes on a thermocycler, and use the PCR program 2P_2nd setting illustrated in Figure 3. This step adds an Illumina adaptor to the forward end and another adaptor with a barcode region to the reverse end of the DNA template. As the primers used herein are long (forward ≈56 bp and reverse ≈60 bp), the touchdown annealing temperature setting is optimized so that the primers adhere to the template DNA. The PCR product is 2P_2nd.

AMPure Beads Preparation
Take Beckman Coulter™ Agencourt AMPure XP beads from the refrigerator 30 min prior to use. Shake and vortex the AMpure XP beads to mix and spin them from the sides of the wells.

Mix AMPure Beads and PCR Products
Add 25 µL of AMpure XP beads into each PCR tube (the ratio of volume of beads to PCR product is 1:1), and pipet at least 10 times to fully mix the PCR product and beads.

Separate Solution versus AMPure Beads
Place the PCR tube onto the magnetic stand (e.g., Agencourt SPRIPlate 96R Super Magnet Plate), and wait until the tube becomes clear. Take out 48 µL, and discard the liquid.

Ethanol Wash
Add 200 µL of 80% ethanol; be careful not to disturb the pellet. Discard all liquid. Let the tubes air dry for 15 min to evaporate the adhered ethanol.

Beads and DNA Resuspension
Remove the PCR tube from the magnetic stand, add 25 µL water into the PCR tube, mix well, and incubate at room temperature for 2 min.

AMpure Bead Separation and Removal
Place the tube on the magnetic stand, and wait until the tube becomes clear. Move 23.5 µL of the supernatant to a new tube. This step generates the product C_2P_2nd.

PCR Product Evaluation and Multiplex
Each C_2P_2nd needs to be examined for amplicon size and DNA concentration.

Gel Preparation and Electrophoresis
Prepare a 1% gel using electrophoresis-grade agarose in 1X TAE (Tris-Acetate-EDTA) buffer and appropriate fluorescence dye (e.g., SYBR-safe) for DNA examination. After the gel solidifies, load 5 µL of C_2P_2nd (mixed with 6× loading dye) for every sample into separate wells of the gel. A 100 bp DNA ladder should also be loaded as a reference for DNA amplicon lengths.

Gel Examination
The gel is then examined under UV or blue light box to determine the size of DNA fragments and their integrity. The gel examination can also be applied to examine the potential issues with primer dimers. Primer dimers with sizes below 200 bp may or may not be sufficiently removed by the bead clean-up during Step 5.

DNA Quantity and Purity Assessment
Evaluating C_2P_2nd products with a spectrometer (e.g., NanoDrop, Thermo Sci-entific™, Waltham, MA, USA) is recommended before multiplexing. This step assesses the DNA concentration as well as detects the potential contaminations, including ethanol and polysaccharides [13]. Ethanol and polysaccharides could interfere with concentration measurements and downstream PCR performance. Due to the high absorbance of such contaminants at 230 nm, the DNA extraction with ethanol/polysaccharides contamination would have a A260/A230 ratio lower than 2.0 [15]. DNA concentration can be further checked with fluorometric quantification methods, such as the Qubit fluorometer (Thermo Scientific™, Waltham, MA, USA).

Sample Multiplexing
Based on the quantification report of every C_2P_2nd sample using NanoDrop or Qubit (Step 6.3), each amplicon sample is normalized to a 10 nM concentration in pure water prior to pooling. Approximately 200-300 pooled samples are prepared for a run of MiSeq 300 PE sequencing.

Sample Submission to the Sequencing Facility
In addition to the pooled samples, two primer sets are needed for sequencing, Read1_seq sequencing primer, and Read2_2_seq sequence primer. Read1_seq and Read2_seq primers allow for the initiation of sequencing for the Illumina platform. Users need to provide the pooled product from Step 6 and custom Read1_seq Illumina sequencing primer (100 µM concentration) (Figure 2, Supplementary Table S3) [5] to the sequencing facility. In general, the Read2_seq sequence is available at the sequencing facility and submitting Read2_seq sequence is likely not necessary.

Submission Condition Inquiry
It is recommended to reach out to the sequencing facility prior to sample submission for primer design, multiplexing strategy, and sequencing depth requirements. Depending on fragment size, 10-15 picomolar (for 300-400 bp) to 25 picomolar (for 600-700 bp) can be loaded. The sequencing performed in this study was based on the platform Illumina MiSeq Paired-End 300 bp. Many sequencing facilities also assist with the demultiplexing process (i.e., assigning the reads to the sample they belong to). If so, provide the barcode list associated with individual samples (Supplementary Table S2).

Hardware Requirements
-Thermocycler: any thermocycler allowing temperature to decrease per cycle should work. -Magnetic stands: PCR purification step is essential to remove remaining dNTPs and primer dimers that might be present in PCR products. To perform purification with paramagnetic beads, a 96-well magnetic plate is essential. -Nucleotide spectrophotometer or fluorometer: to obtain an accurate concentration of DNA across the cleaned-up 2P_2nd product, a spectrophotometer or fluorometer is required. In addition to reporting the DNA concentration, the spectrophotometer also reveals common contaminations (e.g., ethanol and phenolic compounds) in the PCR product. The fluorometer, on the other hand, is believed to provide a more accurate estimation of DNA concentration. -Gel electrophoresis system: horizontal electrophoresis system. -Gel imaging system: gel Documentation System

Results and Discussion
Each C_2P_2nd needs to be examined for amplicon size and DNA concentration. In this study, the efficiency of the PCR enrichment process is defined as successful amplification of the targeted templates. When the efficiency is high, the final PCR products should yield adequate PCR fragments of desired lengths. In cases where there is a decrease in the PCR efficiency, this can often be attributed to critical issues such as (1) primers binding non-specifically to undesired genomic regions and, thus, failing to amplify the targeted DNA fragment; (2) primers carried over from the previous amplification reactions (instead of newly added primer set) consumed in the current PCR cycles; and (3) insufficient primertemplate annealing or low template concentration leading to inadequate PCR products. The second issue may lead to high proportions of undetermined reads in sequencing results [5]. To resolve these issues, the PCR efficiency of 3P [5] and 2P (this study) protocols were compared and their sequence outcomes were evaluated. Specifically, we examined whether the Illumina adaptors and barcodes were successfully attached to the amplified DNA regions during PCR library constructions. We also compared the recovery rate and quality of the reads generated from the two protocols. Our results indicate that the 2P protocol that implements a "touchdown" approach, an additional beads clean-up step, and a lowered DNA input of the final PCR largely improved the PCR efficiency. Such an improvement was further confirmed by receiving fewer undetermined reads in the sequencing data.

Evaluation of PCR Efficiency by Assessing the Amplicon Length
We evaluated the length of PCR products generated from 2P_1st and 2P_2nd. The successful attachment of the Illumina adaptors enables an increase of 65 bp for the second PCR-end products on top of the first PCR-end products. The increased 63 bp include the addition of Illumina adaptors in 5 -and 3 -end PCR products (forward = 29 bp; reverse = 24 bp) plus a barcode to the reverse end (10 bp) (Figure 4). On the contrary, incorrect PCR amplification occurs from primers carried over from the 2P_1st step instead of the newly added primers at the 2P_2nd step. Such unexpected primer usage may lead to a smaller increment in the overall length of the targeted community in final PCR products (3P_3rd and 2P_2nd). The DNA fragments of the PCR products amplified for the two samples at each step were around 650 to 1000 bp in length (shown by the black bands in Figure 4A). Figure 4A showed that, compared to the PCR products in 2P_1st, the increase in size of the 2P_2nd products by approximately 63 bp suggests that the 2P approach successfully amplified the desired template and attached the barcode.
at 875 bp (in red). Similarly, in a root sample, the highest peak shifted from 845 bp (2P_1st, in black) to 900 (2P_2nd, in blue) but was only 861 bp for 3P_3rd (in red). A higher quantity of PCR end-products detected for 2P_2nd compared to 3P_3rd indicates that the 2P protocol had a higher PCR efficiency of adding Illumina adapters and barcodes. The barcode sequence serves as the sample-specific tag for each sequence read. Successful attachment of the barcodes during 2P amplification is critical to assign the reads to the individual sample in the demultiplex process. Figure 4. PCR product lengths detected by Fragment Analyzer ™ measurements. PCR libraries were prepared with two-step (2P) and three-step (3P) protocols from the same DNA extractions (one soil and one root). PCR was performed with a fungus-specific primer set ITS1F and ITS4 (dataset 1). PCR products resulting from each of the three PCR steps of protocol 3P are referred to as 3P_1st, 3P_2nd, and 3P_3rd (Table 2). Similarly, the PCR steps of protocol 2P are referred to as 2P_1st and 2P_2nd (Table 2 and Figure 2). (A) Simulated gel image with DNA fragments shown as black bands. The Y-axis corresponds to the size of the fragment. The DNA fragments for both the soil and root samples demonstrated an increase in PCR product size from 2P_1st to 2P_2nd. Bands at 35 and 1500 bp are size standards; bands at ca. 66 bp are primer dimers. (B) An overlay of the DNA size spectrums generated from each PCR step using a soil and a root sample. The X-axis indicates the size of the DNA fragment measured. The Y-axis corresponds to the intensity of the Relative Fluorescence Unit (RFU) viewed as a proxy for fragment abundance at the given size. The root sample yielded PCR products with peaks at 845 bp (2P_1st), 900 bp (2P_2nd), and 861 bp (3P_3rd product). The PCR product of a soil sample yielded PCR product size peaks such as 849 bp (2P_1st), 911 bp (2P_2nd), and 875 bp (3P_3rd). The DNA standards of 35 and 1500 bp are shown. 2P = two-step PCR protocol; 3P = three-step PCR protocol.
The Fragment Analyzer was further applied to obtain a fine resolution for the fragment size generated from the PCR products of the 2P_1st, 2P_2nd, and 3P_3rd steps ( Figure 4B). The amount of PCR end-products for the 3P_1st and 3P_2nd steps (with only 10 PCR cycles each) were below the detectable range of Fragment Analyzer and therefore not presented here. As shown in Figure 4B, the quantity of PCR end-products in each step was presented as relative fluorescent units (RFU) (Y-axis). The amplified DNA fragments were shown as the peaks ranging between ca. 600 to 1000 bp (highlighted in yellow) ( Figure 4B). With a soil sample, the highest peak (amplified with ITS1F and ITS4 primer set) shifted from 849 bp (2P_1st, in black) to 911 bp (2P_2nd, in blue), while 3P_3rd was only at 875 bp (in red). Similarly, in a root sample, the highest peak shifted from 845 bp (2P_1st, in black) to 900 (2P_2nd, in blue) but was only 861 bp for 3P_3rd (in red). A higher quantity of PCR end-products detected for 2P_2nd compared to 3P_3rd indicates that the 2P protocol had a higher PCR efficiency of adding Illumina adapters and barcodes. The barcode sequence serves as the sample-specific tag for each sequence read. Successful attachment of the barcodes during 2P amplification is critical to assign the reads to the individual sample in the demultiplex process.

The Effect of PCR Library Protocols (2P, 3P, and 3P+Cleanup) on the Quantity and Quality of Sequence Reads
We investigated the sequencing results of seven soil samples generated using the 2P, 3P, and 3P+cleanup protocols (Table 1). To understand the effect of the magnetic bead clean-up between PCR steps on sequencing results, we first compared the 3P+cleanup protocol to the original 3P protocol. Compared to the 3P protocol, 3P+cleanup includes an extra step to clean up the intermediate PCR product (3P_2nd) before using it as the DNA template to generate the final PCR product (3P_3rd) ( Table 2). The 3P+cleanup protocol yielded a higher read number (Wilcoxon test, FDR < 0.001) with better read quality (Wilcoxon test, FDR < 0.01) ( Figure 5A,B) compared to 3P, suggesting that the clean-up step improved read quantity and quality. After confirming the beneficial effect of the beads clean-up step, we then compared the 2P protocol, which includes the beads clean-up step but implements a simpler library preparation procedure than 3P (Tables 1 and 2). According to the sequencing reports, 2P received a significantly (Wilcoxon test, FDR < 0.001) higher number of reads compared to 3P ( Figure 5A) per sample. The read quality is similar between 2P and 3P ( Figure 5B). The 3P+clean protocol yielded a significantly higher read quality compared to 2P and 3P. However, because 2P protocol significantly increased read quantity while retained read quality over 3P, 2P is considered an improved protocol overall.

Across Run Comparison for the Proportion of Undetermined Barcoded Sequences Generated with 2P vs. 3P Approaches
To evaluate the quality of sequences generated from 2P and 3P across independent sequencing runs, the critical components of MiSeq sequencing reports were compared, including the percentage of undetermined reads (i.e., reads unable to be assigned to specific sample) and the average read quality (Phred score) (Supplementary Table S5). Independent Illumina MiSeq runs were compared between 2P (5 runs) and 3P (5 runs). The percentage of undetermined reads of 2P (8.19% ± 1.22%) were lower compared to those of 3P (22.8% ± 25.5%) (Wilcoxon test, p = 0.09) ( Figure 5C). The 2P protocol yielded a more consistent percentage of undetermined reads with significantly less variation between runs (Levene test, p = 0.05) compared to 3P. The average read quality is similar in 2P (32.5 ± 2.04) compared to 3P (31.8 ± 2.50) (Welch two-sample t-test, p = 0.61) ( Figure 5D) and showed similar degrees of variation across samples (Levene test, p = 0.77).
The higher number of sequence reads generated per sample ( Figure 5A) combined with the lower percentage of undetermined reads per run ( Figure 5C) indicates that the barcodes were properly attached to amplicons generated with 2P protocol. This major improvement in 2P consistently minimizes the chance to discard valuable reads simply due to their uncertain originality. The high read quality score in both 2P and 3P reflects a low error rate. Taken together, the 2P protocol allows for the retention of more reads for downstream analysis and assessment ( Figure 2). Compared to 3P, the 2P protocol slightly enhances the average read quality across runs ( Figure 5D). The quality of reads show variation among samples for both 2P and 3P. We suspect that the read quality issue might be sample-specific but requires further examination to discern the specific cause(s).

Evaluation of 2P and 3P Performance for Libraries Targeting Different Taxonomic Groups
To test whether the positive impact of 2P over 3P is consistent regardless of the taxonomic groups each library targets, comparisons of libraries targeting bacteria and fungi with 3P and 2P protocols were made. In addition to evaluating the read number, we further compared the ecological inferences, including alpha-diversity (i.e., Operational Taxonomic Unit (OTU) number) and the taxonomic composition recovered by each protocol. The bioinformatic pipeline was described in Supplementary Method S1. Compared to 3P, 2P-prepared samples consistently resulted in higher read number and higher alphadiversity based on the same DNA extraction for bacteria (Welch two-sample t-test, p < 0.01) ( Figure 6). For fungi, while the read number and alpha-diversity of libraries prepared by 2P were higher than 3P too, the magnitude of increases were not statistically significant ( Figure 6). In addition to quality assessments, the number of chimeras removed from samples sequenced at MSU were compared between the two amplification methods. For ITS samples, the number of chimeras was the same regardless of amplification method (41). The number of chimeras removed from 16S samples was higher in samples amplified with the 2P protocol compared to the 3P protocol (1072 vs. 915). However, this was consistent with a greater number of OTUs and reads produced by the 2P protocol. Agronomy 2021, 11, x FOR PEER REVIEW 16 of 19

2P and 3P Comparison between Sequencing Facilities
To confirm that the 2P protocol improvement is consistent regardless of the sequencing facility, sequencing results independently generated by the sequencing facilities at Duke University and Michigan State University were compared. We showed that samples sequenced with 2P yielded a higher read number in both institutions, ranging from 4.74 (bacteria_Duke)-fold to 1.19-fold compared to 3P (fungi_MSU) (Figure 7). Samples prepared with 2P had slightly lower sequencing quality (all 0.99-fold) compared to 3P in both sequencing facilities (Figure 7).

2P and 3P Comparison between Sequencing Facilities
To confirm that the 2P protocol improvement is consistent regardless of the sequencing facility, sequencing results independently generated by the sequencing facilities at Duke University and Michigan State University were compared. We showed that samples sequenced with 2P yielded a higher read number in both institutions, ranging from 4.74 (bacteria_Duke)-fold to 1.19-fold compared to 3P (fungi_MSU) (Figure 7). Samples prepared with 2P had slightly lower sequencing quality (all 0.99-fold) compared to 3P in both sequencing facilities (Figure 7).

Touchdown Technique in Improving Multi-Step PCR for Next-Generation Amplicon Sequencing
The touchdown PCR technique has been widely utilized in molecular applications to resolve incorrect primer binding [19,20]. In almost all PCR cycler programs, one annealing temperature is set to optimize primer binding to the template. The higher the annealing temperature, the higher the specificity of primer binding. However, a high annealing temperature and a high specificity come with the cost of a low binding rate, which could result in insufficient PCR amplification. Finding the optimized temperature by taking into account annealing specificity and efficiency often requires trial and error and is time-consuming for individual primer sets or samples. The concept of touchdown PCR takes advantage of exponential DNA amplification. By decreasing the annealing temperature, high temperatures enable the high specificity of PCR to be met first, thus producing large

Touchdown Technique in Improving Multi-Step PCR for Next-Generation Amplicon Sequencing
The touchdown PCR technique has been widely utilized in molecular applications to resolve incorrect primer binding [19,20]. In almost all PCR cycler programs, one annealing temperature is set to optimize primer binding to the template. The higher the annealing temperature, the higher the specificity of primer binding. However, a high annealing temperature and a high specificity come with the cost of a low binding rate, which could result in insufficient PCR amplification. Finding the optimized temperature by taking into account annealing specificity and efficiency often requires trial and error and is timeconsuming for individual primer sets or samples. The concept of touchdown PCR takes advantage of exponential DNA amplification. By decreasing the annealing temperature, high temperatures enable the high specificity of PCR to be met first, thus producing large copies of correct DNA amplicons by serving as templates for successive cycles. In the presented approach, a wide range of annealing temperatures are used in the programs to exploit a single PCR thermocycler program (2P_1st and 2P_2nd), allowing individual programs to be applied to diverse primer sets and samples with different conditions. The MiSeq next-generation sequencing platform typically requires long oligonucleotides that include primers, adaptors, and barcodes for amplicon sequencing. Such long oligonucleotides complicate the annealing or ligating steps. Therefore, the touchdown technique for amplicon sequencing that we provide here is a promising modification to existing microbiome protocols. While the current 2P protocol is optimized for Illumina MiSeq sequencing platform, the protocol could be adapted to other sequencing platforms (Oxford Nanopore, Pacbio) with minor adjustments.

Conclusions
The improved 2P protocol described herein was demonstrated to be suitable to study soil and plant microbiomes in agricultural and forest ecosystems. The simplified PCR protocol yields a superior read recovery rate; thus, we recommend the switch from 3P to 2P protocols. Currently, this protocol has only been tested on microbes (i.e., bacteria and fungi), yet the flexible primer, adaptor, and barcode attachment system are transferable to other target taxonomic groups or markers to benefit enterprises and future applications. For instance, this method is amenable to the examination of microfauna (e.g., nematodes in soil), plant identification in mixed crop products, viruses, and general environmental metabarcoding.