SNP Genotyping with Target Amplicon Sequencing Using a Multiplexed Primer Panel and Its Application to Genomic Prediction in Japanese Cedar, Cryptomeria japonica (L.f.) D.Don

: Along with progress in sequencing technology and accumulating knowledge of genome and gene sequences, molecular breeding techniques have been developed for predicting the genetic potential of individual genotypes and for selecting superior individuals. For Japanese cedar ( Cryptomeria japonica (L.f.) D.Don), which is the most common coniferous species in Japanese forestry, we constructed a custom primer panel for target amplicon sequencing in order to simultaneously determine 3034 informative single nucleotide polymorphisms (SNPs). We performed primary evaluation of the custom primer panel with actual sequencing and in silico PCR. Genotyped SNPs had a distribution over almost the entire region of the C. japonica linkage map and veriﬁed the high reproducibility of genotype calls compared to SNPs obtained by genotyping arrays. Genotyping was performed for 576 individuals of the F 1 population, and genomic prediction models were constructed for growth and wood property-related traits using the genotypes. Amplicon sequencing with the custom primer panel enables e ﬃ cient obtaining genotype data in order to perform genomic prediction, manage clones, and advance forest tree breeding.


Introduction
Molecular breeding techniques for plants that predict phenotypes from individual genotypes have been developed to shorten the breeding period compared to conventional methods [1,2]. With advances using a large number of genome-wide DNA markers to predict genomic estimated breeding values [3], genomic selection (GS) has been performed in many plant species (reviewed by Lin et al. [4] and Desta and Ortiz [5]) since it has become possible to construct genomic and transcriptomic information with next-generation DNA sequencing (NGS) and to prepare genome-wide DNA markers with various genotyping techniques [6].
as an effective way of cutting genotyping cost [37]. In particular, analysis cost per individual is an important consideration when performing genotyping on large numbers of individuals. Prediction accuracies using SNPs selected based on the results of GWAS were similar to those using all SNPs for several combinations of traits and populations [37]. Pre-selection SNPs could be crucial for improving the quality of genomic predictions [38]. An efficient SNP genotyping system is required to verify the practicality of these SNPs and to apply them to actual breeding populations.
However, there are not many choices for medium-scale (up to several thousand loci) genotyping methods that can be redesigned flexibly and applied to GS in conifers. It is necessary to construct a platform for medium-scale high reproducibility genotyping to perform genomic prediction in C. japonica with high reliability. AmpliSeq (Thermo Fisher Scientific Inc., Waltham, MA, USA), which is an NGS-based genotyping method using multiplexed primer solutions for targeted amplicon sequencing, can enable amplification of up to 6000 amplicons simultaneously with ultra-high multiplex PCR and the construction of a targeted amplicon sequencing library in 10 h. This method has been used for studying inherited cancer in humans (e.g., [39]) and flowering time in soybean [40].
In this study, we constructed a medium-scale SNP genotyping system for C. japonica. We adopted an AmpliSeq custom primer panel (Thermo Fisher Scientific Inc.) as the platform and performed primary evaluation of the custom primer panel with actual sequencing and variant calling and with in silico PCR. We also examined the applicability of the custom primer panel for genomic prediction in a F 1 population of C. japonica plus trees.

Primer Panel Design
Here, we selected 3034 target SNPs based on allelic effects of the SNPs on growth-(height and diameter at breast height, or DBH), wood property-(wood stiffness and wood density), and reproductive-(male fecundity) related traits as suggested by GWAS results [37] and their comprehensive distribution on the linkage map [36]. Primers for targeted amplicon sequencing on Ion Torrent platforms were designed with the online application program, AmpliSeq designer (https://www.ampliseq.com/login/ login.action) with EST sequences containing target SNPs in C. japonica as reported by Mishima et al. [36] and bed files describing the position of target SNPs on contigs. A total of 3031 EST sequences and 3034 target SNPs (four SNPs were located in one sequence) were analyzed. A total of 3004 primer pairs for 99.0% of the target SNPs were designed in a multiplexed 2 × Ion AmpliSeq Primer Pool (Table S1). Primer pairs to amplify target sequences for the remaining 30 SNPs could not be designed.

Panel Evaluation Via Actual Genotyping
We used a total of 16 clones of C. japonica for primary evaluation of the custom primer panel; 11 of the 16 clones were genotyped with the Axiom custom genotyping array (Thermo Fisher Scientific Inc.) designed by Mishima et al. [36], and five clones had an unknown SNP genotype. Current year needles were collected and stored at −20 • C until DNA extraction. For DNA extraction, about 50 mg of frozen needles were transferred into 2.0 mL microtubes and ground with liquid nitrogen and beads in a Shake Master Auto Ver. 2.0 (Bio Medical Science, Tokyo, Japan). DNA was extracted from the pellet with a plant DNA extraction kit (Qiagen Inc., Hilden, Germany), and DNA was quantified with the Qubit 3.0 Fluorometer and the Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific Inc.), according to the manufacturer's instructions.
Libraries for amplicon sequencing were constructed with the AmpliSeq Library Kit v2.0 (Thermo Fisher Scientific Inc.) using the following protocol. For multiplex-PCR amplification, 5 ng DNA of each sample was amplified with the custom primer pool (3004 primer pairs) per reaction. Each reaction mix contained 2 µL of 5 × Ion AmpliSeq HiFi Master Mix, 5 µL of 2 × Ion AmpliSeq Primer Pool, and 5 ng of DNA, and it was brought to a volume of 10 µL with nuclease-free water. The reaction mix was heated for 2 min at 99 • C for enzyme activation, followed by 13 two-step cycles at 99 • C for 15 s and at 60 • C for 8 min, and ending with a holding period at 10 • C. Following the PCR, 1 µL of FuPa enzyme regents per sample was added to the reaction mix, the reaction mix was incubated at 50 • C for 10 min and at 55 • C for 10 min to digest the primers of the amplicons, and the mixture was incubated at 60 • C for 20 min to inactivate the enzyme. To enable library multiplexing on a single semiconductor chip, 2 µL of Switch Solution, 1 µL of diluted unique barcode adapter mix containing Ion Xpress Barcode and Ion P1 Adapters at standard volumes, and 1 µL of DNA ligase were added to the digested reaction mix, and the reaction mix was incubated at 22 • C for 30 min, followed by ligase inactivation for 5 min at 68 • C and 5 min at 72 • C. The adapter-ligated AmpliSeq library was purified using 22.5 µL of Agencourt AMPure XP Reagent (Beckman Coulter Inc., Brea, CA, USA), followed by washing with 75 µL of 70% ethanol twice. After the magnetic beads were dry, the AmpliSeq library was dissolved in 20 µL of Tris-EDTA (TE) buffer.
AmpliSeq libraries were evaluated for size distribution with a Bioanalyzer 2100 and High Sensitivity DNA Kit (Agilent Technologies Inc., Santa Clara, CA, USA), and the quantity was evaluated with a Qubit 3.0 Fluorometer and Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific Inc.). The libraries identified by Ion Xpress Barcodes (Thermo Fisher Scientific Inc.) were multiplexed into a group of 16 samples for sequencing with an Ion Gene Studio S5 semiconductor sequencer and an Ion 520 semiconductor chip (Thermo Fisher Scientific Inc.). Emulsion PCR, emulsion breaking, and enrichment for the template preparation of ion sphere particles were performed with the Ion Chef and the Ion 520 and 530 Kit Chef (Thermo Fisher Scientific Inc.), according to the manufacturer's instructions. Following preparation of the semiconductor tip, sequencing was performed with an Ion Gene Studio S5 semiconductor sequencer using the Ion 520 Chip (Thermo Fisher Scientific Inc.), according to the manufacturer's instructions. The sequenced data were mapped to the reference gene sequence of C. japonica on a Torrent Server. Plugins, variantCaller v5.10.0.18 (Thermo Fisher Scientific Inc.) and reformatGBSCov v3.1 (Thermo Fisher Scientific Inc.) were used to construct a genotype table for the sequencing.

Data Processing and Visualization
Read depth per amplicon, read quality score, number of variants per site, GC ratio, and marker position within the genotyping with the AmpliSeq custom primer panel were summarized for each amplicon and drawn with the linkage position of the published C. japonica linkage map.
Genotyping efficiency for the primer panel was calculated for the 16 clones of C. japonica. Linkage positions of the SNPs on the published SNP linkage map [36] were drawn with R 3.6.0 [41]. The SNPs were distinguished by color on the linkage map at an 80% genotype call rate threshold. In addition, the SNP genotypes of the 11 of 16 clones were genotyped with the Axiom custom array (Axiom_Cj_70K_ver. 1.0; 73,274 SNPs; Gene Expression Omnibus Dataset (GEO): GSE95616; [36]) and compared with the genotyped results. The reproducibility of alleles was summarized and graphed on the R platform [41].

In Silico Panel Evaluation
The designed custom primer panel was tested with an in silico PCR program, Simulate_PCR version 1.2 ( [42], https://sourceforge.net/projects/simulatepcr/) under a Perl 5.16.3 environment. The fasta files containing the 3031 EST sequences used to design the AmpliSeq custom primer panel, and the 34,731 EST sequences for the reference gene [36] of C. japonica were used for the input template reference file. The default settings and the options regarding included amplicon sizes were as follows: -minlen 50 and -maxlen 1000. The numbers of expected PCR products were summarized as follows: (a) amplified products in in silico PCR, (b) amplicon size ≤300 bp in in silico PCR, (c) correct pairing (intended pair of forward and reverse primers), (d) correct amplicon (annealing to correct contig with expected amplicon size), (e) off-target amplification (unintended primers annealing to the wrong contig), (f) unintended amplicon size (shorter or longer than expected amplicon), (g) mismatched pairing (unintended pair of forward and reverse primers), and (h) others (possible missing detection; amplicon size ≤300 bp and off target, unintended amplicon size, or mismatched pairing primers).

Genotyping a F 1 Population
We used a total of 576 individuals of a F 1 population, consisting of 547 individuals from eight half-diallelic F 1 populations constructed with outcrossing between 32 plus trees originating from the Tokai breeding area (Table S2)  AmpliSeq libraries were constructed with an AmpliSeq Library Kit v2.0 (Thermo Fisher Scientific Inc.) as described above. Then, 22.5 µL of Agencourt AMPure XP Reagent (Beckman Coulter Inc.) was added with epMotion 96 (Eppendorf, Hamburg, Germany), and the magnetic beads were washed with 50 µL ethanol three times using HydroFlex (Tecan Group Ltd., Männedorf, Zürich, Switzerland) to purify the adapter-ligated AmpliSeq library. After the magnetic beads were dry, the AmpliSeq library was dissolved in 20 µL of TE buffer.
The purified AmpliSeq libraries were evaluated for size distribution with the Bioanalyzer 2100 and the High Sensitivity DNA Kit (Agilent Technologies Inc.) with quantification with the Qubit 3.0 fluorometer (Thermo Fisher Scientific Inc.). The libraries were distinguished by Ion Xpress Barcodes (Thermo Fisher Scientific Inc.), and 96 samples were multiplexed for sequencing with the Ion Gene Studio S5 semiconductor sequencer and a total of six Ion 540 semiconductor chips (Thermo Fisher Scientific Inc.). The sequenced data were mapped to the reference gene sequence of C. japonica on the Torrent Server. Plugins, variantCaller v5.10.0.18 and reformatGBSCov v3.1 were used to construct genotype tables for each run. All six sets of genotype data were collected in one genotype table.

Phenotypic Data
Phenotypic data of growth-and wood property-related traits were obtained by the following methods. Tree height, diameter at breast height (DBH), stress wave velocity (SWV), and pilodyn penetration depth (PP) were measured on trees that were 18 years old before harvesting. Tree height was measured with a Vertex III ultrasound instrument system (Haglöf, Västernorrland, Sweden), and DBH was measured at 1.3 m above ground with diameter calipers. The stress-wave propagation time of the stem was measured using a stress-wave timer (TreeSonic; Fakopp, Agfalva, Hungary). Briefly, the start sensor and the stop sensor were set on the stem from 0.7 to 1.7 m above the ground, and the stress-wave propagation time was measured five times from two directions at right angles to the direction of the slope, and SWV was determined by dividing the distance between the sensors by the mean value of the stress-wave propagation time. PP was measured using a Pilodyn 6J Forest (PROCEQ, Zurich, Switzerland) with a 2.5 mm diameter pin without removing the bark from the same directions used to obtain the stress-wave propagation time. After cutting the trees, logs from 1.0 to 2.5 m above the ground were collected for measuring the dynamic Young's modulus (DMOE). DMOE was measured using a portable FFT analyzer AD-3527 (A&D, Tokyo, Japan), following the tapping methods described by Sobue et al. [43]. After measuring DMOE, discs (about 40 mm in thickness) and short logs (about 400 mm in length) were collected from the butt end of the logs used to measure DMOE.
Square specimens were prepared for determining basic density (BD) at every fifth annual ring from the pith. The BD calculated from oven-dried weight was divided by the green volume, which was measured by the water displacement method [44]. BD_1, BD_2, and BD_3 were defined as the BD of the segments containing the 1st to 5th annual rings, 6th to 10th annual rings, and 11th to 15th annual rings, respectively. BD was not measured outside the 16th annual ring. The average BD of the whole disc (BD_means) was estimated by the weighted average method using the area of segments BD_1, BD_2, and BD_3.
The heart wood color was measured using a color meter CR-300 (Minolta, Tokyo, Japan). Bark to bark radial boards (30 mm thickness) were prepared from the short logs. The surface was previously smoothed by a belt sander under air-dried conditions. Measurements were expressed in the L*a*b* color space. L* indicates lightness, a* indicates the red-green axis, and b* indicates the yellow-blue axis.
The average values for five scattered points within each heartwood sample were used for analysis.
To determine the modulus of elasticity (MOE) and modulus of rupture (MOR), static bending tests were conducted according to the Japanese Industrial Standard (JIS) Z 2101-2009 [45] using bark-to-bark radial boards (30 mm thickness) prepared from the short logs. The boards were air-dried under laboratory conditions. A small clear specimen (20 (R) × 20 (T) × 320 (L) mm) was prepared from each board. All specimens were prepared at the same radial position: cross-section centered on the 4th annual ring from the pith. Static bending tests were conducted using a universal testing machine MSC-5/500-2 (Tokyo Testing Machine, Tokyo, Japan). Load was applied to the center of the radial surface of the specimen at 5 mm/min over a span of 280 mm. Data regarding load and deflection were recorded using a personal computer. After static bending tests, a small block was collected from each specimen for measuring the air-dried density and moisture content of the small clear specimen.
The microfibril angle of the S 2 layer in latewood tracheid (MFA) was measured as the angle of the slit-like pit aperture of boarded pits in latewood tracheids [46][47][48]. Using a sliding microtome (ROM-710, Yamatokohki, Saitama, Japan) and small clear specimens after the static bending test, tangential sections of 20 µm in thickness containing latewood of the 4th annual ring from the pith were obtained. These sections were stained with 1% safranine and then dehydrated using graded ethanol. The dehydrated sections were dipped into xylene and mounted on slides with bioleit (Okenshoji, Tokyo, Japan). Photomicrographs of tangential sections were taken using a light microscope CX-41 (Olympus Corporation, Tokyo, Japan) equipped with a digital camera E-300 (Olympus Corporation, Tokyo, Japan). The angle of the slit-pit aperture in the bordered pits of latewood tracheids to longitudinal direction was measured as MFA using ImageJ (National Institute of Health, Bethesda, MD, USA). Thirty tracheids were measured in each tree.
For the phenotypic data of each individual, a spatial autocorrelation error was adjusted with the breedR package [49] of the R platform [41]. Coordinates of individuals in the plantation site and values for each trait were used to calculate the spatial autocorrelation error, and each error was subtracted from the raw value to calculate the adjusted value ( Figure S1).

Genomic Prediction Within the F 1 Population
We performed the genomic prediction for the F 1 population and each trait using two methods: genomic best linear unbiased prediction (GBLUP) and Random Forest (RF). GBLUP and RF were performed using the "kin.BLUP" function of the rrBLUP package v 4.6.1 [50] and "randomForest" function of the randomForest package v 4.6-14 [51] of the R platform [41], respectively. For the methods of GBLUP and RF, raw trait and adjusted trait values with special autocorrelation errors were used independently to construct the genomic prediction model. Prediction accuracy was estimated using Pearson's correlation coefficient between the phenotypic value and the genomic prediction value obtained from the validation dataset in the 10-fold cross-validation. The correlation coefficients from the 10-time replications in the 10-fold cross-validations were summarized.

Primary Evaluation of the Multiplex Primer Panel
In the primary Ion S5 sequence run with an Ion 520 tip, a total of 1.32 Gb corresponding to 6.30 M reads was generated. The mean, median, and mode of the total sequence reads were 210, 228, and 235 bp, respectively, and 95.3% of obtained reads (6.00 out of 6.30 M reads) were successfully aligned to the reference sequence ( Figure S2).
SNPs with a call rate of more than 80% were distributed over 11 linkage groups covering the entire previously constructed linkage map of C. japonica [36] (Figure 1). SNPs with a call rate of less than 80% were scattered over the linkage map (Figure 1), and there were some open areas where SNP markers were not originally designed because of the low degree of polymorphism among clones.  The average number of called SNPs was 1990, corresponding to 68.5% of the SNP call rate through the genotyping for 16 clones of C. japonica. The obtained SNPs covered the linkage map (total of 1492.8 cm covering 11 linkage groups) of a previous study [36] with a mean distance between adjacent SNPs of 0.75 cm per SNP. For 11 clones, which were also previously genotyped with the Axiom arrays in a previous study [36] out of 16 clones, there were variations in the SNP call rate (Figure 3a) that are probably due to the low read depth (Table S3). The average SNP call rate of the 11 clones was 99.4% when genotyped with the Axiom arrays ( Figure 3b). The comparison between the two genotyping platforms, i.e., Axiom and AmpliSeq, shows a genotype call rate of target SNPs with a custom primer panel for AmpliSeq that was less than that with Axiom (Figure 3c), although the average for the ratio consensus per genotyped SNP reached 94.9% (Figure 3d). Therefore, most of the called SNPs with the custom primer panel of AmpliSeq were consistent with results obtained with the custom genotyping array of Axiom, indicating the high reproducibility of these two genotyping systems.  [36]. SNPs that are 80% or more genotyped, less than 80% genotyped, and ungenotyped with the custom primer panel are shown in deep green, light green, and gray, respectively. Read depth on the mapped loci and relative read quality for amplicon sequencing varied among the loci (Figure 2a). Alignment of amplicons to the reference sequence with variant calls shows that novel variants aside from the targeted SNPs were detected with more than one variant per amplicon ( Figure 2c) and with 18 variants per locus in the upper part of linkage group (LG) 1, 20 at the lower part of LG5, and 19 at the intermediate part of LG9 (Figure 2c). GC ratios of the sequenced amplicons were not largely biased (Figure 2d). The GC ratio and read depth were not correlated in this analysis.
The average number of called SNPs was 1990, corresponding to 68.5% of the SNP call rate through the genotyping for 16 clones of C. japonica. The obtained SNPs covered the linkage map (total of 1492.8 cm covering 11 linkage groups) of a previous study [36] with a mean distance between adjacent SNPs of 0.75 cm per SNP. For 11 clones, which were also previously genotyped with the Axiom arrays in a previous study [36] out of 16 clones, there were variations in the SNP call rate (Figure 3a) that are probably due to the low read depth (Table S3). The average SNP call rate of the 11 clones was 99.4% when genotyped with the Axiom arrays (Figure 3b). The comparison between the two genotyping platforms, i.e., Axiom and AmpliSeq, shows a genotype call rate of target SNPs with a custom primer panel for AmpliSeq that was less than that with Axiom (Figure 3c), although the average for the ratio consensus per genotyped SNP reached 94.9% (Figure 3d). Therefore, most of the called SNPs with the custom primer panel of AmpliSeq were consistent with results obtained with the custom genotyping array of Axiom, indicating the high reproducibility of these two genotyping systems.   Of the set of 3031 EST sequences that were used to design the custom primer panel and were used as the template reference for in silico panel evaluations, 3004 amplicons were synthesized by in silico PCR (Table 1), and all of the intended SNP genotyping was obtained. The total number of amplified products by in silico PCR was 3157, as some unintended additional amplicons were produced. Out of 3157 amplicons, 3052 amplicons were intended PCR products generated with the correct primer pairs, and 105 amplicons were unintended PCR products generated with wrong primer pairings (Table 1). Among the amplicons generated with correct primer pairings, six were off-target amplicons due to unintended primers annealing to the wrong contigs, and 42 were of unexpected size. The remaining 63 PCR products had an amplicon size ≤300 bp due to the wrong annealing position, presumably resulting in missing alleles in the process of genotyping (Table 1). In contrast, when all 34,731 EST sequences [36] were used as a template reference for in silico PCR, 2747 out of 3004 targeted amplicons were amplified with in silico PCR ( Table 1). The total number of amplified products by in silico PCR was 3395, consisting of 3265 PCR products amplified by the intended primer pairs and 130 amplified by the wrong primer pairs (Table 1). Among the PCR products with correct primer pairing, 340 products were off target amplicons and 178 were of unintended amplicon size ( Table 1). The remaining 504 PCR products had an amplicon size ≤300 bp (Table 1). When sequence data used for designing the custom primer panel were used for the template reference of the in silico PCR, all the targeted SNPs were genotyped, although some additional unintended amplification occurred. When different sequence data (in this case, 34,731 contigs) were applied to the template reference for in silico PCR, the proportion of the targeted amplification decreased to 2747 amplicons (80.9%), suggesting the redundancy of gene sequences in the C. japonica genome.

Genotyping of the F 1 Population
Ion S5 sequencing runs with the six Ion 540 tips generated a total of 62.2 Gb corresponding to 347.73 M reads. The mean, median, and mode of the total sequence reads were 178, 202, and 233 bp, respectively. Of these, 321.91 M reads (92.6%) were aligned to the reference sequence. Alignment accuracy of the reads to the reference sequence was high as well as the results of the primary evaluation of the custom primer panel.
Through genotyping 576 individuals of the F 1 population, the average number of read counts per sample was 563,800 ± 242,311 (mean ± SD) and the average number of called SNPs was 1963 ± 153, giving an average SNP call rate of 64.7% (Figure 4a). The remaining SNPs (34.3%) were not genotyped. The relationship between the acquired read count per sample and the proportion of genotype call rate per sample was examined, and the proportion of the SNP call rate was saturated when the read count per sample reached more than 250,000 reads (Figure 4b), suggesting that a sufficient number of reads was obtained for most genotyping samples. accuracy of the reads to the reference sequence was high as well as the results of the primary evaluation of the custom primer panel.
Through genotyping 576 individuals of the F1 population, the average number of read counts per sample was 563,800 ± 242,311 (mean ± SD) and the average number of called SNPs was 1963 ± 153, giving an average SNP call rate of 64.7% (Figure 4a). The remaining SNPs (34.3%) were not genotyped. The relationship between the acquired read count per sample and the proportion of genotype call rate per sample was examined, and the proportion of the SNP call rate was saturated when the read count per sample reached more than 250,000 reads (Figure 4b), suggesting that a sufficient number of reads was obtained for most genotyping samples.

Construction of the Genomic Prediction Models with the F1 Population
The prediction accuracy ranged from 0.166 to 0.555 depending on the type of data applied to the prediction, examined traits, and the models used for the genomic prediction (Table 2). Prediction accuracy was improved for all traits when the data were adjusted with spatial autocorrelation ( Table  2, Figures 5 and S3). In the models constructed with GBLUP, the prediction accuracy ranged from 0.166 to 0.454 when using raw trait values, but it was 0.185 to 0.544 for the adjusted trait values ( Figure  S3). In models constructed with RF, the prediction accuracy ranged from 0.176 to 0.450 for the raw trait values and from 0.195 to 0.555 for the adjusted trait values (Table 2, Figure 5). The prediction accuracy for traits related to wood properties (e.g., PP) was higher than for growth-related traits (e.g., height) (Table 2, Figure 5). The prediction accuracy showed a greater range of variation for the growth-related traits (0.197-0.418 for height and 0.236-0.408 for DBH) than for those of wood properties (0.445-0.487 for SWV, 0.456-0.555 for PP, and 0.409-0.544 for BD_means). For many wood properties (SWV, PP, DMOE, BD_2, BD_3, and BD_means), the prediction accuracies were higher

Construction of the Genomic Prediction Models with the F 1 Population
The prediction accuracy ranged from 0.166 to 0.555 depending on the type of data applied to the prediction, examined traits, and the models used for the genomic prediction (Table 2). Prediction accuracy was improved for all traits when the data were adjusted with spatial autocorrelation (Table 2, Figure 5 and Figure S3). In the models constructed with GBLUP, the prediction accuracy ranged from 0.166 to 0.454 when using raw trait values, but it was 0.185 to 0.544 for the adjusted trait values ( Figure  S3). In models constructed with RF, the prediction accuracy ranged from 0.176 to 0.450 for the raw trait values and from 0.195 to 0.555 for the adjusted trait values (Table 2, Figure 5). The prediction accuracy for traits related to wood properties (e.g., PP) was higher than for growth-related traits (e.g., height) (Table 2, Figure 5). The prediction accuracy showed a greater range of variation for the growth-related traits (0.197-0.418 for height and 0.236-0.408 for DBH) than for those of wood properties (0.445-0.487 for SWV, 0.456-0.555 for PP, and 0.409-0.544 for BD_means). For many wood properties (SWV, PP, DMOE, BD_2, BD_3, and BD_means), the prediction accuracies were higher than 0.40 regardless of the applied models (Table 2), although prediction accuracies for heart wood color (L*, a*, and b*) were lower than 0.30 (Table 2). For DBH, DMOE, MOE, MOR, MFA, BD_1, BD_2, BD_3, and BD_means, the prediction accuracies were higher for the model with GBLUP than in the model with RF, but for tree height, L*, a*, b*, SWV, and PP, prediction accuracy was higher in the model with RF than in the model with GBLUP (Table 2). Forests 2020, 11, x FOR PEER REVIEW 11 of 18 than 0.40 regardless of the applied models (Table 2), although prediction accuracies for heart wood color (L*, a*, and b*) were lower than 0.30 (Table 2). For DBH, DMOE, MOE, MOR, MFA, BD_1, BD_2, BD_3, and BD_means, the prediction accuracies were higher for the model with GBLUP than in the model with RF, but for tree height, L*, a*, b*, SWV, and PP, prediction accuracy was higher in the model with RF than in the model with GBLUP (Table 2).

Discussion
In this study, we constructed and evaluated a multiplexed custom primer panel for amplicon sequencing in order to perform genomic prediction in C. japonica. We verified the high reproducibility of genotype calls by comparing results for two different methods: the custom genotyping array (Axiom) and the massive amplicon sequencing based genotyping (AmpliSeq). Genotyped SNPs by these two methods were in consensus for almost 2000 of the 3034 targeted SNPs (94.9%). Genotyped SNPs were distributed over the entire linkage map (1492.8 cm covering 11 LGs) of C. japonica without regional bias, as presented in a previous study [36] with a mean distance between adjacent SNPs of 0.75 cm per SNP. An unbiased distribution of the dense marker is essential for performing an accurate estimation of breeding value [3]. We also conducted genomic prediction with the F 1 population using the genotypes acquired with the custom primer panel, and we therefore created the first platform for middle-scale genotyping with amplicon sequencing to archive genomic prediction in C. japonica.
A comparison with other genotyping platforms indicates the usefulness and availability of the custom primer panel for targeted amplicon sequencing. Amplicon sequencing with a custom primer panel is characterized by the high reproducibility of genotype calls and short processing time. In routine genotyping for breeding, NGS-based techniques need to meet several criteria, e.g., short processing time until interpretation of genotyping results for selection, limited requirements for DNA, sufficient read depth to accurately detect variants [40], and completeness of the genotype call. In lodgepole pine (Pinus contorta Douglas) and white spruce (Picea glauca (Moench) Voss), 17,765 and 17,845 SNPs were obtained for the pine and spruce, respectively, through GBS [52], and those were greater than the currently targeted SNP numbers by AmpliSeq. However, other crops have shown that genotyping with GBS shows a low completeness of SNPs called in those experiments, especially regarding low read depth [53]. Low completion rates of the total detected variants among samples leads to an increase in the number of genotypes that are treated as dominant markers rather than co-dominant markers. Stochastic molecular reactions at the condensation stage of genomic DNA at the step of constructing the sequencing library, such as cleavage by restriction enzymes in RAD-seq or binding less specific primers on short SSR sequences in PCR in MIG-seq, may produce different null alleles between experiments. These genotyping methods are suitable for mutation extraction, e.g., SNP discovery and/or experiments that do not assume repetition. In fact, Ueno et al. [54] detected hundreds of thousands of candidate SNPs in SNP discovery with RNA-Seq and RAD-Seq in C. japonica, although when analyzing the mapping population using them, they used the Fluidigm (South San Francisco, CA, USA) SNPType assay with 129 candidate SNPs and showed that 75 SNPs, representing 58.1%, were available as markers [54]. On the other hand, genotyping with AmpliSeq is suitable for breeding applications such as genomic selection with effective SNPs, owing to its high reproducibility among experiments.
In amplicon sequencing with the custom primer panels used in this study, around 34.3% of 3034 target SNPs were not detected (Figure 3a), even though primers for the target amplicon were designed and included in the custom primer panel. The relationship between acquired read count per sample and genotype call rate per sample shows a saturation curve (Figure 4), and it is suggested that a sufficient number of reads is obtained from most of the samples for genotyping. The most likely cause of why more than 30% of the target SNPs were not genotyped is that the amplicons were not synthesized as designed in the first PCR step of library construction. One of the following may prevent the successful synthesis of amplicons: (1) interception of the primers by other homologous gene loci, (2) synthesis of chimerical products, and (3) insertion of large introns in the genomic sequence. In the present study, we employed partial sequences of expressed genes [36] as a reference to design the custom primer panel. For the first scenario, duplicated gene loci, which were not included among the applied 3031 sequences, may interrupt binding to the specific loci of the designed amplicons. An in silico-based evaluation of the custom primer panel suggests the possibility that designed primers would anneal to off-targeted gene sequences (Table 1). For the second scenario, chimeric PCR products synthesized between different pairs of forward and reverse primers also consumed primers and seemed to have a low alignment ratio to the reference, although the primary evaluation of the custom primer panel showed a high alignment ratio against the reference ( Figure S2). The third scenario is also likely to be an obstacle to sequencing and sequence alignments when designing a custom primer panel that does not target genome sequencing, because the expressed gene sequence after RNA splicing by spliceosomes does not include introns in the genomic sequence. However, the size distribution of the product lengths of the synthesized library was matched to the requirements of sequencing, and excess product length was not observed in the synthesized library. Therefore, the first scenario is the most likely cause to prevent the successful synthesis of amplicons. These considerations suggest that re-designing the custom primer panel would be necessary to improve the genotyping efficiency. Using redundant gene sequences or, if possible, genome sequences as a reference, it would be possible to construct a more accurate panel for genotyping in C. japonica.
In the F 1 population of C. japonica, although results depend on traits and applied models, we confirmed that moderate accuracies (>0.5) were obtained for some wood properties in the genomic prediction modeling with the SNP genotypes ( Table 2). Among the traits, genomic prediction accuracies for wood properties (e.g., SWV, PP, and basic densities) were higher than for growth traits (height and DBH). The ranges of prediction accuracies were more variable in growth traits (0.197-0.418 for height, and 0.236-0.408 for DBH) than in the wood property-related traits (0.445-0.487 for SWV, 0.456-0.555 for PP, and 0.409-0.544 for BD_means). Previous studies suggest that trait heritability is an important factor for the accuracy of genomic prediction [6]. The ranges of broad-sense heritability, which were previously reported for the associated traits in this study, were as follows: 0.37-0.72 for height [55,56], 0.21-0.52 for DBH [55][56][57], 0.65 for wood stiffness [57], and 0.78-0.88 for wood density [55,56]. In addition, higher prediction accuracies were observed for each trait when the trait values were adjusted by the spatial autocorrelation residuals that were employed for genomic predictions. This suggests that the F 1 individuals at the plantation site are affected by local micro-environmental factors for both growth traits and wood properties; the growth traits were more sensitive to environmental heterogeneity than the wood property-related traits. Furthermore, this suggests that accurate individual phenotyping is important for accurate genomic prediction modeling.
In previous genomic prediction studies performed by Hiraoka et al. [37], prediction accuracy differed among populations and unrelated plus trees of C. japonica; the prediction accuracies in the Kyushu population were generally the highest, followed in order by those in North Kanto and South Kanto populations [37]. For example, prediction accuracies in DBH were 0.523, 0.299, and 0.033 in Kyushu, North Kanto, and South Kanto populations, respectively, when the prediction model was constructed based on all 32,036 SNPs [37]. In this study, the F 1 population was mostly constructed by artificial crossings between the plus trees originating from the South Kanto population. Although the prediction model was constructed based on around 2000 SNPs, prediction accuracies in the F 1 population were higher than the results of trait prediction modeling in unrelated plus trees in the South Kanto population. Genomic relationships arising from population structures could influence the prediction accuracy as well as linkage disequilibrium [58], and the genetic structure of the F 1 population may show improved prediction accuracies compared to unrelated plus trees. Extended linkage disequilibrium in C. japonica [59] may have a positive effect on increased prediction accuracies. In addition, large numbers of individuals (n = 576) may positively affect prediction accuracies because the number of F 1 individuals used in modeling was greater than that of the unrelated plus trees (n = 159 in the South Kanto population [37]). In a genomic prediction study in Pinus pinaster Aiton using 661 individuals and 2500 markers [14], the prediction accuracy for tree height and DBH was 0.47 and 0.43, respectively, which was comparable to the prediction accuracy in the present study. In Eucalyptus, the prediction accuracy for basic wood density was 0.67 in genomic prediction with 768 individuals and 24,806 SNPs [60]. Therefore, the population structure and the number of individuals may affect prediction accuracy. Assuming that the medium-scale genotyping is a prerequisite, it is necessary to increase the number of individuals in order to improve the prediction accuracies.
In this study, we performed genomic prediction for the F 1 population, which was mostly constructed by artificial crossing between plus trees originating from the South Kanto population with AmpliSeq, a medium-scale genotyping platform. Genomic estimated breeding values generated by prediction models with moderate accuracies may be usable as a threshold for selecting individuals that have not been virtually phenotyped as candidate or superior trees. We consider that the GS procedure with the medium-scale genotyping applied in this study is more practical than a large-scale genotyping such as Axiom when adapted to a large number of individuals. In a simulation study of GS in C. japonica, Iwata et al. [8] showed that GS breeding with model updating based on a realistic number of markers (e.g., one in every 1 cm) outperformed phenotypic selection breeding over 60-year periods, even for a low-heritability polygenic trait. In the present study, we developed and used a comparable number of markers (0.75 cm intervals) as that assumed by Iwata et al. [8] for the genomic prediction. This indicates that the breeding scheme proposed by Iwata et al. [8] is one option for future genome-based breeding in this species. GS breeding with model updating particularly for traits with high heritability such as wood properties may obtain higher genetic gain than for a low-heritability polygenic trait. In-house genotyping using the custom primer panel allows for flexible model improvement.
Further trials of genomic prediction in progenies of other populations, e.g., North Kanto and Kyusyu populations, would make it possible to verify the effectiveness of these prediction models. In addition, an application of models constructed for first-generation plus trees would be required and useful for predictions in subsequent generations and further applications of GS in C. japonica breeding.

Conclusions
In this study, we constructed a custom primer panel for amplicon sequencing for C. japonica, and we evaluated the custom primer panel with actual sequencing and with in silico PCR. Although genotyping efficiency could be improved through redesign of the custom panel, based on the trials for the genotyping and the genomic prediction modeling in the F 1 population, we demonstrated that the custom panel is useful for genomic prediction in C. japonica. In addition, we showed that prediction accuracy was improved when considering special autocorrelation errors arising from environmental heterogeneities. Since models are considered to be constructed for the first-generation plus trees and would also be useful for predictions in subsequent generations, further considerations are necessary for applying genomic prediction to C. japonica. Amplicon sequencing with the custom panel enables us to obtain genotype data efficiently in order to perform genomic prediction, to manage clones, and to advance forest tree breeding.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/9/898/s1, Figure S1: Example of the effect of trait-value adjustment with special autocorrelation error for tree height, Figure  S2: Alignment of amplicon sequences obtained by a primary evaluation of the custom primer panel for C. japonica, Figure S3: Relationships between actual and predicted trait value for tree height (H, a and b) and for pyridine penetration rate (PP, c and d) with best linear unbiased prediction (GBLUP), Table S1: List for the probe ID in the Axiom custom genotyping array, corresponding amplicon name, contig, SNP position, and primer sequence in the AmpliSeq custom panel, Table S2: Contents of half diallelic F 1 populations, Table S3: Summary statistics in the primary evaluation for the custom primer panel.