A Novel Strategy for Constructing an Integrated Linkage Map in an F1 Hybrid Population of Populus deltoides and Populus simonii

The genetic linkage maps of the traditional F2 population in inbred lines were estimated from the frequency of recombination events in both parents, providing full genetic information for genetic and genomic studies. However, in outbred forest trees, it is almost impossible to generate the F2 population because of their high heterozygosity and long generation times. We proposed a novel strategy to construct an integrated genetic linkage map that contained both parental recombination information, with restriction-site-associated DNA sequencing (RADSeq) data in an F1 hybrid population of Populus deltoides and Populus simonii. We selected a large number of specific RAD tags to construct the linkage map, each of which contained two SNPs, one heterozygous only in the female parent and the other heterozygous only in the male. Consequently, the integrated map contained a total of 1154 RAD tags and 19 linkage groups, with a total length of 5255.49 cM and an average genetic distance of 4.63 cM. Meanwhile, the two parent-specific linkage maps were also constructed with SNPs that were heterozygous in one parent and homozygous in the other. We found that the integrated linkage map was more consensus with the genomic sequences of P. simonii and P. deltoides. Additionally, the likelihood of the marker order in each linkage group of the integrated map was greater than that in both parental maps. The integrated linkage map was more accurate than the parent-specific linkage maps constructed in the same F1 hybrid population, providing a powerful genetic resource for identifying the quantitative trait loci (QTLs) with dominant effects, assembling genomic sequences, and performing comparative genomics in related Populus species. More importantly, this novel strategy can be used in other outbred species to build an integrated linkage map.


Introduction
Genetic linkage maps display the linear orders of groups of DNA markers with genetic distances between them, playing a crucial role in identifying the quantitative trait loci (QTLs), assembling genomic sequences, and performing comparative genomics [1][2][3][4]. For experimental plants and animals in which inbred lines are available, the simplest populations for constructing linkage maps are the backcross (BC) and F 2 intercross between two inbred lines [5]. Mather (1938) showed that the F 2 populations provided more recombination information than the backcross because two, rather than one, recombinant gametes are possibly sampled per individual [6]. Thus, when the sample sizes are the same, the linkage map constructed with an F 2 population is more accurate than that with a backcross, in the aspects of the genetic distances between the adjacent markers and marker orders in each linkage group. Without any doubt, accurate linkage maps are more powerful for locating QTLs and anchoring chromosome-scale sequences [1,7]. In addition, linkage maps that contain both parental recombinant information allow for the effective identification of QTLs with both additive and dominant effects [8,9].
However, for outbred species, especially in forest trees, it is almost impossible to derive the F 2 populations as in the inbred lines, due to their long generation times and high heterozygosity [10,11]. Therefore, an F 1 hybrid population was usually established by crossing two diverged individuals as a mapping population, and the so-called "pseudo-testcross" strategy was applied for linkage mapping, leading to two independent parental linkage maps [1,[12][13][14][15]. Each of the two linkage maps contains recombinant information only from one parent. Although efforts have been made to construct an integrated linkage map in an outbred F 1 hybrid population [16][17][18][19], successful cases similar to the F 2 maps in inbred lines have rarely been reported to date. The main reason is that the overwhelming majority of molecular markers segregate in the ratio of 1:1 in the F 1 progeny, at which one parent is heterozygous and the other homozygous, but there were no sufficient markers segregating as bridges in both parents for generating a complete linkage map [1,20]. Meanwhile, many studies have been performed by combining multiple linkage maps into an integrated map for saturation purposes or for comparative genomics [10,[21][22][23]. However, these maps could not have the full characteristics of the F 2 linkage maps in inbred lines.
With the advances in next-generation sequencing technologies, a large number of single nucleotide polymorphisms (SNPs) can be obtained in a fast and cheap way across a hybrid population for genetic mapping. These SNPs provide opportunities for mining special markers to perform various genetic studies. In the current study, we proposed a novel strategy to construct an integrated linkage map that contained both parental recombinant information between the adjacent markers in an F 1 hybrid population of P. deltoides and P. simonii. In our previous studies [20,24], a large number of the hybrids were sequenced with the restriction-site-associated DNA sequencing (RAD-seq) technology. The SNP genotypes of the two parents and some hybrids were identified by aligning their paired-end (PE) reads to the reference genome of P. simonii [25]. We chose a large number of specific RAD tags to construct the integrated map, each of which contained two SNPs, one heterozygous only in the female parent and the other heterozygous only in the male ( Figure 1A), to construct an integrated linkage map of P. deltoides and P. simonii. Such a linkage map differed from those constructed with the "pseudo-testcross" strategy in forest trees, providing full genetic information for locating quantitative trait loci (QTLs), assembling genomic sequences, and performing comparative genomics. Furthermore, the proposed strategy could be applied to other outbred species for constructing an integrated linkage map with an F 1 hybrid population.

Figure 1.
Recombination fraction between two RAD tags: (A) each RAD tag contains two SNPs, one segregating in the type of ab×aa and the other in aa×ab; (B) the genotype counts at SNPs 1 and 3 in the progeny are used to estimate the female recombination fraction, while the counts at SNPs 2 and 4 are for estimating the male recombination fraction; (C) under the assumption of the SNP linkage phases in coupling, the formulas are given for calculating the recombination fractions of the female (rf), the male (rm), and the combination of both (r).

Plant Material and Sequencing Data
We randomly selected 257 trees in an F1 hybrid population for constructing the parental integrated linkage map. The F1 population was derived by crossing a female P. deltoides and a male P. simonii in the springs of 2009 and 2011, as described in our previous studies [20,24]. The selected individual trees and their two parents were sequenced with RAD-seq technology in 2013 and 2016. The read data for each tree had a genomic coverage of >2x, and its accession number is listed in Table S1. All the read data are available in the NCBI SRA database (http://www.ncbi.nlm.nih.gov/Traces/sra, accessed on 1 May 2022).

SNP Genotyping
First, the PE read data of each tree was filtered to produce high-quality (HQ) data with the NGS QC toolkit (v2.3.3) [26]. Next, we aligned the HQ reads of each individual to the reference sequence of P. simonii [25] to call its SNP genotypes using the software packages of BWA (v0.7.17) [27], SAMtools (v1.9), and BCFtools (v1.9) [28]. The whole SNP calling procedures can be described in detail as follows: (1) The HQ reads of each tree were mapped to the reference genome to generate a sequence alignment/map (SAM) file with the BWA mem command; (2) each SAM file was converted into a BAM file and then was sorted and indexed with SAMtools; (3) each sorted BAM file was converted into a BCF file using BCFtools with its parameters set as follows: "bcftools mpileup -Obuzv -a AD,INFO/AD -f"; (4) for each parent, a VCF file was obtained from its BCF file with the command as "bcftools call-m -v -f gq"; (5) the SNPs of each parent were extracted from their

Plant Material and Sequencing Data
We randomly selected 257 trees in an F 1 hybrid population for constructing the parental integrated linkage map. The F 1 population was derived by crossing a female P. deltoides and a male P. simonii in the springs of 2009 and 2011, as described in our previous studies [20,24]. The selected individual trees and their two parents were sequenced with RAD-seq technology in 2013 and 2016. The read data for each tree had a genomic coverage of >2x, and its accession number is listed in Table S1. All the read data are available in the NCBI SRA database (http://www.ncbi.nlm.nih.gov/Traces/sra, accessed on 1 May 2022).

SNP Genotyping
First, the PE read data of each tree was filtered to produce high-quality (HQ) data with the NGS QC toolkit (v2.3.3) [26]. Next, we aligned the HQ reads of each individual to the reference sequence of P. simonii [25] to call its SNP genotypes using the software packages of BWA (v0.7.17) [27], SAMtools (v1.9), and BCFtools (v1.9) [28]. The whole SNP calling procedures can be described in detail as follows: (1) The HQ reads of each tree were mapped to the reference genome to generate a sequence alignment/map (SAM) file with the BWA mem command; (2) each SAM file was converted into a BAM file and then was sorted and indexed with SAMtools; (3) each sorted BAM file was converted into a BCF file using BCFtools with its parameters set as follows: "bcftools mpileup -Obuzv -a AD,INFO/AD -f"; (4) for each parent, a VCF file was obtained from its BCF file with the command as "bcftools call-m -v -f gq"; (5) the SNPs of each parent were extracted from their VCF files and merged into a site list file, in which the allele read coverage depth (DP) of each genotype at each SNP site was ≥3, and the corresponding genotyping quality (GQ) was >30; (6) for each progeny, a VCF file was derived from its BCF file according to the site list file obtained above, with the command as "bcftools call-m -v -f gq; (7) the SNP genotypes for all individuals were extracted from their VCF files such that the allele DP ≥ 3 and GQ > 30, finally leading to a SNP dataset.
A chi-square test was performed for each SNP to check whether it follows the Mendelian segregation ratio. If an SNP seriously deviated from the Mendelian ratio (p < 0.01) or had ≥20% missing genotypes, it was removed from the dataset.

Integrated Linkage Map Construction
As a novel strategy, we selected the RAD tags (<1000 bp) that contain two SNPs to construct an integrated linkage map of P. deltoides and P. simonii. One of the SNPs on each RAD tag segregates in the type of ab×aa and the other in aa×ab, where the first two letters denote the SNP genotype of the female P. deltoides, and the last two indicate the genotype of the male P. simonii ( Figure 1A). These RAD tags allow for performing a two-point linkage analysis of the female or male species or even both ( Figure 1B,C). First, we used the SNPs of ab×aa to construct the female linkage map with the JoinMap 4.1 software [29]. Apparently, not only the RAD tags but also the SNPs of aa×ab were classified into the same linkage groups with the same orders as the SNPs of ab×aa in the female map. Second, the two genotype datasets of ab×aa and aa×ab were merged into one dataset for constructing an integrated map of the two parents after a transformation procedure. In each female linkage group map, when two adjacent SNPs of ab×aa were in the repulsion phase, the progeny genotype of "ab" at the second SNP was transformed into "aa" and the genotype of "aa" into "ab" (Figure 2). It is easily verified that the same female recombination fraction for the two SNPs can be calculated from the transformed data [16]. Similarly, data transformation along the order of RAD tags was also performed for the SNPs of aa×ab in the same linkage group. We can see that each of the two transformed datasets is like those from the traditional BC population, and so is the merged dataset. The merged dataset for each linkage group was used to construct the integrated group map with the MapMaker software [5]. Third, the linkage maps of RAD tags were plotted in a WMF format using Kosambi's mapping function [30] and FsLinkageMap. Finally, the maps were further changed into a PDF format using the tool Mayura Draw (http://www.mayura.com, accessed on 5 July 2022). allele "a" is transformed into "b" and "b" into "a". Correspondingly, the genotype of "aa" is transformed into "ab" and "ab" into "aa" in the progeny.

Selected RAD Tags
After filtering the RAD-seq data, an average of 4.61 Gb HQ data were obtained to call (B) at the second SNP site in the maternal parent, allele "a" is transformed into "b" and "b" into "a". Correspondingly, the genotype of "aa" is transformed into "ab" and "ab" into "aa" in the progeny.
After the above procedures, the female linkage map and an integrated linkage map were generated. Meanwhile, the male linkage map was obtained by just using the SNPs of aa×ab with JoinMap. In order to investigate the performance of the three linkage maps, we compared their collinearities with a reference sequence. Additionally, for a linkage group, the linear orders of the three linkage maps were compared by calculating the likelihoods with the transformed and merged genotype data using MapMaker.

Selected RAD Tags
After filtering the RAD-seq data, an average of 4.61 Gb HQ data were obtained to call the SNP genotypes for each individual. A total of 48,938 SNPs were detected and genotyped in the F 1 hybrid population by aligning the HQ reads to the reference genome of P. simonii. Like the results in our previous studies [1,20], most SNPs were segregated in the types of ab×aa and aa×ab, amounting to 29,466 and 18,863, respectively. From the two types of 1:1 segregated SNPs, a total of 1154 RAD tags were obtained for constructing an integrated linkage between the two parents. Each RAD tag had two SNPs within a range of <1 Kb, one of which was segregated in the type of ab×aa and the other in aa×ab ( Figure 1A). Moreover, the distance between two adjacent RAD tags was required to be ≥100 Kb so that the crossover events between them can be frequently identified.

Integrated Linkage Map of P. deltoides and P. simonii
First, the female parental linkage map of P. deltoides was built with the SNP dataset of ab×aa, which contained 19 linkage groups under the logarithm of odds (LOD) thresholds ranging from 5 to 17 and thus perfectly matched the karyotype of Populus ( Figure S1). Correspondingly, the RAD tags were also classified into 19 linkage groups. For each linkage group of RAD tags, the SNP genotype datasets of ab×aa and aa×ab were transformed and merged to construct the integrated group map, where the RAD tags were denoted as "RT" plus the serial number along the reference genome of P. simonii (Figure 3). As a result, the total length of the integrated linkage map was 5255.49 cM, with an average genetic distance of 4.63 cM between the adjacent RAD tags and a linkage group length ranging from 164.42 cM for chromosome 19 to 694.79 for chromosome 1 (Table 1).    Besides the construction of the female parent linkage map, we also used the SNP dataset of aa×ab to construct the male parent linkage map ( Figure S2). For the two parental linkage maps and the integrated linkage map, the number of markers and the length of each LG are presented in Table 1. The numbers of SNPs or RAD tags within the same LG were equal for the three linkage maps. Except for LGs 8 and 12, most female LG lengths were consistently greater than those of the male species. Moreover, the total length of the female map reached 5607.34 cM, much greater than that of the male (4752.21 cM), indicating that the recombinant events were more frequent in the female P. deltoides than in the male P. simonii. From the integrated map, it is easily found that each LG length was between those in the female and male linkage maps.

Advantage of the Integrated Map over the Two Parental Maps
In order to investigate the advantage of the integrated linkage map over the two parental linkage maps, we first compared their consensuses with the genomic sequences of P. simonii [25] and P. deltoides [31]. We found that a total of 68 local regions on the integrated linkage map were discordant with the P. simonii genomic sequence, which contained~15% (173) of all the RAD tags (File S1; Table 2). However, there were more discordant regions (DRs) on each parent linkage map, with 119 and 88 DRs containing~31% (361) and~21% (239) markers on the female and male maps, respectively. Moreover, the number of DRs within LGs in the integrated map was largely less than or equal to the numbers in both the female map and the male map, except for LGs 3, 4, 15, and 16. Meanwhile, the numbers of markers in the DRs within the LGs basically had the same situation. To compare the consensuses with the genome of P. deltoides "I-69" among the three linkage maps, the RAD tags were mapped to this genomic sequence. Consequently, a total of 1134 RAD tags were successfully aligned to the corresponding chromosomes, except for 20 markers mapped to different chromosomes or scaffolds (File S2). The DRs and the internal markers in the integrated, female, and male linkage maps were counted for each LG and are listed in Table 3. Similarly, we found that the number of DRs in most integrated LGs was less than or equal to the corresponding numbers in each of the two parental maps, except for LGs 3, 4, and 15. Additionally, the number of markers contained in the DRs showed the same feature as that of the DRs. In total, there were 77 DRs with 197 markers in the integrated map, while there were more DRs with more markers in the female and male maps (Table 3). Table 3. The number of discordant regions with marker numbers in brackets in each linkage group of the two parent linkage maps and the integrated linkage map. The marker order of the region is discordant with the genomic sequence of P. deltoides "I-69".
LG In addition to the comparison of consensuses with genomic sequences, we also investigated the statistical characteristics of the marker orders within LGs in the three linkage maps. The likelihood of the marker order in each LG was calculated based on the transformed and merged data of ab×aa and aa×ab with the MapMaker software. The result showed that the likelihood of each LG in the integrated map was greater than that in both the female and male maps ( Table 4), indicating that, from a statistical point of view, the marker orders are more accurate in the integrated map than in the two parent maps.

Discussion
Throughout time, genetic statisticians have tried to use fully informative markers as bridges for constructing an integrated map in a full-sib family in outbred species, especially in forest trees [16][17][18]. Here, the fully informative markers include those that segregate into the types of ab×ab and ab×cd in a mapping population. However, only a small number of fully informative SNPs were identified in the F 1 hybrid population of P. deltoides and P. simonii in our previous linkage mapping studies, so they were not sufficient to build an integrated linkage map but led to two parent-specific linkage maps [1,20,24]. In the current study, we successfully obtained an integrated genetic linkage map of the two parents in the same population, using a novel strategy of selecting specific RAD tags from a large number of SNPs across the population. Besides the selection of the RAD tags, we proposed a method to transform and merge the two genotype datasets of ab×aa and aa×ab into one dataset, just like in a traditional BC population, allowing for the use of the famous software MapMaker for linkage mapping. The integrated linkage map not only has some advantages over the two parental linkage maps but also possesses superiority in comparative genomics and QTL mapping.
Compared with the two parent-specific linkage maps, the integrated linkage map was more consensus with the genomic sequences of P. simonii [25] and P. deltoides [31]. Although the available genomic sequences have some shortcomings due to the complexity of genomic assembling, such consensus reflects a certain degree of authenticity in the order of contigs because the genomic assemblies and genetic maps were obtained using independent technologies. In the process of genomic assembling, the long reads of the two parents obtained with third-generation sequencing technologies were first assembled into contigs. Then, these contigs were further anchored into chromosome-level sequences with the chromosome conformation capture (Hi-C) technology. However, the Hi-C technology is not entirely perfect, as it could produce some errors in a chromosome-level assembly such as artificial inversions and misjoined contigs [32,33]. On the other hand, the order of markers on linkage maps may not be absolutely correct. The reason is that ordering a large number of markers in a linkage group is a difficult, scientific problem, which could produce erroneous orders in local regions [1]. Nevertheless, the consensus between the orders of contigs and genetic markers could validate each other to make sure they are as correct as possible.
In theory, the integrated linkage map provides a powerful genetic resource for identifying QTLs, with additive and dominant effects in the F 1 hybrid population of P. deltoides and P. simonii. However, there are no software packages available to directly implement the analysis of QTL mapping. Therefore, new algorithms should be developed to calculate the genetic parameters for a specific position on the integrated map, similar to the traditional QTL mapping methods such as interval mapping (IM) [2] and composite interval mapping (CIM) [34]. Unlike in the F 2 population from inbred lines, the two haplotypes of an individual can be discriminated from its genotype with the RAD tags. For this reason, the probability of an assumed QTL genotype conditional on the flanking RAD tags could be easily derived, and thus the likelihood of the QTL effects could be established. After that, the calculation for the parameters can be implemented through the expectation-maximization (EM) algorithm [35]. We expect to finish such a statistical analysis in a subsequent study and to detect QTLs with dominant effects for important growth traits in related Populus species.

Conclusions
In this study, a novel strategy was proposed and successfully applied to construct an integrated linkage map for P. deltoides and P. simonii with next-generation sequencing data. However, the conventional linkage mapping methods for such a hybrid population led to two parent-specific linkage maps, each lacking the full recombinant information of both parents. The integrated linkage map proved to be more accurate than the two parentspecific linkage maps, providing a powerful genetic resource for identifying quantitative trait loci (QTLs) with dominant effects, assembling genomic sequences, and performing comparative genomics in Populus. Most importantly, this novel strategy can be used in other outbred species for constructing an integrated genetic linkage map.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13101731/s1, Table S1: The RAD-seq data information for the 2 parents and 257 progeny in the F 1 hybrid population of P. deltoides and P. simonii; Figure S1: The genetic linkage map of the female parent P. deltoides; Figure S2: The genetic linkage map of the male parent P. simonii; File S1: Discordant regions (highlighted) in the 19 LGs of the integrated, female, and male linkage maps, compared with the genomic sequence of P. simonii; File S2: Discordant regions (highlighted) in the 19 LGs of the integrated, female, and male linkage maps, compared with the genomic sequence of P. deltoides "I-69".   Table S1.