Sequencing of Supernumerary Chromosomes of Red Fox and Raccoon Dog Confirms a Non-Random Gene Acquisition by B Chromosomes

B chromosomes (Bs) represent a variable addition to the main karyotype in some lineages of animals and plants. Bs accumulate through non-Mendelian inheritance and become widespread in populations. Despite the presence of multiple genes, most Bs lack specific phenotypic effects, although their influence on host genome epigenetic status and gene expression are recorded. Previously, using sequencing of isolated Bs of ruminants and rodents, we demonstrated that Bs originate as segmental duplications of specific genomic regions, and subsequently experience pseudogenization and repeat accumulation. Here, we used a similar approach to characterize Bs of the red fox (Vulpes vulpes L.) and the Chinese raccoon dog (Nyctereutes procyonoides procyonoides Gray). We confirm the previous findings of the KIT gene on Bs of both species, but demostrate an independent origin of Bs in these species, with two reused regions. Comparison of gene ensembles in Bs of canids, ruminants, and rodents once again indicates enrichment with cell-cycle genes, development-related genes, and genes functioning in the neuron synapse. The presence of B-chromosomal copies of genes involved in cell-cycle regulation and tissue differentiation may indicate importance of these genes for B chromosome establishment.


Introduction
B chromosomes (or Bs) were first described over a century ago [1] and since then they have been found in a multitude of animal and plant species. The name reflects the fact that the Bs represent a variable addition to the normal (or A) chromosome set. Unlike other types of supernumerary
Samples were prepared for sequencing with Nextera or TruSeq v.3 library preparation kits (Illumina, San Diego, CA, USA). Paired-end sequencing was performed using Illumina MiSeq (Table S1). Sequencing data are available from NCBI short read archive (SRA) under accession PRJNA477942.
Genomic regions of chromosomes were identified using dopseq_pipeline of DOPseq_analyzer v.1.0 [21], which includes primer and adapter trimming with cutadapt v.1.8.3 [28], alignment to the dog (CanFam3.1) genome with BWA-MEM v.0.7.15 [29], removal of human contamination by comparative alignment to human (hg19) genome, filtering of the alignment (min_mapq = 20, min_len = 20), and genome classification based on mean distance between consecutive merged mapped read positions, which is expected to be lower for regions present on the sampled chromosome (see Table S2 for statistics). Only chromosomes of the dog genome assembly were considered at this point, while unassigned scaffolds were discarded. Automated target region detection was complemented by visual inspection of rainfall-style plots representing the distribution of distances between mapped reads along the reference chromosomes (Supplementary File S2).
Copy-number analysis was performed using genomic reads (PRJNA378561) generated during the fox genome assembly project [30]. We established a fibroblast cell culture from this individual and estimated the Bs number to be stable and equal to three. We microdissected all three Bs from a single metaphase plate. The resulting DNA samples were amplified, sequenced and used in this study as VVUB3, 4, and 6. Fox genomic reads were aligned to CanFam3.1 genome with BWA-MEM, and copy number estimation was performed with CNVnator v.3.3.0 [31]. The changes in read coverage corresponding to duplications and deletions were estimated with a sliding window size of 1000 bp, as a lower window size (200 bp) revealed a high number of false positive deletions resulting from uneven coverage for between-species read mapping. B chromosomal copy number was estimated from normalized read depth reported by CNVnator (Table S3).
As a part of comparative gene content study, we re-analyzed Siberian roe deer (Capreolus pygargus Pall.) and grey brocket deer (Mazama gouazoubira G. Fischer) Bs sequencing data [21] with dopseq_pipeline, using cattle reference genome UMD3.1 instead of Baylor Btau_4.6.1 and BWA-MEM instead of Bowtie2 with "very-sensitive-local" profile, as in the original publication [21]. Other parameters were set as described above for canid chromosomes.
Sets of genes found in Bs of six mammalian species were obtained from the Ensembl Genes 92 [32] database using R biomaRt package [33]. First, B-chromosomal gene sets were obtained for the original reference genomes: dog for red fox and Chinese raccoon dog, cattle for Siberian roe deer and grey brocket deer, mouse for field mice Apodemus flavicollis Melchior and Apodemus peninsulae Thomas. Then, information on gene homology was added, so that every gene was supplemented (if possible) with identifiers in human, mouse, dog, and cattle genomes (Supplementary File S3). Functional enrichment analysis was performed using DAVID GO v.6.8 [34,35] for the selections of genes representing all or one-to-one (i.e., single copy) orthologs in the human genome (Supplementary File S4). Functional clustering was performed only for categories that annotated at least 80% of genes in the gene sets. Default background datasets of human genes were used for each reference.

Sequencing of Fox and Raccoon Dog B Chromosomes and Autosomes
To ascertain the adequacy of our approach for chromosomal region detection we included samples of autosomes: flow-sorted Chinese raccoon dog (Nyctereutes procyonoides procyonoides Gray, NPP) chromosome 6 (NPP6) and microdissected red fox (Vulpes vulpes L., VVU) chromosome 3 (VVU3). In both cases, detected regions are in good agreement with comparative cytogenetics data [20,23,24]: NPP6 is homologous to the entire dog (Canis lupus familiaris L., CFA) chromosome 3 (CFA3) and the distal portion of CFA13; VVU3 is homologous to entire chromosomes CFA6, CFA34, and CFA36. For NPP6, an additional 160 kbp region of CFA16 was found, suggesting a putative duplication or translocation. This result indicates the higher resolution of our method compared to comparative cytogenetics, but requires further validation. Slight depletion of reads in some genomic regions was observed on autosomes of both species: In a region of NPP6 corresponding to CFA3:56.4-62.8 Mbp and in a region of VVU3 corresponding to CFA6:55.8 Mbp up to chromosome end for VVU3 (Supplementary File S2). These changes were insufficient to be treated as deletions, but reflected a trend of under-representation of certain regions in isolated chromosome sequencing data [21].
Four samples of fox Bs were sequenced, including one sorted (VVUB2) [23] and three microdissected from a single metaphase plate (VVUB3, 5 and 6). Comparison of sequenced Bs to the dog genome sequence revealed 14 regions comprising 7.7 Mbp (Table 1). Three samples, including both sorted and microdissected Bs (VVUB2, 3 and 5), were in perfect agreement. A reduced set of regions was identified in the sample VVUB6. The seven regions previously detected in fox Bs by BAC clone mapping [20] were recovered successfully. Significant deletions lacking read coverage were observed in two regions: CFA13:34 Mbp and CFA22:24-25 Mbp. For the Chinese raccoon dog, a total of eight samples of Bs were analyzed: One sorted (NPPB1), and others microdissected (NPPB2-8) ( Table 2). Here, the highest number of regions (27, total size 8.5 Mbp) was recovered from the sorted sample only, while microdissected samples demonstrated only a subset of these regions-in contrast to fox Bs, where both chromosome isolation methods resulted in similar sets of regions. We recovered only three of five regions previously identified on Chinese raccoon dog Bs by BAC clone mapping [20]. The region corresponding to CFA13:34 Mbp claimed to be present on both fox and raccoon dog Bs was discovered only in fox Bs, while the region on CFA29:4 Mbp (specific to Chinese raccoon dog) was not identified in any samples. Due to the larger size of raccoon dog Bs, in addition to whole-chromosome probes (NPPB2 and 3), we were able to dissect proximal (NPPB5 and 6), middle (NPPB7), and distal (NPPB4 and 8) portions of the B, hoping to recover gene order on the B. However, this approach had only limited success: Differences between same-portion samples (NPPB5 vs. 6, 4 vs. 8, see Table 2) were higher than between samples of different chromosomal portions from the same experiment (NPPB4 vs. 5, 6 vs. 7 vs. 8), and many regions were present throughout the whole B. This finding is in line with previous BAC-clone mapping results, which yielded repetitive locations of several genes throughout the NPPB [20].
We found two overlapping regions on Bs of the fox and raccoon dog: CFA13:47 Mbp with KIT protooncogene (Figure 1), which was the first gene to be discovered in Bs of mammals [18], and CFA32:13-15 Mbp, with no genes in the subregion present in both species. A similar region reuse pattern involving regions without any genes was previously observed for Bs in two field mouse species [22]. Several regions identified on Bs of the red fox and raccoon dog are located in close proximity in the dog genome, e.g.,  Table 2. Genomic regions of Chinese raccoon dog (Nyctereutes procyonoides procyonoides Gray, NPP) Bs. Region coordinates are given according to dog (canFam3) genome assembly. NPPB1-flow-sorted Bs, NPPB2 and 3-whole microdissected Bs, NPPB4 and 8-microdissected distal part of Bs, NPPB5 and 6-microdissected proximal part of Bs, NPPB7-microdissected middle part of B, BAC-BAC clone mapping data from [20]. +-regions with >5 read positions (recovered automatically).~-regions with <5 read positions (recovered manually).  Table 2. Genomic regions of Chinese raccoon dog (Nyctereutes procyonoides procyonoides Gray, NPP) Bs. Region coordinates are given according to dog (canFam3) genome assembly. NPPB1-flow-sorted Bs, NPPB2 and 3-whole microdissected Bs, NPPB4 and 8-microdissected distal part of Bs, NPPB5 and 6-microdissected proximal part of Bs, NPPB7-microdissected middle part of B, BAC-BAC clone mapping data from [20]. +-regions with >5 read positions (recovered automatically). ~regions with <5 read positions (recovered manually).

Fox Whole-Genome Sequencing Data
Whole-genome sequencing (WGS) allows the discovery of B-chromosomal regions, using coverage-based copy number analysis of individuals with B-chromosomes (B+) and comparison to individuals without Bs (B−) [10]. The regions with increased copy number only in the B+ genome are treated as present on Bs.
Here, we took advantage of the WGS data available for the fox individual whose fibroblast cell cultures were used for isolation of Bs VVUB3, 5 and 6. We aligned its reads to the dog genome and called copy number variants, anticipating finding a higher coverage in B-specific regions. We were able to avoid the WGS for the B− individual by focusing on the regions predicted with isolated B chromosome sequencing. The overall agreement between whole-genome copy number and isolated chromosome sequencing results was striking-all 14 regions were consistently recovered by both methods (Table S3). The median difference between breakpoint coordinates predicted by the two methods was about 700 bp, or less than one window size (1000 bp)-the limit of copy number calling resolution. Compared to the isolated chromosome sequencing, the copy-number dataset included multiple internal deletions, which presumably reflect unevenly reduced mapping efficiency for the cross-species read alignment. As a result, the total size of the B chromosomal regions detected by copy number analysis (6.2 Mbp, Table S3) was still significantly lower than for isolated chromosome sequencing (7.4 Mbp).
Estimates of normalized read depth allowed us to calculate the copy number for each of the 14 regions present on Bs and compare it to the number of Bs of the sampled individual (n = 3, Figure 2). In agreement with BAC clone mapping results [18][19][20], some of the regions were amplified, e.g., six copies of KIT region (CFA13:47 Mbp) were present on Bs, which implies two copies per B chromosome. The copy number varied not only between the regions but also within some regions, suggesting a partial amplification of these regions in Bs. The most striking example was observed in the region corresponding to CFA31:2-4 Mbp; the number of additional copies for different parts of this region varied from zero (putative deletion) to 20 (presumably seven copies per B chromosome). Existence of copy numbers lower than three together with the absence of these regions from chromosome VVUB6 suggest that B heterogeneity within an individual exists ( Table 1, Table S3). Taken together, these observations suggest that a complex mixture of duplications and deletions occurred since the origin of Bs in red fox, which resulted in significant heterogeneity among Bs-even within a single individual.

Fox Whole-Genome Sequencing Data
Whole-genome sequencing (WGS) allows the discovery of B-chromosomal regions, using coverage-based copy number analysis of individuals with B-chromosomes (B+) and comparison to individuals without Bs (B−) [10]. The regions with increased copy number only in the B+ genome are treated as present on Bs.
Here, we took advantage of the WGS data available for the fox individual whose fibroblast cell cultures were used for isolation of Bs VVUB3, 5 and 6. We aligned its reads to the dog genome and called copy number variants, anticipating finding a higher coverage in B-specific regions. We were able to avoid the WGS for the B− individual by focusing on the regions predicted with isolated B chromosome sequencing. The overall agreement between whole-genome copy number and isolated chromosome sequencing results was striking-all 14 regions were consistently recovered by both methods (Table S3). The median difference between breakpoint coordinates predicted by the two methods was about 700 bp, or less than one window size (1000 bp)-the limit of copy number calling resolution. Compared to the isolated chromosome sequencing, the copy-number dataset included multiple internal deletions, which presumably reflect unevenly reduced mapping efficiency for the cross-species read alignment. As a result, the total size of the B chromosomal regions detected by copy number analysis (6.2 Mbp, Table S3) was still significantly lower than for isolated chromosome sequencing (7.4 Mbp).
Estimates of normalized read depth allowed us to calculate the copy number for each of the 14 regions present on Bs and compare it to the number of Bs of the sampled individual (n = 3, Figure 2). In agreement with BAC clone mapping results [18][19][20], some of the regions were amplified, e.g., six copies of KIT region (CFA13:47 Mbp) were present on Bs, which implies two copies per B chromosome. The copy number varied not only between the regions but also within some regions, suggesting a partial amplification of these regions in Bs. The most striking example was observed in the region corresponding to CFA31:2-4 Mbp; the number of additional copies for different parts of this region varied from zero (putative deletion) to 20 (presumably seven copies per B chromosome). Existence of copy numbers lower than three together with the absence of these regions from chromosome VVUB6 suggest that B heterogeneity within an individual exists ( Table 1, Table S3). Taken together, these observations suggest that a complex mixture of duplications and deletions occurred since the origin of Bs in red fox, which resulted in significant heterogeneity among Bseven within a single individual.  Region parts with different copy numbers counted separately. Regions with copy number below three were lost from some of Bs, while regions with copy number above three were amplified within Bs. X-number of additional copies, Y-counts of regions.

Genetic Content of B Chromosomes in Deer Revisited
B chromosomes of two deer species, Siberian roe deer (C. pygargus Pall., CPYB) and grey brocket deer (M. gouazoubira G. Fischer, MGOB) were previously sequenced and analyzed using an earlier version of the chromosome region identification pipeline [21]. Here, we repeatedly identified target region using another version of the cattle (Bos taurus L., BTA) reference genome assembly (UMD3.1) and a more sensitive alignment algorithm. For both samples, this resulted in a significant increase in B region numbers.
In roe deer Bs, five regions were identified in addition to the two discovered previously (Table S4). The largest added region corresponded to BTAX:148 Mbp (292 kbp in size). It is interesting that the alignment to UMD3.1 assembly with Bowtie2 also revealed this region. Thus, this region became detectable due to the significant improvement of the sex chromosome assembly, rather than due to increased aligner sensitivity. The total size of B chromosomal regions increased from 1.96 Mbp to 2.36 Mbp.
In grey brocket deer, four regions were detected in addition to 25 identified previously (Table S5), and the size was increased from 9.31 Mbp to 10.46 Mbp. Among the newly discovered regions was the retrogene, RASA1, for which only exons were located in Bs. Similarly processed retrogenes were previously reported for Bs of fish [10].
In both species, false positive signals arose at stretches of telomeric (BTA20:72 Mbp) and centromeric (BTSAT4 at BTA19:0.4 Mbp, BTA:61 Mbp) repeats. The telomeric repeats are universal among mammals, and the observation of the same centromeric repeat in deer and cattle is in line with the previous report [36].

Comparison of B Chromosome Gene Content in Six Species of Mammals
To identify common features of Bs in various lineages of mammals, we compared genetic content of Bs across six species: red fox (VVUB), Chinese raccoon dog (NPPB), Siberian roe deer (CPYB), grey brocket deer (MGOB), and two field mouse species, yellow-necked mouse (A. flavicollis Melchior, AFLB) and Korean field mouse (A. peninsulae Thomas, APEB) [22]. For each species, we extracted Ensembl gene predictions overlapping the B chromosomal regions. The number of genes identified with such a method was higher than in both original studies, as other gene types, apart from protein-coding ones, were included in the analysis. Data on homologous human genes were also added from the Ensembl database (Table 3, Supplementary File S3). Table 3. B chromosome genes statistics. Genes retrieved from the Ensembl 92 database. HSA hom-number of genes with homologs in human (Homo sapiens L.), HSA 1-to-1-number of genes with one-to-one (i.e., single copy) orthologs in human. Next, we ran functional annotation and clustering for human one-to-one homologs using the DAVID GO web interface (Table 4, Supplementary File S4). Unexpectedly, we found the top cluster to be associated with neuron synapses and cell junctions (enrichment score 2.15). Genes from this cluster were present on Bs of all species, except for the Siberian roe deer. Table 4. Selected B chromosome gene functions. Neuron synapse and cell cycle proteins retrieved from DAVID Gene ontology (GO) clustering results, while differentiation and proliferation proteins are listed based on keyword match in the GO database.

Sample
The cluster of cell division machinery and cell cycle control functions (enrichment score 1.26) included genes found in Bs of the grey brocket deer, both mouse species, and the red fox. A related cluster was associated with microtubules and centrosomes (enrichment score 0.94).
The B chromosomal gene set was also highly enriched with genes bound by developmental transcription factors (enrichment score 2.09). Still, genes involved in cell differentiation and proliferation did not form a significant cluster, although genes involved in these processes were found on Bs in all species studied. Although not automatically classified, TNNI3K, the first gene identified in Siberian roe deer Bs and the only confirmed example of transcribed B chromosomal gene in mammals [37], was included in this list based on the report of its role in cardiomyocyte differentiation from embryonic stem cells [38].
Multiple clusters were associated with specific intracellular processes: A cluster of genes found in all sampled Bs was associated with protein phosphorylation (enrichment score 1.14). The clusters with related protein domains included PH-like or Pleckstrin homology-like (phosphatidylinositol binding, enrichment score 0.97), SH or Src homology (phosphotyrosine binding, enrichment score 0.83), C2 domain (potentially phospholipid binding, found in Ca 2+ -dependent channels, enrichment score 1.57), and immunoglobulin-like domains (enrichment score 1.59).
Dot-like Bs of red fox were one of the first to be described among mammals [44,45]. During the famous experiment on fox domestication [46], Bs were found to be replicating late [47] and somatically mosaic with accumulation in gametes [48]. In meiosis, Bs form uni-, bi-and trivalents and occasionally associate (but do not conjugate) with sex chromosomes [49]. Furthermore, meiotic recombination between Bs was hypothesized based on mismatch repair protein visualization [50]. Fox Bs are characterized by lower DNA methylation [51] and central location in interphase nuclei [52].
The revolutionary finding of the first unique protein-coding gene, namely protooncogene KIT, on Bs of mammals was made with the fox [18]. Further studies provided additional data on breakpoint margins [19] and identified six additional regions apart from the KIT region on red fox Bs [20]. The current data entirely confirmed these findings and allowed the identification of seven additional regions. Using a combination of isolated chromosome and whole-genome sequencing data, we demonstrated a complex pattern of DNA amplification on Bs: Many regions are present in single copy, while some are duplicated, amplified to a higher copy number, or deleted-partially or completely. Both methods suggest that one of three Bs in the studied individual (VVUB6) was partially degenerated in comparison to the other two (VVUB3 and 5).
In raccoon dogs, Bs were described in three subspecies: Chinese (N. p. procyonoides, NPP) [53], Japanese (N. p. viverrinus, NPV) [54], and Korean (N. p. koreensis) [55]. Multiple studies were made for Chinese and Japanese subspecies, which differ in the number of Robertsonian translocations and by B morphology, and supposedly represent independent species [56]. In contrast to the fox Bs, which are the smallest karyotype elements comparable only to Y chromosomes, the raccoon dog Bs are similar in size to medium (NPP) or small (NPV) autosomes [24,56]. In both subspecies, somatic mosaicism and variable B chromosome morphology were recorded [57][58][59][60]. NPP Bs form biand multivalents in meiosis [61], have higher level of DNA methylation [51] and are located at the periphery of the interphase nucleus [52]. Interstitial telomeric sites are found along the arms of Bs of both subspecies [62]. 28S rRNA gene clusters were found on NPP Bs [59].
Protooncogene KIT was discovered in both NPP and NPV Bs [18]. In NPP, B chromosomal copies of KIT were found to be enriched with sequence variants, but not transcribed [63]. Using PCR mapping it was shown that the KIT region in NPP Bs also includes the 5 part of neighboring KDR gene [19]. Further high-density BAC clone mapping experiment identified four additional regions on NPP Bs [20], but the current data did not detect two of these regions: CFA13:34 Mbp and CFA29:41 Mbp. Both of these demonstrated only one pair of BAC clone hybridization signals per three Bs in the individual studied, and thus might represent either variable additional regions, as described in A. flavicollis Bs [22], or mapping artifacts, e.g., due to specific repeat accumulation. For the remaining three regions, the current dataset agrees well with BAC clone mapping. For example, KIT region location in the distal subarm is confirmed by both methods. Extensive amplification of genes found on NPP Bs is supported by three facts: recurrent identification of the same regions in different B subarms; significant increase of B total size (~50-100 Mbp as suggested by similar autosome sizes) in comparison to 8.5 Mbp unique chromosomal regions identified; earlier evidence of repetitive BAC clone localizations [18][19][20].
A relationship between Bs of raccoon dog subspecies remains unknown. The onset of Bs in the common ancestor could have predated the set of Robertsonian translocations observed in the Japanese raccoon dog. However, the only unique gene shown to be present in NPV Bs is KIT. This is the most frequently reused B chromosomal gene, which makes an independent origin of Bs also possible. Sequencing of the Japanese raccoon dog Bs could help elucidate a common or independent origin of Bs in NPP and NPV.
Genetic content of Bs is far from random. Previous studies suggested that Bs often harbor genes involved in cell division and early development. Two common B drive mechanisms found in animals imply interventions into these two processes: Directed (non)disjunction towards ovaries in female meiosis and mitotic accumulation in the germ line, as opposed to somatic tissues. Genes related to cell cycle and tissue differentiation were reported in non-mammalian lineages: morphogene ihhb (Indian hedgehog b) on Bs of cichlid Lithochromis rubripinnis [15]; several cell division associated genes in another cichlid Astatotilapia latifiscata [10]; a set of expressed pseudogenes including kinesins, argonaute and other regulatory genes in rye (Secale cereale) [16,64]; five genes related to cell division (CIP2A, KIF20A, CKAP2, CAP-G and MYCB2) in a grasshopper Eyprepocnemis plorans [17].
Similar tendencies were observed in mammals, both in terms of gene function enrichment and reuses (genes encoding cell-cycle related signaling protein kinases-KIT, RET and VRK1, as well as splicing regulatory KHDRBS3). The only species with Bs seemingly lacking genes related to cell cycle or early development is the Siberian roe deer, in which the first gene described and shown to be transcribed, TNNI3K, [37] has been recently associated with cardiovascular cell differentiation [38].
Neuron synapse and signaling represents another interesting function highly enriched in mammalian Bs. These genes can be found on Bs of all species studied, except for the roe deer. Examples of genes from this category also occur in Bs of cichlid Astatotilapia latifiscata [10]. This functional category remains quite unexplained by the current model of B chromosome evolution.
Five single copy genes were found to be reused in the Bs of different species. Protooncogene KIT was the first gene identified on the Bs in mammals, and it is currently found in Bs of three species: red fox, Chinese raccoon dog, and grey brocket deer. It encodes a protein kinase controlling the cell cycle, and plays an important role in development. Another protooncogene encoding protein kinase, RET, is found on Bs of Chinese raccoon dog and grey brocket deer. A third previously known gene reuse example is VRK1, a gene which also encodes a protein kinase involved in cell cycle control. This gene is found in Bs of two field mice species, which most likely acquired the supernumerary chromosomes independently [22].
Two novel examples of gene reuse were identified in this study: • KHDRBS3 (as known as T-STAR and SLM-2) is found on Bs of the grey brocket deer and the Korean field mouse. It encodes an RNA-binding signal transduction protein involved in alternative splicing regulation expressed in the brain and gonads. Mutations in this gene are associated with a neurological disorder (child absence epilepsy [65]), and affect the progression of various cancers. • MYOF was found on Bs in the Chinese raccoon dog and the yellow-necked mouse. It encodes myoferin, a plasma-associated protein involved in Ca 2+ -channel formation, important in myoblast functioning [66] and involved in EGF-induced cell migration in breast cancer [67].
Two factors seemingly affect the onset of Bs. Genomic instability context, that is, propensity to chromosome rearrangements, is important, as it provides the source material for B chromosome formation. However, the fixation of Bs within a population may in fact depend on their own genes. Here, we revealed patterns of functional preferences and gene reuse in six mammalian species. These trends are, in general, compatible with the B chromosomal gene findings in other lineages. Further studies are needed to understand the exact molecular mechanisms that Bs used to "hack" cell division and differentiation processes while escaping from selection pressure.