Genetic Investigation of Aral Wild Common Carp Populations ( Cyprinus carpio ) Using ddRAD Sequencing

: Common carp ( Cyprinus carpio ) is a widespread freshwater fish species of the Cyprinidae family, one of the largest and most diverse fish families. The natural habitats of C. carpio extend from Western Europe to South-East Asia. Common carp has remained an economically important fish species in aquaculture for many centuries and its production nowadays exceeds 4 million tons worldwide and continues to grow. The taxonomy of C. carpio is complicated, since this species is usually distinguished in two, three, and even four distinct subspecies. In the present study, we used ddRAD-sequencing to genotype 30 specimens from five wild common carp populations from the Ponto-Caspian, Balkhash-Ile, and Aral Sea geographical regions. It is demonstrated that they differ at the population level according to F-statistics analysis. At the same time, the subspecies status of C. carpio aralensis has not yet been confirmed. We found several loci that can be used as a discriminant for Aral and Ponto-Caspian wild common carp populations. It is suggested that Aral carp ( C. carpio aralensis ), which inhabits Balkhash-Ile and Aral Sea basins, is related to Ponto-Caspian or European carp ( C. carpio carpio ). Moreover, Aral carp might be the ancestor for European carp subspecies. Our results can be used to develop population-specific, high-density SNP marker panels, allowing the trade control of common carp production in the Eurasian Economic Union.


Introduction
The common carp (Cyprinus carpio) is a widespread well-known fish species, with many breeds and strains that have been developed by humans for more than a thousandyear history of its aquaculture. Common carp live in natural freshwater habitats from Western Europe to South-Eastern Asia and Japan, forming wild and feral populations [1,2].
The most common view suggests that the common carp's wild ancestor originated in the former realm of the Paratethys sea several million years ago and then dispersed into Eastern Asia and Europe [3]. The taxonomy of common carp is complicated but this species is usually distinguished in at least two distinct subspecies: Eastern Asian (C. carpio haematopterus), and Ponto-Caspian, or European (C. carpio carpio), according to morphology, microsatellite diversity, mitochondrial, and whole genomic data [3][4][5][6]. Other scientists divide common carp into three distinct subspecies of C. c. carpio, C. c. haematopterus, and Vietnamese common carp (C. c. viridiviolaceus) [7].
Some authors described distinct subspecies of common carp, e.g., Aral common carp (C. carpio aralensis), inhabiting the Aral Sea and other internal drainage basins of Central Asia [8,9]. The Aral Sea is an almost endorheic drying lake lying between Kazakhstan and Uzbekistan, fed by the Amu Darya and Syr Daria rivers. This oncse deep lake with about 66,000 sq. km-one of the largest lakes in the world-has lost more than 80% of its water volume within the last six decades and continues to split into saline residual water bodies with different biological and hydrophysical characteristics [10,11]. Due to the Aral Sea disaster and the concomitant destruction of natural habitats, the biodiversity of fish and invertebrate fauna dramatically reduced until the disappearance of certain species, or a sharp reduction in their numbers, including the Aral carp [10]. Even though common carp is one of the most numerous fish species in Kazakhstan, the wild carp population in the Aral Sea basin became vulnerable and is still decreasing due to overfishing and drying out the Aral Sea.
Kazakhstan ichthyologists describe two common carp subspecies in the natural reservoirs of Kazakhstan: C. carpio carpio and C. carpio aralensis. Kazakhstan's Ponto-Caspian wild carp inhabit the Ural-Caspian basin (Caspian Sea, Ural River, and Shalkar lake). The Aral carp was described in the Aral-Syr Darya basin with the Syr Darya, Sarysu, Chu rivers, and Kamystybas, Akchatau, and Bugun lakes on the west, and Balkhash-Ile basin with Alakol Lake on east Kazakhstan. Moreover, the Aral carp area is limited by the Syr Darya river basin (Uzbekistan) and Issyk-Kul Lake (Kyrgyzstan) to the south [8]. Different ecological forms of Aral carp have been described, which evolved in diverse natural habitats [8,9].
The Balkhash-Ile common carp populations formed in the 1930s after wild Aral carp was introduced to the Balkhash-Ile basin (Balkhash, Alakol, Zaysan lakes, and Nura River) from the Chu River. Later the Aral carp has been introduced to the Ishim and Tobol rivers and Irtysh-Karaganda Canal [9]. Nowadays, these freshwater reservoirs are used for commercial fishing of common carp.
Subsequent genetic studies, based on mitochondrial and microsatellite DNA analyses, did not confirm the separate subspecies status of C. c. aralensis, and showed the genetic similarity between Aral carp specimens and Ponto-Caspian wild common carp [12][13][14] . In this research, we used restriction site associated DNA sequencing (RAD-sequencing), which is one of the modern methods for genotype analysis with the advantages of next-generation sequencing (NGS) technology for population-wide studies at a relatively low cost [15]. A few modifications of this method have been developed to date, including ddRAD sequencing , which allows large-scale sample multiplexing because it contains a four-index sequence incorporation step [16] .
The aim of this study was to clarify the taxonomic status of the Aral carp, which remains unclear and continues to be contentious for ichthyologists. The state-of-the-art approach of high-throughput SNP genotyping of local populations can shed light on the taxonomic status of this economically important fish species.

Fish Sampling, DNA Extraction and Sequencing
Tissue samples of 30 individuals from five different wild populations of common carp (Ponto-Caspian, Balkhash-Alakol, and Aral Sea geographical regions) ( Figure 1) were received from fish tissue collections of the "Russian Federal Research Institute of Fisheries and Oceanography" (VNIRO, Russia) and "Fisheries Research and Production Center" (FRPC, Almaty, Kazakhstan). The number of specimens, population names, and sampling locations are shown in Table 1.
Genomic DNA was isolated from ethanol preserved fish fin clips by proteinase K digestion at 50 °C for 16-20 h, followed by purification through phenol-chloroform extraction, ethanol precipitation, and resuspension in sterile ddH2O [17]. Purified DNA was quantified using a Qubit 2.0 fluorimeter (Invitrogen, USA), and DNA integrity was assessed by agarose gel electrophoresis. The library preparation protocol followed the general principles of the quaddRAD approach [16] with some modifications. Genomic DNA was double digested with MspI/PstI restriction endonucleases (NEB, USA) in the presence of ligase and oligonucleotide adapters with 6 bp inner index sequences and 4 random bases to remove PCR duplicates and then amplified with outer 8 bp indexed primers. Agarose gel size selection was used to reduce the genome fraction for further DNA sequencing. An S2 flow cell of Illumina Novaseq6000 genome analyzer (Illumina, USA) with paired-end reads (2 × 150 bp length) was used for sequencing the ddRAD libraries. Sampling localities according to the color legend and natural water reservoirs for European and Aral carp. Ponto-Caspian region: p1-Danube river; p2-Dniester river; p3-Dnieper river; p4-Don river; p5-Kuban' river; p6-Volga river; p7-Ural river; p8-Emba river. Aral-Syrdarya region: a1-North Aral Sea and Kamystybas lakes; a2-South Aral Sea; a3-Sarykamysh Lake; a4-Amu Darya river; a5-Zarafshan river; a6-Syr Darya river; a7-Sary Su river; a8-Chu river. Balkhash-Alakol region: b1-Lake Balkhash; b2-Ile river; b3-Alakol lakes. I-Lake Issyk-Kul; z-Lake Zaysan.

Data Analysis
Raw Illumina reads obtained from ddRAD libraries sequencing were processed with the Stacks package v 2.41 [18]. The clone_filter module of Stacks was used for PCR duplicates removal. The process_radtags function was used for demultiplexing the dual indexed reads, and to remove erroneous and low-quality reads (parameters: −c −q). Then they were mapped to the reference genome of common carp (Wuyuan Hebao red carp, strain HB00, GenBank RefSeq assembly accession: GCA_004011595.1) using the BOWTIE2 software [19] with "very sensitive" parameter. The mapped data in SAM format were converted to binary (BAM) format, sorted, and then indexed by SAMTOOLS v 0.1.19 [20]. Since common carp is a tetraploid species [21], we filtered the resulting BAM files to leave only reads that were uniquely mapped to the reference genome, so as to avoid the problem of mismapping paralogous reads. Only reads with mapping quality values above 10 were selected for further analysis. SNP calling was conducted using BCFTOOLS v 1.9 [20] with minimum base quality of 30 (min-BQ parameter). The VCF file, obtained with BCFTOOLS, was filtered by genotyping quality parameter to discard loci with mean quality below 30. The filtered VCF was loaded into R statistic environment (www.r-project.org) by the vcfR package [22]. SNP loci were converted into the GENLIGHT format of the ADEGENET R package [23] . Discriminant analysis of carp populations was conducted in ADEGENET by the discriminant analysis of principal components (DAPC) method. We also used the SNPRELATE R package to calculate F-statistics and statistical confidence of population differentiation [24] . Population structure analysis of common carp populations was performed with ADMIXTURE v 1.3.0 [25] with default -cv = 5 cross-validation parameter, and the number of ancestral populations (K) = 3 in 100 bootstraps . Population heterozygosity was determined using the is.het () function from the vcfR package [22]. The reference allele count was calculated as the reference allele count averaged over all loci and all specimens from each population-the data were taken based on the diploid genotype. To determine whether discriminating loci are located within protein-coding genes (PCGs) or not, we extracted DNA sequences around SNP loci with 1000 nucleotide flanks using the BEDTOOLS software utility [26] and compared them to zebrafish (Danio rerio) amino acid sequences (D. rerio peptide database v. GRCz11) using the BLASTX tool [27] . The D. rerio peptide IDs with the best BLASTX scores for the common carp genes were converted to their gene IDs (Table 2). Diversity analysis of the common carp populations was conducted using the gl.report.diversity () function from the dartR package [28] with previously recommended parameters [29].

Results
A total of 71.120.328 Illumina reads with 150 nucleotides in length were obtained from 30 carp specimens, and 67.781.165 (95.3%) of them were mapped to the common carp reference genome after the quality filtering. These data were used for the genotyping analysis. At the same time, the usage of uniquely mapped reads did not affect our main conclusions, despite the fact that approximately 12% of the total number of filtered reads remained (results not shown). The sequencing and mapping statistics for each DNA library are shown in Table S1. In total, 20.179 polymorphic loci for subsequent analyses remained after SNP calling and loci quality filtering.
Population structure analysis of the samples revealed small differences between Aral and Alakol populations within the Aral-Alakol group and between the Volga and Ural populations within the Volga-Ural group (Figure 2a). This result is expected since Aral carp was introduced into Alakol Lake. As the Volga and the Ural carp populations are geographically close to each other, it is not surprising that they are closely related. Moreover, heterozygosity in the common carp populations studied was approximately the same (Table S2). Diversity analysis also demonstrated that the allelic diversity in the Azov population has a highest level, while the Ural population has lowest level. The other populations (Aral, Alakol and Volga) have approximately equal diversity levels ( Figure S1).
We conducted a discriminant analysis of principal component (DAPC) of the five carp populations to estimate the differences between them. DAPC analysis revealed discrimination between joint Aral-Alakol and Volga-Ural populations, as well as the Azov sea population. However, it did not reveal any differences within the Aral-Alakol sample as well as in the Volga-Ural population sample (Figure 2b). It is clearly shown that the Aral-Alakol carp combined sample differs from both the Volga-Ural and the Azov carp population samples, but not more than the Volga-Ural sample differs from Azov. We combined Aral with Alakol and Volga with Ural populations to consider them as single populations due to their close relatedness for measuring genetic divergence in Aral-Alakol and Volga-Ural combined populations. Fst between the joint Aral population and Volga-Ural populations was 0.0228. While there is no "official consensus" about Fst value, traditionally, an Fst value less than 0.05 is considered low [30]. Thus, the low genetic differentiation between Aral-Alakol and Volga-Ural populations is additional evidence that calls into question the subspecies status of wild Aral carp. Fst values was between the Azov and Aral common carp populations (0.06) and between the Azov and Volga carp (0.069) showed much greater genetic differentiation. The statistical support for Fst values was quite high (p-value < 0.00001).
We also conducted discriminant analyses of the principal component between Volga-Ural and Aral-Alakol groups and estimated allele contribution in population differentiation to find out SNP markers which can help to distinguish these populations. Although the population groups are distinguishable, the contribution of each allele is meager-maximum discrimination power is 0.6%, and the mean power is 0.1%. Therefore, the analysis of more than two hundred top discriminating loci is required for clearly distinguishing these two groups.
The proportion of PCGs among discriminating loci was slightly higher than among all the loci studied. Thus, 9402 of the 20,409 selected loci were PCGs (46.6%). Moreover, 12 of the 16 top differentiating loci were PCGs.

Discussion
Our results show the population subdivision between the Aral carp (C. carpio aralensis) groups from the Aral Sea and acclimatized Aral carp in Alakol Lake and European carp (C. carpio carpio) from the Ural, Volga, and Azov Sea regions. The discriminant analysis of the main component (DAPC) revealed minor differences between the Aral-Alakol and Volga-Ural, and Azov populations, with low Fst value (0.02) between populations. The relative position of all samples in the DAPC diagram suggests that if the analyzed samples are equidistant, then the Aral carp may slightly differ from the Ponto-Caspian carp at the sub-populational or even populational level. The controversial molecular data obtained in the present research correlate with the ichthyological data on various ecological forms of wild carp. This also consistent with the diversity analysis obtained by Shannon's entropy index score ( Figure S1). There are at least four ecologically and phenotypically distinct forms of Aral carp described in the Aral Sea basin: delta carp (lake-river), semi-anadrome carp, marine carp, and reed carp. The latter has a sub-form-the "dwarf carp" with small body size and low growth rates compared to other reed carp. The distinct feature is that it reaches maturity at a weight of 200-300 g regardless of natural or artificial habitat [9].
At the same time population analysis showed no significant differences between populations. The genetic equidistance of the analyzed samples is possibly related to the common origin of the ancestral carp form and further dispersal from one refugium.
The microevolutionary transformations resulting from repeated marine transgressions of the Paratethys Sea allowed the ancestral carp form to settle in the lake-river systems of the Central Asia region and the proto-Caspian Sea. It also required adaptation to the habitat and acquisition of morphological features of modern C. carpio carpio and C. carpio aralensis in periods of isolation.
As noted earlier, the aboriginal ichthyofauna of this geographical region was formed as a result of a long evolutionary process and the movement of fish from various faunistic complexes since the end of the Tertiary period. At that time, this region was almost completely covered with desalinated waters and served as one of the centers of active speciation of cyprinids [9,31].
It should also be noted that the absence of genetic differences between the Aral and Alakol carp is a consequence of their acclimatization from the Aral Sea to Lake Alakol in the 1930s-1960s.
We showed that most of the discriminating loci are located in protein-coding genes. This fact indicates that these mutations may be functional, as they are related to biological functions and traits that can cause interpopulation differences. The most powerful discriminating loci are presented in Table 2. Most of them affect metabolic, immune, and signaling pathways and are possibly related to physiological adaptations to different environmental conditions. Together with our recently published genotyping data for domestic strains of common carp [32,33], the present results can be used to develop population-specific SNP marker panels, allowing the trade control of common carp production in the Eurasian Economic Union.