Genomic Signatures of Freshwater Adaptation in Pacific Herring (Clupea pallasii)

Pacific herring (Clupea pallasii) is an essential target of commercial fishing in the North Pacific Ocean. Previous studies have suggested the existence of marine and lake ecological forms of this species within its range. The lake ecological form of herring has a shortened life cycle, spending the winter and spawning in brackish waters near the shoreline without long migrations for feeding; it also has a relatively smaller body size than the marine form. Genetic-based studies have shown that brackish water Pacific herring not only can be distinguished as a separate lake ecological form but possibly has its genetic legacy. Here, as part of an ongoing study, using ddRAD-sequencing data for marine and lake ecological forms from a total of 54 individuals and methods of comparative bioinformatics, we describe genomic signatures of freshwater adaptivity in Pacific herring. In total, 253 genes containing discriminating SNPs were found, and part of those genes was organized into genome clusters, also known as “genomic islands of divergence”. Moreover, the Tajima’s D test showed that these loci are under directional selection in the lake populations of the Pacific herring. Yet, most discriminating loci between the lake and marine ecological forms of Pacific herring do not intersect (by gene name) with those in other known marine fish species with known freshwater/brackish populations. However, some are associated with the same physiological trait—osmoregulation.

Significant advances have been achieved during in-depth studies of freshwater adaptation of the three-spined stickleback (Gasterosteus aculeatus), the well-established model for modern evolutionary biology [8]. Genetic comparisons in this species have been intensively studied in the migratory and nonmigratory ecotypes. Such studies are based on genomic data [9][10][11][12][13] and gene expression of protein-coding genes [14] or microRNAs [15]. These studies showed that the genes responsible for ion transmembrane transport play an essential role in adaptation to the freshwater environment. Moreover, these genes are under positive selection [16]. In many cases, genomic regions responsible for adaptation to the freshwater environments are grouped into compact genomic islands of divergence [4,10,11]. Atlantic herring (Clupea harengus) also has interpopulation genetic differentiation, which is related to different salinity levels and is associated with haplotype blocks, often spanning multiple genes and maintained by balancing selection [17].
The commercial importance of the Pacific herring (C. pallasii) and its trophic significance for other species in marine ecosystems prompt many studies aimed at its biology, reproduction, diversity, and genetic differentiation [18][19][20][21][22][23][24]. This species has marine and lake ecological forms. The marine ecological form of C. pallasii performs long feeding migrations and spends wintertime in the upper part of the deep sea. In addition, it uses to spawn in large sea bays. Conversely, the lake ecological form prefers brackish lagoon-type lakes and small bays to spawn and spend the winter [25]. Recent genetic-based studies showed the level of genetic differentiation between populations from the Russian part of the North Pacific Ocean and the Kara Sea; moreover, lake populations from the Kamchatka Peninsula and Sakhalin Island separated from the marine populations of Pacific herring [21,26]. At the same time, it has been suggested that the lake ecological form has its genetic component associated with freshwater adaptation [26].
In this study, we present the analysis of a double digestion restriction site-associated DNA (ddRAD) sequencing dataset of 54 individuals of Pacific herring, which was previously determined as lake (15 specimens) and marine (39 specimens) ecological forms. Genotype analysis of marine and lake ecological forms allows us to identify the genomic signatures of freshwater adaptation in Pacific herring and carry out the comparative analysis of freshwater adaptation loci with its congener-Atlantic herring.
The comparison of such discriminating loci in the genomes of marine fish species with known freshwater adaptation ability will determine the genetic mechanisms and directions of adaptation to the different osmotic conditions, particularly during the conquering of new freshwater ecological niches. It is assumed that other environmental factors related to the new habitat are also changing in addition to osmotic conditions. Another critical topic considered in this study is the representation of new data on how rigidly determined the evolutionary mechanisms of adaptation to an environmental condition and the habitats of different species. From a general point of view, it is assumed that such mechanisms may differ significantly in evolutionarily distant species in comparison with close evolutionary species [27]. Nevertheless, it involves common genes or gene categories that underlie adaptive traits. The comparative analysis of such adaptation-associated loci sheds light on the species-specific genomic mechanisms in higher vertebrates.

Pacific Herring Genomic Dataset Description
The sampling, DNA extraction, ddRAD library preparation and DNA sequencing were previously described [26]. Fifty-four individuals of two ecological forms of Pacific herring from the Northwest Pacific and the Kara Sea were involved in the present study.
Specimens from twelve wild populations of Pacific herring were combined as two separate marine (39 specimens) and lake (15 specimens) ecological forms and used in a comparative analysis aimed at describing genomic signatures of freshwater adaptation (Table 1; Figure 1). The average salinity and temperature in the sampling regions in late spring-early summer are presented in Table 2.   Table 1.   Table 1.
Stacks package (v2.41) and its clone_filter module were used for PCR duplicate removal. In addition, the process_radtags module from the same package was used for dual index demultiplexing and additional quality filtration [40]. The output reads from the Stacks package were mapped against the Atlantic herring reference genome (Ch_v2.0.2, https://www.ncbi.nlm.nih.gov/assembly/GCF_900700415.2 (accessed on 4 July 2022)) using Bowtie2 under the "very-sensitive" parameter set [41]. The BAM files that were obtained using the Samtools (v1.7) [42] and uploaded to BCFtools (v1.9) [42] for SNP-calling. Only SNPs with coverage higher than 1000× for all 54 Pacific herring individuals were used in the subsequent analyses.
The filtered VCF file was loaded into the R statistical environment (v3.4.4) for discriminant analysis. The VCF file was then converted into genlight format and used for discriminant analysis of principal components (DAPC) in the adegenet R package (v2.1.3) [43].
The VCF file was also converted into internal plink format using PLINK (v1.9) package under the "make-bed" parameter [44]. To find the loci in which allele frequencies differ in marine and lake populations of Pacific herring, we used loci association analysis using the "assoc" command, as in the case-control association study, with Fisher's p-value determination. Loci with p-value less than 1 × 10 −6 were considered to discriminate between marine and lake Pacific herring ecological forms. Gene names with discriminating SNP loci were determined using coordinates of genes from the Atlantic herring reference genome (Ch_v2.0.2), SNPs coordinates, and bedtools software package (v2.27.1) under the "intersect" command [45].
The gene list, containing discriminating SNP loci was analyzed against the reference group of genes using the ShinyGO (v0.76) web service [46]. The genes with a single discriminating SNP in the filtered VCF dataset were also included in the comparison. A sliding window method from the ShinyGO tool was implicated in the genomic region prediction process with the SNPs that discriminate marine and lake ecological forms of Pacific herring. The window size was equal to two megabase pairs (Mbp). The hypergeometric test was used to determine if the genes were significantly overrepresented. The FDR cutoff for each window was equal to 0.001.
The neutrality test was performed by Tajima's D method [47] using the Tajima command of the VCF-kit package [48]. The test was performed by evaluating the Tajima value in sliding windows with a length of 10,000 bp. The distribution of the Tajima D test values for frames containing discriminating genes was compared with those containing the remaining genes using the t.test function of the R environment. Analyses were carried out separately for the VCF file, which contained all Pacific herring specimen SNPs data, and for a VCF file which had only lake Pacific herring specimen SNPs data.

Marine and Lake Ecological Form of Pacific Herring Dataset Description and Mapping Statistics
The total number of reads generated for 54 lake and marine specimens of Pacific herring was 216,433,546 (NCBI accession numbers are presented in Table 1). DNA reads were mapped to the reference genome of Atlantic herring (Ch_v2.0.2, https://www.ncbi. nlm.nih.gov/assembly/GCF_900700415.2 (accessed on 4 July 2022)) after adapter trimming and quality filtration. From 59.91 to 88.48% of reads per ddRAD library were mapped to the reference (Table S1). Only SNPs with coverage higher than 1000× (p < 0.05) were used after the SNPs calling.

Distribution of Discriminating Loci between Marine and Lake Ecological Forms of Pacific Herring
A total of 192,433 SNP loci from the 26 chromosomes of the Atlantic herring genome were involved in the PLINK analysis. The coverage of chromosomes by DNA reads was approximately uniform, and the number of SNP loci per chromosome was nearly proportional to the length of each chromosome. However, the distribution of discriminating loci between marine and lake ecological forms of Pacific herring was extraordinarily uneven. Most were found on the seventh, twelfth, and twentieth chromosomes (Figure 2).
We also conducted discriminant analyses of the principal component between marine and lake ecological forms of Pacific herring. To explore differences, we combined SNPs datasets from Ainskoe, Nerpiche, and Bolshoy Vilyuy lake populations in one lake group and the remaining in the marine group. As a result, the sample density along the discriminant function separates two ecological groups (Figure 3).

Genomic Divergence Islands between Marine and Lake Ecological Forms of Pacific Herring
SNPs localization in gene bodies was carried out by the intersection of their coordinates with the Atlantic herring reference genome GFF file (Ch_v2.0.2, https://www.ncbi.nlm. nih.gov/assembly/GCF_900700415.2 (accessed on 4 July 2022)). In total, we found SNPs in 10,649 genes of the reference genome of Atlantic herring. Interestingly, those SNP loci, which discriminate marine and lake ecological forms of Pacific herring, were observed only in 253 genes (Table S2). Moreover, 86 discriminating SNP loci are represented in 6 genes-dnai1, LOC105891580 ral guanine nucleotide dissociation stimulator, dlg2 channel-associated protein of synapse-110, dcc netrin 1 receptor, thoc2, and sptan1, which locate in chromosomes 7, 8, 12, and 20.

Distribution of Discriminating Loci between Marine and Lake Ecological Forms of Pacific Herring
A total of 192,433 SNP loci from the 26 chromosomes of the Atlantic herring genome were involved in the PLINK analysis. The coverage of chromosomes by DNA reads was approximately uniform, and the number of SNP loci per chromosome was nearly proportional to the length of each chromosome. However, the distribution of discriminating loci between marine and lake ecological forms of Pacific herring was extraordinarily uneven. Most were found on the seventh, twelfth, and twentieth chromosomes (Figure 2). We also conducted discriminant analyses of the principal component between marine and lake ecological forms of Pacific herring. To explore differences, we combined SNPs datasets from Ainskoe, Nerpiche, and Bolshoy Vilyuy lake populations in one lake group and the remaining in the marine group. As a result, the sample density along the discriminant function separates two ecological groups (Figure 3).

Genomic Divergence Islands between Marine and Lake Ecological Forms of Pacific Herring
SNPs localization in gene bodies was carried out by the intersection of their coordinates with the Atlantic herring reference genome GFF file (Ch_v2.0.2, https://www.ncbi.nlm.nih.gov/assembly/GCF_900700415.2 (accessed on 04 July 2022)). In total, we found SNPs in 10,649 genes of the reference genome of Atlantic herring. Interestingly, those SNP loci, which discriminate marine and lake ecological forms of Pacific herring, were observed only in 253 genes (Table S2). Moreover, 86 discriminating SNP loci are represented in 6 genes-dnai1, LOC105891580 ral guanine nucleotide dissociation stimulator, dlg2 channel-associated protein of synapse-110, dcc netrin 1 receptor, thoc2, and sptan1, which locate in chromosomes 7, 8, 12, and 20.
Remarkably, chromosomes 7, 8, and 20 contain many discriminating SNP loci ( Table  S2). The same results were shown by analyzing the list of genes using the ShinyGO web service [46]. Furthermore, a significant number of discriminating SNP loci are localized in chromosomes 6, 7, 12, and 20 and in certain chromosome regions, that were called "ge- Remarkably, chromosomes 7, 8, and 20 contain many discriminating SNP loci (Table S2). The same results were shown by analyzing the list of genes using the ShinyGO web service [46]. Furthermore, a significant number of discriminating SNP loci are localized in chromosomes 6, 7, 12, and 20 and in certain chromosome regions, that were called "genomic islands of divergence" in the previous studies [11,49]. Figure 4 clearly shows how the discriminating SNP loci between the marine and lake ecological forms of Pacific herring are grouped into genomic divergence islands.  Gene ontology (GO) analysis was carried out for the list of 253 genes that contain discriminating SNP loci, using the ShinyGO (v0.76) web service. GO analysis combined 17 of them in neuron development category, which contains 221 gene (Fold Enrichment = 3.2; FDR = 0.033).
Not all discriminating SNP loci between ecological forms of Pacific herring were involved in the analysis since the ddRAD sequencing method does not cover the whole genome and reduces the complexity of the analysis by subsampling only at specific genomic sites defined by restriction enzymes [50].
In total, 133 from 253 genes, which contain discriminating SNP loci between ecological forms of Pacific herring, theare located in genomic divergence islands. Interestingly, these parts of chromosomes 6, 7, 12, and 20 contain 254 genes. Thus, the ddRAD sequencing method used in this study allowed us to determine at least half of them.
Recently, Velotta and colleagues summarized genomic mechanisms of adaptation to different salinity environment conditions in eight different fish species; they described the specific categories of genes that play a crucial role in such adaptation [16]. We found a significant number of these gene categories among our discriminating genes. Strikingly, we speculate that they were possibly involved in freshwater adaptation in Pacific herring. Notably, it has been shown that osmoregulation is influenced by insulin-like growth factor 1 (IGF-1), which receptor and associated binding proteins aid in its transport. This hormone is implicated in the proliferation and differentiation of ionocytes of fish gills [16]. The insulin-like growth factor 2 mRNA binding protein 1 (igf2bp1) and insulin receptor substrate 2b (irs2b) genes are described in the present study as discriminating between marine and lake ecological forms of Pacific herring. We suppose these genes can be involved in the proliferation of gill ionocytes, which are responsible for osmoregulation [51]. In addition, fibroblast growth factors (Fgf) are also represented among discriminating genes. Fibroblasts are implicated in forming connective tissue in animals, which is also necessary to protect body tissues from the changing environment. Moreover, it has been shown that Fgf underlies phenotypical adaptations of fish [52]. Transmembrane channel genes are important for freshwater adaptation. Previously, ATPase pump, passive ion cotransporter and transient receptor potential (TRP) channel genes were described as part of molecular Not all discriminating SNP loci between ecological forms of Pacific herring were involved in the analysis since the ddRAD sequencing method does not cover the whole genome and reduces the complexity of the analysis by subsampling only at specific genomic sites defined by restriction enzymes [50].
In total, 133 from 253 genes, which contain discriminating SNP loci between ecological forms of Pacific herring, theare located in genomic divergence islands. Interestingly, these parts of chromosomes 6, 7, 12, and 20 contain 254 genes. Thus, the ddRAD sequencing method used in this study allowed us to determine at least half of them.
Recently, Velotta and colleagues summarized genomic mechanisms of adaptation to different salinity environment conditions in eight different fish species; they described the specific categories of genes that play a crucial role in such adaptation [16]. We found a significant number of these gene categories among our discriminating genes. Strikingly, we speculate that they were possibly involved in freshwater adaptation in Pacific herring. Notably, it has been shown that osmoregulation is influenced by insulin-like growth factor 1 (IGF-1), which receptor and associated binding proteins aid in its transport. This hormone is implicated in the proliferation and differentiation of ionocytes of fish gills [16]. The insulin-like growth factor 2 mRNA binding protein 1 (igf2bp1) and insulin receptor substrate 2b (irs2b) genes are described in the present study as discriminating between marine and lake ecological forms of Pacific herring. We suppose these genes can be involved in the proliferation of gill ionocytes, which are responsible for osmoregulation [51]. In addition, fibroblast growth factors (Fgf) are also represented among discriminating genes. Fibroblasts are implicated in forming connective tissue in animals, which is also necessary to protect body tissues from the changing environment. Moreover, it has been shown that Fgf underlies phenotypical adaptations of fish [52]. Transmembrane channel genes are important for freshwater adaptation. Previously, ATPase pump, passive ion cotransporter and transient receptor potential (TRP) channel genes were described as part of molecular pathways related to response to the changing of osmotic conditions [16,53]. In the case of the Pacific herring, a minimum of six genes (abcc8, trpm1a, trpc4b, cacna1bb, sclt1, and piezo1) from this category are revealed as discriminating between marine and lake ecological forms. Moreover, seven genes involved in transmembrane transporter activity (slc4a1b, slc39a10, slc15a4, slc20a1a, slc25a25b, slc45a2, and slc16a5a) and three genes involved in calcium homeostasis (cabp1, camk2a, and cacna1bb) were also described. Genome epigenetic modifications and non-coding RNAs play a significant role in the rapid adaptation to changes in the salinity of the environment in fish [15,[54][55][56][57]. Table S2 represents several such genes, including piwil1 (piwi-like1 RNA-mediated gene).
The discriminating genes were tested for neutrality using Tajima s D statistical test. As a result, we obtained weakly positive average Tajima s D values for both groups of genes -discriminating (0.2044859) and the set of the other genes (0.3041617) with a high degree of significance (p-value = 9.042 × 10 −5 ) for the 54 Pacific herring specimens. However, Tajima s D values became sharply negative when we tested only specimens from the lake ecological form. The significantly negative values of Tajima s D indicate the genes with an excess of low-frequency variation, which is consistent with the fact that the populations might have experienced an expansion after a recent bottleneck, or the genes targeted are under positive selective pressure. Moreover, the difference in the averages of Tajima s D values for both groups of genes -discriminating (-40.30702) and the whole set of genes (-26.99115) in lake populations have a high degree of significance (p-value = 1.185 × 10 −10 ). The most logical interpretation of this result is that in addition to the population demographic effect, which can be seen by the Tajima s D for the whole gene set (-26.99), there is an additional effect of directional selection (-40.3), which is closely related to discriminating genes of the individuals from lake populations of pacific herring.

Discussion
Our previous studies allowed us to describe the genetic differentiation between the Pacific herring populations in the Russian part of its range using mitochondrial and nuclear genome markers [21,26]. Moreover, the comparative analysis of the genetic differentiation between the combined three lakes on one side and nine marine populations of Pacific herring on the other showed that lake form populations differ from the marine ones [26]. The origin of lake populations across the Russian Far East is still unclear. However, they most likely arose independently due to the significant geographical distance between Sakhalin Island and the Kamchatka Peninsula habitats. Nevertheless, Pacific herring individuals who inhabit fresh-/brackish water in different regions are clustered together based on genetic data [26]. Perhaps, such differences are explained by their ongoing contact with marine individuals, which can spread "freshwater" alleles between lake populations across Russian Far East.
In the present study, even though the relatively small number of gene categories most likely participating in the freshwater adaptation in Pacific herring, we succeeded in showing that the function of most of them are related to osmotic regulation. Moreover, in other teleost species, several of these genes have previously been correlated to the adaptation dynamic toward different salinity conditions [16]. In addition, among discriminative genes between marine and lake forms of Pacific herring, we have found a significant number of regulatory genes, which suggests that rearrangement of regulatory pathways with salinityrelated environment changes is a way to quick adaption. Previously, it has been shown on the three-spined stickleback model where at least half of the genes from the genomic divergence islands have regulatory functions [10,11].
Non-coding RNAs are also implicated significantly in the freshwater adaptation of teleost fish by gene expression regulation. Moreover, the differentiation in the microRNA expression occurs in both cases when fish transfer from the marine to the freshwater environment and vice versa-physiological (quick) response or during the time-long