Population Genomics of an Anadromous Hilsa Shad Tenualosa ilisha Species across Its Diverse Migratory Habitats: Discrimination by Fine-Scale Local Adaptation

The migration of anadromous fish in heterogenic environments unceasingly imposes a selective pressure that results in genetic variation for local adaptation. However, discrimination of anadromous fish populations by fine-scale local adaptation is challenging because of their high rate of gene flow, highly connected divergent population, and large population size. Recent advances in next-generation sequencing (NGS) have expanded the prospects of defining the weakly structured population of anadromous fish. Therefore, we used NGS-based restriction site-associated DNA (NextRAD) techniques on 300 individuals of an anadromous Hilsa shad (Tenualosa ilisha) species, collected from nine strategic habitats, across their diverse migratory habitats, which include sea, estuary, and different freshwater rivers. The NextRAD technique successfully identified 15,453 single nucleotide polymorphism (SNP) loci. Outlier tests using the FST OutFLANK and pcadapt approaches identified 74 and 449 SNPs (49 SNPs being common), respectively, as putative adaptive loci under a divergent selection process. Our results, based on the different cluster analyses of these putatively adaptive loci, suggested that local adaptation has divided the Hilsa shad population into two genetically structured clusters, in which marine and estuarine collection sites were dominated by individuals of one genetic cluster and different riverine collection sites were dominated by individuals of another genetic cluster. The phylogenetic analysis revealed that all the riverine populations of Hilsa shad were further subdivided into the north-western riverine (turbid freshwater) and the north-eastern riverine (clear freshwater) ecotypes. Among all of the putatively adaptive loci, only 36 loci were observed to be in the coding region, and the encoded genes might be associated with important biological functions related to the local adaptation of Hilsa shad. In summary, our study provides both neutral and adaptive contexts for the observed genetic divergence of Hilsa shad and, consequently, resolves the previous inconclusive findings on their population genetic structure across their diverse migratory habitats. Moreover, the study has clearly demonstrated that NextRAD sequencing is an innovative approach to explore how dispersal and local adaptation can shape genetic divergence of non-model anadromous fish that intersect diverse migratory habitats during their life-history stages.


Introduction
The Hilsa shad Tenualosa ilisha (subfamily Alosinae) is an anadromous and important trans-boundary fish that lives in the Bay of Bengal and migrates to the upstream rivers of Bangladesh (86% share), India (8% share), and Myanmar (4% share) for feeding, breeding, and the nursing of offspring [1,2]. Since 2016, the annual production of Hilsa shad has increased by more than 0.5 million tons, and the species alone has been contributing approximately 44% to the captured fish production in Bangladesh, which accounts for 10.5% of the total annual fish production, representing approximately 1% of the total gross domestic product of the country [3]. While the production of Hilsa shad has increased in recent years, the majority of the increased production comes from the sea and estuaries [3]. However, the production of the upstream migratory rivers remained stable or even decreased due to the disruption of migratory routes by heavy siltation, loss of spawning, feeding, and nursing grounds, the indiscriminate catching of juveniles, and an increased fishing of adults [1]. Moreover, migrations to the upstream rivers during the breeding season are largely influenced by the climatic conditions, particularly the southwest monsoon, which results in the flooding of the major rivers in Bangladesh, India, and Myanmar. All of these natural and human-induced activities together put the upstream Hilsa shad fishery at high risk [2]. As an anadromous fish, Hilsa shad migrates in heterogenic environmental conditions across complex aquatic systems, which primarily differ in terms of salinity, turbidity, and other ecological factors. Fishes fundamentally acclimate to the changing environmental conditions by expressing particular adaptive phenotypes, as short-term responses, while eventually adapting to the prevailing environmental conditions by changing their genetic components in response to the new local selective pressure [4]. Therefore, it is the major concern for the fishery managers and scientists along the Bay of Bengal regions to understand and predict how the Hilsa shad populations respond to the imposing selective pressure that results in a genetic variation among populations.
While Hilsa shad is largely anadromous, it has also been indicated that the fish may follow an amphidromous migratory pattern, because both immature and mature fish often migrate between marine and freshwaters not only for breeding, but also for feeding purposes [2]. In addition to the anadromous type, recent studies have also reported two other ecotypes-a fluvial potamodromous type and a marine type, along the Bay of Bengal regions [5][6][7]. The potamodromous stocks remain in the river systems and complete their life cycle within freshwater habitats, while the marine ecotype inhabits nearshore coastal and sea habitats and relies on downstream estuarine waters for spawning [5][6][7]. Therefore, understanding the degree of genetic erraticism among the different ecotypes is crucial for the sustainable exploitation, conservation, and management of the Hilsa shad population. However, the genetic structure and the divergence patterns of the Hilsa shad populations among the different migratory ecotypes have remained obscure until today.
The discrimination of populations, as management units, needs to be overtly implemented for the sustainable exploitation of the anadromous fish species. However, most of the anadromous fish species are distributed across diverse habitat types, causing natural selection to play an important role in promoting genetic discrimination and local adaptation [8]. As an anadromous fish, Hilsa shad populations also have a very weak genetic discrimination across their migratory habitats due to their high rates of gene flow, highly connected divergent population, and large effective population size [8][9][10]. Moreover, population boundaries are often elusive in Hilsa shad, and a lack of obvious stock boundaries often confounds the discrimination of spatial management units and the degree of connectivity among different populations [11,12]. Therefore, it is necessary to investigate the genetic structure of Hilsa shad by a combined analysis of both neutral and adaptive loci [10,13]. However, only neutral genetic markers have been extensively used in the Hilsa shad in recent decades. The limited number of neutral markers used, and their insufficient power to discriminate fine-scale genetic structures due to the recent divergence in their large population size, has caused the results concerning the structuring pattern of Hilsa shad populations to be inconclusive [14][15][16][17][18][19]. Moreover, assessing the fish population structure by the simultaneous use of both neutral and adaptive markers is yet to be common along the Bay of Bengal regions [20,21].
The recent development of next-generation sequencing (NGS) techniques has expanded the prospects of providing insights into the molecular basis of local adaptation for fish with a low genetic differentiation. This technique allows for the genotyping of thousands of genome-wide single nucleotide polymorphism (SNP) loci for ecologically important fish, without any reference genomes, to discover a large number of genetic markers [22]. The discovery of a large number of genetic markers not only affords a higher statistical power in discriminating the populations, but also facilitates the exploration of the putative non-neutral signatures regulating the underlying adaptive traits [23,24], induced either by natural [25] or anthropogenic factors [26,27]. Moreover, they provide a better solution, where traditional neutral markers fail, for distinguishing genetically weak structured populations with clear population boundaries [28][29][30][31][32]. At present, restriction site-associated DNA sequencing (RAD-seq) is widely used to genotype thousands of SNPs in multiple individuals of anadromous and other marine fish species at a relatively low cost [33,34]. Moreover, the NextRAD method has also recently gained popularity in overcoming the restriction fragment length bias of RAD-seq using Nextera library preparation and selective PCR primers to consistently amplify genome-wide loci between samples [35]. Various outlier tests have proven to be a popular means for the discovery of the putatively adaptive markers of highly dispersive anadromous fish species [8,12]. As Hilsa shad have a very weak genetic discrimination across their heterogenous migratory habitats, it represents an ideal candidate for identifying a putatively adaptive panel of SNP loci, generated by a genome scan using the NextRAD method and outlier tests.
The degree of genetic discrimination among the Hilsa shad populations, available in different migratory routes, has remained obscure until today. Therefore, the aims of the present study are to discriminate the fine-scale genetic divergence of Hilsa shad populations and to reconcile previous inconclusive findings on the population genetic structure across their diverse migratory habitats. In this study, we employed the NextRAD sequencing technique to genotype 15,453 SNP loci for 300 individuals of Hilsa shad, collected from nine strategic habitats, including sea, estuary, and different rivers in Bangladesh. The different spatial analyses, using the entire 15,453 SNP dataset, revealed a very weak genetic discrimination among the different population of Hilsa shad. Therefore, we applied both the F ST OutFLANK and pcadapt approaches to ascertain a set of neutral and adaptive genetic markers to better understand whether there is a neutral and/or adaptive genetic variation associated with their distribution in heterogenic environments. Finally, we conducted a homology search to align the NextRAD sequence tags of the adaptive loci to the known genes in public databases in order to elucidate the homologous functions of these genes, which might be associated with the divergence local adaptation of Hilsa shad.

Habitats of T. ilisha in the Complex Ecosystem of Bangladesh
In the geographical region of Bangladesh, Hilsa shad mainly inhabits the Bay of Bengal (sea), including the lower regions of the estuary (Meghna estuary), but it migrates to the upstream rivers, mainly the Ganges-Brahmaputra-Meghna (GBM) river systems, during the spawning season and returns to its original habitat, after spawning, in adulthood. Jamuna River (also known as the Brahmaputra) joins the Padma River (also known as the Ganges) in Goaland, in the Rajbari district of Bangladesh, and the combined flow of the Padma and Jamuna Rivers flow downwards, forming the Padma River. The Meghna, coming from the northeast region of the country, joins with the Padma further downstream in Chandpur and flows downwards to the south, forming the Meghna River, which finally meets the Bay of Bengal through the formation of a large estuary, named the Meghna Estuary ( Figure 1). According to the geography of Bangladesh, the Padma-Jamuna Rivers are considered to be the northwestern rivers, which together discharge about 1 billion tons of sediment annually, placing the Padma-Jamuna in the top three rivers in the world [36]. Therefore, the western rivers are considered to be turbid rivers, with suspended sediment concentrations of about 190-1600 mg/l in the Padma According to the geography of Bangladesh, the Padma-Jamuna Rivers are considered to be the northwestern rivers, which together discharge about 1 billion tons of sediment annually, placing the Padma-Jamuna in the top three rivers in the world [36]. Therefore, the western rivers are considered to be turbid rivers, with suspended sediment concentrations of about 190-1600 mg/l in the Padma and 220-1400 mg/l in the Jamuna [37]. The Meghna River is formed inside Bangladesh, in the Kishoreganj District, by the joining of the Surma and the Kushiyara Rivers, both of which originate in the hilly regions of Eastern India. Various tributaries of the Surma and Kushiyara flow through a vast natural depression, called the haor basin (haor is a wetland ecosystem of 80,000 sq·km in Northeastern Bangladesh, which is surrounded by the hill ranges of India), where waters settle the sediment loads, before a confluence, named the Meghna. Due to these characteristics, the Meghna River and its tributaries are considered to be clear water rivers, and their suspended sediment concentrations are about 20-750 mg/l [37], but the higher range is only observed occasionally during periods of heavy rainfall. The salinity varies from 30.3-33.4 ppt in the Bay of Bengal, 2.5 to 25.6 ppt in the Meghna Estuary, and <1.0 ppt in the different freshwater river systems of Bangladesh [38].

Sample Collection of T. ilisha
Three hundred individuals of T. ilisha were collected from nine strategic habitats across the migratory routes of Hilsa shad, including sea, estuary, and major freshwater river systems in Bangladesh ( Figure 1, Table 1). For the present study, 30 individuals were collected from each sampling site, while 60 individuals were collected from the upper Padma River (UPR). Fin clips of Hilsa shad samples were preserved in absolute ethanol, prior to DNA extraction. DNA isolations were performed using the Promega DNA purification system (Promega, Madison, WI, USA), according to the manufacturers' protocol. DNA quantifications were conducted using the real-time PCR fluorescence measurements of double stranded DNA and the Quant-it kit (Life Technologies, Foster City, CA). All samples were collected under a Bangladesh government permit and in accordance with the animal care protocol (CVASU20180828), as approved by the Chattagram Veterinary and Animal Sciences University's Animal Care and Biosafety Committee.

NextRAD Genotyping
In this method, NextRAD genotyping-by-sequencing libraries were prepared at SNPsaurus (SNPsaurus, LLC, Eugene, OR, USA) by fragmenting the genomic DNA [39,40] with the Nextera DNA Flex Library Prep Kit (llumina, San Diego, CA, US), which also ligated short adapter sequences to the ends of the fragments. To adjust the quality of the DNA and increase the fragment size, the Nextera reaction was scaled at 10 ng of genomic DNA, as an input in the fragmentation reaction, although 20 ng of genomic DNA was also used to compensate for the amount of degraded DNA in the samples. The fragmented DNA was then amplified for 25 cycles at 72 • C, with one of the primers matching the adapter and extending 8 nucleotides into the genomic DNA, with the selective sequence, GTGTAGAG. Thus, only fragments starting with a sequence that could be hybridized by the selective sequence of the primer were efficiently amplified. The NextRAD libraries were sequenced on an Illumina HiSeq 4000, with eight lanes of 150 bp reads (University of Oregon, Eugene, OR, USA). The resulting fragments were fixed at the selective end and had different lengths, depending on the initial Nextera fragmentation. Therefore, a careful size selection of the library was not required, as the amplified DNA from a particular locus was present at many different sizes.
The remaining loci were then aligned to each other to identify allelic loci and collapse allelic haplotypes into a single representative. All reads were mapped to the reference, with an alignment identity threshold of 0.95, using bbmap (BBMap tools). Genotype calling was conducted using callvariants (BBMap tools) (callvariants.shlist=ref_shadupsplit_rm.txt.align_samplesout =shadupsplit_total.vcfref=h_GCA_004329175.1_ BAU_Tilisha_2.1_genomic.fna ploidy=2 multisample=t rarity=0.05 minallelefraction=0.05 usebias=f ow=t nopassdot=f minedistmax=5 minedist=5 minavgmapq=15 minreadmapq=15 minstrandratio=0.0 strandedcov=t). The vcf was filtered to remove alleles with a population frequency of <3%. Loci that were heterozygous in all samples or had more than 2 alleles in a sample (suggesting collapsed paralogs) were removed. The absence of artifacts was checked by counting the SNPs at each read nucleotide position and determining that the SNP number did not increase with the reduction of the base quality at the end of the read.
The original shadstringent.vcf file contained data for 26,718 SNPs loci, existing within a catalog of 92,721 consensus NextRAD tagged sequences of 150 bases each. A column containing a unique ID for each SNP locus was added to the unfiltered vcf file, using a custom perl script to remove the redundancy among loci. Further filtering steps included the removal of complex SNPs with more than two alleles, an overall minor allele frequency of less than 5%, and a completeness of data among samples of less than 80%. Likewise, samples containing a completeness of data of less than 80% among the remaining loci, a minimum quality score lower than 30, and a mean depth per genotype lower than 20 were removed from the dataset. Additionally, where there were multiple SNPs located in a single NextRAD sequence tag, only the first cataloged SNP was kept in the dataset so as to avoid the possibility of a single NextRAD locus having a disproportionate effect on the analyses. The unique IDs for the SNP loci that passed quality control standards were transferred to a whitelist, which was used to filter the original vcf file using a custom perl script. After all of the filtering steps were complete, a total of 15,453 individual SNP loci remained in the dataset.

Clustering Analyses
The initial clustering analyses, through several spatial structuring programs using all 15,453 putative SNPs loci, revealed extremely low to no genetic structuring among all collected individuals from different environments. Therefore, we identified outlier SNPs using the adegenet v2.0.1 R package [41] and two different approaches, OutFLANK [42] and pcadapt version 3.0.2 [43,44]. OUTFLANK calculates the neutral distribution of Fst values and then uses this distribution to assign q-values to each locus to detect adaptive loci that are putatively influenced by selection [42]. On the other hand, pcadapt performs a principal component analysis and computes the p-values of each locus to detect adaptive loci [43,44]. Default parameters were used for OUTFLANK and pcadapt analysis, and the "number_of_samples" parameter was set to 9 (a number equal to the sampled collections). To control the false positive, we set the F ST threshold value to 0.05 in the OUTFLANK approach and the false discovery rate (FDR) threshold value to 0.05 in the pcadapt approach. A neutral loci dataset and an adaptive loci dataset were created from the output of these two approaches for all downstream analyses. The GenoDive version 3.0 (Universiteit van Amsterdam, Amsterdam, The Netherlands) was used to conduct analysis of molecular variance (AMOVA) analyses on both datasets [45]. The GenoDive was also used for significance testing of pairwise F ST to determine the genetic differences between collection sites for the neutral and outlier datasets using the default settings, with the samples grouped by collection site. An isolation by distance analysis, using the outlier dataset, was conducted in the "adegenet" R package, using a pairwise distance matrix for all collection sites. Significance testing for the isolation by distance analysis was conducted through a Mantel test, using an Edward's genetic distance matrix and a physical distance matrix between the collection sites, with 9999 iterations (also using the "adegenet" package). In addition, each adaptive locus was subjected to a BLASTn search of all sequences in the NCBI non-redundant database (word size = 11; mismatch scores = 2 and −3; and maximum e-value = 1 × 10 −5 ). Principal component analysis (PCA) was conducted using the adegenet R package (Universite´de Lyon, UMR 5558, France) and both the neutral and outlier datasets. The Bayesian clustering method, implemented in the STRUCTURE software v. 2.3.4 (Standford University, Stanford, CA, USA), was used to genetically assign individuals to clusters [46]. Simulations were run for 100,000 steps, following a burn-in period of 100,000 steps, considering values of K (number of clusters) from one to 15, with 10 replications for each value of K. The analysis was performed using an admixture, correlated allele frequencies, and no prior information on the sampling location or morphological species. For each individual, the program identifies the fraction of the genome that belongs to each one of the clusters. The rate of change in the log likelihood between successive K values was also estimated [47]. The calculations were performed using STRUCTURE HARVESTER [48]. The clusters of the estimated population structure were visualized using CLUMPAK [49]. Clustering analysis, implemented with the discriminate analysis of principal components (DAPCs) and the outlier dataset, was also conducted, using "adegenet" to reveal possible genetic clustering among samples, without grouping by collection site.

Data Accessibility
Our raw data for 300 individuals were submitted to the DDBJ (https://ddbj.nig.ac.jp), with the DRA accession number: DRA009185. The data is publicly available from 24 December 2019.

NextRAD Sequencing and Outlier Analysis
In this study, we sequenced 100,000 loci at a depth of 20× using the NextRAD-genotyping technique, implemented on an Illumina HiSeq 4000 platform, for each individual of 300 T. ilisha fish, which, on average, gave two million reads of 150 bp per individual. The sequence analysis revealed that all of the 300 individual fish had a completeness of the genotyping data of more than 80%; therefore, all of the T. ilisha samples were considered for further downstream analyses. Out of total 46,307 SNP loci, within a catalog of 94,738 consensus sequences, only 26,718 SNP loci remained, after the indels and SNP sites with a minor allele frequency of less than 5% had been removed from the original dataset. Further quality filtering by variant calling analysis resulted in 15,453 putative SNPs loci from all 300 individuals, and these were finally used for the fine-scale population genetic structuring of T. ilisha.
The initial clustering analyses using the 15,453 putative SNPs loci revealed very little genetic differentiation among different collection sites. Therefore, we applied both the F ST OutFLANK and pcadapt approaches to identify the putative adaptive loci of T. ilisha under local selection pressure. Out of the 15,453 polymorphic SNPs loci, 74 SNPs loci were identified as outliers and putatively under positive selection by F ST OutFLANK, while 449 SNPs loci were detected as candidate outlier loci by the pcadapt approach ( Figure 2). Among this putative panel of SNPs loci, 49 loci were identified as being common between the two approaches ( Figure 3). The remaining 15,379 and 15,004 SNPs were considered as putatively neutral loci by the F ST OutFLANK and pcadapt approaches, respectively.

Demographic Inferences from FST Statistics and AMOVA Analysis
A pairwise FST comparison showed distinct values when only the outlier or neutral SNPs were used ( Table 2). An apparent difference was also observed between the pairwise FST comparisons for the outlier SNPs, identified by both the FST OutFLANK and pcadapt approaches. FST estimation, based on putatively neutral SNPs, was low but significant, ranging between 0.001 and 0.004 (p ≤ 0.001). As expected, the outlier SNPs detected by the FST OutFLANK and pcadapt approaches showed higher significant FST values than the neutral SNPs, with a range of 0.020 to 0.327 (p ≤ 0.001) and 0.005 to 0.073 (p ≤ 0.001), respectively. In both approaches, higher FST values were observed between the Meghna estuary (ME) and upper Jamuna river (UJR) (0.327, p ≤ 0.001; 0.073, p ≤ 0.001) and deep sea (DS) and UJR (0.315, p ≤ 0.001; 0.070, p ≤ 0.001) collection sites. As anticipated, non-significant and lower FST values were observed between the collection sites from the two locations of sea, the sea and estuary, and the lower and upper portion of the same rivers and/or their tributaries.

Demographic Inferences from FST Statistics and AMOVA Analysis
A pairwise FST comparison showed distinct values when only the outlier or neutral SNPs were used ( Table 2). An apparent difference was also observed between the pairwise FST comparisons for the outlier SNPs, identified by both the FST OutFLANK and pcadapt approaches. FST estimation, based on putatively neutral SNPs, was low but significant, ranging between 0.001 and 0.004 (p ≤ 0.001). As expected, the outlier SNPs detected by the FST OutFLANK and pcadapt approaches showed higher significant FST values than the neutral SNPs, with a range of 0.020 to 0.327 (p ≤ 0.001) and 0.005 to 0.073 (p ≤ 0.001), respectively. In both approaches, higher FST values were observed between the Meghna estuary (ME) and upper Jamuna river (UJR) (0.327, p ≤ 0.001; 0.073, p ≤ 0.001) and deep sea (DS) and UJR (0.315, p ≤ 0.001; 0.070, p ≤ 0.001) collection sites. As anticipated, non-significant and lower FST values were observed between the collection sites from the two locations of sea, the sea and estuary, and the lower and upper portion of the same rivers and/or their tributaries.

Demographic Inferences from F ST Statistics and AMOVA Analysis
A pairwise F ST comparison showed distinct values when only the outlier or neutral SNPs were used ( Table 2). An apparent difference was also observed between the pairwise F ST comparisons for the outlier SNPs, identified by both the F ST OutFLANK and pcadapt approaches. F ST estimation, based on putatively neutral SNPs, was low but significant, ranging between 0.001 and 0.004 (p ≤ 0.001). As expected, the outlier SNPs detected by the F ST OutFLANK and pcadapt approaches showed higher significant F ST values than the neutral SNPs, with a range of 0.020 to 0.327 (p ≤ 0.001) and 0.005 to 0.073 (p ≤ 0.001), respectively. In both approaches, higher F ST values were observed between the Meghna estuary (ME) and upper Jamuna river (UJR) (0.327, p ≤ 0.001; 0.073, p ≤ 0.001) and deep sea (DS) and UJR (0.315, p ≤ 0.001; 0.070, p ≤ 0.001) collection sites. As anticipated, non-significant and lower F ST values were observed between the collection sites from the two locations of sea, the sea and estuary, and the lower and upper portion of the same rivers and/or their tributaries.  For the putative adaptive panel of SNP loci, the AMOVA analysis yielded overall F ST values of 0.097 and 0.015 for the F ST OutFLANK and pcadapt approaches (p < 0.001), respectively (Table 3). Interestingly, hierarchical AMOVA, based on the adaptive loci identified by the F ST OutFLANK and pcadapt approaches, revealed a high divergence between individuals (103.1% and 101.7%) but no differentiation among individuals (−12.8% and −3.2%). A substantially low but significant divergence (p < 0.001) was observed (9.7% and 1.5%) among the different collection sites of Hilsa shad. Similarly, an overall F ST value of 0.002 was observed in the putative panel of neutral SNP loci identified by both approaches (p < 0.001). Consistent with the putatively adaptive loci, the AMOVA analysis of the neutral loci for both approaches demonstrated a 100.5% variation between individuals, no variation (−0.7%) among individuals, and a substantially low (0.2%) but significant variation (p < 0.001) among the different collection sites of Hilsa shad (Table 3).

Genetic Structure and Recruitment Assignment Based on Different Spatial Analyses
We have also observed substantial differences in genetic structure patterns, inferred from several spatial analyses based on the outlier and neutral SNPs datasets. The principal components analysis (PCA) of the outlier SNPs, using both the F ST OutFLANK and pcadapt approaches, showed two well-separated clusters, A and B. Sea and estuarine (salty and brackish water) collection sites were dominated by individuals of cluster A, while different riverine (freshwater) collection sites were dominated by individuals of Hilsa shad of cluster B (Figure 4). However, no clustering was found from the PCA analysis of the neutral panel of the SNP datasets.

Genetic Structure and Recruitment Assignment Based on Different Spatial Analyses
We have also observed substantial differences in genetic structure patterns, inferred from several spatial analyses based on the outlier and neutral SNPs datasets. The principal components analysis (PCA) of the outlier SNPs, using both the FST OutFLANK and pcadapt approaches, showed two well-separated clusters, A and B. Sea and estuarine (salty and brackish water) collection sites were dominated by individuals of cluster A, while different riverine (freshwater) collection sites were dominated by individuals of Hilsa shad of cluster B (Figure 4). However, no clustering was found from the PCA analysis of the neutral panel of the SNP datasets. Similarly, the discriminant analysis of principal components (DAPCs) scatterplot on the putatively adaptive SNPs identified by the pcadapt approach also revealed two genetic clusters ( Figure 5B). However, DAPCs analysis for the putatively adaptive SNPs identified by the FST Similarly, the discriminant analysis of principal components (DAPCs) scatterplot on the putatively adaptive SNPs identified by the pcadapt approach also revealed two genetic clusters ( Figure 5B). However, DAPCs analysis for the putatively adaptive SNPs identified by the F ST OutFLANK approach demonstrated nine clusters at a finer-scale level under two major clusters, as revealed for the outlier datasets identified by the pcadapt approach ( Figure 5A). Furthermore, individual assignment data of Hilsa shad by DAPCs analysis for the putatively adaptive SNPs identified by the pcadapt approach were presented in percentages of the two DAPCs clusters according to their different collection sites ( Figure 6). Mixed ancestry patterns of the Hilsa shad individuals were detected in the DAPCs grouping of each collection site, except for the upper Jamuna river (UJR). Consistent with the previous results, DAPCs grouping also demonstrated that marine and brackish water collection sites (DS, SS, and ME) were mostly dominated by the individuals of Group 1, while different riverine collection sites (MR, SKR, LJR, LPR, and UPR) were mostly dominated by the individuals of the Group 2 ( Figure 6). OutFLANK approach demonstrated nine clusters at a finer-scale level under two major clusters, as revealed for the outlier datasets identified by the pcadapt approach ( Figure 5A). Furthermore, individual assignment data of Hilsa shad by DAPCs analysis for the putatively adaptive SNPs identified by the pcadapt approach were presented in percentages of the two DAPCs clusters according to their different collection sites ( Figure 6). Mixed ancestry patterns of the Hilsa shad individuals were detected in the DAPCs grouping of each collection site, except for the upper Jamuna river (UJR). Consistent with the previous results, DAPCs grouping also demonstrated that marine and brackish water collection sites (DS, SS, and ME) were mostly dominated by the individuals of Group 1, while different riverine collection sites (MR, SKR, LJR, LPR, and UPR) were mostly dominated by the individuals of the Group 2 ( Figure 6).   OutFLANK approach demonstrated nine clusters at a finer-scale level under two major clusters, as revealed for the outlier datasets identified by the pcadapt approach ( Figure 5A). Furthermore, individual assignment data of Hilsa shad by DAPCs analysis for the putatively adaptive SNPs identified by the pcadapt approach were presented in percentages of the two DAPCs clusters according to their different collection sites ( Figure 6). Mixed ancestry patterns of the Hilsa shad individuals were detected in the DAPCs grouping of each collection site, except for the upper Jamuna river (UJR). Consistent with the previous results, DAPCs grouping also demonstrated that marine and brackish water collection sites (DS, SS, and ME) were mostly dominated by the individuals of Group 1, while different riverine collection sites (MR, SKR, LJR, LPR, and UPR) were mostly dominated by the individuals of the Group 2 ( Figure 6).   The multivariate methods were in complete concordance with the results obtained using the Bayesian clustering approach, implemented in STRUCTURE. Our STRUCTURE results, with an optimal value of K = 2, were supported by highest log-likelihood values, where K = 2, which was followed by their exponential decline as the number of groups increased. No evidence of a population structure was shown in the neutral loci datasets, as the genetically homogenous group may suggest that T. ilisha is composed of a single panmictic population ( Figure 7B,D). In contrast, a clear population subdivision, with a very small degree of admixture, was observed in the outlier datasets ( Figure 7A,C).
The multivariate methods were in complete concordance with the results obtained using the Bayesian clustering approach, implemented in STRUCTURE. Our STRUCTURE results, with an optimal value of K = 2, were supported by highest log-likelihood values, where K = 2, which was followed by their exponential decline as the number of groups increased. No evidence of a population structure was shown in the neutral loci datasets, as the genetically homogenous group may suggest that T. ilisha is composed of a single panmictic population ( Figure 7B,D). In contrast, a clear population subdivision, with a very small degree of admixture, was observed in the outlier datasets ( Figure 7A,C). These clustering patterns were further confirmed by neighbor-joining (NJ) trees, reconstructed based on Nei's genetic distances (Figure 8). The NJ trees results consistently revealed that a divergent local adaptation in different migratory habitats divided Hilsa shad population into two genetically structured ecotypes: (1) Marine and estuarine, and (2) riverine groups. Interestingly, the riverine T. ilisha collection sites were further separated into two sub-clusters: The north-western riverine (turbid freshwater) and the north-eastern riverine (clear freshwater) ecotypes. These clustering patterns were further confirmed by neighbor-joining (NJ) trees, reconstructed based on Nei's genetic distances (Figure 8). The NJ trees results consistently revealed that a divergent local adaptation in different migratory habitats divided Hilsa shad population into two genetically structured ecotypes: (1) Marine and estuarine, and (2) riverine groups. Interestingly, the riverine T. ilisha collection sites were further separated into two sub-clusters: The north-western riverine (turbid freshwater) and the north-eastern riverine (clear freshwater) ecotypes.
A Mantel test, based on the outlier loci datasets, identified by both F ST OutFLANK and pcadapt approaches, revealed a strong and significant relationship between the genetic and geographic distance, with R 2 = 0.4871 (p < 0.01) and R 2 = 0.4369 (p < 0.01), respectively. Analyses of the neutral dataset indicated a high genetic connectivity and weak differentiation between the populations by geographical distance (Figure 9). A Mantel test, based on the outlier loci datasets, identified by both FST OutFLANK and pcadapt approaches, revealed a strong and significant relationship between the genetic and geographic distance, with R 2 = 0.4871 (p < 0.01) and R 2 = 0.4369 (p < 0.01), respectively. Analyses of the neutral dataset indicated a high genetic connectivity and weak differentiation between the populations by geographical distance (Figure 9).

Gene Function of the Outlier Loci
The BLAST search of the SNP flanking sequences showed that 35 out of 449 (7.8%) outlier loci identified by the pcadapt approach and 11 out of 74 (13.9%) outlier loci identified by the FST OutFLANK approach were observed to be in the protein coding region, and their gene functions are depicted in Table 4. Among these putatively adaptive protein coding loci, five loci (SCED01086260.1:268, SCED01010228.1:165, SCED01066042.1:124, SCED01005892.1:34793, and SCED01010418.1:19123) that encode the genes (nrxn3b, TRIM67, Grik2, GABBR1, and Herc1) are mostly involved in the different neuronal activity, and important for neural communication and responsible to control a range of behavioral phenotypes ( were found to encodes genes which are mostly responsible for the metabolic process, particularly in glucose, glycogen, and lipid metabolism. Another two outlier loci (SCED01000696.1: 30199 and SCED01008207.1: 36816) are associated with two genes, namely, KDM6A and UBR3, which play a direct role in growth and embryonic development processes. Other loci encode the genes mainly involved in different biological processes, such as cell regulation and biosynthesis, membrane transport, mitotic cell development, metabolic process, signal transmission, and other functions ( Table 4). The unknown function of other outlier loci did not meet the significant alignment with publicly available sequences.

Gene Function of the Outlier Loci
The BLAST search of the SNP flanking sequences showed that 35 out of 449 (7.8%) outlier loci identified by the pcadapt approach and 11 out of 74 (13.9%) outlier loci identified by the F ST OutFLANK approach were observed to be in the protein coding region, and their gene functions are depicted in Table 4. Among these putatively adaptive protein coding loci, five loci (SCED01086260.1:268, SCED01010228.1:165, SCED01066042.1:124, SCED01005892.1:34793, and SCED01010418.1:19123) that encode the genes (nrxn3b, TRIM67, Grik2, GABBR1, and Herc1) are mostly involved in the different neuronal activity, and important for neural communication and responsible to control a range of behavioral phenotypes ( Table 4). The majority of the outlier loci (SCED01072826.1:68, SCED01077141.1:127, SCED01094462.1:170, SCED01001444.1:144867, SCED01011615.1:5090, SCED01010819.1:27239, SCED01000975.1:25910, SCED01000852.1:137973) were found to encodes genes which are mostly responsible for the metabolic process, particularly in glucose, glycogen, and lipid metabolism. Another two outlier loci (SCED01000696.1:30199 and SCED01008207.1:36816) are associated with two genes, namely, KDM6A and UBR3, which play a direct role in growth and embryonic development processes. Other loci encode the genes mainly involved in different biological processes, such as cell regulation and biosynthesis, membrane transport, mitotic cell development, metabolic process, signal transmission, and other functions ( Table 4). The unknown function of other outlier loci did not meet the significant alignment with publicly available sequences.

Discussion
Revealing the spatial and temporal dispersal of fish stocks is important for the systematic monitoring and management of fish populations. This study applied NextRAD-based data to assign Hilsa shad individuals to their respective origin populations to explore the variation in stock components and catch compositions. Loci with a high resolving power and potentially important local adaptation to the environment were detected and validated. We observed a remarkable differentiation between the individuals of marine and estuarine collection sites in the south and freshwater riverine collection sites in the north of Bangladesh, with a slight mixing between them, confirming the general validity of the stock designations. In contrast to traditional markers for detecting genetic differentiation, the genomic approach identifies the traces of selection that shape population divergence [50]. Genetic differentiation, under selective force, is not unprecedented, given that analysis based on markers under selection provides more structuring, compared to neutral markers [29,51]. This type of analysis is more promising for populations with a large effective size and minimal genetic drift effect [52], where a high level of gene flow may not be efficient to dilute the differentiation effects driven by adaptive markers [30]. The higher level of assignment success of NextRAD sequencing is likely due to larger number of loci and also outlier loci with a high discriminatory power.
We observed a contrasting magnitude and contrasting patterns of genetic variation in both neutral and outlier loci across multivariate and individual-based analyses (PCA, DAPCs, STRUCTURE, AMOVA, and NJ phylogenetic). The neutral set of markers showed an extremely low to no genetic structuring between all collection sites individuals from the different environments. Pairwise F ST estimates, based on the neutral dataset, indicates that a large fraction of the genome is undifferentiated among the collection sites, with 19.4% of estimates being <0.001 (p ≤ 0.001) ( Table 2). As expected, it is not unpredicted to observe a panmictic structure in all the other analyses, when the entire dataset is considered. In contrast, our analyses based on the outlier dataset provided evidence of a significant genetic differentiation between the populations of the freshwater and marine environments. Our isolation by distance (IBD) results also suggested that the genetic structure demonstrates the effect of dispersion and selection for the outlier dataset, and these factors should be considered when delineating conservation and management units [50]. The outlier loci suggest adaptive responses to potential environmental factors, such as salinity, which vary on regional scales, while neutral loci are influenced by genetic drift [52]. Given that the divergences revealed by the outlier dataset are discordant with the neutral loci, these loci should be driven by the effects of selection and not by the isolation alone [53].
The lack of genetic structure in the neutral loci indicates that all Hilsa shad individuals from different collection sites are connected via high levels of gene flow between the marine water and freshwater environments, which is consistent with previous studies using allozyme [14,15], restriction fragment length polymorphism (RFLP) [16,17], random amplification of polymorphic DNA (RAPD) markers [54][55][56][57], and mitochondrial DNA cytochrome b gene nucleotide sequencing [18,19]. The genetic divergence depicted by the outlier loci dataset suggests that dispersion may likely happen in the foraging ground in marine and brackish water habitats, allowing for a substantial gene flow among distant locations. The shallow sea (SS), deep sea (DS) and Meghna estuary (ME) populations are genetically homogenized, based on clustering patterns in multivariate (PCA, DAPCs) and individual analyses (STRUCTURE, NJ phylogenetics). It is also possible that physical barriers between the different environments are semipermeable, allowing for constraint dispersal, while the environmental contrasts between these habitats reinforce those barriers through selection [53]. The expansions of founder populations may also occur after isolation, which further induces adaptation, if colonizing individuals carry beneficial mutated genes [58].
Our results concerning the outlier loci also indicate that the local conditions are sufficiently different to prevent the genetic homogenization of the Hilsa shad individuals collected from the different geographical regions through the selection process [10]. We observed that the Hilsa shad from the deep sea (DS) and upper Jamuna river (UJR) collection sites in the outlier loci dataset, as detected by F ST OutFLANK (F ST = 0.315, p ≤ 0.001), was an order magnitude greater than those in the putatively neutral loci dataset (F ST = 0.003, p ≤ 0.001). While it is difficult to distinguish between differentiations caused by selection or drift processes [53], it is noteworthy that these processes are mutually inclusive and likely act in concert on populations found around the region, given its complex anadromous life cycle. The presence of two major clusters of Hilsa shad in Bangladesh was substantiated by these analyses of the outlier datasets, which could not be revealed in previous studies utilizing mainly neutral markers. In this study, different cluster analyses revealed that Hilsa shad individuals collected from nine geographical locations are divided into two genetic clusters, in which one cluster is dominated by the marine (deep sea, DS; shallow sea, SS) and estuarine (Meghna estuary, ME) water individuals and another cluster is dominated by the different freshwater riverine (MR, Meghna river; SKR, Surma-Kushiara river; LPR, lower Padma river; UPR, upper Padma river; LJR, lower Jamuna river; UJR, upper Jamuna river) individuals. The distinction between these two major clusters indicates that the gene flow among them is restricted due to a limited dispersal and selection processes, which operate at various scales throughout the heterogeneous environment [53] and likely influences their divergence. More elaborately, such discrimination of Hilsa shad individuals from different geographical locations divided into two genetic clusters can be interpreted by two different ways. Firstly, the genetic discrimination may correspond to the alternative multi-locus genotypes that are maintained by strong divergent selection pressures in a spatially heterogeneous environment in which many factors could impose selection pressures, and those selection pressures could act at one or more life history stages [53]. Secondly, the observed two genetic clusters may correspond to the intense spawning site fidelity for two geographically isolated spawning grounds, one in the freshwater habitats and another one in the brackish water estuary as reported by the previous studies [5][6][7].
The STRUCTURE analysis of the outlier loci demonstrated that individuals from the various locations were assigned either entirely to one subpopulation or entirely to the other subpopulation, indicating that the admixture between these two genetically distinct ecotypes is very limited and may also be reproductively isolated from each other (see the Figure 7). The Hilsa shad individuals collected from the upper stream sites, such as UJR, is highly differentiated from those of the marine individuals collected from DS, with a low possibility that the adult Hilsa spawns in the freshwater environment of the north of Bangladesh and migrates to the deep sea. In our previous study, we showed that the juveniles of Hilsa shad (locally known as Jatka) of a particular habitat return to their respective natal rivers (the rivers where they were born/nursed) for spawning as adults [59]. It is also possible that the marine subtype only inhabits the marine environment and relies on the estuarine waters in the Meghna estuary (ME) for spawning, without migrating to the freshwater system [5][6][7]. Another possibility is that the freshwater fish, also known as potamodromous subtypes, rely on the freshwater rivers for spawning and remain and complete their life cycle within the freshwater, without migrating to the sea [5][6][7]. However, according to the pairwise F ST values (Table 2), SS is likely the main foraging ground and acts as the main hub for gene flow exchange. This is evident from the relatively homogenized genetic structure between the Hilsa shad from the SS collection site with other freshwater collection sites in the lower part of the river system (MR, SKR, LJR, LPR), with an FST range of 0.050 (p ≤ 0.01) to 0.127 (p ≤ 0.001). Accordingly, the Hilsa shad individuals from similar tributaries share similar structuring patterns, which evidently supports the isolation by distance (IBD) concept and the ecologically adaptive behavior of the Hilsa shad. This species spends most of its lifetime in the shallow sea (SS), becomes sexually mature, and migrates to its respective natal spawning grounds in the upper river systems [60]. This migration behavior is compatible with the movements of Hilsa shad individuals at different stages between reproductive areas (LJR, LPR, MR, UJR, UPR, and SKR) and foraging grounds (ME, SS, and DS), which essentially shapes the population structure and is involved in the adaptive divergence of Hilsa shad, like those in other migratory fish species [61]. It is most likely that the selective constraints and local adaptation in the natal spawning areas overcame the genetically homogenizing effects of the neutral loci due to the sufficient gene flow exchange between the populations.
The NJ phylogenetic trees, based on the outlier loci detected by the two approaches, further subdivided the major riverine Hilsa shad clusters into two discrete sub-clusters: The north-eastern and north-western riverine groups. Therefore, the such structuring into two sub-clusters, which mirrors the spatial structuring that is maintained by environmental differences, such as the turbidity and salinity levels in the marine or brackish, muddy/turbid freshwater (north-western riverine) and clear freshwater (north-eastern riverine) habitats. Due to the effect of the different environmental conditions, the fishes of the north-western riverine habitats are of a bright silvery color, with a thicker structure, while the fishes of the north-eastern riverine are slimmer, a bit darker and elongated [62]. Previous meristic and morphometric studies have also revealed some morphological differences among the Hilsa stocks from different environments [63]. Owing to the distinctive differences, the Hilsa stocks were divided into the north-western riverine's "broad type" and the north-eastern riverine's "slender type" [64,65]. The differential seasonal spawning and fecundity level displayed by these morphotypes-the "broad type" during the late monsoon and "slender type" during the monsoon-may have also contributed to the limited natal and breeding dispersal, leading to the divergence of the freshwater Hilsa shad populations [66]. Besides the environmental differences among the collection sites, the clustering pattern of north-eastern and north-western riverine groups may also be related to the spatial autocorrelation of genotype frequencies and observed the IBD pattern of the Hilsa shad individuals [67,68].
The ability to detect a population divergence at the above scales for an anadromous species, like the Hilsa shad, may also depend on the number of outlier loci used. In this study, we employed two independent detection methods for the outliers, notably the F ST OutFLANK and pcadapt approaches, which successfully identified 74 and 449 SNPs, respectively ( Figure 2). While utilizing multiple methods for outlier loci detection, followed by intersecting the results, limits the discovery of candidate adaptive markers, which represent a more conservative approach that reduces Type I errors due to false positives [69]. All of the above-mentioned multivariate and individual-based analyses have shown that a reduction in the number of outlier loci greatly increased the signal of divergence, which divided the Hilsa shad individual from different collection sites into two ecotypes. Notably, the F ST OutFLANK dataset outperformed pcadapt in defining the clustering pattern, with a greater confidence level. The identification of 474 putatively adaptive loci using these two different outlier tests supports the notion that they are under selective differentiation across the sampled locations, although some of the observed patterns may still be affected by IBD [67,68]. Given the migratory nature of Hilsa shad, as an anadromous species, this clearer separation substantiated the notion that the genetic divergences between the Hilsa populations are caused by both IBD and dispersion processes.
The outlier loci also provided a selection of genomic regions associated with the ecotype divergence over recent temporal scales [67]. Of the 74 and 449 SNPs detected by the F ST OutFLANK and pcadapt approaches, respectively, only 35 and 11 SNPs were successfully annotated through BLAST analysis, while a total of 10 outlier loci with known functions were shared by both approaches ( Figure 3B). Only 6.8% of the outliers detected by both methods were mapped to the known sequence in the public database, which; however, demonstrated the limitation of RAD sequencing in identifying candidate genes without a reference genome [70]. Previous studies have also reported that local adaptation in fish populations is related to the presence of linked alleles, associated with life history traits [71], which is important in defining the structuring pattern of Hilsa shad. By focusing on genes related to environmental conditions, we found that about five prominent genes (nrxn3b, TRIM67, Grik2, GABBR1, and Herc1) involved in the genetic differences are those related to neuronal development ( Table 4). The potential role of the neuron system suggests that adaptive differences may be behavioral in nature. In support of this, nrxn3b, TRIM67, Grik2, GABBR1, and Herc1 are thought to be important for spatial memory, muscle function, and sensory motor development in vertebrates [72][73][74]. We hypothesize that the learning and memory storage ability of Hilsa shad may assist in their migratory behavior by allowing them to recognize their natal spawning ground for breeding, when they become sexually mature. It also seems to be highly possible that habitat differences promote adaptive divergence among Hilsa populations, particularly at early developmental stages. Our BLAST results show that the outlier loci (SCED01000696.1:30199 and SCED01008207.1:36816) are associated with two genes, namely, KDM6A and UBR3, which play a direct role in growth and embryonic development processes. These genes are regulated by various environmental factors, including salinity, oxygen, temperature, latitude, depth and habitat, during the development of an animal species [39]. Furthermore, many genes (SCED01077141.1:127, SCED01094462.1:170, SCED01011615.1:5090, SCED01000975.1:25910, etc.) that encode putatively adaptive loci are involved in metabolism (particularly in glucose, glycogen, and lipid metabolism) to provide sufficient energy during migration. The identified set of genes and multiple processes that are significantly associated with the migratory behavior of Hilsa in this study will be extremely useful in generating hypotheses in future targeted research into unraveling the mechanisms of selection.

Conclusions
The present study has provided an unprecedented solution to the stock structure of anadromous Hilsa shad populations across nationwide geographical gradients to obtain their special distribution pattern in Bangladesh. This study has also identified isolation by distance patterns, and the NextRAD sequencing has revealed a highly significant discrimination of Hilsa shad populations. Different spatial analyses subdivided Hilsa shad individuals collected from nine strategic geographical locations into two main clusters, one cluster was mostly dominated by the southern marine-estuarine individuals, while another one was mostly dominated by the northern riverine groups, although somewhat heterogenous mixture of both genetic stocks were evident in most of the collection sites due to their anadromous natures. Moreover, our results have confirmed that the northern riverine Hilsa shad population is further sub-clustered into the north-eastern (clear freshwater) and north-western (turbid freshwater) riverine groups. Besides the environmental differences, these genetic discrimination of Hilsa shad population might also be related to spatial autocorrelation of genotype frequencies and the IBD pattern. Our study provides both a neutral and adaptive context for the observed genetic divergence of Hilsa shad and, consequently, reconciles the previous inconclusive findings on their population genetic structure across their diverse migratory habitats. In the context of fisheries management, our results suggest that it is important to maintain the genetic diversity of Hilsa shad in each habitat, as the fishes are capable of a high level of gene flow within the population but a lower variation across most of their migratory distribution range. Local adaptation due to the imposing selective pressure has a pivotal role in maintaining high frequencies of particular genetic traits in each heterogenic habitat. Therefore, special attention should be paid to the management of the upstream riverine Hilsa shad population, given that the selective pressures that maintain the adaptive genetic variations in these habitats are at a high risk due to increased fishing pressure and adverse climatic conditions. Ultimately, our study has clearly demonstrated that NextRAD sequencing is an innovative approach to exploring how dispersal and local adaptation can shape the genetic divergence of non-model anadromous fish that are found in diverse migratory habitats during their life-history stages.  Acknowledgments: This work was undertaken as a part of the CGIAR Research Program on Fish Agri-Food Systems (FISH). We would like to thank the Chittagong Veterinary and Animal Science University (CVASU) and University Malaysia Terengganu (UMT) for their participation in this collaborative research. Special thanks are due to John Benzie, WorldFish HQ, Penang, Malaysia for his interest in this research and valuable suggestions. We also wish to express our gratitude to the research associates and research assistants of the ECOFISH-BD Project for their efforts in the collection of Hilsa samples. The authors alone are responsible for the opinions expressed in this article.

Conflicts of Interest:
The authors declare no conflicts of interest.