Investigating Genetic Diversity and Genomic Signatures of Hatchery-Induced Evolution in Gilthead Seabream ( Sparus aurata ) Populations

: The identiﬁcation of the genetic basis of domestication in ﬁsh species is of timely importance for the aquaculture industry in order to increase productivity, quality, and the welfare of farmed ﬁsh. The goal of this study is to investigate the largely unknown aquaculture-induced evolution in gilthead seabream, which is one of the most important farmed ﬁsh in the Mediterranean region. We used a panel of 1159 genome-wide SNPs, and genotyped 956 ﬁsh from 23 wild populations of Mediterranean-wide distribution and 362 farmed ﬁsh from ﬁve Greek hatcheries. We assessed the genetic diversity of the sampled populations and contrasted the results of four different approaches of outlier detection methods. We recognized one very strong candidate and two good candidate SNPs with evidence for aquaculture-induced evolution in gilthead seabream. The annotation of these SNPs revealed neighboring genes with biological roles from stress tolerance and disease resistance to sexual maturation that may explain our observations. In conclusion, we demonstrate that the genome of gilthead seabream, despite the fact that the species is often suggested to be in the early stages of the domestication process, shows evidence of aquaculture-induced evolution. We report on a list of genes that may explain our observations and that may be investigated further. We anticipate that our ﬁndings will stimulate additional research with the use of SNP panels of higher density that can elucidate the genomic architecture of domestication in this species of high aquacultural interest.


Introduction
Understanding the genetic basis of domestication is a fascinating and multifaceted topic of relevance to both basic and applied research and is of timely significance to the aquaculture industry. Studies of ancient DNA in animals and plants domesticated thousands of years ago have helped to answer longstanding questions on when, where, and how many times each of these species has been domesticated [1][2][3][4]. Domesticated species also provide excellent evolutionary experiments for studying genotype-phenotype maps [4,5] and for understanding the genetic basis of successful growth in captivity under stressful environmental conditions, which most often involve high growth rates, high population densities, and resistance to pathogens [6,7]. In aquaculture, such knowledge may help with the application of genomic selection practices over traditional breeding programs [8]. Furthermore, assuming a degree of parallel and convergent evolution, for example, as has been observed in plant domestications [9], the study of fish genomic adaptations in captive environments holds the promise of assisting efforts towards the domestication of new fish populations and species. Notably, the vast majority of newly farmed organisms over the last few decades have been aquatic animals, encompassing several different fish species and a variety of crustaceans and mollusks [10,11].
Aquaculture is a relatively young industry with comparably large gaps of knowledge on the genetic basis of domestication for most farmed fish [12]. Unlike most terrestrial animals and plants, fish farming and the breeding of aquatic organisms started relatively recently; thus, most species used in aquaculture are considered to be in the early stages of their domestication process [10,13], although evidence of domestication in fish has been demonstrated as early as within a few generations [7,14,15]. Recently, significant effort has been devoted to whole-genome sequencing of farmed aquatic species and to the production of genetic tools that will allow researchers to recognize genetic variations that are associated with key traits of interest to aquaculture, such as growth, stress, and pathogen resistance [16][17][18]. Nevertheless, given the polygenic nature of most traits, the typical small effective population size of cultured populations, and the early stage of domestication for aquaculture species, the identification of single loci with large effects is most often observed [19,20]. Growing attention is thus being paid to methods of detecting polygenic selection and of identifying intermediate and smaller effect variants or rare alleles in order to prioritize genomic regions for the next phase of breeding programs and to identify the genetic architecture that shapes the desired phenotypes in important aquaculture species [21][22][23][24][25].
The gilthead seabream (Sparus aurata) is amongst the most important farmed fish in the Mediterranean region, with little knowledge on its genetic basis of domestication. The production of farmed gilthead seabream has increased more than seven-fold over the past two decades [26,27]. Its domestication started relatively recently, in the 1980s, and most production is currently derived from intensive farming with Turkey and Greece as the largest producers [28,29]. The first commercial breeding programs for gilthead seabream have been reported almost 20 years ago [30,31]. As of 2015, eight companies had active selective breeding programs of gilthead seabream, four of which were from Greece [32]. Detailed information on these programs is difficult to obtain due to restrictive company policies. Janssen et al. reported that most of these companies had family-based selection programs for one to five generations, and that selected traits involved growth performance, morphology, disease resistance, and feed efficiency [32]. A 5-10% increase in the growth rate per generation has been reported in gilthead seabream that were selected for three generations [33]. Thus, the average selection response in bodyweight at harvest for gilthead seabream is expected to be similar to that of other fish species [34].
The gilthead seabream represents an intriguing case of farmed fish in which domestication may be significant but masked by high levels of inbreeding. Inbreeding depression may be particularly high in farmed gilthead seabream due to its protandric hermaphroditism, which lowers the effective population size of cultured populations, and its mass spawning behavior, which may quickly erode genetic variations due to highly unequal parental contributions [28,[35][36][37]. To date, little evidence exists on the genetic signatures of domestication, if any, in gilthead seabream. Some early comparisons between six wild and five farmed populations across the Mediterranean using 16 allozyme and six microsatellite markers, as well as DNA sequences of a part of the mitochondrial control region, provided evidence of genetic structure in the wild populations and of genetic divergence due to genetic drift in the farmed populations [38]. Šegvić-Bubić et al. combined data from nine microsatellite markers and 19 morphological characteristics to suggest some degree of genetic and morphological differentiation between wild and farmed gilthead seabream populations in the Adriatic Sea region [39]. More recently, the analysis of 1159 single-nucleotide polymorphism (SNP) markers dispersed genome-wide from 23 wild gilthead seabream populations of Mediterranean-wide distribution, has helped to clarify the biogeographic structure of the species [40]. Three distinct genetic clusters of distinct geographic origin were found, namely the Atlantic (ATL), West Mediterranean (WMED) and East Mediterranean clusters, with the latter cluster further subdivided into an Ionian/Adriatic (ION) and an Aegean (AEG) genetic cluster [40]. Effectively using the same set of SNP markers on cultured gilthead seabream from five Greek hatcheries, researchers in [41] identified three genetic clusters with a high degree of differentiation between the cultured and wild populations. The intriguing question thus arises as to whether the hatchery divergence is primarily driven by domestication selection or genetic drift. A scenario of selection should reveal the parallel divergence of the same SNPs between wild and hatchery populations, as seen in Atlantic salmon [42], whereas the divergence should be randomly distributed among SNP loci in the genetic drift scenario. In this study, we aimed to investigate this question by performing a series of genetic outlier tests between the four wild and three cultured previously identified genetic clusters using the same samples as in Maroso et al. and Gkagkavouzis [40,41]. Throughout this work, the terms "cultured" and "farmed" populations or individuals are used interchangeably and have the same meaning.

Sampling and SNP Dataset
A total of 956 wild individuals from 23 different locations and 362 farmed individuals from 5 Greek hatcheries were analyzed ( Table 1). The species is not protected in any of the sampling areas and all samples came from commercially fished animals or were provided by the hatcheries; therefore, no specific permission or approval was required for this study. Tissues (fin clip or muscle) were preserved in 95% ethanol right after sampling. Total DNA was extracted using a column-based kit (Invisorb Spin Tissue Mini Kit, Invitek, STRATEC Biomedical, Birkenfeld, Germany) and a salt precipitation method, using SSTNE buffer, for the samples that failed with the kit.
Multiple ddRAD libraries were prepared, including up to 144 samples each, splitting samples from the same population into different libraries to avoid confounding biases. More details on library preparation and bioinformatic analysis up to SNP calling are described in [40]. Briefly, reads with one or more uncalled bases were filtered out, as well as reads with 11 or more consecutive bases with an average quality score less than 20 (1% error rate). For samples sequenced on multiple lanes, reads were combined into a single file before processing. The Stacks 1.3 [43,44] de novo pipeline was used to cluster reads into consensus tags and call high-quality SNPs. SNPs were filtered out if scored in less than 80% of the analyzed samples and when the minor allele frequency (MAF) was lower than 0.5%. Accordingly, samples were filtered to retain only those genotyped at more than 80% of the markers.

Genetic Diversity and Population Structure
GenAlEx 6.502 [45] was used to calculate expected (H e ) and observed (H o ) heterozygosity as well as the fixation index (F IS ). F ST matrices were calculated with Arlequin 3.5.2.1 [46] using 50,000 permutations to test for significance. To summarize and visualize the genetic relationships among groups: the model-based clustering method, implemented in Structure 2.3.4 [47], was run using different k values and replicates of each k value and using the sampling location as a priori information. The analysis was run with k ranging from one to ten for the wild populations and from one to five for the farmed populations, each repeated five times to allow the evaluation of the likelihood of different simulated numbers of ancestral clusters. Burn in (BI) was set to 50,000 and the number of iterations (IT) to 100,000. Results from different runs were collated and the most likely k values were detected using the Evanno's method, implemented in Structure Harvester [48]. A further Structure run was carried out with the most likely k value, using 100,000 burn-in cycles and 300,000 iterations. A joint analysis of both the wild and the farmed populations was run using the program fastSTRUCTURE v.1.0 [49] for reasons of computational efficiency. Similar to the Structure analysis, fastSTRUCTURE was run with the default parameters for k values ranging from one to ten, with five repetitions for cross-validation (CV). Results were processed with StructureSelector [50] with the optimal k value selected as the one that minimized the CV error. The structure plot was then visualized using Distruct [51] from CLUMPAK [52].

Outlier Analyses
Genome-wide genotyping offers increased power for finding genetic regions potentially under natural selection. These loci can be then used to correlate genetic and phenotypic traits selected in a particular environment. Given that there is no single method that can confidently detect genomic outliers in all experimental designs, e.g., [53], we employed various methods to detect outliers detected under different assumptions and models: two Bayesian approaches implemented in BayeScan 2.1 [54][55][56] and BayeScEnv [57], which, additionally to BayeScan, incorporates environmental information in the form of 'environmental differentiation' and two methods based on the f-statistic, implemented in Arlequin [46] and Lositan [58]. A combination of different methods is strongly advised to obtain reliable information from the data [59]. Outlier detection analyses were carried out with samples grouped into distinct genetic clusters, as suggested by our clustering analysis. Specifically, pairwise comparisons of every wild cluster with every farmed cluster were performed using Arlequin, using a hierarchical island model, and Lositan. To address concerns associated with multiple test comparisons in Arlequin and Lositan, we obtained an FDR estimate for the list of significant SNPs, detected across all pairwise comparisons using the R package "qvalue" [60]. Q-value is a widely used multiple testing criterion that describes the proportion of false positives expected within a set of significant fea-tures [61]. To further minimize the possibility of false positives in Arlequin and Lositan, we considered only SNPs that were found to be significant in at least two of the pairwise comparisons performed for each genetic cluster of farmed fish. This strategy also allowed us to minimize the possibility of population effects in the wild samples. For BayeScan analysis, all the clusters were analyzed in common, whereas in BayeScEnv the wild/farmed origin was set as an environmental parameter in each cluster before their common analysis. All four programs were run with default parameters and outlier panels were defined using a threshold of p-value/FDR < 0.05.

SNP Annotation
In order to investigate the functional effects of the studied SNPs, we relied on sequence similarity local blastn searches using sequenced regions flanking to the SNPs. As a search database for the blastn searches we used the latest genome assembly of the species (GenBank Assembly: GCA_900880675.1 fSpaAur1.1; date of download: 28 April 2021). The maximum permitted e-value was set to 1E-10. Finding a unique and confident blast hit for the vast majority of the SNPs proved trivial, given that in most cases the length of the adjacent sequence information was sufficient (501 bp for 968 of the SNPs and then between 495 bp and 90 bp for the remaining 189 SNPs-mean = 149 bp). In the end, a genomic location was confidently assigned to most of the SNPs (Supplementary File S1).
In addition to their genomic location, for significant SNP outliers we also retrieved information on neighboring genes, which, due to hitchhiking, background selection or both, e.g., [62], may also have a role in the observed patterns of divergent selection for the particular SNP. Some reports suggest that causative genes can be up to 2 Mbps away from trait-associated SNPs [63]; however, we do not consider this to be the case in our study given that such observations were made largely on large-effect SNPs identified from genome-wide association studies with high-density SNP panels [63]. Here, we inspected genes within a window of 20 Kbps in either direction, which is much closer to what is considered common practice, e.g., [63][64][65].
To investigate common functional themes of associated genes, that is, of the genes found within 20 Kbps from the significant SNP outliers, we relied on semantic similarity measures of Gene Ontology (GO) annotations. Semantic similarity measures assess the degree of relatedness between Gene Ontologies by means of similarity in the meaning of their annotations [66], thus allowing additional-to-enrichment ways to assist in their interpretation from noise reduction and multidimensional scaling to graph-based visualizations [67,68]. GO terms for our list of studied genes were retrieved using the official GO annotations for zebrafish (Danio rerio) orthologs, given that zebrafish is a genetic model for teleosts [69]. The GO annotations for zebrafish were downloaded from the Gene Ontology consortium (www.geneontology.org; release: 1 September 2021) and zebrafish ortholog IDs were identified using the gene names at the UniProt database (www.uniprot.org, accessed on 2 September 2021). GO semantic similarity was calculated by means of the SimRel measure with default allowed similarity = 0.7 using the REVIGO program [70]. Multidimensional scaling was then applied to the similarity matrix of GO terms as implemented by REVIGO followed by K-means clustering to decide on clusters of semantically similar GO terms. K-means clustering was performed using the K-means method from the Scikit-learn module in Python [71] for K numbers from 1 to 10 with the final number of K decided using the Elbow method using distortion, calculated as the average of squared distances from the cluster centers of the respective clusters for each K value. The analysis was focused on the Biological Process category of GO terms as we wanted to focus on the "larger processes or biological programs accomplished by multiple molecular activities" (http://geneontology.org/docs/ontology-documentation/, accessed on 1 September 2021) instead of specific molecular functions and cellular components. A Jupyter notebook with the dataset and the accompanying Python script for the K-means analysis is provided in the Supplementary Materials (Supplementary File S2).

Genetic Diversity and Population Genetic Structure
After filtering the raw reads and the initial number of called SNPs, as described in detail by Maroso et al. [40], 956 wild gilthead seabreams were consistently genotyped at 1159 high-quality SNPs. In this sample dataset, 362 farmed individuals were added with their genotypes, acquired in the context of the Aquatrace project. Of the overall data set, two non-common genotyped loci were removed (8727_39, 13938_26) and the final dataset consisted of 1318 gilthead seabreams genotyped at 1157 SNPs, which were used for the subsequent analyses.
For the wild populations, the mean observed and expected heterozygosity values were 0.14 (range 0.12-0.15) and 0.15 (range 0.14-0.16), respectively (Table 1). These parameters tended to be lower in the Atlantic populations. The inbreeding index varied from 0.03 to 0.10 and averaged 0.07 (Table 1). No general trend was found for this parameter when comparing Atlantic and Mediterranean samples. For the farmed populations, mean observed and expected heterozygosity were also 0.14 (range 0.13-0.15) and 0.15, respectively, which were not significantly different from the wild populations. The inbreeding index varied from 0.05 to 0.09 and averaged at 0.07, the same value as in the wild populations. This is possibly due to the implementation of breeding programs and circulation of favorable breeders among fisheries, as the first key step of such programs is to address the problem of inbreeding amongst breeders by restocking (S.P. Pers. Comm; [32]). The F ST values among hatcheries and the clusters of wild populations ranged between 2.4% and 4.9% and all the pairwise calculations were statistically significant (p ≤ 0.005) (Supplementary File S3).
The genetic structure results, obtained with the SNP datasets for the wild populations, are described extensively in [40]. Overall, the results suggested a subdivision of wild gilthead seabream into four major genetic clusters: Atlantic (ATL), West Mediterranean (WMED), Ionian/Adriatic seas (ION), and the Aegean Sea (AEG), with the strongest differentiation being between Atlantic and Mediterranean samples (Figure 1). Similarly, for genetic structure analysis of the farmed populations, Evanno's method suggested a subdivision into three genetic clusters as being the most likely result, and all runs at this number of ancestral groupings showed the same clustering pattern. Farmed samples showed a high degree of admixture with the structure's Bayesian analysis, with the FA1 and FA3 hatcheries' samples assigned into the same cluster, the FA2 and FA5 hatcheries' samples clustered together, and the FA4 hatchery's samples assigned to their own individual cluster (Figure 2). The joint analysis supported a clear distinction between wild and cultured populations (Figure 3) for the optimal k = 7 (Supplementary File S4).    File S8). The overlap of outliers across method implementations revealed one SNP detected with all methods, namely, the SNP 901_49, whereas two additional SNPs, the SNP 8901_12 and SNP 12232_7, were detected by means of BayeScEnv, BayeScan, and Lositan ( Figure 4). Figure 5 shows the semantic similarity space by means of multidimensional scaling for the GO Biological Process of the associated genes to the three SNPs. K-means clustering and the described elbow method suggested three clusters of GO terms with the label of the cluster decided for one of the GO terms closest to each cluster center.  . Scatter plot of the GO terms for biological process (n = 33) for the associated genes, summarized by multidimensional scaling on the semantic similarity matrix. Each point represents a GO term for a biological process, and its position in the scaled space of semantic similarities depicts its similarity of meaning relative to the other GO terms. Colors correspond to K-means clusters for an optimal K-value = 3. Labels correspond to GO terms that are nearest to each cluster center. A full list of the description of the GO terms and the Python code used to create this figure is provided in the form of a Jupyter notebook (Supplementary File S2).

Assessing the Significance of Candidate SNPs for Domestication in Gilthead Seabream
This study represents the first genome-wide investigation of the genetic basis of domestication in gilthead seabream. By screening 1159 SNPs of genome-wide distribution from five Greek farmed and 23 wild populations of Mediterranean-wide distribution, we report on a distinct genetic makeup of the farmed populations compared to the wild ones (Figure 3), and we provide evidence for three candidate SNPs with signatures of parallel divergent selection in multiple farmed gilthead seabream populations. To minimize the possibility of false positives, we used conservative criteria for the detection of candidate SNPs. On one hand, candidate SNPs were significant across three different genomic outlier detection methods, namely, first, a Bayesian approach as implemented in the programs BayeScan and BayScEnv, second, an F ST approach that uses a hierarchical island model as implemented in the program Arlequin, and third, an F ST approach that uses a stepwise mutation model as implemented in the program Lositan. Comparative studies have shown different strengths and weaknesses for individual genomic outlier detection methods, often depending on the confounding effects of the demographic history, structure, and drift of the populations, e.g., [53,72,73]. Hence, genomic outliers proposed by multiple methods that assume different demographic models are more likely to be true positives than any individual method given that we have no knowledge about the demographic history of the studied populations [53,74]. On the other hand, candidate SNPs were significant in different pairwise comparisons between farmed and wild genetic clusters recognized by the programs Structure (Figures 1 and 2) and fastSTRUCTURE (Figure 3). In this way, we controlled for the confounding factors of population structure in our studied samples and of unknown management practices such as broodstock supplementation with wild fish and the exchange of farmed fish [28]. Importantly, with this approach we also managed to report on parallel patterns of divergence across multiple farmed and wild population comparisons. We investigated SNPs that were significant outliers in multiple farmed populations when contrasted with our best-known wild gilthead seabream genetic variation in the Mediterranean region for the 1159 studied SNPs (Figure 1) [40]. The fact that our farmed samples were only of Greek origin might be seen as a limitation of this study. Nevertheless, it is not uncommon for gilthead seabream to be exchanged between European farms (A.T. Pers. Comm.); in fact, it is known that gilthead seabream broodstock fish are exchanged between the hatcheries grouped under the same clusters according to structure analysis (FA1-FA3 and FA2-FA5) (A.T. Pers. Comm.). It is thus possible that our findings are valid across a wider range of gilthead seabream farms in the Mediterranean region; however, this remains to be tested. The fact that we detected three candidate SNPs for domestication in gilthead seabream by screening a relatively low number of 1159 random genomic SNPs suggests potentially strong divergent selection for domestication that, for example, medium-to-high-density SNP screening approaches may be able to elucidate further. To this end, Peñaloza et al. recently reported on the development of a 15K SNP chip for gilthead seabream that may be ideally suited for this purpose [75]. Our analysis of the semantic similarity of the genes within 20 Kbps from the three candidate SNPs also managed to recognize some common functional patterns ( Figure 5). Below, we discuss in detail the potential biological significance of each of the recognized candidate SNPs in the domestication process of gilthead seabream.

Genes Associated with the SNP 901_49
Of the three reported candidate SNPs, the one labeled as 901_49 was found to be significant in all four method implementations and thus represents perhaps the strongest candidate SNP for divergent selection in the farmed gilthead seabream populations (Figure 4). Furthermore, its allele frequencies showed a marked decrease from an average of 39.79% (±4.39%) in the wild populations to 10.70% (±5.65%) in the farmed ones for one of the alleles ( Figure 6). Figure 6. Plot of the allele frequencies of the SNP_901_49 at the studied clusters in which a marked change in frequencies has been observed between farmed (in green) and wild (in red) clusters of gilthead seabream populations. Error bars denote standard deviations, whereas circles and diamonds represent the two different alleles. The information of which populations are grouped in each cluster id can be found in Table 1 The SNP was mapped on the chromosome 21 (position: 7694060) of the published gilthead seabream genome, and it is found within the coding region of the UROD gene. The SNP, however, was found to represent a synonymous mutation and is not expected to have any obvious effect on the sequence and function of the uroporphyrinogen decarboxylase protein. In addition to the UROD gene, the following genes were found within 20 Kbps in either direction of the SNP: ZSWIM5, LYN, and PKIA, which encode respectively for the proteins zinc finger SWIM-type containing 5, tyrosine-protein kinase Lyn, and cAMPdependent protein kinase inhibitor alpha-like.
Regarding the function of these four genes, the uroporphyrinogen decarboxylase protein is involved in heme biosynthesis [76], which is an important pathway in fish adaptation to novel environmental conditions and stress, for example, in salinity adaptation in European flounder (Platichthys flesus) [77] and in Atlantic salmon (Salmo salar) infected with sea lice [78]. Decreased hematocrit levels in cultured gilthead seabream are not uncommon due to bacterial infections [79] or under stressful environmental conditions such as low temperatures [80,81]. Regarding the ZSWIM5 gene, its protein contains an evolutionary conserved domain that may interact with DNA, RNA, and proteins, which suggests a potentially major regulatory role [82,83]. Chang et al. provided evidence of an important role for the protein in mice during early development [84], whereas Micheletti et al. identified in Chinook salmon (Oncorhynchus tshawytscha) a related gene, ZSWIM7, associated with sex differences [85]. Whether this evidence indicates a potential unknown confounding factor in relation to sex or perhaps the rate of sexual maturation in the protandrous gilthead seabream remains to be investigated. Concerning the LYN gene, its protein is a member of subfamily B of Src kinases [86], and has been assigned roles in hematopoiesis and immune function [87], in spermatogenesis [88], and as a key regulator of the stress-activated protein kinase pathway in response to diverse environmental stimuli [89,90]. These roles have been somewhat validated in several fish species, for example, the tyrosine-protein kinase Lyn transcript has been found expressed in the spleen tissue of the grass carp (Ctenopharyngodon idella) [91] and was found to be markedly unregulated during hyperosmotic stress in the fish Gillichthys mirabilis [92]. Arguably, immune function and stress resistance are paramount for the health of any farmed fish species, including gilthead seabream [93,94], and these biological roles may explain the divergent selection observed at the adjacent SNP 901_49 in the farmed gilthead seabream. Notably, "innate immune response" was a GO term that described one of the three clusters of GO terms in our analysis of semantic similarities ( Figure 5). Regarding the PKIA gene, its protein belongs to the family of cAMP-dependent protein kinase (PKA) inhibitors, which in several fish species have been found to regulate the growth hormone (GH) and the gonadotropin-releasing hormones (GnRH) [95]. In many fish raised in captivity, including gilthead seabream, GnRH has been found to be responsible for the regulation of ovulation and spawning [96][97][98]. GH is furthermore a key hormone in fish during exposure to stress, helping the fish adapt to a wide range of environmental stressors such as water temperature and salinity changes [99]. Altogether, we provided several lines of functional evidence as to why the genomic region surrounding SNP 901_49 at chromosome 21 of the gilthead seabream may be an intriguing topic of further research to understand the genetic basis of domestication in the species.

Genes Associated with SNP 8901_12 and SNP 12232_7
The other two reported candidate SNPs, labeled 8901_12 and 12232_7, showed similar behavior at the four methods used to detect divergent selection in the farmed gilthead seabream populations. Both SNPs were found to be significant in the BayeScEnv, BayeScan, and Lositan analyses, but not in Arlequin (Figure 4). In Arlequin, both SNPs were within the top 50 most significant SNPs (p-value < 5%) in all three genetic clusters of farmed fish, but only when compared with the ATL cluster of wild populations (Supplementary File S5). This observation suggests population-specific effects, although methodological limitations cannot be excluded given that we did not get similar observations with the other three methods (Supplementary Files S6-S8). Furthermore, in Lositan for SNP 12232_7, two of the four comparisons with the ATL cluster were not found to be significant (Supplementary File S7), which indicates that the population effect detected with Arlequin may not be very strong. However, a shortcoming of the method itself that we did not investigate further, such as under the conditions described in [73], cannot be excluded. Furthermore, the allele frequency for the less frequent allele showed for SNP_8901_12 an increase from zero in the farmed populations to 6.13% (±1.14%) in the wild ones, and for SNP 12232_7 an increase from zero in the farmed populations to 5.59% (±3.21%) in the wild ones. These changes in frequencies were thus much smaller than those observed with SNP 901_49. Nevertheless, we report on SNP 901_12 and SNP 12232_7 as loci with correlative evidence in the domestication of gilthead seabream that future research will investigate further. The SNP 8901_12 was mapped at chromosome 21 (position: 15127025), the same chromosome at which the aforementioned SNP 901_49 was found. The SNP 8901_12 is located within the gene ZZZ3, which encodes for the ZZ-type zinc finger-containing protein 3, at about the middle of its ca. 16 Kbps gene region. However, it is located at an intronic region, between exon 7 and exon 8 of the ZZZ3 gene, and thus it is not expected to have an effect on the sequence and function of the produced protein. Within 20 Kbps in either direction of the SNP, two more genes were found, namely, the genes USP33 and AK5, which encode for the proteins ubiquitin carboxyl-terminal hydrolase 33 and adenylate kinase isoenzyme 5, respectively. The SNP 12232_7 was mapped at chromosome 12 (position: 25666771) of the gilthead seabream genome. It is found within the gene PPIL1, which encodes for the protein peptidyl-prolyl cis-trans isomerase-like 1, but SNP 12232_7 is also located at an intronic region between exons 4 and 5.
The PPIL1 gene produces an evolutionary conserved peptide sequence from humans to zebrafish and fission yeast [100]. It has peptidyl prolyl isomerase activity (PPIase) [100,101] and also participates in the spliceosome that removes introns from pre-mRNA [102]. As a function, PPIase activity has been implicated in the environmental stress response of a wide range of organisms from water draught in plants, e.g., [103], to heat stress in Escherichia coli [104]. Alternative splicing, although not fully yet understood, has been suggested to play an important role in the evolutionary adaptation of eukaryotes such as temperature adaptation in Drosophila [105]. In fish, a comparative analysis of orthologous transcripts of four species of the Percidae family has recognized the PPIL1 transcript amongst those fast-evolving and positively selected genes, by means of the dN/dS ratio, with a potential role in temperature adaptation [106]. A similar stress coping role may be assigned to the ZZZ3 gene. It belongs to the zinc finger superfamily of proteins, which are considered master regulators of coping with environmental stress, especially in plants [107,108], but also in animals [109,110]. The USP33 gene functions primarily as an ubiquitin-specific protease (USP) and thus regulates cellular protein traffic [111] and modulates interactions that regulate autophagy and immune responses to various stressors [112]. In fish, for example, the transcription of USP33 was found to be significantly altered in zebrafish after exposure to toxic nanomaterials [113], whereas in the large yellow croaker (Larimichthys crocea) a GWAS analysis listed USP33 amongst those candidate genes associated with acute heat tolerance in this fish species [114]. It should also be noted that "ubiquitindependent catabolic process" was a GO term that described one of the three clusters of GO terms in our analysis of semantic similarities ( Figure 5). The AK5 gene encodes an enzyme with an evolutionarily conserved role in energy metabolism that in fish has been involved in various aspects of temperature acclimation from sperm motility in brown trout (Salmo trutta), burbot (Lota lota), and the European grayling (Thymallus thymallus) [115] to the muscles of rainbow trout (Oncorhynchus mykiss) and bastard halibut (Paralichthys olivaceus) [116,117]. Overall, we provide several lines of evidence on the biological role of neighboring genes that may explain the observed parallel divergence of the reported SNPs in the farmed gilthead seabream. These genes and genomic regions may be targets of future studies that will investigate further the domestication process in this fish species.

Conclusions
In conclusion, we here present the first genome-wide evidence of the genetic basis of domestication in the commercially important fish gilthead seabream, using 1159 randomly identified SNPs of genome-wide distribution. The three reported robust outlier SNPs were found to be significant outliers in multiple genetic clusters of farmed fish populations when contrasted with different genetic clusters of wild populations. This finding supports a scenario of parallel signatures of adaptation of gilthead seabream to the farmed conditions, which remains to be investigated further, for instance, with the use of higher-density SNPs that have been recently developed for gilthead seabream [75]. We also reported on the biological role and function of multiple genes closely linked to the outlier SNPs that may explain the observed divergence, and provide targets for future research in the domestication of gilthead seabream. Notably, two of the outlier SNPs, namely, SNP 901_49 and SNP 8901_12, were mapped onto the same chromosome 21, which drew some attention as to the genetic variation of this chromosome with regard to the domestication process. Amongst the associated genes, a common biological pattern has to do with coping in stressful conditions, for example, the associated UROD, LYN, PKIA, ZZZ3, USP33, and AK5 genes have been assigned such a role, which growing in hatchery environments may explain successful growth in high population densities and resistance to pathogens. This was also evident in the analysis of the semantic similarity of the GO biological process of the associated genes, where clusters of GO terms associated with the "innate immune response" and with the "ubiquitin-dependent protein catabolic process" were identified ( Figure 5). It is well known that ubiquitin-dependent pathways play major roles in modulating the effects of environmental stress at the molecular level [118]. We acknowledge that these pathways are also involved in several processes other than stress, such as in the cell cycle and proliferation, and thus more research is needed in this direction. Some possible links were also found with genes that are important for growth and maturation, such as the gene PKIA, which may also explain the selection for high growth rates in the hatchery environment.  Provided support in the form of salary for author AC but did not have any additional role in the study design, data collection or analysis; the decision to publish; or the preparation of the manuscript. The specific role of this author is articulated in the 'Author Contributions' section.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: Demultiplexed raw reads files of the wild and farmed samples are available from the NCBI's SRA database (BioProject accession numbers PRJNA643702 and PRJNA771202, respectively).