Integrative Transcriptome, Genome and Quantitative Trait Loci Resources Identify Single Nucleotide Polymorphisms in Candidate Genes for Growth Traits in Turbot

Growth traits represent a main goal in aquaculture breeding programs and may be related to adaptive variation in wild fisheries. Integrating quantitative trait loci (QTL) mapping and next generation sequencing can greatly help to identify variation in candidate genes, which can result in marker-assisted selection and better genetic structure information. Turbot is a commercially important flatfish in Europe and China, with available genomic information on QTLs and genome mapping. Muscle and liver RNA-seq from 18 individuals was carried out to obtain gene sequences and markers functionally related to growth, resulting in a total of 20,447 genes and 85,344 single nucleotide polymorphisms (SNPs). Many growth-related genes and SNPs were identified and placed in the turbot genome and genetic map to explore their co-localization with growth-QTL markers. Forty-five SNPs on growth-related genes were selected based on QTL co-localization and relevant function for growth traits. Forty-three SNPs were technically feasible and validated in a wild Atlantic population, where 91% were polymorphic. The integration of functional and structural genomic resources in turbot provides a practical approach for QTL mining in this species. Validated SNPs represent a useful set of growth-related gene markers for future association, functional and population studies in this flatfish species.


Introduction
Genetic dissection of polygenic traits is key for animal breeding and evolutionary genetics. Genomic studies offer the possibility of unraveling a huge amount of genetic polymorphisms useful for searching selection signatures in populations and for association studies with productive traits, such as growth or disease resistance, which could eventually be applied in marker-assisted selection (MAS) programs. The application of MAS has enabled increasing productivity in livestock previously subjected to traditional phenotype-based selection [1]. A more effective selection can be performed the closer the marker is to the DNA element responsible for phenotypic differences, or even better, if the marker itself causes that difference. Markers within genes physiologically related to the trait of interest may potentially allow for a more straightforward selection [2]. For example, an intronic single nucleotide polymorphism (SNP) in the chicken growth hormone gene was associated with a 16.9% increase in body weight [3], while microsatellite variation in the growth hormone receptor promoter in cattle could account for 6.9% production improvement (Hale et al., 2000) [4]. Moreover, the identification of gene-associated markers has been applied for detecting adaptive variation and genetic structure in wild and farmed populations of commercially exploited species, which is particularly of interest in order to define conservation and management units in marine fisheries [5,6].
These strategies require information on the target genes and also the discovery of suitable markers, a difficult task in non-model species with limited genomic resources. Methodologies using genomic information from related species allowed us to identify SNP markers associated with growth and reproduction in fish [7,8]. However, new generation sequencing has greatly facilitated marker identification in non-model species. In this regard, transcriptome sequencing using RNA-seq technology allows SNP identification at an affordable cost [9,10]. SNPs are abundant markers along the genomes (e.g.,~1 SNP/100 bp in fish [11]), suitable for assisted selection on a wide range of genomic scales relying on the linkage disequilibrium patterns. In addition, SNP variation may have functional meaning, involving not only non-synonymous amino acid substitutions but also affecting regulatory elements (transcription factor and microRNA binding sites, or splicing recognition sites) [12].
Growth is one of the most important traits in animal breeding and a target in most genetic breeding programs, also in aquaculture species [13]. While most terrestrial livestock such as cattle, pigs or chickens have been strongly selected for commercial traits, only around 10% of aquaculture species have undergone breeding programs [14]. Genomic technologies have increased our knowledge on the molecular basis of growth traits in fish [15]. Different genomic studies have been carried out to detect quantitative trait loci (QTL) for growth traits in finfish and identified candidate genes to be used in marker-assisted selection programs [16][17][18][19][20][21]. Nevertheless, the genetic effects and regulatory pathways underlying the growth rate are still poorly understood in teleosts as compared to mammals [22,23].
Turbot is a marine flatfish of commercial importance in Europe and China [24,25], whose culture in aquaculture facilities is a mature industry. The genomic resources available include a genetic map [26,27] and a recently assembled genome sequence [28]. Previous studies in turbot found several QTLs and associated markers for growth traits, major goals for selective breeding programs in this species [29,30]. In addition, genome scanning has been used for detecting loci involved in adaptive variation of turbot populations, some of them presumptively related to growth performance [31,32]. The integrative connection among these resources may have implications for fisheries management and genetic improvement in this species.
The main aim of this study was the detection and validation of SNP markers in turbot located in candidate genes involved in growth and associated with growth-related QTLs in this species. We used Illumina RNA-seq to obtain muscle and liver transcriptomes from control and fasted specimens enriched in growth regulation pathways, and used this information to identify growth-related genes and associated SNP markers. Integrating (positioning) our new transcriptome data with previous genetic map markers in the draft turbot genome sequence enabled us to locate the candidate transcripts and compare their positions with growth QTL markers reported in the turbot genetic map. SNPs located in 45 genes were selected based on their co-localization with QTLs and functional relevance for growth. These SNPs were validated in a turbot population and will be useful to perform association analysis with phenotypes for growth traits both in farmed broodstock and natural populations for fisheries management.

RNA-Seq and SNP Calling
A total of 19,326,140 muscle and 17,466,901 liver reads were obtained from Illumina sequencing, which enriched the transcriptomic background for turbot, useful for functional and comparative genomics in flatfish and teleosts. After trimming, 89.6% of the reads remained and were aligned against the draft turbot genome [28]. Muscle and liver transcriptomes were successfully assembled from the aligned reads, identifying 19,147 and 15,928 genes, respectively, and a total of 20,447 genes ( Table 1). The figures found in our study are comparable to those obtained using genome-guided alignment for Takifugu rubripes swim bladder (17,249 genes) and gill (16,836 genes), which together accounted for 19,388 different genes [33]. Genomes are critical tools for obtaining realistic transcriptomes since, at present, de novo assemblers usually produce a higher number of genes with lower mean length and N50 (length for which the collection of all genes of that length or longer contains at least half of the sum of the length of all the genes). For example, de novo assembly of the ray-finned fish Coilia nasus liver transcriptome resulted in 65,129 genes with a mean size of 607 bp and N50 of 813 bp [34]. Muscle and liver RNA-seq reads were used for SNP screening. A total of 93,558 polymorphisms were detected and 85,344 of them were SNPs (91.2%), similar to human estimates where SNPs account for 90% of genome variation [35]. The number of SNPs identified by next generation sequencing (NGS) technologies varies greatly among studies, probably due to different experimental designs (number of samples and tissues, sequencing depth, read length, genome availability, and SNP calling parameters, among others), but comparisons between transcriptome and genome data have been proven to improve experimental validation of SNPs [9,33,36,37]. As expected from previous studies in fish, most of the SNPs found in our work corresponded to A/G and C/T, each transition (ts) representing 28%, while transversion (tv) frequencies were below 13% ( Figure 1). The ts/tv ratio of 1.346 is very similar to that previously reported in turbot based on 866 predicted SNPs obtained from two 454 Roche runs [38], and also consistent with the values found in common carp (Cyprinus carpio) (1.31) [39] and gilthead seabream (Sparus aurata) (1.37) [40]. Values were slightly higher in channel (1.56; Ictalurus punctatus) and blue (1.68; Ictalurus furcatus) catfish [37], but lower in other species, such as chum (Oncorhynchus keta) and sockeye (Oncorhynchus nerka) salmon (around 1) [41], close to the ratio estimated from intron regions in C. carpio (1.05) [39]. Still, these values are far from those in the human exome (3.2) [42] or in the whole nuclear genome (2.1) [43]. Besides the particular sampling scenario of each study, discrepancies may also be related to different selective pressures in the species under study [39]. Both muscle and liver transcriptomes were scanned for a large set of genes (~200) involved in growth regulation previously reported in fish [16,18,22,31,32]. We found 160 growth-related genes in muscle and 125 in liver, which account for a total of 174 different genes involved in growth processes. Furthermore, different isoforms were identified for some genes, totaling 325 different transcripts (1.86 isoforms per gene). Although in fish species the growth hormone (gh) has been described in liver [44], and the neuropeptide Y (npy) both in liver and muscle [45], we did not detect muscle or liver expression of these important growth-related hormones, nor did we find liver or muscle expression of prolactin (prl), consistent with the results in Takifugu rubripes, where prl was only found in the pituitary gland [46]. Hence, future growth-related studies should consider including brain as a target tissue.
All growth-related transcripts found in this study were located in the turbot genome and anchored to the genetic map using common markers [27,28]. Sixty-four genes were located in the vicinity of a previously reported growth QTL marker [29,30], within the same scaffold ( Figure 2; Table  S1, Supporting Information), and at an average distance of 1.9 Mb (~4 cM, assuming a relationship between genetic and physical distance of 0.5 Mb/cM [47]). The pooled population sample used for RNA sequencing in this study allowed the detection of SNPs for all these growth-related genes, which were selected for subsequent analyses. The number of SNPs per gene ranged from 1 to 25, with a mean of 7.81 ± 6.00.
We selected 45 out of 176 growth-related genes (those found in our transcriptome from an initial list of ~200) based on their relevance in growth-related processes in fish and vertebrates, their co-localization with growth QTL markers in turbot and the availability of technically suitable SNPs in the transcriptome (Table 2; Figure 2; Table S1, Supplementary Materials). According to Gene Ontology (GO) term classification, these genes were involved in different growth-related processes such as cell signaling, proliferation and growth (36%), energy metabolism (31%), muscle growth and development (26%) or cartilage and bone formation (7%) ( Table S2, Supplementary Materials). We focused on the relationship between growth QTLs and candidate genes as a way to tackle the genetic basis underlying phenotypic effects, but we also focused on the gene relevance in growth-related processes [16,22] (Table 2). Previous QTL screening was performed with a limited amount of markers and families at early hatchery or at on-growing culture phases [29,30], so we would expect to find other genomic regions responsible for growth differences at different ages and under variable family or population genetic backgrounds. The selection included candidate genes associated with growth traits in different vertebrate and fish species, such as insulin growth factors 1 and 2 (igf1 and igf2) and leptin receptor (lpr) [48][49][50], or myogenic regulatory factors such as myogenin (myog) associated with weight in pigs and chickens [51,52] and with indeterminate growth in fish [53]. Members of the transforming growth factor-beta (tgfb) pathway, which regulates cell growth and differentiation, Both muscle and liver transcriptomes were scanned for a large set of genes (~200) involved in growth regulation previously reported in fish [16,18,22,31,32]. We found 160 growth-related genes in muscle and 125 in liver, which account for a total of 174 different genes involved in growth processes. Furthermore, different isoforms were identified for some genes, totaling 325 different transcripts (1.86 isoforms per gene). Although in fish species the growth hormone (gh) has been described in liver [44], and the neuropeptide Y (npy) both in liver and muscle [45], we did not detect muscle or liver expression of these important growth-related hormones, nor did we find liver or muscle expression of prolactin (prl), consistent with the results in Takifugu rubripes, where prl was only found in the pituitary gland [46]. Hence, future growth-related studies should consider including brain as a target tissue.
All growth-related transcripts found in this study were located in the turbot genome and anchored to the genetic map using common markers [27,28]. Sixty-four genes were located in the vicinity of a previously reported growth QTL marker [29,30], within the same scaffold ( Figure 2; Table S1, Supporting Information), and at an average distance of 1.9 Mb (~4 cM, assuming a relationship between genetic and physical distance of 0.5 Mb/cM [47]). The pooled population sample used for RNA sequencing in this study allowed the detection of SNPs for all these growth-related genes, which were selected for subsequent analyses. The number of SNPs per gene ranged from 1 to 25, with a mean of 7.81˘6.00.
We selected 45 out of 176 growth-related genes (those found in our transcriptome from an initial list of~200) based on their relevance in growth-related processes in fish and vertebrates, their co-localization with growth QTL markers in turbot and the availability of technically suitable SNPs in the transcriptome (Table 2; Figure 2; Table S1, Supplementary Materials). According to Gene Ontology (GO) term classification, these genes were involved in different growth-related processes such as cell signaling, proliferation and growth (36%), energy metabolism (31%), muscle growth and development (26%) or cartilage and bone formation (7%) ( Table S2, Supplementary Materials). We focused on the relationship between growth QTLs and candidate genes as a way to tackle the genetic basis underlying phenotypic effects, but we also focused on the gene relevance in growth-related processes [16,22] (Table 2). Previous QTL screening was performed with a limited amount of markers and families at early hatchery or at on-growing culture phases [29,30], so we would expect to find other genomic regions responsible for growth differences at different ages and under variable family or population genetic backgrounds. The selection included candidate genes associated with growth traits in different vertebrate and fish species, such as insulin growth factors 1 and 2 (igf1 and igf2) and leptin receptor (lpr) [48][49][50], or myogenic regulatory factors such as myogenin (myog) associated with weight in pigs and chickens [51,52] and with indeterminate growth in fish [53]. Members of the transforming growth factor-beta (tgfb) pathway, which regulates cell growth and differentiation, were also included, since they were associated with growth and reproduction in vertebrates [54][55][56][57], such as, for example, myostatin 1 (mstn1). Null mutations in this negative regulator gene of muscle development in mammals lead to 30% increased growth in mice or to a "double muscle" phenotype in cattle [58,59]. SNP variation in the 3 1 and 5 1 untranslated regions (UTR) of myostatin was also associated with differences in growth traits in bighead carp (Hypophthalmichthys nobilis) [60] and Atlantic salmon (Salmo salar) [61], respectively. Other relevant genes selected were the parvalbumin 1 (pvalb1), where microsatellite variation at the 3 1 UTR was associated with weight in Asian seabass (Lates calcarifer) [62] and growth hormone receptor 2 (ghr2), which has been associated with growth in cattle and chicken [4,63]  were also included, since they were associated with growth and reproduction in vertebrates [54][55][56][57], such as, for example, myostatin 1 (mstn1). Null mutations in this negative regulator gene of muscle development in mammals lead to 30% increased growth in mice or to a "double muscle" phenotype in cattle [58,59]. SNP variation in the 3′ and 5′ untranslated regions (UTR) of myostatin was also associated with differences in growth traits in bighead carp (Hypophthalmichthys nobilis) [60] and Atlantic salmon (Salmo salar) [61], respectively. Other relevant genes selected were the parvalbumin 1 (pvalb1), where microsatellite variation at the 3′ UTR was associated with weight in Asian seabass (Lates calcarifer) [62] and growth hormone receptor 2 (ghr2), which has been associated with growth in cattle and chicken [4,63] and tilapia (Oreochromis niloticus, O. aureus and O. mossambicus) [23].  [26]. Reported growth-related markers [29,30,32] are shown in blue, either associated with growth traits (qBW, body weight; qL, length; qK, Fulton factor) or located within the confidence interval (CI) of growth QTLs. ( †) Outlier markers proposed to be under selection in turbot [32].  [26]. Reported growth-related markers [29,30,32] are shown in blue, either associated with growth traits (qBW, body weight; qL, length; qK, Fulton factor) or located within the confidence interval (CI) of growth QTLs. ( †) Outlier markers proposed to be under selection in turbot [32].  One SNP was selected for each gene among those available. Sixty percent of the 45 SNPs selected were located within the UTR (1:2 ratio in 5' and 3' UTR) and 31% within coding regions. Among the latter, five and nine SNPs were non-synonymous and synonymous mutations, respectively, whereas the others were located within introns of putative alternative transcript variants, as the alignment with orthologous genes from model fish suggested. These data could also be explained by the sequencing of immature transcripts, although intron retention seems to be a general phenomenon playing a role in the regulation of gene expression [64]. Non-synonymous mutations can directly alter the protein structure and function, since they cause substitutions in the amino acid sequence. Variation in the 5' UTR can affect transcription factor binding sites, thus altering gene expression, while 3' UTRs contain microRNA binding targets and can influence translation efficiency and mRNA stability [65]. Finally, some microRNA genes and consensus motifs required for correct splicing are found in introns [12,66]. Indeed, an intronic SNP in the growth hormone-Releasing hormone (ghrh) was associated with growth differences in Arctic charr (Salvelinus alpinus) [67].

SNP Variation
Sequenom assays were designed for the selected 45 SNPs into two multiplex reactions (Table S3, Supplementary Materials) for validation and genetic diversity estimation. Two SNPs were not technically feasible (acss3, fbxo32) and four were monomorphic in the tested population (fgf6b, ghr2, got1a, myod). Thus, a total of 43 SNPs were genotyped (95.6%) and 39 (86.7%) were finally polymorphic enough for diversity analysis in the Cantabrian population sample assayed (Table 3). Monomorphism at relevant growth-related loci in this wild population may be due to genetic differences between the Cantabric population and the samples used for RNA-seq SNP discovery, coming from a breeding strain of Atlantic origin. Accordingly, all feasible growth-related loci may be useful for further studies in other genetically divergent turbot populations of wild or farm origin [31,32]. Interestingly, some genic SNPs in this study were located close to outlier microsatellite and SNP markers proposed to be under divergent selection in turbot (e.g., SmaUSC-E7 at LG6; SmaUSC149 at LG15 or Sma-E167 at LG22; Figure 2) [32]. Gene-associated markers have become highly valuable tools for ascertaining origin assignment and detecting fine genetic structuration in fish [5,6]. Our panel of growth-related SNP markers will be a particularly useful tool to evaluate genetic differentiation patterns among wild Atlantic and Baltic populations with respect to farmed broodstock selected for growth in order to confirm changes in growth as an adaptation to differences in temperature and salinity conditions, which has implications for improving turbot aquaculture [32].
Among polymorphic SNPs, unbiased gene diversity (He) estimates ranged from 0.058 at igf1, lum, pklr and tgfbr1 to 0.508 at lepr (mean: 0.304˘0.004). Minimum allele frequency (MAF) ranged from 0.029 (igf1, lum, pklr and tgfbr1) to 0.500 in lepr (mean: 0.221˘0.004). These values are in the range of those previously described in turbot [38,68] and other fish [39,69]. No departures from Hardy-Weinberg equilibrium (HW) were detected (α = 0.05; Table 3). F IS values per locus and globally were not significantly different from zero at p < 0.05 and showed a mean value of´0.032˘0.0003 (Table 3), congruent with conformance to HW in the Atlantic turbot population when tested simultaneously for all loci (p = 1). Significant linkage disequilibrium (LD) at p < 0.05 was detected between 37 pairs of SNPs out of the total number of pairs of loci tested in the population under study (741 G-tests), which is close to the 5% expected by chance. Seven out of these 37 significant tests involved pairs of closely linked loci (19%), whereas the other cases involved markers in different linkage groups (LG) (mostly LG5, LG6, LG10 and LG16; Table S4). These cases can be related to type I errors, although epistatic interactions between loci located in different QTL regions cannot be discarded, as reported in fish for growth traits [17]. Only two significant LD tests were retained after Bonferroni correction (p = 0.00007), corresponding to two pairs of closely linked loci, tnnc2-actc1 and mstn1-igfbp2, as predicted by the mapping of genomic scaffolds (Table S1, Supplementary Materials; Figure 2). Some closely linked genes derived from mapping predictions ( Figure 2) were prioritized in this study to increase the probability of detecting informative SNPs within relevant QTL regions in different genetic backgrounds, particularly important for genes with low polymorphism. In summary, the set of polymorphic gene SNPs developed and validated in this study represents useful tools to be used in further population and family association studies in this species. The validation rate based on integrated RNA-seq and structural genomic resources in turbot (~90%) was much higher than in other recent RNA-seq strategies in fish: 54% (26/48) in Takifugu rubripes from swimbladder RNA-seq [33]; 56.9% in rainbow trout from muscle RNA-seq [9]. Although the number of SNPs and the true SNP ratio in RNA-seq may vary due to technical parameters, this technique allows us to obtain a much higher amount of SNPs than previous technologies. SNP calling based on 454 previous transcriptome runs in turbot only detected 866 SNPs, although the efficiency was rather good (79.3%) [38]. In cod, a different method involving PCR in 71 DNA fragments achieved a 37.1% success [7]. Overall, RNA-seq performed very well for obtaining gene-targeted SNPs at affordable costs, and it is particularly efficient if integrated with available structural genomic resources, as in our study. More importantly, we could focus on candidate genes within target QTL regions for traits of productive or evolutionary interest by integrating RNA-seq with physical and genetic mapping data for SNP identification.

Towards a High Density SNP Array in Turbot
Although the aim of this work was to generate an affordable panel of gene-linked SNPs co-localizing with QTL for growth association in a large amount of fish with different genetic backgrounds, the resources presented here can be further exploited at a larger genomic scale. The total number of SNPs generated in this study can be used for designing a more extensive SNP panel as a highly informative tool for studying the architecture of quantitative traits and for increasing the efficiency of selective breeding. Our aim would be the design of a large turbot SNP array based on this information and previous reports [38,70], and on the huge SNP amount obtained in an ongoing restriction site-associated DNA (RAD) sequencing project (>10 k SNPs; unpublished data), which could integrate growth-, reproduction-and immune-related gene-linked SNPs. Although large SNP panels are routinely employed in cattle or pigs for genomic selection, high density genotyping in a large number of individuals is still expensive for aquaculture. However, the availability of new cost-effective genotyping technologies has rendered the first high density genome-wide association studies (GWAS) in aquaculture species [71][72][73]. Still, high density SNP panels are scarce, and currently have been reported only in two oyster species, the Pacific oyster and the European flat oyster [74], and in several fish, Atlantic salmon [75], catfish [76], common carp [77] and rainbow trout [78]. More aquaculture high density SNP arrays will be available in incoming years and the SNPs reported here will surely contribute to this goal in turbot.

Sampling for RNA Sequencing
Turbot juveniles from six unrelated families coming from a breeding strain of Atlantic origin were reared in tanks at water temperature of 18˝C at the facilities of CETGA (Aquaculture cluster of Galicia; Ribeira, Spain). Animals were divided in two groups, one fed daily and the other one subjected to nutritional stress by fasting. This study was performed on muscle and liver because these organs play a key role in the regulation of somatic growth in fish, the first one also representing the main edible part of fish [22,79]. In order to obtain the widest gene expression range and SNP variation among individuals as possible, muscle and liver samples were obtained from muscle and liver tissues of treated and control animals at 15

RNA Sequencing
RNA extraction was performed using the RNeasy mini kit (Qiagen) with DNase treatment following manufacturer's instructions. RNA quality and quantity were evaluated in a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) and in a NanoDrop ® ND-1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, NC, USA), respectively. Muscle and liver pooled samples were barcoded and sequenced on an Illumina HiSeq 2000 (100 bp paired-end) using standard protocols. Sequencing output quality was assessed using FastQC v0.11.2 [80]. Quality filtering and removal of residual adaptor sequences was conducted on read pairs using Trimmomatic v0.32 [81]. Specifically, residual Illumina-specific adaptors were clipped from the reads, leading and trailing bases with a Phred score less than 15 were removed and the read trimmed if a sliding window average Phred score over four bases was less than 20. Only reads where both pairs had a length greater than 36 bp post-filtering were retained. The recently assembled turbot genome [28] was used as a reference for read mapping. The genome sequence spanned 544 Mb and consisted of 16,493 scaffolds with N50 of 4.3 Mb (European Nucleotide Archive (ENA) project: PRJEB11743). Eighty percent of the genome assembly (156 scaffolds) was anchored to linkage groups (LG) of the turbot genetic map [27], enabling integrative and comparative mapping. Filtered reads were mapped to the genome using Tophat2 v2.0.12 [82] that leverages the short read aligner Bowtie2 v2.2.3 [83] with a maximum intron length of 20 kb. Cufflinks v2.2.1 [84] was used to build gene transfer format (GTF) files and cufflinks gffread utility was employed to obtain final Fasta files for muscle and liver transcriptomes.

Growth-Related Turbot Sequences
Growth-related genes were selected based on gene ontology (GO) criteria using Blast2GO [85], and previous reports on growth traits in fish [16,18,22,32] and their sequences retrieved from phylogenetically close model teleosts (stickleback, medaka, tetraodon) in Ensembl release 75 [86]. These sequences were used to scan both muscle and liver transcriptomes for orthologous turbot sequences using local BLAST (e-value < E´1 0 ) [77]. Once obtained, turbot growth-related sequences were blasted against the NCBI [87] non-redundant protein database to confirm their identity. Turbot gene sequences were also blasted against the turbot genome assembly to obtain their genomic position (unique significant hits with respect to specific genomic scaffolds).

Co-Localization of Candidate Genes with Growth QTLs
We checked for co-localization of the turbot growth-related genes in the genetic map with previously described growth QTL markers [29,30] also placed in the genome [28] using local BLAST (e-value < E´1 0 ) [88]. Genes which were placed in the same scaffold as growth QTL markers, preferably with a gene-marker distance below 1 Mbp, were considered as candidates to explain growth associations. Since no associated markers were reported for some growth QTLs (LG 6, 10, 14,15,22,23), the closest markers to the estimated QTL position in the genetic map were used as references to predict gene co-localization. For those genes placed on genomic scaffolds not anchored to any turbot LG, comparative mapping against the most informative model fish for turbot (Gasterosteus aculeatus and Oryzias latipes) [26] was used to infer their syntenic position in the turbot genetic map (LG2, LG10, LG15, LG16, LG22).

SNP Calling
SNP positions within the aligned reads were identified using the pileup function in SAMtools utilities v0.1.19 [89] with a Phredd base quality ě20, sample read depth ě10 and minor allele count ě3. Reads from the two libraries (muscle and liver) were combined to increase coverage and confidence of SNP calling. SNPs located in turbot growth genes were inspected manually in the aligned reads using Tablet [90] to avoid genes with low minor allele frequencies, and their position within genes was investigated (UTR vs. coding regions) to select SNPs with higher chances of being biologically relevant.

SNP Genotyping and Validation
A total of 45 SNPs located in 45 different genes were selected based on previous specified criteria (functional relevance, co-localization with growth QTLs and SNP calling). DNA was extracted from a piece of caudal fin using standard phenol-chloroform procedures [91]. SNPs were validated and genotyped with the MassARRAY platform (Sequenom, San Diego, CA, USA) following the protocols and recommendations provided by the manufacturer. Briefly, the technique consists of an initial locus-specific polymerase chain reaction (PCR), followed by single-base extension using mass-modified dideoxynucleotide terminators of an oligonucleotide primer that anneals immediately upstream of the SNP [92,93]. The distinct mass of the extended primer identifies the SNP allele. MALDI-TOF mass spectrometry analysis in an Autoflex spectrometer was used for allele scoring. The 45 SNPs were combined in two multiplex reactions of 29 and 16 SNPs. SNPs were classified based on manual inspection as "failed assays" (in case that the majority of genotypes could not be scored and/or the samples did not cluster well according to genotype), and feasible SNPs (markers with proper and reliable genotypes), either monomorphic or polymorphic. All SNPs were genotyped in a population sample of 34 individuals from a wild Cantabrian population, different from the samples used for RNA-seq SNP detection (see above). The Cantabrian population had been previously used as reference for immune-related marker validation in turbot [38]. Genetic diversity (unbiased expected heterozygosity, He) and minimum allele frequency (MAF) were estimated using FSTAT 2.9.3 [94]. Departures from Hardy-Weinberg equilibrium (HW) were tested using GENEPOP 4.0 [95,96]. Linkage disequilibrium (association between genotypes at pairs of loci) was tested using the log-likelihood ratio G-statistic implemented in FSTAT 2.9.3. Conformance to HW was checked using the complete enumeration method implemented in GENEPOP 4.0, because all loci were biallelic. The fixation index FIS per locus was estimated using GENEPOP 4.0 and their significance using FSTAT 2.9.3. Bonferroni correction was applied when multiple tests were performed [97].

Conclusions
RNA-seq is an efficient technique to develop markers for candidate gene association studies related to targeted biological processes. A high number of SNPs were identified in the liver and muscle transcriptomes of turbot, which revealed enrichment in transcripts involved in growth regulation, and allowed for detection of genetic variation in several relevant growth-related genes of fish and vertebrate. Integrating genomic and linkage mapping resources allowed us to place the candidate genes in the turbot genetic map and to check for co-localization with growth QTLs in this species. SNP markers for 43 genes were validated in a turbot population, providing useful tools for fine mapping within QTL regions, population genomics studies, and functional and association analysis with growth phenotypes in turbot.