First Whole Genome Sequencing Data of Six Greek Sheep Breeds

Tsoureki, Antiopi; Tsiolas, George; Kyritsi, Maria; Pavlou, Eleftherios; Argiriou, Anagnostis; Michailidou, Sofia

doi:10.3390/data10050075

Open AccessData Descriptor

First Whole Genome Sequencing Data of Six Greek Sheep Breeds

by

Antiopi Tsoureki

¹

,

George Tsiolas

^1,2,

Maria Kyritsi

¹

,

Eleftherios Pavlou

¹,

Anagnostis Argiriou

^1,3

and

Sofia Michailidou

^1,*

¹

Institute of Applied Biosciences, Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece

²

Reframe Food Astiki Mi Kerdoskopiki Etairia, 57001 Thessaloniki, Greece

³

Department of Food Science and Nutrition, University of the Aegean, 81400 Myrina, Greece

^*

Author to whom correspondence should be addressed.

Data 2025, 10(5), 75; https://doi.org/10.3390/data10050075

Submission received: 5 April 2025 / Revised: 9 May 2025 / Accepted: 13 May 2025 / Published: 14 May 2025

Download

Browse Figure

Versions Notes

Abstract

Sheep farming is a common agricultural practice in Greece, with many sheep populations belonging to Greek breeds. However, their genetic makeup remains relatively unexplored and limited information is available for their genetic variability. Here, we provide the first whole genome sequencing (WGS) data for six Greek sheep breeds, namely Chios, Kalarritiko, Karagouniko, Lesvos, Serres, and Thraki breeds. We performed variant discovery analysis on the data and identified 23,526,500 high-quality variants. The high average variant depth (148.7X ± 28.3) and low Single Nucleotide Polymorphism (SNP) density (1 variant per 111 bases) in the callset demonstrated the high quality of the data. The vast majority of the variants (97.46%) were located in non-coding regions, while a small percentage (1.32%) was positioned in exonic regions. The overall transition to transversion−Ti/Tv (2.449) and heterozygous to non-reference homozygous−Het/Hom (1.49) ratios further confirmed the callset’s high quality. This dataset comprises the first WGS data for six Greek sheep breeds, providing invaluable information to the Greek agricultural sector for the design and implementation of targeted breeding schemes, for traceability purposes, and for the overall enhancement of the sector, in terms of performance and sustainability.

Dataset: The raw WGS data have been deposited to the Sequence Read Archive (SRA), under the BioProject ID PRJNA1246525.

Dataset License: CC BY

Keywords:

sheep; Ovis aries; whole genome sequencing (WGS); Greek breeds

1. Summary

Livestock farming is one of the most common agricultural practices in Greece, concerning mainly small ruminants, in particular, sheep and goats. According to the European Union (EU), Greece has the third largest sheep flock in the EU, comprising 7.86 million animals [1]. There are more than 20 indigenous Greek breeds, evolved over time through various processes such as geographical isolation, selection and crossbreeding, genetic drift, and transhumance [2]. All breeds are reared mainly for their milk (and secondarily for their meat), with most of the milk being used for the production of Protected Designation of Origin (PDO) and Protected Geographical Indication (PGI) dairy products. Phenotypically, these breeds are quite distinct and well adapted to their local environments.

Despite the importance of sheep farming for the Greek economy and biodiversity [3], limited information about the genetic background of Greek breeds is available. Studies concerning Greek sheep genetics are relatively limited and they usually concern either a small number of genetic loci [3,4,5,6] or larger scale studies with thousands of SNPs obtained through genotyping microarrays [7,8].

Microarrays are a cost-effective approach to study the genetic composition of a population and perform genome-wide association studies (GWAS), however they can only inform on variants that are spotted in the surface of the chip. Thus, a lot of valuable information is lost, especially when studying local breeds reared in specific geographic locations or with limited geographic distribution. Consequently, whole genome sequencing (WGS) is expected to become the preferred method for genetic studies, given its multiple advantages which include the more comprehensive detection of variation in the entire genome, the ability to identify both known and novel variants, and the continuously decreasing sequencing cost [9]. Despite the multiple advantages of WGS and the fact that the sheep reference genome has been publicly available since 2014 [10], no WGS data have been generated so far for any of the Greek sheep breeds.

In this work, we report the first WGS data for six Greek sheep breeds: “Chios”, “Kalarritiko”, “Karagouniko”, “Lesvos”, “Serres”, and “Thraki”. Moreover, we describe the bioinformatics methodology implemented to obtain a high-quality variant callset. These data comprise the starting point of a nationwide database of Greek sheep genomic resources, which can be used in a wide range of applications, from population genetic studies to targeted breeding schemes and traceability. More specifically, the genetic data reported in this paper can be utilized for the comprehensive, genome-wide analysis of Greek sheep breeds, thus, revealing genetic variation patterns that have not been reported so far based on the limited amount of data obtained in previous research. These patterns can be used in various applications, including the selection of the highest-performing animals in a flock and for quality control and adulteration detection in sheep dairy and meat products. Additionally, the results based on WGS data can be leveraged by the stakeholders in the agricultural sector for the development of new or the enhancement of existing targeted breeding schemes, in order to improve the specific breeds reported in our work, along with the national flock in general, in terms of productivity, disease resistance, and resilience and adaptation to climate change.

Overall, our data provide an invaluable genetic resource that can be used for policymaking and development of management practices towards the conservation of livestock biodiversity, as well as the advancement and sustainability enhancement of the livestock sector in Greece.

2. Data Description

2.1. Sequencing Data

Sequencing resulted in the production of a total of 597.2 Gb for the six samples in fastq format, ranging from 89.9 Gb to 111.7 Gb per sample. This corresponded to an average sample coverage of 36.9X (±2.9X), with a minimum of 33.3X (Karagouniko) and a maximum of 41.4X (Chios) for the individual samples. The high degree of coverage for the samples equaled a large number of reads produced through sequencing. In particular, for the six samples, 1,990,723,918 raw, paired-end reads of 150 bases length were produced, with an average of 331,787,320 (±25,690,580) reads per sample (from 299,574,996−Karagouniko, to 372,243,261−Chios). Of the raw reads, an average of 93.4% (±0.3%) had a Q-score over 30. After quality filtering, 97.11% of the reads were retained, resulting in a total number of filtered reads equal to 1,933,201,279 (average: 322,200,213 ± 25,280,391, minimum: 290,336,550−Karagouniko, maximum: 361,941,831−Chios) with an average length of 133 (±1.2) bases and an average of 94.8% (± 0.2%) of reads with Q-score above 30. The alignment rate against the reference genome was >99.8% for all samples, confirming the high quality of the data (Table 1).

2.2. Variants’ Quality

Variant calling and genotyping of the data resulted in the identification of 33,802,500 raw variants. After application of Variant Quality Score Recalibration (VQSR) and further filtering, 10,276,000 variants were discarded. The final high-quality callset comprised 23,526,500 variants of which 21,174,015 were Single Nucleotide Polymorphisms (SNPs), 1,084,638 were insertions (IN), and 1,267,847 were deletions (DELs). These variants were located mainly on the autosomal chromosomes of the genome, as well as the X chromosome, while a very small percentage (<0.32%) was located in the unplaced scaffolds (Figure 1A). Despite the small number of samples included in the analysis, the high degree of sequencing depth, along with the joint-calling approach, allowed for the robust calling of variants at an average depth of 148.7X (±28.3X) (Figure 1B).

SNP density was quite low compared to the proposed value for minimizing false positive calls, equal to 1 variant per 10 bases [11]. In particular, the overall SNP density in our callset was equal to 1 variant per 111 bases, confirming the low likelihood of false positive variants being present in the final callset. Moreover, the SNP density value of our callset was similar to or lower than that reported in other studies [12,13,14]. Differences in SNP density can be attributed to biological factors such as the genomic differences between wild sheep populations, sheep landraces, and improved sheep breeds as well as the various methodological limitations and technical parameters used for variant calling and filtering. This was true for our dataset also, since different threshold values resulted in different SNP densities (Table S1). Specifically, because only one individual per breed and a small total number of individuals were included in our dataset, we opted to err on the side of sensitivity instead of specificity, in order to capture as many true positive variants as possible, leading to possible inflation of the number of variants reported in our callset with further validation being required before their utilization in various applications. The suggested methodologies for improving the callset’s confidence and minimizing the presence of false positives include the assimilation of a higher number of samples in the dataset, per breed and in total, the employment of a separate, additional variant caller, in order to confirm HaplotypeCaller’s results [15], the exploitation of high-confidence, validated variants as reference variants during the variant filtering step, combined with the application of more stringent filtering thresholds, and the experimental validation of representative variants through PCR or genotyping microarrays.

In our callset, SNP density in the autosomal chromosomes ranged from 8.4 ± 5.9 in chromosome 13, to 10.3 ± 6.6 in chromosome 25. For the rest of the scaffolds, greater variability was observed, with SNP density values ranging from 0.2 ± 0.6 to 18.8 ± 5.3 (Table S2). The high variability observed in SNP density can be attributed to the different mutation rate of the various genomic regions which is associated with those regions’ characteristics, such as chromatin structure, nucleosome position, base composition, gene density, recombination rate, etc. [16,17]. Moreover, sample missingness was zero for all samples, further confirming the callset’s high quality.

2.3. Variants’ Annotation

In the final callset, the vast majority of variants were located in non-coding regions, primarily in intronic (61.27%) and intergenic (26.11%) regions, as well as upstream (4.99%) and downstream (5.09%) of genes. On the contrary, a very small number of variants was detected in exonic regions (1.32%). The transition to transversion (Ti/Tv) ratio, which is a quality indicator for the SNP calling [11], was equal to 2.449 for our callset, ranging from 2.448 to 2.452 across the individual samples representative of the six Greek sheep breeds (Table 2). These results are comparable to the values reported in previous studies concerning various sheep breeds [18,19,20]. In addition, the heterozygous to non-reference homozygous (Het/Hom) SNP ratio, another indicator of SNP quality [11], ranged from 1.15 to 1.74 (1.49 ± 0.21) in our samples belonging to the different sheep breeds (Table 2), in accordance with previously published research [19].

3. Methods

3.1. Sample Collection and DNA Extraction

Samples were collected from six animals, belonging to six different Greek sheep breeds, namely “Chios”, “Kalarritiko”, “Karagouniko”, “Lesvos”, “Serres”, and “Thraki”. The animals were selected based on their origin and phenotypic characteristics, which denoted them as purebred representatives of their respective breeds. Blood samples were collected from the jugular vein in tubes containing EDTA as anticoagulant and stored at −20 °C until further processing. DNA extraction was performed using the NucleoSpin Blood QuickPure kit, using 200 μL whole blood (MACHEREY-NAGEL, Düren, Germany), according to the manufacturer’s instructions. Isolated DNA was quantified on a Qubit 4 Fluorometer using the Qubit™ dsDNA BR Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA) and its quality was assessed through a 1% agarose gel electrophoresis.

3.2. Library Construction and Sequencing

Whole genome libraries were constructed using the Illumina^® Nextera DNA Flex kit (Illumina Inc., San Diego, CA, USA), according to the manufacturer’s instructions. Libraries’ purification was performed with AMPure XP Beads (Beckman Coulter, Brea, CA, USA). Library concentration was measured on a Qubit 4 Fluorometer with the Qubit™ dsDNA BR Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA), and their size and quality was assessed on a 5200 Fragment Analyzer system (Agilent Technologies Inc., Santa Clara, CA, USA) employing the DNF-915-K0500 kit. Libraries’ quantification for molarity calculation was performed on a Rotor-Gene Q real-time PCR system (Qiagen, Hilden, Germany) using the KAPA Library Quantification kit for Illumina sequencing platforms (KAPA BIOSYSTEMS, Wilmington, MA, USA). Sequencing was performed on a NovaSeq 6000 platform (Illumina Inc., San Diego, CA, USA) using the NovaSeq 6000 S2 Reagent Kit v1.5 (300 cycles) kit.

3.3. Data Preprocessing and Variant Discovery

Raw, paired-end sequences were quality checked using FastQC (v0.11.7) [21] and MultiQC (v1.11) [22]. Low-quality reads (Q-score < 34), adapter sequences, unidentified nucleotides (N), very short sequences (length < 20 bases), and poly-G sequences were removed from the data using TrimGalore (v0.6.7) [23]. Quality-filtered sequences were aligned against the Ovis aries reference genome ARS-UI_Ramb_v3.0 (GenBank assembly accession number: GCA_016772045.2) and duplicate sequences were marked with the “fq2bam” function from Clara Parabricks v4.4.0 (NVIDIA, Santa Clara, CA, USA) [24]. The UQ tag was added in the data, using Picard’s (v3.3.0) [25] “SetNmMdAndUqTags” function. Subsequently, Base Quality Score Recalibration (BQSR) was performed to correct for systematic technical errors in bases’ quality score using Parabricks functions. Specifically, the recalibration model for BQSR was built with the “bqsr” function, using the variants available for the O. aries genome in the ENSEMBL (release 113) [26] and EVA (release 6) [27] databases as known sites, and was applied to the data using the “applybqsr” function. The steps were repeated until convergence was achieved. Next, the Parabricks implementation of HaplotypeCaller [15] was used to calculate genotype likelihoods and produce GVCF files for each sample. The individual GVCF files were imported into a GenomicsDB object, using GATK’s (v4.6.1.0) [28] “GenomicsDBImport” tool and joint-genotyping was performed for the samples with the “GenotypeGVCFs” function, producing a single VCF file containing the raw variants.

3.4. Variant Filtering

The variants included in the raw callset were filtered through VQSR. The model for VQSR was built with the “VariantRecalibrator” tool from GATK, using a custom truth and training set. More specifically, the raw variants of our callset underwent rigorous filtering in order to obtain a high-confidence variant set. In particular, SNPs with QUAL < 100.0, DP < 100.0, DP > 600.0, QD < 4.0, FS > 3.0, MQ < 55.0, SOR > 3.0, MQRankSum < −2.0, ReadPosRankSum < −2.0 or ReadPosRankSum > 2.0 were excluded from the variants’ set, along with INDELs which did not pass the same filters, with the exception of the MQ filter which is not applicable for this type of variants. This process resulted in a high-confidence variant set comprising 19,962,997 variants, which was used as training and truth set for VQSR. Moreover, the sheep variants available in the ENSEMBL (release 113) and EVA (release 6) databases were used as known sets for VQSR. The VQSR model was applied to the raw callset with the “ApplyVQSR” function. After VQSR, variants below the 99.0% sensitivity threshold were removed from the callset. Further filtering was applied to remove monomorphic and multiallelic variants, along with variants with coverage more than 350X across all samples, and INDELs with length greater than 50 bp. Variant missingness was calculated using VCFtools (v0.1.16) [29] and variants with missingness >0.1 were excluded from the data.

The variants in the final callset were annotated using SnpEff (v.5.2f) [30]. To assess variant quality, variant depth and SNP density in 1 Kb windows were calculated using VCFtools (v0.1.16), along with the Ti/Tv ratio. Statistics’ calculation and results’ visualization was performed in R programming language (v4.3.3) [31] using the ggplot2 package (v3.5.1) [32].

3.5. Data Availability

The data generated and presented in this study are openly available at the Sequence Read Archive (SRA), under BioProject ID PRJNA1246525.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/data10050075/s1, Table S1: Different thresholds tested for the creation of the Variant Quality Score Recalibration (VQSR) training and truth set; Table S2: SNP density per chromosome or scaffold for the final callset. In the first column the chromosome number or scaffold ID is presented and in the second column the mean (± s.d.) SNP density in 1 Kb windows is presented for each chromosome and scaffold. In scaffolds for which no standard deviation is reported, all the variants were located in the same 1 Kb window, rendering the calculation of standard deviation infeasible.

Author Contributions

Conceptualization, S.M. and A.A.; methodology, A.T.; software, A.T.; validation, A.T.; formal analysis, A.T.; investigation, A.T., E.P., M.K., G.T. and S.M.; resources, E.P. and A.A.; data curation, A.T.; writing—original draft preparation, A.T.; writing—review and editing, E.P., M.K., G.T., A.A. and S.M.; visualization, A.T.; supervision, A.A. and S.M.; project administration, A.A. and S.M.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Rural Development Policy Program 2014–2020 and the European Innovation Partnership for Agricultural Productivity and Sustainability (EIP-AGRI) for the project Gen-Sheep Milk—“Utilization of genetic material of breeds and farming systems to create products with increased added value from sheep’s milk”, grant agreement Μ16ΣΥΝ2-00069.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw WGS data are openly available at NCBI’s Sequence Read Archive, under BioProject ID PRJNA1246525.

Acknowledgments

The computational resources were granted with the support of GRNET. We thank PCG International for their support on IT services. All individuals included in this section have consented to the acknowledgement.

Conflicts of Interest

The authors declare no conflicts of interest.

References

EUROSTAT. Available online: https://ec.europa.eu/eurostat/ (accessed on 31 March 2025).
Ligda, C.; Altarayrah, J.; Georgoudis, A. Genetic Analysis of Greek Sheep Breeds Using Microsatellite Markers for Setting Conservation Priorities. Small Rumin. Res. 2009, 83, 42–48. [Google Scholar] [CrossRef]
Sossidou, E.; Ligda, C.; Mastranestasis, I.; Tsiokos, D.; Samartzi, F. Sheep and Goat Farming in Greece: Implications and Challenges for the Sustainable Development of Less Favoured Areas. Sci. Pap. Anim. Sci. Biotechnol. 2013, 46, 446–449. [Google Scholar]
Loukovitis, D.; Siasiou, A.; Mitsopoulos, I.; Lymberopoulos, A.G.; Laga, V.; Chatziplis, D. Genetic Diversity of Greek Sheep Breeds and Transhumant Populations Utilizing Microsatellite Markers. Small Rumin. Res. 2016, 136, 238–242. [Google Scholar] [CrossRef]
Billinis, C.; Psychas, V.; Leontides, L.; Spyrou, V.; Argyroudis, S.; Vlemmas, I.; Leontides, S.; Sklaviadis, T.; Papadopoulos, O. Prion Protein Gene Polymorphisms in Healthy and Scrapie-Affected Sheep in Greece. J. Gen. Virol. 2004, 85, 547–554. [Google Scholar] [CrossRef]
Triantaphyllopoulos, K.A.; Koutsouli, P.; Kandris, A.; Papachristou, D.; Markopoulou, K.E.; Mataragka, A.; Massouras, T.; Bizelis, I. Effect of β-Lactoglobulin Gene Polymorphism, Lactation Stage and Breed on Milk Traits in Chios and Karagouniko Sheep Breeds. Ann. Anim. Sci. 2017, 17, 371–384. [Google Scholar] [CrossRef]
Michailidou, S.; Tsangaris, G.; Fthenakis, G.C.; Tzora, A.; Skoufos, I.; Karkabounas, S.C.; Banos, G.; Argiriou, A.; Arsenos, G. Genomic Diversity and Population Structure of Three Autochthonous Greek Sheep Breeds Assessed with Genome-Wide DNA Arrays. Mol. Genet. Genom. 2018, 293, 753–768. [Google Scholar] [CrossRef]
Georgatou, S.; Papachristou, D.; Medugorac, I.; Laliotis, G.; Kassinis, N.; Bizelis, I.; Koutsouli, P. Phenotypic traits, diversity levels and genetic relationships of Cretan sheep breeds. Agrofor Int. J. 2024, 9, 68–76. [Google Scholar] [CrossRef]
Uffelmann, E.; Huang, Q.Q.; Munung, N.S.; de Vries, J.; Okada, Y.; Martin, A.R.; Martin, H.C.; Lappalainen, T.; Posthuma, D. Genome-Wide Association Studies. Nat. Rev. Methods Primers 2021, 1, 59. [Google Scholar] [CrossRef]
Jiang, Y.; Xie, M.; Chen, W.; Talbot, R.; Maddox, J.F.; Faraut, T.; Wu, C.; Muzny, D.M.; Li, Y.; Zhang, W.; et al. The Sheep Genome Illuminates Biology of the Rumen and Lipid Metabolism. Science 2014, 344, 1168–1173. [Google Scholar] [CrossRef]
Guo, Y.; Ye, F.; Sheng, Q.; Clark, T.; Samuels, D.C. Three-Stage Quality Control Strategies for DNA Re-Sequencing Data. Brief. Bioinform. 2013, 15, 879–889. [Google Scholar] [CrossRef]
Lv, F.H.; Cao, Y.H.; Liu, G.J.; Luo, L.Y.; Lu, R.; Liu, M.J.; Li, W.R.; Zhou, P.; Wang, X.H.; Shen, M.; et al. Whole-Genome Resequencing of Worldwide Wild and Domestic Sheep Elucidates Genetic Diversity, Introgression, and Agronomically Important Loci. Mol. Biol. Evol. 2022, 39, msab353. [Google Scholar] [CrossRef] [PubMed]
Sun, X.; Guo, J.; Li, R.; Zhang, H.; Zhang, Y.; Liu, G.E.; Emu, Q.; Zhang, H. Whole-Genome Resequencing Reveals Genetic Diversity and Wool Trait-Related Genes in Liangshan Semi-Fine-Wool Sheep. Animals 2024, 14, 444. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Yang, J.; Shen, M.; Xie, X.L.; Liu, G.J.; Xu, Y.X.; Lv, F.H.; Yang, H.; Yang, Y.L.; Liu, C.B.; et al. Whole-Genome Resequencing of Wild and Domestic Sheep Identifies Genes Associated with Morphological and Agronomic Traits. Nat. Commun. 2020, 11, 2815. [Google Scholar] [CrossRef] [PubMed]
Poplin, R.; Ruano-Rubio, V.; DePristo, M.A.; Fennell, T.J.; Carneiro, M.O.; Van der Auwera, G.A.; Kling, D.E.; Gauthier, L.D.; Levy-Moonshine, A.; Roazen, D.; et al. Scaling Accurate Genetic Variant Discovery to Tens of Thousands of Samples. bioRxiv 2018, preprint. [Google Scholar] [CrossRef]
Baer, C.F.; Miyamoto, M.M.; Denver, D.R. Mutation Rate Variation in Multicellular Eukaryotes: Causes and Consequences. Nat. Rev. Genet. 2007, 8, 619–631. [Google Scholar] [CrossRef]
Nishant, K.T.; Singh, N.D.; Alani, E. Genomic Mutation Rates: What High-Throughput Methods Can Tell Us. BioEssays 2009, 31, 912–920. [Google Scholar] [CrossRef]
Tian, D.; Han, B.; Li, X.; Liu, D.; Zhou, B.; Zhao, C.; Zhang, N.; Wang, L.; Pei, Q.; Zhao, K. Genetic Diversity and Selection of Tibetan Sheep Breeds Revealed by Whole-Genome Resequencing. Anim. Biosci. 2023, 36, 991–1002. [Google Scholar] [CrossRef]
Amane, A.; Belay, G.; Tijjani, A.; Dessie, T.; Musa, H.H.; Hanotte, O. Genome-Wide Genetic Diversity and Population Structure of Local Sudanese Sheep Populations Revealed by Whole-Genome Sequencing. Diversity 2022, 14, 895. [Google Scholar] [CrossRef]
Yi, W.; Hu, M.; Shi, L.; Li, T.; Bai, C.; Sun, F.; Ma, H.; Zhao, Z.; Yan, S. Whole Genome Sequencing Identified Genomic Diversity and Candidated Genes Associated with Economic Traits in Northeasern Merino in China. Front. Genet. 2024, 15, 1302222. [Google Scholar] [CrossRef]
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed on 6 February 2025).
Ewels, P.; Magnusson, M.; Lundin, S.; Käller, M. MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report. Bioinformatics 2016, 32, 3047–3048. [Google Scholar] [CrossRef]
Krueger, F.; James, F.; Ewels, P.; Afyounian, E.; Schuster-Boeckler, B. TrimGalore. Available online: https://github.com/FelixKrueger/TrimGalore (accessed on 7 February 2025).
NVIDIA ClaraTM Parabricks 2025. Available online: https://www.nvidia.com/en-us/clara/genomics/ (accessed on 11 February 2025).
Broad Institute, Picard Toolkit. 2019. Available online: https://github.com/broadinstitute/picard (accessed on 17 February 2025).
Dyer, S.C.; Austine-Orimoloye, O.; Azov, A.G.; Barba, M.; Barnes, I.; Barrera-Enriquez, V.P.; Becker, A.; Bennett, R.; Beracochea, M.; Berry, A.; et al. Ensembl 2025. Nucleic Acids Res. 2025, 53, 948–957. [Google Scholar] [CrossRef] [PubMed]
European Variation Archive EVA Database. Available online: https://www.ebi.ac.uk/eva/ (accessed on 20 March 2025).
Van der Auwera, G.A.; O’Connor, B.D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2020. [Google Scholar]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The Variant Call Format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef] [PubMed]
Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A Program for Annotating and Predicting the Effects of Single Nucleotide Polymorphisms, SnpEff: SNPs in the Genome of Drosophila Melanogaster Strain W1118. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef] [PubMed]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016; ISBN 978-3-319-24277-4. Available online: https://ggplot2.tidyverse.org (accessed on 7 March 2025).

Figure 1. Variant distribution and depth for the final callset. (A) Variant distribution in the chromosomes and the unplaced scaffolds in the Ovis aries genome. Numbers 1–26: autosomal chromosomes 1–26, S: sex chromosomes, U: unplaced scaffolds. (B) Distribution of variants’ depth across all samples for the final callset.

Table 1. Read metrics for the six samples. The sequencing depth, the number of raw and filtered reads, the respective percentage of reads with Q-score above 30, and the percentage of reads aligned against the reference genome for each sample are presented.

Sample	Sequencing Depth (X)	No. of Raw Reads	Raw Reads with Q-Score > 30 (%)	No. of Filtered Reads	Filtered Reads with Q-Score > 30 (%)	Alignment Rate (%)
Chios	41.4	372,243,261	93.49	361,941,831	94.80	99.86
Kalarritiko	35.3	317,860,409	93.18	308,121,356	94.57	99.80
Karagouniko	33.3	299,574,996	93.26	290,336,550	94.72	99.86
Lesvos	35.4	318,636,312	93.44	309,470,068	94.78	99.87
Serres	37.2	334,823,282	93.92	326,127,757	95.17	99.85
Thraki	38.6	347,585,658	93.37	337,203,717	94.75	99.85

Table 2. Variant annotation metrics for the six samples. The transition to transversion (Ti/Tv) ratio and the heterozygous to non-reference homozygous (Het/Hom) SNP ratio for each sample are presented.

Sample	Ti/Tv Ratio	Het/Hom Ratio
Chios	2.450	1.40
Kalarritiko	2.448	1.65
Karagouniko	2.450	1.15
Lesvos	2.448	1.74
Serres	2.448	1.48
Thraki	2.452	1.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsoureki, A.; Tsiolas, G.; Kyritsi, M.; Pavlou, E.; Argiriou, A.; Michailidou, S. First Whole Genome Sequencing Data of Six Greek Sheep Breeds. Data 2025, 10, 75. https://doi.org/10.3390/data10050075

AMA Style

Tsoureki A, Tsiolas G, Kyritsi M, Pavlou E, Argiriou A, Michailidou S. First Whole Genome Sequencing Data of Six Greek Sheep Breeds. Data. 2025; 10(5):75. https://doi.org/10.3390/data10050075

Chicago/Turabian Style

Tsoureki, Antiopi, George Tsiolas, Maria Kyritsi, Eleftherios Pavlou, Anagnostis Argiriou, and Sofia Michailidou. 2025. "First Whole Genome Sequencing Data of Six Greek Sheep Breeds" Data 10, no. 5: 75. https://doi.org/10.3390/data10050075

APA Style

Tsoureki, A., Tsiolas, G., Kyritsi, M., Pavlou, E., Argiriou, A., & Michailidou, S. (2025). First Whole Genome Sequencing Data of Six Greek Sheep Breeds. Data, 10(5), 75. https://doi.org/10.3390/data10050075

Article Menu

First Whole Genome Sequencing Data of Six Greek Sheep Breeds

Abstract

1. Summary

2. Data Description

2.1. Sequencing Data

2.2. Variants’ Quality

2.3. Variants’ Annotation

3. Methods

3.1. Sample Collection and DNA Extraction

3.2. Library Construction and Sequencing

3.3. Data Preprocessing and Variant Discovery

3.4. Variant Filtering

3.5. Data Availability

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI