Looking in the Scaffold 22 Hotspot for Differentially Regulated Genes Genomic Sequence Variation in Romanian Blueberry Cultivars

: Since its domestication about a century ago in North America, highbush blueberry ( Vac-cinium corymbosum L.) has gained appreciation by consumers worldwide


Introduction
Highbush blueberries (V.corymbosum), a perennial species belonging to the Ericaceae family, is native to North America and was domesticated at the end of 19th century in New England and Florida [1].Over time, blueberry cultivation spread to the rest of the USA until the 1940s, to Europe in the 1970s, to Australia, New Zeeland and South America in the 1980s, and to China in the 2000s [2].The species is now cultivated worldwide.In Romania, highbush blueberries were introduced in 1968 in Arges , county [3].Subsequently, a breeding program was established in 1982 for highbush blueberries at the Research Institute for Fruit Growing Pites , ti, resulting in the creation of nine Romanian varieties: 'Safir' and 'Azur' (1998), 'Augusta' (1999), 'Simultan' and 'Delicia' (2001), 'Lax' and 'Compact ' (2002), and 'Vital' and 'Prod ' (2008) [4].Moreover, after establishing a highbush blueberry collection and organizing yearly tasting sessions since 2016 with more than 60 blueberry varieties, the University of Agronomic Sciences and Veterinary Medicine of Bucharest started its own blueberry breeding program in 2021 [5,6].
One of the main reasons for cultivating highbush blueberries is that the fruits are considered a superfood, having nutraceutical qualities [7].Blueberries contain high quantities of fiber, vitamins, minerals, secondary metabolites, and polyphenols such as anthocyanins and flavonols [8].Polyphenols have been proven to have a positive role in the prevention of diseases like cancer, diabetes, cardiovascular disease, etc., [9][10][11][12].
In light of the growing demand for blueberries worldwide, the everchanging demand by consumers for different fruit colors, taste, flavors, together with climate change and the need for varieties resistant to biotic and abiotic stress, the pressure for creating new varieties resistant to environmental factors that will also satisfy the customers' demands is ever-increasing [13][14][15][16].One way to hasten the process of creating novel varieties is to employ molecular biology techniques such as the next-generation sequencing (NGS) of whole genomes [17][18][19][20].Once a genome for a species has been fully sequenced and annotated, it serves as a reference genome.The resequencing of other genotypes from the same species leads to the detection of variations at the molecular level.The variations translate into phenotypic differences such as resistance to various environmental stress factors, and fruit-quality traits [21][22][23].
By 2012, Pasaniuc et al. [24] demonstrated that genome-wide associated studies (GWAS) with ultra-low coverage combined with genotype calling, using reference genomes from the 1000 Genomes Project [25], can use larger sample sizes at the same price as GWAS combined with SNP arrays, although the approach is limited to common variants [24].Low-coverage resequencing has been used in crop species, such as wheat and passion fruit, and medicinal plants, such as milkweed, balloon flower and bonnet bellflower, not only to assess genetic diversity, and also to identify SNPs for use as markers for various traits of interest [26][27][28][29][30].
Although the price for whole-genome sequencing has decreased considerably, allowing for the sequencing of non-model plants [31][32][33], the question of which level of coverage depth is needed to obtain results useful for plant breeding for a particular species is still open.
The present study aimed to assess the genetic variability of eight V. corymbosum genotypes based on single-nucleotide polymorphism (SNP), insertion/deletion (InDel), structural variations (SV) and copy number variation (CNV) data from genome-level resequencing.It also aimed to determine if the whole-genome resequencing of V. corymbosum varieties with an average of 10× attempted coverage will yield enough data to breed new blueberry varieties with increased desirable fruit qualities.In this regard, the first ~10 Mb from scaffold 22, because it appears as a hotspot of genetic variation for both SNPs and InDels, was analyzed for the presence of genes differentially regulated throughout fruit growth and development.

Plant Material
In this study, seven Romanian blueberry varieties, 'Prod', 'Vital', 'Azur', 'Simultan', 'Delicia', 'Compact', and 'Safir', and a foreign variety, 'Bluecrop', were used.The blueberry varieties were cultivated in the orchard collection of the Horticulture Faculty, University of Agronomic Sciences and Veterinary Medicine of Bucharest, Romania.The origin and several characteristics of the Romanian varieties are presented in Table 1 [34].

DNA Extraction
Genomic DNA was extracted from young leaves using the Innupure C16 extraction system (Analitik Jena GmbH, Jena, Germany) and the InnuPREP Plant DNA I kit (Analitik Jena GmbH, Jena, Germany), following the manufacturer's instructions.Briefly, in a preliminary external lysis step, after grinding it to powder using liquid nitrogen, the plant material was homogenized with the SLS lysis solution, proteinase K, and RNase A solution.Thereafter, the extraction proceeded with the automated DNA extraction.DNA was quantified using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA).
SNP and InDel variations were detected with SAMtools software [36] with the 'mpileup -m 2 -F 0.002 -d 1000' parameter; SVs were detected using the BreakDancer software [37]; CNVs were detected using the CNVnator software [38] with the '-call 100' parameter.All variations were annotated using ANNOVAR software [39].To reduce the error rate in the SNP and InDel detection, the results were filtered with the following filter conditions: first, the number of support reads for each SNP/InDel was higher than 4; second, the mapping quality of each SNP/InDel was higher than 20.Structural variations were filtered by removing those with less than 2 supporting pair-end reads.
The SNPs and InDels were located in the following regions within the genomes: upstream (within 1 kb upstream away from transcription start site of the gene), exonic (in the exonic region), intronic (in the intronic region), splicing (in the splicing site, within a 2 bp range of the intron/exon boundary), downstream (within 1 kb downstream away from the transcription termination site of the gene region), upstream/downstream (within the less than 2 kb intergenic region, which is in 1 kb downstream or upstream of the genes), intergenic (within the more than 2 kb intergenic region), and others (in other regions).The exonic regions were further defined as nonsynonymous (single-nucleotide mutation with changes in the amino acid sequence), synonymous (single-nucleotide mutation without changing the amino acid sequence), stop gain (a nonsynonymous SNP that leads to the introduction of a stop codon at the variant site), and stop loss (a nonsynonymous SNP that leads to the removal of the stop codon at the variant site).

Scaffold 22 Sequence Analysis
The comparison of Draper genome's scaffold 22 with the V. myrtillus isolate NK2018 v1.0 genome sequence was conducted with the Synteny Viewer tool on the GDV website that used the Tripal Synteny Viewer developed by the Fei Bioinformatics Lab from the Boyce Thompson Institute at Cornell University, and MCScanX software [40].
Expression data from the Draper v1.0 scaffold 22, the first ~10 Mb, were obtained with the Expression Heatmap tool on the GDV website (vaccinium.org),accessed on 11 September 2023, based on transcriptome data published by Colle et al. [41].
Gene location, InterPro, and Go term data were acquired with the 'Gene and transcript search' tool for each of the differentially regulated genes in the first ~10 Mb of the Draper v1.0 scaffold 22 sequence.
The analysis of the differentially regulated genes located in the first ~10 Mb of the scaffold 22 sequences of the eight genomes wholly sequenced in the present study was conducted using Genome workbench software, version 3.9.0[42].

Mapping with the Reference Genome
The mapping rates, average depth, and percentage of 1 x and 4 x coverage depth are presented in Table 2.The mapping rates, calculated as a ratio of the reference genome assembly mapped reads to the total sequenced clean reads, had values of ~97%, indicating high similarity between the genotypes under study and the reference genome.Average depths varied between 3.24 x for 'Bluecrop' and 3.98 x for 'Vital'.Coverage at least 1× varied between 65.24% for 'Bluecrop' and 70.19% for 'Vital', whereas coverage at least 4× varied between 19.25% for 'Simultan' and 28.61% for 'Vital'.Mapped reads: The number of clean reads mapped to the reference assembly, including both single-end reads and reads in pairs; Total reads: Total number of effective reads in clean data; Mapping rate: The ratio of the reference genome assembly mapped reads to the total sequenced clean reads; Average depth: The average depth of mapped reads at each site, calculated by the total number of bases in the mapped reads, divided by the size of the assembled genome; Coverage at least 1X: The percentage of the assembled genome with more than one read at each site; Coverage at least 4X: The percentage of the assembled genome with ≥4X coverage at each site.

SNP Distribution and Mutation Frequency
SNPs were distributed across the genomes, within all regions defined (upstream, exonic, intronic, splicing, downstream, upstream/downstream, intergenic, and others (Supplementary Table S2)).The number of SNPs ranged between 794,931 in 'Delicia' and 1,228,096 in 'Vital'.The highest number of SNPs in all the defined genome regions was observed in 'Vital', whereas the lowest number was observed in 'Delicia', except for synonymous and non-synonymous exonic SNPs, where the lowest number of SNPs was present in 'Bluecrop'.The ratio between the number of transitions and the number of transversions (ts/tv) was over 1.9 in all genotypes, with the lowest ratio being observed in 'Delicia', 1.956, and the highest, 1.981, in 'Simultan'.
The distribution of SNP mutation types is depicted in Figure 1.When looking at the mutation spectrum, the SNP type C:G>T:A presented the highest mutation fraction for all genotypes studied, followed by T:A>C:G, T:A>A:T, C:G>A:T, T:A>G:C and C:G>G:C.The distribution of SNP mutation types is depicted in Figure 1.When looking at the mutation spectrum, the SNP type C:G>T:A presented the highest mutation fraction for all genotypes studied, followed by T:A>C:G, T:A>A:T, C:G>A:T, T:A>G:C and C:G>G:C.S3).Like the SNPs, the highest number of InDels was observed in the 'Vital' variety for all defined regions.However, the lowest numbers of InDels varied among the genotypes, depending on the defined region.For instance, the 'Bluecrop' variety had the lowest number of InDels within the exonic regions, except for stop loss, where  S3).Like the SNPs, the highest number of InDels was observed in the 'Vital' variety for all defined regions.However, the lowest numbers of InDels varied among the genotypes, depending on the defined region.For instance, the 'Bluecrop' variety had the lowest number of InDels within the exonic regions, except for stop loss, where the lowest number of InDels was observed in the 'Simultan' variety.Outside of exonic regions, 'Delicia' presented the lowest number of InDels, except for the splicing region, where the lowest number of InDels was again observed in the 'Simultan' variety.
Within the coding sequences, the highest number of InDels was noted for the 1 bp insertion/deletion (~27%) and the decrease was inversely proportional with the InDel length (Figure 2).As expected, percentages of sequences with multiples of 3 bp were higher than the other sequences, since they do not cause frameshifts, and consequently, radically change the sequence of the translated polypeptide.The percentages of InDels with lengths above 15 bp were below 1%.

Insertions/Deletions Distribution
Similar to the SNPs, InDels were distributed in all defined regions of the genomes (Supplementary Table S3).Like the SNPs, the highest number of InDels was observed in the 'Vital' variety for all defined regions.However, the lowest numbers of InDels varied among the genotypes, depending on the defined region.For instance, the 'Bluecrop' variety had the lowest number of InDels within the exonic regions, except for stop loss, where the lowest number of InDels was observed in the 'Simultan' variety.Outside of exonic regions, 'Delicia' presented the lowest number of InDels, except for the splicing region, where the lowest number of InDels was again observed in the 'Simultan' variety.
Within the coding sequences, the highest number of InDels was noted for the 1 bp insertion/deletion (~27%) and the decrease was inversely proportional with the InDel length (Figure 2).As expected, percentages of sequences with multiples of 3 bp were higher than the other sequences, since they do not cause frameshifts, and consequently, radically change the sequence of the translated polypeptide.The percentages of InDels with lengths above 15 bp were below 1%.The highest number of InDels observed in the genomes was again noticed for the 1 bp insertion/deletion (~38%) and decreased similarly with the InDels within the coding sequence (Figure 3).The highest number of InDels observed in the genomes was again noticed for the 1 bp insertion/deletion (~38%) and decreased similarly with the InDels within the coding sequence (Figure 3).When looking at the SNP and InDel densities within the first 24 scaffolds, hotspots were observed within the same region of the scaffold for all genotypes studied (Figure 4).The widest hotspot region was observed in the first 10 Mb of the scaffold 22.When looking at the SNP and InDel densities within the first 24 scaffolds, hotspots were observed within the same region of the scaffold for all genotypes studied (Figure 4).The widest hotspot region was observed in the first 10 Mb of the scaffold 22.The distribution of the structural variations within the genomes is presented in Supplementary Table S4.Except for downstream, upstream/downstream, and intergenic regions (in the 'Safir' variety), the highest number of SVs in the genome regions was noted for the 'Vital' variety, whereas the lowest number of SVs was observed for the 'Delicia' variety.When considering SV types, the 'Vital' variety had the highest number of deletions, inversions, and inter-chromosomal translocations, while the highest number of intra-chromosomal translocations was observed for the 'Safir' variety.On the other hand, the lowest number of deletions, inversions, and inter-chromosomal translocations was noticed for the 'Delicia' variety and the lowest number of intra-chromosomal translocations was observed for the 'Simultan' variety.When looking at the SV types, the highest percentage of SVs was observed for the inter-chromosomal translocations, followed by deletions, inter-chromosomal translocations, inversions, and insertions (Figure 5).For each variety, on the y axis the scaffolds are aligned in a size-decreasing order, with scaffold 1 on top, and scaffold 24 on bottom.

Structural Variations Detection and Annotation
The distribution of the structural variations within the genomes is presented in Supplementary Table S4.Except for downstream, upstream/downstream, and intergenic regions (in the 'Safir' variety), the highest number of SVs in the genome regions was noted for the 'Vital' variety, whereas the lowest number of SVs was observed for the 'Delicia' variety.When considering SV types, the 'Vital' variety had the highest number of deletions, inversions, and inter-chromosomal translocations, while the highest number of intra-chromosomal translocations was observed for the 'Safir' variety.On the other hand, the lowest number of deletions, inversions, and inter-chromosomal translocations was noticed for the 'Delicia' variety and the lowest number of intra-chromosomal translocations was observed for the 'Simultan' variety.When looking at the SV types, the highest percentage of SVs was observed for the inter-chromosomal translocations, followed by deletions, inter-chromosomal translocations, inversions, and insertions (Figure 5).Structural variations with lengths above 1200 bp accounted for more than 40% of total SVs and were followed by SVs with lengths of 200-300 bp (~17%).The lowest percentages were observed for SVs with lengths of 0-200 bp (~0.6-1%) (Figure 6).Structural variations with lengths above 1200 bp accounted for more than 40% of total SVs and were followed by SVs with lengths of 200-300 bp (~17%).The lowest percentages were observed for SVs with lengths of 0-200 bp (~0.6-1%) (Figure 6).For each variety, on the y axis the scaffolds are aligned in a size-decreasing order, with scaffold 1 on top, and scaffold 24 on bottom.

Structural Variations Detection and Annotation
The distribution of the structural variations within the genomes is presented in Supplementary Table S4.Except for downstream, upstream/downstream, and intergenic regions (in the 'Safir' variety), the highest number of SVs in the genome regions was noted for the 'Vital' variety, whereas the lowest number of SVs was observed for the 'Delicia' variety.When considering SV types, the 'Vital' variety had the highest number of deletions, inversions, and inter-chromosomal translocations, while the highest number of intra-chromosomal translocations was observed for the 'Safir' variety.On the other hand, the lowest number of deletions, inversions, and inter-chromosomal translocations was noticed for the 'Delicia' variety and the lowest number of intra-chromosomal translocations was observed for the 'Simultan' variety.When looking at the SV types, the highest percentage of SVs was observed for the inter-chromosomal translocations, followed by deletions, inter-chromosomal translocations, inversions, and insertions (Figure 5).Structural variations with lengths above 1200 bp accounted for more than 40% of total SVs and were followed by SVs with lengths of 200-300 bp (~17%).The lowest percentages were observed for SVs with lengths of 0-200 bp (~0.6-1%) (Figure 6).

Copy Number Variations Detection and Annotation
The highest number of CNVs was observed for the 'Bluecrop' variety in the intronic, downstream, and upstream/downstream genomic regions, for the 'Vital' variety in the exonic and intergenic regions of the genomes, and in the 'Safir' variety in the upstream genomic regions.The number of duplications, between 20,878 ('Delicia') and 52,280 ('Vital'), was far higher than the number of deletions, with no deletions observed for the 'Prod', 'Vital', 'Simultan', 'Safir' and 'Bluecrop' varieties, and 9 deletions for 'Compact' variety, 16 for 'Delicia' variety and 29 for the 'Azur' variety (Supplementary Table S5).

Scaffold 22 Sequence Analysis
The block bivcdB0989 contains the scaffold 22 hotspot (V.corymbosum cv.Draper v1.0 genome sequence, VaccDscaff22_Vaccinium_corymbosum_Draper_v1: 7771-10,416,940) and matches the V. myrtillus isolate NK2018 v1.0 genome sequence Chr12_Vaccinium_myrtillus_NK2018_v1: 27,664-9,425,466.The genes present in both genomes within this block are shown in Supplementary Table S6, a number of 673 genes being present in the V. corymbosum sequence.
Using the Expression Heatmap tool on the GDV website (vaccinium.org)accessed on 11 September 2023, the expression data were obtained for these genes under the V. corymbosum cv.'Draper' tissue during fruit development, FPKM analysis, with expression data available for 459 genes (Supplementary Table S7).Data available on the GDV website were submitted by Colle et al. [41] and were collected from the following tissue samples: flower bud, flower at anthesis, leaf day, leaf night, young shoot, leaves treated with methyl jasmonate for 1 h, 8 h and 24 h, fruit development (fruit at petal fall, green fruit, pink fruit, ripe fruit), and salt-treated and untreated roots.
For the fruit development data, out of the 459 genes, 11 were upregulated more than 10-fold during fruit development (Table 3, Supplementary Table S8) and 50 were downregulated more than 10-fold (Table 4, Supplementary Table S9).

Downregulated Genes Fold Decrease
When looking at the functions of the encoded upregulated genes, three have a proteinbinding function, one has a heme-binding function, one has a nuclear-acid-binding function, and two have a serine-type carboxypeptidase activity.The rest of these genes do not have a GO (gene ontology) term.
Next, the sequences of these genes were analyzed in the sequenced blueberry varieties.Five of the genes, augustus-gene-4.24,processed-gene-12.3,augustus-gene-56.35,augustus-gene-89.30,and augustus-gene-89.31,were not completely covered by sequencing for any of the varieties analyzed.Augustus-gene-93.19 had full coverage for varieties 'Vital', 'Azur', 'Delicia', 'Compact', 'Safir', and 'Bluecrop', whereas augustus-gene-104.25 had full coverage only for the 'Vital' variety.Interestingly, the processed-gene-45.12,which was fully covered by sequencing for all varieties, had only one deletion for 'Safir' variety, a three-nucleotide sequence, GAC, at an SSR site: (GAC) 3 instead of (GAC) 4 , as opposed to the rest of the genes, which had numerous SNPs, including ones not fully covered by sequencing.Processedgene-0.20,which was also fully sequenced on all varieties, presented high variation within the first 60 nucleotides in the CDS sequence, between varieties, and within the variety at an SSR locus: (CCA) 3 to (CCA) 9 , resulting in a variation in the number of histidine residues translated.
In the case of the downregulated genes, out of the 50, 21 genes were below 1 x coverage by sequencing, 5 genes had full coverage for all varieties studied, and the rest had full coverage for at least one variety.The putative functions of the downregulated genes varied from nucleic-acid-binding to protein-binding, ATP-binding, iron-ion-binding functions, etc., (Supplementary Table S9).
VaccDscaff22-processed-gene-8.5 and VaccDscaff22-processed-gene-49.1 genes are both encoding proteins with glycosyl transferase activity.VaccDscaff22-augustus-gene-23.30encodes a nucleic acid binding protein.VaccDscaff22-processed-gene-63.2 encodes a protein with a putative role in protein ubiquitination (RING/U-box/RING-Ubox_PUB-like protein).VaccDscaff22-augustus-gene-71.34 gene is encoding a putative protein-binding protein and contains a glutathione S-transferase (GST) domain.VaccDscaff22-augustus-gene-76.24 gene does not have a GO term, but InterPro predicts that it encodes a bifunctional inhibitor/lipidtransfer protein/seed storage 2S albumin with a structural domain consisting of 4-helices with a folded-leaf topology and forms a right-handed superhelix domain present in plant lipid-transfer proteins, proteinase/alpha-amylase inhibitors, and seed storage proteins [43].In addition to the numerous SNPs present in this gene for all genotypes studied here, the 'Delicia' variety has a deletion of five nucleotides (As) within an intron.

Discussion
Next-generation sequencing techniques have made it possible to generate a tremendous amount of data from a single sequencing experiment that can be used in countless future studies.At the moment, the V. corymbosum genome has been sequenced at the scaffold level.The first to be wholly sequenced by shotgun sequencing was the diploid variety W8520, resulting in a draft genome sequence that was used to mine SSRs [44].Thereafter, the highbush blueberry tetraploid variety, Draper, was sequenced at the scaffold level [41].Lastly, data from the Vaccinium Pangenome Project were made available on the Genomic Database for the Vaccinium website (GDV, https://www.vaccinium.org/,accessed on 11 September 2023), on 22 V. corymbosum sequenced genomes at the scaffold level [45].The present study offers an analysis of variability in eight V. corymbosum genomes, investigating SNPs, InDels, SVs, and CNVs.It also 'zooms in' to one of the SNP and InDel hotspots, the first ~10 Mb of the 22nd scaffold, to identify potential genes of interest involved in fruit development that present variability at the nucleotide level among the varieties presented here.
When looking at the SNP data, predictably, the highest number of SNP types were transitions (C:G>T:A and T:A>C:G), a phenomenon observed before [46], due to mutations resulting from the deamination of methylated cytosine residues [47].Although the effective number of SNPs and InDels originate from just ~19-28% of the genome, it still indicates high numbers of mutations in regions of interest such as exonic regions.For instance, the number of non-synonymous SNPs varied between 29,881 for the 'Bluecrop' variety and 45,914 for the 'Vital' variety.Structural variations (SVs) varied between ~9700 and ~15,500 per genotype; however, it is possible that the number is higher, as high-throughput short-read sequencing does not characterize easily repetitive regions, and therefore might miss structural variations [48].
Both SNPs and InDels were detected in all defined genome regions, with hotspots with a higher density of SNPs and InDels, as observed before in another study [46].These hotspots are usually located towards the ends of the chromosomes, most probably due to their higher recombination frequency [49,50].
To date, V. corymbosum genome has been wholly sequenced at the scaffold level [44] and annotated by Gupta et al. [51].In the present study, the large hotspot identified on scaffold 22, within the first 10 Mb (Figure 4), was chosen to be studied in more detail.Using the Synteny Viewer tool on the GDV website (vaccinium.org,accessed on 11 September 2023), the scaffold 22 from the Draper genome, sequenced in 2019 by Colle et al. [41], was compared to Vaccinium myrtillus L. isolate NK2018 v1.0 genome sequence.V. myrtillus has been wholly sequenced at the chromosome level [52], and the comparison yielded the location of scaffold 22 at the chromosome level for V. corymbosum (Figure 7).The comparison identified chromosome 12 as the putative location of this scaffold.From the 11 upregulated genes, one gene wholly covered by sequencing for all varieties studied, VaccDscaff22-processed-gene-0.20, that encodes a S-adenosyl-l-methionine-dependent methyltransferase (SAM-MTase), contains SSR sites within its coding region.Sadenosyl-l-methionine-dependent methyltransferases catalyze various steps in ethylene and polyamine biosynthesis pathways, with both having roles in fruit ripening [53].For instance, in a previous study looking for the genetic basis of several blueberry fruit traits, a gene encoding a SAM-MTase from the scaffold00012 (CUFF.1480.1)was linked to fruit firmness [54].Another study characterized a SAM-MTase that catalyzes the biosynthesis of a key aroma compound in strawberry fruits [55].If differences in this gene at the nucleotide level prove to translate into notable phenotypic differences in fruits, a molecular marker developed based on the SSR site present at the beginning of the coding sequence may be used to select for the desirable trait in breeding programs.
Of the 50 downregulated genes, 5 genes were covered by sequencing for all varieties studied here, and they encode two glycosyl transferases, a RING (Really Interesting New Gene) finger protein, a GST protein, and a bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin.
In plants, glycosyl transferases are involved in cell wall polysaccharides biosynthesis, in adding N-linked glycans to glycoproteins, but also in flavonoids biosynthesis [56].It stands to reason that one or both genes encoding glycosyl transferases (VaccDscaff22- The analysis of the hotspot region from the scaffold 22 revealed which of the differentially regulated genes during fruit growth and development presented differences in the varieties sequenced in the present study.These genes should be studied in the future to check if they could be linked to various traits related to fruit quality. From the 11 upregulated genes, one gene wholly covered by sequencing for all varieties studied, VaccDscaff22-processed-gene-0.20, that encodes a S-adenosyl-l-methioninedependent methyltransferase (SAM-MTase), contains SSR sites within its coding region.S-adenosyl-l-methionine-dependent methyltransferases catalyze various steps in ethylene and polyamine biosynthesis pathways, with both having roles in fruit ripening [53].For instance, in a previous study looking for the genetic basis of several blueberry fruit traits, a gene encoding a SAM-MTase from the scaffold00012 (CUFF.1480.1)was linked to fruit firmness [54].Another study characterized a SAM-MTase that catalyzes the biosynthesis of a key aroma compound in strawberry fruits [55].If differences in this gene at the nucleotide level prove to translate into notable phenotypic differences in fruits, a molecular marker developed based on the SSR site present at the beginning of the coding sequence may be used to select for the desirable trait in breeding programs.
Of the 50 downregulated genes, 5 genes were covered by sequencing for all varieties studied here, and they encode two glycosyl transferases, a RING (Really Interesting New Gene) finger protein, a GST protein, and a bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin.
In plants, glycosyl transferases are involved in cell wall polysaccharides biosynthesis, in adding N-linked glycans to glycoproteins, but also in flavonoids biosynthesis [56].It stands to reason that one or both genes encoding glycosyl transferases (VaccDscaff22processed-gene-8.5 and VaccDscaff22-processed-gene-49.1) may have a role in anthocyanin biosynthesis during fruit ripening.
Plant RING finger proteins have been shown to play a role in stress resistance, both biotic and abiotic, signal transduction, and plant growth and development, including fruit development [57].
Even though many of the genes differentially regulated throughout fruit growth and development were below at least 1× coverage by sequencing, two upregulated and five downregulated genes that vary at the nucleotide level were covered at least 1× by sequencing for all the genotypes.
To further exemplify the potential of this type of preliminary low-coverage sequencing study, we chose one of the differentially expressed genes, VaccDscaff22-processed-gene-49.1, to be analyzed in further detail (Supplementary Table S10).This gene putatively encodes a glucosyl/glucuronosyl transferase.The minimum read depth, used as a filtering criterion for SNP detection, may be as low as three [65].Here, a minimum of five supporting reads was used to detect an SNP.Within this gene's coding region, 44 SNPs were identified, out of which 28 SNPs are silent and 16 SNPs (either homozygous or heterozygous) result in missense mutations (Figure 8).For each of the missense mutations, there are at least two varieties that are different at the SNP position, except for the SNP at the position 4,966,356, where only two varieties present heterozygous SNPs.In a subsequent study, the expression of this gene will be analyzed throughout fruit ripening in the genotypes studied here to see if there are any correlations between the DNA sequences, mRNA expression levels, and fruit traits of interest.processed-gene-8.5 and VaccDscaff22-processed-gene-49.1) may have a role in anthocyanin biosynthesis during fruit ripening.Plant RING finger proteins have been shown to play a role in stress resistance, both biotic and abiotic, signal transduction, and plant growth and development, including fruit development [57].
Even though many of the genes differentially regulated throughout fruit growth and development were below at least 1× coverage by sequencing, two upregulated and five downregulated genes that vary at the nucleotide level were covered at least 1× by sequencing for all the genotypes.
To further exemplify the potential of this type of preliminary low-coverage sequencing study, we chose one of the differentially expressed genes, VaccDscaff22-processed-gene-49.1, to be analyzed in further detail (Supplementary Table S10).This gene putatively encodes a glucosyl/glucuronosyl transferase.The minimum read depth, used as a filtering criterion for SNP detection, may be as low as three [65].Here, a minimum of five supporting reads was used to detect an SNP.Within this gene's coding region, 44 SNPs were identified, out of which 28 SNPs are silent and 16 SNPs (either homozygous or heterozygous) result in missense mutations (Figure 8).For each of the missense mutations, there are at least two varieties that are different at the SNP position, except for the SNP at the position 4,966,356, where only two varieties present heterozygous SNPs.In a subsequent study, the expression of this gene will be analyzed throughout fruit ripening in the genotypes studied here to see if there are any correlations between the DNA sequences, mRNA expression levels, and fruit traits of interest.Furthermore, many of the genes analyzed were fully covered by sequencing for at least two genotypes, making a comparison at the nucleotide level possible, as well as the identification of SNPs and InDels.These genes are encoding for nucleic-acid-binding proteins (Zn finger protein, MYB-like DNA-binding protein, homeobox-leucine zipper protein, RNA-dependent RNA polymerase), protein-binding proteins (leucine-rich repeats (LRR) proteins, pentatricopeptide repeat protein, ARM repeat protein, iron-ion-binding proteins, polygalacturonase, hydrolase, and chloroplastic protein).Furthermore, many of the genes analyzed were fully covered by sequencing for at least two genotypes, making a comparison at the nucleotide level possible, as well as the identification of SNPs and InDels.These genes are encoding for nucleic-acid-binding proteins (Zn finger protein, MYB-like DNA-binding protein, homeobox-leucine zipper protein, RNA-dependent RNA polymerase), protein-binding proteins (leucine-rich repeats (LRR) proteins, pentatricopeptide repeat protein, ARM repeat protein, iron-ion-binding proteins, polygalacturonase, hydrolase, and chloroplastic protein).
In addition to stress response, LRR proteins are also involved in fruit development in apple, strawberry [78], and peanuts [79].Pentatricopeptide repeat (PPR) proteins play roles in fruit development in watermelon [80], kiwifruit [81], pepper [82,83], and maize [84], etc.The roles of polygalacturonases and hydrolases in fruit development began to be investigated more than two decades ago [85][86][87][88][89]. Therefore, full coverage for at least two genotypes for a differentially regulated gene revealed enough differences to choose candidate fruit development genes for further study.For genes that may prove to be of interest and have small gaps, Sanger sequencing using specific primers bordering the gaps may be an alternative solution to complete the missing data.
Therefore, all these data demonstrate that even at just 10× attempted average coverage by sequencing, this study offers a treasure trove of information, providing a plethora of preliminary data to be analysed and raising more questions to be answered in future studies.

Conclusions
Consumers preferences for blueberry fruit traits such as fruit color, size, firmness, taste, and flavor shape the goals of breeding programs.Whole-genome resequencing of the genotypes analyzed in the present study led to the identification of multiple genes putatively involved in fruit ripening that are showing high variability among the varieties, in one hotspot, and in one scaffold, and one criterion (fruit growth and development).If this criterion is applied in an in-depth study at the level of the whole genome, additional genes that vary among the genotypes are bound to be found.Moreover, additional criteria can be applied if looking for other traits of interest, such as resistance to stress, either biotic or abiotic.In this respect, despite the low-medium coverage depth of sequencing, the results presented in the current work indicate that a 10× attempted coverage depth is useful if sequencing a high number of genotypes of V. corymbosum.Thus, with a relatively low budget, this low-coverage sequencing offers reliable preliminary data for further use in subsequent studies engaged in discovering novel genes linked to useful traits.The genes identified here will be analyzed further in future studies for their effect on fruit-quality traits.If these genes are found to be linked to desirable fruit traits, these should definitely be selected in blueberry breeding programs.

Figure 1 .
Figure 1.SNP mutation type distribution.3.1.4.Insertions/Deletions Distribution Similar to the SNPs, InDels were distributed in all defined regions of the genomes (Supplementary TableS3).Like the SNPs, the highest number of InDels was observed in the 'Vital' variety for all defined regions.However, the lowest numbers of InDels varied among the genotypes, depending on the defined region.For instance, the 'Bluecrop' variety had the lowest number of InDels within the exonic regions, except for stop loss, where

Figure 4 .
Figure 4. SNP and InDel densities within the scaffolds 1-24 (VaccDscaff1-VaccDscaff24).For each variety, on the y axis the scaffolds are aligned in a size-decreasing order, with scaffold 1 on top, and scaffold 24 on bottom.

Figure 4 .
Figure 4. SNP and InDel densities within the scaffolds 1-24 (VaccDscaff1-VaccDscaff24).For each variety, on the y axis the scaffolds are aligned in a size-decreasing order, with scaffold 1 on top, and scaffold 24 on bottom.

Figure 5 .
Figure 5. SV type distribution within the genome.

Figure 5 .
Figure 5. SV type distribution within the genome.

Figure 4 .
Figure 4. SNP and InDel densities within the scaffolds 1-24 (VaccDscaff1-VaccDscaff24).For each variety, on the y axis the scaffolds are aligned in a size-decreasing order, with scaffold 1 on top, and scaffold 24 on bottom.

Figure 5 .
Figure 5. SV type distribution within the genome.

Figure 6 .
Figure 6.Structural variations length distribution within the genomes.

Figure 7 .
Figure 7. Syntenic blocks between the V. corymbosum scaffold 22 and V. myrtillus genomes.The circular layout was obtained by using the Tripal Synteny Viewer from the GDV (https://www.vaccinium.org/accessed on 11 September 2023) website.

Figure 8 .
Figure 8. Snapshot of alignment of the VaccDscaff22-processed-gene-49.1 gene DNA sequences for the eight V. corymbosum genotypes in the Genome Workbench software.

Figure 8 .
Figure 8. Snapshot of alignment of the VaccDscaff22-processed-gene-49.1 gene DNA sequences for the eight V. corymbosum genotypes in the Genome Workbench software.

Table 1 .
Characteristics of Romanian blueberry varieties.

Table 2 .
Statistics of mapping rate, average depth, and coverage.

Table 3 .
Upregulated genes during fruit development.