Polymorphism of Saccharomyces cerevisiae Strains in DNA Metabolism Genes

Baker’s yeast, S. cerevisiae, is an excellent model organism exploited for molecular genetic studies of the mechanisms of genome stability in eukaryotes. Genetic peculiarities of commonly used yeast strains impact the processes of DNA replication, repair, and recombination (RRR). We compared the genomic DNA sequence variation of the five strains that are intensively used for RRR studies. We used yeast next-generation sequencing data to detect the extent and significance of variation in 183 RRR genes. We present a detailed analysis of the differences that were found even in closely related strains. Polymorphisms of common yeast strains should be considered when interpreting the outcomes of genome stability studies, especially in cases of discrepancies between laboratories describing the same phenomena.


Introduction
Molecular biology and genetic studies using a simple eukaryote, Saccharomyces cerevisiae yeast, had, have, and will immensely impact the DNA replication, repair, and recombination (RRR) field [1,2]. Yeast strains that were collected and stored by Emil Hansen in the 1880s were introduced in laboratory practice by Öjvind Winge 50 years later. These strains diverted to several groups of modern yeast strains [3,4]. The strains have similar genetic content but multiple single nucleotide changes, different Ty1 element distributions, and structural variations [5,6]. The evidence has accumulated that some variations affect DNA maintenance genes that might be critical for the performance of this machinery. One of the most known classic examples is the finding that a widely used control strain, W303, carries a mutated RAD5 gene, allele rad5-535 [7]. Rad5 is a multifunctional helicase/ubiquitin ligase. The mutation alters the conserved nucleotide binding motif 535 GXGKT to 535 RXGKT (changed amino acid is underlined). Strains with the rad5-535 allele are slightly sensitive to a mutagen MMS and show altered genetic interactions with soh2 mutants affecting the RNA polymerase mediator complex. Rad5 is implicated in template switch during replication [8], and in translesion DNA synthesis (TLS) [9,10]. Polymorphism in RAD5 might be a weighty modifier of these processes. Strain yJF1, whose derivatives are used to produce proteins for reconstitutions of replication fork in vitro, is the relative of W303 and possesses this allele [11]. Variation between the strains is a common reason for different rates of replication origin activation [12]. The yeast genome results from ancestral genome duplication [13], and many genes have paralogs that are under less stringent selection and evolve rapidly. Examples are RNR genes that have remained paralogs at various stages of diversification. As a result, some strains with deletion of the gene encoding a major subunit of RNR1 are inviable, e.g., W303, but some are viable, BY4741/S288C [14,15]. Another example is a polymorphism of strains explaining critical parameters connecting telomere length and longevity [16].
Before the next-generation sequencing (NGS) era, discrepancies in the results of DNA repair studies were attributed to unknown variations in the genetic background of the used strains. In one extreme case, it was proposed that unknown genetic differences may affect the interpretation of genetic experiments defining the arrangement of DNA polymerases at the fork [17,18]. Many classic and new yeast strains that are used for studies of DNA metabolism and genome stability have now been sequenced, and the genetic causes of phenotypic differences have been uncovered [19,20]. Here, we analyzed the differences in significant RRR genes among several common laboratory strains and LAN series that we have used for the studies of peculiarities of mutagenesis in diploids by a base analog and APOBEC deaminases with an emphasis on kataegis and the hypermutable fraction of yeast cells [21][22][23]. LAN series are close derivatives of the CG379 strain, which has been used for pioneer studies of the consequences of defective DNA polymerase proofreading, defects of mismatch repair (MMR), or their combination on mutation rates [24][25][26][27][28][29]. A variant of the CG379 strain called E134 [29,30] is one of the primary strains for modeling mutator phenotypes of cancer-associated mutations [31,32].

Characteristics of Strains Examined
For our comparison, we analyzed the genome of the strain that is the closest relative of the first sequenced yeast strain, S288C, BY4742 (MATα his3-∆1 leu2-∆0 lys2-∆0 ura3-∆0). BY4742, and its sibling of a different mating type, BY4741, are widely used for various genetic studies, including genome instability analysis because of the availability of the library of systematic deletions [33,34]. Its genealogy is described in [35]. W303-1A and -1B are MATa and MATα haploids correspondingly, with the same genotype (ade2-1 ura3-1 his3-11, 15 trp1-1 leu2-3, 112 can1-100) but differ in mating type, originating from W303 diploid parent [36]. W303 is likely closely related to S288C although its history is not well-defined [37]. Less related is SK1, a homothallic strain (MATa/MATα [HO can1 gal2 cup] [38,39] that is widely used for studies of yeast metabolism and meiosis [40,41]. It is quite distant from S288C [42]. The known parts of the ancestry of LAN201 and LAN211 are summarized in Figure 1. They are derived from CG379 by a series of integration-excisions, one-step gene replacements, mating type switching by HO-containing plasmid, and crosses.

DNA Sequence Variations in Genomes of the Yeast Strains
We compared whole genome sequencing data of the five strains (data sources are provided in Materials and Methods, Section 4.1, [22,33]) by the methodology described in Sections 4.2 and 4.3. The results, expressed as differences from the canonical S288C genome, are summarized in Table 1, left half, and illustrated in Figure 2A. The closest to the S288C is the BY4742 strain, which has only 54 variants leading to amino acid changes, with a few variants per chromosome distributed relatively evenly (there is not a single chromosome without changes, Figure 2). The W303 and LAN series appear to have more differences from S288C, and the three strains are quite similar regarding the number of variants (around 2000). The pattern of mutation distribution in W303 differs from LANs in many cases. The most evident examples are large blocks with around 1000 mutations per Mb at different locations in chromosomes II, VII, and XI ( Figure 2). LAN201 and its autodiploid LAN211 are virtually identical, both in pattern of mutation localization and in numbers of mutations. However, the transformation by HO-containing plasmid and diploidization ( Figure 1) was not wholly benign and a few changes were accumulated. In the following paragraphs, we will refer to these strains under the umbrella name "LAN". SK1 has overwhelming eight-fold more differences (Table 1). It is vividly demonstrated by a complete change of color in the panel describing this strain in Figure 2A. Panel Figure 2B illustrates a considerable distance between SK1 and the group of the other strains.  [43] and its derivative K65-3D (brackets in genotype are used to indicate the homozygous diploid genotype of homothallic HO strain) [44] on one side, and S288c [4] and its derivative DBY746, on the other [45]. Dr. Craig Giroux (NIEHS, USA) performed a series of backcrosses to find segregants with the desired markers. This yielded CG379, the basic strain for creating DNA polymerase mutations [24]. The lys2 allele with Tn5 insertion was introduced as described in [46], ura3-52 was replaced by the ura3-4 allele, and the strain was made diploid by HO endonuclease. Tetrad dissection gave segregant 1B-D770 with a changed mating type [30]. Next, ura3-4 was converted back to URA3 to create LAN201 [22], which gave rise to diploid LAN211 by auto-diploidization with the assistance of an HO-bearing plasmid.

DNA Sequence Variations in Genomes of the Yeast Strains
We compared whole genome sequencing data of the five strains (data sources are provided in Materials and Methods, Section 4.1, [22,33]) by the methodology described in Sections 4.2 and 4.3. The results, expressed as differences from the canonical S288C genome, are summarized in Table 1, left half, and illustrated in Figure 2A. The closest to the  [43] and its derivative K65-3D (brackets in genotype are used to indicate the homozygous diploid genotype of homothallic HO strain) [44] on one side, and S288c [4] and its derivative DBY746, on the other [45]. Dr. Craig Giroux (NIEHS, USA) performed a series of backcrosses to find segregants with the desired markers. This yielded CG379, the basic strain for creating DNA polymerase mutations [24]. The lys2 allele with Tn5 insertion was introduced as described in [46], ura3-52 was replaced by the ura3-4 allele, and the strain was made diploid by HO endonuclease. Tetrad dissection gave segregant 1B-D770 with a changed mating type [30]. Next, ura3-4 was converted back to URA3 to create LAN201 [22], which gave rise to diploid LAN211 by auto-diploidization with the assistance of an HO-bearing plasmid.

DNA Sequence Variations in Genes of the RRR Panel
Next, we compiled a list of 183 genes relevant to genome stability (arranged alphabetically with a short annotation in Supplementary Table S1). The panel includes genes encoding for DNA polymerases and accessory subunits; other proteins involved in DNA replication, repair, and recombination; nucleotide metabolism; chromatin remodeling; cell cycle; checkpoint; and others. We understand that the list might need to be updated because of so many intertwining processes in the cell, but we believe that we selected most of the important genes. Approximately one-third of the genes in our panel are essential for vegetative growth. Only five genes do not harbor non-synonymic or other significant changes: NTG1, RFA1, RIM1, UBC13, and HAM1. All other genes possess non-synonymous changes that are predicted (by methods described in Section 4.2) to exert a moderate to high impact on protein function, although they are primarily found in the genome of SK1 (Table 1, right half). The analysis of alterations in the gene of the panel revealed only 3 changes in BY4742, around 100 changes in W303-LANs, and more than 600 changes in SK1, averaging around 4 per gene. The information on the position and types of changes for RRR genes are summarized in Supplementary Table S2 and for all genes in Supplementary  Table S3. Variations happen both in essential and non-essential genes.

DNA Sequence Variations in DNA Polymerase-Related Genes
DNA replication is a major characteristic of life. The variation in the genes for the apparatus of replication determines the properties of organisms. Due to our long-term interest in replication [47,48], we focused on polymorphisms of the DNA polymeraserelated genes in the five strains ( Figure 3). Non-synonymic changes and multiple mutations were prevalent in all the selected genes in SK1 irrespective of essentiality. On the contrary, BY4742 possessed almost no differences from S288C; the only affected essential genes were PSF3 (although multiple changes do not lead to amino acid changes in the protein, Figure 3B, Supplementary Table S2), a component of the GINS complex necessary for initiation of replication [49] and DPB2, a second subunit of pol ε [50]. Both genes are related to the leading DNA strand replication. LANs and W303 possess an intermediate number of variations. Table 2 lists the detected amino acid changes. Not all of the multiple changes listed in Figure 3 cause amino acid substitutions, but they can alter a gene's transcription or mRNA stability (see details of these changes in Supplementary  Table S2). These changes are inherited from CG379 because all the changes in the LAN strains listed in Table 2 are also present in another direct descendant of CG379, strain ySR128 (https://www.ncbi.nlm.nih.gov/nuccore/?term=ySR128 [ncbi.nlm.nih.gov] 29 March 2023) that is extensively used for studies of mutagenic effects of APOBEC deaminases and other agents preferentially damaging ssDNA [51][52][53]. W303 possessed the same amino acid change in Dpb2 as BY4742 and, in addition, a change in Rad5. This finding serves as an internal control for the accuracy of our analysis because the change of Rad5 in this strain was one of the examples of how yeast polymorphisms can affect parameters of genetic processes (Introduction, [7]). No amino acid changes were found in Dpb3, Psf1-3, Pol3, and Pol30, the proteins responsible for bulk replication of the yeast genome. However, another essential component of pol ε, Dpb2, varied in all strains. Conversely, a conditional non-essential Pol32 was stable in BY4742 and W303 but harbored several substitutions in LAN and SK1. Some of these substitutions were shared, alluding to common ancestors. also present in another direct descendant of CG379, strain ySR128 (https://www.ncbi.nlm.nih.gov/nuccore/?term=ySR128 [ncbi.nlm.nih.gov] 29 March 2023) that is extensively used for studies of mutagenic effects of APOBEC deaminases and other agents preferentially damaging ssDNA [51][52][53]. W303 possessed the same amino acid change in Dpb2 as BY4742 and, in addition, a change in Rad5. This finding serves as an internal control for the accuracy of our analysis because the change of Rad5 in this strain was one of the examples of how yeast polymorphisms can affect parameters of genetic processes (Introduction, [7]). No amino acid changes were found in Dpb3, Psf1-3, Pol3, and Pol30, the proteins responsible for bulk replication of the yeast genome. However, another essential component of pol ε, Dpb2, varied in all strains. Conversely, a conditional non-essential Pol32 was stable in BY4742 and W303 but harbored several substitutions in LAN and SK1. Some of these substitutions were shared, alluding to common ancestors. Figure 3. DNA polymerases and accessory protein-encoding genes: differences from S288C. Essential genes are highlighted in light blue. One gene, POL32, is highlighted yellow because yeast with the deletion of the gene is viable at 30 °C but non-viable at 13 °C (cold-sensitivity) [54]. Deleting the POL32 ortholog in fission yeast S. pombe (cdc27) is also lethal [55]. Triangles   . DNA polymerases and accessory protein-encoding genes: differences from S288C. Essential genes are highlighted in light blue. One gene, POL32, is highlighted yellow because yeast with the deletion of the gene is viable at 30 • C but non-viable at 13 • C (cold-sensitivity) [54]. Deleting the POL32 ortholog in fission yeast S. pombe (cdc27) is also lethal [55]. Triangles indicate two variants of the corresponding types (color-coded) in the respective strain/gene combination. Multiple changes include upstream and downstream alterations (5 -and 3 -UTRs), synonymous changes, and missense mutations. (A) SNPs and indels: all changes, including promoter, transcription start, and termination zone sequences. (B) SNPs and indels in coding regions.

Genetic Specifics of LAN Strains
We were particularly interested in the status of RRR genes in LAN strains because of the extensive work performed by us and others with these strains or their close relatives (Introduction). Figure 4 illustrates the main differences between these strains from S288C and others (details of the analysis results are in Supplementary Table S2). The figure shows what genes are changed in the LAN strains and whether the same genes are polymorphic in other strains. Only a few genes that are variable in LANs are also variable in BY4742 and W303. In contrast, most genes that are variable in LANs are also variable in SK1. The ubiquitous variants across the panel are in the two non-essential genes: RDH54, involved in recombination, and HSM3, a chaperone involved in DNA mismatch repair, and in one essential gene, DPB2 discussed in Section 2.4. LAN strains carry variants in all genome stability pathways; the most prominent are in genes participating in chromatin remodeling, DSB repair, DNA polymerases, and cell cycle control. The significance of these variations has to be determined, but the results of our study might help with interpreting and comparing the results of studies performed on different strain backgrounds. and others (details of the analysis results are in Supplementary Table S2). The figure shows what genes are changed in the LAN strains and whether the same genes are polymorphic in other strains. Only a few genes that are variable in LANs are also variable in BY4742 and W303. In contrast, most genes that are variable in LANs are also variable in SK1. The ubiquitous variants across the panel are in the two non-essential genes: RDH54, involved in recombination, and HSM3, a chaperone involved in DNA mismatch repair, and in one essential gene, DPB2 discussed in Section 2.4. LAN strains carry variants in all genome stability pathways; the most prominent are in genes participating in chromatin remodeling, DSB repair, DNA polymerases, and cell cycle control. The significance of these variations has to be determined, but the results of our study might help with interpreting and comparing the results of studies performed on different strain backgrounds. The analysis was performed as in Figure 3. Triangles indicate two variants of the corresponding types (color-coded) in the respective strain/gene combination.

Discussion
The genetic background of yeast strains plays an insidious or treacherous role in the analysis of genome instability. In addition to numerous known examples of how a single nucleotide change in the genome completely changes the phenotype of the gene under study (some of them are listed in the Introduction), we have our own story. Our present examination of genomes was instigated in part by our recent work on genome-wide analysis of yeast clones that evolved under stress imposed by a deletion of the N-terminal, catalytic active containing half of Pol2, which was performed in LAN201 and E134 strain derivatives [57]. The deletion created in haploid strains confers a severe growth defect, but miserably growing cells yield healthy colonies with time. NGS analysis of the DNA of these fast growers revealed many different genomic variants and recurrent mutations in

Discussion
The genetic background of yeast strains plays an insidious or treacherous role in the analysis of genome instability. In addition to numerous known examples of how a single nucleotide change in the genome completely changes the phenotype of the gene under study (some of them are listed in the Introduction), we have our own story. Our present examination of genomes was instigated in part by our recent work on genome-wide analysis of yeast clones that evolved under stress imposed by a deletion of the N-terminal, catalytic active containing half of Pol2, which was performed in LAN201 and E134 strain derivatives [57]. The deletion created in haploid strains confers a severe growth defect, but miserably growing cells yield healthy colonies with time. NGS analysis of the DNA of these fast growers revealed many different genomic variants and recurrent mutations in the essential CDC28 gene encoding for a catalytic subunit of cyclin-dependent kinase (CDK), a master regulator of the cell cycle. The genetic analysis confirmed that single nucleotide changes in several sites of the gene acquired by the healthy growers cause rescue of the slow growth phenotype (changes L62F, L84V, L86H, and I236N). These changes are located on the surfaces of Cdc28 interacting with cyclins and a Cks1 subunit of CDK. Thus, combining deleterious and benign mutations in two essential genes might provide growth advantages. We did not see any significant changes in the CDC28 gene in the examined strains. However, CKS1 has several mutations leading to non-synonymic amino acid changes that are predicted to have a moderate impact on protein function in the SK1 strain (Supplementary Table S2). Thus, the effect of these mutations may be similar to the effect of CDC28 alleles because they affect the same CDK complex.
Our work revealed a wide diversity of yeast strains commonly used to analyze genome stability. The genetic consequences of only a few variants have been examined. Many more studies are needed to be performed using structural modeling and genetic and biochemical approaches. Our study will serve as a guidebook for such endeavors and help to better interpret past and future genetic findings.

Sources of Raw Data
The NCBI sources and references to genomic data are in Table 3.

Oher Methods
SNP-based phylogenetic trees from whole genome sequence data were reconstructed using REALPHY by aligning the reads to S288C reference sequences with default parameters (v.1.13) [66]. The Interactive Tree Of Life (https://itol.embl.de) [67] was used to display of phylogenetic tree.
The number of SNPs within a sliding window of a 1 Mb size and SNP density plot was carried out with CMplot [61]. Genomic information was visualized with CoMut [68].
The summary of all genomic differences (not only RRR-panel) in our five strains is in Supplementary Table S3. The results for genes that we might have inadvertently not covered above can be found in this Table. Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms24097795/s1.