Simple Sequence Repeat Markers Reveal Genetic Diversity and Population Structure of Bolivian Wild and Cultivated Tomatoes (Solanum lycopersicum L.)

The western part of South America is a centre of diversity for tomatoes, but genetic diversity studies are lacking for parts of that region, including Bolivia. We used 11 simple sequence repeat (SSR) markers (including seven novel markers) to evaluate genetic diversity and population structure of 28 accessions (four modern cultivars, four advanced lines, nine landraces, 11 wild populations), and to compare their genetic variation against phenotypic traits, geographical origin and altitude. In total, 33 alleles were detected across all loci, with 2–5 alleles per locus. The top three informative SSRs were SLM6-11, LE20592 and TomSatX11-1, with polymorphism information content (PIC) of 0.65, 0.55 and 0.49, respectively. The genetic diversity of Bolivian tomatoes was low, as shown by mean expected heterozygosity (He) of 0.07. Analysis of molecular variance (AMOVA) revealed that 77.3% of the total variation was due to variation between accessions. Significant genetic differentiation was found for geographical origin, cultivation status, fruit shape, fruit size and growth type, each explaining 16–23% of the total variation. Unweighted Pair Group Method with Arithmetic Mean (UPGMA) tree and principal coordinate analysis (PCoA) scatter plot both revealed differentiation between accessions with determinate flowers and accessions with indeterminate flowers, regardless of cultivation status. The genetic profiles of the accessions suggest that the Bolivian tomato gene pool comprises both strictly self-pollinating and open-pollinating genotypes.


Introduction
Bolivia, Chile, Ecuador, the Galapagos Islands and Peru together constitute the centre of origin and distribution of tomatoes (S. lycopersicum L.) [1], although early domestication of tomatoes occurred in both the Andean region and Mexico [2]. Wild tomato relatives are still distributed naturally in the Andean region [3] and may harbour an untapped diversity of genes, which might be useful in diversifying the current cultivated tomato gene pool. In South America, the first domestication, affecting weight and fruit shape, occurred in Ecuador and Peru [4]. In Bolivia, tomato relatives grow in a wide range of ecosystems [5] and public Bolivian institutions are responsible for utilisation, conservation and characterisation of Bolivian core germplasm [6,7]. Some progress has been made in classifying the genetic diversity of Bolivian tomatoes by evaluation and comparison of quality traits among selected accessions. For example, there has been an evaluation and characterisation of 28 tomato accessions from Bolivian core germplasm [8] and a genetic study of 31 accessions from various subspecies of S. lycopersicum [9]. A wide range of phenotypic differences has been reported for Bolivian germplasm, despite a lack of   Seeds of the 28 accessions were planted in 4.5-L plastic trays filled with nutrient-rich soil (0.08 L per plug) in a greenhouse at the Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU). The emerging seedlings were grown under day/night temperature of ±24 °C/19 °C, 16 h of light and 60% relative humidity until sampling of leaf tissue for DNA extraction at one month after planting. Young leaf tissue was sampled separately from 10 individual plants of all accessions except one, which was represented by nine plants (279 samples in total). The fresh leaf tissue taken from each plant was placed in a separate 2-mL Eppendorf tube containing two glass beads with diameter 3 mm, and immediately dipped in liquid nitrogen. The frozen samples were stored at −80 °C until DNA extraction.
The frozen leaf samples were homogenised for 1 min at 30 Hz in a Mixer Mill (MM400-Retsch GmbH, Haan, Germany). Then 400 μL of CTAB-based buffer (0.1 M Tris-HCl, 20 mM EDTA, 1.4 M NaCl, 2% CTAB, pH 7.5) were added to each sample, followed by incubation for 15 min at 56 °C and centrifugation (using Eppendorf 5427 R, Hamburg, Germany) at 9000× g for 3 min. A 200 μL subsample of the supernatant of each sample was transferred to a 96-well plate and DNA extraction was performed with a QIAamp DNA kit, using a QIAcube HT extraction robot (Qiagen, Hilden, Germany). The integrity of the extracted DNA was evaluated by agarose gel electrophoresis (1.5% w/v) and the quantity was determined using a NanoDrop spectrophotometer (ND-1000, Saveen Seeds of the 28 accessions were planted in 4.5-L plastic trays filled with nutrientrich soil (0.08 L per plug) in a greenhouse at the Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU). The emerging seedlings were grown under day/night temperature of ±24 • C/19 • C, 16 h of light and 60% relative humidity until sampling of leaf tissue for DNA extraction at one month after planting. Young leaf tissue was sampled separately from 10 individual plants of all accessions except one, which was represented by nine plants (279 samples in total). The fresh leaf tissue taken from each plant was placed in a separate 2-mL Eppendorf tube containing two glass beads with diameter 3 mm, and immediately dipped in liquid nitrogen. The frozen samples were stored at −80 • C until DNA extraction.
The frozen leaf samples were homogenised for 1 min at 30 Hz in a Mixer Mill (MM400-Retsch GmbH, Haan, Germany). Then 400 µL of CTAB-based buffer (0.1 M Tris-HCl, 20 mM EDTA, 1.4 M NaCl, 2% CTAB, pH 7.5) were added to each sample, followed by incubation for 15 min at 56 • C and centrifugation (using Eppendorf 5427 R, Hamburg, Germany) at 9000× g for 3 min. A 200 µL subsample of the supernatant of each sample was transferred to a 96-well plate and DNA extraction was performed with a QIAamp DNA kit, using a QIAcube HT extraction robot (Qiagen, Hilden, Germany). The integrity of the extracted DNA was evaluated by agarose gel electrophoresis (1.5% w/v) and the quantity was determined using a NanoDrop spectrophotometer (ND-1000, Saveen Werner, Sweden). All extracted DNA samples were kept at −20 • C until polymerase chain reaction (PCR) analysis, while the working solutions (5 ng/µL) were kept at 4 • C for up to 24 h.

Identification of SSRs in the Tomato Genome and Primer Design
The genome of the S. lycopersicum cultivar 'Heinz 1706' (SL3.0 reference Annotation Release 103; (https://www.ncbi.nlm.nih.gov/assembly/GCF_000188115.4 (accessed on 27 July 2022)) was used to identify SSRs in the tomato genome, which were then utilised as new genomic analysis resources for tomato. First, genomic regions of 400 bp to 1200 bp, representing the 12 tomato chromosomes, were randomly sampled. These sequences were searched for identification of dinucleotide and trinucleotide repeat motifs, using WebSat, internet-based software developed for SSR identification [17]. Sequences containing the target SSRs were then further screened based on the suitability of the SSR positions in the sequences for primer design. Next, these sequences were compared against the tomato reference genome at the National Center for Biotechnology Information (NCBI) database, to identify those that are single copy (unique) in the tomato genome, using the Basic Local Alignment Search Tool (BLAST). This resulted in selection of 22 single-copy sequences (two sequences per chromosome) and two highly similar sequences (corresponding to TomSat9-2a and TomSat-2b SSR loci, see Supplementary Tables S1 and S2. The Primer3 program [18][19][20], targeting these SSRs, was used for primer design. Ten genotypes were selected from the different tomato accessions for a first testing of the newly designed primer-pairs in amplifying the target SSR loci under optimised PCR conditions (described below). Twelve of the primer-pairs amplified extra fragments in addition to the target loci, and were therefore excluded, while the remaining primer-pairs amplified only their target loci. To confirm that they matched the target sequences, the amplified products of the 12 primer-pairs were purified and sequenced. Thereafter, the PCR products were purified using E.Z.N.A. Cycle Pure Kit V-spin (Omega Bio-tek; Norcross, GA, USA) and 2 µL of 10 µM sequencing primer and 15 µL of 1 ng/µL purified PCR product for each sample were mixed and sent to Eurofins Genomics Sequencing GmbH (Anzinger Str. 7, 85560 Ebersberg, Germany), where Sanger sequencing was conducted. Each amplified product was sequenced with both the forward and reverse primers used for the PCR. The DNA sequences of the PCR products were then aligned with their corresponding reference sequences using CLUSTAL X version [21], which confirmed amplification of the target loci.
In parallel with the development of the new SSR markers, 40 primer-pairs previously reported to amplify polymorphic SSR loci [22][23][24][25] were screened to determine the quality of their amplified products under optimised PCR reaction conditions. Four of these primerpairs (a-d in Table 2) were selected for use in this study, together with the 12 newly developed primer-pairs (Table S1).

PCR Amplification and Electrophoresis
The 16 selected primer-pairs were used to amplify the target loci of 279 individual genotypes from the 28 tomato accessions. The 5 -end of the forward primers was labelled with either 6-FAM or HEX fluorescent dye (Sigma-Aldrich, St. Louis, USA), for detection of amplified products during capillary electrophoresis. A GCTTCT hexamer was added to the 3 -end of the reverse primers (PIG tailing) to prevent the Taq polymerase from adding non-template sequences to the PCR products, as described in Ballard et al. [26]. The PCR reaction solution was prepared for each sample using the following reagents from Thermo Fisher Scientific GmbH (Waltham, MA, USA) (V.A. Graciuno 8. LT-02241 Vilnius, Lithuania): 2.5 µL Dream Taq buffer (KCl, (NH 4 ) 2 SO 4 and 20 mM MgCl 2 ), 0.3 µL dNTPs (25 mM), 7.5 µL of each forward and reverse primer (10 µM), 0.2 µL DreamTaq DNA Polymerase (5 U/µL), and 5 µL of 5 ng/µL DNA template. A negative control reaction, replacing the DNA template with sterile Millipore water, was also included.
The PCR analysis was carried out using a Bio-Rad thermal cycler S1000 (Hercules, CA, USA) with the following cycling parameters: initial denaturation for 3 min at 95 • C, followed by 35 cycles of denaturation for 30 s at 94 • C, annealing for 40 s at 3-5 • C below the primer's melting temperature and primer extension for 40 s at 72 • C, and a final primer extension for 20 min at 72 • C. After each PCR run, electrophoresis was performed using 1.5% agarose containing GelRed on randomly selected amplified products using a 50 bp Gene Ruler ladder (Thermo Fisher Scientific GmbH, Dreieich, Germany) as size standard, followed by scanning with a BioDoc-It Imaging System (Upland, CA, USA). This led to exclusion of four of the 12 newly developed SSRs, because a significant number of samples failed to be amplified. Hence, the amplified products of 12 SSR loci were used in capillary electrophoresis, as described below.
The PCR products of the 12 SSR loci were multiplexed into four panels following the criteria described in Geleta et al. [27], except that each PCR product was diluted 1:10 in Millipore ultrapure water before multiplexing. This was followed by mixing each multiplexed PCR product (0.5 µL) with Hi-Di™ Formamide (9 µL) (Thermo Fisher Scientific, Waltham, MA, USA) and Size Standard Gene Scan-600 LIZ (9.7 µL) (Thermo Fisher Scientific, Austin, TX, USA), heating at 96 • C for 3 min and cooling on ice. Multiplexed PCR products were then separated by capillary electrophoresis as described in Andersson et al. [28], using an Applied Biosystems 3500 Genetic Analyzer (Thermo Fisher Scientific, Waltham, MA, USA), at the Department of Plant Breeding, SLU, Sweden.

Data Analysis
The capillary electrophoresis step was followed by peak identification using Gene-Marker version 2.7.0 (SoftGenetics, LLC, State College, PA, USA) software with the default settings [29]. Each peak was regarded as an allele and its size was determined using the GS600 size standard. Among the 12 SSR loci, locus TomSatX1-1 was monomorphic across the 28 accessions studied, and hence was excluded from the final data analysis. The alleles of each sample for the remaining 11 polymorphic SSR loci ( Table 2) were exported to Excel and converted to genotypic data for subsequent statistical analysis.
Various genetic diversity parameters were estimated for each locus across all accessions and for each accession across all loci. POPGENE version 1.32 [30] was used to determine observed number of alleles (Na), effective number of alleles (Ne) and percentage of polymorphic loci (%PL). GeneAlEx 6.41 [31] was used to determine number of private alleles (NPL), number of locally common alleles (NLCA), expected heterozygosity (He), observed heterozygosity (Ho), Shannon information index (I), genetic differentiation (G ST ) and fixation index (F). Polymorphism information content (PIC) of each locus was calculated as described in Botstein et al. [32].
Analysis of molecular variance (AMOVA) was conducted to determine the variance within and between accessions and the variance at higher hierarchal level, using Arlequin version 3.5.2.2 [33]. Matrices of pairwise F ST and average pairwise differences between and within accessions were used to generate graphs using a series of R scripts within Rcmd, a console version of the R statistical package, by triggering the command button added to Arlequin version 3.5.2.2 toolbar.
Nei's unbiased genetic distance between the 28 accessions was calculated using Ge-neAlEx 6.41 and then used as input data for Unweighted Pair Group Method with Arithmetic Mean (UPGMA)-based cluster analysis using MEGA7 software, where the optimal tree with the sum of branch length −2.13486148 is shown [34,35]. The genetic distance value was also used for principal coordinate analysis (PCoA) using GeneAlEx 6.41 to visualise the genetic relationship between the accessions. Bayesian statistics-based population structure analysis was conducted using STRUCTURE version 2.3.4 [36], based on the admixture model implementing 100,000 burn-in periods and 200,000 Markov chain Monte Carlo chain iterations for K (number of genetic populations) ranging from two to 15 (with 10 independent runs at each K). The optimum K was then determined using STRUCTURESELECTOR [37], a statistical program based on the STRUCTURE output, following the ∆K approach [38]. A β version of CLUMPACK [39] integrated into the STRUCTURESELECTOR was used to visualise the population structure after the optimal K was determined.

SSR Markers
The total number of alleles recorded across the 11 SSR loci was 33, with the number of alleles per locus varying from two to five ( Table 2). Five of the 11 SSR loci had only two alleles per locus. SLM6-11 and TomSat11-1 were the most polymorphic loci, with five alleles each. The average number of alleles observed per population (Na) for each locus varied from 1.04 (for TomSatX2-2 and TomSatX7-2) to 1.57 (for SLM6-11), with an overall mean of 1.21. The effective number of alleles (Ne) ranged from 1.02 (for TomSatX2-2) to 1.305 (for SLM6-11), with an overall mean of 1.12 ( Table 2). The polymorphism information content (PIC) of the loci ranged from 0.05 (for TomSatX2-2) to 0.65 (for SLM6-11), with an overall mean of 0.29. The most informative locus among the 11 loci was SLM6-11 (PIC = 0.65), followed by TomSatX11-1 (PIC = 0.49). Observed heterozygosity (Ho) at locus level varied from zero (for SLM6-11 and TomSatX2-2) to 0.20 (for LE20592), with an average value across the loci of 0.05. Similarly, expected heterozygosity (He) varied from 0.011 (for TomSatX2-2) to 0.178 (for SLM6-11), with a mean of 0.07. The corresponding values for unbiased expected heterozygosity (uHe) for these loci were 0.01 and 0.20, respectively. The SSR loci showed wide variation in their fixation index values, with minimum, maximum and mean values of −0.45, 1.0 and 0.4 for F IS , 0.62, 1.0 and 0.87 for F IT , and 0.72, 0.90 and 0.80 for F ST . Estimated genetic differentiation (G ST ) at locus level varied from 0.67 (for SLR20) to 0.89 (for TomSatX11-1), with a mean of 0.77, and was highly significant at all loci (p = 0.001) ( Table 2). Table 2. Total number of alleles (TNA), number of different alleles (Na), effective number of alleles (Ne), polymorphism information content (PIC), observed heterozygosity (Ho), expected heterozygosity (He), unbiased expected heterozygosity (uHe), fixation indices (F IS , F IT and F ST ), and population differentiation (G ST , an analogue of F ST adjusted for bias) and its p-value (P(G ST )), for each SSR locus.

Genetic Diversity of the Accessions
The genetic diversity of each accession was estimated using several parameters (Table 3) (Table 3). Of the 28 accessions, 79% had alleles shared by ≤25% of the accessions (NLCA ≤ 25%), whereas all accessions had alleles shared by ≤50% of the accessions (NLCA ≤ 50%). The highest NLCA ≤ 25% value (0.46) was recorded for accession BOL-8288-HT, while the high-est NLCA ≤ 50% value (0.64) was recorded for two accessions ('HT-25' and BOL-8288-HT) ( Table 3).  Among the 28 accessions, six did not have polymorphic loci (PPL = 0), as indicated above, whereas less than 10% of the loci were polymorphic in six other accessions (PPL < 0.1) ( Table 3). The advanced breeding line 'HT-25' was the only accession with more than 50% polymorphic loci (PPL = 0.55). The second highest PPL value (0.45) was recorded for accessions BOL-8288-HT and BOL-8348-HT, both of which are wild. The mean PPL value for the accessions was 0.19, indicating that only 19% of the loci were polymorphic on average.
The genetic diversity of each accession was estimated by Shannon's I and He (gene diversity). In addition to the six accessions that did not show genetic variation (see above), 14 accessions had very low genetic variation, with I and He values below 0.14 and 0.10, respectively (Table 3). Four accessions, 'HT-25' (advanced breeding line), BOL-8226-HT (cultivated), BOL-8348-HT (wild) and BOL-8288-HT (wild), had relatively high genetic variation, with I and He values above 0.20 and 0.14, respectively (Table 3). For example, accession BOL-8288-HT (wild) had I, He and uHe values of 0.29, 0.21 and 0.25, respectively. The mean values of I, He and uHe for the 28 accessions were 0.10, 0.07 and 0.07, respectively. The vast majority of the accessions (86%) had observed heterozygosity (Ho) below 0.10, including those that were totally homozygous (Ho = 0). 'Shanty' (a commercial cultivar from Israel) was the most heterozygous accession, followed by Bolivian advanced breeding line 'HT-25', 'Lia' (a commercial cultivar from Israel) and 'Huichol' (a commercial cultivar from Thailand), with Ho values of 0.27, 0.24, 0.18 and 0.18, respectively ( Table 3). The fixation index (F) of the accessions that have polymorphic loci varied from −1.00 (the lowest possible value) in accession 'Lia', Shanty and 'HT-36' to 1.0 (the highest possible value) in six Bolivian accessions, including one advanced breeding line, one wild and four cultivated accessions (Table 3).

Analysis of Molecular Variance (AMOVA)
AMOVA was performed using 1000 permutations at both the accession and higher hierarchical levels ( Table 4). The results revealed that 77.3% of the total variation was attributable to variation between accessions (F ST = 0.773, p < 0.001), while 22.7% was attributable to variation within accessions, of which 7.1% was accounted for by variation among individuals within accessions and 15.6% by variation within individuals. Table 4. Results of analysis of molecular variance (AMOVA) based on 1000 permutations without grouping the accessions and on grouping them according to geographical region of origin, altitude, cultivation status, fruit shape, fruit colour, fruit size and growth type. The AMOVA analysis at higher hierarchical level was performed by grouping the accessions using seven different criteria: geographical region of origin (La Paz and other regions (Cochabamba, Sucre, Santa Cruz and Beni)); altitude (<500 m above sea level (masl), 950-1200 masl, 1450-1750 masl, and 1850-2250 masl); cultivation status (cultivated and wild); fruit shape (cylindrical, round, slightly flattened); fruit colour (red and yellow); fruit size (intermediate and very small); and growth type (determinate, semi-determinate and indeterminate). Accessions from La Paz differed significantly from those of the other regions in Bolivia, and region of origin accounted for 21.7% of the total variation revealed by the markers used (p < 0.001). Cultivated and wild accessions also differed significantly, with cultivation status accounting for 17.9% of the total variation (p < 0.001) ( Table 4).

Source of Variation
In addition, there were significant genetic differentiations into fruit shape groups, fruit size groups and growth type groups, accounting for 16.8%, 22.5% and 22.0% of the total variation, respectively. However, no significant differences were recorded for altitude groups and fruit colour groups (Table 4).
Pairwise F ST analysis of the 28 accessions revealed significant differentiation (p < 0.05) among the vast majority (94%) of the pairs of accessions (Figure 3). The three non-Bolivian accessions ('Lia', 'Shanty' and 'Huichol') showed significant differentiation from the Bolivian accessions. The differentiation among the four Bolivian advanced breeding lines was also significant (p < 0.05) (Figure 3). 'Rio Grande', a widely cultivated variety in Bolivia, showed significant differentiation from all other accessions except 'HT-23' and BOL-8224-HT. 'Rio Grande' and BOL-8224-HT showed no significant differentiation from each other, having an F ST value close to zero (marked with a purple asterisk in Figure 3

Cluster Analysis and Principal Coordinate Analysis
The UPGMA cluster analysis based on Nei's unbiased genetic distance revealed various genetic relationships among the 28 tomato accessions, including four clusters and two solitary accessions (  . Heatmap displaying average number of pairwise differences of the 28 accessions, estimated using a number of different alleles as a distance method: average number of pairwise differences between the accessions (PiXY; above diagonal), average number of pairwise differences within the corresponding accession (PiX; diagonal); and corrected average pairwise difference (PiXY − (PiX + PiY)/2; below diagonal), also referred to as Nei's distance (d).

Cluster Analysis and Principal Coordinate Analysis
The UPGMA cluster analysis based on Nei's unbiased genetic distance revealed various genetic relationships among the 28 tomato accessions, including four clusters and two solitary accessions (  The Nei's unbiased genetic distance-based principal coordinate analysis (PCoA) further revealed the genetic relationship between the 28 tomato accessions (Figure 6). The The Nei's unbiased genetic distance-based principal coordinate analysis (PCoA) further revealed the genetic relationship between the 28 tomato accessions ( Figure 6). The first two principal coordinates (PCoA) together explained 70% of the total variation, with PCoA1 explaining 52.3% and PCoA2 17.2%. The six accessions in cluster 1 of the UPGMA tree ( Figure 5) formed two small clusters (highlighted in sky blue and green in Figure 6). The two accessions in cluster 2 of the UPGMA tree (BOL-8281-HT and BOL-8288-HT) were separated along PCoA2. Three of the five accessions in cluster 5 of the UPGMA tree ('HT-23', 'Rio Grande' and BOL-8224-HT) were separated along PCoA1, forming a group highlighted in pink in Figure 6. The remaining two accessions in cluster 5 (BOL-8222-HT and BOL-8223-HT), the solitary accessions BOL-8282-HT and BOL-8340-HT, and all accessions in cluster 6 of the UPGMA tree formed the group highlighted in yellow in Figure 6.
first two principal coordinates (PCoA) together explained 70% of the total variation, with PCoA1 explaining 52.3% and PCoA2 17.2%. The six accessions in cluster 1 of the UPGMA tree ( Figure 5) formed two small clusters (highlighted in sky blue and green in Figure 6). The two accessions in cluster 2 of the UPGMA tree (BOL-8281-HT and BOL-8288-HT) were separated along PCoA2. Three of the five accessions in cluster 5 of the UPGMA tree ('HT-23', 'Rio Grande' and BOL-8224-HT) were separated along PCoA1, forming a group highlighted in pink in Figure 6. The remaining two accessions in cluster 5 (BOL-8222-HT and BOL-8223-HT), the solitary accessions BOL-8282-HT and BOL-8340-HT, and all accessions in cluster 6 of the UPGMA tree formed the group highlighted in yellow in Figure 6. Figure 6. Principal coordinate analysis (PCoA) bi-plot, generated based on Nei's unbiased genetic distance, demonstrating the relationship between the 28 tomato accessions, with the first two principal coordinates (PCoA1 and PCoA2) explaining 70% of the total variation. Accessions with the same label colour belong to the same region within Bolivia, or to the same country.

Population Structure Analysis
Analysis of admixture model-based population structure using the STRUCTURE and STRUCTURESELECTOR programs revealed that three genetic clusters (K) was the optimal number, according to the method of Evanno et al. [38] (Supplementary Figure S1). This suggests that the 279 individuals from the 28 accessions analysed in this study originated from three genetic populations. The graphical illustration of the population structure of the 28 accessions clearly showed that most were admixed, albeit to varying degrees. BOL-8335-HT and BOL-8249-HT were the only accessions that were not admixed ( Figure 7). All accessions that formed cluster 1 in the UPGMA tree ( Figure 5  . Principal coordinate analysis (PCoA) bi-plot, generated based on Nei's unbiased genetic distance, demonstrating the relationship between the 28 tomato accessions, with the first two principal coordinates (PCoA1 and PCoA2) explaining 70% of the total variation. Accessions with the same label colour belong to the same region within Bolivia, or to the same country.

Population Structure Analysis
Analysis of admixture model-based population structure using the STRUCTURE and STRUCTURESELECTOR programs revealed that three genetic clusters (K) was the optimal number, according to the method of Evanno et al. [38] (Supplementary Figure S1). This suggests that the 279 individuals from the 28 accessions analysed in this study originated from three genetic populations. The graphical illustration of the population structure of the 28 accessions clearly showed that most were admixed, albeit to varying degrees. BOL-8335-HT and BOL-8249-HT were the only accessions that were not admixed (Figure 7). All accessions that formed cluster 1 in the UPGMA tree ( Figure 5) were represented by deeppurple-dominated bars except BOL-8348-HT, which appeared to show high admixture. Similarly, all accessions that formed cluster 6 in the UPGMA tree were represented by blue-dominated bars, except accessions BOL-8225-HT and BOL-8226-HT (Figure 7). Other highly admixed accessions were BOL-8225-HT, BOL-8282-HT and BOL-8288-HT. Among highly admixed accessions, BOL-8282-HT was the only one with significant proportions of alleles from the three genetic clusters represented by the three different colours in Figure 7. In general, the results of cluster, PCoA and population structure analyses were in good agreement regarding the genetic relationships between the accessions studied.
Genes 2022, 13, x FOR PEER REVIEW 18 of 27 with significant proportions of alleles from the three genetic clusters represented by the three different colours in Figure 7. In general, the results of cluster, PCoA and population structure analyses were in good agreement regarding the genetic relationships between the accessions studied.

Discussion
Our analysis of 28 tomato accessions revealed a low level of diversity persisting in exanimated Bolivian tomatoes, despite the fact that Bolivia is part of the centre of diversity [40] or origin [1,41] of tomatoes. The diversity identified in the analysis was mainly between accessions, with factors such as geographical region of origin, cultivation status, fruit shape, fruit size and growth type clearly dividing the tomatoes into groups that were genotypically differentiable by the markers used. Thus, the markers developed within this study and their relationship to the various parameters considered provide opportunities to select parents for crossbreeding in tomato breeding programmes, which is an important measure to generate new Bolivian cultivars [42]. The present study also contributed novel knowledge on the genetic diversity and population structure of Bolivian tomatoes, as previous studies have only included Bolivian accessions of cherry tomatoes [43] and its wild relatives S. lycopersicum var. ceraciforme and S. neorickii [9]. This novel knowledge increases understanding of genetic relationships among Bolivian tomato germplasm and the domestication processes of tomatoes.

The SSR Markers in Revealing Tomato Genetic Diversity
Use of SSR loci previously employed to study the genetic diversity of tomatoes revealed a considerably lower number of alleles for the Bolivian tomatoes investigated here than reported for tomatoes from other countries. For instance, only three alleles were detected at the SLR20 locus among the 28 tomato varieties studied here, whereas Korir et al. (2014) reported five alleles among 42 tomato varieties from China and Kenya at this locus [25]. Further, Gonias et al. (2019) reported eight alleles at this locus in their study of 107 tomato accessions, including cultivars and landraces from Greece and international hybrids [44]. At the SLM6-11 locus, five alleles were observed in this study, whereas six alleles were reported by Geethanjali et al. (2010) for 16 tomato accessions [24]. Smulders et al. (1997) identified seven alleles at the LE20592 locus for 10 tomato accessions encompassing seven S. lycopersicum cultivars and three wild Solanum species [22], but in the present study only four alleles were recorded at this locus. Some of these previous studies analysed a higher number of accessions than in the present study, while other studies analysed fewer accessions, but a higher number of alleles was reported in all cases. In light of this, Bolivian tomatoes can be considered to have low allelic diversity.
Among the 11 SSR loci studied, SLM6-11 was the most informative, with a PIC value of 0.65, corresponding to the value reported in a previous study [24]. Hence, for population genetics analysis of tomato genetic resources, this locus should be prioritised, along with LE20592 (PIC = 0.55) and TomSatX11-1 (PIC = 0.49), the latter developed in the present study. As in a previous study [45], which reported PIC values ranging from 0.06 to 0.60 (mean 0.31) for inbred tomato lines from different countries, the PIC value in this study ranged from 0.05 to 0.65, with a mean of 0.29. However, higher PIC value ranges and mean values have been reported in other studies, e.g., 0.62-0.85 (mean 0.74) for diverse tomato varieties of modern, landrace and hybrid type using SSR markers [44], 0.42-0.87 (mean 0.69) for different tomato species using SSR markers and 0.17-0.74 (mean) 0.45 for tomato varieties from different countries [25,46]. Despite the fact that the PIC values are directly related to the choice of SSRs, the results of the present study indicate that the genetic diversity of Bolivian tomatoes is relatively low. However, the genetic differentiation between the accessions was highly significant, as shown by the high F ST and G ST values at each SSR locus. Highly significant differentiation between tomato accessions has been reported previously for local landraces from southern Italy and contemporary tomato varieties [47], and for tomato landraces from Cyprus, France and Greece [48].
Tomato is predominantly a self-pollinating species [49,50], and for such species observed heterozygosity (Ho) should normally be lower than expected heterozygosity (He) at a neutral polymorphic locus. In line with expectations, the He values were higher than the Ho values for most of the SSR loci analysed in this study. However, Ho exceeded He for the SSR22, LE20592 and TomSatX7-1 loci, indicating that these loci might be linked to genes under balancing selection, which favours heterozygosity [51]. The He values for the different accessions in the present study were generally low (mean 0.07), whereas higher values (0.17-0.71) have been reported in previous studies [2,52,53]. The significant number of open-and self-pollinated accessions included in the present study might be the reason for the low average He values found, as a higher level of heterozygosity can be expected for hybrids [54] than for open-or self-pollinated accessions [55]. Based on available information, only seven ('Lia', 'Shanty', 'Huichol', and the four advanced breeding lines) of the 28 accessions were hybrids, while the others were open-or self-pollinated. Correspondingly, the commercial hybrid cultivars 'Lia', 'Shanty', and 'Huichol' had higher Ho values (0.27, 0.18, and 0.18, respectively) than the other accessions (Ho < 0.1), while the open/self-pollinated group had Ho values that ranged from zero to 0.05. As expected, Ho was lower or equal to He for the open/self-pollinated group.
For the hybrid group, Ho exceeded He except for 'HT-23' and 'HT-37', with both having higher He than Ho and high positive fixation index (F). Steps taken following hybridisation could explain the differences seen in F and He/Ho ratio. One possibility is that 'HT-23' and 'HT-37' have undergone a series of self-pollination events after the hybridisation event, resulting in homozygosity at the majority of their loci and positive F values. In contrast, the two Israeli commercial cultivars ('Lia' and 'Shanty') and the Bolivian hybrid 'HT-36' are most likely F1 hybrids, since they possess maximum levels of heterozygosity (F = −1), while 'Huichol', a cultivar from Thailand, and the Bolivian hybrid 'HT-25' may have been reproduced through open pollination after the hybridisation event took place. The 20 Bolivian accessions can be classified into three subgroups based on their expected heterozygosity, which measures the genetic diversity within the accessions: those with He = 0 (inbred), those with He = 0.01-0.08 (extremely low diversity), and those with He = 0.11 = 0.21 (low to medium diversity). Shannon's I values can also be used to discern these subgroups. However, in terms of phenotypic characteristics and geographical region of origin, each of these groups was found to be generally diverse. For example, the six inbred accessions differ in fruit colour and shape, as well as altitude and geographical region of origin of the germplasm. That group also comprised both cultivated and wild types. Additionally, some of those with very small fruits appeared to be inbred, while others had He values as high as 0.21. Hence, neither geography nor phenotypic characteristics sufficiently explained the genetic variation within the accessions.
The results of the present analysis indicated differences in reproductive mechanisms among both cultivated and wild Bolivian tomatoes. The accessions BOL-8223-HT, BOL-8281-HT, BOL-8284-HT, BOL-8328-HT, BOL-8330-HT and BOL-8340-HT are genetically uniform, and are characterised by very small red fruits except for BOL-8223-HT (which has very small yellow fruits). Their lack of within-accession genetic variation may indicate that they have cleistogamous flowers that prevent pollen movement, resulting in strict inbreeding. Crossbreeding such inbred accessions can be advantageous, since they will produce genetically suitable F1 hybrids that may be superior to their parents in terms of desirable traits. It is possible, for example, to crossbreed BOL-8223-HT (cultivated) with BOL-8330-HT (wild), since they are genetically distinct, as revealed by our cluster, PCoA and pairwise F ST analyses, while they show some differences in their phenotypic characteristics.
The accessions BOL-8225-HT, BOL-8282-HT, BOL-8290-HT, BOL-8316-HT, BOL-8322-HT) and 'HT-23' have similar characteristics to the above group except that they have genetic variation within accessions. However, the high fixation index values obtained here (F = 1) indicate that these accessions are strictly self-pollinating types, a reproductive mechanism determined in previous studies to be dominant in tomatoes [2,49]. In view of the fact that they are cultivated types with the exception of BOL-8290-HT, the low level of genetic variation within these accessions might be due to unintentional gene flow in the form of seeds. ing small round fruits. They differ in fruit colour (red and yellow, respectively) and flowering habit (indeterminate and semi-determinate, respectively) and in the altitude and geographical location of the sampling site. Further characterisation may lead to identification of genotypes with desirable characteristics that can be incorporated into elite cultivars through crossbreeding. The present study clearly indicated that none of the specific geographical regions or altitude ranges within Bolivia can be considered a hotspot for the genetic diversity of Bolivian tomatoes.

AMOVA
The analysis of molecular variance (AMOVA) results for the 28 S. lycopersicum accessions revealed significant variation both between accessions (77.3%) and within accessions (22.7%), which is in line with the characteristics of species that are predominantly self-pollinating. A previous study [56] that evaluated two wild tomato species from the Galapagos Islands (S. cheesmaniae and S. galapagense) also revealed much higher variation between accessions (>90% of the total genetic variation) than within accessions. In fact, these two wild species are strict inbreeders, unlike some open-pollinating S. lycopersicum accessions analysed in the present study. However, higher variation within accessions (accounting for 29% and 36% of the total variation, respectively) has previously been reported for S. lycopersicum and S. pimpinellifolium, both being self-compatible [50]. In contrast, in the outcrossing S. peruvianum, only 32.2% of the total variation is between accessions [50]. A possible conclusion from the above discussion is that the reproductive mechanism of a species can have a profound impact on population differentiation.
Based on the highly significant genetic variation found between the accessions and the significant differentiation between over 90% of accession pairs, crossbreeding between genotypes bearing desirable agronomic and fruit characteristics may prove to be the most effective approach for cultivar development. It should be noted that in the present study, hierarchical AMOVA revealed significant differences between groups based on a variety of factors, such as region of origin, domestication/breeding status, fruit shape, fruit size and flowering habit. Around 20% of the total genetic variation differentiated accession groups from La Paz versus other regions in Bolivia, pointing to the importance of isolation by distance and geographical barriers in population differentiation. This significant divergence can also be explained partly by the fact that most of the accessions from La Paz are wild populations with small fruits and semi-determinate flowers, while most of the accessions from other regions are cultivated types. This differentiation level is comparable with that of S. chesmaniae, but four times higher than that of S. galapagense from different regions of the Galapagos Islands [56]. On the other hand, the lack of genetic differentiation among altitude groups despite the wide range of altitude of their sampling sites (227-2858 masl) indicates that tomatoes can adapt to altitude. The AMOVA analysis did not significantly differentiate tomatoes with red and yellow skin colour.
Although cultivated Bolivian tomatoes differed significantly from their wild counterpart, that differentiation explained only 18% of the total variation. The majority of the cultivated tomatoes still bear small fruits, similar to wild types, indicating that little has been accomplished in terms of selection-based improvement in fruit size. Significant variations were observed between fruit shape groups, fruit size groups and flowering habit groups, accounting for 16.8%, 22.5% and 22.0% of the total variation, respectively. These results are not surprising, since the commercial cultivars and advanced breeding lines have predominantly cylindrical and round fruits, while the majority of the wild accessions have somewhat flattened fruits. Additionally, commercial cultivars and advanced breeding lines have larger fruits and determinate flowers, while wild and landrace accessions have small fruits and indeterminate/semi-determinate flowers.
Since determinate growth habit and larger fruits are desired characteristics in tomato cultivars, resulting in synchronous maturity that facilitates harvesting and higher fruit yields, these traits have been the target of domestication and breeding programmes [57,58]. Therefore, the significant differences observed here between flowering habit groups and fruit size groups relate to domestication status. The lack of significant differentiation between the fruit colour groups can be explained by the fact that both cultivated and wild accessions possess a large proportion of red fruits, but yellow fruits are also found in both cultivated and wild groups. Tomato skin colour is a phenotypic quality trait that is known to be regulated by several genes, including phytoene synthase 1 (PSY1), phytoene desaturase (PDS), 15-cis-zeta-carotene isomerase (ZISO) and DE-ETIOLATED 1 (DET1) [59]. Due to the fact that the SSRs used in this study are not linked to genes that regulate fruit colour, which is needed to differentiate this trait [60], the results indicate that cultivated tomatoes with different fruit colours have rather similar genetic backgrounds, as is also likely to be the case for wild tomatoes.

Cluster, Principal Coordinate and Population Structure Analyses
There was significant genetic differentiation between cultivated and wild accessions, with the exception of BOL-8282-HT and BOL-8284-HT, as visualised by the heatmaps of pairwise comparisons, UPGMA tree, PCoA scatter plot and STRUCTURE graph. Thus, use of wild accessions in crossbreeding with cultivated accessions of tomatoes provides the potential to develop cultivars that have favourable fruit characteristics and are welladapted to Bolivian agroecosystems. It should be noted, however, that there are higher genetic similarities between the Bolivian cultivated tomatoes and the foreign commercial cultivars evaluated here than between Bolivian cultivated and wild tomatoes. This is an excellent example of how domestication has shaped crop evolution through the selection of germplasm for multiple desirable traits that are generally categorised as "domestication syndrome" traits [61]. Furthermore, the determinate and indeterminate types of tomatoes were found to be clearly separated from each other regardless of their cultivation status, unlike the semi-determinate types. Thus, indeterminate cultivated accessions clustered with indeterminate wild accessions, whereas determinate wild accessions clustered with determinate cultivated accessions. This suggests that the substantial genetic differentiation between determinate and indeterminate varieties predates tomato domestication.
Among the cultivated accessions, BOL-8284-HT was the most genetically distinct, exhibiting an indeterminate flowering habit and very small, slightly flattened red fruits. BOL-8335-HT and BOL-8349-HT were the most genetically distinct wild accessions, both exhibiting a semi-determinate flowering habit and producing very small red fruits. Consequently, these accessions could be valuable for crossbreeding to facilitate the development of superior genotypes through genetic recombination for further breeding.
It is noteworthy that some closely clustered accessions, e.g., BOL-8328-HT vs. BOL-8330-HT, 'Rio Grande' vs. 'HT-23' and 'HT-36' vs. 'HT-37', have similar fruit characteristics and flowering habits, suggesting the presence of genetically similar accessions that could be considered "duplicates" in the Bolivian ex situ conserved tomato germplasm. In contrast, some closely clustered accessions (high genetic similarity) such as BOL-8316-HT and 'HT-37' exhibited different fruit characteristics and flowering habit. Thus, grouping the accessions based on their phenotypic characteristics, followed by genotypic characterisation based upon single nucleotide polymorphisms (SNPs) and SSRs, would allow the creation of a core collection comprising distinct accessions.
In previous population genetics studies, different approaches have been used to determine how populations are genetically structured [36,[62][63][64]. The Bayesian modelbased population structure analysis assumes that populations are defined by the frequencies of alleles at multiple loci [36]. Using this method, each genotype within a predefined population is assigned to a cluster or, if the genotype is found to be admixed, to more than one cluster. Using this approach, we found that the majority of the 28 accessions studied and their individual genotypes exhibited varying degrees of genetic admixture, suggesting significant gene flow between the different groups. Such genetic admixture may result from natural gene flow in wild habitats and agroecosystems and from intentional crossbreeding to produce cultivars that meet desired characteristics such as fruit quality [57].
The commercial cultivars and most of the Bolivian advanced breeding lines displayed highly similar population structures to wild accessions collected from La Paz region, implying that the core germplasm collected in La Paz plays a significant role in tomato improvement programmes in Bolivia. To our knowledge, only one previous study has examined the genetic diversity of Bolivian core germplasm, using 31 accessions of S. neorickii, S. chmielews, S. lycopersicum ceraciforme and S. lycopersicum spp. [9]. That study reported genetic distance in collections between La Paz and other regions. In the present study, not all wild and landraces accessions from Bolivian core germplasm were used, as only accessions classified as S lycopersicum L. were included. The best-represented region was La Paz, which had a significant number of accessions in the core collection. Nonetheless, this study provides valuable novel information on genetic diversity and genetic structure for potential use in breeding programmes. Most of the alleles found in the cultivated accessions derived from two of the three genetic populations identified. Alleles of these genetic populations appear to be widely distributed in non-Bolivian and Bolivian accessions, since they are well represented in these accessions. The accessions that formed cluster 1 in the UPGMA tree, shown as sky blue and green clusters in the PCoA plot in Figure 6 (clearly differentiated along PCoA1) were represented by deep purple bars in the optimal genetic structure plot in Figure 7, except for BOL-8348-HT which apparently had high genetic admixture. The results presented here are partly consistent with those of another study in which tomato wild relatives and landraces showed less admixture than market-oriented cultivars [65]. As a whole, the results of the cluster, PCoA and population structure analyses agreed very well, clearly demonstrating genetic relationships between the accessions studied and the pattern of their genetic variation.

Conclusions
The number of alleles detected in this SSR-based study on Bolivian tomato accessions ranged from two to five, indicating low allelic diversity of examined Bolivian accessions. TomSatX11-1 proved to be the most informative of the newly developed SSR markers in this study and should be prioritised for population genetics analysis of tomatoes, together with other highly informative markers such as SLM6-11 and LE20592. While Bolivia lies within the centre of diversity and origin of tomatoes, explored Bolivian tomatoes have generally low genetic variation within each accession. However, there is highly significant genetic differentiation between the accessions, explaining approximately 75% of the total genetic variation. There is also significant genetic variation between wild and cultivated tomatoes and between tomatoes with different geographical origin, fruit shape, fruit size and flowering habits. However, there is no significant genetic difference between tomatoes from different altitude ranges or tomatoes with different fruit colours. The genetic differentiation between tomatoes with determinate and indeterminate flowers may predate tomato domestication. Tomatoes have genetically determined mechanisms that contribute to either cleistogamous flowers, generating genetically uniform genotypes, here  Table S1. Simple sequences repeat microsatellites (SSRs) developed in the present study. Table S2. Final selection of simple sequences repeat microsatellites (SSRs) for the present analysis. Figure S1. Delta-K plot with a maximum value at 3 (∆K = 3). Data Availability Statement: Data is stored at the SLU server and available by contact with the first author.