First Detection and Identification of Southern Tomato Virus Infecting Tomatoes in Oklahoma with Complete Genome Characterization and Insights into Global Genetic Diversity

Salil Jindal; Akhtar Ali

doi:10.3390/v17091193

and

Department of Biological Science, Oxley College of Health & Natural Sciences, The University of Tulsa, Tulsa, OK 74104, USA

^*

Author to whom correspondence should be addressed.

Viruses2025, 17(9), 1193;https://doi.org/10.3390/v17091193

This article belongs to the Special Issue Plant Viral Pathogens: Innovations in Detection, Genetic Diversity, and Evolutionary Dynamics

Version Notes

Order Reprints

Abstract

Southern tomato virus (STV) or Amalgavirus lycopersici is a persistent virus impacting tomato crops globally. This study identified new STV isolates from Oklahoma and analyzed their evolutionary relationship to global STV isolates. Phylogenetic analyses (complete genomes or individual genes) grouped STV isolates into two distinct clades, independent of geographic origin or host. Notably, Oklahoma isolates formed a separate cluster from previously reported isolates in the United States of America (USA). Coalescent analysis suggested the most recent common ancestor of STV fusion protein emerged around 135 years ago. Genetic diversity among STV isolates was low, with slightly more variability in the RNA-dependent RNA polymerase (RdRp) gene than the p42 gene. Both genes showed strong purifying selection. No recombination events were detected across complete genomes. Structure analysis revealed that the p42 protein, particularly its C-terminal region, displayed higher disorder, indicating a possible role in host interactions and viral adaptability. These findings deepen our understanding of STV’s evolution and highlight the need for ongoing surveillance and broader genomic sampling.

Keywords:

high-throughput sequencing; phylogenetics; molecular evolution; genetic diversity; recombination; purifying selection; protein disorder southern tomato virus

1. Introduction

Tomato (Solanum lycopersicum L.) is among the most cultivated solanaceous vegetables having its origin in western South America [1]. It is consumed both fresh and in processed forms such as soups, sauces, and ketchup [2]. In addition to its culinary uses, tomatoes are a valuable source for secondary metabolites with antioxidant properties and potential anti-cancer benefits [3]. In the early 2020s, global tomato production was reported at approximately 186.82 million tons, cultivated over 5 million hectares with an average productivity of 36.97 tons/hectare [4]. Despite its economic and nutritional importance, tomato production is significantly hindered by its high susceptibility to viral diseases, which can result in yield losses ranging from 70 to 95%, thereby impacting market availability [5].

Southern tomato virus (STV; Amalgavirus lycopersici) belongs to the family Amalgaviridae. It was initially identified in symptomatic tomato plants exhibiting stunting, fruit discoloration, and size reduction in California and Mississippi (USA) as well as in southwestern Mexico [6]. Over the past decade, STV has been reported in tomato crops across several countries, including Albania [7], Bangladesh [8], China [9,10,11], Colombia [12], France [13], Germany [14], Greece [15], Italy [16], Korea [17], Panama [18], Pakistan [11], Serbia [19], Spain [20], Turkey [21], and United Kingdom [22]. The STV genome consists of a monopartite double stranded RNA (dsRNA) molecule approximately 3.5 kb in length, containing two overlapping open-reading frames (ORFs). ORF 1 encodes a putative coat protein (p42), while ORF 2 encodes a fusion protein with RNA-dependent RNA polymerase: (RdRp) activity, expressed via a +1 ribosomal frameshift. STV is efficiently transmitted through seeds but not by mechanical means or grafting [6]. Notably, STV is classified as a cryptic plant virus, often causing no visible symptoms in singly infected plants [11].

The cryptic nature of STV presents significant challenges in elucidating its genetic diversity and evolutionary dynamics factors that are critical for understanding its epidemiology [23]. This ambiguity also hampers the development of reliable diagnostic tools and the implementation of effective disease management strategies [24]. Previous investigations into STV population dynamics have been limited, with studies focusing on a small number of complete genome sequences [11,18,21] or targeting only the putative coat protein gene [25]. In contrast, the present study utilizes over 100 STV isolates available in the GenBank database to conduct a comprehensive analysis of the virus’s genetic diversity and evolutionary patterns. Additionally, this study reports, for the first time, the detection and characterization of STV isolates from tomatoes in Oklahoma. The extensive analysis conducted here offers new insights into STV population structure. Furthermore, this study employs, for the first time, advanced Bayesian evolutionary analysis using BEAST platform, as well as protein disorder prediction, to explore the evolutionary and functional characteristics of STV in greater depth.

2. Materials and Methods

2.1. Sample Collection and Total RNA Extraction

In 2023, tomato leaf samples exhibiting typical virus-like symptoms were collected from five plants (designated K71 to K75) in Cherokee County, Oklahoma. Total RNA was extracted using the TRI-Reagent protocol (Molecular Research Center, Cincinnati, OH, USA) and quantified with a NanoDrop spectrophotometer (Thermo Fisher Scientific; Wilmington, DE, USA).

2.2. High Throughput Sequencing and RT-PCR Confirmation

Total RNA extracted from the five samples was combined into two pooled composite samples. RNA from samples K71 and K72 was normalized and pooled to generate composite sample OK1, while RNA from samples K73, K74, and K75 was pooled to create composite sample OK2. These pooled samples were subjected to high-throughput sequencing (HTS) using the NextSeq 500/550 High-Output Kit v2.5 (Illumina, San Diego, CA, USA) at the Genomics Facility, Oklahoma State University (Stillwater, OK, USA). Paired-end reads generated from HTS were trimmed and assembled de novo using CLC Genomics Workbench v22.0.1 (Qiagen, CLC bio, Aarhus, Denmark). The resulting assembled sequences were analyzed using BLASTn and BLASTx (NCBI, Bethesda, MD, USA) against the non-redundant GenBank database (NCBI, Bethesda, MD, USA) for sequence identification.

Total RNA from individual samples was reverse transcribed into complementary DNA (cDNA) using random hexamers and MMLV reverse transcriptase, following the manufacturer’s protocols (Genscript, Piscataway, NJ, USA). Reverse transcription PCR (RT-PCR) was performed in a 20 µL reaction containing 1 µL of cDNA template, Taq DNA polymerase (Genscript, Piscataway NJ, USA), 10x Taq Buffer, 2 mM dNTPs, RNase A, and sequence specific primers (Table 1). Amplified PCR products were visualized on a 1% TBE agarose gel and purified using the EZNA Cycle Pure Kit (Omega Bio-Tek, Norcross, Georgia, USA). Purified products were then subjected to Sanger sequencing using Eurofins sequencing service (Eurofins-MWG, Louisville, KY, USA).

Table 1. Primers used for the diagnosis of southern tomato virus (STV) using reverse transcription polymerase chain reaction (RT-PCR) assays.

2.3. Multiple Sequence Alignment and Phylogenetic Analysis

Multiple sequence alignments were generated for 108 complete genomes of STV isolates (two isolates were sequenced in this study, while 106 sequences were retrieved from GenBank, last accessed on 24 June 2025) using the MUSCLE algorithm implemented in MEGA version 12 [26]. Similarly, alignments were also performed for the RdRp (108 sequences), fusion protein (108 sequences), and p42 (130 sequences). All alignments underwent pre-processing prior to downstream analysis. The best-fit nucleotide substitution model for each dataset was determined based on the lowest Bayesian Information Criterion (BIC) score using MEGA v12. For the complete genome, a Hasegawa–Kishino–Yano (HKY) model incorporating a gamma-distributed rate variation among sites and a proportion of invariant sites (HKY+G+I) was selected, and for the p42, RdRp, and fusion protein datasets, a HKY model with a parameter for invariant sites (HKY+I) was selected. Phylogenetic trees were constructed using the Maximum likelihood (ML) method in MEGA v12 with 1000 bootstrap replications. Bootstrap values exceeding 70 were considered statistically significant.

2.4. Analysis of Genetic Diversity and Population Structure of STV

Nucleotide diversity (π), defined as the average number of nucleotide differences per site, was estimated for complete STV genomes using DnaSP v6.12.03 [27]. A sliding window analysis was conducted across the entire genome, employing a window size of 100 nucleotides and a step size of 10 nucleotides. Additionally, population genetic parameters were calculated for the complete genomes as well as for the p42, RdRp, and fusion protein gene regions. These parameters included the number of haplotypes (H), haplotype diversity (Hd), number of segregating sites (S), total number of mutations (Eta), average number of nucleotide differences (k), and Watterson’s theta per sequence and per site (θ), along with neutrality tests such as Tajima’s D and Fu and Li’s D* statistic.

Genetic differentiation between two populations, delineated as Clade I and Clade II based on phylogenetic analysis, was evaluated for the RdRp and p42 genes using Ks, Kst*,

Z*, and Snn statistics. Statistical significance was determined through permutation testing with 1000 replicates. To further quantify the extent of genetic divergence and gene flow, fixation index (Fst) and the number of migrants per generation (Nm) were calculated. Fst values range from 0 (no differentiation) to 1 (complete differentiation), with values exceeding 0.33 indicative of limited genetic exchange and pronounced population structure [28,29,30]. Nm provides an estimate of effective gene flow between populations; values below 1 suggest restricted migration and a greater influence of genetic drift, whereas values above 1 imply sufficient gene flow to mitigate drift-driven differentiation [31].

2.5. Recombination Detection and Pairwise Identity Analysis

To detect potential recombination events in STV, two complementary approaches were used. First, multiple sequence alignments of 108 complete genome sequences were examined using the Recombination Detection Program (RDP v4.101) [32]. All seven primary recombination detection methods—RDP, GENECONV, Chimaera, MaxChi, BootScan, SiScan, and 3Seq—along with two secondary scanning methods, BootScan and SiScan, were applied under default settings, with the exception that sequences were designated as linear. Recombination events identified by at least four of these methods were considered statistically significant.

To further investigate recombination breakpoints, the Genetic Algorithm for Recombination Detection (GARD), available via the Datamonkey web server (https://www.datamonkey.org/ (accessed on 29 June 2025)) [33], was used. Additionally, pairwise identity analyses of the complete genome sequences, as well as the p42, RdRp, and fusion protein gene regions, were conducted using the Sequence Demarcation Tool (SDT) version 1.2 [34].

2.6. Bayesian Evolutionary Analysis Sampling Trees (BEAST) Analysis

To estimate the divergence time and nucleotide substitution rates of the fusion protein gene, RdRp, and p42 sequences of STV, Bayesian evolutionary analysis was conducted using BEAST v2.5.2 (Bayesian Evolutionary Analysis Sampling Trees) [35]. Sequences lacking collection dates were excluded resulting in a dataset of 104 sequences for the fusion protein gene and RdRp and 110 sequences for p42. The best fit substitution model for each dataset was determined based on the lowest BIC score in MEGA v12. In BEAUti v2.5.2, input parameters included tip dates derived from collection years available in NCBI metadata, the HKY substitution model, and a relaxed log-normal molecular clock. The tree prior was set to Coalescent: Bayesian Skyline with default settings. Two independent Markov Chain Monte Carlo (MCMC) runs were performed for 200 million generations each, with sampling every 20,000 generations. A 95% highest probability density (HPD) was used to assess statistical significance. Trace files from both MCMC runs were examined in Tracer v1.7.2 [36] to assess convergence and effective sample size. Posterior distributions from independent runs were combined using LogCombiner v2.5.2, discarding the first 10% of each run as burn-in. A maximum clade credibility (MCC) tree was then generated using TreeAnnotator v2.5.2. Final phylogenetic trees were visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/ (accessed on 10 July 2025)).

2.7. Protein Disorder Analysis

Intrinsically disordered proteins regions in plant RNA viruses have been widely predicted and experimentally validated [37,38,39,40,41]. The likelihood of intrinsic disorder in the p42, RdRp, and fusion protein of STV isolates obtained in this study was predicted using the IUPred2A webserver (http://iupred2a.elte.hu (accessed on 2 July 2025)) [42] and compared with the designated STV reference genome sequence (NC_011591). Further, the distribution of disordered amino acids was analyzed by comparing the N-terminal and C-terminal regions of the proteins. IUPred2A estimates the disorder propensity based on total pairwise interaction energy derived from amino acid compositions. To statistically evaluate difference in the proportion of disordered amino acids between the study isolates and the reference genome, the non-parametric Wilcoxon rank-sum test was performed in R with a significance threshold of α = 0.05.

2.8. Analysis of Selective Pressure

To assess the selection pressure acting on the RdRp (n = 108), fusion protein (n = 108) and p42 (n = 130) genes of STV, analysis were conducted using Datamonkey webserver [35]. Three computational methods were employed: Fast Unconstrained Bayesian AppRoximation (FUBAR) [43], Single Likelihood Ancestor Counting (SLAC), and Mixed Effects Model of Evolution (MEME) [44].

3. Results

3.1. Genome Assembly of STV Isolates and RT-PCR Confirmation

High-throughput sequencing (HTS) of composite samples OK1 and OK2 yielded 51,084,045 and 56,242,802 trimmed reads, respectively, with an average reads length of 129.6 bp and 133.86 bp. De novo assembled contigs were subjected to BLASTn and BLASTx searches against the NCBI non-redundant GenBank database, resulting in the identification of two novel STV genomes sequences from tomatoes in Oklahoma (GenBank accession numbers: PV786594-OK1 and PV786595-OK2 isolates) exhibiting over 99% identity with the reference isolate MK948545.

To confirm the presence of STV in the individual tomato samples, RT-PCR was conducted using total RNA and STV specific primers targeting conserved genomic regions. The expected amplification products were obtained from all five symptomatic samples, while no amplification was obtained in the healthy control samples (Figures S1 and S2). PCR products were subsequently purified and sequenced via Sanger sequencing. The resulting sequences exhibited over 99% identity to known STV isolates, thereby validating the presence of STV infection in the tested samples.

3.2. Phylogenetic Analysis

Based on complete genome sequences, all 108 STV isolates were clustered into two major phylogenetic clades designated Clade I and Clade II (Figure 1A). The majority of isolates (79 isolates) were grouped into Clade I and originated from diverse regions, including Asia (Bangladesh, China, Israel, Japan, Pakistan, South Korea, Thailand, Vietnam, and Turkey), Europe (France, Serbia, Slovenia, Spain, United Kingdom, and Turkey), North America (Canada, Dominican Republic, Mexico, Panama, and US), South America (Brazil and Colombia), and Oceania (Fiji). In contrast, Clade II comprised a smaller number of isolates (29 isolates), including those from Albania, Slovenia, and Switzerland, and selected isolates from China, Germany, Serbia, and South Korea. Notably, the two STV isolates characterized in this study were grouped within Clade II, whereas previously reported STV isolates from the USA were placed in Clade I.

Figure 1. Evolutionary analysis of southern tomato virus (STV) based on 108 complete genome sequences. (A) Maximum likelihood phylogenetic tree with 1000 bootstrap replicates based on nucleotide sequences of complete genomes of STV, (B) Nucleotide pairwise identity heatmap of complete genomes of STV. (Highlighted in red color or asterisk-marked (*) STV isolates represent those identified in this study).

The phylogenetic analysis of the fusion protein gene also produced a tree topology congruent with that of the complete genomes (Figure S3), with the newly characterized isolates again placed in Clade II, in contrast to previously reported USA isolates, which remained in Clade I.

Similarly, the phylogenetic tree based on the RdRp gene (Figure S4) showed a comparable topology to the complete genome tree. All isolates previously grouped in Clade II remained consistent, with the exception of OL741992 from Slovenia which was grouped in Clade I in the RdRp-based tree. The Oklahoma isolates from this study continued to cluster within Clade II.

The phylogenetic tree constructed from the p42 gene (Figure S5) largely mirrored the topology of the complete genome tree, with one exception: accession OR725027 from China, which clustered in Clade II in the complete genome tree but grouped in Clade I in the p42-based analysis. The isolates reported in this study consistently clustered in Clade II.

Pairwise nucleotide identity analysis performed using SDT for the complete genomes (Figure 1B), fusion protein (Figure S6), RdRp (Figure S7), and p42 (Figure S8) sequences further supported the clustering patterns observed in the corresponding phylogenetic analyses.

3.3. Genetic Diversity, Genetic Differentiation, and Migration in STV Populations

To characterize the genetic variability of STV, comprehensive population genetic analysis was performed using complete genome and gene-specific datasets. Analysis of 108 complete STV genomes revealed a relatively low mutation rate (Eta = 6.05%) but high haplotype diversity (H_d = 0.979) and a moderate nucleotide diversity (π) of 0.00741. Neutrality tests yielded statistically significant negative values for Fu and Li’s D test and F test whereas Tajima’s D test was negative but not statistically significant (Table 2), suggesting population expansion or purifying selection.

Table 2. Analysis of genetic diversity within the complete genomes and individual genes of southern tomato virus (STV) isolates.

Among individual genes, RdRp exhibited the highest nucleotide diversity (π = 0.00764), as well as greater haplotype diversity, high number of haplotypes, mutations, and segregating sites compared to p42. Neutrality tests for p42 mirrored those of the complete genomes, showing significant negative values for Fu and Li’s D and Fu and Li’s F tests and a non-significant negative value for Tajima’s D test. In the case of RdRp, Fu and Li’s D was significantly negative, while Fu and Li’s F and Tajima’s D were not significant. The fusion protein gene had a nucleotide diversity of π = 0.00741, and all neutrality tests returned negative values, with Fu and Li’s D test and Fu and Li’s F test being statistically significant. Interestingly, the results for the fusion protein gene were identical to those obtained for the complete genomes of STV.

For RdRp, K_s* was 1.68705, K_st* 0.30733, and Z* 7.16782, indicating clear genetic differentiation between the two phylogroups, which was further supported by a significant S_nn value of 1.00000. Similarly, for p42, K_s* 1.07057, K_st* 0.33989, and Z* 7.63219 indicated strong differentiation, with S_nn 0.99231 confirming the result. The high Fst values (0.81010 for RdRp and 0.80377 for p42) and low migration rates (Nm = 0.06) further support the pronounced genetic divergence between these phylogroups.

3.4. Recombination in the STV Population

Recombination detection based on complete genome alignment revealed four potential recombination events. These were identified using RDP v4.101, applying multiple algorithms (MaxChi, SiScan, and 3Seq). The first event involved a single isolate, accession OR725027 (China) with accession PQ492143 (Fiji) and accession KT438549 (China) identified as the major and minor parents, respectively. This event was supported by three different methods (MaxChi, SiScan and 3Seq) and a significant p-value of 2.766 × 10⁻⁴ (3Seq). The second recombination event, supported by two methods (MaxChi and 3Seq), was found in accession OL471984 (Slovenia) with OL471992 (Slovenia) as major parent and PV786595 (OK isolate, this study) as the minor parent (p = 1.739 × 10⁻³, 3Seq). A third event, detected by MaxChi alone, involved accession KT438549 (China) with an unknown major parent and PV786595 (OK isolate, this study), as the minor parent. The fourth recombination event was identified by SiScan, Chimaera, and 3Seq, with OK309713 (Turkey) designated as the recombinant, PQ429143 (Fiji) as the minor parent, and OK309721 (Turkey) as major parent (p = 4.423 × 10⁻², Chimaera). The program warned for the 2nd and 4th recombination events that these could be attributed to other evolutionary events. None of these events were considered significant, as they were not detected by at least four different methods.

Additionally, GARD analysis detected a single recombination breakpoint in the complete genome alignment, with model comparison yielding Δc-AIC values of 65.4360 (vs. null model) and 343.678 (vs. the single tree multiple partition model), pinpointing the recombination breakpoint near nucleotide position 1540.

3.5. Bayesian Phylogenetic Analysis and Substitution Rate

Bayesian phylogenetic analyses were conducted to estimate the evolutionary timelines and substitution rate of STV genes. All estimated sample sizes (ESS) exceeded 200, indicating adequate sampling and convergence of parameters. The maximum clade credibility (MCC) tree (Figure 2A) derived from the fusion protein gene (n = 104) estimated the time to the most recent common ancestor (TMRCA) at approximately 135 years ago (circa 1889), with 95% highest posterior density (HPD) interval spans from 56.9 to 241.4 years ago (Figure 2B). The mean substitution rate of 6.514 × 10⁻⁵ substitutions per site per year (Table 3).

Figure 2. Bayesian based analysis of 104 fusion protein nucleotide sequences of southern tomato virus (STV). (A) Time-scaled Bayesian maximum clade credibility tree constructed from fusion protein sequences of STV isolates. (B) Posterior distribution of tree height estimated using BEAST analysis of fusion protein gene sequences of STV isolates with blue representing sampling falling within the 95% posterior probability range while yellow is outside that range. (Branches highlighted in red indicate STV isolates obtained in the present study).

Table 3. Estimates of nucleotide substitution rate and the age of diversity for southern tomato virus (STV) isolates.

Similarly, the RdRp gene exhibited a mean substitution rate of 4.867 × 10⁻⁵ substitutions per site per year (Table 3), and its TMRCA was estimated at 188.3 years ago (circa 1836) (Figure S9).

In contrast, the p42 gene showed a higher mean substitution rate of 1.184 × 10⁻⁴ substitutions per site per year (Table 3) and a more recent TMRCA of approximately 58.5 years ago (circa 1965) (Figure S10). The MCC trees for the fusion protein, RdRp (Figure S11), and p42 (Figure S12) genes were largely consistent with their respective ML trees, with a minor exception involving a few isolates.

3.6. Comparison of Intrinsically Disordered Regions in STV Proteins

Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions are known to facilitate protein–protein interactions (PPIs), including complex formation and roles in transcription and translation [45,46]. Analysis of STV proteins using IUPred2A revealed that p42 had the highest proportion of disordered amino acids residues (23.34%) (Figure S13), followed by RdRp (3.81%) (Figure S14) and the fusion protein (2.92%) (Figure S15). Most disordered residues were concentrated at the C-terminal regions of all three proteins. Specifically, 98.86% of disordered residues in p42, 93.10% in RdRp, and 87.10% in the fusion protein were localized to the C-terminal region, whereas only minor fractions (1.14%, 6.90%, and 12.90%, respectively) were observed in the N-terminal regions. Statistical comparison using the Wilcoxon rank-sum test indicated no significant differences (α = 0.05) in the percentage of disordered residues between the study isolates and the reference genome.

3.7. Selective Pressure Analysis

For p42, FUBAR identified evidence of pervasive positive (diversifying) selection at four codon sites and pervasive negative (purifying) selection at twenty codon sites, using a posterior probability threshold of 0.9. For RdRp, eight codon sites showed evidence of positive selection, while twenty-two sites were under purifying selection. For fusion protein, thirty-four sites were found to be under purifying selection while four sites were under diversifying selection. These results suggest that although purifying selection is the dominant evolutionary force acting on both genes, a subset of codon positions may be experiencing adaptive evolutionary pressures.

SLAC analysis revealed no evidence of pervasive positive (diversifying) selection in the RdRp or p42 genes at a significance level of 0.1. However, it detected pervasive negative (purifying) selection at sixteen codon sites in the RdRp gene and at six codon sites in the p42 gene. In contrast, the fusion protein gene showed one site under positive selection and twenty-four sites under purifying selection. MEME analysis identified evidence of episodic diversifying selection at a single site in each of the p42, RdRp, and fusion protein genes, based on the likelihood ratio test with a significance threshold of p ≤ 0.1.

4. Discussion

Viruses are characterized by their high evolutionary rate, enabling rapid mutation and contributing to the extensive genetic diversity observed in viral genomes. These evolutionary imprints can be effectively traced through analyses of viral evolutionary dynamics. In tomato, the diversity of viral pathogens has significantly expanded in recent years, with over 312 viruses so far [47]. STV is among the persistent RNA viruses affecting tomato, transmitted vertically through tomato seeds rather than mechanically or via grafting. Although no confirmed vector has been identified, earlier studies have hypothesized a potential role of insect vectors in STV transmission [6].

Interestingly, STV has been reported to exhibit mutualistic effects in single infected tomato plants, with positive impacts on certain plant growth parameters [48]. However, when co-infected with other viruses, STV is associated with more severe disease symptoms [11,49].

In this study, we performed a comprehensive analysis of STV isolates identified for the first time in Oklahoma, alongside publicly available global isolates. Analyses included assessment of genomic variations, recombination, nucleotide substitution rate, disordered protein regions, and phylogenetic relationships. Phylogenetic trees constructed from the complete genome and individual genes (p42, RdRp, and the fusion protein) consistently revealed two primary clades, independent of host or geographic origin. These findings align with previous studies [11,18,21,25]. However, the Oklahoma isolates clustered within Clade II (based on complete genome and individual genes), in contrast to previously reported U.S. isolates, which were grouped in Clade I. This divergence may reflect the introduction of contaminated tomato seeds through global trade as hypothesized in the case of tomato brown rugose fruit virus (ToBRFV) in Netherlands [50], or it may result from local viral evolution overtime due to the error-prone nature of RdRp which lacks proofreading capabilities [51]. Another possible contributor to genetic divergence may be the adaptation of an unidentified vector, if one is involved in transmission, as speculated previously [6]. The presence of two distinct STV subpopulations within Slovenia isolates, as observed in our study, support the hypothesis of concurrent viral lineages within a single geographic region.

Bayesian coalescent analysis of the fusion protein gene estimated the most recent common ancestor of STV to exist around 135 years ago (circa 1889) with a 95% highest posterior density (HPD) interval ranging from 56.8 to 241.4 years. This timeframe predates the first reported case of ‘‘tomato decline’’ in California’s Imperial Valley in 1984, where virus-like symptoms such as yellowing, decline, and poor fruit set were observed [52]. Given STV’s cryptic nature and its frequent detection in asymptomatic infections [6,11,49,53], it is plausible that the virus remained undetected for decades. STV is frequently detected in mixed infections, particularly with the advent of high-throughput sequencing technologies [8,10,11,12,14,15,16,18,19,21,22,54,55]. STV, in association with other viruses including tomato yellow leaf curl virus (TYLCV), tomato chlorosis virus (ToCV), tomato infectious chlorosis virus (TICV), pepino mosaic virus (PepMV), cucumber mosaic virus (CMV), tomato mosaic virus (ToMV), and tomato spotted wilt virus (TSWV), can cause tomato yellow stunt disease (ToYSD) [6,13,14,17,20,24]. STV interacts with CMV and PepMV in mixed infections, enhancing the pathogenic effects of these viruses [49]. Mixed infections involving unrelated viruses within a host can lead to the emergence of new diseases or enhance the pathogenicity of the co-infecting viruses [56,57]. Since mixed infections are common under natural conditions, the interactions between pathogenic and persistent plant viruses warrant further in-depth study [18].

BEAST analysis based on RdRp and p42 estimated TMRCAs of approximately 188.3 and 58.5 years ago, respectively. The evolutionary history of viruses in the family Amalgaviridae suggests an important role of recombination, with evidence pointing to gene flow between double-stranded RNA viruses (e.g., Partitiviridae) and negative-strand RNA viruses (e.g., Phlebovirus and Tenuivirus) [58]. Further phylogenomic studies, including representative taxa from these families, may offer deeper insight into genomic architecture and inter-viral gene exchanges that have shaped amalgavirus evolution. The estimated substitution rate for the STV RdRp gene was 4.866 × 10⁻⁵ substitutions per site per year. Research on substitution rates for dsRNA viruses is limited [59]; however, experimental studies on dsRNA bacteriophage φ6 have reported mutation rates in the range of 10⁻⁴ [60]. These findings suggest that dsRNA viruses may exhibit lower substitutions rates, particularly in plant hosts, where virus evolution tends to proceed more slowly than in animal-infecting viruses [59]. Further comparative research is needed to elucidate evolutionary constraints and dynamics specific to plant dsRNA viruses.

Population structure analysis revealed relatively low genetic diversity among STV isolates. The p42 gene exhibited lower nucleotide diversity (π = 0.00550) than RdRp (π = 0.00764) which also demonstrate greater haplotype diversity and a higher number of segregating sites. These results are consistent with earlier reports [11,18,21,25]. Future studies could investigate how different open reading frames (ORFs) contribute to overall viral diversity. Similar approaches have been employed successfully to study coat protein gene in soybean mosaic virus [61] and the VPg gene in potato potyvirus Y [62].

Selection pressure analysis using FUBAR and SLAC revealed that purifying selection is the predominant evolutionary force acting on STV genes. FUBAR detected limited positive selection (four codons each in the fusion protein and p42), while identifying twenty-one and thirty-five codons under purifying selection in p42 and the fusion protein, respectively. SLAC analysis corroborated these findings, detecting several codons (twenty-five in fusion protein, six in p42) under purifying selection and only one site under positive selection in the fusion protein. MEME analysis further identified one site in the p42 gene and two sites in the fusion protein gene undergoing episodic positive selection, supporting the presence of lineage-specific adaptive evolution. Neutrality tests (Fu and Li’s D test and Fu and Li’s F test) gave significant negative values across the complete genomes and individual genes, suggesting a dominant role of negative selection. Tajima’s D test values were negative but not statistically significant, possibly due to the transient nature of opposing selection forces, which can reduce the test’s power [63]. These findings are in partial agreement with previous studies, while some previous studies [21,25] reported similar patterns for p42 and fusion proteins genes. Our estimates of Tajima’s D differed from those observed previously [11], potentially due to the larger dataset used in this study.

Population genetic differentiation analyses of both RdRp and p42 genes provided strong evidence for substantial divergence between Clade I and Clade II phylogroups. The highly significant values of genetic differentiation tests (K_s*, K_st*, and Z*) for both genes (p < 0.001) suggest that these groups are genetically distinct. For the RdRp gene, the relatively high values of K_s* (1.687) and Z* (7.168), along with a maximum S_nn value (1.000), indicate complete genetic segregation between the phylogroups. Similarly, the p42 gene showed strong differentiation, as reflected by a high K_st* (0.340) and Z* (7.632), and an S_nn of 0.992. The Fst values exceeding 0.33 for both RdRp and p42 genes suggests that the majority of genetic variation is distributed between phylogroups rather than within them, reinforcing the presence of strong population structure. Additionally, the low effective number of migrants implies minimal gene flow, which appears insufficient to prevent the accumulation of genetic divergence. Such restricted migration and strong differentiation are consistent with long-term evolutionary separation and may reflect ecological or host-driven barriers that limit genetic exchange. The results of this study are consistent with those of an earlier study [21], which also reported significant K_s*, K_st*, and Z* values between the two lineages, with Fst exceeding 0.33 and an absolute migration rate below 1.

Recombination is considered as a major driver of genomic variation in plant RNA viruses [64]. No recombination event was identified in the complete STV genomes in this study, consistent with the previous findings [18,21,25]. In contrast, a previous study [11] identified three recombination events among 44 STV isolates, which may reflect differences in the threshold criteria applied during analysis.

Lack of stable tertiary or three-dimensional structure and proper folding characterizes the disordered protein regions in viruses. IDPRs and IDPs play critical roles in various biological functions such as cellular signaling, cell regulation, survival, differentiation, proliferation, and apoptosis [65,66] due to their high plasticity and flexibility [67,68]. In viruses, IDPRs facilitate adaptability, immune evasions and replication regulation [69,70,71]. Our analysis revealed that p42 contained the highest proportion of disordered amino acids (23.34%) compared to RdRp (3.81%) and the fusion protein (2.92%) with disordered residues primarily located in the C-terminal regions. IDRs are known to tolerate high mutation rates, often resulting in functional polymorphism and adaptive potential [37,72]. This is consistent with findings from other RNA viruses, such as Nodamura virus (NoV), where IDRs play a central role in environmental adaptability [69].

5. Conclusions

This study presents the first report of STV in Oklahoma and provides a comprehensive analysis of its genetic diversity and evolutionary characteristics. The phylogenetic placement of Oklahoma isolates in Clade II, distinct from other U.S. isolates, suggests potential introduction via international seed trade or local evolutionary divergence. Further, the BEAST analysis estimated the TMRCA of STV to be approximately 135 years ago. Protein disorder predictions indicated high disorder in p42 C-terminal region, suggesting its involvement in host interaction. Among the genes analyzed, RdRp exhibited higher genetic variability than p42, and all genes were found to be evolving predominantly under purifying selection. These findings emphasize the need for ongoing genomic surveillance and highlight the importance of molecular tools in understanding the epidemiology, evolution, and management of persistent plant viruses such as STV.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v17091193/s1, Figure S1: Agarose gel electrophoresis of PCR products amplified using the STV1 primer pair. Lane 1: molecular weight marker (100 bp Plus); lanes 2–6: tomato samples K71–K75; lane 7: healthy tomato control; lane 8: negative control (water); Figure S2: Agarose gel electrophoresis of PCR products amplified using the STV2 primer pair. Lane 1: molecular weight marker (100 bp Plus); lanes 2–6: tomato samples K71–K75; lane 7: healthy tomato control; lane 8: negative control (water); Figure S3: Phylogenetic tree of 108 STV fusion protein gene nucleotide sequences constructed using maximum likelihood with 1,000 bootstrap replicates; Figure S4: Maximum likelihood phylogenetic tree with 1000 bootstrap replicates using 108 nucleotide sequences of RdRp gene of STV; Figure S5: Maximum likelihood phylogenetic tree of 130 STV p42 gene nucleotide sequences with 1000 bootstrap replicates; Figure S6: Nucleotide pairwise identity heatmap of fusion protein gene sequences. (Highlighted in red color or asterisk-marked (*) STV isolates represent those identified in this study); Figure S7: Nucleotide pairwise identity heatmap of RdRp gene sequences. (Highlighted in red color or asterisk-marked (*) STV isolates represent those identified in this study); Figure S8: Nucleotide pairwise identity heatmap of p42 gene sequences. (Highlighted in red color or asterisk-marked (*) STV isolates represent those identified in this study); Figure S9: Posterior distribution of tree height inferred from BEAST analysis of 104 RdRp gene sequences of STV isolates; Figure S10: Posterior distribution of tree height estimated using BEAST analysis of 110 p42 gene sequences of STV isolates; Figure S11: Time-scaled Bayesian maximum clade credibility tree of RdRp sequences of STV. (Branches highlighted in red indicate STV isolates obtained in the present study); Figure S12: Time-scaled Bayesian maximum clade credibility tree based on p42 sequences of STV. (Branches highlighted in red indicate STV isolates obtained in the present study); Figure S13: Distribution of ordered and disordered amino acid residues within p42 protein of the southern tomato virus (STV) isolates. The STV isolates analyzed were (A) PV786594-OK1, (B) PV786595-OK2, and (C) NC_011591; Figure S14: Distribution of ordered and disordered amino acid residues in RdRp protein of southern tomato virus (STV) isolates. The analyzed isolates include (A) PV786594-OK1, (B) PV786595-OK2, and (C) NC_011591. Figure S15: Distribution of ordered and disordered amino acid residues across fusion protein of southern tomato virus (STV) isolates. The STV isolates analyzed were (A) PV786594-OK1, (B) PV786595-OK2, and (C) NC_011591.

Author Contributions

A.A. initially conceived the study, contributed to partial rewriting, and was responsible for editing and proofreading multiple drafts of the manuscript. S.J. confirmed the observations in both field and laboratory settings, performed all the analysis, authored the first draft, revised the manuscript multiple times, and organized the literature review and citations. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by US Department of Agriculture (USDA), grant number 0409018700.

Data Availability Statement

The STV genomes obtained in this study have been submitted to GenBank under accession numbers PV786594 (OK1) and PV786595 (OK2) and are publicly available at https://www.ncbi.nlm.nih.gov/genbank (accessed on 29 June 2025).

Acknowledgments

The authors would like to thank the Department of Biological Science and The University of Tulsa for their support and resources throughout the course of this study.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Panno, S.; Davino, S.; Caruso, A.G.; Bertacca, S.; Crnogorac, A.; Mandić, A.; Noris, E.; Matić, S. A Review of the Most Common and Economically Important Diseases That Undermine the Cultivation of Tomato Crop in the Mediterranean Basin. Agronomy 2021, 11, 2188. [Google Scholar] [CrossRef]
Harvey, M.; Quilley, S.; Beynon, H. Exploring the Tomato: Transformations in Nature, Economy and Society; Edward Elgar: Cheltenham, UK, 2002. [Google Scholar]
Raiola, A.; Rigano, M.M.; Calafiore, R.; Frusciante, L.; Barone, A. Enhancing the Health-Promoting Effects of Tomato Fruit for Biofortified Food. Mediat. Inflamm. 2014, 2014, 139873. [Google Scholar] [CrossRef]
Food and Agriculture Organization of the United Nations (FAO). FAOSTAT Statistical Database. 2024. Available online: https://www.fao.org/faostat/en/#home (accessed on 21 March 2025).
Rashid, T.S.; Sijam, K.; Awla, H.K.; Saud, H.M.; Kadir, J. Pathogenicity Assay and Molecular Identification of Fungi and Bacteria Associated with Diseases of Tomato in Malaysia. Am. J. Plant Sci. 2016, 7, 949–957. [Google Scholar] [CrossRef]
Sabanadzovic, S.; Valverde, R.A.; Brown, J.K.; Martin, R.R.; Tzanetakis, I.E. Southern Tomato Virus: The Link Between the Families Totiviridae and Partitiviridae. Virus Res. 2009, 140, 130–137. [Google Scholar] [CrossRef]
Magdalena, C.; Slimen, A.B.; Mitri, E.; Orges, C.; Frasheri, D.; Merkuri, J.; Parrella, G.; Elbeaino, T. Molecular Detection and Characterization of Viruses Infecting Greenhouse-Grown Tomatoes in Albania. Phytopathol. Mediterr. 2025, 64, 77–86. [Google Scholar] [CrossRef]
Padmanabhan, C.; Zheng, Y.; Li, R.; Fei, Z.; Ling, K.-S. Complete Genome Sequence of Southern Tomato Virus Naturally Infecting Tomatoes in Bangladesh. Genome Announc. 2015, 3, e01522-15. [Google Scholar] [CrossRef] [PubMed]
Padmanabhan, C.; Zheng, Y.; Li, R.; Sun, S.-E.; Zhang, D.; Liu, Y.; Fei, Z.; Ling, K.-S. Complete Genome Sequence of Southern Tomato Virus Identified in China Using Next-Generation Sequencing. Genome Announc. 2015, 3, e01226-15. [Google Scholar] [CrossRef]
Xu, C.; Sun, X.; Taylor, A.; Jiao, C.; Xu, Y.; Cai, X.; Wang, X.; Ge, C.; Pan, G.; Wang, Q.; et al. Diversity, Distribution, and Evolution of Tomato Viruses in China Uncovered by Small RNA Sequencing. J. Virol. 2017, 91, 10–1128. [Google Scholar] [CrossRef]
Hussain, M.D.; Farooq, T.; Chen, X.; Jiang, T.; Zang, L.; Shakeel, M.T.; Zhou, T. Molecular Detection of Southern Tomato Amalgavirus Prevalent in Tomatoes and Its Genomic Characterization with Global Evolutionary Dynamics. Viruses 2022, 14, 2481. [Google Scholar] [CrossRef]
Bados, J.P.; Gómez, J.; Pérez, A.; Salazar, J.; Marín, M. Molecular Detection of Three RNA Viruses in Tomato (Solanum Lycopersicum L.) Leaf Tissue and Seeds in Antioquia (Colombia). Rev. Colomb. De Cienc. Hortícolas 2025, 19, e18657. [Google Scholar]
Candresse, T.; Marais, A.; Faure, C. First Report of Southern Tomato Virus on Tomatoes in Southwest France. Plant Dis. 2013, 97, 1124. [Google Scholar] [CrossRef] [PubMed]
Gaafar, Y.; Lüddecke, P.; Heidler, C.; Hartrick, J.; Sieg-Müller, A.; Hübert, C.; Wichura, A.; Ziebell, H. First Report of Southern Tomato Virus in German Tomatoes. New Dis. Rep 2019, 40, 1. [Google Scholar] [CrossRef]
James, A.; Andronis, C.; Kryovrysanaki, N.; Goumenaki, E.; Kalantidis, K.; Katsarou, K. First Report of SOUTHERN Tomato Virus from Tomato (Solanum Lycopersicum) in Greece. Plant Dis. 2023, 107, 237. [Google Scholar] [CrossRef]
Iacono, G.; Hernandez-Llopis, D.; Alfaro-Fernandez, A.; Davino, M.; Font, M.; Panno, S.; Galipenso, L.; Rubio, L.; Davino, S. First Report of Southern Tomato Virus in Tomato Crops in Italy. New Dis. Rep. 2015, 32, 27. [Google Scholar] [CrossRef]
Oh, J.; Lee, H.-K.; Park, C.-Y.; Yeom, Y.-A.; Min, H.-G.; Yang, H.-J.; Jeong, R.-D.; Kim, H.; Moon, J.-S.; Lee, S.-H. First Report of Southern Tomato Virus in Tomato (Solanum Lycopersicum) in Korea. Plant Dis. 2018, 102, 1467. [Google Scholar] [CrossRef]
Galipienso, L.; Elvira-González, L.; Velasco, L.; Herrera-Vásquez, J.Á.; Rubio, L. Detection of Persistent Viruses by High-Throughput Sequencing in Tomato and Pepper from Panama: Phylogenetic and Evolutionary Studies. Plants 2021, 10, 2295. [Google Scholar] [CrossRef]
Vučurović, A.; Kutnjak, D.; Mehle, N.; Stanković, I.; Pecman, A.; Bulajić, A.; Krstić, B.; Ravnikar, M. Detection of Four New Tomato Viruses in Serbia Using Post Hoc High-Throughput Sequencing Analysis of Samples from a Large-Scale Field Survey. Plant Dis. 2021, 105, 2325–2332. [Google Scholar] [CrossRef]
Verbeek, M.; Dullemans, A.; Espino, A.; Botella, M.; Alfaro-Fernández, A.; Font, M. First Report of Southern Tomato Virus in Tomato in the Canary Islands, Spain. J. Plant Pathol. 2015, 97, 392. [Google Scholar]
Randa-Zelyüt, F.; Fox, A.; Karanfil, A. Population Genetic Dynamics of Southern Tomato Virus from Turkey. J. Plant Pathol. 2023, 105, 211–224. [Google Scholar] [CrossRef]
Harju, V.; Skelton, A.; Lazenby, M.; Rimmer, T.; Buxton-Kirk, A.; Fowkes, A.; Forde, S.; Ward, R.; Frew, L.; Barker, R.; et al. First Detection, Symptoms and Management of Southern Tomato Virus in the United Kingdom. New Dis. Rep. 2021, 43, e12014. [Google Scholar] [CrossRef]
Grenfell, B.T.; Pybus, O.G.; Gog, J.R.; Wood, J.L.; Daly, J.M.; Mumford, J.A.; Holmes, E.C. Unifying the Epidemiological and Evolutionary Dynamics of Pathogens. Science 2004, 303, 327–332. [Google Scholar] [CrossRef]
Acosta-Leal, R.; Duffy, S.; Xiong, Z.; Hammond, R.; Elena, S.F. Advances in Plant Virus Evolution: Translating Evolutionary Insights into Better Disease Management. Phytopathology 2011, 101, 1136–1148. [Google Scholar] [CrossRef]
Elvira-González, L.; Rubio, L.; Galipienso, L. Geographically Distant Isolates of the Persistent Southern Tomato Virus (STV) Show Very Low Genetic Diversity in the Putative Coat Protein Gene. Virus Genes 2020, 56, 668–672. [Google Scholar] [CrossRef]
Kumar, S.; Stecher, G.; Suleski, M.; Sanderford, M.; Sharma, S.; Tamura, K. MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing. Mol. Biol. Evol. 2024, 41, msae263. [Google Scholar] [CrossRef] [PubMed]
Rozas, J.; Ferrer-Mata, A.; Sánchez-DelBarrio, J.C.; Guirao-Rico, S.; Librado, P.; Ramos-Onsins, S.E.; Sánchez-Gracia, A. DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets. Mol. Biol. Evol. 2017, 34, 3299–3302. [Google Scholar] [CrossRef] [PubMed]
Hudson, R.R.; Boos, D.D.; Kaplan, N.L. A Statistical Test for Detecting Geographic Subdivision. Mol. Biol. Evol. 1992, 9, 138–151. [Google Scholar] [CrossRef] [PubMed]
Hudson, R.R. A New Statistic for Detecting Genetic Differentiation. Genetics 2000, 155, 2011–2014. [Google Scholar] [CrossRef]
Rousset, F. Genetic Differentiation and Estimation of Gene Flow from f-Statistics Under Isolation by Distance. Genetics 1997, 145, 1219–1228. [Google Scholar] [CrossRef]
Jin, W.; Zhang, Y.; Su, X.; Xie, Z.; Wang, R.; Du, Z.; Wang, Y.; Qiu, Y. Genetic Diversity Analysis of Lychnis Mottle Virus and First Identification of Angelica Sinensis Infection. Heliyon 2023, 9, e17006. [Google Scholar] [CrossRef]
Martin, D.P.; Murrell, B.; Golden, M.; Khoosal, A.; Muhire, B. RDP4: Detection and Analysis of Recombination Patterns in Virus Genomes. Virus Evol. 2015, 1, vev003. [Google Scholar] [CrossRef]
Weaver, S.; Shank, S.D.; Spielman, S.J.; Li, M.; Muse, S.V.; Kosakovsky Pond, S.L. Datamonkey 2.0: A Modern Web Application for Characterizing Selective and Other Evolutionary Processes. Mol. Biol. Evol. 2018, 35, 773–777. [Google Scholar] [CrossRef] [PubMed]
Muhire, B.M.; Varsani, A.; Martin, D.P. SDT: A Virus Classification Tool Based on Pairwise Sequence Alignment and Identity Calculation. PLoS ONE 2014, 9, e108277. [Google Scholar] [CrossRef] [PubMed]
Bouckaert, R.; Vaughan, T.G.; Barido-Sottani, J.; Duchêne, S.; Fourment, M.; Gavryushkina, A.; Heled, J.; Jones, G.; Kühnert, D.; De Maio, N.; et al. BEAST 2.5: An Advanced Software Platform for Bayesian Evolutionary Analysis. PLoS Comput. Biol. 2019, 15, e1006650. [Google Scholar] [CrossRef] [PubMed]
Rambaut, A.; Drummond, A.J.; Xie, D.; Baele, G.; Suchard, M.A. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst. Biol. 2018, 67, 901–904. [Google Scholar] [CrossRef]
Charon, J.; Theil, S.; Nicaise, V.; Michon, T. Protein Intrinsic Disorder Within the Potyvirus Genus: From Proteome-Wide Analysis to Functional Annotation. Mol. Biosyst. 2016, 12, 634–652. [Google Scholar] [CrossRef]
Charon, J.; Barra, A.; Walter, J.; Millot, P.; Hébrard, E.; Moury, B.; Michon, T. First Experimental Assessment of Protein Intrinsic Disorder Involvement in an RNA Virus Natural Adaptive Process. Mol. Biol. Evol. 2018, 35, 38–49. [Google Scholar] [CrossRef]
Walter, J.; Charon, J.; Hu, Y.; Lachat, J.; Leger, T.; Lafforgue, G.; Barra, A.; Michon, T. Comparative Analysis of Mutational Robustness of the Intrinsically Disordered Viral Protein VPg and of Its Interactor eIF4E. PLoS ONE 2019, 14, e0211725. [Google Scholar] [CrossRef]
Byrne, M.; Kashyap, A.; Esquirol, L.; Ranson, N.; Sainsbury, F. The Structure of a Plant-Specific Partitivirus Capsid Reveals a Unique Coat Protein Domain Architecture with an Intrinsically Disordered Protrusion. Commun. Biol. 2021, 4, 1155. [Google Scholar] [CrossRef]
LaTourrette, K.; Holste, N.M.; Garcia-Ruiz, H. Polerovirus Genomic Variation. Virus Evol. 2021, 7, veab102. [Google Scholar] [CrossRef]
Mészáros, B.; Erdős, G.; Dosztányi, Z. IUPred2A: Context-Dependent Prediction of Protein Disorder as a Function of Redox State and Protein Binding. Nucleic Acids Res. 2018, 46, W329–W337. [Google Scholar] [CrossRef]
Murrell, B.; Moola, S.; Mabona, A.; Weighill, T.; Sheward, D.; Kosakovsky Pond, S.L.; Scheffler, K. FUBAR: A Fast, Unconstrained Bayesian Approximation for Inferring Selection. Mol. Biol. Evol. 2013, 30, 1196–1205. [Google Scholar] [CrossRef] [PubMed]
Murrell, B.; Wertheim, J.O.; Moola, S.; Weighill, T.; Scheffler, K.; Kosakovsky Pond, S.L. Detecting Individual Sites Subject to Episodic Diversifying Selection. PLoS Genet. 2012, 8, e1002764. [Google Scholar] [CrossRef] [PubMed]
Uversky, V.N. What Does It Mean to Be Natively Unfolded? Eur. J. Biochem. 2002, 269, 2–12. [Google Scholar] [CrossRef] [PubMed]
Szilagyi, A.; Györffy, D.; Zavodszky, P. The Twilight Zone Between Protein Order and Disorder. Biophys. J. 2008, 95, 1612–1626. [Google Scholar] [CrossRef]
Rivarez, M.P.S.; Vučurović, A.; Mehle, N.; Ravnikar, M.; Kutnjak, D. Global Advances in Tomato Virome Research: Current Status and the Impact of High-Throughput Sequencing. Front. Microbiol. 2021, 12, 671925. [Google Scholar] [CrossRef]
Fukuhara, T.; Tabara, M.; Koiwa, H.; Takahashi, H. Effect of Asymptomatic Infection with Southern Tomato Virus on Tomato Plants. Arch. Virol. 2020, 165, 11–20. [Google Scholar] [CrossRef]
Elvira Gonzalez, L.; Peiró, R.; Rubio, L.; Galipienso, L. Persistent Southern Tomato Virus (STV) Interacts with Cucumber Mosaic and/or Pepino Mosaic Virus in Mixed-Infections Modifying Plant Symptoms, Viral Titer and Small RNA Accumulation. Microorganisms 2021, 9, 689. [Google Scholar] [CrossRef]
Van De Vossenberg, B.T.; Visser, M.; Bruinsma, M.; Koenraadt, H.M.; Westenberg, M.; Botermans, M. Real-Time Tracking of Tomato Brown Rugose Fruit Virus (ToBRFV) Outbreaks in the Netherlands Using Nextstrain. PLoS ONE 2020, 15, e0234671. [Google Scholar] [CrossRef]
Steinhauer, D.A.; Domingo, E.; Holland, J.J. Lack of Evidence for Proofreading Mechanisms Associated with an RNA Virus Polymerase. Gene 1992, 122, 281–288. [Google Scholar] [CrossRef]
Laemmlen, F.; Van Maren, A.; Endo, R.; Valverde, R. A Tomato Decline of Unknown Etiology in Imperial Valley. CA Phytopathol. 1985, 75, 1287. [Google Scholar]
Alcalá-Briseño, R.I.; Coşkan, S.; Londoño, M.A.; Polston, J.E. Genome Sequence of Southern Tomato Virus in Asymptomatic Tomato “Sweet Hearts”. Genome Announc. 2017, 5, 10–1128. [Google Scholar] [CrossRef]
Dias, N.P.; Hu, R.; Hale, F.A.; Hansen, Z.R.; Wszelaki, A.; Domier, L.L.; Hajimorad, M. Viromes of Field-Grown Tomatoes and Peppers in Tennessee Revealed by RNA Sequencing Followed by Bioinformatic Analysis. Plant Health Prog. 2023, 24, 207–213. [Google Scholar] [CrossRef]
Ma, Y.; Marais, A.; Lefebvre, M.; Faure, C.; Candresse, T. Metagenomic Analysis of Virome Cross-Talk Between Cultivated Solanum Lycopersicum and Wild Solanum Nigrum. Virology 2020, 540, 38–44. [Google Scholar] [CrossRef] [PubMed]
Murphy, J.F.; Bowen, K.L. Synergistic Disease in Pepper Caused by the Mixed Infection of Cucumber Mosaic Virus and Pepper Mottle Virus. Phytopathology 2006, 96, 240–247. [Google Scholar] [CrossRef]
Syller, J. Facilitative and Antagonistic Interactions Between Plant Viruses in Mixed Infections. Mol. Plant Pathol. 2012, 13, 204–216. [Google Scholar] [CrossRef]
Krupovic, M.; Dolja, V.V.; Koonin, E.V. Plant Viruses of the Amalgaviridae Family Evolved via Recombination Between Viruses with Double-Stranded and Negative-Strand RNA Genomes. Biol. Direct 2015, 10, 12. [Google Scholar] [CrossRef]
Duffy, S.; Shackelton, L.A.; Holmes, E.C. Rates of Evolutionary Change in Viruses: Patterns and Determinants. Nat. Rev. Genet. 2008, 9, 267–276. [Google Scholar] [CrossRef]
Chao, L.; Rang, C.U.; Wong, L.E. Distribution of Spontaneous Mutants and Inferences about the Replication Mode of the RNA Bacteriophage φ6. J. Virol. 2002, 76, 3276–3281. [Google Scholar] [CrossRef]
Choi, H.; Jo, Y.; Chung, H.; Choi, S.Y.; Kim, S.-M.; Hong, J.-S.; Lee, B.C.; Cho, W.K. Phylogenetic and Phylodynamic Analyses of Soybean Mosaic Virus Using 305 Coat Protein Gene Sequences. Plants 2022, 11, 3256. [Google Scholar] [CrossRef]
Mao, Y.; Sun, X.; Shen, J.; Gao, F.; Qiu, G.; Wang, T.; Nie, X.; Zhang, W.; Gao, Y.; Bai, Y. Molecular Evolutionary Analysis of Potato Virus y Infecting Potato Based on the VPg Gene. Front. Microbiol. 2019, 10, 1708. [Google Scholar] [CrossRef]
Zhai, W.; Nielsen, R.; Slatkin, M. An Investigation of the Statistical Power of Neutrality Tests Based on Comparative and Population Genetic Data. Mol. Biol. Evol. 2009, 26, 273–283. [Google Scholar] [CrossRef]
Sztuba-Solińska, J.; Urbanowicz, A.; Figlerowicz, M.; Bujarski, J.J. RNA-RNA Recombination in Plant Virus Replication and Evolution. Annu. Rev. Phytopathol. 2011, 49, 415–443. [Google Scholar] [CrossRef]
Kozlowski, L.P.; Bujnicki, J.M. MetaDisorder: A Meta-Server for the Prediction of Intrinsic Disorder in Proteins. BMC Bioinform. 2012, 13, 111. [Google Scholar] [CrossRef]
Katuwawala, A.; Oldfield, C.J.; Kurgan, L. Accuracy of Protein-Level Disorder Predictions. Brief. Bioinform. 2020, 21, 1509–1522. [Google Scholar] [CrossRef]
Wright, P.E.; Dyson, H.J. Intrinsically Unstructured Proteins: Re-Assessing the Protein Structure-Function Paradigm. J. Mol. Biol. 1999, 293, 321–331. [Google Scholar] [CrossRef]
Uversky, V.N. Intrinsically Disordered Proteins and Their “Mysterious”(meta) Physics. Front. Phys. 2019, 7, 10. [Google Scholar] [CrossRef]
Gitlin, L.; Hagai, T.; LaBarbera, A.; Solovey, M.; Andino, R. Rapid Evolution of Virus Sequences in Intrinsically Disordered Protein Regions. PLoS Pathog. 2014, 10, e1004529. [Google Scholar] [CrossRef]
Xue, B.; Blocquel, D.; Habchi, J.; Uversky, A.V.; Kurgan, L.; Uversky, V.N.; Longhi, S. Structural Disorder in Viral Proteins. Chem. Rev. 2014, 114, 6880–6911. [Google Scholar] [CrossRef] [PubMed]
Mishra, P.M.; Verma, N.C.; Rao, C.; Uversky, V.N.; Nandi, C.K. Intrinsically Disordered Proteins of Viruses: Involvement in the Mechanism of Cell Regulation and Pathogenesis. Prog. Mol. Biol. Transl. Sci. 2020, 174, 1–78. [Google Scholar] [PubMed]
Lafforgue, G.; Michon, T.; Charon, J. Analysis of the Contribution of Intrinsic Disorder in Shaping Potyvirus Genetic Diversity. Viruses 2022, 14, 1959. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Evolutionary analysis of southern tomato virus (STV) based on 108 complete genome sequences. (A) Maximum likelihood phylogenetic tree with 1000 bootstrap replicates based on nucleotide sequences of complete genomes of STV, (B) Nucleotide pairwise identity heatmap of complete genomes of STV. (Highlighted in red color or asterisk-marked (*) STV isolates represent those identified in this study).

Figure 2. Bayesian based analysis of 104 fusion protein nucleotide sequences of southern tomato virus (STV). (A) Time-scaled Bayesian maximum clade credibility tree constructed from fusion protein sequences of STV isolates. (B) Posterior distribution of tree height estimated using BEAST analysis of fusion protein gene sequences of STV isolates with blue representing sampling falling within the 95% posterior probability range while yellow is outside that range. (Branches highlighted in red indicate STV isolates obtained in the present study).

Table 1. Primers used for the diagnosis of southern tomato virus (STV) using reverse transcription polymerase chain reaction (RT-PCR) assays.

Primer Name	Sequence (5′-3′)	Annealing Temperature (°C)	Amplicon Size (bp)
STV 1F	GCGAGAGCGATAAATTTAGTAAGCTAC	53	673
STV 1R	TTGACAATCTTACGCTGCAGATCAG	53	673
STV 2F	GAGAAGAGGACACTGCAGTACAA	54	503
STV 2R	GTAGATATCCTCCATCAGACTCT	54	503

Table 2. Analysis of genetic diversity within the complete genomes and individual genes of southern tomato virus (STV) isolates.

Gene	No. of Sequences	Total no. of Nucleotide Sites	No. of Nucleotide Sites ^a	S	H	H_d	k	π	Eta	Theta (Per Sequence)	Theta (Per Site)	Tajima’s Test	Fu and Li’s D Test	Fu and Li’s F Test
Complete genome	108	3475	3190	184	70	0.979	23.64123	0.00741	193	36.72895	0.01151	−1.18568 ^NS	−3.22915 *	−2.79709 *
p42	130	1134	1134	67	47	0.898	6.23280	0.00550	69	12.68173	0.01118	−1.60530 ^NS	−3.21261 *	−3.02698 *
RdRp	108	2289	2289	129	60	0.960	17.48165	0.00764	138	26.26215	0.01147	−1.10373 ^NS	−2.62916 *	−2.35996 ^NS
Fusion protein	108	3190	3190	184	70	0.979	23.64123	0.00741	193	37.72895	0.01151	−1.18568 ^NS	−3.22915 *	−2.79709 *

^a Number of nucleotide sites excluding gaps and missing data. k, Average number of nucleotide differences. S, number of segregating (polymorphic) sites. H, number of haplotypes. Eta, the total number of mutations. H_d, haplotype diversity. π, nucleotide diversity. *, statistically significant (p ≤ 0.05). NS, Statistically non-significant (0.10 > p > 0.05).

Table 3. Estimates of nucleotide substitution rate and the age of diversity for southern tomato virus (STV) isolates.

Gene	Date Range	Mean Substitution Rate (Subs/Site/Year)	HPD ^a	TMRCA ^b	HPD ^a
Fusion protein	2005–2024	6.514 × 10⁻⁵	3.4575 × 10⁻⁵–9.6661 × 10⁻⁵	135.137	56.8965–241.3992
p42	2005–2024	1.184 × 10⁻⁴	5.7164 × 10⁻⁵–1.8674 × 10⁻⁴	58.546	23.4486–102.1011
RdRp	2005–2024	4.866 × 10⁻⁵	2.0639 × 10⁻⁵–7.7596 × 10⁻⁵	188.263	64.1967–368.4334

^a 95% Highest Probability Density (HPD) values. ^b Time to the Most Common Ancestor (TMRCA); years ago.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

First Detection and Identification of Southern Tomato Virus Infecting Tomatoes in Oklahoma with Complete Genome Characterization and Insights into Global Genetic Diversity

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Collection and Total RNA Extraction

2.2. High Throughput Sequencing and RT-PCR Confirmation

2.3. Multiple Sequence Alignment and Phylogenetic Analysis

2.4. Analysis of Genetic Diversity and Population Structure of STV

2.5. Recombination Detection and Pairwise Identity Analysis

2.6. Bayesian Evolutionary Analysis Sampling Trees (BEAST) Analysis

2.7. Protein Disorder Analysis

2.8. Analysis of Selective Pressure

3. Results

3.1. Genome Assembly of STV Isolates and RT-PCR Confirmation

3.2. Phylogenetic Analysis

3.3. Genetic Diversity, Genetic Differentiation, and Migration in STV Populations

3.4. Recombination in the STV Population

3.5. Bayesian Phylogenetic Analysis and Substitution Rate

3.6. Comparison of Intrinsically Disordered Regions in STV Proteins

3.7. Selective Pressure Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics