Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling

Choi, Bae Young; Kim, Jaewook

doi:10.3390/sci7030088

Open AccessArticle

Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling

by

Bae Young Choi

¹

and

Jaewook Kim

^2,*

¹

School of Liberal Arts and Sciences, Korea National University of Transportation, Chungju 27469, Republic of Korea

²

Department of Biology Education, Korea National University of Education, Cheongju 28173, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sci 2025, 7(3), 88; https://doi.org/10.3390/sci7030088

Submission received: 23 April 2025 / Revised: 26 May 2025 / Accepted: 25 June 2025 / Published: 1 July 2025

Download

Browse Figures

Versions Notes

Abstract

Tomato chlorosis virus (ToCV) is a highly infectious plant virus that poses a significant threat to the Solanaceae family worldwide. Despite its widespread impact, effective control remains challenging due to its vector-borne transmission by whiteflies. To facilitate early detection and potential therapeutic intervention, this study aimed to identify diagnostic epitopes through a comprehensive bioinformatics approach combining comparative genomics and artificial intelligence-based structural modeling. We analyzed forty-four complete ToCV genomes to identify highly conserved regions and uncovered an orphan clade, indicating evolutionary divergence. Subcellular localization and transmembrane domain predictions revealed viral proteins with extracellularly exposed peptide regions. Structural modeling using AlphaFold3 further validated the stability and accessibility of these domains. By integrating these findings with epitope prediction algorithms, this study identified four highly promising epitope candidates, which are suitable for the development of antibody-based diagnostic kits and antiviral therapeutics targeting ToCV. These epitopes provide a strong foundation for the development of antibody-based diagnostic kits or antiviral therapeutics targeting ToCV.

Keywords:

Tomato chlorosis virus (ToCV); comparative genomic analysis; structure analysis; epitope

1. Introduction

Tomato chlorosis virus (ToCV), a member of the genus Crinivirus, causes severe pathological symptoms in a wide range of host plants, particularly among species in the Solanaceae family. First identified in the USA in the mid-1990s, ToCV has spread to more than thirty-five countries worldwide [1]. Infected plants exhibit symptoms such as yellowing, thickening, and bronzing of leaves with floral abortion [2], leading to significant yield loss. As an emerging viral pathogen, ToCV seriously threatens the cultivation of economically important crops, such as pepper, tomato, and cucumber.

The genome of ToCV has a bipartite structure consisting of two RNA segments: RNA1 and RNA2 [1,3,4,5]. These segments encode a total of thirteen open reading frames (ORFs), with four located on RNA1 and nine on RNA2. Among these, ORF5, ORF7, ORF9, and ORF10 have been shown to contribute to the formation of the viral structural coat [1]. ORF1a and ORF1b encode replication-associated proteins, while ORF2, ORF9, and ORF10 function as gene-silencing suppressors and are associated with whitefly transmission [1,6]. To date, 44 complete ToCV genome sequences have been identified and assembled at a complete genome level. A previous study suggested a putative origin for Korean isolates based on comparative genomic analysis [7]. However, no comprehensive global comparative genomic analysis has been conducted to date, nor have putative diagnostic targets been systematically identified.

ToCV is primarily transmitted via insect vectors, with Bemisia tabaci identified as the major species responsible for its spread [8], which contributes to its high transmissibility and makes it challenging to control in nature. In detail, the virus follows a semi-persistent transmission mode, whereby it can remain within the vector for several days and be transmitted multiple times. In some cases, as few as 50 vectors per plant have been sufficient to achieve near-complete infection within 30 days [2]. While B. tabaci is the principal vector, several weed species have also been reported to act as alternative reservoirs and potential vectors for ToCV [8]. At present, management strategies are largely restricted to physical methods to prevent vector access and limit virus spread among cultivated plants.

Early detection of viral pathogens is critical for effective disease management, as demonstrated during the COVID-19 pandemic, where timely diagnostics played a pivotal role in containment efforts [9,10]. Among various detection strategies, antibody-mediated methods stand out for their accuracy and ease of application in field settings. The detection of the ToCV has been attempted to be assessed with RT-PCR and NIR spectroscopy [11,12]. In this study, we aimed to identify the first epitope candidates for ToCV, which can be promising targets for epitope-based ToCV diagnostics, yet have never been addressed. We analyzed forty-four complete ToCV genome sequences obtained from public databases. Through comprehensive multiple sequence alignment and phylogenetic analysis, we identified both highly variable and conserved genomic regions across the viral population. Structural analysis using AlphaFold3 enabled us to assess the conformational stability and extracellular domain positioning of these proteins. By integrating these results, we successfully identified four promising epitope candidates located on ToCV proteins. These epitopes hold potential for the development of diagnostic tools and therapeutic agents aimed at managing ToCV infections.

2. Materials and Methods

2.1. Collection of Whole Genomes

Genome sequences of Tomato chlorosis virus (ToCV) isolates were collected from the National Center of Biotechnology Information (NCBI). A total of forty-four genome sequences from various geological origins were collected for further analysis (Table 1).

2.2. Comparative Genomic Analysis and Phylogenic Analysis

To analyze the phylogenic relationship of all the ToCV genomes, we connected the RNA1 and RNA2 sequences of all the isolates. Then, all the genome sequences were aligned with MAFFT of default settings [13]. Then, aligned sequences were trimmed with trimal program with an automated setting [14]. Trimmed alignment data was constructed into a phylogenic tree with a maximum likelihood pipeline in IQTREE with a default setting [15]. To build a phylogenic tree of molecular-clock approach, we performed BEAST analysis with the following settings: tip dates were set as collection dates, a gamma site model was employed with a GTR substitution model, a relaxed clock log normal model was applied with a normalize option, where coalescent exponential population options and the chain length of MCMC was set as 10,000,000 [16,17,18,19]. To reveal nucleotide conservativeness in all the genomic positions, Jalview version 2.11.4.0 was applied and quality values were used [20].

2.3. Cellular Localization Prediction

To analyze the predicted cellular localization of the viral proteins in the infection status, Virus-mPLoc was applied to predict the cellular localization of ToCV proteins [21]. To predict whether ToCV proteins expressed in plant cells would be localized intracellular parts or extracellular compartments, OutCyte1.0 and OutCyte2.0 were applied and the highest probability of cellular localization predicted by the software was utilized [22]. A further prediction of the transmembrane domain (TMD), TMHMM and Phobius was applied for the ToCV proteins [23,24]. Putative TMDs and extracellular localized peptide regions were identified by comparing both TMHMM and Phobius: when both tools’ predictions were the same, the cellular localization was confirmed to be properly predicted. For the TMD localization, we defined common amino acid residues predicted through both softwares as the probable TMD. Visualization of the prediction was performed with TB-Tools [25].

2.4. Structure Assessment and Epitope Prediction

To identify exposed peptide regions, we performed 3D protein-structure prediction with the alphafold3 model [26]. From the alphafold3 built model, we also were able to reveal the structurally stable peptide regions. Then, chimeraX was applied for the visualization of the protein structure, with several highlights showing predicted TMD and extracellular localized peptide regions [27].

To predict probable epitope peptides, Episcan was applied for all the ToCV proteins without NetChop preteasomal cleavage prediction option and HLA-A option [28]. Epitope peptides with an ESP value lower than 0.2 were selected for further analysis. To reveal putative off-targets, BLASTp was applied on the revealed epitopes against the nr database with a seed length of 2 [29].

3. Results

3.1. Taiwanese Isolates Form the Most Distinct Clade Among ToCV Populations

To reveal the phylogenic relationship among the 44 ToCV isolates, multiple sequence alignment was performed, which was followed by a maximum-likelihood phylogenetic analysis process (Table 1 and Figure 1A). The sequence-based phylogeny revealed a partial correlation between phylogenetic clustering and the geographical origins of the isolates (Figure 1A). For instance, all isolates originating from Brazil clustered together, suggesting they diverged from a common ancestor (Figure 1A). However, the overall tree topology was complex, with some isolates, such as those from Pakistan and Brazil, forming a monophyletic group despite their distinct geographical origins. To improve resolution and account for temporal divergence, we further constructed a time-calibrated phylogenetic tree using a molecular clock model implemented in BEAST (Figure 1B). Compared to the sequence-based tree, this molecular clock analysis established a stronger correlation between phylogenetic structure and geographical origin. Interestingly, the Taiwanese isolates formed a distinct and deeply diverged monophyletic clade, suggesting an early divergence from other global ToCV populations (Figure 1B).

3.2. Coding Regions of the ToCV Genome Exhibit High Sequence Conservation

To discover targetable epitopes for virus, it is crucial to identify genomic regions with high sequence conservation that are less prone to mutation. We assessed sequence conservativeness across the entire ToCV genome by performing multiple sequence alignment and calculating position-specific conservativeness scores among the 44 isolates (Figure 2). Overall, the majority of nucleotide variability was observed in the intergenic regions, whereas the coding regions exhibited relatively high sequence stability (Figure 2). Specifically, seven regions within the genic portion of RNA1 showed a conservativeness score below 50%, while only one such region was identified in RNA2 (Figure 2). Among these regions, ORF6 that encodes a viral coat protein contained the lowest conservativeness score—as low as 2.27% (Figure 2B). Despite these few low-conservation sites, the majority of coding sequences displayed high stability, with over 90% conservativeness scores at the nucleotide level. These results suggest that the highly conserved genic regions of the ToCV genome, except ORF6, are promising candidates for designing robust and specific diagnostic epitopes.

3.3. Six ToCV Proteins Are Predicted to Contain Peptide Regions Exposed on the Outer Side of Plant Cells

To identify protein targets that are accessible for diagnostic purposes, we performed the predicted subcellular localization of ToCV proteins (Figure 3). Assuming a ToCV-infected plant cell context, Virus-mPLoc was applied to predict whether each viral protein is localized to the outer side of the plant cell (Figure 3A). Among the ToCV proteins, YP_293697.1, YP_293700.1, and YP_293706.1 were predicted to localize to the host cell membrane. All proteins except YP_293698.1 and YP_293706.1 were predicted to localize within the host cytoplasm (Figure 3A). Although most ToCV proteins were predicted to be intracellular, we hypothesized that some might be secreted or partially exposed via the host’s secretory system. To address the possibility, we applied OutCyte versions 1.0 and 2.0 to predict viral protein secretion in the host cell (Figure 3B). While predictions varied between the two pipelines, most proteins were predicted to potentially localize the outer side of the cell or be secreted. To further support these predictions, we analyzed transmembrane domains (TMDs) and extracellular regions using TMHMM and Phobius tools (Figure 3C). Based on the consensus between the predictions, YP_293694.1, YP_293697.1, and YP_293706.1 were identified as possessing TMDs with extracellularly exposed regions. Meanwhile, YP_293699.1, YP_293700.1, and YP_293704.1 were predicted to be secreted into the extracellular space without TMDs (Figure 3C,D). Interestingly, YP_293698.1 also possessed a TMD, though the localization was not predicted in Virus-mPLoc (Figure 3A,C). Based on these combined analyses, six proteins (YP_293694.1, YP_293697.1, YP_293699.1, YP_293700.1, YP_293704.1, and YP_293706.1) were identified as possessing extracellularly exposed peptide sequences. These proteins represent promising targets for epitope-based ToCV diagnostics.

3.4. Four Highly Probable Epitopes Are Identified

To identify the most promising epitopes for ToCV, we focused on six viral proteins predicted to contain extracellularly exposed regions (Figure 3). These proteins were subjected to epitope prediction using Episcan with default settings, resulting in thousands of candidate peptide sequences. Among these, twelve sequences were initially predicted as highly probable epitopes based on their scoring profiles. To further narrow down the promising epitopes, we applied three key criteria for candidate selection: (1) localization within extracellular or exposed peptide regions, (2) structural stability, (3) high genetic conservation across ToCV isolates, and (4) abundant expression levels in the host system. To evaluate structural attributes, we performed 3D structure prediction for the six target proteins using Alphafold3 (Figure 4). Among the twelve candidate epitopes, only four epitopes listed in Table 2 were found to reside within structurally stable extracellular regions, without any homologous target found in the nr database (Figure 4). In addition, all four epitopes were located in genetically conserved regions with minimal sequence variability across isolates (Figure 3). To assess their accessibility to antibody binding, the four selected epitopes were highlighted with their corresponding 3D protein structures (Figure S1). This visualization suggested that the selected epitopes are surface-accessible and not buried within the protein core. Taken together, our integrative approach combining subcellular localization, structural modeling, and sequence conservation analysis enabled the identification of four viable epitope candidates suitable for diagnostic or therapeutic strategies targeting ToCV.

4. Discussion

Using the 44 ToCV isolate genomes, we examined their phylogenetic relationships through both sequence-based and molecular-clock-based analyses (Figure 1). The sequence-based phylogenetic tree showed weak correlation with the geographical origins of the isolates, even when considering possible historical geographic distributions (Figure 1A) [30]. In contrast, the molecular-clock-based phylogenetic reconstruction more closely reflected the geographic patterns of the isolates (Figure 1B). However, any discussion linking phylogenetic structure to geographic origin must be approached cautiously, as it presumes that the sampling locations reflect true sites of viral origin. A key factor complicating the interpretation of ToCV evolutionary origins is the role of insect vectors, whiteflies, in virus transmission. Bemisia tabaci, the primary whitefly vector of ToCV, is distributed globally [31,32]. Although B. tabaci is capable of movement from early developmental stages within and between plants, documented movement distances are not sufficient to explain global dispersal [33]. However, whiteflies can travel over 2000 m [34], and their global presence likely facilitates the international spread of ToCV. Recent studies have shown that B. tabaci and Bemisia species are widely distributed, although geographical isolation within specific regions has generally been unsuccessful [32]. This complexity implies that either of the phylogenetic models (sequence-based or molecular-clock-based) could be valid, as viral gene flow is mediated by the mobility of the insect vector rather than strict geographic isolation. The Taiwanese isolates, which form a distinct monophyletic cluster in the molecular-clock-based tree (Figure 1B), may represent an orphan lineage of ToCV. However, without stronger evidence linking isolate location to viral origin, this finding should not be interpreted as confirmation of the geographical origin of the virus.

Previous studies have reported the subcellular localization of several ToCV proteins in host plant systems [35,36,37,38]. For example, YP_293696.1 (P22) has been observed in both the cytoplasm and nucleus of host cells [35]. YP_293706.1 (P7) localizes to the cell membrane [36]. YP_293705.1 (P27) has been found in the cytoplasm and nucleus of Nicotiana benthamiana epidermal cells [38]. YP_293702.1 (P9) co-localizes with Lhca4 in the cytoplasm and nucleus [37]. These four experimentally confirmed localizations are consistent with the predictions made in our study, lending confidence to the overall accuracy of our subcellular localization analyses. Building on this, our computational predictions suggest that 11 out of the 13 ToCV proteins are predominantly localized to intracellular compartments (Figure 3A–C). However, several proteins were also predicted to contain peptide segments exposed to the extracellular environment (Figure 3C). This dual localization pattern could indicate that some ToCV proteins exist in both intracellular and extracellular compartments simultaneously, possibly in distinct subpopulations or during different stages of infection. Alternatively, transient localization shifts may occur as part of the virus–host interaction dynamics. These possibilities highlight the need for further experimental validation to fully understand the localization and functional roles of these viral proteins.

ToCV belongs to the Closteroviridae family, whose members are predominantly phloem-limited viruses [39]. Evidence from closely related species suggests that ToCV virions are primarily localized within phloem and companion cells [40]. This localization pattern has significant implications for the development of diagnostic tools. Specifically, targeting the leaf veins, where phloem tissues are concentrated, would increase the likelihood of detecting the virus using the epitope-based methods proposed in this study. From the single-cell-level analysis, mesophyll cells constitute approximately 75% of the total leaf cell population, while phloem cells represent a relatively minor fraction [41]. Despite their limited proportion, phloem and companion cells are anatomically well distributed throughout the tomato leaf, especially along the vascular bundles. Therefore, sampling leaf disks that include visible veins would enrich phloem tissue and improve the accuracy and sensitivity of diagnostic assays. Furthermore, for potential therapeutic applications such as the delivery of antibody-based antiviral agents, targeting the vascular-rich regions of the leaf would also be essential. Upon diagnosing plants (tree species would be especially beneficial for this approach), roguing the plant with the virus detected could be faster through our approach. Our approach may be applied on the whitefly (B. tabaci) yet we do not know of the expression patterns of ToCV upon its infection in the B. tabaci, which should be addressed in the future.

When we consider these epitopes for real use, we need to first determine the types of the antibody, such as the following: 1. Traditional antibodies produced by infecting the other organisms. 2. Phage display-derived monoclonal antibodies. 3. VHH antibodies which only contain a single domain with a variable heavy chain (VHH), lacking the light chain variable regions. In our case, traditional antibodies or phage display-derived antibodies could be considered since the higher affinity with high specificity is required without the need for a human body. For research, traditional antibody production would be appropriate while phage display-derived antibodies might be considered for mass production. In future studies, both production methods should be addressed with ELISA or DIBA for research and practical usage.

5. Conclusions

Through this study, the first epitope candidates for ToCV were identified, an emerging threat to the cultivation of Solanaceae crops. By analyzing 44 complete genomes of ToCV isolates, the phylogenetic relationships among global isolates were uncovered and the Taiwanese strains were identified as a distinct, early-diverging clade. Through multiple sequence alignment, the highly conserved genomic regions suitable for diagnostic targeting were pinpointed. Subcellular localization predictions allowed us to identify viral proteins with extracellularly exposed regions, making them accessible targets for antibody binding. By integrating sequence conservation, structural analysis, and epitope prediction, four highly promising epitope candidates were identified. These findings lay the groundwork for developing antibody-based diagnostic kits and potential antiviral agents against ToCV. Our integrative approach provides a valuable framework for epitope discovery in other plant viruses as well.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/sci7030088/s1, Figure S1: Structural validation of identified epitopes using surface visualization of 3D protein models. In 3D structure, red-colored regions indicate extracellular regions, white-colored regions indicate transmembrane regions, eggshell-colored regions indicate intracellular regions, and magenta-colored regions indicate the epitopes identified in this study. (A,B). ToCV-EPI1 mapped to YP_293694.1. A. Overall structural view showing the location of ToCV-EPI1. B. Enlarged view detailing the position of ToCV-EPI1. (C,D). ToCV-EPI2 and ToCV-EPI3 mapped to YP_293699.1. C. Overall structural view showing the location of ToCV-EPI2 and ToCV-EPI3. D. Enlarged view detailing the position of ToCV-EPI2 and ToCV-EPI3. (E,F). ToCV-EPI4 mapped to YP_293704.1. E. Overall structural view showing the location of ToCV-EPI4. F. Enlarged view detailing the position of ToCV-EPI4.

Author Contributions

B.Y.C. and J.K., methodology; J.K., formal analysis; B.Y.C., investigation; J.K., data curation; J.K., writing—original draft preparation; B.Y.C. and J.K., writing—review and editing; J.K., visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2024 New Professor Research Grant funded by Korea National University of Education.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available on demand.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fiallo-Olivé, E.; Navas-Castillo, J. Tomato chlorosis virus, an emergent plant virus still expanding its geographical and host ranges. Mol. Plant Pathol. 2019, 20, 1307–1320. [Google Scholar] [CrossRef] [PubMed]
Fortes, I.M.; Moriones, E.; Navas-Castillo, J. Tomato chlorosis virus in pepper: Prevalence in commercial crops in southeastern Spain and symptomatology under experimental conditions. Plant Pathol. 2012, 61, 994–1001. [Google Scholar] [CrossRef]
Lozano, G.; Moriones, E.; Navas-Castillo, J. Complete sequence of the RNA1 of a European isolate of tomato chlorosis virus. Arch. Virol. 2007, 152, 839–841. [Google Scholar] [CrossRef] [PubMed]
Lozano, G.; Moriones, E.; Navas-Castillo, J. Complete nucleotide sequence of the RNA2 of the crinivirus tomato chlorosis virus. Arch. Virol. 2006, 151, 581–587. [Google Scholar] [CrossRef]
Wintermantel, W.M.; Wisler, G.C.; Anchieta, A.G.; Liu, H.-Y.; Karasev, A.V.; Tzanetakis, I.E. The complete nucleotide sequence and genome organization of Tomato chlorosis virus. Arch. Virol. 2005, 150, 2287–2298. [Google Scholar] [CrossRef]
Chen, A.Y.S.; Walker, G.P.; Carter, D.; Ng, J.C.K. A virus capsid component mediates virion retention and transmission by its insect vector. Proc. Natl. Acad. Sci. USA 2011, 108, 16777–16782. [Google Scholar] [CrossRef]
Lee, Y.J.; Kil, E.J.; Kwak, H.R.; Kim, M.; Seo, J.K.; Lee, S.; Choi, H.S. Phylogenetic Characterization of Tomato chlorosis virus Population in Korea: Evidence of Reassortment between Isolates from Different Origins. Plant Pathol. J. 2018, 34, 199–207. [Google Scholar] [CrossRef]
Orfanidou, C.G.; Pappi, P.G.; Efthimiou, K.E.; Katis, N.I.; Maliogka, V.I. Transmission of Tomato chlorosis virus (ToCV) by Bemisia tabaci Biotype Q and Evaluation of Four Weed Species as Viral Sources. Plant Dis. 2016, 100, 2043–2049. [Google Scholar] [CrossRef]
Snopkowska, L.; Sara, W.; Maschio, D.; Henriquez-Camacho, C.; Moreno, C.V. Biomarkers for SARS-CoV-2 infection. A narrative review. Front. Med. 2025, 12, 1563998. [Google Scholar]
Joshi, K.M.; Salve, S.; Dhanwade, D.; Chavhan, M.; Jagtap, S.; Shinde, M.; Holkar, R.; Patil, R.; Chabukswar, V. Advancing protein biosensors: Redefining detection through innovations in materials, mechanisms, and applications for precision medicine and global diagnostics. RSC Adv. 2025, 15, 11523–11536. [Google Scholar] [CrossRef]
Tu, L.; Wu, S.; Gan, S.; Zhao, W.; Li, S.; Cheng, Z.; Zhou, Y.; Zhu, Y.; Ji, Y. A simplified RT-PCR assay for the simultaneous detection of tomato chlorosis virus and tomato yellow leaf curl virus in tomato. J. Virol. Methods 2022, 299, 114282. [Google Scholar] [CrossRef] [PubMed]
Morellos, A.; Tziotzios, G.; Orfanidou, C.; Pantazi, X.E.; Sarantaris, C.; Maliogka, V.; Alexandridis, T.K.; Moshou, D. Non-Destructive Early Detection and Quantitative Severity Stage Classification of Tomato Chlorosis Virus (ToCV) Infection in Young Tomato Plants Using Vis–NIR Spectroscopy. Remote Sens. 2020, 12, 1920. [Google Scholar] [CrossRef]
Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30, 3059–3066. [Google Scholar] [CrossRef] [PubMed]
Capella-Gutierrez, S.; Silla-Martinez, J.M.; Gabaldon, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef]
Nguyen, L.T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef]
Drummond, A.J.; Nicholls, G.K.; Rodrigo, A.G.; Solomon, W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 2002, 161, 1307–1320. [Google Scholar] [CrossRef]
Barba-Montoya, J.; Tao, Q.; Kumar, S. Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated. Bioinformatics 2020, 36 (Suppl. S2), i884–i894. [Google Scholar] [CrossRef]
Bouckaert, R.; Vaughan, T.G.; Barido-Sottani, J.; Duchêne, S.; Fourment, M.; Gavryushkina, A.; Heled, J.; Jones, G.; Kühnert, D.; De Maio, N.; et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2019, 15, e1006650. [Google Scholar]
Drummond, A.J.; Ho, S.Y.W.; Phillips, M.J.; Rambaut, A. Relaxed Phylogenetics and Dating with Confidence. PLoS Biol. 2006, 4, e88. [Google Scholar] [CrossRef]
Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2—A multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef]
Shen, H.B.; Chou, K.C. Virus-mPLoc: A fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn. 2010, 28, 175–186. [Google Scholar] [CrossRef] [PubMed]
Zhao, L.; Poschmann, G.; Waldera-Lupa, D.; Rafiee, N.; Kollmann, M.; Stühler, K. OutCyte: A novel tool for predicting unconventional protein secretion. Sci. Rep. 2019, 9, 19448. [Google Scholar] [CrossRef] [PubMed]
Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001, 305, 567–580. [Google Scholar] [CrossRef] [PubMed]
Käll, L.; Krogh, A.; Sonnhammer, E.L.L. A Combined Transmembrane Topology and Signal Peptide Prediction Method. J. Mol. Biol. 2004, 338, 1027–1036. [Google Scholar] [CrossRef]
Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 2020, 13, 1194–1202. [Google Scholar] [CrossRef]
Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Meng, E.C.; Goddard, T.D.; Pettersen, E.F.; Couch, G.S.; Pearson, Z.J.; Morris, J.H.; Ferrin, T.E. UCSF ChimeraX: Tools for structure building and analysis. Protein Sci. 2023, 32, e4792. [Google Scholar] [CrossRef]
Wang, C.; Wang, J.; Song, W.; Luo, G.; Jiang, T. EpiScan: Accurate high-throughput mapping of antibody-specific epitopes using sequence information. Npj Syst. Biol. Appl. 2024, 10, 101. [Google Scholar] [CrossRef]
Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef]
Yang, F.; Leng, C.; Shen, X.; Bagas, L.; Zhang, L.; Jepson, G. Editorial: Evolution of tectonic structures and mineralisation in orogens and their margins. Front. Earth Sci. 2024, 12, 1371835. [Google Scholar] [CrossRef]
Kanakala, S.; Ghanim, M. Global genetic diversity and geographical distribution of Bemisia tabaci and its bacterial endosymbionts. PLoS ONE 2019, 14, e0213946. [Google Scholar] [CrossRef] [PubMed]
Crossley, M.S.; Snyder, W.E. What Is the Spatial Extent of a Bemisia tabaci Population? Insects 2020, 11, 813. [Google Scholar] [CrossRef] [PubMed]
Summers, C.G.; Newton, A.S.; Estrada, D. Intraplant and Interplant Movement of Bemisia argentifolii (Homoptera: Aleyrodidae) Crawlers. Environ. Entomol. 1996, 25, 1360–1364. [Google Scholar] [CrossRef]
Byrne, D.N. Migration and dispersal by the sweet potato whitefly, Bemisia tabaci. Agric. For. Meteorol. 1999, 97, 309–316. [Google Scholar] [CrossRef]
Shang, K.; Xiao, L.; Zhang, X.; Zang, L.; Zhao, D.; Wang, C.; Wang, X.; Zhou, T.; Zhu, C.; Zhu, X. Tomato chlorosis virus p22 interacts with NbBAG5 to inhibit autophagy and regulate virus infection. Mol. Plant Pathol. 2023, 24, 425–435. [Google Scholar] [CrossRef]
Feng, L.; Luo, X.; Huang, L.; Zhang, Y.; Li, F.; Li, S.; Zhang, Z.; Yang, X.; Wang, X.; OuYang, X.; et al. A viral protein activates the MAPK pathway to promote viral infection by downregulating callose deposition in plants. Nat. Commun. 2024, 15, 10548. [Google Scholar] [CrossRef]
Shi, X.; Yue, H.; Wei, Y.; Preisser, E.L.; Wang, P.; Du, J.; Xia, J.; Li, K.; Yang, X.; Chen, J.; et al. Neophytadiene, a Plant Specialized Metabolite, Mediates the Virus-Vector-Plant Tripartite Interactions. Adv. Sci. 2025, 12, 2416891. [Google Scholar] [CrossRef]
Sun, X.; Zang, L.; Liu, X.; Jiang, S.; Zhang, X.; Zhao, D.; Shang, K.; Zhou, T.; Zhu, C.; Zhu, X. Interactions of Tomato Chlorosis Virus p27 Protein with Tomato Catalase Are Involved in Viral Infection. Viruses 2023, 15, 990. [Google Scholar] [CrossRef]
Navas-Hermosilla, E.; Fiallo-Olivé, E.; Navas-Castillo, J. Infectious Clones of Tomato Chlorosis Virus: Toward Increasing Efficiency by Introducing the Hepatitis Delta Virus Ribozyme. Front. Microbiol. 2021, 12, 693457. [Google Scholar] [CrossRef]
Kiss, Z.A.; Medina, V.; Falk, B. Crinivirus replication and host interactions. Front. Microbiol. 2013, 4, 99. [Google Scholar] [CrossRef]
Kim, J.Y.; Symeonidi, E.; Pang, T.Y.; Denyer, T.; Weidauer, D.; Bezrutczyk, M.; Miras, M.; Zöllner, N.; Hartwig, T.; Wudick, M.M.; et al. Distinct identities of leaf phloem cells revealed by single cell transcriptomics. Plant Cell 2021, 33, 511–530. [Google Scholar] [CrossRef]

Figure 1. Phylogenic and evolutionary relationships of 44 ToCV isolates. (A,B) Node tips are color-coded according to the geological origin or collection site of each isolate, as shown in (B). (A) Phylogenic tree based on sequence homology. MAFFT alignment was performed, and then trimal and iqtree were used to build the tree solely based on their sequence homology. (B) The phylogenic tree was constructed using a molecular-clock model. The distance in the horizontal axis indicates the hypothetical evolutionary time of divergence. The tree was built with BEAST software.

Figure 2. Nucleotide conservativeness across the whole ToCV genome. Nucleotide conservativeness scores were calculated for each nucleotide position based on multiple sequence alignments of 44 ToCV isolates. To visualize in a track, the Florida1 isolate was used as the reference genome for coordinate mapping. The X-axis represents the physical position along the genome, while the Y-axis represents the conservativeness score at each position. Genic region was annotated based on the Florida1 isolate information, with start site and end sites annotated on the relative positions. White boxes annotate interproscan-detected funtional domains of each protein. (A) Nucleotide conservativeness of the RNA1 segment. (B) Nucleotide conservativeness of the RNA2 segment.

Figure 3. Predicted subcellular localization of ToCV proteins in plant cells. (A–C) Subcellular localization of ToCV proteins was predicted using multiple tools under the assumption of a ToCV-infected plant cell context. (A) Heatmap representation of Virus-mPLoc predictions for each ToCV protein. Yellow shading indicates predicted localization within specific cellular compartments. (B,C). UPS indicates unconventional protein secretion. Signal peptide indicates the existence of signal peptide in protein sequence. Transmembrane indicates the existence of transmembrane domain without direction. (B) OutCyte prediction of ToCV proteins. OutCyte 1.0 results are visualized as bar plots indicating the most probable localization. OutCyte 2.0 results are visualized as a heatmap, where color intensity and numeric values in each box indicate the probability of localization in each cellular compartment. (C) Transmembrane domain (TMD) prediction is visualized as a bubble plot. The number of predicted TMDs for each protein was represented with bubble size and color. (D) Summary of six ToCV proteins predicted to have extracellularly exposed regions in ToCV-infected plant cells. Colored boxes indicate the predicted subcellular localization of individual peptide segments.

Figure 4. Structural assessment of six ToCV extracellularly localized proteins revealed peptide regions suitable for epitope development. (A–L). Alphafold3 prediction of all six ToCV extracellularly localized proteins and expected position error plot for each predicted protein structure was visualized. In 3D structures, red-colored regions indicate extracellularly localized regions, white-colored regions indicate transmembrane regions, and eggshell-colored regions indicate intracellular regions. (A) Predicted 3D structure of YP_293694.1 from Alphafold3. (B) Expected position error plot for each predicted protein structure of YP_293694.1. (C) Predicted 3D structure of YP_293697.1 from Alphafold3. (D) Expected position error plot for each predicted protein structure of YP_293697.1. (E) Predicted 3D structure of YP_293699.1 from Alphafold3. (F) Expected position error plot for each predicted protein structure of YP_293699.1. (G) Predicted 3D structure of YP_293700.1 from Alphafold3. (H) Expected position error plot for each predicted protein structure of YP_293700.1. (I) Predicted 3D structure of YP_293704.1 from Alphafold3. (J) Expected position error plot for each predicted protein structure of YP_293704.1. (K) Predicted 3D structure of YP_293706.1 from Alphafold3. (L) Expected position error plot for each predicted protein structure of YP_293706.1. (M) Schematic summary of this study. The purple arrow indicates the stepwise workflow used to identify candidate epitopes.

Table 1. Metainformation of collected ToCV genomes.

Isolate	Geo_Location	Host	RNA1	RNA2	Collection_Date
2.5	Spain, Malaga	Solanum lycopersicum	KJ200304.1	KJ200305.1	2010
AT80/99	Spain, Malaga	Solanum lycopersicum	DQ983480.1	DQ136146.1	1999
AT80/99-IC	Spain, Malaga	Solanum lycopersicum	KJ740256.1	KJ740257.1	2014
BR	Brazil, Sumaré, São Paulo	Capsicum annuum	MT279194.1	MT279195.1	2019
DSMZ PV-1242	Greece	Solanum lycopersicum	ON398512.1	ON398513.1	2018-10-02
Gr-535	Greece	not applicable	EU284745.1	EU284744.1	2008
HP	Republic of Korea	Solanum lycopersicum	KP114530.1	KP114537.1	2013-01-01
HS	Republic of Korea	Solanum lycopersicum	KP137098.1	KP137099.1	2013-01-01
IS17	Republic of Korea	Solanum lycopersicum	KP114535.1	KP114525.1	2013-01-01
IS29	Republic of Korea	Solanum lycopersicum	KP114538.1	KP114529.1	2013-01-01
JJ	Republic of Korea	Solanum lycopersicum	KP137100.1	KP137101.1	2013-01-01
JJ3	Republic of Korea	not applicable	KP114532.1	KP114533.1	2013-01-01
JJ5	Republic of Korea	Solanum lycopersicum	KP114527.1	KP114534.1	2013-01-01
JN1	Republic of Korea	not applicable	KP114531.1	KP114536.1	2013-01-01
JN2	Republic of Korea	Solanum lycopersicum	MG813909.1	MG813910.1	2013
Kas	Turkey	Solanum lycopersicum	KY419526.1	KY419528.1	2015-02-03
Merkez	Turkey	Solanum lycopersicum	KY419525.1	KY419527.1	2015-02-03
MM8	Spain, Malaga	Capsicum annuum	KJ200306.1	KJ200307.1	2005
NS	Republic of Korea	Solanum lycopersicum	MG813908.1	MG813911.1	2013
Oleo	Brazil	Cucumis sativus	MN172419.1	MN172420.1	2019-02-01
PAK1	Pakistan	Solanum lycopersicum	MN869004.1	MN869006.1	2019-02-13
PAK2	Pakistan	Solanum lycopersicum	MN869005.1	MN869007.1	2019-02-14
Pl-1-2	Spain, Malaga	Solanum lycopersicum	KJ200308.1	KJ200309.1	1997
SA33	Brazil, Santo Amaro da Imperatriz, Santa Catarina	Solanum lycopersicum	OR283247.1	OR283248.1	2020-08-01
SDSG	China	Solanum lycopersicum	KC709509.1	KC709510.1	2012-10-01
SDTA	China	Solanum lycopersicum	OR246919.1	OR246920.1	2019-01-09
TN11	Taiwan	Solanum lycopersicum	MF795556.1	MF795557.1	1998
ToC-Br2	Brazil, Sao Paulo	Solanum lycopersicum	JQ952600.1	JQ952601.1	2006
ToCR-186	South Africa	Solanum lycopersicum	KY471129.1	KY471130.1	2015-02-15
ToCV_tomato_BR	Brazil	Solanum lycopersicum	MT673878.1	MT673879.1	2019
ToCV-BJ	China	Solanum lycopersicum	KC887998.1	KC887999.1	2007
ToCV-PAP-JJ6	Republic of Korea, Jinju	Capsicum annuum	OR865222.1	OR865223.1	2023-05-01
ToCV-SH-SIA	China	not applicable	MW490611.1	MW490612.1	2019-11-01
ToCV-SH-SIPPE	China	not applicable	MW490609.1	MW490610.1	2019-11-01
ToCV-SH-SNU	China	not applicable	MW490607.1	MW490608.1	2019-11-01
WSE	Egypt, Cairo, ARC	Solanum lycopersicum	KY618796.1	KY618797.1	2022-05-18
XS	Taiwan, Taichung, Xinshe	Solanum lycopersicum	KP114526.1	KP114528.1	2011-12-19
YG	Republic of Korea	Solanum lycopersicum	LC711108.1	LC711109.1	2020
Mie_1806	Japan	Solanum lycopersicum	LC528262.1	LC528263.1	2018
Florida1	USA, Florida	Solanum lycopersicum	NC_007340.1	NC_007341.1	1989
Florida2	USA, Florida	Solanum lycopersicum	AY903447.1	AY903448.1	1989
FERA_160205	Slovenia	Solanum lycopersicum	KY810786.1	KY810787.1	2016
TCG	Japan, Tochigi	Solanum lycopersicum	LC711106.1	LC711107.1	2008
KGS RNA	Japan, Kagoshima	Solanum lycopersicum	LC791014.1	LC794715.1	2022-05-18

Table 2. List of highly probable epitopes for ToCV.

Peptide ID	9-mer Peptide	Target Protein
ToCV_EPI1	YMLESVKNV	YP_293694.1 (1a polyprotein)
ToCV_EPI2	YTIDGILEL	YP_293699.1 (HSP70)
ToCV_EPI3	KITDLNVSV	YP_293699.1 (HSP70)
ToCV_EPI4	TMNSVVQYV	YP_293704.1 (Minor coat protein)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, B.Y.; Kim, J. Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling. Sci 2025, 7, 88. https://doi.org/10.3390/sci7030088

AMA Style

Choi BY, Kim J. Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling. Sci. 2025; 7(3):88. https://doi.org/10.3390/sci7030088

Chicago/Turabian Style

Choi, Bae Young, and Jaewook Kim. 2025. "Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling" Sci 7, no. 3: 88. https://doi.org/10.3390/sci7030088

APA Style

Choi, B. Y., & Kim, J. (2025). Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling. Sci, 7(3), 88. https://doi.org/10.3390/sci7030088

Article Menu

Discovery of Targetable Epitopes in Tomato Chlorosis Virus Through Comparative Genomics and Structural Modeling

Abstract

1. Introduction

2. Materials and Methods

2.1. Collection of Whole Genomes

2.2. Comparative Genomic Analysis and Phylogenic Analysis

2.3. Cellular Localization Prediction

2.4. Structure Assessment and Epitope Prediction

3. Results

3.1. Taiwanese Isolates Form the Most Distinct Clade Among ToCV Populations

3.2. Coding Regions of the ToCV Genome Exhibit High Sequence Conservation

3.3. Six ToCV Proteins Are Predicted to Contain Peptide Regions Exposed on the Outer Side of Plant Cells

3.4. Four Highly Probable Epitopes Are Identified

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI