1. Introduction
High-throughput sequencing (HTS) has revolutionized plant virus research by enabling unbiased detection of mixed infections, low-titer pathogens and previously unknown viruses directly from plant tissues [
1]. HTS-based “virome” approaches bypass the need for virus isolation or prior serological and PCR knowledge, and have revealed that many crops harbor complex communities of RNA and DNA viruses, persistent viruses and defective or satellite RNAs [
2]. However, the analytical power of HTS critically depends on how sequencing libraries are prepared [
3]. Plant total RNA extracts are dominated by ribosomal RNAs and plastid transcripts, while many plant viruses lack poly(A) tails, possess highly structured genomes or replicate as multipartite DNA molecules, all of which can strongly influence their recovery in different library types [
4]. As a result, methodological choices made upstream of sequencing can profoundly alter which viral taxa are detectable, how completely their genomes are assembled and how accurately their abundance is estimated [
5].
Two main strategies are widely used for plant transcriptome and virome profiling. Ribosomal RNA-depleted total RNA sequencing aims to remove cytosolic and organellar rRNAs while retaining both polyadenylated and non-polyadenylated transcripts, including viral RNAs, viral replicative intermediates and a broad range of host non-coding RNAs [
6]. In contrast, poly(A)-selected mRNA sequencing enriches for polyadenylated transcripts, greatly increasing the proportion of host coding sequences but inevitably depleting non-polyadenylated viral genomes and many DNA-virus transcripts [
7]. Despite the frequent use of both approaches in plant virome studies, direct experimental comparisons of their performance on the same field samples remain scarce, and most published work has relied on a single library type, making it difficult to disentangle true biological patterns from methodological artefacts.
Pepper (
Capsicum annuum) and garlic (
Allium sativum) are high-value crops in Korea that are chronically exposed to a diverse ensemble of viruses [
8,
9]. Peppers are commonly infected by acute RNA viruses such as tomato spotted wilt orthotospovirus (TSWV) and broad bean wilt virus 2 (BBWV2), as well as persistent agents including hot pepper endornavirus (HPEV) and cryptic viruses such as pepper cryptic virus 2 (PCV2) [
8,
10]. Garlic, propagated vegetatively, accumulates long-term infections by multiple allexiviruses (garlic virus A, B, C, D, E and X) and carlaviruses such as garlic common latent virus, shallot virus X (ShVX) and garlic latent virus (GLV), frequently in mixed combinations that reduce bulb yield and quality [
11]. Previous metatranscriptomic surveys in these crops have catalogued many of these viruses and suggested substantial intra-species diversity but have seldom examined how library preparation biases virome profiles, nor have they systematically linked virome composition to genome-scale phylogenies.
Field-collected tissues also present additional interpretive challenges. Heavily infected leaves and cloves may contain not only highly abundant epidemic viruses but also low-titer, persistent or latent viruses—some potentially interacting synergistically with dominant pathogens—as well as endophytic fungi and bacteria, arthropod-derived sequences and environmental contaminants from soil, irrigation water or handling [
12,
13]. Distinguishing true host–virus associations from background noise demands workflows that not only maximize viral recovery but also support robust evolutionary and taxonomic analyses.
In this study, pepper leaves from Anseong and Jincheon and garlic cloves from Hoengseong were collected from commercial fields in Korea to compare rRNA-depleted total RNA and poly(A)-selected mRNA libraries for plant virome analysis. The aims were to (i) quantify how these library types differ in mapping efficiency and host transcript composition, (ii) assess how they reshape apparent viral diversity, genome coverage and within-sample variants in pepper and garlic viromes, and (iii) obtain near-complete viral genomes for phylogenetic placement and species demarcation, with emphasis on allexiviruses in garlic. This integrated approach provides guidance for library selection in plant virome studies and led to the identification of a novel allexivirus species, Garlic virus J (GarVJ), infecting garlic in Korea.
2. Results
2.1. Field Sampling and Library Construction
Visual inspection of field samples revealed distinct virus-like symptoms in pepper and garlic plants collected for comparative virome analysis (
Figure 1). Pepper leaves from Anseong showed typical tomato spotted wilt virus–like symptoms, including chlorotic and necrotic spots on a severely mottled lamina (
Figure 1A), whereas Jincheon peppers displayed milder mosaics and chlorotic patches consistent with mixed infections by diverse RNA viruses rather than a single dominant pathogen (
Figure 1B). The garlic sample from Hoengseong was collected as an intact bulb with attached roots to include all belowground tissues potentially harboring viruses for subsequent sequencing (
Figure 1C,D).
To evaluate mRNA enrichment versus rRNA depletion for plant virome profiling, three field samples (Anseong pepper leaves, Jincheon pepper leaves and Hoengseong garlic cloves) were each processed with two library preparation strategies, yielding six RNA-seq libraries in total (
Table 1). For each tissue, an rRNA-depleted total RNA library (TR) was prepared to capture both polyadenylated and non-polyadenylated transcripts, including viral genomes and non-coding RNAs, and a poly(A)-selected mRNA library (MR) was generated to enrich host mRNAs and allow comparison of viral recovery between methods; all libraries were sequenced as paired-end reads.
2.2. Read Mapping and Host Transcript Composition
Firstly, the two library types were compared with respect to host gene expression. For this analysis, reference datasets of coding DNA sequences (CDSs), chloroplast-encoded mRNAs and mitochondrial mRNAs were compiled for both pepper and garlic from available genome assemblies. Mapping statistics showed that the impact of library preparation on host read recovery varied among samples (
Figure 2). In Anseong pepper, poly(A) selection clearly enhanced mapping efficiency, with mapped reads increasing from 51.7% in the TR library (P-TR-AS) to 71.5% in the MR library (P-MR-AS) (
Figure 2A). In contrast, the Jincheon MR library (P-MR-JC) mapped less efficiently than its TR counterpart (33.6% vs. 51.2%), suggesting location- or sample-specific differences in transcript composition. For Hoengseong garlic, mapping rates also increased after poly(A) selection, from 34.0% in G-TR-HS to 43.0% in G-MR-HS. Across all tissues and library types, mapped reads were dominated by nuclear CDSs, which accounted for 44.7–100.0% of total reads, whereas organellar transcripts contributed only a minor fraction (
Figure 2B). Chloroplast mRNAs represented about half of all reads in pepper TR libraries (51.6% in P-TR-AS and 50.9% in P-TR-JC) but were completely absent from all MR libraries, indicating that poly(A) enrichment efficiently eliminates cp-encoded transcripts. Mitochondrial mRNAs remained at low abundance in every dataset (≤5.0% of total reads), confirming that organellar sequences are greatly outnumbered by nuclear CDS-derived reads.
2.3. Taxonomic Composition of Non-Host Contigs and Virus Genome Assemblies
Unmapped reads from all libraries were assembled
de novo with MEGAHIT and taxonomically classified by BLASTx against the NCBI nr database followed by MEGAN6 binning (
Figure 3;
Table 2 and
Table 3). In pepper libraries, most contigs derived from Viridiplantae, reaching 29,436–47,720 contigs and 11.8–25.3 million reads per library (e.g., 42,141 contigs and 11,832,565 reads in P-MR-AS; 45,647 contigs and 19,472,608 reads in P-TR-JC), whereas bacterial, fungal and animal assignments were rare (
Table 2). Actinomycetia and Proteobacteria each contributed at most 12–14 contigs and ≤15,359 reads, fungal taxa such as Alternaria, unclassified Fungi and Ustilaginomycotina occurred in only a few hundred to ~1400 reads, and animal sequences were limited to five Pterygota contigs (217 reads) in P-MR-JC, consistent with trace insect material. By contrast, virus-assigned contigs, though few in number (5–56 per library), were supported by large read counts, most notably in P-MR-JC, which contained 34 viral contigs underpinned by 16,735,748 reads, indicating a strong viral signal in this pepper sample (
Figure 3C–F;
Table 2).
Garlic libraries showed a similar dominance of plant sequences but with more pronounced fungal and bacterial components (
Figure 3;
Table 3). Viridiplantae contigs reached 110,892 contigs and 18,321,394 reads in G-TR-HS and 40,503 contigs and 14,498,540 reads in G-MR-HS, confirming that most non-host reads still originated from garlic transcripts. Fusarium was the most prominent non-plant taxon, with 86 contigs and 25,249 reads in G-TR-HS and 216 contigs and 9452 reads in G-MR-HS, while several bacterial groups (Brachybacterium, Corynebacteriales, Enterobacterales, Staphylococcus, Streptococcus) contributed dozens of contigs and up to 13,545 reads (Streptococcus) in G-TR-HS but were largely depleted in G-MR-HS (
Table 3). Animal-derived contigs assigned to Fragariocoptes, Mus and Neoptera were detected at low levels (≤28 contigs and 3047 reads per taxon), consistent with minor environmental or handling contamination rather than true microbiome members. Viral contigs were present in both garlic libraries, with 67 contigs and 1,736,549 reads in G-TR-HS and 55 contigs and 2,932,116 reads in G-MR-HS, underscoring a robust garlic virome signal in the non-host read fraction (
Figure 3C–F;
Table 3).
Assembly of non-host reads yielded 29,475–47,766 contigs in pepper libraries and substantially more in garlic, particularly G-TR-HS with 111,263 contigs, suggesting greater sequence diversity or stronger divergence from the reference genome in this sample (
Figure 3A). Total non-host mapped reads ranged from 11.8–42.0 million in pepper and 17.4–22.5 million in garlic, with P-TR-JC and G-TR-HS showing the deepest non-host sequencing (
Figure 3B). Across libraries, most assigned contigs and reads mapped to Viridiplantae, but distinct fungal signatures (Alternaria/Ustilaginomycotina in pepper versus Fusarium in garlic) and variable bacterial and viral loads highlighted host- and site-specific microbiome and virome features (
Figure 3C–E;
Table 2 and
Table 3). Viral reads were particularly enriched in P-MR-JC (16,735,748 reads; 29.1% of total), followed by P-TR-JC, G-TR-HS and G-MR-HS (6.2–7.7%), whereas P-TR-AS exhibited a comparatively low viral burden (2.8% of total reads), indicating substantial differences in virome activity among pepper tissues and between pepper and garlic hosts (
Figure 3F).
Across the six libraries, a diverse set of viral genomes and segments was successfully reconstructed and assigned accession numbers, confirming robust virome recovery from both pepper and garlic samples (
Table S1). In the Anseong pepper total RNA library (P-TR-AS), near-complete TSWV RNA M and RNA L segments were assembled and deposited as PX769119 and PX769120, respectively.
In Jincheon pepper, both library types contributed complementary viral genomes. The mRNA library (P-MR-JC) yielded multiple BBWV2 RNA1 and RNA2 sequences (PX769109–PX769111, PX769116, PX769117), together with MDV C1 alphasatellite and genomic components DNA-U2 and DNA-M (PX769121, PX769124–PX769127), and complete pepper cryptic virus 2 (PCV2) RNA1 and RNA2 segments (PX769128, PX769129). The paired total RNA library (P-TR-JC) provided additional BBWV2 RNA1/RNA2 variants (PX769112–PX769115, PX769118), multiple MDV C1 alphasatellite and DNA-S, DNA-R, DNA-U4 components (PX769122, PX769123, PX769132–PX769140), and full genomes of HPEV and PCV2 (PX769130, PX769131, PX769141), underscoring the broader segment and species recovery achieved with total RNA-seq.
For Hoengseong garlic, assembled sequences were dominated by garlic-infecting viruses. The mRNA library (G-MR-HS) produced complete or nearly complete genomes of garlic virus A, C, D and J, along with BPMV RNA2, which were deposited as PX769142, PX769145, PX769147, PX769148, PX769154, PX769155 and MZ422782.1. The total RNA library (G-TR-HS) further expanded this set, yielding additional genomes of garlic virus A, B, C, D and J and both BPMV RNA1 and RNA2, as well as pepper mild mottle virus (PMMoV), with accessions PX769143–PX769157.
Taken together, these assembled accessions show that rRNA-depleted total RNA-seq consistently recovered more viral species, genomic components and sequence variants than mRNA-seq, particularly for multipartite DNA viruses (MDV) and low-abundance RNA viruses (PCV2, HPEV, PMMoV), whereas mRNA-seq still captured the dominant polyadenylated RNA viruses such as BBWV2 and the major garlic viruses.
2.4. Comparative TSWV-Dominated Virome Analysis of Anseong Pepper Leaves by Total RNA-Seq and mRNA-Seq
Ribo-depleted total RNA-seq and poly(A)-selected mRNA-seq libraries from Anseong pepper leaves showed striking differences in the recovery of TSWV (
Figure 4). In the TR library (P-TR-AS), RNA segments S, M and L were each assembled into one to three contigs, whereas the MR library (P-MR-AS) produced more numerous but fragmented contigs, especially for RNA L (15 contigs), indicating less complete viral genome reconstruction under poly(A) selection (
Table S2 and
Figure 4A). Coverage profiles confirmed that P-TR-AS was heavily enriched for viral RNA, with mean depths of 14,164 (S), 11,878 (M) and 1814 (L), compared with only 49, 10 and 3 in P-MR-AS for the corresponding segments (
Figure 4B). Consistently, total TSWV reads in P-TR-AS (1,178,201) far exceeded those in P-MR-AS (2217), demonstrating that ribo-depleted total RNA-seq is markedly more effective than poly(A)-selected mRNA-seq for virome profiling of TSWV-infected pepper leaves from Anseong (
Figure 4C).
Phylogenetic analysis of near-complete TSWV RNA L and M sequences further showed that the Anseong isolate belongs to a locally circulating Korean lineage (
Table S4 and
Figure 5). In the RNA L tree, TSWV_AS formed a strongly supported clade with Korean accession MW293974.1, clearly separated from a diverse set of foreign reference sequences (
Figure 5A). A similar pattern was observed for RNA M, where TSWV_AS clustered tightly with MW293975.1 and additional Korean isolates (KU179564.1, KU179572.1), again forming a monophyletic group distinct from other global strains (
Figure 5B). Together, these results indicate that the severe TSWV infection revealed by total RNA-seq in Anseong peppers is caused by a typical Korean TSWV variant rather than a highly divergent or newly introduced strain.
2.5. Comparative Virome and Phylogenetic Analysis of Jincheon Pepper Leaves
Poly(A)-selected mRNA-seq and ribo-depleted total RNA-seq libraries from Jincheon pepper leaves revealed a complex virome whose apparent composition strongly depended on library preparation (
Figure 6). Both libraries contained CMV, Broad bean wilt virus 2 (BBWV2), Milk vetch dwarf virus (MDV) plus its C1 alphasatellite, Hot pepper endornavirus (HPEV) and Pepper cryptic virus 2 (PCV2), but their relative contributions differed markedly. In the poly(A)-selected library P-MR-JC, BBWV2 RNA1 and RNA2 were assembled into three and seven contigs, respectively, with extremely high mean coverages (176,961 and 113,331) and read counts (10,869,064 and 5,834,342), indicating that BBWV2 overwhelmingly dominated the mRNA-enriched virome (
Table S3 and
Figure 6A–C). In contrast, the TR library P-TR-JC produced more numerous and typically longer contigs for CMV and multiple MDV components, and several MDV segments (DNA-R, DNA-M, DNA-U2 and DNA-S) reached higher coverages than in P-MR-JC, showing that total RNA-seq better captures the multipartite DNA virus and its alphasatellite (
Figure 6A–C). HPEV and PCV2 were detected in both libraries but displayed higher segment coverages and read counts in P-TR-JC, consistent with improved recovery of non-polyadenylated or low-abundance persistent viruses in ribo-depleted libraries (
Figure 6C). Overall, P-MR-JC amassed 16,735,748 viral reads, almost all from BBWV2, whereas P-TR-JC contained 3,000,669 viral reads distributed more evenly among BBWV2, MDV, HPEV and PCV2, underscoring how the library preparation method reshapes the apparent Jincheon pepper virome (
Figure 6D,E).
For evolutionary analysis, only complete or nearly complete viral genome segments spanning the relevant open reading frames were included in maximum-likelihood phylogenies (
Figure 7 and
Figure 8). Within BBWV2, multiple sequence variants recovered from the same Jincheon sample—originating from both TR and MR libraries—clustered together for RNA1 and RNA2, forming a well-supported lineage within the broader BBWV2 population and indicating local diversification of a single strain group (
Figure 7A,B). The Jincheon HPEV genome grouped with Korean and other Asian enamovirus isolates, while PCV2 RNA1 and RNA2 variants from both library types formed coherent clades with a small subset of reference sequences, supporting the presence of a stable PCV2 lineage infecting peppers at this site (
Figure 7C–E).
Phylogenetic trees of MDV components and the C1 alphasatellite further highlighted within-sample genetic diversity and method-specific recovery (
Figure 8). For each segment (C1, DNA-M, DNA-R, DNA-S, DNA-U2 and DNA-U4), multiple Jincheon MDV variants were resolved, with some contigs derived exclusively from either the TR or MR library, reflecting differences in coverage and assembly between the two preparations (
Figure 8A–F). Nonetheless, all Jincheon MDV sequences grouped into closely related sublineages with a limited set of reference isolates, showing no evidence of highly divergent branches that might suggest inter-segment recombination. This indicates a relatively homogeneous MDV strain complex—represented by several closely related sequence variants—is circulating in Jincheon pepper fields.
2.6. Virome Composition and Allexivirus Speciation in Hoengseong Garlic Sample
Metatranscriptomic sequencing of Hoengseong garlic cloves with ribo-depleted total RNA (G-TR-HS) and poly(A)-selected mRNA (G-MR-HS) libraries revealed dense mixed infections dominated by allexiviruses and carlaviruses (
Figure 9). Both libraries contained numerous contigs assigned to garlic viruses A–E (GarVA–GarVE), Shallot virus X (ShVX) and the carlaviruses GarVD and GarB, with additional low-abundance RNA viruses including CMV, CGMMV, TSWV, SMV, BPMV, PMMoV and PVX. Viral RNA segments generally reached higher coverage in the TR library for non-polyadenylated viruses such as SMV, BPMV and PVX, whereas the major garlic viruses GarA, GarC, ShVX, GarE, GarVD and GarB showed consistently high coverages in both libraries, and together GarC, GarVD and GarB accounted for more than half of all viral reads. Total viral read counts (2,932,116 in G-MR-HS and 1,736,549 in G-TR-HS) confirmed that Hoengseong cloves harbor heavy mixed infections but that both library types robustly recover the core garlic virus community (
Table S4 and
Figure 9).
To place these viruses in an evolutionary context, maximum-likelihood trees were inferred from complete or nearly complete genome sequences covering all predicted ORFs of GarVA–GarVD and GarVJ (
Figure 10). Hoengseong isolates of GarVA, GarVB, GarVC and GarVD formed compact, strongly supported clades with contemporary reference strains, indicating that these infections are caused by typical, globally distributed garlic virus lineages. In contrast, GarVJ formed a distinct, well-supported branch within allexivirus, separate from recognized species. Species demarcation followed ICTV Potyviridae Study Group criteria requiring <80% amino acid identity across all major ORFs. GarVJ showed 62.3% replicase identity with Shallot virus X (closest match) and 64.3–71.3% identity for TGB1, TGB2, CP, and NAPB proteins with Garlic virus C homologues—all below the species threshold. These values, combined with phylogenetic separation and full-genome RNA-seq/Sanger validation, formally designate GarVJ as a new allexivirus species (
Figure 10).
Finally, PMMoV and BPMV sequences detected in Hoengseong garlic grouped within established lineages previously described from non-garlic hosts (
Figure 11). PMMoV genomes clustered with AB126003.1-like strains, and BPMV RNA1/RNA2 sequences grouped with common-bean isolates, with no garlic-specific subclades, supporting the interpretation that these reads reflect environmental or laboratory contamination rather than true garlic-infecting viruses (
Figure 11).
3. Discussion
This study provides a side-by-side comparison of rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq for plant virome analysis under realistic field conditions in pepper and garlic. Poly(A) selection increases the fraction of reads mapping to host nuclear CDSs. It also effectively removes chloroplast transcripts by strongly depleting plastid reads in oligo(dT)-selected libraries compared with rRNA-depleted libraries in the ChloroSeq study of
Arabidopsis chloroplast transcriptomes under heat stress [
14]. However, this approach greatly reduces the recovery of many viral RNAs. The effect is particularly pronounced for non-polyadenylated or low-abundance viruses. These methodological biases arise from three key biological factors. Many plant viruses lack poly(A) tails. Field samples contain latent viruses at much lower levels than dominant pathogens. Multipartite viruses require recovery of all genomic segments for complete assembly. Single library methods can miss over 50% of virome diversity. This study demonstrates these complementary recovery patterns between total RNA-seq and mRNA-seq.
In Anseong pepper, for example, total RNA-seq recovered over 500-fold more TSWV reads and far higher segment coverage than mRNA-seq, even though both libraries were prepared from the same tissue, which is consistent with the molecular architecture of TSWV transcripts, as TSWV is a segmented negative-strand RNA virus whose mRNAs generally lack canonical poly(A) tails and instead carry structured AU-rich 3′ untranslated regions that act as translation enhancers and can functionally mimic a poly(A) tail rather than true polyadenylation by host poly(A) polymerase [
15]. This result emphasizes that total RNA-seq is preferable when the aim is quantitative virome profiling or full-length genome reconstruction of RNA viruses such as TSWV, particularly those that are non-polyadenylated or present at low abundance.
At the same time, the Jincheon pepper and Hoengseong garlic datasets show that mRNA-seq can strongly distort the apparent virome composition. In Jincheon pepper, the poly(A)-selected library was dominated almost entirely by BBWV2, whereas the paired total RNA library also revealed MDV, HPEV and PCV2. Similarly, in garlic, both library types recovered the major allexiviruses and carlaviruses, but total RNA-seq additionally detected RNA viruses such as SMV, BPMV and PVX and often produced longer contigs. SMV, BPMV and PVX appeared only at low levels in the Hoengseong garlic total RNA-seq data, and several clues suggest they do not represent true garlic infections.
None of these viruses is known to infect garlic; in our recent work SMV and BPMV were associated with soybean, common bean and peanut, respectively, and PVX is routinely handled in the laboratory as an infectious clone [
16,
17,
18]. All three are positive-sense ssRNA viruses with 3′ poly(A) tails [
19,
20,
21], so they are readily captured by sequencing, but in this dataset their reads were sparse and uneven compared with the bona fide garlic allexiviruses and carlaviruses. Taken together, these patterns indicate that SMV, BPMV and PVX were most likely introduced as low-level laboratory contaminants, probably via shared pipettes, centrifuges or grinding tools, rather than reflecting active field infections in garlic.
MDV in the Jincheon pepper sample illustrates in more detail how multipartite nanoviruses complicate virome reconstruction from mRNA-seq alone. MDV possesses multiple circular single-stranded DNA segments of approximately 1 kb plus an associated C1 alphasatellite, and each segment can follow its own evolutionary trajectory and accumulation pattern within a mixed infection [
22]. In the phylogenetic trees for the C1 alphasatellite and the DNA-M, DNA-R, DNA-S, DNA-U2 and DNA-U4 segments, MDV sequences from this study (highlighted in red) cluster into at least two to three distinct clades for several components, rather than forming a single monophyletic group, which indicates the coexistence of closely related MDV sequence variants consistent with a quasispecies-like population structure [
23].
Importantly, different MDV segments and variants were not recovered uniformly across library types. Some components, including particular DNA-M, DNA-S and DNA-U2 variants, were assembled only from rRNA-depleted total RNA libraries, whereas other segments and variants (for example, specific DNA-R and DNA-U4 lineages) were obtained exclusively or predominantly from poly(A)-selected mRNA-seq, despite originating from the same plant. This segment- and variant-specific recovery pattern suggests that transcription, RNA processing or coverage of individual MDV components differs between library preparations, so that reliance on a single method could lead to incomplete or biased reconstruction of the MDV genome set and underestimation of its within-host genetic diversity. Consequently, the MDV data underscore that total RNA-seq and mRNA-seq are complementary for multipartite DNA viruses and that combining both library types is crucial to capture the full repertoire of genomic segments and quasispecies variants present in complex field infections. Library choice significantly affects evolutionary interpretation. Total RNA-seq libraries recovered greater quasispecies diversity and revealed segment-specific evolutionary patterns. mRNA-seq libraries alone produced artificially uniform strain complexes that underestimated within-host variation.
The comparative phylogenetic analyses also show how library choice affects evolutionary interpretation. For several viruses, including BBWV2 and MDV, multiple sequence variants were detected in the same field sample, and some variants were assembled only from total RNA libraries while others appeared only in mRNA-seq assemblies. When the analysis was restricted to complete or nearly complete ORF-spanning segments, however, all Jincheon BBWV2 and MDV sequences clustered into coherent, well-supported lineages within each species, indicating locally circulating strain complexes rather than recent introductions of highly divergent variants [
24,
25]. Similar patterns were seen for HPEV and PCV2, whose Jincheon isolates grouped with previously reported Asian or Korean strains [
8,
25]. These observations suggest that, once adequate coverage is achieved, both library types can provide genomes suitable for robust phylogenetic placement, but total RNA-seq gives a better chance of recovering the full range of variants present in mixed field infections.
In Hoengseong garlic, integrated analysis of virome composition and phylogeny revealed a virome dominated by typical GarVA–GarVD and GarVJ-like allexiviruses and carlaviruses, complemented by lower-abundance RNA viruses. Importantly, full-genome reconstruction and Sanger confirmation of GarVJ, combined with pairwise identity values below the International Committee on Taxonomy of Viruses (ICTV) species thresholds for all major proteins, support its recognition as a novel Allexivirus species. This underscores the value of combining high-depth metatranscriptomics with targeted validation for taxonomic decisions and demonstrates that heavily infected vegetative organs remain a rich source of new virus diversity even in well-studied crops. The GarVJ discovery demonstrates how total RNA-seq enables species identification within complex viromes. It combines distinct phylogenetic positioning, amino acid identities below 80% across major ORFs, and Sanger validation according to ICTV species demarcation criteria. This integrated approach serves as a model for virome-based taxonomic discoveries.
Detection of PMMoV sequences in garlic highlights an important issue in virome studies. Low-level contamination from environmental or laboratory sources can be sequenced alongside true host-associated viruses. We could not re-test these 2020 field samples. However, four strong lines of evidence support calling PMMoV, SMV, BPMV, and PVX contaminants. None of these viruses naturally infect garlic. Their phylogenetic trees match non-garlic virus lineages rather than forming garlic-specific clades. They show very low read counts and uneven coverage compared to authentic garlic viruses. Finally, PVX infectious clones were routinely handled in the same laboratory facility. These four independent observations strongly indicate contamination rather than true garlic infections. In many environmental surveys, PMMoV clusters with sequences from wastewater, surface waters or non-garlic plant hosts, forming lineages that track movement through food chains and sewage rather than adaptation to new plant hosts [
26]. This example shows that virome data must be interpreted using both ecology and phylogeny, and that true infections must be carefully distinguished from contaminants before proposing new host–virus relationships.
This study has a key experimental design limitation: only one plant per location was analyzed. Pepper samples consisted of pooled leaves from single plants at Anseong and Jincheon, while the garlic virome was derived from a single bulb from one Hoengseong plant. This
n = 1 biological replication represents a weakness that restricts the generalizability of our findings, particularly regarding species prevalence and mixed-infection patterns. The well-known plant-to-plant heterogeneity of garlic virus complexes means the allexivirus assemblage detected here may not represent the broader Hoengseong population. Similarly, MDV quasispecies and co-infection patterns in Jincheon pepper reflect single-plant characteristics rather than site-wide prevalence. This design prioritized paired library method comparisons using representative field samples. Comprehensive pepper virome patterns across Korean fields were established in our prior work analyzing 15
Capsicum annuum cultivars [
8].
We recommend five strategies for improved plant virome studies. First, prioritize rRNA-depleted total RNA-seq for comprehensive virus detection. Second, employ dual-library approaches when studying mixed infections or multipartite viruses. Third, quantify rRNA depletion efficiency for each sample. Fourth, implement bioinformatics filtering to remove laboratory contaminants. Fifth, confirm low-titer virus detections with targeted validation. These approaches will reduce methodological bias and improve biological interpretation of field viromes.
Overall, this study shows that rRNA-depleted total RNA-seq gives a broader picture of plant viromes, especially for non-polyadenylated RNA viruses and multipartite DNA viruses. In contrast, mRNA-seq remains useful for profiling host gene expression and for detecting highly expressed polyadenylated viruses. When resources permit, combining both library types is likely the best approach, but total RNA-seq should be prioritized when the main aim is exhaustive virus discovery and genome-level characterization.