Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic

Choi, Hoseong; Kang, Dong Woo; Jo, Yeonhwa; Park, Jisoo; Min, Dongjoo; Min, Gyeong Geun; Kim, Jisu; Shin, Chaemin; Hong, Jin-Sung; Cho, Won Kyong

doi:10.3390/ijms27052300

Open AccessArticle

Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic

by

Hoseong Choi

^1,†,

Dong Woo Kang

^2,3,†,

Yeonhwa Jo

⁴,

Jisoo Park

^5,6

,

Dongjoo Min

⁶

,

Gyeong Geun Min

⁶,

Jisu Kim

⁷,

Chaemin Shin

⁷,

Jin-Sung Hong

^6,*

and

Won Kyong Cho

^5,*

¹

Biocube System, Inc., Suwon 16648, Republic of Korea

²

UDZERA Co., Ltd., Chungju 27452, Republic of Korea

³

Department of Plant Medicine, College of Agriculture, Life & Environment Sciences, Chungbuk National University, Cheongju 28644, Republic of Korea

⁴

Department of Plant Protection and Quarantine, Jeonbuk National University, Jeonju 54896, Republic of Korea

⁵

Agriculture and Life Sciences Research Institute, Kangwon National University, Chuncheon 24341, Republic of Korea

⁶

Interdisciplinary Program in Smart Agriculture, Kangwon National University, Chuncheon 24341, Republic of Korea

⁷

Department of Plant Medicine, Division of Bioresource Sciences, Kangwon National University, Chuncheon 24341, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Int. J. Mol. Sci. 2026, 27(5), 2300; https://doi.org/10.3390/ijms27052300

Submission received: 30 December 2025 / Revised: 25 February 2026 / Accepted: 27 February 2026 / Published: 28 February 2026

(This article belongs to the Special Issue 25th Anniversary of IJMS: Updates and Advances in Molecular Informatics)

Download

Browse Figures

Versions Notes

Abstract

High-throughput RNA sequencing reveals plant viromes, but library preparation methods may bias viral detection. Here, we compared rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq using field-collected pepper leaves (Anseong and Jincheon) and garlic cloves (Hoengseong) from Korean commercial fields. rRNA-depleted total RNA-seq consistently recovered more viruses, longer contigs, and complete multipartite DNA virus genomes (e.g., milk vetch dwarf virus components, tomato spotted wilt virus segments), while mRNA-seq was dominated by highly expressed polyadenylated viruses like broad bean wilt virus 2. In Jincheon pepper, mRNA-seq missed hot pepper endornavirus, pepper cryptic virus 2, and multiple milk vetch dwarf virus segments revealed by total RNA-seq. Garlic libraries showed similar patterns, with total RNA-seq additionally detecting low-titer RNA viruses likely representing contamination. rRNA-depleted total RNA-seq provides a more complete, less biased view of plant viromes and is recommended for comprehensive virus discovery and genome reconstruction, while mRNA-seq remains useful for polyadenylated virus quantification and host gene expression analysis alongside virome profiling.

Keywords:

plant virome; rRNA-depleted total RNA-seq; poly(A)-selected mRNA-seq; pepper; garlic; metatranscriptomics; phylogenetics

1. Introduction

High-throughput sequencing (HTS) has revolutionized plant virus research by enabling unbiased detection of mixed infections, low-titer pathogens and previously unknown viruses directly from plant tissues [1]. HTS-based “virome” approaches bypass the need for virus isolation or prior serological and PCR knowledge, and have revealed that many crops harbor complex communities of RNA and DNA viruses, persistent viruses and defective or satellite RNAs [2]. However, the analytical power of HTS critically depends on how sequencing libraries are prepared [3]. Plant total RNA extracts are dominated by ribosomal RNAs and plastid transcripts, while many plant viruses lack poly(A) tails, possess highly structured genomes or replicate as multipartite DNA molecules, all of which can strongly influence their recovery in different library types [4]. As a result, methodological choices made upstream of sequencing can profoundly alter which viral taxa are detectable, how completely their genomes are assembled and how accurately their abundance is estimated [5].

Two main strategies are widely used for plant transcriptome and virome profiling. Ribosomal RNA-depleted total RNA sequencing aims to remove cytosolic and organellar rRNAs while retaining both polyadenylated and non-polyadenylated transcripts, including viral RNAs, viral replicative intermediates and a broad range of host non-coding RNAs [6]. In contrast, poly(A)-selected mRNA sequencing enriches for polyadenylated transcripts, greatly increasing the proportion of host coding sequences but inevitably depleting non-polyadenylated viral genomes and many DNA-virus transcripts [7]. Despite the frequent use of both approaches in plant virome studies, direct experimental comparisons of their performance on the same field samples remain scarce, and most published work has relied on a single library type, making it difficult to disentangle true biological patterns from methodological artefacts.

Pepper (Capsicum annuum) and garlic (Allium sativum) are high-value crops in Korea that are chronically exposed to a diverse ensemble of viruses [8,9]. Peppers are commonly infected by acute RNA viruses such as tomato spotted wilt orthotospovirus (TSWV) and broad bean wilt virus 2 (BBWV2), as well as persistent agents including hot pepper endornavirus (HPEV) and cryptic viruses such as pepper cryptic virus 2 (PCV2) [8,10]. Garlic, propagated vegetatively, accumulates long-term infections by multiple allexiviruses (garlic virus A, B, C, D, E and X) and carlaviruses such as garlic common latent virus, shallot virus X (ShVX) and garlic latent virus (GLV), frequently in mixed combinations that reduce bulb yield and quality [11]. Previous metatranscriptomic surveys in these crops have catalogued many of these viruses and suggested substantial intra-species diversity but have seldom examined how library preparation biases virome profiles, nor have they systematically linked virome composition to genome-scale phylogenies.

Field-collected tissues also present additional interpretive challenges. Heavily infected leaves and cloves may contain not only highly abundant epidemic viruses but also low-titer, persistent or latent viruses—some potentially interacting synergistically with dominant pathogens—as well as endophytic fungi and bacteria, arthropod-derived sequences and environmental contaminants from soil, irrigation water or handling [12,13]. Distinguishing true host–virus associations from background noise demands workflows that not only maximize viral recovery but also support robust evolutionary and taxonomic analyses.

In this study, pepper leaves from Anseong and Jincheon and garlic cloves from Hoengseong were collected from commercial fields in Korea to compare rRNA-depleted total RNA and poly(A)-selected mRNA libraries for plant virome analysis. The aims were to (i) quantify how these library types differ in mapping efficiency and host transcript composition, (ii) assess how they reshape apparent viral diversity, genome coverage and within-sample variants in pepper and garlic viromes, and (iii) obtain near-complete viral genomes for phylogenetic placement and species demarcation, with emphasis on allexiviruses in garlic. This integrated approach provides guidance for library selection in plant virome studies and led to the identification of a novel allexivirus species, Garlic virus J (GarVJ), infecting garlic in Korea.

2. Results

2.1. Field Sampling and Library Construction

Visual inspection of field samples revealed distinct virus-like symptoms in pepper and garlic plants collected for comparative virome analysis (Figure 1). Pepper leaves from Anseong showed typical tomato spotted wilt virus–like symptoms, including chlorotic and necrotic spots on a severely mottled lamina (Figure 1A), whereas Jincheon peppers displayed milder mosaics and chlorotic patches consistent with mixed infections by diverse RNA viruses rather than a single dominant pathogen (Figure 1B). The garlic sample from Hoengseong was collected as an intact bulb with attached roots to include all belowground tissues potentially harboring viruses for subsequent sequencing (Figure 1C,D).

To evaluate mRNA enrichment versus rRNA depletion for plant virome profiling, three field samples (Anseong pepper leaves, Jincheon pepper leaves and Hoengseong garlic cloves) were each processed with two library preparation strategies, yielding six RNA-seq libraries in total (Table 1). For each tissue, an rRNA-depleted total RNA library (TR) was prepared to capture both polyadenylated and non-polyadenylated transcripts, including viral genomes and non-coding RNAs, and a poly(A)-selected mRNA library (MR) was generated to enrich host mRNAs and allow comparison of viral recovery between methods; all libraries were sequenced as paired-end reads.

2.2. Read Mapping and Host Transcript Composition

Firstly, the two library types were compared with respect to host gene expression. For this analysis, reference datasets of coding DNA sequences (CDSs), chloroplast-encoded mRNAs and mitochondrial mRNAs were compiled for both pepper and garlic from available genome assemblies. Mapping statistics showed that the impact of library preparation on host read recovery varied among samples (Figure 2). In Anseong pepper, poly(A) selection clearly enhanced mapping efficiency, with mapped reads increasing from 51.7% in the TR library (P-TR-AS) to 71.5% in the MR library (P-MR-AS) (Figure 2A). In contrast, the Jincheon MR library (P-MR-JC) mapped less efficiently than its TR counterpart (33.6% vs. 51.2%), suggesting location- or sample-specific differences in transcript composition. For Hoengseong garlic, mapping rates also increased after poly(A) selection, from 34.0% in G-TR-HS to 43.0% in G-MR-HS. Across all tissues and library types, mapped reads were dominated by nuclear CDSs, which accounted for 44.7–100.0% of total reads, whereas organellar transcripts contributed only a minor fraction (Figure 2B). Chloroplast mRNAs represented about half of all reads in pepper TR libraries (51.6% in P-TR-AS and 50.9% in P-TR-JC) but were completely absent from all MR libraries, indicating that poly(A) enrichment efficiently eliminates cp-encoded transcripts. Mitochondrial mRNAs remained at low abundance in every dataset (≤5.0% of total reads), confirming that organellar sequences are greatly outnumbered by nuclear CDS-derived reads.

2.3. Taxonomic Composition of Non-Host Contigs and Virus Genome Assemblies

Unmapped reads from all libraries were assembled de novo with MEGAHIT and taxonomically classified by BLASTx against the NCBI nr database followed by MEGAN6 binning (Figure 3; Table 2 and Table 3). In pepper libraries, most contigs derived from Viridiplantae, reaching 29,436–47,720 contigs and 11.8–25.3 million reads per library (e.g., 42,141 contigs and 11,832,565 reads in P-MR-AS; 45,647 contigs and 19,472,608 reads in P-TR-JC), whereas bacterial, fungal and animal assignments were rare (Table 2). Actinomycetia and Proteobacteria each contributed at most 12–14 contigs and ≤15,359 reads, fungal taxa such as Alternaria, unclassified Fungi and Ustilaginomycotina occurred in only a few hundred to ~1400 reads, and animal sequences were limited to five Pterygota contigs (217 reads) in P-MR-JC, consistent with trace insect material. By contrast, virus-assigned contigs, though few in number (5–56 per library), were supported by large read counts, most notably in P-MR-JC, which contained 34 viral contigs underpinned by 16,735,748 reads, indicating a strong viral signal in this pepper sample (Figure 3C–F; Table 2).

Garlic libraries showed a similar dominance of plant sequences but with more pronounced fungal and bacterial components (Figure 3; Table 3). Viridiplantae contigs reached 110,892 contigs and 18,321,394 reads in G-TR-HS and 40,503 contigs and 14,498,540 reads in G-MR-HS, confirming that most non-host reads still originated from garlic transcripts. Fusarium was the most prominent non-plant taxon, with 86 contigs and 25,249 reads in G-TR-HS and 216 contigs and 9452 reads in G-MR-HS, while several bacterial groups (Brachybacterium, Corynebacteriales, Enterobacterales, Staphylococcus, Streptococcus) contributed dozens of contigs and up to 13,545 reads (Streptococcus) in G-TR-HS but were largely depleted in G-MR-HS (Table 3). Animal-derived contigs assigned to Fragariocoptes, Mus and Neoptera were detected at low levels (≤28 contigs and 3047 reads per taxon), consistent with minor environmental or handling contamination rather than true microbiome members. Viral contigs were present in both garlic libraries, with 67 contigs and 1,736,549 reads in G-TR-HS and 55 contigs and 2,932,116 reads in G-MR-HS, underscoring a robust garlic virome signal in the non-host read fraction (Figure 3C–F; Table 3).

Assembly of non-host reads yielded 29,475–47,766 contigs in pepper libraries and substantially more in garlic, particularly G-TR-HS with 111,263 contigs, suggesting greater sequence diversity or stronger divergence from the reference genome in this sample (Figure 3A). Total non-host mapped reads ranged from 11.8–42.0 million in pepper and 17.4–22.5 million in garlic, with P-TR-JC and G-TR-HS showing the deepest non-host sequencing (Figure 3B). Across libraries, most assigned contigs and reads mapped to Viridiplantae, but distinct fungal signatures (Alternaria/Ustilaginomycotina in pepper versus Fusarium in garlic) and variable bacterial and viral loads highlighted host- and site-specific microbiome and virome features (Figure 3C–E; Table 2 and Table 3). Viral reads were particularly enriched in P-MR-JC (16,735,748 reads; 29.1% of total), followed by P-TR-JC, G-TR-HS and G-MR-HS (6.2–7.7%), whereas P-TR-AS exhibited a comparatively low viral burden (2.8% of total reads), indicating substantial differences in virome activity among pepper tissues and between pepper and garlic hosts (Figure 3F).

Across the six libraries, a diverse set of viral genomes and segments was successfully reconstructed and assigned accession numbers, confirming robust virome recovery from both pepper and garlic samples (Table S1). In the Anseong pepper total RNA library (P-TR-AS), near-complete TSWV RNA M and RNA L segments were assembled and deposited as PX769119 and PX769120, respectively.

In Jincheon pepper, both library types contributed complementary viral genomes. The mRNA library (P-MR-JC) yielded multiple BBWV2 RNA1 and RNA2 sequences (PX769109–PX769111, PX769116, PX769117), together with MDV C1 alphasatellite and genomic components DNA-U2 and DNA-M (PX769121, PX769124–PX769127), and complete pepper cryptic virus 2 (PCV2) RNA1 and RNA2 segments (PX769128, PX769129). The paired total RNA library (P-TR-JC) provided additional BBWV2 RNA1/RNA2 variants (PX769112–PX769115, PX769118), multiple MDV C1 alphasatellite and DNA-S, DNA-R, DNA-U4 components (PX769122, PX769123, PX769132–PX769140), and full genomes of HPEV and PCV2 (PX769130, PX769131, PX769141), underscoring the broader segment and species recovery achieved with total RNA-seq.

For Hoengseong garlic, assembled sequences were dominated by garlic-infecting viruses. The mRNA library (G-MR-HS) produced complete or nearly complete genomes of garlic virus A, C, D and J, along with BPMV RNA2, which were deposited as PX769142, PX769145, PX769147, PX769148, PX769154, PX769155 and MZ422782.1. The total RNA library (G-TR-HS) further expanded this set, yielding additional genomes of garlic virus A, B, C, D and J and both BPMV RNA1 and RNA2, as well as pepper mild mottle virus (PMMoV), with accessions PX769143–PX769157.

Taken together, these assembled accessions show that rRNA-depleted total RNA-seq consistently recovered more viral species, genomic components and sequence variants than mRNA-seq, particularly for multipartite DNA viruses (MDV) and low-abundance RNA viruses (PCV2, HPEV, PMMoV), whereas mRNA-seq still captured the dominant polyadenylated RNA viruses such as BBWV2 and the major garlic viruses.

2.4. Comparative TSWV-Dominated Virome Analysis of Anseong Pepper Leaves by Total RNA-Seq and mRNA-Seq

Ribo-depleted total RNA-seq and poly(A)-selected mRNA-seq libraries from Anseong pepper leaves showed striking differences in the recovery of TSWV (Figure 4). In the TR library (P-TR-AS), RNA segments S, M and L were each assembled into one to three contigs, whereas the MR library (P-MR-AS) produced more numerous but fragmented contigs, especially for RNA L (15 contigs), indicating less complete viral genome reconstruction under poly(A) selection (Table S2 and Figure 4A). Coverage profiles confirmed that P-TR-AS was heavily enriched for viral RNA, with mean depths of 14,164 (S), 11,878 (M) and 1814 (L), compared with only 49, 10 and 3 in P-MR-AS for the corresponding segments (Figure 4B). Consistently, total TSWV reads in P-TR-AS (1,178,201) far exceeded those in P-MR-AS (2217), demonstrating that ribo-depleted total RNA-seq is markedly more effective than poly(A)-selected mRNA-seq for virome profiling of TSWV-infected pepper leaves from Anseong (Figure 4C).

Phylogenetic analysis of near-complete TSWV RNA L and M sequences further showed that the Anseong isolate belongs to a locally circulating Korean lineage (Table S4 and Figure 5). In the RNA L tree, TSWV_AS formed a strongly supported clade with Korean accession MW293974.1, clearly separated from a diverse set of foreign reference sequences (Figure 5A). A similar pattern was observed for RNA M, where TSWV_AS clustered tightly with MW293975.1 and additional Korean isolates (KU179564.1, KU179572.1), again forming a monophyletic group distinct from other global strains (Figure 5B). Together, these results indicate that the severe TSWV infection revealed by total RNA-seq in Anseong peppers is caused by a typical Korean TSWV variant rather than a highly divergent or newly introduced strain.

2.5. Comparative Virome and Phylogenetic Analysis of Jincheon Pepper Leaves

Poly(A)-selected mRNA-seq and ribo-depleted total RNA-seq libraries from Jincheon pepper leaves revealed a complex virome whose apparent composition strongly depended on library preparation (Figure 6). Both libraries contained CMV, Broad bean wilt virus 2 (BBWV2), Milk vetch dwarf virus (MDV) plus its C1 alphasatellite, Hot pepper endornavirus (HPEV) and Pepper cryptic virus 2 (PCV2), but their relative contributions differed markedly. In the poly(A)-selected library P-MR-JC, BBWV2 RNA1 and RNA2 were assembled into three and seven contigs, respectively, with extremely high mean coverages (176,961 and 113,331) and read counts (10,869,064 and 5,834,342), indicating that BBWV2 overwhelmingly dominated the mRNA-enriched virome (Table S3 and Figure 6A–C). In contrast, the TR library P-TR-JC produced more numerous and typically longer contigs for CMV and multiple MDV components, and several MDV segments (DNA-R, DNA-M, DNA-U2 and DNA-S) reached higher coverages than in P-MR-JC, showing that total RNA-seq better captures the multipartite DNA virus and its alphasatellite (Figure 6A–C). HPEV and PCV2 were detected in both libraries but displayed higher segment coverages and read counts in P-TR-JC, consistent with improved recovery of non-polyadenylated or low-abundance persistent viruses in ribo-depleted libraries (Figure 6C). Overall, P-MR-JC amassed 16,735,748 viral reads, almost all from BBWV2, whereas P-TR-JC contained 3,000,669 viral reads distributed more evenly among BBWV2, MDV, HPEV and PCV2, underscoring how the library preparation method reshapes the apparent Jincheon pepper virome (Figure 6D,E).

For evolutionary analysis, only complete or nearly complete viral genome segments spanning the relevant open reading frames were included in maximum-likelihood phylogenies (Figure 7 and Figure 8). Within BBWV2, multiple sequence variants recovered from the same Jincheon sample—originating from both TR and MR libraries—clustered together for RNA1 and RNA2, forming a well-supported lineage within the broader BBWV2 population and indicating local diversification of a single strain group (Figure 7A,B). The Jincheon HPEV genome grouped with Korean and other Asian enamovirus isolates, while PCV2 RNA1 and RNA2 variants from both library types formed coherent clades with a small subset of reference sequences, supporting the presence of a stable PCV2 lineage infecting peppers at this site (Figure 7C–E).

Phylogenetic trees of MDV components and the C1 alphasatellite further highlighted within-sample genetic diversity and method-specific recovery (Figure 8). For each segment (C1, DNA-M, DNA-R, DNA-S, DNA-U2 and DNA-U4), multiple Jincheon MDV variants were resolved, with some contigs derived exclusively from either the TR or MR library, reflecting differences in coverage and assembly between the two preparations (Figure 8A–F). Nonetheless, all Jincheon MDV sequences grouped into closely related sublineages with a limited set of reference isolates, showing no evidence of highly divergent branches that might suggest inter-segment recombination. This indicates a relatively homogeneous MDV strain complex—represented by several closely related sequence variants—is circulating in Jincheon pepper fields.

2.6. Virome Composition and Allexivirus Speciation in Hoengseong Garlic Sample

Metatranscriptomic sequencing of Hoengseong garlic cloves with ribo-depleted total RNA (G-TR-HS) and poly(A)-selected mRNA (G-MR-HS) libraries revealed dense mixed infections dominated by allexiviruses and carlaviruses (Figure 9). Both libraries contained numerous contigs assigned to garlic viruses A–E (GarVA–GarVE), Shallot virus X (ShVX) and the carlaviruses GarVD and GarB, with additional low-abundance RNA viruses including CMV, CGMMV, TSWV, SMV, BPMV, PMMoV and PVX. Viral RNA segments generally reached higher coverage in the TR library for non-polyadenylated viruses such as SMV, BPMV and PVX, whereas the major garlic viruses GarA, GarC, ShVX, GarE, GarVD and GarB showed consistently high coverages in both libraries, and together GarC, GarVD and GarB accounted for more than half of all viral reads. Total viral read counts (2,932,116 in G-MR-HS and 1,736,549 in G-TR-HS) confirmed that Hoengseong cloves harbor heavy mixed infections but that both library types robustly recover the core garlic virus community (Table S4 and Figure 9).

To place these viruses in an evolutionary context, maximum-likelihood trees were inferred from complete or nearly complete genome sequences covering all predicted ORFs of GarVA–GarVD and GarVJ (Figure 10). Hoengseong isolates of GarVA, GarVB, GarVC and GarVD formed compact, strongly supported clades with contemporary reference strains, indicating that these infections are caused by typical, globally distributed garlic virus lineages. In contrast, GarVJ formed a distinct, well-supported branch within allexivirus, separate from recognized species. Species demarcation followed ICTV Potyviridae Study Group criteria requiring <80% amino acid identity across all major ORFs. GarVJ showed 62.3% replicase identity with Shallot virus X (closest match) and 64.3–71.3% identity for TGB1, TGB2, CP, and NAPB proteins with Garlic virus C homologues—all below the species threshold. These values, combined with phylogenetic separation and full-genome RNA-seq/Sanger validation, formally designate GarVJ as a new allexivirus species (Figure 10).

Finally, PMMoV and BPMV sequences detected in Hoengseong garlic grouped within established lineages previously described from non-garlic hosts (Figure 11). PMMoV genomes clustered with AB126003.1-like strains, and BPMV RNA1/RNA2 sequences grouped with common-bean isolates, with no garlic-specific subclades, supporting the interpretation that these reads reflect environmental or laboratory contamination rather than true garlic-infecting viruses (Figure 11).

3. Discussion

This study provides a side-by-side comparison of rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq for plant virome analysis under realistic field conditions in pepper and garlic. Poly(A) selection increases the fraction of reads mapping to host nuclear CDSs. It also effectively removes chloroplast transcripts by strongly depleting plastid reads in oligo(dT)-selected libraries compared with rRNA-depleted libraries in the ChloroSeq study of Arabidopsis chloroplast transcriptomes under heat stress [14]. However, this approach greatly reduces the recovery of many viral RNAs. The effect is particularly pronounced for non-polyadenylated or low-abundance viruses. These methodological biases arise from three key biological factors. Many plant viruses lack poly(A) tails. Field samples contain latent viruses at much lower levels than dominant pathogens. Multipartite viruses require recovery of all genomic segments for complete assembly. Single library methods can miss over 50% of virome diversity. This study demonstrates these complementary recovery patterns between total RNA-seq and mRNA-seq.

In Anseong pepper, for example, total RNA-seq recovered over 500-fold more TSWV reads and far higher segment coverage than mRNA-seq, even though both libraries were prepared from the same tissue, which is consistent with the molecular architecture of TSWV transcripts, as TSWV is a segmented negative-strand RNA virus whose mRNAs generally lack canonical poly(A) tails and instead carry structured AU-rich 3′ untranslated regions that act as translation enhancers and can functionally mimic a poly(A) tail rather than true polyadenylation by host poly(A) polymerase [15]. This result emphasizes that total RNA-seq is preferable when the aim is quantitative virome profiling or full-length genome reconstruction of RNA viruses such as TSWV, particularly those that are non-polyadenylated or present at low abundance.

At the same time, the Jincheon pepper and Hoengseong garlic datasets show that mRNA-seq can strongly distort the apparent virome composition. In Jincheon pepper, the poly(A)-selected library was dominated almost entirely by BBWV2, whereas the paired total RNA library also revealed MDV, HPEV and PCV2. Similarly, in garlic, both library types recovered the major allexiviruses and carlaviruses, but total RNA-seq additionally detected RNA viruses such as SMV, BPMV and PVX and often produced longer contigs. SMV, BPMV and PVX appeared only at low levels in the Hoengseong garlic total RNA-seq data, and several clues suggest they do not represent true garlic infections.

None of these viruses is known to infect garlic; in our recent work SMV and BPMV were associated with soybean, common bean and peanut, respectively, and PVX is routinely handled in the laboratory as an infectious clone [16,17,18]. All three are positive-sense ssRNA viruses with 3′ poly(A) tails [19,20,21], so they are readily captured by sequencing, but in this dataset their reads were sparse and uneven compared with the bona fide garlic allexiviruses and carlaviruses. Taken together, these patterns indicate that SMV, BPMV and PVX were most likely introduced as low-level laboratory contaminants, probably via shared pipettes, centrifuges or grinding tools, rather than reflecting active field infections in garlic.

MDV in the Jincheon pepper sample illustrates in more detail how multipartite nanoviruses complicate virome reconstruction from mRNA-seq alone. MDV possesses multiple circular single-stranded DNA segments of approximately 1 kb plus an associated C1 alphasatellite, and each segment can follow its own evolutionary trajectory and accumulation pattern within a mixed infection [22]. In the phylogenetic trees for the C1 alphasatellite and the DNA-M, DNA-R, DNA-S, DNA-U2 and DNA-U4 segments, MDV sequences from this study (highlighted in red) cluster into at least two to three distinct clades for several components, rather than forming a single monophyletic group, which indicates the coexistence of closely related MDV sequence variants consistent with a quasispecies-like population structure [23].

Importantly, different MDV segments and variants were not recovered uniformly across library types. Some components, including particular DNA-M, DNA-S and DNA-U2 variants, were assembled only from rRNA-depleted total RNA libraries, whereas other segments and variants (for example, specific DNA-R and DNA-U4 lineages) were obtained exclusively or predominantly from poly(A)-selected mRNA-seq, despite originating from the same plant. This segment- and variant-specific recovery pattern suggests that transcription, RNA processing or coverage of individual MDV components differs between library preparations, so that reliance on a single method could lead to incomplete or biased reconstruction of the MDV genome set and underestimation of its within-host genetic diversity. Consequently, the MDV data underscore that total RNA-seq and mRNA-seq are complementary for multipartite DNA viruses and that combining both library types is crucial to capture the full repertoire of genomic segments and quasispecies variants present in complex field infections. Library choice significantly affects evolutionary interpretation. Total RNA-seq libraries recovered greater quasispecies diversity and revealed segment-specific evolutionary patterns. mRNA-seq libraries alone produced artificially uniform strain complexes that underestimated within-host variation.

The comparative phylogenetic analyses also show how library choice affects evolutionary interpretation. For several viruses, including BBWV2 and MDV, multiple sequence variants were detected in the same field sample, and some variants were assembled only from total RNA libraries while others appeared only in mRNA-seq assemblies. When the analysis was restricted to complete or nearly complete ORF-spanning segments, however, all Jincheon BBWV2 and MDV sequences clustered into coherent, well-supported lineages within each species, indicating locally circulating strain complexes rather than recent introductions of highly divergent variants [24,25]. Similar patterns were seen for HPEV and PCV2, whose Jincheon isolates grouped with previously reported Asian or Korean strains [8,25]. These observations suggest that, once adequate coverage is achieved, both library types can provide genomes suitable for robust phylogenetic placement, but total RNA-seq gives a better chance of recovering the full range of variants present in mixed field infections.

In Hoengseong garlic, integrated analysis of virome composition and phylogeny revealed a virome dominated by typical GarVA–GarVD and GarVJ-like allexiviruses and carlaviruses, complemented by lower-abundance RNA viruses. Importantly, full-genome reconstruction and Sanger confirmation of GarVJ, combined with pairwise identity values below the International Committee on Taxonomy of Viruses (ICTV) species thresholds for all major proteins, support its recognition as a novel Allexivirus species. This underscores the value of combining high-depth metatranscriptomics with targeted validation for taxonomic decisions and demonstrates that heavily infected vegetative organs remain a rich source of new virus diversity even in well-studied crops. The GarVJ discovery demonstrates how total RNA-seq enables species identification within complex viromes. It combines distinct phylogenetic positioning, amino acid identities below 80% across major ORFs, and Sanger validation according to ICTV species demarcation criteria. This integrated approach serves as a model for virome-based taxonomic discoveries.

Detection of PMMoV sequences in garlic highlights an important issue in virome studies. Low-level contamination from environmental or laboratory sources can be sequenced alongside true host-associated viruses. We could not re-test these 2020 field samples. However, four strong lines of evidence support calling PMMoV, SMV, BPMV, and PVX contaminants. None of these viruses naturally infect garlic. Their phylogenetic trees match non-garlic virus lineages rather than forming garlic-specific clades. They show very low read counts and uneven coverage compared to authentic garlic viruses. Finally, PVX infectious clones were routinely handled in the same laboratory facility. These four independent observations strongly indicate contamination rather than true garlic infections. In many environmental surveys, PMMoV clusters with sequences from wastewater, surface waters or non-garlic plant hosts, forming lineages that track movement through food chains and sewage rather than adaptation to new plant hosts [26]. This example shows that virome data must be interpreted using both ecology and phylogeny, and that true infections must be carefully distinguished from contaminants before proposing new host–virus relationships.

This study has a key experimental design limitation: only one plant per location was analyzed. Pepper samples consisted of pooled leaves from single plants at Anseong and Jincheon, while the garlic virome was derived from a single bulb from one Hoengseong plant. This n = 1 biological replication represents a weakness that restricts the generalizability of our findings, particularly regarding species prevalence and mixed-infection patterns. The well-known plant-to-plant heterogeneity of garlic virus complexes means the allexivirus assemblage detected here may not represent the broader Hoengseong population. Similarly, MDV quasispecies and co-infection patterns in Jincheon pepper reflect single-plant characteristics rather than site-wide prevalence. This design prioritized paired library method comparisons using representative field samples. Comprehensive pepper virome patterns across Korean fields were established in our prior work analyzing 15 Capsicum annuum cultivars [8].

We recommend five strategies for improved plant virome studies. First, prioritize rRNA-depleted total RNA-seq for comprehensive virus detection. Second, employ dual-library approaches when studying mixed infections or multipartite viruses. Third, quantify rRNA depletion efficiency for each sample. Fourth, implement bioinformatics filtering to remove laboratory contaminants. Fifth, confirm low-titer virus detections with targeted validation. These approaches will reduce methodological bias and improve biological interpretation of field viromes.

Overall, this study shows that rRNA-depleted total RNA-seq gives a broader picture of plant viromes, especially for non-polyadenylated RNA viruses and multipartite DNA viruses. In contrast, mRNA-seq remains useful for profiling host gene expression and for detecting highly expressed polyadenylated viruses. When resources permit, combining both library types is likely the best approach, but total RNA-seq should be prioritized when the main aim is exhaustive virus discovery and genome-level characterization.

4. Materials and Methods

4.1. Field Sampling and Symptom Observation

Pepper (Capsicum annuum) leaves showing virus-like symptoms were collected from commercial fields in Anseong (Gyeonggi-do, Republic of Korea) and Jincheon (Chungcheongbuk-do, Republic of Korea) in September 2019, and garlic (Allium sativum) cloves were sampled from a Hoengseong (Gangwon-do, Republic of Korea) field in May 2020. Three field plants were selected to represent distinct virome profiles (n = 1 biological replicate per profile): TSWV-dominated Anseong pepper, mixed RNA + DNA virus Jincheon pepper, and allexivirus-complex Hoengseong garlic. From each plant’s RNA extract, paired rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq libraries were generated as technical method comparisons (Table 1). Symptom photographs were taken prior to sampling: Anseong peppers displayed severe chlorotic and necrotic spotting and mosaic patterns typical of TSWV, whereas Jincheon peppers showed milder chlorosis and mottling suggestive of mixed RNA virus infections; garlic plants were sampled as intact bulbs with attached roots to include all virus-harboring tissues.

4.2. RNA Extraction and Library Preparation

Total RNA was extracted from pooled leaves of a single pepper plant or from garlic bulbs (cloves plus some outer skin carrying soil particles) using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. For each field sample (Anseong pepper, Jincheon pepper, Hoengseong garlic), two libraries were prepared: an rRNA-depleted total RNA library (TR), generated by removing cytoplasmic and organellar rRNAs with the TruSeq Stranded Total RNA LT Sample Prep Kit (Plant), and a poly(A)-selected mRNA library (MR), prepared by oligo(dT)-based enrichment using the TruSeq Stranded mRNA LT Sample Prep Kit. The resulting libraries were designated P-TR-AS, P-MR-AS, P-TR-JC, P-MR-JC, G-TR-HS and G-MR-HS (Table 1) and sequenced as 100 bp × 2 paired-end reads on an Illumina HiSeq 2000 system (Macrogen, Seoul, Republic of Korea). All raw reads in FASTQ format from the six RNA-seq libraries were deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession PRJNA1420407 with individual run accessions SRR37160096–SRR37160101.

4.3. Host Reference Construction and Read Mapping

To assess host transcript representation in each library, nuclear, chloroplast and mitochondrial reference datasets were assembled separately for garlic and pepper. For garlic, coding DNA sequences (CDSs) were obtained from the publicly available garlic genome and transcriptome resource on Figshare (file “garlic_genome_and_transcriptome_resources,” CDS dataset) [27]. Chloroplast and mitochondrial references consisted of the complete chloroplast genome of Allium sativum (NC_031829.1) and the mitochondrial genome of Allium cepa (NC_030100.1), which were used as proxies for garlic plastid and mitochondrial transcripts, respectively [28,29]. For pepper, nuclear CDSs were taken from the “Zhangshugang_CDS.fa” dataset available through the Tomato Functional Genomics Database/pepper genome resource [30]. Organellar references comprised the Capsicum annuum chloroplast genome (NC_018552.1) and the C. annuum cv. Jeju mitochondrial genome (NC_024624.1) [31,32].

Adapter-trimmed reads from each library were mapped to the combined nuclear CDS, chloroplast and mitochondrial reference sets using BBMap (Version 35.85) (https://github.com/BioInfoTools/BBMap) (accessed on 1 June 2021) with default parameters. For each library, mapping statistics were calculated, including the proportions of reads mapped and unmapped, and the fraction of mapped reads assigned to nuclear CDSs, chloroplast transcripts and mitochondrial transcripts. These values were summarized to compare host transcript representation between rRNA-depleted total RNA (TR) and poly(A)-selected mRNA (MR) libraries for each field sample.

4.4. De Novo Assembly and Taxonomic Assignment of Unmapped Reads

Reads that did not align to any pepper or garlic reference sequence (nuclear CDSs, chloroplast or mitochondrial genomes) were extracted and used for de novo assembly. These unmapped reads were assembled with MEGAHIT (v1.2.9) [33]. All resulting contigs above a minimum length threshold (≥200 nt) were searched against the NCBI non-redundant (nr) protein database using DIAMOND BLASTx (v2.1.8) with sensitive settings, and only hits exceeding predefined score and e-value cut-offs were retained [34]. Taxonomic annotation of contigs was then performed in MEGAN, which assigns reads and contigs to taxa based on the Lowest Common Ancestor (LCA) algorithm [35]. This approach prioritizes specificity over sensitivity, reducing false positives by placing divergent sequences at higher taxonomic levels rather than forcing species-level assignments.

4.5. Virus Genome Reconstruction and Abundance Estimation

All raw reads from each library were first assembled de novo with Trinity (v2.10.0) using default parameters [36], without prior removal of host-derived reads, because earlier trials showed that host-read subtraction could also discard viral reads and reduce virome recovery. The resulting contigs were screened by BLASTx (cutoff e-value 1 × 10⁻¹⁰) against a plant virus protein database derived from the NCBI viral genome resource, and contigs with significant hits were provisionally classified as virus-associated.

To eliminate residual non-viral sequences, these virus-associated contigs were further queried by BLASTx against the NCBI non-redundant protein database, and only contigs with best hits to viral proteins were retained for downstream analyses. Viral contigs were then curated by removing very short or low-complexity sequences and, where necessary, extended by iterative mapping of raw reads and re-assembly to improve completeness. For segmented viruses, contigs corresponding to the same genomic segment were merged or scaffolded on the basis of overlapping regions and guidance from reference genomes, yielding segment-level consensus sequences.

Viral abundance in each library was assessed using three complementary metrics: (i) the number of contigs assigned to each viral segment, (ii) the mean depth of read coverage across each reconstructed segment, and (iii) the total number of viral reads mapped per library. Only complete or nearly complete genome segments spanning all annotated open reading frames were retained for phylogenetic analyses and for species or strain demarcation.

4.6. Sanger Sequencing Validation of Garlic Virus J

For GarVJ (GenBank accession No. MZ422782.1), the complete 8945-nucleotide genome was confirmed by overlapping RT-PCR amplicons generated using primers GarHong-DN4-1F1 (5′-GAACCACACCAAACTGCACTAAACC-3′) and GarHong-DN4-8945R1 (5′-GGTGTCTTTGTCCATGTCCAGAG-3′), designed from RNA-seq assemblies. These primers amplify the full-length GarVJ genome from Hoengseong garlic RNA. Amplicons were sequenced by Sanger sequencing in both directions, and resulting chromatograms were assembled to independently validate the complete genomic sequence obtained from RNA-seq.

4.7. Phylogenetic Analyses

Multiple sequence alignments were generated for TSWV RNA L and M (n = 3 Korean isolates including Anseong), BBWV2 RNA1 and RNA2 (n = 11 Jincheon variants), HPEV genomic RNA (n = 1 Jincheon isolate), PCV2 RNA1 and RNA2 (n = 2 Jincheon variants), MDV genomic components plus C1 alphasatellite (n = 15 Jincheon segments/variants across 6 components), and complete genomes of GarVA–GarVD (n = 4 Hoengseong isolates) and GarVJ (n = 3: TR, MR, Sanger; MZ422782.1) using MAFFT (v7.526) with settings appropriate for RNA or DNA sequences [37]. Poorly aligned sites and gap-rich regions were removed with TrimAl (v1.5.1) before tree inference [38].

Maximum-likelihood phylogenetic analyses were performed using IQ-TREE with the following dataset-specific substitution models determined by ModelFinder: BBWV2-RNA1 (TIM2 + F + R2), BBWV2-RNA2 (TIM3 + F + G4), BPMV-RNA1 (TIM2 + F + I), BPMV-RNA2 (TN + F + I), Allexiviruses (GTR + F + I + G4), HPEV (GTR + F + I), MDV-C1-alpha (HKY + F + G4), MDV-M (F81 + F + G4), MDV-R (TN + F + I), MDV-S (TN + F + G4), MDV-U2 (TN + F + G4), MDV-U4 (TN + F + G4), PCV2-RNA1 (F81 + F), PCV2-RNA2 (F81 + F), PMMoV (HKY + F + I), TSWV-RNA-L (HKY + F), TSWV-RNA-M (HKY + F) [39], and the resulting trees were visualized and annotated in FigTree (v1.4.4), with local isolates highlighted (https://tree.bio.ed.ac.uk/software/figtree/) (accessed on 11 December 2025). Only complete or nearly complete segments spanning all annotated open reading frames were included in these analyses.

4.8. Pairwise Identity and Species Demarcation Analysis

Amino-acid identity between predicted proteins of GarVJ and representative allexivirus proteins (Replicase, TGB1, TGB2, 40 kDa movement-associated protein, coat-associated protein and NAPB) was calculated using pairwise alignments. These values were compared with ICTV species-demarcation criteria for Allexivirus (≤72% nt or ≤80% aa identity in CP or Rep).

5. Conclusions

This exploratory study (n = 3 field plants) compared rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq for plant virome profiling under realistic field conditions. Total RNA-seq recovered 3x more viral species and 5x more genome segments than mRNA-seq, particularly non-polyadenylated viruses (TSWV) and multipartite ssDNA viruses (MDV), while mRNA-seq showed strong bias toward abundant polyadenylated viruses (BBWV2) and missed >50% of co-infecting viruses. Single-method surveys underestimate disease complexes affecting pepper/garlic production, as total RNA-seq revealed MDV quasispecies diversity, novel GarVJ species (MZ422782.1), and complete TSWV genomes essential for diagnostics, resistance breeding, and certification. We recommend rRNA-depleted total RNA-seq as the primary method for comprehensive field virome characterization, supplemented by mRNA-seq only for polyadenylated virus quantification, with dual-library approaches essential for mixed infections and rigorous contamination validation using host range, phylogeny, and PCR confirmation to optimize disease management while minimizing methodological bias.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms27052300/s1.

Author Contributions

Conceptualization, H.C., D.W.K., J.-S.H. and W.K.C.; Data curation, H.C., J.P., D.M., G.G.M. and W.K.C.; Formal analysis, H.C., D.W.K., Y.J., J.P., D.M., G.G.M. and W.K.C.; Funding acquisition, J.-S.H. and W.K.C.; Investigation, H.C., D.W.K., Y.J., J.P., D.M., G.G.M., J.K. and C.S.; Methodology, H.C., D.W.K., Y.J., J.P. and W.K.C.; Project administration, J.-S.H. and W.K.C.; Resources, J.-S.H. and W.K.C.; Software, Y.J. and D.W.K.; Supervision, J.-S.H. and W.K.C.; Validation, H.C., D.W.K., J.P. and J.-S.H.; Visualization, H.C. and W.K.C.; Writing—original draft, H.C., D.W.K. and W.K.C.; Writing—review & editing, H.C., D.W.K., J.-S.H. and W.K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Research Program for Agricultural Science & Technology Development (Project No. RS-2025-02273065), National Institute of Agricultural Sciences, Rural Development Administration, Republic of Korea. The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All raw reads in FASTQ format from the six RNA-seq libraries were deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession PRJNA1420407 with individual run accessions SRR37160096-SRR37160101. The nucleotide sequences of the 50 viral genomes and genome segments generated in this study have been deposited in NCBI GenBank under accession numbers PX769109–PX769157 and MZ422782.1, with individual accessions summarized in Table S1.

Acknowledgments

We thank Jin Kyong Cho, Hyang Sook Kim and Mi Kyong Kim for their excellent assistance with the cultivation and harvest of garlic plants used in this study.

Conflicts of Interest

Hoseong Choi was employed by Biocube System, Inc., Suwon 16648, Republic of Korea. Dong Woo Kang was employed by UDZERA Co., Ltd., Chungju 27452, Republic of Korea. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Maclot, F.; Candresse, T.; Filloux, D.; Malmstrom, C.M.; Roumagnac, P.; van Der Vlugt, R.; Massart, S. Illuminating an ecological blackbox: Using high throughput sequencing to characterize the plant virome across scales. Front. Microbiol. 2020, 11, 578064. [Google Scholar] [CrossRef]
Villamor, D.; Ho, T.; Al Rwahnih, M.; Martin, R.; Tzanetakis, I. High throughput sequencing for plant virus detection and discovery. Phytopathology 2019, 109, 716–725. [Google Scholar] [CrossRef] [PubMed]
Lee, J.-Y. The principles and applications of high-throughput sequencing technologies. Dev. Reprod. 2023, 27, 9. [Google Scholar] [CrossRef] [PubMed]
Geng, G.; Wang, D.; Liu, Z.; Wang, Y.; Zhu, M.; Cao, X.; Yu, C.; Yuan, X. Translation of plant RNA viruses. Viruses 2021, 13, 2499. [Google Scholar] [CrossRef] [PubMed]
Houldcroft, C.J.; Beale, M.A.; Breuer, J. Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol. 2017, 15, 183–192. [Google Scholar] [CrossRef]
Marston, D.A.; McElhinney, L.M.; Ellis, R.J.; Horton, D.L.; Wise, E.L.; Leech, S.L.; David, D.; de Lamballerie, X.; Fooks, A.R. Next generation sequencing of viral RNA genomes. BMC Genom. 2013, 14, 444. [Google Scholar] [CrossRef]
Chen, W.; Jia, Q.; Song, Y.; Fu, H.; Wei, G.; Ni, T. Alternative polyadenylation: Methods, findings, and impacts. Genom. Proteom. Bioinform. 2017, 15, 287–300. [Google Scholar] [CrossRef]
Jo, Y.; Choi, H.; Lee, J.H.; Moh, S.H.; Cho, W.K. Viromes of 15 pepper (Capsicum annuum L.) cultivars. Int. J. Mol. Sci. 2022, 23, 10507. [Google Scholar] [CrossRef]
Jo, Y.; Kim, K.-H.; Cho, W.K. First report of cucumber mosaic virus infecting garlic (Allium sativum L.) in Korea. J. Plant Pathol. 2021, 103, 1063–1064. [Google Scholar] [CrossRef]
Kwon, S.-J.; Cho, Y.-E.; Kwon, O.-H.; Kang, H.-G.; Seo, J.-K. Resistance-breaking tomato spotted wilt virus variant that recently occurred in pepper in South Korea is a genetic reassortant. Plant Dis. 2021, 105, 2771–2775. [Google Scholar] [CrossRef]
Fajardo, T.V.; Nishijima, M.; Buso, J.A.; Torres, A.C.; Ávila, A.C.; Resende, R.O. Garlic viral complex: Identification of potyviruses and carlavirus in central Brazil. Fitopatol. Bras. 2001, 26, 619–626. [Google Scholar] [CrossRef]
Jo, Y.; Back, C.-G.; Kim, K.-H.; Chu, H.; Lee, J.H.; Moh, S.H.; Cho, W.K. Using RNA-sequencing data to examine tissue-specific garlic microbiomes. Int. J. Mol. Sci. 2021, 22, 6791. [Google Scholar] [CrossRef]
Jo, Y.; Back, C.-G.; Kim, K.-H.; Chu, H.; Lee, J.H.; Moh, S.H.; Cho, W.K. Comparative study of metagenomics and metatranscriptomics to reveal microbiomes in overwintering pepper fruits. Int. J. Mol. Sci. 2021, 22, 6202. [Google Scholar] [CrossRef]
Castandet, B.; Hotto, A.M.; Strickler, S.R.; Stern, D.B. ChloroSeq, an optimized chloroplast RNA-Seq bioinformatic pipeline, reveals remodeling of the organellar transcriptome under heat stress. G3 Genes. Genomes Genet. 2016, 6, 2817–2827. [Google Scholar] [CrossRef] [PubMed]
Geerts-Dimitriadou, C.; Lu, Y.-Y.; Geertsema, C.; Goldbach, R.; Kormelink, R. Analysis of the Tomato spotted wilt virus ambisense S RNA-encoded hairpin structure in translation. PLoS ONE 2012, 7, e31013. [Google Scholar] [CrossRef]
Jo, Y.; Yoon, Y.N.; Jang, Y.-W.; Choi, H.; Lee, Y.-H.; Kim, S.-M.; Choi, S.Y.; Lee, B.C.; Cho, W.K. Soybean viromes in the Republic of Korea revealed by RT-PCR and next-generation sequencing. Microorganisms 2020, 8, 1777. [Google Scholar] [CrossRef]
Jo, Y.; Choi, H.; Choi, S.Y.; Kim, S.-M.; Choi, Y.M.; Lee, B.C.; Cho, W.K. Complete genome sequence of bean pod mottle virus identified from common bean (Phaseolus vulgaris). Microbiol. Soc. Korea 2020, 56, 404–406. [Google Scholar]
Choi, H.; Cho, W.K.; Kim, K.-H. Two homologous host proteins interact with potato virus X RNAs and CPs and affect viral replication and movement. Sci. Rep. 2016, 6, 28743. [Google Scholar] [CrossRef] [PubMed]
Hajimorad, M.; Domier, L.; Tolin, S.; Whitham, S.; Saghai Maroof, M. Soybean mosaic virus: A successful potyvirus with a wide distribution but restricted natural host range. Mol. Plant Pathol. 2018, 19, 1563–1579. [Google Scholar] [CrossRef]
Smith, C.M.; Gedling, C.R.; Wiebe, K.F.; Cassone, B.J. A sweet story: Bean pod mottle virus transmission dynamics by Mexican bean beetles (Epilachna varivestis). Genome Biol. Evol. 2017, 9, 714–725. [Google Scholar] [CrossRef] [PubMed]
Islam, M.S.; Ahammed, M.A.; Akhter, F.; Rahman, M.; Molla, M.M.H. Whole genome sequencing and molecular detection of potato virus X in Bangladesh. PLoS ONE 2025, 20, e0322935. [Google Scholar]
Sano, Y.; Wada, M.; Hashimoto, Y.; Matsumoto, T.; Kojima, M. Sequences of ten circular ssDNA components associated with the milk vetch dwarf virus genome. J. Gen. Virol. 1998, 79, 3111–3118. [Google Scholar] [CrossRef]
Grigoras, I.; Timchenko, T.; Grande-Pérez, A.; Katul, L.; Vetten, H.-J.; Gronenborn, B. High variability and rapid evolution of a nanovirus. J. Virol. 2010, 84, 9105–9117. [Google Scholar] [CrossRef] [PubMed]
Kwak, H.-R.; Kim, M.-K.; Nam, M.; Kim, J.-S.; Kim, K.-H.; Cha, B.; Choi, H.-S. Genetic compositions of Broad bean wilt virus 2 infecting red pepper in Korea. Plant Pathol. J. 2013, 29, 274. [Google Scholar] [CrossRef] [PubMed]
Lal, A.; Qureshi, M.A.; Son, M.-C.; Lee, S.; Kil, E.-J. Construction and Segmental Reconstitution of Full-Length Infectious Clones of Milk Vetch Dwarf Virus. Viruses 2025, 17, 1213. [Google Scholar] [CrossRef]
Morgan, D. Pepper Mild Mottle Virus as an Indicator of Fecal Pollution Along an Urban Stretch of the Chattahoochee River in Atlanta, GA, 2014. Ph.D. Thesis, ScholarWorks@ Georgia State University, Atlanta, GA, USA, 2016. [Google Scholar]
Sun, X.; Zhu, S.; Li, N.; Cheng, Y.; Zhao, J.; Qiao, X.; Lu, L.; Liu, S.; Wang, Y.; Liu, C. A chromosome-level genome assembly of garlic (Allium sativum) provides insights into genome evolution and allicin biosynthesis. Mol. Plant 2020, 13, 1328–1339. [Google Scholar] [CrossRef] [PubMed]
Filyushin, M.A.; Beletsky, A.V.; Mazur, A.M.; Kochieva, E.Z. The complete plastid genome sequence of garlic Allium sativum L. Mitochondrial DNA Part. B 2016, 1, 831–832. [Google Scholar] [CrossRef]
Kim, B.; Kim, K.; Yang, T.-J.; Kim, S. Completion of the mitochondrial genome sequence of onion (Allium cepa L.) containing the CMS-S male-sterile cytoplasm and identification of an independent event of the ccmF N gene split. Curr. Genet. 2016, 62, 873–885. [Google Scholar]
Liu, F.; Zhao, J.; Sun, H.; Xiong, C.; Sun, X.; Wang, X.; Wang, Z.; Jarret, R.; Wang, J.; Tang, B. Genomes of cultivated and wild Capsicum species provide insights into pepper domestication and population differentiation. Nat. Commun. 2023, 14, 5487. [Google Scholar] [CrossRef]
Jo, Y.D.; Park, J.; Kim, J.; Song, W.; Hur, C.-G.; Lee, Y.-H.; Kang, B.-C. Complete sequencing and comparative analyses of the pepper (Capsicum annuum L.) plastome revealed high frequency of tandem repeats and large insertion/deletions on pepper plastome. Plant Cell Rep. 2011, 30, 217–229. [Google Scholar] [CrossRef]
Jo, Y.D.; Choi, Y.; Kim, D.-H.; Kim, B.-D.; Kang, B.-C. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing. BMC Genom. 2014, 15, 561. [Google Scholar] [CrossRef]
Li, D.; Liu, C.-M.; Luo, R.; Sadakane, K.; Lam, T.-W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015, 31, 1674–1676. [Google Scholar] [CrossRef] [PubMed]
Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
Huson, D.H.; Beier, S.; Flade, I.; Górska, A.; El-Hadidi, M.; Mitra, S.; Ruscheweyh, H.-J.; Tappu, R. MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 2016, 12, e1004957. [Google Scholar] [CrossRef]
Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.; Lieber, M. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef] [PubMed]
Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
Capella-Gutiérrez, S.; Silla-Martínez, J.M.; Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef]
Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; Von Haeseler, A.; Lanfear, R. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020, 37, 1530–1534, Correction in Mol. Biol. Evol. 2020, 37, 2461. [Google Scholar] [CrossRef]

Figure 1. Field symptoms of pepper and garlic plants sampled for comparative virome analysis. Images show disease phenotypes observed at the time of collection. (A) Anseong pepper leaves displaying severe chlorotic spotting and mottling (P-TR-AS/P-MR-AS libraries). (B) Jincheon pepper leaves showing mosaic and chlorotic patches (P-TR-JC/P-MR-JC libraries). (C) Hoengseong garlic bulb with attached roots (G-TR-HS/G-MR-HS libraries). (D) Hoengseong garlic plant exhibiting mild yellowing and streaking on leaves.

Figure 2. Mapping efficiency and composition of host-derived reads in ribo-depleted total RNA (TR) and poly(A)-selected mRNA (MR) libraries from pepper leaves and garlic cloves. (A) Proportion of mapped (yellow) and unmapped (gray) reads in each library. (B) Distribution of mapped reads among chloroplast mRNA (cp-mRNA, green), mitochondrial mRNA (mt-mRNA, orange) and nuclear coding sequences (CDSs, blue).

Figure 3. Taxonomic composition of contigs assembled from non-host (unmapped) reads in pepper and garlic libraries. (A,B) Numbers of non-host contigs and mapped reads for pepper (P-TR-AS, P-MR-AS, P-TR-JC, P-MR-JC) and garlic (G-TR-HS, G-MR-HS) libraries. (C–E) Assignment of these contigs and reads to plant, animal, bacterial, fungal and viral taxa, showing that most non-host sequences are plant-derived with sample-specific fungal and viral enrichments. (F) Proportion of viral reads in each library, highlighting a strong viral signal in P-MR-JC and moderate viral contributions in P-TR-JC, G-TR-HS and G-MR-HS.

Figure 4. Identification of TSWV from Anseong pepper leaves using ribo-depleted total RNA-seq versus poly(A)-selected mRNA-seq. (A) Numbers of contigs assembled for each TSWV genomic RNA segment (S, M and L) in P-TR-AS and P-MR-AS libraries. (B) Mean coverage of the three segments in each library. (C) Total TSWV reads mapped per library.

Figure 5. Phylogenetic relationships of TSWV RNA L and M segments from Anseong pepper. (A) Maximum-likelihood tree based on complete RNA L sequences. (B) Maximum-likelihood tree of complete RNA M sequences. Scale bars represent nucleotide substitutions per site. Bootstrap values ≥50% are indicated at nodes.

Figure 6. Virome composition of Jincheon pepper leaves in poly(A)-selected mRNA (P-MR-JC) and ribo-depleted total RNA (P-TR-JC) libraries. (A) Numbers of contigs for each detected virus or genomic segment, including CMV, BBWV2, MDV genomic components plus C1 alphasatellite, HPEV and PCV2. (B) Segment coverages for CMV, BBWV2 and MDV. (C) Coverage of HPEV and PCV2 RNA segments. (D) Proportion of viral reads assigned to each major virus. (E) Total viral reads per library.

Figure 7. Phylogenetic relationships of BBWV2, HPEV and PCV2 isolates from Jincheon pepper. (A) Maximum-likelihood tree of complete BBWV2 RNA1 sequences. (B) Maximum-likelihood tree of BBWV2 RNA2 sequences. (C) Maximum-likelihood tree of HPEV genomic sequences. (D) Maximum-likelihood analysis of PCV2 RNA1 sequences. (E) Maximum-likelihood tree of PCV2 RNA2 sequences. Scale bars represent nucleotide substitutions per site, and bootstrap values (>50) are indicated at the corresponding nodes.

Figure 8. Phylogenetic relationships of MDV components and associated C1 alphasatellite from Jincheon pepper. (A) Maximum-likelihood tree of MDV C1 alphasatellite sequences. (B) Maximum-likelihood tree of MDV DNA-M sequences. (C) Phylogenetic tree of MDV DNA-R sequences. (D) Maximum-likelihood tree of MDV DNA-S sequences. (E) Maximum-likelihood analysis of MDV DNA-U2 sequences. (F) Maximum-likelihood tree of MDV DNA-U4 sequences. Scale bars represent nucleotide substitutions per site, and bootstrap values (>50) are shown at the corresponding nodes.

Figure 9. Virome profiles of Hoengseong garlic cloves in poly(A)-selected mRNA (G-MR-HS) and ribo-depleted total RNA (G-TR-HS) libraries. (A) Numbers of contigs for each detected virus, including CMV, CGMMV, TSWV (RNA M), SMV, garlic viruses A–E (GarA–GarE), BPMV (RNA1 and RNA2), PMMV, PVX, GarVD and GarB. (B) Mean coverage of RNA virus genome segments. (C) Coverage of major garlic-infecting viruses (GarA, GarC, ShVX, GarE, GarVD and GarB). (D) Proportions of viral reads assigned to individual viruses. (E) Total viral reads per library.

Figure 10. Phylogenetic relationships of garlic viruses A, B, C, D and J (GarVA–GarVD and GarVJ) detected in Hoengseong garlic cloves. Maximum-likelihood trees based on complete or near-complete genome sequences. Scale bar represents nucleotide substitutions per site, and bootstrap values (>50) are indicated at the corresponding nodes.

Figure 11. Phylogenetic relationships of PMMoV and BPMV sequences detected in Hoengseong garlic, interpreted as contaminants rather than true garlic-infecting viruses. (A) Maximum-likelihood tree based on complete PMMoV genomic sequences. (B) Maximum-likelihood analysis of BPMV RNA1. (C) Maximum-likelihood tree of BPMV RNA2 sequences. Scale bars represent nucleotide substitutions per site, and bootstrap values (>50) are indicated at the corresponding nodes.

Table 1. Overview of rRNA-depleted total RNA-seq and poly(A)-selected mRNA-seq libraries from pepper leaves and garlic cloves collected in South Korea.

Sample Name	Plant	Tissue	Region	Collection Date	Library Preparation Method	Accession No.
P-TR-AS	Pepper	Leaves	Anseong	September 2019	total RNA-seq (ribo-depleted)	SRR37160101
P-MR-AS	Pepper	Leaves	Anseong	September 2019	mRNA-seq (poly(A) selection)	SRR37160100
P-TR-JC	Pepper	Leaves	Jincheon	September 2019	total RNA-seq (ribo-depleted)	SRR37160099
P-MR-JC	Pepper	Leaves	Jincheon	September 2019	mRNA-seq (poly(A) selection)	SRR37160098
G-TR-HS	Garlic	Cloves	Hoengseong	May 2020	total RNA-seq (ribo-depleted)	SRR37160097
G-MR-HS	Garlic	Cloves	Hoengseong	May 2020	mRNA-seq (poly(A) selection)	SRR37160096

Table 2. Taxonomic assignment of contigs derived from unmapped reads in pepper RNA-seq libraries after BLASTx searches against the NCBI non-redundant protein database.

Group	Taxon	P-TR-AS	P-TR-AS	P-MR-AS	P-MR-AS	P-TR-JC	P-TR-JC	P-MR-JC	P-MR-JC
Plants	Viridiplantae	47,720	17,261,913	42,141	11,832,565	45,647	19,472,608	29,436	25,287,808
Animals	Pterygota	0	0	0	0	0	0	5	217
Bacteria	Actinomycetia	12	3116	0	0	0	0	0	0
Bacteria	Proteobacteria	13	6657	0	0	14	15,359	0	0
Fungi	Alternaria	16	479	12	307	0	0	0	0
Fungi	Fungi	0	0	0	0	8	1409	0	0
Fungi	Ustilaginomycotina	0	0	16	1451	0	0	0	0
Viruses	Viruses	5	1,178,201	22	2217	56	3,000,669	34	16,735,748

Table 3. Taxonomic assignment of contigs derived from unmapped reads in garlic RNA-seq libraries after BLASTx searches against the NCBI non-redundant protein database.

Group	Taxon	G-TR-HS	G-TR-HS	G-MR-HS	G-MR-HS
Plants	Viridiplantae	110,892	18,321,394	40,503	14,498,540
Animals	Fragariocoptes	10	2541	28	3047
Animals	Mus	0	0	5	194
Animals	Neoptera	0	0	3	161
Bacteria	Brachybacterium	75	11,069	0	0
Bacteria	Corynebacteriales	7	747	0	0
Bacteria	Enterobacterales	7	505	0	0
Bacteria	Staphylococcus	10	2882	0	0
Bacteria	Streptococcus	109	13,545	5	231
Fungi	Fusarium	86	25,249	216	9452
Viruses	Viruses	67	1,736,549	55	2,932,116

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Choi, H.; Kang, D.W.; Jo, Y.; Park, J.; Min, D.; Min, G.G.; Kim, J.; Shin, C.; Hong, J.-S.; Cho, W.K. Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic. Int. J. Mol. Sci. 2026, 27, 2300. https://doi.org/10.3390/ijms27052300

AMA Style

Choi H, Kang DW, Jo Y, Park J, Min D, Min GG, Kim J, Shin C, Hong J-S, Cho WK. Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic. International Journal of Molecular Sciences. 2026; 27(5):2300. https://doi.org/10.3390/ijms27052300

Chicago/Turabian Style

Choi, Hoseong, Dong Woo Kang, Yeonhwa Jo, Jisoo Park, Dongjoo Min, Gyeong Geun Min, Jisu Kim, Chaemin Shin, Jin-Sung Hong, and Won Kyong Cho. 2026. "Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic" International Journal of Molecular Sciences 27, no. 5: 2300. https://doi.org/10.3390/ijms27052300

APA Style

Choi, H., Kang, D. W., Jo, Y., Park, J., Min, D., Min, G. G., Kim, J., Shin, C., Hong, J.-S., & Cho, W. K. (2026). Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic. International Journal of Molecular Sciences, 27(5), 2300. https://doi.org/10.3390/ijms27052300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Library Preparation Biases Plant Virome Detection: Poly(A) mRNA Enrichment vs. rRNA Depletion in Pepper and Garlic

Abstract

1. Introduction

2. Results

2.1. Field Sampling and Library Construction

2.2. Read Mapping and Host Transcript Composition

2.3. Taxonomic Composition of Non-Host Contigs and Virus Genome Assemblies

2.4. Comparative TSWV-Dominated Virome Analysis of Anseong Pepper Leaves by Total RNA-Seq and mRNA-Seq

2.5. Comparative Virome and Phylogenetic Analysis of Jincheon Pepper Leaves

2.6. Virome Composition and Allexivirus Speciation in Hoengseong Garlic Sample

3. Discussion

4. Materials and Methods

4.1. Field Sampling and Symptom Observation

4.2. RNA Extraction and Library Preparation

4.3. Host Reference Construction and Read Mapping

4.4. De Novo Assembly and Taxonomic Assignment of Unmapped Reads

4.5. Virus Genome Reconstruction and Abundance Estimation

4.6. Sanger Sequencing Validation of Garlic Virus J

4.7. Phylogenetic Analyses

4.8. Pairwise Identity and Species Demarcation Analysis

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI