Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing

Zhang, Xiaoan; Fan, Ida X.; Xu, Yanjuan; Rule, Jody; Tse, Long Ping Victor; Pourkarim, Mahmoud Reza; Lee, William M.; Di Bisceglie, Adrian M.; Fan, Xiaofeng

doi:10.3390/v17060812

Open AccessArticle

Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing

by

Xiaoan Zhang

^1,2,

Ida X. Fan

³,

Yanjuan Xu

¹,

Jody Rule

⁴,

Long Ping Victor Tse

³,

Mahmoud Reza Pourkarim

⁵

,

William M. Lee

⁴,

Adrian M. Di Bisceglie

^1,6 and

Xiaofeng Fan

^1,6,*

¹

Division of Gastroenterology & Hepatology, Department of Internal Medicine, Saint Louis University School of Medicine, St. Louis, MO 63104, USA

²

The Third Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China

³

Department of Microbiology & Immunology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA

⁴

Division of Digestive and Liver Diseases, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA

⁵

Laboratory for Clinical and Epidemiological Virology, Rega Institute, Department of Microbiology, Immunology and Transplantation, KU Leuven, 3000 Leuven, Belgium

⁶

Saint Louis University Liver Center, Saint Louis University School of Medicine, St. Louis, MO 63104, USA

^*

Author to whom correspondence should be addressed.

Viruses 2025, 17(6), 812; https://doi.org/10.3390/v17060812

Submission received: 16 May 2025 / Revised: 30 May 2025 / Accepted: 31 May 2025 / Published: 3 June 2025

(This article belongs to the Section Human Virology and Viral Diseases)

Download

Browse Figures

Versions Notes

Abstract

Clinical studies indicate the etiology of liver disease to be unknown in 5% to 30% of patients. A long-standing hypothesis is the existence of unknown viruses beyond hepatitis A through E virus. We conducted serum virome sequencing in nine patients with cryptogenic liver disease and identified eight contigs that could not be annotated. One was determined to be a contaminant, while two of seven contigs from an individual (Patient 3) were validated by reverse transcription and polymerase chain reaction (RT-PCR) and Sanger sequencing. The possibility of contamination was completely excluded through PCR, with templates extracted using different methods from samples taken at different time points. One of the contigs, Seq260, was characterized as negative-sense single-stranded DNA via enzymatic digestion and genome walking. Digital-droplet PCR revealed the copy number of Seq260 to be low: 343 copies/mL. Seq260-based nested PCR screening was negative in 200 blood donors and 225 patients with liver disease with/without known etiologies. None of the seven contigs from Patient 3 was mapped onto 118,713 viral metagenomic data. Conclusively, we discovered seven unknown contigs from a patient with cryptogenic liver cirrhosis. These sequences are likely from a novel human virus with a negative-sense, linear single-stranded DNA genome.

Keywords:

etiology; hepatitis virus; cryptogenic liver disease; cryptogenic cirrhosis; virome

1. Introduction

Understanding the etiology of human disease is essential for effective prevention, diagnosis, and treatment. Many cases of liver disease have a clear etiology, for example, hepatitis virus infection, genetic abnormality, and immunological or metabolic disorder [1]. However, the underlying cause is reportedly unknown in 5% to 30% of patients with various types of liver disease, including hepatitis [2], cirrhosis [3], hepatocellular carcinoma [4], and acute liver failure [5]. A retrospective study of 135,191 cases reported that about 7% of patients who underwent liver transplantation were “cryptogenic”, that is, without known etiology [6]. These observations refresh a long-standing postulation for the existence of unknown hepatotropic viruses beyond hepatitis A through E viruses. Ongoing efforts to identify new hepatitis viruses have resulted in the discovery of multiple previously unknown viruses such as the anelloviruses [7], which include SEN virus [8], human pegivirus type 1 (HPgV-1, formerly called GB virus C or hepatitis G virus) [9,10], human pegivirus type 2 (HPgV-2; also known as human hepegivirus 1, HHpgV-1) [11,12], and the human circular double-stranded DNA (dsDNA) virus KIs_V [13]. Causal links with liver disease or other human diseases have not been established for any of these viruses, which are now considered commensal viruses in the frame of the human virome [14]. However, the sequential discovery of these diverse viruses implies a considerable abundance of unknown viruses within the human virome, which is largely unexplored [15]. The present study reports the discovery of novel virus-like sequences in a patient with cryptogenic liver cirrhosis.

2. Materials and Methods

2.1. Study Population and Patient Samples

We used two patient cohorts: one for viral discovery and one for screening. The discovery cohort consisted of nine patients with cryptogenic liver disease (Table 1). These patients had been excluded from the infection of known hepatitis viruses by routine laboratory tests. Serum samples were collected from these patients and archived in the Saint Louis University Liver Center Sample Repository. Written informed consent was obtained from each patient prior to sample collection; no treatment was given prior to sample collection. As a sole patient with positive discovery in the current study, Patient 3 (Table 1) visited the Saint Louis University Hospital in the middle of 1994 with a complaint of abdominal discomfort. Physical examination, lab tests, and imaging did not reveal obvious abnormality, except for a moderately elevated serum alanine aminotransferase (107 U/mL). Thus, the diagnosis of cryptogenic hepatitis was made. The patient attended regular follow-up visits, but did not receive any medical interventions and experienced raid disease progression, reaching the stage of cryptogenic cirrhosis by the end of 1995. Liver transplantation was carried out in early 1996, but the patient died of liver complications 6 months later. There was no recorded history of drug use, alcohol consumption, or previous surgery.

Samples from the screening cohort were categorized into three groups (Table 1): First, serum samples from patients with known etiologies, including hepatitis C virus (HCV) infection (n = 100) and hepatitis B virus (HBV) infection (n = 10), were accessed from our Liver Center Sample Repository. Second, serum samples from 200 blood donors were obtained from American Red Cross-National Testing Laboratory in Saint Louis as a gift to support biomedical research. The third group comprised samples from the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository (NIDDK-CR), which archives patient specimens from NIDDK-sponsored clinical studies. We obtained serum samples from three completed clinical trials: the acute liver failure study group (ALFSG) (n = 50), the adult-to-adult living-donor liver transplantation cohort study (A2ALL) (n = 40), and the nonalcoholic fatty liver disease (NAFLD) adult (n = 24) (request numbers #23456, #23105, and #23744, respectively). The ALFSG samples were from patients who lacked known etiologies [5]. The A2ALL is a multi-center clinical trial that was conducted between 1998 and 2003 and aimed to determine whether adult-to-adult living-donor liver transplant has significant benefits for patients compared with deceased-donor liver transplantation [16]. All samples that we obtained from the A2ALL were collected from patients without explicit etiologies. The NAFLD adult samples were from patients with diagnosed cryptogenic cirrhosis at the time of recruitment [17]. All serum samples from the NIDDK-CR were coded prior to shipment to our lab. The entire research protocol for the use of patient samples was reviewed and approved by the Saint Louis University Institutional Review Board (IRB protocol: SLU10592).

2.2. Serum Virome Sequencing

Serum virome sequencing was performed as previously described [18,19]. Briefly, total RNA was extracted from 140 μL of serum and eluted into 60 μL Tris buffer (pH 8.5) using the QIAamp Viral RNA Mini kit (Qiagen, Germantown, MD, USA). According to the manufacturer’s instructions, the kit extracts both DNA and RNA larger than 200 bp. For reverse transcription (RT), 10.6 μL of extracted RNA was mixed with 9.4 μL RT matrix consisting of 1× SuperScript III buffer, 10 mM dithiothreitol (DTT), 80 μM of primer C28 (Table 2), exonuclease-resistant random pentamer primers with the 5′ end blocked by C18 spacer [20], 1 mM dNTPs (Life Technologies, Carlsbad, CA, USA), 20 U of RNasin ribonuclease inhibitor (Promega, Madison, WI, USA), and 200 U of SuperScript III reverse transcriptase (Life Technologies). The reaction was incubated at 37 °C for 30 min, 50 °C for 30 min, and inactivated by incubation at 70 °C for 15 min. A 4 μL aliquot of RT was used for template-dependent Multiple Displacement Amplification (tMDA) in a 40 μL reaction volume, consisting of 1× phi29 DNA polymerase buffer, 1 mM dNTPs, 80 μM primer C28 (as used in the RT), and 20 units of phi29 DNA polymerase (New England Biolabs, Ipswich, MA, USA). The reaction was incubated at 28 °C for 14 h and then terminated by heating at 65 °C for 15 min. After purification using the QIAamp DNA mini kit (Qiagen, Germantown, MD, USA), the product of RT-tMDA was subjected to library construction with the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, CA, USA), followed by sequencing on the Illumina NextSeq 500 platform (1× 250-bp single-end reads, mid-output mode) at MOgene (St. Louis, MO, USA).

2.3. Viral Categorization and Discovery

Raw sequence reads in FASTQ format from each sample were filtered using PRINSEQ (v0.20) for quality control with the following parameters: read length ≥70 nt, mean read quality score ≥25, low complexity with DUST score ≤7, ambiguous bases ≤1%, and all types of duplicates [21]. We used Bowtie 2 mapper (version 2.5.3) to map quality reads onto viral reference sequences from the National Center for Biotechnology Information (NCBI) (18,677 complete viral genomes downloaded in March 2025) [22,23]. After viral mapping, reads were filtered by subtractive mapping from the two non-template controls; human sequences (GRCh38 build); NCBI microbial reference sequences for bacteria, archaea, fungi, and protist (downloaded in March 2025) [23]; and microbial reference genomes from the Human Microbiome Project [24]. The remaining reads were de novo assembled using SPAdes (version 3.13.0), a short-read assembler [25]. Assembled contigs were labeled in PRINSEQ and combined to generate a new dataset, which was then filtered in CD-HIT based on 90% nucleotide similarity [26], followed by similarity-based annotation using NCBI BLASTN against the NCBI collection of nucleotide acid sequences (database “nt”) with a conserved E-value setting of 1 × 10⁻⁵. Contigs with no BLASTN hits were translated in all six reading frames and searched using BLASTP against the NCBI nonredundant protein database (“nr”) with the E-value setting of 1 × 10⁻⁵. Contigs without BLASTP hits were evaluated for remote homology by Profile Hidden Markov Model (HMM) analysis in HMMER (v3.2.1) [27] with the HMM-profiles built from NCBI viral RefSeq except for phage (vFam) [28], phage [29], and the collection of protein families (Pfam) (24,076 entries in version 37.2) [30]. Final contigs without any hits throughout all analyses were considered to be unknown sequences.

2.4. Validation of Unknown Sequences

To exclude the possibility of contamination from reagents or experimental pipelines, three unknown sequences were selected for validation by RT-PCR directly from corresponding serum. Briefly, 10 μL of total RNA was used for RT in a 20 μL reaction volume, as described above, except that sequence-specific primer R1 was used (Table 2). A 5 μL aliquot of the RT product was used for the first round of PCR in a 50 μL reaction including 1× Q5 polymerase buffer, 0.8 mM dNTPs, 0.4 μM each of primers R1 and F1 (Table 2), and 1 U of Q5 DNA polymerase (New England Biolabs, Ipswich, MA, USA). Cycle parameters were programmed as 94 °C for 2 min, connected by the first 5 cycles of 94 °C for 1 min, 60 °C for 1 min, and 72 °C for 1 min, linked by 25 cycles in which the annealing temperature was reduced to 50 °C (referred to as the “touchdown protocol”), followed by a final 7-min incubation at 72 °C. A 2 μL aliquot of the first-round PCR product was used for the second round of PCR using the same cycle parameters and primers F2 and R2 (Table 2). The product was gel-purified and subjected to Sanger sequencing. After validation, PCR was repeated with and without RT using freshly prepared templates, including total serum RNA re-extracted using the QIAamp Viral RNA Mini kit (Qiagen, Germantown, MD, USA), Qiagen column-flushed water, and total DNA extracted with the Apostle MiniMax High Efficiency Cell-Free DNA Isolation Kit (Apostle, Pleasanton, CA, USA).

2.5. Machine-Learning Analysis for the Origin of Unknown Sequences

All unknown sequences were evaluated for the likelihood of viral origin using VirFinder, based on the different k-mer signatures of viral and bacterial genomes [31]. Analyses were conducted under two models: Firstly, the VF.modEPV_k8.rda model implemented in VirFinder was used to distinguish bacteria and/or archaea from prokaryotic and/or eukaryotic viruses. This model was trained with 5800 viral genomes from the NCBI viral RefSeq [31]. The second model was trained to separate 142,809 newly assembled human phage genomes from 2206 complete genomes of known hepatitis viruses from GenBank, comprising 73, 1054, 1040, 7, and 32 genomes from hepatitis A, B, C, D, and E viruses, respectively. Based on the trained models, the program extracted k-mer features from query sequences to generate scores ranging from 0 to 1, with 1 representing the highest likelihood of a viral sequence. Scores were compared with the distribution of scores from the training sequences to compute statistical significance. Contigs with a low score and high p-value (>0.05) were considered unlikely to be of viral origin.

2.6. Genome Walking and Strand Attribute Determination

We selected one unknown sequence, namely Seq260, for both experiments. Genome walking consisted of four steps: primer extension (PE), intramolecular circularization, rolling-cycle amplification (RCA), and PCR and Sanger sequencing. Briefly, for PE at the 5′ end, 5 µL of extracted DNA was used in a 50 µL reaction consisting of 0.4 µM primer PAWR (Table 2), 1× Q5 reaction buffer, and 1.6 U Q5 DNA polymerase. After incubation at 94 °C for 2 min, the mixture was subjected to 30 cycles of the touchdown protocol described above. The PE product was purified into 10 µL of elution buffer using MinElute PCR Purification Kit (Qiagen, Germantown, MD, USA), and 8 µL of the purified PE product was used for intramolecular ligation in a 20 µL reaction finalized with 1× CircLigase II buffer, 2.5 mM MnCl₂, 1 M Betaine, and 150 U CircLigase II ssDNA Ligase (Lucigen, Middleton, WI, USA). The reaction was incubated at 60 °C for 3 h, then inactivated at 80 °C for 10 min, and finally purified using the MinElute PCR Purification Kit (Qiagen, Germantown, MD, USA). The total purified ligation product (10 µL) was used for RCA with 20 U phi29 DNA polymerase (New England Biolabs, Ipswich, MA, USA) and 0.4 µM each of primers AWF7bp and AWR7bp (Table 2). These 7 nt primers were selected from all 7 nt strings of the PE region of Seq260 by primer PAWR with 1 nt overlap based on GC content, melting temperature, and the number of perfect replicates in the human reference genome (GRCh Build 38), as described previously [32]. The use of 7 nt but not 6 nt primers in RCA aimed to reduce the number of their replicates on the human genome. After incubation at 30 °C for 16 h, the RCA reaction was inactivated at 65 °C for 10 min, and then it was purified into 30 µL of elution buffer, using the QIAamp DNA Mini Kit (Qiagen, Germantown, MD, USA). A 5 µL aliquot of purified RCA product was subjected to 60 cycles of nested PCR as described above, except that primers AWF1 and AWR1 were used in the first round of PCR, and primers AWF2 and AWR2 were used in the second round of PCR (Table 2). The resulting PCR product was gel-purified and used for Sanger sequencing. The entire procedure was repeated to perform genome walking at the 3′ end with the use of different primers (Table 2). The RCA step in genome walking at the 3′ end used primers BWF7bp and AWR7bp. To evaluate whether Seq260 was single- or double-stranded, 12 µL of extracted DNA was digested with 20 U of the type II restriction enzyme NruI (New England Biolabs, Ipswich, MA, USA) in a 20 µL reaction volume at 37 °C for 1 h. A 3 µL aliquot of this reaction without inactivation was directly used as the input of nested PCR, after which the product was visualized on agarose gel. The same procedures were applied to the total DNA extracted from a mock serum sample consisting of 1 × 10⁶ copies/mL synthetic double-stranded gBlocks Seq260 (Integrated DNA Technologies, Coralville, IA, USA).

2.7. Determination of Copy Numbers of Seq260 Using Digital Droplet PCR

The digital-droplet PCR (ddPCR) mixture was prepared in a semi-skirted 96-well plate (Bio-Rad, Hercules, CA, USA). Each well contained 20 µL reaction volume, including 1× ddPCR supermix for probes (no dUTP), 0.5 µM each of primers 260F and 260R, 0.25 µM of probe 260P (Table 2), and the DNA inputs. After the sealing with a pierceable foil heat seal (Bio-Rad, Hercules, CA, USA) at 180 °C for 5 s, the plate was applied to a Bio-Rad QX200 Automated Droplet Generator (Bio-Rad, Hercules, CA, USA) to generate at least 10,000 droplets. Droplets were transferred to a Bio-Rad PX1 PCR Plate, and standard PCR was run in the Bio-Rad C1000 Thermal Cycler (Bio-Rad, Hercules, CA, USA) with the following program: 10 min at 95 °C for enzyme activation, followed by 40 cycles of 94 °C for 30 s and 60 °C for 1 min, and then deactivation at 98 °C for 10 min. After PCR, the plate was submitted to Bio-Rad QX200 Droplet Reader (Bio-Rad, Hercules, CA, USA), and data were analyzed using Bio-Rad QX Manager 1.2 software (Bio-Rad, Hercules, CA, USA). The input in the ddPCR included an aliquot of 2.5 ng of extracted DNA from the Seq260-positive sample, 10 fg gBlocks Seq260 used as the positive control, and 2.5 ng each of extracted DNA from the mock serum samples, prepared in a 1:10 serial dilution to contain different copy numbers of gBlocks Seq260 in the serum from a healthy blood donor.

2.8. Detection of Seq260 in the Screened Cohort

Total DNA was extracted from serum samples in the screened cohort using Apostle MiniMax High Efficiency Cell-Free DNA Isolation Kit (Apostle, Pleasanton, CA, USA). An aliquot of 2.5 ng of extracted DNA was used as the input for the nested PCR with primers designed based on Seq260 (Table 2). The product of the PCR was visualized on agarose gel. Suspected amplicons were gel-purified and subjected to Sanger sequencing.

2.9. In Silico Screening of Unknown Sequences

All unknown sequences were examined for possible existence in previously published next-generation sequencing (NGS) data. First, we searched the National Center for Biotechnology Information sequence read archive (SRA) portal using the key words “virome”, “viral metagenomics”, “serum”, or “plasma”. Available NGS datasets were downloaded and directly used for mapping with gapped mapper Bowtie 2 or BWA [33]. Second, we screened whole-genome sequencing (WGS) data from the BioMe Biobank at Mount Sinai, which is a community cohort study with 10,178 participants in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) initiative [34]. These data are archived in the Cloud as compressed reference-oriented alignment maps (CRAMs) and were analyzed with a memory-focused instance type r5a.12x (48 vCPU and 384 GB memory) in Amazon Web Services. After downloading with the NCBI SRA Toolkit (version 3.1.0) (option “type all” for CRAM format) [35], data were processed in SAMTools to extract unmapped reads (using the option “f4”) [36], followed by mapping with Bowtie 2. The DIAMOND BLASTX is a faster algorithm than NCBI BLASTX, which we applied for additional analyses of unmapped reads at the amino acid level [37]. Finally, we screened non-human reads of metagenomics and metatranscriptomics data from nine children with acute hepatitis of unknown etiology [38] (kindly provided by Sarah Buddle, Dr. Sofia Morfopoulou, and Professor Judith Breuer at University College London). The in silico screening protocol for unknown sequences was independently reviewed and approved by the Saint Louis University Institutional Review Board (IRB protocol: SLU34248).

3. Results

3.1. Discovery of Novel Virus-like Sequences in a Patient with Cryptogenic Liver Cirrhosis

Serum virome sequencing generated an average of 4.89 ± 0.34 million reads per case. Viral categorization revealed the existence of known human viruses, including anelloviruses and phages. As expected, no known hepatitis viruses were detected in any of the nine patients in the discovery cohort. A total of 28,243 reads remained after subtractive mapping and which were assembled into 5370 contigs. Subsequent annotation identified unknown sequences of only eight contigs: one from Patient 1, and the other seven from Patient 3 (Supplementary Table S1). These eight contigs had no hits in any similarity-based searches at either the nucleotide or amino acid level. Machine-learning analysis revealed contig xx01_23 from Patient 1 to have a score of around 0.5 in both models, and such a score is not indicative of viral origin. However, all seven contigs from Patient 3 had scores above 0.9, regardless of the model applied (Figure 1). Interestingly, the score of crAssphage dropped significantly in the phage/hepatitis virus model, suggesting that k-mer frequency differs between phage and hepatitis virus genomes.

3.2. Seq260 (Contig xx03_260) from Patient 3 Is Not a Contaminant

Three contigs (xx01_23 from Patient 1; and xx03_101 and xx03_260 from Patient 3) were selected for validation from the corresponding serum samples. No amplicon was detected in contig xx01_23-based RT-PCR, despite repeated experiment. The two contigs from Patient 3 were validated using RT-PCR on extracted nucleic acids from this patient. Sanger sequencing of the gel-purified PCR product revealed 100% identity to the assembled contigs (Figure 2). Contig xx03_260 was denoted Seq260 and used in subsequent experiments. No amplification was achieved by repeating Seq260-based RT-PCR using Qiagen column-flushed water (Figure 3). Furthermore, a product of the same size was detected following PCR without RT, suggesting that Seq260 is a DNA rather than an RNA sequence (Figure 3). We also detected Seq260 in total serum DNA extracted using Apostle MiniMax High Efficiency Cell-Free DNA Isolation Kit (Figure 3). The Apostle kit that we used is a bead-based serum DNA extraction kit, free of any kind of silica (Fan and Apostle, communication). Finally, there were two serum samples from Patient 3 that were respectively collected at two separate time points prior to liver transplantation. Seq260 was positive in both serum samples (Figure 3).

3.3. A Net Extension of 107 nt at the 3′ End of Seq260 by Genome Walking

Genome walking at the 3′ end of Seq260 resulted in the detection of an amplicon with a size of ~220 bp. Sanger sequencing confirmed the success with a net extension of 103 nt at the 3′ end (Figure 4). However, genome walking at the 5′ end of Seq260 failed despite repeated experiment. An additional 103 nt from genome walking at the 3′ end gives Seq260 490 nt in length; however, the sequence remains unable to be annotated. In addition, PCR using the NruI-digested template (the recognition site is in the middle of Seq260) exhibited no signs of reduced PCR amplification efficiency compared with the mock sample containing synthetic double-stranded gBlocks Seq260 (Figure 5).

3.4. Seq260 Is Not Detected in Other Patients and Next-Generation Sequencing Data

The readouts of Seq260-based ddPCR became saturated in the mock serum samples with concentrations ≥ 1 × 10⁷ copies/mL, as indicated by the lack of negative droplets (Supplementary Figure S1). We evaluated our ddPCR to have a sensitivity of 100 copies/mL (Figure 6). However, reproducible quantitation, shown as comparable numbers of positive droplets among three technical replicates, was observed in the mock samples with a concentration of no less than 1 × 10³ copies/mL (Figure 6). Seq260 was found to be present at a concentration of 343 copies/mL in serum from Patient 3 (Figure 6), a low titer that necessitated the use of nested PCR to enable visualization of the PCR product on agarose gel. Despite the use of nested PCR, Seq260 was not detected in a total of 433 serum samples from both cohorts and was only detected in samples from Patient 3. Amplicons larger than the expected size of 237 bp were observed in four of the 433 (0.9%) serum samples. However, Sanger sequencing revealed these to be human sequences, indicating non-specific amplification. For in silico screen, keyword searches returned NGS data from 118,731 bio-samples in 617 bio-projects, about 78.3 TB in size. Of the total 118,731 bio-samples, 92,058 (77.5%) were from The Environmental Determinants of Diabetes in the Young (TEDDY) study [39]. Contig xx01_23 from Patient 1 was mapped with ≥ 1 read (s) in 14 bio-samples from five bio-projects (Supplementary Table S2). In contrast, none of seven contigs from Patient 3 had hits in any of the 118,731 NGS datasets. Non-human reads accounted for 1.82 ± 1.44% of the 10,178 WGS data in the TOPMed BioMe Biobank (185 TB in CRAM format), none of which was mapped onto the eight contigs in the present study. Diamond BLASTX analysis of the eight contigs also revealed no meaningful hits on these reads under the definition of the E-value ≤ 1 × 10⁻⁵. Finally, mapping and Diamond BLASTX analysis showed the lack of hits of all unknown sequences on non-human reads from samples in children with cryptogenic hepatitis.

4. Discussion

The advent of NGS technology has greatly advanced viral discovery. However, contamination is a major issue for NGS-based approaches [40], especially when working with low-biomass samples like patient serum. The presence of contaminants can lead to misinterpretation of the data [41]. The current study identified eight unknown contigs from virome sequencing data. We were unable to validate contig xx01_23 from Patient 1 directly from the serum sample. Machine-learning analysis indicated that this contig was not a viral sequence. In addition, the contig was detected in multiple NGS data associated with various clinical phenotypes, even in the tick virome [42,43,44,45]. Taken together, these findings lead us to conclude that contig xx01_23 is likely a contaminant. In contrast, we have obtained multiple lines of evidence to suggest at least two of seven unknown contigs, namely xx03_260 (Seq260) and xx03_101, from Patient 3 are authentic sequences. Direct detection of Seq260 in serum samples obtained at two time points indicates that Seq260 is unlikely to be a contaminant from phlebotomy, reagents, or any steps of the experimental protocol. Furthermore, the strong support from machine-learning analysis indicates that these seven unknown contigs are viral in origin, likely from a single unknown human eukaryotic DNA virus.

The success of genome walking at the 3′ but not the 5′ end indicates that Seq260 may have only one strand. All primers in genome walking were designed based on a positive-sense orientation of Seq260 in accordance with the coding direction of its largest open reading frame (ORF) of 127 amino acids. Successful genome walking at 3′ end but not 5′ end suggests a negative-sense orientation of Seq260. In addition, most type II restriction enzymes preferentially cleave dsDNA over single-stranded DNA (ssDNA) [46]. Digestion with type II restriction enzyme NruI targeting the middle of the Seq260 did not reduce the amplification signal, further confirming the sequence to be a single-stranded DNA. Finally, all seven unknown contigs from Patient 3 were assembled from 42 of 4,101,707 (0.001%) reads from the RT-tMDA product. The phi29 DNA polymerase used in tMDA is well known to favor a circular over a linear DNA template [47]. For example, human anelloviruses are ubiquitous commensal viruses with circular ssDNA genomes, present in serum at usually very low loads (about 1 × 10² to 1 × 10³ copies/mL) [48]. Virome sequencing using RT-tMDA can readily detect these viruses, which account for an average of ≥5% reads, in serum samples from patients and blood donors [18,19]. Thus, extremely low readouts in NGS suggest that the putative virus in Patient 3 is likely to have a negative-sense linear ssDNA genome.

This putative virus was not detected in the screening cohort or in archived NGS data relevant to the host virome studies. Notably, our screening cohort included patients with different liver disorders of unknown etiologies. Similarly, in silico screening covered NGS data of patients with cryptogenic liver disease, such as acute liver failure (SRA accession number: PRJNA389455) [49], post-transfusion hepatitis (SRA accession number: PRJNA217527) [41], and unexplained acute hepatitis in children [38]. However, a negative detection in experimental and in silico screening does not necessarily mean that the putative virus is not present in these subjects for a number of reasons: First, the proportion of patients with liver disease and no clear etiology is small, but the absolute number of patients with cryptogenic liver disease is large due to a big patient reservoir. Thus, the screening cohort in the present study is relatively small. Second, screening PCR was based on Seq260, which appears to be a coding domain. Mutations in coding domains are common in ssDNA virus, such as in the case of human anelloviruses, where PCR based on a 500 nt coding DNA fragment that was initially discovered indicated the prevalence of these viruses to be 3.3%. However, when PCR was carried out targeting the non-coding region, the viruses were found to be ubiquitous in the human population [50]. Finally, the very small size of most human viral genomes results in overwhelming domination of host nucleic acids in human samples in terms of molecular weight, leading to extremely low viral output in NGS-based virome sequencing data and remaining an unsolved technical issue [51]. For this reason, complete genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) using NGS is recommended by the World Health Organization (WHO) only in cases where the Ct values in real-time PCR is below 30 [52]. A well-known human linear ssDNA virus is parvovirus B19, which can package either the negative or positive DNA strands of its 5.6 kb genome into viral particles [53]. By analogy, it is plausible that our putative virus has a similar genome size. The small genome and potentially low viral load, as seen in Patient 3, could explain the negative result of in silico screening of virome sequencing data. Taking together, a cautious conclusion from our data is that the putative virus has a very low population frequency.

The present study has some limitations which should be acknowledged. The total volume of serum samples available at two time points from Patient 3 was only 0.7 mL. The limited volume did not allow for further experimentation to characterize the putative virus. In addition, only serum samples were available from patients with cryptogenic liver disease, including Patient 3. This excluded the possibility for investigation of the putative virus directly in liver tissue. We are, therefore, unable to draw definitive conclusions with regard to etiological associations between the putative virus and cryptogenic liver disease. However, the data presented here lead us to conclude with confidence that we have discovered seven unknown sequences in a serum sample from a patient with cryptogenic liver cirrhosis. These unknown sequences are likely from a novel human virus with a negative-sense linear ssDNA genome. Further work is anticipated to conduct an extended screening in general and target populations. Once additional subjects carrying the putative virus are identified, it will be feasible to determine the complete genome sequence, allowing comprehensive genome annotation and downstream experimentation, such as serological studies. Finally, our study also signals a challenge in investigating unannotated sequences from viral metagenomics data since it is likely a mixture of contaminants and real viral elements. The methods presented in the current study should be helpful for the development of generalized pipelines to unveil the nature of this so-called viral dark matter [15].

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/v17060812/s1, Table S1. Summary of eight contigs that cannot be annotated from virome analysis; Table S2. Screen of contig xx01_23 in virome sequencing data via read mapping.

Author Contributions

Conceptualization, X.F.; data curation, L.P.V.T. and X.F.; formal analysis, X.Z., I.X.F., L.P.V.T. and M.R.P.; funding acquisition, X.F.; investigation, X.Z. and Y.X.; methodology, X.Z. and X.F.; project administration, X.F.; resources, J.R., M.R.P., W.M.L., A.M.D.B. and X.F.; supervision, X.F.; validation, Y.X.; visualization, X.Z.; writing—original draft, X.F.; writing—review and editing, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the US National Institutes of Health (NIH) grant R21AI175438.

Institutional Review Board Statement

The study protocol using patient samples conformed to the ethical guidelines of the Declaration of Helsinki and was approved by the Saint Louis University Institutional Review Board (IRB protocol: SLU10592). In silico data screening protocol was independently reviewed and approved by the Institutional Review Board (IRB protocol: SLU34248).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Raw Illumina sequence data after the removal of human sequences are available upon request made to the corresponding author. A total of eight unannotated contigs are reported in the Supplementary Materials (Table S1). The two contigs validated in the current study were deposited in the GenBank under the accession numbers MW468091 and PV170683 for contigs xx03_260 (Seq260) and xx03_101, respectively.

Acknowledgments

The current study involves the use of patient specimens from three completed clinical trials, including acute liver failure study group, the adult-to-adult living-donor liver transplantation cohort study, and the nonalcoholic fatty liver disease adult. These three studies were conducted by study investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). In the current study, the patient specimens from the three studies were supplied by NIDDK Central Repository (NIDDK-CR). This manuscript was not prepared under the auspices of these studies and does not necessarily reflect the opinions or views of the original clinical trials, NIDDK-CR, or NIDDK.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

RT-PCR, reverse transcription and polymerase chain reaction; ALF, acute liver failure; HCV, hepatitis C virus; HBV, hepatitis B virus; dsDNA, double-stranded DNA; NIDDK-CR, the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository; ALFSG, acute liver failure study group; A2ALL, adult-to-adult living-donor liver transplantation cohort study; NAFLD, nonalcoholic fatty liver disease; tMDA, template-dependent multiple displacement amplification; NCBI, National Center for Biotechnology Information; HMM, Hidden Markov Model; PE, primer extension; RCA, rolling-cycle amplification; ddPCR, digital droplet PCR; NGS, next-generation sequencing; SRA, sequence read archive; WGS, whole-genome sequencing; NHLBI, National Heart, Lung, and Blood Institute; TOPMed, Trans-Omics for Precision Medicine; ssDNA, single-stranded DNA; WHO, World Health Organization; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

References

Gan, C.; Yuan, Y.; Shen, H.; Gao, J.; Kong, X.; Che, Z.; Guo, Y.; Wang, H.; Dong, E.; Xiao, J. Liver diseases: Epidemiology, causes, trends and predictions. Signal Transduct. Target Ther. 2025, 10, 33. [Google Scholar] [PubMed]
Rane, S.V.; Jain, S.; Debnath, P.; Deshmukh, R.; Nair, S.; Chandnani, S.; Kamat, R.; Rathi, P. A comparative study of uncomplicated acute non-A-E hepatitis with acute viral hepatitis and acute onset autoimmune hepatitis. Indian J. Gastroenterol. 2024, 43, 443–451. [Google Scholar] [CrossRef]
Vaz, J.; Eriksson, B.; Strömberg, U.; Buchebner, D.; Midlöv, P. Incidence, aetiology and related comorbidities of cirrhosis: A Swedish population-based cohort study. BMC Gastroenterol. 2020, 20, 84. [Google Scholar] [CrossRef]
Nagaoki, Y.; Hyogo, H.; Ando, Y.; Kosaka, Y.; Uchikawa, S.; Nishida, Y.; Teraoka, Y.; Morio, K.; Fujino, H.; Ono, A.; et al. Increasing incidence of non-HBV- and non-HCV-related hepatocellular carcinoma: Single-institution 20-year study. BMC Gastroenterol. 2021, 21, 306. [Google Scholar] [CrossRef] [PubMed]
Ganger, D.R.; Rule, J.; Rakela, J.; Bass, N.; Reuben, A.; Stravitz, R.T.; Sussman, N.; Larson, A.M.; James, L.; Chiu, C.; et al. Acute liver failure of indeterminate etiology: A comprehensive systematic approach by an expert committee to establish causality. Am. J. Gastroenterol. 2018, 113, 1319. [Google Scholar] [CrossRef]
Mercado-Irizarry, A.; Torres, E.A. Cryptogenic cirrhosis: Current knowledge and future directions. Clin. Liver Dis. 2016, 7, 69–72. [Google Scholar] [CrossRef] [PubMed]
Nishizawa, T.; Okamoto, H.; Konishi, K.; Yoshizawa, H.; Miyakawa, Y.; Mayumi, M. A novel DNA virus (TTV) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology. Biochem. Biophys. Res. Commun. 1997, 241, 92–97. [Google Scholar] [CrossRef] [PubMed]
Akiba, J.; Umemura, T.; Alter, H.J.; Kojiro, M.; Tabor, E. SEN virus: Epidemiology and characteristics of a transfusion-transmitted virus. Transfusion 2005, 45, 1084–1088. [Google Scholar] [CrossRef]
Linnen, J.; Wages, J., Jr.; Zhang-Keck, Z.Y.; Fry, K.E.; Krawczynski, K.Z.; Alter, H.; Koonin, E.; Gallagher, M.; Alter, M.; Hadziyannis, S.; et al. Molecular cloning and disease association of hepatitis G virus: A transfusion-transmissible agent. Science 1996, 271, 505–508. [Google Scholar] [CrossRef]
Simons, J.N.; Leary, T.P.; Dawson, G.J.; Pilot-Matias, T.J.; Muerhoff, A.S.; Schlauder, G.G.; Desai, S.M.; Mushahwar, I.K. Isolation of novel virus-like sequences associated with human hepatitis. Nat. Med. 1995, 1, 564–569. [Google Scholar] [CrossRef]
Berg, M.G.; Lee, D.; Coller, K.; Frankel, M.; Aronsohn, A.; Cheng, K.; Forberg, K.; Marcinkus, M.; Naccache, S.N.; Dawson, G.; et al. Discovery of a novel human pegivirus in blood associated with hepatitis C virus co-infection. PLoS Pathog. 2015, 11, e1005325. [Google Scholar] [CrossRef] [PubMed]
Kapoor, A.; Kumar, A.; Simmonds, P.; Bhuva, N.; Singh Chauhan, L.; Lee, B.; Sall, A.A.; Jin, Z.; Morse, S.S.; Shaz, B.; et al. Virome analysis of transfusion recipients reveals a novel human virus that shares genomic features with hepaciviruses and pegiviruses. mBio 2015, 6, e01466-15. [Google Scholar] [CrossRef]
Satoh, K.; Iwata-Takakura, A.; Osada, N.; Yoshikawa, A.; Hoshi, Y.; Miyakawa, K.; Gotanda, Y.; Satake, M.; Tadokoro, K.; Mizoguchi, H. Novel DNA sequence isolated from blood donors with high transaminase levels. Hepatol. Res. 2011, 41, 971–981. [Google Scholar] [CrossRef]
Liang, G.; Bushman, F.D. The human virome: Assembly, composition and host interactions. Nat. Rev. Microbiol. 2021, 19, 514–527. [Google Scholar] [CrossRef] [PubMed]
Santiago-Rodriguez, T.M.; Hollister, E.B. Unraveling the viral dark matter through viral metagenomics. Front. Immunol. 2022, 13, 1005107. [Google Scholar] [CrossRef]
Abecassis, M.M.; Fisher, R.A.; Olthoff, K.M.; Freise, C.E.; Rodrigo, D.R.; Samstein, B.; Kam, I.; Merion, R.M.; A2ALL Study Group. Complications of living donor hepatic lobectomy--a comprehensive report. Am. J. Transplant. 2012, 12, 1208–1217. [Google Scholar] [CrossRef] [PubMed]
Neuschwander-Tetri, B.A.; Clark, J.M.; Bass, N.M.; Van Natta, M.L.; Unalp-Arida, A.; Tonascia, J.; Zein, C.O.; Brunt, E.M.; Kleiner, D.E.; McCullough, A.J.; et al. Clinical, laboratory and histological associations in adults with nonalcoholic fatty liver disease. Hepatology 2010, 52, 913–924. [Google Scholar] [CrossRef]
Li, G.; Zhou, Z.; Yao, L.; Xu, Y.; Wang, L.; Fan, X. Full annotation of serum virome in Chinese blood donors with elevated alanine aminotransferase levels. Transfusion 2019, 59, 3177–3185. [Google Scholar] [CrossRef]
Ren, Y.; Xu, Y.; Lee, W.M.; Di Bisceglie, A.M.; Fan, X. In-depth serum virome analysis in patients with acute liver failure with indeterminate etiology. Arch. Virol. 2020, 165, 127–135. [Google Scholar] [CrossRef]
Wang, W.; Ren, Y.; Lu, Y.; Xu, Y.; Crosby, S.D.; Di Bisceglie, A.M.; Fan, X. Template-dependent multiple displacement amplification for profiling human circulating RNA. Biotechniques 2017, 63, 21–27. [Google Scholar] [CrossRef]
Schmieder, R.; Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011, 27, 863–864. [Google Scholar] [CrossRef] [PubMed]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef] [PubMed]
Lloyd-Price, J.; Mahurkar, A.; Rahnavard, G.; Crabtree, J.; Orvis, J.; Hall, A.B.; Brady, A.; Creasy, H.H.; McCracken, C.; Giglio, M.G.; et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 2017, 550, 61–66. [Google Scholar] [CrossRef]
Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef] [PubMed]
Niu, B.; Fu, L.; Sun, S.; Li, W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinform. 2010, 11, 187. [Google Scholar] [CrossRef]
Johnson, L.S.; Eddy, S.R.; Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 2010, 11, 431. [Google Scholar] [CrossRef]
Skewes-Cox, P.; Sharpton, T.J.; Pollard, K.S.; DeRisi, J.L. Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS ONE 2014, 9, e105067. [Google Scholar] [CrossRef]
Grazziotin, A.L.; Koonin, E.V.; Kristensen, D.M. Prokaryotic Virus Orthologous Groups (pVOGs): A resource for comparative genomics and protein family annotation. Nucleic Acids Res. 2017, 45, D491–D498. [Google Scholar] [CrossRef]
Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef]
Ren, J.; Ahlgren, N.A.; Lu, Y.Y.; Fuhrman, J.A.; Sun, F. VirFinder: A novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 2017, 5, 69. [Google Scholar] [CrossRef]
Peng, P.; Xu, Y.; Aurora, R.; Di Bisceglie, A.M.; Fan, X. Within-host quantitation of anellovirus genome complexity from clinical samples. J. Virol. Methods 2022, 302, 114493. [Google Scholar] [CrossRef]
Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed]
Taliun, D.; Harris, D.N.; Kessler, M.D.; Carlson, J.; Szpiech, Z.A.; Torres, R.; Taliun, S.A.G.; Corvelo, A.; Gogarten, S.M.; Kang, H.M.; et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021, 590, 290–299. [Google Scholar] [CrossRef]
The SRA Toolkit Development Team. Available online: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software (accessed on 1 May 2023).
Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef] [PubMed]
Morfopoulou, S.; Buddle, S.; Torres Montaguth, O.E.; Atkinson, L.; Guerra-Assunção, J.A.; Moradi Marjaneh, M.; Zennezini Chiozzi, R.; Storey, N.; Campos, L.; Hutchinson, J.C.; et al. Genomic investigations of unexplained acute hepatitis in children. Nature 2023, 617, 564–573. [Google Scholar] [CrossRef]
Vehik, K.; Lynch, K.F.; Wong, M.C.; Tian, X.; Ross, M.C.; Gibbs, R.A.; Ajami, N.J.; Petrosino, J.F.; Rewers, M.; Toppari, J.; et al. Prospective virome analyses in young children at increased genetic risk for type 1 diabetes. Nat. Med. 2019, 25, 1865–1872. [Google Scholar] [CrossRef]
Eisenhofer, R.; Minich, J.J.; Marotz, C.; Cooper, A.; Knight, R.; Weyrich, L.S. Contamination in low microbial biomass microbiome studies: Issues and recommendations. Trends Microbiol. 2019, 27, 105–117. [Google Scholar] [CrossRef]
Naccache, S.N.; Greninger, A.L.; Lee, D.; Coffey, L.L.; Phan, T.; Rein-Weston, A.; Aronsohn, A.; Hackett, J., Jr.; Delwart, E.L.; Chiu, C.Y. The perils of pathogen discovery: Origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J. Virol. 2013, 87, 11966–11977. [Google Scholar] [CrossRef]
Anani, H.; Destras, G.; Bulteau, S.; Castain, L.; Semanas, Q.; Burfin, G.; Petrier, M.; Martin, F.P.; Poulain, C.; Dickson, R.P.; et al. Lung Virome Convergence Precedes Hospital-Acquired Pneumonia in Intubated Critically Ill Patients. Preprint 2024. Available online: https://ssrn.com/abstract=5012218 (accessed on 1 October 2024).
Bal, A.; Pichon, M.; Picard, C.; Casalegno, J.S.; Valette, M.; Schuffenecker, I.; Billard, L.; Vallet, S.; Vilchez, G.; Cheynet, V.; et al. Quality control implementation for universal characterization of DNA and RNA viruses in clinical respiratory samples using single metagenomic next-generation sequencing workflow. BMC Infect. Dis. 2018, 18, 537. [Google Scholar] [CrossRef] [PubMed]
Stegmüller, S.; Fraefel, C.; Kubacki, J. Genome sequence of alongshan virus from ixodes ricinus ticks collected in Switzerland. Microbiol. Resour. Announc. 2023, 12, e0128722. [Google Scholar] [CrossRef] [PubMed]
Cordey, S.; Laubscher, F.; Hartley, M.A.; Junier, T.; Keitel, K.; Docquier, M.; Guex, N.; Iseli, C.; Vieille, G.; Le Mercier, P.; et al. Blood virosphere in febrile Tanzanian children. Emerg. Microbes Infect. 2021, 10, 982–993. [Google Scholar] [CrossRef] [PubMed]
Pingoud, A.; Jeltsch, A. Structure and function of type II restriction endonucleases. Nucleic Acids Res. 2001, 29, 3705–3727. [Google Scholar] [CrossRef]
Dean, F.B.; Nelson, J.R.; Giesler, T.L.; Lasken, R.S. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 2001, 11, 1095–1099. [Google Scholar] [CrossRef]
Brani, P.; Manzoor, H.Z.; Spezia, P.G.; Vigezzi, A.; Ietto, G.; Dalla Gasperina, D.; Minosse, C.; Bosi, A.; Giaroni, C.; Carcano, G.; et al. Torque Teno Virus: Lights and shades. Viruses 2025, 17, 334. [Google Scholar] [CrossRef]
Somasekar, S.; Lee, D.; Rule, J.; Naccache, S.N.; Stone, M.; Busch, M.P.; Sanders, C.; Lee, W.M.; Chiu, C.Y. Viral surveillance in serum samples from patients with acute liver failure by metagenomic next-generation sequencing. Clin. Infect. Dis. 2017, 65, 1477–1485. [Google Scholar] [CrossRef] [PubMed]
Kaczorowska, J.; van der Hoek, L. Human anelloviruses: Diverse, omnipresent and commensal members of the virome. FEMS Microbiol. Rev. 2020, 44, 305–313. [Google Scholar] [CrossRef]
Houldcroft, C.J.; Beale, M.A.; Breuer, J. Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol. 2017, 15, 183–192. [Google Scholar] [CrossRef]
Rotondo, J.C.; Martini, F.; Maritati, M.; Caselli, E.; Gallenga, C.E.; Guarino, M.; De Giorgio, R.; Mazziotta, C.; Tramarin, M.L.; Badiale, G.; et al. Advanced molecular and immunological diagnostic methods to detect SARS-CoV-2 infection. Microorganisms 2022, 10, 1193. [Google Scholar] [CrossRef]
Qiu, J.; Söderlund-Venermo, M.; Young, N.S. Human parvoviruses. Clin. Microbiol. Rev. 2017, 30, 43–113. [Google Scholar] [CrossRef] [PubMed]

Figure 1. A machine-learning analysis of eight unknown contigs as viral sequences. The crAssphage genome (GenBank accession number NC_024711) was used as input to evaluate the performance of the two trained models applied in analyses. Eight unknown contigs were found to have very similar p-values with both models so that only a single p-value was shown for contig xx01_23 from Patient 1. The p-values of the seven unknown contigs from Patient 3 were all ≤ 0.005.

Figure 2. Results of Sanger sequencing of the amplicons from contigs xx03_260 (A) and xx03_101 (B). Each amplicon was sequenced in both directions. Priming sites for sequencing are indicated.

Figure 3. Summarized (RT) PCR results for samples from Patient 3. Serum samples were collected from Patient 3 at two timepoints. M, 50 bp DNA ladder (NEB).

Figure 4. Results of genome walking at the 3′ end of Seq260. The Sanger sequencing map is shown for the domain covering the novel 103 nt sequence that was extended with genome walking. Two junction sites between the known Seq260 and newly extended sequences are indicated.

Figure 5. PCR amplification of the putative viral genome with and without NruI digestion.

Figure 6. Quantitation of Seq260 copy number in serum. Amplitude outputs of digital-droplet PCR (ddPCR) from Patient 3 (bottom right) and the mock serum samples with concentrations of gBlocks Seq260 from 100 to 1 × 10⁸ copies/mL. Each mock serum sample was set up with three technical replicates.

Table 1. The list of patients used in the current study. Note that the diagnosis in the cohort for viral discovery is based on the record of patient’s last visit at Saint Louis University Hospital. Asterisk indicates that these patients have unknown etiologies from the parent studies. Abbreviations: CLD, cryptogenic liver disease; OLT, orthotopic liver transplantation; SLULC, Saint Louis University Liver Center; ARC, American Red Cross; NIDDK-CR, the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository; NA, not available.

Cohort for Viral Discovery
Patient #	Sex	Age	Diagnosis (CLD)	Outcome	Specimen	Source
1	F	64	Cirrhosis	NA	Serum	SLULC
2	F	73	Hepatitis	NA	Serum
3	F	60	Cirrhosis	OLT, deceased	Serum
4	F	42	Cirrhosis	NA	Serum
5	F	68	Cirrhosis	deceased	Serum
6	M	40	Cirrhosis	NA	Serum
7	M	59	Cirrhosis	OLT, deceased	Serum
8	M	65	Cirrhosis	NA	Serum
9	M	54	Hepatitis	NA	Serum
Cohort for PCR Screening
Group #	Diagnosis			Number	Specimen	Source
1	Chronic HCV infection			100	Serum	SLULC
1	Chronic HBV infection			10	Serum	SLULC
2	Blood donors			200	Serum	ARC
3	Acute liver failure			50 *	Serum	NIDDK- CR
	Liver transplantation			40 *	Serum
	Cirrhosis			25 *	Serum

Table 2. The list of primers used in the current study. All primers were designed according to the orientation of the contigs that could generate the largest open reading frame (ORF), starting with any sense codon in a positive-sense manner, as predicted in the NIH ORFfinder. Star donates phosphorothioate bonds in primers for the resistance to exonuclease activity of phi29 DNA polymerase. Primers PAWR and PBWF had their 5′ ends modified by phosphorylation. Primer C28 had its 5′ ends blocked by C18 spacer. Asterisk donated phosphorothioate bonds to resist exonuclease activity of phi29 DNA polymerase. All primers and probes were synthesized in the Integrated DNA Technologies (Newark, NJ). Abbreviation: NA, not applicable; FAM, 6-carboxytetramethylrhodamine; BHQ1, black hole quencher 1.

Application	Target		Primer Name	Polarity	Sequence (5′→3′)	Amplicon Size
RT-PCR/PCR Validation	Contig xx12_260 (Seq260)		F1	Forward	tccttgatgcaagccattg	237 bp
			R1	Reverse	gcgggataccaacaacaac
			F2	Forward	atgtcactggcatccttcttc
			R2	Reverse	taccaacaacaacccaacc
	Contig xx12_101		F1	Forward	gatggtgtccccactacagc	305 bp
			R1	Reverse	acaactcacgaccaggaacc
			F2	Forward	tttaagcagtggtatgccggt
			R2	Reverse	accatgttggtaattgccgga
	Contig xx01_23		F1	Forward	cgatcaagtactctcgccga	237 bp
			R1	Reverse	gccatcacatgcatcaggaa
			F2	Forward	gtactctcgccgatacgtct
			R2	Reverse	agcatcaaccgaaaagccag
Genome walking	Seq260	5′ end	PAWR	Reverse	phos-catcgactggaagtggttgg	NA
			AWF1	Forward	gaccgcacttcatccacatg
			AWR1	Reverse	atgcttcattgacatcctcatc
			AWF2	Forward	cgcgacctatcgttaccaac
			AWR2	Reverse	cagaagaaggatgccagtgac
			AWF7bp	Forward	taatccg
			AWR7bp	Reverse	ggatacc
		3′ end	PBWF	Forward	phos-ggcatccttcttctgttacctc	NA
			BWF1	Forward	cctatcgttaccaaccacttcc
			BWR1	Reverse	atgaagtgcggtcatcgac
			BWF2	Forward	aggtatcgcggttgtctgag
			BWR2	Reverse	atgcttcattgacatcctcatc
			BWF7bp	Forward	ctaactc
ddPCR	Seq260		260F	Forward	cagacagattacgatgaggatgt	100 bp
			260R	Reverse	ggtaacgataggtcgcgatatt
			260P	Reverse	FAM-tcgatgaccgcacttcatccacat-BHQ1
MDA			C28	NA	/5Sp18/nnnnn	20 kb

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Fan, I.X.; Xu, Y.; Rule, J.; Tse, L.P.V.; Pourkarim, M.R.; Lee, W.M.; Di Bisceglie, A.M.; Fan, X. Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing. Viruses 2025, 17, 812. https://doi.org/10.3390/v17060812

AMA Style

Zhang X, Fan IX, Xu Y, Rule J, Tse LPV, Pourkarim MR, Lee WM, Di Bisceglie AM, Fan X. Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing. Viruses. 2025; 17(6):812. https://doi.org/10.3390/v17060812

Chicago/Turabian Style

Zhang, Xiaoan, Ida X. Fan, Yanjuan Xu, Jody Rule, Long Ping Victor Tse, Mahmoud Reza Pourkarim, William M. Lee, Adrian M. Di Bisceglie, and Xiaofeng Fan. 2025. "Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing" Viruses 17, no. 6: 812. https://doi.org/10.3390/v17060812

APA Style

Zhang, X., Fan, I. X., Xu, Y., Rule, J., Tse, L. P. V., Pourkarim, M. R., Lee, W. M., Di Bisceglie, A. M., & Fan, X. (2025). Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing. Viruses, 17(6), 812. https://doi.org/10.3390/v17060812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Population and Patient Samples

2.2. Serum Virome Sequencing

2.3. Viral Categorization and Discovery

2.4. Validation of Unknown Sequences

2.5. Machine-Learning Analysis for the Origin of Unknown Sequences

2.6. Genome Walking and Strand Attribute Determination

2.7. Determination of Copy Numbers of Seq260 Using Digital Droplet PCR

2.8. Detection of Seq260 in the Screened Cohort

2.9. In Silico Screening of Unknown Sequences

3. Results

3.1. Discovery of Novel Virus-like Sequences in a Patient with Cryptogenic Liver Cirrhosis

3.2. Seq260 (Contig xx03_260) from Patient 3 Is Not a Contaminant

3.3. A Net Extension of 107 nt at the 3′ End of Seq260 by Genome Walking

3.4. Seq260 Is Not Detected in Other Patients and Next-Generation Sequencing Data

4. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI