Transcriptome from Paired Samples Improves the Power of Comprehensive COVID-19 Host-Viral Characterization

Previous transcriptome profiling studies showed significantly upregulated genes and altered biological pathways in acute COVID-19. However, changes in the transcriptional signatures during a defined time frame are not yet examined and described. The aims of this study included viral metagenomics and evaluation of the total expression in time-matched and tissue-matched paired COVID-19 samples with the analysis of the host splicing profile to reveal potential therapeutic targets. Prospective analysis of paired nasopharyngeal swabs (NPS) and blood (BL) samples from 18 COVID-19 patients with acute and resolved infection performed using Kallisto, Suppa2, Centrifuge, EdgeR, PantherDB, and L1000CDS2 tools. In NPS, we discovered 6 genes with changed splicing and 40 differentially expressed genes (DEG) that yielded 88 altered pathways. Blood samples yielded 15 alternatively spliced genes. Although the unpaired DEG analysis failed, pairing identified 78 genes and 242 altered pathways with meaningful clinical interpretation and new candidate drug combinations with up to 65% overlap. Metagenomics analyses showed SARS-CoV-2 dominance during and after the acute infection, with a significant reduction in NPS (0.008 vs. 0.002, p = 0.019). Even though both NPS and BL give meaningful insights into expression changes, this is the first demonstration of how the power of blood analysis is vastly maximized by pairing. The obtained results essentially showed that pairing is a determinant between a failed and a comprehensive study. Finally, the bioinformatics results prove to be a comprehensive tool for full-action insights, drug development, and infectious disease research when designed properly.


Introduction
During the current coronavirus disease-19 (COVID-19) pandemic, over 768 million infected people were registered, of which even 6.9 million ended fatally, https://www.who.int/ (accessed 18 June 2023).Accomplishing scientific achievements, such as pathogen identification and vaccine synthesis, were priority goals due to the heavy burden on health systems and the economy.Still, further efforts are necessary to understand the complex interactions between the host and severe acute respiratory coronavirus 2 (SARS-CoV-2).
Although COVID-19 apparently shares numerous clinical manifestations with other common respiratory viral infections, a significant peculiarity of SARS-CoV-2 infection has been observed over time [1].Firstly, many studies have proven that SARS-CoV-2 has the capacity to infect different human cell types managing to successfully evade the host immune response [2].Key features of the innate immune functions disturbance include unrecognizable viral replication, dysregulation of interferon response and/or signaling pathways, exhaustion and reduction of lymphocytes, particularly CD4+ T, CD8+ T, and natural killer (NK) cells, and finally, development of hyperinflammation and even cytokine storm [3][4][5][6].All listed and described features could lead to multiple-organ dysfunction syndrome (MODS) and acute respiratory distress syndrome (ARDS), a major cause of death in COVID-19 [7].However, while the evolution of SARS-CoV-2 continuously leads to the generation of new variants and subvariants, and the rate of infections is simultaneously increasing, the spectrum of clinical manifestations and risk factors for adverse events are also changing [8].
The integrity of the nasal microbiome and virome may also be disturbed by an exogenous infective stressor such as SARS-CoV-2, leading to upregulation of the host innate immune system, activation of dormant pathogens, and establishment of microbial nasal mucosal host gene patterns [3,9].To date, knowledge about these interactions is lacking not only in the field of COVID-19 but also in other infectious diseases.Due to limited methodological possibilities of targeted sequencing on cultures, next-generation sequencing (NGS)-based methods have opened up a wide range of possibilities for understanding metatranscriptomics.However, NGS still could not be widely used because of the complex and time-consuming execution, cost prohibition, demanding quality and sample volume, and difficulties in implementation to accurately and comprehensively describe the entire virome and microbiome together with the host response [10].Therefore, the optimization of new methods and computational workflow to simultaneously characterize virome, microbiome, and host genome directly from a variety of clinical samples is expanding [10][11][12].
As a primary site of infection pathogenesis, local nasal expression is more COVID-19dependent and specific.So far, transcriptome profiling studies have shown significantly upregulated genes and biological pathways altered during acute infection, such as proinflammatory cytokines, chemokines, enzymes in neutrophil-mediated immunity, and several IFN-stimulated genes [13][14][15][16].However, data showing experimental validation, potential diagnostic use, and detailed characterization of variations with age, sex, disease period, and/or severity are generally scarce [17].On the other hand, a systemic expression derived from whole blood could be more relevant to reflect the virus-induced host immune system imbalance.Thus, previous research conducted on blood samples from severe COVID-19 patients showed altered gene expression associated with inflammatory and hypercoagulability pathways, as well as elevated neutrophil activity and expression of coagulation and platelet function genes [18,19].
It could be observed that previous research has not yet examined and described changes in the transcriptional signatures during a defined time frame.This, in particular, implies considering paired samples related to acute and resolved infection because otherwise, the interpretation of human expression may vary [20].Moving toward a more comprehensive analysis of the host-viral relationship, it is of vital interest that the aims of this study include evaluating the expression in paired COVID-19 samples across different relevant tissues.The benefit of pairing the samples and eliminating inter-individual variation is quantified whenever possible.Moreover, this study analyzes the host splicing profile showing potential therapeutic targets.

SARS-CoV-2 Dominates the Metagenomics Findings in Both Case and Control Cohorts
Metagenomic alignment of 36 nasopharyngeal swab samples yielded a total of 3772 species, out of which 2699 had unique mappings.After the analysis of unique reads, only 3 species that belonged to different assemblies of SARS-CoV-2 and SARS-CoV were statistically significantly different between cases and controls (Ensembl IDs 9000092, 9000103, 694009).Considering all reads (unique and multi-mapped), no individual species showed statistically different abundances.Still, the cases' mean log-transformed value was bigger (0.008 vs. 0.002, p = 0.019), indicating a significant reduction in the viral RNA abundance at the control stage.Summation was attempted for all the phages and multiple bacterial taxa, but no significant differences were achieved.
The same analysis was performed on 12 pairs of blood samples yielding a total of 1794 species, out of which 1162 had unique mappings.There were no statistically significant differences in individual or summed categories for all or unique reads.

DEG Analysis in Blood Rescued by Pairing
DEG analysis for NPS and BL samples was repeated with both paired and independent designs, yielding a different result.The independent design of blood samples gave only two genes that were statistically significant, so this analysis was deemed unsuccessful and was not further processed.Conversely, paired design for BL samples yielded 78 significant DEGs with only 3 (3.8%)downregulated (Table 1).NPS samples yielded a comparable number of significant genes regardless of pairing-53 genes in an unpaired (Supplementary Table S1) mode and 40 genes in paired mode (Table 2), with an overlap of 23 genes.Of those 23 upregulated genes, the majority were interferon-regulated genes.Among prominent ones are CXCL10, a marker of acute viral infection, ISG15, both an extracellular cytokine and an intracellular protein modifier, IFI6 that may be involved in the regulation of cell apoptosis and IFI27, a proven predictor for COVID-19 outcomes.The upregulation was found in major histocompatibility complex genes, HLA-DR and HLA-A.Downregulation was noticed in only 2 genes in the unpaired design.The asymmetry in the regulation direction can be explained by the significant immune activation across the transcriptome resulting mostly in upregulation.Based on protein-protein interactions, there are three clusters of genes without interconnections (Figure 1).The largest cluster consists of 19 histone genes with abundant mutual connections between the members.The immunity-related cluster has 5 genes-IFI27, IFI44, LY6E, EPSTI1, and SIGLEC1.The smallest cluster has only two interacting genes-IGJ and TNFRSF17.

Pathway Analysis Confirms Known Disease Aspects and Reveals New Patterns
For the three successful design-tissue combinations (blood paired, nasopharyngeal paired, nasopharyngeal independent), we obtained significant results of pathway analysis for three ontologies-cellular components, biological processes, and molecular functions.The potentially asymmetric upregulation at the individual gene level is validated at this step.Molecular function pathways for BL samples (Table 3) show clinically expected downregulation in olfactory receptor activity, odorant binding, oxygen and heme binding, etc.The expected upregulation is mostly found in immune pathways (immunoglobulin receptor binding and antigen binding) and RNA metabolism (RNA binding and structural constituent of ribosomes), reflecting the infection mechanism of an RNA virus.Analysis of NPS with the same ontology gives reduced olfactory and G-protein coupled receptor activity, with the immune component shown as increased cytokine receptor binding (paired design) and antigen binding (unpaired design).The other tissue-ontology combinations show analogous upregulation involving multiple cell structures (Supplementary Tables S2-S9).
Based on protein-protein interactions, there are three clusters of genes without interconnections (Figure 1).The largest cluster consists of 19 histone genes with abundant mutual connections between the members.The immunity-related cluster has 5 genes-IFI27, IFI44, LY6E, EPSTI1, and SIGLEC1.The smallest cluster has only two interacting genes-IGJ and TNFRSF17.String database protein-protein interactions.The nodes represent genes, while the line width represents the strength of interaction.There are three clusters of genes without interconnections-nineteen histone genes, an immunity-related cluster with five genes, and the smallest cluster with two interacting genes.String database protein-protein interactions.The nodes represent genes, while the line width represents the strength of interaction.There are three clusters of genes without interconnections-nineteen histone genes, an immunity-related cluster with five genes, and the smallest cluster with two interacting genes.

Non-Histone Gene Signatures Are Successfully Negated by L1000 Perturbagens
Comparison of the gene expression patterns with known perturbagens in L1000 gave the top-ranking substance BML-259 a score of 0.24.After analysis of the individual overlaps of substances, almost none of them included genes from the histone family.Because our programmatic annotation failed to recognize 14 of 20 histone genes (HIST*) in our list, we suspected a versioning annotation problem and repeated the analysis without the histone genes.The top-ranking substance remained the same, but the score increased to 0.47 (Table 4), and the best combination of BRD-K12184916 and Nocodazole had a score of 0.65 (Table 5).Nocodazole ranked 34th on the single substance list but appeared in 8 out of 12 top-ranking pairs and included the gene IFI27 with the lowest p-value.Clustered visualization shows that most single substances primarily affect the genes from the immunoglobulin superfamily (IG*) (Figure 2).

Alternative Splicing Changed at Both Event and Isoform Level
AS analysis events and isoforms in the independent design for two types of tissues are shown in Table 6.NPS samples show three genes with differentially spliced transcripts and three genes with differentially spliced events, with DCUN1D3 appearing in both groups.BL samples show thirteen genes with differentially spliced transcripts and three genes with differentially spliced events without mutual overlap.There were no overlapping genes between NPS and BL samples.

Discussion
To date, published studies have focused only on local nasal or systemic gene expression in acute COVID-19.It was unknown if there were any changes when the acute illness resolved.Taking this hypothesis into account, we examined gene expression profiles in paired designed samples of total RNA in acute and resolved SARS-CoV-2 infection to assess complete characterization and time-related differences of host-viral relationship in COVID-19 patients.The pathways that were mostly enriched were largely ones that control infection and inflammation, smell function, as well as a major portion of the receptor signaling and oxygen binding pathways.Besides identified biomarkers, this study revealed new drug co-targeting pathways.
Considering that the blood transcriptome reflects signals from different tissues and host biochemical processes, dispersion in RNA quantification could weak and mask many of those tissue-specific signals [21].Moreover, individual variability influences the capacity to identify the processes primarily located in the blood, such as immune response pathways [17].Thus, a larger sample size and additional standardizations in biological sampling are required.In our study, the blood samples would yield no usable results after removing the failed samples.On the other hand, nasopharyngeal swabs seem relatively robust towards the individual variability-the smaller set of genes is expressed in differentiated tissues such as nasal mucosa, thus increasing the number of transcripts per gene and the power of the study.
Using sample pairing design in this study increased the number of significant genes from 2 to 78, enabling a meaningful pathways analysis and functional/clinical validation of the project.Although this design represents a simple method for increasing power when interpersonal variability is large, genomic studies rarely take advantage of it when it comes to transcriptome analysis.Single-cell sequencing could be done at multiple time points to characterize cells in both time and space, but we strive to measure the improvement when applied in bulk sequencing for the characterization of systemic effects [22].The pairing of samples does require repeated visits, patient tracking, and follow-up.Additionally, it is easier to find healthy controls for infectious diseases characterized by a complete resolution, while chronic infections with pathogen persistence or disease progression do not allow sampling due to the lack of a clear healthy control phase [23].Regardless of the impossibility of obtaining RNA biomarkers of a specific disease, it is still possible to sample the RNA at two distinct time points and evaluate biomarkers of disease progression instead of the disease itself.Therefore, we emphasize that the gain in statistical power could be utilized in multiple fields and clinical problems.
The enrichment analysis has shown down-regulation of olfactory receptor, G proteincoupled receptor activity, and up-regulation of cytokine receptor binding in the nasal epithelium of paired samples, of which the first two were also down-regulated in paired blood samples.Olfactory dysfunction is a common symptom experienced by almost 53% of COVID-19 patients and affects approximately 7% of the COVID-19 convalescents [24,25].There are several proposed mechanisms for SARS-CoV-2-related hyposmia/anosmia, one of them being T-cell-mediated inflammation persistent in the olfactory epithelium and the associated decline in the number of olfactory sensory neurons (OSNs) [26][27][28].The exact mechanism of OSN decline is still not clear.The cilia of the OSNs have G-protein-coupled olfactory receptors involved in the transduction of extracellular signals through second messenger cascades controlled by heterotrimeric guanine nucleotide-binding proteins [29].It is tempting to speculate that found down-regulation of olfactory receptor and G proteincoupled receptor activity may initiate the olfactory dysfunction in COVID-19 patients.Further, persistent viral infection could also drive ongoing damage to OSNs [30].Indeed, the evidence for active SARS-CoV-2 infection was found in all of our nasopharyngeal swabs, although the cases' mean log-transformed value was bigger (0.008 vs. 0.002, p = 0.019), indicating a significant reduction in the viral RNA abundance at the control stage.
IFI27, interferon alpha inducible protein 27, transcription has recently been a proven predictor for COVID-19 outcomes [35].A higher level of IFI27 with a lower level of DCUN1D3 is said to increase the risk for COVID-19 [36].Earlier, it was also known as a biomarker for discrimination between influenza and bacteria in patients with suspected respiratory infection [37].In our study, regardless of the type of sample, this was the strongest upregulated gene in acute COVID-19.These data, together with previously published studies, suggest that prognostic biomarkers targeting the family of IFI27 genes could potentially replace conventional diagnostic tools [35].It could be of particular importance in future virus pandemics because IFI27 expression appears to be specific to viral illness [38].Moreover, as the kinetics of IFI27 expression are poorly understood, the unresolved dilemma reported by other researchers was whether serial measurement of IFI27 expression would be more informative than a single time point measurement [35].In order to resolve those dilemmas, genomic patterns from double-paired samples were, for the first time, comprehensively analyzed in our work.In this way, our results confirmed the hypotheses of previous authors.
Viruses manipulate cell cycle progression to generate resources and conditions favorable for viral production.Still, the effect of SARS-CoV-2 on cell cycle progression remains largely unknown [39].Therefore, the identification of novel drug targets is an essential puzzle of comprehensive research.In the present study, we aimed to identify gene expression patterns with both single and co-targeted highly potent therapeutics.The L1000 perturbagens analysis showed BML-259, a potent cyclin-dependent kinase 5 (CDK5) inhibitor, as the top-ranking drug for acute COVID-19.As CDKs are key regulators of cell cycle progression, they represent promising therapeutic goals for cancer and neurodegenerative diseases.CDK has also been the target of various viral infections such as HIV, Herpes simplex virus, Zika, and Hepatitis B viruses, where its expression is altered in the affected cell [40].Interestingly, when it comes to potential COVID-19 treatment, most literature data have been focused on CDK2, but not CDK5, identified by our study [40,41].
The greatest significance of our survey is that it reveals new co-targeting pathways of high potential for the success of synergistic drug action of nocodazole and BRD-K12184916.Even 65% of genes of acute COVID-19 patients are included in described co-targeted pathways: microtubule cytoskeleton organization and condition of hypoxia.These gene co-targets could serve as a promising step toward the identification of additional drug options, especially because some of them are already in use in other indications.The cytoskeleton, in particular, microtubules, have an essential role in cell mitosis, organization of cytoplasm, and controlling of cell movement, cell signaling, and trafficking of organelles.Their disruption leads to cell cycle arrest and loss of cellular architecture [42].Therefore, various microtubule inhibitors, especially those that lead to cell arrest and apoptosis, such as nocodazole, have been used in the treatment of malignancies to synchronize cell proliferation.Moreover, there are indications that this mechanism could also be used for the purpose of antiviral action in human infections caused by West Nile virus, Cytomegalovirus, and even SARS-CoV-2 [43][44][45].Inhibitors of the PI3K/AKT/mTOR signaling pathway, such as dactolisib, have potential antineoplastic activity targeting tumor cell apoptosis and growth inhibition in PI3K/mTOR-overexpressing tumor cells.In addition, those inhibitors have been shown as novel candidates to treat pathologic hypoxia that occurs in most human solid tumors [46].Finally, even without these relatively new anti-hypoxia theories, those inhibitors have also been investigated in COVID-19.As they could upregulate IFN-induced antiviral responses, research was focused on reducing the COVID-19 severity and use in COVID-19 post-exposure prophylaxis in elderly patients [47,48].
The main limitation of the study was the sample size and the lack of clinical metadata that would enable further clinical correlations and stratification with transcriptomic biomarkers.The sample size was sufficient for the paired analysis performed without additional covariates but would be insufficient for investigating a subsample defined with the additional clinical data.As a consequence, even with strong statistical signals arising from the transcription of individual genes such as IFI27, we were unable to extrapolate their prognostic value to a broader clinical context.On the other hand, the study managed to unify the molecular mechanisms regardless of the differences in the clinical presentation of the disease, which could be considered a result that overcomes the described limitations.
Metagenomics analyses from the total RNA are susceptible to bias from existing fragments and show positive results even when conventional testing methods give a negative signal.However, the coexistence of other RNA viruses was not shown during the active disease or in the convalescence period.
Designing genomic research on a relatively novel subject harbors difficulty in formally estimating the required sample size and the proper design since maximizing the study power increases the probability of its success.Nasopharyngeal swabs are viable targets for both host and viral expression studies.Although they do not require sample pairing, those samples give limited insight into the systemic events.On the other hand, the analysis of blood samples benefits immensely when a sample pairing strategy is applied.In particular, the unpaired expression analysis performed in our research essentially failed to pinpoint significant changes, but the pairing successfully elucidated different pathways as well as the potential therapy targets with high concordance to clinical insights.Thus, the obtained results essentially showed that pairing is a determinant between a failed and a comprehensive study.

COVID-19 Patients and Samples
We performed a prospective analysis of nasopharyngeal swabs (NPS) and blood (BL) samples collected from 18 COVID-19 patients infected between December 2020 and April 2021.The patients were all recruited in the acute phase of COVID-19, during the 7-day period from the onset of the disease.Among 18 patients, 12 were hospitalized but with different disease severity, and 6 non-hospitalized individuals were tested due to mild symptoms and suspicion of COVID-19.Hospitalized patients were treated at the University Hospital Center Dr Dragisa Misovic, Belgrade, and the Serbian Institute of Occupational Health, Belgrade.Patients with mild symptoms were tested on request in the Laboratory of Molecular Microbiology, Institute for Biocides and Medical Ecology, Belgrade.
Both NPS and BL samples were collected from every participant immediately after inclusion in the study, and they were defined as "cases."Their samples were paired during the second sampling (nasopharyngeal swab and blood) after resolving the SARS-CoV-2 infection (2-3 weeks after initial recruiting and sampling), and then they were defined as "controls".At the moment of the first sampling, according to the set inclusion criteria, nasopharyngeal swabs all tested positive for SARS-CoV-2 RNA by Real-Time PCR.On the other hand, paired nasopharyngeal swabs taken after resolving the acute infection all tested negative for SARS-CoV-2 RNA by the same methodology.

Sequencing
Commercial sequencing was carried out at Novogene Bioinformatics Technology Co., Ltd., in Beijing, China.Messenger RNA was purified from total RNA using poly-T oligoattached magnetic beads after rRNA removal (Illumina, Kit Cat.No. 20020597).After fragmentation, the first strand of cDNA was synthesized using random hexamer primers, followed by the second strand of cDNA synthesis (Illumina, Kit Cat.No. 20020597).The library was ready after end repair, A-tailing, adapter ligation, size selection, amplification, and purification.Paired-end sequencing of 150 bases each was performed on Illumina NovaSeq 6000.QC was done according to the sequencing provider's specification, and 6 samples of blood with their pairs were removed from the blood batch of samples.The total number of paired samples was 18 pairs of nasopharyngeal swabs and 12 samples of blood.The average base quality Phred score was above 30 at all locations in all samples.

Bioinformatic Processing
Both nasopharyngeal and blood samples are quantified in an identical manner using Kallisto 0.44 [49] for pseudo-alignment to Gencode v35 transcriptome [50].The internal probabilistic model assigns reads that do not have a transcript-level unique mapping with maximum likelihood providing abundance estimated (transcripts-per-million). Raw

Figure 1 .
Figure 1.String database protein-protein interactions.The nodes represent genes, while the line width represents the strength of interaction.There are three clusters of genes without interconnections-nineteen histone genes, an immunity-related cluster with five genes, and the smallest cluster with two interacting genes.

Figure 1 .
Figure 1.String database protein-protein interactions.The nodes represent genes, while the line width represents the strength of interaction.There are three clusters of genes without interconnections-nineteen histone genes, an immunity-related cluster with five genes, and the smallest cluster with two interacting genes.

Figure 2 .
Figure 2. Hierarchically clustered heatmap of overlaps for individual perturbagens.Most substances match the genes from the immunoglobulin superfamily.Red genes are upregulated, and they are paired with blue perturbagen signatures of opposite sign.Opposite logic is valid for the downregulated PDZK1IP1 gene.

Figure 2 .
Figure 2. Hierarchically clustered heatmap of overlaps for individual perturbagens.Most substances match the genes from the immunoglobulin superfamily.Red genes are upregulated, and they are paired with blue perturbagen signatures of opposite sign.Opposite logic is valid for the downregulated PDZK1IP1 gene.

4. 2 .
RNA Extraction and Real-Time PCR RNA isolation and SARS-CoV-2 detection was performed in the Laboratory of Molecular Microbiology, Institute for Biocides and Medical Ecology, Belgrade.RNA extraction from 200 µL of whole blood or 500 µL of viral transport medium, in which nasopharyngeal swabs were placed, was carried out using RNeasy Mini Kit (Qiagen, Kit Cat.No. 74104).Real-time PCR to confirm SARS-CoV-2 positivity was routinely performed from 5 µL of eluted RNA using GeneFinderTM COVID-19 Plus RealAmp Kit (OSANG Healthcare Co., Seongnam, Republic of Korea) following the manufacturer's protocol.Quant StudioTM 5 Real-Time PCR Instrument (Thermo Fisher Scientific, Waltham, MA, USA) was used for the amplification and detection of viral RNA.All eluates were stored at −80 • C until shipment to the other laboratory for the NGS analysis.

Table 1 .
Results of paired DEG analysis in blood (BL) showing genes with FDR corrected significant p-value.

Table 2 .
Results of paired DEG analysis in nasopharyngeal swabs (NPS) showing genes with FDR corrected significant p-value.

Table 3 .
Molecular function pathways for blood (BL) samples.

Table 4 .
Predicted overlap of L1000 genes with individual perturbagen signatures of the opposite direction.All the differentially expressed genes are included in the analysis.Full table with links is given in Supplementary TableS10.

Table 5 .
Predicted overlap of L1000 genes with signatures of gene pairs of the opposite direction.As opposed to the previous table, the histone genes are excluded from the comparison.The highest overlap, as represented by the score, is 64.71%.

Table 4 .
Predicted overlap of L1000 genes with individual perturbagen signatures of the opposite direction.All the differentially expressed genes are included in the analysis.Full table with links is given in Supplementary TableS10.

Table 6 .
Overview of alternatively spliced events and isoform across nasopharyngeal swabs (NPS) and blood (BL) samples.