Insights into the Transcriptome of Human Cytomegalovirus: A Comprehensive Review

Human cytomegalovirus (HCMV) is a widespread pathogen that poses significant risks to immunocompromised individuals. Its genome spans over 230 kbp and potentially encodes over 200 open-reading frames. The HCMV transcriptome consists of various types of RNAs, including messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs), circular RNAs (circRNAs), and microRNAs (miRNAs), with emerging insights into their biological functions. HCMV mRNAs are involved in crucial viral processes, such as viral replication, transcription, and translation regulation, as well as immune modulation and other effects on host cells. Additionally, four lncRNAs (RNA1.2, RNA2.7, RNA4.9, and RNA5.0) have been identified in HCMV, which play important roles in lytic replication like bypassing acute antiviral responses, promoting cell movement and viral spread, and maintaining HCMV latency. CircRNAs have gained attention for their important and diverse biological functions, including association with different diseases, acting as microRNA sponges, regulating parental gene expression, and serving as translation templates. Remarkably, HCMV encodes miRNAs which play critical roles in silencing human genes and other functions. This review gives an overview of human cytomegalovirus and current research on the HCMV transcriptome during lytic and latent infection.


Introduction
The human cytomegalovirus (HCMV) belongs to the beta-herpesvirus subfamily and is a double-stranded DNA virus, infecting 40% to 60% of individuals in industrialized countries and up to 100% in developing countries. It is transmitted through body fluids, blood transfusion, and organ transplantation. While mostly asymptomatic in immunocompetent individuals, HCMV can remain latent after primary infection and reactivate during pregnancy or in individuals with cancer, transplanted organs, AIDS, or other immune deficiencies [1], leading to severe diseases in the lung, liver, colon, eye, or brain such as hepatitis, pneumonitis, colitis, and CMV retinitis [1]. Additionally, congenital CMV (cCMV) infection is a leading cause of birth defects, with approximately 10% of infants with cCMV displaying CNS impairments [2].
The mature HCMV virion contains a large linear double-stranded genomic DNA tightly intertwined and wrapped within a capsid, which is surrounded by a tegument layer and an envelope [3]. The HCMV genome is approximately 230 to 240 kbp in length,

HCMV Messenger RNA (mRNA)
The human cytomegalovirus (HCMV) was first discovered in 1881 [9]. Early sequencing and annotation of the laboratory strain of HCMV AD169 sequenced around 208 ORFs [10] (Figure 2), but subsequent re-evaluation estimated the number of protein-coding sequences to range from 164 to 220 [11][12][13][14][15][16]. Among these, 45 ORFs were found essential for viral replication in fibroblasts, while 107 were deemed nonessential [16]. However, recent studies have identified over 400 newly translated ORFs by ribosome profiling, bringing the total number to over 750 with many transcripts containing multiple translationally active ORFs [17]. Subsequently, a comprehensive analysis reported 248 transcription start sites, 116 transcription termination sites, and 80 splicing events within the HCMV genome. Furthermore, 291 previously undescribed or only partially annotated transcription isoforms were identified and annotated. Most of these transcripts were found to contain multiple translationally active ORFs [15], adding to the complexity of HCMV gene expression and regulation. While the functions of some HCMV genes remain unknown, significant progress has been made in identifying the functions of genes related to the infective stages [8]. In this review, we provide an overview of current research on the four different classes of HCMV transcripts and delve into their respective roles and functions.

HCMV Messenger RNA (mRNA)
The human cytomegalovirus (HCMV) was first discovered in 1881 [9]. Early sequencing and annotation of the laboratory strain of HCMV AD169 sequenced around 208 ORFs [10] ( Figure 2), but subsequent re-evaluation estimated the number of protein-coding sequences to range from 164 to 220 [11][12][13][14][15][16]. Among these, 45 ORFs were found essential for viral replication in fibroblasts, while 107 were deemed nonessential [16]. However, recent studies have identified over 400 newly translated ORFs by ribosome profiling, bringing the total number to over 750 with many transcripts containing multiple translationally active ORFs [17]. Subsequently, a comprehensive analysis reported 248 transcription start sites, 116 transcription termination sites, and 80 splicing events within the HCMV genome. Furthermore, 291 previously undescribed or only partially annotated transcription isoforms were identified and annotated. Most of these transcripts were found to contain multiple translationally active ORFs [15], adding to the complexity of HCMV gene expression and regulation.

The ORFs Are Color-Coded According to the Growth Properties of their Corresponding virus Gene Deletion Mutants
The expression of HCMV genes is temporally regulated, and it is divided into immediate early (IE), early (E), and late (L) gene expression [18] (Figure 3). IE genes encode regulatory trans-acting factors, while the E genes' expression requires the de novo expression The ORFs Are Color-Coded According to the Growth Properties of Their Corresponding Virus Gene Deletion Mutants The expression of HCMV genes is temporally regulated, and it is divided into immediate early (IE), early (E), and late (L) gene expression [18] (Figure 3). IE genes encode regulatory trans-acting factors, while the E genes' expression requires the de novo expression of IE genes. Late gene expression occurs after the onset of viral DNA replication [19,20]. Due to the complexity of the HCMV genome, the roles and protein-coding potentials of many ORFs remain largely unknown, warranting further investigation. Some major ORFs functions identified include protein coding, viral replication, and translation regulation.

The ORFs Are Color-Coded According to the Growth Properties of their Corresponding virus Gene Deletion Mutants
The expression of HCMV genes is temporally regulated, and it is divided into immediate early (IE), early (E), and late (L) gene expression [18] (Figure 3). IE genes encode regulatory trans-acting factors, while the E genes' expression requires the de novo expression of IE genes. Late gene expression occurs after the onset of viral DNA replication [19,20]. Due to the complexity of the HCMV genome, the roles and protein-coding potentials of many ORFs remain largely unknown, warranting further investigation. Some major ORFs functions identified include protein coding, viral replication, and translation regulation. Research has identified over 30 ORFs that are vital for viral replication. For example, the de novo synthesis of pUL21A promotes the synthesis of viral DNA, which is required for the late accumulation of IE transcripts and establishment of productive infection [21,22] ( Figure 2). UL123-coded 72-kDa IE1 also promotes viral replication and transcription by antagonizing histone deacetylation, whereas pUL76 has a dominant-negative effect on replication [23]. Furthermore, some HCMV genes are involved in the viral translation process. The HCMV gpUL4 mRNA contains a 22-codon upstream open-reading frame (uORF2) whose product represses downstream translation by blocking translation termination and causing ribosomes to stall on the mRNA [24,25]. HCMV pUL38 preserves mTORC1 kinase activity that promotes translation initiation [26]. Moreover, pUL38 and pUL69 support translation by antagonizing the mTOR target 4EBP1 [27]. PTRS1 enhances Research has identified over 30 ORFs that are vital for viral replication. For example, the de novo synthesis of pUL21A promotes the synthesis of viral DNA, which is required for the late accumulation of IE transcripts and establishment of productive infection [21,22] ( Figure 2). UL123-coded 72-kDa IE1 also promotes viral replication and transcription by antagonizing histone deacetylation, whereas pUL76 has a dominant-negative effect on replication [23]. Furthermore, some HCMV genes are involved in the viral translation process. The HCMV gpUL4 mRNA contains a 22-codon upstream open-reading frame (uORF2) whose product represses downstream translation by blocking translation termination and causing ribosomes to stall on the mRNA [24,25]. HCMV pUL38 preserves mTORC1 kinase activity that promotes translation initiation [26]. Moreover, pUL38 and pUL69 support translation by antagonizing the mTOR target 4EBP1 [27]. PTRS1 enhances translation through both PKR-dependent and PKR-independent mechanisms, limiting the host's antiviral response [28].

HCMV Long Non-Coding RNAs (lncRNAs)
Long non-coding RNAs (lncRNAs) are a class of transcripts that consist of over 200 nucleotides but do not encode proteins. Within the context of HCMV, four main lncRNAs, namely RNA1.2, RNA2.7, RNA4.9, and RNA5.0, account for over 50% of the poly (A)+ viral transcriptome in all infection states ( Table 2). Among these lncRNAs, RNA1.2, RNA2.7, and RNA4.9 have been found to play important roles in the overall HCMV viral life cycle, particularly during lytic replication [5]. However, further investigation is still required to fully understand the detailed functions of lncRNAs in HCMV. Here, we aim to provide a comprehensive summary of the existing research on the four HCMV long non-coding RNAs and their potential contributions to the virus's replication.

RNA1.2
RNA1.2 is among the earliest HCMV transcripts to be discovered and accounts for approximately 7.9% of viral polyA RNA transcription [12]. Although many of its functions are still unknown, it is likely that lncRNA1.2 does not play a large role in the main processes of viral production, such as entry, genome replication, virion assembly, and egress. However, research has indicated that RNA1.2 does have important functions in modulating the expression of multiple cellular genes and facilitating the evasion of acute antiviral responses. One of RNA1.2's roles involves downregulating TPRG1L, which in turn blocks NF-κB and suppresses both the expression and secretion of proinflammatory mediators like IL-6 [43]. As a result, further investigation into RNA1.2 could potentially contribute to the development of treatments for IL-6-associated illnesses. Additionally, lncRNA1.2 generates multiple natural antisense transcripts (NATs) during the late infection stage of HCMV. While it has been found that RNA1.2 ASTs (antisense transcripts) play a role in regulating sense strand expression, further research is necessary to determine the importance of RNA1.2 ASTs in the regulation of the expression of the RNA1.2 gene [44].

RNA2.7
The HCMV lncRNA2.7 is the most abundant lncRNA, occupying approximately 29% of the total poly (A) viral transcriptome [5]. Extensive research indicates that this lncRNA has an important role during infection, particularly in promoting the movement and detachment of infected cells during late infection [517]. Specifically, RNA2.7 facilitates cell movement and viral spread during late infection by stabilizing mRNAs that are rich in A and U nucleotides. It also regulates a large number of cellular genes late in the lytic infection, many of which are associated with encouraging cell movement [517]. Additionally. RNA2.7 has been shown to increase cell-to-cell viral transmission, which is likely because of its role in facilitating cell movement [517]. Moreover, research suggests that RNA2.7 may also be involved in the processes related to latency or reactivation, such as cellular transcription and cell cycle progression. Additionally, it contributes to boosting viral replication by reducing the host's response to infection through repressing Pol II S2 phosphorylation [42]. While some functions and roles of lncRNA2.7 have been revealed, like other HCMV lncRNAs, further research is still required to fully understand its importance and the intricate mechanisms by which it influences various aspects of the viral life cycle.

RNA4.9
Unlike the other three HCMV lncRNAs, which are predominantly localized in the cytoplasm, RNA4.9 is uniquely distributed in the viral replication compartment. RNA4.9 is transcribed in this compartment during early infection, and its levels increase as infection progresses [192]. One notable feature of RNA4.9 is its ability to form RNA-DNA hybrids (R-loops) through its G+C rich region. This interaction may be involved with the initiation of DNA replication [518], as a reduction in RNA4.9 expression correlates with decreased viral DNA replication. This finding strongly suggests that RNA4.9 plays a direct role in viral DNA replication and growth [192]. In addition to its role in viral DNA replication, RNA4.9 may be involved in HCMV latency. The lnc4.9 RNA has been found to associate with the polycomb repressor complex 2 (PRC2) [519]. In herpes simplex virus (HSV), PRC2 plays a role in regulating viral latency [191]. This association raises the possibility that RNA4.9 might play a role in HCMV latency as well. Further research on whether virus mutants that do not express RNA4.9 fail to establish and maintain latency could provide more insight into the role of RNA4.9 in HCMV latency [520].

RNA5.0
The lncRNA5.0 is a stable intron expressed during HCMV infection that is transcribed by RNA polymerase II and characterized by a high adenine and thymine nucleotide content [5,521]. However, compared to other HCMV lncRNAs, the expression of RNA5.0 is much lower than other HCMV lncRNAs, accounting for approximately 0.1% of the total viral transcriptome, and it is not present in the poly (A)+ viral transcriptome since it does not contain a poly (A) tail [5]. RNA5.0 is primarily localized in the nucleus during viral infection and lacks potential protein-coding ORFs [521]. While the exact functions of RNA5.0 remain largely unknown, research suggests that RNA5.0 may not be necessary for lytic replication and the maintenance of latent reservoirs, unlike the HCMV lncRNAs RNA1.2, RNA2.7, and RNA4.9 [5]. However, despite its relatively low expression and lack of its exact functions, lnc5.0 RNA may play a role in activating transcription, regulating gene silencing, or impacting HCMV latency. It could also function in an important role like immune evasion that is important for infection of host organisms but not in cultured cells [521]. Given the limited knowledge about lnc5.0 RNA's precise functions, further investigation is necessary to elucidate its role in HCMV infection fully.

HCMV Circular RNAs (circRNAs)
Circular RNAs (circRNAs) are a unique class of RNA molecules formed through back splicing, resulting in covalently closed loops that lack a 5 cap and a 3 poly (A) tail [522]. Due to their circular structure, circRNAs are more resistant to exoribonuclease (such as RNase R) than linear RNAs [522]. CircRNAs have been identified in all kinds of cells and demonstrated to be associated with different diseases, indicating that they possess important biological functions. CircRNAs can function as microRNA (miRNA) sponges, regulate of parental gene expression, and even serve as translation templates [522,523]. They have also been identified from DNA virus-infected cells, such as Epstein-Barr virus (EBV) [524][525][526][527], Kaposi Sarcoma herpesvirus (KSHV) [525,[528][529][530], human papillomaviruses (HPVs) [531] and RNA viruses, severe acute respiratory disease coronavirus 2 (SARS-CoV-2) [532] and murine hepatitis virus (MHV) [533]. This suggests that circRNAs may play important roles in the viral life cycle and infection processes.
In our previous study, we bioinformatically predicted 704 candidate circRNAs encoded by the HCMV TB40/E strain and 230 encoded by the HCMV HAN strain (Figure 4) [6]. Furthermore, we experimentally confirmed 324 back-splice junctions (BSJs) from three HCMV strains, Towne, TB40/E, and Toledo. A newly published work by Deng et al. also experimentally confirmed 629 HCMV circRNAs from the HAN strain [534]. More importantly, we found 12 circRNAs with over-alignment lengths of 40 bp from 60 bp BSJ sequences that were conserved in both the HAN and TB40/E strains and also expressed in several cell types, suggesting these circRNAs are selected and play important roles (Table 3). Functional analysis of HCMV circRNAs in a competitive endogenous RNA co-regulatory network shows that HCMV circRNAs are involved in a complex and multifaceted interaction network. CircRNAs are an important component of the HCMV transcriptome, and further mutagenesis studies on HCMV circRNA biogenesis may reveal the role played by HCMV circRNAs in terms of viral replication, latency, reactivation and host cells.
several cell types, suggesting these circRNAs are selected and play important roles (Table  3). Functional analysis of HCMV circRNAs in a competitive endogenous RNA co-regulatory network shows that HCMV circRNAs are involved in a complex and multifaceted interaction network. CircRNAs are an important component of the HCMV transcriptome, and further mutagenesis studies on HCMV circRNA biogenesis may reveal the role played by HCMV circRNAs in terms of viral replication, latency, reactivation and host cells. About 60 nt sequences around the back-splice junction points of the HAN and TB40/E strain circR-NAs were compared using blastn (BLAST) [6].   About 60 nt sequences around the back-splice junction points of the HAN and TB40/E strain circRNAs were compared using blastn (BLAST) [6].

HCMV microRNA (miRNA)
HCMV microRNAs (miRNAs) are small non-coding RNA molecules that consist of approximately 22 nucleotides and distributed throughout the HCMV genome. They account for around 80% of total HCMV reads obtained from deep sequencing [7,535]. HCMV encodes 17 known mature miRNAs from 11 precursors (Table 4). In addition, recent research has identified 10 new HCMV miRNAs, 4 from known precursors and 6 from new precursors, bringing the total number of mature miRNAs to 22 from 13 different precursors [20,535,536]. The high expression of miRNAs in the HCMV genome also suggests that they play an important biological role during infection [535].
Viral miRNAs from other herpesviruses like EBV and HSV have promoted the establishment and maintenance of latency. HCMV miRNA may have similar functions [537]. Additionally, miRNAs are non-immunogenic and capable of targeting multiple cellular and viral transcripts, providing an effective means for HCMV to manipulate viral gene expression and cellular signaling pathways during both lytic and latent infection [538]. By targeting numerous cellular genes and modulating the host's signaling pathways, HCMV miRNAs contribute to viral survival and replication [7]. HCMV miRNAs can also silence human genes involved in various physiological processes and attenuate the expression of immediate early (IE) proteins, which are vital for lytic replication. Overall, miRNAs are an important component of the HCMV genome.
Furthermore, research suggests that HCMV miRNAs have the potential to be involved in the development and progression of human diseases [7]. For instance, HCMV miR-US33-5p was found to influence the apoptosis of human aortic vascular smooth muscle cells (HA-VSMC) and was more abundant in the plasma of patients with acute aortic dissection (AAD) [539]. This indicates that HCMV miRNAs might have implications in the pathogenesis of certain human diseases, offering new possibilities for potential treatment alternatives. Understanding the functions and roles of HCMV miRNAs not only provides valuable insights into how the virus operates but also opens up new avenues for exploring therapeutic strategies for HCMV-associated diseases. Further research in this area may reveal novel targets for intervention and management of HCMV infections and related health conditions. MiRNAs were downloaded from miRbase database, which is a searchable database of published miRNA sequences and annotation [540].

HCMV Gene Expression during a State of Latency
HCMV establishes latency primarily in early myeloid lineage cells [541,542], such as CD14+ monocytes and CD34+ hematopoietic progenitor cells [543]. The transcriptome of latent HCMV is very challenging to define, in part because of the scarcity of latently infected cells and the lack of a suitable model. The fate of the virus is determined by the type of infected cells, where the infection of fibroblast cells leads to the production of infectious progeny virus, while the infection of myeloid progenitor cells leads to virus latency, which is acharacterized by the maintenance of the viral genome in the absence of active virus infection or replication. The molecular mechanisms governing viral latency are poorly understood. It has been reported that the HCMV transcriptome during latency is qualitatively different from the lytic cycle transcription profile [544]. Studies using a virus gene-specific microarray have identified latency-associated genes in HCMV-infected myeloid progenitor cells [545,546]. Additionally, using nested PCR, researchers have identified several viral genes with distinct transcriptional profiles during virus latency [544,547]. The transcriptomic profiling of HCMV-infected CD34+ cells and CD14+ monocytes led to the identification of around 20 genes that were associated with latent viral infection [191]. Moreover, the single-cells transcriptomic profiling of latently infected monocytes found a cellular heterogeneity in response to latent virus infection [548,549].
A number of genes including UL138 and LUNA are present during latent virus infection [191,550]. Other genes such as UL144, the IE1 region, UL111A, US28, and noncoding RNAs 4.9 and 2.7 were also expressed during the lytic as well as latent virus phases [322,546,551,552]. It has been hypothesized that lytic genes are expressed during an early phase of viral latency and then repressed over time [553]. Furthermore, research suggests that the heterochromatinization of viral DNA takes place to repress gene transcription during latency. Some studies have shown that signaling pathways mediated through platelet-derived growth factors (PDGFR), epidermal growth factor (EGFR), and PI3K along with the downregulation of IE1/2 expression, UL138 upregulation, and perturbation of cytokine expression leads to viral latency [554][555][556]. In addition to viral genes, HCMVencoded miRNAs have been shown to have important roles in the establishment of latency. They include miR-UL148D and miR-UL112-1. Another miRNA, has-miR-s200, was also found to play an important role in HCMV latency [557].
It has been reported that MIEP is the master regulator of latency in infected cells. In latently infected cells, MIEP is heterochromatinized, suggesting a latency-specific  [18,558,559]. The accumulating evidence suggests that the transcriptomic profiling of latent HCMV has heterogeneity and is poorly defined. Moreover, the exact cause of transcriptional repression of virus gene transcription during latency is unclear, and the involvement of other viral and cellular factors in the establishment of virus latency needs to be identified to better understand this complex process.

Conclusions
Indeed, human cytomegalovirus (HCMV) infection can vary greatly depending on the individual's immune status. While it remains latent and asymptomatic in many healthy individuals, HCMV poses a significant health risk for those who are immunocompromised, such as transplant recipients, HIV patients, and infants with congenital infections. Research on the HCMV transcriptome, including mRNAs, lncRNAs, circRNAs, and miRNAs, has provided valuable insights into the complex interactions between the virus and its host. These different types of RNAs play diverse and overlapping functions in HCMV infection, contributing to various aspects of the virus life cycle, including replication, latency, reactivation, immune regulation, protein coding, and cell movement. Despite significant progress in understanding the HCMV transcriptome, there are still areas of the HCMV transcriptome that are not fully investigated. Further research on HCMV pathogenesis and its transcriptome may lead to a better understanding of human cytomegalovirus as well as insights into effective treatments for HCMV diseases. This knowledge can potentially lead to the development of more effective treatments for HCMV-related diseases, especially for immunocompromised patients and infants at risk of congenital infections.