Deciphering the Potential Coding of Human Cytomegalovirus: New Predicted Transmembrane Proteome

CMV is a major cause of morbidity and mortality in immunocompromised individuals that will benefit from the availability of a vaccine. Despite the efforts made during the last decade, no CMV vaccine is available. An ideal CMV vaccine should elicit a broad immune response against multiple viral antigens including proteins involved in virus-cell interaction and entry. However, the therapeutic use of neutralizing antibodies targeting glycoproteins involved in viral entry achieved only partial protection against infection. In this scenario, a better understanding of the CMV proteome potentially involved in viral entry may provide novel candidates to include in new potential vaccine design. In this study, we aimed to explore the CMV genome to identify proteins with putative transmembrane domains to identify new potential viral envelope proteins. We have performed in silico analysis using the genome sequences of nine different CMV strains to predict the transmembrane domains of the encoded proteins. We have identified 77 proteins with transmembrane domains, 39 of which were present in all the strains and were highly conserved. Among the core proteins, 17 of them such as UL10, UL139 or US33A have no ascribed function and may be good candidates for further mechanistic studies.


Introduction
Human cytomegalovirus (CMV) is a large envelope worldwide prevalent betaherpesvirus, ranging from 45% to 100% in the general population based on socio-economic factors [1,2]. Although CMV generally causes asymptomatic infections in immunocompetent individuals, it is a major cause of morbidity and mortality in immunocompromised individuals such as organ transplant recipients, AIDS, and with congenital infection [3][4][5][6][7].
The CMV genome (236 kb) consists of a unique long (UL) and a unique short (US) region flanked by inverted repeats. CMV gene expression occurs with the expression of immediate-early genes followed by early, early-late and late transcripts [8]. In addition to the 165 canonical ORFs [9,10], CMV genome encodes for other alternative transcripts in addition to have non-canonical translation initiation sites [11,12]. Furthermore, CMV encodes for a large number of genes, many of them with unknown functions that may probably be involved in key processes during host-cell interaction [13].
CMV is able to infect a high number of cell types including fibroblasts, endothelial cells, epithelial cells and myeloid lineage cells, among others [14,15]. CMV is a highly complex virus with multiple proteins embedded in the viral envelope, with at least four Int. J. Mol. Sci. 2022, 23, 2768 2 of 21 distinct types of covalently linked glycoprotein complexes required for CMV infectivity including gCI complex (gB dimer), gCII complex (gM, gN), gCIII complex (trimer gH, gL, gO) and pentameric complex (gH/gL/UL128-131) [16,17]. The therapeutic use of neutralizing antibodies, targeting glycoproteins mediating viral entry, have demonstrated to only achieve partial protection against CMV infection [18][19][20][21][22][23]. One possible explanation if that other proteins may be involved in viral entry that might be also necessary to target in order to elicit a complete protection against infection. An ideal vaccine against CMV infection should elicit a broad immune response, including both neutralizing antibody and T-cell response, against multiple viral antigens including proteins involved in virus-cell interaction and entry [24,25], which may increase efficacy compared with the previously tested vaccines [1, [26][27][28]. In fact, despite the efforts made during the last decade, no CMV vaccine is still available [18,27,29,30]. Thus, understanding the complete repertoire of CMV proteins involved in cell entry may also help to determine the neutralizing response necessary to block infection and may provide novel candidates that could be included in new vaccines design.
In this study, we aimed to explore the CMV genome to identify proteins with putative transmembrane domains to identify new potential viral envelope proteins that may be involved in virus-cell interaction during infection and may therefore be potential targets for neutralizing antibodies for the development of novel therapeutics and preventive measures targeting viral entry and cell-to-cell spread. In order to do that, we have performed in silico analysis using the genome sequence of nine different CMV strains, to identify proteins with predicted transmembrane domains. To further characterize the identified proteins, an exhaustive systematic review of the literature and a sequence homology analysis with known proteins from other organisms were performed. Our work highlights the need to explore new experimental and computational approaches to identify and characterize the CMV proteome.

Identification of Putative Transmembrane Proteins
To determine the transmembrane regions of the proteins encoded by the CMV genome, the genome of nine different CMV strains including both clinical and laboratory strains (Table 1) were analyzed using three different bioinformatic methods: Phobius, Purese-qTM and TMHMM. A description of the methodological approach used is represented in Figure 1. CMV is known to accumulate mutations quite rapidly in cell culture during cell passaging [31]. In order to test to what level these nine selected CMV strains are representative of the 335 available CMV genomes in GenBank, the 56190 ORFs were aligned with the ORFs in our CMV dataset. We obtained 100% median percentage identity and breadth coverage (overlapping distance), representing 99.95% of the total ORFs from the Human betaherpesvirus 5 in the NCBI database ( Figure S1).
Based on the first analysis, we identified 94 proteins with potential transmembrane domains (Figures 2 and S2). Seventeen of them were not considered for further analysis ( Figure S2) because of the following reasons. Proteins UL74, UL115, UL47, UL49, UL76, US22, UL77, UL105, UL122 and UL89 were previously described not to be part of membrane structures [32][33][34][35][36][37][38][39][40][41]. UL47, UL49, UL76 and US22 proteins are known to be part of the tegument, UL77 is located in the capsid; UL105, UL122 and UL89 are found in the nucleus of the host cells [32][33][34][35][36][37][38][39]. In addition, UL4, UL22A and UL116, which were predicted to have one transmembrane domain, were discarded because the transmembrane domain corresponded to the sequence of signal peptide [42][43][44]. In addition, RL13TRL14 and US33 (TB40-E_UNC strain) or ORFL27C and ORFL49W.IORF1 (AD169-BAC20) proteins were only found in one of the CMV studied strains and were not considered for further analysis.  Urine from a congenitally infected child 3 times in human fibroblasts NC006273.2 Figure 1. Schematic representation of the applied workflow. Fasta format protein sequences from nine CMV genomes were analyzed in parallel to predict transmembrane domains and to create an entire set of genes from all strains (pangenome). Transmembrane topology was studied following three different approaches: PureseqTM, Phobius and TMHMM, under default parameters. Predicted transmembrane proteins were compared with orthologous proteins identified by BLAST with the whole Mantis database for the prediction of functional annotation. Proteins that were common to all nine genome datasets formed the core protein set, and functions were annotated accordingly for each transmembrane protein. Figure 1. Schematic representation of the applied workflow. Fasta format protein sequences from nine CMV genomes were analyzed in parallel to predict transmembrane domains and to create an entire set of genes from all strains (pangenome). Transmembrane topology was studied following three different approaches: PureseqTM, Phobius and TMHMM, under default parameters. Predicted transmembrane proteins were compared with orthologous proteins identified by BLAST with the whole Mantis database for the prediction of functional annotation. Proteins that were common to all nine genome datasets formed the core protein set, and functions were annotated accordingly for each transmembrane protein. For further characterization of the 77 remaining proteins with predicted transmembrane (TM) domains, a systematic review was performed to search for any previous published information. For each of the proteins, information on the location, the ascribed function and the number of predicted TM domains is included in Table S1. A graphical representation of the number of predicted TM regions found for each ORF, in each of the nine strains with the indicated bioinformatics tool is shown in Figure 2. Of the 77 proteins analyzed, 33 (43%) proteins only had one TM domain, 23 (29.87%) had from 1 to 2 TM domains, 6 (7.7%) exhibited 1-3, while 15 proteins (19.48%) had from 5 to 8 TM domains. None of the analyzed proteins had four TM domains.
The number of predicted domains differed in some of the studied proteins when using different methods. The results obtained TMHMM method were the most divergent of the three tested methods. On the contrary, a group of proteins encoded by the genes UL33, UL78 and the genes from the unique short (US) region US12-US21, US27 and US28 proteins were predicted to have more than five TM regions by all three methods. In fact, TM regions of these genes, such as the members of US12 family and the proteins with homology to the chemokine receptor family of G protein-coupled receptors (GPCRs): US27 and US28, have been previously described supporting our results [45][46][47]. For further characterization of the 77 remaining proteins with predicted transmembrane (TM) domains, a systematic review was performed to search for any previous published information. For each of the proteins, information on the location, the ascribed function and the number of predicted TM domains is included in Table S1. A graphical representation of the number of predicted TM regions found for each ORF, in each of the nine strains with the indicated bioinformatics tool is shown in Figure 2. Of the 77 proteins analyzed, 33 (43%) proteins only had one TM domain, 23 (29.87%) had from 1 to 2 TM domains, 6 (7.7%) exhibited 1-3, while 15 proteins (19.48%) had from 5 to 8 TM domains. None of the analyzed proteins had four TM domains.
Nineteen out of the 77 proteins (UL2, UL6, UL9, UL14, UL15A, UL74A, UL120, UL121, UL140, UL148C, UL148D, US13, US15, US19, US29, US30, US33A, RL8A, RL9A and RL10) have no previously described function, 13 (UL1, UL5, UL8, UL10, UL20, UL42, UL78, UL124, UL139, UL147, US34A, RL12 and RL13) have been partially studied, 1 (UL41A) has previously been shown not to code for a protein [10] and the other 43 proteins have a previously described function (Table 2). Table 2. CMV predicted transmembrane proteins indicating the cellular localization based on biotool Uniprop, the ascribed functions based on a bibliographic search and the number of predicted domains using the three different tools. (*) indicates unknown or non-verified function.    The number of predicted domains differed in some of the studied proteins when using different methods. The results obtained TMHMM method were the most divergent of the three tested methods. On the contrary, a group of proteins encoded by the genes UL33, UL78 and the genes from the unique short (US) region US12-US21, US27 and US28 proteins were predicted to have more than five TM regions by all three methods. In fact, TM regions of these genes, such as the members of US12 family and the proteins with homology to the chemokine receptor family of G protein-coupled receptors (GPCRs): US27 and US28, have been previously described supporting our results [108,109,121].
A validation experiment was carried out using as an example UL2 and UL124, two of the identified proteins with unknown function. The ORF encoding for these two proteins were cloned into a eukaryotic expression plasmid that included a Myc tag sequence in the 5 end of the clone products. After transfecting the HEK 293T mammalian cell line, plasma membrane proteins were extracted and the cytoplasmic (C) and plasma membrane (PM) protein fractions were tested by Western Blot using an anti Myc antibody. As shown in Figure S3A, both UL4 and UL124 proteins were only detected in the PM fractions confirming their location in the membrane. A loading control using the stain free technology is shown in Figure S3B.

Homology Analysis of thePredicted Transmembrane Proteins
In addition to the exhaustive systematic review of the literature, further analysis of sequence homology with known proteins from other organisms was performed using Mantis software (Table S1). Based on this analysis, we found homologies for two of the proteins with unknown function. UL139 had some level of homology (e-value = 5.1 × 10 −28 ) with proteins involved in cell adhesion, while UL15A had some homology (e value = 1.53 × 10 −4 ) with a biotin permease protein. UL15A ORF was identified in all 9 CMV strains analyzed, while UL139 that was only present in the TR strain.
In addition, Mantis analysis shed an association of UL1 with a carcinoembryonic antigen-related cell adhesion molecule, which is a cell adhesion receptor of the immunoglobulinlike superfamily. UL78 was also identified by Mantis as seven transmembrane receptor from the rhodopsin family. UL147 has been proposed by Mantis to be involved in immune response and chemokine activity and US33A seems to have a von Willebrand A (VWA) domain. However, US33A was present exclusively in Towne, Toledo, TR and VR7863 strains.

Sequence Similarities among Strains: Core Proteins
Based on the differences among different strains, we searched for those genes with predicted TM domains that were present in all the studied CMV genomes. Of the 77 initially predicted TM proteins, 39 (50.64%) met the criteria and were designated as the core TM proteome (Figures 2 and 3). A representation of the 39 proteins grouped according to their function is shown in Figure 3A. The number of predicted TM domains is also indicated in each group. No association was found between the number of TM domains and the function in each group.
Most of the 39 core proteins were highly conserved among all nine strains with a high percentage of sequence similarity (94.33 ± 7.3), except for the RL12 gene (48.31 ± 24.17, Figures 4 and S4). The similarity matrix for each protein in each indicated strain is shown in Figure S4 and an example of heat map depicting the similarity of the core proteins comparing all the strains with AD169, as a reference strain, is shown in Figure 4. As expected, AD169-derived BAC was almost identical to the AD169 strain. Despite the fact that sequence similarity was overall high (with an average above 94%), the sequences of the core proteins from the Merlin, Towne and TR strains differed the most. In addition, most of the core proteins with unknown function (highlighted in red in Figure 4), tended to have lower similarity values compared with other core proteins with known functions.
In addition, in order to test to what extent the core proteins were found among the CMV population, the 335 available genomes in the database were aligned to our set using blastp to all proteins present in our pangenome core. All genomes had representative proteins related to proteins in our set in different proportions with a high number of genomes containing all 39 core proteins, indicating that our pangenome could be extended to all annotated genomes ( Figure S1B). . Percentage identity heatmap among the 39 core proteins from all the studied strains using AD169 as a reference strain. Color scales range from red (0% identity) to white (100% identity). Strains and genes were clustered following a hierarchical clustering method (HCL). Genes were clustered from top to bottom of the figure based on their similarity within the indicated strains. Core proteins with unknown functions are highlighted in red.

CMV Gene Families with Transmembrane Domains
The genes encoded by the CMV genome are grouped into several gene families [73], five of which are important families that include genes with TM domains. Thus, we next analyzed which of the 39 identified core proteins are part of these families.
The RL11 family share the RL11D central domain that includes three conserved residues (one tryptophan and two cysteines) and several potential N-linked glycosylation . Percentage identity heatmap among the 39 core proteins from all the studied strains using AD169 as a reference strain. Color scales range from red (0% identity) to white (100% identity). Strains and genes were clustered following a hierarchical clustering method (HCL). Genes were clustered from top to bottom of the figure based on their similarity within the indicated strains. Core proteins with unknown functions are highlighted in red.

CMV Gene Families with Transmembrane Domains
The genes encoded by the CMV genome are grouped into several gene families [122], five of which are important families that include genes with TM domains. Thus, we next analyzed which of the 39 identified core proteins are part of these families.
The RL11 family share the RL11D central domain that includes three conserved residues (one tryptophan and two cysteines) and several potential N-linked glycosylation sites associated with immunomodulatory properties. The RL11 family consists of 14 genes (RL5A, RL6, RL11-13, UL1 and UL4-11 genes), encoding for proteins with transmembrane domains, except for RL5A and RL6 [51], most of them with unknown specific function [123]. Based on our analysis, we identified three of the genes belonging to this family (UL5, UL8 and UL10) that have not been well characterized, although they are suggested to be involved in the viral cycle and immunomodulation mechanisms (Table S1) [42,45,52].
The US12 family includes US12-US21 genes [102]. US12, US14, US18 and US20 had been shown to be involved in the inhibition of natural killer cells [107]. US16 had been shown to interact with UL130 participating in CMV tropism [104]. US21 encodes a viroporin that modulates calcium homeostasis and protects the cells against apoptosis [108]. US18 and US20 were described to play a role in cell tropism mediating viral replication in specific cells while US13, US15 and US19 have no described function or are poorly characterized [102,107].
The GPCR-7 TM family includes four genes coding for proteins with seven predicted TM domains: US27, US28, UL33 and UL78. This family includes G protein-coupled surface receptors with an important role in immunomodulation that transmit an intracellular signal when binding to an extracellular ligand [109,124]. Within this family only UL78 (protein from the unknown core proposed in this work) remains uncharacterized.
UL120 and UL121 are proposed to form a family and both of them were identified in our unknown core analysis. However, little is known about their function and further studies are needed to shed light into their biological relevance [125].

Discussion
The lack of knowledge of an important part of the genes encoded by the CMV genome, the variability between laboratory and clinical strains and the complex regulation of the virus over the host immune system, have probably limited the design of new preventive and prophylactic measures against CMV infection [1, 122,126]. CMV proteins involved in the interaction with the target cells during entry located in the viral envelope may be considered possible targets to develop new treatments and vaccines against CMV since blocking a combination of these proteins may block the infection in the different target cells [24]. However, of the 165 proteins potentially encoded by the CMV genome, only around 60 proteins have been functionally characterized [12,112,127,128]. In this sense, some authors have reported the association between γ marker, human leucocyte antigens and killer immunoglobulin-like receptors and the natural course of CMV infection [129][130][131][132].
We performed the present study using in silico analysis of the genomic sequences of nine CMV strains (representative of the 335 available CMV genomes in GenBank) with three different bioinformatic tools, to identify proteins that have putative TM domains, to identify new potential envelop proteins, that may help to better understand CMV interaction with the cells and with the host immune system.
As a result, we gained knowledge about the performance of the three bioinformatics tools used in this study. While the Phobius and TMHMM tools are well-established methods to study TM regions (using sequence-based approaches that use hidden Markov), PureseqTM is a novel alternative based on a machine learning model algorithm (Deep-CNF), which has demonstrated to increase the number of results obtained. These tools have already been used to study membrane proteins in other organisms such as humans, Plasmodium falciparum, or E. coli [133][134][135]. Although these tools are quick in silico approaches to analyze the genome of a given organism, some of the results obtained are not always accurate. We obtained some variability of the number of predicted TM domain as a function of the used method. These discordances between methods could be explained due to differences in the algorithm and the threshold values. By default, each applied method that predicts TM regions uses their own algorithms and thresholds, probably detecting slight differences for the same region resulting in different thresholds. The analysis of the signal peptide was also important to exclude false positives since the signal peptides of type I TM proteins are usually hydrophobic and are often predicted to be TM domains. In addition, the nucleotide sequence variability of the different CMV strains with a significant number of polymorphisms (such as genes gN, and UL21) may also explain this variability [56,136,137]. These results highlight the importance of applying different bioinformatics methods for predicting domains in silico.
Proteins that are evolutionarily related are commonly referred to as homologues and very close homologues often have similar functions [138]. Homology-based methods have been previously used in different scenarios such as the identification of oncogenes from retrovirus proteins; cancer metastasis; herpesviruses; and Hepadnaviridae [139] among others. Using a homology-based method, we proposed functions for 23 CMV proteins that were previously characterized, which confirmed the performance of the method. We were also able to propose putative functions to several unknown CMV proteins such as UL1, UL15A, UL139, UL78, UL147, US13 and US33A. UL15A was proposed as a biotin transport system permease protein, while UL139 could be involved in cell adhesion. Furthermore, based on the homologies, UL1 was identified to be involved in the viral-cell adhesion and potentially modulating the immune system of the host. Most of those carcinoembryonic antigen-related cell adhesion molecules are modulators of general cellular processes such as proliferation, motility, apoptosis as well as cell-cell interaction that binds to pathogens enhancing their capacity to colonize the host [140,141]. Our analysis identified UL78 as a member of the rhodopsin family which is large group of evolutionarily-related proteins that are cell surface receptors, detecting molecules outside the cell and activate cellular responses [142]. However, given the functional homology among the other members of the family, it is plausible to think that UL78 exhibits similar functions and may have an immunomodulatory role [113], although its ligand is still unknown. UL147 arises with a potential role in immune response and chemokine activity, likely due to its homology with UL146 which is already characterized [90,143]. Finally, US33A, which is present in Towne, Toledo, TR and VR strains, showed a von Willebrand A (VWA) domain that is well characterized to be involved in cell adhesion with extracellular matrix proteins and integrin receptors [144] and could be likely be involved in signaling.

Transmembrane Region Analysis
The nine CMV genome sequences AD169, Towne, Toledo, TR, VR7863, TB40-E_UNC, HANSCTR4, AD169-BAC20 and Merlin (Table 1) were available at the Nucleotide database. In order to test to what level these nine selected CMV strains are representative of the available CMV genomes in GenBank and the ORF from the 335 available CMV genomes containing 56217 ORFs (annotated in the NCBI database as human betaherpesvirus 5 complete genomes) were downloaded. Sequences were aligned using blastp with default parameters. Additionally, the complete set of ORFs coding for CMV proteins was aligned with the ORFs of the protein core in the pangenome. Blast results were filtered for sequence percentage identity ≥75% and breadth coverage ≥90%.
These nine genome sequences were analyzed to predict transmembrane domains within the open reading frames (ORFs). The analysis was carried out using the default parameters of three different bioinformatics approaches: PureseqTM (v1.2) [145], Phobius (v1.01, Stockholm Bioinformatics Center, Sweden) [146] and TMHMM (DTU Health Tech, Lyngby, Denmark) (v1.1) [147]. Phobius and TMHMM are based on sequence methods as hidden Markov model (HMM) approach, while PureseqTM adds an extra layer of prediction based on deep learning. All methods were expected to show similar output with differences in the predictive threshold for the same set of analyzed ORFs. The tool SignalP6.0 (DTU Health Tech, Lyngby, Denmark) was used for signal peptide analysis [148].

Functional Annotation of the Transmembrane ORFs
Predicted TM proteins were analyzed using Mantis protein function annotation v1.1.1 [149] which is a stand-alone tool that uses HMMER or Diamond to match sequences against multiple reference datasets to produce high-quality consensus driven protein annotations, under default parameters. This analysis uses the information from the following different available protein function databases: KOfam [150], Pfam [151], eggnog [152], NCBI protein family models (NPFM) [153], and TIGRfams [154] and sets a consensus result. In parallel, we searched for sequence homology in the RefSeq database using blast with Blastp (v2.9.0) [155] (e-value < 10 −3 ) looking for orthologous proteins. Orthologous proteins are proteins found in other species that maintain the same or close functionality as the studied protein. Proteins with unknown functions could be related to their orthologous protein function this way.
Additional systematic review was performed to retrieve related articles published on the PubMed database website (https://pubmed.ncbi.nlm.nih.gov/, accessed on 15 December 2021) until July 2021. For each of the genes studied, articles were identified using the following search terms: "HHV-5" AND or "CMV".

Similarity Analysis
Core proteins for all nine CMV strains were aligned using Clustal Omega (v1.2.1) [158] with -percent-id and -full parameters to obtain identity matrices. Thirty-nine matrices were obtained and results were plotted as individual heatmaps using heatmap.2 from gplots package (v3.1.1). In parallel, a percentage identity heatmap was plotted for the comparison between AD169 and the other studied strains 39 core proteins, following the same parameters.

Protein Analysis
UL2 and UL124 ORFs were amplified by PCR using CMV BADrUL131-Y4 BAC as a template, with specific primers (Table S2), and the Phusion DNA polymerase (Thermo Scientific). PCR products and the pcDNATM3.1/myc-His (-) (5.5 kb) vector (Invitrogen) were digested with the appropriate restriction enzyme (FastDigest enzymes, Thermo Scientific, Waltham, MA, USA), ligated (Ligase, Thermo Scientific, Waltham, MA, USA) and transformed into the XL10 Gold chemically competent E. coli cells. The constructed plasmids were verified by sequencing (Table S2). The pcDNATM3.1/myc-His (Thermo Scientific, Waltham, MA, USA) constructs containing the UL2 and UL124 ORFs were transfected into the HEK 293T human cell line, using the CaCl 2 transfection method. Briefly, the day before transfection, 1,500,000 HEK 293T cells were seeded in a 10-cm plate and the next day transfections were carried out with approximately 70-80% of cell confluence. Four hours before the transfection, fresh medium was added to cells. For the transfection, 750 µL of CaCl 2 were mixed with 40 µg of the DNA construct. 750 µL of HBS saline buffer (140 mM NaCl, 1.5 mM Na 2 HPO 4 , 50 mM HEPES) were added and the mixture was incubated for 15 min at room temperature. Subsequently, the mixture was added dropwise to each plate containing the 293T cell monolayer, and gently mixed. Transfected cells were incubated during 48 h at 37 • C with 5% CO 2 . Twenty-four hours post-transfection, fresh medium was added to cells. Transfected cells were pelleted and treated with Mem-PER TM Plus Membrane Protein Extraction Kit (Thermo Scientific, Waltham, MA, USA) according to the manufacturer's instructions. Cytoplasmic and plasma membrane protein fractions were obtained and quantified by Bradford. Ten µg of protein lysates were separated on a gradient 4-20% pre-cast SDS gel (BioRad, Hercules, CA, USA), transferred to 0.2 µm nitrocellulose membranes. Detection of the expressed proteins was performed using an anti-Myc monoclonal antibody (MA1-21316, Invitrogen, Waltham, MA, USA) at a 1:1000 dilution in blocking buffer (1X PBS + 0.1% Tween 20 + 5% skim milk) and incubated at 4 • C overnight, with a secondary horseradish peroxidase (HRP)-labeled anti-mouse IgG antibody (diluted 1:2000; 05/2019, Cell Signaling, Danvers, MA, USA). Stain free technology (BioRad, Hercules, CA, USA) of the acrylamide gel was used in parallel as a loading control.

Conclusions
Our results highlight the utility of using these bioinformatic tools to gain knowledge of previously uncharacterized proteins that may be useful to select potential targets of the immune system. Our work also suggests that differences among the strains may be crucial for CMV tropism, replication, latency or the evasion of the host immune response [13]. Among the 77 identified proteins with predicted TM domains, only 39 (designated as a core TM proteome) were shared by all analyzed strains most of which were highly conserved which may have potential clinical relevance for the design of new therapeutics and preventives measures. This group of proteins was highly conserved in all nine strains analyzed, which may have solved the limitation of the variability among laboratory strains and clinical isolates, facilitating the extrapolation of the results.
In addition, the study of the role of the previously uncharacterized proteins may provide novel candidates for new preventive or prophylactic measures against CMV infection. Furthermore, if their location in the viral envelope is confirmed, they could be used as viral antigens for developing a vaccine or monoclonal antibodies. It is noteworthy to highlight protein candidates such as UL139 and US33A, which seem to be involved in the adhesion with the host cell or UL10 in which its location in the membrane has been demonstrated but its function has not been fully characterized [52].
In conclusion, using a complex in silico analysis we have predicted CMV proteins with TM domains that could be of interest because of their possible role in virus-cell interaction and entry. Our approach has been very useful to search for new potential candidates for a more rational design of new treatment targets and vaccines as well as to increase our knowledge of CMV.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.