Viral and Host Genetic and Epigenetic Biomarkers Related to SARS-CoV-2 Cell Entry, Infection Rate, and Disease Severity

Simple Summary COVID-19 emerged as a new disease with quick transmission and a high mortality rate at the end of 2019, caused by SARS-CoV-2. Common features of the coronavirus family helped resolve structural and entry mechanism characteristics of SARS-CoV-2. Still, rapid mutagenesis leads to the fast evolution of the virus and the emergence of new strains that differ in infectivity, morbidity, and mortality. Besides differences in the viral genome, genetic variability in the host defense and immune systems may also play a role in the outcome of virus–host interactions. Furthermore, epigenetic mechanisms may also influence the outcomes, including miRNA gene silencing and DNA methylation, which may be heavily influenced by SARS-CoV-2. Molecular biomarkers are intensively investigated as potential predictive and prognostic biomarkers of the disease course and treatment response. We reviewed new data regarding the mechanisms behind fast virus mutagenesis, infectivity, and potential human genetic and epigenetic characteristics that may lead to a more severe or lethal outcome of the disease. Abstract The rapid spread of COVID-19 outbreak lead to a global pandemic declared in March 2020. The common features of corona virus family helped to resolve structural characteristics and entry mechanism of SARS-CoV-2. However, rapid mutagenesis leads to the emergence of new strains that may have different reproduction rates or infectivity and may impact the course and severity of the disease. Host related factors may also play a role in the susceptibility for infection as well as the severity and outcomes of the COVID-19. We have performed a literature and database search to summarize potential viral and host-related genomic and epigenomic biomarkers, such as genetic variability, miRNA, and DNA methylation in the molecular pathway of SARS-CoV-2 entry into the host cell, that may be related to COVID-19 susceptibility and severity. Bioinformatics tools may help to predict the effect of mutations in the spike protein on the binding to the ACE2 receptor and the infectivity of the strain. SARS-CoV-2 may also target several transcription factors and tumour suppressor genes, thus influencing the expression of different host genes and affecting cell signalling. In addition, the virus may interfere with RNA expression in host cells by exploiting endogenous miRNA and its viral RNA. Our analysis showed that numerous human miRNA may form duplexes with different coding and non-coding regions of viral RNA. Polymorphisms in human genes responsible for viral entry and replication, as well as in molecular damage response and inflammatory pathways may also contribute to disease prognosis and outcome. Gene ontology analysis shows that proteins encoded by such polymorphic genes are highly interconnected in regulation of defense response. Thus, virus and host related genetic and epigenetic biomarkers may help to predict the course of the disease and the response to treatment.


Introduction
COVID-19 emerged as a new disease with quick transmission and a high mortality rate at the end of 2019, caused by SARS-CoV-2. In mid-2020, the pandemic was declared. Since then, more than 200,000 scientific papers have been published on COVID-19.
SARS-CoV-2 enters the cell through the binding of S1 proteins with the help of the ACE2 receptor [1,2] (Figure 1). TMPRSS2 and furin are necessary for proteolytic activation of the virus SARS-CoV-2 [3], since furin and TMPRSS2 inhibitors were shown to block SARS-CoV-2. [3]. TMPRSS2 cleaves S1/S2 at the cleavage site in SARS-CoV-2, a process requiring pre-cleavage by furin [4,5]. Once inside the lysosome, the viral envelope gets degraded with lysosome protease. RNA-dependent RNA polymerase (RdRp) is first to be translated. With the help of RdRp, a negative strand of RNA is produced that serves as a template for the multiplication of viral RNA. It was suggested that SARS-CoV-2 RNAs could be reverse transcribed and integrated into the human genome [6]. The large CpG island (18 CpG) in region 151-368 could contribute to the regulation of the expression of the viral template at the appropriate time for virus reactivation (UCSC Genome Browser, CpG islands, http://genome.ucsc.edu/ (accessed 15 December 2021)). Upon binding, the protease cleaves S1 protein. S1 interacts with different cell membrane proteins, which causes the membrane to envelop the virus, accelerating endocytosis. The endosome goes through all phases to the lysosome, where RNA is released from the endosome and enters the cytoplasm. At all stages, polymorphisms may influence the virus entry and duplication. When virus RNA is released into the cytoplasm, miRNA effects step in. Virus offensive mechanisms contra cell defense efforts and continue to duplicate. Human DNA methylation effects should be noticed throughout the process.
From the cytoplasm, where proteins are translated, proteins get transported into the endoplasmic reticulum (ER), where they are post-translationally modified and transported towards the Golgi apparatus (GA) via the intermediate compartment (IC) [7]. IC Upon binding, the protease cleaves S1 protein. S1 interacts with different cell membrane proteins, which causes the membrane to envelop the virus, accelerating endocytosis. The endosome goes through all phases to the lysosome, where RNA is released from the endosome and enters the cytoplasm. At all stages, polymorphisms may influence the virus entry and duplication. When virus RNA is released into the cytoplasm, miRNA effects step in. Virus offensive mechanisms contra cell defense efforts and continue to duplicate. Human DNA methylation effects should be noticed throughout the process.
From the cytoplasm, where proteins are translated, proteins get transported into the endoplasmic reticulum (ER), where they are post-translationally modified and transported towards the Golgi apparatus (GA) via the intermediate compartment (IC) [7]. IC virion is assembled by wrapping virion proteins around N protein, bound to viral RNA. The assembled viral particles may be exocytosed from the cell [2,8]. SARS-CoV-2 was detected in multiple organs of a COVID-19 patient who had died because of a multiorgan failure. Besides the respiratory system (e.g., lungs and trachea), it also infected the kidneys, small intestines, pancreas, blood vessels, and other tissues, such as sweat glands and vascular endothelial cells in the skin [9].
SARS-CoV-2 replicates more actively and effectively in human lung tissues than SARS-CoV; a higher viral load was found likely due to ongoing immune evasion mechanisms or defective viral clearance [10]. Mutations in the S1 protein or other regions involved in binding and entry into the human cell have been associated with the infectivity of a different strain of SARS-CoV-2.
Patients infected with SARS-CoV-2 may have no symptoms or develop a critical illness. Five different categories of severity exist according to the NIH data, as follows: (1) asymptomatic, pre-symptomatic infection-positive test and no symptoms; (2) mild illness with symptoms-fever, cough, sore throat, malaise, headache, muscle pain, nausea, vomiting, diarrhoea, and loss of taste and smell-but without shortness of breath, dyspnoea, or abnormal chest imaging; (3) moderate illness with lower respiratory disease and oxygen saturation ≥ 94%; (4) severe illness with oxygen saturation under 94%, the ratio of arterial partial pressure of oxygen to fraction of inspired oxygen < 300 mm, a respiratory rate > 30 breaths/min, or lung infiltrates > 50%; (5) critical illness with respiratory failure, septic shock, and/or multiple organ dysfunction [11].
Genetic predisposition for viral infection or disease progression has been proposed/ suggested. Biomarkers can alarm the medical doctors to susceptibility or resistance of the patient towards SARS-CoV-2. Human biomarkers can be used to detect and predict the severity or life-threatening condition of COVID-19 disease. Disease severity can be foreseen prior to infection with COVID-19.
In this review, we have performed a literature and database search to summarize potential viral and host-related genomic and epigenomic biomarkers, such as genetic variability, miRNA, and DNA methylation in the molecular pathway of SARS-CoV-2 entry into the host cell, that may be related to COVID-19 susceptibility and severity.

Virus Strains
Viral mutations and recombination gave birth to new strains that may have different reproduction rates or infectivity and may impact the course and severity of the disease.
The COVID-19 virus strains were named after Greek alphabetical letters, and the designation is based on the positions and number of mutations. There are some disagreements regarding mutations belonging to specific strain groups, probably because different mutations evolved and spread further on different continents and states. Mutations labeled with * are present in some strains [15]. The Alpha variant has mutations of sites E484K*, D614G, delH69V70, and N501Y. Next, the Beta strain, characterized by E484K, D614G, A701V, N501Y, L242_244L, and K417N mutations, and the Gamma strain, with mutations of sites E484K, D614G, K417T, N501Y, and T20, evolved, and both of them outcompeted the wild-type strain [16]. The Delta variant, with mutations of sites D614G, L452R, P681R, and T478K emerged in April 2021. In July 2021, the Delta variant outcompeted all the other strains [16]. Iota has mutations of sites A701V*, E484K*, L452R*, and D614G; Epsilon has mutations of sites L452R and D614G; Eta has mutations of sites E484K, D614G, and delH69V70; Kappa has mutations of sites E484Q, D614G, L452R, and P681R.  [18].
A new Omicron variant (B.1.1.529) emerged at the end of November 2021 [19]. It is classified as a variant of concern (next to Beta, Gamma, and Delta) and holds 33 spike protein mutations, many of which were found in the Alpha and Delta strains [19].

Virus Mutations Position and Their Influence on SARS 2 Disease Development
SARS-CoV, which has a similar structure and RNA sequence to SARS-CoV2, had an estimated mutation rate of approximately 0.80-2.38 × 10 -3 nucleotide substitutions per site per year, and the non-synonymous and synonymous substitution of approximately 1.16-3.30 × 10 -3 and 1.67-4.67 × 10 -3 per site per year, respectively, which is similar to other RNA viruses [20]. The large CoV RNA genome allows modification by introducing ''non-lethal" mutations and recombination, leading to increased probability for intraspecies variability, interspecies "host jump", and novel CoVs to emerge [20]. SARS-CoV-2 has a higher fidelity in its transcription and replication process than other single-stranded RNA viruses because it has a proofreading mechanism, regulated by NSP14. However, despite this mechanism, the mutation rate is very high [21].
The mutation rate of SARS-CoV-2 is so high that it may impact diagnostic test accuracy [21]. In summary, the target spike and other SARS-CoV-2 proteins have numerous mutations. In total, 13,402 single mutations were found among 31,421 virus isolates, many of them located in coding regions currently used for COVID-19 diagnostic tests [21].
Information regarding 3D structure and mutagenesis is getting more accurate, which helps in drug development. Fast mutagenesis helps the virus evolve much faster than the human defense system can adapt to. Information regarding 3D structure and mutagenesis is getting more accurate, which helps in drug development. Fast mutagenesis helps the virus evolve much faster than the human defense system can adapt to.

Virus-host Interactions Affecting Viral Replication and Transcription
Viral infection triggers several mechanisms that are both virus-and host-dependent. On the one hand, viral replication affects transcription factors that promote viral replication; on the other hand, the host defense mechanism tries to activate factors to stop the virus from replicating itself.

Virus-Host Interactions Affecting Viral Replication and Transcription
Viral infection triggers several mechanisms that are both virus-and host-dependent. On the one hand, viral replication affects transcription factors that promote viral replication; on the other hand, the host defense mechanism tries to activate factors to stop the virus from replicating itself.
One CpG island with 18 CpG sites was detected at the start of viral RNA (UCSC Genome Browser). This site could impact the replication process and translation of proteins coded at the start of the RNA. If COVID-19 integrates into human DNA, this CpG island could impact when and how virus reactivation would start.
[47]    [36][37][38]. Interestingly, only the intron variant of ACE2 and two same sense variants of TMPRSS2 should influence SARS-CoV-2 entry (Table 1). Polymorphisms in the ADAM17 gene were suggested to have a role in the outcome of the disease [37]. It was proposed that the higher frequency of ACE D allele contributed to higher numbers of infected patients/million and mortality rate in the Asian population [66].
Vitamin D deficiency was also associated with a more severe course of COVID-19. Polymorphisms in genes that may lead to this condition, such as vitamin D transporter (GE), receptor (VDR), and NAD synthase gene (NADSYN1), were associated with the critical condition [65].
It was discovered that blood type might also contribute to susceptibility to the COVID-19 disease. HLA types also contribute to susceptibility and severity. Increased susceptibility to SARS-CoV-2 was discovered in ABO A-type patients (ABO (A, B, and O)), e4e4 genotype (APOE (e3 and e4)), HLA B, DRB1, DQB1, and DRB1 alleles, 3p21.31 region minor allele and novel missense variant in GOLGA8B rs200975425 and RIMBP3 rs200584390 [48].
Polymorphisms in genes that are directly or indirectly involved in the immune defense system contribute in different ways to infectivity severity and mortality of the SARS-CoV-2 disease (Table 1).
Several additional polymorphisms were associated with COVID  (Table 1).
Gene ontology analysis (WEBGESTALT) (Supplementary Materials (Supplementary File S1)) shows that proteins encoded by genes stated in Table 1 are highly interconnected in regulation of defense response (p = 4.0406e-10, enrichment ratio = 9.1490) (mostly through regulation of interferon alpha and beta (p = 5.1619 × 10 −10 , enrichment ratio = 277.89)). Mostly, they are localized on the cellular membrane (p = 0.0000054530, enrichment ration = 4.3801) or endosomal membranes (p = 0.00010358, enrichment ratio = 7.6820), where they combat against virus entry with the help of signal receptor activity (p = 0.00068370, enrichment analysis = 3.4852) and exopeptidase activity (p = 0.0010791, enrichment ratio = 14.740). Data were accessed as described in Supplementary Materials (Supplementary File S1). Genetic polymorphisms may change cell defense parameters and contribute to different susceptibility and later severity of disease.

Engineered ACE2 Mutations May Predict Virus-Host Interactions
Due to the main interaction with spike protein, ACE2 protein was thoroughly investigated for increased/decreased interaction with SARS-CoV-2 spike protein. They predicted an important role of mutations in regions: 19-42; 69-92; 324-330 (Table 2). Human ACE2 receptor with Y27, L330, and L386 triple mutation showed the highest increase in interaction with RBD of SARS-CoV-2 spike protein (Uniprot database section: "Mutagenesis"). The opposite effect was observed for ACE2 D355N mutation both in vitro and in vivo [70]. ACE2 D355A mutation has a similar effect when interacting with the spike protein of SARS-CoV-1 (Uniprot database section: "Mutagenesis"). These mutations are set in or close to the spike binding site (Figure 4).  The diversity of polymorphisms in genes with different functions indicates that the virus replication is affected in all stages, from entry, to transcription, to exocytosis (Figure 1). Knowledge of a patient's genetic background may support informed choice of treatment.

Changes in mRNA Expression-miRNA ''Silencing'' Interference
siRNA are small RNAs that interfere with RNA translation as they bind to mRNA. miRNA have a similar role and regulate gene expression in the cytoplasm. They are involved in transcriptional gene regulation and alternative splicing [71]. miRNA binding slows down replication of viral RNA and slows translation of viral proteins. Genetic variations may change either miRNA or target site sequence and thus change the expression The diversity of polymorphisms in genes with different functions indicates that the virus replication is affected in all stages, from entry, to transcription, to exocytosis (Figure 1). Knowledge of a patient's genetic background may support informed choice of treatment.

Changes in mRNA Expression-miRNA ''Silencing" Interference
siRNA are small RNAs that interfere with RNA translation as they bind to mRNA. miRNA have a similar role and regulate gene expression in the cytoplasm. They are involved in transcriptional gene regulation and alternative splicing [71]. miRNA binding slows down replication of viral RNA and slows translation of viral proteins. Genetic variations may change either miRNA or target site sequence and thus change the expression pattern of several genes.
The virus may interfere with RNA expression in host cells by exploiting endogenous miRNA and its viral RNA.
The virus attacks the cell through miRNA and subdues it for faster viral replication. Published data show that virus RNA highly interacts with human miRNAs to change the expression of important defense molecules [29,72,73]. Our analysis showed that numerous human miRNA may form duplexes with different coding and non-coding regions of viral RNA. These interactions could disturb important virus proteins' replication The virus attacks the cell through miRNA and subdues it for faster viral replication. Published data show that virus RNA highly interacts with human miRNAs to change the expression of important defense molecules [29,72,73]. Our analysis showed that numerous human miRNA may form duplexes with different coding and non-coding regions of viral RNA. These interactions could disturb important virus proteins' replication and/or translation. RNA polymerase and ribosomes require additional energy to remove duplexes during RNA duplication or protein synthesis.

Changes in DNA Methylation Profile
Coronaviruses can delay pathogen recognition and block interferon-stimulated genes [76]. Several known viral proteins associated with viral pathogenesis are controlled epigenetically [76]. Viruses such as Epstein-Barr virus and SARS-CoV-2 can demethylate the syncytin 1 and 2 genes, resulting in the augmentation of gene transcription [76]. Hypomethylation of ACE2 coupled with demethylation of interferon-and cytokine-regulated genes and enhanced NF-κB axis have been shown to contribute to SARS-CoV-2 disease severity [76]. Critically ill COVID-19 patients had hypermethylation of IFN related genes and hypomethylation of inflammatory genes [76].
An epigenome-wide COVID-19 study reported 51 CpG sites with different methylation profiles between moderate and severe cases [77]. After thorough analysis, 44 CpGs were marked as important, as follows: 15 CpG sites were located in human genomic regions with no currently described gene sequence; 6 CpG sites were associated with non-coding RNA; 23 CpG sites were located within 20 known coding genes. In 17 out of 20 coding genes (85%), the presence of hyper-methylation was significantly associated with transcript down-regulation-7 out of 20 were effectors of interferon signalling, as follows: AIM2, HLA-C, IFI44L, CXCR2, KIFAP3, SGMS1, and VIM [77]. Two CpG methylation sites were found in PM20D1, AIM2, and HLA-C [77].
Methylation of DNA may influence virus replication, and vice versa; the infection may start to change the methylation profile of patients. Better knowledge of epigenetic markers could perhaps help predict the course of the disease.

Extracellular Vesicles as Biomarkers
Extracellular vesicles (EVs), such as exosomes and microvesicles, could be used as biomarkers if sufficiently up-or down-regulated in COVID-19 patients. Several lipid molecules, GM3, and sphingomyelins were enriched in exosomes, while diacylglycerol levels were decreased [78].
EVs research is new, and it is expected that several new diagnostic methods will be developed, especially in virology and immunology.

Conclusions
This review described COVID-19 cell viral and host parameters that may influence SARS-CoV-2 entry and infectivity, along with factors that influence the susceptibility and severity of the COVID-19 disease. Virus mutations, strains, changes in transcriptome, miRNA ''silencing" interference, methylation profiles (epigenetics), and individual polymorphisms were reviewed.
COVID-19 appeared at the end of 2019, and several new strains have already been discovered. The new strain can overcome other strains in a few months. New mutations in the RBD domain can develop the virus into more infectious strains that cause diseases with more severe symptoms during infection. Sequencing analysis shows that beneficial mutations remain in the sequence of virus via outcompeting the wild-type Virus replication may also be influenced by host genetic variability during all phases from the entry and transcription to the final stage. Several human polymorphisms, miRNAs, and methylation profiles led to different susceptibility, severity, and even mortality of the disease. In the future, complete genome sequence information may support a professional, personalized medicine approach to treat and diagnose SARS-CoV-2.
Knowledge of genetic and epigenetic biomarkers may help predict the course of the disease and the response to treatment. Research of extracellular vesicles is also on the rise, so new COVID-19 biomarkers are anticipated due to differences in vesicle composition.  Data Availability Statement: All the data are included in the paper; any additional data is available from the authors upon request.