Knowledge of SARS-CoV-2 Epitopes and Population HLA Types Is Important in the Design of COVID-19 Vaccines

The COVID-19 pandemic has caused extensive loss of lives and economic hardship. In response, infectious disease experts and vaccine developers promptly responded by bringing forth candidate vaccines, some of which have been listed in the World Health Organization’s Emergency Use Listing. Notwithstanding the diverse worldwide population genetics, the vaccines thus far developed are generic in nature for use worldwide. Differences in the human leukocyte antigen (HLA) in different populations, variation of the T cell epitopes, and the propensity of SARS-CoV-2 genetic mutations left room for improvement of the vaccines. Here, we discussed the implications of COVID-19 vaccination and SARS-CoV-2 infection by taking into consideration SARS-CoV-2 mutations, T cell epitopes, risk factors, and current platforms of candidate vaccines based on the HLA types that are commonly present in Peninsular Malaysia Chinese, Indian, and Malay populations. The HLA types associated with protection against and susceptibility to severe SARS-CoV-2 infection were identified based on reported case-control and cohort studies. The relevance of including the non-spike SARS-CoV-2 proteins in the future COVID-19 vaccines is also highlighted. This review is meant to trigger researchers to acknowledge the importance of investigating the possible relationships between the HLA haplotype and the SARS-CoV-2 strains circulating in different populations.


Introduction
The COVID-19 pandemic has impacted our lives not only physically and mentally, but also economically. To date, approximately 580 million confirmed cases of COVID-19 and 6.4 million deaths have been recorded globally. The impact is no less devastating in Malaysia: with over 5000 confirmed cases daily, the total number of confirmed COVID-19 cases is approaching 5 million with almost 36,000 deaths [1].
The SARS-CoV-2 virus is the etiological agent responsible for several pneumonia-like cases that began in Wuhan, China [2]. Being the hub of transportation and industry for central China, the outbreak that started in early November, or December 2019, rapidly spread to become a pandemic [3]. Similar to other viruses that are transmitted through direct, indirect, or close contact with respiratory secretions or droplets from infected people [4], this Betacoronavirus spreading was greatly facilitated by international air travel [4]. The enveloped SARS-CoV-2 virus bears a large (approximately 30+ kb) single-stranded-positive sense RNA genome consisting of up to 14 open reading frames (ORFs) that are translated into the spike (S) protein, matrix (M) protein, envelope (E) protein, nucleocapsid (N) and about 16 non-structural proteins (nsps) [5]. Similar to other RNA viruses, SARS-CoV-2 also accumulates genomic mutations as it replicates owing to natural selections [6]. A number of mutations contribute to the augmented ability of the virus to replicate as well as to evade the host immune responses [6].
With the growing number of cases and the emergence of new SARS-CoV-2 mutants, infectious disease experts, epidemiologists, and public health officers have worked relentlessly to control the spread of the infection and at the same time to deduce the consequences of SARS-CoV-2 mutations. Just within a year since the COVID-19 pandemic started, vaccines have been manufactured and used by millions around the world. The exact mechanism of how SARS-CoV-2 caused severe COVID-19 disease, however, is still not known. Here, we look at the potential importance of the human leukocyte antigen (HLA) in COVID-19, Ref. [7] focusing on the multi-ethnic Malaysian population.
The Major Histocompatibility complex (MHC) system or HLA complex in humans is located on the short arm of chromosome 6 (6p21.3) [8]. Normally inherited as an en bloc from each parent in a no recombination event, linked HLA genes (HLA-A, -B, -C, -DR, -DQ, -DP) are combined as a HLA haplotype and transmitted on a single parental chromosome [9]. Abiding by its imperative functions in self-recognition, eliciting the immune response to an antigenic stimulus and to the regulation of cellular and humoral immunity, HLA class I antigens (HLA-A, -B, and -C) are expressed on the surface of all nucleated cells and platelets (except those of the central nervous system) [10] while the HLA class II antigens (HLA-DR, -DP, and -DQ) are expressed on antigen-presenting cells (APC) [10]. These highly polymorphic HLA loci are involved in antigen presentation to CD8+ T cells (HLA class I), natural killer cells, and CD4+ T cells (HLA class II) [11].
The fate of the SARS-CoV-2 virus and the outcomes of the infection are highly dependent on the efficiency of one's immune system, particularly the T-cell immunity. Considering that the HLA haplotype occurs differently in different populations, the efficiency in SARS-CoV-2 viral clearance and disease progression in return are speculated to be varied. Studies associated with SARS-CoV-2 and HLA have focused on the involvement of cytotoxic CD8+ T and helper CD4+ T lymphocytes as their responses are vital for initial viral clearance, the development of immunologic memory, and eventually for orchestrating the adaptive immune responses [12]. In this report, we explored the potential repercussions of SARS-CoV-2 infection based on the HLA allele frequencies in the Malaysian population highlighting the HLAs that could contribute to the protection or exacerbation of SARS-CoV-2 infection.

SARS-CoV-2 Specific T Cell Epitopes
The search for potential vaccine targets has led to numerous studies to decipher the T cell epitopes that can evoke the MHC-I and MHC-II responses. In Table 1, we present the distribution of SARS-CoV-2 T cell epitopes as predicted from the combinations of the cohort (unexposed and convalescent individuals), bioinformatics, and mathematical modeling studies. Presentation of multiple SARS-CoV-2 epitopes is deemed critical in the induction of vaccine-based and natural infection immunity [13][14][15][16]. Detection of post-infectious T cell immunity is feasible through the employment of SARS-CoV-2-specific peptides even in seronegative convalescent individuals [13,17]. In the absence of antibody responses, specific T cell responses were observed in seronegative convalescent donors but not in unexposed donors, hence emphasizing the activation of T cell immunity upon infection. The SARS-CoV-2 CD4+ T cell is essential in evoking persistent and robust immune responses compared to the HLA class I T cell epitopes [13]. CD4+ T cell recognizes multiple dominant HLA-DR T cell epitopes [13]. The SARS-CoV-2 M protein was recognized by specific CD4+ T cells in COVID-19 cases [15]. The inadequacy of quality class II epitopes from the M protein is contributed to by its small size [18]. Although class II epitopes are predominantly available across the SARS-CoV-2 genomes, it appears that highly expressed proteins are preferred by memory CD4+ T cells [19]. Table 1. Distribution of CD4+ and CD8+ epitopes based on SARS-CoV-2 proteins.

No. Protein(s) and Their (Respective Numbers of Epitopes) Subset
Ref.
These findings coincided with different subsets of SARS-CoV-2-specific T cells. Cytotoxic CD4+ T cells might not be a major contributor to SARS-CoV-2 clearance, since, unlike in influenza virus infections, CD107a+ CD4+ T cells (of cytotoxic potential) are scarcely detected [20]. Incorporating non-spike proteins such as N, M, and ORFs proteins in future vaccine design is perhaps beneficial as central memory and effector memory CD8+ T cells were identified in response to those proteins [20]. Ferretti et al., acknowledged in their study that next-generation vaccines incorporated with shared SARS-CoV-2 epitopes residing outside the spike protein will not only be independent of mutational variation but will also be better at eliciting SARS-CoV-2-specific CD8+ T cell immunity [16].
Heterologous immunity in SARS-CoV-2 infection is characterized by the pre-existing T cell responses against SARS-CoV-2 peptides [13]. The immunity is cross-reactive with common cold coronaviruses in 81% of unexposed individuals [13]. Mateus et al., 2020 also demonstrated the capability of SARS-CoV-2-specific memory CD4+ T cells to crossreact with corresponding~67% homologous sequences from any of the many different commonly circulating common cold human coronaviruses (HCoV)-OC43, -229E, -NL63, and -HKU1 [19]. However, this event seems to happen uniquely in one direction and not vice versa [19]. Despite being highly speculative and vague, the implications of pre-existing HCOVs memory CD4+ T cells on the magnitude of SARS-CoV-2 infection are ascertained [19]. Although the magnitude of T cell responses is not associated with disease severity, severely ill patients possibly lack pre-existing SARS-CoV-2 T cells. This is demonstrated by lower recognition rates of SARS-CoV-2 T cell epitopes in individuals with more severe COVID-19 symptoms compared to non-hospitalized patients with high antibody titers [13].
While bioinformatics-and mathematical modeling-type studies have limitations of their own [21][22][23][24], cohort and case-control studies also come with some drawbacks [13][14][15]19,20,25]. The cohort and case-control studies discussed in this review are largely affected by the number of donors. Meticulous evaluation in comparing mild and severe cases is inconceivable without taking diverse T cell receptors, peptide-MHC affinities, and antigen sensitivities for different epitopes into consideration [13][14][15]19,20,25]. The significance of these factors is worthy of being addressed in future studies. Different techniques applied to determine IFN-γ-producing SARS-CoV-2-specific T cell responses yield contrasting results. This drawback is a result of detection method discrepancies as demonstrated by peptide-stimulated activation-induced marker (AIM) assays and ELISpots and ICS assays in a recent immunogenicity study of recombinant adenovirus type-5-vectored COVID-19 vaccine human phase I trial [14,26]. Although both methods are valid, the functional relevance is different.
The geographical regions where the studied donors are recruited also influenced crossreactive responses as different coronaviruses (in both humans and animals) are circulating in different populations [27][28][29][30]. Pre-existing T cells exhibiting cross-reactivity do not necessarily imply previous coronavirus infections, but they could potentially be primed by other microbes too [31]. Thus, further detailed investigations based on this factor are necessary. Furthermore, the cohort studies focused on T cell responses in PBMCs instead of memory T cells at the site of infection most likely contributes to effective protection as observed in influenza virus infection [20]. The HLA class I and II loci identified in SARS-CoV-2 infection and epitopes recognition garnered from referred studies [15,[19][20][21][22]25,[32][33][34][35][36][37][38][39][40] based on the Malaysian population are presented in Figures 1 and 2. We extracted the HLA phenotype frequency as shown in Figures 1 and 2; from only "gold-standard" data sets comprising Malaysia Peninsular Chinese (n = 194) [32], Indian (n = 271) [33], and Malay (n = 951) [34] populations as available at allelefrequencies.net. The data set of HLA phenotype frequency in the Malaysian population is notably insufficient as it only covered HLA-A, -B, -C, -DRB1, and -DQB1 loci. This limited data set should be considered as an indicator for more investigations to be performed on Malaysian HLA phenotype frequency as it is vital not only for infectious diseases but for other diseases too. The HLA phenotype frequencies in these three main ethnicities in Malaysia are indisputably different to some extent. For example, HLA-DRB1*11:04 and -DQB1*04:01 are distinctly absent in Malaysia's Indian population [33].
The discrepancies between studies (case-control and/or cohort) pertaining to the association of HLA class I and II phenotypes with protection against; and susceptibility and/or severe SARS-CoV-2 infection are predicted to be greatly influenced by the population being studied and the alignment of the SARS-CoV-2 epitopes being used in the studies. For instance, HLA-DRB1*03:01 are associated with both protection against and susceptibility to SARS-CoV-2 infection in donors recruited from Oxford, the United Kingdom, and the Italian Bone Marrow Donor Registry (when occurred as haplotype HLA-A*:01:01g-B*08:01g-C*07:01g-DRB1*03:01g), respectively [20,35]. In addition, HLA-DRB1*03:01 is also associated with protection against SARS-CoV-1 infection in Taiwan's healthcare workers [40].
The outcomes from studies revolving around HLA-C*07:01 point to the association of this particular phenotype to both protection against, and susceptibility toward severe SARS-CoV-2 infections [35,36]. When occurred as haplotype HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g which is more frequent in the southern region of Italy, it is associated with protection against SARS-CoV-2 infection; and haplotype HLA-A*:01:01g-B*08:01g-C*07:01g-DRB1*03:01g which is more frequent in the northern region of Italy yields the opposite [35]. In another study involving a cohort in the Cagliari population (the southern region of Sardinia Island in Italy), the extended haplotype of HLA-C*07:01 is anticipated to provide protection against SARS-CoV-2 infection [36]. By scrutinizing all of the discussed studies, when it comes to investigating the relationship between HLAs and SARS-CoV-2 infection, it is desirable to begin research on SARS-CoV-2 sequences that are circulating in Malaysia, a bigger number of data set and diverse HLA phenotype in the Malaysian population; and all the more important is to identify the haplotype itself. From the two Italian studies, we can conclude that the outcome of SARS-CoV-2 infection is highly dependent on the polymorphism of particularly HLA-B*08:01, -DRB1*03:01, and -C*07:01 together with their combination with other alleles as a haplotype [35,36].
In the case of ACE2, conditions such as hypertension, diabetes, cardiovascular disease, and chronic obstructive pulmonary disease (COPD) are also described as risk factors [89][90][91][92][93]. As a receptor for SARS-CoV-2 and its vast distribution in the lung, adipose, and endothelial tissues, COVID-19 disease progression is associated with the above-mentioned co-morbidities [94,95]. The usage of an angiotensin-converting enzyme inhibitor (ACEI) and angiotensin-receptor blocker (ARB) therapy is debatable in our early endeavor with COVID-19 infections since antihypertensive medications can modulate the expression of ACE2 protein [96]. With the exclusion of ACE2 polymorphism, the genetic association with hypertension traits (hopefully) involving larger cohorts in future studies, the safety of continuing the consumption of ACEI/ARB among patients is, for now, certain [96].
The impact of SARS-CoV-2 infection on diabetic patients is also being observed. The differential expression of ACE2 protein in normal and diabetic patients was reflected by the ACE2 expression in their lung tissue, liver, and pancreas. The expression of ACE2 is higher in bronchial and alveolar [97], pancreatic islets [98], and liver [99] of subjects with diabetes compared to normal subjects. The ramifications of the increase in ACE2 expression in these three organs of diabetic patients on SARS-CoV-2 infection and COVID-19 pathology warrants further investigation. As other diseases are being hypothesized to be co-morbidities and contributing factors to the exacerbation of COVID-19 clinical outcomes, COPD is also associated with the increase in mortality rates of COVID-19 patients [100]. Together with smoking behavior, COPD sets a greater risk for progression towards severe COVID-19 outcomes [101][102][103].
Although previous vaccinations especially the Bacille Calmette-Guérin (BCG) vaccine extensively featured in SARS-CoV-2-related studies, researchers failed to come out with sound evidence to propose and justify the usage of BCG for prevention of COVID-19 [104]. A number of new SARS-CoV-2 variants emerging as a result of the mutation have made the efficiency of recently developed vaccines questioned by the public. A theoretical study by Agerer et al., 2021 demonstrated that the nonsynonymous point mutations in SARS-CoV-2 MHC-I-restricted epitopes enable the virus to hide from CD8+ T cell surveillance [105]. Toyoshima et al., identified in their recent findings that ORF1ab 4715L and S protein 614G variants are strongly correlated with fatality rates [76]. However worrisome the impact of SARS-CoV-2 mutations on mortality rate is, the most controversial mutations such as the ones involving spike D614G, P323L and N501Y are only responsible for higher viral load and younger age of patients; enhancement of SARS-CoV-2 transmission capacity and improvement of viral fitness in different geographical regions [106,107]. All things considered, our previous remark on including the least-mutated, non-spike, T-cell epitopes in SARS-CoV-2 vaccine development still stands.
Conventionally, two doses of vaccines (60% of the candidates) are required with a 14, 21, or 28 days interval in adherence to the prime-boost regimen, while 19% of the candidate vaccines only require one dose and 2% require three doses with the administration on day 0, 28 and 56 [108]. Most of the candidate vaccines are administered intramuscularly (76%), whereas 5% are intradermal and 3% are subcutaneous [108]. Currently, at the dawn of its pre-clinical development, two potential candidate vaccines are deemed to be convenient for vaccination as they are to be administered intranasally (Coroflu) [109] and as a skin patch (PittCoVac) [110], if successful. Recently, WHO has added five vaccines; BNT162 (Pfizer, USA), mRNA-1273 (Moderna, Cambridge, MA, USA), AZD-1222 (SK BIO, Seongnam, Korea), ChAdOx1_nCoV-19 (Serum Institute of India, Pune, India) and Ad26.COV2.S (Johnson & Johnson, New Brunswick, NJ, USA), into their Emergency Use Listing (EUL) for COVID-19 pandemic as the procedure, which has also served for the West Africa Ebola outbreak (2104-2016) [111]. More candidate vaccines that are listed in WHO EUL/PQ (prequalification) evaluation process are still waiting for a decision.
Similar to other countries, Malaysia started its National COVID-19 Immunisation Programme at the end of February 2021, immediately after acquiring five COVID-19 vaccines; BNT162 (Pfizer, New York, NY, USA), AZD-1222 (AstraZeneca, London, UK), CoronaVac (Sinovac, Beijing, China), Ad5-nCoV (CanSinoBIO, Tianjin, China), and Sputnik V (Gamaleya Research Institute of Epidemiology and Microbiology, Moscow, Russia) [112]. A total of 66.7 million doses were to cover 109.65% of those in the country in three phases [112]. Albeit short-lasting, mild to moderate COVID-19 vaccine side effects were anticipated and have been reported. Rare adverse events were also proclaimed however irrelevant the events were to the to COVID-19 vaccines. As of March 2021, 7 cases of blood clots in multiple blood vessels (disseminated intravascular coagulation, DIC) and 18 cases of cerebral venous sinus thrombosis (CVST) have been reviewed by European Medicines Agency (EMA)'s Pharmacovigilance Risk Assessment Committee (PRAC), out of 20 million vaccinated people in the United Kingdom and European Economic Area.113 However, a causal link between these events and AstraZeneca's COVID-19 vaccine is still not proven and its benefits still outweigh the risks although these events deserve further investigation [113].
Contemplating our genetic make-up, the genetic risk factors that might contribute to severe COVID-19 disease and severe post-vaccination side effects should also be considered. For example, the hypertension traits in human chromosomes as well as Factor V Leiden which is the most common form of inherited thrombophilia with the highest occurrence of heterozygosity rate in Europe [114][115][116][117]. Before denouncing COVID-19 vaccines, it is wiser to consider all plausible explanations and reasons that originate from risk factors around us.

Conclusions
Considering most of the vaccines were developed with spike protein and this protein is prone to mutation, there is always room for improvement. By considering the SARS-CoV-2 T cell epitopes and the HLAs that are associated with protection against, and/or associated with susceptibility/severity to SARS-CoV-2 infection; the future COVID-19 vaccine design is expected to be developed based on the non-spike proteins such as NP, M, and ORFs. As highlighted in this perspective review, investigating the polymorphism of HLA-B*08:01, -DRB1*03:01, and -C*07:01 in a population is crucial as the outcomes of COVID-19 disease and vaccination are highly dependent on these alleles. Together with potential pre-existing cross-reactive T cell immunity to SARS-CoV-2, the performance of this future vaccine could provide better COVID-19 clinical outcomes and influence the herd immunity of all epidemiological models.