Immunoinformatics Identification of the Conserved and Cross-Reactive T-Cell Epitopes of SARS-CoV-2 with Human Common Cold Coronaviruses, SARS-CoV, MERS-CoV and Live Attenuated Vaccines Presented by HLA Alleles of Indonesian Population

Reports on T-cell cross-reactivity against SARS-CoV-2 epitopes in unexposed individuals have been linked with prior exposure to the human common cold coronaviruses (HCCCs). Several studies suggested that cross-reactive T-cells response to live attenuated vaccines (LAVs) such as BCG (Bacillus Calmette–Guérin), OPV (Oral Polio Vaccine), and MMR (measles, mumps, and rubella) can limit the development and severity of COVID-19. This study aims to identify potential cross-reactivity between SARS-CoV-2, HCCCs, and LAVs in the context of T-cell epitopes peptides presented by HLA (Human Leukocyte Antigen) alleles of the Indonesian population. SARS-CoV-2 derived T-cell epitopes were predicted using immunoinformatics tools and assessed for their conservancy, variability, and population coverage. Two fully conserved epitopes with 100% similarity and nine heterologous epitopes with identical T-cell receptor (TCR) contact residues were identified from the ORF1ab fragment of SARS-CoV-2 and all HCCCs. Cross-reactive epitopes from various proteins of SARS-CoV-2 and LAVs were also identified (15 epitopes from BCG, 7 epitopes from MMR, but none from OPV). A majority of the identified epitopes were observed to belong to ORF1ab, further suggesting the vital role of ORF1ab in the coronaviruses family and suggesting it as a candidate for a potential universal coronavirus vaccine that protects against severe disease by inducing cell mediated immunity.


Introduction
Severe acute respiratory syndrome coronavirus (SARS-CoV-2) is a new variant of coronavirus that caused an ongoing pandemic of coronavirus disease 2019 (COVID-19) [1]. As of May 2022, approximately 519 million cases and 6.3 million deaths had been reported globally [2]. A variety of studies and research have been undertaken into both preventive and curative measures against the SARS-CoV-2 virus. SARS-CoV-2 is an RNA virus from betacoronavirus of Coronaviridae family. Its genome is a positive single-stranded RNA with a length of 26 to 32 kb, which encodes several structural and nonstructural proteins. SARS-CoV-2 is characterized by having spike proteins on the surface of the enveloped The protein sequences of coronaviruses were primarily retrieved from NCBI (https: //www.ncbi.nlm.nih.gov) on 14 september 2021. The reference numbers of each sequence are as follows: NC_002645.1 (229E), NC_006577.2 (HKU1), NC_005831.2 (NL63), NC_006213.1 (OC43), and NC_045512.2 (SARS-CoV-2) [24]. From each sequence, the encoded proteins present in all strains, namely ORF1ab, S, E, M, and N proteins, were selected for the subsequent analysis. The reference sequence of SARS-CoV-2 (NC_045512.2) was primarily used for T-cell epitope prediction.
For additional analysis performed in this study, ORF1ab sequences of SARS-CoV (NC_004718.3), MERS-CoV (NC_019843.3) were retrieved through NCBI [24]. ORF1ab sequences of SARS-CoV-2 VOCs Alpha (B.1.1.7), Beta (B.1.351), Delta (B.1.617.2), and Gamma (P.1) were obtained from the previous study conducted by Gustiananda et al. (2021) [25] and used for the analysis. The sequence for omicron (B. 1.1.529) variants was retrieved through NCBI on 13 February 2022 from Asian countries only. The included sequence was filtered through several criteria, such as a complete sequence of 7093-7096 amino acids in length with no ambiguous character and human as the host. The FASTA file of the sequences of SARS-CoV-2 VOCs used in this study can be found in File S1 of Supplementary Materials.

HLA Alleles Predominant in Indonesia Identification
The HLA alleles predominant in the Indonesian population were identified through the Allele Frequency Net Database (http://allelefrequencies.net/) [26] which was accessed on 16 september 2021. The run was done using 'HLA Classical Allele Freq Search' options, with the country parameters set to 'Indonesia'. The HLA Class I A, HLA Class I B, and HLA Class II DRB1 alleles with frequencies of more than 5% were chosen for further analysis.
The selected 9-mer CTL epitopes were then checked for their immunogenicity using the immunogenicity tool provided by IEDB (http://tools.iedb.org/immunogenicity/) which was accessed on 21 April 2022 [28]. Since the TCR contact residue of CTL epitopes is located in positions 3-8, residue numbers 1, 2, and C-terminal were selected to be masked in the available option. Peptides with positive immunogenicity were selected for further analysis.

Helper T Lymphocyte (HTL) Epitopes Prediction
Prediction of peptide binding to HLA Class II DRB1 was conducted using netMHCIIpan 4.0 tools (https://services.healthtech.dtu.dk/service.php?NetMHCIIpan-4.0) which was accessed on 3 November 2021 [29]. Several parameters, including peptide length (15-mer), the threshold for strong binder (1%), and the threshold for weak binder (5%), were set accordingly. In addition, the strong binder epitopes, denoted by 'SB,' were selected for further analysis. Like the CTL epitopes, promiscuous epitopes that bind to different HLA alleles were selected for further analysis.
The selected 15-mer HTL epitopes were checked for their IFN-γ inducing ability through the IFNepitope server (http://crdd.osdd.net/raghava/ifnepitope/predict.php) which was accessed on 29 April 2022 [30]. The HTL must be able to induce IFN-γ to ensure induction of an appropriate immune response. The algorithm chosen for the prediction is the Support Vector Machine (SVM) based, while the model is left as default as 'IFNgamma vs. Non-IFN-gamma.' Peptides with positive IFN-γ scores were selected for further analysis.

IEDB Epitopes Validation
Experimentally known epitopes from different studies are cataloged and available to be checked through the Immune Epitope Database (IEDB) (http://www.iedb.org) which was accessed on 17 May 2022 [31]. The short-listed epitopes, both CTL from step 2.3 and HTL from step 2.4, were checked against the database as validation measures. The epitope source was specified as 'SARS-CoV-2', 'Human' as the host, and antigen according to each generated epitope (ORF1ab, S, E, M, and N). Peptides reported and validated in IEDB were usually prioritized to be included in subsequent studies.

Conservancy and Cross-Reactivity Analysis of SARS-CoV-2 T-Cell Epitopes with HCCCs
In order to analyze the cross-reactivity of SARS-CoV-2 with other human CCCs, the conservancy analysis was done using the IEDB epitope conservancy analysis tool (http: //tools.iedb.org/conservancy/) which was accessed on 9 March 2022 [32]. The respective epitopes from each identified protein were run against the four human CCCs. As a result, the epitopes that share 100% similarity across all strains were identified. Information on the 100% conserved epitopes, such as location in the gene, proteins, and domains in all five Additional conservancy analysis was performed against the fully conserved epitopes. For more defined and complete results, the conserved epitopes were again run against SARS-CoV, MERS-CoV, previous, and currently circulating SARS-CoV-2 VOCs. Information regarding the epitopes in different viruses was also gathered and compared.
Epitopes that possess different residues in non-critical positions still have the possibility of binding to the same HLA alleles, provided that the residues fulfill the HLA binding motif. The epitopes that have identical TCR contact residue might still be recognized by the same T-cell clone. The TCR contact residues of CTL epitopes are located in residue number 3-8, while HTL epitopes are located in residue number 2, 3, 5, 7, and 8. Therefore, epitopes that possessed identical TCR contact residues across five different viruses during the conservancy run were identified. The sequence identity cutoff value in IEDB epitope conservancy analysis tool was first set to 67% to ensure optimal recognition and binding [33]. Following that, epitopes with identical TCR contact residue were sorted. The identified epitopes were then subjected to another NetCTLpan 1.1 analysis (https:// services.healthtech.dtu.dk/service.php?NetCTLpan-1.1) for CTL epitopes and NetMHCI-Ipan 4.0 analysis (https://services.healthtech.dtu.dk/service.php?NetMHCIIpan-4.0) for HTL epitopes [27,29]. Both servers were accessed on 2 May 2022. The parameters for each prediction were identical to steps 2.3 and 2.4. This step was conducted to ensure that the residue difference in the sequence indeed did not affect the epitopes' binding ability to the HLA alleles.

Cross-Reactivity Analysis of SARS-CoV-2 T-Cell Epitopes with OPV, BCG, and MMR
The cross-reactivity analysis was done by using the Blastp (Protein BLAST) from the NCBI server (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) which was accessed on 23 April 2022 to identify similar amino acid sequences between SARS-CoV-2 T-cells epitopes with the epitope sequences from OPV, BCG, and MMR. The epitopes sequences were inputted into the "Query sequence" box. Meanwhile, the taxon ID from each live attenuated vaccine (OPV, BCG, MMR) was inputted into the "Organism" box. The taxon ID used in the epitope cross reactivity analysis are as follows: Mycobacterium tuberculosis variant bovis (1765), Human poliovirus 1 (12080), Human poliovirus 2 (12083), Human poliovirus 3 (12086), Measles virus strain Schwarz (132487), Rubella virus (11041), and Mumps orthorubulavirus (2560602). All of the options in the "exclude" section were selected. Some settings in the "Algorithm parameters" section were adjusted, such as the "Expect threshold" being 30000, the "Word size" being 2, and the "Matrix" being changed to PAM30; the "Compositional adjustments" was set to no adjustments, and the option in the "Filter" section was clicked.

Cross-Reactivity of Predicted Epitopes with Human Peptides
Blastp analysis (http://blast.ncbi.nlm.nih.gov/ Blast.cgi) was conducted to find SARS-CoV-2 epitopes that match with human peptides. The website was accessed on 2 May 2022. The method was adapted from previous studies [22,25]. The Blastp algorithm parameter was set as follows: expect threshold 30000, word size 2, matrix PAM30, gap cost set to existence = 9 and extension = 1; the compositional parameter was set to no adjustment, while the low complexity filter was disabled and automatically adjusted for short input sequences. The results from Blastp analysis were transferred into Microsoft Excel and were screened for the peptides which shared at least contiguously 7 identical amino acid residues with the human peptides with no gap and no mismatches residue.

Population Coverage Analysis
In order to assess whether the cross-reactive and short-listed epitopes are prevalent in Indonesia, population coverage analysis was performed through the IEDB population coverage analysis tools (http://tools.iedb.org/population/) which was accessed on 25 July 2022 [34]. The epitopes and corresponding HLA alleles that bind were inputted. Then, several parameters, including the population (Indonesia) and calculation options (Class I separate, Class II separate, and Class I and II combined), were set accordingly. The population coverage was then further analyzed.

Epitope Selection for Universal Coronavirus Vaccine Construct
Following different analyses, epitopes with attractive traits to be included in the coronavirus vaccine construct were further sorted. Criteria included conservancy, promiscuity, immunogenicity, IFNγ score, population coverage, no similarity with human peptides and IEDB reported peptides.

HLA Allele Frequencies of Indonesian Population
Through the allele frequencies database, a total of 21 alleles from both HLA Class I and II with frequencies equal to or more than 5% were identified and used for the epitopes prediction ( Figure 1). Out of 21, 15 of them belonged to HLA Class I alleles where six belonged to Class A and nine belonged to Class B. The highest frequencies from HLA Class I alleles were found to be HLA-A*24:07, with average frequencies of 23.1% coming from three different Indonesian populations. However, the highest frequency allele was located among the remaining six alleles that belonged to the HLA Class II DRB1 alleles. The HLA-DRB1*12:02 followed by HLA-DRB1*15:02 recorded an average frequency of 34.3% and 28.32%, coming from eight and six different Indonesian populations, respectively, making them the two highest frequency alleles of the HLA in the Indonesian population identified in this study.
acid residues with the human peptides with no gap and no mismatches residue.

Population Coverage Analysis
In order to assess whether the cross-reactive and short-listed epitopes are prevalent in Indonesia, population coverage analysis was performed through the IEDB population coverage analysis tools (http://tools.iedb.org/population/) which was accessed on 25 July 2022 [34]. The epitopes and corresponding HLA alleles that bind were inputted. Then, several parameters, including the population (Indonesia) and calculation options (Class I separate, Class II separate, and Class I and II combined), were set accordingly. The population coverage was then further analyzed.

Epitope Selection for Universal Coronavirus Vaccine Construct
Following different analyses, epitopes with attractive traits to be included in the coronavirus vaccine construct were further sorted. Criteria included conservancy, promiscuity, immunogenicity, IFNγ score, population coverage, no similarity with human peptides and IEDB reported peptides.

HLA Allele Frequencies of Indonesian Population
Through the allele frequencies database, a total of 21 alleles from both HLA Class I and II with frequencies equal to or more than 5% were identified and used for the epitopes prediction ( Figure 1). Out of 21, 15 of them belonged to HLA Class I alleles where six belonged to Class A and nine belonged to Class B. The highest frequencies from HLA Class I alleles were found to be HLA-A*24:07, with average frequencies of 23.1% coming from three different Indonesian populations. However, the highest frequency allele was located among the remaining six alleles that belonged to the HLA Class II DRB1 alleles. The HLA-DRB1*12:02 followed by HLA-DRB1*15:02 recorded an average frequency of 34.3% and 28.32%, coming from eight and six different Indonesian populations, respectively, making them the two highest frequency alleles of the HLA in the Indonesian population identified in this study.

Prediction of CTL Epitopes from SARS-CoV-2 Proteins
The prediction of CTL epitopes from SARS-CoV-2 protein with the HLA Class I alleles (%Rank <1%) resulted in 1599 bindings combined from 5 different SARS-CoV-2 proteins. In the process, it was found that HLA-B*15:21 and HLA-A*11:01 showed the most significant number of bindings, at 136 and 133 epitopes, respectively. Overall, as shown in Figure 2, each HLA Class I allele is predicted to bind to a large number of epitopes; the least records a total of 75 bindings, by HLA-B*44:03. Moreover, around 80% of the bindings from each Viruses 2022, 14, 2328 6 of 27 allele were contributed by ORF1ab epitopes (Figure 2; blue color), followed by spike (S), membrane (M), nucleocapsid (N), and envelope (E) protein.

Prediction of CTL Epitopes from SARS-CoV-2 Proteins
The prediction of CTL epitopes from SARS-CoV-2 protein with the HLA Class I alleles (%Rank <1%) resulted in 1599 bindings combined from 5 different SARS-CoV-2 proteins. In the process, it was found that HLA-B*15:21 and HLA-A*11:01 showed the most significant number of bindings, at 136 and 133 epitopes, respectively. Overall, as shown in Figure 2, each HLA Class I allele is predicted to bind to a large number of epitopes; the least records a total of 75 bindings, by HLA-B*44:03. Moreover, around 80% of the bindings from each allele were contributed by ORF1ab epitopes (Figure 2; blue color), followed by spike (S), membrane (M), nucleocapsid (N), and envelope (E) protein. From the NetCTLpan 1.1 prediction, promiscuous peptides from SARS-CoV-2 (ORF1ab, S, E, M, and N) were further sorted and checked for their immunogenicity and population coverage. Promiscuous peptides here are defined as peptides that can bind with at least two different HLA Class I alleles. Out of 1599 predicted peptides, 414 of them are promiscuous. Highly promiscuous peptides that bind to at least five different HLA Class I alleles are listed in Table 1, while the complete list of peptides identified in this study can be found in Supplementary Materials (Table S1). Most of the peptides were observed to belong to the ORF1ab. With starting residue number 1767, the VMYMCTLSY peptide belonging to ORF1ab binds to 9 different peptides (HLA-  From the NetCTLpan 1.1 prediction, promiscuous peptides from SARS-CoV-2 (ORF1ab, S, E, M, and N) were further sorted and checked for their immunogenicity and population coverage. Promiscuous peptides here are defined as peptides that can bind with at least two different HLA Class I alleles. Out of 1599 predicted peptides, 414 of them are promiscuous. Highly promiscuous peptides that bind to at least five different HLA Class I alleles are listed in Table 1, while the complete list of peptides identified in this study can be found in Supplementary Materials (Table S1). Most of the peptides were observed to belong to the ORF1ab. With starting residue number 1767, the VMYMCTLSY peptide belonging to ORF1ab binds to 9 different peptides (HLA-       Besides binding affinity to the HLA molecules, immunogenicity is an essential determinant in peptide recognition by TCR. Highly immunogenic CTL epitopes with a score >0.35 are listed in Table 2. A total of 14 highly immunogenic CTL epitopes (score >0.35) were identified along with their HLA Class I that binds, population coverage, and previously reported IEDB validation. The DYVYNPFMI epitope from ORF1ab recorded the highest immunogenicity score of 0.56221, bound to two HLA Class I alleles, and covered 58.77% of the Indonesian population. The binding with HLA-A*24:02 was also experimentally proven by HLA ligand assay and reported in IEDB. a The binding between the peptide and HLA has been experimentally proven by T-cell assay and reported in IEDB; b The binding between the peptide and HLA has been experimentally proven by HLA ligand assay and reported in IEDB; c The binding between the peptide and HLA has been experimentally proven by both T-cell assay and HLA ligand assay and reported in IEDB.

Prediction of HTL Epitopes from SARS-CoV-2 Proteins
Despite the number of HLA Class II alleles being almost one-third the number of HLA Class I alleles, they record a nearly similar total number of predicted epitope bindings ( Figure 3). The HLA-DRB1*15:02 allele holds the highest record, with a total of 299 HTL epitopes predicted to bind to it. In contrast, the HLA-DRB1*11:01 recorded a total of 170 bindings, the lowest record in HLA Class II alleles. All of the HLA Class II DRB1 included in this prediction binds to many more peptides when compared to HLA Class I. Moreover, consistent with the HLA Class I, ORF1ab is still responsible for most of the bindings (Figure 3; blue color), approximately 80% of the total. This is followed, in order, by spike, nucleocapsid, membrane, and envelope protein. a The binding between the peptide and HLA has been experimentally proven by T-cell assay and reported in IEDB; b The binding between the peptide and HLA has been experimentally proven by HLA ligand assay and reported in IEDB; c The binding between the peptide and HLA has been experimentally proven by both T-cell assay and HLA ligand assay and reported in IEDB.

Prediction of HTL Epitopes from SARS-CoV-2 Proteins
Despite the number of HLA Class II alleles being almost one-third the number of HLA Class I alleles, they record a nearly similar total number of predicted epitope bindings ( Figure 3). The HLA-DRB1*15:02 allele holds the highest record, with a total of 299 HTL epitopes predicted to bind to it. In contrast, the HLA-DRB1*11:01 recorded a total of 170 bindings, the lowest record in HLA Class II alleles. All of the HLA Class II DRB1 included in this prediction binds to many more peptides when compared to HLA Class I. Moreover, consistent with the HLA Class I, ORF1ab is still responsible for most of the bindings (Figure 3; blue color), approximately 80% of the total. This is followed, in order, by spike, nucleocapsid, membrane, and envelope protein. Following the NetMHCIIpan 4.0 prediction, promiscuous HTL peptides from SARS-CoV-2 (ORF1ab, S, E, M, and N) were also sorted and further checked for their IFN-γ score and population coverage. Similar to class I, promiscuous peptides here are defined as peptides that can bind to at least two different HLA Class II alleles. Out of 1422 predicted peptides, 414 of them are found to be promiscuous. The highly promiscuous peptides that bind to at least 3 out of 6 HLA Class II alleles are listed in Table 3, while the complete list of HTL peptides identified in this study can be found in Supplementary Materials (Table  S2)  Following the NetMHCIIpan 4.0 prediction, promiscuous HTL peptides from SARS-CoV-2 (ORF1ab, S, E, M, and N) were also sorted and further checked for their IFN-γ score and population coverage. Similar to class I, promiscuous peptides here are defined as peptides that can bind to at least two different HLA Class II alleles. Out of 1422 predicted peptides, 414 of them are found to be promiscuous. The highly promiscuous peptides that bind to at least 3 out of 6 HLA Class II alleles are listed in Table 3, while the complete list of HTL peptides identified in this study can be found in Supplementary Materials (Table S2). With starting residue number 6751, the 15-mer peptides of LDDFVEIIKSQDLSV belonging to ORF1ab bind to all six different HLA Class II alleles (HLA-DRB1*07:01, HLA-DRB1*11:01, HLA-DRB1*12:02, HLA-DRB1*15:01, HLA-DRB1*15:02, HLA-DRB1*16:02), making it the most promiscuous HTL peptide that covers 95.26% of the Indonesian population. The binding between the peptides and HLA alleles has also been experimentally proven by T-cell assay and reported in IEDB.    Apart from binding affinity, IFN-γ is an important cytokine produced by HTL to trigger immune responses. Therefore, HTL epitopes with good IFN-γ inducing ability reflected as IFN-γ score are more likely to produce immune responses. Thus, 15 top 15-mer HTL epitopes with the highest IFN-γ score are listed in Table 4, along with information on HLA Class II that binds, population coverage, and previous IEDB reports. The complete list can also be found in Supplementary Materials (Table S2). The highest IFN-γ score was IAQFAPSASAFFGMS epitope from ORF9 that binds to DRB1_0701 with a score of 1.06167. The binding has also been experimentally proven by B-cell assay and reported in IEDB. The binding between the peptide and HLA has been experimentally proven by T-cell assay and reported in IEDB; b The binding between the peptide and HLA has been experimentally proven by B-cell assay and reported in IEDB.

Identification of Conserved Epitopes in SARS-CoV-2 and Human CCCs
Cross-reactivity analysis was performed through the IEDB Conservancy Analysis Tools to identify conserved epitopes throughout the SARS-CoV-2 and four other human CCCs. The analysis was performed using five proteins (ORF1ab, S, E, M, and N) conserved in all five viruses against HLA Class I and II alleles. It was found that only two 9-mer epitopes were conserved in all five viruses (Table 5). Both epitopes belong to the ORF1ab, specifically within the region of nsp12, which encodes for the RNA dependent RNA polymerase (RdRp) enzyme of coronaviruses. The two epitopes bind to a total of 4 different HLA Class I alleles, with a population coverage of 13.07%. Moreover, specifically for the 'SLAIDAYPL' epitope, its binding with HLA-A*02:01 has been experimentally proven by both the T-cell assay and HLA ligand assay reported in IEDB.
Additional analysis was performed with the two identified conserved epitopes. The conserved epitopes were run against the SARS-CoV-2 current and previously circulating VOCs. The current VOCs include delta and omicron variants with a total of 1157 and 884 isolates, respectively. The previously circulating VOCs include alpha, beta, and gamma variants with a total of 158, 374, and 9 isolates, respectively. The sequences of SARS-CoV-2 variants can be found in Supplementary Materials. From the conservancy run, it was found that all VOCs of SARS-CoV-2 possessed the two conserved epitope sequences (Table 6). These were slightly different in position but still belonging to the same SARS-CoV-like RdRp region and domain. The results show that the epitopes are fully conserved across SARS-CoV-2 variants despite different mutations that are responsible for emergence of different variants.  In addition to the VOCs, the epitopes were also run against the SARS-CoV and MERS-CoV from the same betacoronavirus family (Table 7). It was found that the SLAIDAYPL epitopes were conserved in both SARS-CoV and MERS-CoV. The HEFCSQHTM epitope was conserved 100% in SARS-CoV but not in MERS-CoV. One amino acid difference in the non-critical ninth residues was found in MERS-CoV, a change from methionine (M) to leucine (L). Despite the difference, the epitope from MERS-CoV still binds to the same HLA allele as SARS-CoV-2. Together with the SARS-CoV and MERS-CoV, the conservancy analysis indicated that the two identified epitopes are conserved across all coronaviruses that have infected humans. Table 7. Conservancy analysis of conserved Epitopes in SARS-CoV-2 against SARS-CoV and MERS-CoV which belong to the same Betacoronavirus family. The different residue is underlined.

Epitopes with Identical TCR Contact Residue between SARS-CoV-2 and Human CCC
Besides the fully conserved epitopes, epitopes with identical TCR contact residues between SARS-CoV-2 and four human CCCs were also identified. As previously indicated, epitopes with residue differences may still experience binding to the same HLA if the residues still fulfill the HLA binding motifs. The peptide which has an identical TCR contact residue might be recognized by the same T-cell clone. This study specifically used a sequence similarity cutoff value of 67% to ensure optimal recognition and binding [33]. Moreover, to ensure that the epitopes from human CCC with differing residues still indeed bind to the same HLA alleles with the original epitopes from SARS-CoV-2 reference sequence, another round of netCTLpan 1.1 analysis for 9-mer epitopes and netMHCIIpan 4.0 analysis for 15-mer epitopes was conducted.
During the process, a total of four 9-mer peptides ( Table 8) and five 15-mer peptides (Table 9) from ORF1ab were identified. Almost all of them possess some residue difference in the non-TCR contact residue. Overall, however, more residue differences were observed in the 15-mer HTL epitopes. Data showed that not all listed peptides turn out to bind to the same alleles where the peptide from SARS-CoV-2 reference sequence binds, which could be due to the differences in the anchor residue of the peptide. The identified 9 peptides were found to mainly originate from nsp12, nsp13, and nsp14 of ORF1ab, which is suggested to play an essential role in the viral survival for it to retain some degree of conservancy across all five viruses. In addition, some of the bindings between the epitopes and HLA have also been reported in IEDB and validated through HLA ligand assay or B cell assay. Table 8. 9-mer epitopes originated from OR1ab with identical TCR contact residues between SARS-CoV-2 and Human CCC identified through IEDB Conservancy Analysis Tools, along with information on HLA that binds, population coverage, position and previous IEDB reports on the binding. The TCR contact residue of HLA Class I is located in position 3-8 (marked by underlined). The residue difference across the sequence is indicated with red color text.  a The binding between the peptide and HLA has been experimentally proven by HLA ligand assay and reported in IEDB. b Still binds to exactly the same allele with SARS-CoV-2 original epitope sequence according to netCTLpan with %Rank less than 1%. c Still binds to at least one same allele with SARS-CoV-2 original epitope sequence according to netCTLpan with %Rank less than 1%. Table 9. 15-mer epitopes (9-mer core epitope is highlighted in bold) originated from ORF1ab with identical TCR contact residue between SARS-CoV-2 and Human CCC identified through IEDB Conservancy Analysis Tools, along with information on HLA that binds, population coverage, position and previous IEDB report on the binding. The TCR contact residue (highlighted in bold and underlined) of HLA Class II is located in position 2, 3, 5, 7 and 8 within the 9 mer core. The residue difference across the sequence is indicated with red color font.   [6280][6281][6282][6283][6284][6285][6286][6287][6288][6289][6290][6291][6292][6293][6294] a The binding between the peptide and HLA has been experimentally proven by B cell assay (ELISA) and reported in IEDB. b Still binds to exactly the same allele with SARS-CoV-2 original epitope sequence according to netMHCIIpan and exhibiting strong binding level with %Rank less than 1%. c Still binds to at least one same allele with SARS-CoV-2 original epitope sequence according to netMHCIIpan and exhibiting strong binding level with %Rank less than 1%.

Cross-Reactive T-Cell Epitopes between SARS-CoV-2 and BCG
Potential cross-reactive CD8+ and CD4+ T-cells epitopes were discovered utilizing Blastp analysis and reinspected using NetCTLpan 1.1 and NetMHCIIpan-4.0 correspondingly. In total, the BCG vaccine contains 14 similar sequences as SARS-CoV-2 in ORF2, ORF3a, ORF4, ORF5, ORF6, ORF7b, ORF8, and ORF9 regions (Table 10). These similar sequences are not completely identical but have 2 to 4 amino acid substitutions. Despite that, these 14 epitopes are still able to bind to various HLA Class I alleles, especially HLA-A*02:01 and HLA-A*02:03. Moreover, several cross-reactive epitopes from BCG are also capable of binding different HLA alleles than the SARS-CoV-2 epitopes. Additionally, a few cross-reactive BCG epitopes also demonstrate stronger binding to HLA than SARS-CoV-2 epitopes according to the HLA binding affinity score. Table 10. Cross-reactive CTL epitopes between SARS-CoV2 and BCG predicted by Blastp. The amino acid substitution is indicated by underlining and TCR contact residue (residue 3-8) is indicated by bold font. n.b. indicates that peptide was predicted as a non-binder to HLA allele. Besides cross-reactive CTL epitopes, one cross-reactive HTL epitope was also identified through Blastp analysis. The cross-reactive HTL epitope from BCG is not identical to the SARS-CoV-2 epitope in the ORF7a region but has 2 amino acid substitutions in its core epitope. In spite of its difference, the cross-reactive HTL epitope is still able to bind HLA-DRB1*11:01 more strongly than SARS-CoV-2. In contrast, the cross-reactive HTL epitope has much lower Indonesian population coverage compared to the SARS-CoV-2 epitope (Table 11). Table 11. Cross-reactive HTL epitopes between SARS-CoV2 and BCG. The amino acid substitution is indicated by underlining. The core epitope is highlighted in red color and bold font indicating the TCR contact residue.

Cross-Reactive T-Cell Epitopes between SARS-CoV-2 and MMR
Similarly to BCG, several cross-reactive CTL epitopes were identified from the MMR vaccine, which is further divided into measles, mumps, and rubella. There were 7 similar sequences discovered from MMR compared with SARS-CoV-2 sequences in ORF2, ORF4, ORF7a, ORF7b, and ORF9 regions. These cross-reactive epitopes are able to bind to either one or two HLA Class I alleles, notably HLA-A*02:01 and HLA-A*02:03. However, only three cross-reactive epitopes from MMR demonstrate stronger HLA binding than SARS-CoV-2 epitopes, indicated by a lower HLA binding affinity score (Table 12).

Potential Epitope Set for Coronavirus Universal Vaccine Construct and Its Population Coverage
The conserved epitope from SARS-CoV-2 and HCCs identified in the preceding steps might be helpful in the universal vaccine construction. However, in regard to epitope selection for vaccines specifically, several additional criteria have to be considered, such as immunogenicity, IFN-γ inducing ability, population coverage, similarity with human peptides, and many more. Therefore, the selected epitopes were further characterized for the similarity with the human peptides and the entire epitope set was calculated for population coverage (Table 13).
Combining all the epitopes into a vaccine construct will result in 99.58% projected population coverage for Indonesia, with an average number of 4.25 epitope hits/HLA    Table 10 for Indonesian, South East Asia, Europe, and the world population.

Discussion
T-cell responses to SARS-CoV-2 infection have been garnering attention lately due to potentially playing a vital role in measuring disease responses and progression, predicting disease recurrence, and control strategies [35]. Understanding T-cell responses can provide greater insights primarily on immune responses towards the disease, thus pro-  Table 10 for Indonesian, South East Asia, Europe, and the world population.

Discussion
T-cell responses to SARS-CoV-2 infection have been garnering attention lately due to potentially playing a vital role in measuring disease responses and progression, predicting disease recurrence, and control strategies [35]. Understanding T-cell responses can provide greater insights primarily on immune responses towards the disease, thus provide deeper understanding of its pathogenesis, vaccine development, responses, evaluation, disease severity, and much more [36]. There is strong evidence on T-cell cross-reactivity as the cause behind pre-existing immunity toward SARS-CoV-2 in the naïve population [8,9]. These cross-reactive immune responses reportedly correlate with shared homology between SARS-CoV-2 and the circulating HCoVs. Prior exposure to the human CCCs likely mediates the priming of SARS-CoV-2 responsive T-cells in the unexposed population [33,35,37,38].
In addition to the homologous epitopes, nine heterologous epitopes with identical TCR contact residue were also identified. They are all represented by at least one major Indonesian HLA allele and some of their bindings have been reported in IEDB. This result showed that cross-reactivity could indeed potentially arise due to sequence homology shared between the SARS-CoV-2 and human CCC. Previous studies demonstrated that T-cells, especially CD4+ T-cells that recognize human CCCs, exhibited cross-reactivity towards the homologous epitopes of SARS-CoV-2 [33]. It is important to note that the small number of epitopes identified in this study was most likely due to the analysis being performed across all four human CCC at once instead of towards each human CCC virus individually. More extensive epitope sequence similarity would be expected if the latter exercise was performed. A previous study identified as many as 13 fully conserved epitopes and 17 heterologous epitopes of SARS-CoV-2-derived epitopes when only compared to the betacoronavirus (HCoV-OC43 and HCoV-HKU1) [40]. Other studies have also consistently explored and reported a larger number of bindings when they conduct the analysis separately. Collectively, the data support the idea that cross-reactivity occurrence happens due to the shared homology of the HCoVs [39,41].
A further look into the fully conserved epitopes also showed that they are conserved in the previous and currently circulating SARS-CoV-2 VOCs, SARS-CoV and MERS-CoV. This result shows that there are low chances of these epitopes being responsible for the difference in disease outcomes and severity, given that their presence across the SARS-CoV-2 variants, SARS-CoV, MERS-CoV, and human CCCs is consistent. While data is still lacking to draw a conclusive claim, the information on the HLA alleles that bind to these conserved epitopes could possibly help in understanding different disease outcomes across the coronavirus family. The region where the conserved epitopes originated might play an essential role in the coronavirus family, for they are conserved across all the HCoVs.
Out of five proteins (ORF1ab, S, E, M, and N) used during the analysis, all of the identified shared epitopes in this study belong to the ORF1ab. This may indicate that ORF1ab possessed a high degree of conservancy in the coronavirus family. This finding is also aligned with our previous study which validated it through entropy analysis [25] and studies done by others using genome-wide screening technology [42]. Besides the shared epitopes, ORF1ab also ranks highest in the number of predicted epitopes when compared to the rest of the proteins such as S, E, M, and N. This is understandable given the number of epitopes highly correlated with the gene length, as ORF1ab length is almost two-thirds of the whole genome [40]. This data was corroborated by others which revealed that ORF1ab contains the highest number of experimentally proven immunodominant epitopes [42] and highlighted the role of ORF1ab as the region with the most cross-reactive T-cells that can elicit responses in individuals exposed to other HCoVs but naive to SARS-CoV-2 infection [38].
Moreover, ORF1ab also has a primary function as a replicase enzyme that is vital for viral survival [43]. The ORF1ab is further cleaved into 16 nsps, where several of them have an integral function during the viral life cycle [44]. This study specifically identified epitopes belonging to the nsp12, nsp13, and nsp14 regions. The nsp12 encodes for RdRp protein responsible for viral replication, transcription, and overall survival. The nsp13 belongs to the helicase superfamily, which possesses both helicase and NTPase activity [45]. They primarily function to unwind the DNA or RNA strands during replication; they also support and interact with other replication-transcription complexes [46]. Lastly, the nsp14 encodes for proteins necessary for proofreading ability, evading the host's antiviral responses, and ensuring mRNA stability and translation [47]. The essential functions of each region in SARS-CoV-2 further support the idea of them being conserved across the whole coronavirus family. Previous studies have reported that nsp12-nsp14 are among the most conserved and accessible sites in the entire SARS-CoV-2 proteome, which is aligned with this study's results [45,46]. As a conserved region, they also become an attractive target for intervention such as drug targets or vaccines [47].
The role of live attenuated vaccines such as BCG, OPV and MMR in attenuating COVID-19 diseases has been hypothesized [48] and investigated in some clinical trials [49,50]. In addition, a case-control study was also performed which investigated the MMR vaccination's probable protective effect against SARS-CoV-2 [51]. Although most clinical and case-control studies focus on trained innate immunity, some studies showed that these LAVs also contain cross-reactive T-cells epitopes similar to epitopes of SARS-CoV-2 that may generate adaptive immunological memory [7,16,17]. These cross-reactive T-cells epitopes might be one of the main factors contributing to protective effects from LAVs apart from the trained innate immune system. The Blastp analysis conducted in this study discovered that BCG and MMR contain 15 and 7 similar yet not identical sequences with SARS-CoV-2, respectively. Despite the fact that these sequences have 2 to 4 amino acid substitution, these sequences are potentially able to induce cross-reactive CTL against SARS-CoV-2. Additionally, one sequence discovered from BCG also has the ability to give rise to cross-reactive HLT against SARS-CoV-2. However, the dissimilarity in terms of which HLAs bind to the cross-reactive epitopes suggests that not all people will benefit from the presence of cross-reactive epitopes. Hence, only people who have certain HLA alleles will be able to respond to the epitopes and gain benefit from the heterologous immunity.
In contrast to other studies [13,14] the Blastp analysis did not find any cross-reactive CTL and HTL epitopes from OPV specific to the Indonesian population. One of the factors that might lead to no cross-reactive epitopes between OPV and SARS-CoV-2 is the genome nature of poliovirus. Compared to other viruses, the genomic RNA of the poliovirus is only approximately 7500 nucleotides in length. Hence, the poliovirus is classified as a small virus in comparison to BCG and MMR, resulting in lower possibilities of similarity matches in Blastp analysis. However, the result of this study does not imply that OPV does not have beneficial cross-protection effects on the Indonesian population. There is still the possibility that OPV provides nonspecific protection through trained innate immunity, such as Toll Like Receptor (TLR) stimulation, notably the TLR3/TLR7, against SARS-CoV-2 which eventually leads to the early activation of innate immunity, such as macrophages and dendritic cells [52] as well as the potential cross-reactivity of the humoral immune responses [15].
Regarding the association of the T-cell cross-reactivity with the clinical outcomes, there are limitations on the conclusions that can be drawn from immunoinformatics analyses. The ideas discussed here require investigation and validation through laboratory experiments. Several studies, for instance, proved that cross-reactive memory T-cells indeed result in protection by measuring the IL-2 and IFN-γ secretion through fluorescence-linked immunospot (FLISpot) assay on PBMC [40]. An experiment using an epitope-mapping and flow cytometry activation-induced marker assay suggested that the presence of crossreactive T-cells might be responsible for the different clinical outcomes and severity in COVID-19 patients [33].
It is still difficult and too early for now to conclude whether this cross-reactivity indeed results in cross-protection or instead results in enhanced pathogenicity. Apart from T-cell reactivity, the association of HLA alleles with COVID-19 severity has also been reported [53]. HLA genetic background is considered one of the contributing factors to different SARS-CoV-2 infection outcomes (transmissibility, susceptibility, progression, and severity), as it is directly related to an individual's immunogenetic variation [21]. Each allele possesses a different level of binding affinity. This binding affinity, as mentioned previously, will directly affect and produce different immune responses such as neutralizing antibodies or pro-inflammatory cytokines. In general, most of the HLA Class II alleles are less significant when compared to HLA Class I in regard to severity [54].
Out of 21 HLA alleles prevalent in Indonesia used in this study (Figure 1), only some have been further analyzed for their contribution towards severity in different populations (not Indonesian). For example, HLA-A*11:01 was proven by different studies to be highly correlated with severe disease and outcomes in the European and Chinese populations [21,54]. One study implied that individuals who possessed HLA-A*11:01 and HLA-A*24:02 might produce better T-cell immune responses when compared to HLA*02:01 [55]. HLA-A*02:01 instead is associated with an increased risk of infection due to its lower ability to present SARS-CoV-2 antigen in multinational populations. Lastly, the DRB1*12:01 are found to display protective effects against SARS-CoV-2, while the DRB1*15:01 are found to be increased in asymptomatic patients in the European and Chinese population [20,54]. Findings on different populations may or may not be reflected in the Indonesian population. However, it can be seen that differences in the genetic profile and geographic background will affect the outcomes of infection along with other factors such as age, gender, ABO blood groups, and other comorbid conditions [20,41].
The COVID-19 pandemic has once again shown the ability of viruses from the coronavirus family to cause massive outbreaks [56,57]. Studies have highlighted SARS-CoV-2 potential to evolve, survive, and reemerge, owing to its efficient host-switching and genetic recombination ability [58,59]. The immediate example that can be witnessed is the emergence of numerous SARS-CoV-2 variants with different transmissibility and immune evasion ability, which influences the treatment efficiency, including for vaccines. These factors stress the need for a universal coronavirus vaccine in order to minimize disease burden or even prevent the next coronavirus outbreak [56].
This study suggested some epitopes that might be helpful in the universal coronavirus vaccine construction (Table 13). Based on a series of analyses conducted in this study, ORF1ab was specifically identified as an attractive candidate for a universal vaccine. Epitopes belonging to the ORF1ab family have some features of high conservancy across the coronavirus family, promiscuousness that covers a large population, immunodominance, and good IFN-γ inducing ability; these make them very promising as possible targets for universal coronavirus vaccines. Several studies that primarily utilized ORF1ab for their multi-epitope vaccines design have also professed interest in and highlighted ORF1ab qualities as a vaccine target [25,60]. The presence of pre-existing cross-reactive memory T-cells recognizing epitopes from the early proteins that were made by the infected cells such as the polymerase proteins have been associated with the abortive infection which is beneficial in protecting individual from SARS-CoV-2 infection [61]. This supports ORF1ab as a candidate for a universal coronavirus vaccine and shows the benefit of cross-reactive T-cells. Yet there are still reports of contradictory findings where it is believed exposure to human common cold coronavirus does not primarily explain the cross-reactivity [62]. Debates over the exact cause and mechanism are still circulating. Pre-existing immunity may result in either protection or increased susceptibility toward SARS-CoV-2 [24]. Regardless of different opinions and arguments, understanding T-cell responses and reactivity is believed to be the key to understanding the disease responses to ensure suitable and effective control strategies are implemented [35,36].
To date, no studies have conducted extensive research on the universal coronavirus vaccine. However, some essential points can be learned from the attempt to make a universal influenza vaccine [63]. It is vital to ensure that a universal vaccine offers broad and durable protection that is effective towards at least three-fourths of viruses from the family, despite the mutations that may arise over time [64,65]. It also must be suitable for all age groups and possess good immunogenicity, low allergenicity, and low homology with the human peptides [64]. A better understanding of the infection itself will also help improve and speed up the vaccine design process, assessing the vaccine responses and evaluation [64]. All in all, it may take a long time to actually produce an effective and safe universal vaccine, but scientists believe the idea of a universal coronavirus vaccine is possible with further exploration through research and analysis [56,57,65].

Conclusions
Through immunoinformatics approaches, two fully conserved SARS-CoV-2 derived epitopes with 100% sequence similarity shared by all four human CCCs, along with additional nine heterologous epitopes with identical TCR contact residues and cross-reactive epitopes with the live attenuated vaccines were identified. Each epitope was found to be represented by at least one major Indonesian HLA allele, covering a considerable population with good binding affinity. These findings overall support the idea that T-cell cross reactivity occurrence in SARS-CoV-2 naïve individuals could arise from shared sequence homology between SARS-CoV-2 and human CCCs. Additional analysis performed with SARS-CoV, MERS-CoV, and SARS-CoV-2 VOCs also indicated that the conserved epitopes are not responsible for differential outcomes in regards to the resulting signs, symptoms, and severity, as their presence is consistent across all assessed isolates. Moreover, with most predicted conserved epitopes belonging to the ORF1ab, this study also implies the vital role of ORF1ab in the coronavirus family. This further suggests its potential as a candidate for the universal coronavirus vaccine target.
The present study is limited to in-silico analysis which cannot confirm matters concerning T-cell reactivity and HLA association with clinical outcomes and severity. Therefore, further validation using laboratory experiments that allow measurement of immune system key player (i.e., IL-2, IFN-γ) levels might provide greater insights to resolve the issues raised.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/v14112328/s1. Table S1: Full list of predicted 9-mer CTL epitopes from five different SARS-CoV-2 proteins (ORF1ab, S, E, M, and N) with %Rank less than 1% identified from NetCTLpan 1.1 along with their immunogenicity score. Table S2. Full list of predicted 15-mer HTL epitopes from five different SARS-CoV-2 proteins (ORF1ab, S, E, M, and N) with %Rank less than 1% identified from NetMHCpan 4.0 along with their IFN-γ score. File S1 contains the FASTA files of the ORF1ab sequences from SARS-CoV-2 VOCs.