Genomic Transcriptome Benefits and Potential Harms of COVID-19 Vaccines Indicated from Optimized Genomic Biomarkers

COVID-19 vaccines can be the tugboats for preventing SARS-CoV-2 infections when they are practical and, more importantly, without adverse effects. However, the reality is that they may result in short-term or long-term impacts on COVID-19-related diseases and even trigger the formation of new variants of SARS-CoV-2. Using published data, we use a set of optimized-performance COVID-19 genomic biomarkers (MND1, CDC6, ZNF282) to study the benefits and adverse effects of the BNT162b2 vaccine. We found that the vaccine lowered the expression values of genes MND1 and CDC6 while heightening the expression values of ZNF282 in individuals who are SARS-CoV-2 naïve, which is expected and satisfies the biological equivalence between the COVID-19 disease and the genomic signature patterns established in the literature. However, we also found that COVID-19-convalescent octogenarians responded reversely. The vaccine heightened the expression values of MND1 and CDC6. In addition, it lowered the expression values of ZNF282. Such adverse effects raise outstanding concerns about whether or not COVID-19-convalescent individuals should take the current vaccine or when they can take it. These findings are new at the genomic level and can provide insights into developing next-generation vaccines, antiviral drugs, and pandemic management guidance.

Antibody responses to SARS-CoV-2 and variants have been commonly applied in evaluating the efficacy of vaccines. We recently identified a set of critical genes that can perfectly or nearly perfectly classify the individuals into their respective groups and further cluster the COVID-19-infected into subgroups. Such high performance led to the identification of reliable biomarkers of COVID-19 disease using blood-sample data [25,26] and of SARS-CoV-2 using NP (nasopharyngeal)/OP (oropharyngeal) swab PCR-sample data [27]. We, in this paper, use the high-performance gene biomarkers to evaluate the genomic benefits and adverse effects of the BNT162b2 vaccine in COVID-19-convalescent octogenarians and SARS-CoV-2-naïve individuals. Here, the genomic benefits mean that the vaccine will increase (decrease) a gene's expression value if the higher (lower) expression value of this gene will lower an individual's SARS-CoV-2 infection risk, and adverse effects mean the reversed direction.
The contributions of this paper are three-fold: (1) signifying genomic (MND1 (meiotic nuclear divisions 1), CDC6 (cell division cycle 6), ZNF282 (zinc finger protein 282) benefits and adverse effects of the COVID-19 vaccine BNT162b2 in two heterogeneous populations; (2) pointing to a new direction of COVID-19 studies, i.e., gene-gene interactions; (3) pointing to a potential target of next-generation vaccines, antiviral drugs, treatments, and management. In (1), we first found that the vaccine lowered the expression values of genes MND1 and CDC6 while heightening the expression values of ZNF282 of individuals who are SARS-CoV-2 naïve, which is as expected and satisfies the biological equivalence between the COVID-19 disease and the genomic signature patterns established in the literature [26]. However, we also found that COVID-19-convalescent octogenarians responded to the vaccine reversely. The vaccine heightened the expression values of MND1 and CDC6. In addition, it lowered the expression values of ZNF282. Such adverse effects raise outstanding concerns about whether or not COVID-19-convalescent individuals should take the current vaccine or when they can take it.
The remaining part of the paper is organized as follows. First, Section 2 briefly reviews the study methodology. Next, Section 3 reports the data sources, analysis results, and interpretations. Finally, Section 4 concludes the study with discussions. Supplementary Materials contain real data and computer outputs.

Method
We use two methods to conduct our comparative studies. The first method is a graphical method (popular in biological and medical research), without any statistical computation, which directly plots and demonstrates gene expression level changes over time after vaccinations. Here, the original gene expression level changes represent clinical information, such as antibody responses. Such a direct method reveals the biological and clinical information related to heterogeneous populations and their genomic effects. The second method is to apply the proven method of max-linear competing logistic regression classifier to the classifications of BNT162b2 vaccine responses between COVID-19-convalescent individuals and SARS-CoV-2-naïve individuals. The second method is very different from other classical statistical and modern machine learning methods, e.g., random forest, deep learning, and support vector machine. In addition, the new method has enhanced the interpretability of results, consistency, and robustness, as shown in our earlier work [25][26][27][28][29][30][31] on studies of COVID-19 and the biomarkers of several types of cancers.
The genes selected in this new study were chosen from the genes identified in our earlier work [25][26][27], which led to perfect or nearly perfect performance. As a result, they can be used as reliable biomarkers, which will be further justified in this new comparative study.
The GSE190747 data contain (1) 16 female COVID-19-convalescent octogenarians and their gene expression values collected at days 0, 1, and 7; and (2) 14 COVID-19-naïve individuals (ages from 26 to 72) and their gene expression values collected at days 0, 10, 11, 34, 35, 40, 41, 48. We, here, briefly report our earlier results [26] that established the existence of genomic signature patterns and COVID-19 subtypes and the mathematical and biological equivalence of the disease and the signature patterns. The work used max-linear competing logistic regression models to establish component classifiers CF-i and the combined max classifier CFmax. The following Tables 1 and 2 appeared in our earlier work [26]. In the table, the classifier CF-I(TPM) is defined as −0.3303 + 3.4152 × KIAA1614 + 0.2177 × MND1−0.1248 × SMG1, using 0.5 as the threshold in computing risk probability in the logistic regression function. Other classifiers are defined similarly. All mathematical equations, algorithms, and interpretations are referred to by Zhang [26]. Table 2. Performance of individual classifiers and combined max-competing classifiers using bloodsample data GSE152418 to classify COVID-19-infected and healthy controls into their respective groups. The meaning of CF-i is the same as those in Table 1. Raw stands for raw counts. For the GSE157103 data, we also found that a combination of CDC6 and ZNF282 can lead to 97.62% accuracy (98% sensitivity, 96.15% specificity), with the following classifier: 1.7615 + 6.8226 × CDC6 −1.1556 × ZNF282; a combination of CDC6, ZNF282, and CEP72 (centrosomal protein 72) can lead to 98.41% accuracy (99% sensitivity, 96.15% specificity), with the following additional classifier: −1.9944 As the pandemic is now dominated by Omicron variants, especially BA.5, linking the genes identified earlier for other variants to Omicron variants will provide better genomic knowledge of COVID-19 diseases. We found that the genes identified from GSE157103 and GSE152418 again led to 100% accuracy with a SARS-CoV-2 Omicron variant BA.1 cohort study GSE201530 [37]. The following new Table 3 reports the outcomes.
The data from GSE201530 contains four types: unvaccinated/no prior infection, vaccinated/no prior infection, unvaccinated/prior infection, and vaccinated/prior infection. Comparing Table 3 with Tables 1 and 2, we can see different patterns in fitted coefficients associated with the chosen genes, which is not surprising, as COVID-19 patients in two previous cohorts (GSE157103 and GSE152418) were first-time infections and had no vaccinations, i.e., GSE157103 and GSE152418 have different group comparisons to GSE201530. An essential feature in Tables 1 and 2 is that the signs and strengths of fitted coefficients are interpretable, i.e., they tell how the expression level changes of the biomarker genes affect the risk of COVID-19 infection and their functional effects. However, the genes identified from our earlier work still lead to 100% accuracy in Table 3, which shows that these genes contain information related to SARS-CoV-2 variants, including Omicron BA.1, and likely BA.5 (once the data are available to check). Table 3. Performance of individual classifiers and combined max-competing classifiers using bloodsample data GSE201530 to classify COVID-19-infected and healthy controls into their respective groups. The meaning of CF-i is the same as those in Table 1  In addition to the NP/OP swab PCR-sample data in Table 4 [35], we studied another NP/OP swab PCR-sample data GSE152075 to obtain the following new Table 5. As discussed in our earlier work [27], genes in Tables 4 and 5 and their transcriptional response and functional effects on SARS-CoV-2 and genes in Tables 1-3 and their functional signature patterns to COVID-19 antibodies are significantly different, which can be interpreted as the former being the point of a phenomenon, and the latter being the essence of the disease. Such significant findings can help explore the causal and pathological clues between SARS-CoV-2 and COVID-19 disease and fight against the disease with more targeted vaccines, antiviral drugs, and therapies. Putting Tables 1-5 together serves as a starting point for our new comparative vaccine efficacy study in the subsequent sections.
Given the perfect performance of genes (ABCB6 (ATP Binding Cassette Subfamily B Member 6 (Langereis Blood Group)) KIAA1614, MND1, SMG1 (nonsense-mediated mRNA decay associated PI3K related kinase), and RIPK3 (Receptor Interacting Serine/Threonine Kinase 3)) in Tables 1-3 using blood-sample data and the nearly perfect performance of genes CDC6, ZNF282, and CEP72, these genes certainly can be used as reliable biomarkers for COVID-19 diseases (blood samples). On the other hand, ATP6V1B2 (ATPase H+ Transporting V1 Subunit B2) and IFI27 (Interferon Alpha Inducible Protein 27) have central roles in SARS-CoV-2 heterogeneous populations, which was discussed in our earlier work [27] and is further confirmed in the new Table 5 using NP/OP swab PCR samples. Therefore, considering the functions of these genes discussed in our earlier work [26,27], we focus on the genes MND1, SMG1, CDC6 (cell division cycle 6), ZNF282, CEP72, ATP6V1B2, and IFI27 in this study using the data of BNT162b2 vaccine efficacy [32].

The Clinic Evidence Directly Observed Using Graphical Approach and Results
In this section, we directly plot gene expression change responses to the BNT162b2 vaccine. Using the genes identified in our earlier work and the last section as COVID-19 biomarkers, in the subsequent figures, we plot MND1, SMG1, CDC6, ZNF282, CEP72, ATP6V1B2, IFI27 responses in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, and Figure 7, respectively. Figures 1-7 clearly show that COVID-19-convalescent individuals and COVID-19naïve individuals have entirely different BNT162b2 vaccine responses. Our earlier work [27] used a total of fourteen cohort studies (including different platforms, different ethics, different geographical regions, breakthrough infections, and Omicron variants) with 1481 samples to justify the results. So far, we have not seen any other research in the literature with nearly perfect performance. With such comprehensive studies and conclusive outcomes, it may be safe to say that the identified genes in our earlier work are representative, and the gene-gene interaction heterogeneity between SARS-CoV-2 and COVID-19 does exist. Using the results from our earlier work [26,27] and those in Section 3.1, we can see that the higher the expression values of ZNF282, CPE72, and ATP6V1B2, the lower the risk of an individual being COVID-19 positive; and the lower the expression values of MND1, CDC6, and IFI27, the lower the risk of an individual being COVID-19 positive. Clearly, COVID-19-naïve individuals have improved and expected vaccine responses, i.e., the BNT162b2 vaccine can help prevent SARS-CoV-2 infections for COVID-19-naïve individuals.
Vaccines 2022, 10, x FOR PEER REVIEW for COVID-19 diseases (blood samples). On the other hand, ATP6V1B2 (ATPase H porting V1 Subunit B2) and IFI27 (Interferon Alpha Inducible Protein 27) have roles in SARS-CoV-2 heterogeneous populations, which was discussed in our earli [27] and is further confirmed in the new Table 5 using NP/OP swab PCR samples fore, considering the functions of these genes discussed in our earlier work [26 focus on the genes MND1, SMG1, CDC6 (cell division cycle 6), ZNF282, ATP6V1B2, and IFI27 in this study using the data of BNT162b2 vaccine efficacy [3

The Clinic Evidence Directly Observed Using Graphical Approach and Results
In this section, we directly plot gene expression change responses to the BN vaccine. Using the genes identified in our earlier work and the last section as CO biomarkers, in the subsequent figures, we plot MND1, SMG1, CDC6, ZNF282, ATP6V1B2, IFI27 responses in Figures 1, 2, 3, 4, 5, 6, and 7, respectively.                    Note that at time zero, except for SMG1 and CEP72, the two groups are comparable in terms of their expression values. However, COVID-19-convalescent octogenarians showed adverse effects with other genes, i.e., the BNT162b2 vaccine could increase the risk of breakthrough SARS-CoV-2 infections in this group of individuals. Figure 2 shows different patterns of SMG1 expression level changes. In our earlier work [25,26], we found that this mRNA gene can be either helpful or harmful depending on its combination effects with other genes. The right panel also shows that the vaccine can be either helpful or harmful depending on its effect time.

Separability between COVID-19-Naïve Individuals and COVID-19-Convalescent Octogenarians Using the High-Performance Biomarkers
The figures in Section 3.2 showed the significant difference between two heterogeneous populations using reliable high-performance biomarkers. In this section, we use the seven genes to separate two BNT162b2 vaccine populations. We fit the max-linear competing logistic classifier model to the data and obtained the following Table 6 using only four genes: MND1, SMG1, CEP72, and APT6V1B2. In the table, sensitivity is for the COVID-19-naïve population, and specificity is for the COVID-19-convalescent population. We have an overall accuracy of 88.70%, with 89.71% sensitivity and 87.23% specificity. Such performance clearly shows that these two populations have very different responses to the BNT162b2 vaccine, which provides justifications for direct findings in Section 3.2.

Discussions
The graphical results in Section 3.2 were direct illustrations, without any statistical calculation and modeling, of gene expression level changes (of clinic information) over time. They show biological phenomena and patterns. Section 3.3 used only four genes to study the significant differences between two heterogeneous populations with an overall high performance of 88.70% accuracy. These results show that the genes used in this study contain basic genomic information, i.e., their responses to the BNT162b2 vaccine. The significant differences between responses from two heterogeneous populations are apparent and convincing.
Our work to study the efficacy of vaccines is completely different from existing studies in the literature, which detected antibody responses, immune responses, durations of neutralizing antibody responses, immune boosting, etc. This paper, at the genomic level, used high-performance biomarkers to study genomic (gene) responses to the BNT162b2 vaccine in two heterogeneous populations. The chosen genes (biomarkers) led to 100% accuracy in our earlier work [26] and the new Table 3 in this paper. These biomarkers characterize the COVID-19 disease at the genomic level, the results are interpretable, and hence they provide genomic information on the disease's pathological relationship to genes and their functional effects. Given that the knowledge of COVID-19 is still limited, it may be safe to say that knowing the genomic relationship between the disease and the highperformance and reliable biomarkers can lead to better vaccine practice and development and antiviral drug development than the current knowledge of SARS-CoV-2. The current vaccines have not evaluated the responses of the biomarker genes and their functional effects in their formulas. There is a potential that the vaccines can make the disease even worse as they can result in adverse effects on gene expressions and their joint functions with other genes. Our new results of the gene-gene interaction effects at the genomic level reveal a new direction for next-generation vaccine development, such that they can more efficiently prevent infection. Using the flu vaccine as an analog example, it has been reported that it is just 15% effective this year. One can say that this is due to new flu variants. However, it may be due to the intrinsic gene-gene interactions, and their functional effects have not been utilized in the vaccine formulas.
Our earlier work [26] proved that the existence of COVID-19 genomic signature patterns with the single-digit number of genes determines the recurrence of COVID-19 disease; especially, recurrence (breakthrough infection) can occur in COVID-19-convalescent individuals with a higher probability, as these individuals' COVID-19 genomic signature had been observed in their genetic system. Our undeniable results in Section 3.2 raise outstanding concerns about allowing COVID-19-convalescent octogenarians to take the BNT162b2 vaccine. A potential risk is that these individuals may suffer breakthrough infections. The logic is that these individuals may have adverse effects from vaccines and change the critical gene expression levels to higher/lower the threshold value of being healthy.
A key question is when they can take a vaccine after recovering from a COVID-19 infection.
Our earlier work [27] hypothesized that MND1 and CDC6 could be responsible for the virus (genetic segment) replication and mutation, and their combined effects can lead to new variants. However, ZNF282 can be hypothesized as a repair agent of SARS-CoV-2. Therefore, these genes demand further studies, aiming to make vaccines and antiviral drugs to lower or control the expression levels of MND1 and CDC6 but to boost the expression levels of ZNF282. In addition, given that ZNF282 is a zinc-figure protein gene, it is worth looking further into whether or not zinc supplements can be helpful.