Association between Mutations in Papain-like Protease (PLpro) of SARS-CoV-2 with COVID-19 Clinical Outcomes

Papain-like protease (PLpro) is important for the replication and transcription of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This study aimed to reveal the PLpro mutations associated with the clinical outcomes of patients. Due to the importance of the S protein in the pathogenicity of SARS-CoV-2, the mutation of the S protein was also analyzed in this study. After downloading the data from the Global Initiative on Sharing Avian Influenza Data (GISAID) database, samples were divided into two groups on the basis of patient status, namely, recovered and dead groups. This study performed a univariate analysis and further explored the association of mutations with patient outcomes through multivariate logistic regression analysis. A total of 138,492 samples were used for analysis. The patients had a mean age of 43.66 ± 21.56 years, and 51.3% of them were female. Multivariate logistic regression results showed that, compared with men, women had a lower risk of dying from coronavirus disease 2019 (COVID-19) (OR = 0.687, 95%CI: 0.638–0.740). Compared with patients aged 17 years and younger, patients aged 18–64 years (OR = 2.864, 95%CI: 1.982–4.139) and patients over 65 years old (OR = 19.135, 95%CI: 13.280–27.572) had a higher risk of death after infection. Compared with the wild type, P78L (OR = 5.185, 95%CI: 2.763–9.730) and K233Q (OR = 5.154, 95%CI: 1.442–18.416) in PLpro were associated with an increased risk of death. A synergistic interaction existed between age and mutations A146D and P78L. The results of the multivariate logistic regression analysis of the data on vaccinated patients demonstrated that, compared with the wild type, the P78L (OR = 3.376, 95%CI: 2.040–5.585) mutation was associated with an increased risk of death. In conclusion, compared with the wild-type PLpro protein, the P78L and K233Q mutations may increase the risk of death in infected individuals. In addition, a synergistic effect existed between age and P78L and K233Q that increased the risk of death in older patients.


Introduction
Coronavirus disease 2019 (COVID- 19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has triggered a worldwide pandemic and spread to more than 200 countries since it was first reported in 2019. Data from the World Health Organization (WHO) show that as of 31 May 2022 at 5:09 pm Central European Time (CET), 526,558,033 cases and 6,287,117 deaths have been confirmed globally [1]. SARS-CoV-2 has a positive and single-stranded RNA genome that is approximately 30,000 bases in length [2,3] and that contains multiple open reading frames (ORF). ORF1a and ORF1b can encode continuous polypeptides, which can generate 16 non-structural proteins (NSPs) after cleavage [4].

Sociodemographic Characteristics of Patients
We accessed GISAID (https://www.gisaid.org/) [18] on 18 January 2022 and downloaded patient status metadata uploaded between 1 January 2021 and 31 December 2021. We obtained 166,339 entries after the first filtering, with filters including "Host = Human," "Complete," "High Coverage," "Exclude Low Coverage," "Patient Status," and "Complete on Collection Date." We included entries with defined gender, age, and patient status and containing mutations located in NSP3 (n = 138,492) after a second filter.
Of the 138,492 entries uploaded by laboratories in 117 countries on six continents, 54.7% (75,707/138,492) were from Europe and 25.3% (34,989/138,492) were from North America.
Out of the 138,492 patients, 135,479 recovered and 3013 died. A total of 51.3% of the cases in the dataset were female. Patients aged 18-64 years represented the largest proportion (69.7%) of the cases, with an average age of 43.66 ± 21.56 for all cases. Univariate analysis revealed that women had a lower risk of death than men. Patients aged 18-64 years and patients aged 65 years and older had a higher risk of death than patients aged 17 years and younger (Table 1).

Mutations
On average, each sample contained 3.26 mutations in the NSP3 protein, with an average of 3.27 mutations in the "Recovered" group and an average of 2.76 in the "Dead" group.
Univariate analysis showed that compared with wild-type SARS-CoV-2, mutations P78L, K233Q, or K93N in PLpro may increase the risk of death in patients, whereas the mutation E162D may reduce the risk of death. However, mutations identified by previous studies, such as Spike_D614G and NSP12_P323L, that may affect the clinical outcomes of patients were not statistically significant in the univariate analysis performed in this study. Among the fifty-four mutations in the S protein, all of them except for D614G, T716I, V70del, H69del, S982A, D1118H, A570D, V1264L, L5F, A243del, and S98F may be related to the clinical outcomes of patients. The detailed results are shown in Table 2.  Note: Chi-squared test was used to compare group differences. p < 0.05 was considered statistically significant and is highlighted in bold. a Continuity correction. b Since the correlation coefficient of Spike_Y160F and Spike_V159L was > 0.999 and was statistically significant (p < 0.001), the two variables were combined.

Age-Subgroup Univariate Analysis
The univariate analysis of PLpro mutations in different age subgroups yielded inconsistent results. Pairwise comparisons through the Breslow-Day test with adjusted statistical significance levels indicated that only the OR values of the K233Q, K93N, and E162D mutations were homogeneous among the three age groups.
The A146D mutation may be associated with an increased risk of mortality in patients aged 18-64 years. In all age groups, the P78L, K93N, and K233Q mutations may be associated with an increased risk of death. Although the E162D mutation had a low frequency, it may be associated with a reduced risk of death in patients. The results of the detailed age-subgroup univariate analysis are shown in Table 3.

Multivariate Logistic Regression Analysis
Gender, age, mutations with p < 0.1 in univariate analysis, and mutations identified by previous studies that may affect the clinical outcomes of patients, such as Spike_D614G and NSP12_P323L, were included in the multiple logistic regression analysis. The detailed results are shown in Table A1. Note: Chi-squared test or Fisher's exact test was used to compare group differences. Breslow-Day test was used for the homogeneity test of the odds ratio. p < 0.05 was considered statistically significant and is highlighted in bold. a Continuity correction. b Fisher's exact test. c A single zero cell existed in the 2 × 2 table. * At the 0.05 level, the difference in the odds ratio was statistically significant compared with patients aged ≤ 17 years. ** At the 0.05 level, the difference in the odds ratio was statistically significant compared with patients aged 18 to 64.
Given the heterogeneity of the OR values of A146D, P78L, and K233Q among different age groups, we also constructed a logistic regression model that included interaction terms. The difference between the two models was statistically significant (p < 0.001), and the inclusion of interaction terms was reasonable.
The results of logistic regression including interaction terms revealed that, with other factors being equal, the risk of death in female patients was 0.687 (95%CI: 0.638-0.740) times that in males. When infected with the wild-type virus, patients aged 18 Table 4.  The results of additive interaction demonstrated the existence of synergistic interactions between age and mutations A146D or P78L. The risk of death in patients aged 18-64 years infected with the virus with the A146D mutation was 3.571 times as high as the sum of the risks in patients exposed to only a single risk factor. The risk of death in patients aged ≥ 65 years infected with the A146D mutant virus was 3.388 times higher than that in patients exposed to only a single risk factor combined. The risk of death in patients aged 18-64 years infected with the virus with the P78L mutation was 2.137 times as high as the sum of the risks in patients exposed to only a single risk factor. Patients over the age of 65 infected with the P78L mutant virus had a 2.761-times higher risk of death than those exposed to only a single risk factor combined. The detailed results are provided in Table 5. Note: The 95% confidence intervals for relative excess risk of interaction (RERI) and attributable proportion due to interaction (AP) do not include 0, and the 95% confidence intervals for the synergy index (S) do not include 1, which means that there is an additive interaction. The synergy index was used as a summary measure of additive interaction.

Analysis of Vaccinated Patient Data
Of the 138,492 entries, 3569 were reported to have been vaccinated. Among the vaccinated patients, 3456 recovered and 113 died. We performed univariate and multivariate analyses on the data on vaccinated patients with breakthrough infections. The results of the univariate analysis are shown in Table 6.    Variables with p < 0.1 in univariate analysis and mutations previously found to be potentially associated with patient clinical outcomes were included in further logistic regression. The multivariate logistic regression results showed that women had a lower risk of death than men (OR = 0.589, 95%CI: 0.389-0.893). The P78L mutation may increase the risk of death by 3.376 (95%CI: 2.040-5.585) times compared with the wild type ( Table 7). The latter result is similar to the logistic regression results of the total metadata. In addition, the Spike_D950N mutation may increase the risk of death by 6.123 (95%CI: 1.147-32.677) times compared with the wild type, whereas the P681R and V1264L mutations in the S protein may reduce the risk of death by 0.045 (95%CI: 0.005-0.387) and 0.118 (95%CI: 0.015-0.922) times, respectively, compared with the wild type.

Discussion
Studies conducted early in the pandemic highlighted similar substitution rates for most genes in SARS-CoV-2. For example, the replacement rate of ORF1ab and Spike is approximately 3.5 × 10 −4 per site per year [19]. The PLpro coding sequence of interest in this study was located in ORF1ab. Since the start of the pandemic, many mutation sites have been found, and some of them have high mutation frequencies. Considering the possible co-occurrence of other mutations in the Spike protein, which is widely recognized to affect patient outcomes, our study identified several mutations in PLpro that may affect the risk of death in patients.
The results of multivariate logistic regression on 138,492 items showed that, compared with the reference sequence, the K233Q and P78L mutations were associated with an increased risk of death in patients. The mutations identified in this study that may affect the patient's risk of death have occurred in variants previously considered to be variants of concern (VOC) [20], such as the P78L mutation in the Delta VOC and the K233Q mutation in the Gamma VOC. Available evidence suggests that the Beta, Delta, and Gamma VOCs significantly increase the risk of death in patients compared with wild-type SARS-CoV-2 [21][22][23]. These mutations may explain some of the pathogenicity changes in VOCs. No dual P78L and K233Q mutations in PLpro were detected in VOCs, and even among the 138,492 entries included in this study, the frequency of double mutations was less than 0.1%, and triple mutations of the above mutations were absent.
Whether the mutation of SARS-CoV-2 affects clinical outcomes is an issue of wide concern. A recent study found that the frequencies of the D614G mutation in the S protein and the P323L mutation in NSP12 were positively correlated with patient mortality [14]. In this study, in addition to the mutation located in PLpro, we included other mutations that may be related to the clinical outcomes of patients in the multivariate logistic regression model. The results of this study revealed that Spike_D614G, Spike_E484K, Spike_N501Y, and NSP12_P323L mutations, which have been shown to be associated with outcomes, were not statistically different between the "Recovery" group and the "Death" group, demonstrating that these mutations were balanced in the two groups. Furthermore, the larger sample size of this study (n = 138,492) compared to the above studies enhances its representativeness.
The results of additive interaction analysis indicated the existence of a significant synergistic interaction between age grouping and the A146D and P78L mutations. The risk of death in patients exposed to the factors of older age and infection with the mutated virus was higher than the sum of the risks in patients exposed to only a single factor. In 1976, Rothman developed the sufficient-component casual model, which may be used to explain the variation in the effects of viral mutations with patient age. In addition to SARS-CoV-2 mutation and patient age, other unrecognized factors that affect patient outcomes exist. Thus, further research is needed to reveal possible complementary etiologies and prevent death outcomes.
Moreover, given that studies have shown that existing immunity can reduce the risk of death after breakthrough infection in patients [24,25], immunity is one of the important factors affecting the risk of death in patients. Therefore, in this study, we selected items related to patients who reported having been vaccinated against COVID-19 for further analysis to corroborate our previous findings. Similar to the results obtained from the analysis of 138,492 items in this study, the results of multivariate analysis revealed that the P78L mutation may increase the risk of death. These findings also confirmed the reliability of our results.
Although our study focused on the pathogenicity of SARS-CoV-2 and revealed mutations in PLpro that may be associated with the clinical outcomes of patients, their underlying mechanisms were not elucidated. PLpro has been widely accepted to modulate immune responses by affecting ubiquitination in host cells [5,7]. Notably, position 233 of PLpro (position 1795 of replica polyprotein 1ab) has been identified as one of the ubiquitination sites of the SARS-CoV-2 protein [5,26]. Therefore, the K233Q mutation in PLpro may suppress host immune responses by regulating the ubiquitination of important proteins, thereby affecting the clinical outcomes of patients. However, whether the 78th position of PLpro has special biological functions remains unclear. Further exploration of the functional, structural, and biological changes associated with the P78L and K233Q mutations in PLpro would be meaningful to reveal the mechanisms that affect the risk of death in patients.
In addition, the data used in this study did not meet random sampling requirements. Therefore, sampling bias may exist. However, we still found mutations associated with clinical outcomes in the PLpro gene of SARS-CoV-2. Our findings could provide evidence for early responses to mutations that could lead to clinically fatal outcomes.

Statistical Analysis
Categorical variables were described as frequencies (percentages). Chi-squared test or Fisher's exact test were used to compare categorical variables. Given that age differences can lead to significant differences in the risk of death after infection with SARS-CoV-2, agesubgroup univariate analysis was performed to explore whether the effects of mutations differed by age. Variables with p < 0.1 in univariate analysis and mutations found in previous studies that may affect the clinical outcomes of patients were further included in multivariate logistic regression to explore the effect of patient gender, age, and mutation on mortality. The R package "epiR" was used to calculate the indices of additive interaction: the relative excess risk (RERI), the attributable proportion (AP), and the synergy index (S). The synergy index was used as a summary measure of additive interaction [27]. In all statistical analyses, p-values less than 0.05 were considered statistically significant. IBM SPSS statistics 25.0 software, R Statistical Software 4.0.1, and RStudio 1.4.1717 were utilized for statistical analysis.

Conclusions
Compared with the wild type, the P78L and K233Q mutations in PLpro increased the risk of death in infected individuals. A synergistic effect existed between age and P78L and A146D. This effect increased the risk of death in older patients.