Adverse Effects of COVID-19 Vaccination: Machine Learning and Statistical Approach to Identify and Classify Incidences of Morbidity and Postvaccination Reactogenicity

Good vaccine safety and reliability are essential for successfully countering infectious disease spread. A small but significant number of adverse reactions to COVID-19 vaccines have been reported. Here, we aim to identify possible common factors in such adverse reactions to enable strategies that reduce the incidence of such reactions by using patient data to classify and characterise those at risk. We examined patient medical histories and data documenting postvaccination effects and outcomes. The data analyses were conducted using a range of statistical approaches followed by a series of machine learning classification algorithms. In most cases, a group of similar features was significantly associated with poor patient reactions. These included patient prior illnesses, admission to hospitals and SARS-CoV-2 reinfection. The analyses indicated that patient age, gender, taking other medications, type-2 diabetes, hypertension, allergic history and heart disease are the most significant pre-existing factors associated with the risk of poor outcome. In addition, long duration of hospital treatments, dyspnoea, various kinds of pain, headache, cough, asthenia, and physical disability were the most significant clinical predictors. The machine learning classifiers that are trained with medical history were also able to predict patients with complication-free vaccination and have an accuracy score above 90%. Our study identifies profiles of individuals that may need extra monitoring and care (e.g., vaccination at a location with access to comprehensive clinical support) to reduce negative outcomes through classification approaches.


Introduction
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) variants give rise to COVID-19, the pandemic disease that has caused a massive public health emergency worldwide since the first reports in December 2019 in Wuhan, China [1,2]. SARS-CoV-2 virus is genetically related to a number of coronaviruses found in bat species, and its genetic medication administered to a very large population, necessitating close surveillance to detect any evidence of direct or indirect effects. While the number of adverse reaction cases to COVID-19 vaccination is extremely small in number relative to the number vaccinated, they cannot be overlooked as they give important information to predict and ameliorate adverse reactions and poor outcomes. Statistical and ML analysis [25] can play a role in characterizing those factors. We have, therefore, analyzed data from patients to clarify the common causes of such reactions. We employed statistical analysis and trained ML models to identify individuals most at risk of vaccine complications. If the causes of adverse effects of a vaccine are identified and eliminated and patients identified as at risk of complications are vaccinated in a safe medical environment, it would prevent the development of serious conditions and enable rapid treatment for anaphylaxis or other conditions, making COVID-19 vaccination safer.
The main objectives of this study are as below: • To identify the most significant features of a patient's past medical history that are associated with adverse effects of COVID-19 vaccination; • To find the most significant patient symptoms that can predict the patient's need for hospitalization for treatments after COVID-19 vaccination; • In cases of death recorded after COVID-19 vaccination, to find the contributing causes of death; • To identify and classify by the machine learning methods those patients that are at high medical risk of severe adverse reactions after COVID-19 vaccination and may need extra precautions.

Methods
In this study, we considered COVID-19 vaccinated patient data, including the past medical history, and their postvaccination effects and outcomes, and conducted data analyses by applying statistical methods and machine learning models. We also quantify the feature importance values to rank the features after model training.

Data Collection
In this study, initially, we have used a raw dataset of vaccinated USA patients that contains various kinds of vaccine-related information. The dataset was collected from the Vaccine Adverse Event Reporting System (VAERS) of observed individuals from December 2020 to 16 February 2022, who had reported adverse reactions after vaccination [26], from which a subset of 102,577 individuals was randomly chosen for further analysis. It contains information including COVID-19 vaccination status and the reactions to different sicknesses after vaccination. However, any non-COVID-19 information was omitted from our current study. In this dataset, for the most frequently used mRNA COVID-19 vaccines, the total number of collected reports was 72,147. VAERS collected the patient information on age, gender, comorbidity history, allergic history, and birth defect information after vaccination, vaccination date, date of reaction onset, hospitalization information after onset, death event, recovery status, and laboratory test information after onset. Additionally, the dataset also contains information about vaccine dose, days to onset, medical history, allergic history, type-2 diabetes status, and a list of medical history and reactions shown in the Tables 1 and 2. All this information was included in the dataset obtained from VAERS and also used in this study.
This dataset has several limitations, but it is a warning system for further inquiry; therefore it could be helpful for analyzing these effects for monitoring purposes. Any afflicted individuals can report to VAERS using either an online platform or a paper document. Experts in vaccination safety analyze all reports of significant adverse events submitted to VAERS after receiving the report. Includes permanent impairment, hospitalization or an extended hospital stay, life-threatening disease, birth defects, and death. Due to the fact that the events are self-reported, some reports may contain incomplete, coincidental, erroneous, or unverifiable information. We have employed three sorts of indications for vaccinated candidates, including hospitalization, SARS-CoV-2 positivity, and death, so these instances are critical and will hopefully be monitored by VAERS specialists.

Data Processing
Before applying statistical methods and machine learning models, we preprocessed the dataset, including the use of feature extraction and feature engineering. After discussing this dataset with expert clinicians, we first constructed a designated list of features, e.g., symptom names and comorbidities. Then, we generated a keyword list with the help of those clinicians and applied string matching algorithms to prepare the dataset with features that included symptoms, aftereffects, and comorbidities.
Applying string matching and keyword selection techniques, we have extracted the patient medical history, such as pre-existing noncommunicable and communicable diseases, which included hypertension, diabetes, chronic obstructive pulmonary disease (COPD), kidney disease, depression, and asthma (detail is shown in Table 1). We have also included the reported adverse reactions, including the types of symptoms and signs such as cough, high temperature, fatigue, fever, pyrexia, nausea, facial paralysis, and vomiting (detail shown in Table 2). We thus obtained a processed dataset with 86 attributes and 72,147 entities.
In the data processing step, especially in feature extraction, we have considered some factors. Initially, we extracted and transformed values from the raw textual dataset [26], i.e., in the "gender" field, there were three types of values, i.e., 'M' as male, 'F' as female, and 'U' as unknown gender. In the 'died' and 'disabled' fields, we have considered 'Y' as yes and the remainder as 'no'; in the 'prior vaccine' fields, mentioned the vaccine name as 'yes' and the rest are 'no'. In the 'allergic history' field, we have considered mentioned allergic effects as a positive case of allergic history and the null values, values with 'no', 'none', 'NA', 'no known allergic effects, and also more negatively mentioned text as a negative case. However, in the 'History' column in the raw dataset, coexisting conditions of patients were in written form. We extracted all of the patient's medical histories separately. In this case, we have selected the keywords for each of the features and then matched them with the text and found the appropriate medical history, which we have considered as the most frequent top 27 individual medical histories. In the raw dataset, [26], there was a separate file that contains the patients' adverse reactions as symptoms, including a key of 'VAERS ID,' where we have separated each of the 56 most frequent reactions, those are 56.22% developed within 24 h. There were three different files that were included in the dataset: the first one was for patients' demographic and medical history, the second one was for patients' reactions, and the final one was for vaccine information. We have merged the dataset according to the primary key 'VAERS ID'. Finally, we have eliminated all of the non-COVID-19 vaccinated patients' data.
We have partitioned our dataset into two different parts. The first part contained the patient medical history, and the second part consisted of the patient adverse reactions after vaccination (detail of the workflow is shown in Figure 1). After vaccination, some patients died shortly after developing some symptoms, some were re-infected with COVID-19, and some had shown sufficiently severe adverse reactions to require admission to hospital facilities for treatment. For this reason, we consider the three different types of target variables for patient comorbidities and reaction analysis after vaccination. The first one is "death status" (2348 were dead and others are alive), the second one is "SARS-CoV-2 test status" (13,546 were infected with COVID-19 and others are not), and the third one is patient "hospital admission status" (together 11,266 individuals were with severe reactions (all of them were hospitalized) and the others (who were not hospitalised))-all of which were observed after vaccination. Figure 1. The schematic diagram of the overall workflow including data processing, data division, analysis using statistical and machine learning methods, and, at the end, performance evaluation with finding significant features. Figures 2 and 3 show the Pearson's correlation heat-maps for the patient medical history and reactions, respectively. Furthermore, for the machine learning algorithms, we have performed some additional steps to process the data. For the data field, namely the age, approximately 2.27% of data were missing, which was imputed with the mean value. Before each of the train-test split of the dataset, we have standardized our dataset with zero mean and unit standard deviation [27].   Among all of the 72,147 COVID-19 vaccinated individual patients, first, we have considered 2348 patients who died and 69,799 (72,147 − 2348) alive and completed the experiments. In the second phase, we repeated the experiment with a new set of data that includes 13,546 reinfected COVID-19 cases and controls (58,601 nonreinfected candidates), in this instance, we examined the control group whose COVID-19-positive status was negative. Altogether, 11,266 individuals were with severe reactions (all of them were hospitalized) and the others (who were not hospitalised) constitute the final sample for this study. In all of the experiments, we considered those attributes as independent variables and performed statistical and machine learning analyses.

Statistical and Machine Learning Approaches
We have used statistical and machine learning approaches to find the significant features. Machine learning models are also capable of distinguishing between the various group of patients. For the categorical variables, we used the chi-squared test to find the corresponding p values and consider p < 0.05 as a significant as well as an associative parameter. Since age is absolute discrete data, we used the Mann-Whitney U test over two different populations.
We also performed descriptive statistical analysis to calculate the percentage and mean values of the features. In machine learning analysis, there are a range of models, i.e., decision tree (DT) and random forest (RF) (tree-based algorithms), support vector machine (SVM) are kernel-based and three boosting algorithms-gradient boosting machine (GBM), extreme gradient boosting machine (XGB) and light gradient boosting machine (LGBM) [28]. We selected those supervised machine learning algorithms for classification because of their excellent performance and quick execution [29]. For this purpose, classifiers that are based on max-voting, averaging, and weighted-averaging have been used as a basic ensemble learning approach. Along with that, the advanced ensemble learning approach also functions as stacking, bagging, and boosting. Those techniques are highly efficient and easy to debug [30].
In the model training phase, the machine learning algorithms had some parameters to classify and extract significant features. In the decision tree algorithm, the random state is set as 42 with a minimum sample split number of two, and 'gini' is used as a criterion. Random forest was used as same as a Decision tree with a minimum of two split samples. On the other hand, SVM sets as a radial basis function ('RBF') kernel. The learning rate was 0.1 with criterion 'friedman_mse' in GBM. However, the learning rate of LGBM was 0.05 with a bagging fraction of 0.8 and a bagging frequency of 5. A tree-based booster with a max depth of six was used in the XGB algorithm and the learning rate was 0.1.
To evaluate the machine learning models, a set of metrics are used, i.e., accuracy, precision, recall, f1-score, area under the receiver operating characteristic curve (ROC) curve (AUC), and log-losses. To find the associative parameters, we calculate the feature importance values for every machine learning model. The coefficient values of each feature represent the corresponding contribution of model training to separate an unknown instance among classes. The explanations of the following matrices are following: • Accuracy: Accuracy can be determined in terms of positive and negative rates for binary classification, as seen below [31,32]: • Recall: When we need to determine how many positives can be predicted, recall is another acceptable selection of assessment metric [31][32][33].
• F1-Score: The F1-score maintains the balance between the classifier's precision and recall. The F1 score, which is the consonant measure of precision & recall, is a value that falls between 0 and 1 [31,32].
• AUC: The area under the ROC curve, or AUC, shows how well the probabilities from the positive classes are separated from the probabilities from the negative classes.
Where True Positive Rate, or TPR, is just the range of trues, we use it to figure out how many times a test is positive [31][32][33].
• Log-loss: The most important order metric based on probabilities is log loss. Raw log-loss values are hard to make sense of, but log-loss is a good way to measure models [31,32].

Results
In this study, we used two different types of factors with two different analyses and then correlated each of the results. The type of factors employed include features of the medical history of the patients who demonstrated reactions after vaccination, and the reaction natures were symptoms that arose after vaccination.

Distribution of Patient Medical History Features and Reactions
In this section, we describe the percentage of each significant factor of patient medical history and reactions shown in Table 1. Although the average age of the individuals was 47.5 years old, the age of those cases of fatalities and hospitalizations was 71.47 and 62.49 years, respectively. Thus, there is a clear difference in age between different patient groups. The highest number of people who received the first dose was 35.58%. For the second and third doses, these figures were 23.62% and 25.65%, respectively. In our study, there were approximately twice the number of female participants compared to male participants, and almost half of them were recorded as regularly taking other medications. A history of allergies (including various kinds of allergic events, not only anaphylaxis) was a frequently observed factor, with approximately 1 in 5 of the total cases and close to 1 in 4 of the fatality cases. In the hospitalized patient group, those with a history of allergies made up 1 in 4. In contrast, there were comparatively much fewer among SARS-CoV-2 positive patients, with 1 in 5. Other common diseases associated with significant patient reactions included prior vaccine, type-2 diabetes, hypertension, thyroid disorder, and asthma which each account for around 5%, while all the remaining factors each accounted for 1%-3%.
The reactions of patients are shown in Table 2. It can be seen that chills and nausea counts were 23.73% and 15.05%, respectively. In addition, patient disability, headache and dyspnoea count were around 10% of the total cases observed. The next most frequent adverse reactions include pain in the extremity, pyrexia, fatigue, different kinds of pain, and dizziness fall mainly in the range of 5% to 8%, with the incidence of other maladies below 5%. On the other hand, the lowest count was for Anaphylactic reaction (0.32%) and Cardiac arrest (0.35%), respectively.

Finding Significant Associations between Patient Medical History Factors and Post-Vaccination Adverse Reactions Using Statistical Analyses
Using two different statistical tests (chi-squared test for categorical variables, Mann-Whitney U test for age variable), we identified the most associative and significant parameters, including patients' medical history factors (including pre-existing diseases and other discomforts) and identified the adverse reactions or symptoms that may have predisposed to the development of severe health conditions, even fatality. In this analysis, we considered those significant parameters with a value of p < 0.05 or lower. The target variables that we have used in our statistical analyses were death, SARS-CoV-2 positive status, and hospital admission status. The two different figures have (Figures 4 and 5) been demonstrated. For a better view, we have calculated the negative 10-based logarithmic values for each of the p values and used them in the corresponding figures. Bar length indicates the significance level.
In terms of patients' medical histories, age, gender, COPD, hypertension, hyperlipidemia, kidney disease, heart disease and type-2 diabetes were the most significant features among all the target groups. However, for patients' death status, dementia, is also found as significant. Most of the significant parameters common for died, hospitalized and SARS-CoV-2 positive patients were age, gender, COPD and hyperlipidemia. However, asthma, COVID-19 positive history, migraine, and high cholesterol were not found as significant within any of the groups. The details results for this analysis are shown in Figure 4 and also shown the values in the Supplementary Table S7. In this figure, the bar lengths proportionate to the negative logarithm of p-values, while indicating the significance, i.e., a larger bar length is more significant.
We have also performed a similar analysis for the dataset with patient adverse reactions and identified a list of significantly associated symptoms (top 30) that are shown in Figure 5 and also shown the values in the Supplementary Table S8. In this case, we also considered three target variables as the independent variable, when it is not considered as a target variable at the time of analysis. It can be observed that the dyspnoea, hospital stay duration in days, intensive care, cough, and pain in extremity were the common factor for all three target variables. When we have considered the incidence of patient mortality as a target variable, dyspnoea, hospital stay days, intensive care, cough, and disableness were found to be the most significant. It was also observed that dyspnoea, hospital stay days, intensive care, cough, and disability were found as significant for the hospitalization status, whereas the dyspnoea, cough, hospital stay days, intensive care, pruritus, rash, urticaria, and erythema was for SARS-CoV-2 positive status.

Classification of Patients Using Machine Learning Algorithms
In our machine learning analysis, first, we considered the patient medical histories as the independent features, and the patient death, SARS-CoV-2 test positive, and hospital admission status as dependent features, which depend on those independent features. Next, by considering both patient's medical history and the patient reactions after vaccination. Initially, we trained our models and evaluated their performances with the test data by calculating a range of metrics including accuracy, precision, recall, F1-score, ROC-AUC, and Log-loss, which are shown in Table 3, including the ROC-AUC curves which are shown in Figure 6 in the panel's A, B, and C, respectively for patients medical history. The results indicate that when the target feature variable was the patient's death status, RF performed the best across all matrices, achieving the highest 1.0 scores and the lowest log-loss values (0.16). Other algorithms such as LGBM, DT, and XGB achieved an accuracy score of 0.99, whereas SVM and GBM achieved scores of 0.95 and 0.94, respectively. In terms of other metrics, the performances of all the algorithms were close to the accuracy scores. In addition, all of the methods achieve log loss values close to 2 percent. However, SVM and GBM model performances were encouraging, with above 94% accuracy. Similar observations were made when we considered the SARS-CoV-2 test result as the target feature variable, i.e., the RF outperformed other competing methods, with 0.96 accuracy scores, respectively, while other models' performances were also found as competitive except SVM, which achieved almost consistently below 0.80, and the log-loss were also higher than others (i.e., above 7%). Finally, for the target variable, hospital admission status, RF, and DT have achieved the highest accuracy with 0.98% and all the other models performed almost equally, but scorewise, they have demonstrated some performances that are below optimal (i.e., compared to the previous two scenarios).  Next, we considered patient postvaccination adverse reactions as the independent feature, and the target variables remain the same as previously. The results indicating model predictive performances are shown in Table 4 including the ROC-AUC curves which are shown in Figure 6 in panels D, E, and F, respectively. It can be noted that all the classifiers demonstrated substantially similar performances with scores of greater than 0.80 in all the evaluation matrices and the log-loss was less than 3.50%. However, it can be also observed that when different target variables were set for the classification tasks after training with the patient adverse reaction, the best performing classifiers (in terms of Accuracy) were different as well, i.e., for the patient death status, the RF yielded a score of 1. Moreover, for the SARS-CoV-2 test status, and hospital admission status, the RF scored 1.0, and LGBM, DT, and XGB yielded 0.99 equally.

Feature Importance Analysis for Finding Significant Features Using Machine Learning Classifiers
After model training, we calculated the coefficient values for each of the features and prioritized them as significant with regard to their corresponding target variables. Firstly, we calculated the feature importance scores for each distinct feature for individual machine learning classifiers (excluding SVM since it is not possible to find feature importance using the 'RBF' kernel), and then we normalized the values to render the data with the same scale, by using the quantile normalization technique [34]. This was followed by the average quantile normalization of those values, as shown in Figures 7 and 8. The longest bar length indicates the higher rank of the features.
In the case of patient past medical histories, the identified features are shown in Figure 7 and also shown the values in the Supplementary Tables S1-S3, where the patient age, gender, and taking other medicine have shown significant importance for all target variable. With the target variables death status and hospital admission status, the important attributes were hypertension and COPD. Allergic history and prior vaccine showed the importance in case of both SARS-CoV-2 positive status and hospital admission status. Figure 8 shows the feature importance of (top 30) listed according to the category of patient postvaccination adverse reactions or symptoms. Moreover, the coefficient values of each of the features are shown in Supplementary Tables S4-S6. For the first target variable (i.e., patient mortality status), the most important features identified were the hospital treatment duration, severe pain, urticaria, headache, cough, dizziness, fatigue and rash. For the second target variable (i.e., SARS-CoV-2 test status), the significant features were similar to the case of the first target variable, including that the dyspnoea was a novel finding as an important factor. Finally, for the third target variable, hospital admission, the significant features identified were hospital treatment duration, fatigue, headache, dizziness, rash, and dyspnoea.

Discussion
Vaccination is a well-accepted and reliable approach to prevent infectious diseases [35], and historically, it has proved to be one of the most effective strategies to control epidemics and pandemics, such as the SARS-CoV-2 outbreak [36]. All vaccines result in at least a small number of patients that demonstrate postvaccination side effects [37]. It is a challenging task to identify patients who are likely to show postvaccination adverse reactions. Some patients can experience a rapid onset reaction [23] requiring treatment at a hospital, or clinic, and even with rapidly administered care, the condition can be fatal [24]. Clearly, a better prediction of the risk of adverse reactions is highly desirable. If a model can distinguish between those whose health conditions pose a high risk and those who do not, hospital administration will be able to provide enough health care services. Therefore, this research could be quite facilitating for such cases.
The purpose of this research was to determine the key indications that indicate a susceptibility to adverse COVID-19 vaccination effects as well as to identify the key symptoms that indicate the cause or causes of the adverse conditions, including classifying a patient as at high risk or needs special care after COVID-19 vaccination. We have found a list of the most significant features that support our hypothesis; all of them are commonly found in all target groups. The most significant demographic information is patient age and gender; the most strongly associated patients' coexisting conditions are taking other medicine, hypertension, allergic history, and type-2 diabetes, abnormal blood pressure; and most significant associated patient negative effects experienced postvaccination are long time hospital treatment, pain, urticaria, headache, cough, dizziness, and rash.
Furthermore, some postvaccination symptoms are commonly found in anonymous cases, but few of these are likely responsible for patients' severe conditions due to vaccination. The most severe side effects identified are the hospital treatment duration, headache, pyrexia, dyspnoea, chills, fatigue, different kind of pain, and dizziness. The Centers for Disease Control and Prevention (CDC) reported headache, fatigue, soreness at the injection site, fever, and myalgias [38,39], which are similar to these early observations. Patient allergic history is commonly an associated cause of adverse effects to many drugs and vaccines [34], and in the case of COVID-19, this has also been reported [40][41][42][43][44]; allergic-related reactions are found to a significant degree in every data group used in our study. Patient age is another important aspect, where the mortality rate in persons of advanced age is comparatively higher than in younger patients; previous studies have made similar findings [24]. There are reports that indicate allergic history may be a significant issue for COVID-19 vaccination [45,46]. However, our study indicates that patients taking significantly immunosuppressant medications [47] are at elevated risk of adverse reactions, as are those who are already SARS-CoV-2 positive at vaccination. Our research also suggests that other pre-existing conditions such as COPD, hypertension, hyperlipidemia, kidney disease, type-2 diabetes, and heart disease and a history of allergic responses could also be associated with the development of severe vaccine reactions. We also identified a range of other factors linked to significant patient reactions that require hospital treatments and may also be associated with patient mortality. In addition, the early findings demonstrate that some persons with chronic conditions, aging populations, or racial/ethnic minority populations require various types of vaccinations [39].
The utilization of machine learning models is widely acknowledged as capable of demonstrating morbidity/mortality-associated factor identification and for using those factors in making patient outcome predictions [21]. We identified machine learning models that performed well with our datasets and identified significant parameters related to vaccination-associated symptoms. The models achieved a good accuracy score including good precision, recall, and f1 score as well as low log loss values indicating strong classification and decision-making. In our analysis, we saw that several models performed particularly well with high scores for evaluation matrices, i.e., in the dataset of medical history, for the mortality and SARS-CoV-2 positive cases 1.0 accuracy for RF and for the hospitalized cases with 0.98 accuracy score for RF and DT, and in the dataset of adverse effects, for the mortality, SARS-CoV-2 positive, and hospitalized cases RF scored 1.0 accuracy score. Thus, based on the exhaustive comparison of various factors utilizing supervised machine learning models, this analysis may identify significant factors for clinicians, indicating parameters valuable for patient stratification. In sum, the use of machine learning models presented here to assess the likelihood that a patient is at risk of developing a severe reaction post-vaccination could be of great utility.
The limitation of this study is the availability of datasets. If we can collect the dataset for all other types of vaccines, we will be able to reach a definitive conclusion. However, this represents a groundbreaking contribution to the study of COVID-19 vaccination reactions.
Postvaccination adverse effects could be decreased if at-risk individuals can be identified based on the patient medical history, which this study confirms by experimenting with a set of validation datasets. Though vaccination may not be directly responsible for patients' severe illness or death, we may need very careful observation of identified at-risk patients including access to ICU facilities.

Conclusions
The results indicate that patient medical histories are strongly related to the incidence of patient adverse reactions, some of which are associated with severe disease and even death. Moreover, a set of significant side effects are also developed as postvaccination symptoms. Therefore, it is important to identify possible causes of the adverse effects. If recognized, the factors identified can be taken into account by clinicians and enable care improvement.
Based on our analyses, the patients at greatest risk of adverse reaction after vaccination include those of advanced old age (Ageing-60 years or more), gender, COPD, hypertension, those having allergic conditions, those taking other medications (notably immunosuppressive medications) and those with a history of type-2 diabetes, hypertension or heart disease disorders. Moreover, the study also revealed that a set of symptoms postvaccination like hospital stay duration, pyrexia, headache, dyspnoea, chills, fatigue, different kind of pain and dizziness, rash, and physical disability are most associated with severe reactions.
Using statistical and machine learning analysis, we have found factors in patient medical histories that are associated with a risk or adverse patient reaction occurring in the postvaccination period. Our results also suggest that a common group of severe after-effects, that were identified by the independent analyses, proves that these outcomes are reliable.
Although our analysis reveals significant findings regarding the risk of COVID-19 vaccination effects, there are a few limitations that need further research effort. We have used a comparatively small amount of patient data collected from a specific region of the USA, which included those receiving the mRNA-based vaccines only. Therefore, for making a generalized decision, it is important to have a rigorous analysis with a larger population size and cover more vaccine types. Nevertheless, we hope that the result of this research will play a significant role for policymakers in considering the distribution of vaccines as well as identifying patients who may be vulnerable to adverse reactions.
The efficacy and safety of COVID-19 vaccines to date have been excellent considering they were so rapidly developed, but minor after-effects of an administered vaccine might be expected and some extreme allergic or other responses may infrequently happen. Although the possibility of postvaccination adverse effects is not always a reason to avoid vaccines (especially given the serious consequences of COVID-19 in many vulnerable groups), new information about adverse reaction risk that our study provides could be an important consideration in clinical considerations about how (or whether) to administer a COVID-19 vaccine to a possibly at-risk individual, as well as determining the need for extra monitoring and care at the point of vaccination.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/healthcare11010031/s1, Table S1: Coefficient values of the patient's medical history after machine learning model training for target variable Died. Table S2: Coefficient values of the patient's medical history after machine learning model training for target variable SARS-CoV-2 Positive. Table S3: Coefficient values of the patient's medical history after machine learning model training for target variable Hospitalized. Table S4: Coefficient values of the patient's reactions after machine learning model training for target variable Died. Table S5: Coefficient values of the patient's reactions after machine learning model training for target variable SARS-CoV-2 Positive. Table S6: Coefficient values of the patient's reactions after machine learning model training for target variable Hospitalized. Table S7: p-values for the patient's medical history of statistical analysis. Table S8: p-values for the patient's reaction of statistical analysis.