Modeling COVID-19 Vaccine Adverse Effects with a Visualized Knowledge Graph Database

In this study, we utilized ontology and machine learning methods to analyze the current results on vaccine adverse events. With the VAERS (Vaccine Adverse Event Reporting System) Database, the side effects of COVID-19 vaccines are summarized, and a relational/graph database was implemented for further applications and analysis. The adverse effects of COVID-19 vaccines up to March 2022 were utilized in the study. With the built network of the adverse effects of COVID-19 vaccines, the API can help provide a visualized interface for patients, healthcare providers and healthcare officers to quickly find the information of a certain patient and the potential relationships of side effects of a certain vaccine. In the meantime, the model was further applied to predict the key feature symptoms that contribute to hospitalization and treatment following receipt of a COVID-19 vaccine and the performance was evaluated with a confusion matrix method. Overall, our study built a user-friendly visualized interface of the side effects of vaccines and provided insight on potential adverse effects with ontology and machine learning approaches. The interface and methods can be expanded to all FDA (Food and Drug Administration)-approved vaccines.


Introduction
Adverse effects have been an important issue of vaccines, which are tightly monitored and evaluated each year [1]. In December 2020, the FDA issued the Emergency Use Authorization EUA for two mRNA-based COVID-19 vaccines (BNT162b2 from Pfizer-BioNTech and mRNA-1273 from Moderna) as 2-dose series and in February 2021, the FDA issued another EUA for one viral-based COVID-19 vaccine (JNJ-78436735 from Johnson & Johnson). Since then, the local and systemic adverse reactions after receipt of these COVID-19 vaccines have been summarized and studied across the world. WHO revised the regulation for safety and effectiveness in May 2022 to help healthcare professionals in the explanation of the oversight of COVID-19 vaccines [2]. Across the European Union (EU), a suitable pharmacovigilance system has to be in place to gather and report data on the adverse reactions during the COVID-19 vaccine campaigns [3][4][5]. In the United States, VAERS functions as the reporting and monitoring system to summarize the adverse events of COVID-19 vaccines. Until July 2022 in the US, 599 million doses of COVID-19 vaccines were administered. Though COVID-19 vaccines have been proven effective in preventing complications of COVID-19, including pneumonia, acute respiratory distress syndrome (ARDS), multi-organ failure, septic shock and death, 12,775 preliminary reports of death (0.0023%) among people who received a COVID-19 vaccine were reported at VAERS [6]. Adverse effects after COVID-19 vaccination include but are not limited to fatigue, muscle pain, headache, chills, injection site reaction, joint pain, fever, sore throat and allergic reaction. Allergic reaction or anaphylaxis was reported in 0.2% of participants after full Healthcare 2022, 10, 1419 2 of 14 vaccination [7]. Though serious adverse effects are rare, the general adverse effects cannot be overseen [8,9]. This indicates continued monitoring and assessment of adverse effects of COVID-19 vaccines are required to further improve our current understanding of safety and decision-making in the implementation of vaccination.
Some previous works focused on the prediction of vaccine outcomes and adverse effects [10,11]. Gonzalez-Dia's group has provided a machine learning method for data processing, feature selection, algorithm selection and testing for the prediction of vaccineinduced immunity [12]. The other group conducted the classification of post-COVID-19 vaccination reactogenicity with decision tree and random forest methods to find the features that lead to hospitalization and patient death [13]. One previous approach utilized an ontology method to support operable COVID-19 context for automatic governance of bioethics processes [14]. While efforts are made on the prediction of COVID-19 vaccine side effects, a visualized interphase is still largely required for healthcare providers, patients, and healthcare officers to easily access the information. Therefore, our goal is to utilize ontology and machine learning methods to analyze COVID-19 vaccine adverse events and build a visualized system to provide insight into the adverse events data and the analyzed results.
In this study, we took the use of the VAERS data of COVID-19 vaccines to first build a visualized interface with user-friendly approaches for healthcare providers, patients, vaccine manufacturers and government officers. While some security concerns and suggestions have been proposed for storing COVID-19 patient data in different NoSQL DBMSs (database management systems) [15], we used graph database neo4j [16] to store the processed data for our system, as it is open-sourced and widely used in software with complex relationships. Further, no security concerns exist given patient data is de-identified in VAERS. Ontology was utilized to set the nodes and relations in the network with neo4j software [16]. In the second part of this study, we demonstrated several applications to conduct analysis with our proposed system, including querying, finding similarities and classification. Some previous work has focused on leveraging machine learning models on VAERS data to predict death risk [17]. We also utilized this ontology network and the information on the adverse effects to build a machine learning model for the prediction of key feature symptoms that lead to hospitalization and treatments after COVID-19 vaccination. A node2vec method [18] was utilized to predict the most predictive symptoms related to hospitalization outcomes of patients, and the model was evaluated with a confusion matrix method. The probability of a patient's need for hospitalization was provided by the final model.

Dataset
Three datasets (VAERSDATA, VAERSSYMPTOMS, VAERSVAX) are extracted from VAERS [1], which is an open-source national reporting system for safety problems of licensed vaccines (Supplementary Table S1). The total vaccination data in the US was extracted from VAERS from 2021-2022 (up to March 2022), and the data was filtered with adverse events associated with vaccinations (majority of the data are COVID-19 vaccines Pfizer-BioNTech, Moderna, Janssen since the pandemic). The unique VAERS ID was used per patient to identify multiple events of a single person. The variables, including age, sex, vaccine manufacture, current illness, disability status, medication usage, allergic history, pre-existing conditions, etc., were extracted and encoded to be utilized in the knowledge graph.

System Overview
A knowledge graph-based system to store and analyze the data was developed. The four major components of our system are shown in Figure 1: knowledge graph design, knowledge graph creation, machine learning and application components.

System Overview
A knowledge graph-based system to store and analyze the data was developed. The four major components of our system are shown in Figure 1: knowledge graph design, knowledge graph creation, machine learning and application components. The knowledge graph was designed based on the VAERS data structure and potential use cases from the application. In knowledge graph creation, we preprocess the data, both structured and unstructured, to create knowledge graph nodes, properties and relationships and store them in the neo4j GraphDB. There is a machine learning component to help with the whole process, for example, to extract entities from unstructured data, compute node embeddings and build machine learning models for the application, etc.
With the vaccine side effect knowledge stored, three tasks were applied for application: (1) querying: use cypher query to search the database; (2) similarity: find similarity among the nodes with the help of node embeddings; and (3) classification: classify the needs of hospitalized vaccinated patients with side effects using machine learning methods. We will cover these components in detail in the following sections.

Data Preprocessing
For computation purposes, we sampled ~10 k side effect records. In the future, the methodology can be applied to use all the data. The related nodes are created, including three types: vaccine nodes, symptom nodes and patient nodes. The information on vaccines and symptoms are added as edges to the vaccine nodes and patient symptom nodes. For the unstructured text data, including OTHER_MEDS, CUR_ILL, HISTORY and SYMPTOM_TEXT field, we first cleaned the texts for each node to remove the invalid texts, including "na", "none", "unknown", "n/a", "no", "nothing", "non3", "unk", "no.", The knowledge graph was designed based on the VAERS data structure and potential use cases from the application. In knowledge graph creation, we preprocess the data, both structured and unstructured, to create knowledge graph nodes, properties and relationships and store them in the neo4j GraphDB. There is a machine learning component to help with the whole process, for example, to extract entities from unstructured data, compute node embeddings and build machine learning models for the application, etc.
With the vaccine side effect knowledge stored, three tasks were applied for application: (1) querying: use cypher query to search the database; (2) similarity: find similarity among the nodes with the help of node embeddings; and (3) classification: classify the needs of hospitalized vaccinated patients with side effects using machine learning methods. We will cover these components in detail in the following sections.

Data Preprocessing
For computation purposes, we sampled~10 k side effect records. In the future, the methodology can be applied to use all the data. The related nodes are created, including three types: vaccine nodes, symptom nodes and patient nodes. The information on vaccines and symptoms are added as edges to the vaccine nodes and patient symptom nodes. For the unstructured text data, including OTHER_MEDS, CUR_ILL, HISTORY and SYMPTOM_TEXT field, we first cleaned the texts for each node to remove the invalid texts, including "na", "none", "unknown", "n/a", "no", "nothing", "non3", "unk", "no.", "none.", etc., using the regex matches with lower case since we found those texts occur with high frequencies in those fields and provide no information.
To process the allergies, disease and medical history data, we used NLP (Natural Language Processing) methods to extract the structured entities and add them to our knowledge graph. Specifically, an open-source pretrained medical NER (Named-entity recognition) model was applied. We used a pre-trained scispacy "en_core_sci_md" in the Python ScispaCy [19] package model to link the texts to entities. In this part, RxNorm and Unified Medical Language System (UMLS) ontologies were utilized to be linked with NLP methods. The information from the OTHER_MEDS field was further extracted using the RxNorm linker with UMLS CUI. Historical condition relations are extracted using the UMLS linker from CUR_ILL, HISTORY and SYMPTOM_TEXT with restrictions to only keep disease/symptom entities because of the noisy text information. The disease/symptom entities are determined by the semantic type TUI T047 in UMLS. The allergy histories are also extracted from ALLERGIES fields using the UMLS linker.

Knowledge Graph
The dynamic knowledge graph is processed with neo4j software for the user interface. A free version of neo4j AuraDB was utilized, which is a cloud-based GraphDB, to collaborate on building and exploring the graph data, and this allows 50,000 nodes as a maximum. Though this limits the full nodes to be processed, we managed to evenly split the nodes across the timeline to better present the overall performance of the model. Both the nodes and edges processed in the previous step are imported into the neo4j, and a dynamic interaction user interface was built. See the abstraction of node and edge details in Supplementary Table S4.
Besides the properties shown in Supplementary Table S4, we also store the additional preprocessed features in the knowledge graph as properties. For example, "fea-ture_STATE" includes "STATE" features of patients with the replacement of NA instances to "UNKNOWN", so the nodes without a "STATE" property will have "feature_STATE" "UNKNOWN". These features are used for downstream applications, for example, building machine learning models, and they are dynamic features, so they can be updated frequently based on the use cases.
We also compute node2vec embeddings (Aditya Grover, Jure Leskovec) for each node in the graph with the help of Neo4j built-in Graph Data Science Library and calculate the top cosine similarities among the nodes. The embeddings are also saved as a property of each node, and the cosine similarity edges are created. The embeddings and similarity edges are also dynamic and can be updated as frequently as used. We experimented with node embeddings with 300 dimensions and with a parameter return factor P: 0.25 and input factor Q: 4 to make the random walk more concentrated.

Application
With the knowledge graph, we came up with three types of applications: querying, similarity and classification. For querying, we analyzed the statistics about COVID-19 vaccine, symptoms and historical drug/condition/allergy connections by searching in the GraphDB we built using the cypher queries. For similarity, after we computed the node embeddings and top cosine similarities of each node, we also used cypher queries to explore the similarity patterns among the COVID-19 vaccine and other entities.
For classification, the performance of prediction of hospitalization of a patient after receipt of a COVID-19 vaccine with certain symptoms was tested. We use cypher to pull the data from the GraphDB and split the data as 80% training set and 20% test set. In the training set, there are 7354 records without hospitalization (label 0) and 638 with hospitalization (label 1). In the test set, there are 1846 records without hospitalization (label 0) and 152 with hospitalization (label 1). Given the small size of the data with imbalanced labels, the final performance of the model is also evaluated with a confusion matrix, f1 score and precision recall AUC. We have experimented with several model architectures, including logistic regression and boosted trees. We use grid search methods with cross-validation to find the best parameters tuning on the positive label f1 score. All the models use patient features, including: "SEX", "STATE", "AGE_YRS" and "NUMDAYS". Features "SEX" and "STATE" is one hot encoded, and "AGE_YRS" and "NUMDAYS" are standardized. In addition, features including symptom features, embedding features and combined features were added (Table 1).

Preprocessing of the Data to Generate Knowledge Graph
The three datasets in 2021 and 2022 on the VAERS database were extracted and preprocessed using python to generate the knowledge graph. In total, 13,670 vaccination side-effect records were extracted, and the top distributions based on the types of the vaccines in the data are presented in Table 2. This is not the final representation in the graph since one patient may have multiple side effects records, but will give some insights into the source data distribution after preprocessing. From the analysis, COVID-19 side effects are the top one from these two years' reports, which is over 50 times the rest of the vaccines. Though this may be partially due to the high number of COVID-19 vaccination, it also indicates the importance of the studies on the adverse effect of COVID-19 vaccines.
In the next step, we deciphered different vaccines using the VAX_NAME. The results are consistent with the previous findings that the top three vaccines are the COVID-19 vaccines from Pfizer, Moderna and Janssen, with counts of adverse effects around 50-20 times higher than the rest of the vaccines (Table 3). From the results, the counts may also partially correlate with the number of the vaccination, as Pfizer and Moderna's side effects reports are higher than Janssen's, while in recent reports, more adverse effects are found in the Janssen COVID-19 vaccine. We also analyzed the side effects of vaccines based on different vaccine manufacturers. As shown in Table 4, the results also pointed to the top three COVID-19 manufacturers, which indicates a high number of COVID-19 vaccinations, while the study and analysis of these side effects are largely required. Furthermore, the top symptoms of the side effects are shown in Tables 5-8. Common side effects are identified, including headache, fatigue, pyrexia, etc. (Table 5). The treatments are also summarized in Table 6, which indicates the top common drugs used to treat adverse effects. In the meantime, we also analyzed the most common diseases and allergies in the reports (Tables 7 and 8). COVID-19 is the second highest disease, which is slightly less than hypertensive disease. This indicates the effectiveness of COVID-19 vaccines and side effects should be tightly monitored.

Creation of the GraphDB with Dynamic Interface
In the second part of the study, we utilized the preprocessed dataset to build a userfriendly visualized knowledge graph database. Due to the limitation of the free version on the nodes (50,000) in neo4j software, we focused on the COVID-19 adverse effects records in 2021 and 2022 and randomly selected 5,000 records for the presentation. In the real case with the updated version of neo4j, this method can be applied to present all the adverse effects of FDA-approved vaccines.
As shown in Figure 2A, four types of node labels are built in the GraphDB (patient, symptom, UMLS and vaccine). Each patient and vaccine is set as an individual node. The reports are extracted, and the information is built into the graph as edges, which includes allergic_to, consine_similarity, had_condition, has_symptom, took_vaccine, used_drug, etc. The property keys are set based on the patient information, vaccination information, hospitalization record, symptoms and treatment information in the dataset. This includes age_yrs, CUI, died, diable, L_threat, numdays, onset_date, recvdate, sex, state for each patient node, symptom and symptomversion for each Symptom node, vaers_id, vax_date, vax_type, vax_manu and vax_name, etc. for each Vaccine node. All the nodes are assigned a name property as the name to be displayed in the neo4j interface. For Patient Nodes, their names are the patient ids, and for Symptom and Vaccine Nodes, their name properties are the same with symptom and vax_name properties ( Figure 2B). Moreover, we also kept the labels and features as dynamic properties for machine learning modeling, including binary label class_HOSPITAL, node embedding embeddingNode2vec, and processed features, such as one hot encoded feature_SEX, feature_STATE, etc.
etc. The property keys are set based on the patient information, vaccination information hospitalization record, symptoms and treatment information in the dataset. This includes age_yrs, CUI, died, diable, L_threat, numdays, onset_date, recvdate, sex, state for each patient node, symptom and symptomversion for each Symptom node, vaers_id, vax_date vax_type, vax_manu and vax_name, etc. for each Vaccine node. All the nodes are assigned a name property as the name to be displayed in the neo4j interface. For Patient Nodes their names are the patient ids, and for Symptom and Vaccine Nodes, their name properties are the same with symptom and vax_name properties ( Figure 2B). Moreover, we also kept the labels and features as dynamic properties for machine learning modeling, including binary label class_HOSPITAL, node embedding embeddingNode2vec, and processed features, such as one hot encoded feature_SEX, feature_STATE, etc. The graph database was built based on the previously discussed settings. As shown in Figure 3, the interface is user-friendly, and the record of each patient and each vaccine can be easily visualized. In this example, it shows the subgraph of Patient 1117806 (Patient node in the center of the graph), who has been reported to have taken PFIZER\BION-TECH COVID-19 Vaccine (Vaccine node in purple) and developed Symptoms, including Blood Pressure Increased, Flushing, Nausea, Dizziness and Pain in Extremity (Blue Symptom nodes with HAS_SYMPTOM edges to that patient). In the meantime, the connections and correlations can be directly monitored with the neo4j database. This improves the version of the current dataset on VAERS and other databases. Compared with the traditional datasets, with this approach, all the adverse effects records of vaccines can be imported into a dynamic visualized graph database, and the relationships of each record can be directly monitored. The graph database was built based on the previously discussed settings. As shown in Figure 3, the interface is user-friendly, and the record of each patient and each vaccine can be easily visualized. In this example, it shows the subgraph of Patient 1117806 (Patient node in the center of the graph), who has been reported to have taken PFIZER\BIONTECH COVID-19 Vaccine (Vaccine node in purple) and developed Symptoms, including Blood Pressure Increased, Flushing, Nausea, Dizziness and Pain in Extremity (Blue Symptom nodes with HAS_SYMPTOM edges to that patient). In the meantime, the connections and correlations can be directly monitored with the neo4j database. This improves the version of the current dataset on VAERS and other databases. Compared with the traditional datasets, with this approach, all the adverse effects records of vaccines can be imported into a dynamic visualized graph database, and the relationships of each record can be directly monitored.

Application of the GraphDB for Prediction
Besides the advantage of a user-friendly visualized interface and the presentation of all the records in one platform, the graph database built with neo4j can be directly utilized to perform searching, data analysis, train machine learning or deep learning models. In the following part, we tested one application using the graph database built with COVID-19 vaccine adverse effects data.

Application of the GraphDB for Prediction
Besides the advantage of a user-friendly visualized interface and the presentation of all the records in one platform, the graph database built with neo4j can be directly utilized to perform searching, data analysis, train machine learning or deep learning models. In the following part, we tested one application using the graph database built with COVID-19 vaccine adverse effects data.

Querying
With the vaccine adverse effect knowledge graph in hand, we can use the cypher query to search for the connections between vaccines and other entities easily. Some example results from the cypher query are shown in Tables 9-12.

Querying
With the vaccine adverse effect knowledge graph in hand, we can use the cypher query to search for the connections between vaccines and other entities easily. Some example results from the cypher query are shown in Tables 9-12.

Similarity
Using the trained node2vec embeddings, we can explore the knowledge graph by finding potential connections about vaccines via the similarity score. Since we have saved both embeddings and the top cosine similarities in the GraphDB, we can also query the similarity result on the fly. Tables 13 and 14 show some similarity results on COVID-19 vaccines.

Classification
A machine learning model was trained with the graph database to predict the potential hospitalization and death of a patient that received a COVID-19 vaccine and showed certain symptoms. The model results are shown in Table 15. From the table, we see models trained on embedding only features that do not give us a good result. Our hypothesis is that the embeddings do not learn useful information for this task. Actually, the embedding method is an unsupervised model, so the learned representations cannot be directly applied to the downstream tasks. To improve the results, further experiments can be explored by using embeddings as the first layer of the representations and training neural network-based models on labels so they can be fine-tuned toward the downstream task.

Evaluation of the Prediction Model
To evaluate the performance of the prediction on hospitalization with symptoms of patients after receipt of COVID-19 vaccines, the data were randomly split into training and testing groups (Figure 4). The prediction of the testing group was performed with the trained model with the training batch, and the results were compared with the real outputs of the testing batch. With this approach, the f1 score was utilized to present the performance of the evaluation since the f1 score can take both the precision and recall into consideration and, in this case, both precision and recall are important. From the results, our model on the prediction of hospitalization with symptoms has an f1 score of 0.67 in the testing group for positive predictions. The f1 scores of Eysha Saad's death risk prediction models on VAERS origin data are between 0.64 and 0.7 using CNN, LSTM and BiLSTM [17]. Though our model predicts the hospitalization task, the simple XGBoost model we built can reach a similar level of accuracy. This indicates that the prediction model using the knowledge graph built can be useful in the analysis of the potential hospitalization and death rate of the patient after receipt of the COVID-19 vaccines. our model on the prediction of hospitalization with symptoms has an f1 score of 0.67 in the testing group for positive predictions. The f1 scores of Eysha Saad's death risk prediction models on VAERS origin data are between 0.64 and 0.7 using CNN, LSTM and BiLSTM [17]. Though our model predicts the hospitalization task, the simple XGBoost model we built can reach a similar level of accuracy. This indicates that the prediction model using the knowledge graph built can be useful in the analysis of the potential hospitalization and death rate of the patient after receipt of the COVID-19 vaccines.

Discussion
Our study provides a general method to present all the recorded side effects of vaccines approved by the FDA. One previous study hosted on the Eureka Research Platform collected data from 19,586 registered participants from 26 March 2020 to 19 May 2021 and reported the percentage rate of COVID-19 vaccine adverse effects by participant's characteristics [7]. While in our study, we proposed a knowledge graph-based system to analyze COVID-19 vaccine side effects. With the user-friendly interface, the users can visualize the side effects and data through the built-in interface.
In addition, we demonstrated examples of building machine learning models to predict hospitalization after COVID-19 vaccination using the system. To model the relationships among the nodes, we calculated node2vec network embeddings for all of the nodes. Node2vec is a random walk-based method that is able to capture higher-order relationships. However, it does not differentiate between node or relationship types. To improve this, there are heterogeneous network embeddings and embeddings over temporal graphs that could be used as better knowledge graph representations [20,21]. However, with limited datasets, in order to better evaluate the model, the real data of patients should be introduced into the system and compared with the suggestions from healthcare providers to tackle the goal of making suggestions on the vaccination of a certain patient. A study in Malaysia established the incidence of adverse COVID-19 drug effects based on the data provided by Sungai Buloh Hospital [22]. This indicates the potential of machine learning models in supporting COVID-19 treatments in the future; for example, in this project, due to limited resources to get suggestions from healthcare experts. As a proposal for the evaluation step, it should be more precise than the precision, recall and f1 score are evaluated with the prediction of the model and the doctor's suggestions for those patients.
Even though COVID-19 vaccines can effectively reduce serious illness, hospitalization and death rate, there are still vaccine-hesitant patients who will not take vaccines for themselves or their families and the low rate of COVID-19 vaccines acceptability was observed in some other countries [23,24]. The public acceptability rate was reported only at

Discussion
Our study provides a general method to present all the recorded side effects of vaccines approved by the FDA. One previous study hosted on the Eureka Research Platform collected data from 19,586 registered participants from 26 March 2020 to 19 May 2021 and reported the percentage rate of COVID-19 vaccine adverse effects by participant's characteristics [7]. While in our study, we proposed a knowledge graph-based system to analyze COVID-19 vaccine side effects. With the user-friendly interface, the users can visualize the side effects and data through the built-in interface.
In addition, we demonstrated examples of building machine learning models to predict hospitalization after COVID-19 vaccination using the system. To model the relationships among the nodes, we calculated node2vec network embeddings for all of the nodes. Node2vec is a random walk-based method that is able to capture higher-order relationships. However, it does not differentiate between node or relationship types. To improve this, there are heterogeneous network embeddings and embeddings over temporal graphs that could be used as better knowledge graph representations [20,21]. However, with limited datasets, in order to better evaluate the model, the real data of patients should be introduced into the system and compared with the suggestions from healthcare providers to tackle the goal of making suggestions on the vaccination of a certain patient. A study in Malaysia established the incidence of adverse COVID-19 drug effects based on the data provided by Sungai Buloh Hospital [22]. This indicates the potential of machine learning models in supporting COVID-19 treatments in the future; for example, in this project, due to limited resources to get suggestions from healthcare experts. As a proposal for the evaluation step, it should be more precise than the precision, recall and f1 score are evaluated with the prediction of the model and the doctor's suggestions for those patients.
Even though COVID-19 vaccines can effectively reduce serious illness, hospitalization and death rate, there are still vaccine-hesitant patients who will not take vaccines for themselves or their families and the low rate of COVID-19 vaccines acceptability was observed in some other countries [23,24]. The public acceptability rate was reported only at 37.4% in Jordan [23]. With the machine learning model built, a better understanding of the safety of vaccines could be provided; further prediction can be performed with the chance of patients getting adverse effects after taking the vaccine, and provide the corresponding recommendations to the patients or the parents on whether themselves or their children should take the vaccine or not.
Another part of the future work is to improve the automatic importation of adverse effects into the interface. In our case, COVID-19 vaccines reported findings quickly and with software engineering methods, and the model can report the most updated analysis results with the up-to-date data. Further, if more data from the patients are available, for instance, with access to more medical history data, we can infer the relationship between medical history and vaccine adverse effects. This can be achieved by crosslinking different databases using ontologies. With the machine learning model built, further prediction can be performed with the chance of patients getting vaccine adverse effects after taking the vaccines and provide the corresponding recommendations to the patients on whether they should take the vaccine or not.
In summary, our study provided a general method to summarize and present COVID-19 vaccine adverse events using a knowledge graph. This enables patients, healthcare professionals and government officials to quickly visualize the side effects. Though the study may contribute to better regulation and monitoring of COVID-19 vaccines, due to limited resources and datasets, future work is required to correlate medical history and vaccine adverse effects with the crosslinking of different databases using ontologies. Author Contributions: Z.L., X.G. and C.L. contributed to conceptualization, methodology, analysis and writing. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset used in the current study will be made available on request.