The 2011–2020 Trends of Data-Driven Approaches in Medical Informatics for Active Pharmacovigilance

: Pharmacovigilance, the scientiﬁc discipline pertaining to drug safety, has been studied extensively and is progressing continuously. In this ﬁeld, medical informatics techniques and interpretation play important roles, and appropriate approaches are required. In this study, we investigated and analyzed the trends of pharmacovigilance systems, especially the data collection, detection, assessment, and monitoring processes. We used PubMed to collect papers on pharmacovigilance published over the past 10 years, and analyzed a total of 40 signiﬁcant papers to determine the characteristics of the databases and data analysis methods used to identify drug safety indicators. Through systematic reviews, we identiﬁed the difﬁculty of standardizing data and terminology and establishing an adverse drug reactions (ADR) evaluation system in pharmacovigilance, and their corresponding implications. We found that appropriate methods and guidelines for active pharmacovigilance using medical big data are still required and should continue to be developed.


Introduction
Pharmacovigilance is the pharmacological science pertaining to the collection, detection, assessment, monitoring, and prevention of adverse events related to drug safety issues [1]. Adverse drug reactions (ADRs) are mostly caused by the pharmacological action of drugs and factors such as drug-drug and drug-food interactions, drug errors, allergies, and metabolism [2,3]. ADR is a leading cause of death in the United States; moreover, ADR causes a larger number of deaths than lung disease, diabetes, HIV/AIDS, and pneumonia [4,5]. Therefore, it is important to identify all possible drug reactions with the aid of pharmacovigilance [3].
Even if the efficacy and safety of a drug are verified through clinical trials, it may still be necessary to conduct post-marketing surveillance because clinical trials clearly possess limitations [6]. In actuality, prescription and dosage errors could arise, and issues could exist in medication compliance as well. Additionally, whereas patients suffering chronic diseases require lifelong drug intake, most clinical trials have only a fixed period. ADRs could occur if a drug is administered for long periods. In other words, post-marketing surveillance is becoming increasingly important, requiring pharmacovigilance to be performed [6].
Pharmacovigilance utilizes various big data sources, including spontaneous reporting systems (SRS), medical literature, electronic health records (EHR), and social media [7][8][9]. Pharmacovigilance comprises two different systems, namely, passive surveillance and active surveillance systems [10,11]. Passive surveillance relies on SRS from medical personnel and patients; this case possesses a severe limitation of underreporting, i.e., less than 1% of ADRs are reported [12]. In active surveillance, various databases based on EHRs, which contain detailed patient information, could be constructed [13]. Additionally, active surveillance can be used to identify new drug safety signals or verify the indicators identified through passive surveillance [11]. Immediate monitoring of ADR, improvement of efficacy, and exploration using various natural language processing (NLP) technologies are indispensable from the perspective of medical informatics [14,15]. Therefore, a systematic medical informatics approach is required to apply appropriate techniques to pharmacovigilance systems.
Pharmacovigilance, which is a dynamic discipline, has evolved significantly since the 1972 World Health Organization (WHO) technical report [16]. Since then, various studies related directly to patient safety have been conducted and many related review papers have been published.
This work reviews the entire range of pharmacovigilance studies conducted over the last 10 years and investigates the overall trends. The main results and limitations of these studies are summarized by categorizing them into sections ranging from collection to monitoring. Finally, we emphasize the role and necessity of medical informatics in active pharmacovigilance and pharmacovigilance platforms.

Published Trends
We used the PubMed database of publications in life sciences and biomedical research from the United States National Library of Medicine (NLM) to examine the overall trends in pharmacovigilance research [17]. We extracted papers published from 1 January 2011 to 31 December 2020 using keywords that describe and include all fields of "Pharmacovigilance", "Adverse drug reaction", and "Pharmacovigilance systems (Collection, Detection, Assessment, Monitoring)", as shown in Table 1. A total of 3322 related papers, composed of journal articles, meta-analyses, reviews, systematic reviews, and observational studies, have been published over the past 10 years. We excluded papers based on their title and abstract, and manually reviewed the criteria for improper subject design, as shown in Figure 1. Finally, 10 papers each were selected for collection, detection, assessment, and monitoring.

Pharmacovigilance Systems
We summarized and discussed the trends observed for each pharmacovigilance system that collects, detects, assesses, and monitors the adverse events related to drug safety issues [1]. For an overall understanding of the systems, we expressed the detection methods used in representative databases, standard terms in assessment, and activities performed in monitoring. Then, we included the keywords and representative terms for each system, as shown in Figure 2.

Pharmacovigilance Systems
We summarized and discussed the trends observed for each pharmacovigilance system that collects, detects, assesses, and monitors the adverse events related to drug safety issues [1]. For an overall understanding of the systems, we expressed the detection methods used in representative databases, standard terms in assessment, and activities performed in monitoring. Then, we included the keywords and representative terms for each system, as shown in Figure 2.

Data Collection
Collection is the key component of an active surveillance system that accesses and extracts data from pharmacovigilance-related databases. It is important to use the appropriate data for this study. We identified significant drug safety indicators using the databases. Drug safety indicators can be found from various databases, such as EHRs, claims data, registries, spontaneous reports, and the literature [18]. Based on studies that used various databases, we classified the databases into nine categories: EHRs, SRSs, structured product labeling (SPL), drug information databases, claims databases, genetics and biochemical databases, bibliographic databases, and social media data. The classification criteria determined whether the search involved clinical institution data, spontaneous reports, or heterogeneous data from the literature, and determined how the overall composition of the drugs was analyzed. We selected and reviewed 10 papers [19][20][21][22][23][24][25][26][27][28] that investigated a diverse range of databases, and summarized the data, objectives, and methods used in each study (Table 2).

Pharmacovigilance Systems
We summarized and discussed the trends observed for each pharmacovigilance system that collects, detects, assesses, and monitors the adverse events related to drug safety issues [1]. For an overall understanding of the systems, we expressed the detection methods used in representative databases, standard terms in assessment, and activities performed in monitoring. Then, we included the keywords and representative terms for each system, as shown in Figure 2.   Bihan et al. [29] published the most recent review article on different pharmacovigilance databases, including VigiBase of the WHO [30], EudraVigilance of the European Medicines Agency (EMA) [31], the FDA Adverse Event Reporting System (FAERS) of the United States Food and Drug Administration [32], and the French pharmacovigilance database (FPDB) [33]. Additionally, databases are available for nationally managed systems, clinical institution data, drug information, and from the literature; for example, Sentinel [34], Exploring and Understanding Adverse Drug Reactions (EU-ADR) [35], EHR [36], DrugBank [37], and MEDLINE [38].

ADR Detection for Pharmacovigilance
In detection, advanced methods are available to find drug safety indicators using the databases identified through collection. These indicators can be used to detect drug-related problems, typically ADRs, adverse drug events (ADEs), or drug-drug interactions [39]. To find indicators from EHR data, the measurement results or prescription history (including patient information) can be used along with drug information or related label information for other registries. The methods to be used differ depending on the database and can be selected by comparing their performance appropriately. Thus, we categorized the detection methods into five types to organize tables with relationships between the databases and methods. Each method involves statistical analyses (descriptive analysis, cohort study, computational analysis, disproportionality analysis), text mining, NLP, machine learning, and deep learning. Initially, if statistical analysis was dominant, then NLP and machine learning techniques were applied as developments in computer technology.
As shown in Table 3 [40][41][42][43][44][45][46][47][48][49], some studies have examined drug-related correlations by using computational methods [42,47] and survival analysis [40,46] with statistical approaches. These studies analyzed the drug exposure groups and non-exposure groups through comparison. As the first step of detection algorithms or methods, most studies are preprocessed through data mining or NLPs [44,47,49]. With the active development of artificial intelligence methods in big data analysis, various machine learning [44,45,47] or deep learning techniques [48] have been applied to pharmacovigilance, and more studies have integrated and analyzed two or more databases rather than only a single database [40,42,43,45,[47][48][49]. Potential drug safety indicators could be detected using various methodologies and approaches.

Assessment for ADR
Assessment through clinical or scientific interpretation is a requirement to evaluate drug safety indicators. To prepare an evaluation system and expand the reference standard, a systematic strategy is required. Reference sets used commonly in pharmacovigilance include side effect resource (SIDER) [50], The Observational Medical Outcomes Partnership (OMOP) [51], and EU-ADR [35]. These references set the standards for known drug-related information that provide positive and negative controls based on drug safety indicators and ADR. Different terminology systems exist in pharmacovigilance (e.g., MedDRA and WHO Adverse Reactions Terminology (WHO-ART)); thus, unification and mapping between the terms must be performed to develop reference standards.  Table 4 summarizes the related papers and their properties [52][53][54][55][56][57][58][59][60][61]. Some studies approached the signals in specific diseases in more detail, such as pancreatitis [52] and liver injury [53,54,58], and used EHR [36] or observational data that contain more patient information than other databases. Additionally, studies that proposed the refer-ence standards to identify drug-induced ADRs did literature analysis used bibliographic databases [53,54,56,57]. We found that experts worked manually to determine the signals and produce results [53]. In particular, Oosterhuis et al. [60] developed a causality documentation (CausDOC) tool that combines algorithms and expert judgment to provide nine relevant structured questions to assess the causality of ADRs.

ADR Monitoring for Patient Safety
Monitoring involves the continuous follow-up and safety management of a patient's condition, and can be said to be the ultimate purpose of pharmacovigilance. Table 4 lists the studies on monitoring [62][63][64][65][66][67][68][69][70][71] in chronological order. We could investigate early studies conducted on a large-scale project basis [62,66]. From 2009 to 2013, Pal et al. [66] carried out the Monitoring Medicines (MM) Project that consisted of 11 consortium partners. Mobile applications have also been developed (e.g., MedWatcher) to improve the spontaneous reporting of patients and the management of reports with patient information [68].
With the activation of social network systems (SNS), SNS data (e.g., Twitter and Facebook) have been analyzed as additional study data [69,70]. Table 5 shows that recent studies use SRSs and social media data as the main analysis data sources and use EHR data or ADE databases as supplemental data [68][69][70][71]. In addition to these recent trends, patient-generated data can be collected through health care services or wearable devices, which requires more diverse monitoring methods and protocols.

Discussion
Over the past 10 years, the numbers of pharmacovigilance studies have been increasing steadily. These studies have resulted in appropriate drug use regulations and guidance being issued. We found, through a literature review, that the main concerns in pharmacovigilance are the difficulty of standardizing data and terminology and establishing an ADR evaluation system. The databases and related websites provide mapping files or materials for terminology. However, even with expert guidance, pre-processing the data for practical analysis is time-intensive; many cases exist where a clinician or a pharmacist has to process the data manually. Therefore, reference standards could serve as ground truth for evaluating the ADR signals [53,65,70].
Many studies have developed new detection algorithms and detected drug safety indicators, but the results rarely lead to actual drug safety management. In addition, the studies using patient data are limited to single institution data, emphasizing the necessity of multicenter studies [72][73][74]. Thus, a platform that can comprehensively manage and systematically access all pharmacovigilance systems is required. Such a platform can contribute to practical drug safety management by organizing the system of collection, detection, assessment, and monitoring into a new standard protocol.
For collection, the conversion and structuring of data into a common data model (CDM) format should be supported. Clinical CDM, medical images, and medical device CDM can be used in pharmacovigilance. As shown in Figure 3, data marts for each study design are also required. Based on the pharmacovigilance platform, anonymized data and analysis results from signal detection can be shared with registered researchers or institutions. Through multicenter research and meta-analysis of the results, the validity of the signals can be verified. By developing a monitoring app linked to the platform, patient conditions can be monitored and prescribed drugs can be managed to ensure patient safety.  In this study, we examined the roles and importance of medical informatics in pharmacovigilance. We summarized the characteristics and results of various studies on pharmacovigilance, and identified the trends of methodology Overall, a detailed approach is required to prepare a system that can integrate and analyze big data containing plenteous information. In active pharmacovigilance, the application of the data-driven approach is expanding gradually; nevertheless, further research is required on the perspective of medical informatics.   In this study, we examined the roles and importance of medical informatics in pharmacovigilance. We summarized the characteristics and results of various studies on pharmacovigilance, and identified the trends of methodology Overall, a detailed approach is required to prepare a system that can integrate and analyze big data containing plenteous information. In active pharmacovigilance, the application of the data-driven approach is expanding gradually; nevertheless, further research is required on the perspective of medical informatics.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest with respect to the research, authorship, and/or publication of this article.