Adverse drug reactions (ADRs) are part of the leading cause of morbidity and mortality in public health. Research has indicated that death and hospitalizations due to ADRs number in the millions (up to 5% hospitalizations, 28% emergency treatments, and 5% death), and the related consumption is approximately 75 billion dollars annually [1
]. Post-marketing drug safety monitoring is therefore essential for pharmacovigilance. Regulatory agencies (e.g., the Food and Drug Administration (FDA)) establish and support spontaneous reporting systems (SRS) to monitor the most current pharmacovigilance activities in the United States. Suspected ADRs may be raised by patients and healthcare providers through these surveillance systems. However, biased and underreported events limit the effectiveness of these systems, which report an estimated ADR rate of approximately 10% [4
Social media, especially health-related social networks (e.g., DailyStrength (http://www.dailystrength.org
) and AskaPatient (https://www.askapatient.com/
)), enable both the patients and nursing staff to share and obtain comments regarding drug safety. Drug reviews of patient feedback on social media are a potential and timely source for ADR identification [5
]. User reviews contain sentiment information (i.e., positive, negative or neutral expressions) to provide important features for ADR identification [7
], and sentiment features can marginally improve ADR detection in health-related forum reviews [8
In this study, based on the intuition of patient reviews about adverse drug reactions (ADRs) expressing negative sentiments, we aim to recognize ADRs through sentiment classification, which is commonly used to complete ADR identification through social media reviews [9
]. The current sentiment classification methods are typically divided into three categories: (1) lexicon-based methods, (2) traditional machine learning methods, and (3) deep learning methods. Lexicon-based methods have implemented a string-matching method that matches the detected terms to predefined drug adverse event lexicons [10
]. However, lexicon matching cannot easily distinguish whether a drug-related event is related to an ADR or to an indication for a medication. In addition, the characteristics of social media language (e.g., informal, vernacular, abbreviations, symbols, misspellings, and irregular grammar) further limit the precision of the lexicon matching method in ADR identification.
Traditional machine learning classifiers (e.g., conditional random fields (CRFs)) [12
] combine knowledge bases with sentiment-related text features. However, the fixed-width window mechanism of CRFs only considers the target word and its neighbouring words in the scope of their input; therefore, important information associated with more distant words may be excluded.
Deep learning models (e.g., convolutional neural networks CNNs) [14
] may limit CRF’s. Hierarchical CNNs specialize in extracting position-invariant features. Given the specificity of social media user reviews, an entire sentence may describe a positive sentiment, but the phrases that contain a negative sentiment (e.g., “don’t” and “miss”) may appear. Thus, the long-short-term-memory (LSTM) network (specifically a class of recurrent neural networks (RNNs) [17
] with a sequential architecture can be used to correctly process long sentences. The LSTM ’memory mechanism, which is well suited for marking tasks, has a hidden state to remember previous labeling decisions and then labels the current token. However, LSTM does not perform well in the emotional classification of social media to complete a key-phrase recognition task [19
Furthermore, a deep learning model is an end-to-end model, allowing the computer to automatically learn sentiment features, thereby reducing feature-extracted complexity and incompleteness. However, a successful deep learning model depends on large-scale labeled data, and obtaining massive labeled training data manually is time-consuming and expensive. The lack of large-scale labeled data has become a bottleneck for deep learning in ADR identification-related research [20
To reduce the limitations of deep learning, researchers mine the information from the data generated by users (e.g., sentiment ratings, tweets, reviews, and emoticons), which is helpful in the training of sentiment classifiers. However, the behaviour of labeling texts, which users designate as predefined labels for each review, is arbitrary and has no uniform standard. These labeled data are noisy (a high score with a negative review) and are called weakly labeled data [21
]. The classification model influenced by noise data in weakly labeled data will lower the accuracy [22
In this work, we propose a deep learning framework for the sentiment classification of drug reviews. The framework utilizes a weakly supervised mechanism (WSM) that applies weakly labeled data to pre-train the parameters of the model and then uses the labeled data to fine-tune the initialized parameters. First, we attempt to leverage a large quantity of weakly labeled data to pre-train a deep neural network that reflects the drug reviews’ sentiment distribution in the neural network. Second, we utilize a small quantity of labeled data to fine-tune the network and learn the target prediction function. In contrast, previous training methods, usually based on weakly labeled data, directly learn the target prediction function, which can impact the prediction function because of the noise in the data. CNN is better at classifying sentences with simple syntactic structure. LSTM can capture long-distance dependencies in comment statements and is better at "understanding" the semantics of sentences as a whole. Through the training framework of "weak supervised pre-training + supervised fine-tuning", the influence of noise on the model training process is reduced, and a large amount of useful information in the weak labeled data is better "remembered" in the depth model. The time efficiency of CNN, LSTM and CNN_LSTM are not very different when we use our small datasets. Our method performs well in ADR recognition.
We propose a model that applies the WSM combining the strength of the CNN and bi-directional long-short-term memory (Bi-LSTM) [23
] (named WSM-CNN-LSTM) to complete the sentiment classification task of ADR reviews. The WSM-CNN-LSTM model includes two parts: the CNN employs a convolutional layer to study and extract the characteristics of the drug review and active features of different scales within the drug reviews. Then, the Bi-LSTM seizes past and future information by the forward and backward networks, respectively, and utilizes the sentence sequence information to compose features sequentially and output the regression results.
To effectively train the WSM-CNN-LSTM model, we collect drug reviews identified as weakly labeled datasets, containing 61,263 comments from the AskaPatient.com forum to pre-train a deep neural network. Additionally, a manually labeled dataset containing 11,083 comments is used to fine-tune the network to learn the target prediction function. Sufficient experiments are designed and implemented to validate the effectiveness of the WSM-CNN-LSTM model.
In this work, our contributions are as follows:
We propose a novel method that uses a WSM for the sentiment analysis of ADR reviews to avoid a large amount of manually labeled data. The WSM greatly reduces the influence of noise on the model in the weakly labeled data. To our knowledge, this is the first work in the health forum, particularly in the field of drug review sentiment analysis.
We propose a novel architecture named WSM-CNN-LSTM to complete the task of ADR identification. This model reports that the stand-alone CNN model performs poorly in the characteristics of the long text of most drug reviews, while adding feed-forward and feed-back neural networks dramatically improves the classification effects.
We validate that the WSM-CNN-LSTM model presents superior performance in ADR identification through experiments, in which a large amount of weakly labeled data is utilized to pre-train a deep neural network and a small quantity of labeled data is used to fine-tune the network and learn the target prediction function. Our proposed training method avoids the direct use of a weakly labeled data training target prediction function, which can partly reduce the influence of noise on the prediction function.
This paper is organized as follows. The weakly supervised multi-channel CNN-LSTM model proposed in this paper is introduced in Section 2
. In Section 3
, the experimental process and results are discussed. Finally, Section 4
is conclusions and presents directions for future work.