1. Introduction
COVID-19 has greatly affected several areas in our lives, the environment, profit, mental health, and public transportation. In the year 2020, the global economy contracted by 3% with a significant loss of around USD 9 trillion. Only 34% of enrolled students in the world were able to enjoy the benefits of physical or proper education, and various business sectors experienced a great decline in productivity and employment [
1]. The paper titled “The Mental Health Consequences of COVID-19 and Physical Distancing” stated that COVID-19 has brought short- and long-term consequences in people’s mental health and wellbeing. The authors also stated that just as the severe acute respiratory syndrome (SARS) was associated with patients and medical personnel having post-traumatic stress disorder (PTSD), stress, and psychological distress, COVID-19 appears to have triggered anxiety, depression, loneliness, violence, and substance abuse. Furthermore, because the schools have been closed, there is a real possibility of child abuse [
2].
COVID-19 has also resulted in a significant decline in public transportation demand and revenue, which has not happened before, because of government orders and personal choices as people refrained from traveling to minimize the risk of acquiring the virus. Thus, there has been a dramatic impact on lifestyle and travel worldwide, ranging from the decline of air travel services to the increase in demand for online communication [
3]. India is suffering from a huge increase in positive COVID-19 cases, and there is fear of mass destruction and casualties. Because the virus is contagious, isolation of the infected person from the family and fear of losing family members are the consequences of COVID-19 [
4].
A huge area of our lives has been affected by the COVID-19 pandemic. Fortunately, there are now vaccines available to provide immunity against the harm of the virus. In the ASEAN region, the Philippines was the last to receive its COVID-19 vaccine shipment [
5]. A total of 600,000 doses of China’s Sinovac, which came from WHO’s COVAX initiative and a donation from the People’s Republic of China, arrived in the Philippines on 28 February 2021 [
6]. A total of 525,600 doses of the Oxford–AstraZeneca vaccine also arrived 12 days later [
7], with 400,000 more doses of China’s Sinovac arriving 12 days after that [
8].
On 1 March 2021, almost a year into the pandemic, the Philippines began its vaccination program, starting with health professionals [
9]. The Philippine Department of Health devised its vaccine rollout plan titled “The Philippine National Deployment and Vaccination Plan for COVID-19 Vaccines” [
10], also known as “ResBakuna”. The document highlights the priority list of those who are eligible to receive the vaccine and is listed in
Table 1.
Because the government devised a vaccination rollout plan, Filipinos are enjoined to be vaccinated to acquire immunity against the virus. However, individuals can freely choose whether or not to be vaccinated. In this context, the goal of this study was to analyze the sentiment of Filipinos towards COVID-19 vaccines through the social networking site Twitter and classify them into positive, neutral, and negative polarities. The results of this study can help the Philippine government to make wise decisions about fund allocation, vaccination provision, and strategic scheduling of its vaccination rollout plans. The proposed method can be applied to English and Tagalog tweets to classify them according to their polarity, which can be used for similar studies. Previous studies successfully utilized data from Twitter regarding the local airlines and COVID-19 in the Philippines. However, because the COVID-19 vaccination only came recently, there have been no studies regarding this issue. In line with this, the researchers made use of all the tweets in the first month of the implementation of the vaccination program. The main contribution of this study can be summarized as follows:
It automatically labels the polarity of both English and Filipino language tweets.
It reports the sentiments of Filipinos towards COVID-19 vaccines.
The government can use this study as a tool to make wise decisions regarding the vaccination program.
The proposed model can continuously analyze incoming tweets to monitor any updates or changes in the attitudes of Filipinos towards COVID-19 vaccines.
2. Related Literature
Nowadays, researchers are using posts on social media for analysis in order to achieve or predict results. Social media is a great platform to express sentiments, views, and opinions [
11]. Twitter is one of the most widely used social media platforms in the world, where users can post anything in their mind. Twitter has over 100 million active users [
12], and the number of tweets posted every day can reach up to 500 million [
13]. Twitter allows people to genuinely express themselves in a timely manner. This is different from traditional face-to-face interviews, where the interviewee’s response may be affected because of the nervousness brought on by the live communication between the interviewer and interviewee [
14]. When users are on their solitude, they can easily express themselves in a genuine manner that’s why twitter is a good platform to use for analyzing true public sentiment [
14]. Because users can freely share their location, comments, opinions, and feelings, albeit limited within 280 characters, it is suitable in studies that require opinion analysis. Moreover, due to Twitter’s application programming interface (API) and database access being available to the public, the data collection can be easier [
14].
Natural language processing (NLP) is used to retrieve information from a given text [
11]. This is a process where the computer extracts meaning from sentences made by a human. NLP can be used in text mining, language translations, and programmed question answering [
15], such as chat bots used by businesses to cater simultaneous customer queries. Sentiment analysis, which is also known as opinion mining, is a computational study of opinions, sentiments, and emotions conveyed in given words or sentences [
16]. The researchers made use of preprocessed tweets using NLP techniques and classified the sentiment expressed by the tweets into positive, neutral, and negative polarities.
Different frameworks for sentiment analysis using Twitter data have been proposed. One of the methods is the attention-based bidirectional CNN–RNN deep model (ABCDM). This framework utilizes two independent bidirectional long short-term memory (LSTM) and gated recurrent unit (GRU) layers to extract past and future contexts by taking into consideration temporal information flow in both directions, which achieves state-of-the-art results for both short and long reviews [
17].
Another method is the bidirectional emotional recurrent unit (BiERU) for conversational sentiment analysis, which extracts the sentiment of each message in a text conversation. This method proposes a fast, compressed framework based on emotional recurrent units with fewer parameters. The BiERU model is party-independent and thus suitable to be integrated in multiparty conversations without the need for adjustments [
18].
Sentiment analysis is also useful in accurately detecting traffic accidents by utilizing social media posts and NLP techniques, as shown in the study by Ali et al. titled “Traffic Accident Detection and Condition Analysis Based on Social Networking Data”. The authors of the study used ontology and latent Dirichlet allocation (OLDA) for topic labeling to extract traffic-related posts and discard other topics. They also trained bidirectional long short-term memory (Bi-LSTM) with SoftMax regression to classify texts according to their polarity, with all the necessary reports organized to be sent to the police station and emergency management office for immediate action [
19].
In the medical field, sentiment analysis is also utilized to analyze posts relating to drug reviews or side effects on social media together with the patient’s medical records to recommend personal diabetes and blood pressure (BP) healthcare treatment. The study by Ali et al. titled “An Intelligent Healthcare Monitoring Framework Using Wearable Sensors and Social Networking Data” used Bi-LSTM with ontologies to classify diabetes, BP, mental health, and side effects of medicine as well as Hadoop MapReduce with machine learning to reduce the size of data about patient treatments [
20]. This healthcare monitoring framework promotes timely monitoring of diabetes and BP patients regarding their health condition before it worsens.
RapidMiner (RM) is a data science software consisting of data preprocessing techniques, machine learning algorithms, and model building operators [
21]. A lot of NLP techniques are available in RM, such as case transformation, tokenization, stemming, stop words removal, etc., which preprocesses texts to obtain meaningful relationships between words and determine what the sentence implies.
Naïve Bayes is commonly used to solve classification issues. It has also been noted that Naïve Bayes performs accurately in determining the true polarity of a given sentence, even in unbalanced datasets [
22]. Moreover, this model is a high-bias and low-variance classifier that works well even in a small dataset [
21]. Naïve Bayes comes from two words. Naïve comes from this method assuming that one occurrence of a certain feature is independent of the occurrence of other features. Thus, each feature contributes individually to classification without dependence on other features. Bayes comes from the principles of the Bayes’ theorem [
23] and this classifier calculates the probability of an event in a series of steps which will be discussed in
Section 3.4. of this paper [
24].
A previous study used Twitter data from the Philippines to examine local airline sentiments [
15]. This study successfully determined the attitude of Filipinos towards the country’s local airlines by applying NLP techniques and comparing three classifier algorithms. Samonte et al. used Naïve Bayes, support vector machine, and random forest to develop a model, and the results showed that Naïve Bayes yielded the highest accuracy (66.67%) in determining the true polarity of tweets. Inspired by this study, Abisado et al. [
22] conducted research on Twitter sentiments of Filipinos during the COVID-19 pandemic using multinomial Naïve Bayes classifier, which revealed that 52% of Filipinos have a positive attitude and 48% have a negative attitude towards the pandemic; the classifier model yielded 72.17% accuracy.
Several research attempts have been made to determine the polarity of given texts through sentiment analysis, especially regarding the COVID-19 pandemic. This study focused on determining the stance of Filipinos regarding vaccination. Moreover, a classifier model was developed using the Naïve Bayes classification algorithm to classify the sentiments expressed in tweets relating to COVID-19 vaccines into positive, neutral, and negative polarities. The classifier analyzed tweets written in both English and Tagalog, which are the two languages most commonly used by Filipinos to express their sentiments. The results and methodologies from previous studies [
15,
22] were used to study this critical issue. The findings of this study will help the Philippine government make wise decisions in allocating funds and devising vaccination rollout plans. A comparison of tweet classification results in the Philippines using RM is listed in
Table 2, including the authors, classifier algorithm, and the results obtained.
The researchers made use of the study by Samonte et al. titled “Sentiment and Opinion Analysis on Twitter about Local Airlines”, which compared Naïve Bayes, support vector machine, and random forest to develop a classifier model using RM to recognize the true polarity of tweets and concluded that the Naïve Bayes is the best among the three in terms of accuracy (66.67%) [
15]. The same study was cited in the study by Abisado et al. titled “Philippine Twitter Sentiments during COVID-19 Pandemic Using Multinomial Naïve Bayes”, which yielded an accuracy of 72.17%. Using RM and the Naïve Bayes classifier algorithm, the proposed method in this study obtained 81.77% accuracy, which is the highest in terms of accuracy.