Analyzing User Digital Emotions from a Holy versus Non-Pilgrimage City in Saudi Arabia on Twitter Platform

: Humans are the product of what society and their environment conditions them into being. People living in metropolitan cities have a very fast-paced life and are constantly exposed to different situations. A social media platform enables individuals to express their emotions and sentiments and thus acts as a reservoir for the digital emotion footprints of its users. This study pro-poses that the user data available on Twitter has the potential to showcase the contrasting emotions of people residing in a pilgrimage city versus those residing in other, non-pilgrimage areas. We collected the Arabic geolocated tweets of users living in Mecca (holy city) and Riyadh (non-pilgrim-age city). The user emotions were classified on the basis of Plutchik’s eight basic emotion categories, Fear, Anger, Sadness, Joy, Surprise, Disgust, Trust, and Anticipation . A new bilingual dictionary, AEELex (Arabic English Emotion Lexicon), was designed to determine emotions derived from user tweets. AEELex has been validated on commonly known and popular lexicons. An emotion analysis revealed that people living in Mecca had more positivity than those residing in Riyadh. Anticipation was the emotion that was dominant or most expressed in both places. However, a larger proportion of users living in Mecca fell under this category. The proposed analysis was an initial attempt toward studying the emotional and behavioral differences between users living in different cities of Saudi Arabia. This study has several other important applications. First, the emotion-based study could contribute to the development of a machine learning-based model for predicting depression in netizens. Second, behavioral appearances mined from the text could benefit efforts to identify the regional location of a particular user.


Introduction
Emotions are a vital source of information to assess the overall psychological wellbeing and health of individuals. Emotion plays an important role in the overall behavior and reactions of an individual in different circumstances. Findings have shown that an individual's response to natural surroundings has an impact on their parasympathetic nervous system. The surroundings also have a positive impact on a person's emotional state and psychological activity level [1]. Thus, people living in varying regions will have a different emotional state. The traditional methods for studying these individual states involved questionnaires, interviews, or in-person observations. However, such studies to date generally have been marked by small sample sizes and involved labor-intensive efforts in reaching out to individuals, leading to inconsistency and inaccuracy in the inferences obtained. In the present-day world, social media sites such as Twitter, LinkedIn, and Facebook are increasingly becoming platforms for individuals to express their emotions, opinions, and ideas. These days people are more expressive on such platforms, as they can fully express themselves without revealing their real identities or endure the stigma of being judged. Thus, digital footprints allow social media platforms to serve as a rich source of information to investigate the emotional dimensions, sentiments, and psychological behavior of individuals. These data, accumulated over time, have led to granular emotion studies for every user. For example in one of the studies, tweets were used to analyze anti-LGBT campaigns in Indonesia [2]. In another work, Twitter messages were used to analyze dialogue during the Austrian presidential elections in 2016, concluding that the winning candidate sent tweets that were neutral in comparison to his opponent, whose messages were based on emotions [3].
Saudi Arabia is a country that is transforming itself into a social media powerhouse. Of its total population of 34.54 million, 25 million are active social media users, and of these, 20.03 million are Twitter users [4]. With a majority of the country's population being avid Twitter users, we can use the data they share to understand how user environments affect their overall mental wellbeing and behavior. To gauge the emotional atmospheres of residents, we studied the tweets shared by users in Saudi Arabia hailing from different backgrounds. In particular, we studied the tweets of users from a holy city or place of pilgrimage i.e., Mecca, and a non-pilgrimage metropolitan city i.e., Riyadh. This work sought to show the potential of social media data shared by users in uncovering the impact of users' surroundings on their mental health. Another goal was to show how their emotions vary across different social environments.
At present most of the related work on identifying emotions on social media has focused on assessing user sentiments. Approaches such as supervised deep learning, semisupervised methods, natural language processing, and lexicon-based analysis have been used to detect user sentiment polarity of social media data [5][6][7]. Due to the increased interest of users in performing analysis of social media conversations, a new field of emotion analysis has emerged [8]. In the current literature, studies have been carried out to understand and detect different forms of emotions such as sarcasm [9], rumors [10], depression [11][12][13], and suicidal tendencies [14]. However, there are primary limitations on literature research in this field. Given the thrust of analyzing user emotions, there is a need to focus on understanding the basic emotion categories, which act as a foundation toward effectively understating emotional behaviors. Deriving the emotions of users rather than their sentiment polarities (positive and negative) will enable us to investigate the digital emotion footprint of users. Therefore, to gain deeper insight into specific emotion dimensions and categories, we focused on the study of basic emotion categories proposed by [15]: namely, fear, trust, surprise anger, sadness, joy, disgust, and anticipation. We designed a bilingual (English and Arabic) emotion lexicon named AEELex based on the Moodbook [16] and NRC Word-Emotion Lexicon (Emolex) [17]. We have extended these dictionaries for the Arabic language, along with new words from user tweets, to form our bilingual dictionary. Following are the main contributions of our work:


The geolocated tweets from two cities in Saudi Arabia have been extracted using Twitter API (the dataset will be made available with the algorithms for the researchers working in this domain).  AEELex, a bilingual emotion lexicon, has been designed to extract the different emotions of users in two different (pilgrimage and non-pilgrimage) cities.  The potential of user tweets to harness user emotions has been studied.  The different emotion-expressing behaviors of netizens across two different cities has been analyzed and compared.  The most common emotion-based vocabulary terms (both Arabic as well as English) belonging to eight emotion categories from both user groups have been identified.
The rest of this paper is organized as follows. Section 2 presents related work. Section 3 describes the approach we adopted for our study. Sections 4-6 describe the data collection, lexicon design, and lexicon comparison and validation respectively. Section 7 presents a data analysis. Section 8 is a discussion on the inferences from the study. Section 9 contains the conclusions of the paper, outlining potential research directions and the contributions of this study.

Related Work
Social media data have been used in many applications across different domains for various purposes, including detecting humor in social media slang used on platforms such as Weibo, the largest Chinese social network [18], or analyzing the spatial patterns of emotions expressed by users visiting theme parks such as Disney World within the dimensions of high and low arousal or pleasure and displeasure [19]. Several studies have been performed to identify user satisfaction based on positive emotions such as happiness and pleasure [20]. Thus, user emotions vary with changes in their region and are also reflected in their social media accounts. One such study showed how the emotions and sentiments of users toward climate change vary between different countries such as the U.K. and Spain. It showed that Twitter users from the U.K. talked less negatively about the weather in comparison to Spanish citizens and had anticipation as their predominant emotion [21]. In [16] users′ emotions were analyzed in a region in conflict, Kashmir, and a region marked by the absence of conflict, Delhi, two union territories in India. The authors showed that people residing in to conflict-impacted regions have a negative attitude and express negative emotions such as anger, fear, and sadness in their Facebook posts. The proposed study also analyzed users' emotions across different regions in two different cities in Saudi Arabia. Furthermore, to bridge the gap between different nationals and tourists residing in a given region, and for more accurate analytics, we used data that were in the native language of the country i.e., Arabic.
Most of the work performed with tweets from Saudi Arabia has focused on sentiment and semantic analysis [22,23]. Words were labeled based on their polarity. In [24] SANA, a multi-dialect, multi-genre, and multilingual lexicon, was created based on text from the Egyptian chat room on Yahoo Maktoob. The words here were labeled based on their positive, negative, and neutral content. In [25], a study was performed on social media data collected from YouTube, Facebook, Al-Saha Al-Siyasia, and Al Arabiya based on the 2009 and 2011 floods in Jeddah. The study showed that people on YouTube posted more emotional comments than those on Facebook. The users expressed emotions such as sadness for the loss of lives and anger toward authorities. Saudi Mood [25] is a real-time tool for visualizing the emotions of users on Twitter, covering happy, sad, angry, scared, and surprised emotions only. The authors in [26] manually created a dataset of Arabic emotions using tweets, but the tweets were filtered from Egypt geolocation only. The emotions in this case were placed into the categories of happiness, joy, sadness, anger, disgust, fear, and surprise. Additionally, the data were categorized on the basis of only 5879 tweets. It has been observed that there are very few studies based on the Plutchik [15] emotion categories in the Arabic language. Most of the work in this discipline has been conducted using either sentiment or emotion analysis. However, the studies on emotion analysis have been limited either by their size or the emotion categories they assessed. This study is a maiden attempt to analyze the difference in emotions of users residing in a pilgrimage area versus those in a non-holy city. An analysis of this kind can assist in designing a machine learning-based system to predict the mental state of a user and the location of a tweet/post.

Approach Synopsis
Our analysis involved four important steps, as shown in Figure 1. In particular, we performed an analysis of the emotional behavior of people belonging to two different regions. The emotional behavior in this context could be defined as the choice of emotion words used by individuals to express their feelings and opinions.

Data Collection
Data were collected through filtered real-time streaming API [27] provided by Twitter. User tweets were collected based on their locations and language, excluding retweets. The data were harvested from selected Twitter profiles for a span of 1.5 months, with an average of 20 tweets collected per user. The collected data did not take into account the activeness of the users. The tweet language was Arabic. In particular, we focused on Saudi dialects including Najdi Arabic, Gulf Arabic, etc. The users' data were collected from two metro cities of the Kingdom of Saudi Arabia, i.e., Riyadh and Mecca. The regions were selected using the bounding box option that allows geo-tagged user tweets to be collected based on a user-defined bounding box area. The geolocation co-ordinates of Riyadh (45.04, 24.01, 47.91, 26.33) and Mecca (38.66, 18.1, 43.67, 23.97) were obtained using Klokantech bounding box tool [28]. The choice of these cities helped us in studying the contrasting emotions of people belonging to a place of pilgrimage, and a city within the same country but not a place of pilgrimage. Table 1 shows the statistics of the data collected.

Arabic English Emotion Lexicon (AEELex) Design
AEELex is a bilingual emotion lexicon for extracting user emotions. This lexicon has been developed in Arabic and English language. This section discusses the steps for data preparation and design of the proposed lexicon.

Data Preparation
At this stage, the data were prepared to suit the needs of the experiments that would be carried out for performing analysis. We collected more than 85K tweets. Without being preprocessed, these tweets were highly redundant and contained a large amount of repetitive information, such as multiple tweets by the same users, etc. To address this issue, the geolocated tweets from Saudi Arabia were collected and cleaned using the following steps.
1. Exclude redundant tweets: During this step, all of the redundant tweets were removed from the data. 2. Remove diacritics: The script of the Arabic language has many diacritics. These diacritics or consonants help in modifying the pronunciation of the language [29].  Table 2 [30]. 4. These Arabic diacritics are mostly used in classic Arabic scripts such as in the Quran and Hadith, their quotes, poetry, children's literature, or when a word has many different meanings according to its diacritics. However, adult native Arabic speakers can easily distinguish a word's meaning based upon the word context. Generally, these diacritics are considered noise that needs to be removed. Therefore, in this work, all the diacritics mentioned in Table 2 were removed. 5. Remove repeating characters: At this step, we checked our data, and if a character appeared more than once in a word, then the repeated character was removed. For example, if the word contained repeated characters such as ‫,"مبرووووووووك"‬ with the character ‫"و"‬ in this case repeated several times, we removed the repeated character and rewrote the word as ‫,"مبروك"‬ meaning congratulations. 6. Remove punctuation marks: All punctuation marks such as Remove non-Arabic words: Because the focus of this study was based on Arabic tweets, only Arabic words were kept. 8. Keep only unique user IDs while preserving their tweets: Within the data, many of the users had posted more than one tweet. Therefore, only unique users were retained but their respective tweets were merged and preserved for analysis. The schema for the final data that were used for the experiments is shown in Table 3.  Table 4 shows statistics on the data after data preprocessing. Furthermore, to make our data ready for analysis, Natural Language Toolkit (NLTK) was utilized to remove stop words (Arabic language) and tokenization. Additionally, neutral tweets were omitted, and only tweets showing positive or negative polarity were used.

Lexicon Design
The lexicon-based analysis is one of the main approaches used in sentiment analysis [31]. It includes identifying the sentiments of users based on the semantic orientation of their words [32]. This method relies on a dictionary-based approach comprising a list of words according to their semantic orientation. There are different methods of creating such dictionaries, including automatic [33] and manual [34].
For the current study, a bilingual emotion dictionary comprising emotions expressed in the English and Arabic languages was created. This dictionary is named "AEELex" and is based on eight emotions proposed by Plutchik [15] including Apart from the words belonging to the eight mood categories, additional words were manually added under the category of positive and negative emotions. We developed our dictionary based on the Moodbook and NRC Word-Emotion Lexicon (Emolex). Both Emolex and Moodbook also tag words according to Plutchik's eight emotions along with the two categories of semantic polarities-positive and negative. However, neither dictionary categorized the moods according to our context. Moreover, these dictionaries are based on words in the English language only. Therefore, we extended these dictionaries to the Arabic language, along with new words, to form our bilingual dictionary, "AEELex". This dictionary was developed using more than 19,000 tweets written in the Arabic language. Table 5 shows an example of a few mood words in the AEELex dictionary.

Lexicon Comparison and Validation
We compared our bilingual lexicon with two recently proposed emotion lexicons, one in the Arabic language proposed by M. Saad [35] and the second one, an English language lexicon, Moodbook. Table 6 highlights the characteristics of these two lexicons along with the proposed AEELex. M. Saad's Arabic language lexicon comprises six categories of emotions: namely, Fear, Anger, Sadness, Joy, Surprise, and Disgust. However, AEELex has two additional categories i.e., trust and anticipation. Furthermore, AEELex has a comparatively larger number of emotion words than the former. Therefore, the proposed lexicon provides a much more fine-grained study of emotions and can cover many numbers of tweets where the existing lexicons failed to mine any emotions. We performed a comparative analysis by identifying the emotion categories of 1000 tweets in the Arabic language collected from two cities in Saudi Arabia and applying it to AEELex (Arabic version) and the lexicon proposed by M. Saad. It was observed from the results that both lexicons were able to identify all the emotion categories. However, M. Saad's lexicon was not able to mine emotions from ~60 percent of total tweets; contrastingly, AEELex was not able to mine only 17 percent of total tweets. M. Saad's lexicon combined several emotions into a single category, which led to a lack of essence for particular emotion words such as Joy, Trust, and Anticipation-emotion categories that were combined into one category (Joy). For example, the emotion word " ‫عبادة‬ "(worship) was placed under the joy category by M. Saad, whereas AEELex classified the same word under the anticipation category rather than joy.
Furthermore, we compared AEELex with an English language lexicon, Moodbook, by analyzing 1000 tweets in the English language collected from Saudi Arabia. Both lexicons have eight emotion categories and were able to identify all the categories in the user tweets. However, it was observed that AEELex outperformed Moodbook by being able to identify a greater number of emotions in the tweets.
Thus, we can conclude that AEELex is efficient as a bilingual dictionary in identifying more emotions from text in both English and Arabic.

Data Analysis
We carried out a study to identify differences in the emotional state of Twitter users residing in Riyadh and Mecca. We aggregated all the tweets of users and analyzed them using AEELex. The analysis was performed in two phases. The first phase showed that tweets had the potential to harness their users' emotions. In the second phase, we highlighted differences in emotions shared by people residing in a religious city or holy city (Mecca) versus a metro city (Riyadh)in Saudia Arabia.

Phase 1: Sentiment Polarity Analysis (Positivity and Negativity)
The objective of this phase was to determine if the user tweets revealed the peace and positivity experienced by residents living in a holy place. Therefore, we compared the amount of positivity and negativity in user tweets. AEELex contains a list of positive and negative emotion words. Joy, Trust, and Anticipation contain words belonging to the positive category whereas Fear, Sad, Angry, and Disgust comprise words falling under the negative category. Words belonging to the Surprise category were classified based on the context. For each region, we calculated the positivity and negativity among the users with Equations (1) and (2), respectively.
where, and are the positivity and negativity of a region, respectively.
( ), ( ) are the number of positive and negative emotion words of a region, respectively, under eight emotion categories and ( ) is the total number of emotion words extracted from the tweets of a user belonging to that region.
Furthermore, the plots in Figure 2 show the proportion of positive and negative feelings expressed by people residing in the cities of Mecca and Riyadh. It is observed from the plots that users from both regions expressed positive emotions much more than negative ones. However, upon conducting the experiments (to support our comparative study), we observed that users from Mecca city expressed a high degree of positivity (~90%) in comparison to the positivity expressed by users residing in Riyadh (~70%). This can be attributed to the fact that people residing in holy places are regularly exposed to religious remembrance and rituals that inculcate a shared feeling of inner harmony due to spiritual enlightenment and inner satisfaction. Conversely, people residing in a metropolitan city such as Riyadh live a fast, competitive, stressful, and busy life. The positivity of Mecca residents could be seen from their tweets about God and friendship and positivity such as " ‫يحميك‬ ‫ﷲ‬ ‫بس‬ ‫عﻠيك‬ ‫ﷲ‬ ‫شاء‬ ‫"ما‬ (As God wills, but God will protect you), " ‫يا‬ ‫بالتوفيﻖ‬ ‫"صديﻘي‬ (Good luck my friend), " ‫ﷲ‬ ‫ييس‬ ‫ر‬ ‫امرك‬ " (May Allah make it easy for you), etc. The tweets from Riyadh usually talked about competitive affairs of day-to-day life such as "

‫اليوم‬ ‫الجو‬ ‫عﻠيل‬
" (The weather today is fine) and " ‫ايی‬ ‫آره‬ ‫مال‬ ‫منم‬ " (What kind of money is this?). Hence, from this study, we could perceive that user tweets have the potential to show the variable emotions of residents of different areas and consequently the emotional wellbeing of their users. This work thus proffers analysis of user tweets as a method of perceiving the mental state of a person.

Phase 2: Plutchik's Emotion-Based Analysis
This phase investigated how the emotions of users changed across the two regions. The varying emotions of the users were observed across different mood categories based on Plutchik's emotion wheel, as shown in Figure 3. Figure 4 illustrates how the proportion of user moods differed across the two regions. In the category of the most expressed emotion, which was Anticipation, the users from Mecca were ahead (50%) in comparison to those from Riyadh (around 40%). The users in Mecca tended to use words such as ‫"ﷲ"‬ meaning "God suffices" in their tweets: for example, " ‫يرحمه‬ ‫الغنيم‬ ‫عبدﷲ‬ ‫الشيخ‬ ‫الﻀيوف‬ ‫اﺳتﻘبال‬ ‫"ﷲ‬ (Reception of guest Sheikh Abdullah Al-Ghunaim, may God have mercy on him). As shown in the plot, users from both the regions Riyadh and Mecca experienced Joy, but a higher proportion of the users in Mecca, approximately 20%, expressed the emotion Joy. In this city, the users use words such as ‫''ورد"‬ (flower), ‫"حﺐ"‬ (love), etc. in their tweets: for example, " ‫روﻗت‬ ‫ازرق‬ ‫ورد‬ ‫انه‬ ‫ﺗخيﻠت‬ ‫بس‬ ‫انا‬ " (I just imagined that it was blue roses) expressing positivity and happiness, ‫حﺐ"‬ ‫مﻠيانه‬ ‫جميﻠه‬ ‫العميﻘه‬ ‫الصداﻗات‬ ‫اﻗدر‬ ‫"انا‬ ( I appreciate deep, beautiful friendships full of love), etc.
For emotions falling in the negative categories, Sadness ‫)حزن(‬ was the one most expressed. Users from both regions used words such as ‫"ﺗعبان"‬ (tired), " ‫الم‬ " (pain), etc. to express sad emotions; for example, tweets such as ‫النوم"‬ ‫ﻗبل‬ ‫ﺗوك‬ ‫التيك‬ ‫فره‬ ‫شكﻠها‬ ‫"ﺗعبان‬ (I am tired, it seems that because I spent a long time on Tik Tok before bed), " " (Don't always bear the pain, pass by, be the bad person for once) have been seen on user profiles. Anger ‫)ﻏﻀﺐ(‬ was the second most expressed emotion in this category, articulated with phrases such as ‫"جريمه"‬ (crime), ‫"جوع"‬ (hungry) in tweets such as ‫جريمﺔ"‬ ‫("السرﻗﺔ‬stealing is a crime), ‫فيه"‬ ‫نسوي‬ ‫وش‬ ‫ماندري‬ ‫الﻠيل‬ ‫آخر‬ ‫("جوع‬Late night hunger, what do we do?), followed by Disgust ‫)اشمئزاز(‬ expressed with words such as "‫("عيﺐ‬shameful), "‫("ﺗافه‬petty), etc., in tweets such as " ‫عيﺐ‬ ‫عيﺐ‬ ‫ﻛذا‬ ‫زي‬ ‫ﻻﺗرﺳموها‬ ‫محتشمه‬ " (Shame, shameful, don't draw it like this), ‫ﺗترﻗﺐ"‬ ‫ما‬ ‫اﻛثر‬ ‫فعل‬ ‫يمكنني‬ ‫ﻛان‬ ‫"ﺗافهﺔ‬ (Petty, I could have done more than you expect). It was also shown that the proportion of users falling into these negative categories was much higher in Riyadh than Mecca. These persistently negative feelings inculcated in a population can lead to psychological issues such as bipolar disorders, depression, etc. It has been reported that psychological ailments such as drug use, depression, and anxiety-based disorders are among the leading causes of disability in Saudi Arabia [30]. Furthermore, according to a study [36], as drugs, petrol, lighter fluids, and tobacco were considered to be easily accessible to young generations in the city of Riyadh. This could be one of the reasons why the people of Riyadh show more negative and depressive emotions in their tweets in comparison to people in Mecca city.

Discussions
In this paper, we made use of social media platforms such as Twitter to study the emotions of people residing in a holy city versus those residing in a typical metropolitan center in the same country. Twitter was used as a choice of social media in this study, as this work was focused on the extraction of sentiments (emotions) from text. The Twitter network appeared to be one of the main sources of text through which users express their feeling in just 280 characters. Other social networking platforms such as Facebook hold other user content (audios, videos, images in almost equal ratios) as well. Although the inclusion of all the different types of media content was beyond the scope of this study, it could be used in our future work. To study differences in user emotions between the two cities, we performed various experiments and conducted a comparative study based on users in these cities. To perform this comparative study, we considered tweets posted by users in Mecca (pilgrimage region) and Riyadh (typical metropolitan city).
Our findings indicate that people residing in Mecca and Riyadh have significant differences in their emotional states. The entire study was divided into two phases. In the first phase, we concluded that tweets posted by users have the potential to reveal their expressed emotions and thus their mental states. People from holy areas expressed much more positivity than citizens from a typical city. The hectic and stressful life of the city affects people negatively, leading to comparatively more psychological disorders and negative feelings such as anger, disgust, and sadness expressed in words such as "‫("عيﺐ‬shameful)," ‫الم‬ " (pain), or "‫("ﺗافه‬petty).
Conversely, people residing in religious places such as Mecca, one of the holiest cities in the world for Muslims, are much more at ease and at peace with their life. They tend to turn to God (Allah) when in need and as a result shut out their negative emotions. Hence, they are much happier and have a positive outlook on life. They tend to use spiritual words such as" ‫ﷲ‬ "(Allah), " ‫"الحمد‬ (alhamdulillah meaning thanks to Allah), and ‫‪"(ya‬يارب"‬ Rabb meaning O Lord) revealing their emotional connection with god.
In the second phase, we performed a fine-grained analysis of user emotions and studied the differences in emotions shared by people residing in the two cities. We observed that words in the positive categories such as Joy and Anticipation were used by a higher proportion of people in Mecca (~90%) compared to those in Riyadh (~70%). In order to perform a more fine-grained analysis, we created a word-cloud of the words extracted from the tweets of people residing in Mecca and Riyadh cities, as shown in Figures 5 and  6, respectively. For a better understanding of user emotions, the top five words in each of the emotion categories expressed by users in the two cities are presented in Table 7. It can be seen that some of the words used are common in both cities, such as ‫"خوف"‬ (meaning fear) and ‫"خﻼص"‬ (salvation/finish) and rank at the topmost position in their corresponding categories, i.e., fear and anger, respectively.   Data Availability Statement: The link to the data repository related to this project can be made available upon request. If required will be made publicly available on GitHub profile (https://bit.ly/3jku4En, accessed on 2 July 23 2021) under the title of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.