MULDASA: Multifactor Lexical Sentiment Analysis of Social-Media Content in Nonstandard Arabic Social Media

The semantically complicated Arabic natural vocabulary, and the shortage of available techniques and skills to capture Arabic emotions from text hinder Arabic sentiment analysis (ASA). Evaluating Arabic idioms that do not follow a conventional linguistic framework, such as contemporary standard Arabic (MSA), complicates an incredibly difficult procedure. Here, we define a novel lexical sentiment analysis approach for studying Arabic language tweets (TTs) from specialized digital media platforms. Many elements comprising emoji, intensifiers, negations, and other nonstandard expressions such as supplications, proverbs, and interjections are incorporated into the MULDASA algorithm to enhance the precision of opinion classifications. Root words in multidialectal sentiment LX are associated with emotions found in the content under study via a simple stemming procedure. Furthermore, a feature–sentiment correlation procedure is incorporated into the proposed technique to exclude viewpoints expressed that seem to be irrelevant to the area of concern. As part of our research into Saudi Arabian employability, we compiled a large sample of TTs in 6 different Arabic dialects. This research shows that this sentiment categorization method is useful, and that using all of the characteristics listed earlier improves the ability to accurately classify people’s feelings. The classification accuracy of the proposed algorithm improved from 83.84% to 89.80%. Our approach also outperformed two existing research projects that employed a lexical approach for the sentiment analysis of Saudi dialects.


Introduction
Throughout Arab nations, social networking communication is immensely famous in recent generations as a means of publicly conveying one's views on a variety of motifs. It is estimated that there are far more than 11 million active Twitter accounts in the Arab region, and Saudi Arabia has the most active members with around 2.6 million. As an increasing number of individuals turn to social media sites to convey personal views and seek advice, demand for social media analysis intensifies. Experts in sentiment evaluation are particularly interested in assessing the community mood and spotting the latest tendencies in this area. Arabic social networking sites' sentiment analysis is particularly difficult since it must confront Arabic's complicated semantics, which is amplified by online posts in nonstandardized Arabic idioms and perhaps not following conventional linguistic forms [1]. Using acronyms and idioms to work around Twitter's 280-character limit set is a common tactic for those who want to quickly express their point [2,3]. The use of abbreviations on Twitter sometimes leads to ambiguity in the interpretation of TTs, and these TTs also frequently involve multiple misspelled words and casual language rules. Throughout this research, the sentiment analysis of particular topics in social-network statistics is utilized to comprehend how views are disseminated to benefit social research analysts in tracking the influence of social challenges on individuals' attitudes, emotions, and momentum-building trends.
The majority of sentiment analysis techniques are lexical (linguistic) or ML (statistical) in nature. Machine-learning (ML) algorithms, including support vector machines (SVM), naïve Bayes (NB), and decision trees (DT) are used to analyze retrieved textual features. These approaches were developed on the text that had been prelabeled by sentiment polarization. Sentiment assessment requires the creation of robust sentiment LX in the context of the sentiment classification domain, which includes a particular set of words having predefined polarities. Once these unfamiliar terms are identified, the statisticalsemantical grading of these terms and their placement can be used to estimate the polarity of an entire paragraph. Tremendous human effort is required to manually gather thought words to create a strong LX in lexical techniques. Techniques for studying modern standard Arabic (MSA) and dialectic Arabic (DA) appear in the literature. Studies on ML techniques use a variety of different ML methodologies [4][5][6][7]. Sentiment studies for MSA and the lexical technique were investigated in existing work [8][9][10][11]. Dialect sentiment analysis, on the other hand, has received minimal research attention. It is difficult to conduct sentiment analysis on Saudi Twitter posts, since there is no gold-labeled dataset or vocabulary that incorporates all Saudi Arabian accents, including Hejazi (western area), Najdi (central area), Shamali (northern area), Janubi (southern area), and Sharqawa (southern area, eastern region). This poses a significant barrier in the study of Saudi Arabia's TTs. Saudi Arabian dialects such as Hejazi and Nejdi are composed of many dialectal variations even in a similar country. This shows how diverse Arabic dialects are and is a significant hurdle to NLP efforts in Saudi Arabia. MSA's nafitha (window) is taqa in the Hejazi dialect, but it is shobak in the Nejdi dialect [12][13][14][15].
The Arabic language is morphologically rich, with a significant amount of information about syntactic parts and relationships expressed at the word level. In the Arabic language, one word may have numerous distinct surface forms; nonetheless, every word could have a large number of forms. Furthermore, most Arabic names are derived from Arabic adjectives that are frequently misconstrued with feelings. Arabic words with the same root might have contradictory emotional orientations due to the usage of diacritics and rich morphology. When using stemming processes to determine the polarity of feelings, this presents a substantial issue. The absence of capital letters in Arabic, which would normally be utilized to distinguish traits and the propensity to repeat letters in writing to indicate moods are also regarded as problems in analyzing the Arabic language. Employing Twitter TTs from Saudi Arabia as a research study, we introduce a new approach for dialectal ASA. The suggested method creates an entire multisentiment lexicon of Saudi dialects and uses a modest stemming process to adhere sentiments in the narrative to the matching base term in the multidialectal sentiment LX. Numerous criteria, including emoji, intensifiers, negations, and other nonstandard languages, are taken into account when classifying TTs according to their emotional content using the suggested technique. Additionally, a feature-sentiment correlation technique is used to weed out feelings that do not impact the discussed topic. In several investigations, the suggested technique was more accurate at sentiment analysis and addressed the shortcomings of prior research.
The remainder of the paper is organized as follows: we provide a literature review in Section 2 regarding the state of the art of the area under study, our method for multifactorial lexical sentiment analysis is discussed in Section 3, the findings of our sentiment analysis technique are examined in Section 4, and Section 5 provides conclusions and ideas for further study. The study's acronyms are listed in Table 1 to benefit the reader's interpretation.

Related Work
The Arabic vocabulary is particularly linguistically rich when it comes to the syntactic features and relationships between words. The English language has less morphological disparity, and sentiment analyses can thus be successfully developed at the sentiment word level. However, because Arabic has several different semantic patterns for a single word, directly applying lexical attributes to sentiment analysis systems results in data sparsity; for example, the word "good" ‫ﺟﻴﺪ‬ is a root with several forms, such as ‫ﺟﻴﺪﻩ‬ (feminine singular), ‫ﺟﻴﺪﺍﺕ‬ (feminine plural), ‫ﺟﻴﺪﺗﻴﻦ‬ (feminine dual), ‫ﺟﻴﺪﺍﻥ‬ (masculine dual), and ‫ﺟﻴﺪﻳﻦ‬ (masculine plural). To further complicate matters, the majority of Arabic given names and surnames originated from Arabic adjectives, which can be mistaken for emotional implications, for example, the Arabic name Saead ‫ﺳﻌﻴﺪ‬ can be the adjective "happy".
Lastly, in Arabic, the origin of an optimistic term is a POS meaning, whereas the origin of a pessimistic one is negative. However, some words with contradictory sentiment polarity can have a similar three-letter root, for example, the words "discrimination" ‫ﺗﻤﻴﻴﺰ‬ tamyiz (negative) and "excellent" ‫ﺇﻣﺘﻴﺎﺯ‬ "iimtiaz" (POS) appear with an incompatible emotional orientation that has the same Arabic root, ‫ﻣﻴﺰ‬ miz [16]. There are a number of studies about ASA, and this section focuses primarily on ML and lexical techniques.

ML Approaches
According to [17], four-tiered polarity was discovered through mining local e-newspaper comments. An average of 815 Arabic opinions were divided into 620 posts for the training dataset, and 195 comments for the test dataset, resulting in validity of 85%. Natural language processing techniques, including machine translation, text categorization, and sentiment analysis, require an enriched corpus for precision and quality standards, as the authors in [18] reported. In total, 1000 comments from the The Voice Facebook account and 1000 from the Al Arabiya Facebook news website were incorporated into the corpus. The scholars used Facebook to build the corpus in order to examine dialectal Arabic. Regarding sentiment analysis and cinematic purchase predictions, a corpus is also employed in publications (negative, neutral, and POS). POS taggers, tokenizers, vocalizers, and stemmers were used to build the corpus. Conventional labeling, interannotator agreement (IAA), and classifications such as DT, SVM, NB, and KNN were all used to determine the polarity of the text. The authors presented sentiment classification algorithms that are not really appropriate for capturing negative expressions, which has a massive effect on sentiment. The preprocessing phase also did not eliminate unnecessary statements.
NB and SVM classifications were utilized to analyze statements in Moroccan dialectal Arabic and MSA collected from Facebook. MSA and dialectal Arabic text classification was added before sentiment analysis to boost reliability. Light stemming was their primary goal in preprocessing, and they used numerous light-stemming methods to implement in to MSA and dialects. Two features, TF-IDF and N-grams, were used to classify the data. The SVM classifier achieved an accuracy of 81%, while the NB classifier achieved an accuracy of 78%. Nevertheless, this strategy is a challenge for huge samples because they need to evaluate whether a TT was written in MSA or dialect during the light-stemming step. The discriminative multinomial naïve Bayes (DMNB) classification model, which is assisted by numerous text preprocessing approaches, including normalization, N-gram tokenization, and TF-IDF, was enhanced by [19]. A public Twitter corpus dataset of 2000 Arabic TTs categorized as POS or negative, and a fivefold cross-validation procedure were utilized in his research. When compared to other corpus-based sentiment analysis methodologies, the study's results revealed enhancement. Despite the author's suggestion that feature selection is used as a strategy for future research, the report provides minimal information on categorization features. Jordanian dialect was the topic of [20]. Three polarities were assigned to the dataset: POS, negative, and neutral. For example, TTs may be cleaned up, normalized, or tokenized to remove stop words, or they could be tokenized and normalized. NB and SVM were used in their study as two supervised classifiers. SVM's results had an accuracy rate of 82.1%. However, there is still room for improvement in light stemming and rooting in dialectical Arabic.

Lexical Approaches
In an effort to improve current lexical approaches for ASA, a novel lexical technique for ASA was suggested [21]. The LX was constructed in four stages. In the first step, we selected 300 root words from the SentiStrength site and added synonyms to our LX in the second step. In the third step, a phrase intensity weighting mechanism was deployed to the LX to examine if any terms had been omitted even after passing through the first two phases. The fourth step expanded the vocabulary by including terms from other Arabic dialects. Sentiment analysis was then conducted using this LX and the basic lexical technique by determining the text's polarity without taking negation or intensification into consideration. Precision was measured at 70.05% using multiple LX scalability stages, but it only covered MSA and did not provide any dialects. For example, one of the three systems presented by [22] was built on top of an improved form of an older lexical technique that could accommodate contextual polarization such as negative and intensifying comments. Using these additional variables, the authors were able to achieve an accuracy rate of 91.75%. For unlabeled data, the authors in [23] found that the LX-based approach was frequently used. However, facts are labeled, and polarity is estimated using sentiment lexica. Sentimental terms and phrases from the LX can be used to gauge the tone of a piece of writing (such as a review). The authors in [24] performed LX-based sentiment classification for Arabic Twitter datasets on the Syrian civil conflict and issues. Arabic TTs provided as a "bag of words" (BOW) were negatively or positively evaluated by checking the given emotions in an Arabic sentiment dictionary. The findings of this article did not analyze dialectical Arabic or other factors that may influence SA efficiencies, including intensifying and negating.
ASA effectively uses both lexical and ML methodologies, as evidenced by surveying relevant research. We applied a text-based lexical strategy and evaluated the impact of numerous elements such as strength, light stemming, negating, and emoji on analytic precision in evaluating various lexical parameters.

Multifactor Lexical Dialectical Arabic Sentiment Analysis (MULDASA)
Lexical sentiment evaluation uses two sentiment LXs to compare sentiment words in TTs (POS and NEG). Sentiment terms are counted in the text to determine the entire polarity of a TT. The most common way to label a TT is to follow a set of rules. If the POS statements in a TT are greater than the NEG statements, the TT is considered to be POS, and vice versa [25]. In order to improve the precision of the sentiment analysis procedure, we applied a range of tricks, such as emoji, intensifiers, negations, and sounds and gestures such as supplication, proverb, and interjection. We call this approach multifactor.

Building Sentiment Analysis Corpus
We used the corpus that we created in our previous work on dialectical Arabic stemming [26] as the basis for building our gold-labeled dataset. Around 40,500 tweets were gathered from various hashtags and accounts. Before using NLP techniques, these tweets were lexically standardized. Following normalization, a gold-standard corpus was created by 7 human annotators manually annotating 7000 tweets, who labeled the polarity of each tweet with its corresponding emotion (positive or negative), as mentioned in Table  2. Twitter's API was used to gather TTs on the basis of two factors: first, the location of the user (Saudi Arabia), and second, the dialect (such as Hejazi and Najdi); an equal number of TTs were acquired using the user's location, and this process was sometimes complex since some participants restricted their location. TTs were collected using relevant hashtags to the unemployment problem domain, such as ‫ﺍﻟﺴﻌﻮﺩﻳﻪ‬ _¬ ‫ﻟﻠﺴﻌﻮﺩﻳﻴﻦ‬ (Saudi Arabia for Saudis) and ‫ﺍﻻﺗﺼﺎﻻﺕ‬ ‫ﺳﻮﻕ‬ ‫ﺗﻜﺎﻣﻞ‬ (telecoms market integration), and subjected to the necessary preprocessing steps, data collection, preprocessing, normalization, light stemming, and notation. Full details on the volume of TTs in our corpus are shown in Table 2.

Domain Analysis and Feature Extraction
Our lexical method is based on a thorough examination of the issue domain's knowledge. Analyses are crucial in this project since they allow for us to extract domain variables in order to link them with the indicated feelings. Domain knowledge encompasses information about a domain's surroundings, important ideas, synonyms, ground facts, and linkages between these objects and external relationships that connect concepts from other domains [27]. Conversations with important stakeholders (e.g., people and authorities) and connectivity channels (Twitter posts) are included in our modeled knowledge for our labor problems, as shown in Figure 1. Figure 1 defines the domain variables and relationships existing between these variables to link these variables with the indicated feelings.

Construction of Arabic Sentiment LX
For dialectical Arabic language, this work develops a new domain-specific vocabulary. As previously stated, the Arabic language is composed of MSA and a variety of regional dialects that are often employed in everyday conversation. Arabs from various areas and nations frequently compose their TTs in their local accents. Saudi Arabia in particular features six distinct dialects. Another way of resolving this issue is by incorporating the vocabulary from several other Saudi dialects, including Hejazi and Najdi from the western and central regions, Shamali from the north, and Janubi and Sharqawi from the south. The LX utilized in this study was manually and automatically compiled by linguistics, and native Arab and Saudi accent speakers. For this reason, much effort is devoted to manually creating and expanding the vocabulary of the dialects. Since colloquial Arabic lacks a consistent vocabulary, the participation of native people of diverse dialects is crucial to the development of the LX. Annotators were chosen from a demographic group that uses social media often (e.g., age between 23 and 45). Annotators followed the same instructions and rules, including not permitting prejudice (such as religious, cultural, or societal beliefs) to affect their work. After the annotation procedure, Cohen's kappa coefficient [28] was employed to evaluate annotation reliability. This is a statistical method for determining qualitative word inter-rater agreement. Because it considers agreement by chance, it is regarded to be a more reliable indication than simple percentage computation. The weighted kappa was 0.816, suggesting accurate annotations. The accepted degree of agreement was estimated to be 91.74% [29]. In order to construct the vocabulary, thousands of emotive expressions were collected from a variety of sources. As a starting point, Azmi and Alzanin chose 1130 MSA feelings terms from their work [17]. Using the MSA terminology and dialectical phrases for each emotion word, we created a list of MSA synonyms. Saudi dialects that were researched included Hejazi, Qassmi, Nejdi, Janubi, and Shamali. There were three annotators who assigned the terms to one of four polarity rates: highly POS (+1), POS (0.5), NEG (−0.5), or highly NEG (−1), as mentioned in Table 3.
It was then expanded from 1130 words in the sentiment vocabulary to a total of 16,500 words. In this paper, we propose multi-intensity sentiment LXs and different matching techniques to perform sentiment analysis as described below: • POS (P): sentiment score (SC) (intensity) = 0.5 This was manually evaluated by 7 annotators who then labelled the polarity of every TT with its sentiment (POS or NEG), which reflects the exact value of the sentiments of the TTs in the corpus. We considered this dual polarity after the annotators' comments showed that, with respect to controversial issues, most people show critical and strong opinions, and they rarely annotated a TT with a neutral label. This conclusion is corroborated by the comprehensive survey of ASA by [30], where the authors referred to the issue of common POS vs. NEG opinion as binary sentiment analysis (BSA).

Feature-Sentiment Association
If we use an association pane (words that are on either side of the targeted word) to look for feelings associated with an idea identified as important in a domain during content modeling, we can exclude opinions that are not related to the given topic. Conventional POS-based referencing techniques, including those proposed by [6] and [31], cannot be effectively deployed for feature-sentiment identification in dialectal Arabic since it lacks the grammatical rules of MSA. Table 4 illustrates our feature-sentiment association method's workings in detail. Using our original dialectical Arabic light-stemming algorithm [26], word stemming was performed on the original TT (first row in Table 4) to find all POS/NEG sentiments (good, excellent, bad) and target domain features (second row). Then, using a two-word window around the sentiment (two words before and after the sentiment), neighboring (associated) domain features were identified (third row). The two-word window is enough for TTs because of their brief sentences.
Sentiments (tiered) ‫ﺗﻌﺐ‬ and (ruin) ‫ﺧﺮﺏ‬ were taken into account in the preceding example in Table 4 since they covered the domain's semantic characteristics. The sentiment beautiful ‫ﺟﻤﺎﻝ‬ is considered to be nonrelevant and was excluded because it refers to ‫ﺟﻮ‬ (weather), which is not a domain feature.

Computing Sentiment SC
The LX was searched for a specific term by using the term-matching approach. If the terms were identical, an SC was provided to the term. The steps mentioned in Algorithm 1 are required to come up with a final SC.

Strategies to Enhance the Basic Sentiment Analysis Approach
To enhance the precision of our sentiment analysis process, we tested a variety of strategies including intensification, negation, special phrases, and emoji.

Negations
It is essential for sentiment analysis to identify NEG words because they can affect the entire context and orientation of an idea. The authors in [32] proposed analysis of negating elements for ASA that takes into account two grammatical structures: ‫ﺍﻟﻨﺼﺐ‬ ‫ﺃﺩﻭﺍﺕ‬ and ‫ﺍﻟﺠﺰﻡ‬ ‫.ﺃﺩﻭﺍﺕ‬ Two categories of negation particles were established on the basis of these criteria, which identify five key negation elements in Arabic: lan ‫,"ﻟﻦ"‬ maa ‫,"ﻣﺎ"‬ lam ‫,"ﻟﻢ"‬ laa ‫,"ﻻ"‬ and laysa ‫."ﻟﻴﺲ"‬ However, their proposed method applies simple grammatical rules that switch polarity only if negation particles follow the sentiment terms. This approach failed in some cases to determine the negation impact. An Arabic Facebook news section sentiment analyzer was presented [33] using an ML approach. The scholars classified negations using several ML techniques, even though only five MSA negations were studied, and dialectal negations were not analyzed. An extensive rule-based approach was needed in our investigation to deal with the complicated form of Arabic, especially with regard to NEG linguistic emotions. The catalog of negation words used throughout Saudi dialects was manually compiled, resulting in a range of 45 words, such as ، ‫ﻣﻮ‬ ، ‫ﻣﺶ‬ ‫ﻣﻌﺎﺩ‬ ، ‫ﻣﺤﺪ‬ ، ‫ﻣﺎﺭﺡ‬ ، ‫ﻣﺎﻧﻲ‬ (msh, mw, mani, marih, mahadun, mueadin).
NEG language can be used to reverse the polarity of a sentiment, which is useful for sentiment analysis purposes. Consider the NEG connotation of a statement like, "not happy". This means that while calculating the sentiment SC, it is critical to take negation into account. A window for terms in TTs must be considered in order to examine negation.

Determining Sentiment Intensity
The majority of existing research treats the ASA problem as a binary classification challenge or 2-class (POS or NEG sentiment) difficulty. From this perspective, terms, statements, or records with varying intensities must be grouped into two specific categories, namely, they must be classified as either POS or NEG sentiment [34]. However, this is not the case in real life, where the polarity spectrum of emotions extends from extremely NEG to very NEG, neutral to POS, and POS to very POS. In addition, experts believe that modeling intensity at the word level is critical for improving the performance of NLP solutions, particularly in questioning answering and contextual inference [35]. As a result, scholars proposed a multiplication impact by pairing an intensifier (a support word) such as "very" with a polarity adjective such as "good" or "bad". This can aid in determining distinct sentiment values for terms such as "very good", "good", "bad", and "very bad".
To our knowledge, no investigations on the effectiveness of intensifiers on sentiment polarity have been conducted in Saudi Arabia. Due to the absence of an intensification vocabulary in the research, intensification terms for Saudi dialects were personally acquired by native linguists throughout this analysis. Approximately 33 Saudi intensification words were gathered; Table 5 illustrates some examples. To analyze sentiment intensity, we employed the gathered intensification words. We used a window for phrases in TTs to extract the preceding and following words for each TT, since intensity, as demonstrated in Table 6, is not always contiguous to the sentiment in linguistic Arabic used on social sites. The Algorithm 3 for TT-SC computation with intensification is shown below. Emoji are tiny digital graphics that can be used to convey a variety of different kinds of information on social networking platforms [36]. Emoji have seen a massive boost in popularity in recent years, especially on Twitter, a blogging platform. An emoji can express affection more effectively than a word or phrase can because it does not depend on language or a specific context. As a consequence, the classification and identification of emoji are important for the development of appropriate sentiment analysis programs. A few studies [37] took into account the adoption of emoji in ASA. Sentiment analysis of microblogs could be improved by using innovative nonverbal features rather than NLP techniques, which the authors in [38] found to be a challenging undertaking, mainly when dealing with dialects. In our analysis, we advocated the usage of multiple ML methods and 969 emoji attributes. The suggested emoji-based attributes worked well in accurately identifying sentiment polarity, according to the findings of the experiments.
The lexical technique and nonverbal aspects are combined throughout this work to generate ASA. Emoji usage and its impact on sentiment analysis can be assessed under this approach. Emoji are used in both POS and NEG settings. Emoji that conveyed four unique moods were emphasized: very POS(VP), POS(P), NEG(N), and very NEG (VN). The experiment used a collection of emoji compiled by [39] that includes 592 different symbols. Human observers reviewed the emoji collection, and manually assigned polarity classes and scores for every emoji. The annotators were instructed to use VP = 1.0, P = 0.5, N = −0.5, and VN = −1, and assigned scores. Every emoji was also given an SC on the basis of how close it was to the mean of annotator scores. With a kappa (K) value of 0.85, the annotators' ultimate agreement was rated at 91.2%, which was an impressive result. Tables 7 and 8 provide a breakdown of the emoji that fall under the VP, P, N, and VN categories, and an illustration.  Supplications, interjections, and proverbs are among numerous distinctive sentences in dialectal Arabic that describe emotions, and they have a significant part in sentiment classification. It was critical to competently, precisely, and transparently interpret these specific terms to enhance the created ASA system throughout this research. Innovative methods were developed in the proposed work to handle supplications, interjections, and proverbs in dialectal Arabic in order to retrieve sentiment as correctly as possible from TTs.

a. Supplications
Supplications are commonly used by Arabs, particularly Saudis, in everyday life. Their own social-media posts reflect this behavior as well. Supplication can be utilized to express both POS and NEG feelings, according to linguistic scholars [40]. Whereas supplications are frequently employed in social networking sites to communicate both POS and NEG thoughts, we are aware of just a few studies that address them in the ASA framework [41]. Over 32% of the TTs in our corpus involved either POS or NEG supplications, demonstrating the significance of supplications in defining emotion. There are numerous types of supplication, including POS intentions (shown in Table 9) and NEG desires (shown in Table 10).

‫ﻳﺒﻬﺪﻟﻬﻢ‬ ‫ﷲ‬
We compiled a list of common supplications from a variety of materials, including the Qur'an and quotations. Supplications were found in the TTs if they involved either of terms ‫ﷲ‬ or ‫ﺍﻟﻠﻬﻢ‬ , as described in Table 11. Various supplications gathered from the ALkalem attayeb site [42], in addition to a few other supplications, were manually included. Table 11. Sample of a usual collection of supplications.

POS Sentiment Supplication NEG Sentiment Supplication
‫ﻳﻭﻓﻘﻙ‬ ‫ﷲ‬ god help you ‫ﺍﻟﻭﻛﻳﻝ‬ ‫ﻧﻌﻡ‬ ‫ﻭ‬ ‫ﷲ‬ ‫ﺣﺳﺑﻲ‬ God is my suffice and the best deputy ‫ﻓﻳﻙ‬ ‫ﷲ‬ ‫ﺑﺎﺭﻙ‬ god bless you ‫ﺑﺎ‬ ‫ﺃﻋﻭﺫ‬ I seek refuge in God b. Proverbs Proverbs are brief summaries of popular experience and understanding. A proverbial statement is a traditional saying that is passed down through oral culture and is identical to a proverb. Idiomatic statements are equivalent structures, and differentiating them from proverbial terms can be challenging. The content of proverbial idioms and idiomatic statements often does not come from the sentence. Proverbs and proverbial statements are also classified as idioms by certain academics [43].The examination of proverbs is considered in this work because people communicate their emotions about a topic when blogging regarding it. Proverbs were manually gathered, giving 200 proverbs in various Saudi dialects. Tables 12 and 13 illustrate instances of POS and negative proverbs, and a TT containing a proverb.

‫ﻧﺎﻝ‬ ‫ﺻﺒﺮ‬ ‫ﻣﻦ‬
Person who is patient will be the winner ‫ﺷﺎﻑ‬ ‫ﻻﻣﻦ‬ ‫ﻻ‬ ‫ﻭ‬ ‫ﺩﺭﻯ‬ ‫ﻣﻦ‬  Lexical multifactor sentiment analysis, which focuses on a thorough assessment of contextual subject information, was discussed in detail. To strengthen the fundamental sentiment assessment, we present our Arabic Sentiment LX development strategies and then demonstrate methods for calculating sentiment scores and other strategies such as light Arabic stemming and morphological analysis, negatives, the intensity of feelings, emoji, and the consideration of special linguistic expressions that impact feelings.

Analysis of Experimental Findings
We describe and evaluate the outcomes of our experiments in this section. Experiments using lexical-and multifactor-sentiment analyses were conducted to assess the performance of the suggested algorithm, which included the consideration of emoticons, intensifiers, and negations. We adopted usual text classification parameters of precision (P), recall (R), accuracy (Acc), and F measure (F1) to examine the alternative techniques. F measure, a harmonic mean of retention and accuracy, is used to assess overall system effectiveness [45].
Categorization performance and the average F-SC among NEG and POS categories are shown in Table 15 and Figure 2, respectively. The experiments were run using a goldlabeled copy of our original records. We tested a variety of parameters that could reflect sentiment polarity or strength, as previously noted. Table 15 shows that the strategy incorporating all aspects (LX-based baseline + light stemming + polarity + negations + emoji + intensification words) achieved the highest classification outcomes, with precision SC of 89.80% and F-SC of 86.32%, improved performance of 5% and 9%, respectively, over the base point. Table 15 further demonstrates that the LX-based baseline had an excellent classification precision of 84.34% and F-SC of 76.47% because of, first, the knowledge-based strategy that enabled collecting particular field characteristics, and second, the efficacious development of Saudi dialect LXs (see Table 16).   Emoji, on the other hand, had a lesser rate of precision than that of the baseline method, with an F-SC of 48.70 and 82.63% accuracy rate. The usage of emoji to convey a sarcastic message may have influenced the quality of the evaluation in some situations. Integrating the LX-and polarity-based classifications achieved classification accuracy of 88.94% and an F-SC of 82.14%, as shown in Table 15. Merging LX-based and special phrases resulted in an accuracy rate of 85.39% and an F-SC of 76.99%. In considerations of light stemming, classification precision SC and F-SC were 88.99% and 81.16%, respectively, when combined with the LX-based technique.
In order to assess most negation phrases and avert misstatements owing to their inconsistent implementation, negation is a more complicated task that requires particular guidelines. The complexity of the Arabic language and the question of negation necessitated indepth linguistic analysis and semantic synthesis. The LX-based method's negative result was hindered by two problems. One was the usage of special characters, including exception characters in the Arabic language, which are popular in TTs (e.g., ‫ﺍﻟﻤﺜﺎﺑﺮ‬ ‫ﺍﻻ‬ ‫ﻳﻔﻠﺢ‬ ‫ﻟﻦ‬ , "nobody is successful unless they work extremely hard"); in this case, we needed to finish sentence analysis and consider the exception characters to determine the correct sentiment. The other problem was the free sequence of terms in an Arabic sentence, which caused the negation and feeling to be mismatched. The correctness of the negation with the LX-based approach was 79.53%, and the F-SC was 57.70%, according to the data. As a consequence, all parameters worked together to improve the accuracy rate, with the absence of negation and emoji. The maximal classification precision measurement was obtained by aggregating the components.

Evaluation against Similar Work on Dialectal ASA
Our LX-based method is primarily predicated on incorporating problem domain understanding into the operation when it comes to sentiment analysis. Because of this, it is beneficial to assess the relevance of our strategy to other issue areas as well. Our technique was evaluated in comparison to 2 LX-based techniques for Saudi dialects that have been presented in the Journal of Information Science.
The collection in [46] was the first with which we conducted our experiments. A variety of social concerns in Saudi Arabia were discussed using hashtags chosen by the writers, including #qyadt_26_aktubar ("on 26 October, women will be driving"), and # almhtsbwn_lldywan_mjdda (sheikhs returned to the ruler to debate the issue of women driving). There were 1103 Arabic annotation TTs in the collection, which is not accessible to the general public. On the other hand, the publishers rendered the information available to us for the study. Systematic sentiment analysis was used to classify the 2 TTs into one of two categories on the basis of their polarity (POS or negative). By converting SentiWordNet into Arabic, the researchers used it to retrieve some sentiment terms for Arabic sentiment LXs. These phrases were then followed by a collection of their own. There are 1500 words in their emotional vocabulary (1000 negative and 500 POSs). It is possible to gauge how POS or negative an entire TT is by looking at how many POS words there are in it. They utilized regular expressions to create a negation phrase analyzer in their study. This takes TTs that are not annotated and returns TTs that are annotated. A few of the produced TTs, however, still lacked sentiment annotation. As a result, their semantic method is limited to terms that express sentiment, and the classifier does not correctly categorize a TT if the sentiment terms are missing from the LX. Whenever they looked at mild stemming and negation, they received the perfect results. An accuracy of 67.60%, 78.24% F-SC, 91.74% precision, and 67.43% recollection were the study outcomes.
The authors gathered the second collection used in the research [22]. Procedia Computer Science revealed their findings. In total, 14,806 TTs were personally described by the observers who had been selected. For the sake of convenience, the AraSenti-TT corpus is available online, and partitioned into training and testing datasets. To generate sentiment LX "AraSenti-Trans", we used the MADAMIRA program and retrieved 131,342 words from the TT datasets. This was achieved by building a collection of all negation particles discovered in TTs and determining whether the TT contained one of those components. It was not taken into account while drawing conclusions. This study's precision (78.38%) outpaced its recall (78.15%) by more than two standard deviations.   Our LX-based strategy outperformed the other two, with a 10% precision enhancement over the Adayel corpus and a 2% enhancement in the F-SC over the Al-Twairesh technique. Many aspects contribute to lexical analysis, comprising intensifiers, negations, proverbs, and interjections, and a full multi-intensity LX for the Saudi dialect. The favorable outcomes of our technique could have been due to these features. Our technique can be applied to other fields, as shown by the comprehensive review.

Conclusions and Future Work
Numerous elements add to the efficiency of sentiment analysis, and this research proposed a mechanism for dialectal Arabic which takes these into account. The presented study extended a sentiment LX that is accompanied by a complete overview of multidialectal sentiment alternatives for the target given the problem of unemployment. We employed a feature-sentiment association technique to weed out thoughts that were not significant to the problem domain and an effective light-stemming approach to associate the given feelings with the associated term root in the multidialectal LX.
In order to test the precision of our multi-intensity lexical sentiment analysis technique, we performed experiments. Findings showed that, in conjunction with light stemming, the collaborative considerations of numerous factors of emoji, intensification, negation, supplication, and interjection helped in improving the proposed algorithm in the sentiment categorization of TTs. The employed algorithm's classification efficiency and F-SC were enhanced from 83.84% to 89.80% and from 73.47% to 84.70%, respectively. Our approach was compared with two existing research studies, and results showed that MULDASA outperformed the existing approaches.
Our future work involves investigating the integration of our lexical algorithm with ML techniques in a hybrid approach that could further improve overall classification accuracy. This is particularly useful for problem domains where it is difficult to recognize beforehand a comprehensive set of domain key ideas (features) that could be associated with the sentiment LX. An example of such a domain is "hate speech", where the feature set and expressed sentiments cover a wider pool of terminology and dynamically change. We are also planning to explore the effictiveness of the proposed approach on neutral text.