Semantic Partitioning and Machine Learning in Sentiment Analysis

: This paper investigates sentiment analysis in Arabic tweets that have the presence of Jordanian dialect. A new dataset was collected during the coronavirus disease (COVID-19) pandemic. We demonstrate two models: the Traditional Arabic Language (TAL) model and the Semantic Partitioning Arabic Language (SPAL) model to envisage the polarity of the collected tweets by invoking several, well-known classiﬁers. The extraction and allocation of numerous Arabic features, such as lexical features, writing style features, grammatical features, and emotional features, have been used to analyze and classify the collected tweets semantically. The partitioning concept was performed on the original dataset by utilizing the hidden semantic meaning between tweets in the SPAL model before invoking various classiﬁers. The experimentation reveals that the overall performance of the SPAL model competes over and better than the performance of the TAL model due to imposing the genuine idea of semantic partitioning on the collected dataset.


Introduction
Social networks, nowadays, are just like beating hearts-people cannot live without them. Social networks affect various fields, such as health, marketing, politics, businesses, management, etc. Scientists/researchers mine for hidden knowledge amongst the vast amount of content posted via Twitter, Instagram, or Facebook, to facilitate decision making [1]. Twitter, with 330 million users [2], was a fertile source of research in this study. Twitter allows users to share their opinions in short-term messages, with a maximum of 280 characters [3].
Sentiment Analysis (SA) is a vital technique used to gain insight human opinions, emotions, and attitudes regarding particular topics in specific, written languages [4][5][6]. SA is the most actively researched field in Natural Language Processing (NLP) [7], and it is involved in data mining and text mining studies [8]. Further details about SA applications and challenges can be found in [9,10]. The influence of social media has increased throughout the years, directly impacting the importance of this field [11]. SA helps provide insight into whether society is positively or negatively impacted by an international or national event [12].
SA for English text has been extensively studied and investigated using public datasets [13,14]. On the other hand, SA for foreign language, such as Arabic language, has received very little attention [4,15]. The Arabic language is the sixth official language of the United Nations [16]. Twenty-seven countries use the Arabic language as a primary language; approximately 422 million people worldwide speak it [4]. However, the Arabic language is still at the beginning stage in the NLP field due to insufficient resources and tools [11,17]. This presents a vast challenge for researchers in this field, in regards to its complex structure, history, different cultures, and dialect [17][18][19]. In general, the Arabic language is categorized into three different types: (1) Modern Standard Arabic (MSA);

•
Research question 1: What are the performances of the various machine-learning classifiers of the collected tweets that utilize various extracted features? • Research question 2: What is the impact of applying semantic partitioning on the collected data prior to invoking machine-learning classifiers? • Research question 3: What are the reactions of Jordanians toward government precautionary measures during the COVID-19 pandemic?
We proposed and implemented two models to address these questions-the Traditional Arabic Language model (TAL) and the Semantic Partitioning Arabic Language (SPAL) model. The inputs included tweets posted by individual Jordanian nationals, via a Tweet Collector Tool (TCT). For both models, we started with tokenization and feature extraction on the processed and collected tweets to generate a CSV file with the same number of extracted features. Afterwards, both models integrated semantic analysis with various machine-learning classifiers to classify the tweets into positive or negative. The main difference between these two models is whether the classifiers are invoked on the entire dataset, as in TAL, or on mutually exclusive ("disjoint") subsets, as in SPAL. It is worth noting that the partitioning process inside SPAL depends on the hidden semantic meaning in the Jordanian dialect tweets. Therefore, the main objective of this paper was to design and implement trustworthy models to categorize Jordanian opinions, whether they were for or against governmental actions during the COVID-19 pandemic. Many experiments have been conducted to measure the performances of both models using various well-known classifiers, including Support Vector Machine (SVM), Naïve Bayes (NB), J48, Multi-Layer Perceptron (MLP), and Logistic Regression (LR). This paper is organized as follows: Section 2 illustrates the informal and algorithmic expressions of the newly proposed models. Section 3 discusses the experimental results on the real collected datasets. Finally, the conclusion and future work are presented in Section 4.

Proposed Methodology
It is common to invoke machine-learning models to classify collected tweets into positive or negative, but it is tedious, routine work. The novelty of this work lies in assimilating the semantic analysis of Jordanian dialect and enacting the semantic partitioning of the collected tweets to enhance the overall performance of the generated model. This section will present the Traditional Arabic Language (TAL) model and Semantic Partitioning Arabic Language (SPAL) model on Jordanian dialect in the following subsections.

Traditional Arabic Language (TAL) Model
The TAL model is implemented, as shown in Figure 1. The input of our model is the Jordanian dialect collected tweets, which were processed using computational strategies: textual processing and feature extraction, which will be explained later. The classification stage was applied to classify the number of positive and negative tweets. Finally, the overall performance of the model was measured as listed in Algorithm 1. For the sake of building a sentimental dataset for Jordanian dialect and modern Arabic text, we considered the Arabic content published on Twitter. The dataset was collected from 1 March, 2020 to 21 May, 2020, during the COVID-19 pandemic. We collected the dataset by using a specially developed tool called the Tweet Collector Tool (TCT), in order to search for a specific hashtag (examples of the hashtag: (" # ‫كورونا‬ ", "#kuruna ", "#Corona"); (" # ‫الحكومة‬ _ ‫كورونا‬ ", "#alhukumatu_kuruna", "#Corona _ government"); (" # ‫الشعب‬ _ ‫كورونا‬ ", "#alshaebi_kuruna", "# Corona people"); (" # ‫ارتفاع‬ _ ‫االسعار‬ ", "#artifaei_alasear", "#high prices"); (" # ‫تصريحات‬ _ ‫وزير‬ _ ‫االعالم‬ ", "#tasrihat_wzir_alaelam", "#statements by the Minister of Information"); (" # ‫المؤتمر‬ _ ‫الصحفي‬ ", "#almutamir_alsahufii", "# Press Conference")). The collected dataset size was equal to 2000 randomly selected tweets with equal positive and negative labels divided among three regions (north, middle, and south in Jordan), called "TAL", as shown in Table 1. This dataset consists of two columns: the collected tweets and the polarity. The polarity for the extracted tweets is assigned manually into positive and negative. This was achieved by three experts in Arabic linguistics. In case of a disagreement, in regards to tweets that were polarized differently, the final decision of the annotation was taken based on the majority chosen.
6. (POutSet,NOutSet,Accuracy,Precision,Recall,F_Score)=CalculateAccuracy(Conf usionMatrix). For the sake of building a sentimental dataset for Jordanian dialect and modern Arabic text, we considered the Arabic content published on Twitter. The dataset was collected from 1 March, 2020 to 21 May, 2020, during the COVID-19 pandemic. We collected the dataset by using a specially developed tool called the Tweet Collector Tool (TCT), in order to search for a specific hashtag (examples of the hashtag: (" # ‫ﻛﻮﺭﻭﻧﺎ‬ ", "#kuruna ", "#Corona"); (" ", "#tasrihat_wzir_alaelam", "#statements by the Minister of Information"); (" # ‫ﺍﻟﻤﺆﺗﻤﺮ‬ _ ‫ﺍﻟﺼﺤﻔﻲ‬ ", "#almutamir_alsahufii", "# Press Conference")). The collected dataset size was equal to 2000 randomly selected tweets with equal positive and negative labels divided among three regions (north, middle, and south in Jordan), called "TAL", as shown in Table 1. This dataset consists of two columns: the collected tweets and the polarity. The polarity for the extracted tweets is assigned manually into positive and negative. This was achieved by three experts in Arabic linguistics. In case of a disagreement, in regards to tweets that were polarized differently, the final decision of the annotation was taken based on the majority chosen.
6. (POutSet,NOutSet,Accuracy,Precision,Recall,F_Score)=CalculateAccuracy(Conf usionMatrix). For the sake of building a sentimental dataset for Jordan Arabic text, we considered the Arabic content published on T collected from 1 March, 2020 to 21 May, 2020, during the CO collected the dataset by using a specially developed tool called t (TCT), in order to search for a specific hashtag (examples of t "#kuruna ", "#Corona"); (" ", "#almu Press Conference")). The collected dataset size was equal to tweets with equal positive and negative labels divided amon middle, and south in Jordan), called "TAL", as shown in Table 1 two columns: the collected tweets and the polarity. The polarity is assigned manually into positive and negative. This was achie Arabic linguistics. In case of a disagreement, in regards to twe differently, the final decision of the annotation was taken based o

(TrainingSet, TestingSet)=Validation_Split(InSet).
4. ClassifierModel()=Classifier(ClassifierName,TrainingSet).  For the sake of building a sentimental dataset for Jordanian dialect an Arabic text, we considered the Arabic content published on Twitter. The da collected from 1 March, 2020 to 21 May, 2020, during the COVID-19 pand collected the dataset by using a specially developed tool called the Tweet Coll (TCT), in order to search for a specific hashtag (examples of the hashtag: (" "#kuruna ", "#Corona"); (" ", "#tasrihat_wzir_alaelam", "#statements Minister of Information"); (" # ‫ﺍﻟﻤﺆﺗﻤﺮ‬ _ ‫ﺍﻟﺼﺤﻔﻲ‬ ", "#almutamir_alsah Press Conference")). The collected dataset size was equal to 2000 random tweets with equal positive and negative labels divided among three regio middle, and south in Jordan), called "TAL", as shown in Table 1. This dataset two columns: the collected tweets and the polarity. The polarity for the extrac is assigned manually into positive and negative. This was achieved by three Arabic linguistics. In case of a disagreement, in regards to tweets that were differently, the final decision of the annotation was taken based on the majority

ConfusionMatrix=CalculateConfusionMatrix(ClassifierModel
", "#tasrihat_wzir_alaelam", "#statements by the Minister of Information"); (" # ‫ﺍﻟﻤﺆﺗﻤﺮ‬ _ ‫ﺍﻟﺼﺤﻔﻲ‬ ", "#almutamir_alsahufii", "# Press Conference")). The collected dataset size was equal to 2000 randomly selected tweets with equal positive and negative labels divided among three regions (north, middle, and south in Jordan), called "TAL", as shown in Table 1. This dataset consists of two columns: the collected tweets and the polarity. The polarity for the extracted tweets is assigned manually into positive and negative. This was achieved by three experts in Arabic linguistics. In case of a disagreement, in regards to tweets that were polarized differently, the final decision of the annotation was taken based on the majority chosen.
", "#almutamir_alsahufii", "# Press Conference")). The collected dataset size was equal to 2000 randomly selected tweets with equal positive and negative labels divided among three regions (north, middle, and south in Jordan), called "TAL", as shown in Table 1. This dataset consists of two columns: the collected tweets and the polarity. The polarity for the extracted tweets is assigned manually into positive and negative. This was achieved by three experts in Arabic linguistics. In case of a disagreement, in regards to tweets that were polarized differently, the final decision of the annotation was taken based on the majority chosen.   (With this epidemic, we learned from the Corona virus that our normal day was never an ordinary day, but it was a blessing from the blessing Dealing with tweets as a whole unit is confusing and ambiguous due to the poorly structured Arabic text. Therefore, there is an urgent need to extract and analyze the tweets to reduce the number of resources needed for processing, and simultaneously preserve important irrelevant features. It is considered a predominant step in SA. It begins with the initial collected dataset to build derived values, which are optimally used in the learning stage. Consequently, this leads to a better understanding of the problem domain and a more precise interpretation for the original dataset. In our study, the extraction process was achieved by implementing java code that tokenized the received data from the Twitter API. The tokenization was used to separate the text into smaller units called words, phrases, numbers, non-Arabic words, single characters, and punctuation marks. Unfortunately, the obtained data were not homogeneous; therefore, a normalization procedure was conducted to standardize the obtained data, in order to avoid Tashhkeel (diacritics), Tatweel (repeated letter), and contextual letter representation. For example; In the Arabic language, most letters have contextual letter representations, such as Alef ( Dealing with tweets as a whole unit is confusing and ambiguous due to the poorly structured Arabic text. Therefore, there is an urgent need to extract and analyze the tweets to reduce the number of resources needed for processing, and simultaneously preserve important irrelevant features. It is considered a predominant step in SA. It begins with the initial collected dataset to build derived values, which are optimally used in the learning stage. Consequently, this leads to a better understanding of the problem domain and a more precise interpretation for the original dataset. In our study, the extraction process was achieved by implementing java code that tokenized the received data from the Twitter API. The tokenization was used to separate the text into smaller units called words, phrases, numbers, non-Arabic words, single characters, and punctuation marks. Unfortunately, the obtained data were not homogeneous; therefore, a normalization procedure was conducted to standardize the obtained data, in order to avoid Tashhkeel (diacritics), Tatweel (repeated letter), and contextual letter representation. For example; In the Arabic language, most letters have contextual letter representations, such as Alef (  1. Lexicon features: it focuses on the word-character structure and emphasizes its effect on the results by computing the number of words and the number of characters per tweet. Moreover, the number of words by length, varying from five characters to ten characters, were counted. 2. Writing style features: the writing style is affected by user mood and the user style.
Some users used numerical digits when writing certain Arabic letters, while others used special characters and symbols to represent their feelings. Moreover, some users used punctuation, which altered the tweet contents. Therefore, it is paramount to count the number of numerical digits, special characters, symbols, delimiters, and punctuation per tweet [50].

1.
Lexicon features: it focuses on the word-character structure and emphasizes its effect on the results by computing the number of words and the number of characters per tweet. Moreover, the number of words by length, varying from five characters to ten characters, were counted.

2.
Writing style features: the writing style is affected by user mood and the user style. Some users used numerical digits when writing certain Arabic letters, while others used special characters and symbols to represent their feelings. Moreover, some users used punctuation, which altered the tweet contents. Therefore, it is paramount to on the results by computing the number of words and the number of characters per tweet. Moreover, the number of words by length, varying from five characters to ten characters, were counted. 2. Writing style features: the writing style is affected by user mood and the user style.
Some users used numerical digits when writing certain Arabic letters, while others used special characters and symbols to represent their feelings. Moreover, some users used punctuation, which altered the tweet contents. Therefore, it is paramount to count the number of numerical digits, special characters, symbols, delimiters, and punctuation per tweet [50]. 3. Grammatical features: many researchers utilized grammatical features to understand the language. In our study, we analyzed 11 grammatical rules exercised in the Arabic tweets: Kan and sisters, Enna and sisters, question tools, exception tools, five verbs, five nouns, plural words, imperative clause, Nidaa clause, Eljar letters, and Eatf letters. 4. Emotional features: we focused on SA to mine the emotional statuses of the tweet users. Polarity and emotion were identified in words inside the tweets. They are

3.
Grammatical features: many researchers utilized grammatical features to understand the language. In our study, we analyzed 11 grammatical rules exercised in the Arabic tweets: Kan and sisters, Enna and sisters, question tools, exception tools, five verbs, five nouns, plural words, imperative clause, Nidaa clause, Eljar letters, and Eatf letters.

4.
Emotional features: we focused on SA to mine the emotional statuses of the tweet users. Polarity and emotion were identified in words inside the tweets. They are categorized as positive words, negative words, a combination of positive words, and a combination of negative words, as shown in Table 2.  A vitally important part of evaluating various models of the collected tweets is separating the tweets into training and testing datasets. The size of the training and testing sets are equal to 70% (1400 tweets) and 30% (600 tweets) of the original size of the dataset, respectively.
The classification process is defined to recognize the tweets and separate them into positive and negative categories. There are many predictive machine-learning models of high accuracy and powerful features invoked in several applications. The following models are used in this paper: Support Vector Machine (SVM): is one of the simplest and most effective neural  A vitally important part of evaluating various models of the collected tweets is separating the tweets into training and testing datasets. The size of the training and testing sets are equal to 70% (1400 tweets) and 30% (600 tweets) of the original size of the dataset, respectively.
The classification process is defined to recognize the tweets and separate them into positive and negative categories. There are many predictive machine-learning models of high accuracy and powerful features invoked in several applications. The following models are used in this paper: Support Vector Machine (SVM): is one of the simplest and most effective neural  A vitally important part of evaluating various models of the collected tweets is separating the tweets into training and testing datasets. The size of the training and testing sets are equal to 70% (1400 tweets) and 30% (600 tweets) of the original size of the dataset, respectively.
The classification process is defined to recognize the tweets and separate them into positive and negative categories. There are many predictive machine-learning models of high accuracy and powerful features invoked in several applications. The following models are used in this paper: Support Vector Machine (SVM): is one of the simplest and most effective neural  A vitally important part of evaluating various models of the collected tweets is separating the tweets into training and testing datasets. The size of the training and testing sets are equal to 70% (1400 tweets) and 30% (600 tweets) of the original size of the dataset, respectively.
The classification process is defined to recognize the tweets and separate them into positive and negative categories. There are many predictive machine-learning models of high accuracy and powerful features invoked in several applications. The following models are used in this paper: Support Vector Machine (SVM): is one of the simplest and most effective neural A vitally important part of evaluating various models of the collected tweets is separating the tweets into training and testing datasets. The size of the training and testing sets are equal to 70% (1400 tweets) and 30% (600 tweets) of the original size of the dataset, respectively.
The classification process is defined to recognize the tweets and separate them into positive and negative categories. There are many predictive machine-learning models of high accuracy and powerful features invoked in several applications. The following models are used in this paper: Support Vector Machine (SVM): is one of the simplest and most effective neural networks. It is highly recommended to be used in classification problems due to its ability in increasing the predictive correctness by eliminating over-fit to the data [51]. It is based on finding the best hyperplane in multidimensional space to minimize errors [52].
Naïve Bayes (NB): is a probabilistic model based on Bayes theorem, which presumes that each feature generates an independent and equal impact to the target class [53]. It is a fast, accurate, and easy classification model that is used for large dataset size [54].
J48: is a decision tree that partitions the input space of the dataset into mutually exclusive regions, each of which is assigned a label to characterize its data points [55]. Building a decision tree is achieved by following an agreed iterative approach. The algorithm partitions the dataset based on the best informative attribute. The attribute with the maximum gain ratio is selected as the splitting attribute [56]. Generally, decision tree classification models have many advantages, such as being easy to interpret, and obtaining comparable accuracy to other classification models [57].
Logistic Regression (LR): is a statistical predictive model, which uses the observations of one or more independent variables to find the probability of the dependent variable. It is commonly used to solve problems in different applications [58]. It is recommended that it be used with the existence of multi-collinearity and high dimensional datasets [59]. LR is easy to implement and properly clarifies the obtained results [60].
Multi-Layer Perceptron (MLP): is a feed-forward artificial neural network to predict and classify labels in different applications. It generally consists of at least three layers to build relationships between input and output layers in order to compute the required patterns. Each layer consists of a set of neurons with a set of adaptive weights [61].
Performance evaluation metrics are used to evaluate the overall performance of various models. In general, metrics include comparing the polarity of the tweets to the predicated classified tweet class (positive or negative). Accuracy, recall, precision, and F-score are computed from the confusion matrix in order to measure the power of the predictive models. Further details about the definition and equation of each metric can be found in [62].

Semantic Partitioning Arabic Language (SPAL) Model
This paper concentrates on applying the partitioning concept to the original set of tweets, after applying the computational strategies procedure and before invoking the classification model. The partitioning process is neither randomly nor blindly applied. It depends on the hidden semantic meaning that exists between the Jordanian dialect tweets. We believe that utilizing the semantic meaning between tweets will drastically increase the general accuracy of the model.
The key idea of collecting tweets is to understand Jordanian reactions toward the government measures implemented during the COVID-19 pandemic. It is clearly shown that this study is founded based on the following three domains: coronavirus, government, and people (Jordanian nationals), as shown in Figure 3. The existing semantic correlation among these domains can be viewed as follows: coronavirus and government, coronavirus and people, and, finally, government and people. Each case is derived by taking all possible semantic correlations that exist in the Jordanian tweets according to the predefined domains: where D represents the number of studied domains, which is equal to 3, while C represents the number of correlated selected domains, which is equal to Data 2021, 6, 67 8 of 17 2. Therefore, 3 2 = 3! 2!. 1! = 3, three different subsets can be generated from the originally collected datasets that are mutually exclusive, as shown below: where S: presents the original collected tweets, while S i : presents the subsets of the original set. Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table  5.  Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table 5.  Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table  5. Table 3. Examples of tweets in the "SPAL_1" dataset. Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table  5. Table 3. Examples of tweets in the "SPAL_1" dataset.  Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table  5. Table 3. Examples of tweets in the "SPAL_1" dataset.  Consequently, another copy of the "TAL" dataset was analyzed to partition the collected tweets into three semantic subsets. The collected tweets were manually classified into three mutually exclusive subsets based on their semantic meaning. This process was completed by seven experts in Arabic linguistics. In case of agreement, the tweet was stored in the corresponding classified subset. Otherwise, the final decision of the classification was taken based on the majority chosen subset. Each subset was stored in a separate dataset. The first subset contains the tweets that belongs to governmental responses in Jordan during the COVID-19 pandemic, called the "SPAL_1" dataset, as shown in Table 3. The second subset contains the tweets that belong to people's reactions during the COVID-19 pandemic, called the "SPAL_2" dataset, as shown in Table 4. The last subset contains the tweets that belong to the interaction between the government and people during the COVID-19 pandemic, called the "SPAL_3" dataset, as shown in Table  5. Table 3. Examples of tweets in the "SPAL_1" dataset.   Table 4. Examples of tweets in the "SPAL_2" dataset. (With this epidemic, we learned from the Corona virus that our normal day was never an ordinary day, but it was a blessing from the blessings of God Almighty)   Table 4. Examples of tweets in the "SPAL_2" dataset. (With this epidemic, we learned from the Corona virus that our normal day was never an ordinary day, but it was a blessing from the blessings of God Almighty)   Table 4. Examples of tweets in the "SPAL_2" dataset. (With this epidemic, we learned from the Corona virus that our normal day was never an ordinary day, but it was a blessing from the blessings of God Almighty) Table 5. Examples of tweets in the "SPAL_3" dataset.

Negative
); (mae hadha alwaba' tuealamuna min fayrus kurna 'iina yawmana aleadia makan abdaan yawm eadi, 'iinama kan niematan min nieam allah subhanah wataealaa); (With this epidemic, we learned from the Corona virus that our normal day was never an ordinary day, but it was a blessing from the blessings of God Almighty) Table 5. Examples of tweets in the "SPAL_3" dataset. The train-test splitting process was independently applied to every subset to mate the performance of the general model. The training part of each subset was used uild the classification sub-model independently. This entailed independent testing for h sub-model by using its testing part. Each sub-model produced its confusion matrix. of the generated confusion matrices per sub-model were merged to produce the eral confusion matrix for the chosen classification model. Latter, the performance of whole model was evaluated by measuring F-score, recall, accuracy, and precision on eneral confusion matrix. Finally, Algorithm 2 presents the proposed SPAL model, ich is visually shown in Figure 4.  The train-test splitting process was independently applied to every subset to estimate the performance of the general model. The training part of each subset was used to build the classification sub-model independently. This entailed independent testing for each sub-model by using its testing part. Each sub-model produced its confusion matrix. All of the generated confusion matrices per sub-model were merged to produce the general confusion matrix for the chosen classification model. Latter, the performance of the whole model was evaluated by measuring F-score, recall, accuracy, and precision on a general confusion matrix. Finally, Algorithm 2 presents the proposed SPAL model, which is visually shown in Figure 4. The train-test splitting process was independently applied to every subset to estimate the performance of the general model. The training part of each subset was used to build the classification sub-model independently. This entailed independent testing for each sub-model by using its testing part. Each sub-model produced its confusion matrix. All of the generated confusion matrices per sub-model were merged to produce the general confusion matrix for the chosen classification model. Latter, the performance of the whole model was evaluated by measuring F-score, recall, accuracy, and precision on a general confusion matrix. Finally, Algorithm 2 presents the proposed SPAL model, which is visually shown in Figure 4.   The most important step in our newly proposed SPAL model involved merging the produced confusion matrix of each sub-model (i) into a general confusion matrix for the whole classification model, as shown in Figure 5.

END Algorithm 2
The most important step in our newly proposed SPAL model involved merging the produced confusion matrix of each sub-model (i) into a general confusion matrix for the whole classification model, as shown in Figure 5. FPi: the number of tweets that are positively classified and their actual polarity is negative for the sub-model i.
The general confusion matrix is calculated by summing the corresponding values of the confusion matrix for all sub-models, as follows: The general confusion matrix is calculated by summing the corresponding values of the confusion matrix for all sub-models, as follows: Data 2021, 6, 67

Experimental Results and Discussion
In this paper, all experimental results were conducted by using Weka software. It is open-source, with plenty of machine learning algorithms and visualization tools for data mining. It is one of the chosen languages in data science, particularly NLP [63].
All running experiments were set to use 10-fold cross-validation, and a 0.001 tolerance value. All previously mentioned extracted features equal to 30 were used as test attributes in building each model. Different classifiers were invoked in the proposed models: SVM, NB, J48, MLP, and LR. The sequential minimum optimization algorithm (SMO) was invoked in SVM with a C value equal to 1.0, Epsilon value equal to 1 × 10 −12 , and PolyKernel type with E 1.0 and C 250007. In J48, the confidence threshold for pruning was set to 0.25, and the minimum number of instances per leaf was set to 2. The LR was run using the default setting. Finally, the MLP is a shallow deep learning model, constructed using one hidden layer with 12 neurons, and the activation function was sigmoid.
The performance evaluations for the TAL and SPAL models are illustrated in Tables 6 and 7. In Table 6, the SPAL model vies the TAL model concerning the accuracy metric by using different classifiers. The highest percentage of accuracy was obtained by using J48 classifier with values 70.83% and 71.73% in the TAL and SPAL models, respectively. On the other hand, the NB classifier scored the lowest percentage of accuracy, equal to 58.33% in the TAL model and 59.18% in the SPAL model. The J48 classifier is based on the decision tree, while the NB classifier is based on probability [53,55]. Moreover, J48 iteratively partitions the dataset using the best informative attribute (the splitting attribute is the attribute of the maximum gain ratio) [55][56][57]. While the NB classifier assumes no correlation exists between the different test-attributes, and each attribute has an equal impact on the tweet's polarity [53,54]. The J48 is commonly utilized for some classification tasks, such as emotion recognition from text and Twitter's text categorizations, although it is rarely used for sentiment prediction [64]. The highest accuracy value obtained by invoking the J48 classifier in both models results from a 20% contribution of the emotional features among the other extracted features. These results respond to the first research question addressed in the present study. Figure 6 shows the number of correctly classified and misclassified tweets for TAL and SPAL models on different classifiers. This is evidence that partitioning the collected data into a number of disjoint subsets will increase the number of correctly classified tweets. The philosophy of partitioning is not applied randomly nor regionally. The partitioning step in the SPAL model properly employs the most important domains that the collected datasets rely on. Therefore, rather than processing all tweets at once using the TAL model, the proposed model SPAL partitions the collected tweets into a number of mutually exclusive subsets by utilizing the hidden semantic meaning among the tweets. The percentage amount of improvement in the accuracy using SPAL model reached up to 3.99% in SVM, 1.46% in NB, 1.27% in J48, 4.22% in MLP, and finally 1.07% in LR compared to the TAL model. It is clearly noticed that the percentage amount of improvement in accuracy metrics using the SVM and MLP classifiers is higher than the percentage amount of improvement using other classifiers. The reason behind this phenomenon is the power of merging the machine learning classifiers with the semantic partitioning in the SPAL model, where machine learning classifiers used a nonlinear stimulation function that helped capture the complex features in the hidden layers [58]. On the other hand, the integration between the SPAL model, particularly the MLP classifier, scores the highest improvements, with respect to the accuracy metric. This is because the SPAL model uses the semantic partitioning to investigate the hidden correlation between tweets, while the MLP has a strong associative memory and prediction capability after training [65]. This illustrates the power of the SPAL model with respect to the TAL model. These results respond to the second research question addressed in this study. Table 7 presents the recall, precision, and F-score metrics for TAL and SPAL models. It is well known that the recall and precision metrics present a completely different perspective of mentioned models. These metrics conflict with each other. Therefore, there is a need for a fair index (called F-score) that considers both of them simultaneously. An F-score is considered perfect when it approaches one, while the model is a total failure when it approaches 0. It is clearly noted that the SPAL model scored a higher weighted average F-score value than the TAL model on various classifiers, with a percentage of improvement that reached up to 8.70% in J48, 7.48% in SVM, 4.59% in NB, 6.74 in MLP, and 5.94% in LR. To the best of our knowledge, a good F-score value indicates a minimum number of false positives and a minimum number of false negatives. Consequently, the model correctly identifies real threats, not disturbed by false alarms [9,66]. perspective of mentioned models. These metrics conflict with each other. Therefore, there is a need for a fair index (called F-score) that considers both of them simultaneously. An F-score is considered perfect when it approaches one, while the model is a total failure when it approaches 0. It is clearly noted that the SPAL model scored a higher weighted average F-score value than the TAL model on various classifiers, with a percentage of improvement that reached up to 8.70% in J48, 7.48% in SVM, 4.59% in NB, 6.74 in MLP, and 5.94% in LR. To the best of our knowledge, a good F-score value indicates a minimum number of false positives and a minimum number of false negatives. Consequently, the model correctly identifies real threats, not disturbed by false alarms [9,66].     The SPAL model superiorly competes with the TAL model in predicting the polarity of the Jordanian dialect tweets based on the extracted lexicon, writing style, grammatical, and emotional features. Utilizing the hidden semantic meaning among the tweets helps enhance the general performance of the used model. The main advantage of the SPAL model is to polarize the dataset in an expressively better performance, especially accuracy and F-score metrics. Another advantage of the proposed model is that such a strategy is applicable in multi-processor machines, particularly shared-memory systems (where there is no need to plan the communication of data between different processors). Moreover, the memory caches will efficiently be used because a subset size is small enough to be stored in cache, and then the partitioning can be achieved without accessing the slower main memory.
Finally, addressing the third research question in this study, which focuses on the reaction of the Jordanian citizens, regarding governmental procedures and precautions during the COVID-19 pandemic, is presented in Figure 7. Jordan is divided into three main regions: north, middle, and south. Statistically, 2000 tweets were divided: 667 tweets in the northern region, 667 tweets in the middle region, and the remaining 666 tweets in the southern region. It is worth noting that the middle region had the highest percentage of negative tweets, 66.27%, in regards to governmental actions and decisions during the COVID-19 pandemic, while the south region registered the highest percentage of positive tweets, 64.71%. This refers to the fact that the middle region encompasses the capital city of Jordan (Amman), which has the most governmental and private companies, as well as institutes that had suffered economically from the lockdown and curfew actions.
in the northern region, 667 tweets in the middle region, and the remaining 666 tweets in the southern region. It is worth noting that the middle region had the highest percentage of negative tweets, 66.27%, in regards to governmental actions and decisions during the COVID-19 pandemic, while the south region registered the highest percentage of positive tweets, 64.71%. This refers to the fact that the middle region encompasses the capital city of Jordan (Amman), which has the most governmental and private companies, as well as institutes that had suffered economically from the lockdown and curfew actions.

Conclusions
In this paper, we considered the SA problem using dialectical Arabic language. We collected 2000 tweets written in Jordan during the COVID-19 pandemic. Herein, we proposed two models to predict the polarity of the collected tweets by invoking SVM, NB, J48, MLP, and LR classifiers. The extraction of different Arabic features and the utilization of the hidden semantic meaning when imposing the partitioning enhance the overall performance of the SPAL model. Our experimental results show an improvement in the accuracy-up to 4.22% in the SPAL model compared to the TAL model when the MLP classifier was invoked. Moreover, an improvement in the weighted F-score, which presents a fair index that optimizes the conflict between recall and precision, the SPAL model against the TAL model reached up to 8.70% in J48 classifier.
The benefits of adapting this proposed model can be itemized into scientific and practical perspectives. The proposed SPAL model polarizes the dataset with better performance, especially accuracy and F-score metrics from the scientific perspective. This is accomplished by discovering the massive amount of knowledge and their hidden relations presented in social networks. Therefore, it is considered a "blossom" tool to enrich the dialectical Arabic SA field. Another advantage of the proposed SPAL model is that such a strategy is applicable in multi-processor machines, particularly sharedmemory systems (where there is no need to plan data communication between different processors). Moreover, the memory caches will efficiently be used because the subset size is small enough to be stored in the cache, and then the partitioning can be achieved without accessing the slower main memory. Practically, it is well known that social media is a fertile environment, full of comments, emotions, thoughts, and opinions regarding common events and topics in society [67]. Integrating the SPAL model with social media will help organizations, companies, and governments in mining the hidden metadata, in order to improve the quality of their services to the end-users and to sustain human

Conclusions
In this paper, we considered the SA problem using dialectical Arabic language. We collected 2000 tweets written in Jordan during the COVID-19 pandemic. Herein, we proposed two models to predict the polarity of the collected tweets by invoking SVM, NB, J48, MLP, and LR classifiers. The extraction of different Arabic features and the utilization of the hidden semantic meaning when imposing the partitioning enhance the overall performance of the SPAL model. Our experimental results show an improvement in the accuracy-up to 4.22% in the SPAL model compared to the TAL model when the MLP classifier was invoked. Moreover, an improvement in the weighted F-score, which presents a fair index that optimizes the conflict between recall and precision, the SPAL model against the TAL model reached up to 8.70% in J48 classifier.
The benefits of adapting this proposed model can be itemized into scientific and practical perspectives. The proposed SPAL model polarizes the dataset with better performance, especially accuracy and F-score metrics from the scientific perspective. This is accomplished by discovering the massive amount of knowledge and their hidden relations presented in social networks. Therefore, it is considered a "blossom" tool to enrich the dialectical Arabic SA field. Another advantage of the proposed SPAL model is that such a strategy is applicable in multi-processor machines, particularly shared-memory systems (where there is no need to plan data communication between different processors). Moreover, the memory caches will efficiently be used because the subset size is small enough to be stored in the cache, and then the partitioning can be achieved without accessing the slower main memory. Practically, it is well known that social media is a fertile environment, full of comments, emotions, thoughts, and opinions regarding common events and topics in society [67]. Integrating the SPAL model with social media will help organizations, companies, and governments in mining the hidden metadata, in order to improve the quality of their services to the end-users and to sustain human satisfaction [1]. This could not be achieved without detecting the changes in human opinions.
The limitations of the proposed SPAL model can be described on the following fronts: the partitioning of the collected tweets is achieved manually, not automatically. Moreover, polarity is manually assigned for the collected tweets. Finally, the accuracy of the invoked various classifiers depends on the nature of the data, the number of extracted features, and their types.
We foresee numerous avenues for future work. First, we propose redesigning the proposed SPAL to form parallelism. To the best of our knowledge, focusing on designing a fast parallel Arabic SA model has not yet been explored. Another avenue that can be explored is to study whether the SPAL model can be hierarchically designed to implement semantic partitioning. Moreover, an investigation is required to perform the semantic partitioning automatically by using artificial intelligence techniques, such as an interactive associative classifier [68] or dependency-based information [69] to deduce the hidden semantic relations in the collected tweets. A final avenue for future work is to investigate the performance of the SPAL model when employing different partitioning techniques, such as random, statistical, or graphical partitioning approaches.
Author Contributions: Conceptualization, E.F.; methodology, E.F. and S.I.; software, E.F. and S.I.; formal analysis, E.F.; investigation, E.F. and S.I.; data curation, S.I.; writing-original draft preparation, E.F. and S.I.; writing-review and editing, E.F. and S.I. All authors have read and agreed to the published version of the manuscript.