Next Article in Journal
Multiscale Wind Forecasting Using Explainable-Adaptive Hybrid Deep Learning
Previous Article in Journal
Additive Manufacturing of Ceramics Study: Sustainable Material Extrusion and Its Potential Role in Circular Economy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Negation: Enhancing Sentiment Analysis for Complex Sentences

1
Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC 20008, USA
2
Department of Electrical and Computer Engineering, University of the District of Columbia, Washington, DC 20008, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(2), 1000; https://doi.org/10.3390/app16021000
Submission received: 27 October 2025 / Revised: 26 December 2025 / Accepted: 15 January 2026 / Published: 19 January 2026
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)

Abstract

Numerous valuable information is available on the Internet, and many individuals rely on mass media as their primary source of information. Various views, comments, expressions, and opinions on social networks have been a tremendous source of information. Harvesting free, resourceful information through social media makes text mining a powerful tool for analyzing public opinions on various issues across diverse social networks. Various research projects have implemented text sentiment analysis through machine and deep learning approaches. Social media text often expresses sentiment through complex syntax and negation (e.g., implicit and double negation and nested clauses), which many classifiers mishandle. We propose hybrid negation, a clause-aware approach that combines (i) explicit/implicit/double-negation rules, (ii) dependency-based scope detection, (iii) a TextBlob back-off for phrase polarity, and (iv) an MLP-learned clause-weighting module that aggregates clause-level scores. Across 156,539 tweets (three-class sentiment), we evaluate six negation strategies and 228 model configurations with and without SMOTE (applied strictly within training folds). Hybrid Negation achieves 98.582% accuracy, 98.196% precision, 98.189% recall, and 98.193% F1 with BERT, outperforming rule-only and antonym/synonym baselines. Ablations show each component contributes to the model’s performance, with dependency scope and double negations offering the largest gains. Per-class results, confidence intervals, and paired tests with multiple-comparison control confirm statistically significant improvements. We release code and preprocessing scripts to support reproducibility.

1. Introduction

Nowadays, the Internet has become the primary source of information for almost everything [1], and approximately 87% of its users utilize it as a research tool [2]. The Web offers a variety of thoughts, comments, and views, and online users constantly publish reviews on blogs and social networks, which generates content at an impressive speed [3]. Social media platforms have been an enormous source of knowledge, characterized by the rapid spread of information, where many personalities convey their views through social networks [1,4]. In 2024, for instance, Twitter had more than 500 million monthly active users, with over 200 million daily tweets on a wide range of topics [5]. Text mining has been a powerful tool for extracting valuable information while working on social networks [6]. Text mining has become a viable tool for analyzing public opinions on various issues across social media outlets [7].
Sentiment analysis using deep learning and machine learning approaches has been widely employed by researchers in various projects, including movie reviews [8,9], fake tweet detection [10,11], medicine intake detection [12], and others [13,14].
Text sentiment analysis practices are generally categorized into statistical techniques, lexicon-based methods, and hybrid processes [15]. Prior to performing sentiment analysis, researchers perform text processing, including removing URLs and punctuation, tokenization, normalization, removing stop words, and stemming and or lemmatization [16,17,18,19,20]. The impact of text preprocessing on sentiment analysis significantly improves model accuracy [20,21] by removing unnecessary noise, such as punctuation, stop words, and irrelevant characters.
There are 179 words in the NLTK English stop words list, including the word “not” and its contraction forms [22]. Removing stop words during text sentiment analysis can impact the sentiment value. Consider the following negative sentiment review: “The product was not good”. After removing stop words, the term becomes “product good”, a positive sentiment.
It is critical to perform a text sentiment analysis with negation handling. It might help to accurately compute a sentence’s sentiment by correctly identifying and addressing negative words, such as “not” or “no”, These negative words can completely flip the polarity of a sentence, which significantly affects the overall sentiment classification if not adequately handled. Ignoring negation might lead to misjudging negative sentiment as positive and vice versa. Improperly handling negation in text sentiment analysis might lead to incorrect classification [23].
Inverting the polarity of the negated term [3,24] is a typical way to handle the negation, which sometimes neglects the scope and impact of negation [25,26]. Other techniques combine the scope of negation with enhanced XLNet to improve the effectiveness of negation handling [27]. About 30% of reviews are implicit sentiments [28], so considering the implicit component combined with the rule-based might enhance the negation in text sentiment analysis [29].
Simple or traditional rule-based negation systems typically fail to capture contextual dependencies within sentence structures [30]. It may encounter difficulties in handling the inherent complexity of human language and nuanced expressions [31]. Accurate negation handling is critical, as it can significantly impact sentiment interpretation and classification result [32], while failure to handle negation correctly can distort polarity and cause systematic errors in sentiment classification [24]. Although various methods have been introduced, effectively managing negation continues to be an obstacle in text-based sentiment analysis [33,34,35]. This study introduces a hybrid negation approach (integrating multiple components) to address the persistent challenge of handling negation in complex sentences. Furthermore, the hybrid negation approach is expected to outperform single-method techniques, delivering more accurate and reliable sentiment predictions for complex textual data.
This work is organized as follows: Section 2 reviews the existing literature on negation handling, Section 3 covers the research method, and Section 4 explains the results. Furthermore, Section 5, Section 6 and Section 7 confer discussions, contributions, and conclusions, respectively. Future work is presented in Section 8.

2. Literature Review

In the literature, most negation handling methods involve determining the polarity of sentences [3,24,36,37]. Several proposed methods exist to detect statements with negation using static windows or punctuation marks [34], which utilize the “_NEG” suffix [25,26]. The other methods used are BERT memorization [38] and applying similarity and antonym to the negation words [34]. Previous studies have only employed neural network detection for negation detection [29] and grouped negation based on classes [39]. Table 1 presents previous research projects that have worked on negation handling for sentiment analysis.
Older models, such as BiLSTM, underachieve in complex sentences commonly due to their sequential nature which struggles with very long-range dependencies and complex relationships [40]. Secondly, BiLSTM is composed of a back and forward LSTM, so for extremely long sequences, maintaining and updating the hidden state becomes computational and memory intensive [41].
The research projects [29,39] did not perform sentiment analysis. They classified text documents into negation or non-negation to identify whether expressions were negated. The study in [29] employed neural-network-based detection to sense both explicit and implicit negation using both rule-based and implicit cues. Using a BiLSTM, the project achieved the highest F1-score of 93.09%, while [39] applied scope, diminisher, and morphological negation. Diminisher negation applied a 0.2 multiplier, while morphological (implicit negation) multiplied by −1. This study attained the highest accuracy of 83.3%.
A research study [38] focused on BERT memorization to detect and understand negation using specific training data. The goal was to gain a better understanding of how to identify the source of BERT’s errors to improve sentiment analysis. The project achieved the highest precision of 88%, recall of 89%, and F1-score of 89%.
The work study in [34] implemented rule-based negation, similarity (synonym), and antonyms of negated expressions. This technique uses synsets from the NLTK library, providing a more straightforward way to find antonyms or synonyms of negated words in the WordNet lexical database rather than a complex training model. This technique could be the most cost-effective in handling negation. Applying logistic regression with word embeddings, lemmatization, and negation, the project earned an accuracy of 91.79%.
The following projects [36,37] applied a rule-based approach to switch the polarity or sentiment of the expressions: the work study in [36] implemented naïve Bayes with negation handling, which gained the best accuracy of 77.57%, while Ref. [37] achieved the highest accuracy of 77.3% using SentiTFIDF.
Another negation-handling project [3] implemented a rule-based approach using a dependency parse tree to enhance the negation technique. Its goal was to analyze grammatical relations by type of dependency, such as nsubj, aux, neg, dobj, and cc. Using word sense disambiguation (WSD), the project achieved a 67% accuracy and an F1-score of 72%.
Work studies [25,26] implemented _NEG to identify negated terms and employed the scope of negation to measure the impact of word relations on the final sentiment analysis. The project in [25] achieved the highest accuracy of 95.67% by utilizing an ANN. On the other hand, the work study in [26] applied _NEG identification and double negation with scope, achieving the highest F1-score of 69.5% using the SVM classifier.
Next, Ref. [24] used identification of explicit and implicit negations with the SentiWordNet (SWN) lexicon. This method has improved sentiment analysis by 2–6% compared to the traditional method. Using a hybrid RBF-SVM, the project achieved the best accuracy of 58.67%.

3. Methodology

The experimental design in this project is illustrated in Figure 1. Before performing negation handling, we preprocessed text documents (remove retweets, URLs, punctuation, numbers, and alphabetical characters) to ensure precise and reliable analysis. We expanded contractions to their full forms to facilitate negation detection. After cleaning up the dataset, we applied six negation handling methods (TextBlob (version: 0.19.0), Negex (negspacy version: 1.0.4) with SpacyTextBlob (version: 5.0.0), Antonym–synonym, Antonym–synonym with a second rule-based, zero-shot, and Hybrid). We applied SMOTE (imblearn version: 0.14.1, shown in Appendix B) to mitigate class imbalance and developed 228 models consisting of 114 without SMOTE (control) and 114 with SMOTE (treatment) to classify tweets into positive, negative, and neutral categories. Using the Wilcoxon Signed Rank Test, we confirmed that SMOTE improved model performance in this research project. A detailed explanation of the proposed research study is presented in the following sections.

3.1. Dataset

We used the dataset from [42], which is available to the public [43], to handle negation in text sentiment analysis. It consists of one hundred fifty-six thousand, five hundred thirty-nine public tweets. The public tweets were collected daily from 26 September 2021 to 27 March 2022.

3.2. Data Cleaning

Text preprocessing ensures the dataset is accurate, appropriate, and consistent, which leads to better model performance [44,45,46]. In this process, we removed retweets to eliminate duplicated content. Additionally, we removed unnecessary information, including URLs, punctuation, numbers, and irrelevant characters to minimize noise that could influence analytical accuracy. Lastly, we expanded contractions into their extended forms, such as the word “weren’t” into the form of “were not”, to ensure that negated sentences were easily detected.

3.3. Negation Handlings

We implemented six distinct negation handling techniques to identify the most effective approach for sentiment analysis. The negation handling techniques applied in this study include TextBlob, Negex with SpacyTextblob, antonym–synonym, antonym–synonym with second rule-based, zero-shot, and a Hybrid approach.

3.3.1. TextBlob

Figure 2 depicts the TextBlob negation handling process, where detection of negation terms such as “not”, “n’t”, “never”, or “no” triggers polarity inversion by multiplying the sentiment score by −1, effectively reversing its orientation from positive to negative or vice versa.
ns = s × −1
where ns is negation sentiment, and s is the original sentiment score. When negation is detected, the original sentiment score is multiplied by negative one. The new polarity score would switch its original score from positive to negative and vice versa.

3.3.2. Negex with SpacyTextBlob

The subsequent approach integrated Negex and SpacyTextBlob to improve negation detection and sentiment computation (Figure 3). Negex was employed to accurately identify negations in complex sentences, reducing false positives that could arise from misclassification. This approach helps narrow the scope of negation words [47]. After identifying negations, we computed its sentiment using SpacyTextBlob. When negation terms were detected, sentiment scores were inverted by multiplying the original value by −1 switching polarity from positive to negative or vice versa.

3.3.3. Antonym–Synonym-Only

We employed antonym and synonym substitution to handle negation, as illustrated in Figure 4. If a negated term is detected, the subsequent phrase is replaced with its antonym to reverse the intended meaning. Take the following expression: “The scenery in the previous location is not stunning” became “The scenery in the previous location is terrible”. If an antonym was available in WordNet, it was retrieved and used to replace the phrase; otherwise, the first antonym from the list was applied. When a synonym from synsets was suitable for a negated expression, it replaced the original term to convey the intended meaning. If neither an antonym nor synonym was found, the original word was retained.

3.3.4. Antonym–Synonym with Second Rule-Based

The fourth negation we experienced was almost like the third one (antonym–synonym-only). The difference was that we applied another rule-based on terms that have no antonym or synonym. If an antonym–synonym pair was not found, we reapplied a rule-based approach by multiplying the sentiment score by −1. The purpose of this step is to accommodate negated terms that have no antonym–synonym.

3.3.5. Zero-Shot

The fifth method for handling negation was zero-shot classification, as shown in Figure 5. We applied pre-trained model “zero-shot-classification” using “facebook/bart-laarge-mnli” model. This classification model was initially applied in image processing to predict unseen pictures [48] and can be applied directly to predict new data without training [49,50]. Recently, this model classifier has been adapted to various natural language processing (NLP) tasks, including document classification [51], text classification [52], entity recognition [53], and relation extraction [54].

3.3.6. Hybrid Negation

The last negation technique we implemented was hybrid negation. This method consists of rule-based, implicit, double negations, dependency parsing to analyze nested clauses, MLP-learned weighting, and TextBlob integration. Figure 1 displays the sentiment computation flow chart using hybrid negation method.
Dependency Parsing
The code uses spaCy model “en_core_web_sm” (pre-trained English pipelines) to build a dependency tree to map the grammatical relationships between words to identify the sentence structure and clause boundaries. Figure 6 displays the pseudocode to set up a dependency tree. The depth of a token within this tree helps detect nested sub clauses, a key feature of complex sentences. The grammatical scope for a token can be established by routing its location within the dependency tree.
Figure 7 displays dependency relations from the following sentence: “Although the food was delicious, the service was slow and inefficient”. The word “was” acts as the head (governor) of the words “service, slow”, and “inefficient”, while “and” is the coordinating conjunction for words “slow” and “inefficient”. “Service” is the subject dependent of “was”, while “slow” and “inefficient” are adjectival predicate dependents. The entire first clause (“Although the food was delicious”) acts as an adverbial clause modifier (advcl label), and the second occurrence of “was” serves as the head verb of the main clause. So, the word “was” (in the second clause) is the ultimate head (root) of the entire sentence’s dependency structure.
Rule-Based, Implicit, and Double Negation
Explicit negation has a direct or asserted negative meaning, while implicit negation has a negative meaning carried from presupposition or pragmatic inference rather than what is literally stated [55]. The explicit negation identifies words like “not”, “never”, “n’t”, and “no” as negated expressions, while implicit negation relies on context or specific words that imply denial or absence, for instance, “barely”, “hardly”, or “seldom”. For example, “He seldom talks about the past” implies “He does not talk about the pass”. Double negation uses two negative terms to form a statement, such as “not in-significant” or “not hardly interesting”. Setting up rule-based, implicit, and double negation in this project was explained in Figure 8. In Figure 9, explicit negation will switch a positive sentiment to a negative or vice versa by multiplying the sentiment score with a negative one, while implicit negation will partially switch the sentiment by multiplying with −0.7. Furthermore, double negation will neutralize each other’s negation, so two negative words will cancel each other’s. For example, an expression “not in-significant” can be translated as “significant”.
TextBlob Integration
As shown in Figure 9, applying TextBlob provides the foundational sentiment polarity for each individual phrase in complex sentences. Next, the original sentiment score will be modified by the rest of the system according to its complexity and clause structure.
MLP (Multilayer Perceptron) Learned Weight Mechanism
An MLP neural network will learn the optimal weighting of sentiment scores from different clauses according to their scopes and structures in nested clauses. Instead of having a uniform inversion, the MLP provides a nuanced approach because it teaches weight sentiment scores according to sentence features and structures. The MLP model was created using input layer with three features (negation count, clause depth, and sentence length) and output layer with one value (the weighing factor). The model learns the possibility that a longer sentence with multiple nested clauses might have a different weight than a simple expression. In Figure 8, we applied a sigmoid function, σ , to ensure that the weight value is between zero and one. The application of MLP-learned weights (the last step) toward various clause complexities and structures, as shown in Figure 10.
Figure 1 displays the hybrid negation sentiment computation flow chart. For each clause i, a base sentiment score, si, is calculated using the TextBlob approach. The weight of wi is determined by an MLP (Multilayer Perceptron) using a sigmoid function, σ , s to ensure the weight value is between 0 and 1 for the clause based on a set of linguistic features, fi. The polarity score computed by TextBlob, the polarity score provided by our rule-based module, and the score supplied by the dependency-parse module are the input vector for the MLP. The purpose of using the MLP is to provide a non-linear learning function that combines the best outputs of the various components to yield the final sentiment score, which serves as the actual learning mechanism for determining when to rely on the rule-based or dependency-based output. Next, the final sentiment score, S, is calculated as the weighted sum of the individual clause sentiment scores.
w i = σ ( M L P f i )
S = i w i s i

3.4. SMOTE

The initial imbalanced dataset was divided into training and testing sets. Then, a vectorization model was fitted exclusively on the training dataset which would convert the raw text of the training set into a numerical feature space. Next, the SMOTE (synthetic minority over-sampling technique) was applied to the vectorized training data that would create synthetic new minority class samples to address the imbalanced data. After having balanced training data, machine learning models trained the new-adjusted training data. Finally, the trained model evaluated untouched testing dataset to measure model performance.

3.5. Wilcoxon Signed Rank Test

The final step was to apply to the Wilcoxon Signed Rank Test to the model performance. The W-Statistic test is a non-parametric statistical test that is useful for ranked or ordinal data to provide an alternative for analyzing paired observations without requiring a normally distributed dataset [56]. A Z-score is computed to perform hypothesis testing using a z-score formula, where n is the number of data, W+ is the sum of positive signs, W is the sum of negative signs, and t ties data (have the same rank).
z = M A X   W + , W n ( n + 1 ) 4   n ( n + 1 ) ( 2 n + 1 ) 24 t 3 t 48
We examined whether there was a statistically significant improvement when we applied the SMOTE imbalance technique on the negation handling techniques. Furthermore, we evaluated whether the hybrid negation performance significantly surpassed the other negation methods.

4. Results

We examined the impact of the SMOTE imbalance technique employing negation handling on text sentiment analysis. We used classification models without the SMOTE as a control group in this study. The following sections present the results of each method (full table can be found in Appendix A).

4.1. Control Classification Model (Without SMOTE)

Figure 11 represents the model performance without applying the SMOTE imbalance technique (serving as a control group). Without applying SMOTE, we achieved the highest accuracy of 98.262% using the hybrid negation technique applying a BERT model classifier as the final prediction. The next highest accuracy (98.030%) was achieved by employing a similar negation handling method (hybrid negation) and applying a RoBERTa classification model for the final prediction. The third-highest accuracy (97.723%) was achieved by applying antonym–synonym with a second rule-based approach using a BERT model classifier.
Without applying the oversampling technique, we achieved a model precision of 97.750% by utilizing the hybrid negation with the BERT model classifier. The next highest precision (97.605%) was achieved by applying the antonym–synonym method in conjunction with a second rule-based approach and the BERT model classifier. The third-highest precision (97.349%) was accomplished using the hybrid negation technique with the RoBERTa classification model.
Figure 12 shows the top 10 recall (a) and F1-score (b) without applying SMOTE. The application of the hybrid negation using the BERT model classifier achieved the highest recall of 97.816%. The second-highest model recall (97.689%) was achieved using the antonym-synonym with a second rule-based approach and the BERT as the final prediction model. The third-highest model recall (97.503%) was achieved by applying hybrid negation and using RoBERTa as the final prediction model.
Lastly, handling negated expressions using a hybrid method combined with BERT classification achieved the highest F1-score of 97.783%, as shown in Figure 12. The second-highest F1-score (97.647%) was achieved by applying the antonym–synonym negation with a second rule-based approach and BERT as its final prediction model. The third-highest model F1-score of 97.426% was acheived by applying the hybrid negation with RoBERTa as the model classifier.
Table 2 shows the lowest model accuracies and precision without applying SMOTE (control group). The lowest model accuracy (69.059%) was achieved when we used the antonym–synonym with a second rule-based approach using traditional gradient boosting as a classification model. The next lowest model accuracy was 69.059% when we used the antonym–synonym with only the gradient-boosting model classifier. Lastly, the next lowest model accuracy of 73.307% resulted when we operated TextBlob negation handling and the gradient-boosting model classification.
The lowest model precision of 70.045% was achieved when we applied a zero-shot negation handling technique using MLP as its final model classifier. The second-lowest model precision (71.147%) was observed when we applied zero-shot using gradient boosting as the classification model. The next lowest model precision (73.727%) occurred when we used the zero-shot negation method applying MLP and LinearSVC as its base model with voting classifier as its final prediction.
Table 3 displays the lowest recall and F1-score. The lowest recall, 51.552%, happened when we used zero-shot negation handling and gradient boosting as its model classifier. The next lowest model recall (58.029%) was observed when we applied the zero-shot negation technique with random forest as the final classification model. Lastly, the next lowest recall, 59.564%, occurred when we utilized zero-shot negation and logistic regression as the classification model.
Furthermore, the lowest F1-score of 55.531% was exhibited by applying a zero-shot negation approach with gradient boosting as the model classifier. The next lowest F1-score (63.278%) was achieved by applying zero-shot negation handling using random forest as its final classification model. The third-lowest F1-score (63.453%) occurred when we applied similar negation method (zero-shot) using Logistic Regression (LR) as the model classifier.

4.2. Classification Model with SMOTE

Figure 13 represents the ten highest model accuracy (a) and model precision (b) classification model performances applying the SMOTE imbalance technique on negation handling techniques. The highest accuracy of 98.582% was achieved when we applied hybrid negation by utilizing a BERT model classifier. The next highest model accuracy (98.486%) was accomplished by applying the same negation handling technique (hybrid method) using a RoBERTa classification model. The third-highest model accuracy (98.098%) was achieved when we applied hybrid approach to handle negation and utilized Multi-Layer Perceptron (MLP) as its model classifier.
Like model accuracy, the best precision (98.196%) was achieved by employing hybrid negation handling technique, utilizing BERT as its final prediction method, shown in Figure 13. The next highest precision (98.101%) was achieved by applying same negation (hybrid method) utilizing MLP and gradient boosting as its base model and the logistic regression as its final prediction. The third-highest precision (98.073%) was achieved by employing hybrid approach as its negation handling applying MLP and gradient boosting as the base model and logistic regression for its final prediction.
Figure 14 represents the ten highest recall (a) and F1-score (b). The best recall (98.189%) was achieved by employing hybrid negation, utilizing a BERT as its final prediction method. The second-highest recall (98.098%) was achieved by applying the same negation technique (hybrid approach) and using RoBERTa as the final prediction. The third-highest recall (98.097%) was achieved using hybrid negation handling approach employing MLP as its final model classifier.
Like the recall score, the best F1-score (98.193%) was achieved by applying hybrid negation, using BERT as its final model prediction, shown in Figure 14. The second-highest F1-score (98.098%) was achieved using a similar negation handling method with MLP as its final classification model. The third-highest recall (98.079%) was achieved using hybrid negation handling approach, combined with RoBERTa as the final prediction model.
The lowest accuracy (69.121%) was achieved when we applied antonym–synonym-only using the gradient-boosting model classifier, as shown in Table 4. The second-lowest model accuracy (70.059%) was achieved when we applied the antonym–synonym negation with second rule-based method using the gradient-boosting model for its final model classifier. The third-lowest model accuracy (75.219%) was achieved when we applied TextBlob negation handling with gradient boosting as the classifier.
Next, our experiment achieved the lowest model precision (72.626%) when we used an antonym synonym with second rule-based method, applying gradient boosting as the model classification, as displayed in Table 4. The second-lowest precision (74.060%) occurred when we utilized the zero-shot negation handling technique with the BERT model classifier. Furthermore, our experiment showed that the next lowest precision (75.018%) occurred when we applied the antonym–synonym-only applying the gradient-boosting model classifier.
Table 5 represents the lowest model recalls and F1-scores with SMOTE imbalance technique. The lowest model recall of 69.193% occurred when we applied antonym–synonym-only negation handling with gradient boosting. The following lowest recall (69.882%) occurred when applying the zero-shot negation handling technique with the BERT classification model. The third-lowest recall (70.176%) was achieved by applying an antonym synonym with second rule-based negation technique and gradient boosting for its model classifier.
Furthermore, the lowest model F1-score (68.777%) was achieved when we applied the antonym–synonym-only feature with the gradient-boosting model classifier, as displayed in Table 5. The next lowest F1-score (69.854%) utilized the antonym synonym with second rule-based negation handling, employing gradient boosting as its model classifier. Lastly, applying the zero-shot negation technique with the BERT model classifier, we achieved the third-lowest F1-score of 71.911%.
Figure 15 summarizes the negation handling performance in this research study. Hybrid negation surpassed the other negation handling methods with the highest accuracy of 98.582%. Secondly, comparing average accuracies according to negation methods, we found that the hybrid method also achieved the highest average accuracy (95.789%). Next, matching the median accuracies based on the negation handling techniques, hybrid negation (96.836%) outperformed the others’ medians. Furthermore, evaluating the accuracy of the modes by the negation handling methods, the hybrid approach (96.570%) surpassed the other methods. Lastly, assessing the minimum accuracy based on the negation handling techniques, we found that hybrid negation still showed an outstanding performance (77.019%) compared to the other techniques. The summary statistics of the various negation handling techniques showed that text sentiment analysis using the hybrid negation technique outperformed the other negation handling methods.

5. Discussion

According to our experiment results, the highest model accuracy of 98.582%, was obtained when hybrid negation handling was combined with a BERT classifier for the final prediction. Using the same negation strategy with a BERT-based classification model also produced the highest precision, reaching 98.196%. Likewise, applying hybrid negation with BERT yielded the highest recall of 98.189%. The best F1-score, 98.193%, was achieved under the same configuration. Overall, hybrid negation consistently enabled classification models (particularly those using BERT) to outperform all other negation handling techniques.
Like the performance using the SMOTE imbalance technique, the highest accuracy in the control group (98.262%) was achieved by applying hybrid negation with BERT as its model classifier. In addition, the highest precision in the control group (97.750%) was achieved by utilizing a similar negation technique (hybrid) with BERT as its final prediction model. Furthermore, applying hybrid negation with BERT as its model classifier in the control group achieved the highest recall of 97.816%. Lastly, the highest F1-score (97.783%) in the control group was accomplished by applying the hybrid negation technique with BERT as the classifier.
The hybrid negation performed much better than the other negation handling techniques in the control experiment group. There were 15 out of 20 models with the best accuracy in our experiment (75%) that were accomplished when we utilized the hybrid negation technique. Two out of twenty best model accuracies (10%) in this research project were achieved when we applied the TextBlob negation handling method. Next, there were 1 out of 20 best model accuracies in this study (5%) that were achieved when we employed the Negex method. Additionally, 2 out of 20 best model accuracies (10%) in this project were achieved when we applied antonym–synonym with a second rule-based negation handling method.
Like the control group, hybrid negation consistently outperformed the other negation handling techniques when we applied the SMOTE imbalance method. There were 16 out of 20 best model accuracies (80%) that succeeded when we employed a hybrid approach. Next, 2 out of 20 (10%) achieved best accuracies when we applied antonym–synonym with a second based-rule. At the same time, the TextBlob negation handling method succeeded in achieving 2 out of 20 best model accuracies (10%) in our experiment.
Table 6 represents an ablation study of the hybrid negation handling technique. It shows that reducing a component in hybrid negation decreased all model performances. Without adding a “scope” component to the hybrid, the model accuracy decreased the most (by 0.402%). The highest reduction in precision (by 0.441%) happened when we excluded a “double negation” component in the system. Next, the greatest drop in recall (by 0.594%) occurred when we eliminated “scope” components in the hybrid negation. Furthermore, the highest decrease in F1-score (by 0.515%) happened when we excluded “scope” components in the system. Consequently, the scope element in the hybrid negation seems to have greatly affected the system because misinterpreting the scope of sentences can lead to misinterpretation of the information which leads to inaccurate sentiment analysis.
There is still a possibility that our hybrid model struggles to recognize linguistic ambiguity, such as in ambiguous sentences or sarcasm. Our model relies on predefined rules or features that may not be considered for nuanced human language use. The presence of sarcasm might lead to incorrect sentiment analysis. For example, “Oh, fantastic! It was wonderful that the customer service hung up on me three times”, might be misclassified as a positive sentiment because of the words “fantastic” and “wonderful”. The expression shows anger or dissatisfaction, which is considered a negative sentiment. Another possible misclassification hybrid model is when dealing with complex metaphors. For instance, “The team’s performance was a slow-motion train wreck”, might be translated into neutral, as it interpreted “slow-motion train wreck” as a noun, which is classified as neutral. In fact, the metaphorical meaning of the expression “slow-motion train wreck” describes a poor performance that is gradual and drawn out, which likely leads to negative sentiment.
For evaluation purposes, we conducted a comparative analysis of our proposed hybrid method against LLMs (large language models). We applied the “cardiffnlp/twitter-roberta-base-sentiment” model with the transformers library’s pipeline function to perform sentiment analysis using the LLMs. Table 7 compares the performance of our suggested methods with that of the LLMs. For all metrics, the Hybrid methods surpass the LLM technique. The best model performance for the LLMs (accuracy of 93.855%, precision of 93.808%, recall of 93.833%, and F1-score of 93.807%) was achieved when we applied the MLP-LinearSVC classifier. On the other hand, our proposed methods using the same model (MLP-LinearSVC) achieved 98.067% accuracy, 98.072% precision, 98.066% recall, and 98.066% F1-score.
The descriptive statistics in Table 8 show that applying SMOTE improved overall model performance. The average accuracy increased from 90.379% to 92.764%; its median rose from 92.254% to 93.197%; the minimum accuracy grew from 69.059% to 69.121%; and its maximum accuracy advanced from 98.262% to 98.582%. In addition, utilizing an imbalance technique (SMOTE) also improved the standard error from 0.569 to 0.500, the standard deviation from 6.074 to 5.340, and the sample variance from 36.896 to 28.514. Consequently, applying SMOTE advanced model performance in this research project.
We utilized classification models without SMOTE as a control experiment while applying SMOTE as a treatment group. It seems that applying the SMOTE imbalance technique has improved the classification model accuracy (as shown in Table 8). To validate this assumption, we used a one-tailed Wilcoxon Signed Rank Test to examine our hypothesis.
H0: P0 = P1
H1: P0 < P1
H0 is the null hypothesis, while H1 represents the alternative hypothesis. P1 describes the performance of the SMOTE model, and P0 is designated as the performance of the control group (without the SMOTE technique).
Our null hypothesis asserts that the metric performance of the control group (without the SMOTE technique) is the same as that of the treatment group (with the SMOTE). On the other hand, the alternative hypothesis posits that the treatment class outperformed the control group. We used 99% confidence intervals with a 1% significance level (a z-score of 2.33 as the cutoff).
A z-score value (Table 9) was created for accuracy, precision, recall, and F1-score from the SMOTE model performance. The table shows that the z-score for accuracy (8.186723131) is extremely high, and the p-value close to zero. In addition, the z-score for precision (7.892674018), recall (8.729583031), and F1-score (8.636278986) are also exceptionally high, meaning that the p-value is extremely low, close to zero. Using a 99% confidence interval (significant level of 1%), the z-scores of SMOTE are higher than the 2.33. Therefore, we rejected the null hypothesis. The W-test results suggest that the SMOTE imbalance technique improves model performances in this negation handling in text sentiment analysis. The result corresponds to research studies [57,58,59] that show that the oversampling method enhances model performance for its ability not simply to generate synthetic minority samples but also to remove misclassified samples [57] and reduce noisy samples [59], which improves model performances, predictive accuracy, and mitigates the risk of overfitting [57].
Next, to examine whether the hybrid negation handling performance surpassed that of the other negation techniques, we applied a one-tailed Wilcoxon Signed Rank Test with a 99% confidence interval (significance level 1%). The null hypothesis stated that the metric performance of hybrid negation was the same as that of the other negation techniques. In contrast, the alternative hypothesis stated that hybrid negation performed better than the other negation handling methods.
Table 10 presents z-scores for the hybrid negation technique compared with the other negation handling methods (Negex, synonym with second rule-based, synonym-only, TextBlob, and zero-shot). The z-scores were higher than the z-score of the confidence interval (z-score of 2.33). Consequently, we rejected the null hypothesis. Utilizing a 1% significance level (confidence interval of 99%), the W-test suggests that the hybrid negation technique significantly outperforms the other negation techniques.
Our research project surpassed the previous work studies. We successfully achieved the highest model performance (both with and without SMOTE) when we applied hybrid negation and the BERT model classification as the final prediction model. Our experiment, with an accuracy of 98.582%, outperformed the research study [25] (accuracy of 95.67%). The previous study employed Negex negation handling and SentiWordNet (SWN) sentiment computation in conjunction with the ANN classification model.
Additionally, without applying the SMOTE imbalance technique, our project with three classes achieved an accuracy of 98.262%, exceeding the previous research studies. A work study [25], having two classes (positive and negative), achieved the highest accuracy of 95.67%. Project [26] had the same three classes and achieved the best accuracy of 69.5%. Project [34] achieved an accuracy of 77.57%, experiment [37] achieved an accuracy of 77.3%, and study [34] achieved an accuracy of 91.79%. A research study [3] with three classes (positive, negative, and neutral) achieved the best accuracy of 67%.
The previous project [34] utilized antonym–synonym and SWN sentiment computation with a Logistic Regression model, achieving the highest model accuracy of 91.79%. Nevertheless, we outperformed the earlier project when we used the same negation handling method (antonym–synonym), a different sentiment computation (Vader SentimentIntensityAnalyzer), and a BERT model classifier (with SMOTE, achieving 94.822%, and without SMOTE, achieving 94.666%). Furthermore, when we modified the antonym–synonym handling method by applying a second rule-based, we achieved a much higher model performance (with SMOTE 97.796% and without SMOTE 97.723%).
According to the model performance, our model achieved the lowest accuracy of 69.059%, yet it outperformed the previous negation handling project. A research project [24], for example, achieved an accuracy of 58.67% by applying the hybrid RBF (radial basis function)-SVM (support vector machine) model with three classes (positive, negative, and neutral).
Most previous research projects [3,24,25,34] utilized SentiWordNet (SWN) to compute sentiment scores. Project [3] (an accuracy of 72%) applied rule-based negation by inverting the SWN sentiment score and word sense disambiguation model. In addition, the work study [24] achieved the highest accuracy of 58.67% by inverting the SWN polarity for negated expressions and utilizing a hybrid RBF-SVM model classifier. On the other hand, when we applied a different sentiment computation (TextBlob) and model classifier (BERT), we achieved better model performance, with SMOTE at 97.438% and without SMOTE at 97.368%. Consequently, applying a suitable sentiment computation improved classification model performance in text sentiment analysis.

6. Contributions

We would like to highlight the following contributions of this research project to negation handling in text sentiment analysis:
  • This project designed and implemented a hybrid negation method using an MLP-learned weight mechanism that outperformed previous negation handling methods. Employing hybrid negation in this project, it achieved an accuracy of 98.582% using three classes (positive, negative, and neutral). This study outperformed the best previous work study [25] with a 95.67% accuracy rate, using two classes (positive and negative). The summary statistics of the hybrid negation method surpassed the other negation techniques.
  • This research study experimented with six negation handling techniques (Hybrid, antonym–synonym, antonym–synonym with second rule-based approach, TextBlob, Negex with SpacyTextBlob, and zero-shot) to determine the most effective method for handling negation in text sentiment analysis. The previous research projects used a single negation handling technique [38]. For example, BERT memorization was applied, achieving an accuracy of 89% [38], while [29] utilizing neural network detection to identify negated expressions, achieving an F1-score of 93.09%. References [3] (accuracy of 67%) and [24] (accuracy of 58.67%) switched classes/polarity scores on identified negated tweets when handling negation in text sentiment analysis. References [25] (accuracy of 95.67%) and [26] (accuracy of 69.5%) applied the _NEG’ suffix and scope of negation in their projects. The work study in [34] applied synonyms and antonyms to handle negation in text sentiment analysis, reaching the highest accuracy of 91.79%. Reference [39] applied synthetic, diminisher, and morphological methods to identify negated expressions with an accuracy of 83.3%.
  • This project designed, progressed, and examined a total of 228 classification models. The previous studies on negation handling for sentiment analysis applied a maximum of six classification models [29,37], five model classifiers [25], four classifiers [39], and three classification models [26,34]. In contrast, others performed one model classifier only [24,36,38]. Our project achieved an accurate score of 98.582%, a precision of 98.196%, a recall of 98.189%, and an F1-score of 98.193%, surpassing the previous projects.
  • Implementing a statistical hypothesis test, analysis in this research experiment enhanced the work studies’ suggestion that applying the SMOTE method could improve model performance [57,58,59]. In addition, applying hybrid negation method in sentiment analysis significantly improved model performance.

7. Conclusions

The Internet has become the primary source of various information, and many individuals rely on mass media as a source of news. The web generates various opinions, comments, and expressions where online users share their views on social media platforms. Text mining has been a powerful tool for harvesting valuable information and a feasible instrument for analyzing public opinions on various issues on social media.
Sentiment analysis through machine learning and deep learning approaches have been implemented in various research projects. Text preprocessing plays a crucial role in sentiment analysis, significantly enhancing model performance. However, removing stop words as part of the text preprocessing could impact the result of sentiment analysis since negation words, such as “not”, are included in the list of stop words. Many studies have been struggling to handle negation in complex and complicated structures.
This study’s results suggest that applying hybrid negation technique has overcome the challenges of negation in complicated expressions. Employing hybrid approach surpassed other types of negation handling techniques. Our experiment achieved a model accuracy of 98.582%, precision of 98.196%, recall of 98.189%, and an F1-score of 98.193%. Additionally, the oversampling method, SMOTE, improved model performance by minimizing the impact of noisy samples.

8. Future Work

We want to apply this method to datasets from various resources with multilingual corpora. The goal is to test generalization, particularly with languages with different syntaxes. In addition, we aim to investigate the effect of negation handling sentiment analysis to various social networks datasets. Additionally, we intend to investigate whether there is a relationship between the proportion of sentiment classes and various datasets from diverse social media resources.

Author Contributions

Conceptualization, M.Q. and P.C.; methodology, M.Q. and P.C.; software, M.Q.; validation, M.Q. and P.C.; formal analysis, M.Q.; investigation, M.Q. and P.C.; resources, M.Q.; data curation, M.Q.; writing-original draft preparation, M.Q.; writing-review and editing, M.Q.; visualization, M.Q.; supervision, M.Q. and P.C.; project administration, M.Q.; funding acquisition, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding, and the APC was funded by UDC.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Control Model Performance

Table A1 shows model performance of the control group (without SMOTE) in this research study.
Table A1. Control model performance (without SMOTE).
Table A1. Control model performance (without SMOTE).
No.Negation HandlingBase ModelFinal PredictionControl
AccuracyPrecisionRecallF1-Score
1Hybrid BERT98.26297.75097.81697.783
2Hybrid RoBERTa98.03097.34997.50397.426
3Synonym BERT97.72397.60597.68997.647
4TextBlob BERT97.36896.95596.58496.769
5TextBlob RoBERTa97.21896.71796.42596.571
6Synonym RoBERTa97.12596.95297.05597.004
7HybridMLP + LinearSVCVoting96.83896.15795.59795.870
8Hybrid MLP96.83596.17395.49995.823
9HybridMLP + GBLinearSVC96.82296.00695.65695.829
10HybridMLPLinearSVC96.76195.97195.58095.771
11HybridLinearSVC + LRMLP96.75896.36295.11995.708
12HybridMLP + LRLinearSVC96.74596.02095.38495.692
13Hybrid LinearSVC96.73296.30795.16195.706
14HybridMLP + GBLogReg96.70195.78895.50195.642
15HybridMLP + LinearSVC + LRStacking96.68595.81795.44295.626
16HybridLinearSVCMLP96.57996.08594.94095.485
17HybridMLP + LinearSVCLogReg96.57095.64095.32195.478
18HybridMLP + LinearSVC + LRVoting96.57095.91795.03795.459
19HybridMLP + LinearSVCStacking96.54495.65395.28895.467
20Negex BERT96.47495.54295.08795.314
21Negex RoBERTa96.09794.97994.49594.736
22HybridLinearSVC + GBMLP96.03095.63394.15694.847
23Synonym-only BERT94.66694.56194.56294.561
24Synonym-only RoBERTa94.43994.29494.31294.303
25NegexLinearSVC + GBMLP93.92893.24691.05392.041
26NegexLinearSVCMLP93.87493.32091.05092.064
27NegexLinearSVC + LRMLP93.73693.02090.92991.867
28Negex LinearSVC93.72092.89490.68891.678
29SynonymLinearSVCMLP93.63193.56993.25193.382
30NegexMLP + LinearSVC + LRVoting93.61592.23591.31891.753
31Hybrid LR93.60993.68290.35491.777
32SynonymMLP + GBLogReg93.40193.23893.15693.195
33NegexMLP + LinearSVCVoting93.37992.02390.64091.286
34SynonymMLP + LinearSVCStacking93.33793.13693.00893.067
35SynonymLinearSVC + LRMLP93.29293.21092.89993.030
36SynonymLinearSVC + GBMLP93.19393.05492.79992.909
37Synonym LinearSVC93.19393.05692.80592.912
38Synonym-only LinearSVC93.19393.05692.80592.912
39SynonymMLP + LRLinearSVC93.17792.96892.81792.887
40SynonymMLP + LinearSVCVoting93.16592.97792.91792.940
41NegexMLP + GBLinearSVC93.15291.41991.08591.249
42NegexMLP + LinearSVCStacking93.08291.21891.00091.093
43Synonym MLP93.01892.85292.65092.741
44Synonym-only MLP93.01892.85292.65092.741
45NegexMLP + GBLogReg92.99291.12890.79390.945
46SynonymMLP + GBLinearSVC92.98692.81992.69692.744
47SynonymMLPLinearSVC92.94492.73492.66992.697
48SynonymMLP + LinearSVC + LRStacking92.94192.80992.55292.664
49NegexMLP + LRLinearSVC92.89691.39590.07690.691
50NegexMLPLinearSVC92.85890.92490.56590.738
51Negex MLP92.81390.58890.62390.599
52NegexMLP + LinearSVCLogReg92.69590.82390.37890.595
53NegexMLP + LinearSVC + LRStacking92.69590.69890.49790.596
54SynonymMLP + LinearSVC + LRVoting92.54892.33592.21392.266
55TextBlobMLP + LinearSVC + LRVoting92.46290.79490.10090.431
56TextBlobMLP + GBLogReg92.41190.77289.93590.333
57SynonymMLP + LinearSVCLogReg92.28391.98391.97891.969
58TextBlobMLP + LinearSVCStacking92.22690.14190.05390.094
59TextBlobMLP + LinearSVCVoting92.02190.40089.36889.853
60TextBlobMLP + LRLinearSVC91.97389.85389.80989.823
61TextBlob MLP91.86589.85989.52089.683
62TextBlobMLPLinearSVC91.85589.68189.58589.623
63TextBlobMLP + GBLinearSVC91.84689.76989.28789.519
64TextBlobMLP + LinearSVC + LRstacking91.73789.59089.52189.552
65TextBlobMLP + LinearSVCLogReg91.66389.34689.31389.321
66Synonym-onlyLinearSVCMLP91.39891.30091.14891.148
67Synonym-onlyLinearSVC + LRMLP91.38691.25791.12191.132
68TextBlobLinearSVC + GBMLP91.36389.45487.56888.405
69TextBlob LinearSVC91.33489.64887.49488.432
70TextBlobLinearSVCMLP91.24289.52487.40388.332
71TextBlobLinearSVC + LRMLP91.21089.56887.54788.426
72Synonym-onlyLinearSVC + GBMLP91.13691.02290.95590.916
73Synonym-onlyMLP + LinearSVCVoting90.10889.84989.88089.849
74Synonym-onlyMLP + GBLogReg90.09589.86389.87189.846
75Synonym-onlyMLP + LinearSVC + LRStacking90.07989.79589.91689.847
76Negex LR90.04490.19284.80486.844
77Synonym-onlyMLP + GBLinearSVC89.97189.82789.68189.735
78Synonym-onlyMLPLinearSVC89.90789.99289.39889.631
79Synonym-onlyMLP + LinearSVC + LRVoting89.89189.55389.77889.644
80Synonym LR89.80589.91589.16389.410
81Synonym-only LR89.80589.91589.16389.410
82Synonym-onlyMLP + LinearSVCStacking89.78589.69689.34189.475
83Zero-Shot RoBERTa89.62577.56073.75875.611
84Synonym-onlyMLP + LRLinearSVC89.61689.46489.23589.318
85Synonym-onlyMLP + LinearSVCLogReg89.50189.42189.24689.314
86TextBlob LR88.81487.41083.73785.164
87Zero-Shot BERT87.99076.76670.63773.574
88Negex RF87.04289.23879.44682.321
89Hybrid RF85.71389.48477.54880.589
90TextBlob RF85.02687.20977.37979.986
91Zero-Shot LinearSVC84.42079.00262.79666.733
92Zero-Shot LR84.38085.78159.56463.453
93Zero-ShotLinearSVCMLP83.93581.71762.54566.883
94Zero-ShotLinearSVC + LRMLP83.74083.25162.53266.833
95Zero-ShotLinearSVC + GBMLP83.57578.11062.11165.947
96Zero-Shot MLP83.49070.04564.73766.810
97Zero-ShotMLP + GBLinearSVC83.34074.09664.97364.531
98Zero-ShotMLP + LinearSVCStacking83.10079.27364.67566.550
99Zero-ShotMLPLinearSVC83.06074.07764.92964.499
100Zero-ShotMLP + LRLinearSVC83.05073.92565.00664.458
101Zero-ShotMLP + GBLogReg83.01579.11764.18866.098
102Zero-ShotMLP + LinearSVCLogReg82.93578.07264.66366.117
103Zero-ShotMLP + LinearSVC + LRStacking82.93576.82564.71565.685
104Zero-ShotMLP + LinearSVCVoting82.65573.72764.63064.174
105Zero-ShotMLP + LinearSVC + LRVoting82.33576.10764.44665.226
106Zero-Shot RF82.15084.54758.02963.278
107Synonym RF80.46581.06979.35679.730
108Synonym-only RF80.46581.06979.35679.730
109Hybrid GB77.01984.43767.19670.115
110Zero-Shot GB76.05571.14751.55255.531
111Negex GB74.72982.39563.37865.630
112TextBlob GB73.30780.18862.20663.812
113Synonym GB69.05975.16866.86767.230
114Synonym-only GB69.05975.16866.86767.230

Appendix A.2. SMOTE Model Performance

Table A2 represents model performance applying SMOTE imbalance technique in this experiment.
Table A2. Model performance with SMOTE.
Table A2. Model performance with SMOTE.
No.Negation HandlingBase ModelFinal PredictionControl
AccuracyPrecisionRecallF1-Score
1Hybrid BERT98.58298.19698.18998.193
2Hybrid RoBERTa98.48698.06098.09898.079
3Hybrid MLP98.09898.10198.09798.098
4HybridMLP + GBLogReg98.07398.07398.07498.072
5HybridMLPLinearSVC98.06798.07298.06698.068
6HybridMLP + LinearSVCVoting98.04998.05398.04998.048
7HybridMLP + LinearSVCStacking98.02298.03098.02398.022
8HybridMLP + LinearSVC + LRStacking97.99297.99597.99297.991
9HybridMLP + LRLinearSVC97.94397.95097.94397.943
10HybridMLP + GBLinearSVC97.93297.94397.93197.932
11HybridLinearSVC + GBMLP97.90597.90997.90797.903
12HybridLinearSVCMLP97.90597.90997.90797.903
13Hybrid LinearSVC97.90397.90797.90597.901
14HybridLinearSVC + LRMLP97.90397.90797.90397.900
15HybridMLP + LinearSVC + LRVoting97.90197.91097.90197.901
16HybridMLP + LinearSVCLogReg97.86697.87497.86797.864
17Synonym BERT97.79697.73797.71397.725
18TextBlob BERT97.43897.09296.65296.871
19Synonym RoBERTa97.35597.20997.31297.261
20TextBlob RoBERTa97.32096.91396.44296.677
21Negex BERT96.53195.79995.02695.411
22Negex RoBERTa96.32095.29194.83695.063
23NegexMLP + LinearSVCLogReg95.72095.72495.72595.716
24NegexMLP + LRLinearSVC95.68895.68995.69295.684
25NegexMLP + LinearSVC + LRVoting95.67195.67795.67695.665
26Negex LinearSVC95.60595.64695.61595.593
27NegexLinearSVC + GBMLP95.60595.64695.61595.593
28NegexLinearSVCMLP95.60595.64695.61595.593
29NegexLinearSVC + LRMLP95.60595.64695.61595.593
30NegexMLP + LinearSVC + LRStacking95.58895.59495.59295.583
31TextBlobMLP + GBLogReg95.47795.50895.47895.467
32TextBlobMLP + LinearSVC + LRVoting95.46095.47795.46295.450
33NegexMLP + LinearSVCVoting95.45495.46595.45695.449
34NegexMLP + GBLogReg95.45295.46495.45595.446
35NegexMLP + GBLinearSVC95.45095.46195.45095.442
36TextBlobMLP + LRLinearSVC95.42395.42595.42395.419
37NegexMLP + LinearSVCStacking95.37195.39695.37795.363
38Negex MLP95.35495.38195.35795.349
39NegexMLPLinearSVC95.34695.35295.34995.341
40TextBlobMLP + GBLinearSVC95.34695.36295.34795.341
41TextBlobMLPLinearSVC95.32995.37695.33095.318
42TextBlobMLP + LinearSVC + LRstacking95.32595.32895.32595.319
43TextBlobMLP + LinearSVCStacking95.27995.28995.28095.274
44Hybrid LR95.22495.31395.23795.214
45TextBlob MLP95.16495.18995.16695.155
46TextBlobMLP + LinearSVCVoting95.14995.17695.14795.144
47TextBlobMLP + LinearSVCLogReg95.12695.12695.12595.119
48Synonym-only BERT94.82294.67894.70594.692
49Synonym-only RoBERTa94.57694.42994.46394.446
50TextBlobLinearSVC + GBMLP94.13494.18394.14094.118
51TextBlobLinearSVCMLP94.13494.18394.14094.118
52TextBlobLinearSVC + LRMLP94.13494.17994.13494.113
53TextBlob LinearSVC94.13494.18394.14094.118
54SynonymLinearSVC + GBMLP93.43493.45093.45793.430
55SynonymLinearSVCMLP93.43493.45093.45793.430
56SynonymLinearSVC + LRMLP93.43493.45093.45793.430
57Synonym LinearSVC93.43493.45093.45793.430
58Zero-ShotMLP + LRLinearSVC92.95992.95692.94792.924
59SynonymMLP + GBLogReg92.76492.76992.77992.768
60SynonymMLP + LinearSVCVoting92.76192.77592.77992.762
61Zero-ShotMLPLinearSVC92.75792.78292.74592.720
62Synonym-only MLP92.75792.77892.77492.756
63SynonymMLP + LinearSVC + LRVoting92.62192.62392.63792.624
64Zero-ShotMLP + LinearSVCLogReg92.61292.67392.60092.569
65SynonymMLP + LinearSVC + LRStacking92.60592.61392.62092.608
66Zero-ShotMLP + LinearSVCStacking92.58392.65192.57092.543
67SynonymMLP + GBLinearSVC92.57392.59192.58492.582
68Zero-ShotMLP + LinearSVCVoting92.57292.64992.57392.543
69Synonym MLP92.57192.57492.57192.565
70Zero-ShotMLP + LinearSVC + LRVoting92.55792.58192.54592.515
71Zero-Shot MLP92.50992.64492.49792.460
72SynonymMLP + LRLinearSVC92.50892.51192.53092.505
73Zero-ShotMLP + LinearSVC + LRStacking92.50392.50792.49092.468
74SynonymMLPLinearSVC92.49892.51692.51992.495
75Zero-ShotMLP + GBLogReg92.47892.57992.46592.431
76Zero-ShotMLP + GBLinearSVC92.47392.50692.46192.431
77SynonymMLP + LinearSVCStacking92.37392.38692.39392.374
78SynonymMLP + LinearSVCLogReg92.34692.35392.36292.351
79Negex LR92.22492.41292.24292.199
80Synonym-onlyLinearSVC + GBMLP91.83791.96491.85591.808
81Synonym-onlyLinearSVCMLP91.83791.96491.85591.808
82Synonym-onlyLinearSVC + LRMLP91.83791.96491.85591.808
83Synonym-only LinearSVC91.83791.96491.85591.808
84Synonym-onlyMLP + LinearSVCVoting91.64491.67191.64591.619
85Synonym-onlyMLP + LinearSVCStacking91.64291.65591.65491.625
86Synonym-onlyMLP + LinearSVCLogReg91.61791.62091.62791.604
87Zero-ShotLinearSVC + LRMLP91.60191.57991.58691.544
88Zero-ShotLinearSVCMLP91.59391.56691.57891.533
89Zero-ShotLinearSVC + GBMLP91.58491.55691.57091.524
90Zero-Shot LinearSVC91.58491.55691.57091.524
91Synonym-onlyMLP + LinearSVC + LRVoting91.48091.49691.49291.463
92Zero-Shot RF91.45091.81991.43091.403
93Synonym-onlyMLPLinearSVC91.32391.32891.33491.308
94Synonym-onlyMLP + GBLinearSVC91.00291.01291.01390.991
95Synonym-onlyMLP + LinearSVC + LRStacking90.92790.92390.93590.923
96Synonym-onlyMLP + LRLinearSVC90.91890.95090.93190.909
97Synonym-onlyMLP + GBLogReg90.81390.82890.82590.796
98TextBlob LR90.72990.88690.74290.691
99Hybrid RF90.37891.08990.40190.395
100Zero-Shot LR89.93989.80289.91989.843
101Synonym LR89.81489.96089.84989.799
102Negex RF89.52390.38289.55189.510
103Zero-Shot RoBERTa89.21079.13874.18676.582
104TextBlob RF88.49389.27188.51088.440
105Zero-Shot BERT88.10574.06069.88271.911
106Synonym-only LR88.06688.44188.08988.026
107Synonym-only RF82.19082.65082.21682.146
108Synonym RF81.45181.89081.49181.430
109Hybrid GB79.93683.60180.01579.865
110Negex GB76.45480.66376.52376.218
111Zero-Shot GB76.42277.02476.40876.360
112TextBlob GB75.21978.90475.26775.008
113Synonym GB70.05972.62670.17669.854
114Synonym-only GB69.12175.01869.19368.777

Appendix B. Software & Reproducibility Artifacts

Software versions. Python 3.12.12; PyTorch 2.9; Transformers 4.57.3; scikit-learn 1.6.1; imbalanced-learn 0.14.1; spaCy 3.8.11 (en_core_web_sm 3.8); TextBlob 0.19.0; NumPy 2.0.2; pandas 2.2.2.
Artifacts. Upon acceptance, we plan to release preprocessing scripts, training configs, fold splits, seed files, and a notebook that regenerates Table X and Figure X from saved per-fold outputs.

Appendix B.1. Ablation Study

What we ablate. Starting from the full Hybrid method (rules for explicit/implicit/double negation + dependency-scope modeling + TextBlob clause priors + MLP clause-weighting), we create progressively simplified variants:
  • Hybrid (full)—all components enabled.
    • MLP weights—remove the MLP; use fixed rule weights for clauses;
    • Implicit & double—keep explicit negation and dependency scopes, but drop implicit (e.g., “hardly” and “barely”) and double-negation handling;
    • Dependency scopes—apply only sentence-local rules (no dependency-based scope), with TextBlob priors.
  • TextBlob baseline—polarity from TextBlob only (no explicit negation rules).
Protocol. We used the same stratified 5-fold CV, preprocessing, and SMOTE placement (training folds only). Metrics are macro-averaged over folds. We tested Hybrid (full) versus each ablation using paired Wilcoxon signed rank (one-sided, α = 0.01) and adjust p-values with Holm–Bonferroni.
Ablation on the Hybrid negation model (BERT backbone, SMOTE in training folds only). Bars show mean macro-F1 across 5 folds; error bars are standard error. The full Hybrid method significantly outperforms every ablation (paired Wilcoxon, Holm–Bonferroni-adjusted p < 0.01).
Table A3. Ablation results (macro-averaged across 5 folds). Replace “…” with the means ± SE after reruns.
Table A3. Ablation results (macro-averaged across 5 folds). Replace “…” with the means ± SE after reruns.
VariantAcc. (Mean ± SE)Prec. (Mean ± SE)Rec. (Mean ± SE)F1 (Mean ± SE)
Hybrid (full)98.403 ± 97.968 ± …97.883 ± …97.926 ± …
      –
MLP weights
      –
Implicit & double
      –
Dependency scopes
TextBlob baseline97.438 ± …97.092 ± …96.652 ± …96.871 ± …

Appendix B.2. Reproducibility & Implementation Details

Data & labels. We used a public Twitter corpus of 156,539 posts (26 September 2021–27 March 2022) with 3 classes (positive/neutral/negative). We removed retweets, expand contractions (e.g., aren’t → are not) and stripped URLs, punctuation, numbers, and non-text artifacts. Class ratios are reported in Appendix A; original tweet IDs are retained for inspection.
Splits and leakage control. We performed stratified 5-fold cross-validation. All preprocessing, vectorization, model fitting, and SMOTE oversampling occurred inside each training fold only; validation/test folds were never oversampled. Transformer fine-tuning uses an internal 9:1 split from the training fold for early stopping.
Hybrid implementation. We parsed with spaCy to extract dependency scopes (main, relative, temporal, conditional, causal, and contrast). TextBlob provides clause-level polarity priors. Rules handle explicit, implicit, and double negation. A 2-layer MLP maps clause features (scope type, depth, length, and negation cues) to [0, 1] weights via a sigmoid; the sentence score is the weighted, sign-corrected aggregation of clause polarities. The MLP was trained per training fold with Adam, cross-entropy loss, and early stopping (patience = 3).
Backbones & baselines. We compared the following six negation strategies: Hybrid, antonym–synonym, antonym–synonym + rule, TextBlob, NegEx + spaCyTextBlob, and zero-shot. Classifiers include BERT, RoBERTa, LinearSVC, Logistic Regression, MLP, Random Forest, and gradient boosting. The best pipeline is Hybrid + BERT + SMOTE with 98.403 Acc/97.968 Prec/97.883 Rec/97.926 F1 (macro).
Hyperparameters. Tokenization/FT: bert-base-uncased/roberta-base, max_len = 128, batch = 32, lr = 2 × 10−5, epochs = 3–5, early stopping on validation loss. MLP (clause weights): hidden = [64, 32], dropout = 0.2, lr = 1 × 10−3, batch = 64, epochs = 10–20, early stopping. Classical models: scikit-learn defaults unless noted; C tuned over {0.5, 1, 2} for LinearSVC/LogReg via inner CV on the training fold. SMOTE: k_neighbors = 5 via imbalanced-learn; applied after vectorization within the training fold.
Statistics. We report macro accuracy, precision, recall, and F1. For inferential testing, we use paired Wilcoxon signed rank tests (one-sided, α = 0.01) on per-fold scores to compare (i) SMOTE vs. No-SMOTE within a pipeline and (ii) Hybrid vs. each baseline. We adjusted family-wise error with Holm–Bonferroni and report W, z, and adjusted p-values.
Compute & seeds. Experiments were run on a single NVIDIA GPU (e.g., RTX 3090) or CPU. We fixed seeds for NumPy, PyTorch, and scikit-learn and set PYTHONHASHSEED = 0. For CUDA determinism, we disabled non-deterministic CuDNN kernels where applicable; remaining nondeterminism was negligible (<0.02 pp on macro-F1 across repeats).

References

  1. McClain, C.; Vogels, E.A.; Perrin, A.; Sechopoulos, S.; Rainie, L. The Internet and the Pandemic; Pew Research Center: Washington, DC, USA, 2021. [Google Scholar]
  2. Horrigan, J.B. The Internet as a Resource for News and Information about Science; Pew Research Center: Washington, DC, USA, 2006. [Google Scholar]
  3. Diamantini, C.; Mircoli, A.; Potena, D. A Negation Handling Technique for Sentiment Analysis. In Proceedings of the International Conference on Collaboration Technologies and Systems (CTS), Orlando, FL, USA, 31 October–4 November 2016. [Google Scholar]
  4. Wong, A.; Ho, S.; Lyness, D.; Olusanya, O.; Antonini, M.V. The use of social media and online communications in times of pandemic COVID-19. J. Intensive Care Soc. 2020, 22, 255–260. [Google Scholar] [CrossRef]
  5. Dean, B. X (Twitter) Statistics: How Many People Use X? 30 January 2025. [Online]. Available online: https://backlinko.com/twitter-users (accessed on 2 March 2025).
  6. Tao, D.; Yang, P.; Feng, H. Utilization of text mining as a big data analysis tool for food science and nutrition. Compr. Rev. Food Sci. Food Saf. 2020, 19, 875–894. [Google Scholar] [CrossRef]
  7. Oyebode, O.; Ndulue, C.; Adib, A.; Mulchandani, D.; Suruliraj, B.; Orji, F.A.; Chambers, C.T.; Meier, S.; Orji, R. Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach. J. Med. Internet Res. 2021, 9, 22734. [Google Scholar] [CrossRef]
  8. Hassan, A.; Mahmood, A. Deep Learning approach for sentiment analysis of short texts. In Proceedings of the 3rd International Conference on Control, Automation and Robotics (ICCAR), Nagoya, Japan, 22–24 April 2017. [Google Scholar]
  9. Mahmud, Q.I.; Mohaimen, A.; Islam, M.S.; Jannat, M.-E. A support vector machine mixed with statistical reasoning approach to predict movie success by analyzing public sentiments. In Proceedings of the 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 22–24 December 2017. [Google Scholar]
  10. Reis, J.C.S.; Correia, A.; Murai, F.; Veloso, A.; Benevenuto, F. Supervised Learning for Fake News Detection. IEEE Intell. Syst. 2019, 43, 76–81. [Google Scholar] [CrossRef]
  11. Qorib, M.; Kim, J. Fake Tweets Detection and Its Impacts on the 2020 U.S. Election Prediction. In Proceedings of the 2024 International Symposium on Networks, Computers and Communications (ISNCC), Washington, DC, USA, 22–25 October 2024. [Google Scholar]
  12. Mahata, D.; Friedrichs, J.; Shah, R.R.; Jiang, J. Detecting personal intake of medicine from Twitter. IEEE Intell. Syst. 2018, 33, 87–95. [Google Scholar] [CrossRef]
  13. Branz, L.; Brockmann, P. Sentiment Analysis of Twitter Data: Towards Filtering, Analyzing and Interpreting Social Network Data. In Proceedings of the DEBS ’18: The 12th ACM International Conference on Distributed and Event-based Systems, Hamilton, New Zealand, 25–29 June 2018. [Google Scholar]
  14. JAlbornoz, C.-D.; Vidal, J.R.; Plaza, L. Feature engineering for sentiment analysis in e-health forums. PLoS ONE 2018, 13, e0207996. [Google Scholar]
  15. Cambria, E. Affective Computing and Sentiment Analysis. IEEE Intell. Syst. 2016, 31, 102–107. [Google Scholar] [CrossRef]
  16. Qorib, M.; Oladunni, T.; Denis, M.; Ososanya, E.; Cotae, P. COVID-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Syst. Appl. 2023, 212, 118715. [Google Scholar] [CrossRef]
  17. Singh, T.; Kumari, M. Role of Text Pre-Processing in Twitter Sentiment Analysis. Procedia Comput. Sci. 2016, 89, 549–554. [Google Scholar] [CrossRef]
  18. Palomino, M.A.; Aider, F. Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis. Appl. Sci. 2022, 12, 8765. [Google Scholar] [CrossRef]
  19. Firza, N.; Bakiu, A.; Monaco, A. Machine Learning for Quality Diagnostics: Insights into Consumer Electronics Evaluation. Electronics 2025, 14, 939. [Google Scholar] [CrossRef]
  20. Krouska, A.; Troussas, C.; Virvou, M. The effect of preprocessing techniques on Twitter Sentiment Analysis. In Proceedings of the 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), Chalkidiki, Greece, 13–15 July 2016. [Google Scholar]
  21. Alam, S.; Yao, N. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 2019, 25, 319–335. [Google Scholar] [CrossRef]
  22. Prakriti. Removing Stop Words Using NLTK. 26 March 2024. [Online]. Available online: https://www.naukri.com/code360/library/removing-stop-words-using-nltk (accessed on 2 March 2025).
  23. Okpala, I.; Rodriguez, G.R.; Tapia, A.; Halse, S.; Kropczynski, J. A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing. In Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval, Bangkok, Thailand, 17–19 December 2023. [Google Scholar]
  24. Gupta, I.; Joshi, N. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic. J. Intell. Syst. 2019, 29, 1611–1625. [Google Scholar] [CrossRef]
  25. Mukherjee, P.; Badr, Y.; Doppalapudi, S.; Srinivasan, S.M.; Sangwan, R.S.; Sharma, R. Effect of Negation in Sentences on Sentiment Analysis and Polarity Detection. Procedia Comput. Sci. 2021, 185, 370–379. [Google Scholar] [CrossRef]
  26. Gupta, I.; Joshi, N. Feature-Based Twitter Sentiment Analysis With Improved Negation Handling. IEEE Trans. Comput. Soc. Syst. 2021, 8, 917–927. [Google Scholar] [CrossRef]
  27. Ferdinand, W.; Girsang, A.S. Negation Handling on XLNet Using Dependency Parser for Sentiment Analysis. In Proceedings of the 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), Bali, Indonesia, 17–19 December 2024. [Google Scholar]
  28. Russo, I.; Caselli, T.; Strapparava, C. SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015. [Google Scholar]
  29. Singh, P.K.; Paul, S. Deep Learning Approach for Negation Handling in Sentiment Analysis. IEEE Access 2021, 9, 102579–102592. [Google Scholar] [CrossRef]
  30. Hyscaler. Sentiment Analysis: A Step-by-Step Guide. 20 August 2024. [Online]. Available online: https://hyscaler.com/insights/sentiment-analysis-guide/ (accessed on 3 December 2025).
  31. Song, C.; Zhang, Y.; Gao, H.; Yao, B.; Zhang, P. Large Language Models for Subjective Language Understanding: A Survey. arXiv 2025, arXiv:2508.07959. [Google Scholar] [CrossRef]
  32. Hii, D. Using Meaning Specificity to Aid Negation Handling in Sentiment Analysis. Master’s Thesis, University of California at Irvine, Irvine, CA, USA, 2019. [Google Scholar]
  33. Naldi, M.; Petroni, S. A Testset-Based Method to Analyse the Negation-Detection Performance of Lexicon-Based Sentiment Analysis Tools. Computers 2023, 12, 18. [Google Scholar] [CrossRef]
  34. Farooq, U.; Mansoor, H.; Nongaillard, A.; Ouzrout, Y.; Qadir, M.A. Negation Handling in Sentiment Analysis at Sentence Level. J. Comput. 2017, 12, 470–478. [Google Scholar] [CrossRef]
  35. Díaz, N.P.C. Negation and Speculation Detection for Improving Information Retrieval Effectiveness. In Proceedings of the Fifth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2013) (FDIA), 3 September 2013. [Google Scholar]
  36. Kamal, L.H.; McKee, G.T.; Othman, N.A. Naïve Bayes with Negation Handling for Sentiment Analysis of Twitter Data. In Proceedings of the 2022 9th Intl. Conference on Soft Computing & Machine Intelligence, Toronto, ON, Canada, 26–27 November 2022. [Google Scholar]
  37. Ghag, K.V.; Shah, K. Negation Handling for Sentiment Classification. In Proceedings of the 2016 International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 12–13 August 2016. [Google Scholar]
  38. GTejada, N.C.; Scholtes, J.C.; Spanakis, G. A Study of BERT’s Processing of Negations to Determine Sentiment; Maastricht University: Maastricht, The Netherland, 2021. [Google Scholar]
  39. Lal, U.; Kamath, P. Effective Negation Handling Approach for Sentiment Classification using synsets in the WordNet lexical database. In Proceedings of the 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichy, India, 16–18 February 2022. [Google Scholar]
  40. Abadi, V.N.M.; Ghasemian, F. Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model. Sci. Rep. 2025, 15, 80. [Google Scholar] [CrossRef]
  41. Sun, Y.; Zhang, J.; Yu, Z.; Zhang, Y.; Liu, Z. Bidirectional Long Short-term Neural Network Based on the Attention Mechanism of the Residual Neural Network (ResNet–BiLSTM–Attention) Predicts Porosity through Well Logging Parameters. ACS Omega 2023, 8, 24083–24092. [Google Scholar] [CrossRef]
  42. Qorib, M.; Oladunni, T.; Denis, M.; Osasanya, E.; Cotae, P. COVID-19 Vaccine Hesitancy: A Global Public Health and Risk Modelling Framework Using an Environmental Deep Neural Network, Sentiment Classification with Text Mining and Emotional Reactions from COVID-19 Vaccination Tweets. Int. J. Environ. Res. Public Health 2023, 20, 5803. [Google Scholar] [CrossRef]
  43. Qorib, M. COVID-19 Public Tweets; Researchgate: Washington, DC, USA, 2022; Available online: https://www.researchgate.net/publication/364110620_Covid19_Public_Tweets?channel=doi&linkId=633a5ad876e39959d6903819&showFulltext=true (accessed on 2 March 2025).
  44. HaCohen-Kerner, Y.; Miller, D.; Yigal, Y. The influence of preprocessing on text classification using a bag-of-words representation. PLoS ONE 2020, 15, e0232525. [Google Scholar] [CrossRef] [PubMed]
  45. Uysal, A.K.; Gunal, S. The impact of preprocessing on text classification. Inf. Process. Manag. 2014, 50, 104–112. [Google Scholar] [CrossRef]
  46. Vijayarani, S.; Ilamathi, J.; Nithya, N.S. Preprocessing Techniques for Text Mining—An Overview. Int. J. Comput. Sci. Commun. Netw. 2015, 5, 7–16. [Google Scholar]
  47. Chapman, W.W.; Bridewell, W.; Hanbury, P.; Cooper, G.F.; Buchanan, B.G. A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 2001, 34, 301–310. [Google Scholar] [CrossRef] [PubMed]
  48. Romera-Paredes, B.; Torr, P. An embarrassingly simple approach to zero-shot learning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
  49. Lampert, C.H.; Nickisch, H.; Harmeling, S. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
  50. Larochelle, H.; Erhan, D.; Bengio, Y. Zero-data learning of new tasks. AAAI 2008, 1, 3. [Google Scholar]
  51. Nam, J.; Mencía, E.L.; Fürnkranz, J. All-in Text: Learning Document, Label, and Word Representations Jointly. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
  52. Pushp, P.K.; Srivastava, M.M. Train once, test anywhere: Zero-shot learning for text classification. In Proceedings of the ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2017. [Google Scholar]
  53. Ma, Y.; Cambria, E.; Gao, S. Label embedding for zero-shot fine-grained named entity typing. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016. [Google Scholar]
  54. Levy, O.; Seo, M.; Zettlemoyer, L. Zero-Shot Relation Extraction via Reading Comprehension. arXiv 2017, arXiv:1706.04115. [Google Scholar] [CrossRef]
  55. Xiang, M.; Grove, J.; Giannakidou, A. Semantic and pragmatic processes in the comprehension of negation: An event related potential study of negative polarity sensitivity. J. Neurolinguist. 2016, 38, 71–88. [Google Scholar] [CrossRef]
  56. Intellectus360. “Intelectus Consulting,” Intelectus Consulting, 2025. [Online]. Available online: https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/how-to-conduct-the-wilcox-sign-test/ (accessed on 23 October 2025).
  57. Han, Y.; Joe, I. Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging. Appl. Sci. 2024, 14, 9772. [Google Scholar] [CrossRef]
  58. Limanto, S.; Buliali, J.L.; Saikhu, A. GLoW SMOTE-D: Oversampling Technique to Improve Prediction Model Performance of Students Failure in Courses. IEEE Access 2024, 12, 8889–8901. [Google Scholar] [CrossRef]
  59. Meng, D.; Li, Y. An imbalanced learning method by combining SMOTE with Center Offset Factor. Appl. Soft Comput. 2022, 120, 108618. [Google Scholar] [CrossRef]
Figure 1. Design experiment.
Figure 1. Design experiment.
Applsci 16 01000 g001
Figure 2. TextBlob negation pseudocode.
Figure 2. TextBlob negation pseudocode.
Applsci 16 01000 g002
Figure 3. Negex with SpacyTextBlob negation pseudocode.
Figure 3. Negex with SpacyTextBlob negation pseudocode.
Applsci 16 01000 g003
Figure 4. Antonyms–synonyms-only negation pseudocode.
Figure 4. Antonyms–synonyms-only negation pseudocode.
Applsci 16 01000 g004
Figure 5. Zero-shot negation pseudocode.
Figure 5. Zero-shot negation pseudocode.
Applsci 16 01000 g005
Figure 6. Setting up a dependency parse tree.
Figure 6. Setting up a dependency parse tree.
Applsci 16 01000 g006
Figure 7. Sample visual representation dependency tree.
Figure 7. Sample visual representation dependency tree.
Applsci 16 01000 g007
Figure 8. Set up of a ruled-base, implicit, double, MLP model and nested clause negation.
Figure 8. Set up of a ruled-base, implicit, double, MLP model and nested clause negation.
Applsci 16 01000 g008
Figure 9. Analyzing sentiment using various negations.
Figure 9. Analyzing sentiment using various negations.
Applsci 16 01000 g009
Figure 10. Applying an MLP model as a learned weight mechanism.
Figure 10. Applying an MLP model as a learned weight mechanism.
Applsci 16 01000 g010
Figure 11. Control model performance: (a) top 10 model accuracy without SMOTE; (b) top 10 model precision without SMOTE.
Figure 11. Control model performance: (a) top 10 model accuracy without SMOTE; (b) top 10 model precision without SMOTE.
Applsci 16 01000 g011
Figure 12. Model performance: (a) top 10 model recall without SMOTE; (b) top 10 model F1-score without SMOTE.
Figure 12. Model performance: (a) top 10 model recall without SMOTE; (b) top 10 model F1-score without SMOTE.
Applsci 16 01000 g012
Figure 13. Model performance: (a) top 10 model accuracy with SMOTE; (b) top 10 model precision with SMOTE.
Figure 13. Model performance: (a) top 10 model accuracy with SMOTE; (b) top 10 model precision with SMOTE.
Applsci 16 01000 g013
Figure 14. Model performance: (a) The top 10 model recall with SMOTE; (b) The top 10 model F1-score with SMOTE.
Figure 14. Model performance: (a) The top 10 model recall with SMOTE; (b) The top 10 model F1-score with SMOTE.
Applsci 16 01000 g014
Figure 15. Negation handling summaries.
Figure 15. Negation handling summaries.
Applsci 16 01000 g015
Table 1. Overview of negation handling.
Table 1. Overview of negation handling.
No.TitleAuthorsDataModel ClassifierNegation HandlingClassConclusion
1A Study of BERT’s Processing of Negations to Determine SentimentGiorgia Nidia Carranza Tejada, Johannes C. Scholtes and Gerasimos Spanakis29,175 tweetsBERTBERT memorizationPositive and negativeThe highest precision of 88%, recall of 89%, and F1-score of 89% [38]
2A Negation Handling Technique for Sentiment AnalysisClaudia Diamantini, Alex Mircoli, and Domenico Potena597 tweetsWord Sense Disambiguation (WSD)Switch class/polarity score with dependency parsePositive, neutral, and negativeThe best accuracy is 67%, with an F1-score of 72% [3]
3Deep Learning Approach for Negation Handling in Sentiment AnalysisPrakash Kumar Singh and Sanchita Paul5520 sentencesCRF, SVM, RNN, LSTM + CRF, LSTM, and BiLSTMNeural network detectionNegation and non-negationBiLSTM achieved the highest F1-score of 93.09% [29]
4Effect of Negation in Sentences on Sentiment Analysis and Polarity DetectionPartha Mukherjee, Youakim Badr, Shreyesh Doppalapudi, Satish M. Srinivasan, Raghvinder S. Sangwan, and Rahul Sharma75,000 reviewsRNN, ANN, SVM, NB, and SentiWordNet_NEG suffix and scope of negationPositive and negativeANN achieved the highest accuracy of 95.67% [25]
5Effective Negation Handling Approach for Sentiment Classification Using a Synsets in the WordNet Lexical DatabaseUtkarsh Lal and Priya Kamath50,000 reviewsLogistic Regression and Naïve Bayes with BoW, TFIDF, and Word EmbeddingSimilarity and antonymPositive and negativeLogistic regression with word embedding with lemmatization and negation handling achieved the highest accuracy of 91.79% [34]
6Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual SemanticItisha Gupta and Nisheeth Joshi12,597 tweetsRBF-SVMSwitch class/polarity score for implicit and explicit negationsPositive, neutral, and negativeHybrid RBF-SVM achieved the best accuracy of 58.67% [24]
7Feature-Based Twitter Sentiment Analysis With Improved Negation HandlingItisha Gupta and Nisheeth Joshi11,216 tweetsSVM, DT, and NB_NEG suffix and scope of negationPositive, neutral, and negativeThe SVM model classifier achieved the highest F1-score of 69.5% [26]
8Naïve Bayes with Negation Handling for Sentiment Analysis of Twitter DataLobna H. Kamal, Gerard T. McKee, and Nermin Abdelhakim Othman700,000 tweetsNBSwitch class/polarity scorePositive and negativeNaïve Bayes model classifier with negation handling achieved the best accuracy of 77.57% [36]
9Negation Handling in Sentiment Analysis at Sentence LevelUmar Farooq, Hasan Mansoor, Antoine Nongaillard, and Yacine Qadir Ouzrout, Muhammad Abdul1000 review sitesWSB, SW, HB, and MPScope, diminisher, and morphological negationNegation and non-negationNegation identification at sentence level achieved the highest accuracy of 83.3% [39]
10Negation Handling for Sentiment ClassificationKranti Vithal Ghag and Ketan Shah2000 movie reviewsRTFSC, ARTSC, Delta TFIDF, SentiTFIDF, and Traditional classifierSwitch class/polarity scorePositive and negativeSentiTFIDF achieved the highest accuracy of 77.3% [37]
Table 2. The lowest model accuracies and precisions in the control group.
Table 2. The lowest model accuracies and precisions in the control group.
Lowest AccuracyLowest Precision
ModelPercentModelPercent
Synonym + GB69.059Zero-Shot + MLP70.045
Synonym-only + GB69.059Zero-Shot + GB71.147
TextBlob + GB73.307Zero-Shot + MLP + LinearSVC + Voting73.727
Table 3. The lowest model recalls and F1-scores in the control group.
Table 3. The lowest model recalls and F1-scores in the control group.
Lowest RecallLowest F1-Score
ModelPercentModelPercent
Zero-Shot + GB51.552Zero-Shot + GB55.531
Zero-Shot + RF58.029Zero-Shot + RF63.278
Zero-Shot + LR59.564Zero-Shot + LR63.453
Table 4. The lowest model accuracies and precisions with SMOTE.
Table 4. The lowest model accuracies and precisions with SMOTE.
Lowest AccuracyLowest Precision
ModelPercentModelPercent
Synonym-only + GB69.121Synonym + GB72.626
Synonym + GB70.059Zero-Shot + BERT74.060
TextBlob + GB75.219Synonym-only + GB75.018
Table 5. The lowest model recalls and F1-scores with SMOTE.
Table 5. The lowest model recalls and F1-scores with SMOTE.
Lowest RecallLowest F1-Score
ModelPercentModelPercent
Synonym-only + GB69.193Synonym-only + GB68.777
Zero-Shot + BERT69.882Synonym + GB69.854
Synonym + GB70.176Zero-Shot + BERT71.911
Table 6. Ablation study model performance.
Table 6. Ablation study model performance.
TreatmentsAccuracyPrecisionRecallF1-Score
No Scope98.17997.75997.59597.677
No Double98.34297.75597.97297.863
No Implicit98.39797.93697.99497.965
Uniform Weight98.40397.90197.82397.862
Full Hybrid98.58298.19698.18998.193
Table 7. Comparison of the Hybrid with LLMs.
Table 7. Comparison of the Hybrid with LLMs.
ModelHybridLLMs
AccuracyPrecisionRecallF1-ScoreAccuracyPrecisionRecallF1-Score
BERT98.58298.19698.18998.19391.02188.74789.88889.314
RoBERTa98.48698.06098.09898.07992.94489.87593.83191.811
MLP98.09898.10198.09798.09893.34493.36093.31893.279
MLP + GB + LogReg98.07398.07398.07498.07293.76493.73493.74193.714
MLP + LinearSVC98.06798.07298.06698.06893.85593.80893.83393.807
Table 8. Descriptive statistics of the control and SMOTE accuracy.
Table 8. Descriptive statistics of the control and SMOTE accuracy.
Descriptive StatisticsControlSMOTE
Mean90.37992.764
Standard Error0.5690.500
Median92.25493.197
Mode93.19394.134
Standard Deviation6.0745.340
Sample Variance36.89628.514
Kurtosis1.9907.124
Skewness−1.390−2.417
Range29.20329.461
Minimum69.05969.121
Maximum98.26298.582
Sum10,303.15510,575.119
Count114114
Table 9. Z-scores between the control and SMOTE method.
Table 9. Z-scores between the control and SMOTE method.
AccuracyPrecisionRecallF1-Score
Z-score8.1867231317.8926740188.7295830318.636278986
Table 10. Z-scores between hybrid negation and the other methods.
Table 10. Z-scores between hybrid negation and the other methods.
NegexSynonym w/2nd Rule-BasedSynonym-OnlyTextBlobZero-Shot
Hybrid5.3445.3735.3735.3735.344Accuracy
5.3735.3735.3735.3734.851Precision
3.0245.2435.3305.3735.359Recall
5.3445.3735.3735.3735.359F1-score
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qorib, M.; Cotae, P. Hybrid Negation: Enhancing Sentiment Analysis for Complex Sentences. Appl. Sci. 2026, 16, 1000. https://doi.org/10.3390/app16021000

AMA Style

Qorib M, Cotae P. Hybrid Negation: Enhancing Sentiment Analysis for Complex Sentences. Applied Sciences. 2026; 16(2):1000. https://doi.org/10.3390/app16021000

Chicago/Turabian Style

Qorib, Miftahul, and Paul Cotae. 2026. "Hybrid Negation: Enhancing Sentiment Analysis for Complex Sentences" Applied Sciences 16, no. 2: 1000. https://doi.org/10.3390/app16021000

APA Style

Qorib, M., & Cotae, P. (2026). Hybrid Negation: Enhancing Sentiment Analysis for Complex Sentences. Applied Sciences, 16(2), 1000. https://doi.org/10.3390/app16021000

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop