Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter)

Touhami Chahdi, Yassir; Abbou, Fouad Mohamed; Abdi, Farid; Bouhadda, Mohamed; Bouanane, Lamiae

doi:10.3390/a18120748

Open AccessArticle

Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter)

by

Yassir Touhami Chahdi

^1,*,

Fouad Mohamed Abbou

²,

Farid Abdi

¹

,

Mohamed Bouhadda

³

and

Lamiae Bouanane

²

¹

Signals, Systems and Components Laboratory, Faculty of Sciences and Technologies, Sidi Mohamed Ben Abdellah University, Fes 30000, Morocco

²

School of Science and Engineering, Al Akhawayn University, Hassan II, Ifrane 53000, Morocco

³

Engineering Laboratory Polydisciplinary Faculty of Taza, Sidi Mohamed Ben Abdellah University, Fes 30000, Morocco

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(12), 748; https://doi.org/10.3390/a18120748

Submission received: 10 October 2025 / Revised: 7 November 2025 / Accepted: 12 November 2025 / Published: 28 November 2025

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

This research investigates global public attitudes towards ChatGPT by analyzing opinions on X (Twitter) to better understand societal perceptions of generative artificial intelligence (AI) applications. As conversational AI systems become increasingly integrated into daily life, evaluating public sentiment is crucial for informing responsible AI development and policymaking. Unlike many prior studies that adopt a binary (positive-negative) sentiment framework, this research presents a three-class classification scheme-positive, neutral, and negative framework, enabling more comprehensive evaluation of public attitudes using X (Twitter) data. To achieve this, tweets referencing ChatGPT were collected and categorized into positive, neutral, and negative opinions. Several algorithms, including Naïve Bayes, Support Vector Machines (SVMs), Random Forest, and an Ensemble Learning model, were employed to classify sentiments. The Ensemble model demonstrated superior performance, achieving an accuracy of 86%, followed by SVM (84%), Random Forest (79%), and Naïve Bayes (66%). Notably, the Ensemble approach improved the classification of neutral sentiments, increasing recall from 73% (SVM) to 76%, underscoring its robustness in handling ambiguous or mixed opinions. These findings highlight the advantages of Ensemble Learning techniques in social media sentiment analysis and provide valuable insights for AI developers and policymakers seeking to understand and address public perspectives on emerging AI technologies such as ChatGPT.

Keywords:

sentiment analysis; ChatGPT; machine learning; ensemble learning; social media

1. Introduction

Artificial intelligence is the revolution that has changed our world and is continuous, making long strides in areas such as health [1], education [2], and even business. While it powers intelligent transport systems or autonomous cars and traffic flow management to ensure smoother traffic or avoidance of accidents, e-commerce makes use of AI applications in recommending proper and personalized items for shopping, helping customers save on their time required and further enhancing shopping experiences. For example, AI in healthcare currently interprets mammograms with an accuracy of 99% [3]; in education, adaptive learning platforms create a distinctive experience which caters separately to each student’s needs [4]. Right in the middle of the current game-changing wave is a conversational AI system called ChatGPT from OpenAI and based on the GPT or Generative Pretrained Transformer architecture. From GPT-1 to the multimodal capability of GPT-4 in 2023 (see Figure 1), this model underwent an evolution reflective of very significant changes in natural language processing and human–computer interaction.

ChatGPT is used in numerous sectors of life [5]. For example, in education, it aids personalized education, and in entertainment, it enables the creation of original content. These varied uses of ChatGPT stem from its refined functionalities. ChatGPT technology is developing at a dizzying pace, and users are divided in their assessments, ranging from marveling and encouraging to fretting and fearing. Insight into users’ opinions is important for the understanding of AI acceptance and addressing of potential societal apprehensions. The present study tries to contribute to the growing understanding by analyzing social media sentiments. Tweets are categorized into positive, neutral, and negative types, with machine learning models employed to analyze trends and address challenges in public opinion.

2. Related Work

Due to the explosive rise of social networks and digital media in recent years, sentiment analysis has become a growing area of study. Many businesses and organizations are keen to learn what the public and customers think of their products by reading through social media posts. Governments also find public opinion analysis to be quite useful since it sheds light on human behavior and the impact of other people’s opinions. To understand the opinions and feelings conveyed in textual material, sentiment analysis is essential [6,7,8]. Opinion classification can be structured as a binary or multi-class classification task. Binary sentiment analysis categorizes texts into positive or negative groups, while multi-class sentiment analysis classifies them into more specific, detailed categories [9]. Sentiment analysis can be performed on various social media platforms [10], such as Twitter, and on websites, including comments, forums, blogs, and microblogs. Typically, this analysis is performed using either a rule-based system or a machine learning system. Recently, machine learning systems have become more popular because they offer more flexibility and are easier to implement than traditional rule-based methods.

The recent developments in sentiment analysis are owed to the use of machine learning [11] and deep learning algorithms [12,13]. The latest developments in sentiment analysis cover various application areas, datasets, pre-processing strategies, machine learning and deep learning methodologies, and large language models (LLMs) [14]. Sentiment analysis is based on natural language processing (NLP) tools, which are crucial for understanding the nuances of language, which can be complicated because opinions are subjective [15].

Previous research has extensively examined users’ opinions about ChatGPT on social media, particularly focusing on Twitter data and Reddit comments [16,17]. Public opinions about ChatGPT have been analyzed by using topic modeling, sentiment analysis, and SWOT analysis. In total, 202,905 comments from Reddit that were collected from December 2022 until December 2023 were processed [16]. The sentiments in tweets related to ChatGPT were analyzed with the same dataset by using ML algorithms, namely Decision Tree, KNN, Naive Bayes, Logistic Regression and SVM. Three feature extraction methods were implemented, and the classifier’s accuracy reached 96.41% in the case of SVM when applying TF-IDF. All neutral opinions were dropped from the dataset before training all ML algorithms [17].

The study in [17] worked with a database that included sentiments regarding ChatGPT but excluded neutral sentiments in the data processing stage. The algorithms were trained on datasets where the target variable was binary.

Despite significant advancements in sentiment analysis and the increasing use of social media data for public opinion mining, several research gaps remain concerning the analysis of attitudes towards ChatGPT and similar large language models. Prior studies have primarily focused on binary sentiment classification (positive vs. negative), often excluding neutral sentiments that represent a large proportion of real-world discussions. Furthermore, most previous analyses have only relied on individual machine learning classifiers (e.g., SVM, Naïve Bayes, or Logistic Regression) without exploring the potential benefits of Ensemble Learning approaches, which can provide the strengths of multiple models to improve generalization and handle ambiguous classes more effectively. Although recent advances in deep learning and transformer-based models (e.g., BERT, RoBERTa) have achieved strong results in sentiment analysis, their computational complexity and limited interpretability may not be ideal for exploratory studies aimed at evaluating social sentiment patterns.

Given these gaps, there is a clear need for a comprehensive, interpretable, and computationally efficient approach to analyzing multi-class sentiment (positive, neutral, and negative) in the context of public perceptions of ChatGPT. To address this need, this research presents an Ensemble Learning model based on SVM, Naïve Bayes, and Random Forest.

The rest of this paper is structured as follows: Section 3 describes the adopted methodology, Section 4 presents the results and their analysis, and Section 5 concludes the study. Section 6 and Section 7 discuss the limitations of the work and outline directions for future research.

3. Methodology

3.1. General Methodology Pipeline

Figure 2 depicts the methodological flow adopted by this study. It progresses through a series of sequential stages from data acquisition to model evaluation.

3.1.1. Data Collection

This research made use of a dataset available on the website Kaggle.com [18]. This dataset comprises 219,293 tweets related to ChatGPT. The tweets were collected over a one-month period. Each tweet was labeled with a sentiment category: positive (“good”), negative (“bad”), or neutral. The opinions were distributed as follows: the positive (good) ones amounted to approximately 26%, the negative (bad) opinions amounted to approximately 49%, and the neutral opinions amounted to approximately 25%, as illustrated in Figure 3. There were more negative opinions than there were positive and neutral ones. This dataset provides a substantial volume of data for training and evaluating sentiment analysis models, facilitating the development of classifiers to predict sentiments in tweets about ChatGPT.

3.1.2. Data Preprocessing

The Twitter data, which originally contained emojis, URLs, hashtags, and various special characters, was first cleaned to guarantee uniformity and clarity for subsequent analysis. Emojis were removed to avoid introducing irrelevant tokens; the text was normalized to lowercase; URLs and user mentions were eliminated; and non-ASCII characters, together with punctuation and unwanted symbols, were filtered out to improve consistency. Hashtags located at the end of tweets were removed, whereas those integrated within a sentence were retained by simply removing the hashtag symbol to preserve their semantic contribution. Words containing symbols such as $ or &, often associated with spam or promotional content, were discarded, and multiple spaces were reduced to a single space to ensure coherent formatting.

After the cleaning stage, the dataset was balanced by increasing the number of samples in the positive and neutral classes through random oversampling, while maintaining the original distribution of the negative class to avoid distortion. Figure 4 shows the refined thematic scope, aiding the discussion on the “applications, impact, and potential of ChatGPT” by using ‘ChatGPT’, ‘AI’, ‘human’, ‘write’, and ‘future’ as keywords. This preprocessing and balancing pipeline enhanced the quality of the dataset and ensured more reliable sentiment analysis and thematic interpretation.

3.1.3. BERT Tokenization

Text tokenization is a fundamental step in Natural Language Processing (NLP) that involves transforming raw text into smaller units called tokens. These tokens may represent words, subwords, characters, or symbols, depending on the tokenization strategy used. The purpose of tokenization is to convert unstructured text into a format that can be processed and analyzed by machine learning or deep learning models. Traditional tokenization methods split text into words based on whitespace or punctuation. However, modern NLP models, such as BERT [19], use more advanced tokenization techniques such as subword tokenization (WordPiece) [20]. The cleaned tweets were then tokenized using the BERT WordPiece tokenizer. This tokenizer splits text into meaningful subword units, allowing for the better handling of rare words, informal language, and variations commonly found in social media content.

3.1.4. Text Vectorization

The Term Frequency–Inverse Document Frequency (TF-IDF) technique has been employed to transform text into numerical representations suitable for machine learning [21]. This method assigns weights to words within a document, essentially serving as a measure of their importance or relevance. The mathematical representation for finding TF-IDF is as follows:

w (t_{k}) = t f_{k} * l o g (\frac{n}{d f_{k}})

(1)

where n is the total number of documents, and

d f_{k}

is the number of documents that contain the term

t_{k}

within the corpus: the first term is responsible for enhancing recall, while the latter enhances precision.

Although TF-IDF tries to overcome the problem of the frequently occurring terms within a document, it is limited in its ability; for example, handling each word as an index separately does not allow for a consideration of word similarity. Despite all the drawbacks, TF-IDF vectors give generally better accuracy in comparison to other methods.

3.1.5. Class Balancing

Since the dataset was imbalanced, i.e., the negative class was dominant, random oversampling was applied to increase the number of positive and neutral samples while keeping the negative class unchanged.

Random oversampling was selected instead of SMOTE to preserve the linguistic integrity of real sentences and avoid generating artificial, semantically incorrect text [22,23,24].

3.1.6. Training Base Classifiers

The balanced dataset was split into training (75%) and testing (25%) partitions to assess generalization performance objectively. Three supervised machine learning models were trained:

-: Multinomial Naïve Bayes (α = 1.0), particularly effective for TF–IDF-based textfeatures.
-: Linear Support Vector Machine (SVM) (C = 1.0, max_iter = 1000), known for its strong performance in high-dimensional spaces.
-: Random Forest Classifier (100 trees, Gini impurity), providing robustness and reduced sensitivity to noisy data.

3.1.7. Soft-Voting Ensemble

The predicted probabilities from the three models were combined using a soft-voting ensemble strategy, which averages the class probability distributions of the base learners. This approach combines the complementary strengths of the individual models and improves classification performance, particularly in the neutral sentiment class.

3.2. Machine Learning Algorithms

Support Vector Machine (SVM), Naïve Bayes, and Random Forest were selected as baseline classifiers for this analysis due to their ability to provide a complete performance evaluation [25]. SVM represents a linear approach that is effective in handling high-dimensional feature spaces, such as those produced by TF–IDF representations [25]. Naïve Bayes employs a probabilistic model that is computationally efficient and often performs well with sparse text data [26]. Random Forest, an ensemble method based on multiple decision trees, offers robustness against overfitting and captures complex feature interactions [27]. These three classifiers together cover distinct learning paradigms: linear, probabilistic, and ensemble, making them well-suited for this analysis.

3.2.1. Naïve Bayes Classifier

The Naïve Bayes classifier utilizes a probabilistic framework for text classification [28]. The posterior probability for a given class cj and document D is calculated as follows:

P (c_{j} / D) = \frac{P (D / c_{j}) P (c_{j})}{P (D)}

(2)

Assuming conditional independence of features, this simplifies into the following:

P (D / c_{j}) = \prod_{i} P (d_{i} / c_{j})

(3)

Assuming that the sample information is identical for each class,

c_{j}, j = 1, 2, \dots k

, Equation (3) becomes the following:

c^{*} (D) = a r g \max_{j} P (c_{j}) \prod_{i} P (d_{i} / c_{j})

(4)

3.2.2. Support Vector Machine (SVM)

Support Vector Machines classify data by finding the optimal hyperplane that separates classes in an n-dimensional space. The decision boundary is defined as follows:

w^{T} x_{i} + b = 0

(5)

where

x_{i}

is the feature vector and the associated label is

y_{i}

.

The training dataset is

X = {\{x_{i}, y_{i}\}}_{i = 1}^{n}

. The parameter w represents the optimal separating hyperplane, and b corresponds to the bias term. The simplest form of SVM classification occurs when the data are linearly separable in the feature space. In this scenario, the geometric margin is optimized by adjusting the functional margin, referred to as the canonical hyperplane [29], leading to a linear classifier,

y_{i} = 1

; the geometric margin in this case is as follows:

γ_{i} = \frac{1}{∥ w ∥}

(6)

The margin is defined as the distance between the hyperplane and the closest training data point.

3.2.3. Random Forest

The Random Forests algorithm, introduced in 2001 by Leo Breiman and Adèle Cutler, is a type of machine learning algorithm. This technique combines the principles of “bagging” and random subspaces. The algorithm constructs a forest of decision trees, each trained on slightly varied subsets of data. This algorithm is an ensemble method that employs multiple base classifiers, combining their outputs to generate a result, which makes it a highly effective and dependable technique in machine learning [30,31].

For each tree (b) from 1 to B, the algorithm selects a bootstrap sample Z* of size N from the training dataset. Then it constructs a random forest tree (Tb) using the following steps:

Randomly choose m features from the total p features.
Identify the optimal feature and split point among the mmm chosen features.
Divide the node into two child nodes and continue this process until the minimum node size (nmin) is reached.

The output is an ensemble of B decision trees. Each tree predicts a class, and the final prediction is determined by majority voting as shown in Figure 5.

3.2.4. Ensemble Learning

Ensemble Learning brings different machine learning models together into one model that delivers better predictive performance [32]. For that reason, the above-mentioned algorithms—RF, NB, and SVM—will be dealt with together in an overall collective classification where base learners will be jointly combined. Therefore, these make predictions through the average across various models with the probability distribution for one best output using soft voting.

The Support Vector Machine is a type of algorithm that provides scores which in fact are distances of data points from the decision hyperplane, which separates different categories. However, such scores are not probabilities. Such scores must, for the application of SVM on soft voting inside the Ensemble Learning, be probability calibrated. The Ensemble Learning model is in Figure 6 below.

4. Results and Discussion

This study evaluated several machine learning models for sentiment analysis, including Naïve Bayes, Support Vector Machine (SVM), Random Forest, and an Ensemble Learning approach. The performance of each model was assessed using standard metrics: precision, recall, F1-score, and accuracy.

To train and evaluate sentiment classification models, three supervised machine learning algorithms were employed with appropriate hyperparameter configurations. The first model was a Multinomial Naïve Bayes classifier, which applies Laplace/Lidstone smoothing to handle zero-frequency terms and learns class prior probabilities directly from the training distribution, making it well suited for text classification tasks based on TF-IDF representations. A Support Vector Machine (SVM) classifier with a linear kernel was also used, known for its strong performance in high-dimensional spaces. This model was trained using a fixed regularization strength (C = 1.0) and a limit of 1000 optimization iterations; its decision scores were subsequently calibrated using five-fold cross-validation to produce probabilistic outputs suitable for ensemble integration. Finally, a Random Forest classifier composed of 100 decision trees was implemented, where node splits were evaluated using the Gini impurity criterion. This ensemble structure enhances robustness by aggregating multiple diverse decision paths, thereby reducing variance and improving generalization. Table 1 summarizes the classifiers used and their key hyperparameter configurations.

The results revealed that the Ensemble Learning model outperformed the others, achieving an accuracy of 86%. This superior performance is further highlighted by its ability to address a key limitation observed in individual models: the classification of neutral sentiments. The Ensemble Learning model achieved a recall accuracy of 86% for neutral sentiments, surpassing both SVM and Random Forest models. While the individual models demonstrated average accuracy levels that left room for improvement, the Ensemble Learning approach not only raised overall accuracy but also improved recall for neutral sentiments, which had been a weak point in the standalone models. This makes the ensemble model a robust solution for sentiment analysis tasks.

Table 2 provides a comparative analysis of the performance metrics for different machine learning models used in this study. The Naïve Bayes model demonstrates baseline performance with an accuracy of 66%, highlighting its limitations in effectively handling neutral sentiments. The Support Vector Machine (SVM) model shows significant improvement, achieving an accuracy of 84% with strong precision and recall for both positive and negative sentiments. The Random Forest model delivers balanced performance, reaching an accuracy of 79%, though it struggles with recall for neutral sentiments.

As shown in Figure 7, the Ensemble Learning model emerges as the best-performing approach, achieving an accuracy of 86% and excelling in recall for neutral sentiments. This demonstrates its robustness in managing complex sentiment distributions.

Furthermore, with reference to Table 3, the performance of the machine learning models in classifying neutral sentiment varies significantly. Naïve Bayes struggles the most, achieving a neutral precision of 62% and a recall of only 54%, resulting in a high misclassification rate. Notably, 7043 neutral tweets are misclassified as negative.

SVM performs the best overall among individual models, with a neutral precision of 82% and a recall of 73%. However, it still misclassifies 3044 neutral tweets as negative and 2143 as positive. Random Forest outperforms Naïve Bayes but falls short of SVM, achieving a neutral precision of 84% and a recall of 66%. In this case, 3966 neutral tweets are misclassified as positive and 2654 as negative. The Ensemble Learning model surpasses all individual models, achieving the highest F1-score of 80% for neutral sentiment. It also demonstrates fewer misclassifications compared to the other models. These results indicate that while SVM offers the best balance among individual classifiers, Ensemble Learning further enhances neutral sentiment classification. It reduces misclassification errors and achieves more stable performance overall.

The confusion matrices, depicted in Figure 8, provide a deeper understanding of each model’s performance. The Naïve Bayes model, shown in (a), demonstrates adequate classification of positive and negative sentiments but struggles considerably with neutral sentiments, primarily due to its assumption of feature independence. This limitation is reflected in its overall accuracy of 66%. In comparison, the Support Vector Machine (SVM) model, depicted in (b), exhibits notable improvements, achieving an accuracy of 84%, with strong precision and recall for positive and negative sentiments. However, it still faces challenges in accurately classifying neutral sentiments, indicating a need for better feature separability.

The Random Forest model shown, in (c), offers balanced performance with an accuracy of 79%, excelling in positive and negative sentiment classification but showing weaker recall for neutral sentiments.

The Ensemble Learning model, illustrated in (d), shows the strengths of the individual classifiers through soft voting, achieving the highest accuracy of 86% and markedly improving recall for neutral sentiments to 76%. This demonstrates the effectiveness of ensemble techniques in handling complex sentiment distributions by balancing performance across all categories.

The robustness of the ensemble model is further validated by the receiver operating characteristic (ROC) curve shown in Figure 9. The high Area Under the Curve (AUC) values reflect the model’s excellent discriminatory power, particularly for positive and negative sentiments. While the AUC for neutral sentiments indicates improvement over individual models, it still highlights the inherent challenges of this category. Overall, the ensemble model’s balanced performance, high accuracy, and superior AUC values underscore its reliability and effectiveness in sentiment analysis tasks.

It is evident that the SVM model in this research study outperformed the others, with an accuracy of 84% compared to 65.3% [16]. Naïve Bayes and Random Forest also performed best in this study, with accuracies of 66% and 79%, respectively, compared to 62.8% and 60.3%, respectively [16]. The best model in this work is Ensemble Learning, with an accuracy of 86%. This improvement is due to the use of feature extraction, the BERT tokenizer and the tree ML algorithms as part of the Ensemble Learning.

Neutral sentiment classification is very challenging since neutral tweets contain mixed wording. The Ensemble Learning model addresses this issue by combining complementary decision boundaries from different classifier approaches:

-: Naïve Bayes contributes probabilistic weighting, which captures global word frequency trends.
-: SVM provides strong linear separation in high-dimensional space, improving margin-based discrimination.
-: Random Forest captures non-linear interactions between terms and contextual dependencies missed by SVM or NB.

This voting allows the ensemble to reduce variance (via Random Forest), correct bias (via SVM), and balance probabilistic uncertainty (via Naïve Bayes). Consequently, the ensemble achieved a neutral-class F1-score improvement of 3–6% compared to the best individual classifier.

In previous research [17], neutral sentiments were excluded, resulting in a binary classification task (positive vs. negative). This significantly simplifies model training because the boundary between sentiment classes becomes sharper, allowing algorithms such as SVM or Logistic Regression to achieve higher accuracy (up to 96%). In contrast, the present study preserved all three sentiment categories (positive, neutral, negative), introducing additional linguistic ambiguity and overlapping sentiment boundaries. As a result, overall accuracy is expected to be slightly lower; however, this setup provides a more realistic representation of public discourse, where neutral and mixed opinions are prevalent.

While earlier datasets captured early reactions to ChatGPT’s release, the dataset used in this study presents a more mature discussion period, including critical and balanced perspectives, which further complicates sentiment differentiation.

5. Conclusions

This paper has evaluated the effectiveness of various machine learning methods for analyzing public attitudes toward ChatGPT. The ensemble model demonstrated superior performance compared to individual classifiers, particularly in classifying neutral sentiments, which are traditionally difficult to categorize. Given that the majority of tweets in the dataset (47%) expressed a negative sentiment, there is a clear need for increased public acceptance and further refinement of AI technologies. Future studies should adopt longitudinal designs to capture evolving public perceptions over time, which is essential for understanding how general society views emerging AI systems. Furthermore, in-depth, domain-specific evaluations could yield highly informative insights for particular sectors, thereby increasing the practical utility of ChatGPT and similar technologies.

6. Limitations of the Study

Despite the strong performance of the ensemble model, the approach has limitations. TF–IDF does not capture deep semantic context, which impacts neutral sentiment classification. The dataset consists only of short tweets, limiting generalizability. The study also relies on classical models rather than deep learning or transformer-based methods, and traditional techniques still struggle to detect sarcasm, tone, and cultural language nuances.

7. Future Work

Future work will explore more advanced embedding techniques, such as Word2Vec, GloVe, and contextual models like BERT and RoBERTa, to improve semantic representation, especially for neutral sentiments. Deep learning and transformer-based classifiers may also be integrated to enhance performance. Additionally, expanding the dataset across time, languages, and regions can increase generalizability. Finally, incorporating explainable AI tools could provide clearer insight into how linguistic features shape sentiment classification.

Author Contributions

Conceptualization, Y.T.C., F.M.A. and F.A.; investigation, Y.T.C.; simulation, Y.T.C., F.M.A. and F.A.; writing—original draft preparation, Y.T.C.; writing—review and editing, L.B.; visualization, Y.T.C. and M.B. All authors have read and agreed to the published version of the manuscript.

Funding

We declare that this research received no funding.

Data Availability Statement

The original data presented in the study are openly available in https://www.kaggle.com/.

Conflicts of Interest

We declare that there are no financial or non-financial competing interests related to this manuscript. Neither the authors nor any co-authors have any competing interests to disclose.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under the Curve
BERT	Bidirectional Encoder Representations from Transformers
GPT	Generative Pre-Trained Transformer
LLM	Large Language Model
ML	Machine Learning
NB	Naïve Bayes
NLP	Natural Language Processing
RF	Random Forest
ROC	Receiver Operating Characteristic
SVM	Support Vector Machine
TF-IDF	Term Frequency–Inverse Document Frequency

References

Issa, W.B.; Shorbagi, A.; Al-Sharman, A.; Rababa, M.; Al-Majeed, K.; Radwan, H.; Refaat Ahmed, F.; Al-Yateem, N.; Mottershead, R.; Abdelrahim, D.N.; et al. Shaping the future: Perspectives on the Integration of Artificial Intelligence in health profession education: A multi-country survey. BMC Med. Educ. 2024, 24, 1166. [Google Scholar] [CrossRef] [PubMed]
Amiri, S.M.H.; Islam, M.M.; Hossen, M.S. The Role of Artificial Intelligence in Shaping Future Education Policies. Educ. J. 2025, 14, 32–38. [Google Scholar] [CrossRef]
Ahn, J.S.; Shin, S.; Yang, S.-A.; Park, E.K.; Kim, K.H.; Cho, S.I.; Ock, C.-Y.; Kim, S. Artificial Intelligence in Breast Cancer Diagnosis and Personalized Medicine. J. Breast Cancer 2023, 26, 405–435. [Google Scholar] [CrossRef]
Kabudi, T.; Pappas, I.; Olsen, D.H. AI-enabled adaptive learning systems: A systematic mapping of the literature. Comput. Educ. Artif. Intell. 2021, 2, 100017. [Google Scholar] [CrossRef]
Hantom, W.H.; Rahman, A. Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach. AI 2024, 5, 1049–1065. [Google Scholar] [CrossRef]
Khairy, M.; Mahmoud, T.M.; Omar, A.; Abd El-Hafeez, T. Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection. Lang. Resour. Eval. 2024, 58, 695–712. [Google Scholar] [CrossRef]
Omar, A.; Abd El-Hafeez, T. Quantum computing and machine learning for Arabic language sentiment classification in social media. Sci. Rep. 2023, 13, 17305. [Google Scholar] [CrossRef]
Birjali, M.; Kasri, M.; Beni-Hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 2021, 226, 107134. [Google Scholar] [CrossRef]
Mamdouh Farghaly, H.; Abd El-Hafeez, T. A new feature selection method based on frequent and associated itemsets for text classification. Concurr. Comput. Pract. Exp. 2022, 34, e7258. [Google Scholar] [CrossRef]
Bi, Y. Sentiment classification in social media data by combining triplet belief functions. J. Assoc. Inf. Sci. Technol. 2022, 73, 968–991. [Google Scholar] [CrossRef]
Revathy, G.; Alghamdi, S.A.; Alahmari, S.M.; Yonbawi, S.R.; Kumar, A.; Anul Haq, M. Sentiment analysis using machine learning: Progress in the machine intelligence for data science. Sustain. Energy Technol. Assess. 2022, 53, 102557. [Google Scholar] [CrossRef]
Abdullah, T.; Ahmet, A. Deep Learning in Sentiment Analysis: Recent Architectures. ACM Comput Surv 2022, 55, 1–37. [Google Scholar] [CrossRef]
Alqarni, A.; Rahman, A. Arabic Tweets-Based Sentiment Analysis to Investigate the Impact of COVID-19 in KSA: A Deep Learning Approach. Big Data Cogn. Comput. 2023, 7, 16. [Google Scholar] [CrossRef]
Spam and Sentiment Detection in Arabic Tweets Using MarBert Model|IIETA. Available online: https://www.iieta.org/journals/mmep/paper/10.18280/mmep.090617 (accessed on 7 November 2025).
Khan, M.T.; Durrani, M.; Ali, A.; Inayat, I.; Khalid, S.; Khan, K.H. Sentiment analysis and the complex natural language. Complex Adapt. Syst. Model. 2016, 4, 2. [Google Scholar] [CrossRef]
Naing, S.Z.S.; Udomwong, P. Public Opinions on ChatGPT: An Analysis of Reddit Discussions by Using Sentiment Analysis, Topic Modeling, and SWOT Analysis. Data Intell. 2024, 6, 344–374. [Google Scholar] [CrossRef]
Sabir, A.; Ali, H.A.; Aljabery, M.A. ChatGPT Tweets Sentiment Analysis Using Machine Learning and Data Classification. Informatica 2024, 48, 103–112. [Google Scholar] [CrossRef]
ChatGPT Sentiment Analysis. Available online: https://www.kaggle.com/datasets/charunisa/chatgpt-sentiment-analysis (accessed on 7 November 2025).
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar] [CrossRef]
Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv 2016, arXiv:1609.08144. [Google Scholar] [CrossRef]
El-Khair, I.A. TF*IDF. In Encyclopedia of Database Systems; Springer: Boston, MA, USA, 2009; pp. 3085–3086. ISBN 978-0-387-39940-9_956. [Google Scholar]
SMOTE for High-Dimensional Class-Imbalanced Data|BMC Bioinformatics|Full Text. Available online: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-106 (accessed on 7 November 2025).
Mujahid, M.; Kına, E.; Rustam, F.; Villar, M.G.; Alvarado, E.S.; De La Torre Diez, I.; Ashraf, I. Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering. J. Big Data 2024, 11, 87. [Google Scholar] [CrossRef]
Glazkova, A. A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification. arXiv 2020, arXiv:2008.04636. [Google Scholar] [CrossRef]
Rifaldy, F.; Sibaroni, Y.; Prasetiyowati, S.S. Effectiveness of Word2Vec and TF-IDF in Sentiment Classification on Online Investment Platforms Using Support Vector Machine. JIPI (J. Ilm. Penelit. Dan Pembelajaran Inform.) 2025, 10, 863–874. [Google Scholar] [CrossRef]
Blanquero, R.; Carrizosa, E.; Ramírez-Cobo, P.; Sillero-Denamiel, M.R. Variable selection for Naïve Bayes classification. Comput. Oper. Res. 2021, 135, 105456. [Google Scholar] [CrossRef]
Venkateshwarlu, G.; Akhila, S.; Kavyasree, V.; Vishnu, S.; Prasad, V.S. Enhanced Text Classification Using Random Forest: Comparative Analysis and Insights on Performance and Efficiency. Int. J. Comput. Eng. Res. Trends 2024, 11, 1–8. [Google Scholar]
Peretz, O.; Koren, M.; Koren, O. Naive Bayes classifier—An ensemble procedure for recall and precision enrichment. Eng. Appl. Artif. Intell. 2024, 136, 108972. [Google Scholar] [CrossRef]
Liang, X.; Zhu, L.; Huang, D.-S. Multi-task ranking SVM for image cosegmentation. Neurocomputing 2017, 247, 126–136. [Google Scholar] [CrossRef]
Biau, G.; Scornet, E. A Random Forest Guided Tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
Pasinetti, S.; Fornaser, A.; Lancini, M.; De Cecco, M.; Sansoni, G. Assisted Gait Phase Estimation Through an Embedded Depth Camera Using Modified Random Forest Algorithm Classification. IEEE Sens. J. 2020, 20, 3343–3355. [Google Scholar] [CrossRef]
Mhawi, D.N.; Aldallal, A.; Hassan, S. Advanced Feature-Selection-Based Hybrid Ensemble Learning Algorithms for Network Intrusion Detection Systems. Symmetry 2022, 14, 1461. [Google Scholar] [CrossRef]

Figure 1. The evolution of ChatGPT from GPT-1 to GPT-4.

Figure 2. Methodology pipeline.

Figure 3. Sentiment distribution of ChatGPT tweets in the dataset.

Figure 4. Word cloud of cleaned tweets.

Figure 5. Random Forest algorithm.

Figure 6. Ensemble Learning model.

Figure 7. Performance of classification models across accuracy and class-based metrics.

Figure 8. Confusion matrices for classification models: (a) Naïve Bayes, (b) Support Vector Machine, (c) Random Forest and (d) Ensemble Learning.

Figure 9. ROC curve for Ensemble Learning.

Table 1. Hyperparameters used by the classifiers.

Classifier	Model/Component	Key Hyperparameters
Naïve Bayes	MultinomialNB()	alpha = 1.0 fit_prior = True
Support Vector Machine (SVM)	LinearSVC	C = 1.0 max_iter = 1000 random_state = 42
Support Vector Machine (SVM)	CalibratedClassifierCV	cv = 5
Random Forest	RandomForestClassifier()	n_estimators = 100 criterion = ‘gini’

Table 2. Performance metrics of machine learning models.

Model	Precision (Avg)	Recall (Avg)	F1-Score (Avg)	Accuracy (%)
Naïve Bayes	66%	66%	66%	66
SVM	84%	84%	84%	84
Random Forest	79%	79%	78%	79
Ensemble Learning	86%	86%	86%	86

Table 3. Performance on neutral sentiment classification.

Model	Neutral Precision	Neutral Recall	Neutral F1-Score
Naïve Bayes	0.62	0.54	0.58
SVM	0.82	0.73	0.77
Random Forest	0.84	0.66	0.74
Ensemble Learning	0.84	0.76	0.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Touhami Chahdi, Y.; Abbou, F.M.; Abdi, F.; Bouhadda, M.; Bouanane, L. Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter). Algorithms 2025, 18, 748. https://doi.org/10.3390/a18120748

AMA Style

Touhami Chahdi Y, Abbou FM, Abdi F, Bouhadda M, Bouanane L. Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter). Algorithms. 2025; 18(12):748. https://doi.org/10.3390/a18120748

Chicago/Turabian Style

Touhami Chahdi, Yassir, Fouad Mohamed Abbou, Farid Abdi, Mohamed Bouhadda, and Lamiae Bouanane. 2025. "Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter)" Algorithms 18, no. 12: 748. https://doi.org/10.3390/a18120748

APA Style

Touhami Chahdi, Y., Abbou, F. M., Abdi, F., Bouhadda, M., & Bouanane, L. (2025). Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter). Algorithms, 18(12), 748. https://doi.org/10.3390/a18120748

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analyzing Global Attitudes Towards ChatGPT via Ensemble Learning on X (Twitter)

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. General Methodology Pipeline

3.1.1. Data Collection

3.1.2. Data Preprocessing

3.1.3. BERT Tokenization

3.1.4. Text Vectorization

3.1.5. Class Balancing

3.1.6. Training Base Classifiers

3.1.7. Soft-Voting Ensemble

3.2. Machine Learning Algorithms

3.2.1. Naïve Bayes Classifier

3.2.2. Support Vector Machine (SVM)

3.2.3. Random Forest

3.2.4. Ensemble Learning

4. Results and Discussion

5. Conclusions

6. Limitations of the Study

7. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI