Prediction of Machine-Generated Financial Tweets Using Advanced Bidirectional Encoder Representations from Transformers

: With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in ﬁnance, has become challenging. Previous studies have focused on binary identiﬁcation of ChatGPT -generated content, overlooking other AI tools used for text re-generation. This study addresses this gap by examining various AI-regenerated content types in the ﬁnance domain. Objective: The study aims to diﬀerentiate between human - generated ﬁnancial content and AI-regenerated co ntent, speciﬁcally focusing on ChatGPT, QuillBot, and SpinBot. It constructs a dataset comprising real text and AI-regenerated text for this purpose. Contribution: This research contributes to the ﬁeld by providing a dataset that includes various types of AI -regenerated ﬁnancial content. It also evaluates the performance of diﬀerent models, particularly highlighting the eﬀectiveness of the Bidirectional Encoder Representations from the Transformers Base Cased model in distinguishing between these content types. Methods: The dataset is meticulously preprocessed to ensure quality and reliability. Various models, including Bidirectional Encoder Representations Base Cased, are ﬁne -tuned and compared with traditional machine learning models using TFIDF and Word2Vec approaches. Results: The Bidirectional Encoder Representations Base Cased model outperforms other models, achieving an accuracy, precision, recall, and F1 score of 0.73, 0.73, 0.73, and 0.72 respectively, in distinguishing between real and AI-rege nerated ﬁnancial content. Conclu-sion: This study demonstrates the eﬀectiveness of the Bidirectional Encoder Representations base model in diﬀerentiating between human - generated ﬁnancial content and AI -regenerated content. It highlights the importance of considering various AI tools in identifying synthetic content, particularly in the ﬁnance domain in Pakistan.


Introduction
In recent years, natural language processing (NLP) has played an important role in improving computer systems' performance for generating and understanding human content [1].Due to the advancement of such a domain, a Large Language Model (LLM) was introduced recently for the generation of language that is relatively close to human tone, after the training of large text data [2].The Generative Pretrained Transformer (GPT) model that was introduced by OpenAI has gained popularity in less time [3].
The founding of ChatGPT signifies a significant turning point in the development of complex computer systems, especially in the area of NLP [4].After being trained on a large amount of textual information, ChatGPT produces quick logical and appropriate responses for the given context.Its influence can be observed in several areas, such as language acquisition, education, and communication, where its capacity to generate text like humans is constructive.The development of ChatGPT as a popular LLM signifies an important breakthrough in NLP.BERT, a massive language model developed by Google, is another noteworthy example that has significantly advanced the field of study.The increasing use of advanced AI-powered chatbots like ChatGPT underscores the importance of differentiating between text created by humans and artificial intelligence [5].This distinction can have significant implications for various fields related to information security and digital forensics [6].In the realm of information security, the ability to identify AIgenerated language is crucial for detecting and mitigating the impact of harmful AI applications, such as deceptive social engineering or the dissemination of false and misleading information.Developing methods to distinguish AI-generated texts is essential for ensuring the integrity and accuracy of data [7], particularly in sensitive industries like banking and finance, political campaigns, and customer reviews of restaurants, movies, or products.ChatGPT can also be used for text regeneration, and the detection of ChatGPT is explored in many studies.There is a lack of identification text that is generated by tools like QuillBot, SpinBot, as well as ChatGPT.The main details about these tools are described below.
OpenAI's GPT [8] series has sparked widespread interest in natural language processing (NLP) because of its novel technique and remarkable performance on linguistic challenges.GPT is a deep learning model that initially acquires knowledge from a large amount of text input, specifically from a transformer architecture.During this early learning stage, the model learns about the syntax, structure, and suggestions of the language.The transformer design in GPT has been recognized for properly managing complex linguistic problems.GPT produces coherent and contextually relevant writing by using a process known as "self-attention" to assess the significance of words inside a sentence.GPT can be fine-tuned using specific data for tasks like translation, summarization, question answering, and idea analysis.GPT and related models benefit people individually as well as society at large by enhancing communication, education, and healthcare services across a wide range of businesses.
In the development of NLP and text generation, Quillbot has emerged as a significant turning point [9].AI researchers and engineers developed Quillbot as a language model to handle the difficulties of automatically paraphrasing text and growth.To improve the quality and accuracy of text, such tools currently need time.Quillbot provides superiorquality text extraction, academic scripting, creation of content, and skillful transmission by utilizing important NLP and ML strategies.
SpinBot [10] is a tool that is frequently used in marketing and content production to generate several versions of a blog post or article for search engine optimization.Nevertheless, because the automatic rewriting process does not always yield clear or comprehensible sentences, the deployment of SpinBot can result in low-quality or meaningless content.It is crucial to exercise caution while using these tools and to carefully check the results to make sure the content is still pertinent, accurate, and understandable.
The structure of the paper is as follows: The relevant works are covered in Section 2, the dataset and methodology are explained in Section 3, the results and discussion are presented in Section 4, and the study is concluded in Section 5.

Literature Review
Yu et al. [11] compiled a comprehensive conceptual dataset incorporated by ChatGPT users to develop identification techniques and investigate the potential negative impact of ChatGPT on academia.The conceptual dataset consists of 35,304 synthetic abstracts.Liao et al. [12] introduced the AIGC (Artificial Intelligence Generated Content) ethics framework in the healthcare sector.Their study mainly focused on comparing ChatGPT-generated texts with medical texts generated by human experts.In addition, they explored machine learning algorithms to gather and recognize medical texts created by ChatGPT.Initially, they accomplished a dataset containing both medical texts from ChatGPT and those from human specialists.Finally, they used and developed machine learning techniques to confirm the source of the generated medical text.ChatGPT has emerged as a potential tool in the medical industry, due to its advanced text analysis skills and user-friendly interface.However, ChatGPT is more concerned with understanding the meaning of text than with complicated data structures and real-time data analysis, which are normally necessary for the construction of an intelligent Clinical Decision Support System (CDSS) employing specialist AI algorithms.Although ChatGPT does not directly perform specific algorithms, it builds the framework for intelligent CDSS at the textual level.
Alamleh et al. [13] examined ChatGPT and other advanced artificial intelligence language models, highlighting concerns about distinguishing between human-written and AI-generated texts in academic and scientific contexts.The study aimed to assess the effectiveness of ML algorithms in discriminating between human-generated text and text generated by artificial intelligence.In their study, they interviewed computer science students about both paper-based and programming activities.They trained and tested a variety of ML models, including Logistic Regression (LR), Decision Tree (DT), support vector machines (SVMs), neural networks (NNs), and Random Forest (RF), using accuracy, computational efficiency, and confusion matrices.By comparing the performance of these models, they determined that the RF model is effective for such a task.Chen et al. [14] introduced a novel way of distinguishing between human-written texts and those generated by ChatGPT utilizing language-based methods.They collected and shared a preprocessed dataset called OpenGPT Text, which contained rewritten text created by ChatGPT.Then, they created, developed, and trained two distinct text classification models utilizing the highly improved pretraining method of BERT (Roberta) and the text-to-text transfer transformer (T5) in their study.Katib et al. [15] proposed a Tunicate Swarm Algorithm with a Long Short-Term Memory Recurrent Neural Network (TSA-LSTMRNN) model to identify human and ChatGPT text in their study.Furthermore, the proposed TSA-LSTMRNN mainly focuses on word embedding, TF-IDF, and count vectorizer for the process of feature extraction.Hamed and Yu [16] demonstrated that the distributions generated by ChatGPT can be discriminated from those produced by human researchers.By developing a coordinated ML technique, they demonstrated how machine-generated articles may be distinguished from specialist-produced articles.They created an algorithmic mechanism for reliably identifying publications published by ChatGPT.
Perkins et al. [17] explored aspects of academic integrity among students utilizing artificial intelligence techniques, with a special emphasis on using ChatGPT in the LLM (Large Langue Model).They discussed the evolution of these tools and highlighted potential ways for an LLM to support students' digital writing education, such as assisting with writing composition and instruction, facilitating AI-human collaboration, improving Automated Writing Assessment (AWA), and assisting EFL (English as a Foreign Language) students.Each institution's academic integrity policy can determine if a student's use of an LLM complies with academic integrity standards, which may need to be updated to reflect the changing use of these technologies in educational contexts.
Maddigan and Susnjak [18] introduced Chat2VIS, a revolutionary framework that uses Large Language Models (LLMs).This framework demonstrates how complicated language understanding challenges can be tackled, and it can guide critical and accurate final solutions more quickly than existing methods.Based on the arguments pointed out, Chat2VIS demonstrates that LLMs are a trustworthy method for visualizing natural language inquiries.Their study also demonstrates how LLM incentives may be set to protect data privacy and security over a wide range of information.To recognize machine-generated content on social media, Kumarage et al. [19] used BERT and ensemble-based techniques that used stylometric elements.
Pardos and Bhandari [20] carried out a fundamental study.They evaluated the efficiency of ChatGPT prompts to those provided by human instructors in two distinct algebra subjects: intermediate and arithmetic algebra.Their study introduces the Tunicate Multitude algorithm, which uses the TSA-LSTMRNN model to identify human-generated text and ChatGPT text.The suggested TSA-LSTMRNN method investigates the model's decision-making and discovers any potential distinguishing patterns.Furthermore, the TSA-LSTMRNN method employs the TF-IDF (Term Frequency-Inverse Document Frequency) scheme, word embedding, and computer vectorization in the feature extraction process.The LSTMRNN model was used for both detection and classification tasks in their study.As compared to a multimodal ensemble, Dipta et al. [21] proposed a single base model with data augmentation and contrastive learning for the prediction of machinegenerated text.

Research Gap
This study fills a major gap in the literature by proposing an extensive classification system that makes the distinctions between text written by QuillBot (v15.151.0),SpinBot (Basic Free Version), ChatGPT (v3.5), and real humans.Prior research has mostly concentrated on the distinction between text generated by ChatGPT and human writers, ignoring the subtleties added by other text-generation programs like QuillBot (v15.151.0), and Spin-Bot (Basic Free Version).This study's incorporation of these extra classes offers a more sophisticated comprehension of text generation, facilitating more precise identification and classification of many forms of generated text.Our comprehension of the digital landscape is strengthened by this classification methodology, which improves our capacity to assess and address the spread of machine-generated information across several online platforms.The major challenges in finance AI text detection, such as domain-specific language and diverse data sources, have also been emphasized.The major contributions of this study are listed below:

•
We meticulously compiled a substantial dataset of finance-related tweets sourced from Twitter, forming the cornerstone of our research.

•
Utilizing this financial tweet dataset, we harnessed advanced language models, including ChatGPT, QuillBot, and SpinBot, to craft pertinent content, augmenting the depth and breadth of our collection for in-depth analysis.

•
The prepared dataset of four classes was properly preprocessed before applying models.

•
The different versions of Bidirectional Encoder Representations from Transformers (BERT) models, including BERT Base Cased, BERT Base Un-Cased, BERT Large Cased, BERT Large Un-Cased, Distilbert Base Cased, and Distilbert Base Un-Cased, are fine-tuned in this study.

•
The performance of the "BERT Base Cased" model is also compared with various machine learning models, including Logistic Regression, Random Forest Classifier, Gradient Boosting Classifier, K Neighbors Classifier, Decision Tree Classifier, Multilayer Perceptron (MLP) Classifier, AdaBoost Classifier, Bagging Classifier, Support Vector Classifier (SVC), and Quadratic Discriminant Analysis.

•
The machine learning models are trained with TF-IDF and Word2Vec separately to properly show the robustness of the proposed fine-tuned "BERT Base Cased" model.

Materials and Methods
In this section, we explore the key aspects of our proposed study, including dataset preparation methods, preprocessing, and a thorough discussion of the models and techniques used in this study, see Figure 1 as an abstract diagram.

Dataset
Using the search API on Twitter, we created a dataset of tweets for our study.We retrieved the tweets based on predefined financial terms, i.e., "debit" and "credit", which helped us to select appropriate tweets for our study.Finally, to prepare the dataset, we regenerated the tweets with QuillBot, SpinBot, and ChatGPT-3.5.To reduce the overfitting chance, we prepared a balanced dataset, i.e., 1000 samples per class.Our prepared dataset consists of a total of 4000 samples from which 1000 belong to real financial texts, 10,000 are generated by ChatGPT-3.5,1000 are generated by SpinBot, and 1000 are generated by QuillBot.

Preprocessing
Data preprocessing is a necessary step in the domains of machine learning and deep learning.To reduce the computation power and achieve effective accuracy, there is a need to remove unnecessary information from tweets before regenerating text from QuillBot, SpinBot, and ChatGPT.We used a well-known library, Tweets-Processor [22], to remove emojis, URLs, and hashtags from the tweets to remove the noise and irrelevant information, see Figure 2 for WordCloud.Furthermore, the point that is noticeable here is that we have collated the English tweets only and applied lemmatization as text preprocessing for the time being to reduce the dimensionality of the dataset, see Figure 3 for the length of text in the prepared dataset.The final prepared samples can be seen in Table 1."so country borrow money pay interest loan burden future generation daily interest payment sit huge amount dead capital urban holding worth form govtowned land building" "The country's borrowing money to pay interest on loans burdens future generations, with daily interest payments sitting on a huge amount of dead capital in the form of government-owned land and buildings".
"so country acquire cash pay revenue advance weight group of people yet to come everyday premium installment sit colossal sum dead capital metropolitan holding worth structure govt-possessed land building" "Therefore, the nation borrows money, pays interest on loans, and burdens future generations with daily interest payments while sitting on enormous amounts of dead capital from government-owned land buildings".

BERT Transformer Model
Development in recent years has focused on using the pretrained deep neural network as the language model and then optimizing the downstream tasks.The meaning of words and the structure of sentences must be accurately represented by these word vectors.These problems can be well solved by the BERT model.It employs a bidirectional transformer as the model's training framework [23].Three hyperparameters primarily decide the size of BERT: the quantity of transformer layers, the number of self-attention heads in every transformer, and the dimensionality of hidden state vectors.There are 12 transformer layers, 768 hidden state vectors, and 12 self-attention heads in the BERT base model.On the other hand, there are 24 transformer layers, 1024 hidden state vectors, and 16 self-attention heads in the BERT base model.Therefore, a huge computation power is required to train the BERT base as well as the BERT large model due to 110 M and 340 M parameters respectively.To mitigate this, large-scale unlabeled texts, such as those collected from Wikipedia, are used to pretrain BERT models in a self-supervised manner for two tasks: masked language modeling (MLM) and next sentence prediction (NSP).The base architecture of the model can be seen in Figure 4. Input Representation:

•
BERT takes input sequences of tokens, which are the basic units of text (words or subwords).

•
Each input sequence starts with a special token [CLS], which stands for classification, followed by the actual text tokens, and ends with a special token [SEP], which indicates the end of the sequence.

•
BERT also uses WordPiece embeddings, which break down words into subword units to handle rare or out-of-vocabulary words.
Transformer Encoder: • BERT utilizes the transformer architecture, which consists of multiple layers of selfattention mechanisms and feedforward neural networks.

•
The self-attention mechanism allows the model to weigh the importance of different words in the input sequence when encoding each word representation.

•
Each transformer layer processes the input sequence in parallel, allowing for efficient computation of contextual word embeddings.

Pre-training Objective:
• BERT is pretrained using two unsupervised learning tasks: • Masked Language Model (MLM): BERT randomly masks some of the input tokens and then tries to predict the original words based on the context provided by the surrounding tokens.

•
Next Sentence Prediction (NSP): BERT is trained to predict whether a pair of sentences appear consecutively in a document or not, helping the model learn relationships between sentences.
Fine-tuning for Text Classification: • After pretraining, BERT can be fine-tuned on a specific text classification task [24].

•
The [CLS] token representation from the final transformer layer is used as the aggregate sequence representation for the classification task.
• A simple classification layer (e.g., a SoftMax classifier) is added on top of the [CLS] representation to predict the class label for the input text.

Benefits of BERT for Text Classification:
• Bidirectional Context: BERT considers context from both directions, capturing richer semantic meaning compared to traditional models.

•
Transfer Learning: Pretraining on a large corpus allows BERT to learn general language representations, which can be fine-tuned on smaller, task-specific datasets for improved performance.

•
Attention Mechanism: The self-attention mechanism in BERT helps the model focus on relevant parts of the input sequence, enhancing its ability to understand complex relationships in text.
Pretraining and fine-tuning are the two phases of a BERT-based model's training process.The trained model could be applied to predict the labels of test samples that have not been seen yet.

Experiments and Discussion
The evaluation metrics that were employed to ascertain the effectiveness of the proposed approach are thoroughly examined in this section [25].It also looks at the hardware and software requirements needed for training and evaluating models.A thorough description of the various hyperparameters and the values that correlate with them is provided.This part also meticulously presents a thorough analysis of the results obtained with the proposed methodology.

Evaluation Metrics
Quantitative measurements known as assessment metrics are essential for assessing the effectiveness of a deep learning model.They are crucial for assessing how well various models or algorithms work on a certain task, figuring out how well a model or algorithm works to solve a certain problem, and pinpointing potential areas for development.This study used recall/sensitivity, ROC curve, accuracy, f1-score, precision, and confusion matrix as assessment metrics.These metrics provide a comprehensive assessment of the model's effectiveness and provide illuminating details regarding its benefits and possible areas for development.

•
Accuracy: To evaluate the overall accuracy of the model's predictions, the accuracy metric computes the ratio of correctly classified cases to total samples, see Equation (1).However, when errors have varying degrees of importance or when datasets are uneven, relying solely on accuracy may not be adequate to provide a comprehensive evaluation.

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇)/((𝑇𝑇𝑇𝑇
• Precision: The ability of a model to correctly identify positive samples from the set of actual positives is referred to as its precision.This metric calculates the ratio of true positives to the sum of true positives and false positives, see Equation (2).In short, accuracy indicates how well the model performs when it generates a positive forecast.
• Recall: Recall measures how successfully the model separates positive samples from the actual positive pool.It is also frequently referred to as sensitivity or the true positive rate.This statistic is computed as the ratio of true positives to the sum of true positives and false negatives, see Equation (3).In essence, recall offers an assessment of the extent to which the model's favorable predictions hold.
• F1-Score: Recall and precision are balanced in the F1-score, a comprehensive measure.It is calculated as these two measurements' harmonic mean, see Equation ( 4).This is particularly helpful when there is an unequal distribution of errors between the classes or when there is no difference in the relative importance of the two error categories.The F1-score, which ranges from 0 to 1, is a combined assessment of the model's recall and precision abilities.It operates most effectively at 1.

Experiments Setup
This experiment uses Python for programming.The deep learning framework used is Keras, which is built on top of TensorFlow.The experiment was developed using the Google Colab free version (https://colab.research.google.com/(accessed on 1 February 2024)).The configuration used in the experiment is detailed in Table 2.

Hyperparameter Tuning
To attain the best results in the training of the fake financial classification model, a thorough empirical testing approach was carried out to adjust several hyperparameters.Batch size, optimizer selection, learning rate, epochs, and the choice of a suitable loss function are some of these crucial variables.The goal was to determine which combination of hyperparameter values produced the best results in the classification of fake content.For the model to successfully identify between cases of real and machine-regenerated financial content with an adequate level of accuracy and resilience, an iterative optimization procedure is required.Table 3 provides the specifics of the parameters.

Experimental Results and Discussion
In this study, we have considered the BERT models and their variations.Extensive experiments are performed for the prepared dataset.The different BERT variations, i.e., BERT base, BERT large, and Distilbert [26], are considered with fine-tuning in this study to propose an effective transformer model for fake text identification [27].The BERT models are trained on BooksCorpus (800 M words) and English Wikipedia (2500 M words).The BERT base has 110 M parameters and the large has 340 parameters.The base model is light and the best for choosing such a classification task.The BERT Base Cascade model outperformed other variations, as can be seen in Table 4.The models were evaluated for 600 test samples.From Table 4, it is observable that the BERT base case model outperformed the variational models of BERT with an accuracy of 0.73, precision of 0.73, recall of 0.73, and f1 of 0.72.The Receiver Operating Characteristics curve (ROC) is a more crucial evaluation matrix used in the proposed study.An effective two-dimensional representation of the tradeoff between true-positive and false-positive rates is an ROC curve.Every DL model was evaluated throughout the training process using a distinct set of test data that was not used in the training phase.To verify equal outcomes and assess the detectors' generalization abilities, the data were constructed in this manner.The ROC curves of the BERT variation models are displayed in Table 5.

Robustness of the Proposed Model
For the robustness of the proposed fine-tuned BERT base model, we have also considered several machine learning models in this study.The TF-IDF (Term Frequency-Inverse Document Frequency) technique [28] has been examined in this study with machine learning models.This statistical measure assesses a word's significance in a document about a corpus of documents.For tasks like text mining, text classification, and search engine optimization, it is frequently utilized in natural language processing and information retrieval.The results of well-known machine learning models with TF-IDF can be seen in Table 6.Table 6.Evaluation scores of machine learning models with TF-IDF vectorizer for fake financial text dataset.

Model
Accuracy Precision Recall F1 Logistic Regression [29] 0.57 0.57 0.57 0.56 Random Forest Classifier [30] 0.47 0.47 0.47 0.47 Gradient Boosting Classifier [31] 0.47 0.47 0.47 0.47 K-Neighbors Classifier [32] 0.46 0.45 0.46 0.45 Decision Tree Classifier [33] 0.40 0.40 0.40 0.40 MLP Classifier [34] 0.59 0.59 0.59 0.58 AdaBoost Classifier [35] 0.54 0.52 0.54 0.52 Bagging Classifier [36] 0.44 0.44 0.44 0.44 SVC [37] 0.53 0.56 0.53 0.52 Quadratic Discriminant Analysis [38] 0.52 0.40 0.52 0.38 The BERT Base Cased model outperforms other models because of its ability to capture context through bidirectional training while maintaining case sensitivity, which is critical for identifying financial institutions.While BERT Large models contain more parameters, they are prone to overfitting and need more computational resources, making them unsuitable for our dataset.DistilBERT, while efficient, makes several tradeoffs in performance.Traditional models such as Logistic Regression, Random Forest, and others that use TF-IDF and Word2Vec embeddings lack the depth of contextual understanding that BERT has, resulting in lesser accuracy.This blend of complexity and generalization makes BERT Base Cased the best fit for our requirements.The extensive experiments conclude that the BERT Base Cased model outperformed not only BERT variations but also 10 machine learning models that were trained using TFIDF and Word2Vec separately.The BERT Base Cased model has superior performance with accuracy, precision, recall, and f1 of 0.73, 0.73, 0.73, and 0.72, respectively.Additionally, the BERT Large Un-Cased model is also effective due to accuracy, precision, recall, and f1 of 0.73, 0.73, 0.72, and 0.72.It is important to remember, nevertheless, that the 340 parameters in the BERT Large model make it computationally expensive.For the machine learning models trained with the TFIDF and Word2Vec approach, the logistic regression model with TFIDF outperformed the others with accuracy, precision, recall, and f1 of 0.62, 0.63, 0.62, and 0.62, respectively.The best models' performance in terms of the BERT base model, BERT large model, TFIDF, and Word2Vec approach can be seen in Figure 5.

Conclusions
In this proposed study, we have covered a major gap in existing studies that are mainly based on binary fake content identification, i.e., ChatGPT and human text.We have considered the AI-based content regeneration online tools, i.e., QuillBot and SpinBot, with ChatGPT to save the financial markets from unfair content or reports.The original financial tweets retrieved from Twitter are based on financial keywords, i.e., debit, credit, and loan.Extensive preprocessing like removing emojis and URLs was performed on the prepared dataset.The different BERT variation models, i.e., BERT Base Cased, BERT Base Un-Cased, BERT Large Cased, BERT Large Un-Cased, Distilbert Base Cased, and Distilbert Base Un-Cased models, were fine-tuned for our four-class dataset: Real, ChatGPT, Quill-Bot, and SpinBot.The BERT Base Cased model outperformed others with accuracy, precision, recall, and f1 of 0.73, 0.73, 0.73, and 0.72, respectively.The proposed fine-tuned BERT Base Cased model is also effective due to the limited number of parameters as compared to BERT large models.For the robustness of the proposed model, different machine learning models were trained using the TFIDF and Word2VEC approach.From the machine learning perspective, the performance of Logistic Regression with TFIDF is effective with accuracy, precision, recall, and f1 of 0.62, 0.63, 0.62, and 0.62, respectively.This study is a pioneer study concerning four classes as per our knowledge.The limited dataset and four classes are the limitations of this study.The focus on specific AI tools and financial content within Pakistan may limit the study's applicability.The keyword-based selection of real tweets may introduce noise into the dataset.In the future, extending the dataset size and improving the tweets' collection process and classes will yield further improvements through additional refining and enhanced model performance.

Figure 1 .
Figure 1.Abstract diagram for proposed study.
fall last month the pakistani consumer see half benefit"."Despitea recent fall in oil prices, Pakistani consumers only see half of the benefits"."oilcost fall last month the pakistani purchaser see half advantage""When the price of oil fell last month, Pakistani consumers benefited by half".

Figure 4 .
Figure 4. Abstract diagram of model architecture.

Figure 5 .
Figure 5. BERT base model, BERT large model, and TFIDF and Word2Vec approach-based best ML models' scores.

Table 1 .
Prepared dataset samples after preprocessing.

Table 4 .
Evaluation scores of BERT variations for fake financial text dataset.

Table 5 .
ROC curves of BERT variations for fake financial text dataset.

Table 7 .
Evaluation scores of machine learning models with Word2Vec for fake financial text dataset.