LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model

: Customer reviews on the Internet reﬂect users’ sentiments about the product, service, and social events. As sentiments can be divided into positive, negative, and neutral forms, sentiment analysis processes identify the polarity of information in the source materials toward an entity. Most studies have focused on document-level sentiment classiﬁcation. In this study, we apply an unsupervised machine learning approach to discover sentiment polarity not only at the document level but also at the word level. The proposed topic document sentence (TDS) model is based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques. The IMDB dataset, comprising user reviews, was used for data analysis. First, we applied the LDA model to discover topics from the reviews; then, the TDS model was implemented to identify the polarity of the sentiment from topic to document, and from document to word levels. The LDAvis tool was used for data visualization. The experimental results show that the analysis not only obtained good topic partitioning results, but also achieved high sentiment analysis accuracy in document-and word-level sentiment classiﬁcations.


Introduction
Since 2000, a great deal of research has been conducted on consumers' opinions and sentiments due to the increase in online commercial applications. Business sectors and organizations have put substantial efforts in determining consumers' opinions about the product or service they offer. This is because customers' decision-making process is significantly affected by the opinion of people around them (friends, family, etc.). As Bing et al. explained in [1], in research literature it is possible to see various names, e.g., "sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, and review mining"; however, all of them have a similar purpose and belong to the subject of sentiment analysis or opinion mining. Due to this, the importance of sentiment analysis (SA) is being realized in a range of different domains, such as consumer products, services, healthcare, political science, social sciences, and financial services [1][2][3]. A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence or feature level whether the expressed opinion in a document, a sentence or an entity feature is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise [4]. Meanwhile, with the rapid growth in the volumes of text data produced in recent years, the need for text mining has increased to discover hidden knowledge and insights from text data. Texts comprise meaningful words that describe people's opinions. Opinions are usually subjective expressions that describe people's feelings and sentiments toward entities, events, and properties [5]. Sentiment analysis, also known as opinion mining, is a task that uses natural language processing (NLP), text analysis, and computational techniques to automate the extraction or classification of general public's positive, negative, and neutral emotions about the products and services they have used. However, finding the sentiment polarity or opinion mining from a large amount of textual data is an overwhelming task. For example, there is a lack of capability in dealing with complex sentences; and the existing SA techniques do not perform well in certain domains and have inadequate polarity detection and accuracy. A detailed survey in [6,7] explained the challenges of SA and possible techniques that can be used to solve each problem. ASA task is considered a sentiment classification problem [8]. Sentiment classification is one of the most commonly studied areas for most researchers. Sentiment classification techniques can be divided into machine learning (ML), lexicon-based, and hybrid approaches [9]. ML approaches use various linguistic features for sentiment classification. The lexicon-based approach implements a sentiment lexicon. The hybrid approach, as the name suggests, combines both approaches. Thus, sentiment lexicons play an important role in most sentiment classification methods. In terms of different levels, the classification can be divided into document-level, sentencelevel, and aspect-based classification [10]. Sentence-level classification classifies sentiment expressions into every sentence. For this step, the subjectivity of the sentence is identified, and the classification process determines whether the sentence is positive or negative. An aspect-based level classifies the sentiments based on certain aspects of entities, such as identifying the entities and their aspects. To perform this analysis, supervised and unsupervised ML algorithms can be chosen.
To address the aforementioned problems, we present a robust and reliable, real-time sentiment analysis framework based on unsupervised machine learning approach. A motivation for this work was developed by reading JST (Joint Sentiment Topic model to be explained subsequently) model implementation in sentiment classification. JST will detect sentiment values of the topics without considering any external sentiment labels. This leads JST to slight drawbacks in sentiment classification accuracy, because without any external sentiment labels, topics discovery will suffer. To better topic quality, we used LDA (Latent Dirichlet Allocation) model which performs significantly well in inter-and intra-mixing distributional documents from multiple topics.
The main contributions of this paper are as follows: • Increasing automatic discovery of topics from data or corpus by a joining proposed method with LDA approach. • Providing more precise sentiment representation over topics, documents and words by integrating accurate topic and document discovery.
The remainder of the paper is organized as follows: Section 2 reviews existing conventional studies for the sentiment analysis. Section 3 presents the proposed sentiment analysis approach in detail. The experimental results based on IMDB databases are discussed in Section 4. Section 5 highlights certain limitations of the proposed method. Finally, Section 6 concludes the paper by summarizing our findings and future research directions.

Related Works
Sentiment analysis is one of the hardest tasks in natural language processing because even humans struggle to analyze sentiments accurately. Data scientists are getting better at creating more accurate sentiment classifiers, but challenges of machine-based sentiment analysis remaining exist. In general, existing systems related to sentiment analysis technologies can be divided into two categories: traditional sentiment analysis approaches based on computer vision and AI-based sentiment analysis systems using machine learning (ML) and deep learning (DL). In this section, we focus mainly on discussing the aforementioned AI based approach, which are appropriate only under certain conditions. However, these features are not sufficient to establish accurate classification from text analysis. To overcome these limitations, additional sentiment attributes are required, such as the feedback data, multilingual efficacy, speed and scale, social media and multimedia [11].

Computer Vision and Image Processing Approaches for Sentiment Analysis
Ortis et al. [12] introduced the research field of image sentiment analysis, reviewed the related problems, provides an in-depth overview of current research progress, discusses the major issues and outlines the new opportunities and challenges in this area. The first paper on Visual Sentiment Analysis aims to classify images as "positive" or "negative" and dates back to 2010 [13]. In this work, the authors studied the correlations between the sentiment of images and their visual content. They assigned numerical sentiment scores to each picture based on their accompanying text (i.e., meta-data). Udit et al. [14] developed an improved sentiment analysis method using image processing techniques based on visual data. Other research related to aspect-based sentiment analysis (ABSA) classify the sentiment of a specific aspect in a text presented in [15][16][17]. Earlier research has shown that these approaches are appropriate only under certain conditions.

Artificaial Intelligence Approaches for Sentiment Analysis
In recent years, DL approaches have been significantly and effectively implemented in sentiment analysis research areas in different ways. In contrast to the techniques reviewed earlier that rely on handcrafted characteristics, ML and DL approaches can automatically identify, extract, quantify, and analyze complicated point features. Another benefit is that deep neural networks can be implemented flexibly and successfully in automatic feature extraction using learned data or analyzed customer's feedback data; instead of spending time extracting functions, they can be modified to create a robust database and an appropriate network structure.
There has been a great deal of research focus on the problem of sentiment classification. Supervised and unsupervised ML approaches are separately used by most researchers to classify SA. In addition, supervised and unsupervised approaches can also be combined to analyze the sentiment. In [18], authors used supervised and unsupervised methods together in the case of proposing meta-classifiers to develop a polarity classification system. Furthermore, in [19], unsupervised learning algorithm was applied to automatically categorize text, which resulted in training sets by using keyword lists. They classified documents into a certain number of predefined categories. The purpose of their study was to overcome the problems associated with the creation of labeled training documents and manually categorizing them. To evaluate the proposed methods, they embodied a traditional system by supervised learning using the same naïve Bayes classifier, and then tested and compared the performance check.
Similar research was conducted by Turney et al. [20], applies an unsupervised learning algorithm to classify the semantic orientation regarding mutual information between document phrases and a small set of positive/negative paradigm words. Lin et al. [21] proposed a framework based on the LDA approach called JST to detect sentiments and topics simultaneously. Their JST model is also an unsupervised ML algorithm. In another interesting study, Adnan et al. used a statistical approach in the feature selection process as detailed in [22]. The hidden Markov model (HMM) and LDA method were used to separate the entities in a review document from the subjective expressions according to the polarity of those entities. The proposed scheme achieved competitive results for document polarity classification. Although some approaches have applied unsupervised and semi-supervised learning methods [23,24], using supervised learning techniques for aspect-based sentiment analysis is also a very popular concept in machine learning [25]. The LDA model allows documents to be explained by latent topics. There is also a very applicable supervised ML approach called latent semantic indexing (LSI) [26]. LSI is a well-known feature selection method that attempts to reduce the dimensionality of the data by transforming the text space into a new axis system. However, compared with LSI, LDA has a better statistical foundation to define the topic-document distribution θ, by allowing inferences in new documents based on previously estimated models, and avoids the overfitting problem.

Proposed Method
Pang et al. [6] mentioned that the sentiment classification problem is comparatively more challenging than the traditional topic-based classification because sentiments can be expressed in a more subtle manner while topics can be identified more easily with respect to the co-occurrence of keywords. According to the Appraisal group, the improvement of sentiment polarity detection accuracy is related to incorporating prior information or the subjectivity lexicon.
In this study, we propose an unsupervised ML TDS approach to determine sentiment polarity first at the topic level, and then at document and word levels. We applied the LDA feature selection model to discover the topics in the IMDB dataset [27]. To visualize topic distribution, we used the LDAvis data visualization tool, which was developed by Carson et al. [28], which revealed aspects of the topic-term relationships, including topical distance calculation, number of clusters, terms, and value of lambda. Lambda (λ) determines the weight given to the probability of term under the topic relative to its lift (measuring both on the log scale) and the value of lambda used to compute the most relevant terms for each topic. The overall workflow of this study is shown in Figure 1.
inferences in new documents based on previously estimated models, and avoids the ov fitting problem.

Proposed Method
Pang et al. [6] mentioned that the sentiment classification problem is comparativ more challenging than the traditional topic-based classification because sentiments can expressed in a more subtle manner while topics can be identified more easily with resp to the co-occurrence of keywords. According to the Appraisal group, the improvemen sentiment polarity detection accuracy is related to incorporating prior information or subjectivity lexicon.
In this study, we propose an unsupervised ML TDS approach to determine sentim polarity first at the topic level, and then at document and word levels. We applied LDA feature selection model to discover the topics in the IMDB dataset [27]. To visual topic distribution, we used the LDAvis data visualization tool, which was developed Carson et al. [28], which revealed aspects of the topic-term relationships, including topi distance calculation, number of clusters, terms, and value of lambda. Lambda (λ) det mines the weight given to the probability of term under the topic relative to its lift (me uring both on the log scale) and the value of lambda used to compute the most relev terms for each topic. The overall workflow of this study is shown in Figure 1.

Introduction to LDA
LDA is a well-known method for topic modeling. First introduced by David et al [29], LDA shows topics using word probabilities. LDA is an unsupervised generat probabilistic model of a corpus. The main task of LDA is that documents are represen in a random mixture over latent topics, where a topic is characterized by a distribut over words [30]. LDA assumes that every document can be represented as a probabili distribution over latent topics, as shown in Figure 2. In this process, the topic distribut in all documents shares a common Dirichlet prior. Each latent topic represented in

Introduction to LDA
LDA is a well-known method for topic modeling. First introduced by David et al. in [29], LDA shows topics using word probabilities. LDA is an unsupervised generative probabilistic model of a corpus. The main task of LDA is that documents are represented in a random mixture over latent topics, where a topic is characterized by a distribution over words [30]. LDA assumes that every document can be represented as a probabilistic distribution over latent topics, as shown in Figure 2. In this process, the topic distribution in all documents shares a common Dirichlet prior. Each latent topic represented in the LDA model is also represented as a probabilistic distribution over words, and the word distributions of topics share a common Dirichlet prior. Given a corpus D consisting of M documents, with document d having N d words (d ∈ 1, . . . ,M), LDA models D according to the following generative process [31]: iables, while others are latent variables (φ and θ) and hyper parameters ( the latent variable and hyper parameters, the probability of the observed puted and maximized as follows: The parameters of the topic Dirichlet prior and the distribution of w ics, which are drawn from the Dirichlet distribution, are given by β. T is topics, M is the number of documents, and N is the size of the vocabulary multinomial pair for corpus-level topic distributions , β) was considered multinomial pair for topic-word distributions is given by β and φ. Varia document-level variables. and are word-level variables that a each word in each text document. There have been many studies related to topic models using LDA in d such as topic modeling in linguistic science, political science, medical a fields, and other research areas [32][33][34].

Topic Document Sentence (TDS) Model
The existing LDA framework represents three hierarchical layers, w associated with documents and words are associated with topics. How searchers [21] have implemented a joint sentiment topic (JST) unsupervised adding an additional layer to the LDA model. More specifically, sentime are associated with documents, topics are associated with sentiment labels, associated with both sentiment labels and topics of the entire corpus. The tween TDS and LDA is that LDA supports three layers for processing, whi additional sentiment label layer for giving higher classified sentiment. It Regarding the generative process above, words in documents are only observed variables, while others are latent variables (ϕ and θ) and hyper parameters (α and β). To infer the latent variable and hyper parameters, the probability of the observed data D is computed and maximized as follows: The α parameters of the topic Dirichlet prior and the distribution of words over topics, which are drawn from the Dirichlet distribution, are given by β. T is the number of topics, M is the number of documents, and N is the size of the vocabulary. The Dirichlet multinomial pair for corpus-level topic distributions (α, β) was considered. The Dirichlet multinomial pair for topic-word distributions is given by β and ϕ. Variables θ d are the document-level variables. z dn and w dn are word-level variables that are sampled for each word in each text document.
There have been many studies related to topic models using LDA in different fields, such as topic modeling in linguistic science, political science, medical and biomedical fields, and other research areas [32][33][34].

Topic Document Sentence (TDS) Model
The existing LDA framework represents three hierarchical layers, where topics are associated with documents and words are associated with topics. However, many researchers [21] have implemented a joint sentiment topic (JST) unsupervised ML model by adding an additional layer to the LDA model. More specifically, sentiment labels of JST are associated with documents, topics are associated with sentiment labels, and words are associated with both sentiment labels and topics of the entire corpus. The difference between TDS and LDA is that LDA supports three layers for processing, while TDS has one additional sentiment label layer for giving higher classified sentiment. It is important to note that, other than the one additional layer, the TDS model is similar to the JST model. However, what distinguishes the TDS model from the JST model is the implementation of sentiment label analysis in three parts of sentiment analysis sections, such as topic sentiment analysis, document sentiment analysis, and word sentiment analysis.
Let us assume that we have a corpus with a collection of D documents denoted by D = {d 1 d 2 d 3 , . . . d M }, and the corresponding vocabulary of the document collection is denoted by V is the number of words, and w N d is the nth word in document d, w N d ∈ V. Assume that S is the number of distinct sentiment labels, and T is the total number of topics.
The generative process shows in Figure 3a representing of the original JST model [35]. Like the JST model, the proposed TDS model development parameters are as follows: timent analysis, document sentiment analysis, and word sentiment analysis.
Let us assume that we have a corpus with a collection of D documents denoted by = , … , and the corresponding vocabulary of the document collection is deno by V = { , … }. Each document in the corpus is a sequence with words noted by d = { , … }, where d is the document number of the collection ! is the number of words, and is the nth word in document d, ∈ V. Assume t S is the number of distinct sentiment labels, and T is the total number of topics. The generative process shows in Figure 3a representing of the original JST mo [35]. Like the JST model, the proposed TDS model development parameters are as follo  As mentioned in [18], $ plays an important role in identifying the document po ity. In our model implementation, $ is the main change. It is worth noting that sentim document distribution is applied to the total number of topics, T stage.
Formula implementation of TDS model with joint sentiment. As mentioned in [18], π plays an important role in identifying the document polarity. In our model implementation, π is the main change. It is worth noting that sentiment document distribution is applied to the total number of topics, T stage.
Formula implementation of TDS model with joint sentiment.
P(w, z, l) = P(w|z, l)P(z, l) = P(w|z, l) * P(z|l, d)P(l|d) For the first term, by integrating out ϕ, we obtain: where V is the size of the vocabulary, T is the total number of topics, S is the total number of sentiment labels, and N i,j,k subscripts are used to loop for the number of times the word i appears in topic j and for sentiment label k. N j,k, is the number of times words are assigned to topic j and sentiment k, and Г is the gamma function. The remaining terms of Equation (4) are obtained by integrating out the term-θ: In Equation (4), S is the total number of sentiment labels, D is the total number of documents in the collection, N j,d,k is the number of times a word from document d has been associated with topic j and sentiment label k. N k,d is the number of times sentiment label k has been assigned to some word tokens in document d.
For the fifth and sixth terms, integrating out π: where D is the total number of documents in the collection, N k,d is the number of times sentiment label k has been assigned to some word tokens in document d. N d is the total number of words in the document collection.

Experimental Results and Analysis
This section presents the topic modeling and experimental setup of sentiment polarity classification based on the IMDB dataset. We implemented and tested the proposed method in Visual Studio 2019 C++ on a PC with a 3.20-GHz CPU, 32 GB of RAM, and two Nvidia GeForce 1080Ti GPUs.

Preprocessing the Dataset
Data cleaning is one of the most important processes for obtaining accurate experimental results. To test our model, we used the IMDB movie review dataset [27]. The dataset included 50,000 reviews that were evenly divided into positive and negative reviews. The dataset was divided into a 80% (4000) training set and a 20% (1000) testing set. First, unnecessary columns were removed. The second process included some spelling corrections, removing weird spaces in the text, html tags, square brackets, and special characters represented in text and contraction. They were then handled with emoji by converting them to the appropriate meaning of their occurrence in the document. Thereafter, we put all the text to lowercase, and removed text in square brackets, links, punctuation, and words containing numbers. Next, we removed stop words because having these makes our analysis less effective and confuses our algorithm. Subsequently, to reduce the vocabulary size and overcome the issue of data sparseness, stemming, lemmatization, and tokenization processes were applied. We normalized the text in the dataset to transform the text into a single canonical form. With the aim of achieving better document classification, we also performed count vectorization for the bag-of-words (BOW) model. The BOW model can be used to calculate various measures to characterize the text. For this calculation process, the term frequency-inverse document frequency (TF-IDF) is the best method. Basically, TF-IDF reflects the importance of a word [36]. We applied the N-gram model to avoid the shortcomings of BOW when dealing with several sentences with words of the same meaning. The N-gram model parses the text into units, including TF-IDF values. The N-gram model is an effective model representation used in sentiment analysis. Each N-gram related to parsed text becomes an entry in the feature vector with the corresponding feature value of TF-IDF [37].
After preprocessing, the LDA model was applied. LDA is a three-level hierarchical Bayesian model that creates probabilities at the word level, on the document level, and on the corpus level. Corpus level means that all documents exist in the dataset. Then, a model was developed for identifying unique words in the initial documents and the number of unique words after removing rare and common words. For the analysis of visualizing document relationships, the LDA developed model was applied at the corpus level, which was used for whole document visualization. The LDA application greatly reduced the dimensionality of the data. We used a web-based interactive visualization system, called LDAvis, developed by Carson et al. [28]. LDAvis helps to understand the meaning of each topic and measures the prevalence and relationship of topics to each topic. The removal of rare and common tokens from documents decreased the number of unique words. Initially documents covered 163,721 unique words, and after the process, we had 24,960 unique words. From 50,000 documents, the experiments showed 24,960 unique tokens as shown in Table 1. Experiments were benchmarked on the IMDB dataset [27], which contains an even number of positive and negative reviews. The IMDB dataset included movie reviews retrieved from the IMDB. In text mining, datasets probably contain existing data with type errors. Misspelled words decrease having high classification accuracy. Coming to the main purpose of this research, increasing sentiment classification accuracy, we used TextBlob python library to correct misspelled words. TextBlob library is useful in applying manifolds in Natural Language Processing (NLP), such as detecting sentiment orientation, every word intensity in the sentence and spelling correction of words.

Performance Measurement
For the evaluation check, we used a confusion matrix that is commonly used to describe the data classification performance in three metrics: accuracy, recall, and precision, as highlighted in Equations (7)-(9), and Table 2.  To evaluate our model, we first divided each document into two parts and analyzed whether the topics assigned were similar. To implement this analysis process, we used a corpus LDA model transformation. The LDA transformation in every document returns topics with non-zero weights. This function then creates a matrix transformation of documents in the topic space. To compute the LDA transformation, we chose the cosine similarity method, which is a simple and effective method. For two given vectors of attributes A and B, the cosine similarity is represented as: where A represents the target sentence's word vector, and B represents the word vector of the compared sentence. A * B represent the number of shared words between word vectors A and B. A and B refer to the number of words in A and B. Table 3 shows an evaluation check of document similarity at the corpus level. We can say that the intra-topic similarity was highly accurate. This can be explained by the fact that topics in the corpus are perfectly modeled. Moreover, inter-topic similarity was well separated with an accuracy of 99.92%. Intra and inter document evidence has proved effective in improving sentence sentiment classification [38]. We ran the LDAvis tool for four-topic visualization to determine the most relevant terms for every topic and percentage of tokens. Moreover, this tool was also helpful for visualizing topic correlations.  Table 4 highlights the most frequent terms in the case of the four topics. Frequency rate of all topics are low, which indicate the analyzed data is big or very sparse. From the frequency of words, topics can be indicated. However, as the main purpose of this study is to find those topics, sentences and words are representing which sentiment class. The TDS model implementation identified sentiment classes of data, as can be seen in Figure 4.
Sentiment polarities are presented in Figure 4 with document level and in Figures  7 and 8 with word level. The Tokenized Word Length Distribution Table represents the polarity of documents. Length displays how the document includes sample tokenized words. As mentioned before, we have 50,000 documents and of them 24,960 are unique tokens. As can be seen from Figure 5, most of the text length is [0, 80], and the text between [0, 40] is the vast majority. The text with the length between [10,40] is selected for the experiment as corpus sentiment analysis, and the selected experimental text length covers 14,000 texts.   Table represents the polarity of documents. Length displays how the document includes sample tokenized words. As mentioned before, we have 50,000 documents and of them 24,960 are unique tokens. As can be seen from Figure 5, most of the text length is [0, 80], and the text between [0, 40] is the vast majority. The text with the length between [10,40] is selected for the experiment as corpus sentiment analysis, and the selected experimental text length covers 14,000 texts.    Table represents the polarity of documents. Length displays how the document includes sample tokenized words. As mentioned before, we have 50,000 documents and of them 24,960 are unique tokens. As can be seen from Figure 5, most of the text length is [0, 80], and the text between [0, 40] is the vast majority. The text with the length between [10,40] is selected for the experiment as corpus sentiment analysis, and the selected experimental text length covers 14,000 texts. Every word appearing in a sentence has the meaning of identifying document polarity. Every document has a role in identifying the entire topic, and the topic can be classified as positive or negative. To adopt this theory, word-level sentiment classification is important. For the word-level sentiment class, we identified the top seven most common Every word appearing in a sentence has the meaning of identifying document polarity. Every document has a role in identifying the entire topic, and the topic can be classified as positive or negative. To adopt this theory, word-level sentiment classification is important. For the word-level sentiment class, we identified the top seven most common positive and negative sentiments. Figure 6 represents the most frequent words in an entire dataset. Words are classified in sentiment classes with their frequency count numbers. Representative figures are examples of the seven most commonly used words as positive and negative in our entire dataset. Some words appear in all sentiment classes; however, this measurement is performed subjectively. When classification was done at the sentence level, all group-dependent sentiments were classified in the class as they presented. Nevertheless, comparatively in all sentiment classes, it can be predicted which sentences are more dominant in terms of sentiment class, and that sentiment class is more suitable in that identified sentiment class. From the experimental results, it can be conducted that a word represented comparatively more in the positive class than in the other classes, and then the word classified as a positive sentence. Emotional tendency score judgment rules as follows: {Positive score >= 0.05, Negative score <= −0.05} (11) The score indicates how negative or positive the analysis is of the overall text. Anything below a score of −0.05 we tag as negative and anything above 0.05 we tag as positive. Table 5 describes performance of TDS model. Our data is well balanced, and showing relatively high results in positive precision, negative recall and negative F1 scores. To check the validity of the proposed model, the following models are compared. Comparison models are SVM [39], which use the bag of words model for representing texts and TF-IDF for calculating weight of words, CNN model that was proposed by Kim [40], and the LSTM model with 300 dimension vectors [41]. For the training of the models, maximum length of the sentence was set to 60, and zero filling operation was

MOST COMMON SENTIMENTS
film good man watch see get act Representative figures are examples of the seven most commonly used words as positive and negative in our entire dataset. Some words appear in all sentiment classes; however, this measurement is performed subjectively. When classification was done at the sentence level, all group-dependent sentiments were classified in the class as they presented. Nevertheless, comparatively in all sentiment classes, it can be predicted which sentences are more dominant in terms of sentiment class, and that sentiment class is more suitable in that identified sentiment class. From the experimental results, it can be conducted that a word represented comparatively more in the positive class than in the other classes, and then the word classified as a positive sentence. Emotional tendency score judgment rules as follows: {Positive score ≥ 0.05, Negative score ≤ −0.05} (11) The score indicates how negative or positive the analysis is of the overall text. Anything below a score of −0.05 we tag as negative and anything above 0.05 we tag as positive. Table 5 describes performance of TDS model. Our data is well balanced, and showing relatively high results in positive precision, negative recall and negative F1 scores. To check the validity of the proposed model, the following models are compared. Comparison models are SVM [39], which use the bag of words model for representing texts and TF-IDF for calculating weight of words, CNN model that was proposed by Kim [40], and the LSTM model with 300 dimension vectors [41]. For the training of the models, maximum length of the sentence was set to 60, and zero filling operation was performed with the same number of sentences. Table 6 and Figures 7 and 8 show the analysis results of the sentiment classification results in comparison with the SVM, CNN, and LSTM models. The precision, recall, and F1 score of the positive sentiment class are highlighted by P pos, R pos, and F1 pos . For the negative sentiment class, precision was highlighted as P neg , recall as R neg , and F1 score as F1 neg , respectively. The F1 score is the weighted average of precision and recall. Hence, this score considers both false positives and false negatives. Intuitively, it is not as easy to understand as the accuracy, but F1 is more commonly used than accuracy. It can be seen that the deep learning models performed better in sentiment classification than SVM, which is a traditional method. The F1 score calculated using following Equation (12). ative sentiment class, precision was highlighted as Pneg, recall as Rneg, and F1 score as F1neg, respectively. The F1 score is the weighted average of precision and recall. Hence, this score considers both false positives and false negatives. Intuitively, it is not as easy to understand as the accuracy, but F1 is more commonly used than accuracy. It can be seen that the deep learning models performed better in sentiment classification than SVM, which is a traditional method. The F1 score calculated using following Equation (12).  It can be seen from Figure 7 that the TDS model performs better only in the positive precision sentiment class compared with the other three models, reaching 94.0% in the TDS model and 84.8%, 86.0%, and 85.1% in the SVM, CNN, and LSTM models, respectively. Nevertheless, the TDS model reaches the lowest classification results in the recall and F1 score classes. The range of the text sentiment score was between [−1, 1] for all distributed samples, which selected experimental data length was between [10,40].

Limitations
Some classification results of TDS model were comparatively higher than other mo els. However, the model is achieving dominance in all classifications' parts. Moreover, t F1 score is consistently showing low results in both positive and negative sentiment cl sifications. Furthermore, we will improve the data analysis by separating score interv into several parts in order to keep balance of the learning rate.

Conclusions
In this study, a topic-based SA was analyzed. The LDA, which is an unsupervis ML technique, was successfully used for topic modeling. Furthermore, we implement the TDS model in this study, with the main idea for which was to model topics with aim of increasing sentiment classification. The topic similarity checks yielded accurate sults. Thereafter, from highly similar topics, we classified sentiments at the topic, do ment, and word levels. Our experimental results confirm that the TDS model is an exc lent ML technique for modeling the topic and classifying sentiment polarity. Howev the results showed that the TDS model achieved accurate results only in the positive a negative recall scores. According to our assessment, one of the main reasons for havi low classification performance in Rpos and F1pos positive sentiment classes, Pneg and F1 negative sentiment classes, is probably because of having high fluctuation representati of score accuracy in underscore intervals.   It can be seen from Figure 7 that the TDS model performs better only in the positive precision sentiment class compared with the other three models, reaching 94.0% in the TDS model and 84.8%, 86.0%, and 85.1% in the SVM, CNN, and LSTM models, respectively. Nevertheless, the TDS model reaches the lowest classification results in the recall and F1 score classes. The range of the text sentiment score was between [−1, 1] for all distributed samples, which selected experimental data length was between [10,40]. Figure 8 shows the negative sentiment class classification performances of the three models compared with our model TDS. Negative recall sentiment class (R neg ) results indicated that classification of our model performed relatively high classification reach than other models, to around 99.0%, while SVM, CNN and LSTM models reached around 83.5%, 85.82% and 86.1%, respectively.

Limitations
Some classification results of TDS model were comparatively higher than other models. However, the model is achieving dominance in all classifications' parts. Moreover, the F1 score is consistently showing low results in both positive and negative sentiment classifications. Furthermore, we will improve the data analysis by separating score intervals into several parts in order to keep balance of the learning rate.

Conclusions
In this study, a topic-based SA was analyzed. The LDA, which is an unsupervised ML technique, was successfully used for topic modeling. Furthermore, we implemented the TDS model in this study, with the main idea for which was to model topics with the aim of increasing sentiment classification. The topic similarity checks yielded accurate results. Thereafter, from highly similar topics, we classified sentiments at the topic, document, and word levels. Our experimental results confirm that the TDS model is an excellent ML technique for modeling the topic and classifying sentiment polarity. However, the results showed that the TDS model achieved accurate results only in the positive and negative recall scores. According to our assessment, one of the main reasons for having low classification performance in R pos and F1 pos positive sentiment classes, P neg and F1 neg negative sentiment classes, is probably because of having high fluctuation representation of score accuracy in underscore intervals.