Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding (LeBERT) Model with Convolutional Neural Network

: Sentiment analysis has become an important area of research in natural language processing. This technique has a wide range of applications, such as comprehending user preferences in ecommerce feedback portals, politics, and in governance. However, accurate sentiment analysis requires robust text representation techniques that can convert words into precise vectors that represent the input text. There are two categories of text representation techniques: lexicon-based techniques and machine learning-based techniques. From research, both techniques have limitations. For instance, pre-trained word embeddings, such as Word2Vec, Glove, and bidirectional encoder representations from transformers (BERT), generate vectors by considering word distances, similarities, and occurrences ignoring other aspects such as word sentiment orientation. Aiming at such limitations, this paper presents a sentiment classiﬁcation model (named LeBERT) combining sentiment lexicon, N-grams, BERT, and CNN. In the model, sentiment lexicon, N-grams, and BERT are used to vectorize words selected from a section of the input text. CNN is used as the deep neural network classiﬁer for feature mapping and giving the output sentiment class. The proposed model is evaluated on three public datasets, namely, Amazon products’ reviews, Imbd movies’ reviews, and Yelp restaurants’ reviews datasets. Accuracy, precision, and F-measure are used as the model performance metrics. The experimental results indicate that the proposed LeBERT model outperforms the existing state-of-the-art models, with a F-measure score of 88.73% in binary sentiment classiﬁcation.


Introduction
Recently, social media platforms have created opportunities for businesses and organizations to obtain feedback from their customers and clients through reviews in the form of user-generated posts. Such posts are availed through social media and worldwide web in form of blogs, which contain data in text, audio, visual, or a combination of the three modes. Specifically, social media text data are characterized by short sentences, which are unstructured, semi-structured, and normally full of colloquial language, making it messy, difficult, and time consuming to build its vector representations and sentiment classification [1][2][3][4]. However, through sentiment analysis (SA), one of the big data analytics techniques, the text data can provide insightful business information [4]. Sentiment analysis is the process of classifying texts into predetermined opinion classes [3], which can be performed at document level, sentence level, or word level. Sentence level SA is a text classification task that assigns short texts (sentences) to predefined sentiment or opinion

Related Work
Sentiment analysis (SA) is a branch in NLP, which utilizes text mining and related technologies to classify subjective text into classes of opinions, emotions, or any other category. Vector representation of text is a very important task in sentiment analysis since it determines the accuracy and efficiency of the developed SA models [5]. Recently, there are many studies that have used lexicon-based techniques, pre-trained word embeddings, NLP techniques, and deep learning models in vector representation and generally in SA. In Section 2.1, current research in lexicon-based techniques, N-grams, and NLP is discussed, whereas in Section 2.2, research in pre-trained word embeddings and deep learning models is discussed.

Lexicon-Based Techniques, N-Grams and Natural Language Processing
The lexicon-based techniques use a dictionary of words labeled with their sentiment orientations. In such techniques, a piece of text is converted into a bag of words whose sentiment orientations are summarized or aggregated to classify the text. This technique is simple, but it is mostly dependent on manual labeling of the text [22]. Baharudin and Khan [23] suggested that sentence structure and contextual information are important for sentiment orientation and classification. In their work, each term in the sentence was assigned a sentiment score from the Sent WordNet lexicon. The overall classification of the sentence is the sum total score of the individual scores of the terms in the sentence. While the approach is interesting, one of the limitations of this approach is that words can be of the same orientation, but negating one another, thus giving the wrong sentiment classification. The main improvements of lexicon-based techniques involve using lexicon labeled words as input to machine learning classifiers. Mudinas et al. [24] combined lexicon-based approach and support vector machine. In their method, they generated word sentiment labels and used them as input to the SVM classifier. Seyed et al. [14] used several lexicons to assign lexicon vectors to words in a text, which they referred to as Lexicon2Vec (L2V). They combined their vector with Word2Vec and PoS2vec to obtain a hybrid vector representation. Generally, little research has been performed on combining lexicon-based methods and deep learning architectures. Huang et al. [25] proposed a sentiment analysis model of online reviews, which they referred to as polymerization topic sentiment model (PTSM). In their model, they used lexicon dictionary to extract sentiment information from online reviews. Although their model performed well with the support vector machine, they did not test their model with deep learning classifiers or word embedding algorithms. However, they recommended use of lexicon-based methods to solve the over-fitting problem of sentiment analysis models and to filter unnecessary information Generation of word N-grams is another important NLP technique applicable in sentiment analysis. In text classification, word-grams are used to generate word co-occurrence patterns and vectors for machine learning classifiers. N-gram NLP models are widely used due to their simplicity and effectiveness [26]. However, they do not consider the information encapsulated in the sequence of the words. For instance, words could be negating one another in a sentence or having different meaning in different context. Kumar et al. [27], in their recent research on use of N-grams in text representation, used bi-grams and tri-grams to extract features from text data. Their work yielded promising results, which is an indication that N-grams can be utilized for effective text representation. They proposed a big data analytics framework for sentiment analysis and classification using intelligent cognitive inspired computing. In their model, they used fuzzy cognitive maps as classifiers. In our research, we advance this work by investigating use of hybrid NLP techniques, including N-grams and sentiment lexicon. They also recommended future research on deep learning architecture, an area which is also being explored in this research work. We do so by seeking to combine pre-trained word embeddings with sentiment lexicon and N-grams.

Word Embeddings-Based Techniques and Deep Learning Models
Recently, word embeddings-based vector representation techniques are playing an important role in natural language processing [28]. According to Mikolov [29], research in word embedding feature selection gained momentum in 2013. The main word embeddings algorithms are Word2Vec [15], Glove [16] and FastText [17,30], which are used to convert words to vectors. Recently, bidirectional encoder representations from the transformers (BERT) model [18] has received much attention due to its bidirectional and attention mechanisms. Consequently, use of BERT embedding-based models outperforms other models, thus showing remarkable performance in sentiment analysis tasks [31,32]. Word embeddings are better than the normal bag of words representation, since they cater for synonyms and produce vectors with lower dimensionality than the bag of words [14,15]. Garg [33] did research on word embeddings and established that Word2Vec embeddings performed better than the other word-embedding algorithms. Currently, most researchers use pre-trained word embeddings vectors as inputs of machine learning classifiers in their sentiment analysis research since they are more accurate and compatible with deep learning neural networks [22]. However, pre-trained word embeddings ignore sentiment orientation of words and their context, hence affecting sentiment classification accuracy [14,28]. This is because they use word distances and synonyms to calculate word vectors.
Kim [34] studied use of pre-trained Word2Vec vectors as inputs to convolutional neural networks and improved their performance by hyper parameter tuning of the CNN model. Wang et al. [35] used pre-trained Glove vectors as inputs for attention-based LSTM models for aspect-level sentiment analysis. Liu et al. [21] used pre-trained Word2Vec in idiom recommendation model in essay writing. Liu et al. [36] used pre trained Word2Vec model and improved them for cross-domain classification by extending the vector to include domain information. Recently D'Silva and Sharma [37] used FastText pre-trained word embeddings and neural networks to classify Konkani texts. Hu et al. [38] used BERT to integrate mental features and short text vector to improve topic classification and false detection in short text. Although their work showed better performance, they did not compare their proposal with other word embedding models. They also suggested more research to be performed on application of BERT in other contexts of text classification. Prottasha et al. [31] did a study to compare Word2Vec, Glove, FastText, and BERT. They demonstrated that transformer architectures, such as BERT models, are the state-of-the-art models for text representation and play a crucial role in sentiment analysis. The superiority of BERT is that it can read series of words in either direction, unlike other word embedding algorithms. Further, BERT employs the attention mechanism of the transformer that assigns a word its vector, depending on the surrounding words. This mechanism enhances the semantic representation of the target text. However, the series of input words to be read by the BERT algorithm maintains the entire words of the target text. We propose that the performance of BERT algorithm can be enhanced by focusing the input series to a few words, which contain sentiment information and their neighbours of the target text. This can be guided by utilization of sentiment lexicon and word N-grams. In a recent study [13], the researchers investigated a text representation technique using sentiment lexicon and N-grams where a Lexicon-pointed hybrid N-gram feature extraction model (LeNFEM) was proposed and investigated. A three-word N-gram was identified, which contains a sentiment word by use of a sentiment lexicon. The N-gram was then expanded to form a hybrid vector containing words, POS tags, and sentiments. Although this is a novel text representation technique, a proposal was put forth on investigation of how the approach could be applied with deep learning models, including word embeddings. In this paper, we extend on this work and present a text representation technique named lexicon selected-BERT embedding (LeBERT) Model. The model combines sentiment lexicon and BERT word embeddings via word N-grams for sentiment classification.
Based on the related work discussed, we observe that existing deep learning models for sentiment analysis generate text representation vectors using word embeddings. We also noted that the BERT model is one of the state-of-the-art embedding models. Thus, any study on improving it advances sentiment analysis and natural language processing research. With this objective, this study suggested and investigated combination of BERT word embedding model, sentiment lexicon, and N-grams. The novelty of the proposed LeBERT model is that the sentiment lexicon is utilized to identify a section of a text (sentence or a document) where sentiment information is domiciled, and the BERT algorithm is used to build word vectors from that section only. In Section 3, we present and describe the details of the proposed model.

The Proposed LeBERT Model
In deep learning, the BERT model is one of the current word embeddings and text representation models under study for sentiment analysis. BERT, unlike other word embedding algorithms, can effectively read series of words in either direction of the input text, and since it uses the attention mechanism to assign a word, its vector depends on the surrounding words, and it is efficient in word vectorization [39]. Although BERT considers the context of a word when assigning the vector, it does so for all the words in the input text, which leads to a resultant vector with high dimensionality. Second, word vectors built from BERT do not contain semantic information, which is critical in sentiment classification. Compared with BERT, the sentiment lexicon can be used to identify sentiment words in a text and assign specific sentiment polarity to the words. However, sentiment lexicon cannot generate representative word vectors, hence leading to high data sparseness. Thus, to improve sentiment classification, this paper proposes the LeBERT model, which combines sentiment lexicon, N-grams, and BERT algorithms.
The design idea of the LeBERT model is to first use N-grams to split the input text into sections, and then use a sentiment lexicon to identify a section or sections that contain a sentiment word. It is worth noting that text reviews, such as social media posts, contain short text, and characteristically, semantic features in short texts are concentrated in a certain part [39]. Thus, extracting features from such parts will lead to efficient and effective text representation. The words of the identified section(s) are then converted into a vector by BERT. The output word vector is then used as the input into a CNN model with a fully connected layer where features from the vector are obtained. The features extracted are then integrated by the dense output layer, and finally the sentiment class of the text is performed by a SoftMax classifier. The architecture of the proposed LeBERT model is shown in Figure 1. As shown in Figure 1, the sentiment lexicon, N-grams, and BERT algorithm are used in the embedding layer to build the word vector. The overall sentiment analysis model using the LeBERT model is presented in Figure 2. As shown in Figure 1, the sentiment lexicon, N-grams, and BERT algorithm are used in the embedding layer to build the word vector. The overall sentiment analysis model using the LeBERT model is presented in Figure 2.

LeBERT Embedding
There are currently two common methods used to build text vectors for sentiment analysis: word-embedding based methods or lexicon-based methods. In our proposed model, we sought to utilize both methods through N-grams. The sentiment lexicon is used to identify word N-grams containing a sentiment word, and then the vector from the Ngram words using BERT word embedding model is used.
To build the vector, we first generate word N-grams from the sentences. A N-gram is a combination of words from a sentence, which forms a Markovian process. Normally, this is used to predict the next word in a sequence of words. Further, Markovian process also generates co-occurrence of words, which is a key aspect in influencing sentiment in a text. In this case, we use N-gram sequences to partition a sentence into various sections that represent the entire text, such as an online review or a sentence. This is because Ngrams present co-occurrence of words in a text in a more comprehensive manner than mere bag of words (BoW). The size of the partition depends on the value of N.
For instance, if we consider a sentence S given as:

LeBERT Embedding
There are currently two common methods used to build text vectors for sentiment analysis: word-embedding based methods or lexicon-based methods. In our proposed model, we sought to utilize both methods through N-grams. The sentiment lexicon is used to identify word N-grams containing a sentiment word, and then the vector from the N-gram words using BERT word embedding model is used.
To build the vector, we first generate word N-grams from the sentences. A N-gram is a combination of words from a sentence, which forms a Markovian process. Normally, this is used to predict the next word in a sequence of words. Further, Markovian process also generates co-occurrence of words, which is a key aspect in influencing sentiment in a text. In this case, we use N-gram sequences to partition a sentence into various sections that represent the entire text, such as an online review or a sentence. This is because N-grams present co-occurrence of words in a text in a more comprehensive manner than mere bag of words (BoW). The size of the partition depends on the value of N.
For instance, if we consider a sentence S given as: where, w i are words.
For various values of N, we have; N = 1, the set of N-grams N 1 = {w 1 , w 2 , w 3 , . . . , w n } N = 2, the set of N-grams N 2 = {w 1 _w 2 , w 2_ w 3 , w 3 _w 4 , . . . , w n−1 _w n } N = 3, the set of N-grams N 3 = {w 1 _w 2 _w 3 , w 2 _w 3 _w 4 , w 3 _w 4 _w 5, . . . , w n−2 _w n−1 _w n } The fundamental idea is that, with the set of N-grams, it is possible to select a section of the entire input text. This ensures that we use the most significant words when building text vectors for sentiment analysis. Once the N-gram(s) are identified from the text, it is then reverted to a bag of words. Each word is then converted into a vector using the BERT word-embedding algorithm.

The LeBERT Embedding Algorithm
Let L: sentiment lexicon; C: corpus of subjective user reviews (R i ); V i : vector representation of a subjective review (R i ); W t : sentiment term; W 1 : the first word neighboring the sentiment term; and W 2 : the second word neighboring the sentiment term.
We define the text vector, v i , of a subjective review, R i , as the vector originating from a selected section of the review S i using sentiment lexicon and BERT word embedding model (Be). The algorithm listing of the sentence vector representation generation is presented in Algorithm 1.

The CNN Layer
The CNN deep learning model is used as the classifier, which uses the resultant vector from LeBERT embedding as input and gives the sentiment class as the output. CNNs are specialized types of artificial neural networks, which are capable of outperforming the common machine learning algorithms in supervised learning tasks. CNNs' main function is to identify and learn the information characteristic patterns through the use of convolution layers and thus facilitate classification of the objects. The CNN model is presented in Figure 3. Using the convolution kernels (windows) and the nonlinear function (filter), feature maps are obtained. A pooling operation is then applied on the feature maps to select the optimal features. The dense output layer then classifies the optimal features using softmax activation function (which uses probability) into a positive or a negative class.
Appl. Sci. 2023, 13, 1445 9 of 16 select the optimal features. The dense output layer then classifies the optimal features using softmax activation function (which uses probability) into a positive or a negative class.

Experiments
This section describes the dataset used; the experiments set up was carried out to evaluate the performance of the proposed model. The tools and techniques used in model formulation and evaluation are also discussed.

Dataset
In order to evaluate the effectiveness of the proposed model, the experiments were carried using a dataset complied from three public datasets. The dataset contains three world datasets including: Amazon products' reviews dataset, with 70,000 reviews, Imbd dataset, with 50,000 movie reviews, and Yelp dataset, with 300,000 restaurants' reviews.
In the experiments, we used 3000 reviews, as compiled by Kotzias et al. [40] and published in a machine learning repository. For each website, Kotzias et al. [40] randomly sampled 500 positive and 500 negative tweets, which were clearly positive and negative.

Experiment Setup
The reviews presented in the dataset were cleaned of non-English words and preprocessed. Tokenization, N-grams generation, text vector building, and designing of the CNN layers was conducted using python programming language in the virtual laboratory (Google Colaboratory). The obtained vector was split into two subsets, 80% of the dataset was used for training the CNN model, and the other 20% was used for evaluating the classification performance. Since the dataset contained multiple sentences (reviews), pooled output was used in the BERT embedding. The rectified linear unit (RELU) was used as the activation function, with 100 neurons for the hidden fully connected layer. The output dense layer was set up with two (2) neurons since the texts were to be classified into two classes. Softmax was used as the activation function, which was in line with the text classification problem at hand. In the study, we used 50-dimensional Glove word embeddings trained on Google News, 250-dimensional Word2Vec embeddings trained on Wikipedia, and 128-dimensional BERT embeddings trained on English Wikipedia corpus.
In the experiments, we used tensor flow tools to prepare the data and build our proposed

Experiments
This section describes the dataset used; the experiments set up was carried out to evaluate the performance of the proposed model. The tools and techniques used in model formulation and evaluation are also discussed.

Dataset
In order to evaluate the effectiveness of the proposed model, the experiments were carried using a dataset complied from three public datasets. The dataset contains three world datasets including: Amazon products' reviews dataset, with 70,000 reviews, Imbd dataset, with 50,000 movie reviews, and Yelp dataset, with 300,000 restaurants' reviews. In the experiments, we used 3000 reviews, as compiled by Kotzias et al. [40] and published in a machine learning repository. For each website, Kotzias et al. [40] randomly sampled 500 positive and 500 negative tweets, which were clearly positive and negative.

Experiment Setup
The reviews presented in the dataset were cleaned of non-English words and preprocessed. Tokenization, N-grams generation, text vector building, and designing of the CNN layers was conducted using python programming language in the virtual laboratory (Google Colaboratory). The obtained vector was split into two subsets, 80% of the dataset was used for training the CNN model, and the other 20% was used for evaluating the classification performance. Since the dataset contained multiple sentences (reviews), pooled output was used in the BERT embedding. The rectified linear unit (RELU) was used as the activation function, with 100 neurons for the hidden fully connected layer. The output dense layer was set up with two (2) neurons since the texts were to be classified into two classes. Softmax was used as the activation function, which was in line with the text classification problem at hand. In the study, we used 50-dimensional Glove word embeddings trained on Google News, 250-dimensional Word2Vec embeddings trained on Wikipedia, and 128-dimensional BERT embeddings trained on English Wikipedia corpus.
In the experiments, we used tensor flow tools to prepare the data and build our proposed model. Among the training set, a small potion (100) of the reviews was used for validation. In Section 4.3, we present the model parameters of the designed CNN model.

Model Parameters BERT, Glove, and Word2Vec Pre-Trained Word Embeddings
The model parameters for the BERT word embeddings were as shown in Table 1. From Table 1, the Keras layer represents the shape of embedding and the preprocessor used for the BERT model. In the experiment, the BERT word embeddings were initialized using small BERT due to limitations of computation resources. Consequently, the dimension of the word embedding was set to 128 and appropriate preprocessor for the BERT was set. Glove and Word2Vec word embeddings of 50 and 250 dimensions, respectively, were used as baseline models, and their parameters were set as shown in Tables 2 and 3.  From Tables 2 and 3, The Keras layer represents the input layer in which the input vector was obtained using the Glove and Word2Vec word embeddings. The shape of the Keras layer was determined by the dimensions of the word embeddings. The dense output layer is for binary classification of the input text into positive or negative sentiment.

Model Performance Evaluation
To verify the effectiveness of the proposed model, a 2 by 2 contingency matrix that shows the number of correctly predicted positive reviews (TP), true negative reviews (TN), false positive reviews (FP), and false negative reviews [41] was used, as shown in Table 4. Four model evaluation metrics were selected: accuracy, precision, recall, and F-measure. From Table 4, we calculated the metrics, as discussed and presented in Equations (2)- (5).
Accuracy is the ratio of the correctly classified predictions to the total sum of predictions. It is given as; Precision is the ratio of accurately classified data to the total data classified in the class. It is given as; Recall is the ratio of accurately classified data to the actual data in the class. It is given as; F-measure is the mean of precision and recall. It is given as;

Results and Discussion
This section describes the results obtained from the experiments. We first sought to test the effect of using sentiment lexicon on the input text data and the vector. We compared the shape of Yelp dataset (restaurants reviews) before and after using the sentiment lexicon. Table 5 presents the details of the text data. From Table 5, it was evident that application of sentiment lexicon to select a section of the input text significantly reduced the size of input text. Although the number of posts or paragraphs remained the same, the shape of the input text changed from 11 rows to 3 rows, which, in turn, would reduce the computation time for the model. We then designed and performed experiments with deep learning CNN to evaluate how the LeBERT embedding model would perform in sentiment analysis.

Ablation Study on Effect of Size of N-Grams on LeBERT Model
In order to verify the effectiveness of using LeBERT model as the embedding layer to generate word vectors, we first did an experiment to study the effect of the size of N-grams on the LeBERT model with CNN. In the experiment, the restaurant reviews datasets were used. The experimental results of N = 1,2, 3 and all words were as shown in Table 6. For N = 1, it implies that, for each sentence, only one word was used, which was chosen by the sentiment lexicon. The results indicate a low performance since one word cannot represent the sentiment of the entire text. The highest model performance was obtained at N = 3. As shown in Table 6, we generated N-grams up to N = 4 due to computational resources. The category of 'All words' implies that the sentiment lexicon was not applied on the input text to select some words, hence, this reverts to the original BERT model. The results indicated that N = 3 is an ideal size of N-gram for the proposed model. Section 5.2 presents the performance results of the model in comparison to the baseline models in the three datasets.

Comparison of LeBERT Model Performance with Baseline Models
The experiment was carried out to validate the performance of the proposed LeBERT model in terms of accuracy, recall, precision, and F-measure of the CNN on the three discussed datasets. Glove and Word2Vec were used as baseline word embedding models. In this experiment, tri-grams (N = 3) were used. Tables 7-9 show the performance results on restaurants reviews, movie reviews, and product reviews datasets, respectively. The presented tables indicate the comparative results between the pre-trained word embeddings, with and without the proposed fusion with sentiment lexicon. Generally, the proposed LeBERT model performs better compared to the baseline word embeddings models. Accuracy is considered to be a good performance evaluation metric when the classes are balanced [41]. Since, in our experiments all the three datasets exhibited balanced classes, we compared accuracy of the model with the various approaches for the three datasets. The results obtained were as shown in Figure 4.  The presented tables indicate the comparative results between the pre-trained word embeddings, with and without the proposed fusion with sentiment lexicon. Generally, the proposed LeBERT model performs better compared to the baseline word embeddings models. Accuracy is considered to be a good performance evaluation metric when the classes are balanced [41]. Since, in our experiments all the three datasets exhibited balanced classes, we compared accuracy of the model with the various approaches for the three datasets. The results obtained were as shown in Figure 4. From Figure 4, our proposed model (LeBERT) had the highest accuracy in all datasets, with relatively lower accuracy on Amazon's product reviews dataset. This could be attributed to the fact that the reviews referred to various products, and thus the sentiment terms varied from one product to another.

Conclusions
Sentiment analysis of social media reviews is a difficult task due to sparsity and high dimensionality of word vectors representing the text. Use of sentiment lexicon and word embedding algorithms can improve sentiment analysis models for text reviews. In this context, we proposed a sentiment analysis model, named LeBERT, based on sentiment lexicon, N-grams, BERT word embedding, and CNN. In the model, a section of a document or a sentence where sentiment information can be highly found is selected using sentiment lexicon and word N-grams, and then the words are vectorized using the BERT word embedding algorithm. A CNN classifier is then used to classify the input vector into a sentiment class. To validate the performance of the proposed LeBERT model, original Word2Vec, Glove, and BERT word embeddings were used as baseline models on three benchmark sentiment datasets. From the experimental results, use of sentiment lexicon From Figure 4, our proposed model (LeBERT) had the highest accuracy in all datasets, with relatively lower accuracy on Amazon's product reviews dataset. This could be attributed to the fact that the reviews referred to various products, and thus the sentiment terms varied from one product to another.

Conclusions
Sentiment analysis of social media reviews is a difficult task due to sparsity and high dimensionality of word vectors representing the text. Use of sentiment lexicon and word embedding algorithms can improve sentiment analysis models for text reviews. In this context, we proposed a sentiment analysis model, named LeBERT, based on sentiment lexicon, N-grams, BERT word embedding, and CNN. In the model, a section of a document or a sentence where sentiment information can be highly found is selected using sentiment lexicon and word N-grams, and then the words are vectorized using the BERT word embedding algorithm. A CNN classifier is then used to classify the input vector into a sentiment class. To validate the performance of the proposed LeBERT model, original Word2Vec, Glove, and BERT word embeddings were used as baseline models on three benchmark sentiment datasets. From the experimental results, use of sentiment lexicon significantly reduces the dimension of the input vector, thus improving efficiency of sentiment analysis models. Secondly, integration of sentiment lexicon and N-grams with BERT embedding algorithm yields a better representative word vector, hence increasing the predictive performance of the resultant sentiment analysis model. The results also indicated that sentiment lexicon with BERT (through LeBERT model) outperformed other word embedding algorithms. This paper had some limitations. The designed model utilized convolutional neural network (CNN) only. In the future, the LeBERT embedding model could be implemented and evaluated in other neural networks, such as long short-term memory (LSTM). Our proposed model was tested and found to be effective in binary sentiment classification, where sentiment lexicon was used. It would be interesting to evaluate the model on other text classification tasks where other types of lexicons are used.