Aspect Term Extraction Based on MFE-CRF
Abstract
:1. Introduction
- Put forward Multi-Feature Embedding (MFE) model: this model strengthens the text based on distributed word embedding to overcome the weakness of insufficient contextual information in word embedding.
- Put forward MFE-CRF: MFE clustering based on word embedding is used to strengthen the effect of CRF in aspect term extraction. MFE clustering classes are obtained by Kmeans++ algorithm and the clustering classes are set as the additional position features to strengthen the effect of CRF. It proves that MFE clustering can significantly improve the effect of CRF in aspect term extraction through experiments.
2. Related Work
2.1. Aspect Term Extraction
2.2. Aspect Term Extraction Based on BIO
2.2.1. Sequence Representation of Aspect Term
2.2.2. Sequence Prediction Based on CRF
3. CRF Model Based on MFE Clustering Reinforcement
3.1. MFE
3.1.1. Semantic Capture
- Word-POS: Traditional word embedding models cannot recognize the polysemy of words. There are two sentences “jack works hard every day” and “his works are amazing”. The meaning of “works” is different in the two sentences. The former is “work”, but the latter is “production”. To eliminate ambiguity, MFE combines Part-of-Speech (POS) with word to get the Word-POS feature. The authors specify a text contains l words, and then is obtained by part-of-speech tagging. Finally, combine and to get . “Works”, will be distinguished by verb and noun in Word-POS. Table 1 shows the comparison between traditional word embedding model and the Word-POS model regarding similar elements in IMDB (http://ai.stanford.edu/∼amaas/data/sentiment/) dataset (part of speech tagging is done by NLTK toolkit (http://www.nltk.org/#), word embedding is trained by Skip-Gram). The “nn” means noun and “vb” means verb in the Table 1.
- Stemming: To eliminate semantic redundancy, the stem of each word is extracted in the text. Then, the word stem sequence of the text is set as the input of the word embedding training model. Table 2 shows the comparison between traditional word embedding model and the stem model regarding similar elements.
3.1.2. Training of MFE
- Step2. Vector mapping. Subsequent to the text being transformed into the features sequence, it can be used as the input data of the word embedding training model. Through the training process of word embedding, the semantic features sequence can be mapped to the MFE vector. MFE assumes that the similar semantic text features have the similar structures of context. The window slides on the features sequence can map texts that have similar semantic features to vectors with similar distance. The overall structure of the training model is shown in Figure 1.
3.2. Features of MFE-CRF
3.2.1. General Features
- Lower-case word: Information of uppercase and lowercase needs to be considered separately; make lowercase as the general feature .
- Suffix of word: The suffix of a word can determine whether the word is aspect term. The last two and three words are extracted as the general features.
- Capital: Proper nouns and special words usually begin with uppercase words; they are more likely to be aspect terms.
- Part-of-speech: Aspect terms are usually nouns and other words with specific parts of speech, therefore, part-of-speech is one of the most important general features.
- Stemming: To get more compact sentences.
- Dependent syntactic:
- amod: Whether the word is modified by other words;
- nsubj: Whether the word is used as the subject for other words;
- dobj: Whether the word is used as a direct object for other words.
3.2.2. Cluster Features
- Step1. Determine the value of k and choose one center uniformly at random among the sample set (all word vectors).
- Step2. Taking each word vector , calculate the distance between the vector and the nearest center that has already been chosen by Formula (5). Specify two word vectors of length and . The distance between vectors can be calculated by Euclidean Distance:
- Step3. Choose one new center. Calculate the selected probability of each word vector using Formula (6), then select the new center by roulette wheel selection:
- Step4. Repeat Step2 and Step3 until k centers have been chosen.
- Step5. Regarding all the word vectors, calculate the distance between the vector and k centers using Formula (5). Mark the clustering class of the center that is the closest to each point as the clustering class for this point.
- Step6. Update the centers of the k classes, specify that the word vectors set belonging to the th clustering class as . The new centroid of this class is calculated by the mean value of vectors in :
- Step1. According to the steps described in Section 3.1.2, train Word-POS embedding and stem embedding in corpus.
- Step2. Cluster the Word-POS embedding and stem embedding by the method used in Word Clustering.
3.3. Process of MFE-CRF
- Step1. Extract the general features. According to the general features shown in Section 3.2.1, it needs to extract the features from each position of the observation sequences. Take the word “screen” for example, the general features dictionary for the word “screen” is {lower:‘screen’, lower[–2]:‘en’, lower[–3]:‘een’, isTitle: false, POS:‘NN’, stem: ‘screen’, amod: true, nsubj: true, dobj: false}.
- Step2. Extract the clustering features. According to the word and MFE clustering methods given in Section 3.2.2, six clustering models are constructed by Skip-Gram and CBOW. Get the clustering classes of each word in the observation sequence and set it as , the clustering features dictionary of is .
- Step3. Construction of CRF model. Figure 2 shows the overall process of the MFE-CRF model. CRF model is trained by position features of the observation and state sequence. Once the model converges, the position features of the test text are taken as the input and then the BIO sequence of the test text can be predicted.
4. Experiments
4.1. Experiment Description
- SemEval2014 (http://alt.qcri.org/semeval2014/task4/index.php?id=important-dates): The data set includes laptop and restaurant trial data. Restaurant trial data contains 3041 pieces of train data and 800 pieces of test data; laptop trial data contains 3045 pieces of train data and 800 pieces of test data. During this experiment, all the train data sets were used for position features extraction and model training, and test data sets were used for the evaluation. L-14 means laptop dataset and R-14 means restaurant dataset in SemEval2014.
- SemEval2015 (http://alt.qcri.org/semeval2015/task12/)/2016 (http://alt.qcri.org/semeval2016/task5/): The data sets include restaurant trial data that are similar to SemEval2014. R-15 contains 1363 pieces of train data and 685 pieces of test data; R-16 contains 2048 pieces of train data and 676 pieces of test data. During this experiment, all the training data sets were used for model training and test data sets were used for the additional evaluation.
- Yelp dataset (https://www.yelp.com/dataset): The training of word embedding and MFE requires a large number of training data, therefore the experiment used an additional yelp dataset for vector training to achieve a better effect. Yelp dataset contains 335,022 samples of restaurant-comment data; 200,000 comments were selected randomly for the expansion of vector training.
- Amazon product data (http://jmcauley.ucsd.edu/data/amazon/): The data set contains 1,689,188 electronic product reviews; the experiment selected randomly 200,000 comments containing the words “laptop” or “computer” to expand the training text.
4.2. Experiment Setup and Evaluation Measures
- CRF1: CRF1 was trained by the general features shown in Section 3.2.1. This model that was set as the baseline obtained the basic effect of aspect term extraction.
- CRF2: CRF2 model was reinforced by Skip-Gram word clustering on the basis of CRF1. The dimension of word embedding was 300, negative sample size was 5, the width of window was 10 and it ignored words that appeared less than 3 times.
- CRF3: CRF3 model was reinforced by CBOW word clustering on the basis of CRF2; the parameters of CBOW were consistent with Skip-Gram.
- CRF3+Stem: CRF3+Stem was reinforced by stem MFE clustering including CBOW and Skip-Gram on the basis of CRF3. The parameters of MFE were consistent with word embedding.
- CRF3+WP: CRF3+WP was reinforced by Word-POS MFE clustering, including CBOW and Skip-Gram on the basis of CRF3.
- CRF3+ALL: CRF3+ALL was reinforced by stem and Word-POS MFE clustering on the basis of CRF3.
4.3. Results Comparison and Analysis
4.3.1. Overall Assessment
4.3.2. Comparison with Other Methods on F1
- HIS_RD, DLIREC(U), EliXa(U) and NLANGP(U): HIS_RD was the best result of restaurant dataset in SemEval2014 [29], DLIREC(U) was the best result of laptop dataset in SemEval2014 [11]; EliXa(U) was the best result of restaurant dataset in SemEval2015 [30]; NLANGP(U) was the best result of restaurant dataset in SemEval2016 [31]. U means unconstrained and using additional resources without any constraint, such as lexicons or additional training data.
- LSTM, Bi-LSTM and Deep LSTM: These deep learning models were provided by Wu et al. [21].
- MTCA: MTCA was a multi-task attention model that learned shared information among different tasks [32].
4.3.3. The Evaluation of K
5. Conclusions and Future
- Better MFE vector training strategy: MFE in this paper is a derivative of word vector technology; the vector training algorithm is not thoroughly studied and improved. Additionally, the introduction of more additional semantic features also brings more burdens to training. Thus, more efficient MFE training methods need to be developed.
- Apply and improve deep learning in aspect term extraction: the MFE-CRF presented in this paper is still in the realm of traditional machine learning, so it is still necessary to explore how to introduce deep learning into aspect term extraction.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Yadollahi, A.; Shahraki, A.G.; Zaiane, O.R. Current State of Text Sentiment Analysis from Opinion to Emotion Mining. ACM Comput. Surv. 2017, 50, 25. [Google Scholar] [CrossRef]
- Giachanou, A.; Crestani, F. Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Comput. Surv. 2016, 49, 28. [Google Scholar] [CrossRef]
- Liu, B. Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Technol. 2012, 30, 1–167. [Google Scholar] [CrossRef]
- Thet, T.T.; Na, J.C.; Khoo, C.S. Aspect-based sentiment analysis of movie reviews on discussion boards. J. Inf. Sci. 2010, 36, 823–848. [Google Scholar] [CrossRef]
- Wen, H.; Zhao, J. Aspect term extraction of E-commerce comments based on model ensemble. In Proceedings of the 2017 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 15–17 December 2017; pp. 24–27. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems, Stateline, NV, USA, 5–10 December 2013; MIT Press Ltd.: Cambridge, MA, USA, 2013; pp. 3111–3119. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. Comput. Sci. 2013, arXiv:1301.3781. [Google Scholar]
- One-Hot. Available online: https://en.wikipedia.org/wiki/One-hot (accessed on 25 July 2018).
- Choi, Y.; Cardie, C.; Riloff, E.; Patwardhan, S. Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, 6–8 October 2005; Association for Computational Linguistic: Stroudsburg, PA, USA, 2005; pp. 355–362. [Google Scholar] [Green Version]
- Jakob, N.; Gurevych, I. Extracting Opinion Targets in a Single- and Cross-Domain Setting with Conditional Random Fields. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011. [Google Scholar]
- Miao, Q.; Li, Q.; Zeng, D. Mining Fine Grained Opinions by Using Probabilistic Models and Domain Knowledge. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto, ON, Canada, 31 August–3 September 2010; IEEE Computer Society: Washington, DC, USA, 2010; pp. 358–365. [Google Scholar]
- Toh, Z.; Wang, W. DLIREC: Aspect Term Extraction and Term Polarity Classification System. In Proceedings of the 8th International Workshop on Semantic Evaluation, Dublin, Ireland, 23–24 August 2014. [Google Scholar]
- Parkhe, V.; Biswas, B. Aspect Based Sentiment Analysis of Movie Reviews: Finding the Polarity Directing Aspects. In Proceedings of the International Conference on Soft Computing and Machine Intelligence, New Delhi, India, 26–27 September 2014; IEEE Computer Society: Washington, DC, USA, 2014; pp. 28–32. [Google Scholar]
- Guha, S.; Joshi, A.; Varma, V. SIEL: Aspect Based Sentiment Analysis in Reviews. In Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 4–5 June 2015. [Google Scholar]
- Román, J.V.; Cámara, E.M.; Morera, J.G.; Zafra, S.M. TASS 2014. The Challenge of Aspect-based Sentiment Analysis. Proces. Del Leng. Nat. 2015, 54, 61–68. [Google Scholar]
- Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.; Mohammad, A.S.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the International Workshop on Semantic Evaluation, Dublin, Ireland, 23–24 August 2014; pp. 27–35. [Google Scholar]
- Poria, S.; Ofek, N.; Gelbukh, A.; Hussain, A.; Rokach, L. Dependency Tree-Based Rules for Concept-Level Aspect-Based Sentiment Analysis. Commun. Comput. Inf. Sci. 2014, 475, 41–47. [Google Scholar]
- Khalid, S.; Khan, M.T.; Durrani, M.; Khan, K.H. Aspect-based Sentiment Analysis on a Large-Scale Data: Topic Models are the Preferred Solution. Bahria Univ. J. Inf. Commun. Technol. 2015, 8, 22–27. [Google Scholar]
- Poria, S.; Chaturvedi, I.; Cambria, E.; Bisio, F. Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 4465–4473. [Google Scholar]
- Schouten, K.; Baas, F.; Bus, O.; Osinga, A.; van de Ven, N.; van Loenhout, S.; Vrolijk, L.; Frasincar, F. Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns. In Proceedings of the Web Information Systems Engineering—WISE 2016, Shanghai, China, 7–10 November 2016. [Google Scholar]
- Wu, C.; Wu, F.; Wu, S.; Yuan, Z.; Huang, Y. A Hybrid Unsupervised Method for Aspect Term and Opinion Target Extraction. Knowl. Based Syst. 2018, 148, 66–73. [Google Scholar] [CrossRef]
- Manek, A.S.; Shenoy, P.D.; Mohan, M.C.; Venugopal, K.R. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 2017, 20, 35–154. [Google Scholar] [CrossRef]
- Weichselbraun, A.; Gindl, S.; Fischer, F.; Vakulenko, S.; Scharl, A. Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams. IEEE Intell. Syst. 2017, 32, 80–88. [Google Scholar] [CrossRef]
- Schusterböckler, B.; Bateman, A. An Introduction to Hidden Markov Models. Curr. Protoc. Bioinform. 2007, 18. [Google Scholar] [CrossRef]
- Lafferty, J.; McCallum, A.; Pereira, F.C. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 282–289. [Google Scholar]
- Liu, P.; Joty, S.; Meng, H. Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1433–1443. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Soc. 1979, 28, 100–108. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Acm-Siam Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; Louisiana Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007; pp. 1027–1035. [Google Scholar]
- Chernyshevich, M. IHS R&D Belarus: Cross-domain extraction of product features using CRF. In Proceedings of the 8th International Workshop on Semantic Evaluation, Dublin, Ireland, 23–24 August 2014. [Google Scholar]
- Vicente, I.S.; Saralegi, X.; Agerri, R. EliXa: A modular and flexible ABSA platform. In Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 4–5 June 2015. [Google Scholar]
- Toh, Z.; Su, J. NLANGP at semeval-2016 task 5: Improving aspect based sentiment analysis using neural network features. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA, 16–17 June 2016. [Google Scholar]
- Wang, W.; Pan, S.J.; Dahlmeier, D.; Xiao, X. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; AAAI Press: San Francisco, CA, USA, 2017; pp. 3316–3322. [Google Scholar]
Word/Word-POS | The Closest Word of the Vector |
---|---|
works | work, worked, done, plays, working, crafted, quite |
(works, nn) | (work, nn), (films, nn), (works, nn), (art, nn), (kundera, nn), (krzysztof, nn), (masterpiece, nn) |
(works, vb) | (worked, vb), (work, vb), (work, vb), (succeed, vb), (working, vb), (plays, vb), (done, vb) |
Word/Stem | The Closest Word of the Vector |
---|---|
Run (word) | runs, running, ran, mill, afoul, walk, amok |
Running (word) | run, runs, walking, run, minutes, around, screaming |
Run (stem) | walk, chase, amok, go, afoul, get, wander |
Sequence | |
---|---|
Original Sequence | Natural language processing is a field concerned with the interactions between computers and human languages. |
Pretreatment | natural language processing field concerned interactions computers human languages |
Word-POS | (natural, jj) (language, nn) (processing, nn) (field, nn) (concerned, vb) (interactions, nn) (computers, nn) (human, jj) (languages, nn) |
Stemming | natur languag process field concern interact comput human languag |
Models | Restaurant | Laptop | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
CRF1 | 84.91 | 73.83 | 78.98 | 83.09 | 56.70 | 67.71 |
CRF2 | 86.53 | 79.52 | 82.88 | 84.37 | 60.71 | 70.61 |
CRF3 | 86.27 | 80.02 | 83.03 | 85.92 | 64.63 | 73.77 |
CRF3+Stem | 86.24 | 81.33 | 83.71 | 87.07 | 67.02 | 75.74 |
CRF3+WP | 86.19 | 82.16 | 83.05 | 87.64 | 66.39 | 75.55 |
CRF3+ALL | 86.41 | 82.35 | 84.33 | 87.81 | 67.82 | 76.53 |
Models | L-14 | R-14 | R-15 | R-16 |
---|---|---|---|---|
HIS_RD | 74.55 | 79.62 | ||
DLIREC(U) | 73.78 | 84.01 | - | - |
EliXa(U) | - | - | 70.05 | - |
NLANGP(U) | - | - | 67.12 | 72.34 |
LSTM | 62.72 | 75.78 | - | - |
Bi-LSTM | 61.98 | 78.63 | - | - |
Deep LSTM | 64.27 | 77.67 | - | - |
MTCA | 69.14 | - | 71.31 | 73.26 |
CRF3+Stem | 75.74 | 83.71 | 68.74 | 72.73 |
CRF3+WP | 75.55 | 83.05 | 69.42 | 73.49 |
CRF3+ALL | 76.53 | 84.33 | 70.31 | 73.81 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, Y.; He, H.; Zheng, J. Aspect Term Extraction Based on MFE-CRF. Information 2018, 9, 198. https://doi.org/10.3390/info9080198
Xiang Y, He H, Zheng J. Aspect Term Extraction Based on MFE-CRF. Information. 2018; 9(8):198. https://doi.org/10.3390/info9080198
Chicago/Turabian StyleXiang, Yanmin, Hongye He, and Jin Zheng. 2018. "Aspect Term Extraction Based on MFE-CRF" Information 9, no. 8: 198. https://doi.org/10.3390/info9080198
APA StyleXiang, Y., He, H., & Zheng, J. (2018). Aspect Term Extraction Based on MFE-CRF. Information, 9(8), 198. https://doi.org/10.3390/info9080198