Next Article in Journal / Special Issue
A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek
Previous Article in Journal
A New Quintic Spline Method for Integro Interpolation and Its Error Analysis
Previous Article in Special Issue
Mining Domain-Specific Design Patterns: A Case Study †
Open AccessArticle

Large Scale Implementations for Twitter Sentiment Classification

Computer Engineering and Informatics Department, University of Patras, Patras 26504, Greece
Department of Informatics, Ionian University, Corfu 49100, Greece
Department of Cultural Heritage Management and New Technologies, University of Patras, Agrinio 30100, Greece
Computer & Informatics Engineering Department, Technological Educational Institute of Western Greece, Patras 26334, Greece
Author to whom correspondence should be addressed.
Academic Editor: Bruno Carpentieri
Algorithms 2017, 10(1), 33;
Received: 8 December 2016 / Revised: 28 February 2017 / Accepted: 1 March 2017 / Published: 4 March 2017
(This article belongs to the Special Issue Humanistic Data Processing)
PDF [343 KB, uploaded 8 March 2017]


Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification. View Full-Text
Keywords: Apache Spark; Big Data; Bloom Filters; Hadoop; MapReduce; Twitter Apache Spark; Big Data; Bloom Filters; Hadoop; MapReduce; Twitter

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Kanavos, A.; Nodarakis, N.; Sioutas, S.; Tsakalidis, A.; Tsolis, D.; Tzimas, G. Large Scale Implementations for Twitter Sentiment Classification. Algorithms 2017, 10, 33.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top