Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
Abstract
1. Introduction
2. Background Information
2.1. Text Classification Approaches
2.2. Satire Identification
3. Materials and Methods
3.1. Data Acquisition
3.2. Feature Extraction
3.2.1. Term-Counting Features
3.2.2. Word Embeddings
- Word2Vec. Word2Vec was one of the firsts models for obtaining word embeddings. With word2vec, word embeddings are learned by training a neural network, the objective of which is next-word prediction [40]. Specifically, there are two methods for learning word-embeddings with Word2Vec: (1) Continuous Bag of Words Model (CBOW), in which the objective is to predict a word based on the context words; and (2) Skip-Grams, in which the objective is just the opposite: predicting context words from a target word. Regardless of the approach, both strategies learn the underlying word representations. The difference is that the CBOW model is faster as the same time that provides better accuracy with frequent words and the Skip-Gram model is more accurate using smaller training data at the same time that provides better representation of a word that appears infrequently. The pre-trained word embeddings from Word2Vec used in this experiment were trained with the Spanish Billion Corpora [41].
- GloVe. GloVe is a technique to learn with word embeddings that exploit statistical information regarding word co-occurrences that is better suited for performing NLP tasks such as word-analogy or entity recognition [42]. As the opposite of Word2Vec, where word embeddings are learned applying raw co-occurrence probabilities, GloVe learns the ratio between co-occurrences, which improves to learn fine-grained details in the relevance of two linked terms. The pre-trained word embeddings from GloVe used in this work were trained with the Spanish Billion Corpora [41].
- FastText. FastText is inspired in the word2vec model but it represents each word as a sequence of character n-grams [43]. FastText is, therefore, aware of unknown words and misspellings. In addition, the character n-grams allows to capture extra semantic information in different types of languages. For example, in inflected languages such as Spanish, it can capture information about prefixes and suffixes, including information about number and grammatical gender. In agglutinative languages, such as German, in which words can be made up of other words, character n-grams can include information of both words. It is worth noting that these character n-grams behaves internally to a BoW model, so it does not take the internal order of the character n-grams into account. FastText has available pre-trained word embeddings from different languages, including Spanish [44] trained with Wikipedia. For this experiment, however, we use the pre-trained word embeddings of fastText trained with the Spanish Unannotated Corpora [45]. This decision was made because this pre-trained word embeddings have used more sources, including subtitles, news and legislative text of the European Union.
3.3. Supervised Classifiers
3.3.1. Machine-Learning Classifiers
- Random Forest (RF). They belong to the decision trees family. Decision trees are algorithms that build a tree structure composed of decision rules on the form if-then-else. Each split decision is based on the idea of entropy, maximising the homogeneity of new subsets. Decision trees are popular because they provide good results, they can be used in both classification and regression problems and, in smaller datasets, they provide interpretable models. However, they present some drawbacks. First, they tend to generate over-fitted models by creating over-complex trees that do not generalise the underlying pattern. Second, they are very sensitive to the input data and small changes can result in completely different trees. Third, decision trees are affected by bias when the dataset is unbalanced. In this work, we selected Random Forest [46]. Random Forest is an ensemble machine-learning method that uses bagging for creating several decision trees and averaging their results. Moreover, each random forest tree considers only a subset of the features and a set of random examples, which reduces the overfitting of the model.
- Support Vector Machines (SVM). They are a family of classifiers based on the distribution of the classes over a hyperspace and determine the separation that distributed the classes best. Support Vector Machines allow the usage of different kernels that solve linear and non-linear classification problems. Some works that have evaluated satire identification applying SVM can be found at [16,20].
- Multinomial Naïve Bayes (MNB). It is a probabilistic classifier that it is based on the Bayes’ theorem. Specifically, the naïve variant of this classifier assumes an independence between all the features and classes. This classifier has been evaluated for solving similar tasks like irony detection [25].
3.3.2. Deep-Learning Architectures
- Multilayer Perceptron (MLP). Deep learning models are composed of stacked layers of perceptrons in which every node is fully connected with the others and there is, at least, one hidden layer. In this work we have evaluated different vanilla neural networks including different number of layers, neurons per layer, batch sizes, and structures. The details of this process are described in Section 3.5.
- Convolutional Neural Networks (CNNs). According to [48], convolutional deep neural networks employ specific layers that convolve filters that are applied to local features. CNN became popular for computer vision, but they have also achieved competitive results for NLP tasks such as text-classification [15]. The main idea behind CNNs is that they can effectively manage spatial features. In NLP, that means that CNN are capable of understanding joint words. In this work, we stacked a Spatial Dropout layer, a Convolutional Layer, and a Global Max Pooling layer. During the hyper-parameter evaluation, we tested to concatenate the convolutional neural network to several feed-forward neural networks.
- Recurrent Neural Networks (RNNs). RNNs are deep-learning architectures in which the input is treated as a sequence and the connection between units is a directed cycle. In RNNs both the input and output layer are somehow related. Moreover, bidirectional RNNs can consider past states and weights but also future ones. RNNs are widely used in NLP because they handle the input as a sequence, which is suitable for natural language. For example, BiLSTM have been applied for conducting text classification [49] or irony detection [50]. For this experiment, we evaluated two bidirectional RNNs: Bidirectional Gated Recurrent Units (BiGRU) and Bidirectional Long-Short Term Memory Units (BiLSTM). BiGRU is an improved version of RNNs that solves the vanishing gradient problem by using two gates (update and reset), which filter the information directed to the output. BiGRU can keep long memory information. As we did with the CNN, we evaluated to connect RNN layers to different neural networks layers.
3.4. Models
3.5. Hyper-Parameter Optimisation
4. Results
4.1. Traditional Machine-Learning with Term-Counting Feature Sets Results
4.2. Deep-Learning with Pre-Trained Word Embeddings Results
5. Discussion
5.1. Insights
- Term-counting features provide, in general, better accuracy for automatic satire identification. It draws our attention that features based on term-counting outperformed those based on pre-trained word embeddings. We consider two main explanations for this fact. On the one hand, the dataset is small (less than 5000 documents for each linguistic variant), so it is possible that it is easier to categorise texts based on the words that appear rather than model a more complex relationship between words as the deep-learning architectures do. Moreover, as we observed in the analysis of the corpus, each corpus only contains tweets from four different accounts, so it is possible that all the models trained are learning to discern between those accounts but they are not learning the underlying difference among satirical and non-satirical utterances (see Figure 2).
- As we can get from Table 8, we can observe that BiGRU with FastText obtains the best results in all the datasets. The average accuracy for the European Spanish dataset is 80.16392% with a standard deviation of 0.85288. For the Mexican Spanish, the average accuracy is 88.16067% with a standard deviation of 1.21629, and for the full dataset the average accuracy is 83.60150% with a standard deviation of 1.11221. The major difference we identify is regarding the pre-trained word embedding selection in the Mexican Spanish dataset with RNNs.
- Character n-grams are more reliable than word n-grams for satire classification. Regarding the high variability on the accuracy obtained between the word n-gram (see Table 6) and character n-gram (see Table 6), we considered three main hypotheses: (1) important differences regarding satire identification based on the cultural background, (2) the presence of noise data in the Spanish corpus, or (3) the different ration between satiric and non-satiric utterances on both datasets. In order to determine the reasons for these hypotheses, we observed that the results achieved with the Mexican Spanish dataset are always higher than the ones achieved with the European dataset. We consider that the strong imbalance between the number of tweets between satirical and non-satirical accounts (see Figure 2) biased these results, thus we consider that it is necessary to evaluate these results with a more homogeneous dataset.
- Multinomial Bayes achieves better accuracy for term-counting features based on word n-grams whereas Support Vector Machines achieve better results with character n-grams. MNB achieved the best accuracy with the Mexican Spanish and the full dataset, and the second best result with the European dataset. Regarding spatial data, both CNNs (see Table 8) and term-counting features (see Table 6 and Table 7) achieved similar results.
5.2. Comparison with Other Approaches
6. Conclusions and Further Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Condren, C. Satire and definition. Humor 2012, 25, 375–399. [Google Scholar] [CrossRef]
- Lee, H.; Kwak, N. The Affect Effect of Political Satire: Sarcastic Humor, Negative Emotions, and Political Participation. Mass Commun. Soc. 2014, 17, 307–328. [Google Scholar] [CrossRef]
- Chen, H.T.; Gan, C.; Sun, P. How does political satire influence political participation? Examining the role of counter-and pro-attitudinal exposure, anger, and personal issue importance. Int. J. Commun. 2017, 11, 19. [Google Scholar]
- Shabani, S.; Sokhn, M. Hybrid machine-crowd approach for fake news detection. In Proceedings of the 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), Philadelphia, PA, USA, 18–20 October 2018; pp. 299–306. [Google Scholar]
- del Pilar Salas-Zárate, M.; Alor-Hernández, G.; Sánchez-Cervantes, J.L.; Paredes-Valverde, M.A.; García-Alcaraz, J.L.; Valencia-García, R. Review of English literature on figurative language applied to social networks. Knowl. Inf. Syst. 2020, 62, 2105–2137. [Google Scholar] [CrossRef]
- Colston, H.L. Figurative language development/acquisition research: Status and ways forward. J. Pragmat. 2020, 156, 176–190. [Google Scholar] [CrossRef]
- Weitzel, L.; Prati, R.C.; Aguiar, R.F. The comprehension of figurative language: What is the influence of irony and sarcasm on NLP techniques? In Sentiment Analysis and Ontology Engineering; Springer: Berlin/Heidelberg, Germany, 2016; pp. 49–74. [Google Scholar]
- Eke, C.I.; Norman, A.A.; Shuib, L.; Nweke, H.F. Sarcasm identification in textual data: Systematic review, research challenges and open directions. Artif. Intell. Rev. 2020, 53, 4215–4258. [Google Scholar] [CrossRef]
- Canete, J.; Chaperon, G.; Fuentes, R.; Pérez, J. Spanish pre-trained bert model and evaluation data. PML4DC ICLR 2020, 2020. Available online: https://users.dcc.uchile.cl/~jperez/papers/pml4dc2020.pdf (accessed on 19 October 2020).
- del Arco, F.M.P.; Molina-González, M.D.; Ureña-López, L.A.; Martín-Valdivia, M.T. Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 2021, 166, 114120. [Google Scholar] [CrossRef]
- Liu, H.; Yin, Q.; Wang, W.Y. Towards explainable NLP: A generative explanation framework for text classification. arXiv 2018, arXiv:1811.00196. [Google Scholar]
- Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Altınel, B.; Ganiz, M.C. Semantic text classification: A survey of past and recent advances. Inf. Process. Manag. 2018, 54, 1129–1153. [Google Scholar] [CrossRef]
- Apolinardo-Arzube, O.; García-Díaz, J.A.; Medina-Moreira, J.; Luna-Aveiga, H.; Valencia-García, R. Evaluating information-retrieval models and machine-learning classifiers for measuring the social perception towards infectious diseases. Appl. Sci. 2019, 9, 2858. [Google Scholar] [CrossRef]
- Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative Study of CNN and RNN for Natural Language Processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
- Reganti, A.N.; Maheshwari, T.; Kumar, U.; Das, A.; Bajpai, R. Modeling satire in English text for automatic detection. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 970–977. [Google Scholar]
- Ravi, K.; Ravi, V. A novel automatic satire and irony detection using ensembled feature selection and data mining. Knowl. Based Syst. 2017, 120, 15–33. [Google Scholar] [CrossRef]
- Tsonkov, T.V.; Koychev, I. Automatic detection of double meaning in texts from the social networks. In Proceedings of the 2015 Balkan Conference on Informatics: Advances in ICT, Craiova, Romania, 2–4 September 2015; pp. 33–39. [Google Scholar]
- Barbieri, F.; Ronzano, F.; Saggion, H. Do we criticise (and laugh) in the same way? Automatic detection of multi-lingual satirical news in Twitter. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- del Pilar Salas-Zárate, M.; Paredes-Valverde, M.A.; Rodriguez-García, M.Á.; Valencia-García, R.; Alor-Hernández, G. Automatic detection of satire in Twitter: A psycholinguistic-based approach. Knowl. Based Syst. 2017, 128, 20–33. [Google Scholar] [CrossRef]
- Tausczik, Y.R.; Pennebaker, J.W. The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 2010, 29, 24–54. [Google Scholar] [CrossRef]
- Sharma, A.S.; Mridul, M.A.; Islam, M.S. Automatic Detection of Satire in Bangla Documents: A CNN Approach Based on Hybrid Feature Extraction Model. In Proceedings of the 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh, 27–28 September 2019; pp. 1–5. [Google Scholar]
- Toçoğlu, M.A.; Onan, A. Satire detection in Turkish news articles: A machine learning approach. In Proceedings of the International Conference on Big Data Innovations and Applications, Istanbul, Turkey, 26–28 August 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 107–117. [Google Scholar]
- Rashkin, H.; Choi, E.; Jang, J.Y.; Volkova, S.; Choi, Y. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 2931–2937. [Google Scholar]
- Ortega-Bueno, R.; Rangel, F.; Hernández Farıas, D.; Rosso, P.; Montes-y Gómez, M.; Medina Pagola, J.E. Overview of the task on irony detection in Spanish variants. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019. [Google Scholar]
- Cignarella, A.T.; Bosco, C. ATC at IroSva 2019: Shallow syntactic dependency-based features for irony detection in Spanish variants. In Proceedings of the 35th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; Volume 2421, pp. 257–263. [Google Scholar]
- Miranda-Belmonte, H.U.; López-Monroy, A.P. Early Fusion of Traditional and Deep Features for Irony Detection in Twitter. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; pp. 272–277. [Google Scholar]
- González, J.Á.; Hurtado, L.F.; Pla, F. ELiRF-UPV at IroSvA: Transformer Encoders for Spanish Irony Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; pp. 278–284. [Google Scholar]
- Garcıa, L.; Moctezuma, D.; Muniz, V. A Contextualized Word Representation Approach for Irony Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019. [Google Scholar]
- Iranzo-Sánchez, J.; Ruiz-Dolz, R. VRAIN at IroSva 2019: Exploring Classical and Transfer Learning Approaches to Short Message Irony Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; pp. 322–328. [Google Scholar]
- Frenda, S.; Patti, V. Computational Models for Irony Detection in Three Spanish Variants. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; pp. 297–309. [Google Scholar]
- Deon, D.J.; de Freitas, L.A. UFPelRules to Irony Detection in Spanish Variants. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019; pp. 310–314. [Google Scholar]
- Castro, D.; Benavides, L. UO-CERPAMID at IroSvA: Impostor Method Adaptation for Irony Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-Located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), Bilbao, Spain, 24 September 2019. [Google Scholar]
- Barbieri, F.; Ronzano, F.; Saggion, H. Is this Tweet satirical? A computational approach for satire detection in Spanish. Proces. Leng. Nat. 2015, 55, 135–142. [Google Scholar]
- García-Díaz, J.A.; Almela, A.; Alcaraz-Mármol, G.; Valencia-García, R. UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks. Proces. Leng. Nat. 2020, 65, 139–142. [Google Scholar]
- Oliver, I. Programming Classics: Implementing the World’s Best Algorithms; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1994. [Google Scholar]
- Mehri, A.; Jamaati, M. Variation of Zipf’s exponent in one hundred live languages: A study of the Holy Bible translations. Phys. Lett. A 2017, 381, 2470–2477. [Google Scholar] [CrossRef]
- Krasnowska-Kieraś, K.; Wróblewska, A. Empirical linguistic study of sentence embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5729–5739. [Google Scholar]
- Tang, D.; Wei, F.; Yang, N.; Zhou, M.; Liu, T.; Qin, B. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 1555–1565. [Google Scholar]
- Goldberg, Y.; Levy, O. word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
- Cardellino, C. Spanish Billion Words Corpus and Embeddings. 2019. Available online: https://crscardellino.github.io/SBWCE/ (accessed on 19 October 2020).
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Grave, E.; Bojanowski, P.; Puhrsch, C.; Joulin, A. Advances in pre-training distributed word representations. arXiv 2017, arXiv:1712.09405. [Google Scholar]
- Grave, E.; Bojanowski, P.; Gupta, P.; Joulin, A.; Mikolov, T. Learning word vectors for 157 languages. arXiv 2018, arXiv:1802.06893. [Google Scholar]
- Compilation of Large Spanish Unannotated Corpora [Data Set]. 2019. Available online: https://github.com/josecannete/unannotated-spanish-corpora (accessed on 19 October 2020).
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Stöckl, A. Detecting Satire in the News with Machine Learning. arXiv 2018, arXiv:1810.00593. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv 2016, arXiv:1611.06639. [Google Scholar]
- Zhang, S.; Zhang, X.; Chan, J.; Rosso, P. Irony detection via sentiment-based transfer learning. Inf. Process. Manag. 2019, 56, 1633–1644. [Google Scholar] [CrossRef]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 19 October 2020).
- Autonomio Talos [Computer Software]. 2019. Available online: https://github.com/autonomio/talos (accessed on 19 October 2020).
- Mozetič, I.; Grčar, M.; Smailović, J. Multilingual Twitter sentiment classification: The role of human annotators. PLoS ONE 2016, 11, e155036. [Google Scholar] [CrossRef] [PubMed]
- García-Díaz, J.A.; Cánovas-García, M.; Valencia-García, R. Ontology-driven aspect-based sentiment analysis classification: An infodemiological case study regarding infectious diseases in Latin America. Future Gener. Comput. Syst. 2020, 112, 641–657. [Google Scholar] [CrossRef]
- García-Díaz, J.A.; Cánovas-García, M.; Colomo-Palacios, R.; Valencia-García, R. Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings. Future Gener. Comput. Syst. 2020, 114, 506–518. [Google Scholar] [CrossRef]





| Feature-Set | Tweets | Satirical | Non-Satirical | 
|---|---|---|---|
| European Spanish | 4821 | 2488 | 2333 | 
| train (60%) | 2892 | 1493 | 1400 | 
| evaluation (20%) | 964 | 497 | 466 | 
| test (20%) | 965 | 498 | 467 | 
| Mexican Spanish | 4956 | 2488 | 2468 | 
| train (60%) | 2974 | 1493 | 1481 | 
| evaluation (20%) | 991 | 497 | 493 | 
| test (20%) | 991 | 498 | 494 | 
| Hyper-Parameter | Options | 
|---|---|
| word_n_grams | [(1, 1), (1, 2), (1, 3)] | 
| character_n_grams | [(4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (4, 10)] | 
| min_df | [0.01, 0.1, 1] | 
| sublinear_tf | [True, False] | 
| use_IDF | [True, False]] | 
| strip_accents | [None, ’unicode’] | 
| rf__n_estimators | [200, 400, 800, 1600] | 
| rf__max_depth | [10, 100, 200] | 
| svm__kernel | [’rbf’, ’poly’, ’linear’] | 
| lr__solver | [’liblinear’, ’lbfgs’] | 
| lr__fit_intercept | [True, False] | 
| Hyper-Parameter | RF | SVM | MNB | LR | ||||
|---|---|---|---|---|---|---|---|---|
| ES | MS | ES | MS | ES | MS | ES | MS | |
| min_df | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 
| sublinear_tf | False | False | True | False | False | True | True | True | 
| use_IDF | False | False | True | True | True | True | True | True | 
| strip_accents | unicode | None | unicode | None | unicode | None | unicode | None | 
| rf__n_estimators | 400 | 400 | - | - | - | - | - | - | 
| rf__max_depth | 200 | 200 | - | - | - | - | - | - | 
| svm__kernel | - | - | rbf | rbf | - | - | - | - | 
| lr__solver | - | - | - | - | - | - | linear | linear | 
| lr__fit_intercept | - | - | - | - | - | - | True | True | 
| Hyper-Parameter | Options | 
|---|---|
| Activation | [elu, relu, selu, sigmoid, tanh] | 
| Batch size | [16, 32, 64] | 
| Dropout | [False, 0.2, 0.5, 0.8] | 
| Neurons per layer | [8, 16, 48, 64, 128, 256] | 
| Learning rate | (0.5, 2, 10) | 
| Numbers of layers | [1, 2, 3, 4] | 
| Shape | [’brick’, ’funnel’] | 
| Adjust embeddings | [True, False] | 
| Word2Vec | ||||||||
|---|---|---|---|---|---|---|---|---|
| Hyper-Parameter | CNN | BiLSTM | BiGRU | MLP | ||||
| ES | MS | ES | MS | ES | MS | ES | MS | |
| Activation | elu | relu | selu | elu | tanh | relu | elu | elu | 
| Batch size | 16 | 32 | 32 | 32 | 64 | 64 | 64 | 16 | 
| Dropout | 0.2 | False | 0.2 | 0.2 | 0.8 | 0.5 | False | 0.2 | 
| Neurons per layer | 256 | 16 | 64 | 64 | 64 | 64 | 128 | 64 | 
| Learning rate | 0.8 | 1.85 | 0.8 | 1.4 | 1.1 | 1.4 | 1.25 | 1.4 | 
| Numbers of layers | 1 | 2 | 2 | 4 | 2 | 4 | 3 | 3 | 
| Shape | - | brick | brick | funnel | funnel | funnel | funnel | brick | 
| Adjust embeddings | True | True | True | True | True | True | True | True | 
| Glove | ||||||||
| Hyper-Parameter | CNN | BiLSTM | BiGRU | MLP | ||||
| ES | MS | ES | MS | ES | MS | ES | MS | |
| Activation | elu | tanh | tanh | elu | sigmoid | tanh | sigmoid | sigmoid | 
| Batch size | 32 | 64 | 16 | 16 | 64 | 16 | 32 | 32 | 
| Dropout | 0.8 | 0.2 | 0.5 | 0.5 | 0.5 | 0.2 | 0.2 | 0.2 | 
| Neurons per layer | 64 | 16 | 128 | 256 | 48 | 256 | 8 | 8 | 
| Learning rate | 1.85 | 1.55 | 0.8 | 0.95 | 1.85 | 1.55 | 0.65 | 0.65 | 
| Numbers of layers | 1 | 1 | 3 | 3 | 4 | 2 | 1 | 1 | 
| Shape | - | - | brick | brick | funnel | funnel | - | - | 
| Adjust embeddings | True | True | True | True | True | True | True | True | 
| FastText | ||||||||
| Hyper-Parameter | CNN | BiLSTM | BiGRU | MLP | ||||
| ES | MS | ES | MS | ES | MS | ES | MS | |
| Activation | selu | selu | selu | selu | sigmoid | sigmoid | sigmoid | sigmoid | 
| Batch size | 32 | 32 | 32 | 32 | 32 | 32 | 16 | 16 | 
| Dropout | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 
| Neurons per layer | 256 | 256 | 16 | 64 | 128 | 128 | 64 | 64 | 
| Learning rate | 1.25 | 1.25 | 1.85 | 1.4 | 1.7 | 1.7 | 1.85 | 1.85 | 
| Numbers of layers | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 
| Shape | funnel | funnel | brick | funnel | - | - | funnel | funnel | 
| Adjust embeddings | True | True | True | True | True | True | True | True | 
| Feature-Set | RF | SVM | MNB | LR | 
|---|---|---|---|---|
| European Spanish | ||||
| 1 word n-grams | 76.477 | 81.244 | 79.896 | 78.964 | 
| 1-2 word n-grams | 74.648 | 79.171 | 80.933 | 77.824 | 
| 1-2-3 word n-grams | 74.611 | 77.720 | 81.140 | 77.409 | 
| Mexican Spanish | ||||
| 1 word n-grams | 82.863 | 88.911 | 89.718 | 86.593 | 
| 1-2 word n-grams | 83.266 | 88.206 | 89.415 | 87.298 | 
| 1-2-3 word n-grams | 84.173 | 87.500 | 89.919 | 86.694 | 
| Full dataset | ||||
| 1 word n-grams | 78.528 | 84.714 | 83.947 | 81.186 | 
| 1-2 word n-grams | 79.499 | 83.589 | 84.458 | 81.851 | 
| 1-2-3 word n-grams | 78.067 | 82.882 | 85.225 | 81.544 | 
| Feature-Set | RF | SVM | MNB | LR | 
|---|---|---|---|---|
| European Spanish | ||||
| 4 character n-grams | 82.591 | 83.316 | 79.793 | 80.829 | 
| 4-5 character n-grams | 80.725 | 83.523 | 78.756 | 81.347 | 
| 4-6 character n-grams | 80.104 | 83.212 | 78.031 | 81.036 | 
| 4-7 character n-grams | 78.342 | 82.902 | 77.617 | 81.140 | 
| 4-8 character n-grams | 79.067 | 82.694 | 77.617 | 81.036 | 
| 4-9 character n-grams | 78.860 | 82.487 | 77.617 | 80.518 | 
| 4-10 character n-grams | 78.860 | 82.694 | 77.617 | 80.518 | 
| Mexican Spanish | ||||
| 4 character n-grams | 86.089 | 89.415 | 88.508 | 87.903 | 
| 4-5 character n-grams | 86.996 | 90.020 | 89.214 | 88.710 | 
| 4-6 character n-grams | 86.593 | 90.524 | 88.012 | 89.516 | 
| 4-7 character n-grams | 87.298 | 90.726 | 89.810 | 90.524 | 
| 4-8 character n-grams | 86.996 | 91.028 | 89.911 | 90.726 | 
| 4-9 character n-grams | 87.097 | 91.230 | 89.113 | 90.625 | 
| 4-10 character n-grams | 86.593 | 91.431 | 89.012 | 90.524 | 
| Full dataset | ||||
| 4 character n-grams | 82.311 | 85.838 | 82.209 | 83.538 | 
| 4-5 character n-grams | 81.595 | 85.481 | 81.442 | 83.282 | 
| 4-6 character n-grams | 81.953 | 85.429 | 81.748 | 83.589 | 
| 4-7 character n-grams | 82.106 | 85.276 | 81.493 | 83.282 | 
| 4-8 character n-grams | 80.675 | 85.378 | 81.544 | 83.487 | 
| 4-9 character n-grams | 81.186 | 85.020 | 81.493 | 83.078 | 
| 4-10 character n-grams | 81.084 | 84.867 | 81.288 | 83.771 | 
| Model | CNN | BiLSTM | BiGRU | MLP | 
|---|---|---|---|---|
| European Spanish | ||||
| Word2Vec | 79.896 | 80.104 | 80.829 | 79.793 | 
| GloVe | 78.134 | 80.622 | 79.482 | 80.310 | 
| FastText | 80.103 | 80.933 | 81.554 | 80.207 | 
| Mexican Spanish | ||||
| Word2Vec | 87.770 | 86.996 | 86.491 | 88.709 | 
| GloVe | 87.399 | 88.407 | 86.391 | 88.911 | 
| FastText | 88.407 | 89.415 | 90.524 | 88.508 | 
| Full dataset | ||||
| Word2Vec | 83.384 | 83.538 | 82.924 | 83.077 | 
| GloVe | 81.595 | 84.918 | 82.822 | 82.924 | 
| FastText | 84.407 | 84.969 | 85.429 | 83.231 | 
| Feature Set | Classifier | Accuracy | 
|---|---|---|
| European Spanish | ||
| 1 word n-grams | SVM | 81.244 | 
| 4-5 character n-grams | SVM | 83.523 | 
| FastText | BiGRU | 81.554 | 
| Mexican Spanish | ||
| 1-2-3 word n-grams | MNB | 89.919 | 
| 4-10 character n-grams | SVM | 91.431 | 
| FastText | BiGRU | 90.524 | 
| Full dataset | ||
| 1-2-3 word n-grams | MNB | 85.225 | 
| 4 character n-grams | SVM | 85.838 | 
| FastText | BiGRU | 85.429 | 
| Dataset | Linguistic Features [20] | Term-Counting Features | Pre-Trained Word Embeddings | 
|---|---|---|---|
| European Spanish | 84.000 | 83.523 | 81.554 | 
| Mexican Spanish | 85.500 | 91.431 | 90.524 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Apolinario-Arzube, Ó.; García-Díaz, J.A.; Medina-Moreira, J.; Luna-Aveiga, H.; Valencia-García, R. Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets. Mathematics 2020, 8, 2075. https://doi.org/10.3390/math8112075
Apolinario-Arzube Ó, García-Díaz JA, Medina-Moreira J, Luna-Aveiga H, Valencia-García R. Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets. Mathematics. 2020; 8(11):2075. https://doi.org/10.3390/math8112075
Chicago/Turabian StyleApolinario-Arzube, Óscar, José Antonio García-Díaz, José Medina-Moreira, Harry Luna-Aveiga, and Rafael Valencia-García. 2020. "Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets" Mathematics 8, no. 11: 2075. https://doi.org/10.3390/math8112075
APA StyleApolinario-Arzube, Ó., García-Díaz, J. A., Medina-Moreira, J., Luna-Aveiga, H., & Valencia-García, R. (2020). Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets. Mathematics, 8(11), 2075. https://doi.org/10.3390/math8112075
 
        


 
       