A Semantic Enhancement Framework for Multimodal Sarcasm Detection
Abstract
:1. Introduction
- We propose a semantic enhancement framework (SEF) for multimodal sarcasm detection tasks that captures the intra- and inter-modal semantic information of both multiple spans and multiple granularities.
- By using contrastive learning, the semantics from text–image pairs within the same batch is exploited for semantic information enhancement, aiming to bridge the semantic gap between visual and textual modalities.
2. Related Work
3. Methodology
3.1. Feature Extractor
3.2. Cross-Modal Interaction
3.2.1. Token-Level Congruity
3.2.2. Global-Level Congruity
3.3. Semantic-Enhanced Module
3.4. Training and Learning Objectives
4. Experiments
4.1. Dataset
4.2. Experimental Settings
4.3. Baseline
5. Experimental Results
5.1. Main Results
5.2. Ablation Study
5.3. Effect of GAT Layer Number
5.4. Effect of GCN Layer Number
5.5. Case Study
5.6. Visualization
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gibbs, R.W. On the psycholinguistics of sarcasm. J. Exp. Psychol. Gen. 1986, 115, 3. [Google Scholar] [CrossRef]
- Liu, H.; Wang, W.; Li, H. Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement. arXiv 2022, arXiv:2210.03501. [Google Scholar]
- Babanejad, N.; Davoudi, H.; An, A.; Papagelis, M. Affective and contextual embedding for sarcasm detection. In Proceedings of the 28th International Conference on Computational Linguistics, Virtual, 8–13 December 2020; pp. 225–243. [Google Scholar]
- Kelishadrokhi, M.K.; Ghattaei, M.; Fekri-Ershad, S. Innovative local texture descriptor in joint of human-based color features for content-based image retrieval. Signal Image Video Process. 2023, 17, 4009–4017. [Google Scholar]
- Xu, N.; Zeng, Z.; Mao, W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 3777–3786. [Google Scholar]
- Pan, H.; Lin, Z.; Fu, P.; Qi, Y.; Wang, W. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Virtual, 16–20 November 2020; pp. 1383–1392. [Google Scholar]
- Liang, B.; Lou, C.; Li, X.; Gui, L.; Yang, M.; Xu, R. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4707–4715. [Google Scholar]
- Liang, B.; Lou, C.; Li, X.; Yang, M.; Gui, L.; He, Y.; Pei, W.; Xu, R. Multi-modal sarcasm detection via cross-modal graph convolutional network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 1767–1777. [Google Scholar]
- Pang, S.; Xue, Y.; Yan, Z.; Huang, W.; Feng, J. Dynamic and multi-channel graph convolutional networks for aspect-based sentiment analysis. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Virtual, 1–6 August 2021; pp. 2627–2636. [Google Scholar]
- Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual graph convolutional networks for aspect-based sentiment analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, 1–6 August 2021; Volume 1: Long Papers, pp. 6319–6329. [Google Scholar]
- Yu, H.; Lu, G.; Cai, Q.; Xue, Y. A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification. Mathematics 2022, 10, 3908. [Google Scholar] [CrossRef]
- Schifanella, R.; De Juan, P.; Tetreault, J.; Cao, L. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 September 2016; pp. 1136–1145. [Google Scholar]
- Cai, Y.; Cai, H.; Wan, X. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2506–2515. [Google Scholar]
- Yue, T.; Mao, R.; Wang, H.; Hu, Z.; Cambria, E. KnowleNet: Knowledge fusion network for multimodal sarcasm detection. Inf. Fusion 2023, 100, 101921. [Google Scholar]
- Qiao, Y.; Jing, L.; Song, X.; Chen, X.; Zhu, L.; Nie, L. Mutual-enhanced incongruity learning network for multi-modal sarcasm detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 9507–9515. [Google Scholar]
- Wen, C.; Jia, G.; Yang, J. DIP: Dual Incongruity Perceiving Network for Sarcasm Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2540–2550. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:2010.11929. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Xu, B.; Huang, S.; Sha, C.; Wang, H. MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual, 21–25 February 2022; pp. 1215–1223. [Google Scholar]
- Zhu, Z.; Zhang, D.; Li, L.; Li, K.; Qi, J.; Wang, W.; Zhang, G.; Liu, P. Knowledge-guided multi-granularity GCN for ABSA. Inf. Process. Manag. 2023, 60, 103223. [Google Scholar] [CrossRef]
- Gao, T.; Yao, X.; Chen, D. Simcse: Simple contrastive learning of sentence embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [PubMed]
- Tay, Y.; Tuan, L.A.; Hui, S.C.; Su, J. Reasoning with sarcasm by reading in-between. arXiv 2018, arXiv:1805.02856. [Google Scholar]
- Xiong, T.; Zhang, P.; Zhu, H.; Yang, Y. Sarcasm detection with self-matching networks and low-rank bilinear pooling. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2115–2124. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Training | Development | Testing | |
---|---|---|---|
Positive | 8642 | 959 | 959 |
Negative | 11,174 | 1451 | 1450 |
All | 19,816 | 2410 | 2409 |
Model | Acc (%) | Pre (%) | Rec (%) | F1 (%) | |
---|---|---|---|---|---|
Image | Image [13] | 64.76 | 54.41 | 70.80 | 61.53 |
ViT [18] | 67.83 | 57.93 | 70.07 | 63.43 | |
Text | TextCNN [24] | 80.03 | 74.29 | 76.39 | 75.32 |
Bi-LSTM [25] | 81.90 | 76.66 | 78.42 | 77.53 | |
SIARN [26] | 80.57 | 75.55 | 75.70 | 75.63 | |
SMSD [27] | 80.90 | 76.46 | 75.18 | 75.82 | |
BERT [17] | 83.85 | 78.72 | 82.27 | 80.22 | |
Multimodal | HFM [13] | 83.44 | 76.57 | 84.15 | 80.18 |
D&R Net [5] | 84.02 | 77.97 | 83.42 | 80.60 | |
Res-BERT [6] | 84.80 | 77.80 | 84.15 | 80.85 | |
Att-BERT [6] | 86.05 | 78.63 | 83.31 | 80.90 | |
InCrossMGs [7] | 86.10 | 81.38 | 84.36 | 82.84 | |
CMGCN [8] | 86.54 | - | - | 82.73 | |
HKE [2] | 87.36 | 81.84 | 86.48 | 84.09 | |
SEF (Ours) | 88.45 | 85.35 | 86.58 | 85.96 |
Model | Acc (%) | F1 (%) |
---|---|---|
SEF | 88.45 | 85.96 |
global | 87.91 | 84.97 |
simcse | 88.20 | 85.37 |
semantic | 87.65 | 84.78 |
Methods | Effectiveness of Semantic Enhancement | Effectiveness of Lobal Congruity | |
---|---|---|---|
another night having to grind out belgian beer styles, studying for certified <user>. bloody nightmare #beer #nightmare emoji_156 | apparently we have a potato shortage in rotherham this is what i received in a large fries box tonight <user> #valueformoney | hi there <user>, i don’t believe this room is large enough for one on one podcasts. #dominion | |
HKE | 56 | 56 | 56 |
SEF | 52 | 52 | 52 |
SEF semantic | 56 | 56 | 52 |
SEF global | 52 | 56 | 56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhong, W.; Zhang, Z.; Wu, Q.; Xue, Y.; Cai, Q. A Semantic Enhancement Framework for Multimodal Sarcasm Detection. Mathematics 2024, 12, 317. https://doi.org/10.3390/math12020317
Zhong W, Zhang Z, Wu Q, Xue Y, Cai Q. A Semantic Enhancement Framework for Multimodal Sarcasm Detection. Mathematics. 2024; 12(2):317. https://doi.org/10.3390/math12020317
Chicago/Turabian StyleZhong, Weiyu, Zhengxuan Zhang, Qiaofeng Wu, Yun Xue, and Qianhua Cai. 2024. "A Semantic Enhancement Framework for Multimodal Sarcasm Detection" Mathematics 12, no. 2: 317. https://doi.org/10.3390/math12020317
APA StyleZhong, W., Zhang, Z., Wu, Q., Xue, Y., & Cai, Q. (2024). A Semantic Enhancement Framework for Multimodal Sarcasm Detection. Mathematics, 12(2), 317. https://doi.org/10.3390/math12020317