Text-Centric Multimodal Contrastive Learning for Sentiment Analysis
Abstract
:1. Introduction
- We have introduced a text-centric multimodal contrastive learning (TCMCL) framework for sentiment analysis. The framework uses audio and visual modal information to provide auxiliary augmentations to textual content.
- We have proposed two contrastive learning strategies based on instance prediction and sentiment polarity, aiming to unearth deep sentimental space information and achieve implicit cross-modal fusion alignment.
- Our model achieves state-of-the-art performance on the CMU-MOSI and CMU-MOSEI datasets.
2. Related Work
2.1. Multimodal Sentiment Analysis
2.2. Contrastive Learning
3. Methods
3.1. Overall Architecture
3.2. Single-Modal Feature Extraction
3.3. Cross-Modal Text Augmentation
3.3.1. Siamese Network Structure
3.3.2. Contrastive Learning Task: IPCL and SPCL
3.4. Total Training Loss
4. Experiments
4.1. Datasets and Evaluation Indicators
4.2. Experimental Details
4.3. Baselines
4.4. Experimental Results
4.5. Ablation Study
4.5.1. Uni-Modal Versus Multimodal
4.5.2. With or Without Contrastive Learning Tasks
4.5.3. Feature-Level Attention Versus Sequence-Level Attention
4.6. Parameter Experiments
4.6.1. Different Attention Unit Layers
4.6.2. Different Loss Weights
4.7. Visualization Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 423–443. [Google Scholar] [CrossRef] [PubMed]
- Mullen, T.; Collier, N. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 412–418. [Google Scholar]
- Whitelaw, C.; Garg, N.; Argamon, S. Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, 31 October–5 November 2005; pp. 625–631. [Google Scholar]
- Yi, J.; Nasukawa, T.; Bunescu, R.; Niblack, W. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL, USA, 19–22 December 2003; pp. 427–434. [Google Scholar]
- Schuller, B.; Batliner, A.; Steidl, S.; Seppi, D. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun. 2011, 53, 1062–1087. [Google Scholar] [CrossRef]
- Xia, R.; Liu, Y. Using denoising autoencoder for emotion recognition. In Proceedings of the Interspeech, Lyon, France, 25–29 August 2013; pp. 2886–2889. [Google Scholar]
- Deng, J.; Xia, R.; Zhang, Z.; Liu, Y.; Schuller, B. Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 4818–4822. [Google Scholar]
- Zhao, S.; Jia, G.; Yang, J.; Ding, G.; Keutzer, K. Emotion recognition from multiple modalities: Fundamentals and methodologies. IEEE Signal Process. Mag. 2021, 38, 59–73. [Google Scholar] [CrossRef]
- Morency, L.P.; Mihalcea, R.; Doshi, P. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th International Conference on Multimodal Interfaces, Alicante, Spain, 14–18 November 2011; pp. 169–176. [Google Scholar]
- Rosas, V.P.; Mihalcea, R.; Morency, L.P. Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 2013, 28, 38–45. [Google Scholar] [CrossRef]
- Poria, S.; Cambria, E.; Hussain, A.; Huang, G.B. Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 2015, 63, 104–116. [Google Scholar] [CrossRef] [PubMed]
- Park, S.; Shim, H.S.; Chatterjee, M.; Sagae, K.; Morency, L.P. Multimodal analysis and prediction of persuasiveness in online social multimedia. ACM Trans. Interact. Intell. Syst. (TiiS) 2016, 6, 1–25. [Google Scholar] [CrossRef]
- Han, W.; Chen, H.; Gelbukh, A.; Zadeh, A.; Morency, L.p.; Poria, S. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In Proceedings of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, 18–22 October 2021; pp. 6–15. [Google Scholar]
- Zadeh, A.; Chen, M.; Poria, S.; Cambria, E.; Morency, L.P. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1103–1114. [Google Scholar]
- Barezi, E.J.; Fung, P. Modality-based Factorization for Multimodal Fusion. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, 2 August 2019; pp. 260–269. [Google Scholar]
- Zadeh, A.; Liang, P.P.; Mazumder, N.; Poria, S.; Cambria, E.; Morency, L.P. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Paraskevopoulos, G.; Georgiou, E.; Potamianos, A. Mmlatch: Bottom-up top-down fusion for multimodal sentiment analysis. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual and Singapore, 23–27 May 2022; pp. 4573–4577. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Han, W.; Chen, H.; Poria, S. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event, 7–11 November 2021; pp. 9180–9192. [Google Scholar]
- Yu, W.; Xu, H.; Yuan, Z.; Wu, J. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 10790–10797. [Google Scholar]
- Hazarika, D.; Zimmermann, R.; Poria, S. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1122–1131. [Google Scholar]
- Mai, S.; Zeng, Y.; Hu, H. Multimodal information bottleneck: Learning minimal sufficient unimodal and multimodal representations. IEEE Trans. Multimed. 2022, 25, 4121–4134. [Google Scholar] [CrossRef]
- Sun, L.; Lian, Z.; Liu, B.; Tao, J. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans. Affect. Comput. 2023, 15, 309–325. [Google Scholar] [CrossRef]
- Chen, Q.; Huang, G.; Wang, Y. The weighted cross-modal attention mechanism with sentiment prediction auxiliary task for multimodal sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 2689–2695. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Yang, K.; Xu, H.; Gao, K. Cm-bert: Cross-modal bert for text-audio sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 521–528. [Google Scholar]
- Rahman, W.; Hasan, M.K.; Lee, S.; Zadeh, A.; Mao, C.; Morency, L.P.; Hoque, E. Integrating multimodal information in large pretrained transformers. In Proceedings of the Conference, Association for Computational Linguistics, Meeting, Online, 5–10 July 2020; Volume 2020, p. 2359. [Google Scholar]
- Kim, K.; Park, S. AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis. Inf. Fusion 2023, 92, 37–45. [Google Scholar] [CrossRef]
- Tsai, Y.H.H.; Bai, S.; Liang, P.P.; Kolter, J.Z.; Morency, L.P.; Salakhutdinov, R. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference, Association for Computational Linguistics, Meeting, Florence, Italy, 28 July–2 August 2019; Volume 2019, p. 6558. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; p. 1724. [Google Scholar]
- Poria, S.; Chaturvedi, I.; Cambria, E.; Hussain, A. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 439–448. [Google Scholar]
- Kumar, A.; Vepa, J. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4477–4481. [Google Scholar]
- Nguyen, D.; Nguyen, K.; Sridharan, S.; Dean, D.; Fookes, C. Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition. Comput. Vis. Image Underst. 2018, 174, 33–42. [Google Scholar] [CrossRef]
- Zhang, Y.; Song, D.; Li, X.; Zhang, P.; Wang, P.; Rong, L.; Yu, G.; Wang, B. A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis. Inf. Fusion 2020, 62, 14–31. [Google Scholar] [CrossRef]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; Walker, M., Ji, H., Stent, A., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2018; pp. 2227–2237. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training; OpenAI: San Francisco, CA, USA, 2018. [Google Scholar]
- Zeng, Y.; Li, Z.; Tang, Z.; Chen, Z.; Ma, H. Heterogeneous graph convolution based on in-domain self-supervision for multimodal sentiment analysis. Expert Syst. Appl. 2023, 213, 119240. [Google Scholar] [CrossRef]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3733–3742. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Volume 33, pp. 21271–21284. [Google Scholar]
- Yan, Y.; Li, R.; Wang, S.; Zhang, F.; Wu, W.; Xu, W. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; Association for Computational Linguistics: Kerrville, TX, USA, 2021. [Google Scholar]
- Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event, 7–11 November 2021; Association for Computational Linguistics (ACL): Kerrville, TX, USA, 2021; pp. 6894–6910. [Google Scholar]
- Degottex, G.; Kane, J.; Drugman, T.; Raitio, T.; Scherer, S. COVAREP—A collaborative voice analysis repository for speech technologies. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 960–964. [Google Scholar]
- iMotions. Facial Expression Analysis. 2017. Available online: https://imotions.com/ (accessed on 20 October 2023).
- Chen, M.; Wang, S.; Liang, P.P.; Baltrušaitis, T.; Zadeh, A.; Morency, L.P. Multimodal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 163–171. [Google Scholar]
- Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Zadeh, A.; Zellers, R.; Pincus, E.; Morency, L.P. Mosi: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv 2016, arXiv:1606.06259. [Google Scholar]
- Zadeh, A.B.; Liang, P.P.; Poria, S.; Cambria, E.; Morency, L.P. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2236–2246. [Google Scholar]
- Tsai, Y.H.H.; Liang, P.P.; Zadeh, A.; Morency, L.P.; Salakhutdinov, R. Learning Factorized Multimodal Representations. In Proceedings of the International Conference on Representation Learning, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Yu, T.; Gao, H.; Lin, T.E.; Yang, M.; Wu, Y.; Ma, W.; Wang, C.; Huang, F.; Li, Y. Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 7900–7913. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Parameter | Value |
---|---|
Length of feature sequence: N | 50 |
Number of heads of multi-attention mechanisms: i | 5 |
Number of layers of attention units: K | 1 |
Projection layer: Linear layer size | 768 and 128 |
Temperature parameter: | 0.7 |
Dropout rate | 0.5 |
Model | CMU-MOSI | CMU-MOSEI | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
MAE ↓ | Corr ↑ | Acc2 ↑ | F1 ↑ | Acc7 ↑ | MAE ↓ | Corr ↑ | Acc2 ↑ | F1 ↑ | Acc7 ↑ | |
TFN | 0.944 | 0.672 | 79.3/80.0 | 79.3/80.1 | 33.8 | 0.566 | 0.708 | 80.1/82.1 | 80.2/82.3 | 48.8 |
MFN | 0.952 | 0.695 | 79.1/80.6 | 79.0/80.5 | 32.7 | 0.589 | 0.725 | 79.9/82.4 | 80.0/82.6 | 47.4 |
MFM | 0.915 | 0.704 | 79.8/80.4 | 79.8/80.2 | 33.2 | 0.632 | 0.719 | 80.0/82.8 | 80.6/83.0 | 49.2 |
MulT | 0.787 | 0.783 | 80.8/82.1 | 80.9/82.2 | 36.2 | 0.617 | 0.722 | 82.5/83.5 | 82.6/83.7 | 50.9 |
MISA | 0.771 | 0.786 | 81.6/83.2 | 81.6/83.3 | 39.1 | 0.599 | 0.724 | 82.1/84.3 | 82.4/84.4 | 48.9 |
MAG-BERT | 0.731 | 0.783 | 82.7/85.0 | 82.6/85.0 | 44.3 | 0.563 | 0.749 | 82.3/84.9 | 82.6/84.9 | 51.4 |
Self-MM | 0.727 | 0.787 | 83.0/85.1 | 82.3/85.1 | 44.2 | 0.559 | 0.744 | 81.5/85.1 | 81.7/85.1 | 51.2 |
MMIM | 0.729 | 0.782 | 83.0/85.3 | 83.0/85.2 | 44.4 | 0.556 | 0.753 | 82.0/85.3 | 82.4/85.2 | 51.6 |
MIB | 0.723 | 0.769 | 82.8/85.3 | 82.8/85.2 | 42.6 | 0.584 | 0.741 | 82.0/84.4 | 81.9/84.3 | 51.9 |
MMLATCH | 0.736 | 0.721 | 81.7/84.1 | 81.7/84.1 | 43.0 | 0.582 | 0.723 | 81.2/83.0 | 81.2/83.0 | 52.1 |
SPECTRA | 0.721 | 0.790 | 83.1/85.8 | 83.1/85.8 | 44.7 | 0.551 | 0.749 | 82.2/85.6 | 82.1/85.5 | 52.3 |
TCMCL | 0.704 | 0.807 | 84.4/86.7 | 84.3/86.7 | 45.0 | 0.541 | 0.759 | 82.8/85.8 | 83.2/85.7 | 52.8 |
MAE ↓ | Corr ↑ | Acc2 ↑ | F1 ↑ | Acc7 ↑ | |
---|---|---|---|---|---|
Text | 0.793 | 0.769 | 81.9/83.8 | 81.7/83.8 | 41.0 |
Audio | 0.971 | 0.667 | 76.5/77.4 | 76.5/77.5 | 30.2 |
Visual | 0.944 | 0.723 | 78.0/79.5 | 78.0/79.6 | 32.8 |
Audio-centric | 0.808 | 0.768 | 81.6/83.0 | 81.6/83.1 | 40.3 |
Visual-centric | 0.756 | 0.780 | 82.5/84.4 | 82.6/84.4 | 42.4 |
Balance Model | 0.747 | 0.778 | 82.2/84.9 | 82.2/84.8 | 41.6 |
Text-centric | 0.704 | 0.807 | 84.4/86.7 | 84.3/86.7 | 45.0 |
MAE ↓ | Corr ↑ | Acc2 ↑ | F1 ↑ | Acc7 ↑ | |
---|---|---|---|---|---|
w/o IPCL | 0.731 | 0.785 | 83.0/85.5 | 83.2/85.5 | 43.4 |
w/o SPCL | 0.719 | 0.797 | 83.9/86.1 | 83.8/86.1 | 43.9 |
w/o CL | 0.745 | 0.784 | 82.8/84.7 | 82.8/84.7 | 42.3 |
TCMCL | 0.704 | 0.807 | 84.4/86.7 | 84.3/86.7 | 45.0 |
MAE ↓ | Corr ↑ | Acc2 ↑ | F1 ↑ | Acc7 ↑ | |
---|---|---|---|---|---|
Sequence-level attention | 0.739 | 0.788 | 82.9/84.8 | 82.9/84.9 | 41.6 |
Feature-level attention | 0.704 | 0.807 | 84.4/86.7 | 84.3/86.7 | 45.0 |
0.05 | 0.1 | 0.15 | 0.2 | 0.25 | 0.3 | ||
0.05 | 86.0 | 85.9 | 84.8 | 83.9 | 84.3 | 83.8 | |
0.1 | 86.7 | 85.4 | 84.6 | 84.0 | 84.2 | 83.6 | |
0.15 | 85.5 | 84.4 | 83.9 | 83.7 | 82.9 | 82.9 | |
0.2 | 84.7 | 84.1 | 84.0 | 83.5 | 82.7 | 82.6 | |
0.25 | 83.9 | 83.7 | 83.2 | 83.5 | 82.8 | 82.6 | |
0.3 | 84.0 | 83.2 | 83.0 | 82.4 | 82.5 | 82.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, H.; Gu, X.; Li, J.; Wang, Z.; Xu, H. Text-Centric Multimodal Contrastive Learning for Sentiment Analysis. Electronics 2024, 13, 1149. https://doi.org/10.3390/electronics13061149
Peng H, Gu X, Li J, Wang Z, Xu H. Text-Centric Multimodal Contrastive Learning for Sentiment Analysis. Electronics. 2024; 13(6):1149. https://doi.org/10.3390/electronics13061149
Chicago/Turabian StylePeng, Heng, Xue Gu, Jian Li, Zhaodan Wang, and Hao Xu. 2024. "Text-Centric Multimodal Contrastive Learning for Sentiment Analysis" Electronics 13, no. 6: 1149. https://doi.org/10.3390/electronics13061149
APA StylePeng, H., Gu, X., Li, J., Wang, Z., & Xu, H. (2024). Text-Centric Multimodal Contrastive Learning for Sentiment Analysis. Electronics, 13(6), 1149. https://doi.org/10.3390/electronics13061149