A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning
Abstract
:1. Introduction
- We define a new problem, commonsense causality generation, which benefits many NLP applications such as question answering.
- We propose a method to automatically create a cause-effect pairs corpus which facilitates the training process for commonsense causality generation.
- We propose a novel causal attention fusion mechanism which introduces the global causal dependencies observed from an external knowledge source. Extensive experiments show that our approach outperforms multiple strong baselines by a substantial margin.
2. Related Work
3. Our Approach
3.1. Cause-Effect Pairs Extraction
- Cue Sentences Collection. We begin by collecting sentences containing either “because” or “so” through regular expression matching. These sentences are then processed using the Stanford CoreNLP [38] tools for tokenization, POS tagging, constituent parsing, and dependency parsing.
- Negation Detection. During the analysis, if the causal conjunction node in the dependency tree has a negation word sibling, such as “not” or “n’t”, we exclude this sentence from further extraction.
- Clause Detection. To extract cause-effect sentence pairs, we devise syntactic pattern matching rules. Specifically, we ensure that the occurrence of“because" in the sentence introduces a subordinate conjunction word leading to a subordinate clause (tagged as ‘SBAR’), which in turn contains a declarative sentence(‘S’) with a subject(‘NP’) and predicate(‘VP’). Similarly, we design analogous syntactic rules for the ‘so that’ pattern. Additional illustration of these syntactic rules is provided in Figure 1.
- Spans Extraction. Next, we remove redundant punctuation and unwanted sentence constituents from both the main clause and the subordinate clause subtrees. The text spans within these subtrees are then extracted as cause-effect sentence pairs.
3.2. Background: CNN Seq2seq Model
3.3. Causal Attention Fusion Mechanism
4. Experiments
4.1. Dataset Description
4.1.1. Cause-Effect Pairs Corpus
4.1.2. Evaluation Datasets
4.2. Baselines
4.2.1. Selection-Based Methods
4.2.2. Generation-Based Methods
4.3. Evaluation Metrics
- BLEU scores [42] includes BLEU-1, BLEU-2, and BLEU-3 which combines modified n-gram precision and a sentence brevity penalty:
- Accuracy (Acc) is the evaluation metric for COPA, computed as:
4.4. Experiment Results and Analysis
4.4.1. Comparison Results
4.4.2. Case Studies
5. Implications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Roemmele, M.; Bejan, C.A.; Gordon, A.S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In Proceedings of the AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, Stanford, CA, USA, 21–23 March 2011; AAAI Press: Washington, DC, USA, 2011. [Google Scholar]
- Luo, Z.; Sha, Y.; Zhu, K.Q.; Hwang, S.; Wang, Z. Commonsense causal reasoning between short texts. In Principles of Knowledge Representation and Reasoning: Proceedings of the 15th International Conference (KR-16), Cape Town, South Africa, 25–29 April 2016; AAAI Press: Washington, DC, USA, 2016; pp. 421–431. [Google Scholar]
- Goodwin, T.; Rink, B.; Roberts, K.; Harabagiu, S.M. UTDHLT: COPACETIC system for choosing plausible alternatives. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Stroudsburg, PA, USA, 7–8 June 2012; pp. 461–466. [Google Scholar]
- Jabeen, S.; Gao, X.; Andreae, P. Using asymmetric associations for commonsense causality detection. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Gold Coast, Australia, 1–5 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 877–883. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The power of scale for parameter-efficient prompt tuning. arXiv 2021, arXiv:2104.08691. [Google Scholar]
- Wang, J.; Hou, Y.; Liu, J.; Cao, Y.; Lin, C.Y. A statistical framework for product description generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, 8 November 2017; Short Papers. Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; Volume 2, pp. 187–192. [Google Scholar]
- Chen, Y.; Li, V.O.; Cho, K.; Bowman, S.R. A Stable and Effective Learning Strategy for Trainable Greedy Decoding. arXiv 2018, arXiv:1804.07915. [Google Scholar]
- Song, K.; Tan, X.; He, D.; Lu, J.; Qin, T.; Liu, T.Y. Double path networks for sequence to sequence learning. arXiv 2018, arXiv:1806.04856. [Google Scholar]
- Wu, F.; Fan, A.; Baevski, A.; Dauphin, Y.N.; Auli, M. Pay Less Attention with Lightweight and Dynamic Convolutions. arXiv 2019, arXiv:1901.10430. [Google Scholar]
- Wang, W.; Jiao, W.; Hao, Y.; Wang, X.; Shi, S.; Tu, Z.; Lyu, M. Understanding and improving sequence-to-sequence pretraining for neural machine translation. arXiv 2022, arXiv:2203.08442. [Google Scholar]
- Fan, A.; Grangier, D.; Auli, M. Controllable abstractive summarization. arXiv 2017, arXiv:1711.05217. [Google Scholar]
- Liu, Y.; Luo, Z.; Zhu, K. Controlling length in abstractive summarization using a convolutional neural network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4110–4119. [Google Scholar]
- Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv 2018, arXiv:1808.08745. [Google Scholar]
- Guo, J.; Xu, L.; Chen, E. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 376–385. [Google Scholar]
- Kouris, P.; Alexandridis, G.; Stafylopatis, A. Abstractive text summarization: Enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Comput. Linguist. 2021, 47, 813–859. [Google Scholar] [CrossRef]
- Joshi, A.; Fidalgo, E.; Alegre, E.; Fernández-Robles, L. DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Syst. Appl. 2023, 211, 118442. [Google Scholar] [CrossRef]
- Baevski, A.; Auli, M. Adaptive input representations for neural language modeling. arXiv 2018, arXiv:1809.10853. [Google Scholar]
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv 2019, arXiv:1905.02450. [Google Scholar]
- Lin, S.C.; Yang, J.H.; Nogueira, R.; Tsai, M.F.; Wang, C.J.; Lin, J. Conversational question reformulation via sequence-to-sequence architectures and pretrained language models. arXiv 2020, arXiv:2004.01909. [Google Scholar]
- Fan, A.; Lewis, M.; Dauphin, Y. Hierarchical neural story generation. arXiv 2018, arXiv:1805.04833. [Google Scholar]
- Fan, A.; Lewis, M.; Dauphin, Y. Strategies for Structuring Story Generation. arXiv 2019, arXiv:1902.01109. [Google Scholar]
- Miller, A.H.; Feng, W.; Fisch, A.; Lu, J.; Batra, D.; Bordes, A.; Parikh, D.; Weston, J. Parlai: A dialog research software platform. arXiv 2017, arXiv:1705.06476. [Google Scholar]
- Dinan, E.; Roller, S.; Shuster, K.; Fan, A.; Auli, M.; Weston, J. Wizard of wikipedia: Knowledge-powered conversational agents. arXiv 2018, arXiv:1811.01241. [Google Scholar]
- Zhao, J.; Mahdieh, M.; Zhang, Y.; Cao, Y.; Wu, Y. Effective sequence-to-sequence dialogue state tracking. arXiv 2021, arXiv:2108.13990. [Google Scholar]
- Ott, M.; Edunov, S.; Baevski, A.; Fan, A.; Gross, S.; Ng, N.; Grangier, D.; Auli, M. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the NAACL-HLT 2019: Demonstrations, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Gordon, A.S.; Bejan, C.A.; Sagae, K. Commonsense causal reasoning using millions of personal stories. In Proceedings of the Association for the Advancement of Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011; AAAI Press: Washington, DC, USA, 2011. [Google Scholar]
- Church, K.W.; Hanks, P. Word association norms, mutual information, and lexicography. Comput. Linguist. 1990, 16, 22–29. [Google Scholar]
- Roemmele, M.; Gordon, A. An encoder-decoder approach to predicting causal relations in stories. In Proceedings of the 1st Workshop on Storytelling, New Orleans, LA, USA, 7 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 50–59. [Google Scholar]
- Dasgupta, I.; Wang, J.X.; Chiappa, S.; Mitrovic, J.; Ortega, P.A.; Raposo, D.; Hughes, E.; Battaglia, P.; Botvinick, M.; Kurth-Nelson, Z. Causal Reasoning from Meta-reinforcement Learning. arXiv 2019, arXiv:1901.08162v1. [Google Scholar]
- Yeo, J.; Wang, G.; Cho, H.; Choi, S.; Hwang, S. Machine-Translated Knowledge Transfer for Commonsense Causal Reasoning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 2021–2028. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
- Paulus, R.; Xiong, C.; Socher, R. A Deep Reinforced Model for Abstractive Summarization. arXiv 2017, arXiv:1705.04304. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1073–1083. [Google Scholar] [CrossRef]
- Nallapati, R.; Zhou, B.; dos Santos, C.N.; Gülçehre, Ç.; Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August 2016; pp. 280–290. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; pp. 1243–1252. [Google Scholar]
- Fan, A.; Grangier, D.; Auli, M. Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, 20 July 2018; pp. 45–54. [Google Scholar]
- Manning, D.C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.J.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Trinh, T.H.; Le, Q.V. A simple method for commonsense reasoning. arXiv 2018, arXiv:1806.02847. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]
Method | Main Objective | Pros | Cons |
---|---|---|---|
Feature-based methods | Causal/Non-causal classification | Utilizes diverse feature datasets | Limited to predefined features |
Co-occurrence based Methods | Causal strength computation | Captures statistical dependencies | Limited to lexical templates |
Neural-based methods (previous) | Causality prediction | Provides a high level of accuracy for causal/non-causal classification | Limited to acquiring causal pairs; Struggles with capturing complex causalities |
Neural-based methods (ours) | Causality generation | Enables the generation of causal sentences; captures complex causalities leveraging external knowledge sources | Limited to the word-based attention fusion mechanism |
Model | BLEU-1 | BLEU-2 | BLEU-3 | Human Accuracy |
---|---|---|---|---|
OUniMat | 17.71 | 1.17 | 0.0 | 0.26 |
PUniMat | 17.54 | - | - | 0.22 |
BiMat | 19.95 | 2.55 | 0.01 | 0.23 |
CAttn | 14.41 | 4.74 | 1.21 | 0.18 |
MultiSAttn | 18.72 | 9.67 | 4.76 | 0.21 |
FAttn | 20.06 | 9.98 | 4.82 | 0.35 |
Model | BLEU-1 | BLEU-2 | BLEU-3 | Accuracy |
---|---|---|---|---|
OUniMat | 11.15 | 0.14 | 0.0 | 0.26 |
PUniMat | 12.35 | - | - | 0.17 |
BiMat | 13.42 | 0.41 | 0.0 | 0.26 |
CAttn | 21.64 | 10.90 | 1.94 | 0.21 |
MultiSAttn | 37.26 | 27.68 | 18.42 | 0.25 |
FAttn | 36.94 | 27.78 | 18.73 | 0.52 |
(a) Causality generation case from the NOVTest dataset. | ||
Cause | the little boy forward was always in the shelter of her arms | |
Effect | lien pushed the little boy forward and inched herself along on her heels | |
forward reasoning | CAttn | <f> unk had been a good one |
MultiSAttn | <f> they were n’t going to kill them | |
FAttn | <f> unk was upset about it | |
backward reasoning | CAttn | <b> the unk had been so much more than a year ago |
MultiSAttn | <b> she was so small | |
FAttn | <b> the little boy was so fond of her | |
(b) Causality generation case from the COPATest dataset. | ||
Cause | the husband discovered that the husband wife was having an affair | |
Effect | the husband filed for divorce | |
forward reasoning | CAttn | <f> i was n’t sure what to say |
MultiSAttn | <f> the husband was not in the least concerned | |
FAttn | <f> the husband was disappointed | |
backward reasoning | CAttn | <b> the unk had to be careful |
MultiSAttn | <b> the woman was convicted of the woman | |
FAttn | <b> the woman wanted to get rid of her husband |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Z.; Liu, Y.; Luo, S. A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning. Mathematics 2023, 11, 4796. https://doi.org/10.3390/math11234796
Luo Z, Liu Y, Luo S. A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning. Mathematics. 2023; 11(23):4796. https://doi.org/10.3390/math11234796
Chicago/Turabian StyleLuo, Zhiyi, Yizhu Liu, and Shuyun Luo. 2023. "A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning" Mathematics 11, no. 23: 4796. https://doi.org/10.3390/math11234796
APA StyleLuo, Z., Liu, Y., & Luo, S. (2023). A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning. Mathematics, 11(23), 4796. https://doi.org/10.3390/math11234796