Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebanks
Abstract
:1. Introduction
Contributions
- We present a methodology for testing the reliability of probing tasks by building control datasets at diverse levels of complexity;
- We assess the extent to which the linguistic knowledge encoded by BERT is influenced by the length of a sentence and how the length can represent a confounding factor that may bias the real estimate of BERT’s knowledge of a wide variety of (morpho-)syntactic phenomena;
- We test the effectiveness of the diagnostic probing task approach on Italian, a language frequently neglected by studies on probing.
2. The Diagnostic Probing Paradigm
3. Methodology
3.1. Data
3.2. Linguistic Features
- (1)
- In Svizzera, alcuni militanti si sono arrampicati sul tetto dell’ambasciata. [trad. ‘In Switzerland, some militants climbed onto the roof of the embassy.’]
3.3. Control Datasets
3.4. Models
4. Results
4.1. Probing on the Standard Subset
4.2. Probing on the Shortest and Longest Subsets
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 6000–6010. Available online: https://dl.acm.org/doi/abs/10.5555/3295222.3295349 (accessed on 20 September 2022).
- Wang, A.; Pruksachatkun, Y.; Nangia, N.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S. Superglue: A stickier benchmark for general-purpose language understanding systems. Adv. Neural Inf. Process. Syst. 2019, 32, 3266–3280. [Google Scholar]
- Yang, W.; Xie, Y.; Lin, A.; Li, X.; Tan, L.; Xiong, K.; Li, M.; Lin, J. End-to-End Open-Domain Question Answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 72–77. [Google Scholar] [CrossRef] [Green Version]
- Naseem, U.; Razzak, I.; Musial, K.; Imran, M. Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Gener. Comput. Syst. 2020, 113, 58–69. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2020, arXiv:1907.11692. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners; OpenAI Blog: San Francisco, CA, USA, 2019. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers). Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Rogers, A.; Kovaleva, O.; Rumshisky, A. A Primer in BERTology: What We Know About How BERT Works. Trans. Assoc. Comput. Linguist. 2020, 8, 842–866. [Google Scholar] [CrossRef]
- Belinkov, Y.; Màrquez, L.; Sajjad, H.; Durrani, N.; Dalvi, F.; Glass, J. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, Taipei, Taiwan, 27 November–1 December 2017; pp. 1–10. [Google Scholar]
- Ettinger, A. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Trans. Assoc. Comput. Linguist. 2020, 8, 34–48. [Google Scholar] [CrossRef]
- Morger, F.; Brandl, S.; Beinborn, L.; Hollenstein, N. A Cross-lingual Comparison of Human and Model Relative Word Importance. In Proceedings of the 2022 CLASP Conference on (Dis)embodiment, Gothenburg, Sweden, 14–16 September 2022; Association for Computational Linguistics: Gothenburg, Sweden, 2022; pp. 11–23. [Google Scholar]
- Clark, K.; Khandelwal, U.; Levy, O.; Manning, C.D. What Does BERT Look at? An Analysis of BERT’s Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Florence, Italy, 1 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 276–286. [Google Scholar] [CrossRef] [Green Version]
- Goldberg, Y. Assessing BERT’s syntactic abilities. arXiv 2019, arXiv:1901.05287. [Google Scholar]
- Ramnath, S.; Nema, P.; Sahni, D.; Khapra, M.M. Towards Interpreting BERT for Reading Comprehension Based QA. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3236–3242. [Google Scholar] [CrossRef]
- Conneau, A.; Kruszewski, G.; Lample, G.; Barrault, L.; Baroni, M. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2126–2136. [Google Scholar] [CrossRef] [Green Version]
- Belinkov, Y. Probing Classifiers: Promises, Shortcomings, and Advances. Comput. Linguist. 2022, 48, 207–219. [Google Scholar] [CrossRef]
- Zeman, D.; Nivre, J.; Abrams, M.; Aepli, N.; Agic, Ž.; Ahrenberg, L.; Aleksandravičiūtė, G.; Antonsen, L.; Aplonova, K.; Aranzabe, M.J.; et al. Universal Dependencies 2.5. 2020. Available online: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3105 (accessed on 20 February 2023).
- Miaschi, A.; Alzetta, C.; Brunato, D.; Dell’Orletta, F.; Venturi, G. Probing Tasks Under Pressure. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021), Milan, Italy, 29 June–1 July 2022; Fersini, E., Passarotti, M., Patti, V., Eds.; CEUR Workshop Proceedings: Milan, Italy, 2022. [Google Scholar]
- Belinkov, Y.; Glass, J. Analysis Methods in Neural Language Processing: A Survey. Trans. Assoc. Comput. Linguist. 2019, 7, 49–72. [Google Scholar] [CrossRef]
- Hewitt, J.; Liang, P. Designing and Interpreting Probes with Control Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2733–2743. [Google Scholar]
- Miaschi, A.; Brunato, D.; Dell’Orletta, F.; Venturi, G. Linguistic Profiling of a Neural Language Model. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; International Committee on Computational Linguistics: Barcelona, Spain, 2020; pp. 745–756. [Google Scholar] [CrossRef]
- Raganato, A.; Tiedemann, J. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; Association for Computational Linguistics: Brussels, Belgium, 2018. [Google Scholar]
- Htut, P.M.; Phang, J.; Bordia, S.; Bowman, S.R. Do attention heads in BERT track syntactic dependencies? arXiv 2019, arXiv:1911.12246. [Google Scholar]
- Kovaleva, O.; Romanov, A.; Rogers, A.; Rumshisky, A. Revealing the Dark Secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 4365–4374. [Google Scholar] [CrossRef] [Green Version]
- Saphra, N.; Lopez, A. Understanding Learning Dynamics of Language Models with SVCCA. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 3257–3267. [Google Scholar]
- Blevins, T.; Levy, O.; Zettlemoyer, L. Deep RNNs Encode Soft Hierarchical Syntax. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; pp. 14–19. [Google Scholar]
- Tenney, I.; Xia, P.; Chen, B.; Wang, A.; Poliak, A.; McCoy, R.T.; Kim, N.; Van Durme, B.; Bowman, S.R.; Das, D.; et al. What do you learn from context? probing for sentence structure in contextualized word representations. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Hewitt, J.; Manning, C.D. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4129–4138. [Google Scholar]
- Tenney, I.; Das, D.; Pavlick, E. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 4593–4601. [Google Scholar] [CrossRef] [Green Version]
- Liu, N.F.; Gardner, M.; Belinkov, Y.; Peters, M.E.; Smith, N.A. Linguistic Knowledge and Transferability of Contextual Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 1073–1094. [Google Scholar] [CrossRef]
- Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 15–20 July 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 2227–2237. [Google Scholar] [CrossRef] [Green Version]
- Hall Maudslay, R.; Valvoda, J.; Pimentel, T.; Williams, A.; Cotterell, R. A Tale of a Probe and a Parser. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7389–7395. [Google Scholar] [CrossRef]
- Pimentel, T.; Valvoda, J.; Maudslay, R.H.; Zmigrod, R.; Williams, A.; Cotterell, R. Information-Theoretic Probing for Linguistic Structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4609–4622. [Google Scholar]
- Voita, E.; Titov, I. Information-Theoretic Probing with Minimum Description Length. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 19–20 November 2020; pp. 183–196. [Google Scholar]
- Ravichander, A.; Belinkov, Y.; Hovy, E. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 3363–3377. [Google Scholar]
- de Vries, W.; van Cranenburgh, A.; Nissim, M. What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 4339–4350. [Google Scholar] [CrossRef]
- Miaschi, A.; Sarti, G.; Brunato, D.; Dell’Orletta, F.; Venturi, G. Italian Transformers Under the Linguistic Lens. In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020), Bologna, Italy, 1–3 March 2021; Monti, J., Dell’Orletta, F., Tamburini, F., Eds.; CEUR Workshop Proceedings: Bologna, Italy, 2021. [Google Scholar]
- Guarasci, R.; Silvestri, S.; De Pietro, G.; Fujita, H.; Esposito, M. Assessing BERT’s ability to learn Italian syntax: A study on null-subject and agreement phenomena. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 289–303. [Google Scholar] [CrossRef]
- Sanguinetti, M.; Bosco, C. Converting the parallel treebank ParTUT in Universal Stanford Dependencies. In Proceedings of the First Italian Conference on Computational Linguistics CLiC-it, Pisa, Italy, 9–10 December 2014; pp. 316–321. [Google Scholar]
- Delmonte, R.; Bristot, A.; Tonelli, S. VIT-Venice Italian Treebank: Syntactic and Quantitative Features. In Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories, Bergen, Norway, 7–8 December 2007; Volume 1, pp. 43–54. [Google Scholar]
- Bosco, C.; Simonetta, M.; Maria, S. Converting italian treebanks: Towards an italian stanford dependency treebank. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 8–9 August 2013; pp. 61–69. [Google Scholar]
- Zeman, D.; Popel, M.; Straka, M.; Hajič, J.; Nivre, J.; Ginter, F.; Luotolahti, J.; Pyysalo, S.; Petrov, S.; Potthast, M.; et al. CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, BC, Canada, 3–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 1–19. [Google Scholar] [CrossRef] [Green Version]
- Sanguinetti, M.; Bosco, C.; Lavelli, A.; Mazzei, A.; Antonelli, O.; Tamburini, F. PoSTWITA-UD: An Italian Twitter Treebank in universal dependencies. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018. [Google Scholar]
- Cignarella, A.T.; Bosco, C.; Patti, V.; Lai, M. TWITTIRÒ: An Italian Twitter Corpus with a Multi-layered Annotation for Irony. Ital. J. Comput. Linguist. 2018, 4, 25–43. [Google Scholar] [CrossRef]
- Brunato, D.; Cimino, A.; Dell’Orletta, F.; Venturi, G.; Montemagni, S. Profiling-UD: A Tool for Linguistic Profiling of Texts. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 7147–7153. [Google Scholar]
- Nivre, J. Towards a universal grammar for natural language processing. In Proceedings of the 16th Annual Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Cairo, Egypt, 14–20 April 2015; pp. 3–16. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
- Tiedemann, J.; Nygaard, L. The OPUS Corpus—Parallel and Free. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal, 26–28 May 2004; Available online: http://logos.uio.no/opus (accessed on 20 September 2022).
- Miaschi, A.; Sarti, G.; Brunato, D.; Dell’Orletta, F.; Venturi, G. Probing Linguistic Knowledge in Italian Neural Language Models across Language Varieties. Ital. J. Comput. Linguist. 2022, 8, 25–44. [Google Scholar] [CrossRef]
- Jawahar, G.; Sagot, B.; Seddah, D. What Does BERT Learn about the Structure of Language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 3651–3657. [Google Scholar] [CrossRef] [Green Version]
Linguistic Feature | Label |
---|---|
Order of elements (Order) | |
Relative order of subject and object | subj_pre, subj_post, obj_post |
Morpho-syntactic information (POS) | |
Distribution of UD and language-specific POS | upos_dist_*, xpos_dist_* |
Use of Subordination (Subord) | |
Distribution of subordinate clauses | subordinate_prop_dist |
Average length of subordination chains and distribution by depth | avg_subord_chain_len, subordinate_dist_1 |
Relative order of subordinate clauses | subordinate_post |
Syntactic Relations (SyntacticDep) | |
Distribution of dependency relations | dep_dist_* |
Global and Local Parsed Tree Structures (TreeStructure) | |
Depth of the whole syntactic tree | parse_depth |
Average length of dependency links and of the longest link | avg_links_len, max_links_len |
Average length of prepositional chains and distribution by depth | avg_prep_chain_len, prep_dist_1 |
Clause length | avg_token_per_clause |
Inflectional morphology (VerbInflection) | |
Inflectional morphology of lexical verbs and auxiliaries | verbs_*, aux_* |
Verbal Predicate Structure (VerbPredicate) | |
Distribution of verbal heads and verbal roots | verbal_head_dist, verbal_root_perc |
Verb arity and distribution of verbs by arity | avg_verb_edges, verbal_arity_* |
Shortest | Standard | Longest | ||||
---|---|---|---|---|---|---|
Feat. Group | Mean | CV | Mean | CV | Mean | CV |
Order | 19.83 | 1.04 | 40.45 | 0.55 | 52.96 | 0.34 |
POS | 3.56 | 0.14 | 3.56 | 0.09 | 3.68 | 0.03 |
Subord | 16.73 | 0.96 | 36.51 | 0.58 | 48.67 | 0.29 |
SyntacticDep | 5.36 | 0.19 | 5.33 | 0.13 | 5.51 | 0.08 |
TreeStructure | 4.62 | 1.17 | 11.66 | 0.56 | 17.37 | 0.29 |
VerbInflection | 23.21 | 0.80 | 38.38 | 0.47 | 47.38 | 0.33 |
VerbPredicate | 16.60 | 0.78 | 23.17 | 0.39 | 25.97 | 0.22 |
Random | Swapped | |||||
---|---|---|---|---|---|---|
Group | Random | Bins | Lengths | Swapped | Bins | Lengths |
Order | 0.48 | 0.48 | 0.48 | 0.41 | 0.40 | 0.40 |
POS | 0.40 | 0.31 | 0.25 | 0.12 | 0.12 | 0.12 |
Subord | 0.43 | 0.41 | 0.41 | 0.38 | 0.35 | 0.35 |
SyntacticDep | 0.40 | 0.31 | 0.25 | 0.15 | 0.13 | 0.12 |
TreeStructure | 0.36 | 0.28 | 0.25 | 0.20 | 0.18 | 0.18 |
VerbInflection | 0.47 | 0.47 | 0.47 | 0.44 | 0.43 | 0.44 |
VerbPredicate | 0.42 | 0.41 | 0.40 | 0.26 | 0.25 | 0.25 |
Average | 0.42 | 0.38 | 0.36 | 0.28 | 0.27 | 0.26 |
Shortest | Longest | |||
---|---|---|---|---|
Group | Random | Swapped | Random | Swapped |
Order | 0.50 | 0.32 | 0.46 | 0.35 |
POS | 0.43 | 0.13 | 0.38 | 0.13 |
Subord | 0.49 | 0.20 | 0.39 | 0.28 |
SyntacticDep | 0.44 | 0.13 | 0.37 | 0.15 |
TreeStructure | 0.37 | 0.22 | 0.37 | 0.14 |
VerbInflection | 0.50 | 0.34 | 0.44 | 0.42 |
VerbPredicate | 0.46 | 0.24 | 0.39 | 0.21 |
Average | 0.46 | 0.23 | 0.40 | 0.24 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miaschi, A.; Alzetta, C.; Brunato, D.; Dell’Orletta, F.; Venturi, G. Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebanks. Information 2023, 14, 144. https://doi.org/10.3390/info14030144
Miaschi A, Alzetta C, Brunato D, Dell’Orletta F, Venturi G. Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebanks. Information. 2023; 14(3):144. https://doi.org/10.3390/info14030144
Chicago/Turabian StyleMiaschi, Alessio, Chiara Alzetta, Dominique Brunato, Felice Dell’Orletta, and Giulia Venturi. 2023. "Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebanks" Information 14, no. 3: 144. https://doi.org/10.3390/info14030144