Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction
Abstract
:1. Introduction
Challenges in Arabic Event Coreference
2. Various Approaches and Data
2.1. Data
Dataset | Docs | Events Mentions | Chains |
---|---|---|---|
ACE 2005 | 599 | 5349 | 4090 |
ECB+ [7] | 982 | 14,884 | 9875 |
TAC KBP [11] | 1075 | 29,471 | 19,257 |
MAVEN-ERE [14] | 4480 | 112,276 | 103,193 |
Model | Dataset | CoNLL | AVG |
---|---|---|---|
End-to-End within-document [16] | ACE2005 | 64.56 | 62.11 |
Gold triggers within-document [16] | ACE2005 | 86.78 | 86.63 |
EPASE-within-document [17] | ECB+ | 88.3 | - |
EPASE-cross-document [17] | ECB+ | 84.3 | - |
End-to-End within-document [18] | KBP 2016 | 43.55 | 40.61 |
2.2. Approaches
3. Proposed Arabic Event Coreference Annotation
3.1. Event Trigger Annotation
3.2. Event Coreference Annotation
- When events mentions refer to the same real-world event and have the same event type.
1. قتل ثلاثة وأربعون شخصا في الهجوم على بغداد , توفي ثلاثة وأربعون شخصا في هجوم بغداد 2. الهجوم وقع بعاصمة سوريا .…. قتل في الهجوم على دمشق اربعة مسلحين - Events that have the same temporal and location scope, though not necessarily the same temporal expression or specifically the same date.
1. هجوم في بغداد الخميس ….. قصف في المنطقة الخضراء الاسبوع الماضي - Event arguments may be non-coreferential or conflicting.
1. قتل 18 شخصا ….. عشرات القتلى
- Morphological Variation:
- ○
- A strategy should be developed to train annotators in identifying morphological variations in verbs and how they are related to the same event. Within an event chain, guidelines should provide examples and rules for handling different verb forms.
- ○
- Example: Provide annotators with examples of verb conjugations and instruct them to connect verbs with the same root and semantic event, even if they have different morphological forms. For instance, “زار” (visited) and “زرت” (I visited) share the same root and should be linked if they refer to the same event.
- Pronoun Ambiguity:
- ○
- In order to disambiguate pronoun references, guidelines should provide explicit instructions. The annotator should be guided to consider the context, antecedents, and gender/number agreement when making annotations.
- ○
- Example: Instruct annotators to look for the closest noun or entity that agrees in gender and number with the ambiguous pronoun. For “ساره رأت محمد وقالت له أنها ستأتي,” annotators should link “له” (him) to “محمد” since they agree in gender and number.
- Dialectal Variations:
- ○
- Strategy: Annotators should be trained to recognize different expressions for the same event in order to be aware of dialectal variations. A section on common dialectal variations can be included in the guidelines.
- ○
- Example: If annotators encounter a dialectal phrase that refers to an event, they should be instructed to link it to the standard Arabic expression that represents the same event.
- Verb Ellipsis:
- ○
- Strategy: Guidelines should specify how verb ellipses should be handled, emphasizing that omitted verbs should be interpreted in light of the context.Example: For “أحمد أكل التفاحة ومحمد أيضًا,” annotators should understand that the omitted verb “ate” applies to both Ahmed and Mohammad.
- ○
- Providing annotators with clear guidelines, training, and regular feedback sessions can also assist in addressing linguistic challenges effectively. When faced with ambiguous cases, the schema should include mechanisms for annotator discussion and consensus building. In order to improve the quality of event coreference annotations for Arabic text, constant communication between annotators and project supervisors is essential.
4. Evaluation Metrics
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Doddington, G.R.; Mitchell, A.; Przybocki, M.A.; Ramshaw, L.A.; Strassel, S.M.; Weischedel, R.M. The automatic content extraction (ace) program—Tasks, data, and evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal, 26–28 May 2004; Volume 2, pp. 837–840. [Google Scholar]
- Verhagen, M.; Gaizauskas, R.J.; Schilder, F.; Hepple, M.; Katz, G.; Pustejovsky, J. TimeML: Robust Specification of Event and Temporal Expressions in Text. J. Semant. 2007, 24, 37–75. [Google Scholar]
- Arabic Events Guidelines Version 5.4.4. Available online: https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/arabic-events-guidelines-v5.4.4.pdf (accessed on 13 June 2023).
- Cybulska, A.; Vossen, P. Translating Granularity of Event Slots into Features for Event Coreference Resolution. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Denver, CO, USA, 4 June 2015; pp. 1–10. [Google Scholar]
- OntoNotes Release 5.0. LDC2013T19. Web Download. Philadelphia: Linguistic Data Consortium. 2013. Available online: https://catalog.ldc.upenn.edu/LDC2013T19 (accessed on 9 May 2023).
- National Institute of Standards and Technology. TAC Knowledge Base Population (KBP) 2017. Available online: https://tac.nist.gov/2017/KBP/index.html (accessed on 1 May 2023).
- Cybulska, A.; Vossen, P. Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; pp. 4545–4552. [Google Scholar]
- Adrian, B.C.; Sanda, H. Unsupervised event coreference resolution with rich linguistic features. Uppsala. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, 11–16 July 2010. [Google Scholar]
- Song, Z.; Bies, A.; Strassel, S.; Riese, T.; Mott, J.; Ellis, J.; Wright, J.; Kulick, S.; Ryant, N.; Ma, X. From light to rich ere: Annotation of entities, relations, and events. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Denver, CO, USA, 4 June 2015; pp. 89–98. [Google Scholar]
- Eirew, A.; Cattan, A.; Dagan, I. WEC: Deriving a large-scale cross-document event coreference dataset from Wikipedia. arXiv 2021, arXiv:2104.05022. [Google Scholar]
- Getman, J.; Ellis, J.; Strassel, S.; Song, Z.; Tracey, J. Laying the groundwork for knowledge base population: Nine years of linguistic resources for tac kbp. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018. [Google Scholar]
- Hürriyetoğlu, A.; Zavarella, V.; Tanev, H.; Yörük, E.; Safaya, A.; Mutlu, O. Automated extraction of socio-political events from news (AESPEN): Workshop and shared task report. arXiv 2020, arXiv:2005.06070. [Google Scholar]
- Hürriyetoğlu, A.; Mutlu, O.; Yörük, E.; Liza, F.F.; Kumar, R.; Ratan, S. Multilingual protest news detection-shared task 1, case 2021. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text (CASE 2021), Online, 5–6 August 2021; pp. 79–91. [Google Scholar]
- Wang, X.; Chen, Y.; Ding, N.; Peng, H.; Wang, Z.; Lin, Y.; Han, X.; Hou, L.; Li, J.; Liu, Z.; et al. MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 926–941. [Google Scholar]
- Lu, J.; Ng, V. Event Coreference Resolution: A Survey of Two Decades of Research. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Survey Track, Stockholm, Sweden, 13–19 July 2018; pp. 5479–5486. [Google Scholar]
- Yao, Y.; Li, Z.; Zhao, H. Learning Event-aware Measures for Event Coreference Resolution. In Proceedings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 13542–13556. [Google Scholar]
- Zeng, Y.; Jin, X.; Guan, S.; Guo, J.; Cheng, X. Event coreference resolution with their paraphrases and argument-aware embeddings. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), 8–13 December 2020; pp. 3084–3094. [Google Scholar]
- Lai, T.; Ji, H.; Bui, T.; Tran, Q.H.; Dernoncourt, F.; Chang, W. A context-dependent gated module for incorporating symbolic semantics into event coreference resolution. arXiv 2021, arXiv:2104.01697. [Google Scholar]
- Nguyen, T.H.; Meyers, A.; Grishman, R. New York University 2016 System for KBP Event Nugget: A Deep Learning Approach; TAC: Tokyo, Japan, 2016. [Google Scholar]
- Chen, C.; Ng, V.S. An end-to-end Chinese event coreference resolver. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; Volume 2, pp. 4532–4538. [Google Scholar]
- De Langhe, L.; De Clercq, O.; Hoste, V. Investigating Cross-Document Event Coreference for Dutch. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, Gyeongju, Republic of Korea, 16–17 October 2022; pp. 88–98. [Google Scholar]
- Joshi, M.; Levy, O.; Weld, D.S.; Zettlemoyer, L. BERT for coreference resolution: Baselines and analysis. arXiv 2019, arXiv:1908.09091. [Google Scholar]
- Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
- Lu, J.; Ng, V. Learning antecedent structures for event coreference resolution. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 113–118. [Google Scholar]
- Raghunathan, K.; Lee, H.; Rangarajan, S.; Chambers, N.; Surdeanu, M.; Jurafsky, D.; Manning, C.D. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 9–11 October 2010; pp. 492–501. [Google Scholar]
- Liu, Z.; Araki, J.; Hovy, E.H.; Mitamura, T. Supervised Within-Document Event Coreference using Information Propagation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; pp. 4539–4544. [Google Scholar]
- Choubey, P.K.; Huang, R. Event coreference resolution by iteratively unfolding inter-dependencies among events. arXiv 2017, arXiv:1707.07344. [Google Scholar]
- Chen, C.; Ng, V. Joint inference over a lightly supervised information extraction pipeline: Towards event coreference resolution for resource-scarce languages. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Araki, J.; Mitamura, T. Joint event trigger identification and event coreference resolution with structured perceptron. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 2074–2080. [Google Scholar]
- Lu, J.; Ng, V. Conundrums in event coreference resolution: Making sense of the state of the art. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 1368–1380. [Google Scholar]
- Vilain, M.; Burger, J.D.; Aberdeen, J.; Connolly, D.; Hirschman, L. A model-theoretic coreference scoring scheme. In Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, USA, 6–8 November 1995. [Google Scholar]
- Bagga, A.; Baldwin, B. Algorithms for scoring coreference chains. In Proceedings of the First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Granada, Spain, 28–30 May 1998; Volume 1, pp. 563–566. [Google Scholar]
- Luo, X. On coreference resolution performance metrics. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, 6–8 October 2005; pp. 25–32. [Google Scholar]
- Recasens, M.; Hovy, E. BLANC: Implementing the Rand index for coreference evaluation. Nat. Lang. Eng. 2011, 17, 485–510. [Google Scholar] [CrossRef]
- Pradhan, S.; Moschitti, A.; Xue, N.; Uryupina, O.; Zhang, Y. CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the Joint Conference on EMNLP and CoNLL-Shared Task, Jeju, Republic of Korea, 13 July 2012; pp. 1–40. [Google Scholar]
Arabic | English | Transliteration |
---|---|---|
أصيب جنديان في الهجوم | Two soldiers were injured in the attack | Usyeb jundyan fi alhujum |
Arabic | English | Transliteration |
---|---|---|
توالت الادانات العربية للجريمة البشعة التي اقترفها تنظيم داعش بإقدامه على حرق الطيار الاردني حيا... فأدانت السعودية تلك الجريمة. | Arab condemnations continued for the heinous crime committed by ISIS, as they burned the Jordanian pilot alive. Saudi Arabia condemned that crime | Tawalat al-adanat al-arabiyya lil-jarimah al-bashiyah allati iqtarafaha tanzeem Da’esh bi-iqdamih ‘ala harq al-tayar al-urdoni hayyan... fa-adanat al-Su’udiyyah tilka al-jarimah |
Lemma. | English Inflectional Forms | Arabic Inflectional Forms |
---|---|---|
attack— هجم (hujm) | attack, attacks, attacked, attacking | هجم - هاجم - تهاجم - يهاجم - نهاجم - تهاجمونَ - يهاجمونَ – سأهاجم – ستهاجم - سيهاجم - سنهاجم - ستهاجمونَ - سيهاجمونَ - هاجموا- هاجمتُ - هاجمتَ - هاجمتما - أهاجم |
No | Event Type | Event Sub Types | Annotated | Not Annotated | Note |
---|---|---|---|---|---|
1 | Life | Be-Born | Mohamed was born in England on 18 June 1963 1963 محمد ولد في لندن في 13 يونيو Being born without my hand, I have never experienced any other way. أنا ولدت بدون أيدي, لا أحس أن هنالك أي فرق | University was born in August 1990. الجامعة ولدت في أغسطس 1990. | The birth of other things or ideas is not encompassed. |
Marry | Amna and ahmed were married on 9 June 1998. امنة و أحمد تزوجوا عام 1998 amna and ahmed are married. (resultative) امنة و أحمد متزوجان | ||||
Divorce | A two-year marriage ended in divorce for the couple. انتهى زواج لمدة عامين بالطلاق للزوجين. | ||||
Injure | The attack resulted in two soldiers being injured. اصيب جنديان في الهجوم A soldier who has been injured. (resultative) الجندي المصاب | LIFE events | |||
Die | Ronald Reagan was the target of an assassination attempt by John Hinckley. جون هينكلي قام بمحاولة اغتيال رونالد ريجان Foreign hostages have been threatened with death by terrorist groups. المجموعات الارهابية هددت بقتل الرهائن الاجانب An automobile accident resulted in her death. توفيت في حادث | ||||
2 | Movement | Transport | Fred went to New York on Friday to visit Harry. ذهب فريد إلى نيويورك يوم الجمعة لزيارة هاري. The leaders of Palestine cautioned that Israel should withdraw its troops from the surrounding areas of Palestinian cities. القادة الفلسطينيين حذروا بان علي الاسرائيليين ان يخلوا جنودهم من المدن الفلسطينية | ||
3 | Transaction | Transfer-Ownership | A total of two nuclear submarines have been purchased by China from Russia اشترت الصين ما مجموعه غواصتين نوويتين من روسيا This report pertains to the submarines that China has recently obtained. (attributive) اقتنت الصين غواصتين حديثا | ||
Transfer-Money | There were suspicions that the charity provided funds to an organization. الجمعيات الخيرية متهمة بتمويل منظمة القاعدة | I paid $9 for the movie ticket. دفعت تسعة دولارات ثمنا لتذكرة السينما | TRANSFER-MONEY event. | ||
4 | Business | Start-Org | Joseph Conrad Parkhurst, the founder of Cycle World motorcycle magazine in 1962, has passed away. .... جوزيف آونارد الذي انشا مجلة السيارات عام 1962 | The event of establishing independence of a geopolitical entity (GPE) or spinning off a subsidiary of an organization (ORG) will not be marked as a STARTORG event in the annotation. | |
Merge-Org | It was announced in September that the long-planned merger with KLM Royal Dutch Airlines was not going to take place أُعلن في سبتمبر أن الاندماج المخطط له منذ فترة طويلة مع الخطوط الجوية الملكية KLM لن الهولندية | ||||
Declare-Bankruptcy | In 1995, Orange County declared bankruptcy. اشهرت شركة كوكي افلاسها عام 1995 | ||||
End-Org | FOO Corp folded in 2002. تم طي فو كروب في العام 2002 | ||||
5 | Conflict | Attack | The bombing of Fallujah by U.S. forces persisted. القوات الامريكية استمرت في قصف الفالوجا | ||
Demonstrate | On Monday, the strike of the union started. وبدأ إضراب النقابة يوم الاثنين. Demonstrators gathered in protest. المعارضون تظاهروا امام البيت الابيض | ||||
6 | Contact | Meet | General Motors (GM) is currently in negotiations with Chrysler for the acquisition of Jeep. شركة جي ام تجري مباحثات مع آريزلر لشراء السيارة جيب | ||
Phone-Write | An e-mail was sent by John to Jane. جون ارسل رسالة اليكترونية الي جين | The event of a very common PERSON talking to reporters or issuing a statement is not taggable. | |||
7 | Personnel | Start-Position | In June 1998, Mary Smith became the CEO of Foo Corp. ماري سميث التحقت بالشركة كرئيس مجلس ادارة في يونية عام 1998 | A job creation or other large-scale economic trends in employment will not be annotated in general. | |
End-Position | Mary Smith departed from Foo Corp. in July 2000 ماري سميث تركت شركة كوكي في عام 2000م | ||||
Nominate | We expect the party to nominate him for president. نتوقع أن يرشحه الحزب لمنصب | ||||
Elect | In 1993, Greg Lashutka won the election and became the mayor of Columbus. جورج انتخب عمده لكولومبيا عام 1993 | ||||
8 | Justice | Arrest-Jail | Scott Peterson was taken into custody for the killing of his spouse تم القبض على سكوت بيترسون بتهمة قتل زوجته. Since May, more than 20 individuals suspected of terrorism have been imprisoned in Russia without trial. منذ مايو الماضي اعتقلت روسيا أكثر من عشرين من الارهابيين المشتبه فيهم بدون اي محاكمة | Only events that can be linked to the legal system of a GPE entity that can be tagged will be annotated as JUSTICE events. | |
Release-Parole | In accordance with his parole, Fred has been released. وفقًا للإفراج المشروط عنه ، تم إطلاق سراح فريد المسجون | ||||
Trial-Hearing | Jenna Raleigh is facing trial in a military court. جينا رالي ستحاكم في محكمة عسكرية. This week, the trial resumed المحاكمة المنعقدة هذا الاسبوع | ||||
Charge-Indict | The grand jury indicted Joy Fenter on eleven counts of mail fraud. وجهت هيئة محلفين كبرى اتهامات إلى جوي فينتر في إحدى عشرة تهمة بالاحتيال عبر البريد. | ||||
Sue | She threatened to sue me هددت بمقاضاتي | ||||
Convict | The court will convict the suspect المحكمة ستقضي بإدانة المشتبه به بالجريمة | ||||
Sentence | The court sentenced him to 20 years’ hard labor. وحكمت عليه المحكمة بالأشغال الشاقة 20 سنة | ||||
Fine | She was acquitted on all counts تمت تبرئتها من جميع التهم الموجهة إليه | ||||
Execute | The execution of David Goran by lethal injection took place in March 1987. تم إعدام ديفيد جوران بالحقنة المميتة في مارس 1987. | ||||
Extradite | The ex-leader was sent to Burkina Faso after extradition. تم إرسال الزعيم السابق إلى بوركينا فاسو بعد ترحيله | ||||
Acquit | Chase was acquitted after a trial in the Senate. تمت تبرئة تشيس بعد محاكمة في مجلس الشيوخ. | ||||
Appeal | Ahmed submitted an appeal against the court ruling قدم أحمد طلب إستئناف الحكم الصادر من المحكمة | ||||
Pardon | The prisoner was granted a pardon after serving the sentence. تم منح العفو للمسجون بعد قضاء فترة العقوبة. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aldawsari, M.; Kolhar, M.; Dawood Omer, O.S. Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction. Appl. Sci. 2023, 13, 11004. https://doi.org/10.3390/app131911004
Aldawsari M, Kolhar M, Dawood Omer OS. Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction. Applied Sciences. 2023; 13(19):11004. https://doi.org/10.3390/app131911004
Chicago/Turabian StyleAldawsari, Mohammed, Manjur Kolhar, and Omer Salih Dawood Omer. 2023. "Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction" Applied Sciences 13, no. 19: 11004. https://doi.org/10.3390/app131911004
APA StyleAldawsari, M., Kolhar, M., & Dawood Omer, O. S. (2023). Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction. Applied Sciences, 13(19), 11004. https://doi.org/10.3390/app131911004