Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction

: Event coreference resolution is a crucial component in Natural Language Processing (NLP) applications as it directly a ﬀ ects text summarization, machine translation, classi ﬁ cation, and textual entailment. However, the research on this task for Arabic language is limited, compared to other languages such as English, Chinese and Spanish. This paper aims to review the state-of-the-art approaches in event coreference (EC) within the context of coreference resolution tasks, emphasizing the signi ﬁ cance of EC in NLP. The focus is placed on the latest developments in Arabic language processing related to event coreference. To ﬁ ll this gap, a comprehensive study of existing work is conducted, and new approaches are suggested. The paper highlights the challenges speci ﬁ c to Arabic event coreference resolution, such as the variability of verb forms, pronoun ambiguity, ellipsis and null arguments, lexical and morphological variation, lack of annotated resources, discourse and pragmatic context, and cultural and contextual sensitivity. Addressing these challenges requires a deep understanding of Arabic linguistics, advanced NLP techniques, and the availability of annotated resources. Furthermore, this paper examines the existing datasets and methods for Arabic event coreference and proposes an annotation scheme. By leveraging existing NLP algorithms and developing event coreference resolution systems tailored for Arabic, the accuracy and performance of NLP tasks can be signi ﬁ cantly improved.


Introduction
In the context of the ACE [1] and TimeML [2] schemas, an event is a significant occurrence or activity that happens at a specific point in time.However, the definitions of an event in these two schemas are slightly different.In the ACE2005 schema, an event is defined as a type of entity that represents a happening or occurrence that takes place at a particular time, such as a meeting, an attack, or an election.An event in ACE2005 is often characterized by attributes such as its type, subtype, and participants.In contrast, the TimeML schema defines an event as a concept that represents a single point or interval in time where something happens.TimeML events can include specific occurrences, such as an earthquake or a meeting, as well as more abstract events, such as the start or end of a period of time.Overall, both ACE2005 and TimeML define an event as a significant occurrence that takes place at a specific time, but the distinction between the two lies in how they represent and categorize events.ACE2005 is more focused on the type and attributes of events, while TimeML is more concerned with the temporal relationships between events and other temporal expressions in a text.In Table 1, bolded words are annotated as events in both corpora.

Arabic
English Transliteration ‫ﺃ‬ ‫ﺻﻴﺐ‬ ‫ﺟﻨﺪﻳﺎﻥ‬ ‫ﻓﻲ‬ ‫ﺍﻟﻬﺠﻮﻡ‬ Two soldiers were injured in the attack Usyeb jundyan fi alhujum Event coreference resolution is a task in Natural Language Processing (NLP) that involves identifying expressions (i.e., words) in a text that refer to the same real-world event.In other words, event coreference aims to find all the different ways that an event is mentioned in a text and link them together, so that NLP systems can better understand the relationship between events and the entities that participate in them.In Table 2, all bolded words are coreferencing events.Event coreference is important for natural language processing because it affects how well we can do things like summarizing, translating, and understanding texts.For instance, if we want to summarize a text, we need to know which events are the most important and how they are related to each other.Event coreference is also difficult because it requires knowing how humans make sense of and communicate events before using words.Humans can process the text directly without focusing on which event refers to another because they have a rich mental representation of events that is based on their perception, memory, and knowledge.They can use cues such as tense, aspect, modality, and discourse markers to infer the relations between events.However, these cues are not always explicit or consistent in natural language, and they may vary across languages and genres.This paper concentrates on event coreference for Arabic, which has more challenges and less research than event coreference for other languages.Arabic is a morphologically rich and syntactically complex language that has many variations and dialects.It also has different writing systems and conventions that affect how events are expressed in texts.This paper surveys the existing works on Arabic event coreference and challenges as well as available datasets.Furthermore, the paper proposes a schema for annotating Arabic event coreference based on the ACE (Automatic Content Extraction) Arabic Annotation Guidelines for Events [3].

Challenges in Arabic Event Coreference
The Arabic language presents several challenges when it comes to event coreference resolution due to its unique linguistic characteristics.Some of these challenges include then following: Lexical and Morphological Variation: As shown in Table 3 Arabic has a complex system of inflections and morphological variations that can lead to a variety of surface forms for a given event.As a result of this morphological richness, it may be difficult to identify and link event mentions expressed in different inflected forms.Consider two sentences: ‫ﺍﻟﻤﻜﺘﺒﺔ"‬ ‫ﺯﺍﺭ‬ ‫"ﺃﺣﻤﺪ‬ (Ahmed visited the library) and ‫ﺍﻟﻤﻜﺘﺒﺔ"‬ ‫"ﺯﺭﺕ‬ (I visited the library).In the second sentence, the verb form ‫"ﺯﺭﺕ"‬ is different due to the speaker being first-person singular, making it challenging to link this event to the first sentence.
Dropped Pronouns: Arabic frequently drops subject pronouns due to its tendency to drop them, making it more difficult to determine the agents of events, which are crucial for resolving coreferences [4].For instance, in this sentence ‫ﺍﻟﻤﺪﺭﺳﺔ"‬ ‫ﺇﻟﻰ‬ ‫ﺫﻫﺒﺖ‬ (Went to school)" the subject pronoun " ‫"ﺃﻧﺎ‬ (I) is dropped, which makes it more difficult to determine the agent of the event.
Pronouns Ambiguity: Due to the extensive use of pronouns with multiple forms and genders, Arabic pronouns are highly ambiguous, as are other Arabic words.For example, in this sentence: " ‫ﻟﻬ‬ ‫ﻗﺎﻟﺖ‬ ‫ﺗﺤﺒﻪ‬ ‫ﺃﻧﻬﺎ‬ ‫ﺎ‬ (She told her that she loves him)" both pronouns ‫"ﻫﺎ"‬ and ‫"ﻫﺎ"‬ refer to different entities.Based on context, it is necessary to disambiguate pronoun referents in order to resolve coreferring events accurately.
Events Ellipses: These refer to situations where an event is referred to or implied without being explicitly mentioned.Arabic uses events ellipses extensively.In this sentence, ‫ﺑﺮﺗﻘﺎﻟﺔ"‬ ‫ﻭﺍﻟﺒﻨﺖ‬ ‫ﺗﻔﺎﺣﺔ‬ ‫ﺍﻟﻮﻟﺪ‬ ‫ﺃﻛﻞ‬ (The boy ate an apple and the girl ate an orange)", the event of eating is referred to twice, but the second time it is implied without being explicitly mentioned in a sentence.By recognizing event ellipsis, the coherence of discourse is enhanced as it aids in resolving coreferring events accurately.
Cultural and Contextual Sensitivity: The interpretation and resolution of Arabic coreferences may involve a consideration of cultural and contextual factors specific to the Arabic language and its diverse dialects.
In order to address these challenges, a deep understanding of Arabic linguistics, advanced natural language processing techniques, and the availability of annotated resources specific to Arabic event coreference resolution are required.

Lemma.
English Inflectional Forms Arabic Inflectional Forms attack-‫ﻫﺠﻢ‬ (hujm ) attack, attacks, attacked, attacking When transferring NLP models from English to Arabic or any other language, it is imperative to understand the linguistic and domain-specific characteristics of both languages and datasets.The key to achieving good performance in cross-lingual NLP tasks is careful analysis, adaptation, and evaluation.In addition, Arabic has several distinctive features, including its root-based morphology, its right-to-left script, and its complex verb conjugation.These linguistic differences could challenge models that are trained on Indo-European languages such as English.
A key difference between within-document event coreference resolution and crossdocument event coreference resolution lies in the scope and objective of each task.As opposed to within-document resolution, which emphasizes coherence within a single text by connecting mentions of events, cross-document resolution focuses on connecting and consolidating information about events across multiple texts to improve understanding of events that occur in various contexts.Among the NLP tasks, both event-centered analysis and information retrieval have their own challenges and applications.It is more challenging to resolve cross-document event coreferences than to resolve withindocument events because it often involves reasoning about the same event in various contexts, dealing with variations in event descriptions, and resolving ambiguity caused by various descriptions of the same event.

Data
When it comes to EC resolution, the English language domain faces several major challenges.Additionally, there are a variety of annotation schemas, topologies, and conceptual definitions of what an event is available in the corpora.Therefore, event coreference methodologies cannot be used to compare datasets for which they were not designed.A benchmark for entity and event coreference resolution systems in the English language domain has long been the OntoNotes corpus [5].The large-scale multi-domain text collection contains annotations at the entity and event levels.Additionally, OntoNotes does not distinguish between entity and event labels: both are simply referred to as mentions.In spite of the fact that event coreference and entity coreference share many aspects, researchers currently face vastly different challenges when it comes to the two tasks.Additionally, verbal events cannot be classified as a single-word mention in the OntoNotes corpus unless there is an equivalent noun phrase.The data become less consistent as a result, especially since coreference resolution algorithms are intended for use in practical, real-world applications.
Table 4 shows the available English corpora annotated with event reference relations and Table 5 presents the state-of-the-art systems for these corpora.The ACE 2005 [1] is used for evaluating the ACE corpora.A set of predefined type actions in English and Chinese are annotated as events and coreferences within the ACE 2005 corpus.In spite of the fact that ACE's event schema is relatively limited, its general approach has evolved over time.ACE methodology, for example, is incorporated into the TAC-KBP corpora [6], which combine ACE's typology with more complex and informative annotation styles.There are documents included in this dataset in English, Arabic, Chinese, and Spanish, as well as co-referential links within each document.These datasets all annotate coreferential links within documents, which makes cross-document event correlative resolution even more challenging.A monolingual English ECB+ corpus [7] is the standard for cross-document research, extending the more limited ECB corpus [8].This corpus contains a substantial number of newspaper documents that contain multi-word event spans that have been annotated according to Rich ERE guidelines [9].In addition to the use of a semi-automated data leveraging method, the WEC-Eng project developed the WEC-Eng dataset [10], the second and final cross-document English corpus.As a result of this method, event mentions and references are not limited to predefined categories.English, Dutch, Spanish, and Italian annotations are provided at the cross-document level.In terms of coverage, the dataset is largely unrestricted.Authors [10], proposed an efficient method for acquiring a large-scale dataset for cross-document event correlation is using Wikipediaʹs event conference.
Several datasets are available in different languages, including Chinese [1,11], Greek [12], and Spanish [1,13].Based on the available information at present, it has been ascertained that there are no existing corpora specifically annotated for Arabic event coreference.Therefore, there is no system yet to detect Arabic event coreference.

Approaches
According to Table 5, in their model [16], multiple representations are learned and integrated from both event alone and event pair data.In order to create more discriminatory representations of events, they introduced multiple linguistically motivated event alone features.To capture the distinctions between event pairs, they considered multiple similarity measures.They demonstrate the effectiveness of their proposed model by achieving a state-of-the-art on the ACE2005 benchmark.The model is also compared with ground truth triggers and predicted event-alone features in order to ensure a thorough comparison with the CDGM model.
A new model for event coreference resolution, called EPASE, was proposed in [17].It can cover event paraphrases in a broader range of situations, improving generalization, by identifying deep paraphrase relationships within an event-specific context of sentences.In addition, argument roles are embedded in event embedding without relying on a fixed number or type of arguments, leading to greater EPASE scalability.There is consistent and significant superiority of this method over existing methods, both withinand cross-document correlations.
The resolution of event coreferences is an important research problem with many applications.Although pre-trained language models have achieved remarkable success in recent times, we argue that using symbolic features is still highly beneficial.The automatic extraction of symbolic features is subject to noise and errors, since reference resolution typically comes from upstream components in the information extraction pipeline.Furthermore, certain features may be more informative than others depending on the context.In response to these observations, the authors in [18] proposed a novel contextdependent gated module for adaptively controlling the information flow from the input symbolic features.With the help of a simple noisy training method, their proposed models achieve state-of-the-art results on two datasets: ACE 2005 and KBP 2016.
According to reference [15], event coreference resolution typically follows the same paradigms as entity coreference resolution.It should be noted, however, that the methods described in the following discussion are limited to the English and Chinese languages.Coreference resolution is currently characterized by three predominant paradigms.Typically, the mention-pair approach is used, which involves transforming the process of forming clusters of co-referential mentions into a binary classification process.Based on this approach, pairs of event mentions are generated and classified using a binary classification algorithm.In order to reconstruct the event coreference chain from the binary output, a clustering algorithm is used.Recently, mention-pair systems have evolved in tandem with developments in machine learning methods frequently used in natural language processing.
Prior to deep neural networks [19], support vector machines [20] and decision trees [4] were used to analyze feature-based data.In studies, outward lexical similarity has been demonstrated to be the most powerful indicator of coreference among these approaches, and features based on string comparisons have also been demonstrated to be the most powerful.A further feature that models the document's structure was also successful in resolving coreference in its context [15,21].However, feature-based methods for coreference resolution have encountered competition from transformer-based approaches.These newer techniques employ large language models to produce powerful contextual representations of mentions, forming the foundation for their classification algorithms [22].
Transformer-based approaches that are span-based [23] have demonstrated state-ofthe-art performance in the English language.It is important to note that these pre-trained language models are optimized to encode longer word sequences, which results in more robust contextual representations of events (including multi-word events).Even though mention-pair models generally perform better in coreference resolution tasks, one of their main limitations is their inability to account for event coreference chains involving more than two events.Rather than considering the entire discourse, the algorithm reduces to making pairwise decisions.The second paradigm, mention-ranking, addresses some of the limitations of the mention-pair approach.Based on the feature representation of the mention and its antecedents, the possible antecedents of a mention are ranked.
The algorithm calculates the probability of all co-referential relationships [24] based on a partitioning of co-referential chains.A third method of resolving event correlations is known as the easy-first modeling approach.An event coreference approach is applied using rule-based multi-pass sieve algorithms that have been found to be successful in entity coreference research [25].A system in which mentions which are relatively "easier" are resolved first is determined by a combination of a series of classification rules or sieves arranged in decreasing order of precision.However, it is possible to include global coreference cluster information even though rule systems are primarily based on pairwise comparisons.As a result, mention-pair approaches are addressed, albeit to a minor extent.In addition, within-chain event argument propagation [26] and agglomerative clustering [27] can further improve the performance of simple first methods.Event coreference are resolved using gold-standard event mentions in the methods and algorithms discussed thus far.End-to-end systems must, however, first extract mentions from raw text.To resolve coreferences end-to-end, a pipeline or a joint approach can be used.
It is possible to detect and resolve event mentions in a pipeline configuration using several different methods, and any self-contained detection method may be coupled with any of the above methods for resolving events.In spite of the fact that such systems can be relatively easy to implement and highly customizable, they are prone to error propagation, since errors in one component can pass without being corrected to the next.Alternatively, joint event coreference resolution aims to model both event detection and coreference resolution simultaneously.Integer linear programming and Markov logic networks can be used to perform joint inference [28][29].Each component can be enhanced by incorporating background knowledge.By utilizing segment-based decoding, a joint coreference resolution algorithm can be generated as part of a full-blown joint-learning approach, which combines the two tasks into one structured prediction task.It has been demonstrated that joint methods, particularly joint inference methods, perform best in this field [30], particularly when combined with high-performance entity coreference resolution systems [15] and transformer-based architectures [23].

Event Trigger Annotation
For annotating event triggers, we plan to follow the ACE2005 schema for Arabic event annotation.That is, in order to tag events triggers, annotators must adhere to the ACE2005 Arabic event guidelines [3].Table 6 shows examples of the main event types and subtypes extracted from the ACE2005 Arabic event guidelines.

Event Coreference Annotation
For annotating Arabic event coreference, annotators will follow the less strict schema Rich ERE schema [9] in annotating event coreference.That is, event mentions that refer to the same event occurrence will be grouped into Event Hoppers.Event Hopper is a more inclusive, less strict notion of event coreference as compared strict event coreference in ACE2005 and Light ERE.Event hoppers contain mentions of events that "feel" coreferential to the annotator even if they do not meet the strict event identity requirement in ACE2005.More specifically, event mentions that have the following features go into the same hopper: The bolded text in the following text has been annotated

•
When events mentions refer to the same real-world event and have the same event type.Example: Instruct annotators to look for the closest noun or entity that agrees in gender and number with the ambiguous pronoun.For " ‫ﺃﻧﻬﺎ‬ ‫ﻟﻪ‬ ‫ﻭﻗﺎﻟﺖ‬ ‫ﻣﺤﻤﺪ‬ ‫ﺭﺃﺕ‬ ‫ﺳﺎﺭﻩ‬ ‫",ﺳﺘﺄﺗﻲ‬ annotators should link ‫"ﻟﻪ"‬ (him) to ‫"ﻣﺤﻤﺪ"‬ since they agree in gender and number.

Dialectal Variations:
o Strategy: Annotators should be trained to recognize different expressions for the same event in order to be aware of dialectal variations.A section on common dialectal variations can be included in the guidelines.o Example: If annotators encounter a dialectal phrase that refers to an event, they should be instructed to link it to the standard Arabic expression that represents the same event.

Verb Ellipsis:
o Strategy: Guidelines should specify how verb ellipses should be handled, emphasizing that omitted verbs should be interpreted in light of the context.o Example: For ‫ًﺎ"‬ ‫ﺃﻳﻀ‬ ‫ﻭﻣﺤﻤﺪ‬ ‫ﺍﻟﺘﻔﺎﺣﺔ‬ ‫ﺃﻛﻞ‬ ‫",ﺃﺣﻤﺪ‬ annotators should understand that the omitted verb "ate" applies to both Ahmed and Mohammad.
Providing annotators with clear guidelines, training, and regular feedback sessions can also assist in addressing linguistic challenges effectively.When faced with ambiguous cases, the schema should include mechanisms for annotator discussion and consensus building.In order to improve the quality of event coreference annotations for Arabic text, constant communication between annotators and project supervisors is essential.

Evaluation Metrics
Following standard practice for event coreference systems evaluation, the most common evaluation metrics for event coreference resolution can be used to evaluate Arabic event coreference systems, such as MUC [31], B-Cubed [32], CEAF [33], and BLANC [34], all of which report results in terms of recall (R), precision (P), and F-score (F).Additionally, the CoNLL score [35] can be used for Arabic event coreference evaluation, which is the unweighted average of the MUC, B3, and CEAF F-scores.

Conclusions
The successful resolution of event coreference in Arabic holds the potential to significantly benefit a range of applications, including information extraction, sentiment analysis, document summarization, and machine translation.Nevertheless, this task encounters substantial challenges within the Arabic linguistic landscape.
The intricate lexical and morphological variations in Arabic, coupled with the frequent omission of subject pronouns, contribute to the complexity of event coreference resolution.Moreover, the extensive utilization of ambiguous pronouns and events ellipses further amplifies this complexity.Additionally, the scarcity of annotated corpora tailored to Arabic event coreference presents a major hindrance to the development of specialized systems.
Arabic event coreference remains an underexplored area compared to languages with more established resources.The lack of specialized systems and comprehensive datasets highlights the need for concerted efforts in constructing suitable corpora.Addressing this gap, we have put forth a schema for annotating Arabic event coreference.This schema is designed to effectively capture the nuanced relationships between events and thereby provide crucial support for the development of advanced coreference resolution systems.
In the future, work will be conducted on the resolution of event conflicts, which often involves entities such as individuals and organizations as participants.The integration of entity coreference resolution with event coreference can result in knowledge graphs or databases that are more comprehensive and coherent.By leveraging shared context, joint entity and event coreference resolution can also improve accuracy.

Table 3 .
General overview of the inflectional forms for the word "attack-‫ﻫﺟﻡ‬ (hujm)" in Arabic and

Table 5 .
[15]ish state-of-the-art systems in event coreference.The AVG is the average F-score of four metrics: MUC, B3, CEAFe and BLANC.The CoNLL score is the average of the first three metrics[15].

Table 6 .
Event Types and Sub Types are extracted from ACE2005 event annotation guidelines.
In order to develop an effective annotation schema for Arabic event coreference resolution, it is necessary to consider the specific linguistic challenges associated with Arabic.In addition to the high-level descriptions previously provided, let us examine more detailed strategies for addressing these challenges within the annotation schema.Provide annotators with examples of verb conjugations and instruct them to connect verbs with the same root and semantic event, even if they have different morphological forms.For instance, ‫"ﺯﺍﺭ"‬ (visited) and ‫"ﺯﺭﺕ"‬ (I visited) share the same root and should be linked if they refer to the same event.