AR-Sanad 280K: A Novel 280K Artiﬁcial Sanads Dataset for Hadith Narrator Disambiguation

: Determining hadith authenticity is vitally important in the Islamic religion because hadith s record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith , the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators’ full names are not usually included in the narration chains of hadith s. Thus, ﬁrst, ambiguous narrators need to be identiﬁed. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains ( sanad s) with identiﬁed narrators. The AR-Sanad 280K dataset has around 280K artiﬁcial sanad s and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classiﬁcation problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by ﬁnetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artiﬁcial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate.


Introduction
In the Arabic language, the word "hadith" means speech or talk. In Islamic terminology, it refers to the words or deeds of Prophet Muhammad (PBUH) or his tacit approval or criticism of the actions of other people. The matn and the sanad are the primary components of all hadiths. The term sanad refers to the narration chain, a chronological list of the narrators who relayed the hadith from the era of Prophet Muhammad (PBUH) to the present day. The term matn refers to the main body of the hadith, the statement or report that needs to be delivered. Figure 1 shows an example from "Musnad Imam Ahmad", one of the nine most famous hadith books. Narrator Abi Hurairah is a companion of Prophet Muhammad who heard the statement directly from the Prophet. The first narrator, Abdul Samad, is a more recent narrator who died in 206. To illustrate how the hadith is transmitted through time by the narrators, we added the death dates of each narrator in Hijri (the Islamic lunar calendar which begins from Prophet Muhammad migration to Madinah in 622 CE). Hadiths are of great importance in Islam. They are a major Islamic legislative source, second only to the Quran. Muslims rely on authentic hadiths to know what is allowed and what is prohibited. hadiths give a detailed explanation of how to practically apply Quranic principles since "Prophet Muhammad is seen as the 'living Quran,' the embodiment of God's will in his behavior and words" [1].
Seeing that hadiths hold great influence, there was an urgent need to make sure that hadiths are not fabricated, to prevent people from being misguided by statements that are falsely attributed to Prophet Muhammad. To that end, the science of hadith emerged. Hadith science is a science that tackles the evaluation of hadith authenticity.
Hadith authentication relies heavily on the sanad and the reliability of the narrators. Hadith scholars believe that if the chain of narrators of a hadith fulfills five criteria, the hadith is to be accepted as authentic: continuity in the chain of narrators; integrity of character; infallible retention; freedom from any hidden defect; and safety from any aberrance [2,3]. The last two criteria apply to the matn, which is rarely examined by hadith scholars.
One of the challenges facing hadith scholars when dealing with the sanad is disambiguating the name of the narrator when only the first name is mentioned in the narration chain [4]. Identifying ambiguous names or " " is a branch of hadith science that deals with unidentified narrators ( ) in the sanad. This is different than dealing with unknown and unnamed narrators (example of unidentified narrator: "Said Muhammad" and example of unnamed narrator: "Said a man"). Narrators are considered unidentified or when someone refers to them by names that are similar to other narrators', e.g., first name or surname.
An example is shown in Figure 2. There are two sanads; the name Hammad is mentioned in both of them. However, each one of them is referring to a different narrator, and they both lived in the same era. The first one is Hammad bin Salamah, and the second one is Hammad bin Zayd. In our narrators' list, 112 other narrators share the first name Hammad. For more common names such as Muhammad or Ibrahim, the number goes even higher. This is a cause for concern. If two narrators are confused, this could lead to trusting someone who should not be trusted or vice versa. To identify a narrator, hadith scholars look at the narrators preceding and following the ambiguous name in the narration chain. It is also helpful to study the biographies of narrators. If they are still unable to identify the narrator, they could search for the same matn in different books to see if the narrator's name was spelled fully elsewhere.
After identifying the narrators, the authenticity of the hadith can be determined by examining the degree of reliability of each narrator in the sanad. Narrators' reliability can be determined by: (1) looking through their biographies, (2) looking at Jarh and Taadeel books to find what the scholars said about them, and (3) comparing what they had narrated with the hadiths of other narrators.
We introduce a new dataset that contains 279,625 artificial sanads. Artificial sanads are similar to real sanads but are created by making different combinations of narrators who narrated from each other. Each narrator in those sanads is tagged with an ID to disambiguate his name. More details on the creation process is in Section 3.1. The artificial sanads are created using narrators' data collected from hadith books and narrators' biographies that are made available in digital format at " ", the Custodian of the Two Holy Mosques for the Prophetic Sunnah. We refer to it as Khadem Al-haramayn website (https://sunnah.alifta.gov.sa/, accessed on 10 November 2021). This website was launched by , the General Presidency of Scholarly Research and Ifta.

Contributions
Our main contributions in this paper are: • Introducing a new Arabic dataset of artificial sanads (AR-Sanad 280K) with identified narrators. This dataset could be used to train systems to disambiguate narrators' names when their full names are not mentioned; • Introducing a new dataset of real sanads that we use as a test set to evaluate models' performance on real data; • We also present a systematic benchmark evaluation using AraBERT, a BERT-based model trained on a very large Arabic corpus. We also evalauate other models on the lite version of the AR-Sanad 280K dataset. This evaluation can be used by other studies to improve the models designed for the narrator disambiguation task.
According to hadith scholars, using a tool for disambiguation of narrators in sanads is very valuable for them, to semiautomate their efforts for studying different hadith's sanads. This saves a lot of time during hadiths investigation.

Related Work
Many works have been published in the field of hadith computation, using computational and machine learning methods to solve problems related to hadith. The survey by [4] covers major computational and NLP-based studies of hadith. In this section, we list research works that are most relevant to our work.

Hadith Computation
The work of [5] use information from the context (sanad), to find out if two narrators in different sanads with similar names are the same person. Comparing might help with building the narration graph. However, it is still hard to know the narrator's identities. For testing, they used hadiths from Sahih Al-Bukhari.
The work of [6] proposes a deterministic way to determine hadith authenticity using information from narrators' biographies. The authors classify sanads into three classes: sahih, hasan, and dha'if and used information from Taqrib al-Tahzeeb by Ibn Hajar to determine the reliability level of the narrators. To identify the narrators, they used data from [7]. To test their approach, they used 2180 hadiths from Sahih Al-Bukhari and 752 from Tirmizi. They achieved an accuracy of 99.6% on Sahih Al-Bukhari and 93.6% on Tirmizi.
Sahih Muslim and Sahih Al-Bukhari are the two most-trusted hadith books. All hadiths in Sahih Muslim and Sahih Al-Bukhari are authentic. Hence, in [8], the authors compiled a list of all their sanads and treated them as authentic. For any other hadith, if its sanad is in the list, they consider that hadith as authentic. If it is not on the list, the hadith is considered as inauthentic. However, this criterion leaves out other possible authentic sanads.
In the existing works in hadith, researchers used to build their own datasets collected from various hadith books. This makes it hard to compare the performance of different systems [9]. So recently, there have been some efforts to construct hadith datasets and make them available for research purposes.
In [10], the authors collected hadiths from four books in Arabic and English; Sahih Muslim, Sahih Al-Bukhari, Sunan Abi-Dawud, and Muwatta Imam Malik. They used regular expression to separate matn and sanad. In [11], the authors created a bilingual hadith corpus of 33,359 hadiths gathered from the six most famous hadith books. They used a segmentation tool to separate matn from sanad and included the degree of authenticity and other details about the hadiths.

Word Sense Disambiguation
The narrator disambiguation problem that we tackle in this paper has some similarity to Word Sense Disambiguation (WSD) where a word could have different meanings depending on the context. A name in the sanad could be referring to different people depending on other narrators in the sanad.
In [12], the authors explore two different strategies of integrating pretrained contextualized word representations for WSD: first, by using a straightforward way by finding a word in the training data with the closest contextual representation and assigning it the same sense, and second, passing the word representation through a linear classifier or filtering the representation vector through a gated linear unit. For the contextualized representation, they test two representations: the hidden vector of BERT in the last layer and a weighted sum of all hidden layers.
In [13], the authors create sense embeddings and use k-NN to determine the word sense. For each sense, its embedding is the concatenation of three vectors. The first embedding vector is determined by averaging contextual representation of words annotated with the given sense in the training data. The second embedding vector is determined by averaging contextual representation of words in the gloss of the given sense. The third vector is the fastText embedding of the given sense lemma.
In [14], the authors retrain BERT with the original masked token prediction jointly with masked word sense prediction.
In [15], the authors leverage relational information from lexical knowledge bases. They use the sum of the outputs of the last four layers of BERT as a vector representation of the target word. They feed this vector to a two-layer feedforward network. Instead of using the network output vector for prediction, they compute another vector such that the final score corresponding to a particular sense is a function of both its score and the scores of other senses related to it.
In [16], the authors use context sentence and gloss pairs to fine-tune BERT to determine whether the gloss belongs to the sense of the target word.
In [17], the authors train two encoders, context encoder and gloss encoder, both initialized with BERT. The context encoder produces a representation vector for each word in the context sentence. The gloss encoder produces sense representation. The target word sense is the sense whose representation has the highest dot product score with the target word representation.

Arabic Named Entity Disambiguation
Named Entity Disambiguation (NED) or entity linking is a research area that focuses on linking mentions of entities in text to their corresponding objects in a knowledge base. Our problem is a specific subfield in this area where we focus only on narrator entities, and we use the data in Khadem Al-haramayn website as our knowledge base.
The work of [18] build AIDArabic, an extended version of AIDA [19] that is more suitable for performing NED on Arabic text.
In [20], the authors use information from DBpedia and Arabic Wikipedia. They search DBpedia using: Arabic entity mention, English Entity mention, or similar entities on Wikipedia. If none of this returned any entities, they search Wikipedia for articles about the entity. The information extracted from the article web page is used to populate an Arabic Ontology for future queries. If more than one entity is returned, they use other mentions of entities in the context to disambiguate the target entity.
In [21], the authors follow an approach that could be used with low-resource languages and applied it to the Arabic language. They used YAGO3 [22] as their knowledge base and extended the named-entity dictionary using JRC-Names [23] and Google Word-to-Concept repository [24]. They also trained a machine translation system to translate named entities. This helps to find entities that are mentioned in Arabic but only exist in the English knowledge base.
In these works, they try to link any mention of an entity, including a person name, to the corresponding object in a knowledge base. They deal with mentions of entities in regular textual data. Our problem is different since sanads contain only names. It is better to use domain-specific knowledge base.
None of these works address the problem of narrator disambiguation. In [25], the authors try to link narrator names with entities from DBpedia. However, they found it very lacking in this domain, a large number of narrators did not have any entities. In our AR-Sanad 280K dataset, we solve this problem by including 18,298 narrators from the list of narrators in Khadem Al-haramyn website.
Khadem Al-haramyn is a website that serves the hadith science. It has a great collection of hadith books in a digital format and offers a variety of services that help scholars with their research. One of the services they offer is narrator identification. Using rules known in hadith science, they identify all narrators in the hadiths they added to the website. In [26], the author explore the narrator identification service and how it could be used. They show the advantages of the service and propose ideas to improve it.

AR-Sanad 280K Dataset
In this section, we describe our new dataset and the creation process. Because the number of sanads in the six famous hadith books is only about 40K sanads, we did not use real sanads from hadiths. Instead, we created artificial sanads by listing chronologically the narrators who heard from each other. We used the artificial sanads for training and reserved sanads of real hadiths for testing.
The artificial sanads we use reflect real sanads. The names we use in the artificial sanads are the names of real narrators, and we do not refer to them by their nicknames or first names randomly. We refer to them by the names that were used in real hadiths. This means that some of those artificial sanads could be found in real hadiths. However, none of the sanads in the test set are included in the training data. In Section 4.3, we show how the models trained on our artificial AR-Sanad 280K dataset perform when dealing with real sanads.

Creating Artificial Sanads
First, we scraped narrators' data from the Khadem Al-haramayn website. The website has a list of 18,861 narrators and their information. Information about each narrator includes their full name, a list of the narrators they narrated from/to, their appearance forms ( ), date of death, etc.
Appearance forms or are short names or ways used to refer to a narrator when he is mentioned in a sanad. As an example, consider a narrator whose full name is: "Ibrahim bin Marwan bin Muhammad bin Hassan". However, he can be referred to by different appearance forms such as "Ibrahim bin Marwan", "Ibrahim bin Marwan Aldemashquey", "Ibrahim bin Marwan bin Muhammad Aldemashquey" or "Ibrahim bin Marwan bin Muhammad Altatarey". Other common appearance forms are meaning, his father, or meaning, his grandfather. When a narrator hears the hadith from his father or grandfather, he might refer to them using those forms without mentioning their names. A few narrators from the list did not have sufficient information to include their names in any sanad. Thus, the final number of narrators that we used in the AR-Sanad 280K dataset is 18,298. In the end, every narrator has a unique ID that is used to form the artificial sanads.
To create the artificial sanads dataset, we go through the list of narrators one by one and do the following for each narrator ID:

1.
We pick a random narrator ID that he/she narrated to and a random narrator ID that he/she narrated from; 2.
For the two narrators IDs we picked, we select two other narrators IDs they narrated to/from; 3.
For each narrator of the five narrators, we select a random name from their appearance forms; 4. The

5.
We repeat steps 1-4 a few times depending on the number of narrators they narrated to/form.
The above procedure produces artificial sanads of length five narrators. Similar procedures can be used to produce longer or shorter sanad chains. We created sanads of lengths (3-7) as those are the most common sanad lengths in real hadiths. Figure 3 shows the distribution of sanad lengths in ∼30 k real hadiths. Notice that we use a standard separator, [ ], and not the narration terms that usually appear in real hadiths such as the hadith in Figure 1. Some examples of narration terms are: narrated, said, mentioned, heard, etc. Extracting narrator names from sanads is another topic discussed by many works [27][28][29][30][31][32], but it is not in the scope of our paper. Those narration terms do not help the identification process. We show the steps to create the sanads in Figure 4. It also explains how we deal with boundary cases where the narrator is a companion of Prophet Muhammad who heard directly from him. In this case, there are no narrators that he narrated from. There is also the case where the narrators lived very recently and no other narrators heard from them, but the hadiths they narrated were written in books.

Special Appearance Forms
Some appearance forms need special attention. When a narrator does not use names and calls to other narrators by their relation lisuch aske , which means his father and grandfather, respectively, we refer to this as using special appearance forms. If we used the same method as before to generate sanads with special forms, the special forms might be selected randomly when the narrator preceding them is not the son or the grandson.
To avoid this, we make sure to select the right preceding narrator whenever we use a special appearance form.
The special appearance forms are frequently used in real sanads. In our dataset, we have 12,123 sanads with special forms.

Dataset Refinement
In this section, we explain the issues we faced while generating the data and how we fixed them.

•
We removed duplicate sanads; • We removed any name that was misspelled in appearance forms; • After filtering the appearance forms, some narrators did not have appearance forms. We referred to them using their full names. If the name was too long we use only their first four names; • We removed duplicate narrators who have identical information in the narrators' list, i.e., same full name, kunia, death date, etc.

Dataset Statistics
We summarize here some statistics of our dataset. To generate train and validation splits, we used scikit-learn (https://scikit-learn.org/stable/, accessed on 10 November 2021) implementation for multilabel stratification. We made sure that every narrator has at least one observation in the training data. The number of sanads in the train and validation set are 223,750 and 55,875 sanads, respectively.
We created overall 279,625 artificial sanads that included 18,298 narrators. In Table 1, we show narrators with the highest and lowest number of appearances in the data. There are 260 narrators who had only one observation in the data. The number of observations for each narrator depends on the number of narrators they narrated to/from (connections). There are 167 narrators who have the name "Muhammad" in appearance forms. In Table 2, we show some of the most common names and the number of narrators who share them. There are 61,598 unique names in appearance forms. Only 3477 of them are shared by more than one narrator. The rest of them are unique to one narrator.

Creating Lite Dataset
The size of the full dataset is about 280K sanad. This prevents us from evaluating multiple models due to the expected longer training time. Therefore, we constructed a lite version of our AR-Sanad 280K dataset. We built our lite version using the same algorithm in Figure 4 but restricted the number of narrators to 2222 narrators. Using the lite version, we were able to train different models and that led us to the final choice of using AraBERT over different alternatives.

Experiments
The most common method that is used by hadith scholars to identify narrators is by leveraging the information about the narrator's students and teachers, i.e., the narrators he narrated to/from. There is a large number of possible combinations of narrators. It is hard to compile a list of all possible cases and how to deal with them. In these kinds of problems, it is preferable to use a machine learning model. It is much easier to let the model learn from data. In this section, we describe our experiments on the dataset using machine learning and deep learning methods.

Lite Dataset
In this section, we describe the experiments on the lite dataset in order to select the best performing models. We do three types of experiments.

Static Embeddings
In the first part of the experiments, we compose a vector representation for each narrator and use it as the input to the classifier. We tested two classifiers, KNN with k = 3 and Naive Bayes. We show the results in Table 3.
The vector representation we used is: FastText300: For each narrator in the sanad, we average FastText embeddings of the tokens in his name.
FastText600: To encode some of the context information, we concatenate two vectors. The first one is the FastText300 representation of the narrator we want to classify. The second one is the FastText300 representation of the following or preceding narrator in the sanad.  [34] on our AR-Sanad 280K dataset. AraBERT is a BERT-based model pretrained on a large-scale Arabic corpus. The size of vocabulary in AraBERT is 64k tokens. This makes a big difference in the performance since our data consists of names only. Notably, 61% of the tokens in our dataset are in Arabert's vocab. This is a good number considering that they are all names. AraBERT also does preprocessing using Farasa segmenter [35] to segment words into stems, prefixes and suffixes. In Section 4.3.1, we show the difference in performance when we do not do preprocessing with Farasa.
We use AraBERT to get a vector representing each narrator in the sanad. We do this by averaging tokens embeddings of the narrator name. Then, we feed this vector to a classification layer to determine the identity of the narrator. Figure 5 illustrates this process. Figure 5. Classification process: At first every narrators' name is tokenized. Next, the BERT model produces token embeddings. Token embeddings that belong to the same narrator are averaged. Finally, a classification layer produces the output Narrator IDs. Better seen after zooming in.
We tried two settings for the model. First, we used AraBERT in the frozen setting. We used the embeddings to train a classification layer. In the second setting, we fine-tuned AraBERT and used the classification layer parameters from the previous experiment as the initial parameters.
Beside using neural networks for classification, we also tried KNN and Naive Bayes classifiers. We also tried building narrator embeddings (narrEmb) and used it together with 1-Nearest-Neighbor to determine narrators' identities. The results of these experiments are shown in Table 4 and show that tuning AraBERT with one classification layer gives the best performance.  Table 5 shows results of tuning other transformer models, namely AraElectra [36] and AlBERT [37]. Both models did not produce as good results as the AraBERT model since they both focus on optimizing model efficiency more than performance. Table 5. Performance of different tuned transformer models.

Model Accuracy
AraElectra [36] 92.1 AlBERT-Arabic [37] 92.5 AraBERT [34] 93.1 AraElectra: It uses replaced token detection as a pretraining task. This enables the model to learn from all input tokens instead of just the small masked-out subset. The objective of this work is to improve the efficiency of pretrained model without hurting the performance. The performance of a small-sized Electra model is not so far behind the BERT model.
AlBERT: It also focuses on model efficiency with reasonable performance. It reduces the number of BERT parameters by 18× and increased training speed 1.7×.

Full Dataset
We report F1 scores on the validation set of the full AR-Sanad 280K dataset in Table 6 (Large) and top-k accuracy on Table 7. The fine-tuned AraBERT achieves 92.9 F1. We tested only AraBERT since it produces the best performance on the lite dataset. We observed that the fine-tuned AraBERT outperforms the frozen version. We also report the sanad error rate to evaluate the model ability to classify whole sanads correctly. Sanad Error Rate (SER) is the percentage of sanads that have at least one of their narrators misclassified. In Table 8, we show the percentage of sanads that have n narrators with wrong predictions. Table 6. F1 scores and sanad error rate (SER) of the models trained on the large dataset. The Large section shows performance on the validation set of artificial sanads. The Real section shows performance on the test set of real sanads. FrozenAraBERT uses pretrained AraBERT with a classification layer. TunedAraBERT uses pretrained AraBERT and initialize the classification layer with the trained parameters from FrozenAraBERT and fine-tune both.

Dataset
Large Real

Effect of Sanad Length
We take a look at how the sanad length affects model performance. We split the validation set by length and observed the change in performance. In Table 9, we report the results. We notice that the sanad error rate increases as the sanad becomes longer. This is expected since a larger number of narrators need to be identified correctly. We also notice that F1 scores become higher. The model becomes better at identifying individual narrators when it obtains more clues (narrators).

Real Sanads Test Set
To evaluate the models ability to identify narrators in real sanads, we collected real sanads from the six most famous hadith books. We gather these data from Khadem Al-haramayn website. It encompasses a large collection of hadith books with identified narrators. Large groups of researchers and specialists in different fields worked together to develop this website [26]. The narrators' group had 30 members. They were responsible for work related to narrators including identifying hadith narrators, creating a database for them, etc., [38]. Table 10 shows the number of sanads we extracted from each book. After removing duplicate sanads and sanads with lengths less than 3 or more than 10, the total number of sanads in the test set is 27,056. We did not include hadiths that had more than one sanad. The list of 18,298 narrators is collected from a large number of hadith books. Only 6378 narrators were included in the test set, since the test set has only sanads from the six most famous hadith books. We report the results on the whole test set are in Table 6 and top-k accuracy on Table 7. In Table 11, more detailed results on each book are shown separately.
In Table 8, we show the percentage of sanads that have n narrators with wrong predictions. We see that a large percentage of the sanads have only one misidentified narrator. This caused the SER to be as high as 60.6% when only 23.8% of the sanads more than one misidentified narrator.

Effect of Farasa Segmenter
In this section, we observe the change in performance with and without preprocessing using the Farasa segmenter. First, we show examples of AraBERT tokenizer output in both cases in Figure 6. In the examples shown, Farasa separates (equivalent to the in the English language) and the possessive pronoun (equivalent to his in the English language) from the rest of the word. By separating any word from anything that is not an intrinsic part of it, Farasa helps improve the performance of models. In Table 12, we compare the performance of TunedAraBERT on the test set when we trained it with Farasa versus with no preprocessing.

Implementation Details
For training the classification layer in FrozenAraBERT, we selected the best pair of batch size and learning rate from [32,64,128,256,512,1024] and [1e − 4, 1e − 3, 0.01, 0.05, 0.1, 0.5]. We trained the model for five epochs for each pair of parameters and selected the pair that achieves the highest accuracy on the validation set. We show in Figure 7 the training curves of some of the parameters that we tested. When fine-tuning, we reduce the learning rate by a factor of 10. The parameters we used are shown in Table 13. The model was trained with Adam optimizer [39]. Due to restrictions on the RAM available, we used a batch of size 32 on the fine-tuned model when training on the full AR-Sanad 280K dataset.  For frozen and tuned AraBERT, we start with the selected learning rate. During training, we decrease the learning rate by a factor of two when the accuracy on the validation set is not improving. An initially large learning rate suppresses the network from memorizing noisy data while decaying the learning rate improves the learning of complex patterns [40]. Training stops after a certain number of trials decreasing the learning rate. Ten trials for frozen AraBERT and eight trials for fine-tuning.

Analysis
We explore some of the possible causes for error on the validation set. We examine the model TunedAraBERT performance on: narrators with low number of appearances in training data and narrators who have common or special appearance forms. We observed the following: • First, we look at some of the predictions made by the model. In Figures 8 and 9, we show a few examples of false and true predictions and the narrators' true identities. We notice that in most cases the model's confidence level is higher for true predictions than false ones. In total, 68.3% of all narrators were identified correctly with a confidence level of 90% or above. In total, 81.7% of the correct predictions have a confidence level of 90% or above. Only 12.9% of the false predictions have a confidence level of 90% or above; • Special appearance forms could be a little confusing for models, since the narrator name is not stated. Only 71% of the narrators that appeared in a special form were classified correctly; • Figure 10 shows examples of narrators that were not identified correctly, but their true identities were in the top five most probable ones. Most of them are called by common short names; • As we shown in Table 2, there are many narrators who have similar appearance forms. There were 3669 instances of the names showed in the table; only 26% were correctly identified. We hope that our AR-Sanad 280K dataset could be used to build better systems that can manage to avoid such errors.   . Examples of narrators whose true identities were not on top but were still in the five most probable identities.

Conclusions
We presented a new dataset of artificial sanads that could help models identify narrators with ambiguous names. Narrator disambiguation is very important in order to authenticate hadiths. This dataset could be used to build systems that could help hadith scholars who are working on this problem.
We fine-tuned AraBERT on our dataset and used it to classify narrators with 92.9 micro F1 and 92.5 macro F1 on the validation set. We also created a test set of real sanads from the six most famous hadiths books and tagged the narrators with their IDs. Our best model achieved micro and macro F1 scores of 83.5 and 78.8, respectively, on the test set.