1. Introduction
This research forms part of the BiblIndex project, an online index of biblical textual reuses by the Church Fathers. It was developed at the Institut des Sources Chrétiennes in Lyon. The website currently provides lists of canonical references, linking a particular verse with a specific passage in a patristic work. The main advantage of this system is its ability to process large volumes of data and provide an overview of the role played by specific Bible passages in the history of interpretation. However, this can be an overly simplistic way of delineating the boundaries between source and target texts. Under other biases, it ignores fluctuations in the canon during the production of the earliest Christian writings as well as the influence of patristic quotations on the subsequent canonical form of biblical texts. In this paper, however, we will focus on another issue: the rich intrabiblical intertextuality that often makes identifying a precise biblical reference difficult when determining which text the Church Father was referring to. Our global technical objective is to develop a tool that classifies verses based on their proximity to others. This will help annotators of the BiblIndex corpus to select one biblical reference over another and offer website users alternative verses if they wish to expand their research.
Given the Psalms’ status as Jesus’ prayer book and their shared cultural significance in Jewish and Christian traditions, we selected the reuses of the Psalms in the Greek New Testament as a preliminary case study. We used the Septuagint version of the Psalms because specialists today consider this Greek version to be the most important source of Psalmic quotations in New Testament writings (
Dorival 2016)
1. This corpus is large enough to allow us to make several observations. We identified 614 Psalms reuses in the New Testament.
Instead of exploring the potential of supervised learning with neural networks, we used conventional similarity detection methods to evaluate their effectiveness and establish their potential usefulness to philologists and theologians annotating the BiblIndex corpus. A neural network approach would require a corpus of validated ancient biblical text that machines could use directly, and such a corpus currently does not exist. Additionally, the Bible is insufficient in volume. Furthermore, our goal is to produce a configurable tool that is transparent enough to allow the settings to be adjusted according to scientific criteria for evaluating reproducible results—a capability that would be precluded by AI. Firstly, we aim to identify text reuses ranging from distant thematic echoes to explicit literal quotations, and second, to classify them into a precise typology based on measurable criteria such as various morphosyntactic characteristics and proximity to the source text or explicit mentions of the act of quoting. This approach could be extended to other biblical corpora.
After presenting our gold standard reference corpus, we will outline the methodology adopted to compile the data. Next, we will present the numerical methods applied to detect and classify instances of text reuse. Finally, we will present and analyze the results.
2. The Gold Standard Reference for Text Reuse Between the Psalms and the Greek New Testament
The prevalence of psalm-like language in the New Testament is well-documented. A significant number of records relating to quotations from the Psalms, or more tenuous textual reuses, have been meticulously documented in the notes of most contemporary Bibles. In addition, many partial lists are available online. Several specific studies have also been conducted on quotations from the Old Testament in the New Testament. Some authors, including
Archer and Chirichigno (
[1983] 2005), have already developed typologies based on the proximity of the New Testament text to the Septuagint or Hebrew text. However, a thorough examination of the provided lists of occurrences reveals that the research is neither exhaustive nor definitive. The differing objectives of these studies are one key factor in explaining this lack. Some studies are only concerned with identifying literal text reuse for the purpose of philological analysis. Discrepancies also arise because the text may refer to the Hebrew text and Greek translations that differ from the Septuagint
2. However, the main reason for the discrepancies is the variety of textual reuse, which extends far beyond simple, explicit quotations, consisting most of the time of tenuous echoes or reminiscences.
Quotations, which are verbatim textual reuses of different lengths that may be either explicit or implicit, depending on whether introducing terms referring to the Scriptures (e.g., καθὼς γέγραπται) are present. Quotations may be literal, meaning the words in the quoted psalm and New Testament verse are identical, or modified, meaning the source is inserted into the target text with minor lexical or morphosyntactic changes, e.g.,
Echoes are non-verbatim textual reuses that share something with the source text. There are three types of echoes: thematic, semantic, and lexical. Thematic echoes share a common theme, e.g.,
Semantic echoes share a common lexical field, e.g.,
Lexical echoes share one or two lemmas. These three characteristics may occur simultaneously.
Coincidences are lexical proximity in different thematic and semantic contexts.
The following typologies will be used in the
Section 5:
Typology 1: distinction between quotations, echoes, and coincidences.
Typology 2: distinction between explicit and implicit.
Typology 3: distinction between literal, modified, lexical, semantic, thematic, and mixed text reuses. Mixed text reuses combine several characteristics from the list above.
Figure 1 displays the respective distributions of types within each typology. As can be seen, types have unbalanced populations. Note that the “modified” and “literal” types only apply to quotations.
3. Corpus Preparation
The successive stages of corpus preparation will be described. These stages are extensive and intricate, and require meticulous verification.
3.1. Source Texts
We used as source texts the
Greek New Testament (denoted as NT in this paper) edited by Tischendorf in 1869, taken from the PROIEL project (
Eckhoff et al. 2018) and the Rahlfs’
Septuagint (LXX), edited in 1950, taken from the BiblIndex data (text and verse segmentation). Rather than applying the natural language processing (NLP) pipeline from scratch on raw texts, we preferred to reuse preprocessed and verified open-source data as much as possible, as current NLP software programs still require time-consuming manual corrections when applied to Ancient Greek texts. Despite the existence of numerous sources of already lemmatized biblical texts available online, acquiring high-quality data remains challenging. Several textual databases containing the
Septuagint and the
Greek New Testament provide lemmas and morphosyntactic information. This information can be accessed by clicking on a specific word (e.g.,
Thesaurus Linguae Graecae, BibleWorks, etc.). However, users are unable to access a file containing the entire lemmatized and annotated text. Consequently, our dataset is a patchwork of several resources coming from different projects in computing humanities and NLP applications performed by us. New Testament tokens, lemmas and Part-of-Speech (PoS) tags originated from a PROIEL file
3. and the
Psalms’ lemmas originated from the OpenScriptures repository (Resources for Biblical Texts in Greek, specifically the Septuagint)
4. Note that 291 verses were not annotated in the New Testament file (see below in
Section 3.4 for details on data filling).
We had to pay particular attention to diacritical marks in the Septuagint data because these are encoded differently from those in the New Testament. Additionally, we identified errors in the diacritical mark encoding when integrating Trench’s synonyms
Trench (
1880) (see
Section 3.7) and the
Louw and Nida (
1988) semantic domains (see
Section 3.8) data to our analysis. Therefore, we decided to remove all diacritic marks from our dataset.
3.2. Partitioning the Corpus
Determining the basic textual unit for a tool designed to analyze reuse phenomena involves balancing algorithmic performance (optimizing the size of the units) with the analytical scale, in line with end-user practices. In our case study of intrabiblical intertextuality, philologists describe reuses, particularly in the critical apparatus, on the basis of the long-established system of versification (dating back to the sixteenth century).
To make similarity and proximity measures more interpretable while remaining close to philological practices, we chose biblical verses as the basic textual unit for our similarity measures. Alternatively, a segmentation into sentences or clauses could also have been adopted for greater semantic coherence. However, such an approach would have made it difficult to relate the results of the text reuse analysis back to the verse division, thus reducing the usability of the tool.
Nevertheless, as sub-verse units are commonly referred to in biblical and patristic studies—especially in the Psalms, where verses often consist of distichs—we carried out a manual segmentation based on units of meaning. Psalmic verses may therefore contain one, two, three, or even four parts in our analysis. Finally, our two corpora comprise 7939 verses from the New Testament and 3093 verse parts from the Book of Psalms. The next steps are applied independently to each of these textual entities.
3.3. Tokenization
All of our similarity measures are based on sequence comparisons at the word level. In our study, therefore, the tokenization step simply involves individualizing words based on spaces, apostrophes, and punctuation marks. In the case of Psalms, tokenization is performed using the SpaCy Ancient Greek pipeline
5 named GreCy
6 (note that tokenization and all necessary NLP steps are based on the
grc proiel trf model
7 which are primarily established on the New Testament corpus). After tokenization, we also used this model to perform cleaning steps, such as suppressing punctuation, accents and spirits.
3.4. Lemmatization
We then lemmatized our corpus. As explained above, the lemmas data were obtained from external sources. Nevertheless, we compared it with lemmatization performed using different NLP tools
8: the Classical Language Toolkit
9 and GreCy. CLTK’s performance on the Septuagint is unsatisfactory. It often fails to analyze complex verb forms and participles, and it frequently makes gender errors. An evaluation of the results on the first two psalms provides further insight (see image: the red words required a correction). Although CLTK proved more effective in its analysis of the New Testament, it failed to identify elided prepositions such as ‘μετ´’ and ‘παρ´’. Additionally, there were inaccuracies in conjugations, and personal pronouns were not always in the nominative case. Finally, we chose to use the lemmatized
New Greek Testament available in PROIEL. This decision proved superior thanks to its superior Part-of-speech (PoS) analysis and systematic verification by Hellenists. However, we reported a systematic error in cases of crase (‘κἀγώ’ instead of ‘καὶ’, ‘ἀγὼ’), which we solved by automatically replacing terms in a post-processing step.
Unfortunately, the PROIEL data were incomplete: Chapter 13 of the Letter to the Hebrews, 1 Peter 3–5, 2 Peter 1–3, the epistles of John and Jude, and a few other verses were missing. This totaled 291 verses. Therefore, we had to complete the lemmatization and PoS tagging of these verses manually.
3.5. Parts-of-Speech
PoS tags were already provided by the New Testament source used (from the PROIEL project), and manual verification confirmed the quality of the tags. Tagging of Psalmic verse parts was done by GreCy in the same NLP pipeline application as tokenization. As these differed from the PROIEL PoS tags used for the New Testament, we chose to harmonize the data using the (
Prévost et al. 2009) referential. Initially focused on medieval French and Latin languages, this French project has a broader objective of incorporating multilingual grammatical analysis. The resulting tags were thus obtained through a correspondence table (see
Appendix A).
3.6. Stop Words
To compare the efficiency of these two methodological approaches, we created two versions of our texts: one with stop words and one without. We used the GreCy stop word list
10.
3.7. Lexicon
Since paraphrasing a source text using synonyms is a common feature of intertextuality, we attempted to incorporate the ability to compare synonyms into our method. From the filtered lemmas of each verse, i.e., ordered lists of lemmas without stop words, we created a lexicon containing all the lemmas of the verse (without duplicates) in alphabetical order. Based on a digitized database of the
Synonyms of the New Testament (
Trench 1880), we built a table containing the 124 lists of synonymous lemmas (see the project GitHub repository
11). Using this table, we replaced each lemma having synonyms with an arbitrary lemma within the corresponding list. For a given list of synonyms, the arbitrary lemma is obviously the same for each replacement. After completing this process, we obtained a lexicon (one for each New Testament verse or Psalms verse part) that was less specific to a given context than the filtered lemmas.
3.8. Semantic Domains
Since most quotations from the Psalms in the New Testament are neither explicit nor literal, we used tools to identify paraphrases and lexical similarities. Once again, despite the existence of many print resources
12, we had to build our own tool.
Since we needed a mass processing of all the lemmas, we decided to use the (
United Bible Societies 2023), adapted from the
Semantic Dictionary of Biblical Greek (SDGNT). This revised and reformatted edition of
Louw and Nida (
1988) is supplemented with an exhaustive list of biblical references for each lexical meaning. It was produced and kindly made available by the Summer Institute of Linguistics (SIL). This dictionary contains a lexical analysis for each entry, including definitions, glosses, all scriptural references, and lexical-semantic domains or subdomains.
A table was created from an XML file
13 including the Louw–Nida lexicon that associates semantic domains and subdomains with each New Testament lemma. Since the Louw–Nida lexicon only covers the New Testament, the table was completed for the 614 Psalms-specific lemmas: semantic domains and subdomains (from Louw–Nida categories) were attributed to each of these.
3.9. Multi-Representations of Verses from NLP Operations
We stored and gathered the resulting sequences of textual entities (tokens, PoS tags, lemmas, domains, etc.) for each of these NLP operations applied sequentially. Thus, each New Testament verse or Psalmic verse part is described by eight linguistic representations; organized in two tables containing, respectively, 7939 (NT)/3093 (Psalms) rows and eight columns of sequences:
Tokens: An ordered sequence of unaccented Greek terms.
Lemmas: An ordered sequence of unaccented Greek lemmas.
PoS: An ordered sequence of part-of-speech tags.
Stop words: An ordered sequence of unaccented Greek stop words.
Filtered lemmas: An ordered sequence of unaccented Greek lemmas without stop words.
Lexicon: An unordered sequence of unaccented Greek lemmas for synonym substitution.
Domains: An unordered sequence of semantic domains.
Subdomains: An unordered sequence of semantic subdomains.
Each sequence of items provides a distinct representation of a given verse. Thus, these sequences can be grouped into
literal representations (tokens and lemmas),
grammatical representations (PoS and stop words),
lexical representations (filtered lemmas and lexicon), and
semantic representations (domains and subdomains). These two tables constitute our datasets. In other words, they are the basis for all subsequent experimentation.
Figure 2 presents an overview of the elaboration process of the two datasets (the New Testament and the Psalms), summarizing the operations, integration of external resources and application of GreCy. Note that, at this stage of our analysis, the data are strictly textual (no numerical values) and can thus be interpreted and reviewed by individuals without digital tool expertise. Complete versions of our New Testament and Psalms datasets (including the lemmatized NTG text) are available on the project GitHub
14.
4. Numerical Methods
In the previous
Section 3, we created verse representation datasets for the New Testament and the
Book of Psalms, using raw Greek texts and linguistic or philological resources. These representations comprise only textual entities (e.g., Greek terms, lemmas, PoS tags) and are strictly knowledge-based, with no statistics involved. Consequently, the objectives of the numerical methods section are twofold.
First, we introduce algorithmic operations that quantify the similarity between each representation of NT and Psalms. These operations attribute a numerical value corresponding to the proximity of the sequences within representations of a given NT verse and a given Psalms verse part. Thus, each pair of an NT verse and a Psalmic verse part has eight values. These values correspond, respectively, to the measured similarities between the token, lemma, PoS, stop words, filtered lemma, lexicon, domain and subdomain representations of the given two verses (or a verse and a verse part). Therefore, we speak of a numerical textual reuse model based on literal, grammatical, lexical, and semantic similarity measures.
Second, we apply statistical methods to analyze these numerical measures of similarity between verse representations to automatically perform several philological tasks:
Detection of new textual reuses of the Psalms in the New Testament.
Prediction of existing Psalmic reuses in the NT (methodological validation).
Classification (i.e., clustering methods in data science) of these reuses according to the balance of the different types of similarity measures (grammatical, lexical, semantic, etc.).
Note that all of these analyses are based solely on
unsupervised methods (including machine learning algorithms), meaning that no training is involved in the sense of data science. Although we created a gold standard dataset of psalm reuses in the New Testament (see
Section 2), the information in it is not integrated into the statistical methods of our model. Rather, it is used for comparison to validate (see
Section 5) and support the interpretation of the results (see
Section 6).
4.1. Measuring Similarity Between Representations
To characterize the proximity between a given Psalmic verse part and an NT verse, we compute eight independent similarity measures, one for each type of verse representation. The results of detection, prediction and classification of reuse occurrences depend on the chosen similarity measure method. To propose a transparent method that avoids the black-box effect and preserves interpretability for non-experts, we chose a more deterministic approach than a statistical one, which implies the numeration
15 of verse representations. Therefore, we used
string similarity metrics (
Navarro 2001)
16, a common family of methods called that compare sequences of textual items (characters, words, etc.) directly and return numerical values corresponding to their definition of similarity.
4.1.1. Sequence Similarity Metrics
Since we have both ordered and unordered sequences (see
Section 3.9), two different metrics are required. For the ordered sequences, we selected the
Levenshtein distance, which is a widely used string metric (
Navarro 2001) for this type of task because it is easy to understand. The
Levenshtein distance between two strings
and
is equal to the minimum number of operations on the elements of
necessary to obtain
(or conversely, as
is symmetric). Admissible operations on elements include
deletion (delete an element anywhere in the sequence),
insertion (add an element anywhere in the sequence) or
substitution (change an element anywhere in the sequence). For example, the
Levenshtein distance between the two Greek term lists
is equal to two as one insertion (λόγος) and one substitution (
) are necessary to convert the first list to the second. For unordered sequences, we defined a set similarity metric (see Equation (
1)) based on the difference between the union of the cardinal sum of the two sequences and the minimum cardinality. We also determined the minimum number of operations needed.
4.1.2. Normalization Processes
The results of the introduced similarity metrics depend directly on the lengths of the sequences. Since these are rarely equal (e.g., NT and Psalms verses), normalization of the similarity measures is necessary to avoid implying sequence lengths in the results. Additionally, comparing values within statistical methods requires a uniform scale. Therefore, we apply a normalization process based on sequence length comparison. This process is applied to each metric, yielding a measure of similarity ranging from 0 (identical) to 1 (totally different). Here, we propose two normalization processes that address sequence lengths differently. The first normalization process (see Equation (
2)) divides the similarity measure between sequences by the length of the longest sequence. The second normalization process (see Equation (
3)) addresses the issue of very long sequences requiring a large number of deletions in the Levenshtein distance process. This method attempts to reduce this bias.
where
is the similarity metric (
or
) between sequences
and
, while
and
are the respective lengths of these sequences.
These two normalization processes may encode different information about the similarity between the representations, and since we do not know which process is best suited to a given pair of NT verse and Psalms verse part, we apply both processes on each of the eight representation similarity measures. This results in 16 numerical values in the range of for each pair of verses.
Finally, we obtain a knowledge-based model composed of 24,555,327 samples (all possible pairs between NT verses and Psalms verse parts) described by 16 features. This model forms the basis for various analyses of Psalmic reuses in the New Testament at different scales. Five samples are presented as examples in
Appendix B. As this methodological study focuses on our numerical method applied for the detection of Psalmic reuses in the NT, we propose three approaches linked to the established gold standard: reuse prediction, detection, and clustering. By comparing with this reference, we can measure the detection performance and clustering coherence with typologies.
4.2. Reuse Prediction
Note that the ultimate goal of this tool is to assist philologists, patrologists and biblical scholars. We simulate the search for Psalmic reuses in the NT by performing reuse
prediction on our intertextuality model. The process takes an input NT verse (or a range of NT verses) from the gold standard data set as an entry. It returns the number of predicted Psalms verse parts equal to the number of references to the input verse.
Figure 3 illustrates the schematic view of this prediction process.
Computational analysis sorts all the verse pairs in the textual reuse model containing the input NT verse from the most probable reuse occurrence to the least probable. Finally, it returns the N Psalms verse parts associated with the N first sorted pairs. N is the number of Psalmic reuses related to the NT verse in the gold standard data. Then, we compare the predicted Psalms verse parts with the Psalms verse reuses in the gold standard.
In this strictly defined process, the sorting rule is an open choice. As our model comprises various types of similarity measures (literal, lexical, grammatical and semantic), we propose performing predictions either independently on each similarity measure (e.g. predictions based only on lexical similarity) or through a combination of several measures (e.g., the mean of similarity measures from the same normalization). In this study, we evaluate multiple cases: 16 similarity measure values (one by one), mean values of eight similarity measure values (i.e. all verse representations) for both normalization, and the overall mean value.
Note that in a real usage case (i.e., without a gold standard), the user defines the input verse, selects a sorting rule and specifies the number N of predicted Psalms, because the number of Psalmic reuses in the input verse is unknown.
4.3. Reuse Detections
The detection process simply involves computing a score for each pair in the textual reuse model and returning the top-scoring pairs (without any input). As with the prediction method, detection is based on a sorting rule chosen by the user. Here, we attempt to detect new intertextuality occurrences by using the mean values of the eight representations’ similarity measures for both normalizations and the overall mean value.
In a real usage case, reuse detection is a complementary approach to reuse prediction. This is because it does not depend on an input verse. Thus, it allows for the discovery of unexpected reuses. However, the results are too lengthy to examine, so not all reuses can be found through detection alone (in a large corpus). Additionally, not all types of sorting rules can be applied due to the high number of false positives when using certain similarity measures alone (e.g., the similarity measure of PoS representations).
4.4. Reuse Clustering
The multi-similarity measures approach enables comparison of similarity types (literal, grammatical, lexical and semantic) in order to categorize reuses according to their balance of these measures. The corresponding numerical methods are based on the statistics of a reuse dataset, which is obtained by extracting samples from the textual reuse model (i.e., building a reduced dataset by selecting pairs of verses). For this study, the reduced dataset is defined by the gold standard of reuses to enable a comparison of the statistical categorization (a posteriori) with the typologies of reuse established by experts.
Clustering methods are used computationally to find a statistically meaningful partition of the dataset. These methods investigate the underlying structure of the dataset, grouping its samples according to their relative distances within the space defined by the features. In this case, the features are the eight similarity measures. Several clustering algorithms (K-means, Gaussian Mixture Model, hierarchical clustering, and DBSCAN in UMAP Atlas) were tested and compared with the gold standard. For clarity, this study only presents and uses one clustering from the k-means model which can be considered as the simplest and most common clustering algorithm. It tries to partition n samples into k groups—k defined by user—where each sample belongs to the cluster with the nearest mean (cluster centers), resulting in a minimization of within-cluster variances (i.e. minimizing squared Euclidean distances). See (
MacQueen 1967) for more details.
5. Results
5.1. Detecting Psalms Reuses
5.1.1. Comparison with Standard Text-Reuse Tool
The same Psalms reuse detection experiment was conducted using Passim
17, a standard text reuse detection tool based on n-gram alignment (
Smith et al. 2015), to enable a performance comparison using the same corpus (preprocessed as explained in
Section 3) and gold standard (see
Section 2). For both Passim and our method, we performed Psalms reuse detection in the New Testament in a fully unsupervised manner. We counted the number of detections that effectively belonged to the gold standard, and derived the percentage of gold standard reuses found. For our multi-representation similarity measures, we computed the mean values of the measures for normalization 1 and 2 as the sorting rule. For Passim, given the large number of parameters and how sensitive Passim is to them, various sets of parameters and text representations were explored to maximize performances
18. Note that, to generate comparable results, we configured each method to obtain exactly the same number of reuse guesses. Results for both methods and Passim’s test cases for three different text representations are displayed in
Table 1.
The results of
Table 1 seem to show that it is difficult to detect text reuse in this dataset, regardless of the methods and or text representations used. However, as these results are based on only 240 reuse guesses, while the gold standard contains 614, these statistics must be interpreted relatively, not absolutely. In contrast, we found that our method, which is based on multi-representation similarity measures of text, significantly outperforms Passim’s best results. Additionally, we observed that lemmatization and stop word filtering negatively affected n-gram alignment for Psalms reuse detection in the New Testament. This means that, compared to our method, Passim avoids the NLP steps required to build text representations. However, Passim’s sensitivity to its parameters requires optimization, which may be time-consuming and difficult, while our approach does not require tuning.
5.1.2. New Psalms Quotations and Echos Detected in the New Testament
Although the textual reuses of Psalms in the NT have been widely investigated and referenced in our gold standard dataset, applying the detection process defined in
Section 4.3 led to the detection of new reuses. For the sorting rules, we computed the mean values of measures for normalization 1 and 2. Of the 240 detection first results that we reviewed, i.e., 240 pairs of New Testament verses and Psalms verse parts, 56 existing quotations and 10 echoes were referenced in the gold standard dataset. Of the unknown verse pairs, 68 were unpublished possible reuses (to our knowledge), and the rest were false positives. As with the gold standard occurrences, we annotated the detected reuses according to the same typology. The profile of the newly detected possible reuses differs greatly from the gold standard distribution (see
Table 2 in comparison with
Figure 1a,b).
Among the 68 reuses detected, 20 are distant lexical coincidences, which cannot be considered real text reuses, but rather testify to a common lexical substrate. For example, the word τέλος, meaning “end,” appears in the recurrent Psalmic formula “Εἰς τὸ τέλος” and retains its literal meaning in the New Testament verse. Similarly, the expression “ἐν γλώσσῃ” was found. There are 45 implicit echoes to Psalmic verses, among others. For example, Acts 4:24 echoes Psalm 134:6. The former reads Δέσποτα, σὺ ὁ ποιήσας τὸν οὐρανὸν καὶ τὴν γῆν καὶ τὴν ϑάλασσαν καὶ πάντα τὰ ἐν αὐτοῖς, “Lord, you who made heaven and earth and the sea and everything in them”, while the latter reads πάντα, ὅσα ἠϑέλησεν ὁ κύριος, ἐποίησεν ἐν τῷ οὐρανῷ καὶ ἐν τῇ γῇ, ἐν ταῖς ϑαλάσσαις καὶ ἐν πάσαις ταῖς ἀβύσσοις, “The Lord has done everything he wanted to do in heaven and on earth, in the seas and in all the depths.” Both verses just share the semantic domain of creation. Another example is John 10:3 and Psalm 99:3, which share the word πρόβατα. Of these echoes, one-third have no shared words with the proposed New Testament verse, and only thematic similarity. Only three detections are implicit non-literal quotations:
![Religions 17 00088 i004 Religions 17 00088 i004]()
![Religions 17 00088 i005 Religions 17 00088 i005]()
![Religions 17 00088 i006 Religions 17 00088 i006]()
In the first two cases, the Bible’s annotations refer to another psalm because they are undoubtedly part of an explicit, longer quotation. 1 Peter 3:10–12 quotes Psalm 35:13–17, and Hebrews 1:6–8 quotes Psalm 8:5–7. In the last case, Hebrews 10:8 repeats Hebrews 10:5 and refers to the same psalm in the Bible apparatus. Since the first reference was so recent, noting the corresponding verse again was deemed unnecessary. In conclusion, none of the new detections are useful for exegetical or theological analysis.
5.2. Predictions of Psalms Reuses in the New Testament
As introduced in Method
Section 4.2, the automatic prediction of reuses is an experiment conducted with our tool on a labeled dataset, i.e., a gold standard. After reviewing the reuse detections, we added the 68 positive reuses detected to the existing dataset presented in
Section 2. This yielded a complete reuse dataset of 682 occurrences. The model performs a detection for each of these entries: it tries to predict a source Psalmic verse from a New Testament verse input, seeking the highest similarity value within all possible pairs of NT verses/Psalms verse parts. This prediction test was applied independently for each type of representation in order to compare the performance of each similarity measure and evaluate their respective contributions to a global prediction rate. Prediction success is measured by prediction rates, i.e., the ratio of true predictions to the total number of predictions for each representation. These results are presented in
Figure 4, which shows a bar plot of the 16 prediction rates and contributions. Contribution is a measure of how relevant a representation is to the prediction, i.e., the increase in the true predictions due to a specific representation. As with the prediction rate, contribution is a ratio of true predictions for each representation, but it is computed among the subset of true predictions obtained only through a single representation.
5.2.1. Prediction Rates
First, we observe similar trends in the results of both normalizations: the eight bars on the left (normalization 1) are compared with the eight bars on the right (normalization 2). The normalization 1 results are always significantly higher than the normalization 2 results, except for the subdomains. It happens because normalization 2 is much more sensitive, generating more false positives. Second, comparing the prediction rates from the different representations yields the following results:
Filtered lemmas and lexicon are the best representations for automatically predicting reuses.
The
lexicon has significantly better performance (in terms of prediction rates and contribution) than
filtered lemmas. This means that using a synonym database (see
Section 3.7) is relevant for reuse detection.
Tokens are slightly better than lemmas, especially for normalization 1. This is likely due to the presence of stop words in these two representations because lemmatized stop words may produce more false positives.
PoS, stop words, domains and subdomains—i.e., the most abstract representations—exhibit lower prediction rates than representations that include meaningful terms.
Finally, we note that the prediction rates are quite low. The highest value is approximately 0.3 and only eight of the 16 bars are greater than 0.1. However, the overall prediction rate–i.e., the ratio of correct predictions from all representations–reaches 0.5. This means that reuse predictions are not achieved by a single representation. The contribution score aims to measure the impact of each representation on the overall prediction rate.
5.2.2. Contributions
We observe that lexicon 1 accounts for about of the reuses predicted by a single representation. However, the surprising result is that, except for PoS and domain 1, all representations increase the overall prediction rate. Even representations with very low prediction rates or that appear redundant provide information for the automatic reuse model. This is particularly noticeable for domain 1 and subdomain 1 which have relatively high contributions compared to the same representations in normalization 1. These results demonstrate the relevance of normalization 2 for reuse prediction, and the relevance of a multi-representation approach combining literal, grammatical, lexical and semantic similarity measures.
5.2.3. Predictions by Typologies
Reuse predictions are not uniformly distributed within the gold standard typologies. Among the
of correct predictions,
quotations and
explicit reuses are strongly overrepresented, compared to
echoes,
coincidences and
implicit reuses.
Figure 5 shows these results, with high scores reaching
for
quotations and
explicit reuses, and mitigated results around
for the rest. Due to the significant proportion of echoes and implicit reuses in our corpus, this average result of
is closer to the resulting prediction rates and the overall rate given in
Figure 4. More precisely, the missed quotations are
implicit or
modified. The only exception is one literal, explicit quotation (John 10:34/Psalm 81:6). This leads to a
prediction rate of literal quotations. Given the nature of our corpus—composed of a majority of echos, coincidences and implicit reuses (cf.
Figure 1)–achieving such a high prediction rate for literal and explicit text reuse is a significant achievement.
5.3. Clustering of the Psalms Reuses in the New Testament
We created a reuse dataset consisting of 682 rows, each described by 16 values ranging from 0 to 1, based on 16 similarity measures derived from our eight representations and two normalizations. As explained in
Section 4.4, we used the K-means method to cluster these reuse occurrences into a small number of groups that share literal, grammatical, lexical, and semantic characteristics. We selected five clusters and compared them to the third typology described in the gold standard section: literal, modified, lexical, semantic, thematic, and mixed. After clustering all reuses into five groups, we computed the consistency of each group in relation to the reuse types. Finally, we plotted the proportion of each reuse type within the five clusters in
Figure 6.
As we can see in
Figure 6, there is significant coherence between our third typology and the clustering results through the type proportions (indicated by colors) in the bars representing reuse clusters. Each cluster can be characterized and attributed to a type as follows:
Thematic cluster: Thematic echoes with a small amount of lexical reuses, as well as some rare modified, literal and semantic reuses.
Mixed thematic, lexical and semantic echoes (mainly).
Semantic cluster: Almost all semantic reuses are found in this cluster.
Pseudo-literal cluster: A significant portion of literal and modified (near-literal) quotations, as well as thematic and semantic reuses.
Literal cluster: The vast majority of literal and modified quotations.
Note that
lexical echoes or coincidences, and
modified quotations, which are not quantitatively predominant, are not really assigned to a specific cluster, but rather spread across all clusters. This is probably because lexical similarity forms the basis of our intertextuality model, as demonstrated in the prediction results in
Section 5.2 above. This means that the model has difficulty differentiating lexical reuses from other types. We also observe two opposite trends in the proportion of reuse types: As the number of literal quotations in a cluster increases, the quantity of thematic reuses decreases. Therefore, we can interpret the process of reuse attribution corresponding to our third typology as follows: The more subtle a reuse is (i.e., the farther it is from a literal quotation), the more likely the expert is to tag it as thematic. Finally, the consistency of the unsupervised reuse classification, shown in
Figure 6, indicates that similarity measures based on verse representations encode information about reuse linguistic categories. Thus, they can help characterize intertextuality between two corpora.
In addition, to interpret the clustering, we computed eight mean similarity measures—one for each representation—for each cluster. We plotted these mean measures for a given normalization on a radar chart (see
Figure 7) to compare the values and shapes of the five clusters for a given normalization. Regarding normalization comparisons, normalization 2 is clearly more sensitive to differences between reuses because the shapes of the clusters diverge much more than they do with normalization 1. Although normalization 1 produced significantly better prediction results, normalization 2 is nevertheless more suitable for characterization. For clarity, we chose to only present normalization 2 radar charts. As expected, cluster shapes decrease as the proportion of literal reuses increases (from cluster 1 to cluster 5). We also found that
parts-of-speech similarity measures were lowest because
parts-of-speech values were restricted to a few tags. As expected, measures based on the
lexicon and
filtered lemmas exhibit the greatest variation between clusters. Measures based on
domains and
subdomains are not significant for discriminating between clusters. However, semantic
subdomains and
domains representations help distinguish clusters 1 and 2 from cluster 3, which contains semantic reuses. Clusters 1 and 2 (thematic) differ from cluster 3 (thematic and lexical). This difference lies mainly in
parts-of-speech and
stop words. This can be explained by the nature of lexical similarity, which uses
stop words. A textual analysis of the reuses in clusters 1 and 2 should confirm this. In conclusion, cluster content analysis is a relevant method for characterizing reuse clustering. In the case of an unlabeled dataset, it can be used to propose a typology of reuses based on clustering.
6. Discussion
6.1. A Reuse Characterization Tool
As an extension of the cluster content analysis in
Section 5.3, we present examples of reuse that are analyzed using similarity measures and plotted on radar charts.
Figure 8 shows two such charts. The left chart contains three quotations (one explicit, one implicit, and one implicitly detected), and the right chart contains two implicit echoes.
First, significant variations in similarity measures are observed between reuses, even when they belong to the same type (quotation or echo). The two selected echoes (
Figure 8b) have very different profiles. The first echo, between 1 Corinthians 13:1 (κύμβαλον ἀλαλάζον) and Psalm 150:5 (ἐν κυμβάλοις ἀλαλαγμοῦ), has a high semantic similarity, as seen in the
domains and
subdomains values. However, the difference in the context and meaning of this common expression explains why it is characterized as an echo. The second echo, between John 4:36 and Psalm 125:5, exhibits better grammatical and lexical similarities, as seen in the
Parts-of-speech and
lexicon values. Both verses share the theme of harvest, but their contexts exclude the idea of quotation.
![Religions 17 00088 i007 Religions 17 00088 i007]()
Quotations exhibit very low values compared to echoes. We observe a quantitative difference between our three quotations. The literal quotation of Psalm 15:10 found in Acts 2:27–28 has lower values than the two modified quotations, except for the stop words value, which is lower for the detected quotation because of its length and the high number of common lemmas.
The variations between the two implicit quotations (the detected one and the gold standard one) demonstrate qualitative differences. The reuse of Psalm 143:3 in Hebrews 2:6, which experts did not characterize as a quotation due to its proximity to Psalm 8:5, was detected because of the grammatical similarity between the two verses.
The gold standard quotation of Psalm 21:19 in Matthew 27:35 demonstrates grammatical similarities in Parts-of-speech and stop words, and lexical similarities in filtered lemmas and lexicon.
Note that the measures of lemmas and tokens are also sharply impacted because many stop words are taken into account. Finally, this brief analysis of reuse examples demonstrates the intertextuality model’s ability to capture reuse characteristics and provides methods for analyzing reuse at various scales, from single reuses to groups of tens or hundreds of reuses, as shown in the clustering results in
Section 5.3.
6.2. Psalmic Reuses in the NT Map
We created a map that illustrates the locations and types (typology 1) of all the reported and categorized reuses of Psalms in the New Testament (NT), which originate from literature (see
Section 2) or the detection results (see
Section 5.1). The map provides an overview of trends in psalmic quotations, echoes, and coincidences in NT chapters (
Figure 9), as well as of explicit and implicit reuses (
Figure 10). With these maps, one can make qualitative observations, such as the comparative density of reuses in different NT books. The
Book of Revelation most frequently reuses the Psalms (mainly echoes), while the
Epistle to the Hebrews exhibits the highest density of quotations (echoes are rare). Additionally, the first chapter of the
Gospel of Luke contains a high number of quotations, and several patterns are repeated in the Gospels. For example, all four Gospels contain the same quotations of Psalm 21 during Jesus’ passion, and three quotations of Psalms 109 and 117 (cited twice) appear in a similar pattern in the second half of each Synoptic Gospel. Note also the evolution of reuse types and density in the Pauline epistles, especially the low number of reuses (only a few echoes) in the second half of the Pauline corpus, as defined by the order retained here.
Figure 10 shows that although the Catholic epistles and the
Book of Revelation are the densest texts in the NT, there are no explicit reuses. The
Gospel of John reuses the Psalms as many times as the three Synoptic Gospels combined. Conversely, the influence of particular psalms on the NT is evident. For example, the significance of Psalm 109 is emphasized, especially in the
Epistle to the Hebrews, and Psalm 21 is referenced repeatedly in the Synoptic Passion narratives. These maps illustrate the significant role of Psalmic echoes in the NT, which serve as a woven backdrop. While quotations play an essential, albeit limited, role in specific narratives or demonstrations, especially when made by Jesus himself, echoes better demonstrate the presence of Psalms in the NT and the continuity between the two testaments.
6.3. A General Tool for Textual Reuse Detection
Our goal was to incorporate as many features as possible from traditional natural language processing (NLP) tools to determine whether a combined approach could produce useful results for biblical and patristic scholars working on BiblIndex, all the while avoiding the black box effect. The resulting tool detects quotations and certain thematic and semantic echoes in texts that have not yet been analyzed by humans. In the context of the BiblIndex project, our work demonstrates that merely reporting reuse occurrences between a biblical corpus and another corpus citing biblical texts is insufficient. Intra-biblical references must be analyzed in parallel because many cross-reuses may be missed by analysts, as demonstrated in the detection results in
Section 5.1. Note that reuse of Psalms in the New Testament has been widely studied and mainly takes the form of echoes, as explained in the dataset preparation (see
Section 2). This makes it a difficult test for a reuse detection tool. The clustering-based typology and the possibilities of reuse characterization are also convincing. Additionally, it generates fewer false positives than tools that use n-gram methods (see, for instance,
Forstall et al. (
2014) or different experiments conducted with TRACER
19 in the 2010s). Therefore, this approach can be useful for analyzing large patristic corpora or other ancient Greek texts, saving a significant amount of time. However, applying this approach to the same text representations requires digitized linguistic resources, such as synonyms and semantic domains, or an extension of the biblical resources used here.
Furthermore, depending on the corpus, the proposed detection and prediction methods may miss a large proportion of echoes. This shows that, even though our approach leverages the ability to predict reuses through the integration of biblical synonyms and semantic domain databases, development efforts must still be made to achieve satisfactory results for a detection tool. This improvement will likely require large language models that are trained on specific corpora in ancient languages and fine-tuned to measure textual similarity.
6.4. Analyzing Textual Similarity from Intertext Embedding
More generally, the proposed intertext model is not limited to the three operations performed here: detection, prediction, and clustering. Composed of various linguistic field similarity measures (grammatical, lexical, and semantic) between all pairs of textual entities—in this case, verses—from two corpora, it allows for the analysis of intertextuality at various scales (from verses to entire books) and can be employed in several types of investigations (e.g., text classification and stylometry). Additionally, due to its data structure, which resembles text or sentence embeddings (
Mikolov et al. 2013a,
2013b;
McGovern et al. 2025), the model can be referred to as “intertext embedding” and described as a knowledge-based sentence pair similarity represented in a vector space. Through this embedding architecture, machine learning algorithms can perform the aforementioned types of analyses. The clustering of reuses of the gold standard proposed here is an example using a restricted set of samples in the embedding. However, a larger or more specific set of verse pairs could undergo clustering analysis.
7. Conclusions
In the context of the BiblIndex project, an online index of biblical textual reuses by the Church Fathers, this work experiments with a numerical approach to the unsupervised detection and characterization of intra-biblical reuses. It introduces a new method of measuring literal, grammatical, lexical and semantic similarities between pairs of textual entities. Unlike recent language models which have a black box effect that limits their usage in philology, the goal was to propose a transparent method.
The tool is applied to two ancient Greek corpora: the Book of Psalms from the Septuagint and the New Testament. In parallel, a gold standard of 614 Psalms reuses in the New Testament has been constituted from an extended literature review and manually tagged through three different typologies. Each pair of Psalms and New Testament verses is described by a set of numerical measures derived from eight representations obtained through natural language processing operations combined with Greek biblical synonyms and semantic domains databases. The resulting intertextuality model formally consists of a knowledge-based sentence pair embedding and is deployed in three reuse analysis tasks: detection of Psalms reuses within the gold standard (outperforming the standard n-gram alignment method) and within unpublished reuses (68 new reuses were found); prediction of Psalms reuses from each New Testament verse within the gold standard to evaluate the model’s efficiency; clustering of gold standard reuses to automatically assign reuse types in coherence with the gold standard typologies.
We concluded from this experiment that lexical representations are best suited for reuse prediction, though the combination of all representations significantly increases the prediction rates from 0.3 to 0.5. This final rate, which is not high enough for a reuse detection tool in biblical or patristic studies, is a weighted average of all reuse types, which have various individual prediction rates ranging from 0.38 for implicit echoes to 0.98 for literal quotations. We also compared two normalization processes and determined that the basic process was more relevant for reuse prediction, while the second one permits a clearer characterization of reuses and clusters.
Finally, this study demonstrates the relevance of our multi-representation approach, which combines common NLP methods and linguistic databases, for the detection and characterization of reuses. However, prediction results must improve for a fully automated reuse detection tool to cover implicit echoes in the BiblIndex project. A more abstract representation—especially one based on large language models—would likely improve performance, albeit at the expense of interpretability due to a lack of transparency in the method.