Next Article in Journal
The Determination of Halal Food Perceptions Among University Students Receiving Islamic Theology Education: The Case of Istanbul, Berlin, and Kuala Lumpur
Previous Article in Journal
The Creation of Bahá’u’lláh and the New Era: “Textbook” of the Bahá’í Faith
Previous Article in Special Issue
The Social Network of the Holy Land
 
 
Article
Peer-Review Record

Computational Stylometrics and the Pauline Corpus: Limits in Authorship Attribution

Religions 2025, 16(10), 1264; https://doi.org/10.3390/rel16101264
by Anthony Rosa
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Religions 2025, 16(10), 1264; https://doi.org/10.3390/rel16101264
Submission received: 5 August 2025 / Revised: 15 September 2025 / Accepted: 28 September 2025 / Published: 1 October 2025
(This article belongs to the Special Issue Computational Approaches to Ancient Jewish and Christian Texts)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Summary:

The article under review should be published with minor but essential revisions. The main issue that requires correction is re-framing the nature of the critique, which probably will necessitate changing the paper’s title. The article fundamentally critiques the claim that stylometry can provide empirical proof of authorship, due to the two primary arguments that (1) different statistical methods evince some differences in results and (2) the author’s own statistical methods show proximity between certain New Testament and non-New Testament texts.

However, three major issues arise. First, the author’s fundamental critique is overly narrow, only pertaining to strong claims of authorship based on stylometry; other forms of statistics are not broached. Second, it is something of a straw man argument: most biblical scholars using statistics and computation would never make so strong a claim that a single statistical method can prove or disprove authorship. Third, the author’s own statistical analyses provide interesting, positivist results that contribute to many important questions in the field; these analyses not only triangulate with some traditional exegetical conclusions but also some other, previous statistical analyses, demonstrating clearly their utility in attempts to analyze and understand ancient texts.

Finally, from a methodological perspective, the author must acknowledge that if they deem statistical approaches “incoherent” because (1) different methods have differences in results and (2) statistics cannot be absolutely definitive about authorship, then they must equally deem non-computational methods (i.e.,all traditional exegesis, historical-cultural methods, philology, etc.) incoherent for exactly the same reasons.

Ultimately, the article is a rebuttal not of the methodology of stylometrics per se (or indeed statistical approaches more broadly, which is the theme of the issue) but only of the very specific and overly-strong (and not widely held, especially by the authors in this special issue) claim that stylometry can prove authorship. It merits especial mention in closing that a great many of the articles in this issue do not use traditional stylometry but other statistical methods, so this rebuttal is further limited in that regard; and in the papers that do use stylometry, none of them make so strong a claim as is attempted for rebuttal here.

 

More specific matters (some redundancy with above):

Stylometric analysis

  • The author uses six features as a proxy for “authorial style”. These are features used in traditional stylometry, and they return what we might term poor results, insofar as they show no real differentiation between Pauline and Julian letters.
  • The conclusion here is a rather narrow one: these are poor proxies for ‘authorial style’. Certainly the field can and should do better: we have shown how traditional stylometry, even of a more modern and multi-variate kind, is not able to make the conclusions it claims to be able to make, i.e. that there are identifiable authorial fingerprints. But of course WHAT and HOW we choose to code means everything: the article here has merely shown that these particular features are not useful, not that stylometry – much less statistical and computational approaches more broadly – isn’t useful per se.

Statistics and historical positivism

  • The author is skeptical that we can learn anything about something like the ‘real historical Paul’ using stylometry, because of all the methodological problems mentioned, for instance and notably the lack of ‘proof text’ that is authentically Pauline. Even if a text were Pauline, say Romans, it has undergone likely many alterations over the years and was likely the partial composition of a scribe. However, such a state of affairs is actually perfectly acceptable for a great deal of scholarship and other thought (theological, philosophical, historical, and otherwise), so long as a given text is understood to reflect an authentically ‘Pauline thought’ or ‘Pauline ideology’ or ‘Pauline community’. This is a widely debated issue within historical studies, for example: most scholars would agree that we have 100% certainty on nothing regarding exact authorship (or even something like ‘Pauline communities’), but we use ‘more and less likely’ historical (and other types of) analysis to engage certain texts as reflecting a particular strand of influential thought. The issue is not to make too extreme a claim, but rather to focus our tools on what we CAN know, namely the features in the text. And stylometry is useful here.
  • But this is what stylometry should do: not find a historical Paul, but rather analyze texts per se to see how they compare. And it can do this in a much more empirical, transparent, and objective way than traditional exegetical analysis.

Stylometry versus traditional methods

  • The author takes serious issue with far-reaching claims of stylometry. This is all well and good; most contributors to this special issue would likely agree. But these critiques must be taken alongside critiques of traditional methods. The question is not whether stylometry can “prove” that a historical Paul wrote one text instead of another, but rather whether it is an improvement on previous methods. And previous methods are equally – if not more – bankrupt on these very same questions. Stylometry cannot prove that Paul wrote Romans, yes. But traditional scholarship has done no better, and I would argue worse, for its focus on features that are stylistically subjective and content-based.
  • On the latter, it is even easier to imitate someone’s content than their style down to an n-gram level of analysis. Just because stylometry cannot do what its most extreme adherents claim (e.g., find the historical Paul) does not mean we should reject it. We should keep it mind its improvement on existing methods, which have been unable to come to any sort of consensus over 2000 years on these very questions.

Positivist contributions via statistics

  • The article provides several useful analyses and conclusions that in fact prove the utility of stylometry. They acknowledge that their clusters linked epistolography but not other types of literature in the Bible. They are right that this doesn’t prove single authorship if two texts are within a cluster, but we need not make that conclusion. Most statistical approaches, in fact, do not go so far as to make that historical, positivist conclusion, but rather conclude more narrowly on texts and text-features. Most recent statistical methods make conclusions along the lines of, ‘These groups of texts are similar in a, b, and c ways, but less similar to other given texts in x, y, and z ways.’ These are exactly the conclusions the author finds, and they are positivist, empirical, and useful for historical analysis. They should then marshal the ambition to dig deeper: which texts, to which degrees, and according to which criteria? These are questions only stylometry, and computation more broadly, can solve.
  • The author’s cluster map is actually tremendously interesting. Behold the proximity of the Gospels, the outlier cluster of Colossians and Ephesians, and the notably broad cluster of epistolography. Cluster mapping has performed as it should. The author should then explore the interesting dimensions of these findings: note the location of Revelation amidst other purportedly different texts; the relative clusterings of the Julian letters in the lower right; the relative clusterings of the Undisputed Letters in the middle; the particular proximities of pairs like Romans and 1 Corinthians, or Galatians and Hebrews and Revelations. There is a wealth of interesting follow-up analyses and studies to conduct. Much the same can be said for the Cosine Similarity graph and discussion.
  • And wouldn’t a traditional, textual scholar largely agree with these clusterings? With these differences? With these pairings? And cosine similarities of pairs like Romans and 1 Corinthians?
  • Note too the clusters with the Julian letters removed. We have the Gospel narratives and Acts (no surprise); a variety of non-Pauline epistolography (no surprise either); a large cluster of Pauline, both authentic and not (also no surprise); and then Colossians and Ephesians (perhaps no surprise there either). These findings tremendously triangulate qualitative, traditional analysis – an important step forward for validating those points of consensus AND the utility and accuracy of the author’s statistical methods.
  • As the author themself notes, these findings cohere to a huge degree with many existing studies and findings! It seems their critique extends only to stylometrists who make overly-strong claims around authorship; every other claim of statistical approaches seems to yet stand. Most statistical-biblical scholars would agree with this: we shouldn’t make overly strong, essentialist, “proof” claims using statistical methods; rather, we should use these methods as another tool to explore similarities and differences between texts. The author, dare I suggest, would agree, and there is therefore much less reason for polemic here than is presented in the title, framing, or conclusion; they should all be modified accordingly.

Critiques of statistics versus traditional methods

  • The author also overviews a few different stylometry scholars and methods, and argues that different methods and approaches find smaller groupings that differ from one another, and the author frames this fact as a fundamental and fatal methodological critique. This is certainly true – different methods and approaches have some differences in their findings – and indeed it is something that the field is currently sorting out: which methods are best, most useful, and relevant? Different ways of coding, different forms of analysis, and so on – it all gives us different results when we really drill down. But this is a *feature* of this type of analysis, not a bug. The findings of previous stylometrists are focused on particular ways of groupings texts according to particular features, which find particular areas of overlap. Again, this is useful – and empirically transparent too.
  • If we are to critique such methods as being useless and “incoherent” as the author suggests, then we should critique all historical, philological, philosophical, and indeed every other qualitative approach as equally useless and “incoherent” for the manifestly same reason that scholars within these areas disagree, dispute, and make differing conclusions about Paul. What we’re left with is utter incoherence, absolute aporia, and supra-skepticism about all analysis per se, simply because scholars disagree and use different methods. Traditional exegesis hasn’t been able to prove which (sections of which) letters the historical Paul wrote: throw it all out! This cannot be so: we interrogate our methods, compare findings, triangulate results, explore the strengths and weaknesses of different approaches, and transparently provide our definitions of terms, scholarly assumptions, and methodologies to encourage open debate. Stylometry – and all statistical analysis – does the same.

Reference corpora for triangulating analyses

  • Why can’t we have a reference corpus? Why not trust the biblical scholars who have partitioned the texts between Pauline and non-Pauline? Even if this isn’t guaranteed to be true (it’s not, but “guarantee” is an impossibly high bar), it draws from a vast body of careful cultural, historical, and textual study that groups the texts. Then we can usefully analyze: how do these two sets of Pauline letters compare? Where do they overlap and differ? What do they say about highly disputed letters, such as 2 Thessalonians?
  • And even if we can’t, even if we have no “reference corpus”, we can still make the same conclusions: which texts are similar to which, where, how, and to what extent? The problem the author continually returns to is a too-strong claim on authorship. I and most others would agree – but what statistics CAN tell us relates to the texts (i.e., form of data) we have today: their shape, their features, and how all of this compares to other texts.

Methodological over-reach

  • The author should not – and cannot, I submit – claim that we should be “questioning stylometrics as a contributive discipline for Pauline authorial studies”. It can obviously contribute: it can map, it can test, and it can engage in empirical discussion with qualitative, traditional methods that approach Pauline authorial studies. Unless the author wants to get rid of authorial studies entirely (which would be the implication of their critique of stylometry per se; see above), scholars need to look to the contributions of all available analytic tools. Just because a tool is not definitive and some have over-stated its use does not mean it should be discarded entirely or it has no function; the same could easily be said for every traditional method ever applied to biblical studies. Instead, we should put statistics in conversation with “fundamental epistemological problems” as the author indeed suggests, and ultimately pair computational methods with “verified historical evidence to produce reliable conclusions”. I believe it is fair to say that not a single author in this issue would disagree.

Author Response

Please see the attached PDF. Thank you for your detailed feedback!

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This is a provocative and well-argued study, which is well-informed by recent and relevant research. I commend the author(s) for being both at home in the field of theological and computational studies. It's a significant contribution to the field of Pauline studies.  

There is only one thing that comes to mind of which the author(s) seem(s) unaware. Even though a discussion is presented on ancient manuscripts (p. 4 of 22), saying that the autographs are lost and so there is insecurity about the faithfulness of transmission, it is equally insecure that all extant manuscripts generate similar results. On the one hand, this hypothesis proves the author's/authors' point, but on the other hand potentially falsifies it as the results of this study are exclusively based on the edition of the SBLGNT. 

One finale note. On p. 19, it is said that ... "the malleability of the outcomes based on feature selection supports Van Nes’ conclusions." It is, however, not clear to me which conclusions. 

Author Response

Please see the attached PDF. Thank you for your feedback!

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper offers a blistering critique of computational attempts to sift through the corpus of Pauline letters in the NT. It is well written and engages with a wide range of scholarship - and I have great sympathy with the author's conclusions, that computational methods cannot entirely supplant traditional analytic, literary, and intellectual analysis to determine questions of authenticity. Its principal weakness, however, is conceptual. The basis of the argument rests on the idea that we have no 'ground truth' for Pauline authorship, and as a result, have no  genuine 'stylome' to compare. One can't help but feel that this is a bit of sophistry: the notions that (a) 'one person was responsible for writing some core of the Pauline corpus, and that this person's style can be detected and analysed', and (b) 'that that person is the Paul described in Acts' are not in fact the same question. Obviously, there is no way a computer can ever answer the second question, but I don't think the author's methods go far enough to disprove the first. A fruitful comparand from the field of ancient Greek stylometrics would have been the Homer and Hesiod question: I wonder how it would be received if one asserted that one cannot possibly use stylometric methods on these texts because we cannot establish the 'ground truth' of who historically the 'author' was. That said, it is possible the author is right that computational methods cannot reliably determine across different experimental set-ups that the same author wrote say Ephesians and Colossians, or any other subset, but that does not actually depend whatsoever on the question of the historical Paul. 

This problem is compounded by the unfortunate decision to use the some epistles of Julian is a comparand. The selection of letters is never justified, and one wonders what motivated it. Worse still, Julian is a uniquely bad case - emperors are busy people, and Julian did a lot in his brief reign. Some of the letters may have been written by him (with amanuenses, etc.) but others may have simply been signed by him. Using a disputed corpus as a distractor or a comparand for another disputed corpus is unsound, and one can't help but wonder why the letters of Libanius were not used - from the same cultural ambit, but one which is widely (and correctly) thought to be the work of the a single author in the normal sense of the term. Would the results be the same?

In brief, while there is merit here some fundamental issues need to be addressed, and the choice of Julianic letters is both unmotivated and methodologically unsound. Alongside this there are minor problems that should be addressed - one cannot simply dismiss an analysis by J.N. Adams - possibly the most expert Latinist of the second half of the twentieth century - as dated and unproven - it needs to be backed up by evidence. There are also some problems in citation, for example, the essay of Hilla Halla-aho is attributed to 'Oxford Scholarship Online' rather than the book in which it actually appears.

Author Response

Please see the attached PDF. Thank you for your feedback!

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

A significant improvement that better and more narrowly maps the statistical findings onto qualitative issues around (ie, strong claims regarding) authorship attribution. We're grateful to the author for their extensive and constructive engagement with the first round of feedback. Core points of disagreement remain - both in terms of methodology and content - but this is to be celebrated as exactly the purpose of such a piece and it therefore belongs in the conversation. The author is thoughtful, careful, rigorous, and contributes a useful and substantive piece at the intersection of computational statistics and traditional biblical studies. 

Back to TopTop