Minimal Computing and Weak AI for Historical Research: The Case of Early Modern Church Administration
Abstract
1. Introduction
2. Tasks and Requirements
- T1:
- Extracting relevant semantic information (dates, persons in different roles, categorical information of different types, geographical entities, temporal information, etc.) to information atoms or semantic elements.
- T2:
- Harmonizing, standardizing, and normalizing these elements across all their occurrences, e.g., by transforming dates into machine-readable formats, identifying identical persons, places, territories, etc., and potentially linking them to authority data (Wikidata, etc.).
- T3:
- Reassembling the elements within one or across multiple entries to larger semantic or ontological structures (knowledge graph), such as combining elements that describe distinct events (provisions involving stakeholders, places, dates, etc.) and distinct objects these events deal with (church offices having a specific legal framing or institutional embedding).
- T4:
- Using the resulting knowledge graph as a basis to analyze the semantic data to provide various insights, such as quantifications and rankings, geographical patterns, social networks, complex patterns and clusters, temporal trends, etc.
- R1:
- All automation needs to be reproducible, fast, and available offline on multiple personal computers that synchronize and harmonize their individual data later on.
- R2:
- Model predictions are being validated, and these validated and improved results must feed back into the training data to retrain a model. These predictions must also be interpretable and deterministic to the degree that they truly reflect the shape and quality of the training data and no other semantic context.
- R3:
- Data models and ontology must truly and only conform to assumptions by domain experts, while conforming to formalized and machine-readable standards (e.g., OWL).
- R4:
- Analytical or heuristic functions must be deterministic and simple enough to be validated by doctoral students with no expertise in mathematics or computer science.
2.1. Task 1: Extracting
2.2. Task 2: Normalizing and Linking
2.3. Task 3: Grouping
| Listing 1: Feature engineering for the CatBoost classifier, using Python packages pandas, catboost, and sklearn. See full code in src/Classifyer.py in (Sander 2024b). |
| df['entry_length'] = df['entry'].apply(len) |
| df['start'] = df['start'] / df['entry_length'] |
| df['end'] = df['end'] / df['entry_length'] |
| df['entity_count'] = df.groupby('entry_ID')['text'].transform('count') |
| df['avg_start_position'] = df.groupby('entry_ID')['start'].transform('mean') |
| df['avg_end_position'] = df.groupby('entry_ID')['end'].transform('mean') |
| df['all_texts'] = df.groupby('entry_ID')['text'].transform(lambda x: ' '.join(x)) |
| df['all_labels'] = df.groupby('entry_ID')['label'].transform(lambda x: ','.join(sorted(x))) |
| df['all_labels_count'] = df['entry_ID'].map(df.groupby('entry_ID')['label'].agg(list)) |
| preprocessor = ColumnTransformer( |
| transformers=[ |
| ('text', TfidfVectorizer(token_pattern=r"(?u)\b\w+\b"), 'text'), |
| ('label', OneHotEncoder(handle_unknown='ignore'), ['label']), |
| ('all_texts', TfidfVectorizer(token_pattern=r"(?u)\b\w+\b"), 'all_texts'), |
| ('all_labels', OneHotEncoder(handle_unknown='ignore'), ['all_labels']), |
|
('all_labels_count', CountVectorizer(token_pattern=None, tokenizer=lambda labels: labels, lowercase=False), 'all_labels_count'), |
| ('start_end', 'passthrough', ['start', 'end']), |
| ('context_features', 'passthrough', [ |
| 'entry_length', 'entity_count', 'avg_start_position', 'avg_end_position' |
| ]) |
| ]) |
2.4. Task 4: Understanding
3. Discussion and Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
| 1 | GRACEFUL17: Global Governance, Local Dynamics. Transnational Regimes of Grace in the Roman Dataria Apostolica (17th Century) is a transnational, Franco-German research project, funded by the Deutsche Forschungsgemeinschaft and the Agence Nationale de la Recherche, and directed by Birgit Emich (Goethe Universität Frankfurt a. M.) and Olivier Poncet (École Nationale des Chartes in Paris). Other partner institutions include the Deutsches Historisches Institut in Rome, the École Française de Rome, and the Université de Reims-Champagne-Ardenne. The project’s digital humanities component is based at the German Historical Institute in Rome. |
| 2 | As a matter of fact, HTR technology is currently being tried, with some success. Yet, parsing the transcribed text into semantic units, also accounting for incoherent bindings of the paper sheets that were written on, poses challenges on yet another level. |
| 3 | The projects “Repertorium (Academicum) Germanicum” are similar in data and analog workflows but are not born digital. Yet, they are an obvious application for the workflows presented here (Höing 1991; Esch 1991; Schwinges 2015; Gubler and Schwinges 2017; Beckstein et al. 2022; Hörnschemeyer and Voigt 2023; Schmugge 2023; Reimann 1991). |
| 4 | Archivio Apostolico Vaticano (AAV) Dataria Ap. (Dataria Apostolica) Expeditiones 2 and 9. |
| 5 | The GRACE ontology resembles the Simple Even Model (van Hage et al. 2009a, 2009b, 2011) in its core assumptions. |
| 6 | |
| 7 | Figure 1 and the synopsis are taken verbatim from a forthcoming data paper (Sander et al. forthcoming). |
| 8 | For the clustering, I used sklearn.cluster.SpectralClustering(n_clusters=n_clusters, affinity=‘precomputed’, assign_labels=‘discretize’). |
| 9 | P: 0.9186346171867679, R: 0.9307807344458423, F1: 0.9246677907346924. |
| 10 | Certain ontological axioms define the framework for these deep structures. By definition, every event requires an associated object. In such one-to-one relationships, the assignment of specific labels to either an object or an event is deterministic. Machine learning proves especially useful in more complex cases, where rule-based disambiguation reaches its limits. |
| 11 | Yet, promising tests conducted with Jochen Büttner in fact suggest a viable pipeline for using fine-tuned foundation models to efficiently conduct the same task. A joint publication is in progress. |
| 12 | Although one could imagine predicting class, subtype, and index as three separate outputs, doing so would effectively multiply the number of outputs by three and force the model to learn dependencies across them. As classes are currently binary (event vs. object), cardinalities for types are quite high (few types recurring frequently), and indexes do not exceed ten (max. events/objects per entry), this target triplet keeps the number of target dimensions manageable and far lower than the full Cartesian product of separate class, subtype, and index predictions. By encoding each valid combination of class, subtype, and index as a single multilabel triplet, the prediction task is reduced to m independent binary decisions (one per triplet). This approach eliminates the need for the model to output and reconcile three interdependent values (class, subtype, index) for each element. In practice, a MultiOutputClassifier would require three separate heads and learn the intricate dependencies between them, increasing complexity and the risk of inconsistent predictions. The OvR-triplet approach circumvents these issues by treating each composite role as its own binary label while preserving the model’s inherent capacity to assign relevant combinations of deep structures. |
| 13 | The MLB takes each sample’s set of true triplets and transforms it into a fixed-length binary vector. If there are m triplet labels in the training data for one element, this element’s vector has m positions, and it thereby converts a variable-sized label set into the uniform, numeric format required. Once the targets have been binarized, the OvR wrapper constructs m separate binary classifiers—one for each triplet label. Each binary model is trained to distinguish “element belongs to label i” versus “it does not.” During training, OvR simply reads off the corresponding column of the binarized target matrix produced by the MLB. At inference time, each of the m classifiers casts an independent vote on whether its label applies. The collection of positive votes is then recombined into the final multilabel prediction for each element. |
| 14 | F1: 0.9738834762666144, P: 0.9897332440073945, R: 0.9636759179906388, support: 45507. |
| 15 | Both references are not to suggest disagreement. In fact, Eberle et al. (2024) present an AI-assisted research case relying on largely unsupervised ML. |
References
- Aviyente, Selin, and Abdullah Karaaslanli. 2022. Explainability in Graph Data Science: Interpretability, Replicability, and Reproducibility of Community Detection. IEEE Signal Processing Magazine 39: 25–39. [Google Scholar] [CrossRef]
- Balavoine, Ludovic. 2011. Des Hommes et des Bénéfices: Le Système Bénéficial du Diocèse de Bayeux au Temps de Louis XIV. Bibliothèque d’histoire moderne et contemporaine. Paris: H. Champion. Genève: Diff. Slatkine. [Google Scholar]
- Beckstein, Clemens, Robert Gramsch-Stehfest, Clemens Beck, Jan Engelhardt, Christian Knüpfer, and Georg Zwilling. 2022. Digitale Prosopographie. Die automatisierte Auswertung des Repertorium Germanicum, eines Quellenkorpus zur Geschichte geistlicher Eliten des 15. Jahrhunderts. In Digital History. Konzepte, Methoden und Kritiken Digitaler Geschichtswissenschaft. Edited by Karoline Dominika Döring, Stefan Haas, Mareike König and Jörg Wettlaufer. Berlin: De Gruyter, pp. 151–69. [Google Scholar] [CrossRef]
- Bogart, Steve. 2014. SankeyMATIC: Build a Sankey Diagram. SankeyMATIC, Released. Available online: https://sankeymatic.com/build/ (accessed on 1 May 2025).
- CatBoost. 2025. CatBoost (V. 1.2.8). Available online: https://github.com/catboost/catboost (accessed on 1 May 2025).
- Chase, Harrison. 2022. LangChain. Jupyter Notebook. Available online: https://github.com/langchain-ai/langchain (accessed on 1 May 2025).
- Cucerzan, Silviu. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Paper presented at 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, June 28–30; Edited by Jason Eisner. Vienna: Association for Computational Linguistics, pp. 708–16. Available online: https://aclanthology.org/D07-1074/ (accessed on 1 May 2025).
- Dombrowski, Quinn. 2022. Minimizing Computing Maximizes Labor. Digital Humanities Quarterly 16. Available online: https://dhq.digitalhumanities.org/vol/16/2/000594/000594.html (accessed on 1 May 2025).
- Eberle, Oliver, Jochen Buttner, Florian Krautli, Klaus-Robert Muller, Matteo Valleriani, and Gregoire Montavon. 2022. Building and Interpreting Deep Similarity Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 44: 1149–61. [Google Scholar]
- Eberle, Oliver, Jochen Büttner, Hassan el-Hajj, Grégoire Montavon, Klaus-Robert Müller, and Matteo Valleriani. 2024. Historical Insights at Scale: A Corpus-Wide Machine Learning Analysis of Early Modern Astronomic Tables. Science Advances 10: eadj1719. [Google Scholar] [CrossRef]
- Esch, Arnold. 1991. EDV-gestützte Auswertung vatikanischer Quellen des Mittelalters: Die neuen Indices des Repertorium Germanicum. Vorbemerkungen zum Thema. Quellen und Forschungen aus Italienischen Archiven und Bibliotheken 71: 241–42. [Google Scholar]
- Espeland, Wendy Nelson, and Mitchell L. Stevens. 1998. Commensuration as a Social Process. Annual Review of Sociology 24: 313–43. [Google Scholar] [CrossRef]
- Explosion. 2025. Explosion/Displacy. JavaScript. Released April 8. Available online: https://github.com/explosion/displacy (accessed on 1 May 2025). First published 2016.
- Fink, Karl August, and Angelo Mercati. 1951. Das Vatikanische Archiv: Einführung in die Bestände und ihre Erforschung, 2nd ed. Rome: Regenberg. [Google Scholar]
- GO::DH Minimal Computing Working Group. 2022. Minimal Computing. DHCC, May 17. Available online: https://go-dh.github.io/mincomp/ (accessed on 1 May 2025).
- Gubler, Kaspar, and Rainer Christoph Schwinges. 2017. Repertorium Academicum Germanicum (RAG): Un nuovo Database per un’analisi basata sul Web e per la Visualizzazione dei Dati. Annali di Storia dell’Università Italiane 21: 13–24. [Google Scholar]
- Guest, Olivia. 2025. What Does ‘Human-Centred AI’ Mean? arXiv arXiv:2507.19960. [Google Scholar] [CrossRef]
- Hachey, Ben, Will Radford, Joel Nothman, Matthew Honnibal, and James R. Curran. 2013. Evaluating Entity Linking with Wikipedia. Artificial Intelligence 194: 130–50. [Google Scholar] [CrossRef]
- Halevi Hochwald, Inbal, Gizell Green, Yael Sela, Zorian Radomyslsky, Rachel Nissanholtz-Gannot, and Ori Hochwald. 2023. Converting Qualitative Data into Quantitative Values Using a Matched Mixed-Methods Design: A New Methodological Approach. Journal of Advanced Nursing 79: 4398–410. [Google Scholar] [CrossRef]
- Harris, Steve, and Andy Seaborne. 2013. SPARQL 1.1 Query Language. With Eric Prud’hommeaux. Available online: https://www.w3.org/TR/sparql11-query/ (accessed on 1 May 2025).
- Healy, Kieran. 2017. Fuck Nuance. Sociological Theory 35: 118–27. [Google Scholar] [CrossRef]
- Hiltmann, Torsten, Martin Dröge, Nicole Dresselhaus, Till Grallert, Melanie Althage, Paul Bayer, Sophie Eckenstaler, Koray Mendi, Jascha Marijn Schmitz, Philipp Schneider, and et al. 2025. NER4all or Context Is All You Need: Using LLMs for Low-Effort, High-Performance NER on Historical Texts. A Humanities Informed Approach. arXiv arXiv:2502.04351. [Google Scholar] [CrossRef]
- Honnibal, Matthew, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-Strength Natural Language Processing in Python. Python. Released July 3. First published 2014. [Google Scholar] [CrossRef]
- Höing, Hubert. 1991. Die Erschließung des Repertorium Germanicum durch EDV-gestützte Indices. Technische Voraussetzungen und Möglichkeiten. Quellen und Forschungen aus Italienischen Archiven und Bibliotheken 71: 310–24. [Google Scholar]
- Hörnschemeyer, Jörg, and Jörg Voigt. 2023. Das ‘Repertorium Germanicum’. Perspektiven einer Digitalen Prosopographie. In Die Römischen Repertorien. Neue Perspektiven für die Erforschung von Kirche und Kurie des Spätmittelalters (1378–1484). Edited by Claudia Märtl, Irmgard Fees, Andreas Rehberg and Jörg Voigt. Bibliothek des Deutschen Historischen Instituts in Rom 145. Berlin: De Gruyter, pp. 135–58. [Google Scholar] [CrossRef]
- Hu, Yuntong, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2025. GRAG: Graph Retrieval-Augmented Generation. In Findings of the Association for Computational Linguistics: NAACL 2025. Edited by Luis Chiruzzo, Alan Ritter and Lu Wang. Vienna: Association for Computational Linguistics. [Google Scholar] [CrossRef]
- Jaskulski, Piotr, Tomasz Latos, Mariusz Ryńca, and Adam Zapała. 2025. Reliability of Large Language Models as a Tool for Knowledge Extraction from Biographical Dictionaries: The Case of the Polish Biographical Dictionary. Digital Scholarship in the Humanities 40: 538–48. [Google Scholar] [CrossRef]
- Joyner, David. 2025. ‘AI Veganism’: Some People’s Issues with AI Parallel Vegans’ Concerns about Diet. The Conversation, July 29. Available online: http://theconversation.com/ai-veganism-some-peoples-issues-with-ai-parallel-vegans-concerns-about-diet-260277 (accessed on 1 May 2025).
- Karjus, Andres. 2025. Machine-Assisted Quantitizing Designs: Augmenting Humanities and Social Sciences with Artificial Intelligence. Humanities and Social Sciences Communications 12: 277. [Google Scholar] [CrossRef]
- Kieslich, Kimon, Marco Lünich, and Pero Došenović. 2024. Ever Heard of Ethical AI? Investigating the Salience of Ethical AI Issues among the German Population. International Journal of Human–Computer Interaction 40: 2986–99. [Google Scholar] [CrossRef]
- Kleymann, Rabea. 2025. Taken for Granted? Investigating Constructivist Principles with Bayes’ Theorem in Digital Humanities Scholarship. Digital Scholarship in the Humanities. ahead of print. [Google Scholar] [CrossRef]
- Kosmyna, Nataliya, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task. arXiv arXiv:2506.08872. [Google Scholar] [CrossRef]
- Landau, Peter. 1980. Beneficium, Benefizium. III. Kanonisches Recht und Kirchenverfassung. In Lexikon des Mittelalters. Munich: Artemis, vol. 1. [Google Scholar]
- Langlais, Pierre-Carl. 2023. MonadGPT. Hugging Face, November 10. Available online: https://huggingface.co/Pclanglais/MonadGPT (accessed on 1 May 2025).
- Langlais, Pierre-Carl, Pavel Chizhov, Mattia Nee, Carlos Rosas Hinostroza, Matthieu Delsart, Irène Girard, Anastasia Stasenko, and Ivan P. Yamshchikov. 2025. Pleias 1.0: The First Ever Family of Language Models Trained on Fully Open Data. Procedia Computer Science 267: 146–56. [Google Scholar] [CrossRef]
- Lehmann, Jörg, and Anna-Maria Sichani. 2025. A Position Paper on AI and Copyrights in Cultural Heritage and Research (EU and UK). Journal of Open Humanities Data 11. [Google Scholar] [CrossRef]
- Levenshtein, Vladimir I. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10: 707. [Google Scholar]
- Lundberg, Scott M., Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From Local Explanations to Global Understanding with Explainable AI for Trees. Nature Machine Intelligence 2: 2522–5839. [Google Scholar] [CrossRef]
- Luo, Kangyang, Yuzhuo Bai, Cheng Gao, Shuzheng Si, Yingli Shen, Zhu Liu, Zhitong Wang, Cunliang Kong, Wenhao Li, Yufei Huang, and et al. 2025. GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion. In Findings of the Association for Computational Linguistics: ACL 2025. Edited by Wanxiang Che, Joyce Nabende, Ekaterina Shutova and Mohammad Taher Pilehvar. Vienna: Association for Computational Linguistics, pp. 11328–44. Available online: https://aclanthology.org/2025.findings-acl.591/ (accessed on 1 May 2025).
- Marin, Lavinia, and Steffen Steinert. 2025. CTRL+ Ethics: Large Language Models and Moral Deskilling in Professional Ethics Education. In Oxford Intersections: AI in Society. Edited by Philipp Hacker. Oxford: Oxford University Press. [Google Scholar] [CrossRef]
- Mau, Steffen. 2018. Die Quantifizierung des Sozialen. Zeitschrift für Theoretische Soziologie 7: 274–92. [Google Scholar] [CrossRef]
- McInnes, Leland, John Healy, and Steve Astels. 2017. Hdbscan: Hierarchical Density Based Clustering. Journal of Open Source Software 2: 205. [Google Scholar] [CrossRef]
- Merry, Sally Engle, Kevin E. Davis, and Benedict Kingsbury, eds. 2015. The Quiet Power of Indicators: Measuring Governance, Corruption, and Rule of Law, 1st ed. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
- Oberbichler, Sarah, and Cindarella Petz. 2025. Evaluating bias within an epistemological framework for AI-based research in the humanities. In Diversità, Equità e Inclusione: Sfide e Opportunità per l’Informatica Umanistica nell’Era dell’Intelligenza Artificiale, Proceedings del XIV Convegno Annuale AIUCD2025. Edited by Simone Rebora, Marco Rospocher and Stefano Bazzaco. Verona: AIUCD, pp. 52–59. Available online: https://amsacta.unibo.it/id/eprint/8380/1/AIUCD2025_Proceedings.pdf (accessed on 1 May 2025).
- Parravicini, Alberto, Rhicheek Patra, Davide B. Bartolini, and Marco D. Santambrogio. 2019. Fast and Accurate Entity Linking via Graph Embedding. Paper presented at 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Amsterdam, The Netherlands, June 30–July 5; New York: Association for Computing Machinery, pp. 1–9. [Google Scholar] [CrossRef]
- Pásztor, Lajos. 1970. Guida delle Fonti per la Storia dell’America Latina: Negli Archivi della Santa Sede e Negli Archivi Ecclesiastici d’Italia. Vatican City: Archivio Vaticano. [Google Scholar]
- Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and et al. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12: 2825–30. [Google Scholar]
- Peels, Rik. 2019. Replicability and Replication in the Humanities. Research Integrity and Peer Review 4: 2. [Google Scholar] [CrossRef]
- Pellissier Tanon, Thomas. 2025. Oxigraph. Rust. Zenodo. Released June 15. [Google Scholar] [CrossRef]
- Peng, Boci, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. 2024. Graph Retrieval-Augmented Generation: A Survey. arXiv arXiv:2408.08921. [Google Scholar] [CrossRef]
- Powers, Simon T., Neil Urquhart, Chloe M. Barnes, Theodor Cimpeanu, Anikó Ekárt, The Anh Han, Jeremy Pitt, and Michael Guckert. 2025. What’s It Like to Trust an LLM: The Devolution of Trust Psychology? IEEE Technology and Society Magazine 44: 30–37. [Google Scholar] [CrossRef]
- Rao, Delip, Paul McNamee, and Mark Dredze. 2013. Entity Linking: Finding Extracted Entities in a Knowledge Base. In Multi-Source, Multilingual Information Extraction and Summarization. Edited by Thierry Poibeau, Horacio Saggion, Jakub Piskorski and Roman Yangarber. Berlin/Heidelberg: Springer, pp. 93–115. [Google Scholar] [CrossRef]
- RDF Core Working Group. 2014. RDF-Semantic Web Standards. HTML. RDF-Semantic Web Standards, February 25. Available online: https://www.w3.org/TR/rdf-schema/ (accessed on 1 May 2025).
- Reimann, Michael. 1991. Neue Erschließungsformen kurialer Quellen: Das Repertorium Germanicum Nikolaus’ V. und Calixts III. (1447–1458) mit computergestützten Indices. Römische Quartalschrift für Christliche Altertumskunde und Kirchengeschichte 86: 98–112. [Google Scholar]
- Ries, Thorsten, Karina van Dalen-Oskam, and Fabian Offert. 2024. Reproducibility and Explainability in Digital Humanities. International Journal of Digital Humanities 6: 1–7. [Google Scholar] [CrossRef]
- Risam, Roopika, and Alex Gil. 2022. Introduction: The Questions of Minimal Computing. Digital Humanities Quarterly 16. Available online: http://digitalhumanities.org/dhq/vol/16/2/000646/000646.html (accessed on 1 May 2025).
- Sander, Christoph. 2024a. DATAria: Graceful17 Utilities Platform (App). Released. Available online: https://dataria.dhi-roma.it/ (accessed on 1 May 2025).
- Sander, Christoph. 2024b. DATAria: Graceful17 Utilities Platform (Core Codebase). Released. Available online: https://github.com/ch-sander/dataria-core (accessed on 1 May 2025).
- Sander, Christoph. 2025a. GRACEFUL17: Article 1 Code (Nepotism/Lead Speed). Python. March 31, Released April 2. [Google Scholar] [CrossRef]
- Sander, Christoph. 2025b. GRACEFUL17 Explorer. Jinja. April 16. DHI-Roma. Released April 19. Available online: https://github.com/DHI-Roma/g17-explorer (accessed on 1 May 2025).
- Sander, Christoph. 2025c. DATAria Python Utils. Python. December 19, Released April 23. Available online: https://github.com/ch-sander/dataria-py-utils (accessed on 1 May 2025). First published 2024.
- Sander, Christoph. 2025d. DATAria: Graceful17 Utilities Platform (Core Codebase, Release). Zenodo, Released August 4. [Google Scholar] [CrossRef]
- Sander, Christoph. 2025e. G17_cat_boost. Version 5576ed0. Hugging Face, April 28. [Google Scholar] [CrossRef]
- Sander, Christoph. 2025f. La_g17_all_tags. Version 25933a9. Hugging Face, April 24. [Google Scholar] [CrossRef]
- Sander, Christoph, and Bruno Boute. 2025. GRACE Ontology (Version 1.0.2). Zenodo, April 29. [Google Scholar] [CrossRef]
- Sander, Christoph, and Jörg Hörnschemeyer. 2025. GRACEFUL17: A Scalable Digital Fast-Track Strategy: Mining, Modelling, and Mastering Early Modern Church Administration Data. Paper presented at Alliance of Digital Humanities Organizations Annual Conference, Lisbon, Portugal, July 14–18. [Google Scholar]
- Sander, Christoph, Bruno Boute, Jörg Hörnschemeyer, Naomi Beutler, Filippo Sarra, Valentino Verdone, and Andrea Cicerchia. Forthcoming. GRACEFUL17 Data Paper. Die Zeitschrift Für Digitale Geisteswissenschaften. accepted for publication. [Google Scholar]
- Sander, Christoph, Naomi Beutler, Filippo Sarra, Valentino Verdone, and Bruno Boute. 2025a. GRACEFUL17 Notebooks. Jupyter Notebook. DHI-Roma, released November 5. Available online: https://github.com/DHI-Roma/g17-notebooks (accessed on 1 May 2025).
- Sander, Christoph, Naomi Beutler, Filippo Sarra, Valentino Verdone, Andrea Cicerchia, Bruno Boute, and Jörg Hörnschemeyer. 2025b. Graceful17: Main Data Repository. Zenodo, February 28. [Google Scholar] [CrossRef]
- Schmugge, Ludwig. 2023. ‘Repertorium Poenitentiariae Germanicum’ und Digital Humanities. Eine fruchtbare Beziehung. In Die Römischen Repertorien. Neue Perspektiven für die Erforschung von Kirche und Kurie des Spätmittelalters (1378–1484). Edited by Claudia Märtl, Irmgard Fees, Andreas Rehberg and Jörg Voigt. Bibliothek des Deutschen Historischen Instituts in Rom 145. Berlin: De Gruyter. [Google Scholar] [CrossRef]
- Schwinges, Rainer Christoph. 2015. Das Repertorium Academicum Germanicum (RAG): Ein digitales Forschungsvorhaben zur Geschichte der Gelehrten des Alten Reiches (1250–1550). Jahrbuch für Universitätsgeschichte 16: 215–32. [Google Scholar]
- Selim, Rania, Arunima Basu, Ailin Anto, Thomas Foscht, and Andreas Benedikt Eisingerich. 2024. Effects of Large Language Model-Based Offerings on the Well-Being of Students: Qualitative Study. JMIR Formative Research 8: e64081. [Google Scholar] [CrossRef]
- Simons, Arno, Michael Zichert, and Adrian Wüthrich. 2025. Large Language Models for History, Philosophy, and Sociology of Science: Interpretive Uses, Methodological Challenges, and Critical Perspectives. arXiv arXiv:2506.12242. [Google Scholar] [CrossRef]
- Sparna. 2025. Sparnatural SPARQL Query Builder. Released. Available online: https://github.com/sparna-git/Sparnatural (accessed on 1 May 2025).
- Spärck Jones, Karen. 1972. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation 28: 11–21. [Google Scholar] [CrossRef]
- Sporny, Manu, Dave Longley, Gregg Kellog, Markus Lanthaler, Pierre-Antoine Champin, and Niklas Lindström. 2020. JSON-LD 1.1. V. 1.1. Released July 16. Available online: https://www.w3.org/TR/json-ld11/ (accessed on 1 May 2025).
- Storti, Nicola. 1969. La Storia e il Diritto della Dataria Apostolica dalle Origini ai Nostri Giorni. Contributi alla storia del diritto canonico. Naples: Athena Mediterranea. [Google Scholar]
- Suchikova, Yana, Natalia Tsybuliak, Jaime A. Teixeira da Silva, and Serhii Nazarovets. n.d.GAIDeT (Generative AI Delegation Taxonomy): A Taxonomy for Humans to Delegate Tasks to Generative Artificial Intelligence in Scientific Research and Publishing. Accountability in Research, 1–27. [Google Scholar] [CrossRef]
- TildeAI. 2025. TildeOpen-30b. Hugging Face, June 6, Available online: https://huggingface.co/TildeAI/TildeOpen-30b (accessed on 1 May 2025).
- TriplyDB. 2025. TriplyDB/Yasgui. TypeScript. May 31, TriplyDB, Released July 11. Available online: https://github.com/TriplyDB/Yasgui (accessed on 1 May 2025). First published 2014.
- Tudor, Crina, Beata Megyesi, and Robert Östling. 2025. Prompting the Past: Exploring Zero-Shot Learning for Named Entity Recognition in Historical Texts Using Prompt-Answering LLMs. Paper presented at 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025), Albuquerque, NM, USA, May 4; Edited by Anna Kazantseva, Stan Szpakowicz, Stefania Degaetano-Ortlieb, Yuri Bizzoni and Janis Pagel. Albuquerque: Association for Computational Linguistics. [Google Scholar] [CrossRef]
- Valleriani, Matteo. 2025. Large Language Models That Power AI Should Be Publicly Owned. Technology. The Guardian. May 26. Available online: https://www.theguardian.com/technology/2025/may/26/large-language-models-that-power-ai-should-be-publicly-owned (accessed on 1 May 2025).
- van Hage, Willem Robert, Véronique Malaisé, Gerben de Vries, Guus Schreiber, and Maarten van Someren. 2009a. Combining Ship Trajectories and Semantics with the Simple Event Model (SEM). Paper presented at 1st ACM International Workshop on Events in Multimedia (EiMM ’09), Beijing, China, October 23; pp. 73–80. [Google Scholar] [CrossRef]
- van Hage, Willem Robert, Véronique Malaisé, Roxane Segers, Laura Hollink, and Guus Schreiber. 2009b. The Simple Event Model. Available online: https://semanticweb.cs.vu.nl/2009/11/sem/ (accessed on 1 May 2025).
- van Hage, Willem Robert, Véronique Malaisé, Roxane Segers, Laura Hollink, and Guus Schreiber. 2011. Design and Use of the Simple Event Model (SEM). Journal of Web Semantics 9: 128–36. [Google Scholar] [CrossRef]
- Viana, Antonio. 2018. Introducción histórica y canónica al oficio eclesiástico. Ius Canonicum 58: 709–40. [Google Scholar] [CrossRef]
- Xie, Tingyu, Qi Li, Yan Zhang, Zuozhu Liu, and Hongwei Wang. 2024. Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models. arXiv arXiv:2311.08921. [Google Scholar] [CrossRef]
- You, Doohee, Andy Parisi, Zach Vander Velden, and Lara Dantas Inojosa. 2025. LLM-as-Classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification Using Large Language Models. arXiv arXiv:2508.16478. [Google Scholar] [CrossRef]
- Zhu, Yutao, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen. 2025. Large Language Models for Information Retrieval: A Survey. ACM Transactions on Information Systems. ahead of print. [Google Scholar] [CrossRef]
- Zwicklbauer, Stefan, Christin Seifert, and Michael Granitzer. 2016. Robust and Collective Entity Disambiguation through Semantic Embeddings. Paper presented at 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’16), New York, NY, USA, July 7; pp. 425–34. [Google Scholar] [CrossRef]


| Class | Role/Label | Literal/Span | Encoded | Description |
|---|---|---|---|---|
| Person | Providee | Mattheus Cittadinus | Family: Cittadinus; Given: Mattheus | The name of the individual being appointed. |
| Person | Former Possessor | Marci Antonii Amaroni | Family: Amaronus; Given: Marcus Antonius | The name of the former possessor of the office. |
| Date | Event Date | Pridie Id Septembri a ii | 1622-09-12 (i.e., 2nd year of pontificate Gregory XV) | The date of the decision (granting of the supplication). |
| Date | Vacancy Date | de augusti prox pret | 1622-08 | The date of the vacancy. |
| Place | Place of Event | Rome apud SMM | Rome, Santa Maria Maggiore | The administrative location of the dating/granting of the papal grace. |
| Place | Location of Institution | Senen | Siena | The location of the benefice’s holding institution, e.g., a church. |
| Institution | In Diocese | Senen | Diocese of Siena | The diocese holding the benefice. |
| Type | Benefice Category | Canonicatu et praebenda | Canonship | The awarded benefice category. |
| Type | Church Category | ecclesiae | Church | The ecclesiastical institution to which the office is attached. |
| Type | Vacancy Category | per obitum/defuncti | Death of predecessor | The reason for the vacancy and office reassignment. |
| Type | Deceased in Curia | extra | Outside the Curia | Indicates whether the death occurred inside or outside the Curia. |
| Type | Source Subregister | per obitum | “Per obitum” sub-register | The sub-register from which the data originates. |
| Monetary Value | Benefice Taxation | 24 duc | 24 ducats | The tax valuation of the office in Apostolic Chamber’s currency. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sander, C. Minimal Computing and Weak AI for Historical Research: The Case of Early Modern Church Administration. Histories 2025, 5, 59. https://doi.org/10.3390/histories5040059
Sander C. Minimal Computing and Weak AI for Historical Research: The Case of Early Modern Church Administration. Histories. 2025; 5(4):59. https://doi.org/10.3390/histories5040059
Chicago/Turabian StyleSander, Christoph. 2025. "Minimal Computing and Weak AI for Historical Research: The Case of Early Modern Church Administration" Histories 5, no. 4: 59. https://doi.org/10.3390/histories5040059
APA StyleSander, C. (2025). Minimal Computing and Weak AI for Historical Research: The Case of Early Modern Church Administration. Histories, 5(4), 59. https://doi.org/10.3390/histories5040059

