The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications
Abstract
1. Introduction
- RQ1:What is the overall prevalence and specific composition of explicitdigital method terminology in the metadata of mainstream Russian historical publications compared to a focused sub-field?
- RQ2: How has the frequency of key digital methods (identified through our thesaurus) evolved over the last decades, and can we identify emerging trends or stagnating techniques?
2. Materials and Methods
2.1. Thesaurus Construction
2.1.1. Taxonomic Framework
- 1.
- Hard-Core Digital Methodsencompass unambiguous technical approaches with high diagnostic value, including:
- Programming & Development (specific languages and frameworks, such as Python, R, SQL (Structured Query Language), TensorFlow, PyTorch);
- Artificial Intelligence & Machine Learning (neural networks, deep learning, and specialized algorithms);
- Natural Language Processing & Computational Linguistics (tokenization, named entity recognition, semantic analysis, sentiment analysis, etc.);
- GIS & Spatial Analysis (georeferencing, spatial analysis, historical GIS applications);
- Computer Vision & 3D Technologies (photogrammetry, lidar scanning, 3D reconstruction, OCR—Optical Character Recognition, etc.);
- Visualization & VR/AR, i.e., Virtual Reality, Augmented Reality (interactive dashboards, virtual reconstruction, data visualization platforms);
- Digital Archives & Infrastructures (datasets, corpora, archives, etc.);
- Specialized Tools & Platforms (other digital methods).
- 2.
- Secondary Methodological Indicators (Second-Tier) include broader computational concepts requiring contextual validation:
- Analytical approaches (data analysis, statistical methods, modeling);
- Infrastructure concepts (databases, web applications, digital archives);
- Process characteristics (automation, digitization, computational methods).
- Diagnostic Specificity: The Hard-Core terms exhibit high contextual stability and minimal semantic ambiguity. Technical proper nouns (“Python,” “QGIS,” “TensorFlow”) and domain-specific techniques (“NER,” “LDA”) maintain consistent referential meaning, making them reliable indicators of digital methodological commitment.
- Contextual Polysemy Mitigation: The Second-Tier terms demonstrate higher contextual dependency. Concepts like “modeling” and “visualization” address both computational and traditional methodological traditions, requiring triangulation with other digital indicators for accurate classification. Another notable example is the Russian word “граф” which carries dual meanings of “earl” (a noble title) and “graph” (a mathematical structure). In Russian historical scholarship, the aristocratic meaning occurs significantly more frequently, creating potential ambiguity in automated method identification. Such homonyms require additional human checking to avoid disambiguation between historical concepts and computational methods.
- Disciplinary Adaptation: Historical scholarship often employs computational methods using discipline-specific nomenclature rather than technical terminology. The bipartite system accommodates this linguistic diversity while maintaining classification rigor.
2.1.2. Compilation and Validation
- Core Inclusion: Specific technologies, software platforms, and technical methods with minimal contextual ambiguity
- Contextual Inclusion: Broader methodological concepts that demonstrate digital approaches when combined with core terms
- Exclusion: General historical concepts, geographical terms, and traditional methodological approaches
2.2. Corpus Preparation
- Corpus 1: General History. This dataset comprises metadata for approximately 95,000 scholarly articles available in a digital format. The General History corpus was constructed using the CyberLeninka open-access platform1 as the primary data source, which aggregates Russian-language scholarly journals across multiple indexing tiers, including journals indexed in the Russian Science Citation Index (RSCI), Web of Science, and Scopus. The Cyberleninka is known for its extensive coverage of journals across various disciplines (Gasparyan et al., 2019; Semyachkin et al., 2014).Since not all journals provide full-text articles in open access, the data collection was focused on metadata that typically encapsulates the core methodological and thematic substance of an article: namely, the title, abstract, and author keyphrases (Zhang et al., 2025). For each entry, these three elements were concatenated into a single text field, forming a representative proxy for the article’s primary content. Data were harvested using automated parsing techniques with the BeautifulSoup library2 for Python. The data collection was performed in October 2024. Subsequent to the initial collection, a data preprocessing procedure was implemented. This involved the removal of entries not in the Russian language and records lacking a substantive abstract.The final curated corpus consists of approximately 95,000 text entries published between 2004 and 2023. It is important to acknowledge that the open-access nature of the CyberLeninka platform may introduce certain coverage biases. For instance, prestigious subscription-based journals with limited open-access policies may be underrepresented. Similarly, the corpus likely reflects the output of major research centers more comprehensively than that of smaller regional institutions. Furthermore, authors publishing in open-access venues might exhibit different patterns in methodological self-presentation compared to those in traditional subscription journals. Despite these potential biases, the corpus’s substantial size, temporal span, and inclusion of journals from multiple indexing tiers (RSCI, Web of Science, Scopus) provide a robust and extensive sample of mainstream Russian-language historical discourse, suitable for analyzing the visibility of digital methods in scholarly communication.
- Corpus 2: Great Patriotic War History. The study of the Great Patriotic War, the term used in Soviet and post-Soviet historiography to denote the period of World War II from the Nazi Germany invasion of the USSR in 1941 to its victory in 1945, represents one of the most prominent and enduring sub-fields within Russian historical science.The Great Patriotic War History corpus was compiled from journals indexed in the Russian Science Citation Index on the Web of Science platform (RSCI WoS). The RSCI WoS database aggregates leading Russian scholarly journals and is integrated into the international WoS citation system, ensuring a selection of high-quality publications. The corpus was constructed by analyzing 13 leading Russian periodicals in history, selected by experts in the field (Sokova et al., 2025). These journals were included based on their status within the RSCI WoS (as of May 2024) and their focus on Russian history.For all articles published in these journals between 2014 and 2023, metadata, which include titles, keyphrases, and abstracts, were collected from their open-access sources on the CyberLeninka platform and official journal websites using the BeautifulSoup library. This metadata were concatenated into a single text per article, similar to the General History corpus. The data collection, performed automatically from May to June 2024, initially yielded over ten thousand articles from the target journals and periods.To identify the articles specifically related to the Great Patriotic War, a two-stage filtering process was employed. First, based on an empirical analysis of publications in the RSCI database, expert historians compiled two lists of thematic markers: (a) terms that unequivocally identify the topics of the Great Patriotic War and (b) terms that possibly indicate such topics. The collected texts were lemmatized, and a search was conducted for terms and their derivatives from the first (definitive) list. The resulting articles formed the initial core collection. Subsequently, a search for terms from the second (potential) list was conducted. The articles containing these markers were manually reviewed by experts and added to the collection only if they were confirmed to pertain to the Great Patriotic War. Finally, the entire collection underwent a final expert review. The resulting curated corpus consists of 544 texts, providing a focused sample to examine the penetration of digital methods into a well-established research area. Expert validation was conducted by two researchers from the University of Tyumen, Russia, with formal training in digital history. Disagreements were resolved through discussion until consensus was reached.
- Language filter: only Russian-language publications were retained.
- Document consistency filter: articles were included if they contained a title, abstract, and author-provided keyphrases available in open access.
2.3. Analysis Procedure
3. Results
3.1. Overall Prevalence of Digital Methods
3.2. Composition and Thematic Context of Digital Terminology
- GIS technologies were applied for spatio-temporal analysis (e.g., modeling migration routes, settlement structures) and the reconstruction of historical landscapes.
- Cluster analysis was used to identify typologies and patterns in data, for example, to classify archaeological artifacts or analyze parliamentary voting.
- 3D modeling and photogrammetry were primarily used in archaeology and architectural history for the virtual reconstruction of cultural heritage objects.
- The Second-Tier terms (modeling, data analysis, databases, etc.) often described the creation of digital research infrastructure (e.g., databases of archival documents) or the application of basic statistical methods to analyze historical sources.
3.3. Dynamics of Mentions over Time
4. Discussion
4.1. Limited Discursive Penetration of Digital Method Terminology
4.2. Potential Barriers to Adoption
- Methodological Traditions: Russian historical science has strong traditions of qualitative, source-based analysis, within which digital methods may be perceived as alien.
- Infrastructure and Training: The lack of necessary digital infrastructure and systematic training in digital methods within history departments creates a high barrier to entry for researchers.
- Linguistic and Cultural Context: The dominance of English-language interfaces in complex software and a lack of localized educational resources could be an additional barrier.
4.3. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Digital Methods
- 1.
- Hard-Core Digital Methods:
- Programming & Development: “python”, “r package”, “programming language”, “sql”, “postgresql”, “mysql”, “dbms”, “api”, “json”, “xml”.
- Artificial Intelligence & Machine Learning: “machine learning”, “neural network”, “artificial intelligence”, “deep learning”, “tensorflow”, “pytorch”, “svm”, “arima”, “cluster analysis”, “regression analysis”.
- Natural Language Processing & Computational Linguistics: “nlp”, “natural language processing”, “computational linguistics”, “topic modeling”, “lda”, “large language models”, “semantic analysis”, “named entity recognition”, “ner”, “vectorization”, “word2vec”, “embedding”, “stylometry”, “spacy”, “nltk”, “stanza”, “udpipe”, “treetagger”, “mystem”, “huggingface”, “allennlp”, “gensim”, “transformers”, “sentiment analysis”, “dependency parsing”, “syntactic analysis”, “keyness analysis”, “keyword analysis”, “collocation analysis”, “tokenization”, “lemmatization”, “data annotation”.
- GIS & Spatial Analysis: “qgis”, “arcgis”, “nextgis”, “geoserver”, “postgis”, “geopandas”, “geographic information system”, “web-gis”, “historical gis”, “spatial analysis”, “georeferencing”, “geocoding”, “leaflet”, “openlayers”, “mapbox”, “cartography”, “gps”, “geotracker”, “track navigation”, “tracker”, “aerial photography”.
- Computer Vision & 3D Technologies: “computer vision”, “pattern recognition”, “ocr”, “opencv”, “3d model”, “3d reconstruction”, “3d”, “lidar”, “laser scanning”, “photogrammetry”, “aerial survey”, “point cloud”, “digital elevation model”, “voxel”, “autodesk”, “3ds max”, “sketchup”, “bim”, “blender”, “three.js”.
- Visualization & VR/AR: “data visualization”, “tableau”, “gephi”, “d3.js”, “plotly”, “dashboard”, “interactive map”, “virtual reality”, “vr”, “augmented reality”, “ar”, “unity”, “unreal engine”.
- Digital Archives & Infrastructures: “digital archive”, “born-digital”, “iiif”, “dataset”, “text corpus”, “machine translation”.
- Specialized Tools & Platforms: “gpt”, “chatgpt”, “bert”, “deepseek”, “access”, “excel”, “wordpress”, “uav”, “prompt engineering”, “digital humanities”.
- 2.
- Second-Tier:
- Analytical approaches: “text analysis”, “data analysis”, “visualization”, “big data”, “mapping”, “quantitative analysis”, “statistical analysis”, “modeling”, “automatic classification”, “mathematical method”, “regression analysis”, “distant reading”.
- Infrastructure concepts: “database”, “data bank”, “web application”, “web service”, “web resource”, “internet resource”, “network resource”, “internet”, “information system”, “information retrieval system”, “electronic catalog”, “electronic archive”, “electronic resource”, “computing resources”, “cloud computing”, “distributed systems”, “data warehouse”, “database management system”
- Process characteristics: “data processing”, “software”, “text recognition”, “digitization”, “graph”, “datafication”, “software product”, “automated”, “automatic”, “computer-based”, “digital”, “interactive”, “computer model”, “interoperability”, “scalability”, “visual data”.
| 1 | https://cyberleninka.ru/, assessed on 11 November 2025. |
| 2 | https://beautiful-soup-4.readthedocs.io/en/latest/, assessed on 11 November 2025. |
References
- Bakels, J. H., Grotkopp, M., Scherer, T., Stratil, J., Jiaming, T. Z., & Dongrui, C. (2025). Matching computational analysis and human experience: Performative arts and the digital humanities. Digital Humanities Research, 5(2), 59. [Google Scholar]
- Berry, D. M. (2024). Post-digital humanities: Computation and cultural critique in the arts and humanities. EDUCAUSE Review, 49(3), 24–26. [Google Scholar]
- Bird, S. (2006, July 17–18). NLTK: The natural language toolkit. COLING/ACL 2006 Interactive Presentation Sessions (pp. 69–72), Sydney, Australia. [Google Scholar]
- Borodkin, L. I. (2014). Historical informatics at the Faculty of History of Moscow State University: From computers to supercomputers. In S. P. Karpov (Ed.), Problems of historiography, source studies and methods of historical research. Proceedings of the 5th scholarly readings in memory of academician I. D. kovalchenko (pp. 40–47). Moscow University Press. Available online: https://istina.msu.ru/publications/article/8751589/ (accessed on 8 January 2026). (In Russian)
- Borodkin, L. I. (2016). Modeling historical processes: From the reconstruction of reality to the analysis of alternatives. Aletheia. (In Russian) [Google Scholar]
- Borodkin, L. I. (2024). 20 years of the department of historical informatics of the faculty of history of Moscow State University: New trends in interdisciplinary research. Historical Informatics, 3, 16–32. (In Russian) [Google Scholar] [CrossRef]
- Cai, S., Venugopalan, S., Tomanek, K., Narayanan, A., Morris, M., & Brenner, M. (2022). Context-aware abbreviation expansion using large language models. In Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 1261–1275). Association for Computational Linguistics. [Google Scholar]
- Ehrmann, M., Bunout, E., & Clavert, F. (2023). Digitised historical newspapers: A changing research landscape (pp. 1–22). De Gruyter Oldenbourg. [Google Scholar]
- Frolov, A. A. (2017). A dynamic map as the basis of a historical map in a GIS environment. Historical Informatics, 2, 61–73. (In Russian) [Google Scholar]
- Garskova, I. M. (2018). Historical informatics: Methodological and historiographical aspects of development [Doctor dissertation, Russian State University for the Humanities]. (In Russian) [Google Scholar]
- Gasparyan, A. Y., Yessirkepov, M., Voronov, A. A., Koroleva, A. M., & Kitas, G. D. (2019). Comprehensive approach to open access publishing: Platforms and tools. Journal of Korean Medical Science, 34(27), e184. [Google Scholar] [CrossRef] [PubMed]
- Gousyatskaya, P., & Loukachevitch, N. (2025). Word sense disambiguation in Russian: A generative LLM approach. In M. Bakaev, R. Bolgov, A. Chizhik, A. Chugunov, V. Demareva, Y. Kabanov, R. Pereira, R. Elakkiya, & W. Zhang (Eds.), Internet and modern society, IMS 2025, communications in computer and information science (Vol. 2671). Springer. [Google Scholar]
- Joo, S., Hootman, J., & Katsurai, M. (2022). Exploring the digital humanities research agenda: A text mining approach. Journal of Documentation, 78(4), 853–870. [Google Scholar] [CrossRef]
- Korobov, M. (2015). Morphological analyzer and generator for Russian and Ukrainian languages. In International conference on analysis of images, social networks and texts (pp. 320–332). Springer International Publishing. [Google Scholar]
- Kovalenko, I. R. (2024). Realistic 3D model of the Albazinsky Fort. Bulletin of Amur State University. Series: Humanities, 104, 142–147. Available online: https://vestnik.amursu.ru/wp-content/uploads/2024/03/n104_142-147.pdf (accessed on 8 January 2026). (In Russian)
- Luzietti, R. B., Spadi, A., Giampietro, N., Mancuso, G., Caravale, A., D’Eredità, A., Caradonna, M., Moscati, P., Quochi, V., Monachini, M., & Degl’Innocenti, E. (2025). Digital humanities and heritage science: Moving from landscaping to a dynamic research observatory in an open science cloud. Umanistica Digitale, 9(20), 419–439. [Google Scholar]
- Nesterov, S. P. (2024). Dynamics of structures and historical reconstruction of the Albazin Fort on the Amur River. Vestnik NSU. Series: History and Philology, 23(7), 105–115. (In Russian) [Google Scholar] [CrossRef]
- Oberbichler, S., Boroş, E., Doucet, A., Marjanen, J., Pfanzelter, E., Rautiainen, J., Toivonen, H., & Tolonen, M. (2022). Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians. Journal of the Association for Information Science and Technology, 73(2), 225–239. [Google Scholar] [CrossRef] [PubMed]
- Salomatina, S. A. (2004). Commercial banks in Russia: Dynamics and structure of operations, 1864–1917. ROSSPEN. (In Russian) [Google Scholar]
- Salomatina, S. A., & Frenkel, O. I. (2016). Regional development of the Russian joint-stock commercial banks in the second half of the 19th century: Statistics and GIS-technologies. History (Electronic Scientific and Educational Journal), 7(51), 17. (In Russian) [Google Scholar] [CrossRef]
- Semyachkin, D., Kislyak, E., & Sergeev, M. (2014). CyberLeninka: Open access and CRIS trends leading to open science in Russia. Procedia Computer Science, 33, 136–139. (In Russian) [Google Scholar] [CrossRef]
- Silber-Varod, V., & Geri, N. (2025). Winds of generative AI: Research trends of digital humanities in computer science publications. Online Journal of Applied Knowledge Management (OJAKM), 13(1), 1–12. [Google Scholar] [CrossRef]
- Sokova, Z. N., Kruzhinov, V. M., & Glazkova, A. V. (2025). Topic modeling of scientific texts using BERTopic (Based on scientific abstracts about the Great Patriotic War of 2014–2023). NSU Vestnik. Series: Linguistics and Intercultural Communication, 23(3), 107–122. (In Russian) [Google Scholar] [CrossRef]
- Toktas, E. (2025). Future scenarios of digital humanities and post-humanist education. Journal of Foresight and Health Governance, 2(1), 21–31. [Google Scholar]
- Vladimirov, V. N. (2005). Historical geoinformatics: Geographic information systems in historical research. Altai University Press. (In Russian) [Google Scholar]
- Vladimirov, V. N., Garskova, I. M., & Frolov, A. A. (2020). Historical informatics in a new interdisciplinary field: Academic symposium dedicated to the 15th anniversary of the department of historical informatics of Moscow University. Historical Informatics, 1, 158–170. (In Russian) [Google Scholar] [CrossRef]
- Volkaert, F. (2021). OK computer? The digital turn in legal history: A methodological retrospective. Tijdschrift voor Rechtsgeschiedenis/Revue d’histoire du droit/The Legal History Review, 89(1–2), 1–46. [Google Scholar] [CrossRef]
- Zhang, C., Yan, X., Zhao, L., & Zhang, Y. (2025). Enhancing keyphrase extraction from academic articles using section structure information. Scientometrics, 130(4), 2311–2343. [Google Scholar] [CrossRef]
- Zherebyatev, D. I. (2014). Three-dimensional computer modeling methods in the tasks of historical reconstruction of monastic complexes of Moscow. MAKS-Press. (In Russian) [Google Scholar]
- Zherebyatev, D. I., Malyshev, A. A., & Moor, V. V. (2018). Gorgippia in the Archaic period: Methods and technologies of 3D reconstruction of an ancient Fortress-City. Historical Informatics, 25(3), 33–50. (In Russian) [Google Scholar] [CrossRef]







| Characteristic | General History | Great Patriotic War History |
|---|---|---|
| Average text length (symbols) | 981.28 | 1108.66 |
| Standard deviation for avg text length | 602.53 | 728.18 |
| Median text length (symbols) | 798 | 879 |
| Number of texts | 95,720 | 545 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sokova, Z.; Kruzhinov, V.; Glazkova, A. The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications. Publications 2026, 14, 8. https://doi.org/10.3390/publications14010008
Sokova Z, Kruzhinov V, Glazkova A. The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications. Publications. 2026; 14(1):8. https://doi.org/10.3390/publications14010008
Chicago/Turabian StyleSokova, Zinaida, Valery Kruzhinov, and Anna Glazkova. 2026. "The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications" Publications 14, no. 1: 8. https://doi.org/10.3390/publications14010008
APA StyleSokova, Z., Kruzhinov, V., & Glazkova, A. (2026). The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications. Publications, 14(1), 8. https://doi.org/10.3390/publications14010008

