MDPI - Publisher of Open Access Journals

18 pages, 1651 KB

Open AccessArticle

The Penetration of Digital Methods into Historical Scholarship: A Text-Mining Analysis of Russian Publications

by Zinaida Sokova, Valery Kruzhinov and Anna Glazkova

Publications 2026, 14(1), 8; https://doi.org/10.3390/publications14010008 - 20 Jan 2026

Viewed by 825

The integration of digital technologies into historical research is a global trend; however, its manifestation varies across national academic traditions. This study investigates the explicit articulation and terminological adoption of digital methods in Russian historical science by analyzing the prevalence and dynamics of specific technological terms in a large corpus of publications. We first constructed a controlled thesaurus of 166 digital technologies by manually curating keyphrases from Russia’s primary specialized journal in the field (“Istoricheskaya Informatika”, Historical Informatics). This vocabulary was then used to perform text-mining on two distinct corpora: a broad sample of 95K Russian-language history articles from various journals (2004–2024) and a focused sample of publications on the Great Patriotic War History from the Russian Science Citation Index (RSCI, 2014–2023). Our quantitative analysis reveals the frequency, trends, and thematic context of digital method mentions. The findings highlight a significant disparity between the specialized discourse of “Istoricheskaya Informatika” and the mainstream historical publications, while also identifying specific areas (such as archaeological studies) where certain technologies have gained traction. This research offers a novel, data-driven perspective on the “digital turn” in Russian historiography and contributes to the comparative study of digital humanities’ global development. Full article

► Show Figures

Figure 1

22 pages, 3887 KB

Open AccessArticle

The Impact of Linguistic Variations on Emotion Detection: A Study of Regionally Specific Synthetic Datasets

by Fernando Henrique Calderón Alvarado

Appl. Sci. 2025, 15(7), 3490; https://doi.org/10.3390/app15073490 - 22 Mar 2025

Cited by 2 | Viewed by 1658

Abstract

This study examines the role of linguistic regional variations in synthetic dataset generation and their impact on emotion detection performance. Emotion detection is essential for natural language processing (NLP) applications such as social media analysis, customer service, and mental health monitoring. To explore this, synthetic datasets were generated using a state-of-the-art language model, incorporating English variations from the United States, United Kingdom, and India, alongside a general baseline dataset. Two levels of prompt specificity were employed to assess the influence of regional linguistic nuances. Statistical analyses—including frequency distribution, term frequency-inverse document frequency (TF-IDF), type–token ratio (TTR), hapax legomena, pointwise mutual information (PMI) scores, and key-phrase extraction—revealed significant linguistic diversity and regional distinctions in the generated datasets. To evaluate their effectiveness, classification experiments were conducted with two models using bidirectional encoder representations from transformers (BERT) and its de-noising sequence to sequence variation (BART), beginning with zero-shot classification on the contextualized affect representations for emotion recognition (CARER) dataset, followed by fine-tuning with both baseline and region-specific datasets. Results demonstrated that region-specific datasets, particularly those generated with detailed prompts, significantly improved classification accuracy compared to the baseline. These findings underscore the importance of incorporating global linguistic variations in synthetic dataset generation, offering insights into how regional adaptations can enhance emotion detection models for diverse NLP applications. Full article

(This article belongs to the Special Issue Application of Affective Computing)

► Show Figures

Figure 1

20 pages, 2731 KB

Open AccessArticle

A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles

by Mohammad Badrul Alam Miah, Suryanti Awang, Md Mustafizur Rahman, A. S. M. Sanwar Hosen and In-Ho Ra

Electronics 2022, 11(17), 2773; https://doi.org/10.3390/electronics11172773 - 2 Sep 2022

Cited by 5 | Viewed by 2307

Abstract

Automated keyphrase extraction is crucial for extracting and summarizing relevant information from a variety of publications in multiple domains. However, the extraction of good-quality keyphrases and the summarising of information to a good standard have become extremely challenging in recent research because of the advancement of technology and the exponential development of digital sources and textual information. Because of this, the usage of keyphrase features for keyphrase extraction techniques has recently gained tremendous popularity. This paper proposed a new unsupervised region-based keyphrase centroid and frequency analysis technique, named the KCFA technique, for keyphrase extraction as a feature. Data/datasets collection, data pre-processing, statistical methodologies, curve plotting analysis, and curve fitting technique are the five main processes in the proposed technique. To begin, the technique collects multiple datasets from diverse sources, which are then input into the data pre-processing step by utilizing some text pre-processing processes. Afterward, the region-based statistical methodologies receive the pre-processed data, followed by the curve plotting examination and, lastly, the curve fitting technique. The proposed technique is then tested and evaluated using ten (10) best-accessible benchmark datasets from various disciplines. The proposed approach is then compared to our available methods to demonstrate its efficacy, advantages, and importance. Lastly, the results of the experiment show that the proposed method works well to analyze the centroid and frequency of keyphrases from academic articles. It provides a centroid of 706.66 and a frequency of 38.95% in the first region, 2454.21 and 7.98% in the second region, for a total frequency of 68.11%. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI