Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Information Systems and Data Management".

Deadline for manuscript submissions: closed (20 August 2025) | Viewed by 19235

Special Issue Editor


E-Mail Website
Guest Editor
Facultad de Informática, Universidad Complutense de Madrid, 28001 Madrid, Spain
Interests: machine learning; artificial intelligence; e-learning; programming languages
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent decades, the rise of artificial intelligence has driven its application in various fields, including education. Applications exist that aim to analyze the data of the learning–teaching activity, both in face-to-face environments and distance-learning environments, using intelligent algorithms to extract information about the educational process. Using this information, it is possible to infer aspects such as the reasons for the success or failure of students, patterns of behavior and learning, and other predictions. Likewise, applications have also been developed that implement intelligent algorithms to automate the educational process. Related to this last point is the development of chatbots and approaches to ethics in the use of artificial intelligence. In this sense, an area of interest has developed related to the application of artificial intelligence to problem solving in education. This Special Issue will bring together works that show the latest advances in the application of artificial intelligence in the educational field, as well as those describing specific experiences and applications to certain problems.

This Special Issue will serve as a meeting point for all researchers working in these fields, whether they study theories or applications. The topics of interest include, but are not limited to, the following:

  • Generative AI for creating educational content and learning materials;
  • Machine learning and deep learning applications in education;
  • Artificial intelligence for personalized learning and adaptive education;
  • Big data and learning analytics in education;
  • Intelligent tutoring systems and virtual teaching assistants;
  • Predictive analytics and data-driven decision-making in education;
  • AI-driven content creation and curriculum development;
  • Ethics and bias in AI-powered educational tools;
  • Learning experience platforms (LXP) and smart educational environments;
  • Data privacy and security in AI-enhanced learning systems;
  • Conversational AI and chatbots for enhancing student engagement;
  • Gamification and immersive learning experiences using AI;
  • Natural language processing (NLP) in educational assessments and feedback.

Both review articles on the state of the art and experimental/theoretical articles are welcome.

Dr. Antonio Sarasa-Cabezuelo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • e-learning
  • Machine learning
  • Artificial intelligence
  • Data analysis
  • Algorithms
  • Big data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issues

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

31 pages, 1887 KB  
Article
ZaQQ: A New Arabic Dataset for Automatic Essay Scoring via a Novel Human–AI Collaborative Framework
by Yomna Elsayed, Emad Nabil, Marwan Torki, Safiullah Faizullah and Ayman Khalafallah
Data 2025, 10(9), 148; https://doi.org/10.3390/data10090148 - 19 Sep 2025
Viewed by 616
Abstract
Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows [...] Read more.
Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows progress in Arabic natural language processing for educational use. While manual annotation by human experts remains the most accurate method for essay evaluation, it is often too costly and time-consuming to create large-scale datasets, especially for low-resource languages like Arabic. In this work, we introduce a human–AI collaborative framework designed to overcome the shortage of scored Arabic essays. Leveraging QAES, a high-quality annotated dataset, our approach uses Large Language Models (LLMs) to generate multidimensional essay evaluations across seven key writing traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Structure. To ensure accuracy and consistency, we design prompting strategies and validation procedures tailored to each trait. This system is then applied to two unannotated Arabic essay datasets: ZAEBUC and QALB. As a result, we introduce ZaQQ, a newly annotated dataset that merges ZAEBUC, QAES, and QALB. Our findings demonstrate that human–AI collaboration can significantly enhance the availability of labeled resources without compromising assessment quality. The proposed framework serves as a scalable and replicable model for addressing data annotation challenges in low-resource languages and supports the broader goal of expanding access to automated educational assessment tools where expert evaluation is limited. Full article
Show Figures

Figure 1

18 pages, 1810 KB  
Article
Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics
by Liga Paura, Irina Arhipova, Gatis Vitols and Sandra Sproge
Data 2025, 10(7), 110; https://doi.org/10.3390/data10070110 - 8 Jul 2025
Viewed by 2643
Abstract
The aim of this study is to identify the key factors contributing to student dropout and to develop a predictive model that estimates the dropout risk of students based on their entry characteristics and enrolment registration data. Our analysis is based on the [...] Read more.
The aim of this study is to identify the key factors contributing to student dropout and to develop a predictive model that estimates the dropout risk of students based on their entry characteristics and enrolment registration data. Our analysis is based on the registration and academic data of 971 full-time and part-time bachelor’s students in five faculties, who were enrolled in the academic year 2021–2022 at the Latvia University of Life Sciences and Technologies (LBTU). The dropout analysis was done during the 3.5 years of study, when the students started their last semester in engineering and information technology, agriculture and food technology, economics and social sciences, and forest and environmental studies and when veterinary medicine students had completed more than half of their program of study. Survival analysis methods were used during the study. Students’ dropout risk in relation to gender, faculty, priority to study in the program, and secondary school performance (SM) was estimated using the Proportional hazard model (Cox model). The highest student dropout was observed during the first year of study. Secondary school performance was a significant predictor of students’ dropout risk; students with higher SM had a lower dropout risk (HR = 0.66, p < 0.05). As well, student dropout can be explained by faculty or study programme. Students in economics and social sciences were at lower dropout risk than the students from the other faculties. Results show the model’s concordance index was 0.59, and this indicates that additional or stronger predictors may be needed to improve model performance. Full article
Show Figures

Figure 1

24 pages, 1586 KB  
Article
Effective Education System for Athletes Utilising Big Data and AI Technology
by Martin Mičiak, Dominika Toman, Roman Adámik, Ema Kufová, Branislav Škulec, Nikola Mozolová and Aneta Hoferová
Data 2025, 10(7), 102; https://doi.org/10.3390/data10070102 - 24 Jun 2025
Viewed by 1246
Abstract
Education leads to building successful careers. However, different groups of students have different studying preferences. Our target group are athletes, combining their education and sports training. The main objective is to provide recommendations for an effective education system for athletes, improving their chances [...] Read more.
Education leads to building successful careers. However, different groups of students have different studying preferences. Our target group are athletes, combining their education and sports training. The main objective is to provide recommendations for an effective education system for athletes, improving their chances of finding new careers after leaving sports. Such a system must include Big Data and utilise AI possibilities currently available that support athletes’ career planning and development in a meaningful way. The main objective is specified by the following partial objectives: identifying what types of Big Data to analyse in connection with the athletes’ education; revealing what AI tools to include in the athletes’ education for their better preparation for a career after sports; determining what knowledge of AI and Big Data athletes need to stay relevant once they enter the labour market. Our study combines secondary and primary data sources. The secondary data (used in the orientation analysis) include case studies on AI and Big Data connected to education. The primary data were collected via a survey performed on over 200 Slovak junior athletes. The results show directions for the sports policymakers and sports organisations’ managers willing to improve their athletes’ career prospects. Full article
Show Figures

Figure 1

10 pages, 267 KB  
Article
Dataset on Programming Competencies Development Using Scratch and a Recommender System in a Non-WEIRD Primary School Context
by Jesennia Cárdenas-Cobo, Cristian Vidal-Silva and Nicolás Máquez
Data 2025, 10(6), 86; https://doi.org/10.3390/data10060086 - 3 Jun 2025
Viewed by 751
Abstract
The ability to program has become an essential competence for individuals in an increasingly digital world. However, access to programming education remains unequal, particularly in non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. This study presents a dataset resulting from an educational intervention [...] Read more.
The ability to program has become an essential competence for individuals in an increasingly digital world. However, access to programming education remains unequal, particularly in non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. This study presents a dataset resulting from an educational intervention designed to foster programming competencies and computational thinking skills among primary school students aged 8 to 12 years in Milagro, Ecuador. The intervention integrated Scratch, a block-based programming environment that simplifies coding by eliminating syntactic barriers, and the CARAMBA recommendation system, which provided personalized learning paths based on students’ progression and preferences. A structured educational process was implemented, including an initial diagnostic test to assess logical reasoning, guided activities in Scratch to build foundational skills, a phase of personalized practice with CARAMBA, and a final computational thinking evaluation using a validated assessment instrument. The resulting dataset encompasses diverse information: demographic data, logical reasoning test scores, computational thinking test results pre- and post-intervention, activity logs from Scratch, recommendation histories from CARAMBA, and qualitative feedback from university student tutors who supported the intervention. The dataset is anonymized, ethically collected, and made available under a CC-BY 4.0 license to encourage reuse. This resource is particularly valuable for researchers and practitioners interested in computational thinking development, educational data mining, personalized learning systems, and digital equity initiatives. It supports comparative studies between WEIRD and non-WEIRD populations, validation of adaptive learning models, and the design of inclusive programming curricula. Furthermore, the dataset enables the application of machine learning techniques to predict educational outcomes and optimize personalized educational strategies. By offering this dataset openly, the study contributes to filling critical gaps in educational research, promoting inclusive access to programming education, and fostering a more comprehensive understanding of how computational competencies can be developed across diverse socioeconomic and cultural contexts. Full article
Show Figures

Figure 1

29 pages, 4066 KB  
Article
SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning
by Muhammad Adnan Aslam, Fiza Murtaza, Muhammad Ehatisham Ul Haq, Amanullah Yasin and Numan Ali
Data 2025, 10(3), 27; https://doi.org/10.3390/data10030027 - 20 Feb 2025
Cited by 4 | Viewed by 2196
Abstract
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student [...] Read more.
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies. Full article
Show Figures

Figure 1

17 pages, 662 KB  
Article
A Bayesian State-Space Approach to Dynamic Hierarchical Logistic Regression for Evolving Student Risk in Educational Analytics
by Moeketsi Mosia
Data 2025, 10(2), 23; https://doi.org/10.3390/data10020023 - 7 Feb 2025
Viewed by 1724
Abstract
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a [...] Read more.
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a dynamic hierarchical logistic regression model in a fully Bayesian framework to address these shortcomings. Our method leverages partial pooling across students and employs a state-space formulation, allowing each student’s log-odds of failure to evolve over multiple assessments. By using Markov chain Monte Carlo for inference, we obtain robust posterior estimates and credible intervals for both population-level and individual-specific effects, while posterior predictive checks ensure model adequacy and calibration. Results from simulated and real-world datasets indicate that the proposed approach more accurately tracks fluctuations in student risk compared to static logistic regression, and it yields interpretable insights into how engagement patterns and demographic factors influence failure probability. We conclude that a Bayesian dynamic hierarchical model not only enhances prediction of at-risk students but also provides actionable feedback for instructors and administrators seeking evidence-based interventions. Full article
Show Figures

Figure 1

14 pages, 770 KB  
Article
Stress Factors in Higher Education: A Data Analysis Case
by Rodolfo Bojorque, Fernando Moscoso, Fernando Pesántez and Ángela Flores
Data 2025, 10(2), 22; https://doi.org/10.3390/data10020022 - 7 Feb 2025
Viewed by 2832
Abstract
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and [...] Read more.
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and faculty workload to uncover patterns affecting academic outcomes. The study found that UPS exhibits a stable educational system, maintaining consistent metrics across student success indicators. However, the COVID-19 pandemic presented unique stressors, evidenced by a paradoxical increase in student grades during heightened faculty stress levels. This anomaly suggests a potential link between academic rigor and faculty well-being during systemic disruptions. Stressors affecting students directly correlated with reduced academic performance, highlighting the importance of early detection and intervention. Conversely, faculty stress was reflected in adjustments to grading practices, raising questions about institutional pressures and faculty motivation. These findings emphasize the value of proactive data analytics in identifying stress-induced anomalies to support student success and faculty well-being. The study advocates for further research on faculty burnout, motivation, and institutional strategies to mitigate stressors, underscoring the potential of data-driven approaches to enhance the quality and sustainability of higher education ecosystems. Full article
Show Figures

Figure 1

Other

Jump to: Research

10 pages, 1572 KB  
Data Descriptor
Simultaneous EEG-fNIRS Data on Learning Capability via Implicit Learning Induced by Cognitive Tasks
by Chayapol Chaiyanan, Thanate Angsuwatanakul, Keiji Iramina and Boonserm Kaewkamnerdpong
Data 2025, 10(8), 131; https://doi.org/10.3390/data10080131 - 18 Aug 2025
Viewed by 734
Abstract
The development of real-time learning assessment tools is hindered by an incomplete understanding of the underlying neural mechanisms. To address this gap, this study aimed to identify the specific neural correlates of implicit learning, a foundational process crucial for skill acquisition. We collected [...] Read more.
The development of real-time learning assessment tools is hindered by an incomplete understanding of the underlying neural mechanisms. To address this gap, this study aimed to identify the specific neural correlates of implicit learning, a foundational process crucial for skill acquisition. We collected simultaneous electroencephalography and functional near-infrared spectroscopy data from thirty healthy adults (ages 21–29) performing a serial reaction time task designed to induce implicit learning. By capturing both electrophysiological and hemodynamic responses concurrently at shared locations, this dataset offers a unique opportunity to investigate neurovascular coupling during implicit learning and gain deeper insights into the neural mechanisms of learning. The dataset is categorized into two groups: participants who demonstrated implicit learning (based on post-experiment interviews) and those who did not. This dataset enables the identification of prominent brain regions, features, and temporal patterns associated with successful implicit learning. This identification will form the basis for future real-time learning assessment tools. Full article
Show Figures

Figure 1

8 pages, 529 KB  
Data Descriptor
An Extended Dataset of Educational Quality Across Countries (1970–2023)
by Hanol Lee and Jong-Wha Lee
Data 2025, 10(8), 130; https://doi.org/10.3390/data10080130 - 15 Aug 2025
Viewed by 938
Abstract
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student [...] Read more.
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student performance, their limited coverage across countries and years constrains broader analyses. To address this limitation, we harmonized observed test scores across assessments and imputed missing values using both linear interpolation and machine learning (Least Absolute Shrinkage and Selection Operator (LASSO) regression). The dataset included (i) harmonized test scores for 15 year olds, (ii) annual educational quality indicators for the 15–19 age group, and (iii) educational quality indexes for the working-age population (15–64). These measures are provided in machine-readable formats and support empirical research on human capital, economic development, and global education inequalities across economies. Full article
Show Figures

Figure 1

15 pages, 770 KB  
Data Descriptor
NPFC-Test: A Multimodal Dataset from an Interactive Digital Assessment Using Wearables and Self-Reports
by Luis Fernando Morán-Mirabal, Luis Eduardo Güemes-Frese, Mariana Favarony-Avila, Sergio Noé Torres-Rodríguez and Jessica Alejandra Ruiz-Ramirez
Data 2025, 10(7), 103; https://doi.org/10.3390/data10070103 - 30 Jun 2025
Viewed by 723
Abstract
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol [...] Read more.
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol conducted at the Experiential Classroom of the Institute for the Future of Education. The dataset was built by collecting multimodal indicators such as neuronal, physiological, and facial data using a portable EEG headband, a medical-grade biometric bracelet, a high-resolution depth camera, and self-report questionnaires. The participants were exposed to a digital test lasting 20 min, composed of audiovisual stimuli and cognitive challenges, during which synchronized data from all devices were gathered. The dataset includes timestamped records related to emotional valence, arousal, and concentration, offering a valuable resource for multimodal learning analytics (MMLA). The recorded data were processed through calibration procedures, temporal alignment techniques, and emotion recognition models. It is expected that the NPFC-Test dataset will support future studies in human–computer interaction and educational data science by providing structured evidence to analyze cognitive and emotional states in learning processes. In addition, it offers a replicable framework for capturing synchronized biometric and behavioral data in controlled academic settings. Full article
Show Figures

Figure 1

18 pages, 6224 KB  
Data Descriptor
A Structured Dataset for Automated Grading: From Raw Data to Processed Dataset
by Ibidapo Dare Dada, Adio T. Akinwale and Ti-Jesu Tunde-Adeleke
Data 2025, 10(6), 87; https://doi.org/10.3390/data10060087 - 6 Jun 2025
Cited by 1 | Viewed by 2170
Abstract
The increasing volume of student assessments, particularly open-ended responses, presents a significant challenge for educators in ensuring grading accuracy, consistency, and efficiency. This paper presents a structured dataset designed for the development and evaluation of automated grading systems in higher education. The primary [...] Read more.
The increasing volume of student assessments, particularly open-ended responses, presents a significant challenge for educators in ensuring grading accuracy, consistency, and efficiency. This paper presents a structured dataset designed for the development and evaluation of automated grading systems in higher education. The primary objective is to create a high-quality dataset that facilitates the development and evaluation of natural language processing (NLP) models for automated grading. The dataset comprises student responses to open-ended questions from the Management Information Systems (MIS221) and Project Management (MIS415) courses at Covenant University, collected during the 2022/2023 academic session. The responses were originally handwritten, scanned, and transcribed into Word documents. Each response is paired with corresponding scores assigned by human graders, following a detailed marking guide. To assess the dataset’s potential for automated grading applications, several machine learning and transformer-based models were tested, including TF-IDF with Linear Regression, TF-IDF with Cosine Similarity, BERT, SBERT, RoBERTa, and Longformer. The experimental results demonstrate that transformer-based models outperform traditional methods, with Longformer achieving the highest Spearman’s Correlation of 0.77 and the lowest Mean Squared Error (MSE) of 0.04, indicating a strong alignment between model predictions and human grading. The findings highlight the effectiveness of deep learning models in capturing the semantic and contextual meaning of both student responses and marking guides, making it possible to develop more scalable and reliable automated grading solutions. This dataset offers valuable insights into student performance and serves as a foundational resource for integrating educational technology into automated assessment systems. Future work will focus on enhancing grading consistency and expanding the dataset for broader academic applications. Full article
Show Figures

Figure 1

11 pages, 1930 KB  
Data Descriptor
Towards a Datatset of Digitalized Historical German VET and CVET Regulations
by Thomas Reiser, Jens Dörpinghaus, Petra Steiner and Michael Tiemann
Data 2024, 9(11), 128; https://doi.org/10.3390/data9110128 - 3 Nov 2024
Viewed by 1550
Abstract
The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education [...] Read more.
The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and training) and CVET (continuing vocational education and training) regulations to support educational research. This dataset contains data from 1908 to the present and includes 2125 documents as PDF, 983 fully converted XML documents, and additional metadata for 7090 documents from the archive. We present an overview of the historical background and the challenges of processing different historical documents from three different federal states. Full article
Show Figures

Figure 1

Back to TopTop