Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (384)

Search Parameters:
Keywords = NLP model evaluation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 1259 KB  
Review
Integrating Artificial Intelligence in Audit Workflow: Opportunities, Architecture, and Challenges: A Systematic Review
by Ashif Anwar and Muhammad Osama Akeel
Account. Audit. 2026, 2(1), 4; https://doi.org/10.3390/accountaudit2010004 - 9 Mar 2026
Viewed by 297
Abstract
Background: This paper is a systematic review of 100 peer-reviewed articles (2015–2025) related to artificial intelligence (AI) applications in the auditing field, and includes machine learning, natural language processing, robotic process automation, and other AI methods. Purpose: The paper delves into the integration [...] Read more.
Background: This paper is a systematic review of 100 peer-reviewed articles (2015–2025) related to artificial intelligence (AI) applications in the auditing field, and includes machine learning, natural language processing, robotic process automation, and other AI methods. Purpose: The paper delves into the integration of these AI technologies into the audit workflow; empirical implications of these technologies on audit effectiveness; efficiency and quality; and technical, organizational, and regulatory obstacles that suggest more widespread adoption is still limited. Methods: Five large-scale databases and other sources were searched and selected using PRISMA; structured data were extracted, assessed in quality and narrative, and thematically analyzed. Results: The discussion indicates that machine learning-based anomaly detection and predictive analytics, document analysis through NLP, and automation through RPA are becoming part of planning, risk assessments, control tests, and substantive procedures/reporting, with improvements in detection capabilities, coverage and efficiency reported in various empirical and design science studies. The review also presents common architectural models of AI-enabled audit processes, including layered data and governance, model development and oversight, orchestration and automation, auditor-facing applications, and human-in-the-loop controls. Conclusions: The article proposes an AI-based audit workflow reference architecture and summarizes evidence on opportunities, threats, and implementation obstacles, highlighting gaps in longitudinal assessment, comparative evaluation of AI methods, and regulatory recommendations. The results have practical implications for auditors, standard-setters, and system designers seeking to revise the audit approach and regulations to enable AI-driven assurance. Full article
Show Figures

Figure 1

51 pages, 1067 KB  
Article
Language Models Are Polyglots: Language Similarity Predicts Cross-Lingual Transfer Learning Performance
by Juuso Eronen, Michal Ptaszynski, Tomasz Wicherkiewicz, Robert Borges, Katarzyna Janic, Zhenzhen Liu, Tanjim Mahmud and Fumito Masui
Mach. Learn. Knowl. Extr. 2026, 8(3), 65; https://doi.org/10.3390/make8030065 - 7 Mar 2026
Viewed by 267
Abstract
Selecting a source language for zero-shot cross-lingual transfer is typically done by intuition or by defaulting to English, despite large performance differences across language pairs. We study whether linguistic similarity can predict transfer performance and support principled source-language selection. We introduce quantified WALS [...] Read more.
Selecting a source language for zero-shot cross-lingual transfer is typically done by intuition or by defaulting to English, despite large performance differences across language pairs. We study whether linguistic similarity can predict transfer performance and support principled source-language selection. We introduce quantified WALS (qWALS), a typology-based similarity metric derived from features in the World Atlas of Language Structures, and evaluate it against existing similarity baselines. Validation uses three complementary signals: computational similarity scores, zero-shot transfer performance of multilingual transformers (mBERT and XLM-R) on four NLP tasks (dependency parsing, named entity recognition, sentiment analysis, and abusive language identification) across eight languages, and an expert-linguist similarity survey. Across tasks and models, higher linguistic similarity is associated with better transfer, and the survey provides independent support for the computational metrics. Full article
Show Figures

Figure 1

18 pages, 1182 KB  
Article
Co-MedGraphRAG: A Collaborative Large–Small Model Medical Question-Answering Framework Enhanced by Knowledge Graph Reasoning
by Sizhe Chen and Tao Chen
Information 2026, 17(3), 247; https://doi.org/10.3390/info17030247 - 2 Mar 2026
Viewed by 261
Abstract
Large language models (LLMs) have demonstrated significant capabilities in natural language processing (NLP), but they often encounter challenges in the medical domain. This can result in insufficient alignment between generated answers and user intent, as well as factual deviations. To address these issues, [...] Read more.
Large language models (LLMs) have demonstrated significant capabilities in natural language processing (NLP), but they often encounter challenges in the medical domain. This can result in insufficient alignment between generated answers and user intent, as well as factual deviations. To address these issues, we propose Co-MedGraphRAG, a novel framework combining knowledge graph reasoning with large–small model collaboration, aimed at improving the structural grounding and interpretability of medical responses. The framework operates through a multi-stage collaborative mechanism to augment question answering. First, a large language model constructs a question-specific knowledge graph (KG) containing pending entities (denoted as “none”) to explicitly define known and unknown variables. Subsequently, a hybrid reasoning strategy is employed to populate the pending entities, thereby completing the question-specific knowledge graph. Finally, this graph serves as critical structured evidence, combined with the original question, to augment the large language model in generating the final answer, implemented using Qwen2.5-7B and GLM4-9B in this paper. To evaluate the generated answers, we introduce a larger-parameter LLM(GPT-4o) to assess performance across five dimensions and compute an overall score. Experiments on three medical datasets demonstrate that Co-MedGraphRAG achieves consistent improvements in relevance, practicality, and structured knowledge support compared with mainstream Retrieval-Augmented Generation (RAG) frameworks. This work serves as a reference for researchers and developers designing medical question-answering frameworks and exploring decision-support applications. Full article
Show Figures

Graphical abstract

21 pages, 1533 KB  
Article
Enterprise E-Mail Classification Using Instruction-Following Large Language Models
by Ahmet Çağrı Sarıyıldız and Şafak Durukan-Odabaşı
Appl. Sci. 2026, 16(5), 2173; https://doi.org/10.3390/app16052173 - 24 Feb 2026
Viewed by 266
Abstract
Enterprise e-mail corpora contain heterogeneous and domain-specific content that poses challenges for conventional supervised Natural Language Processing (NLP) approaches due to class imbalance, evolving terminology, and limited labeled data. This study examines the use of instruction-following Large Language Models (LLMs) for enterprise e-mail [...] Read more.
Enterprise e-mail corpora contain heterogeneous and domain-specific content that poses challenges for conventional supervised Natural Language Processing (NLP) approaches due to class imbalance, evolving terminology, and limited labeled data. This study examines the use of instruction-following Large Language Models (LLMs) for enterprise e-mail classification under realistic operational conditions. The study evaluates instruction-based classification and semantic enrichment derived from distributional similarity as two complementary approaches for distinguishing technical from nontechnical messages. The approaches are assessed on a large-scale enterprise e-mail corpus and validated using a manually annotated subset. The results indicate that instruction-following LLMs provide stable contextual reasoning across diverse message structures, while semantic enrichment improves coverage of previously unseen technical expressions. Overall, the study presents an applied NLP framework for enterprise e-mail classification, with attention to interpretability, scalability, and robustness in real-world organizational settings. Full article
(This article belongs to the Special Issue Machine Learning Approaches in Natural Language Processing)
Show Figures

Figure 1

14 pages, 2451 KB  
Article
SQ-LoRA: Memory-Efficient Language Model Compression Through Stable-Rank-Guided Quantization for Edge Computing Applications
by Seda Bayat Toksöz and Gültekin Işik
Appl. Sci. 2026, 16(4), 2113; https://doi.org/10.3390/app16042113 - 21 Feb 2026
Viewed by 277
Abstract
The deployment of transformer-based language models on resource-constrained edge devices presents fundamental challenges in computational efficiency and memory utilization. We introduce SQ-LoRA (Stable-rank Quantized Low-Rank Adaptation), a theoretically grounded compression framework that achieves unprecedented efficiency through the synergistic integration of adaptive low-rank decomposition, [...] Read more.
The deployment of transformer-based language models on resource-constrained edge devices presents fundamental challenges in computational efficiency and memory utilization. We introduce SQ-LoRA (Stable-rank Quantized Low-Rank Adaptation), a theoretically grounded compression framework that achieves unprecedented efficiency through the synergistic integration of adaptive low-rank decomposition, hardware-accelerated structured sparsity, and intelligent hybrid quantization. Our primary contribution establishes the first rigorous mathematical connection between the matrix stable rank and optimal LoRA rank selection, formalized in Theorem I, which provides bounded approximation guarantees. SQ-LoRA implements: (1) adaptive rank allocation via stable-rank analysis to automatically determine layer-wise compression ratios; (2) 4:8 structured sparsity patterns, enabling 2× hardware acceleration on modern edge processors; and (3) a three-tier quantization scheme that combines 4-bit NormalFloat storage with selective 3-bit/8-bit precision to preserve outliers. A comprehensive evaluation on four diverse natural language processing (NLP) benchmarks demonstrates that SQ-LoRA achieves a 320 MB memory footprint (96.7% reduction) and a 10 ms inference latency (91.7% improvement), and maintains 82.0% average accuracy (within 0.15% of the full model). Statistical significance testing (p < 0.001) confirms its superiority over state-of-the-art methods. This framework enables the deployment of sophisticated language models on devices with 2 GB of RAM, advancing practical edge-AI applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

32 pages, 9123 KB  
Article
AI-Based Classification of IT Support Requests in Enterprise Service Management Systems
by Audrius Razma and Robertas Jurkus
Systems 2026, 14(2), 223; https://doi.org/10.3390/systems14020223 - 21 Feb 2026
Viewed by 328
Abstract
In modern organizations, IT Service Management (ITSM) relies on the efficient handling of large volumes of unstructured textual data, such as support tickets and incident reports. This study investigates the automated classification of IT support requests as a data-driven decision-support task within a [...] Read more.
In modern organizations, IT Service Management (ITSM) relies on the efficient handling of large volumes of unstructured textual data, such as support tickets and incident reports. This study investigates the automated classification of IT support requests as a data-driven decision-support task within a real-world enterprise ITSM context, addressing challenges posed by multilingual content and severe class imbalance. We propose an applied machine-learning and natural language processing (NLP) pipeline combining text cleaning, stratified data splitting, and supervised model training under realistic evaluation conditions. Multiple classification models were evaluated on historical enterprise ticket data, including a Logistic Regression baseline and transformer-based architectures (multilingual BERT and XLM-RoBERTa). Model validation distinguishes between deployment-oriented evaluation on naturally imbalanced data and diagnostic analysis using training-time class balancing to examine minority-class behavior. Results indicate that Logistic Regression performs reliably for high-frequency, well-defined request categories, while transformer-based models achieve consistently higher macro-averaged F1-scores and improved recognition of semantically complex and underrepresented classes. Training-time oversampling increases sensitivity to minority request types without improving overall accuracy on unbalanced test data, highlighting the importance of metric selection in ITSM evaluation. The findings provide an applied empirical comparison of established text-classification models in ITSM, incorporating both predictive performance and computational efficiency considerations, and offer practical guidance for supporting IT support agents during ticket triage and automated request classification. Full article
(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)
Show Figures

Figure 1

22 pages, 392 KB  
Review
Word Sense Disambiguation with Wikipedia Entities: A Survey of Entity Linking Approaches
by Michael Angelos Simos and Christos Makris
Entropy 2026, 28(2), 236; https://doi.org/10.3390/e28020236 - 18 Feb 2026
Viewed by 380
Abstract
The inference of unstructured text semantics is a crucial preprocessing task for NLP and AI applications. Word sense disambiguation and entity linking tasks resolve ambiguous terms within unstructured text corpora to senses from a predefined knowledge source. Wikipedia has been one of the [...] Read more.
The inference of unstructured text semantics is a crucial preprocessing task for NLP and AI applications. Word sense disambiguation and entity linking tasks resolve ambiguous terms within unstructured text corpora to senses from a predefined knowledge source. Wikipedia has been one of the most popular sources due to its completeness, high link density, and multi-language support. In the context of chatbot-mediated consumption of information in recent years through implicit disambiguation and semantic representations in LLMs, Wikipedia remains an invaluable source and reference point. This survey covers methodologies for entity linking with Wikipedia, including early systems based on hyperlink statistics and semantic relatedness, methods using graph inference problem formalizations and graph label propagation algorithms, neural and contextual methods based on sense embeddings and transformers, and multimodal, cross-lingual, and cross-domain settings. Moreover, we cover semantic annotation workflows that facilitate the scaled-up use of Wikipedia-centric entity linking. We also provide an overview of the available datasets and evaluation measures. We discuss challenges such as partial coverage, NIL concepts, the level of sense definition, combining WSD and large-scale language models, as well as the complementary use of Wikidata. Full article
(This article belongs to the Special Issue Information Theoretic Learning with Its Applications)
Show Figures

Figure 1

16 pages, 668 KB  
Article
Evaluation of a Company’s Media Reputation Based on the Articles Published on News Portals
by Algimantas Venčkauskas, Vacius Jusas and Dominykas Barisas
Appl. Sci. 2026, 16(4), 1987; https://doi.org/10.3390/app16041987 - 17 Feb 2026
Viewed by 232
Abstract
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of [...] Read more.
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of three polarities: positive, negative, or neutral. Then, we developed another method to assess a company’s media reputation using all available online articles about the company. The company’s media reputation is presented as a tuple consisting of their media reputation on a scale from 0 to 100, the number of articles related to the company, and the margin of error. Experiments were conducted using articles written in Lithuanian published on major news portals. We used two different tools to assess the sentiments of the articles: Stanford CoreNLP v.4.5.10, combined with Google API, and the pre-trained transformer model XLM-RoBERTa. Google API was used for translation into English, as Stanford CoreNLP does not support the Lithuanian language. The results obtained were compared with those of existing methods, based on the coefficients of media endorsement and media favorableness, showing that the results of the proposed method are less moderate than the coefficient of media favorableness and less extreme than the coefficient of media endorsement. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

13 pages, 1009 KB  
Article
Phishing Email Detection Using BERT and RoBERTa
by Mariam Ibrahim and Ruba Elhafiz
Computation 2026, 14(2), 46; https://doi.org/10.3390/computation14020046 - 7 Feb 2026
Viewed by 1028
Abstract
One of the most harmful and deceptive forms of cybercrime is phishing, which targets users with malicious emails and websites. In this paper, we focus on the use of natural language processing (NLP) techniques and transformer models for phishing email detection. The Nazario [...] Read more.
One of the most harmful and deceptive forms of cybercrime is phishing, which targets users with malicious emails and websites. In this paper, we focus on the use of natural language processing (NLP) techniques and transformer models for phishing email detection. The Nazario Phishing Corpus is preprocessed and blended with real emails from the Enron dataset to create a robustly balanced dataset. Urgency, deceptive phrasing, and structural anomalies were some of the neglected features and sociolinguistic traits of the text, which underwent tokenization, lemmatization, and noise filtration. We fine-tuned two transformer models, Bidirectional Encoder Representations from Transformers (BERT) and the Robustly Optimized BERT Pretraining Approach (RoBERTa), for binary classification. The models were evaluated on the standard metrics of accuracy, precision, recall, and F1-score. Given the context of phishing, emphasis was placed on recall to reduce the number of phishing attacks that went unnoticed. The results show that RoBERTa has more general performance and fewer false negatives than BERT and is therefore a better candidate for deployment on security-critical tasks. Full article
Show Figures

Figure 1

18 pages, 1035 KB  
Article
Narrative Divergence and Disinformation: An Entropic Model for Assessing the Informative Utility of Public Information Sources
by José Ignacio Peláez, Gustavo Fabian Vaccaro and Felix Infante León
Entropy 2026, 28(2), 183; https://doi.org/10.3390/e28020183 - 6 Feb 2026
Viewed by 520
Abstract
In today’s information ecosystem, disinformation threatens civic autonomy and the stability of public discourse. Beyond the intentional spread of false information, it often appears as narrative divergence among sources interpreting shared events, generating fragmentation and measurable losses in structural coherence. This study examines [...] Read more.
In today’s information ecosystem, disinformation threatens civic autonomy and the stability of public discourse. Beyond the intentional spread of false information, it often appears as narrative divergence among sources interpreting shared events, generating fragmentation and measurable losses in structural coherence. This study examines disinformation within an entropic structural framework, defining it as narrative disorder and epistemic incoherence in information systems. The approach moves beyond fact-checking by treating narrative structure and informational order as quantifiable attributes of public communication. We present the QVP-RI (Relational Information Valuation) operator, a computational model that quantifies narrative divergence through informational entropy and normalized structural divergence, without issuing truth assessments. Implemented through state-of-the-art NLP pipelines and entropic analysis, the operator maps narrative structure and epistemic order across plural media environments. Unlike accuracy-driven approaches, it evaluates narrative coherence and informational utility (IU) as complementary indicators of epistemic value. Experimental validation with 500 participants confirms the robustness of the structural–entropic model and identifies high divergence regions, revealing communication vulnerabilities and showing how narrative disorder enables disinformation dynamics. The QVP-RI operator thus offers a computationally grounded tool for analyzing disinformation as narrative divergence and for strengthening epistemic order in open information systems. Full article
(This article belongs to the Special Issue Complexity of Social Networks)
Show Figures

Figure 1

25 pages, 1516 KB  
Article
Comparative Benchmarking of Deep Learning Architectures for Detecting Adversarial Attacks on Large Language Models
by Oleksandr Kushnerov, Ruslan Shevchuk, Serhii Yevseiev and Mikołaj Karpiński
Information 2026, 17(2), 155; https://doi.org/10.3390/info17020155 - 4 Feb 2026
Viewed by 460
Abstract
The rapid adoption of large language models (LLMs) in corporate and governmental systems has raised critical security concerns, particularly prompt injection attacks exploiting LLMs’ inability to differentiate control instructions from untrusted user inputs. This study systematically benchmarks neural network architectures for malicious prompt [...] Read more.
The rapid adoption of large language models (LLMs) in corporate and governmental systems has raised critical security concerns, particularly prompt injection attacks exploiting LLMs’ inability to differentiate control instructions from untrusted user inputs. This study systematically benchmarks neural network architectures for malicious prompt detection, emphasizing robustness against character-level adversarial perturbations—an aspect that remains comparatively underemphasized in the specific context of prompt-injection detection despite its established significance in general adversarial NLP. Using the Malicious Prompt Detection Dataset (MPDD) containing 39,234 labeled instances, eight architectures—Dense DNN, CNN, BiLSTM, BiGRU, Transformer, ResNet, and character-level variants of CNN and BiLSTM—were evaluated based on standard performance metrics (accuracy, F1-score, and AUC-ROC), adversarial robustness coefficients against spacing and homoglyph perturbations, and inference latency. Results indicate that the word-level 3_Word_BiLSTM achieved the highest performance on clean samples (accuracy = 0.9681, F1 = 0.9681), whereas the Transformer exhibited lower accuracy (0.9190) and significant vulnerability to spacing attacks (adversarial robustness ρspacing=0.61). Conversely, the Character-level BiLSTM demonstrated superior resilience (ρspacing=1.0, ρhomoglyph=0.98), maintaining high accuracy (0.9599) and generalization on external datasets with only 2–4% performance decay. These findings highlight that character-level representations provide intrinsic robustness against obfuscation attacks, suggesting Char_BiLSTM as a reliable component in defense-in-depth strategies for LLM-integrated systems. Full article
(This article belongs to the Special Issue Public Key Cryptography and Privacy Protection)
Show Figures

Graphical abstract

17 pages, 783 KB  
Article
Hospital-Wide Sepsis Detection: A Machine Learning Model Based on Prospectively Expert-Validated Cohort
by Marcio Borges-Sa, Andres Giglio, Maria Aranda, Antonia Socias, Alberto del Castillo, Cristina Pruenza, Gonzalo Hernández, Sofía Cerdá, Lorenzo Socias, Victor Estrada, Roberto de la Rica, Elisa Martin and Ignacio Martin-Loeches
J. Clin. Med. 2026, 15(2), 855; https://doi.org/10.3390/jcm15020855 - 21 Jan 2026
Viewed by 612
Abstract
Background/Objectives: Sepsis detection remains challenging due to clinical heterogeneity and limitations of traditional scoring systems. This study developed and validated a hospital-wide machine learning model for sepsis detection using retrospectively developed data from prospectively expert-validated cases, aiming to improve diagnostic accuracy beyond conventional [...] Read more.
Background/Objectives: Sepsis detection remains challenging due to clinical heterogeneity and limitations of traditional scoring systems. This study developed and validated a hospital-wide machine learning model for sepsis detection using retrospectively developed data from prospectively expert-validated cases, aiming to improve diagnostic accuracy beyond conventional approaches. Methods: This retrospective cohort study analysed 218,715 hospital episodes (2014–2018) at a tertiary care centre. Sepsis cases (n = 11,864, 5.42%) were prospectively validated in real-time by a Multidisciplinary Sepsis Unit using modified Sepsis-2 criteria with organ dysfunction. The model integrated structured data (26.95%) and unstructured clinical notes (73.04%) extracted via natural language processing from 2829 variables, selecting 230 relevant predictors. Thirty models including random forests, support vector machines, neural networks, and gradient boosting were developed and evaluated. The dataset was randomly split (5/7 training, 2/7 testing) with preserved patient-level independence. Results: The BiAlert Sepsis model (random forest + Sepsis-2 ensemble) achieved an AUC-ROC of 0.95, sensitivity of 0.93, and specificity of 0.84, significantly outperforming traditional approaches. Compared to the best rule-based method (Sepsis-2 + qSOFA, AUC-ROC 0.90), BiAlert reduced false positives by 39.6% (13.10% vs. 21.70%, p < 0.01). Novel predictors included eosinopenia and hypoalbuminemia, while traditional variables (MAP, GCS, platelets) showed minimal univariate association. The model received European Medicines Agency approval as a medical device in June 2024. Conclusions: This hospital-wide machine learning model, trained on prospectively expert-validated cases and integrating extensive NLP-derived features, demonstrates superior sepsis detection performance compared to conventional scoring systems. External validation and prospective clinical impact studies are needed before widespread implementation. Full article
Show Figures

Figure 1

33 pages, 550 KB  
Article
Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach
by Qidi Yu, Chen Xing, Yanjing He, Sunghee Ahn and Hyung Jong Na
Electronics 2026, 15(2), 443; https://doi.org/10.3390/electronics15020443 - 20 Jan 2026
Viewed by 384
Abstract
This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard [...] Read more.
This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard (BSC) framework into financial, customer, internal process, and learning and growth dimensions. Various machine learning and deep learning models—including k-nearest neighbors (KNNs), support vector machine (SVM), light gradient boosting machine (LightGBM), convolutional neural network (CNN), long short-term memory (LSTM), autoencoder, and transformer—were evaluated, with results showing that the inclusion of strategic textual data significantly enhanced prediction accuracy, precision, recall, area under the curve (AUC), and F1-score. Among individual models, the transformer architecture demonstrated superior performance in extracting context-rich semantic features. A soft-voting ensemble model combining autoencoder, LSTM, and transformer achieved the best overall performance, leading in accuracy and AUC, while the best single deep learning model (transformer) obtained a marginally higher F1 score, confirming the value of hybrid learning. Furthermore, analysis revealed that customer-oriented strategy disclosures were the most predictive among BSC dimensions. These findings highlight the value of integrating financial and narrative data using advanced NLP and artificial intelligence (AI) techniques to develop interpretable and robust corporate performance forecasting models. In addition, we operationalize information security narratives using a reproducible cybersecurity lexicon and derive security disclosure intensity and weight share features that are jointly evaluated with BSC-based strategic vectors. Full article
(This article belongs to the Special Issue Advances in Intelligent Information Processing)
Show Figures

Figure 1

19 pages, 2077 KB  
Article
Evaluating Natural Language Processing and Named Entity Recognition for Bioarchaeological Data Reuse
by Alphaeus Lien-Talks
Heritage 2026, 9(1), 35; https://doi.org/10.3390/heritage9010035 - 19 Jan 2026
Viewed by 384
Abstract
Bioarchaeology continues to generate growing volumes of data from finite and often destructively sampled resources, making data reusability critical according to FAIR principles (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility and Ethics). However, much valuable information remains trapped [...] Read more.
Bioarchaeology continues to generate growing volumes of data from finite and often destructively sampled resources, making data reusability critical according to FAIR principles (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility and Ethics). However, much valuable information remains trapped in grey literature, particularly PDF-based reports, limiting discoverability and machine processing. This paper explores Natural Language Processing (NLP) and Named Entity Recognition (NER) techniques to improve access to osteoarchaeological and palaeopathological data in grey literature. The research developed and evaluated the Osteoarchaeological and Palaeopathological Entity Search (OPES), a lightweight prototype system designed to extract relevant terms from PDF documents within the Archaeology Data Service archive. Unlike transformer-based Large Language Models, OPES employs interpretable, computationally efficient, and sustainable NLP methods. A structured user evaluation (n = 83) involving students (42), experts (26), and the general public (15) assessed five success criteria: usefulness, time-saving ability, accessibility, reliability, and likelihood of reuse. Results demonstrate that while limitations remain in reliability and expert engagement, NLP and NER show clear potential to increase FAIRness of osteoarcheological datasets. The study emphasises the continued need for robust evaluation methodologies in heritage AI applications as new technologies emerge. Full article
(This article belongs to the Special Issue AI and the Future of Cultural Heritage)
Show Figures

Figure 1

30 pages, 1372 KB  
Systematic Review
A Systematic Review and Bibliometric Analysis of Automated Multiple-Choice Question Generation
by Dimitris Mitroulias and Spyros Sioutas
Big Data Cogn. Comput. 2026, 10(1), 35; https://doi.org/10.3390/bdcc10010035 - 18 Jan 2026
Viewed by 715
Abstract
The aim of this study is to systematically capture, synthesize, and evaluate current research trends related to Automated Multiple-Choice Question Generation as they emerge within the broader landscape of natural language processing (NLP) and large language model (LLM)-based educational and assessment research. A [...] Read more.
The aim of this study is to systematically capture, synthesize, and evaluate current research trends related to Automated Multiple-Choice Question Generation as they emerge within the broader landscape of natural language processing (NLP) and large language model (LLM)-based educational and assessment research. A systematic search and selection process was conducted following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, using predefined inclusion and exclusion criteria. A total of 240 eligible publications indexed in the Scopus database were identified and analyzed. To provide a comprehensive overview of this evolving research landscape, a bibliometric analysis was performed utilizing performance analysis and scientific mapping methods, supported by the Bibliometrix (version 4.2.2) R package and VOSviewer (version 1.6.19) software. The findings of the performance analysis indicate a steady upward trend in publications and citations, with significant contributions from leading academic institutions—primarily from the United States—and a strong presence in high quality academic journals. Scientific mapping through co-authorship analysis reveals that, despite the increasing research activity, there remains a need for enhanced collaborative efforts. Bibliographic coupling organizes the analyzed literature into seven thematic clusters, highlighting the main research axes and their diachronic evolution. Furthermore, co-word analysis identifies emerging research trends and underexplored directions, indicating substantial opportunities for future investigation. To the best of our knowledge, this study represents the first systematic bibliometric analysis that examines Automated Multiple-Choice Question Generation research within the context of the broader LLM-driven educational assessment literature. By mapping the relevant scientific production and identifying research gaps and future directions, this work contributes to a more coherent understanding of the field and supports the ongoing development of research at the intersection of generative AI and educational assessment. Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
Show Figures

Figure 1

Back to TopTop