Search for Articles

Article

3,872 Views

18 Pages

DefAn: Definitive Answer Dataset for LLM Hallucination Evaluation

A. B. M. Ashikur Rahman,
Saeed Anwar,
Muhammad Usman,
Irfan Ahmad and
Ajmal Mian

Information2025, 16(11), 937;https://doi.org/10.3390/info16110937

-

28 October 2025

Large Language Models (LLMs) represent a major step in AI development and are increasingly used in daily applications. However, they are prone to hallucinations, generating claims that contradict established facts, deviating from prompts, and produci...

24,071 Results Found

DefAn: Definitive Answer Dataset for LLM Hallucination Evaluation

Applicability Evaluation of the Global Synthetic Tropical Cyclone Hazard Dataset in Coastal China

Educational Evaluation with MLLMs: Framework, Dataset, and Comprehensive Assessment

A Dataset and Experimental Evaluation of a Parallel Conflict Detection Solution for Model-Based Diagnosis

Statistical Evaluation and Analysis of Road Extraction Methodologies Using a Unique Dataset from Remote Sensing

NILMPEds: A Performance Evaluation Dataset for Event Detection Algorithms in Non-Intrusive Load Monitoring

Epitope Prediction Based on Random Peptide Library Screening: Benchmark Dataset and Prediction Tools Evaluation

Organ-On-A-Chip (OOC) Image Dataset for Machine Learning and Tissue Model Evaluation

AutoML-Based Prediction of Unconfined Compressive Strength of Stabilized Soils: A Multi-Dataset Evaluation on Worldwide Experimental Data

Applicability Evaluation of Antarctic Ozone Reanalysis and Merged Satellite Datasets

Evaluation of Benchmark Datasets and Deep Learning Models with Pre-Trained Weights for Vision-Based Dynamic Hand Gesture Recognition

Evaluation of Eight Global Precipitation Datasets in Hydrological Modeling

Multiscale Evaluation of Gridded Precipitation Datasets across Varied Elevation Zones in Central Asia’s Hilly Region

Evaluation and Error Analysis of Multi-Source Precipitation Datasets during Summer over the Tibetan Plateau

Warm-Season Precipitation in the Eastern Pamir Plateau: Evaluation from Multi-Source Datasets and Elevation Dependence

Evaluation of Sixteen Gridded Precipitation Datasets over the Caribbean Region Using Gauge Observations

Satellite-Based Precipitation Datasets Evaluation Using Gauge Observation and Hydrological Modeling in a Typical Arid Land Watershed of Central Asia

Evaluation of Global Historical Cropland Datasets with Regional Historical Evidence and Remotely Sensed Satellite Data from the Xinjiang Area of China

Evaluation of High-Resolution Crop Model Meteorological Forcing Datasets at Regional Scale: Air Temperature and Precipitation over Major Land Areas of China

Evaluation of Multi-Source Soil Moisture Datasets over Central and Eastern Agricultural Area of China Using In Situ Monitoring Network

Thailand Raw Water Quality Dataset Analysis and Evaluation

Automated Dataset-Creation and Evaluation Pipeline for NER in Russian Literary Heritage

An Evaluation of Large Language Models for Supplementing a Food Extrusion Dataset

Dataset Evaluation Method and Application for Performance Testing of SSVEP-BCI Decoding Algorithm

The Proposition and Evaluation of the RoEduNet-SIMARGL2021 Network Intrusion Detection Dataset

A Comprehensive Benchmarking Framework for Sentinel-2 Sharpening: Methods, Dataset, and Evaluation Metrics

Cross-Dataset Evaluation of Deep Learning Networks for Uterine Cervix Segmentation

Evaluation of Potential Evapotranspiration Based on CMADS Reanalysis Dataset over China

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark

Evaluation of Machine Learning Algorithms in Network-Based Intrusion Detection Using Progressive Dataset

Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost

Evaluating the Accuracy of a Gridded Near-Surface Temperature Dataset over Mainland China

An Evaluation of the Capability of Global Meteorological Datasets to Capture Drought Events in Xinjiang

Evaluation of Infrared Thermography Dataset for Delamination Detection in Reinforced Concrete Bridge Decks

Bird Object Detection: Dataset Construction, Model Performance Evaluation, and Model Lightweighting

Advances in Face Recognition: A Comprehensive Review of Feature Extraction and Dataset Evaluation

A Bilingual Basque–Spanish Dataset of Parliamentary Sessions for the Development and Evaluation of Speech Technology

Evaluation of Online Inquiry Competencies of Chilean Elementary School Students: A Dataset

3DRIED: A High-Resolution 3-D Millimeter-Wave Radar Dataset Dedicated to Imaging and Evaluation

Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset

Exploring Emotional Stimuli Detection in Artworks: A Benchmark Dataset and Baselines Evaluation

Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data

A Benchmark for the Evaluation of Corner Detectors

Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos

A New Dataset and Performance Evaluation of a Region-Based CNN for Urban Object Detection

Deep 3D Convolutional Neural Network for Facial Micro-Expression Analysis from Video Images

Evaluation and Comparison of Five Long-Term Precipitation Datasets in the Hang-Jia-Hu Plain of Eastern China

xScore: A Simple Metric for Cross-Domain Robustness in Lightweight Vision Models

Theory and Data-Driven Competence Evaluation with Multimodal Machine Learning—A Chinese Competence Evaluation Multimodal Dataset

Clinical Application of Vision Transformers for Melanoma Classification: A Multi-Dataset Evaluation Study