You are currently on the new version of our website. Access the old version .

13,873 Results Found

  • Article
  • Open Access
2 Citations
3,023 Views
18 Pages

A Large Benchmark Dataset for Individual Sheep Face Recognition

  • Yue Pang,
  • Wenbo Yu,
  • Chuanzhong Xuan,
  • Yongan Zhang and
  • Pei Wu

The mutton sheep breeding industry has transformed significantly in recent years, from traditional grassland free-range farming to a more intelligent approach. As a result, automated sheep face recognition systems have become vital to modern breeding...

  • Article
  • Open Access
875 Views
17 Pages

MASS-LSVD: A Large-Scale First-View Dataset for Marine Vessel Detection

  • Yunsheng Fan,
  • Dongjie Ju,
  • Bing Han,
  • Feng Sun,
  • Liran Shen,
  • Zongjiang Gao,
  • Dongdong Mu and
  • Longhui Niu

19 November 2025

In this paper, we release a new large-scale dataset containing multiple categories of ships and floating objects at sea, which we call MASS-LSVD. It is used to train and validate target detection algorithms and future large models for ship autopiloti...

  • Article
  • Open Access
3 Citations
4,058 Views
9 Pages

Despite the growing capabilities of large language models, concerns exist about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers that ca...

  • Data Descriptor
  • Open Access
1,514 Views
27 Pages

DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition

  • Heng-Wei Zhang,
  • Rui-Feng Wang,
  • Zhengle Wang and
  • Wen-Hao Su

20 November 2025

The accurate identification of crop pests and diseases is critical for global food security, yet the development of robust deep learning models is hindered by the limitations of existing datasets. To address this gap, we introduce DLCPD-25, a new lar...

  • Article
  • Open Access
2 Citations
5,278 Views
22 Pages

A Framework for Domain-Specific Dataset Creation and Adaptation of Large Language Models

  • George Balaskas,
  • Homer Papadopoulos,
  • Dimitra Pappa,
  • Quentin Loisel and
  • Sebastien Chastin

This paper introduces a novel framework for addressing domain adaptation challenges in large language models (LLMs), emphasising privacy-preserving synthetic data generation and efficient fine-tuning. The proposed framework employs a multi-stage appr...

  • Article
  • Open Access
12 Citations
6,490 Views
30 Pages

Towards an Improved Large-Scale Gridded Population Dataset: A Pan-European Study on the Integration of 3D Settlement Data into Population Modelling

  • Daniela Palacios-Lopez,
  • Thomas Esch,
  • Kytt MacManus,
  • Mattia Marconcini,
  • Alessandro Sorichetta,
  • Greg Yetman,
  • Julian Zeidler,
  • Stefan Dech,
  • Andrew J. Tatem and
  • Peter Reinartz

11 January 2022

Large-scale gridded population datasets available at the global or continental scale have become an important source of information in applications related to sustainable development. In recent years, the emergence of new population models has levera...

  • Article
  • Open Access
9 Citations
7,357 Views
21 Pages

2 November 2022

In recent years, Vehicle Make and Model Recognition (VMMR) has attracted a lot of attention as it plays a crucial role in Intelligent Transportation Systems (ITS). Accurate and efficient VMMR systems are required in real-world applications including...

  • Article
  • Open Access
2,375 Views
22 Pages

A Clustering Algorithm for Large Datasets Based on Detection of Density Variations

  • Adrián Josué Ramírez-Díaz,
  • José Francisco Martínez-Trinidad and
  • Jesús Ariel Carrasco-Ochoa

15 July 2025

Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can...

  • Article
  • Open Access
43 Citations
5,941 Views
17 Pages

An Oversampling Method for Class Imbalance Problems on Large Datasets

  • Fredy Rodríguez-Torres,
  • José F. Martínez-Trinidad and
  • Jesús A. Carrasco-Ochoa

28 March 2022

Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k-nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitabl...

  • Article
  • Open Access
11 Citations
3,965 Views
19 Pages

23 July 2022

There is a large number of grid-based climate datasets available which differ in terms of their data source, estimation procedures, and spatial and temporal resolutions. This study evaluates the performance of diverse meteorological datasets in terms...

  • Article
  • Open Access
10 Citations
6,714 Views
31 Pages

1 July 2024

This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model ad...

  • Article
  • Open Access
27 Citations
6,272 Views
19 Pages

Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets

  • Xavier Larriva-Novo,
  • Mario Vega-Barbas,
  • Víctor A. Villagrá,
  • Diego Rivera,
  • Manuel Álvarez-Campana and
  • Julio Berrocal

15 May 2020

New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people&rsqu...

  • Article
  • Open Access
718 Views
24 Pages

18 July 2025

The construction of large-scale, dynamic datasets for specialized domain models often suffers with problems of low efficiency and poor consistency. This paper proposes a method that integrates multi-role collaboration with automated annotation to add...

  • Proceeding Paper
  • Open Access
2 Citations
4,713 Views
10 Pages

Large Landing Trajectory Dataset for Go-Around Analysis

  • Raphael Monstein,
  • Benoit Figuet,
  • Timothé Krauth,
  • Manuel Waltert and
  • Marcel Dettling

13 December 2022

The analysis and prediction of go-arounds, also referred to as missed approaches, is an active field of research due to the go-around’s impact on safety and the disruption of the traffic flow at airports. The advent of open-source aircraft traj...

  • Data Descriptor
  • Open Access
7 Citations
7,467 Views
27 Pages

#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

  • Gabriel Oliveira dos Santos,
  • Esther Luna Colombini and
  • Sandra Avila

21 January 2022

Automatically describing images using natural sentences is essential to visually impaired people’s inclusion on the Internet. This problem is known as Image Captioning. There are many datasets in the literature, but most contain only English ca...

  • Article
  • Open Access
3 Citations
5,994 Views
12 Pages

A Large-Scale Mouse Pose Dataset for Mouse Pose Estimation

  • Jun Sun,
  • Jing Wu,
  • Xianghui Liao,
  • Sijia Wang and
  • Mantao Wang

25 April 2022

Mouse pose estimations have important applications in the fields of animal behavior research, biomedicine, and animal conservation studies. Accurate and efficient mouse pose estimations using computer vision are necessary. Although methods for mouse...

  • Article
  • Open Access
29 Citations
13,413 Views
24 Pages

WAID: A Large-Scale Dataset for Wildlife Detection with Drones

  • Chao Mou,
  • Tengfei Liu,
  • Chengcheng Zhu and
  • Xiaohui Cui

17 September 2023

Drones are widely used for wildlife monitoring. Deep learning algorithms are key to the success of monitoring wildlife with drones, although they face the problem of detecting small targets. To solve this problem, we have introduced the SE-YOLO model...

  • Article
  • Open Access
1 Citations
1,911 Views
18 Pages

An Evaluation of Large Language Models for Supplementing a Food Extrusion Dataset

  • Necva Bölücü,
  • Jordan Pennells,
  • Huichen Yang,
  • Maciej Rybinski and
  • Stephen Wan

15 April 2025

Food extrusion is a widely used processing technique that transforms raw ingredients into structured food products—foods with well-defined textures, shapes, and functionalities—through mechanical shear and thermal energy. Despite its broa...

  • Data Descriptor
  • Open Access
15 Citations
4,423 Views
11 Pages

Large-Scale Dataset of Local Java Software Build Results

  • Matúš Sulír,
  • Michaela Bačíková,
  • Matej Madeja,
  • Sergej Chodarev and
  • Ján Juhár

21 September 2020

When a person decides to inspect or modify a third-party software project, the first necessary step is its successful compilation from source code using a build system. However, such attempts often end in failure. In this data descriptor paper, we pr...

  • Article
  • Open Access
20 Citations
7,542 Views
21 Pages

UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images

  • Boyong He,
  • Xianjiang Li,
  • Bo Huang,
  • Enhui Gu,
  • Weijie Guo and
  • Liaoni Wu

9 December 2021

As a data-driven approach, deep learning requires a large amount of annotated data for training to obtain a sufficiently accurate and generalized model, especially in the field of computer vision. However, when compared with generic object recognitio...

  • Article
  • Open Access
4 Citations
4,432 Views
19 Pages

30 April 2025

In the field of Cyber Threat Intelligence (CTI), the scarcity of high-quality and labelled datasets that include Indicators of Compromise (IoCs) impact the design and implementation of robust predictive models that are capable of classifying IoCs in...

  • Article
  • Open Access
26 Citations
5,811 Views
25 Pages

29 April 2021

The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge...

  • Article
  • Open Access
8 Citations
3,953 Views
18 Pages

Simulating Large-Scale 3D Cadastral Dataset Using Procedural Modelling

  • Jernej Tekavec,
  • Anka Lisec and
  • Eugénio Rodrigues

Geospatial data and information within contemporary land administration systems are fundamental to manage the territory adequately. 3D land administration systems, often addressed as 3D cadastre, promise several benefits, particularly in managing tod...

  • Article
  • Open Access
2 Citations
1,731 Views
23 Pages

Overwater-Haze: A Large-Scale Overwater Paired Image Dehazing Dataset

  • Yuhang Xie,
  • Meng Li,
  • Siqi Wang and
  • Hongbo Wang

19 August 2025

Maritime navigation safety relies on high-precision perception systems. However, hazy weather often significantly compromises system performance, particularly by reducing image quality and increasing navigational risks. Although image dehazing techni...

  • Article
  • Open Access
2,346 Views
14 Pages

10 May 2024

Modern UAVs (unmanned aerial vehicles) equipped with video cameras can provide large-scale high-resolution video data. This poses significant challenges for structure from motion (SfM) and simultaneous localization and mapping (SLAM) algorithms, as m...

  • Article
  • Open Access
11 Citations
5,397 Views
12 Pages

19 December 2022

The detection of road facilities or roadside structures is essential for high-definition (HD) maps and intelligent transportation systems (ITSs). With the rapid development of deep-learning algorithms in recent years, deep-learning-based object detec...

  • Article
  • Open Access
22 Citations
9,214 Views
18 Pages

Diverse Scene Stitching from a Large-Scale Aerial Video Dataset

  • Tao Yang,
  • Jing Li,
  • Jingyi Yu,
  • Sibing Wang and
  • Yanning Zhang

28 May 2015

Diverse scene stitching is a challenging task in aerial video surveillance. This paper presents a hybrid stitching method based on the observation that aerial videos captured in real surveillance settings are neither totally ordered nor completely un...

  • Article
  • Open Access
7 Citations
3,972 Views
21 Pages

19 February 2025

With the rapid development of large visual language models (LVLMs) and multimodal large language models (MLLMs), these models have demonstrated strong performance in various multimodal tasks. However, alleviating the generation of hallucinations rema...

  • Article
  • Open Access
85 Citations
10,861 Views
27 Pages

LASDU: A Large-Scale Aerial LiDAR Dataset for Semantic Labeling in Dense Urban Areas

  • Zhen Ye,
  • Yusheng Xu,
  • Rong Huang,
  • Xiaohua Tong,
  • Xin Li,
  • Xiangfeng Liu,
  • Kuifeng Luan,
  • Ludwig Hoegner and
  • Uwe Stilla

The semantic labeling of the urban area is an essential but challenging task for a wide variety of applications such as mapping, navigation, and monitoring. The rapid advance in Light Detection and Ranging (LiDAR) systems provides this task with a po...

  • Article
  • Open Access
1 Citations
1,099 Views
18 Pages

A Large-Language-Model-Based Dataset of Plant Species for Green Roofs in China

  • Haoyu Han,
  • Xiliang Liu,
  • Shaofu Lin,
  • Yumiao Chang,
  • Shimin Ding and
  • Jing Zhang

20 August 2025

As urbanization accelerates, a host of negative ecological impacts have become increasingly prominent. Green roofs, as a sustainable solution, can effectively mitigate the urban heat island effect and reduce carbon footprints. However, the lack of da...

  • Article
  • Open Access
1,450 Views
28 Pages

27 May 2025

The pre-training and fine-tuning paradigm has significantly advanced satellite remote sensing applications. However, its potential remains largely underexplored for airborne laser scanning (ALS), a key technology in domains such as forest management...

  • Feature Paper
  • Article
  • Open Access
125 Citations
10,722 Views
15 Pages

21 March 2018

Among the members of biometric identifiers, the palmprint and the palmvein have received significant attention due to their stability, uniqueness, and non-intrusiveness. In this paper, we investigate the problem of palmprint/palmvein recognition and...

  • Article
  • Open Access
3 Citations
4,031 Views
18 Pages

AlgoLabel: A Large Dataset for Multi-Label Classification of Algorithmic Challenges

  • Radu Cristian Alexandru Iacob,
  • Vlad Cristian Monea,
  • Dan Rădulescu,
  • Andrei-Florin Ceapă,
  • Traian Rebedea and
  • Ștefan Trăușan-Matu

9 November 2020

While semantic parsing has been an important problem in natural language processing for decades, recent years have seen a wide interest in automatic generation of code from text. We propose an alternative problem to code generation: labelling the alg...

  • Data Descriptor
  • Open Access
3 Citations
2,840 Views
15 Pages

A Large-Scale Dataset of Conservation and Deep Tillage in Mollisols, Northeast Plain, China

  • Fahui Jiang,
  • Shangshu Huang,
  • Yan Wu,
  • Mahbub Ul Islam,
  • Fangjin Dong,
  • Zhen Cao,
  • Guohui Chen and
  • Yuming Guo

24 December 2022

One of the primary challenges of our time is to feed a growing and more demanding world population with degraded soil environments under more variable and extreme climate conditions. Conservation tillage (CS) and deep tillage (DT) have received stron...

  • Article
  • Open Access
1 Citations
3,842 Views
20 Pages

6 September 2024

The accurate reconstruction of indoor environments is crucial for applications in augmented reality, virtual reality, and robotics. However, existing indoor datasets are often limited in scale, lack ground truth point clouds, and provide insufficient...

  • Article
  • Open Access
825 Views
22 Pages

NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation

  • Chi-en Amy Tai,
  • Matthew Keller,
  • Saeejith Nair,
  • Yuhao Chen,
  • Yifan Wu,
  • Olivia Markham,
  • Krish Parmar,
  • Pengcheng Xi and
  • Alexander Wong

4 November 2025

Elderly populations often face significant challenges when it comes to dietary intake tracking, often exacerbated by health complications. Unfortunately, conventional diet assessment techniques such as food frequency questionnaires, food diaries, and...

  • Data Descriptor
  • Open Access

13 January 2026

Graphene (GRA) and graphene oxide (GO) have drawn significant attention in materials science, chemistry, and nanotechnology because of their tunable physicochemical properties and wide range of potential uses in biomedical and environmental applicati...

  • Article
  • Open Access
18 Citations
8,458 Views
25 Pages

EyeTrackUAV2: A Large-Scale Binocular Eye-Tracking Dataset for UAV Videos

  • Anne-Flore Perrin,
  • Vassilios Krassanakis,
  • Lu Zhang,
  • Vincent Ricordel,
  • Matthieu Perreira Da Silva and
  • Olivier Le Meur

8 January 2020

The fast and tremendous evolution of the unmanned aerial vehicle (UAV) imagery gives place to the multiplication of applications in various fields such as military and civilian surveillance, delivery services, and wildlife monitoring. Combining UAV i...

  • Data Descriptor
  • Open Access
20 Citations
7,451 Views
10 Pages

Large-Scale Dataset for the Analysis of Outdoor-to-Indoor Propagation for 5G Mid-Band Operational Networks

  • Usman Ali,
  • Giuseppe Caso,
  • Luca De Nardis,
  • Konstantinos Kousias,
  • Mohammad Rajiullah,
  • Özgü Alay,
  • Marco Neri,
  • Anna Brunstrom and
  • Maria-Gabriella Di Benedetto

15 March 2022

Understanding radio propagation characteristics and developing channel models is fundamental to building and operating wireless communication systems. Among others uses, channel characterization and modeling can be used for coverage and performance a...

  • Article
  • Open Access
42 Citations
9,241 Views
12 Pages

17 July 2020

Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to...

  • Article
  • Open Access
4 Citations
4,325 Views
13 Pages

DIR: A Large-Scale Dialogue Rewrite Dataset for Cross-Domain Conversational Text-to-SQL

  • Jieyu Li,
  • Zhi Chen,
  • Lu Chen,
  • Zichen Zhu,
  • Hanqi Li,
  • Ruisheng Cao and
  • Kai Yu

9 February 2023

Semantic co-reference and ellipsis always lead to information deficiency when parsing natural language utterances with SQL in a multi-turn dialogue (i.e., conversational text-to-SQL task). The methodology of dividing a dialogue understanding task int...

  • Article
  • Open Access
81 Citations
16,011 Views
15 Pages

25 February 2021

The recent explosion of large volume of standard dataset of annotated images has offered promising opportunities for deep learning techniques in effective and efficient object detection applications. However, due to a huge difference of quality betwe...

  • Article
  • Open Access
14 Citations
4,615 Views
23 Pages

Early detection of dyslexia and learning disorders is vital for avoiding a learning disability, as well as supporting dyslexic students by tailoring academic programs to their needs. Several studies have investigated using supervised algorithms to sc...

  • Article
  • Open Access
20 Citations
5,403 Views
23 Pages

The possibility of carrying out a meaningful forensic analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornog...

  • Systematic Review
  • Open Access
1 Citations
3,618 Views
19 Pages

27 May 2025

Synthetic medical text generation has emerged as a solution to data scarcity and privacy constraints in clinical NLP. This review systematically evaluates the use of Large Language Models (LLMs) for structured medical text generation, examining techn...

  • Article
  • Open Access
1 Citations
3,603 Views
28 Pages

KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition

  • Hyeongbok Kim,
  • Eunbi Kim,
  • Sanghoon Ahn,
  • Beomjin Kim,
  • Sung Jin Kim,
  • Tae Kyung Sung,
  • Lingling Zhao,
  • Xiaohong Su and
  • Gilmu Dong

14 March 2025

Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastruc...

  • Article
  • Open Access
13 Citations
2,244 Views
20 Pages

12 January 2024

We aimed to improve the detection accuracy of laser methane sensors in expansive temperature application environments. In this paper, a large-scale dataset of the measured concentration of the sensor at different temperatures is established, and a te...

  • Communication
  • Open Access
2 Citations
1,554 Views
7 Pages

Predictors of ICU Admission in Children with COVID-19: Analysis of a Large Mexican Population Dataset

  • Martha I. Cárdenas-Rojas,
  • José Guzmán-Esquivel and
  • Efrén Murillo-Zamora

22 May 2023

Children, although mostly affected mildly or asymptomatically, have also developed severe coronavirus disease 2019 (COVID-19). This study aims to assess potential predictors of intensive care unit (ICU) admission in a large population (n = 21,121) of...

  • Article
  • Open Access
18 Citations
6,320 Views
22 Pages

Hyperspectral Image Classification on Large-Scale Agricultural Crops: The Heilongjiang Benchmark Dataset, Validation Procedure, and Baseline Results

  • Hongzhe Zhang,
  • Shou Feng,
  • Di Wu,
  • Chunhui Zhao,
  • Xi Liu,
  • Yuan Zhou,
  • Shengnan Wang,
  • Hongtao Deng and
  • Shuang Zheng

26 January 2024

Over the past few decades, researchers have shown sustained and robust investment in exploring methods for hyperspectral image classification (HSIC). The utilization of hyperspectral imagery (HSI) for crop classification in agricultural areas has bee...

  • Data Descriptor
  • Open Access
28 Citations
8,631 Views
16 Pages

4 August 2022

The COVID-19 Omicron variant, reported to be the most immune-evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online...

of 278