You are currently on the new version of our website. Access the old version .

49 Results Found

  • Article
  • Open Access
3 Citations
4,114 Views
19 Pages

2 December 2024

This research presents a retrospective analysis of zero-shot object detectors in automating image labeling for eyeglasses detection. The increasing demand for high-quality annotations in object detection is being met by AI foundation models with open...

  • Article
  • Open Access
974 Views
26 Pages

21 November 2025

Achieving object grasping in everyday environments by leveraging the powerful generalization capabilities of foundational general models while enhancing their deployment efficiency within robotic control systems represents a key challenge for service...

  • Article
  • Open Access
1,492 Views
19 Pages

This study presents a zero-shot object detection framework for corner casting detection in shipping container operations, leveraging edge computing for intelligent robotic perception and control. The proposed system integrates Grounding DINO on a Ras...

  • Review
  • Open Access
5 Citations
3,517 Views
44 Pages

Low-Light Image and Video Enhancement for More Robust Computer Vision Tasks: A Review

  • Mpilo M. Tatana,
  • Mohohlo S. Tsoeu and
  • Rito C. Maswanganyi

Computer vision aims to enable machines to understand the visual world. Computer vision encompasses numerous tasks, namely action recognition, object detection and image classification. Much research has been focused on solving these tasks, but one t...

  • Article
  • Open Access
2 Citations
1,769 Views
23 Pages

Drone-Based Detection and Classification of Greater Caribbean Manatees in the Panama Canal Basin

  • Javier E. Sanchez-Galan,
  • Kenji Contreras,
  • Allan Denoce,
  • Héctor Poveda,
  • Fernando Merchan and
  • Hector M. Guzmán

21 March 2025

This study introduces a novel, drone-based approach for the detection and classification of Greater Caribbean Manatees (Trichechus manatus manatus) in the Panama Canal Basin by integrating advanced deep learning techniques. Leveraging the high-perfor...

  • Review
  • Open Access
161 Citations
35,788 Views
28 Pages

Remote Sensing Object Detection in the Deep Learning Era—A Review

  • Shengxi Gui,
  • Shuang Song,
  • Rongjun Qin and
  • Yang Tang

12 January 2024

Given the large volume of remote sensing images collected daily, automatic object detection and segmentation have been a consistent need in Earth observation (EO). However, objects of interest vary in shape, size, appearance, and reflecting propertie...

  • Article
  • Open Access
14 Citations
4,743 Views
30 Pages

12 February 2025

Multi-object tracking (MOT) is an important task in computer vision, particularly in complex, dynamic environments with crowded scenes and frequent occlusions. Traditional tracking methods often suffer from identity switches (IDSws) and fragmented tr...

  • Article
  • Open Access
1,792 Views
15 Pages

Zero-Shot Learning for Sustainable Municipal Waste Classification

  • Dishant Mewada,
  • Eoin Martino Grua,
  • Ciaran Eising,
  • Patrick Denny,
  • Pepijn Van de Ven and
  • Anthony Scanlan

Automated waste classification is an essential step toward efficient recycling and waste management. Traditional deep learning models, such as convolutional neural networks, rely on extensive labeled datasets to achieve high accuracy. However, the an...

  • Article
  • Open Access
2 Citations
2,602 Views
19 Pages

19 January 2025

The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown o...

  • Article
  • Open Access
3,755 Views
21 Pages

10 February 2025

Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot...

  • Article
  • Open Access
3 Citations
2,171 Views
27 Pages

5 June 2025

Automated food safety inspection systems rely heavily on the visual detection of contamination, spoilage, and foreign objects in food products. Current approaches typically require extensive labeled training data for each specific hazard type, limiti...

  • Article
  • Open Access
2 Citations
2,721 Views
16 Pages

Artificial Intelligence Vision Methods for Robotic Harvesting of Edible Flowers

  • Fabio Taddei Dalla Torre,
  • Farid Melgani,
  • Ilaria Pertot and
  • Cesare Furlanello

14 November 2024

Edible flowers, with their increasing demand in the market, face a challenge in labor-intensive hand-picking practices, hindering their attractiveness for growers. This study explores the application of artificial intelligence vision for robotic harv...

  • Article
  • Open Access
412 Views
15 Pages

18 December 2025

Traditional segmentation methods are slow and rely on manual annotations, which are labor-intensive. To address these limitations, we propose YOLO-SAM AgriScan, a unified framework that combines the fast object detection capabilities of YOLOv11 with...

  • Article
  • Open Access
382 Views
19 Pages

D-Know: Disentangled Domain Knowledge-Aided Learning for Open-Domain Continual Object Detection

  • Bintao He,
  • Caixia Yan,
  • Yan Kou,
  • Yinghao Wang,
  • Xin Lv,
  • Haipeng Du and
  • Yugui Xie

1 December 2025

Continual learning for open-vocabulary object detection aims to enable pretrained vision–language detectors to adapt to diverse specialized domains while preserving their zero-shot generalization capabilities. However, existing methods primaril...

  • Article
  • Open Access
2,861 Views
16 Pages

Prompt Self-Correction for SAM2 Zero-Shot Video Object Segmentation

  • Jin Lee,
  • Ji-Hun Bae,
  • Dang Thanh Vu,
  • Le Hoang Anh,
  • Zahid Ur Rahman,
  • Heonzoo Lee,
  • Gwang-Hyun Yu and
  • Jin-Young Kim

10 September 2025

Foundation models, exemplified by the Segment Anything Model (SAM), have revolutionized object segmentation with their impressive zero-shot capabilities. The recent SAM2 extended these abilities to the video domain, utilizing an object pointer and me...

  • Article
  • Open Access
1 Citations
2,422 Views
17 Pages

Zero-Shot Day–Night Domain Adaptation for Face Detection Based on DAl-CLIP-Dino

  • Huadong Sun,
  • Yinghui Liu,
  • Ziyang Chen and
  • Pengyi Zhang

Two challenges in computer vision (CV) related to face detection are the difficulty of acquisition in the target domain and the degradation of image quality. Especially in low-light situations, the poor visibility of images is difficult to label, whi...

  • Feature Paper
  • Article
  • Open Access
714 Views
21 Pages

KORIE: A Multi-Task Benchmark for Detection, OCR, and Information Extraction on Korean Retail Receipts

  • Mahmoud SalahEldin Kasem,
  • Mohamed Mahmoud,
  • Mostafa Farouk Senussi,
  • Mahmoud Abdalla and
  • Hyun Soo Kang

4 January 2026

We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-sca...

  • Article
  • Open Access
1 Citations
1,827 Views
17 Pages

22 April 2025

Log anomaly detection in cloud computing environments is essential for maintaining system reliability and security. While sequence modeling architectures such as LSTMs and Transformers have been widely employed to capture temporal dependencies in log...

  • Article
  • Open Access
6 Citations
3,015 Views
14 Pages

28 November 2024

This study conducted an analysis of zero-shot detection capabilities using two frameworks, YOLO-World and Grounding DINO, on a selection of images in the wild blueberry (Vaccinium angustifolium Ait.) cropping system. The datasets included ripe wild b...

  • Article
  • Open Access
2 Citations
3,199 Views
16 Pages

Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that in...

  • Article
  • Open Access
7 Citations
4,949 Views
16 Pages

Language Models for Multimessenger Astronomy

  • Vladimir Sotnikov and
  • Anastasiia Chaikova

With the increasing reliance of astronomy on multi-instrument and multi-messenger observations for detecting transient phenomena, communication among astronomers has become more critical. Apart from automatic prompt follow-up observations, short repo...

  • Article
  • Open Access
2 Citations
2,481 Views
28 Pages

25 June 2025

Object detection is essential for precision agriculture applications like automated plant counting, but the minimum dataset requirements for effective model deployment remain poorly understood for arable crop seedling detection on orthomosaics. This...

  • Article
  • Open Access
3 Citations
1,837 Views
13 Pages

A Rapid Construction Method for High-Throughput Wheat Grain Instance Segmentation Dataset Using High-Resolution Images

  • Qi Gao,
  • Heng Li,
  • Tianyue Meng,
  • Xinyuan Xu,
  • Tinghui Sun,
  • Liping Yin and
  • Xinyu Chai

13 May 2024

Deep learning models can enhance the detection efficiency and accuracy of rapid on-site screening for imported grains at customs, satisfying the need for high-throughput, efficient, and intelligent operations. However, the construction of datasets, w...

  • Article
  • Open Access
10 Citations
6,439 Views
29 Pages

1 November 2021

With the increasing number of underwater pipeline investigation activities, the research on automatic pipeline detection is of great significance. At this stage, object detection algorithms based on Deep Learning (DL) are widely used due to their abi...

  • Article
  • Open Access
9 Citations
8,904 Views
13 Pages

7 March 2024

Object detection is a crucial research topic in the fields of computer vision and artificial intelligence, involving the identification and classification of objects within images. Recent advancements in deep learning technologies, such as YOLO (You...

  • Article
  • Open Access
35 Citations
6,878 Views
20 Pages

Using Multimodal Large Language Models (MLLMs) for Automated Detection of Traffic Safety-Critical Events

  • Mohammad Abu Tami,
  • Huthaifa I. Ashqar,
  • Mohammed Elhenawy,
  • Sebastien Glaser and
  • Andry Rakotonirainy

2 September 2024

Traditional approaches to safety event analysis in autonomous systems have relied on complex machine and deep learning models and extensive datasets for high accuracy and reliability. However, the emerge of multimodal large language models (MLLMs) of...

  • Article
  • Open Access
24 Citations
5,619 Views
19 Pages

10 October 2024

The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This stud...

  • Article
  • Open Access
205 Views
15 Pages

12 January 2026

The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands...

  • Article
  • Open Access
7 Citations
4,936 Views
18 Pages

DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM

  • Fan Zhu,
  • Yifan Zhao,
  • Ziyu Chen,
  • Chunmao Jiang,
  • Hui Zhu and
  • Xiaoxi Hu

12 February 2025

Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes,...

  • Article
  • Open Access
1,509 Views
29 Pages

DASeg: A Domain-Adaptive Segmentation Pipeline Using Vision Foundation Models—Earthquake Damage Detection Use Case

  • Huili Huang,
  • Andrew Zhang,
  • Danrong Zhang,
  • Max Mahdi Roozbahani and
  • James David Frost

14 August 2025

Limited labeled imagery and tight response windows hinder the accurate damage quantification for post-disaster assessment. The objective of this study is to develop and evaluate a deep learning-based Domain-Adaptive Segmentation (DASeg) workflow to d...

  • Article
  • Open Access
1 Citations
2,180 Views
13 Pages

A CLIP-Based Framework to Enhance Order Accuracy in Food Packaging

  • Mattia Gatti,
  • Anwar Ur Rehman and
  • Ignazio Gallo

This study addresses the challenge of ensuring order accuracy in the dynamic environment of industrial food packaging through a novel zero-shot learning framework. The fundamental limitations of conventional systems, which rely heavily on pre-defined...

  • Article
  • Open Access
1 Citations
1,760 Views
22 Pages

13 February 2025

As maritime transportation and human activities at sea continue to grow, ensuring the safety of offshore infrastructure has become an increasingly pressing research focus. However, traditional high-precision sensor systems often involve prohibitive c...

  • Article
  • Open Access
2 Citations
993 Views
18 Pages

18 November 2025

Neural Architecture Search (NAS) is critical for developing efficient and robust perception models for UAV and drone-based applications, where real-time small object detection and computational constraints are major challenges. Existing NAS methods,...

  • Article
  • Open Access
2,809 Views
21 Pages

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

  • Lijie Tao,
  • Guoting Wei,
  • Zhuo Wang,
  • Zhaoshuai Qi,
  • Ying Li and
  • Haokui Zhang

14 July 2025

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are de...

  • Article
  • Open Access
3 Citations
1,788 Views
21 Pages

4 April 2025

This study evaluates the performance of You Only Look Once version 8 (YOLOv8) and a SAM-based unified and robust zero-shot visual tracker with motion-aware instance-level memory (SAMURAI) for worker detection in masonry construction environments unde...

  • Article
  • Open Access
1,874 Views
21 Pages

24 September 2025

Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated...

  • Article
  • Open Access
1,351 Views
37 Pages

Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing

  • Huming Zhu,
  • Tianqi Gao,
  • Zhixian Li,
  • Zhipeng Chen,
  • Qiuming Li,
  • Kongmiao Miao,
  • Biao Hou and
  • Licheng Jiao

23 August 2025

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-sca...

  • Article
  • Open Access
3 Citations
2,315 Views
14 Pages

EXACT-Net: Framework for EHR-Guided Lung Tumor Auto-Segmentation for Non-Small Cell Lung Cancer Radiotherapy

  • Hamed Hooshangnejad,
  • Gaofeng Huang,
  • Katelyn Kelly,
  • Xue Feng,
  • Yi Luo,
  • Rui Zhang,
  • Ziyue Xu,
  • Quan Chen and
  • Kai Ding

6 December 2024

Background/Objectives: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, accounting for 87% of lung cancer diagnoses, require radiation therapy. Rapid tre...

  • Article
  • Open Access
701 Views
23 Pages

Seconds count differently for people in danger. We present a real-time streaming pipeline for audio-based detection of hazardous life events affecting life and property. The system operates online rather than as a retrospective analysis tool. Its obj...

  • Article
  • Open Access
18 Citations
4,991 Views
24 Pages

ST-DeepGait: A Spatiotemporal Deep Learning Model for Human Gait Recognition

  • Latisha Konz,
  • Andrew Hill and
  • Farnoush Banaei-Kashani

21 October 2022

Human gait analysis presents an opportunity to study complex spatiotemporal data transpiring as co-movement patterns of multiple moving objects (i.e., human joints). Such patterns are acknowledged as movement signatures specific to an individual, off...

  • Article
  • Open Access
145 Views
35 Pages

13 January 2026

Deep learning models have advanced rapidly, leading to claims that they now match or exceed human performance. However, such claims are often based on closed-set conditions with fixed labels, extensive supervised training, and do not considering diff...

  • Article
  • Open Access
1,013 Views
19 Pages

27 September 2025

Background/Objectives: To develop an automated American Joint Committee on Cancer (AJCC) staging system for radical prostatectomy pathology reports using large language model-based information extraction and knowledge graph validation. Methods: Patho...

  • Article
  • Open Access
585 Views
23 Pages

Anatomical Alignment of Femoral Radiographs Enables Robust AI-Powered Detection of Incomplete Atypical Femoral Fractures

  • Doyoung Kwon,
  • Jin-Han Lee,
  • Joon-Woo Kim,
  • Ji-Wan Kim,
  • Sun-jung Yoon,
  • Sungmoon Jeong and
  • Chang-Wug Oh

20 November 2025

An Incomplete Atypical femoral fracture is subtle and requires early diagnosis. However, artificial intelligence models for these fractures often fail in real-world clinical settings due to the “domain shift” problem, where performance de...

  • Article
  • Open Access
2,700 Views
28 Pages

22 September 2025

Remote sensing visual question answering (RSVQA) involves interpreting complex geospatial information captured by satellite imagery to answer natural language questions, making it a vital tool for observing and analyzing Earth’s surface without...

  • Article
  • Open Access
276 Views
15 Pages

Few-Shot Transfer Learning for Diabetes Risk Prediction Across Global Populations

  • Shrinit Babel,
  • Sunit Babel,
  • John Hodgson and
  • Enrico Camporesi

19 December 2025

Background and Objectives: Type 2 diabetes mellitus (T2DM) affects over 537 million adults worldwide and disproportionately burdens low- and middle-income countries, where diagnostic resources are limited. Predictive models trained in one population...

  • Article
  • Open Access
1 Citations
357 Views
19 Pages

Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation

  • Andrey V. Kovalchuk,
  • Andrey A. Lebedev,
  • Olga V. Shemagina,
  • Irina V. Nuidel,
  • Vladimir G. Yakhno and
  • Sergey V. Stasenko

This study addresses the problem of correcting systematic errors in classical cascade object detectors under severe data scarcity and distribution shift. We focus on the widely used Viola–Jones framework enhanced with a modified Census transfor...

  • Article
  • Open Access
18 Citations
4,346 Views
20 Pages

4 September 2018

The goal of light detection and ranging (LIDAR) systems is to achieve high-resolution three-dimensional distance images with high refresh rates and long distances. In scanning LIDAR systems, an idle listening time between pulse transmission and recep...