Next Article in Journal
Glucose Prediction under Variable-Length Time-Stamped Daily Events: A Seasonal Stochastic Local Modeling Framework
Next Article in Special Issue
The Role of Global Appearance of Omnidirectional Images in Relative Distance and Orientation Retrieval
Previous Article in Journal
Frequency Response of Pressure-Sensitive Paints under Low-Pressure Conditions
Previous Article in Special Issue
Microwave Sensors for In Situ Monitoring of Trace Metals in Polluted Water
Open AccessArticle

Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches

1
Computer Vision Center (CVC), Universitat Autònoma de Barcelona (UAB), 08193 Bellaterra, Spain
2
Computer Science Department, Universitat Autònoma de Barcelona (UAB), 08193 Bellaterra, Spain
*
Author to whom correspondence should be addressed.
Academic Editor: Guillermo Villanueva
Sensors 2021, 21(9), 3185; https://doi.org/10.3390/s21093185
Received: 30 March 2021 / Revised: 22 April 2021 / Accepted: 28 April 2021 / Published: 4 May 2021
(This article belongs to the Special Issue Feature Papers in Physical Sensors Section 2020)
Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images. View Full-Text
Keywords: co-training; multi-modality; vision-based object detection; ADAS; self-driving co-training; multi-modality; vision-based object detection; ADAS; self-driving
Show Figures

Figure 1

MDPI and ACS Style

Gómez, J.L.; Villalonga, G.; López, A.M. Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches. Sensors 2021, 21, 3185. https://doi.org/10.3390/s21093185

AMA Style

Gómez JL, Villalonga G, López AM. Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches. Sensors. 2021; 21(9):3185. https://doi.org/10.3390/s21093185

Chicago/Turabian Style

Gómez, Jose L.; Villalonga, Gabriel; López, Antonio M. 2021. "Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches" Sensors 21, no. 9: 3185. https://doi.org/10.3390/s21093185

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop