Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = space-variant computer vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 6781 KB  
Article
Prototype-Based Support Example Miner and Triplet Loss for Deep Metric Learning
by Shan Yang, Yongfei Zhang, Qinghua Zhao, Yanglin Pu and Hangyuan Yang
Electronics 2023, 12(15), 3315; https://doi.org/10.3390/electronics12153315 - 2 Aug 2023
Cited by 5 | Viewed by 3343
Abstract
Deep metric learning aims to learn a mapping function that projects input data into a high-dimensional embedding space, facilitating the clustering of similar data points while ensuring dissimilar ones are far apart. The most recent studies focus on designing a batch sampler and [...] Read more.
Deep metric learning aims to learn a mapping function that projects input data into a high-dimensional embedding space, facilitating the clustering of similar data points while ensuring dissimilar ones are far apart. The most recent studies focus on designing a batch sampler and mining online triplets to achieve this purpose. Conventionally, hard negative mining schemes serve as the preferred batch sampler. However, most hard negative mining schemes search for hard examples in randomly selected mini-batches at each epoch, which often results in less-optimal hard examples and thus sub-optimal performances. Furthermore, Triplet Loss is commonly adopted to perform online triplet mining by pulling the hard positives close to and pushing the negatives away from the anchor. However, when the anchor in a triplet is an outlier, the positive example will be pulled away from the centroid of the cluster, thus resulting in a loose cluster and inferior performance. To address the above challenges, we propose the Prototype-based Support Example Miner (pSEM) and Triplet Loss (pTriplet Loss). First, we present a support example miner designed to mine the support classes on the prototype-based nearest neighbor graph of classes. Following this, we locate the support examples by searching for instances at the intersection between clusters of these support classes. Second, we develop a variant of Triplet Loss, referred to as a Prototype-based Triplet Loss. In our approach, a dynamically updated prototype is used to rectify outlier anchors, thus reducing their detrimental effects and facilitating a more robust formulation for Triplet Loss. Extensive experiments on typical Computer Vision (CV) and Natural Language Processing (NLP) tasks, namely person re-identification and few-shot relation extraction, demonstrated the effectiveness and generalizability of the proposed scheme, which consistently outperforms the state-of-the-art models. Full article
(This article belongs to the Special Issue Machine Intelligent Information and Efficient System)
Show Figures

Figure 1

18 pages, 3761 KB  
Letter
Application-Oriented Retinal Image Models for Computer Vision
by Ewerton Silva, Ricardo da S. Torres, Allan Pinto, Lin Tzy Li, José Eduardo S. Vianna, Rodolfo Azevedo and Siome Goldenstein
Sensors 2020, 20(13), 3746; https://doi.org/10.3390/s20133746 - 4 Jul 2020
Cited by 1 | Viewed by 3443
Abstract
Energy and storage restrictions are relevant variables that software applications should be concerned about when running in low-power environments. In particular, computer vision (CV) applications exemplify well that concern, since conventional uniform image sensors typically capture large amounts of data to be further [...] Read more.
Energy and storage restrictions are relevant variables that software applications should be concerned about when running in low-power environments. In particular, computer vision (CV) applications exemplify well that concern, since conventional uniform image sensors typically capture large amounts of data to be further handled by the appropriate CV algorithms. Moreover, much of the acquired data are often redundant and outside of the application’s interest, which leads to unnecessary processing and energy spending. In the literature, techniques for sensing and re-sampling images in non-uniform fashions have emerged to cope with these problems. In this study, we propose Application-Oriented Retinal Image Models that define a space-variant configuration of uniform images and contemplate requirements of energy consumption and storage footprints for CV applications. We hypothesize that our models might decrease energy consumption in CV tasks. Moreover, we show how to create the models and validate their use in a face detection/recognition application, evidencing the compromise between storage, energy, and accuracy. Full article
(This article belongs to the Special Issue Information Fusion and Machine Learning for Sensors)
Show Figures

Graphical abstract

21 pages, 6127 KB  
Article
Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition
by Guangle Yao, Tao Lei, Xianyuan Liu and Ping Jiang
Appl. Sci. 2018, 8(10), 1835; https://doi.org/10.3390/app8101835 - 6 Oct 2018
Cited by 2 | Viewed by 3518
Abstract
As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered [...] Read more.
As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered as a temporal sequence of action units. These action units which we referred to as action atoms, can capture the key semantic and characteristic spatiotemporal features of actions in different temporal scales. We first investigate Res3D, a powerful 3D CNN architecture and create the variants of Res3D for different temporal scale. In each temporal scale, we design some practices to transfer the knowledge learned from RGB to optical flow (OF) and build RGB and OF streams to extract deep spatiotemporal information using Res3D. Then we propose an unsupervised method to mine action atoms in the deep spatiotemporal space. Finally, we use long short-term memory (LSTM) to model the temporal evolution of atoms for action recognition. The experimental results show that our proposed multi-temporal-scale spatiotemporal atoms modeling method achieves recognition performance comparable to that of state-of-the-art methods on two challenging action recognition datasets: UCF101 and HMDB51. Full article
(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)
Show Figures

Figure 1

Back to TopTop