To assist humans with their daily tasks, mobile robots are expected to navigate complex and dynamic environments, presenting unpredictable combinations of known and unknown objects. Most state-of-the-art object recognition methods are unsuitable for this scenario because they require that: (i) all target object classes are known beforehand, and (ii) a vast number of training examples is provided for each class. This evidence calls for novel methods to handle unknown object classes, for which fewer images are initially available (few-shot recognition). One way of tackling the problem is learning how to match novel objects to their most similar supporting example. Here, we compare different (shallow and deep) approaches to few-shot image matching on a novel data set, consisting of 2D views of common object types drawn from a combination of ShapeNet and Google. First, we assess if the similarity of objects learned from a combination of ShapeNet and Google can scale up to new object classes, i.e., categories unseen at training time. Furthermore, we show how normalising the learned embeddings can impact the generalisation abilities of the tested methods, in the context of two novel configurations: (i) where the weights of a Convolutional two-branch Network are imprinted and (ii) where the embeddings of a Convolutional Siamese Network are L2-normalised.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited