For multiple query retrieval, the user is required to provide a set of images as query. The distance between the features of this set of images and the feature of each image in the database is then computed. The keystone of a multiple query system, then, is how to compute this distance which should reflect the information conveyed by the low-level features of the set images provided as query. In the rest of the paper, we will refer to this distance as the set distance and we will refer to the set of images that define a single query as the set of queries. In the last decade, several query retrieval systems have been proposed. In the following, we describe these approaches.
In [
17], the authors propose a weighted color and texture histogram algorithm which computes the set distance using the multi-histogram intersection method [
18]. This involves a weighted combination of the texture and color distances. The color and texture weights provide the relative importance of each feature, and cross-validation technique is suggested as a way of determining these parameters. Image Grouper [
19] requires the user to provide two sets of query images. One set relevant to user semantics is called multiple positive groups, and another set irrelevant to user semantics is called multiple negative groups. The set distance is computed as the distance between the mean of each positive group and the image in the database using Fisher’s discriminant analysis (FDA) [
20]. The minimum weighted distance combination algorithm [
21] uses a linear weighted summation of the different features. The set distance is defined as the minimum distance between each query image and the image in the dataset. Similarly, the standard-deviation-based weights approach [
22] defines the set distance the same as in [
21]. However, the weights are defined as the normalized inverse of the standard deviation of the image features. The MindReader approach [
23] requires the user to provide a goodness score for each selected query image. Using the covariance matrix of the set of query images, this approach learns an appropriate Mahanolobis distance. The authors in [
23] defined an objective function as the sum of the Mahalanobis distances of the optimal feature vector and the images in the database weighted by the corresponding goodness scores. Minimizing this objective function provides the optimal feature vector. The set distance is then expressed as the Mahanalobis distance [
24] between an image in the database and the optimal feature vector. The logic AND-based distance [
25] assumes that retrieved images must be similar to all query images. Thus, the set distance is expressed as the maximum of the Euclidean distances between the image query and the database images. The multi-feature query [
26] combines logic AND and logic OR distances. This approach takes the minimum distance of a given query image
and a given database image
with respect to the different features and the maximum of query set distances to a given image in the database. The linear distance combination approach in [
27] compels the user to provide a set of query images and their respective weights or scores of goodness. The set distance is a linear weighted sum over all distances between the query images and the database image.
To the best of our knowledge, the reported approaches are the only works that deal with multiple query content based image retrieval in the literature, and although deep learning has been an active field of research, deep learning approaches have been used in the context of image retrieval only to extract the features. In fact, no deep learning-based approach has been reported to learn the sematic of the query. Concerning the approaches stated above, some require that the user provide a set of feature weights, while others compel the user to give a score of goodness set. While these weights and scores of goodness could enhance the retrieval results, they are very ambiguous to set and make the query impractical for the user. Other approaches learn the distance using the Mahanalobis distance or the standard deviation. However, these two approaches require an important number of query images to produce a significant result. This will also yield an impractical query for the user. Some other approaches use the minimum and the maximum to select a feature or a query image. This is a type of crisp weighting of the features, whereby the selected feature is assigned a value of one, and the others are assigned a value of zero. Also, when applied to query images, this approach offers a crisp score of goodness, where the selected query image is assigned a score of one. However, a selection of one feature or one query image will discard information on the remaining ones that could be useful.