Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions

Ashraf, Rehan; Bashir, Khalid; Irtaza, Aun; Mahmood, Muhammad Tariq

doi:10.3390/e17063552

Open AccessArticle

Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions

by

Rehan Ashraf

¹,

Khalid Bashir

¹,

Aun Irtaza

² and

Muhammad Tariq Mahmood

^3,*

¹

Department of Computer Engineering, University of Engineering and Technology , Taxila 47050, Pakistan

²

Department of Computer Science, University of Engineering and Technology , Taxila 47050, Pakistan

³

School of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 330-708, Korea

^*

Author to whom correspondence should be addressed.

Entropy 2015, 17(6), 3552-3580; https://doi.org/10.3390/e17063552

Submission received: 9 March 2015 / Revised: 7 May 2015 / Accepted: 18 May 2015 / Published: 29 May 2015

Download

Browse Figures

Versions Notes

Abstract

:

One of the major requirements of content based image retrieval (CBIR) systems is to ensure meaningful image retrieval against query images. The performance of these systems is severely degraded by the inclusion of image content which does not contain the objects of interest in an image during the image representation phase. Segmentation of the images is considered as a solution but there is no technique that can guarantee the object extraction in a robust way. Another limitation of the segmentation is that most of the image segmentation techniques are slow and their results are not reliable. To overcome these problems, a bandelet transform based image representation technique is presented in this paper, which reliably returns the information about the major objects found in an image. For image retrieval purposes, artificial neural networks (ANN) are applied and the performance of the system and achievement is evaluated on three standard data sets used in the domain of CBIR.

Keywords:

bandelet transform; Gabor filter; content based image retrieval; artificial neural network; geometric extraction

1. Introduction

The number of digital images in the form of personalized and enterprise collections have grown immensely. Hence, there is a growing demand for powerful image indexing and retrieval in an automatic way. However, with such widespread use and availability of images, the textual annotation of images (using keywords) is becoming impracticable and unsuitable for the representation of images and their retrieval. This is the reason that content based image retrieval (CBIR) has become a great research interest amongst research communities [1–3].

Content based image retrieval systems generate meaningful image representations by considering the visual characteristics of images, i.e., color [4–6], texture [7–10], shape [11,12] and salient points [13,14] and bring closely resembling images in terms of distance as the semantic response. One of the major challenges faced in this regard is semantic gap, i.e., features at low level are not sufficient to characterize the high level image semantics [15]. To bridge this gap to some extent, an important focus of research is on the enhancement of these features, so that the machine learning algorithms can make significant improvements to bridge this gap. Features can be split into two main categories, i.e., global image features and local image features of the region representations. Global image features are generated by considering the whole image as a single entity and then feature representations are obtained. Color histograms [6,16], texture [10,17,18], color layout [4,5] etc. are all examples of the global image features. Region representation or the local features are obtained either by the segmented regions [11,19,20] or by detecting the interest points, e.g., features obtained by scale-invariant feature transform (SIFT) [21] and sped up robust features (SURF) [14]. Segmentation allows to generate the representations that take into account the actual objects of interest and avoid image regions that are less informative. This is the reason that the image representations obtained by the image segmentation are much more powerful than the global image representations. However, the drawback of the segmentation based image features is that currently there is no technique available that can perform the image segmentation in a better way, also the associated cost for image segmentation in terms of computational time makes it of no practical use [11,20,22,23].

To reap the benefits of the segmentation based image representations and to overcome the associated drawbacks, the focus of the current manuscript is on the identification of the image segments that contain major image objects by applying bandlet transform. Bandlet transform returns the geometric representation of the texture of the object regions, which can be used to discriminate the objects of interest in a fast way. The detailed procedure will be described in the Section 3.1. The major problem with the geometric output is that its empathy is complicated due to the close resemblance of the connected regions. In order to ensure the actual association, artificial neural networks are applied and correct texture classification is performed. We then apply the Gabor filter and generate the texture representation from it based on the classification output. To further enhance the image representation capabilities, we have also estimated the color content in the YCbCr color domain and defused it with texture.

As described by the Irtaza et al., major drawbacks faced by a CBIR system or query by image content which severely impact the retrieval performance [24] are: (1) the lack of output verification and (2) neighborhood similarity avoidance for the semantic association purpose. Therefore, we have followed their findings and also included the neighborhood in the semantic association process. Content based image retrieval is then performed by the artificial neural networks after training them with these obtained features. Bandelet transform is used for medical image retrieval [25]. However, their approach for feature extraction is different than our approach. Before this, researchers have utilized the bandeletization property for image compression [26], image enhancement [27] and gender classification [28].

The remainder of paper is presented as follows: In Section 2 related work in the domain of CBIR are introduced. In Section 3 the proposed techniques is described in detail. Experimentation results are provided in Section 4. Finally, we have concluded our findings in Section 5.

2. Related Work

Numerous CBIR systems [1,29] are proposed so far to focus on the image searching problem and efficient image retrieval system in a more effective way. For this, the research intent is on the exploration of the new signature types [18] and powerful image similarity detection measures. In CBIR, image signature plays an imperative role to fabricate an efficient image search. Query image and images found in the repository are qualified as a collection of feature vectors and ranking of the relevant results occur on the basis of common norms, i.e., distance or semantic association by a machine learning technique [15]. Signature development is usually performed through the analysis of color [4,16], texture [17], or shape [19] or by generating any of these combinations and representing them mathematically [8]. Color features are extensively used in CBIR, which may be endorsed to the better potentiality in three dimensional domains over the gray level images which is single dimensional domain. Texture features as powerful visual features are used to capture repetitive patterns of a surface in the images. The formation of human identity and recognize objects in the real world is known as an important cue of the shape. Shape features form have been used for the purpose of retrieving images in many applications [30]. Shape features extraction techniques are classified into contour based and region based methods. The classification of contour method based on shape boundary which is consist of contours and extracted features from the contour, while region based methods extract the features from the entire region.

A color estimation method in images through color difference histograms (CDH) is presented [16]. Their work has diminished the reliance on the frequency of pixels in an image. The unique characteristic of CDH is that they count the perceptually uniform color differences between the two points with respect to color and edge orientations in L × a × b color space. Dominant colors and edge orientations are then used for image representation purposes. In the work of [31] a color-texture and dominant color based image retrieval system (CTDCIRS) is proposed and three different features from the images, i.e., motif co-occurrence matrix (MCM), dynamic dominant color (DDC), and difference between pixels of scan pattern (DBPSP) are offered. In their first step, using a color quantization algorithm, the image is divided into an eight coarse partition and the dominant colors are obtained from every partition. In their second step, MCM and DBPSP are applied for texture representation of an image using motif transform. The above main three features are amalgamated to expedite the efficiency of the system. In the work of Yue et al. [32], they introduced a method for feature extraction which is based on texture-color features. Combination of texture and color features are used for automatic retrieval purpose. In the initial step hue-saturation-value (HSV) color space is measured and co-occurrence matrices are then exerted to build the texture features.

The work of [33] presented a feature extraction approach by generating the curvelet representation of the images. The Curvelet transformations are combined with a vector codebook of region based sub-band clustering for extraction of dominant color. In this approach, the user defined image and target images are compared by using the principle of the most similar highest priority (MSHP) and evaluated for the retrieval performance. In the work of [34], a color histogram based on the wavelet is introduced, which also considers the texture and color component of the images for image retrieval. In the work of [35] the Micro Structure Descriptor (MSD) was proposed, which depend on the orientation of the edges. It depends on the primary colors in the small structures with similar edge orientation, which are simulated according to the human visual processing estimation. Hejazi and Ho [17] presented an image retrieval approach, which is based on textural information of an image and their approach considers the directionality, orientation and regularity of the texture using nonlinear modified discrete Radon transform, which are further used for image retrieval purposes. The Vector Quantization (VQ) is applied in the work of [18] for feature extraction. According to their approach, an image is cleaved into pixel blocks of 2 × 2 size in RGB domain. These blocks are then used for training vectors development. For the initial training sets purpose, linde buzo gray (LBG) and kekre proportionate error (KPE) algorithms are applied. After training, the same approach is used for testing and decent results are produced.

In the work of [36], they transformed the RGB image in the opponent chromaticity space. In the chromaticity space, Zernike chromaticity distribution moments are used to capture the characteristic of color contents. In the contour domain, texture features are extracted using a scale-invariant and rotation-invariant image descriptor. The combination of texture-color features of the human visual system provides efficient and flexible approximation processing. In the work of [13], they proposed a CBIR system using color and texture features of an image sub region. This is to identify regions of interest (ROI) almost by segmenting the image in the fixed sections after the application of morphological dilation. The color of the ROIs are calculated from the quantized histogram HSV color space and Texture are computed from gray level co-occurrence matrix (GLCM). The Query and target Images are compared with same numbers of Region of interests. Lin et al. [37], proposed three image features for efficient automatically retrieval of images. To extract the texture feature, difference between pixels of scan pattern (DBPSP)are used while color co-occurrence matrix (CCM)are used to get the color features. The last image feature depend on color distribution, called color histogram for K-mean (CHKM).

Irtaza and Jaffar [24] presented a potential solution of retrieving semantically similar images from large image repositories against any query image. Genetic algorithm and support vector machines are used to minimize the existing gap between high-level and low level features. To avoid the risk of dissociation, relevance feedback is also incorporated in their work. In the work of [38], presented a gender classification technique which is based on the feature selection through Eigen values of Gabor filters and Haar based wavelet packets. As shown the experiment result, Gabor features combined with random mean feature values they improve the performance of image retrieval and classification accuracy. Jhanwar et al. [39], presented a new technique for CBIR using motif co-occurence matrix (MCM). MCM is obtained by using motif transformed image and is similar to color co-occurence matrix.

In the work of [40], texture is the main difficulty in the way of segmentation. It is very difficult to estimate the parameters of texture model when texture segmentation are applied. These problem are controlled by using J-image segmentation (JSEG). JSEG consists of two independent steps: spatial segmentation and color quantization. Wang et al. [22] introduced a semantics classification method, which uses a wavelet-based approach for feature extraction and then for the image comparison. They used image-segmentation-based region matching approach. integrated region matching (IRM) proposed in this work is not efficient for texture classification due to uncertain modeling. Therefore, to address this issue, their idea was further processed by Chen et al. [23]. In the work of [23], they proposed a method cluster-based retrieval of images by unsupervised learning (CLUE). They proposed an unsupervised clustering-based technique, which generate multiple clusters of retrieved results and give more accurate results as compared to the previous work, but their method suffers from issues such as numbers of clusters identification and segmentation uncertainty, due to which the results of this technique are not reliable. ElAlami [41] proposed a model which is based on three different techniques: the first one is concerned with the features extraction from images database. For this purpose, Color histogram and Gabor filter are used to extract the color and texture features. While the second technique depends on genetic algorithm and obtained the optimal boundaries of these discrete values, in the last technique, the selection of features consist of two successive function which are called preliminary and intensely reduction for extracting the most similar features from the original feature repository sets.

Some other visual features are also proposed for CBIR, Important points such as salient points and spatial features. SIFT [21] and SURF [32] based on salient points found in an image are the familiar visual features. Researchers have done a lot of work by using these salient points in content based image retrieval. Velmurugan and Baboo [14] applied SURF features by combining them with the color features to improve the retrieval accuracy. Mallat and Peyré [42] introduced bandelet approaches to geometric image representations. Orthogonal Bandelets using an adaptive segmentation are well appropriate solution to capture the regularity of edge structure. They applied wavelet coefficient of orthogonal transformation based on bandeletization.

Qu et al. [43] proposed a system based on bandelet transform and represented sharp image transitions such as edges by taking the advantage of geometric regularity of the image structure in image fusion. To create the fused image max rule is applied to select the geometric flow of the source image and bandelet coefficient.

The technique that we present in the current paper considers the most prominent objects which exist in an image using the object geometric representation obtained by bandelet transform in a precise manner. The texture information found in object boundaries are then utilized for use as the component of the feature vectors after applying the targeted parameters to the Gabor transform based on the Artificial Neural Network suggestions. The features are further improved by incorporating the color information in YCbCr domain. Image semantics are then obtained by the Artificial Neural Networks.

3. Proposed Method

The most important capability of the proposed method is its attribute for identifying the most prominent objects in an image. These objects are then considered as the core outcomes which are used for the generation of feature vectors. To achieve this, first of all, image transformations are generated through bandlet transform, which return the geometric boundaries of the major objects found in an image. We apply Gabor filter with targeted parameters (as will be described) to estimate the texture content around these boundaries. These geometric boundaries are vague in a sense that they could easily be deceived to be associated with unwanted texture classes as all of them closely resemble to one another, and if not carefully considered can result in the form of wrong parameter estimation, which will be consequence of the form of unsatisfactory image retrieval output. Therefore, to avoid this situation, geometric classification is performed through the backpropagation neural networks, which make it certain that the texture estimation parameters to apply Gabor filter should be approximated with maximum accuracy.

To increase the power of feature vectors, color components are also induced in the YCbCr domain after approximating it through wavelet decomposition over the color histograms. The proposed features are applied on all images present in the image repository, and their semantic classes are determined through ground truth training with Artificial Neural Networks and the finer neighborhood of every image. We generated inverted index over the semantic sets, which guarantees the fast image retrieval after determining the semantic class of query image. The Complete process of the proposed method is represented in Figure 1 and the detail of the process will come in the subsequent sections.

3.1. Bandelet Transform

The issue with Wavelet bases was that the same values of texture have different directions in an image. To overcome this limitation Le Pennec and Mallat et al.[42,44] proposed the Geometric regularity in an anisotropic way by eliminating the redundancy of wavelet transform using the concept of bandeletization. Bandelet transform is a major self adaptive multiscale geometry analysis method which exploits the recognized geometric information of images as compared to the non adaptive algorithms such as curvelet [9,33] and Contourlet transforms [7]. Bandelet transform not only has the uniqueness of multiscale analysis, directionality and anisotropy but it also presents particular possessions of severe sampling and adaptability for image representation. Bandelet basis rules accumulate carriers extend in the direction perpendicular to the regularity of the maximum of the function as shown in Figure 2. Alpert transform is used for bandeletization that closely follows the geometry of underlying images. The main objective is to take the advantage of sharp image transitions by computing the geometric flow to form bandelet bases which capture the grayscale images constantly changing direction.

As shown in Figure 3, bandelet transform divides the image into square blocks and obtains one contour (Ω_i) from it. If a small image region does not contain any contour, it means that the image intensity is uniform and regular in that region therefore the flow of line is not defined.

3.1.1. Alpert bases in bandelet transform

As per the work of [45], Alpert transform was applied to compute the bandelet bases to approximate images having some geomatric regularity. For this, the image is divided into square blocks and the geometric flow is estimated in every block. These square block are represented by S. In our implementation the block size is 8 × 8.

As we elaborated in Figure 3, if we use smaller blocks, i.e., 4×4, then it will divide the image in more chunks. However, the drawback is that in this case, the bandlet transform would not be able to capture the sharp edges and similarly, if the block size is larger, i.e., 16 × 16 or 32 × 32, then the geometric flow exceeds the object boundaries. Hence, through the experimental observations we used the block size of 8 × 8 for appropriate object estimation. Alpert transform parallel to the geometric flow which is constructed over the space l²(S) of wavelet coefficients in S with piecewise polynomials over bands of dyadic widths. A geometric flow direction γ is assumed to be known over S and warping operator W warps S into derivative of S. In the theory of warping function, any point x_n = 2^j n is warped into

{\bar{x}}_{n} = W (2^{j} n)

. Similarly l²(S) represents the function sampled in warped domain, i.e.,

\bar{g} ({\bar{x}}_{n}) (2^{j} n) \in S

. To explain the multiresolution for each scale 2^l, the warped square S is recursively subdivided into 2⁻^l horizontal bands through:

\bar{S} = \cup_{i = 0}^{2^{- l - 1}} {\bar{β}}_{l, i}

(1)

In Equation (1), 2nd term depends upon

{\bar{β}}_{l, i} = {\bar{β}}_{l - 1, 2 i} \cup {\bar{β}}_{l - 1, 2 i + 1}

. Now the value of band is calculated in original Square “S” using Alpert multiresolution space

{\bar{V}}_{l} \subset l^{2} (\bar{S})

i.e.

{\bar{β}}_{l, i} =^{d e f} W^{- 1} ({\bar{β}}_{l, i}) \in S

that has the width roughly equal to λ2^l and sampling points equals to 2^l(λ2^l)². Alpert vector is obtained through Equations (2) and (3):

\forall {\bar{V}}_{l} = \tilde{g} \in l^{2} (\tilde{S}) \in {\bar{β}}_{l, i}

(2)

\tilde{g} ({\bar{x}}_{n}) = P_{i} ({\bar{x}}_{n})

(3)

According to multiresolution space orthogonal bases

{({\tilde{h}}_{l, i, k})}_{i, k}

of each space are obtained by Gram Schmidt orthogonalization and resulting vector is obtained as:

{\tilde{p}}_{k} ({\tilde{x}}_{n}) = {({\tilde{x}}^{1})}^{k_{1}} {({\tilde{x}}^{2})}^{k_{2}}

(4)

Alpert wavelets

{({\tilde{Ψ}}_{l, i, k})}_{i, k}

are the orthogonal bases of orthogonal complement

({\tilde{w}}_{l})

of

({\tilde{V}}_{l})

. Therefore we compute Alpert wavelet vectors

{({\tilde{Ψ}}_{l, i, k})}_{k}

after applying Gram-Schmidt orthogonalization of the family.

{[{\tilde{h}}_{l - 1, 2 i, k} - {\tilde{h}}_{l - 1, 2 i + 1, k}]}_{k 1 + k 2} < p \subset {\tilde{V}}_{l - 1}

(5)

The consequential multiwavelet vectors

({\tilde{Ψ}}_{l, i, k})

have vanishing moments over the warped domain which is orthogonal to

({\tilde{V}}_{l})

.

\sum ({\tilde{Ψ}}_{l, i, k}) (\tilde{x}) {(\tilde{x})}^{k} = 0

(6)

The above equation satisfies the condition, where

{(\tilde{x})}^{k} = {({\tilde{x}}^{1})}^{k 1}

for each point

{\tilde{x}}_{n} = ({\tilde{x}}^{1}, {\tilde{x}}^{2})

in the warped domain. The orthogonal bases

{({\tilde{Ψ}}_{l, i, k})}_{_{l, i, k}}

of

l^{2} (\tilde{S} S)

describes an orhogonal alpert bases in l²(S) domain.

({\tilde{ψ}}_{l, i, k}) (x_{n}) = ({\tilde{ψ}}_{l, i, k}) ({\tilde{x}}_{n})

(7)

In square block at scale 2^l we calculate the orthogonal alpert bases

β (S, \tilde{γ})

of l²(S) by:

β (S, \tilde{γ}) =^{d e f} ({\tilde{ψ}}_{l, m}) | L \leq l \leq 0 a n d 0 \leq m < p (p + 1) (2^{l - 1})

(8)

Bandelet transform provides the information of each square block and the flow of each square S is undefined then the projection onto

β (S, \tilde{γ})

leaves the wavelet coefficient in S as unchanged. Bandelets have to provide the dyadic segmentation with bandeletization bases β(Γ_j) of the whole space of wavelet coefficients at a scale 2^j.

β (Γ_{j}) = \underset{S \in S_{j}}{\cup} β (S, {\tilde{γ^{'}}}_{S})

(9)

After applying alpert transform, we get a vector for each square, i.e.,

{\tilde{ψ}}_{v} [n] = {\tilde{ψ}}_{l, k}) [n]

(10)

In the equation above

{\tilde{ψ}}_{v} [n]

are the coordinates of bandelet function and fit in the space of L²([0, 1]²). These coordinates are further used to calculate the bandelet bases which is called bandelization.

β (Γ_{j}) = \underset{j \leq 0}{\cup} {b_{v} | ψ_{v} \in β (Γ_{j})} w h e r e Γ = \underset{j \leq 0}{\cup} Γ_{j}

(11)

Bandelet bases are important factor to calculate the geometric images. Therefore in bandelet transform best bases are obtained by minimizing the Lagrangian function.

β (Γ *) = {argmin}_{β (Γ) \in D_{T} 2} ℒ (f, β (Γ), T)

(12)

Bandelet transform pursues the above equation to generate the geometries of an image. In the above equation, “T” is the threshold. The value of threshold has impacts on the diversity of the image estimation. Different values can be adopted, i.e., 32, 48, 56,…, etc. Therefore, in our implementation, we used a threshold value of 70 after detailed experimentation, which is able to estimate the theme object in an image. In the work of [43], each block is approximated in separable wavelet bases of L₂(Ω) domain i.e.,

\begin{matrix} ϕ_{j, m} (x) = ϕ_{j, m_{1}} (x_{1}) ϕ_{j, m_{2}} (x_{2}) \\ ψ_{j, m}^{H} (x) = ϕ_{j, m_{1}} (x_{1}) ψ_{j, m_{2}} (x_{2}) \\ ψ_{j, m}^{V} (x) = ψ_{j, m_{1}} (x_{1}) ϕ_{j, m_{2}} (x_{2}) \\ ψ_{j, m}^{D} (x) = ψ_{j, m_{1}} (x_{1}) ψ_{j, m_{2}} (x_{2}) \end{matrix}} where j, m 1, m 2 \in I (Ω)

(13)

where I(Ω) is index set which depends upon the geometry of the boundary of (Ω) and x₁, x₂ denotes the location of pixel in the image. Above equation represents the customized wavelets at the boundary and geometry flow is calculated in region (Ω). These wavelet bases are replaced by bandelet orthonormal bases of L₂(Ω). Then,

\begin{matrix} ϕ_{l, m_{1}} (x_{1}) ψ_{j, m_{2}} (x_{2} - c (x_{1})) \\ ψ_{j, m_{1}} (x_{1}) ϕ_{j, m_{2}} (x_{2} - c (x_{1})) \\ ψ_{j, m_{1}} (x_{1}) ψ_{j, m_{2}} (x_{2} - c (x_{1})) \end{matrix}} where j, m 1, m 2 \in I (Ω)

(14)

In the above equation c(x) defines the line flow associated to fix translation parameter x₂, (x₁, x₂+c(x₁)) be in the right place to (Ω) and is the direction of geometric flow which is extended. Then, c(x) is obtained as:

c (x) = \int_{x_{\min}}^{x} c^{'} (x) d x

(15)

This flow is parallel and c′(x) is calculated as an expansion over translated function dilated by a scale factor 2^l. Then, the flow at this scale is characterized by:

c^{'} (x) = \sum_{n = 1}^{2^{k - l}} a_{n} b (2^{- l} t - n)

(16)

The bandeletization of wavelet coefficient use alpert transform to define a set of bandelet coefficients and by using these coefficients combined as inner product of original image f with bandelets:

b_{j, l, n}^{k} (x) = \sum_{p} a_{l, n} [p] ψ_{j, p}^{k} (x)

(17)

Local geometric flow depends upon these coefficient and scales. Therefore, for each scale 2^j of the orientation k a different geometry is obtained. After bandeletization process, we have achieved multiscale low and high pass filtering structure similar to wavelet transform. The above Equations (12) and (17) are used to calculate the geometry of the images as shown in Figure 4. The regions having contours are further used for texture classification to compute the features using Artificial Neural Network structure.

3.1.2. Texture Feature Extraction using Bandelets

Texture is significant module of human visual perception and many researchers have done a lot of work to determine effectively characterize it in images. In this paper, we have proposed a new method to figure out the most prominent texture areas in the image that constitutes the major image objects. In the proposed method, first of all image transformations are generated through bandelet transform, which gives back the geometric boundaries of the major objects found in an image. We apply a Gabor filter with targeted parameters to estimate the texture content around these boundaries. These geometric boundaries are indefinite in the sense that they could easily be duped to be associated with undesired texture classes as all of them closely resemble to one another, and if not carefully considered can result in the form of wrong parameter estimation, which will be in consequence of the unsatisfactory image retrieval output. Therefore, to avoid this situation, geometric classification is performed through the back propagation neural networks, which makes it certain that the texture estimation parameters to apply Gabor filter should be approximated with maximum accuracy. The following are the main steps of texture feature extraction:

Convert input RGB image (I) of size M × N into gray scale image.
Apply bandelet transform to calculate the geometry of an image and obtain the directional edges.
Artificial Neural Network is used to classify these blocks having directional edges after training on the sample edge set as described in Figure 5. Once the network is trained, every geometric shape obtained in the step 2 is classified for parameter estimation. These parameters will further be described in the Gabor filter section.
After parameter estimation, the blocks with geometric contents are passed to the Gabor filter to estimate the texture.
Steps 1 to 4 is repeated for whole image repository.

3.1.3. Artificial Neural Network

Neural networks (NN) are renounced as powerful and dominant tools in the area of pattern recognition and are inspired by biological neurons found in the human brain. The Least mean square rule and the gradient search method are used to minimize the average difference between input and target values on neural network [46].

The backpropagation neural networks are applied to classify the texture on the base of geometry returned by the bandelet transform. In this regard, we classify the texture directions in either horizontal, vertical, right/left diagonal or no contour blocks, by training on a small set developed manually. For this, we have placed 14 block samples representing the mentioned geometric types in every category, as described in Figure 5. To generate these samples, we consider only the image geometry and suppress the original image part. Once the network is trained, we apply it to classify every block present in the image. The reason to perform this task with the help of ANN instead of kernel (Window based operations used in image processing) is that, the geometry is not fixed and has different variations for same category. In this situation, the performance of the kernel based operations is miserable. Therefore, the ANN is applied and it classified the texture with maximum accuracy. Figure 6 shows the structure of neural network. The neural networks structure is defined with one hidden layer having 20 neurons and four output units. The sigmoid function is used in hidden layer and output layer as transfer function, i.e.,

f (x) = g (x) = \frac{1}{1 + \exp (- x / x_{0})}

(18)

Details of the neural network structure is summarized in Table 1.

After training of the neural networks, all blocks of an image are tested against the neural network, and their texture type is determined as:

m^{↓} = a r g m a x (\bar{y} f m)

(19)

where m represents the total number of output units in the neural network structure,

\bar{y} f m

returns the association factor of a particular output unit. The texture type m^↓ of the particular class is based on the output of the neural network with highest association factor.

3.1.4. Gabor Feature

Gabor filters are extensively used in the vicinity of computer vision and pattern recognition. Several conquering applications of Gabor wavelet filter include feature extraction, segmentation of texture, face recognition, identification of finger prints, edge detection, contours detection, directional image enhancement, and image hierarchical representation, compression, recognition. Gabor is a strong technique to reduce the noise and can easily reduce image redundancy and repetition [47]. Gabor filters can be convolved on a small portion of an image or can be convolved on the full image. An image region is expressed by the different Gabor responses generated through different orientations, different frequencies and angles [24,48]. For an image I(x, y) with size M×N, its discrete Gabor wavelet transform is given by convolution:

G_{m n} (x, y) = \sum_{s} \sum_{t} I (x - s, y - t) ψ_{m n}^{*} (s, t)

(20)

where s and t are filter mask size and

ψ_{m n}^{*}

is the complex conjugate of ψ_mn which is the self similar function created from rotation and dilation of following wavelets, i.e.:

ψ (x, y) = \frac{1}{2 π σ_{x} σ_{y}} \exp [- \frac{1}{2} (\frac{x^{2}}{σ_{x}^{2}} + \frac{y^{2}}{σ_{y}^{2}})] \exp (j 2 π λ x)

(21)

where λ is the modulation frequency. The generating function is used to obtain self similar Gabor wavelets.

ψ_{m n} (x, y) = a^{- m} ψ (\tilde{x}, \tilde{y})

(22)

where m and n specify the scale and orientation of wavelet, with m = 0, 1,…, M − 1 and n = 0, 1,…, N − 1. In the above equation, we calculate the term

\tilde{x}

,

\tilde{y}

, i.e.:

\tilde{x} = a^{- m} (x c o s θ + y s i n θ)

(23)

\tilde{y} = a^{- m} (- x s i n θ + y c o s θ)

(24)

where a > 1 and θ = nπ/N. In Gabor filter σ is the standard deviation of the Gaussian function, λ is the wavelength of harmonic function, and θ is the orientation. In our implementation, blocks having bandelet based geometric response are passed to the Gabor filter and based on the neural network classification, we select the parameters for the application of the Gabor filter [48].

For horizontal texture portions:

θ = π, and λ = 0.3.

For vertical texture portions:

θ = π/2, and λ = 0.4.

For left diagnol texture portions:

θ = π/4, and λ = 0.5.

For right diagnol texture portions:

θ = 3π/4, and λ = 0.5.

Energy computation is performed using following equation:

F_{v} = μ ((A - λ_{E} I) X)

(25)

where F_v is the feature vector. λ_E are the eigen values, X is the eigen vector, and A is the Gabor response on a particular block.

3.2. Color Feature Extraction

In CBIR, color is the most imperative and significant visual attribute. It has been extensively studied and the motivation is that: the color estimation is not sensitive to rotation translation, and scale changes. Varieties of color spaces are available and serve effectively for different applications [6,49]. In this paper, color features are extracted on the base of edge detection in YCbCr color space. Edges are extracted by applying canny edge detector on Y luminance component. The main steps of color feature extraction are as under:

RGB image (I) is converted into Y C_bC_r color space.
After conversion, we separate the Y C_bC_r components and apply canny edge detector on Y component of the image.
In the next step, we combine the edges obtained in the previous step with unchanged C_b and C_r.
After step (3), convert the combined image back into single RGB image.
Now separate the individual R, G, and B components and calculate the histogram of each component. 256 bins are obtained from H_R, H_G, and H_B.
To improve the feature performance, we apply wavelet transform at each histogram obtained in the previous step. We apply the discrete wavelet transform of H_R at level 2, H_G, and H_B are applied at level 3. After this step, we have 128 bins i.e. 64 bins from H_R, 32 bins from H_G, and 32 bins from H_B.
Calculate feature vector for every image in the repository.

Figure 7 shows that the color features are obtain as describe in above mentioned step.

3.3. Fusion Vector

Application of the above mentioned procedure generate two feature vectors, representing texture features obtained from Bandelet transform and color features obtained from YCbCr color space. Aggregation of these feature vectors in a single vector represents feature vector features against any image.

3.4. Content based image retrieval

Once the images present in the image repository are represented in the form of low level features, we can determine their semantic class. To determine the actual semantic class, a sub repository of the images is generated having the representation of “M” known classes, and every class contains R ≥ 2 images. In our implementation the value of “R” is set on 30, which means that 30 images from every semantic class from a ground truth image repository are used for the development of training repository. On this sub repository neural networks with specific class association parameters are trained. One against all classes (OAA) association rule is used for the network development, with the target to decrease the mean squared error between actual association and the NN based obtained association. The class specific training set can be defined as: Ω_tr = Ω_pos ∪ Ω_neg where Ω_pos are representing the R images from a particular class, and Ω_neg are all other images in the training repository. Once the training is complete, semantic class of all images present in the image repository is determined on the base of decision function and the association rules.

Due to the object composition present in the images, many images may tend to show the association with some other classes, i.e., in the case of Mountain images which sometimes associate with the beach images. Therefore, a mechanism is required to reduce such associations. This is the reason, that the class finalization process also involves top “K” neighbors in the semantic association process using the majority voting rule (MVR) [50]:

C * (x) = s g n {\sum_{i} C_{i} (X) - \frac{K - 1}{2}}

(26)

where C_i(X) is the class wise association of input image and its top neighbors.

C_{i} (X) = \bar{y} f l

(27)

where l = {1, 2, 3…, n} represents the total number of neural network structures, and

\bar{y} f l

returns the association factor of a particular neural network structure with a specific class. MVR counts the largest number of classifiers that agrees with each other [50]. Therefore, the class association can be determined by:

C_{F}^{*} (x) = a r g m a x (\sum_{i} C * (x))

(28)

Once the semantic association of all images present in the image repository is determined, we store the semantic association values in a file that serves for us as the semantic association database. Therefore, after determining the semantic class of the query image through trained neural networks, we compute the Euclidean distance of the query image with all images of the same semantic class by taking into an account the values of the semantic association database, and generate the output on the basis of the feature similarities.

4. Performance Evaluation

To elaborate the retrieval capabilities of the proposed method numerous experiments are performed on three different image datasets. For implementation purposes, we have used Matlab 2010 in the Windows 7 environment using a core i3 machine by Dell. The detail of the experiments will be presented in the following subsections. Section 4.1 describes the datasets used for image retrieval purposes. Section 4.2 is about the retrieval precision and recall on randomly selected queries. Section 4.3 and 4.4 describe the comparison results of the proposed method with some state of the art works in CBIR.

4.1. Image Datasets

For our experiments we have used three image datasets namely: Corel, Coil, and Caltech 101. The Corel dataset contains 10,908 images of the size 384 × 256 or 256 × 384 each. For this dataset, we have reported the results on ten semantic categories having 100 images in each category. These semantic classes are namely: Africa, Buses, Beach, Dinosaurs, Buildings, Elephants, Horses, Mountains, Flowers, and Food. The reason for our choice to report the result on these categories is that: these categories are the same semantic groups used by most of the researchers who are working in the domain of CBIR to report the effectiveness of their work [31,33,37,39,41], so a clear performance comparison is possible in term of the reported results. To further elaborate the performance of the proposed system, experiments are also performed on Columbia object image library (COIL) [33] having 7200 images from 100 different categories. Finally, we have used the Caltech 101 image set. This dataset consists of 101 image categories and every category has different number of images in it. For the simplification purposes we have manually selected 30 categories which contain at least 100 images from every semantic category.

4.2. Retrieval Precision/Recall Evaluation

For our experiments we have written a computer simulation which randomly selects 300 images from image repository and uses them as the query image. As already described above, we are using image datasets having the images grouped in the form of semantic concepts, and on the base of their labels we can automatically determine their semantic association. We run this simulation on all three datasets mentioned previously and determine the performance by counting how many correct results are obtained against each query image. In proposed work, we reported the average result after running the five time experiments from each image category. For our experiments, a reverted index mechanism is proposed which after determining the semantic class of the query image, returns the relevant images against it, i.e., a method followed by the Google for text document search. According to the proposed method, we apply the trained neural networks on every image present in the image repository and determine its semantic class. The class association information is stored in a file which serves for us the semantic association database. The usability of this approach is that, after determining the semantic information for one time, we only need to determine the semantic class of the query image and relevance information about the semantic cluster is predetermined. Overall, the class association accuracy is determined in terms of the precision and recall using following formulas:

Precision = \frac{N_{A (q)}}{N_{R (q)}}

(29)

Recall = \frac{N_{A (q)}}{N_{t}}

(30)

where N_A₍_q₎ represents the relevant images matching to the query image, N_R₍_q₎ represents the retrieved images against the query image, and N_t the total number of relevant images available in the database. Precision or specificity determines the ability of system to retrieve only those images which are relevant for any query image amongst all of the retrieved images, while Recall rate also known as sensitivity or true positive rate, determines the ability of classifier systems in terms of model association with their actual class. For the elaboration of results, top 20 retrieved images against any query image are used to compute the precision and recall. We have reported the average of the results as we mentioned previously after running our technique for five times. To elaborate the performance of the proposed system, we have randomly selected three images from 10 previously mentioned image categories from Corel image set and displayed their results in Figure 8. Here we have displayed the retrieval results which represent the precision obtained by our method against these query images. In this regard, we have displayed the results from the top 10 to top 40 retrievals against these randomly selected query images. The quantitative analysis of the proposed method suggests that the quality of the system is good in terms of precision as reliable results are appearing against these random selections. The most reliable results appear in the range of 10 to 30 images against query image as there are 100 images in a single category. The important thing to note here is that these are the results we have achieved without the involvement of any kind of the external supervision done by the user, as most of the relevance feedback based CBIR techniques do.

In Figures 9 and 10, the same experiment is performed on Caltech 101 image set by randomly selecting 4 images. Precision and Recall are reported on top 10 to top 60 retrieval rates. Hence, on the base of retrieval accuracy we can say the proposed method is quite efficient. Another important thing to note is that the results reported here represent the retrieval against three random queries, while overall accuracy is reported on the average of 100 query images and performing the experiments for five times.

4.3. Comparison on Corel Image Set

To determine the usability of proposed method, it is compared with some state of the art methods in CBIR. In this regard, the technique is compared with [31,33,37,39,41]. The reason for our choice to compare with these techniques is that these systems have reported their results on the common denomination of the ten semantic categories of Corel dataset as described earlier. Therefore, a clear performance comparison is possible. Table 2 explains the class wise comparison of the proposed system with other comparative systems in terms of precision. The results show that proposed system has performed better than all other systems in terms of average precision obtained. Table 3 describes the performance comparison in terms of Recall rates with the same systems. From the results, it could be easily observed that the proposed system has the highest recall rates. Figure 11 shows the performance of proposed method in term of precision with other state of art system. Figure 12 describes the performance comparison in terms of Recall rates with the same systems. From results it could be easily observed that the proposed system has the highest recall rates as show in Table 3.

4.4. Comparison on Coil Image Set

From the precision and recall results described for the Corel dataset, we can observe that integrating curvelet transform with enhanced dominant colors extraction and texture (ICTEDCT) has second highest rates in terms of precision and recall. Therefore we have reported the performance comparison on Coil dataset on different retrieval rates against ICTEDCT [33]. For this experiment five images are selected from each image category and then performance of both systems is compared against each category. From the results elaborated in Figure 13, it can be clearly observed that proposed method is giving higher recall and precision rates as compare to ICTEDCT [33]. Hence from the results of proposed method on Coil and Corel datasets, we can say that proposed method is much more precise and effective as compare to other CBIR systems.

4.5. Comparison with State-of-the-Art Methods

The retrieval results are compared with the state-of-art image retrieval methods, including the methods of Efficient content-based image retrieval using multiple support vector machines ensemble (EMSVM) [51], Simplicity [22], CLUE [23], patch based histogram of oriented gradients-local binary pattern (Patch based HOG-LBP) [52], and Edge orientation difference histogram and color-SIFT (EODH and Color-SIFT) [53]. The reason of our choice for comparison with these techniques is that: these systems have reported their results on the common denomination of the ten semantic categories of Corel dataset as described earlier. Hence, a clear performance comparison is possible with above state of art methods. Table 4 presents comparison of the proposed system with other comparative systems in terms of average precision. The results show that proposed system has performed better than all other systems in terms of average precision obtained. Same results are graphically illustrated in Figure 14. Table 5 describes the performance comparison in terms of Recall rates with the same systems. From the results, it can easily observed that the proposed system has the better recall rates. Figure 15 describes the performance comparison in terms of recall rates with the same systems.

5. Conclusions

With several application benefits, content based image retrieval has gained a lot of research attention. The paper has introduced a mechanism for automatic image retrieval. The major consideration of the paper was that the most prominent image results can appear if we generate the image representations that emphasize the core image objects instead of considering every image patch. Therefore, we have applied the bandelet transform for feature extraction, which considers the core objects found in an image. To further enhance the image representation capabilities, color features are also incorporated. Semantic association is performed through Artificial Neural Networks, and an inverted index mechanism is used to return the images against queries to assure the fast retrieval. The results of the proposed method are reported on three image data sets namely: Corel, Coil, and Caltech-101. The comparison results with other standard CBIR systems have revealed that the proposed system has outperforms other systems in terms of average precision and recall values.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (2013R1A1A2008180).

Author Contributions

Rehan Ashraf and Khalid Bashir designed the research while Rehan Ashraf performed the experiments, analyzed the data and plotted the figures. Aun Irtaza helped in writing the paper. During the development of the paper, we benefited from the suggestions and critical insights provided by Khalid Bashir, Aun Irtaza and Muhammad Tariq Mahmood. Correspondence and requests for materials should be addressed to Rehan Ashraf. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gudivada, V.N.; Raghavan, V.V. Content based image retrieval systems. Computer 1995, 28, 18–22. [Google Scholar]
Smeulders, A.W.; Worring, M.; Santini, S.; Gupta, A.; Jain, R. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1349–1380. [Google Scholar]
Datta, R.; Li, J.; Wang, J.Z. Content-based image retrieval: approaches and trends of the new age, Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, NY, USA, 1–2 August 2005; pp. 253–262.
Lei, Z.; Fuzong, L.; Bo, Z. A. CBIR method based on color-spatial feature, Proceedings of the IEEE Region 10 Conference, Cheju Island, South Korea, 15–17 September 1999; pp. 166–169.
Smith, J.R.; Chang, S.F. Tools and Techniques for Color Image Retrieval. Proc. SPIE. 1996, 2670, 2–7. [Google Scholar]
Plataniotis, K.N.; Venetsanopoulos, A.N. Color Image Processing and Applications; Springer: New York, NY, USA, 2000. [Google Scholar]
Chitaliya, N.; Trivedi, A. Comparative analysis using fast discrete Curvelet transform via wrapping and discrete Contourlet transform for feature extraction and recognition, Proceedings of 2013 International Conference on Intelligent Systems and Signal Processing (ISSP), Gujarat, India, 1–2 March 2013; pp. 154–159.
Barley, A.; Town, C. Combinations of Feature Descriptors for Texture Image Classification. J. Data Anal. Inf. Process 2014, 2. [Google Scholar] [CrossRef]
Sumana, I.J.; Islam, M.M.; Zhang, D.; Lu, G. Content based image retrieval using curvelet transform, Proceedings of 2008 IEEE 10th Workshop on Multimedia Signal Processing, Cairns, Australia, 8–10 October 2008; pp. 11–16.
Zhang, J.; Tan, T. Brief review of invariant texture analysis methods. Pattern Recognit. 2002, 35, 735–747. [Google Scholar]
Yang, M.; Kpalma, K.; Ronsin, J. A survey of shape feature extraction techniques. In Pattern Recognition Techniques, Technology and Applications; Yin, P.-Y., Ed.; InTech: Rijeka, Croatia, 2008; pp. 43–90. [Google Scholar]
Zhang, D.; Lu, G. Shape-based image retrieval using generic Fourier descriptor. Signal Process. Image Commun. 2002, 17, 825–848. [Google Scholar]
Vimina, E.; Jacob, K.P. Content Based Image Retrieval Using Low Level Features of Automatically Extracted Regions of Interest. J. Image Graph. 2013, 1, 7–11. [Google Scholar]
Velmurugan, K.; Baboo, L.D.S.S. Content-based image retrieval using SURF and colour moments. Glob. J. Comput. Sci. Technol. 2011, 11, 1–4. [Google Scholar]
Liu, Y.; Zhang, D.; Lu, G.; Ma, W.Y. A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 2007, 40, 262–282. [Google Scholar]
Liu, G.H.; Yang, J.Y. Content-based image retrieval using color difference histogram. Pattern Recognit. 2013, 46, 188–198. [Google Scholar]
Hejazi, M.R.; Ho, Y.S. An efficient approach to texture-based image retrieval. Int. J. Imaging Syst. Technol. 2007, 17, 295–302. [Google Scholar]
Kekre, H.; Thepade, S.D.; Sarode, T.K.; Suryawanshi, V. Image Retrieval using Texture Features extracted from GLCM, LBG and KPE. Int. J. Comput. Theory Eng. 2010, 2, 1793–8201. [Google Scholar]
Zhang, D.; Lu, G. A comparative study on shape retrieval using Fourier descriptors with different shape signatures, Proceedings of International Conference on Intelligent Multimedia and Distance Education (ICIMADE01), Fargo, ND, USA, 1–3 June 2001; pp. 1–9.
Prasad, B.; Biswas, K.K.; Gupta, S. Region-based image retrieval using integrated color, shape, and location index. Comput. Vis. Image Underst. 2004, 94, 193–233. [Google Scholar]
Yuan, X.; Yu, J.; Qin, Z.; Wan, T. A. SIFT-LBP image retrieval model based on bag of features, Proceedings of 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011.
Wang, J.Z.; Li, J.; Wiederhold, G. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 947–963. [Google Scholar]
Chen, Y.; Wang, J.Z.; Krovetz, R. CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Trans. Image Process 2005, 14, 1187–1201. [Google Scholar]
Irtaza, A.; Jaffar, M.A. Categorical image retrieval through genetically optimized support vector machines (GOSVM) and hybrid texture features. Signal Image Video Process 2014. [Google Scholar] [CrossRef]
Sk. Sajidaparveen, G.; Chandramohan, B. Medical image retrieval using bandelet. Int. J. Sci. Eng. Technol. 2014, 02, 1103–1115. [Google Scholar]
Peyré, G.; Mallat, S. Surface compression with geometric bandelets. ACM Trans. Graph. (TOG) 2005, 24, 601–608. [Google Scholar]
Le Pennec, E.; Mallat, S. Sparse geometric image representations with bandelets. IEEE Trans. Image Process 2005, 14, 423–438. [Google Scholar]
Alomar, F.A.; Muhammad, G.; Aboalsamh, H.; Hussain, M.; Mirza, A.M.; Bebis, G. Gender recognition from faces using bandlet and local binary patterns, Proceedings of 2013 20th International Conference onSystems, Signals and Image Processing (IWSSIP), Bucharest, Romania, 7–9 July 2013; pp. 59–62.
Flickner, M.; Sawhney, H.; Niblack, W.; Ashley, J.; Huang, Q.; Dom, B.; Gorkani, M.; Hafner, J.; Lee, D.; Petkovic, D.; et al. Query by image and video content: The QBIC system. Computer 1995, 28, 23–32. [Google Scholar]
Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. Proc. SPIE. 1992. [Google Scholar] [CrossRef]
Rao, M.B.; Rao, B.P.; Govardhan, A. CTDCIRS: Content based image retrieval system based on dominant color and texture features. Int. J. Comput. Appl. 2011, 18, 40–46. [Google Scholar]
Yue, J.; Li, Z.; Liu, L.; Fu, Z. Content-based image retrieval using color and texture fused features. Math. Comput. Model 2011, 54, 1121–1127. [Google Scholar]
Youssef, S.M. ICTEDCT-CBIR: Integrating curvelet transform with enhanced dominant colors extraction and texture analysis for efficient content-based image retrieval. Comput. Electr. Eng. 2012, 38, 1358–1376. [Google Scholar]
Singha, M.; Hemachandran, K. Content based image retrieval using color and texture. Signal Image Process. Int. J. 2012, 3, 39–57. [Google Scholar]
Liu, G.H.; Li, Z.Y.; Zhang, L.; Xu, Y. Image retrieval based on micro-structure descriptor. Pattern Recognit. 2011, 44, 2123–2133. [Google Scholar]
Wang, X.Y.; Yang, H.Y.; Li, D.M. A new content-based image retrieval technique using color and texture information. Comput. Electr. Eng. 2013, 39, 746–761. [Google Scholar]
Lin, C.H.; Chen, R.T.; Chan, Y.K. A smart content-based image retrieval system based on color and texture feature. Image Vis. Comput. 2009, 27, 658–665. [Google Scholar]
Ashraf, R.; Mahmood, T.; Irtaza, A.; Bajwa, K.B. A novel approach for the gender classification through trained neural networks. J. Basic Appl. Sci. Res. 2014, 4, 136–144. [Google Scholar]
Jhanwar, N.; Chaudhuri, S.; Seetharaman, G.; Zavidovique, B. Content based image retrieval using motif cooccurrence matrix. Image Vis. Comput. 2004, 22, 1211–1220. [Google Scholar]
Deng, Y.; Manjunath, B. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 800–810. [Google Scholar]
ElAlami, M.E. A novel image retrieval model based on the most relevant features. Knowl.-Based Syst. 2011, 24, 23–32. [Google Scholar]
Mallat, S.; Peyré, G. A review of bandlet methods for geometrical image representation. Numer. Algorithms 2007, 44, 205–234. [Google Scholar]
Qu, X.; Yan, J.; Xie, G.; Zhu, Z.; Chen, B. A novel image fusion algorithm based on bandelet transform. Chin. Opt. Lett. 2007, 5, 569–572. [Google Scholar]
Le Pennec, E.; Mallat, S. Bandelet image approximation and compression. Multiscale Model. Simul. 2005, 4, 992–1039. [Google Scholar]
Peyré, G.; Mallat, S. Orthogonal bandelet bases for geometric images approximation. Commun. Pure Appl. Math. 2008, 61, 1173–1212. [Google Scholar]
Weber, M.; Crilly, P.; Blass, W.E. Adaptive noise filtering using an error-backpropagation neural network. IEEE Trans. Instrum. Meas. 1991, 40, 820–825. [Google Scholar]
Andrysiak, T.; Choraś, M. Image retrieval based on hierarchical Gabor filters. Int. J. Appl. Math. Comput. Sci. 2005, 15, 471–480. [Google Scholar]
Lam, M.; Disney, T.; Pham, M.; Raicu, D.; Furst, J.; Susomboon, R. Content-based image retrieval for pulmonary computed tomography nodule images. Proc. SPIE. 2007. [Google Scholar] [CrossRef]
Acharya, T.; Ray, A.K. Image Processing: Principles and Applications; John Wiley & Sons: Napoli, Italy, 2005. [Google Scholar]
Tao, D.; Tang, X.; Li, X.; Wu, X. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1088–1099. [Google Scholar]
Yildizer, E.; Balci, A.M.; Hassan, M.; Alhajj, R. Efficient content-based image retrieval using multiple support vector machines ensemble. Expert Syst. Appl. 2012, 39, 2385–2396. [Google Scholar]
Yu, J.; Qin, Z.; Wan, T.; Zhang, X. Feature integration analysis of bag-of-features model for image retrieval. Neurocomputing 2013, 120, 355–364. [Google Scholar]
Tian, X.; Jiao, L.; Liu, X.; Zhang, X. Feature integration of EODH and Color-SIFT: Application to image retrieval based on codebook. Signal Process. Image Commun. 2014, 29, 530–545. [Google Scholar]

Figure 1. Proposed Method.

Figure 2. Bandelet Transform [35,37]. (a) Dyadic segmentation based on local directionality of the image; (b) A sample bandelet segmentation square that contains a strong regularity function shown by the red dash; (c) Geometric flow and sampling position; (d) Sampling position adapted to the warped geometric flow; (e) Illustration of a warping example.

Figure 3. Geometric flow representation using different block sizes (a) small size 4 × 4; (b) medium size 8 × 8.

Figure 4. Object categorization on the base of Geometric flow obtained through Bandletization.

Figure 5. Types of texture.

Figure 6. The structure of neural network.

Figure 7. (a) RGB original image; (b) Y matrix luminance image; (c) Canny luma image; (d) Canny RGB image.

Figure 8. Query Performance on Corel image dataset with top 10 to top 40 Retrievals.

Figure 9. Query Performance on Caltech image dataset with top 10 to top 60 Retrievals in terms of Precision.

Figure 10. Query Performance on Caltech image dataset with top 10 to top 60 Retrievals in terms of Recall.

Figure 11. Comparison of mean precision obtained by proposed method with other standard retrieval systems.

Figure 12. Comparison of mean recall obtained by proposed method with other standard retrieval systems.

Figure 13. Comparison of precision and recall obtained by proposed method with integrating curvelet transform with enhanced dominant colors extraction and texture (ICTEDCT).

Figure 14. Comparison of mean precision obtained by proposed method with state of art retrieval systems.

Figure 15. Comparison of mean recall obtained by proposed method with state of art retrieval systems.

Table 1. Summary of Neural network structure for every image category used in this work.

**Table 1.** Summary of Neural network structure for every image category used in this work.
	INPUT

Input:	$\vec{a} = (a_{1, \dots,} a_{n})$	$\dim (\vec{a}) = N$

	MIDDLE (HIDDEN) LAYER

Input:	$\vec{b} = U \vec{a}$	$\dim (\vec{b}) = M$
Output:	$\vec{c} = f (b - \vec{s})$	$\dim (\vec{c}) = M$
U:	MxN weight matrix
f:	hidden layer activation function
$\vec{s} :$	thresholds

	OUTPUT LAYER

Input:	$\vec{d} = W \vec{c}$	$\dim (\vec{d}) = K$
Output:	$\vec{e} = g (d - \vec{t})$	$\dim (\vec{e}) = 1$
W:	1×M weight matrix
g:	output layer activation function
$\vec{t}$ :	thresholds

	ERROR CORRECTION

MSE:	$E = 1 / 2 {(\vec{p} - \vec{e})}^{2}$
ΔW_ij	−α∂E/∂W_ij= αδ_ic_j
Δt_i	αδ_i
ΔU_ji	−β∂E/∂U_ji

Table 2. Comparison of mean precision obtained by proposed method with other standard retrieval systems on top 20 retrievals.

**Table 2.** Comparison of mean precision obtained by proposed method with other standard retrieval systems on top 20 retrievals.
Class	Proposed Method	[39]	[31]	[41]	[33]	[37]
Africa	0.65	0.45	0.56	0.70	0.64	0.68
Beach	0.70	0.39	0.53	0.56	0.64	0.54
Buildings	0.75	0.37	0.61	0.57	0.70	0.54
Buses	0.95	0.74	0.89	0.87	0.92	0.88
Dinosaurs	1.00	0.91	0.98	0.97	0.99	0.99
Elephants	0.80	0.30	0.57	0.67	0.78	0.65
Flowers	0.95	0.85	0.89	0.91	0.95	0.89
Horses	0.90	0.56	0.78	0.83	0.95	0.80
Mountains	0.75	0.29	0.51	0.53	0.74	0.52
Food	0.75	0.36	0.69	0.74	0.81	0.73

Mean	0.820	0.522	0.701	0.735	0.812	0.722

Table 3. Comparison of mean recall obtained by proposed method with other standard retrieval systems on top 20 retrievals.

**Table 3.** Comparison of mean recall obtained by proposed method with other standard retrieval systems on top 20 retrievals.
Class	Proposed Method	[39]	[31]	[41]	[33]	[37]
Africa	0.13	0.11	0.15	0.15	0.13	0.14
Beach	0.14	0.12	0.19	0.19	0.13	0.19
Buildings	0.15	0.12	0.18	0.18	0.14	0.17
Buses	0.19	0.09	0.11	0.11	0.18	0.12
Dinasours	0.20	0.07	0.09	0.09	0.20	0.10
Elephants	0.16	0.13	0.15	0.15	0.16	0.14
Flowers	0.19	0.08	0.11	0.11	0.19	0.11
Horses	0.18	0.10	0.13	0.13	0.19	0.13
Mountains	0.15	0.13	0.22	0.22	0.15	0.21
Food	0.15	0.12	0.13	0.13	0.16	0.13

Mean	0.164	0.107	0.146	0.146	0.163	0.144

Table 4. Comparison of mean precision obtained by proposed method with state-of-art methods on top 20 retrievals.

**Table 4.** Comparison of mean precision obtained by proposed method with state-of-art methods on top 20 retrievals.
Class	Proposed Method	EMSVM [51]	Simplicity [22]	CLUE [23]	HOG-LBP [52]	SIFT [53]
Africa	0.65	0.5	0.4	0.5	0.55	0.75
Beach	0.70	0.7	0.3	0.35	0.47	0.38
Buildings	0.75	0.2	0.4	0.45	0.56	0.54
Buses	0.95	0.8	0.6	0.65	0.91	0.97
Dinosaurs	1.00	0.9	0.96	0.95	0.94	0.99
Elephants	0.80	0.6	0.3	0.3	0.49	0.66
Flowers	0.95	1.00	0.6	0.75	0.85	0.92
Horses	0.90	0.8	0.6	0.7	0.52	0.87
Mountains	0.75	0.5	0.25	0.3	0.37	0.59
Food	0.75	0.6	0.45	0.6	0.55	0.62

Mean	0.820	0.661	0.486	0.555	0.621	0.729

Table 5. Comparison of mean recall obtained by proposed method with state-of-art methods on top 20 retrievals.

**Table 5.** Comparison of mean recall obtained by proposed method with state-of-art methods on top 20 retrievals.
Class	Proposed Method	EMSVM [51]	Simplicity [22]	CLUE [23]	HOG-LBP [52]	SIFT [53]
Africa	0.13	0.1	0.08	0.1	0.11	0.15
Beach	0.14	0.14	0.06	0.07	0.09	0.08
Buildings	0.15	0.04	0.07	0.09	0.11	0.11
Buses	0.19	0.14	0.12	0.13	0.18	0.19
Dinasours	0.20	0.18	0.19	0.19	0.1	0.13
Elephants	0.16	0.12	0.06	0.06	0.1	0.13
Flowers	0.19	0.2	0.12	0.15	0.17	0.18
Horses	0.18	0.16	0.12	0.14	0.1	0.17
Mountains	0.15	0.1	0.05	0.06	0.08	0.12
Food	0.15	0.12	0.09	0.12	0.11	0.13

Mean	0.164	0.130	0.096	0.111	0.124	0.146

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ashraf, R.; Bashir, K.; Irtaza, A.; Mahmood, M.T. Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions. Entropy 2015, 17, 3552-3580. https://doi.org/10.3390/e17063552

AMA Style

Ashraf R, Bashir K, Irtaza A, Mahmood MT. Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions. Entropy. 2015; 17(6):3552-3580. https://doi.org/10.3390/e17063552

Chicago/Turabian Style

Ashraf, Rehan, Khalid Bashir, Aun Irtaza, and Muhammad Tariq Mahmood. 2015. "Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions" Entropy 17, no. 6: 3552-3580. https://doi.org/10.3390/e17063552

Article Menu

Content Based Image Retrieval Using Embedded Neural Networks with Bandletized Regions

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Bandelet Transform

3.1.1. Alpert bases in bandelet transform

3.1.2. Texture Feature Extraction using Bandelets

3.1.3. Artificial Neural Network

3.1.4. Gabor Feature

3.2. Color Feature Extraction

3.3. Fusion Vector

3.4. Content based image retrieval

4. Performance Evaluation

4.1. Image Datasets

4.2. Retrieval Precision/Recall Evaluation

4.3. Comparison on Corel Image Set

4.4. Comparison on Coil Image Set

4.5. Comparison with State-of-the-Art Methods

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI