# Color Texture Image Retrieval Based on Local Extrema Features and Riemannian Distance

## Abstract

## 1. Introduction

## 2. Local Texture Representation and Description Using Local Extrema Features

#### 2.1. Approach

#### 2.2. Generation of Local Extrema-Based Descriptor (LED)

- fix the number (N) of nearest local maxima and nearest local minima for each one;
- or, fix a window size $W\times W$ around each keypoint; then, all local maxima and minima inside that window are considered.

- Mean and variance of three color channels:$${\mu}_{c}^{\mathrm{max}}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}{I}_{c}\left(q\right),$$$${\sigma}_{c}^{2\mathrm{max}}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}{({I}_{c}\left(q\right)-{\mu}_{c}^{\mathrm{max}}\left(p\right))}^{2},$$
- Mean and variance of spatial distances from each local maximum to point p:$${\mu}_{d}^{\mathrm{max}}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}d(p,q),$$$${\sigma}_{d}^{2\mathrm{max}}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}{(d(p,q)-{\mu}_{d}^{\mathrm{max}}\left(p\right))}^{2},$$
- Circular variance [38] of angles of geometric vectors formed by each local maximum and point p:$${\sigma}_{\mathrm{cir},\alpha}^{2\mathrm{max}}\left(p\right)=1-\sqrt{{\overline{c}}_{\alpha}{\left(p\right)}^{2}+{\overline{s}}_{\alpha}{\left(p\right)}^{2}},$$$${\overline{c}}_{\alpha}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}\mathrm{cos}\alpha (p,q),$$$${\overline{s}}_{\alpha}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}\mathrm{sin}\alpha (p,q),$$$$\alpha (p,q)=\mathrm{arctan}\left(\right)open="("\; close=")">\frac{{y}_{q}-{y}_{p}}{{x}_{q}-{x}_{p}}$$
- Mean and variance of gradient magnitudes:$${\mu}_{g}^{\mathrm{max}}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}\nabla I\left(q\right),$$$${\sigma}_{g}^{2\mathrm{max}}=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}{\left(\right)}^{\nabla}2$$
- Circular variance [38] of gradient orientations:$${\sigma}_{\mathrm{cir},\theta}^{2\mathrm{max}}\left(p\right)=1-\sqrt{{\overline{c}}_{\theta}{\left(p\right)}^{2}+{\overline{s}}_{\theta}{\left(p\right)}^{2}},$$$${\overline{c}}_{\theta}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}\mathrm{cos}\theta \left(q\right),$$$${\overline{s}}_{\theta}\left(p\right)=\frac{1}{|{\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)|}\sum _{q\in {\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)}\mathrm{sin}\theta \left(q\right).$$

## 3. Proposed Framework for Texture and Color Image Retrieval

#### 3.1. LED Feature Extraction

#### 3.2. Dissimilarity Measure for Retrieval

#### 3.3. Proposed Retrieval Framework

- Load the query color image ${I}_{\mathrm{color}}$.
- Convert the image to grayscale I.
- Compute gradient images from I:
- +
- ($\nabla I,\theta $) for the 27D version,
- +
- ($\nabla I,\nabla {I}_{\sigma},\theta ,{\theta}_{\sigma}$) for the enhanced 33D version.

- Extract the keypoint set and the two local extrema sets from I:
- +
- keypoint set: $S\left(I\right)={S}_{{\omega}_{1}}^{\mathrm{max}}\left(I\right)$,
- +
- extrema sets: ${S}_{{\omega}_{2}}^{\mathrm{max}}\left(I\right)$ and ${S}_{{\omega}_{2}}^{\mathrm{min}}\left(I\right)$.

- Generate LED descriptors for all keypoints:
- +

- Estimate the feature covariance matrice for these LED descriptors.
- Compute the Riemannian distance (15) between the query and the other images from the database.
- Sort these distance measures and produce the best matches as the final retrieval result for the query.

## 4. Experimental Study

#### 4.1. Image Databases

#### 4.2. Evaluation Criteria

#### 4.3. Results and Discussion

#### 4.3.1. Performance in Retrieval Accuracy

#### 4.3.2. Computation Time

#### 4.3.3. Sensitivity Analysis

#### 4.3.4. Sensitivity to Distance Measure

## 5. Conclusions

## Author Contributions

## Conflicts of Interest

## References

**Figure 1.**Illustration: spatial distribution and arrangement of local maximum pixels (in red) and local minimum pixels (in green) within four different textures (size of $100\times 100$ pixels) extracted from the Vistex database [37]: Buildings.0009, Leaves.0016, Water.0005 and Fabric.0017. The local extrema are detected using a $5\times 5$ search window. The figure is better visualized in colors. (

**a**) Buildings.0009; (

**b**) Leaves.0016; (

**c**) Water.0005; (

**d**) Fabric.0017.

**Figure 2.**Geometric and gradient information from a local maximum (resp. local minimum) pixel $q=({x}_{q},{y}_{q})$ within ${\mathcal{N}}_{W}^{\mathrm{max}}\left(p\right)$ (resp. ${\mathcal{N}}_{W}^{\mathrm{min}}\left(p\right)$) considered for the calculation of LED descriptor at the studied keypoint $p=({x}_{p},{y}_{p})$. Here, $d(p,q)$ is the distance between p and q; $\alpha (p,q)$ is the angle of the vector yielded from p to q. We have $d(p,q)=d(q,p)$ but $\alpha (p,q)\ne \alpha (q,p)$. Then, $\nabla I\left(q\right)$, $\theta \left(q\right)$ are the gradient magnitude and gradient orientation at q.

**Figure 3.**Proposed framework for color texture image retrieval using the local extrema-based descriptors and the Riemannian distance.

**Figure 5.**Sensitivity of the proposed method to its parameters in terms of average retrieval rate (%) and feature extraction time (s). Experiments were conducted on Vistex-640 data set using the 27D LED+RD. (

**a**) sensitivity to window size ${\omega}_{1}$ for keypoint extraction; (

**b**) sensitivity to window size W for descriptor generation.

Database | Number of Classes | Number of Images/Class | Total Number |
---|---|---|---|

Vistex-640 | 40 | 16 | 640 |

Stex-7616 | 476 | 16 | 7616 |

CBT-2800 | 112 | 25 | 2800 |

USPtex-2292 | 191 | 12 | 2292 |

Outex-1380 | 68 | 20 | 1380 |

**Table 2.**Average retrieval rate (ARR) on

**Vistex-640**database by the proposed method compared to the state-of-the-art methods.

Method | Using Color | ARR (%) |
---|---|---|

GT+GGD+KLD [4] | - | 76.57 |

DT-CWT [5] | - | 80.78 |

DT-CWT+DT-RCWT [5] | - | 82.34 |

MGG+Gaussian+KLD [11] | √ | 87.40 |

MGG+Laplace+GD [11] | √ | 91.70 |

DCT+MGMM [7] | - | 84.94 |

Gaussian Copula+Gamma+ML [12] | √ | 89.10 |

Gaussian Copula+Weibull+ML [12] | √ | 89.50 |

Student-t Copula+GG+ML [12] | √ | 88.90 |

LMEBP [15] | - | 87.77 |

Gabor LMEBP [15] | - | 87.93 |

LtrP [17] | - | 90.02 |

Gabor LtrP [17] | - | 90.16 |

LEP+colorhist [19] | √ | 82.65 |

MCMCM+DBPSP [52] | √ | 86.17 |

Gaussian Copula-MWbl [9] | - | 84.41 |

ODBTC [26] | √ | 90.67 |

Gaussian Copula+Gabor Wavelet [10] | √ | 92.40 |

EDBTC [27] | √ | 92.55 |

DDBTC [28] | √ | 92.65 |

LECoP [21] | √ | 92.99 |

ODII [25] | √ | 93.23 |

CNN-AlexNet [51] | √ | 91.34 |

CNN-VGG16 [51] | √ | 92.97 |

CNN-VGG19 [51] | √ | 93.04 |

Proposed LED+RD (27D) | √ | 94.64 |

Proposed LED+RD (33D) | √ | 94.70 |

Class | 27D | 33D | Class | 27D | 33D |
---|---|---|---|---|---|

Bark.0000 | 76.95 | 75.00 | Food.0008 | 100.00 | 100.00 |

Bark.0006 | 98.05 | 98.05 | Grass.0001 | 94.53 | 93.36 |

Bark.0008 | 83.20 | 84.38 | Leaves.0008 | 99.61 | 100.00 |

Bark.0009 | 77.13 | 78.13 | Leaves.0010 | 100.00 | 100.00 |

Brick.0001 | 100.00 | 99.22 | Leaves.0011 | 100.00 | 100.00 |

Brick.0004 | 97.66 | 98.05 | Leaves.0012 | 55.86 | 56.25 |

Brick.0005 | 100.00 | 100.00 | Leaves.0016 | 86.72 | 86.72 |

Buildings.0009 | 100.00 | 100.00 | Metal.0000 | 98.83 | 99.61 |

Fabric.0000 | 100.00 | 100.00 | Metal.0002 | 100.00 | 100.00 |

Fabric.0004 | 76.95 | 77.34 | Misc.0002 | 100.00 | 100.00 |

Fabric.0007 | 99.61 | 99.61 | Sand.0000 | 100.00 | 100.00 |

Fabric.0009 | 100.00 | 100.00 | Stone.0001 | 82.42 | 82.42 |

Fabric.0011 | 100.00 | 100.00 | Stone.0004 | 90.63 | 91.02 |

Fabric.0014 | 100.00 | 100.00 | Terrain.0010 | 94.92 | 95.31 |

Fabric.0015 | 100.00 | 100.00 | Tile.0001 | 91.80 | 89.45 |

Fabric.0017 | 96.48 | 97.66 | Tile.0004 | 100.00 | 100.00 |

Fabric.0018 | 98.83 | 100.00 | Tile.0007 | 100.00 | 100.00 |

Flowers.0005 | 100.00 | 100.00 | Water.0005 | 100.00 | 100.00 |

Food.0000 | 100.00 | 100.00 | Wood.0001 | 96.88 | 96.88 |

Food.0005 | 99.22 | 99.22 | Wood.0002 | 88.28 | 90.23 |

ARR | 94.64 | 94.70 |

**Table 4.**Average retrieval rate (ARR) on

**Stex-7616**database by the proposed method compared to the state-of-the-art methods.

Method | Using Color | ARR (%) |
---|---|---|

GT+GGD+KLD [4] | - | ${}^{\u2605}$ 49.30 |

DT-CWT+Weibull+KLD [6] | - | ${}^{\u2605}$ 58.80 |

MGG+Laplace+GD [11] | √ | ${}^{\u2605}$ 71.30 |

DWT+Gamma+KLD [8] | - | ${}^{\u2605}$ 52.90 |

Gaussian Copula+Gamma+ML [12] | √ | 69.40 |

Gaussian Copula+Weibull+ML [12] | √ | 70.60 |

Student-t Copula+GG+ML [12] | √ | 65.60 |

LEP+colorhist [19] | √ | 59.90 |

DDBTC [28] | √ | 44.79 |

LECoP [21] | √ | 74.15 |

Gaussian Copula+Gabor Wavelet [10] | √ | 76.40 |

CNN-AlexNet [51] | √ | 68.84 |

CNN-VGG16 [51] | √ | 74.92 |

CNN-VGG19 [51] | √ | 73.93 |

Proposed LED+RD (27D) | √ | 79.95 |

Proposed LED+RD (33D) | √ | 80.08 |

**Table 5.**Average retrieval rate (ARR) on

**CBT-2800**database by the proposed method compared to the state-of-the-art methods.

Method | Using Color | ARR (%) |
---|---|---|

LBP [13] | - | ${}^{\u2605}$ 81.75 |

LtrP [17] | - | ${}^{\u2605}$ 82.05 |

LOCTP-YCbCr [20] | √ | 84.46 |

LOCTP-HSV [20] | √ | 88.60 |

LOCTP-LAB [20] | √ | 88.90 |

LOCTP-RGB [20] | √ | 93.89 |

CNN-AlexNet [51] | √ | 90.72 |

CNN-VGG16 [51] | √ | 91.64 |

CNN-VGG19 [51] | √ | 90.36 |

Proposed LED+RD (27D) | √ | 99.06 |

Proposed LED+RD (33D) | √ | 98.79 |

**Table 6.**Average retrieval rate (%) on

**USPtex-2292**and

**Outex-1360**databases by the proposed method compared to some reference methods.

Method | UPStex-2292 | Outex-1360 |
---|---|---|

DDBTC (${L}_{1}$) [28] | 63.19 | 61.97 |

DDBTC (${L}_{2}$) [28] | 55.38 | 57.51 |

DDBTC (${\chi}^{2}$) [28] | 73.41 | 65.54 |

DDBTC (Canberra) [28] | 74.97 | 66.82 |

CNN-AlexNet [51] | 83.57 | 69.87 |

CNN-VGG16 [51] | 85.03 | 72.91 |

CNN-VGG19 [51] | 84.22 | 73.20 |

Proposed LED+RD (27D) | 90.22 | 76.54 |

Proposed LED+RD (33D) | 90.50 | 76.67 |

Method | Feature Dimension |
---|---|

DT-CWT [4] | $(3\times 6+2)\times 2$ = 40 |

DT-CWT+DT-RCWT [4] | $2\times (3\times 6+2)\times 2$ = 80 |

LBP [13] | 256 |

LTP [16] | $2\times 256=512$ |

LMEBP [15] | $8\times 512=4096$ |

Gabor LMEBP [15] | $3\times 4\times 512=6144$ |

LEP+colorhist [19] | $16\times 8\times 8\times 8=8192$ |

LECoP(${H}_{18}{S}_{10}{V}_{256}$) [21] | $18+10+256=284$ |

LECoP(${H}_{36}{S}_{20}{V}_{256}$) [21] | $36+20+256=312$ |

LECoP(${H}_{72}{S}_{20}{V}_{256}$) [21] | $72+20+256=348$ |

ODII [25] | 128 + 128 = 256 |

CNN-AlexNet [51] | 4096 |

CNN-VGG16 [51] | 4096 |

CNN-VGG19 [51] | 4096 |

Proposed LED+RD (27D) | 27 |

Proposed LED+RD (33D) | 33 |

**Table 8.**Performance of the proposed method in terms of feature extraction (FE) time and dissimilarity measurement (DM) time. Experiments were conducted on the Vistex-640 database.

Version | FE Time (s) | DM Time (s) | Total Time (s) | ARR (%) | |||
---|---|---|---|---|---|---|---|

${\mathit{t}}_{\mathbf{data}}$ | ${\mathit{t}}_{\mathbf{image}}$ | ${\mathit{t}}_{\mathbf{data}}$ | ${\mathit{t}}_{\mathbf{image}}$ | ${\mathit{t}}_{\mathbf{data}}$ | ${\mathit{t}}_{\mathbf{image}}$ | ||

27D | 422.8 | 0.661 | 22.3 | 0.035 | 445.1 | 0.695 | 94.64 |

33D | 476.6 | 0.745 | 28.1 | 0.044 | 504.7 | 0.789 | 94.70 |

Method | Feature Dimension | Extraction Time (ms) |
---|---|---|

SIFT [39] | 128 | 538.6 |

SURF [40] | 64 | 162.2 |

BRISK [53] | 64 | 8.2 |

BRIEF [54] | 32 | 3.2 |

Proposed LED (27D) | 27 | 1298.7 |

Proposed LED (33D) | 33 | 1476.3 |

**Table 10.**Sensitivity to distance measures in terms of dissimilarity measurement time and average retrieval rate (ARR). Experiments were conducted on the

**Vistex-640**database using the 27D LED descriptors.

Distance Measure | Formula | ${\mathit{t}}_{\mathbf{data}}$ (s) | ${\mathit{t}}_{\mathbf{image}}$ (ms) | ARR (%) |
---|---|---|---|---|

Taking into account mean feature vectors | ||||

Simplified Mahalanobis | ${({\mu}_{1}-{\mu}_{2})}^{T}\left(\right)open="("\; close=")">{C}_{1}^{-1}+{C}_{2}^{-1}$ | 40.51 | 63.30 | 90.48 |

Symmetric Kullback–Leibler | $\mathrm{trace}\left(\right)open="("\; close=")">{C}_{1}{C}_{2}^{-1}+{C}_{2}{C}_{1}^{-1}({\mu}_{1}-{\mu}_{2})$ | 47.10 | 73.59 | 91.82 |

Not accounting for mean feature vectors | ||||

Log-euclidean | $\left(\right)$ | 20.21 | 31.58 | 72.65 |

Bartlett | $\mathrm{log}\frac{|{C}_{1}+{C}_{2}{|}^{2}}{|{C}_{1}\left|\right|{C}_{2}|}$ | 27.03 | 42.23 | 76.51 |

Wishart-like | $\mathrm{trace}\left(\right)open="("\; close=")">{C}_{1}{C}_{2}^{-1}+{C}_{2}{C}_{1}^{-1}$ | 37.79 | 59.90 | 92.39 |

Riemannian | $\sqrt{{\sum}_{\ell =1}^{d}{\mathrm{log}}^{2}{\lambda}_{\ell}}$, where ${\lambda}_{\ell}{C}_{1}{\chi}_{\ell}-{C}_{2}{\chi}_{\ell}=0,\ell =1\dots d$ | 22.34 | 34.91 | 94.64 |

