Deep Learning Using Symmetry, FAST Scores, Shape-Based Filtering and Spatial Mapping Integrated with CNN for Large Scale Image Retrieval

: This article presents symmetry of sampling, scoring, scaling, ﬁltering and suppression over deep convolutional neural networks in combination with a novel content-based image retrieval scheme to retrieve highly accurate results. For this, fusion of ResNet generated signatures is performed with the innovative image features. In the ﬁrst step, symmetric sampling is performed on the images from the neighborhood key points. Thereafter, the rotated sampling patterns and pairwise comparisons are performed, which return image smoothing by applying standard deviation. These values of smoothed intensity are calculated as per local gradients. Box ﬁltering adjusts the results of approximation of Gaussian with standard deviation to the lowest scale and suppressed by non-maximal technique. The resulting feature sets are scaled at various levels with parameterized smoothened images. The principal component analysis (PCA) reduced feature vectors are combined with the ResNet generated feature. Spatial color coordinates are integrated with convolutional neural network (CNN) extracted features to comprehensively represent the color channels. The proposed method is experimentally applied on challenging datasets including Cifar-100 (10), Cifar-10 (10), ALOT (250), Corel-10000 (10), Corel-1000 (10) and Fashion (15). The presented method shows remarkable results on texture datasets ALOT with 250 categories and fashion (15). The proposed method reports signiﬁcant results on Cifar-10 and Cifar-100 benchmarks. Moreover, outstanding results are obtained for the Corel-1000 dataset in comparison with state-of-the-art methods.


Introduction
Symmetry creates harmony, which is desirable in all areas of life. Symmetry application to content-based image retrieval (CBIR) using deep learning is a novel idea implemented by this contribution. Traditionally, CBIR is performed with colors [1,2], objects and textures features [3][4][5].
In the modern era, image retrieval using deep learning is a big challenge to retrieve relevant images with highest precision, which still misses the symmetry for feature extraction and description. Convolutional neural network mainly focuses on large image datasets [6] so that CNN is increasing its attention in the computer vision community. Accurate image representation and image feature extraction play a vital role in the image analysis [6]. Moreover, deep learning is also employed for feature learning and For this, ∀(q Symmetry 2020, 12, 612 5 of 29 For this, the number of all possible tests depends upon the sampling pattern configuration performed. The sampling pattern of BRISK contains M = 60 points. Moreover, BRISK has significant differences in the sampling pattern from the pre-rotation and pre-scaling [56]. BRISK applies results of the sampling pattern in the sampling point focused at the radius on the key points. BRISK employs fewer sampling points than the pairwise comparisons as one point is participating in multiple comparisons. Finally, there is a spatial restriction on comparisons, such as brightness differences are required as locally consistent. Equation (1) [56] above contains a bit string of the length 512. Algorithm 1 shows the symmetric sampling for which local intensity gradients, collected sampling patterns, spread concenter circles and scaling is initiated at step 1-4. It iterates for the whole image at step 5 where each value it computes sampling patterns and scaling as shown in step 6. At step 7, it computes the local intensity gradients with step 6 results. After these computations, a pairwise comparison is performed at step 8 for which scalable sampling is applied at step 9. Let M (M − 1)/2 be the sampling point pairs ( , ) and be the standard deviation that is proportional to the distance among circle points, respectively. The descriptor is used to perform the comparisons of the short distance of the point pairs ( , ) ∈ H, where H is the subset of short distance pairs and B is the set of all pairs of sampling points. Therefore, c represents every bit corresponding to Equation (1) [56] as follows: For this, the number of all possible tests depends upon the sampling pattern configuration performed. The sampling pattern of BRISK contains M = 60 points. Moreover, BRISK has significant differences in the sampling pattern from the pre-rotation and pre-scaling [56]. BRISK applies results of the sampling pattern in the sampling point focused at the radius on the key points. BRISK employs fewer sampling points than the pairwise comparisons as one point is participating in multiple Symmetry 2020, 12, x FOR PEER REVIEW 6 of 30 step 5 where each value it computes sampling patterns and scaling as shown in step 6. At step 7, it computes the local intensity gradients with step 6 results. After these computations, a pairwise comparison is performed at step 8 for which scalable sampling is applied at step 9.

Gaussian Smoothing
To best describe the derived values, from the symmetry of sampling using BRISK, smoothing is applied in a new way to obtain the optimal values of gradients. Smoothing is applied on image features after symmetric sampling in Figure 1. For this task, we used the Gaussian smoothing technique [56]. The calculation is not feasible for the proposed approach when power resources are limited. K ( , ) and K ( , ) are values of smoothed intensity, respectively, that are applied to calculate the local gradient t ( , ) using Equation (5) [56] as follows:

Gaussian Smoothing
To best describe the derived values, from the symmetry of sampling using BRISK, smoothing is applied in a new way to obtain the optimal values of gradients. Smoothing is applied on image features after symmetric sampling in Figure 1. For this task, we used the Gaussian smoothing technique [56]. The calculation is not feasible for the proposed approach when power resources are limited. K (q i , θ i ) and K (q j , θ j ) are values of smoothed intensity, respectively, that are applied to calculate the local gradient t (q i , q j ) using Equation (5) [56] as follows: Thus, the distance values are set as ∂ max = 9.75 g and ∂ min = 13.67 g, where g is the scale for the key point of l. Iteration is performed in I by point pairs. Estimation of pattern direction for overall characteristic of the key point l is represented in Equation (6) [56] as follows: The calculation is used for long-distance pairs based on the local gradients, which are not important in the global feature determination [56].

Space-Based Placements
To achieve subsampling for feature reduction at the signature formation step, we introduced space-based placement. The resultant values of smoothing were input to the placement process, as seen in Figure 1. The scale spaces are normally used for an image pyramid and divided into octaves. Gaussian smoothing was applied repeatedly for images, and then sub sampling was performed to achieve the highest level of the pyramid. The key points are used for detection in the octave layers of an image pyramid to increase the computation efficiency. The pyramid layers of scale-space contain n octaves w i and n intra octaves f i where n = 4 and l = {0,1,2, 3, . . . , n − 1}. The formation of the octaves is obtained by increasing the half sampling of the real image. Every intra octave f i is placed in w i and w i+1 layers.

Shape-Based Filtering
Shape detection refers to object recognition with outlined boundary detection, which is filtered at the intermediate step to finely recognize image content; that is a novelty of the presented approach to embody this cycle with correct input and output parameters. As seen in Figure 1, after the space-based placement step, the proposed method is performed on the images. The box filtering technique is applied at this step. The box filters of 9×9 are applied to approximate the Gaussian smoothing with a standard deviation θ i = 1.2 and is represented by the lowest scale to compute the response [57]. The box filters are denoted by E xx , E yy and E xy . The rectangular regions are easy and efficient for the computation when apply weights. Equation (7) is used to calculate the Hessian determinant [57] as follows: Determinant (L approx ) = E xx E yy − (d E xy ) 2 (7) where d is used as the relative weight for the responses of the filters to balance the expression of the Hessian determinant as described in Equation (7) above. This technique applies to preserve the energy among the Gaussian kernels. Moreover, the responses of filters are normalized according to their size [57]. The Hessian determinant is used to represent the response of an image at location x.
Over various scales, the responses are saved in a map.

FAST Score-Based Suppression
The presented model used a novel inline series of steps including suppression after filtering to obtain the best scores, which are accelerated using the intervention of the FAST algorithm. In series with these steps, we incorporated interpolation for better estimation of intermediate values. The suppression applies at this step in Figure 1. Suppression of images is used to search the whole or sudden alteration in images by using various methods. The proposed technique uses a non-maximal suppression algorithm that deals with the detection problems of many interest point [58]. Non-maximal suppression requires accomplishing the maximum condition to its eight neighborhood features from accelerated segment test (FAST) and obtain scores of similar layers. The score is used for the maximum threshold, even if an image point is a corner. The scores in the above layer and bottom layer are required to be lower. Equal size square patches are compared by using the proposed methodology; the side length is selected as 2 pixels in the layer suspected maximal. The neighborhood layers are used for various discretization [55]. Interpolation applies at the patches of boundaries [55]. Non-maximal suppression performs a sub-pixel and endless scale improvement for every detected maximum point. Firstly, a two-dimensional quadratic function is fit in the least-squares to every three score patches to limit the complexity of the refinement process. The result is that the three sub-pixel refines highest saliencies. We suppose a 3×3 score patch on every layer. Furthermore, refine scores apply to fit a one-dimensional parabola along the scale axis; the estimation of the final score and estimation of scale at its highest. Finally, the coordinates of images among the patches in the next layer are re-interpolated to determine the scale.
Algorithm [58] is used to calculate a score function H 1 for every point of detection. Interest points are generated with algorithms. Moreover, the score function is a summation of absolute difference among pixels in the arc adjacent and central pixel. Thus, the two adjacent interest points perform comparison between their values of H 1 and delete one with the lower value of H 1 in Equation (8) [58] as follows: For Equation (8) above, s 1 is used as a threshold for recognition, pv is used for pixel values, v is used for value and e for central pixel. For an alternative way, score function can also be defined. Moreover, the heuristic function is used for the comparison of two adjacent corners, and after comparison, removes the minor one [58].
Furthermore, distortion suppression is described for the distortion of the digital image. An accurate model includes lenses, an object and so on. One common model uses an image b [w, z] distorted by a linear, shift-invariant system l [w, z] and made impure by noise m [w, z]. The algorithm 2 shows the steps of the suppression algorithm in which it computes the score function for each deleted point where d is used as the relative weight for the responses of the filters to balance the expression of the Hessian determinant as described in Equation (7) above. This technique applies to preserve the energy among the Gaussian kernels. Moreover, the responses of filters are normalized according to their size [57]. The Hessian determinant is used to represent the response of an image at location x. Over various scales, the responses are saved in a map.

FAST Score-Based Suppression
The presented model used a novel inline series of steps including suppression after filtering to obtain the best scores, which are accelerated using the intervention of the FAST algorithm. In series with these steps, we incorporated interpolation for better estimation of intermediate values. The suppression applies at this step in Figure 1. Suppression of images is used to search the whole or sudden alteration in images by using various methods. The proposed technique uses a non-maximal suppression algorithm that deals with the detection problems of many interest point [58]. Nonmaximal suppression requires accomplishing the maximum condition to its eight neighborhood features from accelerated segment test (FAST) and obtain scores of similar layers. The score is used for the maximum threshold, even if an image point is a corner. The scores in the above layer and bottom layer are required to be lower. Equal size square patches are compared by using the proposed methodology; the side length is selected as 2 pixels in the layer suspected maximal. The neighborhood layers are used for various discretization [55]. Interpolation applies at the patches of boundaries [55]. Non-maximal suppression performs a sub-pixel and endless scale improvement for every detected maximum point. Firstly, a two-dimensional quadratic function is fit in the least-squares to every three score patches to limit the complexity of the refinement process. The result is that the three sub-pixel refines highest saliencies. We suppose a 3×3 score patch on every layer. Furthermore, refine scores apply to fit a one-dimensional parabola along the scale axis; the estimation of the final score and estimation of scale at its highest. Finally, the coordinates of images among the patches in the next layer are re-interpolated to determine the scale.
Algorithm [58] is used to calculate a score function [ for every point of detection. Interest points are generated with algorithms. Moreover, the score function is a summation of absolute difference among pixels in the arc adjacent and central pixel. Thus, the two adjacent interest points perform comparison between their values of [ and delete one with the lower value of [ in Equation (8) [58] as follows: For Equation (8) above, [ is used as a threshold for recognition, pv is used for pixel values, v is used for value and e for central pixel. For an alternative way, score function can also be defined. Moreover, the heuristic function is used for the comparison of two adjacent corners, and after comparison, removes the minor one [58].
Furthermore, distortion suppression is described for the distortion of the digital image. An accurate model includes lenses, an object and so on. One common model uses an image b [w, z] distorted by a linear, shift-invariant system l [w, z] and made impure by noise m [w, z]. The algorithm 2 shows the steps of the suppression algorithm in which it computes the score function for each deleted point ϼ. At step 2, it looks until the length of the point ϼ for which adjacent pixels are assigned at step 3. To retain or discard the values in steps 4-8, the adjacent pixels with their neighboring pixels are checked and a decision is made the keep or discard, depending upon the value size. The score . At step 2, it looks until the length of the point where d is used as the relative weight for the responses of the filters to balance t Hessian determinant as described in Equation (7) above. This technique app energy among the Gaussian kernels. Moreover, the responses of filters are norm their size [57]. The Hessian determinant is used to represent the response of an Over various scales, the responses are saved in a map.

FAST Score-Based Suppression
The presented model used a novel inline series of steps including suppress obtain the best scores, which are accelerated using the intervention of the FAST with these steps, we incorporated interpolation for better estimation of interm suppression applies at this step in Figure 1. Suppression of images is used to sudden alteration in images by using various methods. The proposed technique suppression algorithm that deals with the detection problems of many inter maximal suppression requires accomplishing the maximum condition to its features from accelerated segment test (FAST) and obtain scores of similar laye for the maximum threshold, even if an image point is a corner. The scores in bottom layer are required to be lower. Equal size square patches are compared by methodology; the side length is selected as 2 pixels in the layer suspected maxima layers are used for various discretization [55]. Interpolation applies at the patche Non-maximal suppression performs a sub-pixel and endless scale improvemen maximum point. Firstly, a two-dimensional quadratic function is fit in the least-sq score patches to limit the complexity of the refinement process. The result is tha refines highest saliencies. We suppose a 3×3 score patch on every layer. Furthe apply to fit a one-dimensional parabola along the scale axis; the estimation of estimation of scale at its highest. Finally, the coordinates of images among the layer are re-interpolated to determine the scale.
Algorithm [58] is used to calculate a score function [ for every point o points are generated with algorithms. Moreover, the score function is a sum difference among pixels in the arc adjacent and central pixel. Thus, the two adj perform comparison between their values of [ and delete one with the lo Equation (8) [58] as follows: For Equation (8) above, [ is used as a threshold for recognition, pv is used used for value and e for central pixel. For an alternative way, score function Moreover, the heuristic function is used for the comparison of two adjacen comparison, removes the minor one [58].
Furthermore, distortion suppression is described for the distortion of th accurate model includes lenses, an object and so on. One common model use distorted by a linear, shift-invariant system l [w, z] and made impure by noise m 2 shows the steps of the suppression algorithm in which it computes the sco deleted point ϼ. At step 2, it looks until the length of the point ϼ for which adjacen at step 3. To retain or discard the values in steps 4-8, the adjacent pixels with thei are checked and a decision is made the keep or discard, depending upon the v for which adjacent pixels are assigned at step 3. To retain or discard the values in steps 4-8, the adjacent pixels with their neighboring pixels are checked and a decision is made the keep or discard, depending upon the value size. The score function is called that is pseudo-coded in algorithm 3 where contiguous arcs are assigned at step 2 and the central pixel is assigned at step 3 for which difference of central pixel is calculated at step 3 and the absolute is computed at step 4. These are summed up until the length of the range. Its summation in the form of score function is return at the end of the score-function.
1 function is called that is pseudo-coded in algorithm 3 where contiguous arcs are assigned at step 2 and the central pixel is assigned at step 3 for which difference of central pixel is calculated at step 3 and the absolute is computed at step 4. These are summed up until the length of the range. Its summation in the form of score function is return at the end of the score-function.

Various Level Scaling
In this approach, scaling is introduced to fetch the image features at various levels for more accurate image content depiction. The proposed method applies scaling that is used for structure handling of an image at various levels in Figure 1. The images can be represented at scale space by smoothing kernel parameterization for fine-scaling [30]. 0 is attained by down sampling the real image 0 by a factor of 1.5, and the remaining layers are derived. Thus, if the scale is denoted by c, then c ( 1 ) = 2j and c ( ) = (2 × l) × 1.5. In the BRISK framework, 9-16 masks are commonly used that require a minimum of 9 pixels consecutively in the circle of 16 pixels.
Moreover, linear scale-space has large applicability and attractive properties to derive possible values from the small set of scale-space axioms [59]. Scale-space axioms require linearity and spatial shift-invariance. A scale-space axiom is a collection of various approaches of notion formalizing that is not allowed to create new structures from the coarse to the fine grain in transformation [59]. The scale-space includes concepts for linear scale-space derived operators where visual information is processed by computerized systems on the basis of expressing a large class of visual operations.

Feature Reduction
The presented technique shows the image features in a compact manner; however, to minimize the retrieval time the features are represented by their coefficients by applying principal component analysis (PCA). In Figure  1, After various level scaling step, the presented technique applies a feature reduction algorithm, which uses different

Various Level Scaling
In this approach, scaling is introduced to fetch the image features at various levels for more accurate image content depiction. The proposed method applies scaling that is used for structure handling of an image at various levels in Figure 1. The images can be represented at scale space by smoothing kernel parameterization for fine-scaling [30]. a 0 is attained by down sampling the real image w 0 by a factor of 1.5, and the remaining layers are derived. Thus, if the scale is denoted by c, then c (w 1 ) = 2j and c ( f l ) = (2 × l) × 1.5. In the BRISK framework, 9-16 masks are commonly used that require a minimum of 9 pixels consecutively in the circle of 16 pixels.
Moreover, linear scale-space has large applicability and attractive properties to derive possible values from the small set of scale-space axioms [59]. Scale-space axioms require linearity and spatial shift-invariance. A scale-space axiom is a collection of various approaches of notion formalizing that is not allowed to create new structures from the coarse to the fine grain in transformation [59]. The scale-space includes concepts for linear scale-space derived operators where visual information is processed by computerized systems on the basis of expressing a large class of visual operations.

Feature Reduction
The presented technique shows the image features in a compact manner; however, to minimize the retrieval time the features are represented by their coefficients by applying principal component analysis (PCA). In Figure 1, After various level scaling step, the presented technique applies a feature reduction algorithm, which uses different mathematical models to reduce insignificant images and perform image compression for the image data redundancy [60]. The PCA applies to reduce the variables, and these variables are used to measure the number of factors that are uncorrelated [61]. The PCA works when most of the variables are measuring on the similar construct [61] and is also used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number Symmetry 2020, 12, 612 9 of 29 of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of p 1 random variables and reducing the measurement from p 1 to r [55]. The PCA is used to find linear combinations where b 1 v, b 2 v . . . b r v are called principal components that have an extreme variance of data, and these are not correlated with last b 1 vt. To solve the issue of maximization, the eigenvectors are b 1 , b 2 , . . . .., b r of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all p 1 real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose 'I is an image, g colors in 'I are quantized as D'I, . . . ,Dg. Color is represented for a pixel P = (a, Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of [ random variables and reducing the measurement from [ to r [55]. The PCA is used to find linear combinations where [ i , G i … k i are called principal components that have an extreme variance of data, and these are not correlated with last [ i . To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as ∈ Ί Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as [ ∈ Ί ɉ G ∈ Ί Equation (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13) [19] as follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing ), which can be defined in Equation (9) [19] as follows: P = (a, j) ∈ I Equation (10) [19] is used to compute the distance between pixels P 1 (a 1 , Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of [ random variables and reducing the measurement from [ to r [55]. The PCA is used to find linear combinations where To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as ∈ Ί Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ). Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as [ ∈ Ί ɉ G ∈ Ί Equation (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13) [19] as follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing 1 ) and P 2 (a 2 , Symmetry 2020, 12, x FOR PEER REVIEW 9 used for data processing comprising the extraction of a few synthetic variables that are ca principal components. Principal components are data projection sequences [61]. The PCA is app for the compression and dimension reduction to find the highly variance-based coefficients [61] number of random variables is decreased in a procedure is called dimension reduction. Let m denoted by dimensions on a vector v of [ random variables and reducing the measurement [ to r [55]. The PCA is used to find linear combinations where . , k of covariance matrix t th corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for principal components, respectively, and the summation ratio of first r eigenvalues to the summa of the variances for all [ real variables that are represented by the proportion of all variance in real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using coefficients to show the color image contents in a compact and efficient way. A color histog captures only the color distribution and does not include any spatial information. In our appro the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colo Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be define Equation (9) [19] as follows: [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2) define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an im and Di is pixel color. The histogram X is a linear function in the image size that can be compute time O(g 2 ). Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represe (12) is used to represent the spatial arrangement of color pixels in the imag Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a g color pixel. The spatial relationship between same color values is defined in Equation (13)[1 follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D c pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection extraction to obtain the maximum results accuracy. We took advantage of ResNet's ama 2 ); we define it as follows: The color histogram X of an image 'I is defined in Equation (11) [19] as P ∈ I Equation (11) shows that X Di ('I)/g 2 gives the probability of pixel color 'IDi where 'I is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image 'I is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of [ random variables and reducing the measurement from [ to r [55]. The PCA is used to find linear combinations where [ i , G i … k i are called principal components that have an extreme variance of data, and these are not correlated with last [ i . To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as ∈ Ί Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13)[19] as follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing (12) P 1 ∈ 'ID Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of random variables and reducing the measurement from to r [55]. The PCA is used to find linear combinations where , … are called principal components that have an extreme variance of data, and these are not correlated with last . To solve the issue of maximization, the eigenvectors are , , … . . , of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as Equation (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13)[19] as follows: From Equation (12), Equation (13) is derived, where is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of [ random variables and reducing the measurement from [ to r [55]. The PCA is used to find linear combinations where [ i , G i … k i are called principal components that have an extreme variance of data, and these are not correlated with last [ i . To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as ∈ Ί Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13)[19] as follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing is denoted by the probability that a color pixel D a at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13) [19] as follows: Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as ∈ Ί Equation (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ).
Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) [19] as (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13)[19] as follows: From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing From Equation (12), Equation (13) is derived, where Symmetry 2020, 12, x FOR PEER REVIEW 9 of 30 used for data processing comprising the extraction of a few synthetic variables that are called principal components. Principal components are data projection sequences [61]. The PCA is applied for the compression and dimension reduction to find the highly variance-based coefficients [61]. The number of random variables is decreased in a procedure is called dimension reduction. Let m be denoted by dimensions on a vector v of [ random variables and reducing the measurement from [ to r [55]. The PCA is used to find linear combinations where [ i , G i … k i are called principal components that have an extreme variance of data, and these are not correlated with last [ i . To solve the issue of maximization, the eigenvectors are [ , G , … . . , k of covariance matrix t that is corresponding to the r biggest eigenvalues. Moreover, eigenvalues provide the variances for the principal components, respectively, and the summation ratio of first r eigenvalues to the summation of the variances for all [ real variables that are represented by the proportion of all variance in the real dataset, used firstly for r principle components [55].

Spatial Color Features Extraction
The presented approach introduces a new way of presentation for color channels using their coefficients to show the color image contents in a compact and efficient way. A color histogram captures only the color distribution and does not include any spatial information. In our approach, the spatial correlation of color changes is represented by distance. Suppose Ί is an image, g colors in Ί are quantized as DΊ, …,Dg. Color is represented for a pixel P = (a, ɉ ), which can be defined in Equation (9) [19] as follows: Equation (10) [19] is used to compute the distance between pixels P1 (a1, ɉ1) and P2 (a2, ɉ2); we define it as follows: The color histogram X of an image Ί is defined in Equation (11) [19] as (11) shows that XDi (Ί)/g 2 gives the probability of pixel color ΊDi where Ί is an image, and Di is pixel color. The histogram X is a linear function in the image size that can be computed in time O(g 2 ). Suppose that the distance d1 is fixed a priori. Then the correlogram for the image Ί is represented of a, k ∈ [g]; j ∈ [d] in Equation (12) (12) is used to represent the spatial arrangement of color pixels in the image. In Equation (12), ₳ is denoted by the probability that a color pixel B at distance l away from a given color pixel. The spatial relationship between same color values is defined in Equation (13)[19] as follows: ₱ r From Equation (12), Equation (13) is derived, where ₱ is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing is represented by the probability of D color pixel with l distance.

Residual Network Architecture
The proposed ResNet architecture technique is fused with the presented feature detection and extraction to obtain the maximum results accuracy. We took advantage of ResNet's amazing performance and image classification ability. Using ResNet, suppose that H(x) is fit by some stacked layers of stacked as underlying mapping; here x is denoted as inputs to these layers firstly. The main contribution of Residual networks is skipped connection where it is easier to enhance the residual mapping than to enhance the real unreferenced mapping, H(x) = F(x) + x; the stacked weight layers attempt to estimate F(x) instead of H(x) and there is no any additional parameters and computation complication in ResNet [62]. If the additional layers are constructed like identity mappings, the deeper model has a training error not more than its lower counterpart. It has been suggested that the problem of degradation is that it has difficulties in identity mapping approximation by many non-linear layers. With reformulation of residual learning, if the identity mapping is optimal, it easily drives multiple weights of nonlinear layers to zero method identity mappings. Originally, reformulation helps to condition the issue while identity mappings are best. In [62], presented with experiments that the functions of learned residual have responses on a small scale, identity mappings also provided reasonable preconditioning. In [62], building blocks are defined in Equation (14) [62] as follows: In Equation (14) above, x and y are vectors of layers for input and output. F (x, {Wi}) is a function represented as residual mapping that is used to be learned. The F + x operation is applied with a shortcut connection, and the addition is elementwise. After addition, the proposed method adopts the second nonlinearity. The dimensions must be of equal size as x and F in Equation (14). When changes are performed on output and input channels, a linear estimation W s for the shortcut connection is performed on matching dimensions as in Equation (15) [62] as follows: Here identity mapping is appropriate to address the problem of degradation and is inexpensive, and therefore W s is used only for matching dimensions. The above-mentioned notations are for completely connected layers, for simplicity, and can be applied to convolutional layers. F (x, {Wi}) is represented as many convolutional layers. The plain starting points are inspired by the VGG nets [48] philosophy. The convolutional layers normally have 3 × 3 filters and follow two easy design rules: (1) The layers have filters of similar numbers if the output feature has same map size; and (2) for the split feature map size, doubled the number of filters is used to save the time complexity for every layer.
In this architecture, a resized image with its smaller side is sampled randomly to increase the scale [49]. From an image, a 224 × 224 crop portion is sampled randomly with each pixel subtracted mean [49]. The augmentation of standard color is used in [49]. In ResNet [62], batch normalization [50] is performed right after every convolution and the activation is performed before. The weights initialized in ResNet architecture [62] are the same as in [51] and every residual or plain net is trained from scratch. In ResNet architecture [62], a stochastic gradient descent (SGD) size of 256 is used with a mini-batch. Rate of learning started from 0.1 and was divided by 10 when the error was increased, and trained models were iterated up to 60 × 10 4 . Using ResNet, performance of several applications of computer vision were increased such as a face detection of objects and recognition of faces. Our feature vectors were fused with the ResNet generated feature vectors to create a powerful image signature that deeply represents the shape and object features. Hence, the proposed algorithm sharply represents the deep image features image. These features are input into bag-of-words (BoW) architecture, which by using KNN (k-nearest neighbors) to fetch and index shows the resultant images. The proposed light-weight image retrieval method inputs the PCA reduced image features to the bag-of-words framework for an efficient retrieval, even for large datasets. Hence the novelties and main contributions of the presented work are as follows: 1.
For the maximum image content representation, a new method was introduced that uses fusion of ResNet architecture-based formulated signatures with color-texture-shape carrier attributes produced by proposed algorithms.

2.
To obtain the improved results, a method was introduced that enhances the capabilities of ResNet architecture by its internal coupling with primitive features.

3.
A light-weight feature detection criteria was introduced that flows in simple steps including sampling, smoothing and placement and returns the salient key points with local feature information.

4.
An efficient and effective feature extraction strategy is presented with three easy steps of filtering, suppression and scaling whose results are a reflection of potential image contents.

5.
Color and gray level features-based image retrieval is introduced for the first time that is strengthened by a convolutional neural network. 6.
An innovative time efficient recipe is presented that works equally for color channels, 0-255 gray levels and all layers of ResNet architecture and shows the retrieval results in fractions of time.

7.
A new idea is contributed by assembling the fused CNN and primitive features with the bag of visual words framework to index, rank and retrieve the classified images.

Datasets
The effectiveness and accuracy of an image retrieval system was tested by selecting the suitable image datasets. Some databases are tailored to be dependent on the nature of project. Most of the contributions are mainly domain oriented. Moreover, it is a big challenge to compare the results with the existing method. Different databases of the images are used to their complexity and versatility, generic usage of CBIR, object occlusion, object information and spatial color. Experiments are performed on a variety of standardized databases included Cifar-100 [63], Cifar-10 (10) [63], ALOT (250) [19], Corel-1000 (10) [19], Corel-10000 (10) [19] and Fashion (15) [64]. These challenging benchmarks belong to a wide area of image semantic groups. Result accuracy is affected by image attributes such as color, quality, occlusion, cluttering, overlapping, size and object location [65]. The characteristics of selected datasets are diversity of the image categories and images from various areas, and the categories have several types of objects positioned at background and foreground [18].

Input Process
The query image is input to the system that is normally a color image. This color image is then converted to gray scale image for the proposed algorithm and the color images also input to the convolutional neural network. In the input process, the input image is selected from the image benchmarks. In the proposed work, the input images were taken from Cifar-100 (10), Cifar-10 (10), ALOT (250), Corel-10000 (10), Corel-1000 (10) and Fashion (15). Images were sampled for training and testing with 70% and 30% proportions for training and testing, respectively. Random images were selected from each category using permutation. Training and testing time of ResNet varied for each dataset depending upon the number of images, image size and number of categories along with the hardware aspects including batch size, processor, DL library and GPU scaling. Normally 15 to 75 epochs are used with single node GPU with variable training time of~3-450 m.

Evaluation of Precision and Recall
Precision and recall are two metrics that are used to evaluate the accuracy of performance. The positive predicted values used for precision and the evaluation of true positive rate is a recall. The precision is computed for every category by using Equation (16) [18]. The recall is computed for every category by using Equation (17) [18] as follows.
where G w(n) is used for the query images to relevant images, G u(n) is denoted by image retrieval contrary to the query image, and the total number of available related images in database are denoted by G o .

Evaluation of Average Retrieval Precision (ARP)
Average retrieval precision (ARP) graphs show the average retrieval precision of the proposed method for various datasets. The ARP is computed for each category using Equation (18) [19] as follows: In Equation (18), AP is used for average precision and k is used for total number of categories. ARP is used to compute average precision of all categories of each dataset. The ARP graph shows data orientation in sequence where every data bar is represented for correct number of retrieved images regardless the category. The x-axis shows the number of classes against average precision. The average precision is gradually decreased when the number of categories is increased because the huge number of categories plots a huge denominator. ARP is computed for datasets including Cifar-10, Cifar-100, ALOT (250), Corel-1000 (10), Corel-10000 and Fashion (15).

Evaluation of F-Measure
The f-measure is computed as the harmonic mean of average precision (p) and recall (q) using Equation (19) [66] as follows: In Equation (19), F is used for f-measure, where p is used for average precision and q for recall.

Experimental Results and Discussion
The experimentation was performed on a Core i7 machine (GPU) with 8 GB RAM. MATLAB R2019a provided the testing and training environment with CNN toolbox. Extensive experiments were performed on a variety of datasets to endorse the validity of results.

Results on Large Data
Experiments were performed on large datasets such as Cifar-10 and Cifar-100 to test the effectiveness of the proposed method. The Cifar-10 database contains 60,000 images with 10 different categories of 32 × 32 RGB color images [63]. The Cifar-10 dataset contains various semantic groups such as birds, frogs, ships, dogs, cars, cats, airplanes, horses, deer and trucks. It consists of 6000 images in each category. It is therefore inevitable to produce the retrieval results with the highest throughput. The computational load is an important impact factor at this stage. Our technique adjusted it by applying proper sampling and reduction of features in three stages as follows: first, at symmetry of sampling (Section 3.1) to maintain harmony of samples, secondly at subsampling of features (Section 3.3) and finally by applying principal component analysis (Section 3.7). These three levels of work resulted in prompt feature extraction and quick user response with low computational load. This was endorsed by the statistical fact that aggregate time for feature detection, extraction, fusion, CNN extraction and BoW indexing of an image was~0.01 to 0.015 s.
The proposed method showed the highest average precision ratios in seven categories of the Cifar-10 dataset. Figure 2 shows the sample images of different categories in which the proposed method shows the highest average precision rates. The images were classified correctly due to deep learning feature used in the proposed method. Image sampling and scaling integration with CCN features made it possible to correctly classify the images from a large range of image semantic groups, such as airplane, deer, ship, truck, horse, dog and bird. The proposed method provided above 95% mean average precision for these categories. The proposed method also showed better average precision results in some other categories, such as frog, cars and cats. Figure 3 is the graphical representation of average precision (AP) rates in seven categories of the Cifar-10 dataset. Figure 3a reports the highest AP rates in some categories and Figure 3b reports outstanding recall ratio. The tabular representation of AP rates for the Cifar-10 dataset is shown in Table 1. The proposed method showed above 90% average precision ratio for image categories such as deer, horses, dogs and birds and above 85% average precision results in airplanes and trucks. The category dogs reported a 100% AP rate, which showed the strength of the proposed method. The proposed method showed above 70% AP rates for some other categories. The mAP was 88% in 10 categories of the Cifar-10 dataset.  Figure 4a shows average retrieval precision (ARP) for 10 categories of the Cifar-10 dataset. The proposed method reported the highest ARP ratio for the categories of airplanes and frogs. Other categories also showed above 90% ARP rates, which showed the outstanding performance of the Cifar-10 dataset. Figure 4b shows the f-measure rate for the proposed method. A pie-chart is used to represent the fmeasure. The categories airplanes, frogs, trucks, horses and dogs reported 9% f-measure rate. Other categories showed 11% f-measure rate. The tabular representation of AP rates for the Cifar-10 dataset is shown in Table 1. The proposed method showed above 90% average precision ratio for image categories such as deer, horses, dogs and birds and above 85% average precision results in airplanes and trucks. The category dogs reported a 100% AP rate, which showed the strength of the proposed method. The proposed method showed above 70% AP rates for some other categories. The mAP was 88% in 10 categories of the Cifar-10 dataset.  Figure 4a shows average retrieval precision (ARP) for 10 categories of the Cifar-10 dataset. The proposed method reported the highest ARP ratio for the categories of airplanes and frogs. Other categories also showed above 90% ARP rates, which showed the outstanding performance of the Cifar-10 dataset. Figure 4b shows the f-measure rate for the proposed method. A pie-chart is used to represent the fmeasure. The categories airplanes, frogs, trucks, horses and dogs reported 9% f-measure rate. Other categories showed 11% f-measure rate. The tabular representation of AP rates for the Cifar-10 dataset is shown in Table 1. The proposed method showed above 90% average precision ratio for image categories such as deer, horses, dogs and birds and above 85% average precision results in airplanes and trucks. The category dogs reported a 100% AP rate, which showed the strength of the proposed method. The proposed method showed above 70% AP rates for some other categories. The mAP was 88% in 10 categories of the Cifar-10 dataset.  Figure 4a shows average retrieval precision (ARP) for 10 categories of the Cifar-10 dataset. The proposed method reported the highest ARP ratio for the categories of airplanes and frogs. Other categories also showed above 90% ARP rates, which showed the outstanding performance of the Cifar-10 dataset.  The Cifar-100 dataset is the same as the Cifar-10 dataset with 32 × 32 RGB color images except that it contains 100 different categories. The Cifar-100 dataset contains various semantic groups such as bowls, rabbit, clock, lamp, tiger, forest, mountain, butterfly, elephant, willow, bus, person, house, road, palm, tractor, rocket, motorcycle, etc. It consists of 600 images in each category. The proposed method showed remarkable average precision ratios in most of the Cifar-100 categories. Sample images of Cifar-100 dataset are shown in Figure 5. The proposed method achieved up to 80% AP in most of the complex image categories of the Cifer-100 dataset, as shown in Table 2. The images were classified well by the presented method. The prosed method used image sampling and shape-based smoothing with a combination of CCN features to classify images of different semantic groups. Different semantic groups of the Cifar-100 dataset include rabbit, whale, trout, flatfish, otter, sunflower, roses, apple, orange, mushroom, bottle, cups, plates, chair, wardrobe, bridge, house, camel, elephant, kangaroo, girl, man, palm, pine, willow, bus, train, rocket, tank, tractor, spider, snail, lizard and turtle. The proposed method reported 100% average precision ratios for many categories, which are mentioned in Table 2. The proposed method showed above 84% mean average precision (mAP) rate for all categories. The proposed method provided significant ARP ratios for most of the categories. The presented technique showed fmeasures between 18% and 30% for all categories.  Figure 4b shows the f-measure rate for the proposed method. A pie-chart is used to represent the f-measure. The categories airplanes, frogs, trucks, horses and dogs reported 9% f-measure rate.
Other categories showed 11% f-measure rate.
The Cifar-100 dataset is the same as the Cifar-10 dataset with 32 × 32 RGB color images except that it contains 100 different categories. The Cifar-100 dataset contains various semantic groups such as bowls, rabbit, clock, lamp, tiger, forest, mountain, butterfly, elephant, willow, bus, person, house, road, palm, tractor, rocket, motorcycle, etc. It consists of 600 images in each category. The proposed method showed remarkable average precision ratios in most of the Cifar-100 categories. Sample images of Cifar-100 dataset are shown in Figure 5.  The Cifar-100 dataset is the same as the Cifar-10 dataset with 32 × 32 RGB color images except that it contains 100 different categories. The Cifar-100 dataset contains various semantic groups such as bowls, rabbit, clock, lamp, tiger, forest, mountain, butterfly, elephant, willow, bus, person, house, road, palm, tractor, rocket, motorcycle, etc. It consists of 600 images in each category. The proposed method showed remarkable average precision ratios in most of the Cifar-100 categories. Sample images of Cifar-100 dataset are shown in Figure 5. The proposed method achieved up to 80% AP in most of the complex image categories of the Cifer-100 dataset, as shown in Table 2. The images were classified well by the presented method. The prosed method used image sampling and shape-based smoothing with a combination of CCN features to classify images of different semantic groups. Different semantic groups of the Cifar-100 dataset include rabbit, whale, trout, flatfish, otter, sunflower, roses, apple, orange, mushroom, bottle, cups, plates, chair, wardrobe, bridge, house, camel, elephant, kangaroo, girl, man, palm, pine, willow, bus, train, rocket, tank, tractor, spider, snail, lizard and turtle. The proposed method reported 100% average precision ratios for many categories, which are mentioned in Table 2. The proposed method showed above 84% mean average precision (mAP) rate for all categories. The proposed method provided significant ARP ratios for most of the categories. The presented technique showed fmeasures between 18% and 30% for all categories. The proposed method achieved up to 80% AP in most of the complex image categories of the Cifer-100 dataset, as shown in Table 2. The images were classified well by the presented method. The prosed method used image sampling and shape-based smoothing with a combination of CCN features to classify images of different semantic groups. Different semantic groups of the Cifar-100 dataset include rabbit, whale, trout, flatfish, otter, sunflower, roses, apple, orange, mushroom, bottle, cups, plates, chair, wardrobe, bridge, house, camel, elephant, kangaroo, girl, man, palm, pine, willow, bus, train, rocket, tank, tractor, spider, snail, lizard and turtle. The proposed method reported 100% average precision ratios for many categories, which are mentioned in Table 2. The proposed method showed above 84% mean average precision (mAP) rate for all categories. The proposed method provided significant ARP ratios for most of the categories. The presented technique showed f-measures between 18% and 30% for all categories. The proposed method reported significant average precision rates for the large size dataset Cifar-100. The proposed method showed excellent performance with 100% average precision rate in most of the categories. It was also observed that the presented method showed more than 80% results in other categories. The strength of the proposed method was its significant average precision results for large datasets such as Cifar-10 and Cifar-100.
The average retrieval precision (ARP) for the Cifar-100 dataset is shown in Figure 6. The proposed method showed outstanding ARP rates for the Cifar-100 dataset. It was observed that above 80% results were achieved in all categories.
Symmetry 2020, 12, x FOR PEER REVIEW 16 of 30 The average retrieval precision (ARP) for the Cifar-100 dataset is shown in Figure 6. The proposed method showed outstanding ARP rates for the Cifar-100 dataset. It was observed that above 80% results were achieved in all categories.

Results on Texture Datasets
The ALOT (250) and Fashion (15) datasets are challenging benchmarks for image categorization and classification. These datasets are mainly used to classify texture images from semantic groups. Moreover, the number of categories is an important factor in the domain of content-based image retrieval. For this challenging reason, a large database consisting of 250 categories, the ALOT dataset, was used to test the effectiveness and versatility of the proposed method. The ALOT database [19] contains 250 categories with 100 samples for each. ALOT dataset images have a 384 × 235 pixel resolution [19]. The various semantic groups in the ALOT dataset include fruit, vegetables, clothes, spices, stones, cigarettes, sands, leaves, coins, sea shells, seeds and fabrics, bubbles, embossed fabrics, vertical and horizontal lines, small repeated patterns, etc. These categories contribute different spatial information, objects, object shapes and texture information to classify images. The presented method effectively classified the texture images from semantically similar groups with similar foreground and background objects. Symmetric sampling and norm steps were applied by the proposed method to achieve remarkable results for images with different textures. The images were effectively classified using CCN features with image sampling, scaling integration and shape-based filtering by the proposed method. Scaling on different levels and symmetric sampling were used to achieve significant AP rates for various texture images. In the ALOT dataset, most of the categories contain texture images with similar patterns and colors, whereas other categories contain different object patterns. The presented method showed significant results with up to 80% average precision rates in most of the challenging categories. Sample images for the ALOT dataset with similar colors and similar patterns are shown in Figure 7. The proposed method showed significant average precision results for images with similar colors and patterns, as shown in Figure 8. It was observed that the texture images in vertical lines with the same color and with different line directions were efficiently classified and showed significant results for different image categories. Gaussian smoothing and shape-based filtering with CCN features made it possible to efficiently classify the texture images from different image categories.

Results on Texture Datasets
The ALOT (250) and Fashion (15) datasets are challenging benchmarks for image categorization and classification. These datasets are mainly used to classify texture images from semantic groups. Moreover, the number of categories is an important factor in the domain of content-based image retrieval. For this challenging reason, a large database consisting of 250 categories, the ALOT dataset, was used to test the effectiveness and versatility of the proposed method. The ALOT database [19] contains 250 categories with 100 samples for each. ALOT dataset images have a 384 × 235 pixel resolution [19]. The various semantic groups in the ALOT dataset include fruit, vegetables, clothes, spices, stones, cigarettes, sands, leaves, coins, sea shells, seeds and fabrics, bubbles, embossed fabrics, vertical and horizontal lines, small repeated patterns, etc. These categories contribute different spatial information, objects, object shapes and texture information to classify images. The presented method effectively classified the texture images from semantically similar groups with similar foreground and background objects. Symmetric sampling and norm steps were applied by the proposed method to achieve remarkable results for images with different textures. The images were effectively classified using CCN features with image sampling, scaling integration and shape-based filtering by the proposed method. Scaling on different levels and symmetric sampling were used to achieve significant AP rates for various texture images. In the ALOT dataset, most of the categories contain texture images with similar patterns and colors, whereas other categories contain different object patterns. The presented method showed significant results with up to 80% average precision rates in most of the challenging categories. Sample images for the ALOT dataset with similar colors and similar patterns are shown in Figure 7. The average retrieval precision (ARP) for the Cifar-100 dataset is shown in Figure 6. The proposed method showed outstanding ARP rates for the Cifar-100 dataset. It was observed that above 80% results were achieved in all categories.

Results on Texture Datasets
The ALOT (250) and Fashion (15) datasets are challenging benchmarks for image categorization and classification. These datasets are mainly used to classify texture images from semantic groups. Moreover, the number of categories is an important factor in the domain of content-based image retrieval. For this challenging reason, a large database consisting of 250 categories, the ALOT dataset, was used to test the effectiveness and versatility of the proposed method. The ALOT database [19] contains 250 categories with 100 samples for each. ALOT dataset images have a 384 × 235 pixel resolution [19]. The various semantic groups in the ALOT dataset include fruit, vegetables, clothes, spices, stones, cigarettes, sands, leaves, coins, sea shells, seeds and fabrics, bubbles, embossed fabrics, vertical and horizontal lines, small repeated patterns, etc. These categories contribute different spatial information, objects, object shapes and texture information to classify images. The presented method effectively classified the texture images from semantically similar groups with similar foreground and background objects. Symmetric sampling and norm steps were applied by the proposed method to achieve remarkable results for images with different textures. The images were effectively classified using CCN features with image sampling, scaling integration and shape-based filtering by the proposed method. Scaling on different levels and symmetric sampling were used to achieve significant AP rates for various texture images. In the ALOT dataset, most of the categories contain texture images with similar patterns and colors, whereas other categories contain different object patterns. The presented method showed significant results with up to 80% average precision rates in most of the challenging categories. Sample images for the ALOT dataset with similar colors and similar patterns are shown in Figure 7. The proposed method showed significant average precision results for images with similar colors and patterns, as shown in Figure 8. It was observed that the texture images in vertical lines with the same color and with different line directions were efficiently classified and showed significant results for different image categories. Gaussian smoothing and shape-based filtering with CCN features made it possible to efficiently classify the texture images from different image categories. The proposed method showed significant average precision results for images with similar colors and patterns, as shown in Figure 8. It was observed that the texture images in vertical lines with the same color and with different line directions were efficiently classified and showed significant results for different image categories. Gaussian smoothing and shape-based filtering with CCN features made it possible to efficiently classify the texture images from different image categories. The proposed method showed remarkable average precision ratios in texture images due to Gaussian smoothing and spatial mapping, applied in the presented technique. Most of the categories showed above 80% AP rates, as shown in Figure 8a. Only one category reported 70% AP rate for the proposed method. Figure 8b shows outstanding mean average precision rate for the image categories leaf, stone and fabric. All three categories showed between 90% and 100% mAP rates. Sample images of the ALOT dataset with different color and texture are shown in Figure 9. The RGB coefficient step was used by the proposed method to classify different color images. texture [19]. Table 3 shows the average precision ratios for the ALOT dataset with different image categories with leaf texture, stone texture, bubble texture, spices texture, sea shells texture, vegetables texture, fruit texture, seeds texture, beans texture, coins texture and fabric texture. It was observed that the proposed method showed above 90% AP rate in most of the bubble texture categories. The proposed method provided outstanding AP ratios in stone texture categories. The category leaf texture also showed significant results with 90% or more AP in most of the categories. Moreover, above 85% AP was achieved in the fabric texture category. The proposed method was also experimentally used for some other categories such as stones, cigarettes, vegetables, beans, coins, spices and fruit. Above 90% results were achieved in most of these categories by the proposed method. Moreover, above 93% mAP was achieved in all categories of the ALOT (250) dataset. The proposed method showed remarkable average precision ratios in texture images due to Gaussian smoothing and spatial mapping, applied in the presented technique. Most of the categories showed above 80% AP rates, as shown in Figure 8a. Only one category reported 70% AP rate for the proposed method. Figure 8b shows outstanding mean average precision rate for the image categories leaf, stone and fabric. All three categories showed between 90% and 100% mAP rates. Sample images of the ALOT dataset with different color and texture are shown in Figure 9. The RGB coefficient step was used by the proposed method to classify different color images. The proposed method showed remarkable average precision ratios in texture images due to Gaussian smoothing and spatial mapping, applied in the presented technique. Most of the categories showed above 80% AP rates, as shown in Figure 8a. Only one category reported 70% AP rate for the proposed method. Figure 8b shows outstanding mean average precision rate for the image categories leaf, stone and fabric. All three categories showed between 90% and 100% mAP rates. Sample images of the ALOT dataset with different color and texture are shown in Figure 9. The RGB coefficient step was used by the proposed method to classify different color images. texture [19]. Table 3 shows the average precision ratios for the ALOT dataset with different image categories with leaf texture, stone texture, bubble texture, spices texture, sea shells texture, vegetables texture, fruit texture, seeds texture, beans texture, coins texture and fabric texture. It was observed that the proposed method showed above 90% AP rate in most of the bubble texture categories. The proposed method provided outstanding AP ratios in stone texture categories. The category leaf texture also showed significant results with 90% or more AP in most of the categories. Moreover, above 85% AP was achieved in the fabric texture category. The proposed method was also experimentally used for some other categories such as stones, cigarettes, vegetables, beans, coins, spices and fruit. Above 90% results were achieved in most of these categories by the proposed method. Moreover, above 93% mAP was achieved in all categories of the ALOT (250) dataset.  [19]. Table 3 shows the average precision ratios for the ALOT dataset with different image categories with leaf texture, stone texture, bubble texture, spices texture, sea shells texture, vegetables texture, fruit texture, seeds texture, beans texture, coins texture and fabric texture. It was observed that the proposed method showed above 90% AP rate in most of the bubble texture categories. The proposed method provided outstanding AP ratios in stone texture categories. The category leaf texture also showed significant results with 90% or more AP in most of the categories. Moreover, above 85% AP was achieved in the fabric texture category. The proposed method was also experimentally used for some other categories such as stones, cigarettes, vegetables, beans, coins, spices and fruit. Above 90% results were achieved in most of these categories by the proposed method. Moreover, above 93% mAP was achieved in all categories of the ALOT (250) dataset.  It is noticed that the proposed method showed improved performance for most of the image categories with different shapes and colors. Image sampling, shape-based filtering, RGB coefficients and spatial mapping with CCN features made it possible to effectively and efficiently classify the images. The images were with different categories such as spices, seeds, vegetables and fruit, with different colors. The proposed method showed above 90% results for most of these types of image categories. Similarly, the image categories for embossed fabrics, bubble textures and others were subjected to experiment and classified accurately. Overall, mean average precision was above 93% for all categories of the ALOT (250) dataset.
The versatility and superiority of the proposed method was tested by experimenting with the fashion dataset. The fashion dataset is more suitable for texture analysis, since it contains images with various type of texture, shapes and color objects. The fashion dataset is a challenging set of 15 object categories, which includes 293,800 HD images. The object categories contain different types of fabrics such as uniform, jacket, long dress, shirt, suit, cloak, blouses, sweater, jersey t-shirt, polo-sport shirt, robe, undergarments, vest-waistcoat and coat [64]. In the fashion dataset, there are more than 260 thousand images with different foreground and background textures. The proposed method outperforms for the cluttered and complex objects for the reason of its object recognition capability. The image classification performed remarkably with the proposed method and showed improved AP and AR rates for overlapping, complex and cluttered objects. Figure 10 shows sample images of the fashion dataset. categories, which includes 293,800 HD images. The object categories contain different types of fabrics such as uniform, jacket, long dress, shirt, suit, cloak, blouses, sweater, jersey t-shirt, polo-sport shirt, robe, undergarments, vest-waistcoat and coat [64]. In the fashion dataset, there are more than 260 thousand images with different foreground and background textures. The proposed method outperforms for the cluttered and complex objects for the reason of its object recognition capability. The image classification performed remarkably with the proposed method and showed improved AP and AR rates for overlapping, complex and cluttered objects. Figure 10 shows sample images of the fashion dataset.  [64]. Figure 11 shows average precision and average recall rates for the fashion dataset. The proposed method was used to experiment with all 15 categories of the fashion dataset. Figure 11a shows the significant results for AP. Three out of 15 categories show 100% AP, whereas other categories also show remarkable results with more than a 70% AP rate. Only one category, vest-waist coat, showed 40% AP rate due to the complex background and fake color images. The proposed method also showed improved results for overlay and complex images, as shown in Figure 11b. The categories coat and uniform reported significant AR rates. The performance of the proposed method was also   Figure 11a shows the Symmetry 2020, 12, 612 20 of 29 significant results for AP. Three out of 15 categories show 100% AP, whereas other categories also show remarkable results with more than a 70% AP rate. Only one category, vest-waist coat, showed 40% AP rate due to the complex background and fake color images. The proposed method also showed improved results for overlay and complex images, as shown in Figure 11b. The categories coat and uniform reported significant AR rates. The performance of the proposed method was also measured using mean average precision. More than 80% mAP was achieved by the proposed method.  [64]. Figure 11 shows average precision and average recall rates for the fashion dataset. The proposed method was used to experiment with all 15 categories of the fashion dataset. Figure 11a shows the significant results for AP. Three out of 15 categories show 100% AP, whereas other categories also show remarkable results with more than a 70% AP rate. Only one category, vest-waist coat, showed 40% AP rate due to the complex background and fake color images. The proposed method also showed improved results for overlay and complex images, as shown in Figure 11b. The categories coat and uniform reported significant AR rates. The performance of the proposed method was also measured using mean average precision. More than 80% mAP was achieved by the proposed method.  Figure 12a shows ARP for the fashion (15) dataset. The proposed method showed significant ARP rates for the fashion dataset as it used the L2 color coefficient to effectively index and classify the images. Most of the categories, including bloused, jacket, coat, jersey t-shirt, long dress, robe and uniform texture, showed outstanding performance of the proposed method. The ARP rate was above 85% in many categories. The results obtained by f-measure for the fashion dataset are graphically represented in Figure 12b. The proposed method reported encouraging f-measure results. The category vest-waistcoat showed the highest f-measure at 10%. Shirt and Polo-sport shirt reported 8% f-measure, whereas cloak and short dress showed 7% f-measure. All other categories reported 6% fmeasure. The significant f-measure results showed the superiority of the presented method for the fashion (15) dataset.  Figure 12a shows ARP for the fashion (15) dataset. The proposed method showed significant ARP rates for the fashion dataset as it used the L2 color coefficient to effectively index and classify the images. Most of the categories, including bloused, jacket, coat, jersey t-shirt, long dress, robe and uniform texture, showed outstanding performance of the proposed method. The ARP rate was above 85% in many categories. The results obtained by f-measure for the fashion dataset are graphically represented in Figure 12b. The proposed method reported encouraging f-measure results. The category vest-waistcoat showed the highest f-measure at 10%. Shirt and Polo-sport shirt reported 8% f-measure, whereas cloak and short dress showed 7% f-measure. All other categories reported 6% f-measure. The significant f-measure results showed the superiority of the presented method for the fashion (15) dataset.

Results on Blobs
The Corel-1000 dataset is commonly used for image classification and retrieval [38,67,68]. Corel datasets consist of various image categories containing plain background images of complex objects. The dataset contains 1000 images in ten categories. The Corel-1000 dataset contains various semantic groups such as food, flowers, animals, natural scenes, buses, buildings, mountains and people. For the object detection and versatility of the image semantics, Corel-1000 was tested. The 100 images had a resolution of 256 × 384 pixels or 384 × 256 pixels for every semantic group. Figure 13 shows sample images of the Corel-1000 dataset.

Results on Blobs
The Corel-1000 dataset is commonly used for image classification and retrieval [38,67,68]. Corel datasets consist of various image categories containing plain background images of complex objects. The dataset contains 1000 images in ten categories. The Corel-1000 dataset contains various semantic groups such as food, flowers, animals, natural scenes, buses, buildings, mountains and people. For the object detection and versatility of the image semantics, Corel-1000 was tested. The 100 images had a resolution of 256 × 384 pixels or 384 × 256 pixels for every semantic group. Figure 13 shows sample images of the Corel-1000 dataset. Figure 13. Corel-1000 dataset: Sample images from each category [19].
The average precision results for Corel-1000 dataset are shown in Figure 14a. The proposed method effectively classified the blob images from semantically different groups containing different foreground and background images. The Corel-1000 dataset efficiently classified images due to the deep learning feature of the proposed methods. Image sampling, scaling, integration, shape-based filtering, RGB coefficients and spatial mapping with CCN features made it possible to effectively classify the images. The average precision results for the Corel-1000 dataset show the superiority of the proposed method in blob images due to symmetric sampling, shape-based filtering and RGB coefficient mapping. The proposed method showed significant performance in most of the categories, such as beaches, buildings, buses, dinosaurs, flowers, mountains, horses and food. For complex categories including dinosaurs, flowers and horses, the proposed method reported 100% AP rates. The category buses and mountains showed 97% and 95% AP rate, respectively. Other categories showed above 75% AP rates. The mean average precision for the proposed method was more than 89%. The presented method also showed remarkable results for average recall, as shown in Figure  14b. The categories buses, dinosaurs, flowers and horses reported significant performance with 0.10 AR rate.

Results on Blobs
The Corel-1000 dataset is commonly used for image classification and retrieval [38,67,68]. Corel datasets consist of various image categories containing plain background images of complex objects. The dataset contains 1000 images in ten categories. The Corel-1000 dataset contains various semantic groups such as food, flowers, animals, natural scenes, buses, buildings, mountains and people. For the object detection and versatility of the image semantics, Corel-1000 was tested. The 100 images had a resolution of 256 × 384 pixels or 384 × 256 pixels for every semantic group. Figure 13 shows sample images of the Corel-1000 dataset. Figure 13. Corel-1000 dataset: Sample images from each category [19].
The average precision results for Corel-1000 dataset are shown in Figure 14a. The proposed method effectively classified the blob images from semantically different groups containing different foreground and background images. The Corel-1000 dataset efficiently classified images due to the deep learning feature of the proposed methods. Image sampling, scaling, integration, shape-based filtering, RGB coefficients and spatial mapping with CCN features made it possible to effectively classify the images. The average precision results for the Corel-1000 dataset show the superiority of the proposed method in blob images due to symmetric sampling, shape-based filtering and RGB coefficient mapping. The proposed method showed significant performance in most of the categories, such as beaches, buildings, buses, dinosaurs, flowers, mountains, horses and food. For complex categories including dinosaurs, flowers and horses, the proposed method reported 100% AP rates. The category buses and mountains showed 97% and 95% AP rate, respectively. Other categories showed above 75% AP rates. The mean average precision for the proposed method was more than 89%. The presented method also showed remarkable results for average recall, as shown in Figure  14b. The categories buses, dinosaurs, flowers and horses reported significant performance with 0.10 AR rate.   Figure 13. Corel-1000 dataset: Sample images from each category [19].
The average precision results for Corel-1000 dataset are shown in Figure 14a. The proposed method effectively classified the blob images from semantically different groups containing different foreground and background images. The Corel-1000 dataset efficiently classified images due to the deep learning feature of the proposed methods. Image sampling, scaling, integration, shape-based filtering, RGB coefficients and spatial mapping with CCN features made it possible to effectively classify the images. The average precision results for the Corel-1000 dataset show the superiority of the proposed method in blob images due to symmetric sampling, shape-based filtering and RGB coefficient mapping. The proposed method showed significant performance in most of the categories, such as beaches, buildings, buses, dinosaurs, flowers, mountains, horses and food. For complex categories including dinosaurs, flowers and horses, the proposed method reported 100% AP rates. The category buses and mountains showed 97% and 95% AP rate, respectively. Other categories showed above 75% AP rates. The mean average precision for the proposed method was more than 89%. The presented method also showed remarkable results for average recall, as shown in Figure 14b. The categories buses, dinosaurs, flowers and horses reported significant performance with 0.10 AR rate.
The ARP for the Corel-1000 dataset is shown in Figure 15a. The proposed method showed remarkable ARP results for the Corel-1000 dataset. Figure 15b shows f-measure results of the proposed method for Corel-1000 dataset. The categories African, buildings, elephants and food show 11% f-measure, whereas mountains and beaches reported a 10% f-measure, and other categories showed 9% results. The ARP for the Corel-1000 dataset is shown in Figure 15a. The proposed method showed remarkable ARP results for the Corel-1000 dataset. Figure 15b shows f-measure results of the proposed method for Corel-1000 dataset. The categories African, buildings, elephants and food show 11% f-measure, whereas mountains and beaches reported a 10% f-measure, and other categories showed 9% results.

Results for Small and Tiny Images
The Corel 10,000 dataset [19] contains various image categories. The Corel-10000 database is comprised of hundreds of categories where each category contains 100 images. The image size is 128 ×85 pixels or 85 ×128 pixels for every semantic group. The image size of the Corel-10000 dataset is small. The Corel-10000 dataset contains various semantic groups such as butterfly, ketch, cars, planets, flags, texture, shining stars, text, hospital, flowers, food, sunset, animals, human texture and trees etc. Figure 16 is shown sample images of Corel-10000.   The ARP for the Corel-1000 dataset is shown in Figure 15a. The proposed method showed remarkable ARP results for the Corel-1000 dataset. Figure 15b shows f-measure results of the proposed method for Corel-1000 dataset. The categories African, buildings, elephants and food show 11% f-measure, whereas mountains and beaches reported a 10% f-measure, and other categories showed 9% results.

Results for Small and Tiny Images
The Corel 10,000 dataset [19] contains various image categories. The Corel-10000 database is comprised of hundreds of categories where each category contains 100 images. The image size is 128 ×85 pixels or 85 ×128 pixels for every semantic group. The image size of the Corel-10000 dataset is small. The Corel-10000 dataset contains various semantic groups such as butterfly, ketch, cars, planets, flags, texture, shining stars, text, hospital, flowers, food, sunset, animals, human texture and trees etc. Figure 16 is shown sample images of Corel-10000.

Results for Small and Tiny Images
The Corel 10,000 dataset [19] contains various image categories. The Corel-10000 database is comprised of hundreds of categories where each category contains 100 images. The image size is 128 × 85 pixels or 85 × 128 pixels for every semantic group. The image size of the Corel-10000 dataset is small. The Corel-10000 dataset contains various semantic groups such as butterfly, ketch, cars, planets, flags, texture, shining stars, text, hospital, flowers, food, sunset, animals, human texture and trees etc. Figure 16 is shown sample images of Corel-10000. The ARP for the Corel-1000 dataset is shown in Figure 15a. The proposed method showed remarkable ARP results for the Corel-1000 dataset. Figure 15b shows f-measure results of the proposed method for Corel-1000 dataset. The categories African, buildings, elephants and food show 11% f-measure, whereas mountains and beaches reported a 10% f-measure, and other categories showed 9% results.

Results for Small and Tiny Images
The Corel 10,000 dataset [19] contains various image categories. The Corel-10000 database is comprised of hundreds of categories where each category contains 100 images. The image size is 128 ×85 pixels or 85 ×128 pixels for every semantic group. The image size of the Corel-10000 dataset is small. The Corel-10000 dataset contains various semantic groups such as butterfly, ketch, cars, planets, flags, texture, shining stars, text, hospital, flowers, food, sunset, animals, human texture and trees etc. Figure 16 is shown sample images of Corel-10000.   Figure 16. Corel-10000 dataset: Sample images from some categories [19]. Table 4 shows average precision, average recall, ARP and F-measure of the proposed method for the Corel-10000 dataset. The proposed method showed outstanding performance in most of the image categories. The average precision rate was between 70% and 100%. The proposed method showed significant average recall rates. Most of the complex categories reported better performance with 0.10 AR rate. The proposed method showed improved performance for different categories with images of various shape and color. ARP results showed outstanding performance of the proposed method. The proposed method provided above 85% ARP ratios for most of the image categories. The proposed method also showed significant f-measure results for many categories. The Cifar-100 dataset contains tiny images. The proposed method reported outstanding average precision and f-measure results for tiny and complex images. However, the proposed method provided significant results for tiny images of the Cifar-100 database.

Results of the Corel-1000 Dataset with Existing State-of-the-Art Methods
To test the effectiveness and accuracy of the proposed method, the results of the Corel-1000 dataset were compared with the existing state-of-the-art methods. The existing methods include CDLIR [69], CBSSC [70], CRHOG [71], GRMCB [72], RLMIR [73], IKAMC [74], AMCI [75] and IRMSR [76]. A graphical representation of average precision of the proposed method as compared with existing state-of-the-art methods is shown in Figure 17. The proposed method showed outstanding performance in most of the categories, as compared with other methods. The presented method reported the highest average precision rates in the categories African, beaches, dinosaurs, flowers, horses, mountains and food. However, existing state-of-the-art methods showed better average precision results in some categories including buildings, buses and elephants. The proposed method also showed better accuracy in these three categories. RLMIR [73] reported better AP for the category of buses. CRHOG [71] shows improved AP for the category buildings and GRMCB [72] provides better result for the category Elephants. The Cifar-100 dataset contains tiny images. The proposed method reported outstanding average precision and f-measure results for tiny and complex images. However, the proposed method provided significant results for tiny images of the Cifar-100 database.

Results of the Corel-1000 Dataset with Existing State-of-the-Art Methods
To test the effectiveness and accuracy of the proposed method, the results of the Corel-1000 dataset were compared with the existing state-of-the-art methods. The existing methods include CDLIR [69], CBSSC [70], CRHOG [71], GRMCB [72], RLMIR [73], IKAMC [74], AMCI [75] and IRMSR [76]. A graphical representation of average precision of the proposed method as compared with existing state-of-the-art methods is shown in Figure 17. The proposed method showed outstanding performance in most of the categories, as compared with other methods. The presented method reported the highest average precision rates in the categories African, beaches, dinosaurs, flowers, horses, mountains and food. However, existing state-of-the-art methods showed better average precision results in some categories including buildings, buses and elephants. The proposed method also showed better accuracy in these three categories. RLMIR [73] reported better AP for the category of buses. CRHOG [71] shows improved AP for the category buildings and GRMCB [72] provides better result for the category Elephants. Figure 17. Comparison of the average precisions attained by the proposed method and other standard retrieval systems using the Corel-1000 dataset. Table 5 shows the average precision ratio of the proposed method compared with other existing state-of-the-art methods. The performance of the proposed method showed significant average precision rates in African, food, buses, dinosaurs, flowers, horses, mountains and beaches categories. Figure 18 shows the comparison of the proposed method with other existing methods for mean average precision. The proposed method showed highest mAP with 0.89. CBSSC [70] reports second highest mAP as 78%. GRMCB [72] and RLMIR [73] provide mAP as 76%. CDLIR [69], AMCI [75], IRMSR [76] and CRHOG [71] show mAP between 0.66 and 0.76. IKAMC [74] reports lowest mAP as 0.64.   Figure 17. Comparison of the average precisions attained by the proposed method and other standard retrieval systems using the Corel-1000 dataset. Table 5 shows the average precision ratio of the proposed method compared with other existing state-of-the-art methods. The performance of the proposed method showed significant average precision rates in African, food, buses, dinosaurs, flowers, horses, mountains and beaches categories. Figure 18 shows the comparison of the proposed method with other existing methods for mean average precision. The proposed method showed highest mAP with 0.89. CBSSC [70] reports second highest mAP as 78%. GRMCB [72] and RLMIR [73] provide mAP as 76%. CDLIR [69], AMCI [75], IRMSR [76] and CRHOG [71] show mAP between 0.66 and 0.76. IKAMC [74] reports lowest mAP as 0.64.  The only pitfall to be expected from our approach is that it is not applicable to satellite images.

Conclusions
This paper presents a novel technique that detects salient objects, spatial color and texture features in a novel way to represent the image contents accurately and combines them with the signatures extracted by ResNet architecture. The extracted powerful image feature candidates are the information carriers of image contents strengthened by the signatures produced by convolution neural network ResNet. Image smoothing, sampling, suppression and scaling oriented signatures are capable of recognizing the deep image features. The proposed method shows remarkable results on most of the datasets including 250 categories of the ALOT benchmark and 100 categories of Corel-10000. Remarkable results are achieved for the challenging benchmarks including Cifar-100 and Cifar-10. Texture features are finely distinguished by the presented technique at high precision for the fashion dataset. An extension to this contribution is to run it on cloud for VOC.

Limitations
The only pitfall to be expected from our approach is that it is not applicable to satellite images.

Conclusions
This paper presents a novel technique that detects salient objects, spatial color and texture features in a novel way to represent the image contents accurately and combines them with the signatures extracted by ResNet architecture. The extracted powerful image feature candidates are the information carriers of image contents strengthened by the signatures produced by convolution neural network ResNet. Image smoothing, sampling, suppression and scaling oriented signatures are capable of recognizing the deep image features. The proposed method shows remarkable results on most of the datasets including 250 categories of the ALOT benchmark and 100 categories of Corel-10000. Remarkable results are achieved for the challenging benchmarks including Cifar-100 and Cifar-10. Texture features are finely distinguished by the presented technique at high precision for the fashion dataset. An extension to this contribution is to run it on cloud for VOC.