Dihedral Group D4—A New Feature Extraction Algorithm

In this paper, we propose a new feature descriptor for images that is based on the dihedral group D 4 , the symmetry group of the square. The group action of the D 4 elements on a square image region is used to create a vector space that forms the basis for the feature vector. For the evaluation, we employed the Error-Correcting Output Coding (ECOC) algorithm and tested our model with four diverse datasets. The results from the four databases used in this paper indicate that the feature vectors obtained from our proposed D 4 algorithm are comparable in performance to that of Histograms of Oriented Gradients (HOG) model. Furthermore, as the D 4 model encapsulates a complete set of orientations pertaining to the D 4 group, it enables its generalization to a wide range of image classification applications.


Introduction
In computer vision, a feature vector or descriptor for an image region is usually defined by mathematical operations on a set of neighboring pixels in the image region. These operations generally result in a compact representation of the image region, which reduces the computational complexity associated with classification tasks. An optimal feature vector should provide a suitable representation of an object or image region that enables its discrimination from the other objects or image regions in the scene.
HOG is based on the idea that an object's shape and appearance can be characterized by the distribution of local intensity gradients [1]. A feature vector in HOG algorithm is calculated by dividing an image into smaller regions called cells and for each cell accumulating a histogram of gradients for all pixels in the cell [1]. The local gradients are contrast-normalized by selecting larger regions called blocks and using the results to normalize all the cells in a block [1]. In a study [1] by Dalal and Triggs, the authors observed that the HOG based feature vector outperformed the wavelet [9], PCA-SIFT [10], and Shape context [11] based descriptors for a human detection test case.
Based on the success obtained by using local gradients or edge orientation in the HOG model [1], we hypothesize that the use the D 4 elements on a square image region can capture the local gradients. We investigate if the inherent properties of the complete set of elements pertaining to the D 4 group can form a natural basis for calculating a feature vector suitable for image discrimination. The D 4 group has shown promising results in various computer vision applications [12][13][14][15][16][17], which motivated us to use this group for our proposed algorithm.
The rest of the article is organized as follows. In Section 2, we briefly discuss the theory behind the dihedral group D 4 . In Section 3, we outline the proposed D 4 algorithm for calculating feature vector associated with a given image. In Section 3.3, we briefly explain the databases used for testing the performance of the proposed model. In Section 3.4, we briefly explain the ECOC algorithm that used for classification. In Section 4, we discuss the results obtained for the different datasets used in this paper. In Section 5, we discuss the different customizable aspects of the proposed D 4 model and possible future research directions. Finally, based on the results, we outline our conclusions.

Theory
A dihedral group D n is the group of symmetries of an n-sided regular polygon, i.e., all sides have the same length and all angles are equal. D n has n rotational symmetries and n reflection symmetries. In other words, it has n axes of symmetry and a total of 2n different symmetries [18]. For instance, the polygons for n = 3-6 and the associated reflection symmetries are shown in Figure 1. Here, we can see that, if n is odd, each axis of symmetry connects the vertex with the midpoint of the opposite side. If n is even, there are n/2 symmetry axes connecting the midpoints of opposite sides and n/2 symmetry axes connecting opposite vertices. A group is a set G together with a binary operation * on its elements. This operation * must behave such that: (i) G must be closed under * , that is, for every pair of elements g 1 , g 2 in G we must have that g 1 * g 2 is again an element in G. (ii) The operation * must be associative, that is, for all elements g 1 , g 2 , g 3 in G we must have that (iii) There is an element e in G, called the identity element, such that for all g ∈ G we have that e * g = g = g * e.
(iv) For every element g in G there is an element g −1 in G, called the inverse of g, such that The group D 4 has eight elements, four rotational symmetries and four reflection symmetries. The rotations are 0 • , 90 • , 180 • , and 270 • , and the reflections are defined along the four axes shown in Figure 1. We refer to these elements as σ 0 , σ 1 , . . . , σ 7 . Note that the identity element is rotation by 0 • , and that for each element there is another element that has the opposite effect on the square, as required in the definition of a group. As an example of one of the group elements, consider Figure 2, where we demonstrate rotation by 90 • counterclockwise on a square with labeled corners.

Method
In this section, we describe the details of our proposed algorithm. First, we discuss the colorspace used for our proposed model. Second, we describe the procedure used to obtain the D 4 based feature vector from a given image. Third, we discuss the conditions under which the proposed model can generate sparse feature vectors and describe our proposed solution to mitigate that problem. Fourth, we briefly explain details of the four different databases used in our analysis. Fifth, we outline details of the procedure used for the analysis and discuss the ECOC algorithm.

De-Correlated Color Space
As a first step, to reduce redundant information across the color channels, the input RGB color image I is de-correlated. In line with the study by Sharma [17], the color channels are de-correlated as follows: First, the matrix entries of I are reorganized to create a two-dimensional matrix M of size w × n, where n is the number of channels and w is the length of vector, i.e., the product of the length of matrix rows and columns. In the case of an RGB image, n = 3. After that, the matrix entries of M are normalized by the mean for each channel.Next, we calculate the correlation matrix of M as, where the size of C is n by n. Following this, the Eigen decomposition of a symmetric matrix is calculated as, where V is a square matrix whose columns are Eigenvectors of C, while D is the diagonal matrix whose diagonal entries are the corresponding Eigenvalues. Finally, the RGB image channels are transformed into Eigenvector space (also known as principal components) as: where µ is the mean for each channel and S is the transformed space matrix that represent the de-correlated channels. As an example, the de-correlated channels of an RGB image is shown in Figure 3.
If the input image is grayscale only, we perform a histogram equalization on the image and normalize it in the range [0 1].

Proposed D 4 Model
To calculate feature vector associated with an input image, we decompose the image into k square regions of size N by N pixels each as shown in Figure 4. Please note that the choice of N can influence the results, which is discussed below in Section 4. If an image size is not a multiple of the square region size, the image borders are extended by padding with neighboring information.
B (i.e., a square region) is defined as an N × N-matrix and σ i as one of the eight group elements of D 4 . The eight elements are the rotations along 0 • , 90 • , 180 • , and 270 • , and the reflections along the horizontal, vertical, and two diagonal axes of the square. As an example, the eight group transformations pertaining to a square block of an image are shown in Figure 5. As asymmetry associated with rotation by 0 • is trivial, there are only seven unique asymmetries to be considered; these seven asymmetries are used in the proposed algorithm. The asymmetry of square region B by σ i is denoted by A i (B) to be, where i = {1, 2, 3, 4, 5, 6, 7} and N 2 is the total number of pixels in each square region. In other words, asymmetry for each unique group element is represented by a positive real value that is obtained by the mean square root of the absolute value norm associated with the matrix differences of B and σ i B.  Finally, the seven scalar asymmetry values associated with each square region in the image are collected in a matrix R and normalized in the range [0 1] for each element. Figure 6 shows the different features associated with a cat image captured by the different asymmetries R 1 to R 7 . This results in a k × 7 × 3 sized feature vector where k corresponds to the number of blocks into which an input image is divided, 7 corresponds to the number of different asymmetries, and 3 corresponds to the number of channels associated with the input RGB image. This resulting feature vector is then used for image classification tasks. The proposed D 4 model was implemented using MATLAB and its implementation will be made available at the MathWorks file exchange website.

Special Case
A typical limitation of the proposed algorithm is that, for completely symmetric patterns such as shown in Figure 7, the feature vectors generated will be sparse. This can be addressed by selecting the square blocks with an overlap, as shown in Figure 8 . Please note that, for our calculations, we used an overlap of 50% for each block, which was an arbitrary choice.

Databases
To evaluate the performance of the feature vector obtained from the D 4 model, we used four different datasets: Cats and Dogs [19], Fashion-MNIST [20], Person [1], and NLC [21]. The Cats and Dogs [19] dataset consists of 8192 RGB color images of cats and dogs. A few sample images of the two categories are shown in Figure 9. As the pictures are taken in complex backgrounds, this dataset is considered to be quite challenging for machine learning algorithms [22]. The Fashion-MNIST [20] dataset consists of grayscale images of clothing items belonging to 10 different categories. Using this dataset enables us to explore our proposed model for image data that lack color information. For instance, a few sample images belonging to the Fashion-MNIST [20] dataset are shown in Figure 10. The Person [1] dataset consists of color RGB images of people in different upright positions (as shown in Figure 11) and is divided into two categories consisting of positive samples with people and negative samples for images without people. The NLC [21] dataset consists of color images of sky, which are divided into four categories: noctilucent clouds, tropospheric clouds, clear sky, and rest. This is a unique dataset as it does not contain the usual shapes, such as people, animals, and clothing items, which exist in the other datasets. For example, a few samples of images belonging to the different categories are shown in Figure 12. For more details on the datasets, please see Table 1.

Procedure for Analysis
In this section, we outline the procedure used for analysis of proposed D 4 model for the four different datasets used in this paper. We compared the performance of the proposed D 4 model with that of the HOG [1] model by employing the ECOC algorithm. For the training phase of the ECOC algorithm, we used 60% of the samples for each dataset and for evaluating the performance we used 40% of the data. The total number of samples used in our analysis and the size of sample images for each dataset are shown in Table 1. The data samples for each database were randomized prior to selection for training and testing.

ECOC Algorithm
The ECOC algorithm is suitable for problems that involve instances belonging to multiple classes or categories [23]. For instance, in an optical digit recognition task, each handwritten character can belong to one of the ten different classes associated with digits from 0 to 9.
The ECOC algorithm [23] is based on the approach of distributed output code mentioned in the study by Sejnowski et al. [24]. Here, the general idea is to decompose a problem into several binary problems by using a binary classifier (such as a support vector machine [25]). This means that, for a given class i, it should be able to discriminate among the patterns of the class i and the rest of the classes [26]. In this manner, each class is assigned a unique n bit binary string called a codeword [23], where each bit identifies the membership of the class to a classifier [27]. In the evaluation stage, the classification decision is based on the output codeword obtained from the binary classifiers. The distances between codewords of the output and the classes are calculated and the class with the shortest distance is assigned as the predicted class. For our calculations, we used the ECOC model with SVM from MATLAB.

Results
In this section, we discuss the results obtained by testing different colorspaces, norm functions, and the comparison of the D 4 model and HOG [1] model across the four different datasets.

Colorspace Selection
We used percentage accuracy as a metric for judging the performance of a model, which is defined as the ratio of number of samples with correct classification to that of the total number of samples used for testing.
To see if the choice of a colorspace can influence the performance of the proposed D 4 model, we used the Person [1] dataset for testing. Our results, as shown in Table 2, indicate that for RGB colorspace the accuracy is slightly lower than that of the HSV and De-Corr colorspaces. For our analysis in the rest of the paper, we use the HSV colorspace for the D 4 model.

Norm Function Selection
We testrf different norm functions for calculation of the values associated with asymmetries as defined in Equation (4). Three functions, namely L 1 norm, L 2 norm, and our proposed norm defined by mean square-root of absolute differences, were evaluated for the Person [1] dataset. The results shown in Table 3 suggest that different norm functions influence the accuracy of the prediction with our proposed norm giving the highest accuracy. Thus, this norm function was employed in our proposed D 4 model.

Comparison for Different Databases
We compared the performance of feature vectors generated by using the proposed D 4 model and the HOG [1] model for four different datasets. As shown in Table 4, different block or cell sizes used in the D 4 and HOG [1] models lead of different feature vector sizes. Please note that N is the block size in case of the D 4 model and the cell size in case of the HOG [1] model. A larger block or cell size can generate a compact feature vector for both D 4 and HOG [1] models. However, it can also reduce the accuracy, as illustrated by the results associated with the NLC [21] dataset, where the accuracy of both D 4 and HOG [1] models decrease when the block or cell size is increased from 8 to 16. A larger feature vector can capture more details associated with an image, but it also increases the computational complexity associated with the classification task.
The accuracy percentages in Table 4  To see if combining the proposed D 4 and the HOG [1] models could provide a better classification accuracy, we combined the feature vectors from both algorithms. The associated accuracies can be observed in Table 4. We note that the combined models outperform the individual models for all four datasets. This indicates that there are differences in the feature vectors obtained from the proposed D 4 and the HOG [1] models that can further improve the performance of classification.

Discussion
As mentioned in Section 4.1, the proposed D 4 model is influenced by the choice of colorspace used for calculation of the feature vector. For images, the colorspaces that generate uncorrelated channels such as L*a*b*, HSV, and De-Corr give better classification accuracies as compared to traditional RGB channels.
The choice of norm function, as outlined in Section 4.2, can also influence the performance of the D 4 based feature vector. We employed a custom norm function that returns a scalar value associated with a particular asymmetry as defined by the D 4 group elements. The feature vectors generated by using this norm function give better accuracies than the feature vectors using the L 1 and L 2 norm functions. It should be noted that a recent study by Ballesteros and Salgado [28], where the authors explored the optimal parameters for the HOG [1] model, suggests that the choice of norm function depends on the task at hand.
As shown in Table 4, the choice of block size for calculating the feature vector for the D 4 model can influence the accuracy of the classification. Using a larger block size can generate a compact feature vector, but, it can also reduce the accuracy of prediction, as discussed in Section 4.3. This implies that the choice of block size is dependent on the type of classification problem.
The proposed D 4 model calculates a feature vector for a given image based on seven unique asymmetries associated with the dihedral group D 4 . These asymmetries encapsulate the local gradients in a suitable manner that renders them to be used as a feature vector. This can be observed from the results obtained in Section 4.3, where the performance of the D 4 model is comparable to that of the HOG [1] model. Furthermore, the simplicity of asymmetry calculations reduces the computational complexity of the D 4 model.
The combined D 4 and HOG models outperform the individual D 4 and HOG models for all the datasets used in this paper (as shown in Table 4). This implies that, for a given image, the feature vectors generated from the two models are not identical.
Studies by Bilgic et al. [29] and Hong et al. [30] on improving the robustness and using the HOG [1] model for real-time tasks suggest employing the AdaBoost algorithm to combine the results from a set of weak classifications by using multiple iterations to provide a robust classification output. Similar approaches can be applied to the proposed D 4 model to improve its robustness and enable its use in real-time applications. This is something we plan to address in the future.
In the future, the proposed D 4 model based feature vector approach can be extended to three-dimensional image data. This can achieved by dividing the three-dimensional image space into cube spaces and by employing the symmetry group associated with a cube, i.e., using the S 4 × S 2 group transformations. S 2 is the symmetric group of degree 2 and has two elements: the identity and the permutation interchanging the two points [18]. S 4 is a symmetric group of degree 4, i.e., all permutations on a set of size four [18]. This group has 24 elements that are obtained by rotations about opposite faces, opposite diagonals and opposite edges of the cube.

Conclusions
In this article, we propose a new feature descriptor for images that has its basis in the dihedral group D 4 elements. The group action of the D 4 elements on a square image region is used to create a vector space that forms the basis for the feature vector. For testing the performance of the D 4 based feature vector, we used an Error-Correcting Output Coding (ECOC) algorithm. An evaluation was performed using four different datasets. Our results show that the proposed D 4 algorithm is comparable in performance to that of Histograms of Oriented Gradients (HOG) model. In addition, as the D 4 model captures the complete set of orientations pertaining to the D 4 group, it enables its generalization to a wide range of image classification tasks. In addition, we outline a few approaches towards future research directions.
Funding: This research received no external funding.