# Rotation Invariant Networks for Image Classification for HPC and Embedded Systems

## Abstract

## 1. Introduction

## 2. Related Works

#### 2.1. Continuous Filter Sampling

#### 2.2. Discrete Angular Sampling

## 3. Methodology

#### 3.1. Oriented Representation Mapping

**x**(Figure 2a). An oriented component is obtained by the product $\mathrm{x}\times {g}^{\phi}$ where $\phi $ is the orientation of an oriented edge detector g. Furthermore, we have a set of orientations such as $i=0,\dots ,N$, with $N\in {\mathbb{Z}}^{+}$ where N is the number of orientations in which we want to decompose the input image. This means that the set $[\mathrm{x}\times {g}^{{\phi}_{i}}]$ contains N oriented components (features) of the input image $\mathrm{x}$. Furthermore, we can define $d\phi =2\pi /N$ as the angular sampling magnitude in degrees for a filter with periodicity $2\pi $. Then the orientation of the filter in the position i position is ${\phi}_{i}=2\pi i/N$.

**.**Let Φ be a mapping $\mathsf{\Phi}:{\mathbb{R}}^{m\times n}\to {\mathbb{R}}^{m\times n\times N}$, and $\mathsf{\Phi}\left(\mathrm{x}\right)$ a feature representation of $\mathrm{x}$

#### Gabor Filters

#### 3.2. Feature Extraction

#### State-of-the-Art Architectures as Backbone (ResNet)

#### 3.3. Translating Predictor

#### 3.4. Memory-Footprint vs. Data Parallelism

## 4. Results

#### 4.1. MNIST

#### 4.2. CIFAR-10

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

**Figure 1.**Rotational Invariant CNN architecture. Input is class number 4 rotated 90° clockwise. ($N=8$, $K=10$ convolutions in (

**e**) scan each place with shared weights).

**Figure 2.**Oriented mapping. First, the image is decomposed by a set of N oriented components. Second, the result is re-oriented to the reference. This becomes a roto-translational feature stage.

**Figure 3.**Oriented representation mapping to obtain roto-translational feature space. Left side: Up-right oriented input (

**a**), and the oriented components (

**b**) generate a translating output product of the rotation compensation (

**c**). Right side: Rotated input (

**d**) and the oriented components (

**e**) with output translated by one space over axis N (

**f**).

**Figure 5.**The backbone feature extraction is applied to each filter of the oriented feature space. ResNet used as backbone example. Output is a translating feature space. (Image blocks with $N=8$).

**Figure 6.**Accuracy values forthe proposed architecture on CIFAR-10 (training: up-right/validation: rotated).

Method | Error Rate | # Parameters |
---|---|---|

ORN-8 (ORPooling) [20] | 16.67% | 397 k |

ORN-8 (ORAlign) [20] | 16.24% | 969 k |

RotInv Conv. (RP_RF_1) [23] | 19.85% | 130 k |

RotInv Conv. (RP_RF_1_32) [23] | 12.20% | 1 M |

RotDCF (60 degrees) [22] | 17.64% | 760 k |

Spherical CNN [15] | 6.00% | 68 k |

Icosahedral CNN [16] | 30.01% | n.c. |

RI-LBCNNs [21] | 25.77% | 390 k |

Covariant CNN [27] | 17.21% | 7 k |

RIN (Steerable + Convolutional) (this paper) | 2.05% | 42 k |

RIN (Gabor + Convolutional) (this paper) | 1.71% | 9 k |

Method | Error Rate | # Parameters |
---|---|---|

RotInv Conv. (RP_RF_1) [23] | 55.88% | 130 k |

ORN [20] | 59.31% | 382 k |

RP_1234 [17] | 62.55% | 130 k |

RIN (Gabor + Convolutional) (this paper) | 37.60% | 93 k |

RIN (Gabor + Convolutional) (this paper) | 28.69% | 238 k |

RIN (Gabor + Convolutional) (this paper) | 33.25% | 586 k |

RIN (Gabor + ResNet8) (this paper) | 27.68% | 243 k |

RIN (Gabor + ResNet14) (this paper) | 27.34% | 341 k |

RIN (Gabor + ResNet20) (this paper) | 21.10% | 438 k |

RIN (Gabor + ResNet26) (this paper) | 21.50% | 537 k |

