# PrimeNet: Adaptive Multi-Layer Deep Neural Structure for Enhanced Feature Selection in Early Convolution Stage

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- A backward-locking free novel dynamic MLP structure ‘PrimeNet’ is proposed to encourage the most vital distinctive attributes within highly correlated multiscale activations.
- PrimeNet builds a localized learning strategy to train the weight layers with locally generated errors.
- To avoid extreme compression of the information passing through PrimeNet and to avoid correlated regions concentrating in local regions, a summarized local translation-invariant features projection is utilized in this research.

## 2. Relevant Work

#### 2.1. Conditional Computation

#### 2.2. Network Pruning and Distillation

#### 2.3. Knowledge Distillation

#### 2.4. Discussion

## 3. Method

#### 3.1. Local Loss Computation

#### 3.2. Convolution with Local Loss

#### 3.3. PrimeNet

#### 3.4. Feature Representation with Minimum Loss Local Structure Transferring

#### 3.5. Learning Algorithm

Algorithm 1 Training algorithm for classifier model. | |

Input: Batch Images | |

Output: Prediction | |

1: procedure TRAIN( ) | |

2: capsuleX=Capsule() | ▷ Shallow network instance for x scale. |

3: capsuleY=Capsule() | ▷ Shallow network instance for y scale. |

4: capsuleZ=Capsule() | ▷ Shallow network instance for z scale. |

5: classi f ier=PrimeNet() | ▷ Classifier Network Instance. |

6: procedure FEATURE_TRAIN(images) | ▷ Individual capsule training for feature generarion |

7: featuresX = capsule(images) | ▷ Features X from Capsule X. |

8: lossX = loss( featuresX) | ▷ loss from feature X. |

9: featuresY = capsule(images) | ▷ Features Y from Capsule Y. |

10: lossY = loss( featuresY) | ▷ loss from feature Y. |

11: featuresZ = capsule(images) | ▷ Features Z from Capsule Z. |

12: lossZ = loss( featuresZ) | ▷ loss from feature Z. |

13: if (lossX < lossY)&(lossX < lossZ) then | |

14: min_loss_ features=featureX | |

15: else if (lossY > lossX)&(lossY > lossZ) then | |

16: min_loss_ features=featureY | |

17: else | |

18: min_loss_ features=featureZ | |

19: end if | |

20: gradient_update(capsuleX, capsuleY, capsuleZ) | ▷ local gradient update |

21: end procedure | |

22: for epoch in range(epochs) do | |

23: classifier_loss = 0 | |

24: total_classifier_loss = 0 | |

25: for batch_images in train_dataset do | |

26: MinLossFeature=feature_train(batch_images) | |

27: features=MinLossFeature + input_projection | ▷ Input feature projection is concatenated. |

28: classifier_output = classifier( features) | |

29: classifier_loss = loss(classifier_output) | |

30: total_classifier_loss += classifier_loss | ▷ Combined classifier loss |

31: update_weights=gradient_update(classifier) | |

32: end for | |

33: end for | |

34: end procedure |

## 4. Experiments and Results

#### 4.1. Implementation Details

**32**is the number of filters). NA means that there are no trainable parameters involved. GAP refers to the meaning Global Average Pooling in the Classifier configuration.

#### 4.2. Experimental Setup

#### 4.3. Results

#### 4.4. Computational Analysis

## 5. Ablation Study

#### 5.1. Local Gradient Update

#### 5.2. Adaptive Cost-Conscious Local Structure Transfer

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv
**2013**, arXiv:1312.4400. [Google Scholar] - Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Rosenblatt, F. Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms; Technical Report; Cornell Aeronautical Lab. Inc.: Buffalo, NY, USA, 1961. [Google Scholar]
- Nøkland, A.; Eidnes, L.H. Training neural networks with local error signals. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 4839–4850. [Google Scholar]
- Bolukbasi, T.; Wang, J.; Dekel, O.; Saligrama, V. Adaptive neural networks for fast test-time prediction. arXiv
**2017**, arXiv:1702.07811. [Google Scholar] - Huang, G.; Chen, D.; Li, T.; Wu, F.; van der Maaten, L.; Weinberger, K.Q. Multi-scale dense networks for resource efficient image classification. arXiv
**2017**, arXiv:1703.09844. [Google Scholar] - Lin, J.; Rao, Y.; Lu, J.; Zhou, J. Runtime neural pruning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 2178–2188. [Google Scholar]
- Wang, X.; Yu, F.; Dou, Z.Y.; Darrell, T.; Gonzalez, J.E. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 409–424. [Google Scholar]
- Veit, A.; Belongie, S. Convolutional networks with adaptive inference graphs. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–18. [Google Scholar]
- Figurnov, M.; Collins, M.D.; Zhu, Y.; Zhang, L.; Huang, J.; Vetrov, D.; Salakhutdinov, R. Spatially adaptive computation time for residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1039–1048. [Google Scholar]
- Kong, S.; Fowlkes, C. Pixel-wise attentional gating for parsimonious pixel labeling. arXiv
**2018**, arXiv:1805.01556. [Google Scholar] - Li, Z.; Yang, Y.; Liu, X.; Zhou, F.; Wen, S.; Xu, W. Dynamic computational time for visual attention. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 1199–1209. [Google Scholar]
- Ying, C.; Fragkiadaki, K. Depth-adaptive computational policies for efficient visual tracking. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2017; pp. 109–122. [Google Scholar]
- Wu, Z.; Nagarajan, T.; Kumar, A.; Rennie, S.; Davis, L.S.; Grauman, K.; Feris, R. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8817–8826. [Google Scholar]
- McIntosh, L.; Maheswaranathan, N.; Sussillo, D.; Shlens, J. Recurrent segmentation for variable computational budgets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1648–1657. [Google Scholar]
- Kang, D.; Dhar, D.; Chan, A.B. Incorporating Side Information by Adaptive Convolution. Int. J. Comput. Vis.
**2020**, 128, 2897–2918. [Google Scholar] [CrossRef] - Li, H.; Zhang, H.; Qi, X.; Yang, R.; Huang, G. Improved techniques for training adaptive deep networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1891–1900. [Google Scholar]
- Huang, G.; Liu, S.; Van der Maaten, L.; Weinberger, K.Q. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2752–2761. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. arXiv
**2014**, arXiv:1406.6247. [Google Scholar] - Khan, F.U.; Aziz, I.B.; Akhir, E.A.P. Pluggable Micronetwork for Layer Configuration Relay in a Dynamic Deep Neural Surface. IEEE Access
**2021**, 9, 124831–124846. [Google Scholar] [CrossRef] - He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft filter pruning for accelerating deep convolutional neural networks. arXiv
**2018**, arXiv:1808.06866. [Google Scholar] - Singh, P.; Verma, V.K.; Rai, P.; Namboodiri, V.P. Play and prune: Adaptive filter pruning for deep model compression. arXiv
**2019**, arXiv:1905.04446. [Google Scholar] - Lin, M.; Ji, R.; Zhang, Y.; Zhang, B.; Wu, Y.; Tian, Y. Channel pruning via automatic structure search. arXiv
**2020**, arXiv:2001.08565. [Google Scholar] - Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring Sparsity in Image Super-Resolution for Efficient Inference. arXiv
**2021**, arXiv:2006.09603. [Google Scholar] - Kim, J.; Chang, S.; Yun, S.; Kwak, N. Prototype-based Personalized Pruning. arXiv
**2021**, arXiv:2103.15564. [Google Scholar] - Luo, C.; Zhan, J.; Hao, T.; Wang, L.; Gao, W. Shift-and-Balance Attention. arXiv
**2021**, arXiv:2103.13080. [Google Scholar] - Lan, X.; Zhu, X.; Gong, S. Knowledge distillation by on-the-fly native ensemble. arXiv
**2018**, arXiv:1806.04606. [Google Scholar] - Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv
**2015**, arXiv:1503.02531. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2017**, arXiv:1412.6980. [Google Scholar] - Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional Net. arXiv
**2015**, arXiv:1412.6806. [Google Scholar] - Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? arXiv
**2014**, arXiv:1411.1792. [Google Scholar] - Panchal, G.; Ganatra, A.; Shah, P.; Panchal, D. Determination of over-learning and over-fitting problem in back propagation neural network. Int. J. Soft Comput.
**2011**, 2, 40–51. [Google Scholar] [CrossRef] - Van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res.
**2008**, 9, 2579–2605. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv
**2015**, arXiv:1512.03385. [Google Scholar]

**Figure 1.**The backward-locking free, custom convolution block that updates the layer-wise local loss instead of sending the gradients back through backpropagation. Redrawn from figure source in Reference [4].

**Figure 2.**The proposed PrimeNet architecture. Multi-scale micro-network operates to perform intermediate classifications. Local loss is computed for each scale convolution, and the adaptive convolution is performed. The other layers with higher losses are soft pruned, and their weights are locally updated.

**Figure 3.**PrimeNet training flow. The shallow capsules network inside the selection block convolves at different scales, and the batch-wise loss is calculated as gradients are updated locally. The feature representation with minimum loss is forwarded as an input to the classifier, and classification is performed on locally learned feature representation.

**Figure 6.**The t-SNE visualization of features from image representations by proposed PrimeNet (

**a**) and ResNet50 (

**b**).

**Figure 9.**(

**a**,

**b**) The incremental model learning from the dynamically selected feature representations as a result of fine tuning on top of the classifier. Linear classifier network (static convolutional) losses and accuracies are also presented to demonstrate the difference in performance.

Capsule (Shallow Network) | PrimeNet (Classifer) | ||
---|---|---|---|

Layer | Parameter | Layer | Parameter |

Conv2D | 1 × 1, 32 | MaxPool | 3 × 3 |

Conv2D | 3 × 3, 5 × 5, 7 × 7, 64 | Conv2D | 1 × 1, 64 |

BN | NA | BN | NA |

LReLU | NA | MaxPool | 2 × 2 |

Flatten | NA | Conv2D | 3 × 3, 128 |

Dense | 28 × 28 × 1, 32 × 32 × 3 | GAP | NA |

Flatten | NA | ||

Dense | 10 |

Dataset | Input Size | No. of Classes | Train Size | Test Size |
---|---|---|---|---|

MNIST | 28 × 28 × 1 | 10 | 60,000 | 10,000 |

CIFAR-10 | 32 × 32 × 3 | 10 | 50,000 | 10,000 |

SVHN | 32 × 32 × 3 | 10 | 73,257 | 26,032 |

**Table 3.**Comparison of classification error rate on evaluation model (%). The symbols $\left(\mu \right)$ and $\left(\sigma \right)$ represent the mean and standard deviation, respectively, for the top 5 scores from each dataset.

Dataset | PrimeNet Test Error (%) | Model | Baseline Error (%) | Baseline Method | ||
---|---|---|---|---|---|---|

Top-1 | Top-5 ($\mathit{\mu}$) | $\mathit{\sigma}$ | ||||

MNIST | 0.59 | 0.706 | 0.14 | DT-RAM [12] | 1.46, 1.12 | Conditional Computation |

Condensenet [18] | 3.46, 3.76 | Conditional Computation | ||||

ITADN [17] | 5.9 | Conditional Computation | ||||

CIFAR-10 | 6.21 | 7.34 | 3.67 | RNP [7] | 15.05 | Network Pruning |

SFP [21] | 7.74, 6.32 | Network Pruning | ||||

ITADN [17] | 3.13, 5.99 | Knowledge Distillation | ||||

SVHN | 1.71 | 2.2 | 0.42 | ONE [27] | 1.63 | Knowledge Distillation |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Khan, F.U.; Aziz, I. PrimeNet: Adaptive Multi-Layer Deep Neural Structure for Enhanced Feature Selection in Early Convolution Stage. *Appl. Sci.* **2022**, *12*, 1842.
https://doi.org/10.3390/app12041842

**AMA Style**

Khan FU, Aziz I. PrimeNet: Adaptive Multi-Layer Deep Neural Structure for Enhanced Feature Selection in Early Convolution Stage. *Applied Sciences*. 2022; 12(4):1842.
https://doi.org/10.3390/app12041842

**Chicago/Turabian Style**

Khan, Farhat Ullah, and Izzatdin Aziz. 2022. "PrimeNet: Adaptive Multi-Layer Deep Neural Structure for Enhanced Feature Selection in Early Convolution Stage" *Applied Sciences* 12, no. 4: 1842.
https://doi.org/10.3390/app12041842