# Feature Activation through First Power Linear Unit with Sign

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

## 3. Proposed Method

#### 3.1. Source of Motivation

#### 3.2. Derivation Process

- (I).
- Defined on the set of real numbers:$${\left(\right)}_{x}f\left(x\right)=\infty $$
- (II).
- Through the origin of coordinate system:$$f\left(0\right)=0$$
- (III).
- Continuous at the demarcation point:$$\underset{x\to {0}^{+}}{lim}f\left(x\right)=\underset{x\to {0}^{-}}{lim}f\left(x\right)=f\left(0\right)$$
- (IV).
- Differentiable at the demarcation point:$${\left(\right)}_{{\displaystyle \frac{\partial f\left(x\right)}{\partial x}}}x={0}^{+}$$
- (V).
- Monotone increasing:$$\forall x\in \mathbb{R}\phantom{\rule{4pt}{0ex}},\phantom{\rule{4pt}{0ex}}{\displaystyle \frac{\partial f\left(x\right)}{\partial x}}\u2a7e0$$
- (VI).
- Convex down:$$\forall x\in \mathbb{R}\phantom{\rule{4pt}{0ex}},\phantom{\rule{4pt}{0ex}}{\displaystyle \frac{{\partial}^{2}f\left(x\right)}{\partial {x}^{2}}}\u2a7e0$$

- When $\beta =1$, then ${\omega}_{2}<0,\phantom{\rule{4pt}{0ex}}\alpha >0\Rightarrow \forall x\in (-\infty ,0)$ subordinate condition (ii) is tenable.
- When $\beta =-1$, then ${\omega}_{2}>0,\phantom{\rule{4pt}{0ex}}\alpha <0\Rightarrow \forall x\in (-\infty ,0)$ subordinate condition (ii) is tenable.

- $\beta =1,\phantom{\rule{4pt}{0ex}}{\omega}_{2}<0,\phantom{\rule{4pt}{0ex}}{\omega}_{1}=-{\omega}_{2}>0,\phantom{\rule{4pt}{0ex}}\alpha >0,\phantom{\rule{4pt}{0ex}}\theta =-\alpha $
- $\beta =-1,\phantom{\rule{4pt}{0ex}}{\omega}_{2}>0,\phantom{\rule{4pt}{0ex}}{\omega}_{1}=-{\omega}_{2}<0,\phantom{\rule{4pt}{0ex}}\alpha <0,\phantom{\rule{4pt}{0ex}}\theta =\alpha $

## 4. Analysis of the Method

#### 4.1. Characteristics and Attributes

- For positive zones, identity mapping is retained for the sake of avoiding the gradient vanishing problem.
- For negative zones, as the negative input deepens, the outcome will show a tendency to gradually reach saturation, which offers our method robustness to the noise.
- Entire outputs’ mean of the unit is close to zero, since the result yielded for negative input is not directly zero, but exists a relative minus response to neutralize the holistic activation, so that bias shift effect can be reduced.
- When bearing the corresponding formulation of the negative part processed in Taylor expansion, seen as Equation (43), the operation carried out in the negative domain is equivalent to dissociating each order component of the input signal received, and thus more abundant features might be attained up to a point.
- From an overall perspective, the shape of the function is unilateral inhibitive, and this kind of one-sided form could facilitate the introduction of sparsity to output nodes, making them similar to logical neurons.

#### 4.2. Implications and Examples

## 5. Experiments and Discussion

#### 5.1. Influence of Two Alterable Factors $\lambda $ and $\mu $

#### 5.2. Intuitive Features of Activation Effect

#### 5.3. Explication of Robustness and Reliability

#### 5.4. Comparison of Performance on CIFAR-10 & CIFAR-100

#### 5.5. Experimental Results on ImageNet-ILSVRC2012

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys.
**1943**, 5, 115–133. [Google Scholar] [CrossRef] - Hodgkin, A.L.; Huxley, A.F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol.
**1990**, 52, 117. [Google Scholar] [CrossRef] [Green Version] - Dayan, P.; Abbott, L.F. Theoretical Neuroscience: Computational & Mathematical Modeling of Neural Systems; The MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models. In Proceedings of the 30 th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv
**2015**, arXiv:1511.07289. [Google Scholar] [CrossRef] - Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 972–981. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw.
**2018**, 107, 3–11. [Google Scholar] [CrossRef] [PubMed] - Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv
**2017**, arXiv:1710.05941. [Google Scholar] - Misra, D. Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv
**2019**, arXiv:1908.08681v2. [Google Scholar] - Goodfellow, I.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. Maxout Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Ma, N.; Zhang, X.; Liu, M.; Sun, J. Activate or Not: Learning Customized Activation. arXiv
**2021**, arXiv:2009.04759. [Google Scholar] [CrossRef] - Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev.
**1958**, 65, 386–408. [Google Scholar] [CrossRef] [Green Version] - Courbariaux, M.; Bengio, Y.; David, J.P. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. arXiv
**2015**, arXiv:1511.00363. [Google Scholar] [CrossRef] - Berradi, Y. Symmetric Power Activation Functions for Deep Neural Networks. In Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, Rabat, Morocco, 2–5 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Gulcehre, C.; Moczulski, M.; Denil, M.; Bengio, Y. Noisy Activation Functions. arXiv
**2016**, arXiv:1603.00391. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef] [Green Version]
- Trottier, L.; Giguėre, P.; Chaib-draa, B. Parametric Exponential Linear Unit for Deep Convolutional Neural Networks. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 207–214. [Google Scholar] [CrossRef] [Green Version]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res.
**2010**, 9, 249–256. [Google Scholar] - Lecun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput.
**1989**, 1, 541–551. [Google Scholar] [CrossRef] - Amari, S.I. Natural Gradient Works Efficiently in Learning. Neural Comput.
**1999**, 10, 251–276. [Google Scholar] [CrossRef] - Attwell, D.; Laughlin, S.B. An Energy Budget for Signaling in the Grey Matter of the Brain. J. Cereb. Blood Flow Metab.
**2001**, 21, 1133–1145. [Google Scholar] [CrossRef] [PubMed] - Lennie, P. The Cost of Cortical Computation. Curr. Biol. CB
**2003**, 13, 493–497. [Google Scholar] [CrossRef] [Green Version] - Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv
**2015**, arXiv:1505.00853. [Google Scholar] [CrossRef] - Shang, W.; Sohn, K.; Almeida, D.; Lee, H. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. arXiv
**2016**, arXiv:1603.05201. [Google Scholar] [CrossRef] - Ma, N.; Zhang, X.; Sun, J. Funnel Activation for Visual Recognition. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020. [Google Scholar] [CrossRef]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic ReLU. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020. [Google Scholar] [CrossRef]
- Barron, J.T. Continuously Differentiable Exponential Linear Units. arXiv
**2017**, arXiv:1704.07483. [Google Scholar] - Zheng, Q.; Tan, D.; Wang, F. Improved Convolutional Neural Network Based on Fast Exponentially Linear Unit Activation Function. IEEE Access
**2019**, 7, 151359–151367. [Google Scholar] - Basirat, M.; Roth, P.M. The Quest for the Golden Activation Function. arXiv
**2018**, arXiv:1808.00783. [Google Scholar] - Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv
**2016**, arXiv:1606.08415. [Google Scholar] - Dugas, C.; Bengio, Y.; Bélisle, F.; Nadeau, C.; Garcia, R. Incorporating Second-Order Functional Knowledge for Better Option Pricing. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, USA, 1 January 2000; pp. 451–457. [Google Scholar]
- Ying, Y.; Su, J.; Shan, P.; Miao, L.; Peng, S. Rectified Exponential Units for Convolutional Neural Networks. IEEE Access
**2019**, 7, 2169–3536. [Google Scholar] [CrossRef] - Kiliarslan, S.; Celik, M. RSigELU: A nonlinear activation function for deep neural networks. Expert Syst. Appl.
**2021**, 174, 114805. [Google Scholar] [CrossRef] - Pan, J.; Hu, Z.; Yin, S.; Li, M. GRU with Dual Attentions for Sensor-Based Human Activity Recognition. Electronics
**2022**, 11, 1797. [Google Scholar] [CrossRef] - Tedesco, S.; Alfieri, D.; Perez-Valero, E.; Komaris, D.S.; Jordan, L.; Belcastro, M.; Barton, J.; Hennessy, L.; O’Flynn, B. A Wearable System for the Estimation of Performance-Related Metrics during Running and Jumping Tasks. Appl. Sci.
**2021**, 11, 5258. [Google Scholar] [CrossRef] - Hubel, D.H.; Wiesel, T.N. Receptive Fields of Single Neurons in the Cat’s Striate Cortex. J. Physiol.
**1959**, 148, 574–591. [Google Scholar] [CrossRef] - Bhumbra, G.S. Deep learning improved by biological activation functions. arXiv
**2018**, arXiv:1804.11237. [Google Scholar] - Ramachandran, P.; Zoph, B.; Le, Q. Swish: A Self-Gated Activation Function. arXiv
**2017**, arXiv:1710.05941. [Google Scholar] - Lecun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv
**2015**, arXiv:1502.03167. [Google Scholar] [CrossRef] - Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res.
**2014**, 15, 1929–1958. [Google Scholar] - Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv
**2016**, arXiv:1602.07360. [Google Scholar] [CrossRef] - Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. arXiv
**2017**, arXiv:1707.07012. [Google Scholar] - Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv
**2016**, arXiv:1602.07261. [Google Scholar] - Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv
**2014**, arXiv:1409.1556. [Google Scholar] [CrossRef] - Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef] [Green Version]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; Volume 11218. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv
**2019**, arXiv:1905.11946. [Google Scholar] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**Graphic curves of power functions in a pair of opposite signs with 1 and −1 being exponents. Shadow region signifies a general orientation for $y=\omega x$ to reach tangency with $y=-\omega {x}^{-1}$.

**Figure 4.**Signal flow diagram of the input passing through the FPLUS activation unit. There are two switches in the diagram, in which the upper one chooses the branch of weight, and the lower one decides to apply negative first power or not.

**Figure 5.**A neuron in the convolutional layer with FPLUS unit embedded as the activation gate. X indicates the input, which may be normalized before, and Y denotes convolutional output after indispensable excitation.

**Figure 6.**Architecture of LeNet-5. There are four activation units lying in four layers correspondingly, illustrated by blue boxes.

**Figure 7.**Parameter variation from different initialization on MNIST and Fashion-MNIST after learning. Triangular points are samples of $\lambda $ and circular ones are samples of $\mu $. Initial values are distinguished by different colors.

**Figure 8.**Visualization of feature maps produced by ResNet18’s first activation layer, whose dimension is 64 channels generated by previous convolutional layer with batch normalization. They are presented in 8 × 8 grids and the format is in original grayscale. Comparison displays the impact of two different activation approaches.

**Figure 9.**Architecture for traditional ResNet-50 with addition of dropout layer. Details in bottleneck A and B are shown below.

**Figure 10.**Structures of BottleNeck A and BottleNeck B in Figure 9’s construction, embedded with CBAM modules specially.

**Figure 11.**Visualization of feature maps and heat maps produced by model’s final convolutional layer, under two circumstances whether batch normalization is employed or not. Original input image as well as corresponding predictive results for ReLU and FPLUS are given in the figure to check.

**Figure 14.**Curves of validating loss and top-1 accuracy for each model activated by different methods. Subfigures (

**a**–

**i**) are organized by the type of networks. Every subfigure contains a loss descent record on the left, and an illustration of top-1 accuracy ascent on the right.

Dataset | Factor | Loss | Accuracy | |||||||
---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\mu}$ | 0.01 | 0.1 | 1 | 10 | 0.01 | 0.1 | 1 | 10 | ||

$\mathit{\lambda}$ | ||||||||||

MNIST | 0.01 | 1.573 | 1.485 | 1.441 | 1.534 | 39.93% | 43.85% | 44.89% | 41.20% | |

0.1 | 0.147 | 0.114 | 0.139 | 0.158 | 96.21% | 97.01% | 96.63% | 96.05% | ||

1 | 0.054 | 0.046 | 0.027 | 0.028 | 98.40% | 98.31% | 98.97% | 98.92% | ||

10 | 0.567 | 0.430 | 0.111 | 0.079 | 96.21% | 96.47% | 97.11% | 97.45% | ||

Fashion-MNIST | 0.01 | 1.091 | 1.072 | 1.065 | 1.343 | 52.88% | 57.25% | 59.45% | 52.30% | |

0.1 | 0.630 | 0.568 | 0.534 | 0.579 | 76.49% | 78.84% | 79.87% | 77.97% | ||

1 | 0.309 | 0.307 | 0.253 | 0.267 | 87.75% | 88.18% | 89.62% | 88.98% | ||

10 | 0.496 | 0.636 | 0.371 | 0.341 | 83.49% | 84.99% | 86.53% | 86.71% |

**Table 2.**Comparison of loss and accuracy for training Intel image classification dataset on ResNet-50 with multiple network configurations, grouped by two ways of learning rate decay.

Architecture Configuration | Step Decay for lr | Exponential Decay for lr | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Testing Loss | Testing Accuracy | Testing Loss | Testing Accuracy | |||||||

BatchNorm | CBAM | Dropout | ReLU | FPLUS | ReLU | FPLUS | ReLU | FPLUS | ReLU | FPLUS |

0.674 | 0.484 | 75.24% | 83.18% | 1.098 | 0.822 | 55.39% | 66.95% | |||

✓ | 0.682 | 0.544 | 87.28% | 88.11% | 0.678 | 0.669 | 86.21% | 86.40% | ||

✓ | 0.802 | 0.551 | 69.98% | 80.19% | 1.335 | 1.133 | 44.25% | 55.75% | ||

✓ | 0.722 | 0.499 | 72.57% | 82.66% | 1.123 | 0.835 | 53.19% | 67.19% | ||

✓ | ✓ | 0.563 | 0.571 | 86.90% | 86.71% | 0.607 | 0.609 | 85.79% | 86.31% | |

✓ | ✓ | 0.846 | 0.490 | 87.08% | 87.44% | 0.683 | 0.684 | 86.03% | 86.51% | |

✓ | ✓ | 0.846 | 0.617 | 67.47% | 77.53% | 1.306 | 1.151 | 46.14% | 54.58% | |

✓ | ✓ | ✓ | 0.586 | 0.496 | 87.50% | 88.50% | 0.651 | 0.666 | 85.71% | 85.75% |

**Table 3.**Networks trained on CIFAR-10 and their parameter quantities as well as calculation overhead.

Network | # Params | FLOPs |
---|---|---|

SqueezeNet [47] | 0.735 M | 0.054 G |

NASNet [48] | 4.239 M | 0.673 G |

ResNet-50 [43] | 23.521 M | 1.305 G |

InceptionV4 [49] | 41.158 M | 7.521 G |

Activation | SqueezeNet | NASNet | ResNet-50 | InceptionV4 |
---|---|---|---|---|

ReLU | 0.350 | 0.309 | 0.335 | 0.442 |

LReLU | 0.345 | 0.307 | 0.322 | 0.412 |

PReLU | 0.365 | 0.366 | 0.370 | 0.305 |

ELU | 0.360 | 0.280 | 0.259 | 0.234 |

SELU | 0.447 | 0.307 | 0.295 | 0.307 |

FPLUS | 0.349 | 0.287 | 0.255 | 0.220 |

PFPLUS | 0.344 | 0.323 | 0.251 | 0.230 |

**Table 5.**Accuracy rate (%) on validation set of CIFAR-10 for each network with different activation functions.

Activation | SqueezeNet | NASNet | ResNet-50 | InceptionV4 |
---|---|---|---|---|

ReLU | 88.67 | 91.55 | 91.94 | 87.95 |

LReLU | 88.63 | 91.61 | 92.20 | 88.18 |

PReLU | 88.91 | 91.52 | 92.09 | 92.95 |

ELU | 87.82 | 91.70 | 92.71 | 92.59 |

SELU | 84.83 | 90.35 | 91.10 | 90.53 |

FPLUS | 88.70 | 91.46 | 92.75 | 93.47 |

PFPLUS | 89.09 | 91.62 | 93.13 | 93.41 |

**Table 6.**Networks trained on CIFAR-100 and their parameter quantities as well as calculation overhead.

Network | # Params | FLOPs |
---|---|---|

AlexNet [50] | 36.224 M | 201.056 M |

GoogLeNet [51] | 6.403 M | 534.418 M |

VGGNet-19 [52] | 39.328 M | 418.324 M |

ResNet-101 [43] | 42.697 M | 2.520 G |

DenseNet-121 [53] | 7.049 M | 898.225 M |

Xception [54] | 21.014 M | 1.134 G |

ShuffleNetV2 [55] | 1.361 M | 45.234 M |

MobileNetV2 [56] | 2.369 M | 67.593 M |

EfficientNetB0 [57] | 0.807 M | 2.432 M |

**Table 7.**Top-1 and top-5 accuracy rates (%) on validation set of CIFAR-100 for multiple networks with different activation implementations.

Network | Top-1 Accuracy | Top-5 Accuracy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

ReLU | ELU | Swish | Mish | FPLUS | PFPLUS | ReLU | ELU | Swish | Mish | FPLUS | PFPLUS | |

AlexNet [50] | 59.73 | 65.84 | 51.99 | 61.30 | 66.28 | 66.08 | 85.19 | 89.14 | 79.21 | 86.19 | 89.14 | 89.07 |

GoogLeNet [51] | 73.69 | 73.17 | 74.01 | 74.10 | 74.38 | 74.69 | 92.50 | 92.36 | 92.39 | 92.81 | 92.77 | 92.85 |

VGGNet-19 [52] | 69.00 | 65.49 | 68.30 | 68.10 | 68.03 | 68.24 | 87.86 | 84.25 | 88.58 | 88.69 | 89.72 | 89.36 |

ResNet-101 [43] | 74.42 | 74.99 | 74.48 | 74.97 | 75.20 | 75.51 | 93.06 | 93.18 | 92.57 | 93.13 | 93.29 | 93.26 |

DenseNet-121 [53] | 74.74 | 72.77 | 75.36 | 75.16 | 73.72 | 74.63 | 92.59 | 93.20 | 93.30 | 93.28 | 93.32 | 93.30 |

Xception [54] | 72.62 | 72.34 | 72.81 | 72.86 | 72.83 | 73.21 | 91.37 | 91.56 | 91.14 | 91.53 | 91.80 | 92.04 |

ShuffleNetV2 [55] | 65.32 | 68.01 | 66.73 | 67.48 | 69.01 | 67.89 | 88.58 | 90.53 | 89.46 | 89.72 | 91.23 | 90.17 |

MobileNetV2 [56] | 64.09 | 65.57 | 65.73 | 65.85 | 65.47 | 66.52 | 88.52 | 89.25 | 88.52 | 89.06 | 89.34 | 90.01 |

EfficientNetB0 [57] | 62.33 | 63.62 | 63.20 | 63.45 | 64.40 | 64.31 | 86.72 | 88.20 | 85.97 | 86.81 | 88.61 | 89.02 |

Activation Method | ReLU | ELU | FPLUS |
---|---|---|---|

Training Loss | 1.10 | 1.26 | 1.16 |

**Table 9.**Top-1 and top-5 accuracy rates (%) of each activation method applied on ResNet-101 for ILSVRC2012, grouped by training set and validation set.

Activation Method | Training Set | Validation Set | ||
---|---|---|---|---|

Top-1 Accuracy | Top-5 Accuracy | Top-1 Accuracy | Top-5 Accuracy | |

ReLU | 73.98 | 90.76 | 73.55 | 91.37 |

ELU | 70.23 | 88.01 | 72.15 | 90.36 |

FPLUS | 72.20 | 89.23 | 73.64 | 91.64 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Duan, B.; Yang, Y.; Dai, X.
Feature Activation through First Power Linear Unit with Sign. *Electronics* **2022**, *11*, 1980.
https://doi.org/10.3390/electronics11131980

**AMA Style**

Duan B, Yang Y, Dai X.
Feature Activation through First Power Linear Unit with Sign. *Electronics*. 2022; 11(13):1980.
https://doi.org/10.3390/electronics11131980

**Chicago/Turabian Style**

Duan, Boxi, Yufei Yang, and Xianhua Dai.
2022. "Feature Activation through First Power Linear Unit with Sign" *Electronics* 11, no. 13: 1980.
https://doi.org/10.3390/electronics11131980