A Novel Adaptive Activation Function for Convolutional Neural Networks: The Parametric Arctangent Unit (PATU)
Abstract
1. Introduction
- 1.
- 2.
- The parametric arctangent unit (PATU) activation function is proposed. The proposed PATU integrates the advantages of ReLU in the positive part, the negative part employs arctangent to activate negative inputs, and a trainable parameter is implemented to control saturation of the negative part.
- 3.
- The proposed PATU is evaluated on the CIFAR10 and CIFAR100 [30] datasets using three CNN architectures, i.e., SmallNet [31], the Network in Network (NIN) [32], and the Residual Network (ResNet) [33], which cover a variety of numbers of layers and filter structures. The experimental results demonstrate that the proposed PATU yields improvements over existing activation functions.
2. Related Works
2.1. Predefined Activation Function
2.1.1. ReLU
2.1.2. ELU
2.1.3. Swish
2.2. Parametric Activation Function
PReLU
3. Properties of Enhanced Activation Function
3.1. Mean Activation
3.2. Adaptability
3.3. Local Non-Linearity
3.4. Noise-Robust Deactivation State
4. The Proposed PATU Activation Function
- The proposed PATU exploits the advantages of ReLU in the positive part, and the arctangent function is employed in the negative part; thus, negative activations push the mean activation closer to zero, which reduces the effect of bias shift.
- The negative part employs the arctangent function to ensure a higher derivative for small negative inputs. A higher derivative relieves gradient vanishing problems; thus, the weights and bias can be updated properly. In addition, the derivative of the arctangent function decreases gradually and approaches zero; thus, a soft-saturation property realizes the noise-robust activation, which improves the model’s robustness.
- A trainable parameter is implemented to control the saturation shape of the arctangent function in the negative part. As a result, negative values can be activated selectively and adaptively. The parameterization feature enhances the flexibility and non-linearity of the model.
- The proposed PATU activation function is a locally nonlinear function.
5. Experimental Evaluations
5.1. Experimental Settings
5.2. Analysis of Parameter Initialization
5.3. Results of SmallNet on CIFAR10 and CIFAR100
5.4. Results of Network in Network on CIFAR10 and CIFAR100
5.5. Results of ResNet18 on CIFAR10 and CIFAR100
5.6. Mean Rank
6. Discussion
7. Limitations
- The computational complexity increases with the arctangent function and trainable parameter in the negative part. This results in a moderate increase in the training time per epoch, which might be an issue for very large-scale or time-critical applications.
- Our experiments are at the moment restricted to rather small-scale datasets, i.e., CIFAR10 and CIFAR100. The performance of PATU on large-scale datasets such as ImageNet has not been evaluated yet. Moreover, the proposed method has not been validated for more challenging computer vision tasks such as object detection and image dehazing [54,55].
- The initial value of trainable parameter in the negative part can be sensitive to the datasets and the tasks.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Abdullah, A.; Arianti, D.; Sahran, S. Iterative Ensemble Threshold Selection in Branch CNNs for Efficient Image Classification. In 2025 International Conference on Intelligent Systems: Theories and Applications (SITA); IEEE: New York, NY, USA, 2025; pp. 1–8. [Google Scholar]
- Abdullah, A.; Wong, W.S.; Albashish, D. EB-CNN: Ensemble of branch convolutional neural network for image classification. Pattern Recognit. Lett. 2025, 189, 1–7. [Google Scholar] [CrossRef]
- Su, Z.; Adam, A.; Nasrudin, M.F.; Prabuwono, A.S. Proposal-free fully convolutional network: Object detection based on a box map. Sensors 2024, 24, 3529. [Google Scholar] [CrossRef]
- Oday, A.; Azizi, A.; Sahran, S. YOLO-OSAM: Reassembly Spatial Attention Mechanisms for Facial Expression Recognition. Trait. Du. Signal 2025, 42, 2379. [Google Scholar] [CrossRef]
- Hassabis, D.; Kumaran, D.; Summerfield, C.; Botvinick, M. Neuroscience-inspired artificial intelligence. Neuron 2017, 95, 245–258. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Duan, B.; Yang, Y.; Dai, X. Activation by Switch Unit of Opposite First Powers. In 2022 IEEE 8th International Conference on Computer and Communications (ICCC); IEEE: New York, NY, USA, 2022; pp. 1431–1439. [Google Scholar]
- Rasamoelina, A.D.; Adjailia, F.; Sinčák, P. A review of activation function for artificial neural network. In 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI); IEEE: New York, NY, USA, 2020; pp. 281–286. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv 2018, arXiv:1811.03378. [Google Scholar] [CrossRef]
- Wang, S.H.; Sakk, E. The effect of activation function choice on the performance of convolutional neural networks. J. Emerg. Investig. 2023, 6, 1–9. [Google Scholar] [CrossRef]
- Budiman, N.A.; Adi, K.; Wibowo, A. Impact of Activation Function on the Performance of Convolutional Neural Network in Identifying Oil Palm Fruit Ripeness. Int. J. Math. Comput. Res. 2025, 13, 5107–5113. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, J.; Gao, C.; Qu, J.; Ji, L. Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), Chengdu, China, 6–9 December 2019; pp. 2000–2008. [Google Scholar]
- Douglas, S.; Yu, J. Why RELU Units Sometimes Die: Analysis of Single-Unit Error Backpropagation in Neural Networks. In Proceedings of the 2018 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 28–31 October 2018; pp. 864–868. [Google Scholar]
- Luo, G.; Wang, X.; Zhao, W.; Tao, S.; Tang, Z. ReLU Neural Networks and Their Training. Mathematics 2025, 14, 39. [Google Scholar] [CrossRef]
- Pusztaházi, L.S.; Eigner, G.; Csiszár, O. Parametric activation functions for neural networks: A tutorial survey. IEEE Access 2024, 12, 168626–168644. [Google Scholar] [CrossRef]
- Ohn, I.; Kim, Y. Smooth function approximation by deep neural networks with general activation functions. Entropy 2019, 21, 627. [Google Scholar] [CrossRef]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML 2013, 30, 3. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-normalizing neural networks. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; p. 30. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Florek, D.; Miłosz, M. Comparison of an effectiveness of artificial neural networks for various activation functions. J. Comput. Sci. Inst. 2023, 26, 7–12. [Google Scholar] [CrossRef]
- Jiang, T.; Cheng, J. Target recognition based on CNN with LeakyReLU and PReLU activation functions. In 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC); IEEE: New York, NY, USA, 2019; pp. 718–722. [Google Scholar]
- Shah, A.; Kadam, E.; Shah, H.; Shinde, S.; Shingade, S. Deep residual networks with exponential linear unit. In Proceedings of the Third International Symposium on Computer Vision and the Internet, Florence, Italy, 21–24 September 2016; pp. 59–65. [Google Scholar]
- Wang, T.; Qin, Z.; Zhu, M. An ELU network with total variation for image denoising. In Proceedings of the International Conference on Neural Information Processing; Springer International Publishing: Cham, Switzerland, 2017; pp. 227–237. [Google Scholar]
- Ying, Y.; Su, J.; Shan, P.; Miao, L.; Wang, X.; Peng, S. Rectified exponential units for convolutional neural networks. IEEE Access 2019, 7, 101633–101640. [Google Scholar] [CrossRef]
- Eger, S.; Youssef, P.; Gurevych, I. Is it time to swish? Comparing deep learning activation functions across NLP tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4415–4424. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Qiu, S.; Xu, X.; Cai, B. FReLU: Flexible rectified linear units for improving convolutional neural networks. In 2018 24th International Conference on Pattern Recognition (ICPR); IEEE: New Yok, NY, USA, 2018; pp. 1223–1228. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Amari, S.I. Natural gradient works efficiently in learning. Neural Comput. 1998, 10, 251–276. [Google Scholar] [CrossRef]
- Hock, H.C.; Wahid, N.; Ong, P. Parametric flatten-t swish: An adaptive nonlinear activation function for deep learning. J. Inf. Commun. Technol. (JICT) 2021, 20, 21–39. [Google Scholar]
- Jinsakul, N.; Tsai, C.F.; Tsai, C.E.; Wu, P. Enhancement of deep learning in image classification performance using xception with the swish activation function for colorectal polyp preliminary screening. Mathematics 2019, 7, 1170. [Google Scholar] [CrossRef]
- Szandała, T. Review and comparison of commonly used activation functions for deep neural networks. In Bio-Inspired Neurocomputing; Springer: Singapore, 2020; pp. 203–224. [Google Scholar]
- Szandała, T. Benchmarking comparison of swish vs. Other activation functions on cifar-10 imageset. In International Conference on Dependability and Complex Systems; Springer International Publishing: Cham, Switzerland, 2019; pp. 498–505. [Google Scholar]
- Tripathi, G.C.; Rawat, M.; Rawat, K. Swish activation based deep neural network predistorter for RF-PA. In TENCON 2019–2019 IEEE Region 10 Conference (TENCON); IEEE: New Yok, NY, USA, 2019; pp. 1239–1242. [Google Scholar]
- Liu, X.; Di, X. TanhExp: A Smooth Activation Function with High Convergence Speed for Lightweight Neural Networks. arXiv 2020, arXiv:2003.09855. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Feature extraction through LOCOCODE. Neural Comput. 1999, 11, 679–714. [Google Scholar] [CrossRef]
- Wang, Y.; Li, Y.; Song, Y.; Rong, X. The influence of the activation function in a convolution neural network model of facial expression recognition. Appl. Sci. 2020, 10, 1897. [Google Scholar] [CrossRef]
- Ping, W.; Peng, K.; Gibiansky, A.; Arik, S.O.; Kannan, A.; Narang, S.; Miller, J. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Liao, X.; Sahran, S.; Abdullah, A.; Shukor, S.A. Adacb: An adaptive gradient method with convergence range bound of learning rate. Appl. Sci. 2022, 12, 9389. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Hou, S.; Liu, X.; Wang, Z. Dualnet: Learn complementary features for image recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 502–510. [Google Scholar]
- Murthy, V.N.; Singh, V.; Chen, T.; Manmatha, R.; Comaniciu, D. Deep decision network for multi-class image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2240–2248. [Google Scholar]
- Xu, J.; Li, Z.; Du, B.; Zhang, M.; Liu, J. Reluplex made more practical: Leaky ReLU. In 2020 IEEE Symposium on Computers and Communications (ISCC); IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
- Yang, C.; Yang, Z.; Liao, S.; Hong, Z.; Nai, W. Triple-GAN with variable fractional order gradient descent method and mish activation function. In 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC); IEEE: New York, NY, USA, 2020; Volune 1, pp. 244–247. [Google Scholar]
- Sütfeld, L.R.; Brieger, F.; Finger, H.; Füllhase, S.; Pipa, G. Adaptive blending units: Trainable activation functions for deep neural networks. In Science and Information Conference; Springer International Publishing: Cham, Switzerland, 2020; pp. 37–50. [Google Scholar]
- Zhang, S.; Zhang, X.; Shen, L.; Wan, S.; Ren, W. Wavelet-based physically guided normalization network for real-time traffic dehazing. Pattern Recognit. 2025, 172, 112451. [Google Scholar]
- Liu, Y.; Li, T.; Tan, C.; Ren, W.; Ancuti, C.; Lin, W. IHDCP: Single image dehazing using inverted haze density correction prior. IEEE Trans. Image Process. 2026, 35, 1448–1461. [Google Scholar] [CrossRef]








| α | 0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
| Accuracy | 0.8226 | 0.8231 | 0.8191 | 0.8257 | 0.8235 | 0.8247 | 0.8190 | 0.8214 | 0.8193 | 0.8182 | 0.8197 |
| Activation Function | CIFAR10 (%) | CIFAR100 (%) | Training Time |
|---|---|---|---|
| ReLU | 81.95 ± 0.37 | 51.35 ± 0.28 | 5 s |
| PReLU | 81.62 ± 0.30 | 51.25 ± 0.41 | 7 s |
| ELU | 81.57 ± 0.33 | 51.20 ± 0.18 | 5 s |
| Swish | 81.43 ± 0.22 | 52.00 ± 0.31 | 6 s |
| PATU | 82.40 ± 0.16 | 52.79 ± 0.33 | 8 s |
| Activation Function | CIFAR10 (%) | CIFAR100 (%) | Training Time |
|---|---|---|---|
| ReLU | 84.55 ± 0.37 | 56.41 ± 0.47 | 25 s |
| PReLU | 86.47 ± 0.25 | 59.26 ± 0.36 | 41 s |
| ELU | 86.27 ± 0.24 | – | 27 s |
| Swish | 86.10 ± 0.23 | 59.11 ± 0.27 | 37 s |
| PATU | 86.42 ± 0.14 | 59.52 ± 0.39 | 59 s |
| Activation Function | CIFAR10 (%) | CIFAR100 (%) | Training Time |
|---|---|---|---|
| ReLU | 82.32 ± 0.51 | 51.54 ± 0.16 | 25 s |
| PReLU | 82.52 ± 0.12 | 50.60 ± 0.63 | 38 s |
| ELU | 81.60 ± 0.27 | 52.88 ± 0.77 | 26 s |
| Swish | 82.74 ± 0.66 | 54.27 ± 0.62 | 30 s |
| PATU | 83.51 ± 0.53 | 54.46 ± 1.59 | 40 s |
| Activation Function | SmallNet (C10) | SmallNet (C100) | NIN (C10) | NIN (C100) | ResNet18 (C10) | ResNet18 (C100) | Mean Rank |
|---|---|---|---|---|---|---|---|
| ReLU | 2 | 3 | 5 | 4 | 4 | 4 | 3.67 |
| PReLU | 3 | 4 | 1 | 2 | 3 | 5 | 3.00 |
| ELU | 4 | 5 | 3 | 5 | 5 | 3 | 4.16 |
| Swish | 5 | 2 | 4 | 3 | 2 | 2 | 3.00 |
| PATU | 1 | 1 | 2 | 1 | 1 | 1 | 1.16 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liao, X.; Sahran, S.; Abdullah, A.; Shukor, S.A.; Deriche, M. A Novel Adaptive Activation Function for Convolutional Neural Networks: The Parametric Arctangent Unit (PATU). Symmetry 2026, 18, 971. https://doi.org/10.3390/sym18060971
Liao X, Sahran S, Abdullah A, Shukor SA, Deriche M. A Novel Adaptive Activation Function for Convolutional Neural Networks: The Parametric Arctangent Unit (PATU). Symmetry. 2026; 18(6):971. https://doi.org/10.3390/sym18060971
Chicago/Turabian StyleLiao, Xuanzhi, Shahnorbanun Sahran, Azizi Abdullah, Syaimak Abdul Shukor, and Mohamed Deriche. 2026. "A Novel Adaptive Activation Function for Convolutional Neural Networks: The Parametric Arctangent Unit (PATU)" Symmetry 18, no. 6: 971. https://doi.org/10.3390/sym18060971
APA StyleLiao, X., Sahran, S., Abdullah, A., Shukor, S. A., & Deriche, M. (2026). A Novel Adaptive Activation Function for Convolutional Neural Networks: The Parametric Arctangent Unit (PATU). Symmetry, 18(6), 971. https://doi.org/10.3390/sym18060971

