Next Article in Journal
Transition-Aware Decomposition of Single-Qudit Gates
Previous Article in Journal
The KPZ Equation of Kinetic Interface Roughening: A Variational Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method

1
School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China
2
School of Sciences, Southwest Petroleum University, Chengdu 610500, China
*
Author to whom correspondence should be addressed.
Entropy 2026, 28(1), 57; https://doi.org/10.3390/e28010057 (registering DOI)
Submission received: 25 September 2025 / Revised: 17 December 2025 / Accepted: 26 December 2025 / Published: 31 December 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

When training a neural network, the choice of activation function can greatly impact its performance. A function with a larger derivative may cause the coefficients of the latter layers to deviate further from the calculated direction, making deep learning more difficult to train. However, an activation function with a derivative amplitude of less than one can result in the problem of a vanishing gradient. To overcome this drawback, we propose the application of pseudo-normalization to enlarge some gradients by dividing them by the root mean square. This amplification is performed every few layers to ensure that the amplitudes are larger than one, thus avoiding the condition of vanishing gradient and preventing gradient explosion. We successfully applied this approach to several deep learning networks with hyperbolic tangent activation for image classifications. To gain a deeper understanding of the algorithm, we employed interpretability techniques to examine the network’s prediction outcomes. We discovered that, in contrast to popular networks that learn picture characteristics, the networks primarily employ the contour information of images for categorization. This suggests that our technique can be utilized in addition to other widely used algorithms.
Keywords: deep learning; convolutional neural networks; backpropagation algorithm; vanishing gradient; pseudo-normalizing deep learning; convolutional neural networks; backpropagation algorithm; vanishing gradient; pseudo-normalizing

Share and Cite

MDPI and ACS Style

Bu, Y.; Jiang, W.; Lu, G.; Zhang, Q. Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy 2026, 28, 57. https://doi.org/10.3390/e28010057

AMA Style

Bu Y, Jiang W, Lu G, Zhang Q. Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy. 2026; 28(1):57. https://doi.org/10.3390/e28010057

Chicago/Turabian Style

Bu, Yun, Wenbo Jiang, Gang Lu, and Qiang Zhang. 2026. "Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method" Entropy 28, no. 1: 57. https://doi.org/10.3390/e28010057

APA Style

Bu, Y., Jiang, W., Lu, G., & Zhang, Q. (2026). Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy, 28(1), 57. https://doi.org/10.3390/e28010057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop