This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method
by
Yun Bu
Yun Bu 1,*
,
Wenbo Jiang
Wenbo Jiang 1
,
Gang Lu
Gang Lu 1 and
Qiang Zhang
Qiang Zhang 2
1
School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China
2
School of Sciences, Southwest Petroleum University, Chengdu 610500, China
*
Author to whom correspondence should be addressed.
Entropy 2026, 28(1), 57; https://doi.org/10.3390/e28010057 (registering DOI)
Submission received: 25 September 2025
/
Revised: 17 December 2025
/
Accepted: 26 December 2025
/
Published: 31 December 2025
Abstract
When training a neural network, the choice of activation function can greatly impact its performance. A function with a larger derivative may cause the coefficients of the latter layers to deviate further from the calculated direction, making deep learning more difficult to train. However, an activation function with a derivative amplitude of less than one can result in the problem of a vanishing gradient. To overcome this drawback, we propose the application of pseudo-normalization to enlarge some gradients by dividing them by the root mean square. This amplification is performed every few layers to ensure that the amplitudes are larger than one, thus avoiding the condition of vanishing gradient and preventing gradient explosion. We successfully applied this approach to several deep learning networks with hyperbolic tangent activation for image classifications. To gain a deeper understanding of the algorithm, we employed interpretability techniques to examine the network’s prediction outcomes. We discovered that, in contrast to popular networks that learn picture characteristics, the networks primarily employ the contour information of images for categorization. This suggests that our technique can be utilized in addition to other widely used algorithms.
Share and Cite
MDPI and ACS Style
Bu, Y.; Jiang, W.; Lu, G.; Zhang, Q.
Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy 2026, 28, 57.
https://doi.org/10.3390/e28010057
AMA Style
Bu Y, Jiang W, Lu G, Zhang Q.
Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy. 2026; 28(1):57.
https://doi.org/10.3390/e28010057
Chicago/Turabian Style
Bu, Yun, Wenbo Jiang, Gang Lu, and Qiang Zhang.
2026. "Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method" Entropy 28, no. 1: 57.
https://doi.org/10.3390/e28010057
APA Style
Bu, Y., Jiang, W., Lu, G., & Zhang, Q.
(2026). Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method. Entropy, 28(1), 57.
https://doi.org/10.3390/e28010057
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.