Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
IBM Research, Bangalore 560045, India
Entropy 2020, 22(5), 560; https://doi.org/10.3390/e22050560
Received: 2 April 2020 / Revised: 10 May 2020 / Accepted: 15 May 2020 / Published: 17 May 2020
(This article belongs to the Special Issue Deep Artificial Neural Networks Meet Information Theory)
This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.
View Full-Text
Keywords:
deep neural networks; stochastic gradient descent; mutual information; adaptive learning rate
▼
Show Figures
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
MDPI and ACS Style
Vasudevan, S. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks. Entropy 2020, 22, 560. https://doi.org/10.3390/e22050560
AMA Style
Vasudevan S. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks. Entropy. 2020; 22(5):560. https://doi.org/10.3390/e22050560
Chicago/Turabian StyleVasudevan, Shrihari. 2020. "Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks" Entropy 22, no. 5: 560. https://doi.org/10.3390/e22050560
Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.
Search more from Scilit