Next Article in Journal
An Overview of Emergent Order in Far-from-Equilibrium Driven Systems: From Kuramoto Oscillators to Rayleigh–Bénard Convection
Next Article in Special Issue
Discovering Higher-Order Interactions Through Neural Information Decomposition
Previous Article in Journal
Social Laser Model for the Bandwagon Effect: Generation of Coherent Information Waves
Previous Article in Special Issue
A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
Open AccessArticle

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

IBM Research, Bangalore 560045, India
Entropy 2020, 22(5), 560; https://doi.org/10.3390/e22050560
Received: 2 April 2020 / Revised: 10 May 2020 / Accepted: 15 May 2020 / Published: 17 May 2020
(This article belongs to the Special Issue Deep Artificial Neural Networks Meet Information Theory)
This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. View Full-Text
Keywords: deep neural networks; stochastic gradient descent; mutual information; adaptive learning rate deep neural networks; stochastic gradient descent; mutual information; adaptive learning rate
Show Figures

Figure 1

MDPI and ACS Style

Vasudevan, S. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks. Entropy 2020, 22, 560. https://doi.org/10.3390/e22050560

AMA Style

Vasudevan S. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks. Entropy. 2020; 22(5):560. https://doi.org/10.3390/e22050560

Chicago/Turabian Style

Vasudevan, Shrihari. 2020. "Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks" Entropy 22, no. 5: 560. https://doi.org/10.3390/e22050560

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop