Next Article in Journal
Simplified Approach to Detect Satellite Maneuvers Using TLE Data and Simplified Perturbation Model Utilizing Orbital Element Variation
Previous Article in Journal
Process Quality Evaluation Model with Taguchi Cost Loss Index
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning

1
School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
2
National Engineering Laboratory for Big Data Analytics, Xi’an Jiaotong University, Xi’an 710049, China
3
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(21), 10184; https://doi.org/10.3390/app112110184
Submission received: 5 October 2021 / Revised: 26 October 2021 / Accepted: 28 October 2021 / Published: 30 October 2021
(This article belongs to the Topic Machine and Deep Learning)

Abstract

Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, Adacomp, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size.
Keywords: deep learning; adaptive learning rate; robustness; stochastic gradient descent deep learning; adaptive learning rate; robustness; stochastic gradient descent

Share and Cite

MDPI and ACS Style

Li, Y.; Ren, X.; Zhao, F.; Yang, S. A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning. Appl. Sci. 2021, 11, 10184. https://doi.org/10.3390/app112110184

AMA Style

Li Y, Ren X, Zhao F, Yang S. A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning. Applied Sciences. 2021; 11(21):10184. https://doi.org/10.3390/app112110184

Chicago/Turabian Style

Li, Yanan, Xuebin Ren, Fangyuan Zhao, and Shusen Yang. 2021. "A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning" Applied Sciences 11, no. 21: 10184. https://doi.org/10.3390/app112110184

APA Style

Li, Y., Ren, X., Zhao, F., & Yang, S. (2021). A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning. Applied Sciences, 11(21), 10184. https://doi.org/10.3390/app112110184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop