Next Article in Journal
A Mesoscopic Traffic Data Assimilation Framework for Vehicle Density Estimation on Urban Traffic Networks Based on Particle Filters
Previous Article in Journal
First-Stage Prostate Cancer Identification on Histopathological Images: Hand-Driven versus Automatic Learning
Article Menu

Export Article

Open AccessArticle
Entropy 2019, 21(4), 357;

Multistructure-Based Collaborative Online Distillation

National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
School of Electronic Engineering and Computer Science, Queen Mary University of London, London E14NS, UK
Author to whom correspondence should be addressed.
Received: 5 March 2019 / Revised: 24 March 2019 / Accepted: 1 April 2019 / Published: 2 April 2019
(This article belongs to the Special Issue The Information Bottleneck in Deep Learning)
PDF [1074 KB, uploaded 18 April 2019]


Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation. View Full-Text
Keywords: deep learning; knowledge distillation; distributed architecture; supplementary information deep learning; knowledge distillation; distributed architecture; supplementary information

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Gao, L.; Lan, X.; Mi, H.; Feng, D.; Xu, K.; Peng, Y. Multistructure-Based Collaborative Online Distillation. Entropy 2019, 21, 357.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top