Next Article in Journal
Information Theory and Symbolic Analysis: Theory and Applications
Next Article in Special Issue
Language Representation Models: An Overview
Previous Article in Journal
Beware the Black-Box: On the Robustness of Recent Defenses to Adversarial Examples
Previous Article in Special Issue
Discriminable Multi-Label Attribute Selection for Pre-Course Student Performance Prediction

Information Bottleneck Theory Based Exploration of Cascade Learning

School of Electronics and Computer Science, University of Southampton, Southampton SO17 3AS, UK
Author to whom correspondence should be addressed.
Academic Editor: Boštjan Brumen
Entropy 2021, 23(10), 1360;
Received: 6 September 2021 / Revised: 12 October 2021 / Accepted: 15 October 2021 / Published: 18 October 2021
(This article belongs to the Special Issue Information-Theoretic Data Mining)
In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification. View Full-Text
Keywords: information bottleneck theory; Cascade Learning; neural networks information bottleneck theory; Cascade Learning; neural networks
Show Figures

Figure 1

MDPI and ACS Style

Du, X.; Farrahi, K.; Niranjan, M. Information Bottleneck Theory Based Exploration of Cascade Learning. Entropy 2021, 23, 1360.

AMA Style

Du X, Farrahi K, Niranjan M. Information Bottleneck Theory Based Exploration of Cascade Learning. Entropy. 2021; 23(10):1360.

Chicago/Turabian Style

Du, Xin, Katayoun Farrahi, and Mahesan Niranjan. 2021. "Information Bottleneck Theory Based Exploration of Cascade Learning" Entropy 23, no. 10: 1360.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop