Towards New Generation, Biologically Plausible Deep Neural Network Learning
Abstract
:1. Introduction
2. Background and Motivation
2.1. End-to-End Backpropagation
2.1.1. Biological Implausibility of End-to-End Backpropagation
- First is its lack of local error representation and nonlocal learning rules. The classic end-to-end backpropagation algorithm uses a nonlocal rule for updating the weights between nodes, whereas Hebb’s work demonstrates that a change in synaptic strength in biological neural network learning should depend only on the neurons local to that synapse [20].
- Second is the symmetry of forward and backward weights. In artificial neural networks, the weights during forward (information) and backward (error) propagation are identical for the same synapse. This symmetry suggests that identically weighted forward and backward connections should exist between biological neurons; however, this is not the case [21].
- Third are the data demands. The end-to-end backpropagation algorithm requires a very large labelled dataset, whereas biological neural systems use unsupervised learning with observations from their extensive sensory experience to train feature detectors [22].
- Fourth are the unrealistic models of neurons. By and large, artificial neural networks use neurons that fire a continuous output whereas real neurons output discrete spikes [23]; in other words, while natural neurons either fire or not (are “on” or “off”), the output of artificial ones exhibit a smoothly behaving degree of action potentiation.
2.1.2. Limitations of End-to-End Backpropagation
- computationally intensive to train and deploy (GPUs required, and sometimes even TPUs) [30],
- susceptible to algorithmic bias [31],
- poor at representing and conveying uncertainty (how can one know what the model knows?) [32],
- difficult in complementing with structure and prior knowledge during learning [33],
- uninterpretable black boxes (which do not easily establish trust) [34], and
- require expert knowledge for their design and fine-tuning [35].
2.2. Biologically Plausible Learning
2.3. A Balancing Act
2.4. Scalability
3. Related Work
3.1. Supervised Learning Approaches
3.1.1. Target Propagation
3.1.2. Feedback Alignment
3.2. Semi-Supervised Learning Approaches
3.2.1. Autoencoders (AEs) and Restricted Boltzmann Machines (RBMs)
3.2.2. Unsupervised Learning with Hidden Competing Units
3.2.3. Bayesian Confidence Propagation Neural Network (BCPNN)
3.3. Competitive Learning
- the starting state comprising a set of units with a randomly distributed parameter which enables each unit to respond slightly differently to inputs;
- limited unit “strength” [25]; and
- unit competition for the “winner-takes-all” right to respond to input.
3.4. Learning Robustness
4. Our Biologically Plausible Model
4.1. Overall Structure
Forward Pass
4.2. The Unsupervised Component
4.2.1. Activations of Hidden Units
4.2.2. Temporal Competition
4.2.3. Synaptic Plasticity Rule
4.2.4. Approximate but Fast(er) Learning
4.2.5. Summary
4.3. The Supervised Component
5. Experimental Analysis
5.1. Implementation and Hardware
5.2. Datasets
5.3. Architectures
5.4. Cost Function
5.5. Hyperparameters
5.6. Baseline Comparate
5.7. Experiments
5.7.1. Feature Detectors
5.7.2. Data Scarcity
Results
5.7.3. Robustness to Data Corruption
Additive Gaussian Noise
Feature Detectors
Robustness to Additive Gaussian Noise
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech emotion recognition using deep learning techniques: A review. IEEE Access 2019, 7, 117327–117345. [Google Scholar] [CrossRef]
- Chen, H.; Chen, A.; Xu, L.; Xie, H.; Qiao, H.; Lin, Q.; Cai, K. A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric. Water Manag. 2020, 240, 106303. [Google Scholar] [CrossRef]
- Li, Y.; Ma, L.; Zhong, Z.; Liu, F.; Chapman, M.A.; Cao, D.; Li, J. Deep learning for lidar point clouds in autonomous driving: A review. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3412–3432. [Google Scholar] [CrossRef]
- Caie, P.D.; Dimitriou, N.; Arandjelović, O. Precision medicine in digital pathology via image analysis and machine learning. In Artificial Intelligence and Deep Learning in Pathology; Elsevier: Amsterdam, The Netherlands, 2021; pp. 149–173. [Google Scholar]
- Rojas, R. The backpropagation algorithm. In Neural Networks; Springer: Berlin/Heidelberg, Germany, 1996; pp. 149–182. [Google Scholar]
- Li, J.; Wu, Y.; Gaur, Y.; Wang, C.; Zhao, R.; Liu, S. On the comparison of popular end-to-end models for large scale speech recognition. arXiv 2020, arXiv:2005.14327. [Google Scholar]
- Li, L.; Gong, B. End-to-end video captioning with multitask reinforcement learning. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 339–348. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Carlini, N.; Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017; pp. 3–14. [Google Scholar]
- Carlini, N.; Wagner, D. Defensive distillation is not robust to adversarial examples. arXiv 2016, arXiv:1607.04311. [Google Scholar]
- Werbos, P.J. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting; John Wiley & Sons: Hoboken, NJ, USA, 1994; Volume 1. [Google Scholar]
- Werbos, P. New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Dissertation, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
- Hendler, J. Avoiding another AI winter. IEEE Intell. Syst. 2008, 23, 2–4. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Minsky, M.; Papert, S. Perceptrons: An Introduction to Computational Geometry; The MIT Press: Cambridge, MA, USA, 1969. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; Psychology Press: New York, NY, USA, 2005. [Google Scholar]
- Whittington, J.C.; Bogacz, R. Theories of error back-propagation in the brain. Trends Cogn. Sci. 2019, 23, 235–250. [Google Scholar] [CrossRef] [PubMed]
- Krotov, D.; Hopfield, J.J. Unsupervised learning by competing hidden units. Proc. Natl. Acad. Sci. USA 2019, 116, 7723–7731. [Google Scholar] [CrossRef] [Green Version]
- Tavanaei, A.; Ghodrati, M.; Kheradpisheh, S.R.; Masquelier, T.; Maida, A. Deep learning in spiking neural networks. Neural Netw. 2019, 111, 47–63. [Google Scholar] [CrossRef] [Green Version]
- Nøkland, A. Direct feedback alignment provides learning in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Ravichandran, N.B.; Lansner, A.; Herman, P. Brain-like approaches to unsupervised learning of hidden representations-a comparative study. In Proceedings of the International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 162–173. [Google Scholar]
- Bartunov, S.; Santoro, A.; Richards, B.; Marris, L.; Hinton, G.E.; Lillicrap, T. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Lillicrap, T.P.; Cownden, D.; Tweed, D.B.; Akerman, C.J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 2016, 7, 13276. [Google Scholar] [CrossRef]
- Lee, D.H.; Zhang, S.; Fischer, A.; Bengio, Y. Difference target propagation. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 7–11 September 2015; pp. 498–515. [Google Scholar]
- Illing, B.; Gerstner, W.; Brea, J. Biologically plausible deep learning—But how far can we go with shallow networks? Neural Netw. 2019, 118, 90–101. [Google Scholar] [CrossRef]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Zhao, Q.; Adeli, E.; Pfefferbaum, A.; Sullivan, E.V.; Pohl, K.M. Confounder-aware visualization of convnets. In Proceedings of the International Workshop on Machine Learning In Medical Imaging, Shenzhen, China, 13 October 2019; pp. 328–336. [Google Scholar]
- Xia, Y.; Zhang, J.; Jiang, T.; Gong, Z.; Yao, W.; Feng, L. HatchEnsemble: An efficient and practical uncertainty quantification method for deep neural networks. Complex Intell. Syst. 2021, 7, 2855–2869. [Google Scholar] [CrossRef]
- Lampinen, J.; Vehtari, A. Bayesian approach for neural networks—Review and case studies. Neural Netw. 2001, 14, 257–274. [Google Scholar] [CrossRef]
- Cooper, J.; Arandjelović, O.; Harrison, D.J. Believe the HiPe: Hierarchical perturbation for fast, robust, and model-agnostic saliency mapping. Pattern Recognit. 2022, 129, 108743. [Google Scholar] [CrossRef]
- Dimitriou, N.; Arandjelovic, O. Magnifying Networks for Images with Billions of Pixels. arXiv 2021, arXiv:2112.06121. [Google Scholar]
- Grinberg, L.; Hopfield, J.; Krotov, D. Local unsupervised learning for image analysis. arXiv 2019, arXiv:1908.08993. [Google Scholar]
- Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017; Volume 1. [Google Scholar]
- Rumelhart, D.E.; Zipser, D. Feature discovery by competitive learning. Cogn. Sci. 1985, 9, 75–112. [Google Scholar] [CrossRef]
- Hendrycks, D.; Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. arXiv 2019, arXiv:1903.12261. [Google Scholar]
- Recht, B.; Roelofs, R.; Schmidt, L.; Shankar, V. Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv 2018, arXiv:1806.00451. [Google Scholar]
- Azulay, A.; Weiss, Y. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv 2018, arXiv:1805.12177. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Hendrycks, D.; Mazeika, M.; Dietterich, T. Deep anomaly detection with outlier exposure. arXiv 2018, arXiv:1812.04606. [Google Scholar]
- Hendrycks, D.; Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv 2016, arXiv:1610.02136. [Google Scholar]
- Liu, S.; Garrepalli, R.; Dietterich, T.; Fern, A.; Hendrycks, D. Open category detection with PAC guarantees. In Proceedings of the International Conference on Machine Learning (PMLR), Stockholm, Sweden, 10–15 July 2018; pp. 3169–3178. [Google Scholar]
- Steinhardt, J.; Koh, P.W.W.; Liang, P.S. Certified defenses for data poisoning attacks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Hendrycks, D.; Mazeika, M.; Wilson, D.; Gimpel, K. Using trusted data to train deep networks on labels corrupted by severe noise. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Löwel, S.; Singer, W. Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity. Science 1992, 255, 209–212. [Google Scholar] [CrossRef] [PubMed]
- Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 1982, 15, 267–273. [Google Scholar] [CrossRef] [PubMed]
- Dimitriou, N.; Arandjelovic, O. A new look at ghost normalization. arXiv 2020, arXiv:2007.08554. [Google Scholar]
- Linsker, R. From basic network principles to neural architecture: Emergence of spatial-opponent cells. Proc. Natl. Acad. Sci. USA 1986, 83, 7508–7512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pehlevan, C.; Hu, T.; Chklovskii, D.B. A Hebbian/anti-Hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput. 2015, 27, 1461–1495. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Von der Malsburg, C. Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 1973, 14, 85–100. [Google Scholar] [CrossRef]
- Seung, H.S.; Zung, J. A correlation game for unsupervised learning yields computational interpretations of Hebbian excitation, anti-Hebbian inhibition, and synapse elimination. arXiv 2017, arXiv:1704.00646. [Google Scholar]
- Chakravarthy, A. Visualizing Intermediate Activations of a CNN Trained on the MNIST Dataset. 2019. Available online: https://towardsdatascience.com/visualizing-intermediate-activations-of-a-cnn-trained-on-the-mnist-dataset-2c34426416c8 (accessed on 29 October 2022).
- Bottou, L. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 421–436. [Google Scholar]
Data Set | p | k | Batch Size | Epoch # | Learning Rate | |
---|---|---|---|---|---|---|
MNIST | 2 | 2 | 0.4 | 100 | 1000 | 0.02 → 0.00 |
CIFAR-10 | 2 | 2 | 0.3 | 100 | 1000 | 0.02 → 0.00 |
Data Set | Optimizer | Batch Size | Epoch # | Learning Rate | m | n | |
---|---|---|---|---|---|---|---|
MNIST | Adam | 100 | 600 | 0.0001 | 6 | 4.5 | 0.01 |
CIFAR-10 | Adam | 100 | 600 | 0.004 → 0.00001 | 6 | 4.5 | 0.01 |
ine Data Set | Optimizer | Batch Size | Epoch # | Learning Rate | m | n | |
---|---|---|---|---|---|---|---|
MNIST | Adam | 100 | 600 | 0.001 → 0.00001 | 6 | 1 | 0.01 |
CIFAR-10 | Adam | 100 | 600 | 0.004 → 0.001 | 4 | 1 | 0.01 |
ine | Classic NN | Proposed Model |
---|---|---|
ine No noise | 98.52 | 97.39 |
Noise (zero mean = 0, 0.1 STD) | 98.52 | 96.87 |
Noise (zero mean, 0.5 STD) | 96.60 | 92.86 |
Noise (zero mean, 1 STD) | 87.74 | 73.13 |
ine |
ine | Classic NN | Proposed Model |
---|---|---|
ine No noise | 55.15 | 48.62 |
Noise (zero mean = 0, 0.1 STD) | 54.84 | 48.18 |
Noise (zero mean, 0.5 STD) | 51.18 | 35.70 |
Noise (zero mean, 1 STD) | 43.50 | 21.07 |
ine |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Apparaju, A.; Arandjelović, O. Towards New Generation, Biologically Plausible Deep Neural Network Learning. Sci 2022, 4, 46. https://doi.org/10.3390/sci4040046
Apparaju A, Arandjelović O. Towards New Generation, Biologically Plausible Deep Neural Network Learning. Sci. 2022; 4(4):46. https://doi.org/10.3390/sci4040046
Chicago/Turabian StyleApparaju, Anirudh, and Ognjen Arandjelović. 2022. "Towards New Generation, Biologically Plausible Deep Neural Network Learning" Sci 4, no. 4: 46. https://doi.org/10.3390/sci4040046