Delving into Unsupervised Hebbian Learning from Artificial Intelligence Perspectives
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Definitions of Hebbian Learning
3.1.1. Mathematical Formulation of Hebbian Learning
3.1.2. Hebbian Weight Update Rule
3.1.3. Implementation of Anti-Hebbian Mechanism
3.2. Supervised Learning Settings
3.3. Definition of Different Activation Functions
3.4. Statistics of Datasets
3.5. Approaches to Data Fusion
3.5.1. Strategies for Utilizing Additional Unlabeled Datasets
- s0. Synapses are trained on the Fashion-MNIST training set for 25 epochs.
- s1. Synapses are first trained on the KMNIST training set for 5 epochs, followed by training on the Fashion-MNIST training set for 20 epochs.
- s2. Two separate sets of synapses are independently trained for 25 epochs, one on the Fashion-MNIST training set and the other on the KMNIST training set, before being combined.
- s3. Two separate sets of synapses are independently trained for 25 epochs, one on the Fashion-MNIST training set and the other on the MNIST training set, before being combined.
- s4. Two separate synapses are independently trained for 25 epochs, one on the Fashion-MNIST training set and the other on a combined set of KMNIST and MNIST training data, before being combined.
- s5. Two separate synapses are independently trained for 25 epochs, one on the Fashion-MNIST training set and the other on its test set, before being combined.
- s6. Synapses are trained for 25 epochs on the combined Fashion-MNIST training and test sets.
- s7. Synapses are trained for 20 epochs on the combined Fashion-MNIST training and test sets.
3.5.2. Equation for Synaptic Weight Merging with Multiple Datasets
3.6. Training Details
- Batch size: 100.
- Learning rate: Linear decay from to 0.
- Number of hidden neurons: (matching input dimensionality).
- Power parameter: .
- Anti-Hebbian strength: .
- Anti-Hebbian indices: .
- Numerical stability threshold: .
- Batch size: 100.
- Learning rate: Cosine annealing from 0.01 to 0 over 200 epochs.
- Optimizer: Adam [4].
- Loss function: Cross-entropy.
4. Results
4.1. Overfitting in Hebbian Learning
4.1.1. Degraded Performance Caused by Overfitting
4.1.2. Visualization of Trained Synapses Under Overfitting
4.2. Effects of Activation Functions on Representations Learned by Hebbian Learning
4.3. Results of Data Fusion Strategies
4.3.1. Comparison of Different Data Fusion Strategies
4.3.2. Influence of Hybrid Parameter
5. Conclusions
6. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Sejnowski, T.J. The unreasonable effectiveness of deep learning in artificial intelligence. Proc. Natl. Acad. Sci. USA 2020, 117, 30033–30038. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
- Bengio, Y.; Lee, D.H.; Bornschein, J.; Mesnard, T.; Lin, Z. Towards biologically plausible deep learning. arXiv 2015, arXiv:1502.04156. [Google Scholar]
- Illing, B.; Gerstner, W.; Brea, J. Biologically plausible deep learning—But how far can we go with shallow networks? Neural Netw. 2019, 118, 90–101. [Google Scholar] [CrossRef]
- Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; Psychology Press: London, UK, 2005. [Google Scholar]
- Fung, C.C.A.; Fukai, T. Competition on presynaptic resources enhances the discrimination of interfering memories. PNAS Nexus 2023, 2, pgad161. [Google Scholar] [CrossRef]
- Krotov, D.; Hopfield, J.J. Unsupervised learning by competing hidden units. Proc. Natl. Acad. Sci. USA 2019, 116, 7723–7731. [Google Scholar] [CrossRef]
- Moraitis, T.; Toichkin, D.; Journé, A.; Chua, Y.; Guo, Q. Softhebb: Bayesian inference in unsupervised hebbian soft winner-take-all networks. Neuromorphic Comput. Eng. 2022, 2, 044017. [Google Scholar] [CrossRef]
- Maass, W. On the computational power of winner-take-all. Neural Comput. 2000, 12, 2519–2535. [Google Scholar] [CrossRef] [PubMed]
- Gupta, M.; Ambikapathi, A.; Ramasamy, S. Hebbnet: A simplified hebbian learning framework to do biologically plausible learning. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3115–3119. [Google Scholar]
- Lagani, G.; Falchi, F.; Gennaro, C.; Fassold, H.; Amato, G. Scalable bio-inspired training of Deep Neural Networks with FastHebb. Neurocomputing 2024, 595, 127867. [Google Scholar] [CrossRef]
- Földiak, P. Forming sparse representations by local anti-Hebbian learning. Biol. Cybern. 1990, 64, 165–170. [Google Scholar]
- Pehlevan, C.; Hu, T.; Chklovskii, D.B. A hebbian/anti-hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput. 2015, 27, 1461–1495. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [PubMed]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 30, p. 3. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Malenka, R.C.; Bear, M.F. LTP and LTD. Neuron 2004, 44, 5–21. [Google Scholar] [CrossRef] [PubMed]
- Hoel, E. The overfitted brain: Dreams evolved to assist generalization. Patterns 2021, 2, 100244. [Google Scholar] [CrossRef] [PubMed]
- Salman, S.; Liu, X. Overfitting mechanism and avoidance in deep neural networks. arXiv 2019, arXiv:1901.06566. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 18–20 June 2016; pp. 2818–2826. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 18–20 June 2016; pp. 770–778. [Google Scholar]
- Lin, W.; Fung, C.C.A. Utilizing Data Imbalance to Enhance Compound–Protein Interaction Prediction Models. Adv. Intell. Syst. 2025, 2400985. [Google Scholar] [CrossRef]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
- Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. BEiT: BERT Pre-Training of Image Transformers. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Lin, W.; Fung, C.C.A. Rethinking the Masking Strategy for Pretraining Molecular Graphs from a Data-Centric View. ACS Omega 2024, 9, 20832–20838. [Google Scholar] [CrossRef]
- Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.S.; Leskovec, J. Strategies for Pre-training Graph Neural Networks. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7112–7127. [Google Scholar]
- Luo, H.; Xiang, Y.; Fang, X.; Lin, W.; Wang, F.; Wu, H.; Wang, H. BatchDTA: Implicit batch alignment enhances deep learning-based drug–target affinity estimation. Briefings Bioinform. 2022, 23, bbac260. [Google Scholar] [CrossRef]
- Wortsman, M.; Ilharco, G.; Gadre, S.Y.; Roelofs, R.; Gontijo-Lopes, R.; Morcos, A.S.; Namkoong, H.; Farhadi, A.; Carmon, Y.; Kornblith, S.; et al. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 23965–23998. [Google Scholar]
- Krotov, D.; Hopfield, J.J. Dense associative memory for pattern recognition. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Red Hook, NY, USA, 5–10 December 2016; pp. 1180–1188. [Google Scholar]
- Ba, J.; Frey, B. Adaptive dropout for training deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NV, USA, 5–8 December 2013; Volume 26. [Google Scholar]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
- Clanuwat, T.; Bober-Irizar, M.; Kitamoto, A.; Lamb, A.; Yamamoto, K.; Ha, D. Deep learning for classical japanese literature. arXiv 2018, arXiv:1812.01718. [Google Scholar] [CrossRef]
- Wu, J.; Pan, Y.; Ye, Q.; Zhou, J.; Gou, F. Intelligent cell images segmentation system: Based on SDN and moving transformer. Sci. Rep. 2024, 14, 24834. [Google Scholar] [CrossRef] [PubMed]
- Jain, N.; Chiang, P.y.; Wen, Y.; Kirchenbauer, J.; Chu, H.M.; Somepalli, G.; Bartoldson, B.R.; Kailkhura, B.; Schwarzschild, A.; Saha, A.; et al. NEFTune: Noisy Embeddings Improve Instruction Finetuning. In Proceedings of the ICLR, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Lv, Z.; Zhu, S.; Wang, Y.; Ren, Y.; Luo, M.; Wang, H.; Zhang, G.; Zhai, Y.; Zhao, S.; Zhou, Y.; et al. Development of bio-voltage operated humidity-sensory neurons comprising self-assembled peptide memristors. Adv. Mater. 2024, 36, 2405145. [Google Scholar] [CrossRef] [PubMed]
- Journé, A.; Rodriguez, H.G.; Guo, Q.; Moraitis, T. Hebbian deep learning without feedback. arXiv 2022, arXiv:2209.11883. [Google Scholar]
- Schmidgall, S.; Ziaei, R.; Achterberg, J.; Kirsch, L.; Hajiseyedrazi, S.; Eshraghian, J. Brain-inspired learning in artificial neural networks: A review. APL Mach. Learn. 2024, 2, 021501. [Google Scholar] [CrossRef]
- Hinton, G. The forward-forward algorithm: Some preliminary investigations. arXiv 2022, arXiv:2212.13345. [Google Scholar] [CrossRef]







| Dataset | Training Samples | Testing Samples | Description |
|---|---|---|---|
| Fashion-MNIST | 60,000 | 10,000 | Ten types of clothing |
| MNIST | 60,000 | 10,000 | Ten types of handwritten digits |
| KMNIST | 60,000 | 10,000 | Ten types of handwritten digits in Japanese |
| Metric | Training Accuracy | Test Accuracy | ||||
|---|---|---|---|---|---|---|
| Epochs | 20 | 25 | 30 | 20 | 25 | 30 |
| RePU | 87.91 ± 0.10 | 77.84 ± 2.29 | 56.08 ± 0.05 | 82.25 ± 0.05 | 73.02 ± 2.07 | 54.91 ± 0.32 |
| ReLU | 89.27 ± 0.05 | 85.78 ± 2.02 | 67.08 ± 0.04 | 84.57 ± 0.04 | 80.01 ± 2.18 | 66.35 ± 0.11 |
| LeakyReLU | 89.15 ± 0.04 | 90.21 ± 0.34 | 86.35 ± 0.05 | 84.53 ± 0.05 | 84.66 ± 0.20 | 84.03 ± 0.07 |
| GELU | 89.32 ± 0.04 | 87.25 ± 0.85 | 68.62 ± 0.07 | 84.58 ± 0.08 | 81.81 ± 0.93 | 67.02 ± 0.13 |
| Swish | 89.44 ± 0.09 | 89.18 ± 0.61 | 75.31 ± 1.46 | 84.67 ± 0.17 | 83.57 ± 0.69 | 72.95 ± 1.21 |
| Softplus | 89.46 ± 0.10 | 90.90 ± 0.26 | 72.24 ± 0.12 | 84.68 ± 0.27 | 85.07 ± 0.46 | 70.42 ± 0.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, W.; Piao, Z.; Fung, C.C.A. Delving into Unsupervised Hebbian Learning from Artificial Intelligence Perspectives. Mach. Learn. Knowl. Extr. 2025, 7, 143. https://doi.org/10.3390/make7040143
Lin W, Piao Z, Fung CCA. Delving into Unsupervised Hebbian Learning from Artificial Intelligence Perspectives. Machine Learning and Knowledge Extraction. 2025; 7(4):143. https://doi.org/10.3390/make7040143
Chicago/Turabian StyleLin, Wei, Zhixin Piao, and Chi Chung Alan Fung. 2025. "Delving into Unsupervised Hebbian Learning from Artificial Intelligence Perspectives" Machine Learning and Knowledge Extraction 7, no. 4: 143. https://doi.org/10.3390/make7040143
APA StyleLin, W., Piao, Z., & Fung, C. C. A. (2025). Delving into Unsupervised Hebbian Learning from Artificial Intelligence Perspectives. Machine Learning and Knowledge Extraction, 7(4), 143. https://doi.org/10.3390/make7040143

