Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems
Abstract
1. Introduction
2. Materials and Methods
2.1. PRISMA Hyperspectral Dataset
2.2. Fashion MNIST Dataset
2.3. Tensor Network
- (i)
- Reduced density matrix in the label space, extracted using the contraction scheme shown in Figure 2d. This quantity encodes the coherence and distinguishability between label states during training.
- (ii)
- Local magnetization , computed separately for each label and feature index , as depicted in Figure 2e. These expectation values reveal the contribution of individual input features to the classification decision, serving as a form of interpretable attribution.
2.4. Transfer Entropy
2.5. O-Information
3. Results
3.1. Classification Performances
- (i)
- Fashion MNIST: dress, sneaker, bag;
- (ii)
- Hyperspectral land cover: cropland, olive tree, grapevine.
3.2. Magnetization Pattern Extraction
- (i)
- Top row: label 0—dress/cropland;
- (ii)
- Middle row: label 1—sneaker/olive tree;
- (iii)
- Bottom row: label 2—bag/grapevine.
3.3. Causal Information Transfer Between Quantum Masks
3.4. Score Redundancy Peak at Grokking
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
QML | quantum machine learning |
MPS | Matrix Product State |
MNIST | Modified National Institute of Standards and Technology |
PRISMA | PRecursore IperSpettrale della Missione Applicativa |
SVD | singular value decomposition |
Appendix A
Appendix B
References
- Arrazola, J.M.; Bergholm, V.; Brádler, K.; Bromley, T.R.; Collins, M.J.; Dhand, I.; Fumagalli, A.; Gerrits, T.; Goussev, A.; Helt, L.G.; et al. Quantum circuits with many photons on a programmable nanophotonic chip. Nature 2021, 591, 54–60. [Google Scholar] [CrossRef]
- Banchi, L.; Fingerhuth, M.; Babej, T.; Ing, C.; Arrazola, J.M. Molecular docking with Gaussian Boson Sampling. Sci. Adv. 2020, 6, 23. [Google Scholar] [CrossRef] [PubMed]
- Yu, S.; Zhong, Z.-P.; Fang, Y.; Patel, R.B.; Li, Q.-P.; Liu, W.; Li, Z.; Xu, L.; Sagona-Stophel, S.; Mer, E.; et al. A universal programmable Gaussian boson sampler for drug discovery. Nat. Comput. Sci. 2023, 3, 839–848. [Google Scholar] [CrossRef]
- Vakili, M.G.; Gorgulla, C.; Nigam, A.K.; Bezrukov, D.; Varoli, D.; Aliper, A.; Polykovsky, D.; Krishna, M.; Das, P.; Snider, J.; et al. Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS. Nat. Biotechnol. 2025. [Google Scholar] [CrossRef]
- Benedetti, M.; Garcia-Pintos, D.; Perdomo, O.; Leyton-Ortega, V.; Nam, Y.; Perdomo-Ortiz, A. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Inf. 2019, 1, 45. [Google Scholar] [CrossRef]
- Hibat-Allah, M.; Mauri, M.; Carrasquilla, J.; Perdomo-Ortiz, A. A framework for demonstrating practical quantum advantage: Comparing quantum against classical generative models. Commun. Phys. 2024, 7, 1. [Google Scholar] [CrossRef]
- Gili, K.; Hibat-Allah, M.; Mauri, M.; Ballance, C.; Perdomo-Ortiz, A. Do quantum circuit Born machines generalize? Quantum Sci. Technol. 2023, 8, 035021. [Google Scholar] [CrossRef]
- Caro, M.C.; Huang, H.Y.; Cerezo, M.; Sharma, K.; Sornborger, A.; Cincio, L.; Coles, P.J. Generalization in quantum machine learning from few training data. Nat. Commun. 2022, 13, 4919. [Google Scholar] [CrossRef]
- Gibbs, J.; Holmes, Z.; Caro, M.C.; Ezzell, N.; Huang, H.-Y.; Cincio, L.; Sornborger, A.T.; Coles, P.J. Dynamical simulation via quantum machine learning with provable generalization. Phys. Rev. Res. 2024, 1, 013241. [Google Scholar]
- Peters, E.; Schuld, M. Generalization despite overfitting in quantum machine learning models. Quantum 2023, 7, 1210. [Google Scholar] [CrossRef]
- Bowles, J.; Wright, V.J.; Farkas, M.; Killoran, N.; Schuld, M. Contextuality and inductive bias in quantum machine learning. arXiv 2023, arXiv:2302.01365. [Google Scholar] [CrossRef]
- Gil-Fuster, E.; Eisert, J.; Bravo-Prieto, C. Understanding quantum machine learning also requires rethinking generalization. Nat. Commun. 2024, 15, 2277. [Google Scholar] [CrossRef]
- Pomarico, D.; Monaco, A.; Amoroso, N.; Bellantuono, L.; Lacalamita, A.; La Rocca, M.; Maggipinto, T.; Pantaleo, E.; Tangaro, S.; Stramaglia, S.; et al. Emerging generalization advantage of quantum-inspired machine learning in the diagnosis of hepatocellular carcinoma. Discov. Appl. Sci. 2025, 7, 205. [Google Scholar] [CrossRef]
- Pomarico, D.; Fanizzi, A.; Amoroso, N.; Bellotti, R.; Biafora, A.; Bove, S.; Didonna, V.; La Forgia, D.; Pastena, M.I.; Tamborra, P.; et al. A Proposal of Quantum-Inspired Machine Learning for Medical Purposes: An Application Case. Mathematics 2021, 9, 410. [Google Scholar] [CrossRef]
- Nahum, A.; Roy, S.; Skinner, B.; Ruhman, J. Measurement and Entanglement Phase Transitions in All-To-All Quantum Circuits, on Quantum Trees, and in Landau-Ginsburg Theory. PRX Quantum 2021, 1, 010352. [Google Scholar]
- Pomarico, D.; Cosmai, L.; Facchi, P.; Lupo, C.; Pascazio, S.; Pepe, F.V. Dynamical Quantum Phase Transitions of the Schwinger Model: Real-Time Dynamics on IBM Quantum. Entropy 2023, 25, 4. [Google Scholar] [CrossRef] [PubMed]
- Pomarico, D.; Pandey, M.; Cioli, R.; Dell’Anna, F.; Pascazio, S.; Pepe, F.V.; Facchi, P.; Ercolessi, E. Quantum Error Mitigation in Optimized Circuits for Particle-Density Correlations in Real-Time Dynamics of the Schwinger Model. Entropy 2025, 27, 427. [Google Scholar] [CrossRef]
- Ran, S.-J. Encoding of matrix product states into quantum circuits of one- and two-qubit gates. Phys. Rev. A 2020, 3, 032310. [Google Scholar] [CrossRef]
- Rudolph, M.S.; Chen, J.; Miller, J.; Acharya, A.; Perdomo-Ortiz, A. Decomposition of matrix product states into shallow quantum circuits. Quantum Sci. Technol. 2023, 9, 015012. [Google Scholar] [CrossRef]
- Rudolph, M.S.; Miller, J.; Motlagh, D.; Chen, J.; Acharya, A.; Perdomo-Ortiz, A. Synergistic pretraining of parametrized quantum circuits via tensor networks. Nat. Commun. 2023, 14, 8367. [Google Scholar] [CrossRef]
- Schuhmacher, J.; Ballarin, M.; Baiardi, A.; Magnifico, G.; Tacchino, F.; Montangero, S.; Tavernelli, I. Hybrid Tree Tensor Networks for Quantum Simulation. PRX Quantum 2025, 1, 010320. [Google Scholar] [CrossRef]
- Khosrojerdi, M.; Pereira, J.L.; Cuccoli, A.; Banchi, L. Learning to classify quantum phases of matter with a few measurements. Quantum Sci. Technol. 2025, 10, 025006. [Google Scholar] [CrossRef]
- Stoudenmire, E.M.; Schwab, D.J. Supervised Learning with Quantum-Inspired Tensor Networks. arXiv 2017, arXiv:1605.05775. [Google Scholar] [CrossRef]
- Stoudenmire, E.M.; Schwab, D.J. Supervised Learning with Tensor Networks. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Huggins, W.; Patil, P.; Mitchell, B.; Whaley, B.; Stoudenmire, E.M. Towards quantum machine learning with tensor networks. Quantum Sci. Technol. 2019, 4, 024001. [Google Scholar] [CrossRef]
- Felser, T.; Trenti, M.; Sestini, L.; Gianelle, A.; Zuliani, D.; Lucchesi, D.; Montangero, S. Quantum-inspired machine learning on high-energy physics data. npj Quantum Inf. 2021, 7, 111. [Google Scholar] [CrossRef]
- Dborin, J.; Barratt, F.; Wimalaweera, V.; Wright, L.; Green, A.G. Matrix product state pre-training for quantum machine learning. Quantum Sci. Technol. 2022, 7, 035014. [Google Scholar] [CrossRef]
- Ballarin, M.; Mangini, S.; Montangero, S.; Macchiavello, C.; Mengoni, R. Entanglement entropy production in Quantum Neural Networks. Quantum 2023, 7, 1023. [Google Scholar] [CrossRef]
- Collura, M.; Dell’Anna, L.; Felser, T.; Montangero, S. On the descriptive power of Neural-Networks as constrained Tensor Networks with exponentially large bond dimension. SciPost 2021, 4, 1. [Google Scholar] [CrossRef]
- Glasser, I.; Pancotti, N.; August, M.; Rodriguez, I.D.; Cirac, J.I. Neural-Network Quantum States, String-Bond States, and Chiral Topological States. Phys. Rev. X 2018, 8, 011006. [Google Scholar] [CrossRef]
- Glasser, I.; Pancotti, N.; Cirac, J.I. From Probabilistic Graphical Models to Generalized Tensor Networks for Supervised Learning. IEEE Access 2020, 8, 68169–68182. [Google Scholar] [CrossRef]
- Gallego, A.J.; Orús, R. From Language Design as Information Renormalization. Springer Nat. Comput. Sci. 2022, 3, 140. [Google Scholar]
- Cheng, S.; Wang, L.; Xiang, T.; Zhang, P. Tree tensor networks for generative modeling. Phys. Rev. B 2019, 99, 155131. [Google Scholar] [CrossRef]
- Cheng, S.; Wang, L.; Xiang, T.; Zhang, P. Machine learning with tree tensor networks, CP rank constraints, and tensor dropout. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7825–7832. [Google Scholar] [CrossRef]
- Pomarico, D.; Monaco, A.; Magnifico, G.; Lacalamita, A.; Pantaleo, E.; Bellantuono, L.; Tangaro, S.; Maggipinto, T.; La Rocca, M.; Picardi, E.; et al. Grokking as an entanglement transition in tensor network machine learning. arXiv 2025, arXiv:2503.10483. [Google Scholar] [CrossRef]
- Larrarte, O.S.; Aizpurua, B.; Dastbasteh, R.; Otxoa, R.M.; Martinez, J.E. Tensor Network based Gene Regulatory Network Inference for Single-Cell Transcriptomic Data. arXiv 2025, arXiv:2509.06891. [Google Scholar] [CrossRef]
- Venkatesh, S.M.; Macaluso, A.; Nuske, M.; Klusch, M.; Dengel, A. Q-Seg: Quantum Annealing-Based Unsupervised Image Segmentation. IEEE Comput. Graph. Appl. 2024, 44, 27–39. [Google Scholar] [CrossRef]
- Power, A.; Burda, Y.; Edwards, H.; Babuschkin, I.; Misra, V. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. arXiv 2022, arXiv:2201.02177. [Google Scholar] [CrossRef]
- Liu, Z.; Kitouni, O.; Nolte, N.; Michaud, E.J.; Tegmark, M.; Williams, M. Towards Understanding Grokking: An Effective Theory of Representation Learning. arXiv 2022, arXiv:2205.10343. [Google Scholar] [CrossRef]
- Liu, Z.; Zhong, Z.; Tegmark, M. Grokking as Compression: A Nonlinear Complexity Perspective. arXiv 2023, arXiv:2310.05918. [Google Scholar] [CrossRef]
- Miller, J.; O’Neill, C.; Bui, T. Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity. arXiv 2024, arXiv:2310.17247. [Google Scholar] [CrossRef]
- Varma, V.; Shah, R.; Kenton, Z.; Kramár, J.; Kumar, R. Explaining grokking through circuit efficiency. arXiv 2023, arXiv:2309.02390. [Google Scholar] [CrossRef]
- Huang, Y.; Hu, S.; Han, X.; Liu, Z.; Sun, M. Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition. arXiv 2024, arXiv:2402.15175. [Google Scholar] [CrossRef]
- Mei, S.; Montanari, A.; Nguyen, P.-M. A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. USA 2018, 115, E7665–E7671. [Google Scholar] [CrossRef]
- Seroussi, I.; Naveh, G.; Ringel, Z. Separation of scales and a thermodynamic description of feature learning in some CNNs. Nat. Commun. 2023, 14, 908. [Google Scholar] [CrossRef]
- Rubin, N.; Seroussi, I.; Ringel, Z. Grokking as a First Order Phase Transition in Two Layer Networks. arXiv 2024, arXiv:2310.03789. [Google Scholar]
- Clauw, K.; Stramaglia, S.; Marinazzo, D. Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition. arXiv 2024, arXiv:2408.08944. [Google Scholar] [CrossRef]
- Varley, T.F. Information Theory for Complex Systems Scientists: What, Why, & How? arXiv 2023, arXiv:2304.12482. [Google Scholar]
- Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef]
- Faes, L.; Kugiumtzis, D.; Nollo, G.; Jurysta, F.; Marinazzo, D. Estimating the decomposition of predictive information in multivariate systems. Phys. Rev. E 2015, 91, 032904. [Google Scholar] [CrossRef]
- Rosas, F.E.; Mediano, P.A.M.; Gastpar, M.; Jensen, H.J. Quantifying high-order interdependencies via multivariate extensions of the mutual information. Phys. Rev. E 2019, 100, 032305. [Google Scholar] [CrossRef]
- White, S.R. Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 1992, 69, 2863–2866. [Google Scholar] [CrossRef]
- Schollwöck, U. The density-matrix renormalization group. Rev. Mod. Phys. 2005, 77, 259–315. [Google Scholar] [CrossRef]
- Schollwöck, U. The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 2011, 326, 96–192. [Google Scholar] [CrossRef]
- Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef]
- Hackbusch, W. Tensor Spaces and Numerical Tensor Calculus; Springer Series in Computational Mathematics: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- De Lathauwer, L.; De Moor, B.; Vandewalle, J. A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl. 2000, 21, 1253–1278. [Google Scholar] [CrossRef]
- Oseledets, I.V. Tensor-Train Decomposition. SIAM J. Sci. Comput. 2011, 33, 2295–2317. [Google Scholar] [CrossRef]
- Zaletel, M.P.; Pollmann, F. Isometric Tensor Network States in Two Dimensions. Phys. Rev. Lett. 2020, 124, 037201. [Google Scholar] [CrossRef]
- Evenbly, G. A Practical Guide to the Numerical Implementation of Tensor Networks I: Contractions, Decompositions, and Gauge Freedom. Front. Appl. Math. Stat. 2022, 8, 806549. [Google Scholar] [CrossRef]
- Wiersema, R.; Zhou, C.; Carrasquilla, J.F.; Kim, Y.B. Measurement-induced entanglement phase transitions in variational quantum circuits. SciPost Phys. 2023, 14, 147. [Google Scholar] [CrossRef]
- Li, D.; Zheng, C. Non-Hermitian Generalization of Rényi Entropy. Entropy 2022, 24, 1563. [Google Scholar] [CrossRef]
- Liu, Z.; Zheng, C. Non-Hermitian Quantum Rényi Entropy Dynamics in Anyonic-PT Symmetric Systems. Symmetry 2024, 16, 584. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Frenzel, S.; Pompe, B. Partial Mutual Information for Coupling Analysis of Multivariate Time Series. Phys. Rev. Lett. 2007, 99, 204101. [Google Scholar] [CrossRef]
- ReCaS Bari. Available online: https://www.recas-bari.it/index.php/en/ (accessed on 27 September 2025).
1 | 1.298 | 1.304 | 0.726 | 1.265 | 1.248 | 0.056 |
2 | 1.515 | 1.221 | 0.774 | 1.177 | 1.297 | 0.107 |
3 | 1.524 | 0.918 | 0.725 | 0.850 | 1.005 | 0.102 |
4 | 1.415 | 0.713 | 0.726 | 0.421 | 0.782 | 0.244 |
5 | 1.279 | 0.750 | 0.763 | 0.192 | 0.597 | 0.333 |
6 | 1.296 | 0.554 | 0.800 | 0.079 | 0.489 | 0.358 |
7 | 1.290 | 0.396 | 0.857 | 0.283 | 0.366 | 0.369 |
8 | 1.287 | 0.215 | 0.934 | 0.662 | 0.243 | 0.409 |
9 | 1.295 | 0.036 | 1.009 | 1.005 | 0.230 | 0.429 |
10 | 1.344 | 0.148 | 1.098 | 1.399 | 0.187 | 0.387 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pomarico, D.; Cilli, R.; Monaco, A.; Bellantuono, L.; La Rocca, M.; Maggipinto, T.; Magnifico, G.; Ontivero Ortega, M.; Pantaleo, E.; Tangaro, S.; et al. Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems. Technologies 2025, 13, 438. https://doi.org/10.3390/technologies13100438
Pomarico D, Cilli R, Monaco A, Bellantuono L, La Rocca M, Maggipinto T, Magnifico G, Ontivero Ortega M, Pantaleo E, Tangaro S, et al. Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems. Technologies. 2025; 13(10):438. https://doi.org/10.3390/technologies13100438
Chicago/Turabian StylePomarico, Domenico, Roberto Cilli, Alfonso Monaco, Loredana Bellantuono, Marianna La Rocca, Tommaso Maggipinto, Giuseppe Magnifico, Marlis Ontivero Ortega, Ester Pantaleo, Sabina Tangaro, and et al. 2025. "Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems" Technologies 13, no. 10: 438. https://doi.org/10.3390/technologies13100438
APA StylePomarico, D., Cilli, R., Monaco, A., Bellantuono, L., La Rocca, M., Maggipinto, T., Magnifico, G., Ontivero Ortega, M., Pantaleo, E., Tangaro, S., Stramaglia, S., Bellotti, R., & Amoroso, N. (2025). Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems. Technologies, 13(10), 438. https://doi.org/10.3390/technologies13100438