Next Article in Journal
Information-Theoretic Reliability Analysis of Linear Consecutive r-out-of-n:F Systems and Uniformity Testing
Previous Article in Journal
Scaling Laws in Language Families
Previous Article in Special Issue
Fault Diagnosis of Wind Turbine Gearbox Based on Improved Multivariate Variational Mode Decomposition and Ensemble Refined Composite Multivariate Multiscale Dispersion Entropy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Information-Theoretical Analysis of a Transformer-Based Generative AI Model

Department of Electrical and Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USA
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(6), 589; https://doi.org/10.3390/e27060589
Submission received: 26 March 2025 / Revised: 26 May 2025 / Accepted: 28 May 2025 / Published: 31 May 2025

Abstract

Large Language models have shown a remarkable ability to “converse” with humans in a natural language across myriad topics. Despite the proliferation of these models, a deep understanding of how they work under the hood remains elusive. The core of these Generative AI models is composed of layers of neural networks that employ the Transformer architecture. This architecture learns from large amounts of training data and creates new content in response to user input. In this study, we analyze the internals of the Transformer using Information Theory. To quantify the amount of information passing through a layer, we view it as an information transmission channel and compute the capacity of the channel. The highlight of our study is that, using Information-Theoretical tools, we develop techniques to visualize on an Information plane how the Transformer encodes the relationship between words in sentences while these words are projected into a high-dimensional vector space. We use Information Geometry to analyze the high-dimensional vectors in the Transformer layer and infer relationships between words based on the length of the geodesic connecting these vector distributions on a Riemannian manifold. Our tools reveal more information about these relationships than attention scores. In this study, we also show how Information-Theoretic analysis can help in troubleshooting learning problems in the Transformer layers.
Keywords: Machine learning; Generative AI; Transformer; Information Theory; Mutual Information estimation; Information Geometry; Riemann manifold; Fisher metric; geodesic Machine learning; Generative AI; Transformer; Information Theory; Mutual Information estimation; Information Geometry; Riemann manifold; Fisher metric; geodesic

Share and Cite

MDPI and ACS Style

Deb, M.; Ogunfunmi, T. Information-Theoretical Analysis of a Transformer-Based Generative AI Model. Entropy 2025, 27, 589. https://doi.org/10.3390/e27060589

AMA Style

Deb M, Ogunfunmi T. Information-Theoretical Analysis of a Transformer-Based Generative AI Model. Entropy. 2025; 27(6):589. https://doi.org/10.3390/e27060589

Chicago/Turabian Style

Deb, Manas, and Tokunbo Ogunfunmi. 2025. "Information-Theoretical Analysis of a Transformer-Based Generative AI Model" Entropy 27, no. 6: 589. https://doi.org/10.3390/e27060589

APA Style

Deb, M., & Ogunfunmi, T. (2025). Information-Theoretical Analysis of a Transformer-Based Generative AI Model. Entropy, 27(6), 589. https://doi.org/10.3390/e27060589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop