Next Article in Journal
Fermatean Fuzzy Two-Sided Matching Model Considering Regret Aversion and Matching Willingness
Previous Article in Journal
High-Accuracy Spectral-like Legendre–Darboux Method for Initial Value Problems
Previous Article in Special Issue
Spectral Theory and Hardy Spaces for Bessel Operators in Non-Standard Geometries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Probing the Topology of the Space of Tokens with Structured Prompts

1
Department of Mathematics and Statistics, American University, Washington, DC 20016, USA
2
Galois, Inc., Arlington, VA 22203, USA
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(20), 3320; https://doi.org/10.3390/math13203320
Submission received: 25 July 2025 / Revised: 30 September 2025 / Accepted: 15 October 2025 / Published: 17 October 2025
(This article belongs to the Special Issue New Perspectives in Harmonic Analysis)

Abstract

Some large language models (LLMs) are open source and are therefore fully open for scientific study. However, many LLMs are proprietary, and their internals are hidden, which hinders the ability of the research community to study their behavior under controlled conditions. For instance, the token input embedding specifies an internal vector representation of each token used by the model. If the token input embedding is hidden, latent semantic information about the set of tokens is unavailable to researchers. This article presents a general and flexible method for prompting an LLM to reveal its token input embedding, even if this information is not published with the model. Moreover, this article provides strong theoretical justification—a mathematical proof for generic LLMs—for why this method should be expected to work. If the LLM can be prompted systematically and certain benign conditions about the quantity of data collected from the responses are met, the topology of the token embedding is recovered. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of the Llemma-7BLLM. We demonstrate the flexibility of this method by performing the recovery at three different times, each using the same algorithm applied to different information collected from the responses. While the prompting can be a performance bottleneck depending on the size and complexity of the LLM, the recovery runs within a few hours on a typical workstation. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.
Keywords: large language model; autoregressive process; systematic prompting; dynamical system; genericity; embedding methods; transversality large language model; autoregressive process; systematic prompting; dynamical system; genericity; embedding methods; transversality

Share and Cite

MDPI and ACS Style

Robinson, M.; Dey, S.; Kushner, T. Probing the Topology of the Space of Tokens with Structured Prompts. Mathematics 2025, 13, 3320. https://doi.org/10.3390/math13203320

AMA Style

Robinson M, Dey S, Kushner T. Probing the Topology of the Space of Tokens with Structured Prompts. Mathematics. 2025; 13(20):3320. https://doi.org/10.3390/math13203320

Chicago/Turabian Style

Robinson, Michael, Sourya Dey, and Taisa Kushner. 2025. "Probing the Topology of the Space of Tokens with Structured Prompts" Mathematics 13, no. 20: 3320. https://doi.org/10.3390/math13203320

APA Style

Robinson, M., Dey, S., & Kushner, T. (2025). Probing the Topology of the Space of Tokens with Structured Prompts. Mathematics, 13(20), 3320. https://doi.org/10.3390/math13203320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop