Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey
Abstract
:1. Introduction and Background
- Studies on specific categorical frameworks corresponding to specific machine learning methods. Backpropagation is formalized within Cartesian differential categories, structuring gradient-based learning. Probabilistic models, including Bayesian inference, are studied in Markov categories, capturing stochastic dependencies. Clustering algorithms are analyzed in the category of metric spaces, providing a structured view of similarity-based learning.
- Methodological approaches that explore the potential applications of category theory to various aspects of machine learning from a broad mathematical perspective. For example, research has examined how topoi capture the internal properties of neural networks, how 2-categories formalize component compositions in learning models, and how toposes and stacks provide structured frameworks for encoding learning dynamics and invariances.
- Example 1. Dimensionality Reduction: Traditional methods risk information loss. Topos theory leverages its algebraic properties and thus enables dimensionality reduction while preserving structural integrity and extracting key features.
- Example 2. Interpretability of Machine Learning Models: Deep learning models are often ‘black boxes’. Topos theory provides logical reasoning to interpret internal structures, outputs, and emergent phenomena in large-scale models.
- Example 3. Dynamic Data Analysis: Traditional static analysis struggles with evolving data. Topos theory naturally captures temporal changes and local–global relationships.
- In gradient-based learning, we introduce base categories, functors, and the structure of compositional optimization.
- In probability-based learning, we present categorical probability models, Bayesian inference, and probabilistic programming.
- In invariance and equivalence-based learning, we explore categorical clustering, manifold learning, and persistent homology.
- In topos-based learning, we apply sheaf and stack structures to machine learning, building on Lafforgue’s reports.
- Finally, we discuss applications and future directions, covering frustrated AI systems, categorical modeling, and emerging challenges.
2. Developments in Gradient-Based Learning
2.1. Fundamental Components: The Base Categories and Functors
- Objects: the objects of .
- Morphisms: pairs of the form , representing a map from the input A to the output B, with P being an object of and .
- Compositions of morphisms: the composition of morphisms and is given by the pair .
- Identity: The identity endomorphism on A is the pair . ( due to the strict monoidal property.)
- Objects are pairs of objects in .
- A morphism from to consists of a pair of morphisms in , denoted as , as illustrated in
- The composition of and is given by get and put . The graphical notation is
- The identity on is the pair .
2.2. Composition of Components
- Model: A parameterized function maps inputs to outputs, with reparameterization enabling parameter updates.
- Loss Map: Computes error; its reverse map aids in gradient-based updates.
- Gradient Descent: Iteratively updates parameters using gradient information.
- Optimizer: Includes basic and stateful variants, the latter incorporating memory for adaptive updates.
- Learning Rate: A scalar controlling update step size.
- Corner Structure: Ensures compatibility between learning components.
- Case Study 1. Supervised Learning: In the former, fixing the optimizer enables parameter learning as a morphism in , yielding a lens of type , where ‘get’ is the identity morphism and ‘put’ maps inputs to updated parameters. The following formulas can summarize the learning process.
- * Updates of parameters for the network provided by the state and step size
- * Predicted output of the network
- * Difference between prediction and true value
- * Updates of parameters and input
- Case Study 2: DeepDream framework: This utilizes the parameters p of a trained classifier network to generate or amplify specific features and shapes of a given type b in the input a. This enables the network to enhance selected features in an image. The corresponding categorical framework formalizes how the gradient descent lens connects to the input, facilitating structured image modification to elicit a specific interpretation. The system learns a morphism in from to , with the parameter space . This defines a lens of type , where the ‘get’ function is trivial, and the ‘put’ function maps to . The learning process follows the formulas below.
- * The updated input provided by the state and step size
- * Prediction of the network
- * Changes in prediction and true value
- * Changes in parameters and input
2.3. Other Related Research
3. Developments in Probability-Based Learning
- Empirical Probability: Defined as the ratio of event occurrences to total observations. In the context of category theory, empirical probability is represented in three distinct ways: first, as a functor mapping observations A to distributions ; second, through the Giry monad, which captures finite measures; and third, within the framework of measure-theoretic probability, where categories such as formalize measurable spaces and measurable functions.
- Theoretical Probability: Defined as the ratio of favorable outcomes to possible outcomes. Within category theory, theoretical probability is modeled using the Giry monad to represent probability distributions, and categories like to formalize measurable spaces and functions. Furthermore, monoidal categories provide a structured framework for combining distributions, facilitating the modeling of probabilistic processes in machine learning.
- Joint Probability: Joint probability, , quantifies the likelihood of two events occurring together. In category theory, it is modeled using the copying structure in Markov categories or the tensor product in monoidal categories.
- Conditional Probability and Bayes’ Theorem in Machine Learning: Conditional probability plays a fundamental role in probabilistic reasoning, allowing the update of beliefs based on new information. In machine learning, it is widely used in probabilistic models such as Bayesian networks and hidden Markov models. The joint probability of two events can be expressed as:A direct application of conditional probability is Bayes’ Theorem, which updates the probability of a hypothesis given new evidence, is as follows:This theorem is crucial in Bayesian inference, widely applied in generative models, reinforcement learning, and uncertainty quantification.In category-theoretic terms, probabilistic transitions can be modeled using Markov categories, where conditional dependencies are naturally represented. However, for practical machine learning applications, the focus remains on efficient approximation techniques, such as variational inference (VI) and Monte Carlo methods (MCMC), to handle complex distributions.
- Descriptive Statistics: This summarizes data characteristics through measures such as central tendency, dispersion, and visualization techniques, offering insights into distribution patterns and trends.
- Inferential Statistics—This uses sample data to estimate population parameters, facilitating hypothesis testing, interval estimation, and predictive modeling.
3.1. Categorical Background of Probability and Statistics Learning
- Categorization of traditional probability theory structures: Constructs like probability spaces and integration can be categorized using structures such as the Giry monad, which maps measurable spaces to probability measures, preserving the measurable structure. These frameworks formalize relationships between spaces and probability measures, enabling compositional probabilistic models.
- Categorization of synthetic probability and statistical concepts: Some axioms and structures are seen as ‘fundamental’ in probabilistic logic, which derive inference processes. Measure-theoretic models serve as concrete instances of these abstract frameworks. Markov categories are used to represent stochastic maps, conditional probabilities, and compositional reasoning in probabilistic systems.
3.2. Preliminaries and Notions
- 1.
- For every , the map defined by
- 2.
- If are in , then any measurable function such that is also in .
- 3.
- contains all constant functions.
- Ω is the sample space;
- is a σ-algebra of subsets of Ω;
- is a probability measure on .
- 1.
- Non-negativity: for all and .
- 2.
- Finiteness: For each , the number of nonzero probabilities is finite.
- 3.
- Normalization: For each , the transition probabilities sum to one:
- In Markov chains, a stochastic map describes the transition probabilities between states, forming the basis for modeling sequential dependencies.
- In reinforcement learning, policy functions and transition models are often represented as stochastic maps, capturing the inherent randomness in environment dynamics.
- In probabilistic inference, stochastic maps define the conditional distributions in Bayesian networks and hidden Markov models.
- 1.
- Probability Measure Condition: For each , the function is a probability measure on , meaning:
- 2.
- Measurability Condition: For each , the function is -measurable.
- Bayesian learning: Modeling posterior distributions in Bayesian inference.
- Sequential decision-making: Representing transition dynamics in stochastic control and reinforcement learning.
- Variational inference: Defining probability measures in stochastic optimization and Monte Carlo methods.
- A comultiplication map ;
- A counit map .
- Objects: The objects of are the same as the objects of .
- Morphisms: For objects , the morphisms in are given by:
- Composition: For and , their composition in is defined as:
- Identity: For each object , the identity morphism is given by the unit of the monad.
- represents the probability density or likelihood of y given x;
- is the marginal probability of y under the prior π;
- is a probability measure on , representing the conditional distribution of x given y (the posterior distribution).
- Discrete Case: If X is a discrete random variable with a probability mass function defined on a finite set , the entropy is given by:
- Continuous Case: If X is a continuous random variable with a probability density function defined on a support set , the entropy , also referred to as differential entropy, is given by:
3.3. Framework of Categorical Bayesian Learning
- The combination of Bayesian inference and backpropagation induces the Bayesian inverse.
- The gradient-based learning process is further formalized as a functor .
3.3.1. Probability Model
- A morphism in a Markov category can be viewed as a probability distribution on X. A morphism is called a channel.
- Given a pair and , a state on can be defined as the following diagram, referred to as the jointification of f and .
- Let be a joint state. A disintegration of consists of a channel and a state such that the following diagram commutes. If every joint state in a category allows for decomposition, then the category is said to allow for conditional distribution.
- Let be a Markov category. If for every morphism , there exists a morphism such that the following commutative diagram holds, then is said to have conditional distribution (i.e., X can be factored out as a premise for Y).
- Equivalence of Channels: Let be a state on the object . Let be morphisms in . f is said to be almost everywhere equal to g if the following commutative diagram holds. If is a state and is the corresponding marginal distribution, and are channels such that and both form decompositions with respect to , then f is almost everywhere equal to g with respect to .
- Bayesian Inverse: Let be a state on . Let be a channel. The Bayesian inverse of f with respect to is a channel . If a Bayesian inverse exists for every state and channel f, then is said to support Bayesian inverses. This definition can be rephrased using the concept of decomposition. The Bayesian inverse can be obtained by decomposing the joint distribution which results from integrating over , where f is a channel. The Bayesian inverse is not necessarily unique. However, if and are Bayesian inverses of a channel with respect to the state , then is almost everywhere equal to .
- If a category admits conditional probabilities, it is causal. However, the converse does not hold.
- The categories and are both causal.
- If is causal, then (or written as ) is symmetric monoidal.
3.3.2. Introduction of Functor
- is called an -actegory if there exists a strong monoidal functor , where is the category of endofunctors on , with composition as the monoidal operation. For and , the action is denoted by .
- is called a right -actegory if it is an -actegory and is equipped with a natural isomorphism:
- If is a right -actegory, the following natural isomorphisms must exist:
- Objects: The objects of .
- 1-morphism: in consists of a pair , where and is a morphism in .
- Composition of 1-morphisms: Let and . The composition is the morphism in given by:
- 2-morphism: Let . A 2-morphism is given by a morphism such that the following diagram commutes:
- Identity morphisms and composition: These in the category inherit from the identity morphisms and composition in .
3.3.3. The Final Combination: Functor
- Objects: Pairs , where and .
- Morphisms: A morphism consists of a pair , where:
- 1.
- is a morphism in ;
- 2.
- is a morphism in .
- Hom-Sets: The Hom-set is given by the dependent sum:
- Define the functor : given , let .
- Define as the lens corresponding to the functor with objects and morphisms as follows.
- Objects: For , where and .
- Morphisms: A morphism is given by a morphism in and a morphism in .
- Combining with reverse derivatives, define the functor : given , let . If is a morphism in , then
- Let and be Markov categories, with being causal. Assume is a symmetric monoidal -actegory consistent with . Then is a symmetric monoidal category. The categories and are -actegories. is a functor that, when applied to , yields:If is a symmetric monoidal category, then a -actegory allows a canonical functor , where . The functor is the unit of the pseudomonad defined by . Thus, we obtain the following diagram:
- Define the functor .
3.4. Other Related Research
3.4.1. Categorical Probability Framework and Bayesian Inference
3.4.2. Generalized Models and Probabilistic Programming
3.4.3. Applications and Advanced Techniques in Categorical Structures
4. Developments in Invariance and Equivalence-Based Learning
- Other functorial construction-based methods.;
4.1. Functorial Constructions and Properties
- Clustering: A clustering algorithm takes a finite metric space and assigns each point in the space to a cluster.
- Manifold Learning: Manifold learning algorithms, such as Isomap, Metric Multidimensional Scaling, and UMAP, construct -embeddings for the points in X, which are interpreted as coordinates for the support . These techniques are based on the assumption that this support can be well-approximated with a manifold.
4.1.1. Preliminaries and Notions
- X is a finite set,
- is a function satisfying the following properties:
- 1.
- Identity: for all ,
- 2.
- Symmetry: for all ,
- 3.
- Triangle Inequality: for all .
- Objects: Finite uber-metric spaces;
- Morphisms: Non-expansive maps between uber-metric spaces.
- Non-Nestedness: If and , then .
- Flag Property: The simplicial complex associated with , defined by:
- Identity on Underlying Sets: For each object , the underlying set of is X.
- Preservation of Structure: maps morphisms (non-expansive maps) in to morphisms in , preserving the clustering structure.
- Closure under Subsets: If and , then .
- Finiteness: is a finite collection of finite sets.
- The vertices of are the elements of X.
- A subset forms a simplex in if and only if for all .
4.1.2. Functorial Manifold Learning
- Objects: Tuples , where n is a natural number and is a real-valued function that satisfies for or .
- Morphisms: when for all and .
- is a hierarchical -clustering functor;
- maps a fuzzy, non-nested flag cover with vertex set X to some with cardinality .
- : Constructs a local metric space around each point in X;
- : Converts each local metric space into a fuzzy simplicial complex;
- : Takes a fuzzy union of these fuzzy simplicial complexes;
- : Converts the resulting fuzzy simplicial complex into a fuzzy non-nested flag cover;
- : Constructs a loss function based on this cover.
- Developing more robust theories regarding the resistance of various types of unsupervised and supervised algorithms to noise;
- Exploring the possibility of imposing stricter bounds on the stability of our results by shifting from a finite-dimensional space to a distributional perspective, potentially incorporating concepts such as surrogate loss functions and divergence measures.
4.1.3. Functorial Clustering
- Disjoint clustering: When an element belongs to only one cluster, e.g., clustering by content .
- Fuzzy clustering: An element can belong to all clusters, but with a certain degree of membership, e.g., clustering within a color range.
- Overlapping clustering: An element can belong to multiple clusters, e.g., people who like Eastern and Western cuisines.
- For objects: A finite uber-metric space is sent to a fibered fuzzy simplicial complex , where for , is a simplicial complex whose 0-simplices are X, and the 1-simplices satisfy for . has no n-simplices for .
- For morphisms: A map is sent to a natural transformation between the two fibered fuzzy simplicial complexes , where the distribution at α is given by f.
- For objects: The vertex set X of is mapped to , where .
- For morphisms: sends a natural transformation to the function f defined by μ on the vertex sets of and (the function must be non-expansive because for all , if , then must be in or ).
- Flat Clustering Functor: A functor , which is constant on the underlying set.
- Non-trivial Flat Clustering Functor: is a flat clustering functor, where there exists a clustering parameter such that for any is a single simplex containing two points, and for any , is a pair of two simplices. The clustering parameter is the upper bound on the distance at which the clustering functor identifies two points as belonging to the same simplex.
- A functor is defined such that for any , is a flat clustering functor.
- Non-trivial hierarchical clustering functor : A hierarchical clustering functor where for all , is a flat clustering functor with clustering parameter .
- Single Linkage : The points lie in the same cluster with strength at least a if there exists a sequence of points such that .
- Maximum Linkage : The points lie in the same cluster with strength at least a if the largest pairwise distance between them is no larger than .
4.2. Persistent Homology
- Defining differentiability and derivatives within the infinite-dimensional and singular category of barcodes , enabling PH-based gradient-descent optimization [96].
- Introducing Extended Persistent Homology (EPH) and applying it to machine learning, using the Laplacian operator for graph dataset classification [97].
- Analyzing the fiber of the persistence map from filter functions on a simplicial complex to . By applying increasing homeomorphisms of , they showed the fiber forms a polyhedral complex and established a functorial relationship from barcodes to polyhedral complexes [98].
- Developing an algorithm to compute the polyhedral complex forming the fiber for arbitrary simplicial complexes K. enabling homology computations and fiber statistics [99].
4.3. Other Related Research
5. Developments in Topos-Based Learning
5.1. The Reports of Laurent Lafforgue
- Emergence and Invariance: Higher-order categories and geometric logic model emergent behavior and preserved properties in neural networks.
- Geometric Representation: Data spaces can be modeled as geometric objects, capturing structural evolution beyond linear representations.
- Local-to-Global Analysis: Topos-theoretic spaces enable local properties to determine global learning behavior.
- Categories and Generic Models: A mathematical theory derived from a neural network framework in logic can be effectively expressed using a category with suitable properties and a generic model, which acts as a functor.
- Logical Foundations: Axiomatic structures in category theory provide a formal basis for machine learning transformations.
- Theory Mappings: Functors between categories of models facilitate structured transformations across learning paradigms.
- Classifying Topos: A universal framework encoding neural network properties via categorical semantics.
The Idea in the Reports of Laurent Lafforgue
- The naming functor assigns structured labels to objects, creating a unified vocabulary.
- The partial knowledge functor encodes what is currently known and how it evolves.
- Each topos represents a structured knowledge space, capturing relationships between learning components.
- Topos morphisms track how information evolves, refining knowledge across different layers of abstraction.
- Hierarchical decomposition enables a systematic approach to integrating localized details into broader learning architectures.
5.2. Artificial Neural Networks: Corresponding Topos and Stack
- DNNs excel at pattern recognition but lack structured reasoning.
- Bayesian models are good at probabilistic inference but do not support logical deduction.
- Human-like intelligence requires both statistical learning and logical inference.
- Explainable AI: Making neural networks more interpretable by modeling learning as logical transformations.
- Transfer and Compositional Learning: Enabling modular knowledge transfer between tasks.
- Hierarchical and Multi-Agent Learning: Encoding layered decision-making in AI systems.
- AI for Scientific Discovery: Applying topos-based representations in physics-inspired machine learning.
- Symbolic-Neural Hybrid Models: Bridging deep learning and structured reasoning for more robust AI.
- Key Concepts and Their Role: Several fundamental mathematical structures are central to understanding the topos-theoretic approach to machine learning. While detailed definitions are omitted, we highlight the key ideas and their relevance to our discussion.
- Groupoids and Stacks: Groupoids represent structures where every morphism is invertible, capturing symmetries and equivalences. Stacks extend this idea by formalizing hierarchical relationships between objects in a categorical framework. These notions are useful in modeling invariance and modularity in deep learning architectures.
- Grothendieck Construction and Fibrations: The Grothendieck construction provides a systematic way to transition between category-valued functors and structured categories, particularly in the study of fibration structures. This is instrumental in understanding hierarchical feature representations in neural networks.
- Type Theory and Logical Structures: Type theory provides a formal language for structuring logical reasoning. Within categorical semantics, type-theoretic frameworks are used to interpret neural network functions as structured logical operations, enhancing explainability and compositional learning.
- Locally Cartesian Closed Categories and Model Categories: These structures support flexible transformations between logical and geometric representations of learning processes. They facilitate the integration of homotopy-theoretic ideas with machine learning, particularly in capturing topological invariants of learned representations.
- Sheaves, Cosheaves, and Invariance Principles: Sheaf-theoretic techniques allow for local-to-global reasoning, enabling structured information flow in learning systems. Cosheaves, in contrast, provide a dual perspective that is valuable for understanding distributed representations and signal propagation in neural networks.
Topoi and Stacks of Deep Neural Networks
- Invariant Network Structures: Patterns in neural networks, such as CNNs and LSTMs, correspond to Giraud’s stack structures, supporting generalization under specific constraints.
- Artificial Languages and Logic: Learning representations are structured along the fibers of stacks, incorporating different types of logic (intuitionistic, classical, linear) to enable formal reasoning.
- Semantic Functions in Networks: Neural network functions act as semantic functions, encoding and processing structured information, leading to meaningful outputs.
- Entropy and Semantic Information: Entropy measures the distribution of semantic information within the network, providing insights into how knowledge is processed and retained.
- Geometric Semantics and Invariance: Geometric fibrant objects in Quillen’s closed model categories help classify semantics, particularly through homotopy invariants that capture structural consistency.
- Type-Theoretic Learning Structures: Intensional Martin–Löf Type Theory (MLTT) structures geometric objects and their relationships, offering a constructive reasoning framework.
- Information Flow and Derivators: Grothendieck derivators analyze how information is stored and exchanged within the categorical framework.
- Local–Global Coherence: Topos theory formalizes how local interactions in a network lead to global coherence, explaining structured information flow.
- Structured Node Representation: Nodes act as both receivers and transmitters. The Grothendieck construction models their internal structure, associating fibers with a base category. This extends to stacks (2-sheaves) [26], capturing hierarchical dependencies in complex networks.
- A fibration encodes fibered categories satisfying stack axioms.
- The topos of sheaves is equivalent to , forming a classifying topos.
- (feedforward propagation).
- (feedback propagation).
- Type-Theoretic Structure: Network operations are represented as types (presheaves) on a stack’s fibers, capturing the logical rules of learning.
- Semantic Refinement Across Layers: Deeper layers refine their understanding of inputs, aligning with the hierarchical nature of feature extraction in DNNs.
- Types: Objects in .
- Contexts: Slice categories .
- Propositions: -truncated objects in .
- Proofs: Generalized elements of propositions.
- Feedforward transformation: .
- Feedback transformation: .
- Model Category and M-L Type Theory in DNNs: Homotopy theory provides a structured way to study transformations between neural network mappings. In model categories, weak equivalences capture essential features of learning, allowing homotopies between mappings to be explicitly observed. This perspective aligns with Quillen’s model theory, where fibrations and cofibrations ensure structural consistency in DNNs.
- The category of stacks (groupoid stacks) corresponds to the category . Within the groupoid fibration, stacks are fibrant objects, and weak equivalences define homotopy relations, linking them to M-L type theory.
- This result extends to general stack categories, connecting Quillen models with intensional M-L theory and Voevodsky’s homotopy type theory.
- The primary categories studied include (groupoids) and (small categories). In , fibrations lift isomorphisms, cofibrations are injective functors, and weak equivalences correspond to categorical equivalences.
- Given the hierarchical structure of DNNs, a poset representing layers induces a fibration , providing a structured context for M-L type theory.
- Types in this framework are defined as fibrations, supporting logical operations such as conjunction, disjunction, implication, negation, and quantification.
- The M-L structure over DNNs associates contexts and types with geometric fibrations in the 2-category of contravariant functors , ensuring a well-defined internal logic.
- Similar principles apply to groupoids, allowing M-L theory to define language and semantics for neural networks, aligning machine learning with structured categorical reasoning.
- Dynamics and Homology in Deep Neural Networks In supervised and reinforcement learning, network decisions rely on structured information flow. The dynamic object represents network activity, guiding decision-making based on learned semantic structures. The key challenge is understanding how the entire deep neural network (DNN) contributes to output decisions. This is achieved by encoding output propositions into truth values and expanding the network’s structure accordingly.
- Direct representation of semantic content (first level).
- Higher-order structures capturing evolving theories (second level).
- Random variables in a layer correspond to logical propositions, enabling measurement.
- Layers represent gerbe objects, modeling dynamic semantic transformations across feedforward and feedback loops.
5.3. Other Related Works
- Ref. [140] leveraged the logical programming language ProbLog to unify semantic information and communication by integrating technical communication (TC) and semantic communication (SC) through the use of internal logics. This approach demonstrates how logical programming can bridge semantic and technical paradigms to enhance communication systems.
- Ref. [141] examined semantic communication in AI applications, focusing on causal representation learning and its implications for reasoning-driven semantic communication networks. The authors proposed a comprehensive set of key performance indicators (KPIs) and metrics to evaluate semantic communication systems and demonstrated their scalability to large-scale networks, thereby establishing a framework for designing efficient, learning-oriented semantic communication networks.
- Ref. [142] explored the mathematical underpinnings of statistical systems by representing them as partially ordered sets (posets) and expressing their phases as invariants of these representations. By employing homological algebra, the authors developed a methodology to compute these phases, offering a robust framework for analyzing the structural and statistical properties of such systems.
5.4. Case Study: Frustrated Systems in AI and Topos-Theoretic Approaches
- Neural Networks: In deep learning, conflicting weight updates can create local minima, complicating optimization.
- Optimization Problems: Problems like the traveling salesman problem and graph coloring exhibit frustration due to multiple competing constraints.
- Quantum Computing: Quantum error correction and superposition management rely on coherent state transitions, which can be modeled using topos structures.
- Difficulty in Finding Global Optima: Local minima hinder effective optimization.
- High Computational Complexity: Navigating rugged optimization landscapes requires extensive computation.
- Sensitivity to Perturbations: Small changes can lead to instability in model performance.
- Long Relaxation Times: Convergence to stable states can be slow in complex AI systems.
- Formalizing Learning Architectures: Representing training processes as morphisms within a topos.
- Enhancing Optimization: Providing structured methods to escape local minima.
- Ensuring Stability and Robustness: Using categorical structures to generalize across learning conditions.
- Neural Networks: Regularization techniques and structured weight updates reduce the risk of local minima.
- Quantum Algorithms: Error correction and coherence management benefit from topos-theoretic representations.
- Reinforcement Learning: Managing the exploration-exploitation tradeoff through categorical methods.
- Hybrid Approaches: Combining topos theory with traditional optimization techniques for enhanced performance.
5.5. Outlook and Future Directions
- Robustness in Adversarial Learning: Investigating topos-theoretic invariances to develop models that are more resistant to adversarial perturbations.
- Explainability in Deep Learning: Using internal logic of topoi to formalize explainability in black-box models such as transformers and generative AI.
- Multi-Agent and Federated Learning: Applying categorical compositionality to improve information sharing and coordination between decentralized models.
- Efficient Knowledge Transfer in Pretrained Models: Leveraging geometric morphisms to enhance the transferability of representations across different tasks.
- Topos-Based Optimization Frameworks: Exploring higher categorical structures in optimization, potentially improving convergence and stability in gradient-based learning.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shiebler, D.; Gavranović, B.; Wilson, P. Category Theory in Machine Learning. arXiv 2021, arXiv:2106.07032. [Google Scholar]
- Lu, X.; Tang, Z. Causal Network Condensation. arXiv 2022, arXiv:2112.15515. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2017, arXiv:1609.04747. [Google Scholar]
- Cruttwell, G.S.H.; Gavranovic, B.; Ghani, N.; Wilson, P.; Zanasi, F. Categorical Foundations of Gradient-Based Learning. In Programming Languages and Systems; Sergey, I., Ed.; Springer: Cham, Switzerland, 2022; pp. 1–28. [Google Scholar]
- Cruttwell, G.S.H.; Gavranovic, B.; Ghani, N.; Wilson, P.; Zanasi, F. Deep Learning with Parametric Lenses. arXiv 2024, arXiv:2404.00408. [Google Scholar]
- Capucci, M.; Gavranović, B.; Hedges, J.; Rischel, E.F. Towards Foundations of Categorical Cybernetics. Electron. Proc. Theor. Comput. Sci. 2022, 372, 235–248. [Google Scholar] [CrossRef]
- Gavranović, B. Fundamental Components of Deep Learning: A category-theoretic approach. arXiv 2024, arXiv:2403.13001. [Google Scholar]
- Blute, R.F.; Cockett, J.R.B.; Seely, R.A. Cartesian differential categories. Theory Appl. Categ. 2009, 22, 622–672. [Google Scholar]
- Cockett, R.; Cruttwell, G.; Gallagher, J.; Lemay, J.S.P.; MacAdam, B.; Plotkin, G.; Pronk, D. Reverse derivative categories. arXiv 2019, arXiv:1910.07065. [Google Scholar]
- Wilson, P.W. Category-Theoretic Data Structures and Algorithms for Learning Polynomial Circuits. Ph.D. Thesis, University of Southampton, Southampton, UK, 2023. [Google Scholar]
- Wilson, P.; Zanasi, F. Reverse Derivative Ascent: A Categorical Approach to Learning Boolean Circuits. Electron. Proc. Theor. Comput. Sci. 2021, 333, 247–260. [Google Scholar] [CrossRef]
- Statusfailed. Numeric Optics: A Python Library for Constructing and Training Neural Networks Based on Lenses and Reverse Derivatives; Online Resource. 2025. Available online: https://github.com/statusfailed/numeric-optics-python (accessed on 20 February 2025).
- Wilson, P.; Zanasi, F. Data-Parallel Algorithms for String Diagrams. arXiv 2023, arXiv:2305.01041. [Google Scholar]
- Wilson, P. Yarrow Diagrams: String Diagrams for the Working Programmer. 2023. Available online: https://github.com/yarrow-id/diagrams (accessed on 23 August 2024).
- Wilson, P. Yarrow-polycirc: Differentiable IR for Zero-Knowledge Machine Learning. 2023. Available online: https://github.com/yarrow-id/polycirc (accessed on 23 August 2024).
- Wilson, P. Catgrad: A Categorical Deep Learning Compiler. 2024. Available online: https://github.com/statusfailed/catgrad (accessed on 23 August 2024).
- Cruttwell, G.; Gallagher, J.; Lemay, J.S.P.; Pronk, D. Monoidal reverse differential categories. Math. Struct. Comput. Sci. 2022, 32, 1313–1363. [Google Scholar] [CrossRef]
- Cruttwell, G.; Lemay, J.S.P. Reverse Tangent Categories. arXiv 2023, arXiv:2308.01131. [Google Scholar]
- Fong, B.; Spivak, D.I.; Tuyéras, R. Backprop as Functor: A Compositional Perspective on Supervised Learning. In Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2019), Vancouver, BC, Canada, 24–27 June 2019; IEEE: New York, NY, USA, 2019; pp. 1–13. [Google Scholar] [CrossRef]
- Fong, B.; Johnson, M. Lenses and Learners. arXiv 2019, arXiv:1903.03671. [Google Scholar]
- Fong, B. Causal Theories: A Categorical Perspective on Bayesian Networks. arXiv 2013, arXiv:1301.6201. [Google Scholar]
- Spivak, D.I. Functorial aggregation. arXiv 2023, arXiv:2111.10968. [Google Scholar] [CrossRef]
- Ghica, D.R.; Kaye, G.; Sprunger, D. A Fully Compositional Theory of Sequential Digital Circuits: Denotational, Operational and Algebraic Semantics. arXiv 2024, arXiv:2201.10456. [Google Scholar]
- Videla, A.; Capucci, M. Lenses for Composable Servers. arXiv 2022, arXiv:2203.15633. [Google Scholar]
- Gavranović, B. Space-time tradeoffs of lenses and optics via higher category theory. arXiv 2022, arXiv:2209.09351. [Google Scholar]
- Belfiore, J.C.; Bennequin, D. Topos and Stacks of Deep Neural Networks. arXiv 2022, arXiv:2106.14587. [Google Scholar]
- Spivak, D.I. Learners’ languages. Electron. Proc. Theor. Comput. Sci. 2022, 372, 14–28. [Google Scholar] [CrossRef]
- Capucci, M. Diegetic Representation of Feedback in Open Games. Electron. Proc. Theor. Comput. Sci. 2023, 380, 145–158. [Google Scholar] [CrossRef]
- Hedges, J.; Sakamoto, R.R. Reinforcement Learning in Categorical Cybernetics. arXiv 2024, arXiv:2404.02688. [Google Scholar]
- Lanctot, M.; Lockhart, E.; Lespiau, J.B.; Zambaldi, V.; Upadhyay, S.; Pérolat, J.; Srinivasan, S.; Timbers, F.; Tuyls, K.; Omidshafiei, S.; et al. OpenSpiel: A Framework for Reinforcement Learning in Games. arXiv 2020, arXiv:1908.09453. [Google Scholar]
- Kamiya, K.; Welliaveetil, J. A category theory framework for Bayesian learning. arXiv 2021, arXiv:2111.14293. [Google Scholar]
- Gavranović, B.; Lessard, P.; Dudzik, A.J.; Von Glehn, T.; Madeira Araújo, J.A.G.; Veličković, P. Position: Categorical Deep Learning is an Algebraic Theory of All Architectures. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F., Eds.; Proceedings of Machine Learning Research, Volume 235. PMLR: Cambridge, MA, USA, 2024; pp. 15209–15241. [Google Scholar]
- Vákár, M.; Smeding, T. CHAD: Combinatory Homomorphic Automatic Differentiation. ACM Trans. Program. Lang. Syst. 2022, 44, 1–49. [Google Scholar] [CrossRef]
- Abbott, V. Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures. arXiv 2024, arXiv:2402.05424. [Google Scholar]
- Abbott, V.; Zardini, G. Functor String Diagrams: A Novel Approach to Flexible Diagrams for Applied Category Theory. arXiv 2024, arXiv:2404.00249. [Google Scholar]
- Hauenstein, J.D.; He, Y.H.; Kotsireas, I.; Mehta, D.; Tang, T. Special issue on Algebraic Geometry and Machine Learning. J. Symb. Comput. 2023, 118, 93–94. [Google Scholar] [CrossRef]
- Lawvere, F.W. The Category of Probabilistic Mappings—With Applications to Stochastic Processes, Statistics, and Pattern Recognition. Semin. Handout Notes 1962. Unpublished. [Google Scholar]
- Giry, M. A categorical approach to probability theory. In Categorical Aspects of Topology and Analysis; Banaschewski, B., Ed.; Springer: Berlin/Heidelberg, Germany, 1982; pp. 68–85. [Google Scholar]
- Leinster, T. Codensity and the ultrafilter monad. arXiv 2013, arXiv:1209.3606. [Google Scholar]
- Sturtz, K. Categorical Probability Theory. arXiv 2015, arXiv:1406.6030. [Google Scholar]
- Belle, R.V. Probability Monads as Codensity Monads. Theory Appl. Categ. 2021, 38, 811–842. [Google Scholar]
- Burroni, E. Distributive laws. Applications to stochastic automata. (Lois distributives. Applications aux automates stochastiques.). Theory Appl. Categ. 2009, 22, 199–221. [Google Scholar]
- Culbertson, J.; Sturtz, K. Bayesian machine learning via category theory. arXiv 2013, arXiv:1312.1445. [Google Scholar]
- Culbertson, J.; Sturtz, K. A categorical foundation for Bayesian probability. Appl. Categ. Struct. 2014, 22, 647–662. [Google Scholar] [CrossRef]
- Chentsov, N.N. Categories of mathematical statistics. Uspekhi Mat. Nauk 1965, 20, 194–195. [Google Scholar]
- Giry, M. A categorical approach to probability theory. In Proceedings of the 1982 International Conference on Category Theory, Dundee, Scotland, 29 March–2 April 1982. [Google Scholar]
- Golubtsov, P.V. Axiomatic description of categories of information transformers. Probl. Peredachi Informatsii 1999, 35, 80–98. [Google Scholar]
- Golubtsov, P.V. Monoidal Kleisli category as a background for information transformers theory. Inf. Process. 2002, 2, 62–84. [Google Scholar]
- Kallenberg, O. Random Measures, Theory and Applications; Springer: Cham, Switzerland, 2017; Volume 1. [Google Scholar]
- Fritz, T.; Perrone, P. Bimonoidal Structure of Probability Monads. In Proceedings of the 2018 Symposium on Logic in Computer Science, Oxford, UK, 9–12 July 2018. [Google Scholar]
- Fritz, T. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math. 2020, 370, 107239. [Google Scholar] [CrossRef]
- Fritz, T.; Rischel, E.F. Infinite Products and Zero-One Laws in Categorical Probability. Compositionality 2020, 2, 13509. [Google Scholar] [CrossRef]
- Fritz, T.; Gonda, T.; Perrone, P.; Fjeldgren Rischel, E. Representable Markov categories and comparison of statistical experiments in categorical probability. Theor. Comput. Sci. 2023, 961, 113896. [Google Scholar] [CrossRef]
- Fritz, T.; Gadducci, F.; Perrone, P.; Trotta, D. Weakly Markov Categories and Weakly Affine Monads. arXiv 2023, arXiv:2303.14049. [Google Scholar]
- Sabok, M.; Staton, S.; Stein, D.; Wolman, M. Probabilistic programming semantics for name generation. Proc. ACM Program. Lang. 2021, 5, 1–29. [Google Scholar] [CrossRef]
- Moggi, E. Computational Lambda-Calculus and Monads; Laboratory for Foundations of Computer Science, Department of Computer Science, University of Edinburgh: Edinburgh, UK, 1988. [Google Scholar]
- Sennesh, E.; Xu, T.; Maruyama, Y. Computing with Categories in Machine Learning. arXiv 2023, arXiv:2303.04156. [Google Scholar]
- Ambrogioni, L.; Lin, K.; Fertig, E.; Vikram, S.; Hinne, M.; Moore, D.; van Gerven, M. Automatic structured variational inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 13–15 April 2021; pp. 676–684. [Google Scholar]
- Meulen, F.; Schauer, M. Automatic Backward Filtering Forward Guiding for Markov processes and graphical models. arXiv 2020, arXiv:2010.03509. [Google Scholar]
- Braithwaite, D.; Hedges, J. Dependent Bayesian Lenses: Categories of Bidirectional Markov Kernels with Canonical Bayesian Inversion. arXiv 2022, arXiv:2209.14728. [Google Scholar]
- Heunen, C.; Kammar, O.; Staton, S.; Yang, H. A convenient category for higher-order probability theory. In Proceedings of the IEEE 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), Reykjavik, Iceland, 20–23 June 2017; pp. 1–12. [Google Scholar]
- Kallenberg, O. Foundations of Modern Probability, 2nd ed.; Probability and Its Applications; Springer: New York, NY, USA, 2002. [Google Scholar]
- Villani, C. Optimal Transport: Old and New. In Grundlehren der Mathematischen Wissenschaften; Springer: Berlin/Heidelberg, Germany, 2009; Volume 338. [Google Scholar]
- Lane, S.M. Categories for the Working Mathematician, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1998; Volume 5. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Etingof, P.; Gelaki, S.; Nikshych, D.; Ostrik, V. Tensor Categories; Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, USA, 2015; Volume 205. [Google Scholar]
- Schreiber, U.; Leinster, T.; Baez, J. The n-Category Café, September 2006. Available online: https://golem.ph.utexas.edu/category/2006/09/index.shtml (accessed on 23 August 2024).
- Corfield, D.; Schölkopf, B.; Vapnik, V. Falsificationism and statistical learning theory: Comparing the Popper and Vapnik-Chervonenkis dimensions. J. Gen. Philos. Sci. 2009, 40, 51–58. [Google Scholar] [CrossRef]
- Bradley, T.D.; Terilla, J.; Vlassopoulos, Y. An enriched category theory of language: From syntax to semantics. La Mat. 2022, 1, 551–580. [Google Scholar] [CrossRef]
- Fabregat-Hernández, A.; Palanca, J.; Botti, V. Exploring explainable AI: Category theory insights into machine learning algorithms. Mach. Learn. Sci. Technol. 2023, 4, 045061. [Google Scholar] [CrossRef]
- Alejandro Aguirre, G.B.L.B.; Bizjak, A.; Gaboardi, M.; Garg, D. Relational Reasoning for Markov Chains in a Probabilistic Guarded Lambda Calculus. In Proceedings of the European Symposium on Programming, Thessaloniki, Greece, 16–19 April 2018. [Google Scholar]
- Schauer, M.; Meulen, F. Compositionality in algorithms for smoothing. In Proceedings of the 2023 International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Fritz, T.; Klingler, A. The d-separation criterion in Categorical Probability. J. Mach. Learn. Res. 2022, 24, 1–49. [Google Scholar]
- Perrone, P. Markov Categories and Entropy. IEEE Trans. Inf. Theory 2024, 70, 1671–1692. [Google Scholar] [CrossRef]
- Mahadevan, S. Categoroids: Universal Conditional Independence. arXiv 2022, arXiv:2208.11077. [Google Scholar]
- Yang, B.; Marisa, Z.Z.K.; Shi, K. Monadic Deep Learning. arXiv 2023, arXiv:2307.12187. [Google Scholar]
- Shiebler, D. Functorial Clustering via Simplicial Complexes. In Proceedings of the NeurIPS 2020 Workshop on Topological Data Analysis and Beyond, Online, 11 December 2020. [Google Scholar]
- Shiebler, D. Functorial Manifold Learning. Electron. Proc. Theor. Comput. Sci. 2022, 372, 1–13. [Google Scholar] [CrossRef]
- Edelsbrunner, H.; Harer, J. Persistent homology-a survey. Contemp. Math. 2008, 453, 257–282. [Google Scholar]
- Pun, C.S.; Xia, K.; Lee, S.X. Persistent-Homology-based Machine Learning and its Applications—A Survey. arXiv 2018, arXiv:1811.00252. [Google Scholar] [CrossRef]
- Shiebler, D. Compositionality and Functorial Invariants in Machine Learning. Ph.D. Thesis, University of Oxford, Oxford, UK, 2023. [Google Scholar]
- Kelly, G.M. Basic Concepts of Enriched Category Theory; London Mathematical Society Lecture Note Series; Cambridge University Press: Cambridge, UK, 1982; Volume 64. [Google Scholar]
- Hatcher, A. Algebraic Topology; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Edelsbrunner, H.; Harer, J.L. Computational Topology: An Introduction; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
- Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1988. [Google Scholar]
- Carlsson, G.E.; Mémoli, F. Classifying Clustering Schemes. Found. Comput. Math. 2010, 13, 221–252. [Google Scholar] [CrossRef]
- Spivak, D.I. Metric Realization of Fuzzy Simplicial Sets; Online Resource; Self-Published Notes. Available online: https://dspivak.net/metric_realization090922.pdf (accessed on 23 August 2024).
- McInnes, L. Topological methods for unsupervised learning. In Proceedings of the Geometric Science of Information: 4th International Conference, GSI 2019, Toulouse, France, 27–29 August 2019; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2019; pp. 343–350. [Google Scholar]
- Chazal, F.; Cohen-Steiner, D.; Glisse, M.; Guibas, L.J.; Oudot, S.Y. Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry, Aarhus, Denmark, 8–10 June 2009; pp. 237–246. [Google Scholar]
- Ghrist, R. Barcodes: The persistent topology of data. Bull. Am. Math. Soc. 2008, 45, 61–75. [Google Scholar] [CrossRef]
- Edelsbrunner, H.; Morozov, D. Persistent Homology: Theory and Practice; Technical Report; Lawrence Berkeley National Lab. (LBNL): Berkeley, CA, USA, 2012. [Google Scholar]
- Huber, S. Persistent homology in data science. In Proceedings of the Data Science—Analytics and Applications: 3rd International Data Science Conference–iDSC2020, Vienna, Austria, 13 May 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 81–88. [Google Scholar]
- Dłotko, P.; Wagner, H. Computing homology and persistent homology using iterated Morse decomposition. arXiv 2012, arXiv:1210.1429. [Google Scholar]
- Gameiro, M.; Hiraoka, Y.; Obayashi, I. Continuation of point clouds via persistence diagrams. Phys. D Nonlinear Phenom. 2016, 334, 118–132. [Google Scholar] [CrossRef]
- Leygonie, J. Differential and fiber of persistent homology. Ph.D. Thesis, University of Oxford, Oxford, UK, 2022. [Google Scholar]
- Leygonie, J.; Oudot, S.; Tillmann, U. A Framework for Differential Calculus on Persistence Barcodes. Found. Comput. Math. 2019, 22, 1069–1131. [Google Scholar] [CrossRef]
- Yim, K.M.; Leygonie, J. Optimization of Spectral Wavelets for Persistence-Based Graph Classification. Front. Appl. Math. Stat. 2021, 7, 651467. [Google Scholar] [CrossRef]
- Leygonie, J.; Tillmann, U. The fiber of persistent homology for simplicial complexes. J. Pure Appl. Algebra 2021, 226, 107099. [Google Scholar] [CrossRef]
- Leygonie, J.; Henselman-Petrusek, G. Algorithmic reconstruction of the fiber of persistent homology on cell complexes. J. Appl. Comput. Topol. 2024, 226, 2015–2049. [Google Scholar] [CrossRef]
- Jardine, J.F. Data and homotopy types. arXiv 2019, arXiv:1908.06323. [Google Scholar]
- Jardine, J.F. Persistent homotopy theory. arXiv 2020, arXiv:2002.10013. [Google Scholar]
- Jardine, J.F. Directed Persistence. 2020. Available online: https://www.math.uwo.ca/faculty/jardine/preprints/fund-cat03.pdf (accessed on 23 August 2024).
- Ballester, R.; Casacuberta, C.; Escalera, S. Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey. arXiv 2024, arXiv:2312.05840. [Google Scholar]
- Turkevs, R.; Montúfar, G.; Otter, N. On the effectiveness of persistent homology. Adv. Neural Inf. Process. Syst. 2022, 35, 35432–35448. [Google Scholar]
- Zhao, Q.; Ye, Z.; Chen, C.; Wang, Y. Persistence Enhanced Graph Neural Network. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020. [Google Scholar]
- Solomon, E.; Wagner, A.; Bendich, P. From Geometry to Topology: Inverse Theorems for Distributed Persistence. In Proceedings of the 38th International Symposium on Computational Geometry (SoCG 2022), Berlin, Germany, 7–10 June 2022; Leibniz International Proceedings in Informatics (LIPIcs). Goaoc, X., Kerber, M., Eds.; Dagstuhl: Berlin, Germany, 2022; Volume 224, pp. 61:1–61:16. [Google Scholar]
- Zhou, L. Beyond Persistent Homology: More Discriminative Persistent Invariants. Ph.D. Thesis, The Ohio State University, Columbus, OH, USA, 2023. [Google Scholar]
- Belchí, F.; Murillo, A. A∞-persistence. Appl. Algebra Eng. Commun. Comput. 2014, 26, 121–139. [Google Scholar] [CrossRef]
- Herscovich, E. A higher homotopic extension of persistent (co)homology. J. Homotopy Relat. Struct. 2014, 13, 599–633. [Google Scholar] [CrossRef]
- Guss, W.H.; Salakhutdinov, R. On Characterizing the Capacity of Neural Networks using Algebraic Topology. arXiv 2018, arXiv:1802.04443. [Google Scholar]
- Petri, G.; Leitão, A. On the topological expressive power of neural networks. In Proceedings of the Topological Data Analysis and Beyond Workshop at the 34th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
- Walton, S. Isomorphism, Normalizing Flows, and Density Estimation: Preserving Relationships Between Data; Technical Report; University of Oregon, Computer and Information Sciences Department. 2023. Available online: https://www.cs.uoregon.edu/Reports/AREA-202307-Walton.pdf (accessed on 4 July 2024).
- Mahadevan, S. Unifying causal inference and reinforcement learning using higher-order category theory. arXiv 2022, arXiv:2209.06262. [Google Scholar]
- Shiebler, D. Kan Extensions in Data Science and Machine Learning. arXiv 2022, arXiv:2203.09018. [Google Scholar]
- Mahadevan, S. GAIA: Categorical Foundations of Generative AI. arXiv 2024, arXiv:2402.18732. [Google Scholar]
- Mahadevan, S. Empowering Manufacturing: Generative AI Revolutionizes ERP Application. Int. J. Innov. Sci. Res. 2024, 9, 593–595. [Google Scholar] [CrossRef]
- Sridhar, M. Universal Causality. Entropy 2023, 25, 574. [Google Scholar] [CrossRef] [PubMed]
- Mahadevan, S. Causal Homotopy. arXiv 2021, arXiv:2112.01847. [Google Scholar]
- Morales-Álvarez, P.; Sánchez, M. A note on the causal homotopy classes of a globally hyperbolic spacetime. Class. Quantum Gravity 2015, 32, 197001. [Google Scholar] [CrossRef]
- Lafforgue, L. Some Possible Roles for AI of Grothendieck Topos Theory; Technical Report. 2022. Available online: https://www.laurentlafforgue.org/Expose_Lafforgue_topos_AI_ETH_sept_2022.pdf (accessed on 23 August 2024).
- Caramello, O. Grothendieck Toposes as Unifying ‘Bridges’: A Mathematical Morphogenesis. In Objects, Structures, and Logics: FilMat Studies in the Philosophy of Mathematics; Springer International Publishing: Cham, Switzerland, 2022; pp. 233–255. [Google Scholar]
- Villani, M.J.; McBurney, P. The Topos of Transformer Networks. arXiv 2024, arXiv:2403.18415. [Google Scholar]
- Asher, N. Lexical Meaning in Context: A Web of Words; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Márta Abrusán, N.A.; de Cruys, T.V. Content vs. function words: The view from distributional semantics. Zas Pap. Linguist. 2018, 60, 1–21. [Google Scholar] [CrossRef]
- Yasuo Kawahara, H.F.; Mori, M. Categorical representation theorems of fuzzy relations. Inf. Sci. 1999, 119, 235–251. [Google Scholar] [CrossRef]
- Hyland, J.M.E.; Pitts, A.M. The theory of constructions: Categorical semantics and topos-theoretic models. Contemp. Math. 1989, 92, 137–199. [Google Scholar]
- Katsumata, S.Y.; Rival, X.; Dubut, J. A categorical framework for program semantics and semantic abstraction. Electron. Notes Theor. Inform. Comput. 2023, 3, 11-1–11-18. [Google Scholar] [CrossRef]
- Babonnaud, W. A topos-based approach to building language ontologies. In Proceedings of the Formal Grammar: 24th International Conference, FG 2019, Riga, Latvia, 11 August 2019; Proceedings 24. Springer: Berlin/Heidelberg, Germany, 2019; pp. 18–34. [Google Scholar]
- Saba, W.S. Logical Semantics and Commonsense Knowledge: Where Did we Go Wrong, and How to Go Forward, Again. arXiv 2018, arXiv:1808.01741. [Google Scholar]
- Tasić, M. On the knowability of the world: From intuition to turing machines and topos theory. Biocosmol.-Neo-Aristot. 2014, 4, 87–114. [Google Scholar]
- Awodey, S.; Kishida, K. Topological Semantics for First-Order Modal Logic; Online Resource. 2006. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=17cc1aa99e748fddc31320de1409efa78991b913 (accessed on 12 August 2024).
- Wilkins, I. Topos of Noise. Angelaki 2023, 28, 144–162. [Google Scholar] [CrossRef]
- Lafforgue, L. Some Sketches for a Topos-Theoretic AI; Technical Report. 2024. Available online: https://bm2l.github.io/projects/lafforgue/ (accessed on 5 August 2024).
- Bennequin, D.; Belfiore, J.C. Mathematics for AI: Categories, Toposes, Types. In Mathematics for Future Computing and Communications; Cambridge University Press: Cambridge, UK, 2021; pp. 98–132. [Google Scholar]
- Caramello, O.; Lafforgue, L. Ontologies, knowledge representations and Grothendieck toposes. In Proceedings of the Invited Talk to Semantics Workshop, Lagrange Center, Huawei, Paris, France, 3–4 February 2022. [Google Scholar]
- Hmamouche, Y.; Benjillali, M.; Saoudi, S.; Yanikomeroglu, H.; Renzo, M.D. New Trends in Stochastic Geometry for Wireless Networks: A Tutorial and Survey. Proc. IEEE 2021, 109, 1200–1252. [Google Scholar] [CrossRef]
- Belfiore, J.C.; Bennequin, D.; Giraud, X. Logical Information Cells I. arXiv 2021, arXiv:2108.04751. [Google Scholar]
- Bloomfield, C.; Maruyama, Y. Fibered universal algebra for first-order logics. J. Pure Appl. Algebra 2024, 228, 107415. [Google Scholar] [CrossRef]
- Caramello, O. Fibred sites and existential toposes. arXiv 2022, arXiv:2212.11693. [Google Scholar]
- Choi, J.; Loke, S.W.; Park, J. A Unified Approach to Semantic Information and Communication Based on Probabilistic Logic. IEEE Access 2022, 10, 129806–129822. [Google Scholar] [CrossRef]
- Chaccour, C.; Saad, W.; Debbah, M.; Han, Z.; Poor, H.V. Less Data, More Knowledge: Building Next Generation Semantic Communication Networks. IEEE Commun. Surv. Tutorials 2024, 27, 37–76. [Google Scholar] [CrossRef]
- Sergeant-Perthuis, G. Compositional statistical mechanics, entropy and variational inference. In Proceedings of the Twelfth Symposium on Compositional Structures (SYCO 12), Birmingham, UK, 15–16 April 2024. [Google Scholar]
- Youvan, D. Modeling Frustrated Systems within the Topos of Artificial Intelligence: Achieving Coherent Outputs through Categorical and Logical Structures; Online Resource. 2023. Available online: https://www.researchgate.net/publication/381656591_Modeling_Frustrated_Systems_within_the_Topos_of_Artificial_Intelligence_Achieving_Coherent_Outputs_through_Categorical_and_Logical_Structures?channel=doi&linkId=667962688408575b8384bdb4&showFulltext=true (accessed on 6 August 2024).
Aspect | Shiebler et al. (2021) [1] | This Survey |
---|---|---|
Time Coverage | Up to 2021 | Primarily From 2021 to Present |
Main Topics | Gradient-based learning, Bayesian learning, invariant and equivariant learning | Includes recent advancements in the three topics but primarily focuses on topos-based machine learning |
Main Focus | Strong emphasis on functoriality and composability in learning processes | Builds upon composability while extending to additional categorical structures, including higher-order categories |
Some Critical Properties | Causality and interventions are not explicitly addressed; concurrency and dynamic state transitions are not covered; focuses on compositional semantics through component-based structures | Emphasizes causal reasoning through higher-order category theory, particularly via sheaves and presheaves; introduces topos-based approaches for capturing concurrency and dynamic state transitions (e.g., Petri nets); focuses on global semantics |
Applications of Topos Theory | Not Covered | Explores advanced applications of topos theory, including model interpretability, dimensionality reduction, and temporal data analysis |
Case Study | Not Covered | Outlines existing case studies in gradient-based learning, Bayesian learning, invariance and equivariance-based learning, and topos-based learning. |
Theoretical vs. Practical Contributions | Focuses on well-established theoretical insights with practical implications | Highlights emerging research (e.g., Lafforgue’s work), including theoretical contributions with ongoing debates on practical feasibility; highlights research based on persistent homology |
Future Research Implications | Potential future directions are relatively fixed and derived from component combinations | Also provides an outlook on potential developments in topos-based ML research, with a focus on network structures and layer-combination-derived explorations |
Optimizer | Categorical Interpretation | Strengths and Limitations |
---|---|---|
Stochastic Gradient Descent (SGD) | Treated as a morphism in a Cartesian differential category, updating parameters locally in an iterative process | Simple, computationally efficient, widely used in deep learning frameworks. Prone to slow convergence and oscillations in non-convex loss landscapes. |
Adaptive Moment Estimation (ADAM) | Utilizes monoidal categories where additional structures (moment estimates) modulate updates, adapting step sizes dynamically | Faster convergence, effective for sparse gradients, adaptive learning rate. Can lead to poor generalization due to aggressive adaptation of gradients. |
Nesterov Accelerated Gradient (NAG) | Introduces a higher-order functor-like mechanism, predicting transformation effects before applying updates | Reduces oscillations, improves convergence speed over vanilla SGD. Sensitive to hyperparameter tuning, requires additional computations per update. |
Characteristic | Construction | Motivation |
---|---|---|
Parametricity | A neural network is a mapping with parameter, i.e., a function , and a supervised learning is to find a ‘good’ parameter for . Parameters also arise elsewhere like the loss function. | |
Bidirectionality | Information flows bidirectionally as inputs (forward) are sent to outputs and loss through sequential layers, and backpropagation then reverses this flow to update parameters (backward). | |
Differentiation | CRDC | Differentiate the loss function that maps a parameter to its associated loss to reduce that loss, which the CRDC can capture. |
Component | Pictorial Definition | Categorical Construction |
---|---|---|
Model | ||
Loss map | ||
Optimizer | Gradient descent | |
Stateful Optimizer | ||
Learning rate | ||
Corner | ||
Concept | Mathematical Definition | Giry Monad Interpretation | Application in ML |
---|---|---|---|
Measure Space X | Measurable space | Objects in | Data space (e.g., feature space) |
Probability Measure | Measure satisfying axioms | Functor : Maps X to probability distributions | Expresses uncertainty over datasets |
Dirac Measure | if , else 0 | Unit : Assigns point mass to an event | Represents prior knowledge (Bayesian priors) |
Pushforward Measure | for | Functoriality : Transforms probability distributions | Likelihood in Bayesian models |
About Marginalize | Integration over probability measures | Multiplication : | Posterior computation in Bayesian inference |
Bayesian Inference | Pushforward and integration formalize updates | Posterior learning in probabilistic models | |
Variational Autoencoders (VAEs) | Approximate inference over latent space Z | Latent distribution as a Giry monad object | Generative modeling, variational inference |
Probabilistic Programming | Probabilistic computation using distributions | Monadic composition of random variables | Defining probabilistic ML models (e.g., Pyro, Turing.jl) |
Category | Objects | Morphisms | Composition |
---|---|---|---|
(Stochastic Category) | Measurable spaces | Stochastic kernels | Integration of kernels |
(Finite Stochastic) | Finite sets | Stochastic maps (probability matrices) | Matrix multiplication |
(Probabilistic Stochastic) | Measurable spaces | Stochastic kernels preserving measurability | Integral transformation |
Measurable spaces enriched over | Stochastic kernels respecting the enrichment | Enriched composition | |
(Borel Stochastic) | Standard Borel spaces | Borel-measurable Markov kernels | Integral transformation preserving Borel structure |
Monad | Functor | Unit | Multiplication |
---|---|---|---|
Giry Monad | Assigns measurable space X to the space of probability measures | Maps to Dirac measure | Integrates over probability measures: |
Distribution Monad | Assigns X to , the set of probability measures with a measurable structure | Maps x to Dirac measure | , aggregating probability measures |
Probability Monad | Assigns X to , where measures satisfy | Maps x to Dirac measure | , ensuring probability preservation |
Machine Learning Area | Challenges in Conventional Methods | Topos-Based Solutions |
---|---|---|
Graph Neural Networks (GNNs) | Difficulty in capturing long-range dependencies and global structures in graphs | Sheaf theory and presheaves encode local–global relationships, enabling structured information propagation while preserving higher-order dependencies |
Attention Mechanisms in Transformer Networks | Lack of intrinsic geometric interpretation of token dependencies across layers | Free cocompletions in topoi can formalize transformer networks as morphisms within topoi, enhancing interpretability with the internal logic [122] |
Causal Machine Learning | Traditional methods focus on correlations without capturing causal structures | Geometric morphisms between topoi model causal relationships across data representations, enhancing counterfactual reasoning and robustness |
Generative Models (VAEs, GANs) | Ensuring generated samples respect underlying data invariances (e.g., transformations in computer vision) | Topos structure provides a natural framework to enforce invariances through algebraic structures and geometric morphisms |
Research Area | Open Problems | Potential Directions |
---|---|---|
Geometric Morphisms and Learning Dynamics | Formalizing the role of geometric morphisms in machine learning and optimization | Investigate how coarse-graining and refinement interact across learning scales, particularly in hierarchical architectures |
Language Structure of Information | Understanding the impact of Galois group actions, fibered information spaces, and fundamental groups on deep learning | Develop robust representation learning and algorithmic reasoning using categorical invariances |
Higher-Order Categorical Structures in ML | Extending higher categorical structures (e.g., 3-categories) to model deep learning information flow | Apply categorical compositional semantics to reinforcement learning, generative models, and hierarchical architectures |
Topos-Based Representations for Neural Architectures | Investigating how topoi enhance interpretability beyond transformers | Apply topos-based structures to CNNs, RNNs, and attention mechanisms to improve abstraction and long-range dependencies |
Semantic Communication and Learning | Bridging semantic information theory with machine learning frameworks | Explore how logical program synthesis and categorical reasoning can improve probabilistic and structured learning models |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jia, Y.; Peng, G.; Yang, Z.; Chen, T. Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey. Axioms 2025, 14, 204. https://doi.org/10.3390/axioms14030204
Jia Y, Peng G, Yang Z, Chen T. Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey. Axioms. 2025; 14(3):204. https://doi.org/10.3390/axioms14030204
Chicago/Turabian StyleJia, Yiyang, Guohong Peng, Zheng Yang, and Tianhao Chen. 2025. "Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey" Axioms 14, no. 3: 204. https://doi.org/10.3390/axioms14030204
APA StyleJia, Y., Peng, G., Yang, Z., & Chen, T. (2025). Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey. Axioms, 14(3), 204. https://doi.org/10.3390/axioms14030204