A Logifold Structure for Measure Space

Inkee Jung; Siu-Cheong Lau

doi:10.3390/axioms14080599

and

Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA

^*

Author to whom correspondence should be addressed.

Axioms2025, 14(8), 599;https://doi.org/10.3390/axioms14080599

This article belongs to the Special Issue Recent Advances in Function Spaces and Their Applications

Version Notes

Order Reprints

Abstract

In this paper, we develop a geometric formulation of datasets. The key novel idea is to formulate a dataset to be a fuzzy topological measure space as a global object and equip the space with an atlas of local charts using graphs of fuzzy linear logical functions. We call such a space a logifold. In applications, the charts are constructed by machine learning with neural network models. We implement the logifold formulation to find fuzzy domains of a dataset and to improve accuracy in data classification problems.

Keywords:

logifold; dataset; neural network; measure theory; fuzzy space; data classification; machine learning

MSC:

46S40; 14P10; 68T07; 53Z50

1. Introduction

In geometry and topology, the manifold approach dates back to Riemann using open subsets in

R^{n}

as local models to build a space. Such a local-to-global principle is central to geometry and has achieved extremely exciting breakthroughs in modeling spacetime by Einstein’s theory of relativity.

In recent years, the rapid development of data science brings immense interest to datasets that are ‘wilder’ than typical spaces that are well-studied in geometry and topology. Taming the wild is a central theme in the development of mathematics. Advances in computational tools have helped to expand the realm of mathematics in history. For instance, it took many years in human history to recognize the irrational number

π

and approximate it by rational numbers. In this regard, we consider machine learning by neural network models as a modern tool to find expressions of a ‘wild space’ (for instance a dataset in real life) as a union (limit) of fuzzy geometric spaces expressed by finite formulae.

The mathematical background of this paper lies in manifold, probability, and measure theory. Only basic mathematical knowledge in these two subjects is necessary. For instance, the textbooks [1,2] provide excellent introduction to measure theory and manifold theory, respectively.

Let X be a topological space,

B_{X}

the corresponding Borel

σ

-algebra, and

μ

a measure on

(X, B_{X})

. We understand a dataset, for instance, the collection of all labeled appearances of cats and dogs, as a fuzzy topological measure space in nature. In the example of cats and dogs, topology concerns about the nearby appearances of the objects; measure concerns about how typical each appearance is; and fuzziness comes from how likely the objects belonging to cats, dogs, or neither.

To work with such a complicated space, we would like to have local charts that admit finite mathematical expressions and have logical interpretations, playing the role of local coordinate systems for a measure space. In our definition, a local chart

U \subset X

is required to have a positive measure. Moreover, to avoid triviality and requiring too many charts to cover the whole space, we may further fix

ϵ > 0

and require that

μ (X - U) < ϵ

. Such a condition disallows U to be too simple, such as a tiny ball around a point in a dataset. This resembles the Zariski-open condition in algebraic geometry.

In place of open subsets of

R^{n}

, we formulate ‘local charts’ that are closely related to neural network models and have logic gate interpretations. Neural network models provide a surprisingly successful tool for finding mathematical expressions that approximate a dataset. Non-differentiable or even discontinuous functions analogous to logic gate operations are frequently used in network models. It provides an important class of non-smooth and even discontinuous functions to study a space.

We take classification problems as the main motivation in this paper. For this, we consider the graph of a function

f : D \to T

, where D is a measurable subset in

R^{n}

(with the standard Lebesgue measure) and T is a finite set (with the discrete topology). The graph

gr (f) \subset D \times T

is equipped with the push-forward measure by

D \to gr (f)

.

We use the graphs of linear logical functions

f : D \to T

explained below as local models. A chart is of the form

(U, Φ)

, where

U \subset X

is a measurable subset which satisfies

μ (X - U) < ϵ

, and

Φ : U \to gr (f)

is a measure-preserving homeomorphism. We define a linear logifold to be a pair

(X, U)

, where

U

is a collection of charts

(U_{i}, Φ_{i})

such that

μ (X - ⋃_{i} U_{i}) = 0

. In applications, the condition makes sure that the logifold covers almost every element (in the measure-theoretical sense) in the dataset. Figure 1 provides a simple example for a logifold.

Figure 1. An example of a logifold. The graph jumps over values 0 and 1 infinitely in left-approaching to the point marked by a star (and the length of each interval is halved). This is covered by infinitely many charts of linear logical functions, each of which has only finitely many jumps. Moreover, the base is a measurable subset of

R

(which is hard to depict and not shown in the picture).

In the example of cats and dogs,

U \subset X

is a representing sample collection of appearances of cats and dogs. We take 2D images with n number of pixels for these appearances and get a function

Φ : U \to D \times {0, 1}

where

T = {0, 1}

is the collection of the labels and

D \subset R^{n}

is the collection of pictures. The image

Φ (U) \subset R^{n} \times {0, 1}

is given by the graph of a function

f : D \to {0, 1}

, which tells whether each picture shows a cat or a dog.

The definition of linear logical functions is motivated from neural network models and has a logic gate interpretation. A network model consists of a directed graph, whose arrows are equipped with linear functions and vertices are equipped with non-linear functions, which are typically ReLu or sigmoid functions in middle layers and are sigmoid or softmax functions in the last layer. The review paper [3] provides an excellent overview of deep learning models. Note that sigmoid and softmax functions are smoothings of the discrete-valued step function and the index-max function, respectively. Such smoothings are useful to describe the fuzziness of data.

From this perspective, step and index-max functions are the non-fuzzy (or called classical) limit of sigmoid and softmax functions, respectively. We will take such a limit first, and come back to fuzziness in a later stage. This means we replace all sigmoid and softmax functions in a neural network model by step and index-max functions. We will show that such a neural network is equivalent to a linear logical graph as follows: at each node of the directed graph, there is a system of N linear inequalities (on the input Euclidean domain

D \subset R^{n}

) that produces

2^{N}

possible Boolean outcomes for an input element, which determine the next node that this element will be passed to. We call the resulting function a linear logical function.

From the perspective of functional analysis, the functions under consideration have targets being finite sets or simplices, which are not vector spaces. Thus, the set of functions

D \to T

is NOT a vector space. This is the main difference from the typical setting of Fourier analysis. We discuss more about this aspect using semiring structures in Section 2.5.

We prove that linear logical functions can approximate any given measurable function

f : D \to T

, where D is a measurable subset of

R^{n}

with

μ (D) < \infty

. This provides a theoretical basis of using these functions in modeling.

Theorem 1

(Universal approximation theorem by linear logical functions). Let

f : D \to T

be a measurable function whose domain

D \subset R^{n}

is of finite Lebesgue measure, and suppose that its target set T is finite. For any

ϵ > 0

, there exists a linear logical function L and a measurable set

E \subset R^{n}

of the Lebesgue measure less than ϵ such that

{L |}_{D - E} {\equiv f |}_{D - E}

.

By taking the limit

ϵ \to 0

, the above theorem finds a linear logifold structure on the graph of a measurable function. In reality,

ϵ

reflects the error of a network in modeling a dataset.

It turns out that for

D = R^{n}

, linear logical functions

R^{n} \to T

(where T is identified with a finite subset of

R

) are equivalent to semilinear functions

R^{n} \to T

, whose graphs are semilinear sets defined by linear equations and inequalities [4]. Semilinear sets provide the simplest class of definable sets of so-called o-minimal structures, which are closely related to model theory in mathematical logic. O-minimal structures made an axiomatic development of Grothendieck’s idea of finding tame spaces that exclude wild topology. On the one hand, definable sets have a finite expression which is crucial to the predictive power and interpretability in applications. On the other hand, our setup using measurable sets

D \subset R^{n}

provides a larger flexibility for modeling data.

Compared to more traditional approximation methods such as Fourier series, there are reasons why linear logical functions are preferred in many situations for data. When the problem is discrete in nature (for instance the target set T is finite), it is simple and natural to take the most basic kinds of discrete-valued functions as building blocks, namely step functions formed by linear inequalities. These basic functions are composed to form networks which are supported by current computational technology. Moreover, such discrete-valued functions have fuzzy and quantum deformations which have rich meanings in mathematics and physics.

Now let us address fuzziness, another important feature of a dataset besides discontinuities. In practice, there is always an ambiguity in determining whether a point belongs to a dataset. This is described as a fuzzy space

(X, P)

, where X a topological measure space and

P : X \to (0, 1]

is a continuous measurable function that encodes the probability of whether a given point of X belongs to the fuzzy space under consideration. Here, we require

P > 0

on purpose, and while points of zero probability of belonging may be adjoined to X to that the union gets simplified (for instance, X may be embedded into

R^{n}

, where points in

R^{n} - X

have zero probability of belonging), they are auxiliary and have no intrinsic meaning.

Let us illustrate by the above-mentioned example of cats and dogs. The fuzzy topological measure space X consists of all possible labeled appearances of cats and dogs. The function value

P (x) \in (0, 1]

expresses the probability of x belonging to the dataset, in other words, how likely the label for the appearance is correct.

To be useful, we need a finite mathematical expression (or approximation) for

P

. This is where neural network models enter into the description. A neural network model for a classification problem that has the softmax function in the last layer gives a function

f = (f_{1}, \dots, f_{d}) : R^{n} \to S

, where S is the standard simplex

{\sum_{i = 1}^{d} y_{i} = 1} \subset R^{d}

. This gives a fuzzy space

(R^{n} \times T, P),

where

T = {1, \dots, d}

and

P (p, t) : = f_{t} (p)

. As we have explained above, in the non-fuzzy limit, sigmoid and softmax functions are replaced by their classical counterparts of step and index-max functions, respectively, and we obtain

f^{classical} : R^{n} \to T

and the subset

{(p, t) : f^{classical} (p) = t} \subset R^{n} \times T

as the classical limit. Figure 2 shows a very simple example of a fuzzy space and its classical and quantum analogs.

Figure 2. The left hand side shows a simple example of a logifold. It is the graph of the step function

[- 1, 1] \to {0, 1}

. The figure in the middle shows a fuzzy deformation of it, which is a fuzzy subset in

[- 1, 1] \times {0, 1}

. The right hand side shows the graph of the probability distribution of a quantum observation, which consists of the maps

\frac{| z_{0} |^{2}}{| z_{0} |^{2} + {| z_{1} |}^{2}}

and

\frac{| z_{1} |^{2}}{| z_{0} |^{2} + {| z_{1} |}^{2}}

from the state space

P^{1}

to

[0, 1]

.

However, the ambient space

R^{n}

is not intrinsic; for instance, in the context of images, the dimension n gets bigger if we take images with higher resolutions, even though the objects under concern remain the same. Thus, like in manifold theory, the target space is taken to be a topological space rather than

R^{n}

, and our theory takes a topological measure space X in place of

R^{n}

(or

R^{n} \times T

).

R^{n} \times T

(for various possible n) is taken as an auxiliary ambient space that contains (fuzzy) measurable subsets that serve as charts to describe a dataset

(X, P)

.

Generally, we formulate fuzzy linear logical functions (Definition 2) and fuzzy linear logifolds (Definition 9). A fuzzy logical graph is a directed graph G whose each vertex of G is equipped with a state space, and each arrow is equipped with a continuous map between the state spaces. The walk on the graph (determined by inequalities) depends on the fuzzy propagation of the internal state spaces.

Our logifold formulation of a dataset can be understood as a geometric theory for ensemble learning and the method of Mixture of Experts. Ensemble learning utilizes multiple trained models to make a decision or prediction, see, for instance [5,6,7]. Ensemble machine learning achieves improvement in classification problems, see, for instance [8,9]. In the method of Mixture of Experts (see, for instance [10,11]), several expert models

E_{j}

are employed, and there is also a gating function

G_{j} (x)

. The final outcome is given by the total

\sum_{j} G_{j} (x) E_{j} (x)

. This idea of using ‘experts’ to describe a dataset is similar to the formulation of a fuzzy logifold. On the other hand, motivated by manifold theory, we formulate universal mathematical structures that are common to datasets, namely the global intrinsic structure of a fuzzy topological measure space, and local logical structures among data points expressed by graphs of fuzzy logical functions.

The research design of this paper is characterized by its dual approach, outlined as follows: it involves rigorous mathematical constructions and formalization of logifolds and their properties, complemented by the design of algorithms, empirical validation through experiments that demonstrate their practical advantages in enhancing prediction accuracy for ensemble machine learning. This dual approach ensures both the theoretical soundness and practical utility of the proposed framework.

For readers who are more computationally oriented, they can first directly go to Section 4, where we describe the implementation of the logifold theory in algorithms. The key new ingredient in our implementation is the fuzzy domain of each model. A trained model typically does not have a perfect accuracy rate and performs well only on a subset of data points, or for a subset of target classes. A major step here is to find and record the domain of each model where it works well. Leveraging certainty scores derived from the softmax outputs of classifiers, a model’s prediction is only used in the implemented logifold structure if its certainty exceeds a predefined threshold, allowing for a refined voting system (Section 4.5).

Two experiments were conducted for logifold and they were published in our paper [12] using the refined voting system on the following well-known benchmark datasets: CIFAR10 [13], MNIST [14], and Fashion MNIST [15]. We summarize the experimental results in Section 5.

Organization

The structure of this paper is as follows. First, we formulate fuzzy linear logical functions in Section 2. Next, we establish relations with semilinear functions in Section 3.1, prove the universal approximation theorem for linear logical functions in Section 3.2, and define fuzzy linear logifolds in Section 3.3. We provide a detailed description of the algorithmic implementation of logifolds in Section 4.

2. Linear Logical Functions and Their Fuzzy Analogs

Given a subset of

R^{n}

, one would like to describe it as the zero locus, the image, or the graph of a function in a certain type. In analysis, we typically think of continuous/smooth/analytic functions. However, when the domain is not open, smoothness may not be the most relevant condition.

The success of network models has taught us a new kind of function that is surprisingly powerful in describing datasets. Here, we formulate them using directed graphs and call them linear logical functions. The functions offer three distinctive advantages. First, they have the advantage of being logically interpretable in theory. Second, they are close analogues of quantum processes. Namely, they are made up of linear functions and certain non-linear activation functions, which are analogous to unitary evolution and quantum measurements. Finally, it is natural to add fuzziness to these functions and hence they are better adapted to describe statistical data.

2.1. Linear Logical Functions and Their Graphs

We consider functions

D \to T

for

D \subset R^{n}

and a finite set T constructed from a graph as follows. Let G be a finite directed graph that has no oriented cycle and has exactly one source vertex which has no incoming arrow and

| T |

target vertices. Each vertex that has more than one outgoing arrows is equipped with an affine linear function

l = (l_{1}, \dots, l_{k})

on

R^{n}

, where the outgoing arrows at this vertex are one-to-one corresponding to the chambers in

R^{n}

subdivided by the hyperplanes

{l_{i} = 0}

. Explicitly, let

I \subset {1, \dots, k}

, and consider

C_{I} : = {x \in R^{n} : l_{i} \geq 0 if i \in I and l_{i} < 0 otherwise} .

If

C_{I}

is non-empty, we call

C_{I}

a chamber associated with

l = (l_{1}, \dots, l_{k})

.

Definition 1.

A linear logical function

f_{G, L} : D \to T

is a function made in the following way from

(G, L)

, where G is a finite directed graph that has no oriented cycle and has exactly one source vertex and

| T |

target vertices,

L = {l_{v} : v s . is a vertex with more than one outgoing arrows},

l_{v}

are affine linear functions whose chambers in D are one-to-one corresponding to the outgoing arrows of v.

(G, L)

is called a linear logical graph.

Given

x \in D

, we get a path from the source vertex to one of the target vertices in G as follows. We start with the source vertex. At a vertex v, if there is only one outgoing arrow, we simply follow that arrow to reach the next vertex. If there are more than one outgoing arrows, we consider the chambers made by the affine linear function

l_{v}

associated with the vertex v, and pick the outgoing arrow that corresponds to the chamber that x lies in. See Figure 3. Since the graph is finite and has no oriented cycle, we will stop at a target vertex, which is associated with an element

t \in T

. This defines the function

f_{G, L}

by setting

f_{G, L} (x) = t

.

Figure 3. The left side shows a partial directed graph at vertex v, with five outgoing arrows. On the right, chambers are formed in

R^{2}

by the affine maps

L_{v} = (l_{1}, l_{2}, l_{3})

defined on

R^{2}

. A point x is marked in the chamber defined by

{l_{1} \leq 0, l_{2} \geq 0, l_{3} \leq 0}

. One of the arrows corresponding to the shaded chamber containing x is highlighted in the left diagram.

Proposition 1.

Consider a feed-forward network model whose activation function at each hidden layer is the step function and that at the last layer is the index-max function. The function is of the form

σ \circ L_{N} \circ s_{N - 1} \circ L_{N - 1} \circ \dots \circ s_{1} \circ L_{1},

where

L_{i} : R^{n_{i - 1}} \to R^{n_{i}}

are affine linear functions with

n_{0} = n

,

s_{i}

are the entrywise step functions and σ is the index-max function. We make the generic assumption that the hyperplanes defined by

L_{i}

for

i = 2, \dots, N

do not contain

s_{i - 1} \circ \dots \circ L_{1} (D)

. Then this is a linear logical function with target

T = {1, \dots, n_{N}}

(on any

D \subset R^{n}

, where

R^{n}

is the domain of

L_{1}

).

Proof.

The linear logical graph

(G, L)

is constructed as follows. The source vertex

v_{0}

is equipped with the affine linear function

L_{1}

. Then we make N number of outgoing arrows of

v_{0}

(and corresponding vertices), where N is the number of chambers of

L_{1}

, which are one-to-one corresponding to the possible outcomes of

s_{1}

(which form a finite subset of

{0, 1}^{n_{1}}

). Then we consider

s_{2} \circ L_{2}

restricted to this finite set, which also has a finite number of possible outcomes. This produces exactly one outgoing arrow for each of the vertices in the first layer. We proceed inductively. The last layer

σ \circ L_{N}

is similar and has

n_{N}

possible outcomes. Thus, we obtain

(G, L)

as claimed, where L consists of only one affine linear function

L_{1}

over the source vertex. □

Figure 4 depicts the logical graph in the above proposition.

Figure 4. A linear logical graph for a feed-forward network whose activation function at each hidden layer is the step function and that at the last layer is the index-max function.

Proposition 2.

Consider a feed-forward network model whose activation function at each hidden layer is the ReLu function and that at the last layer is the index-max function. The function takes the form

σ \circ L_{N} \circ r_{N - 1} \circ L_{N - 1} \circ \dots \circ r_{1} \circ L_{1},

where

L_{i} : R^{n_{i - 1}} \to R^{n_{i}}

are affine linear functions with

n_{0} = n

,

r_{i}

are the entrywise ReLu functions, and σ is the index-max function. This is a linear logical function.

Proof.

We construct a linear logical graph

(G, L)

, which produces this function. The first step is similar to the proof of the above proposition. Namely, the source vertex

v_{0}

is equipped with the affine linear function

L_{1}

. Next, we make N number of outgoing arrows of

v_{0}

(and corresponding vertices), where N is the number of chambers of

L_{1}

, which are one-to-one corresponding to the possible outcomes of the sign vector of

r_{1}

(which form a finite subset of

{0, +}^{n_{1}}

). Now we consider the next linear function

L_{2}

. For each of these vertices in the first layer, we consider

L_{2} \circ r_{1} \circ L_{1}

restricted to the corresponding chamber, which is a linear function on the original domain

R^{n}

, and we equip this function to the vertex. Again, we make a number of outgoing arrows that correspond to the chambers in

R^{n}

made by this linear function. We proceed inductively, and get to the layer of vertices that correspond to the chambers of

L_{N - 1} \circ r_{N - 2} \circ L_{N - 3} \circ \dots \circ r_{1} \circ L_{1}

. Write

L_{N} = (l_{1}, \dots, l_{n_{N}})

, and consider

{\tilde{L}}_{N} = (l_{i} - l_{j} : i \neq j)

. At each of these vertices,

{\tilde{L}}_{N} \circ r_{N - 1} \circ L_{N - 1} \circ \dots \circ r_{1} \circ L_{1}

restricted on the corresponding chamber is a linear function on the original domain

R^{n}

, and we equip this function to the vertex and make outgoing arrows corresponding to the chambers of the function. In each chamber, the index i that maximizes

l_{i} \circ r_{N - 1} \circ L_{N - 1} \circ \dots \circ r_{1} \circ L_{1}

is determined, and we make one outgoing arrow from the corresponding vertex to the target vertex

i \in T

. □

By the above propositions, ReLu/sigmoid-based feed forward network functions are linear logical functions. Thus, the above linear logical functions have the same computational complexity as the corresponding ReLu/sigmoid-based functions. Figure 5 depicts the logical graph in the above proposition.

Figure 5. The linear logical graph for a feed-forward network whose activation function at each hidden layer is the ReLu function and that at the last layer is the index-max function.

In classification problems, T is the set of labels for elements in D, and the data determine a subset in

D \times T

as a graph of a function. The deep learning of network models provides a way to approximate the subset as the graph of a linear logical function

gr (f_{G, L})

. Theoretically, this gives an interpretation of the dataset; namely, the linear logical graph gives a logical way to deduce the labels based on linear conditional statements on D.

The following lemma concerns the monoidal structure on the set of linear logical functions on D.

Lemma 1.

Let

f_{G_{i}, L_{i}} : D \to T_{i}

be linear logical functions for

i \in I

, where

I = {1, \dots, k}

. Then

(f_{G_{i}, L_{i}} : i \in I) : D \to \prod_{i \in I} T_{i}

is also a linear logical function.

Proof.

We construct a linear logical graph out of

G_{i}

for

i \in I

as follows. First, take the graph

G_{1}

. For each target vertex of

G_{1}

, we equip it with the linear function at the source vertex of

G_{2}

and attach to it the graph

G_{2}

. The target vertices of the resulting graph are labeled by

T_{1} \times T_{2}

. Similarly, each target vertex of this graph is equipped with the linear function at the source vertex of

G_{3}

and attached with graph

G_{3}

. Inductively, we obtain the required graph, whose target vertices are labeled by

\prod_{i \in I} T_{i}

. By this construction, the corresponding function is

(f_{G_{i}, L_{i}} : i \in I)

. □

f_{(G, L)}

admits the following algebraic expression in the form of a sum over paths which has an important interpretation in physics. The proof is straightforward and is omitted. A path in a directed graph is a finite sequence of composable arrows. The set of all linear combinations of paths and the trivial paths at vertices form an algebra by concatenation of paths.

Proposition 3.

Given a linear logical graph

(G, L)

,

f_{(G, L)} (x) = h (\sum_{γ} c_{γ} (x) γ) = \sum_{γ} c_{γ} (x) h (γ),

where the sum is over all possible paths γ in G from the source vertex to one of the target vertices;

h (γ)

denotes the target vertex that γ heads to for

γ = a_{r} \dots a_{1}

,

c_{γ} (x) = \prod_{i = 1}^{r} s_{a_{i}} (x),

where

s_{a} (x) = 1

if x lies in the chamber corresponding to the arrow a; otherwise, it is 0. In the above sum, exactly one of the terms is non-zero.

2.2. Zero Locus

Alternatively, we can formulate the graphs

gr (f_{G, L})

as zero loci of linear logical functions targeted at the field

F_{2}

with two elements as follows. Such a formulation has the advantage of making the framework of algebraic geometry available in this setting.

Proposition 4.

For each linear logical function

f_{G, L} : D \to T

, there exists a linear logical function

f_{\tilde{G}, \tilde{L}} : D \times T \to F_{2}

whose zero locus in

D \times T

equals

gr (f_{G, L})

.

Proof.

Given a linear logical function

f_{G, L} : D \to T

, we construct another linear logical function

f_{\tilde{G}, \tilde{L}} : D \times T \to F_{2}

as follows. Without loss of generality, let

T = {1, \dots, p}

, so that

D \times T

is embedded as a subset of

R^{n + 1}

. Any linear function on

R^{n}

is pulled back as a linear function on

R^{n + 1}

by the standard projection

R^{n + 1} \to R^{n}

that forgets the last component. Then

f_{G, L}

is lifted as a linear logical function

D \times T \to T

.

Consider the corresponding graph

(G, L)

. For the k-th target vertex of

(G, L)

(that corresponds to

k \in T

), we equip it with the linear function

(y - (k - 1 / 2), (k + 1 / 2) - y) : R^{n + 1} \to R^{2},

where y is the last coordinate of

R^{n + 1}

. This linear function produces three chambers in

R^{n + 1}

. Correspondingly, we make three outgoing arrows of the vertex. Finally, the outcome vertex that corresponds to

(+, +)

is connected to the vertex

0 \in F_{2}

; the other two outcome vertices are connected to the vertex

1 \in F_{2}

. We obtain a linear logical graph

(\tilde{G}, \tilde{L})

and the corresponding function

f_{\tilde{G}, \tilde{L}} : D \times T \to F_{2}

.

By construction,

f_{\tilde{G}, \tilde{L}} (x, y) = 0

for

(x, y) \in D \times T

if and only if

y = f_{G, L} (x) \in T

. Thus, the zero locus of

f_{\tilde{G}, \tilde{L}}

is the graph of

f_{G, L}

. □

The set of functions (with a fixed domain) valued in

F_{2}

forms a unital commutative and associative algebra over

F_{2}

, which is known as a Boolean algebra.

Proposition 5.

The subset of linear logical functions

D \to F_{2}

forms a Boolean ring

L

(for a fixed

D \subset R^{n}

).

Proof.

We need to show that the subset is closed under addition and multiplication induced from the corresponding operations of

F_{2}

.

Let

f_{(G_{1}, L_{1})}

and

f_{(G_{2}, L_{2})}

be linear logical functions

D \to F_{2}

. By Lemma 1,

(f_{(G_{1}, L_{1})}, f_{(G_{2}, L_{2})}) : D \to {(F_{2})}^{2}

is a linear logical function. Consider the corresponding logical graph. The target vertices are labeled by

(s_{1}, s_{2}) \in {(F_{2})}^{2}

. We connect each of them to the vertex

s_{1} + s_{2} \in F_{2}

by an arrow. This gives a linear logical graph whose corresponding function is

f_{(G_{1}, L_{1})} + f_{(G_{2}, L_{2})}

. We obtain

f_{(G_{1}, L_{1})} \cdot f_{(G_{2}, L_{2})}

in a similar way. □

In this algebro-geometric formulation, the zero locus of

f_{G, L} : D \to F_{2}

corresponds to the ideal

(f_{G, L}) \subset L

.

2.3. Parameterization

The graph

gr (f_{G, L})

of a linear logical function can also be put in parametric form. For the moment, we assume the domain D is finite. First, we need the following lemma.

Lemma 2.

Assume

D \subset R^{n}

is finite. Then the identity function

I_{D} : D \to D

is a linear logical function.

Proof.

Since D is finite, there exists a linear function

l : R^{n} \to R^{N}

such that each chamber of l contains at most one point of D. Then we construct the linear logical graph G as follows. The source vertex is equipped with the linear function l and outgoing arrows corresponding to the chambers of l. Elements of D are identified as the target vertices of these arrows which correspond to chambers that contain them. The corresponding function

f_{G, L}

equals

I_{D}

. □

Proposition 6.

Given a linear logical function

f_{G, L} : D \to T

with finite

| D |

, there exists an injective linear logical function

D \to D \times T

whose image equals

gr (f_{G, L})

.

Proof.

By Lemmas 1 and 2,

(I_{D}, f_{G, L})

is a linear logical function. By definition, its image equals

gr (f_{G, L})

. □

2.4. Fuzzy Linear Logical Functions

Another important feature of a dataset is its fuzziness. Below, we formulate the notion of a fuzzy linear logical function and consider its graph. Basic notions of fuzzy logic can be found in textbooks such as [16]. There are many developed applications of fuzzy logic such as modeling, control, pattern recognition and networks, see, for instance [17,18,19,20,21].

Definition 2.

Let G be a finite directed graph that has no oriented cycle and has exactly one source vertex and target vertices

t_{1}, \dots, t_{K}

as in Definition 1. Each vertex v of G is equipped with a product of standard simplices

P_{v} = \prod_{k = 1}^{m_{v}} S^{d_{v, k}}, S^{d_{v, k}} = \{(y_{1}, \dots, y_{d_{v, k}}) \in R_{\geq 0}^{d_{v, k}} : \sum_{i = 1}^{d_{v, k}} y_{i} \leq 1\}

for some integers

m_{v} > 0

,

d_{v, k} \geq 0

.

P_{v}

is called the internal state space of the vertex v. Let D be a subset of the internal state space of the source vertex of G. Each vertex v that has more than one outgoing arrow is equipped with an affine linear function

l_{v} : \prod_{k = 1}^{m_{v}} R^{d_{v, k}} \to R^{j}

for some

j > 0

, and we require that the collection of non-empty intersections

C_{I} \cap P_{v}

of the chambers

C_{I}

of

l_{v}

with the product simplex

P_{v}

are one-to-one corresponding to the outgoing arrows of v. Let L denote the collection of these affine linear functions

l_{v}

. Moreover, each arrow a is equipped with a continuous function

p_{a} : P_{s (a)} \to P_{t (a)},

where

s (a), t (a)

denote the source and target vertices, respectively.

We call

(G, L, P, p)

a fuzzy linear logical graph. Let

P^{out}

denote the disjoint union

∐_{l = 1}^{K} P_{t_{l}}

.

(G, L, P, p)

determines a function

f_{(G, L, P, p)} : D \to P^{out}

as follows. Given

x \in D

, it is an element of the internal state space

P_{v}

of the source vertex v. Moreover, it lies in a unique chamber of the affine linear function

l_{v}

at the source vertex v. Let

v^{'}

be the head vertex of the outgoing arrow a corresponding to this chamber. We have the element

p_{a} (x) \in P_{v^{'}}

. By repeating this process, we obtain a path from the source vertex to one of the target vertices

t_{l}

, and also an element in the internal state space

P_{t_{l}}

that we define as

f_{(G, L, P, p)} (x)

. The resulting function

f_{(G, L, P, p)}

is called a fuzzy linear logical function.

In the above definition, a standard choice of a continuous map between product simplices

\prod_{j = 1}^{m} S^{d_{j}} \to \prod_{k = 1}^{n} S^{r_{k}}

is using the softmax function

\tilde{σ}

:

{(\tilde{σ} \circ l_{k} ({\vec{x}}_{1}, \dots, {\vec{x}}_{m}))}_{k = 1}^{n},

where

l_{k} : \prod_{j = 1}^{m} R^{d_{j}} \to R^{r_{k}}

are affine linear functions.

\tilde{σ} : R^{r} \to S^{r}

is defined by

\tilde{σ} (z_{1}, \dots, z_{r}) = (\frac{e^{z_{1}}}{1 + \sum_{i = 1}^{r} e^{z_{i}}}, \dots, \frac{e^{z_{r}}}{1 + \sum_{i = 1}^{r} e^{z_{i}}}) .

This is the sigmoid function when

r = 1

.

Remark 1.

We can regard the definition of a fuzzy linear function as a generalization of the linear logical function in Definition 1. Note that the linear functions

l_{v}

in L in the above definition have domain to be the internal state spaces over the corresponding vertices v. In comparison, the linear functions

l_{v}

in L in Definition 1 have domain as the input space

R^{n}

. A linear logical graph

(G, L)

in Definition 1 has no internal state space except the

R^{n}

at the input vertex.

To relate the two notions given by the above definition and Definition 1, we can set

P_{v} = S^{n}

as the same for all vertices v except the target vertices

t_{1}, \dots, t_{K}

, which are equipped with the zero-dimensional simplex (a point), and we set

p_{a}

as the identity maps for all arrows a that are not targeted at any of

t_{i}

. Then

f_{(G, L, P, p)}

reduces back to a linear logical function in Definition 1.

Remark 2.

We call the corners of the convex set

P_{v}

state vertices, which take the form

e_{I} = (e_{i_{1}}, \dots, e_{i_{m_{v}}}) \in P_{v}

for a multi-index

I = (i_{1}, \dots, i_{m_{v}})

, where

{e_{0}, \dots, e_{d_{v, k}}} \subset R^{d_{v, k} + 1}

is the standard basis. We have a bigger graph

\hat{G}

by replacing each vertex v of G by the collection of state vertices over v and each arrow of G by the collection of all possible arrows from source state vertices to target state vertices. Then the vertices of G are interpreted as ‘layers’ or ‘clusters’ of vertices of

\hat{G}

. The input state

x \in D

, the arrow linear functions L, and the maps between state spaces p determine the probability of getting to each target state vertex of

\hat{G}

from the source vertex.

In this interpretation, we take the target set to be the disjoint union of corners of

P_{t_{l}}

at the target vertices

t_{1}, \dots, t_{K}

as follows:

T = ∐_{l = 1}^{K} T_{l} : = ∐_{l = 1}^{K} \{e_{I} : I = (i_{1}, \dots, i_{m_{t_{l}}}) for i_{k} \in {0, \dots, d_{t_{l}, k}}\},

(1)

which is a finite set. The function

f = f_{(G, L, P, p)}

determines the probability of the outcome for each input state

x \in D

as follows. Let

f (x) \in P_{t_{l}} = \prod_{k = 1}^{m_{t_{l}}} S^{d_{t_{l}, k}}

for some

l = 1, \dots, K

. Then the probability of being in

T_{j}

is zero for

j \neq l

. Writing

f (x) = (f_{1} (x), \dots, f_{m_{t_{l}}} (x))

for

f_{k} (x) \in S^{d_{t_{l}, k}}

, the probability of the output to be

t = e_{I} \in T_{l}

for

I = (i_{1}, \dots, i_{m_{t_{l}}})

is given by

\prod_{k = 1}^{m_{t_{l}}} f_{k}^{(i_{k})} (x) \in [0, 1]

.

Proposition 7.

Consider the function

f = \tilde{σ} \circ L_{N} \circ {\tilde{s}}_{N - 1} \circ L_{N - 1} \circ \dots \circ {\tilde{s}}_{1} \circ L_{1}

given by a feed-forward network model whose activation function at each hidden layer is the sigmoid function (denoted by

{\tilde{s}}_{i}

), and that at the last layer it is the softmax function

\tilde{σ}

. f is a fuzzy linear logical function.

Proof.

We set

(G, L, P, p)

as follows. G is the graph that has

(N + 1)

vertices

v_{0}, \dots, v_{N}

with arrows

a_{i}

from

v_{i - 1}

to

v_{i}

for

i = 1, \dots, N

. L is just an empty set.

P_{i} : = {(S^{1})}^{m_{i}}

for

i = 0, \dots, N - 1

, where

m_{i}

is the dimension of the domain of

L_{i + 1}

. The one-dimensional simplex

S^{1}

is identified with the interval

[0, 1]

.

P_{N} : = S^{m_{N}}

, where

m_{N}

is the dimension of the target of

L_{N}

. Then

\begin{matrix} p_{i} : = {\tilde{s}}_{i} \circ L_{i} |_{{[0, 1]}^{m_{i - 1}}} for i = 1, \dots, N - 1 \\ p_{N} : = \tilde{σ} \circ L_{N} |_{{[0, 1]}^{m_{N - 1}}} . \end{matrix}

Then

f = f_{(G, L, P, p)}

. □

Proposition 8.

Consider the function

f = \tilde{σ} \circ L_{N} \circ r_{N - 1} \circ L_{N - 1} \circ \dots \circ r_{1} \circ L_{1}

given by a feed-forward network model whose activation function at each hidden layer is the ReLu function (denoted by

r_{i}

), and that at the last layer it is the softmax function

\tilde{σ}

. Here, each

L_{i}

denotes an affine linear function. The function f is a fuzzy linear logical function.

Proof.

We need to construct a fuzzy linear logical graph

(G, L, P, p)

such that

f = f_{(G, L, P, p)}

. We take G to be the logical graph constructed in Example 2 (Figure 5) with the last two layers of vertices replaced by a single target vertex t. Each vertex that points to the last vertex t and t itself only has zero or one outgoing arrow and hence is not equipped with a linear function. Other vertices are equipped with linear functions on the input space

R^{n}

as in Example 2. We take the internal state space to be the n-dimensional cube

P_{v} : = {(S^{1})}^{n} \subset R^{n}

, where n is the input dimension at every vertex v except at the target vertex, whose internal state space is defined to be the simplex

P_{t} : = S^{d}

, where d is the target dimension of f. The function

p_{a}

in Definition 2 is defined to be the identity function on the internal state space

P_{v}

for every arrow a except for the arrows that point to t. Now, we need to define

p_{a}

for the arrows that point to t. Let

t_{i}^{'}

be the source vertices of these arrows. The input space

R^{n}

is subdivided into chambers

{x \in R^{n} : the path determined by x targets at t_{i}^{'}}

. Moreover,

L_{N} \circ r_{N - 1} \circ L_{N - 1} \circ \dots \circ r_{1} \circ L_{1}

is a piecewise-linear function, whose restrictions on each of these chambers is linear and extend to a linear function l on

R^{n}

. Then

p_{a}

for the corresponding arrow a is defined to be

\tilde{σ} \circ l

. By this construction, we have

f = f_{(G, L, P, p)}

. □

As in Proposition 3,

f_{(G, L, P, p)}

can be expressed in the form of the sum over paths.

Proposition 9.

Given a fuzzy linear logical graph

(G, L, P, p)

,

f_{(G, L, P, p)} (x) = \sum_{γ} c_{γ} (x) p_{γ} (x),

where the sum is over all possible paths γ in G from the source vertex to one of the target vertices; for

γ = a_{r} \dots a_{1}

,

p_{γ} (x) = \prod_{i = 1}^{r} p_{a_{r}} \dots p_{a_{1}} (x)

;

c_{γ} (x) = \prod_{i = 1}^{r} s_{a_{i}} (p_{a_{i - 1} \dots a_{1}} (x)),

where

s_{a} (x) = 1

if

x \in P_{t_{a}}

lies in the chamber corresponding to the arrow a or is 0 otherwise. In the above sum, exactly one of the terms is non-zero.

2.5. The Semiring of Fuzzy Linear Logical Functions

Since the target space

P^{o u t}

is not a vector space, the set of functions

D \to P^{o u t}

does not form a vector space. This makes a main difference from usual approximation theory (such as Fourier analysis or Taylor series) where the function space is flat as the target is a vector space (typically

R

). For a fuzzy logic target (the product simplex in our case), the ‘function space’ forms a semiring. Due to this difference, we shall work out some well-known results in functional analysis in the semiring context below.

Recall that a semi-group is a set equipped with a binary operation that is associative; a monoid is a semi-group with an identity element. A semiring is a set equipped with two binary operations addition and multiplication which satisfy the ring axioms except for the existence of additive inverses. In particular, a semiring is a semi-group under addition and a semi-group under multiplication, and the two operations are compatible in the sense that multiplication distributes over addition. A semiring without a unit is a semiring that does not have a multiplicative identity.

Example 1

(Viterbi semiring). Let

S^{1} = [0, 1]

be the unit interval with the usual multiplication and addition defined as taking maximum. The additive identity is 0 and the multiplicative identity is 1. Then

(S^{1}, +, \cdot)

is a semiring. This is known as the Viterbi semiring that appears in probabilistic parsing.

The subset of two elements

{0, 1} \subset [0, 1]

is a sub-semiring known as the Boolean semiring. Addition (taking maximum) and multiplication are identified as the logical operations OR and AND, respectively.

In this subsection, we focus on the case that the target space

P^{o u t}

is a product of intervals

\prod_{k = 1}^{m} S^{1} = {[0, 1]}^{m}

.

Lemma 3.

The set of functions from a set D to

{[0, 1]}^{m}

(or

{0, 1}^{m}

) has a semiring structure.

Proof.

For the semiring structure on

{[0, 1]}^{m}

(or

{0, 1}^{m}

), we simply take entrywise addition and multiplication, where addition is defined as taking maximum and multiplication is the usual one. The additive identity is the origin

\vec{0}

and the multiplicative identity is

(1, \dots, 1)

. The set of functions to

{[0, 1]}^{m}

(or

{0, 1}^{m}

) has a semiring structure by taking maximum and multiplication of function values. The additive identity is the zero function that takes all domain points to

\vec{0}

. □

Remark 3.

If we take a disjoint union of product simplices in place of

{[0, 1]}^{m}

, we still have a semi-group structure by entriwise multiplication. However, since taking the entriwise maximum does not preserve the inequality

\sum_{i} x_{i} \leq 1

for a simplex, this destroys the semiring structure. This is why we restrict ourselves to

{[0, 1]}^{m}

in this section.

Theorem 2.

Fix D as a subset of a product simplex, and fix the target product of intervals

P^{o u t} = {[0, 1]}^{n}

. The set of fuzzy linear logical functions

f : D \to P^{o u t}

is a sub-semiring of the semiring of functions

D \to P^{o u t}

.

Similarly, the set of linear logical functions

f : D \to {0, 1}^{n}

is a sub-semiring of the semiring of functions

D \to {0, 1}^{n}

.

Proof.

Given two linear logical functions

f_{(G_{1}, L_{1}, P_{1}, p_{1})}

and

f_{(G_{2}, L_{2}, P_{2}, p_{2})}

with the same domain

D \subset P^{i n}

and target

P^{o u t}

, we need to show that their sum (in the sense of entriwise maximum) and product are also linear logical functions. We construct a directed graph G with one source vertex and one target vertex by merging the target vertex of

G_{1}

with the source vertex of

G_{2}

as a single vertex, and then we add one outgoing arrow to the final vertex of

G_{2}

to another vertex, which is the target of G.

Now for the internal state spaces, over those vertices v of G that belong to

G_{1}

, we take

{(P_{1})}_{v} \times P^{i n}

; over those v that belong to

G_{2}

, we take

P^{o u t} \times {(P_{2})}_{v}

. Note that the vertex that comes from merging of the target vertex of

G_{1}

and the source vertex of

G_{2}

is equipped with the internal state space

P^{o u t} \times P^{i n}

; the vertex that corresponds to the final vertex of

G_{2}

is

P^{o u t} \times P^{o u t}

. The last vertex is equipped with the space

P^{o u t}

.

The arrow maps are defined as follows. For arrows a that come from

G_{1}

, we take the function

p_{1} (a) \times Id

. For arrows a that come from

G_{2}

, we take the function

Id \times p_{2} (a)

. By this construction for an input

x \in D

, the value becomes

(f_{(G_{1}, L_{1}, P_{1}, p_{1})} (x), f_{(G_{2}, L_{2}, P_{2}, p_{2})} (x))

at the vertex corresponding to the final vertex of

G_{2}

. The last arrow is then equipped with the continuous function

P^{o u t} \times P^{o u t} \to P^{o u t}

given by the entriwise maximum (or entriwise product). Thus,

(G, L, P, p)

gives the function

f_{(G, L, P, p)}

, which is the sum (or the product) of

f_{(G_{1}, L_{1}, P_{1}, p_{1})}

and

f_{(G_{2}, L_{2}, P_{2}, p_{2})}

.

The proof for the case of linear logical functions

f : D \to {0, 1}^{n}

is similar and hence omitted. □

P^{o u t} = {[0, 1]}^{n}

is equipped with the standard Euclidean metric

d_{P^{o u t}}

. Moreover, the domain D is equipped with the Lebesgue measure

μ

. We consider the set of measurable functions

D \to P^{o u t}

.

Lemma 4.

The set of measurable functions

D \to P^{o u t}

form a sub-semiring.

Proof.

If

f, g : D \to P^{o u t}

are measurable functions, then

(f, g) : D \to P^{o u t} \times P^{o u t}

is also measurable. Taking entriwise maximum and multiplication are continuous functions

P^{o u t} \times P^{o u t} \to P^{o u t}

. Thus, the compositions, which give the sum and product of f and g, are also measurable functions. Furthermore, the zero function is measurable. Thus, the set of measurable functions

D \to P^{o u t}

forms a sum semiring. □

We define

d (f, g) : = \int_{D} d_{P^{o u t}} (f (x), g (x)) d μ (x) = \int_{D} ∥ f (x) - g (x) ∥ d μ (x) .

To make this a distance function such that

d (f, g) = 0

implies

f = g

, we need to quotient out the functions whose support has measure zero.

Proposition 10.

The subset of measurable functions

f : D \to P^{o u t}

whose support

{f \neq \vec{0}}

has measure zero forms an ideal of the semiring of measurable functions

D \to P^{o u t}

. The above function d gives a metric on the corresponding quotient semiring, denoted by

L^{0} (D, P^{o u t})

.

Proof.

The support of the sum

max (f, g)

is the union of the supports of f and g. The support of the product

f \cdot g

is the intersection of the supports of f and g. Thus, if the supports of f and g have measure zero, then so does the support of

max (f, g)

. Furthermore, if f has the support of measure zero,

f \cdot g

has the support of measure zero (with no condition on g). This shows that the subset of measurable functions

f : D \to P^{o u t}

with the support of measure zero forms an ideal.

d is well-defined on the quotient semiring because if f and g are in the same equivalent class, that is

f + δ_{1} = g + δ_{2}

for some measurable functions

δ_{1}, δ_{2}

whose support have measure zero, then

{f |}_{D - A} {= g |}_{D - A}

for some set A of measure zero. Thus, for any other function h,

\begin{matrix} d (f, h) = & \int_{D} d_{P^{o u t}} (f (x), h (x)) d μ (x) = \int_{D - A} d_{P^{o u t}} (f (x), h (x)) d μ (x) \\ = & \int_{D - A} d_{P^{o u t}} (g (x), h (x)) d μ (x) = \int_{D} d_{P^{o u t}} (g (x), h (x)) d μ (x) = d (g, h) . \end{matrix}

Now, consider any two elements

[f], [g]

in the quotient semiring. It is obvious that

d ([f], [g]) = d ([g], [f])

from the definition of d. Furthermore, the following triangle inequality holds:

\begin{matrix} d ([f], [g]) = & \int_{D} ∥ f (x) - g (x) ∥ d μ (x) \\ \leq & \int_{D} ∥ f (x) - h (x) ∥ d μ (x) + \int_{D} ∥ h (x) - g (x) ∥ d μ (x) \\ = & d ([f], [h]) + d ([h], [g]) . \end{matrix}

Finally, suppose

d ([f], [g]) = 0

. This means that

\int_{D} ∥ f (x) - g (x) ∥ d μ (x) = 0

. Since the integrand is non-negative, we have

∥ f (x) - g (x) ∥ = 0

for almost every

x \in D

. Thus,

f (x) = g (x)

away from a measure zero subset

A \subset D

. Since

f (x) = 1_{D - A} f + 1_{A} f

and

g (x) = 1_{D - A} g + 1_{A} g

(where

1_{A}

denotes the characteristic function of the set A which takes value 1 on A and 0 otherwise), we have

[f] = [1_{D - A} f] = [1_{D - A} g] = [g] .

□

Proposition 11.

L^{0} (D, P^{o u t})

is a topological semiring (under the topology induced from the metric d).

Proof.

We need to show that the addition and multiplication operations

max (f, g), f \cdot g : L^{0} (D, P^{o u t}) \times L^{0} (D, P^{o u t}) \to L^{0} (D, P^{o u t})

is continuous. First, consider

(f, g) : L^{0} (D, P^{o u t}) \times L^{0} (D, P^{o u t}) \to L^{0} (D, P^{o u t} \times P^{o u t}) .

This is continuous, as follows: for

ϵ > 0

, consider

\tilde{f}, \tilde{g}

with

d (\tilde{f}, f) < δ

and

d (\tilde{g}, g) < δ

. Then

\begin{matrix} d ((\tilde{f}, \tilde{g}), (f, g)) = & \int_{D} d_{P^{o u t} \times P^{o u t}} ((\tilde{f} (x), \tilde{g} (x)), (f (x), g (x))) d μ (x) \\ \leq \int_{D} | \tilde{f} - f | d μ + \int_{D} | \tilde{g} - g | d μ < 2 δ = ϵ \end{matrix}

by taking

δ = ϵ / 2

. Now, the function

max (a, b) : P^{o u t} \times P^{o u t} \to P^{o u t}

is Lipschitz continuous with Lipschitz constant 1. Entriwise multiplication

P^{o u t} \times P^{o u t} \to P^{o u t}

is also Lipschitz continuous since the product

P^{o u t} \times P^{o u t}

is compact. Denote either one of these two functions by

ϕ

, and let K be the Lipschitz constant. As above, we have chosen suitable

δ

to control

\tilde{f}

and

\tilde{g}

such that

\int_{D} d_{P^{o u t} \times P^{o u t}} ((\tilde{f} (x), \tilde{g} (x)), (f (x), g (x))) d μ (x) < ϵ .

Then

\begin{matrix} d (ϕ \circ (\tilde{f}, \tilde{g}), ϕ \circ (f, g)) = & \int_{D} d_{P^{o u t}} (ϕ (\tilde{f} (x), \tilde{g} (x)), ϕ (f (x), g (x))) d μ (x) \\ \leq K \int_{D} d_{P^{o u t} \times P^{o u t}} ((\tilde{f} (x), \tilde{g} (x)), (f (x), g (x))) d μ (x) < K ϵ . \end{matrix}

□

2.6. Graph of Fuzzy Linear Logical Function as a Fuzzy Subset

A fuzzy subset of a topological measure space X is a continuous measurable function

F : X \to [0, 1]

. This generalizes the characteristic function of a subset. The interval

[0, 1]

can be equipped with the semiring structure whose addition and multiplication are taking maximum and minimum, respectively. This induces a semiring structure on the collection of fuzzy subsets

F : X \to [0, 1]

that plays the role of union and intersection operations.

The graph

gr (f)

of a function

f : D \to ∐_{l = 1}^{K} P_{t_{l}},

where

P_{t_{l}}

are products of simplices as in Definition 2, is defined to be the fuzzy subset in

D \times T

, where T is defined by Equation (1), given by the probability at every

(x, t) \in D \times T

determined by f in Remark 2.

The following is a fuzzy analog of Proposition 4.

Proposition 12.

Let

f_{(G, L, P, p)}

be a fuzzy linear logical function. Let

F : D \times T \to [0, 1]

be the characteristic function of its graph where T is defined by (1). Then

F

is also a fuzzy linear logical function.

Proof.

Similar to the proof of Proposition 4, we embed the finite set T as the subset

{1, \dots, | T |} \subset R

. The affine linear functions on

R^{n} \supset D

in the collection L are pulled back as affine linear functions on

R^{n + 1} \supset D \times T

. Similarly for the input vertex

v_{0}

, we replace the product simplex

P_{v_{0}}

by

{\tilde{P}}_{v_{0}} : = P_{v_{0}} \times [0, | T | + 1]

(where the interval

[0, | T | + 1]

is identified with

S^{1}

); for the arrows a tailing at

v_{0}

,

p_{a}

are pulled back to be functions

{\tilde{P}}_{v_{0}} \to P_{h (a)}

. Then, we obtain

(\tilde{L}, \tilde{P}, \tilde{p})

on G.

For each of the target vertices

t_{l}

of G, we equip it with the linear function

(y - 3 / 2, y - 5 / 2, \dots, y - (| T | - 1 / 2)) : R^{n + 1} \to R^{| T | - 1},

where y is the last coordinate of the domain

R^{n + 1}

. It divides

R^{n + 1}

into

| T |

chambers that contain

R^{n} \times {j}

for some

j = 1, \dots, | T |

. Correspondingly, we make

| T |

outgoing arrows of the vertex

t_{l}

. The new vertices are equipped with the internal state space

P_{t_{l}}

, and the new arrows are equipped with the identity function

P_{t_{l}} \to P_{t_{l}}

. Then we get

| T | K

additional vertices, where K is the number of target vertices

t_{l}

of G. Let us label these vertices as

v_{j, l}

for

j \in T

and

l \in {1, \dots, K}

. Each of these vertices are connected to the new output vertex by a new arrow. The new output vertex is equipped with the internal state space

S^{1} ≅ [0, 1]

. The arrow from

v_{j, l}

to the output vertex is equipped with the following function

p_{j, l} : P_{v_{j, l}} \to [0, 1]

. If

j \notin T_{l}

, then we set

p_{j, l} \equiv 0

. Otherwise, for

j = e_{I} \in T_{l}

and

I = (i_{1}, \dots, i_{m_{t_{l}}})

,

p_{j, l} : = \prod_{k = 1}^{m_{t_{l}}} u_{k}^{(i_{k})}

, where

u_{k}^{(0)}, \dots, u_{k}^{(d_{t_{l}, k})}

are the coordinates of

S^{d_{t_{l}, k}} \subset R_{\geq 0}^{d_{t_{l}, k} + 1}

. This gives the fuzzy linear logical graph

(\tilde{G}, \tilde{L}, \tilde{P}, \tilde{p})

whose associated function

f_{(\tilde{G}, \tilde{L}, \tilde{P}, \tilde{p})} : D \times T \to [0, 1]

is the characteristic function. □

The above motivates us to consider fuzzy subsets whose characteristic functions are fuzzy linear logical functions

F : X \to [0, 1]

. Below, we show that they form a sub-semiring, that is, they are closed under fuzzy union and intersection. We need the following lemma analogous to Lemma 1.

Lemma 5.

Let

f_{G_{i}, L_{i}, P_{i}, p_{i}} : D \to P_{i}^{out}

be fuzzy linear logical functions for

i \in I

, where

I = {1, \dots, k}

, and we assume that the input state space

P_{i, in}

are the same for all i. Then

(f_{G_{i}, L_{i}, P_{i}, p_{i}} : i \in I) : D \to \prod_{i \in I} P_{i}^{out}

is also a fuzzy linear logical function.

Proof.

By the proof of Lemma 1, we obtain a new graph

(G, L)

from

(G_{i}, L_{i})

for

i = 1, \dots, k

by attaching

(G_{i + 1}, L_{i + 1})

to the target vertices of

(G_{i}, L_{i})

. For the internal state spaces, we change as follows. First, we make a new input vertex

{\tilde{v}}_{0}

and an arrow

{\tilde{a}}_{0}

from

{\tilde{v}}_{0}

to the original input vertex

v_{0}

of

(G, L)

. We denote the resulting graph by

(\tilde{G}, \tilde{L})

. We define

{\tilde{P}}_{{\tilde{v}}_{0}} : = P_{v_{0}}

,

{\tilde{P}}_{v_{0}} : = \prod_{i = 1}^{k} P_{v_{0}}

, where

P_{v_{0}} = P_{i, in}

for all i by assumption, and

{\tilde{p}}_{{\tilde{a}}_{0}} : P_{v_{0}} \to \prod_{i = 1}^{k} P_{v_{0}}

to be the diagonal map

{\tilde{p}}_{{\tilde{a}}_{0}} = (Id, \dots, Id)

. The internal state spaces

P_{v}

over vertices v of

G_{1}

are replaced by

P_{v} \times \prod_{i = 2}^{k} P_{v_{0}}

, and

p_{a}

for arrows a of

G_{1}

are replaced by

(p_{a}, Id, \dots, Id)

. Next, over the vertices v of the graph

G_{2}

that is attached to the target vertex

t_{l}

of

G_{1}

, the internal state space

P_{v}

is replaced by

P_{t_{l}} \times P_{v} \times \prod_{i = 3}^{k} P_{v_{0}}

, and

p_{a}

for arrows a of

G_{2}

are replaced by

(Id, p_{a}, Id, \dots, Id)

. Inductively, we obtain the desired graph

(\tilde{G}, \tilde{L}, \tilde{P}, \tilde{p})

. □

Proposition 13.

Suppose

F_{1}, F_{2} : X \to [0, 1]

are fuzzy subsets defined by fuzzy linear logical functions. Then

F_{1} \cup F_{2}

and

F_{1} \cap F_{2}

are also fuzzy subsets defined by fuzzy linear logical functions.

Proof.

By the previous lemma,

(F_{1}, F_{2}) = f_{(G, L, P, p)} : X \to [0, 1] \times [0, 1]

for some fuzzy linear logical graph

(G, L, P, p)

, which has a single output vertex whose internal state space is

[0, 1] \times [0, 1] ≅ S^{1} \times S^{1}

. We attach an arrow a to this output vertex. Over the new target vertex v,

P_{v} : = [0, 1]

;

p_{a} : = max : [0, 1] \times [0, 1] \to [0, 1]

(or

p_{a} : = min

). Then, we obtain

(\tilde{G}, \tilde{L}, \tilde{P}, \tilde{p})

, whose corresponding fuzzy function defines

F_{1} \cup F_{2}

(or

F_{1} \cap F_{2}

, respectively). □

Remark 4.

For

f : P^{in} \to P^{out}

, where

P^{in}, P^{out}

are product simplices, we can have various interpretations.

1.: As a usual function, its graph is in the product $P^{in} \times P^{out}$ .
2.: As a fuzzy function on $P^{in}$ : $P^{in} \to T$ , where T is the finite set of vertices of the product simplex $P^{out}$ , its graph is a fuzzy subset in $P^{in} \times T$ .
3.: The domain product simplex $P^{in}$ can also be understood as a collection of fuzzy points over V, the finite set of vertices of $P^{in}$ , where a fuzzy point here just refers to a probability distribution (which integrates to 1).
4.: Similarly, $P^{in} \times P^{out}$ can be understood as a collection of fuzzy points over $V \times T$ . Thus, the (usual) graph of f can be interpreted as a sub-collection of fuzzy points over $V \times T$ .

(Id, f)

gives a parametric description of the graph of a function f. The following ensures that it is a fuzzy linear logical function if f is.

Corollary 1.

Let

f : D \to P^{out}

be a fuzzy linear logical function. Then

(Id, f) : D \to D \times P^{out}

is also a fuzzy linear logical function whose image is the graph of f.

Proof.

By Lemma 5, it suffices to know that

Id : D \to D

is a fuzzy linear logical function. This is obvious: we take the graph with two vertices serving as input and output, which are connected by one arrow. The input and output vertices are equipped with the internal state spaces that contain D, and p is just defined by the identity function. □

Remark 5.

Generative deep learning models that are widely used nowadays can be understood as parametric descriptions of data sets X by fuzzy linear logical functions

f : D \to X

(where D and X are embedded in certain product simplices

P^{in}

and

P^{out}

, respectively, and D is usually called the noise space). We focus on classification problems in the current work and plan to extend the framework to other problems as well in the future.

2.7. A Digression to Non-Linearity in a Quantum-Classical System

The fuzzy linear logical functions in Definition 2 have the following quantum analog. Quantum systems and quantum random walks are well known and studied, see, for instance [22,23]. On the other hand, they depend only linearly on the initial state in probability. The motivation of this subsection is to compare fuzzy and quantum systems and to show how non-linear dependence on the initial probability distribution can come up. On the other hand, this section is mostly unrelated to the rest of the writing and can be skipped.

Definition 3.

Let G be a finite directed graph that has no oriented cycle and has exactly one source vertex and target vertices

t_{1}, \dots, t_{K}

. Each vertex v is equipped with a product of projectifications of Hilbert spaces over complex numbers as follows:

Q_{v} : = \prod_{l = 1}^{m_{v}} P (H_{v, l})

for some integer

m_{v} > 0

. We fix an orthonormal basis in each Hilbert space

H_{v, l}

, which gives a basis in the tensor product as follows:

E^{(v)} = \{e_{I}^{(v)} = (e_{i_{1}}^{(v)}, \dots, e_{i_{m_{v}}}^{(v)}) : e_{i_{l}}^{(v)} is a basic vector of H_{v, l}\} .

For each vertex v that has more than one outgoing arrows, we make the choice of decomposing set

E^{(v)}

into subsets that are one-to-one corresponding to the outgoing arrows. Each arrow a is equipped with a map

q_{a}

from the corresponding subset of basic vectors

e_{I}^{(t (a))}

to

Q_{h (a)}

.

Let us call the tuple

(G, Q, E, q)

a quantum logical graph.

We obtain a probabilistic map

f_{(G, Q, E, q)} : Q^{in} \to T

as follows. Given a state

\vec{w} = (w_{1}, \dots, w_{m_{v}}) \in Q_{v}

at a vertex v, we make a quantum measurement and

\vec{w}

projects to one of the basic elements

e_{I}^{(v)}

with probability

\prod_{l = 1}^{m_{v}} {| ⟨ w_{l}, e_{i_{l}}^{(v)} ⟩ |}^{2}

. The outcome

e_{I}^{(v)}

determines which outgoing arrow a to pick, and the corresponding map

q_{a}

sends it to an element of

Q_{h (a)}

. Inductively, we obtain an element

f_{(G, Q, E, q)} (w) \in T

.

However, such a process is simply linearly depending on the initial condition in probability as follows: the probabilities of outcomes of the quantum process

f_{(G, Q, E, q)} (w)

for an input state w (which is complex-valued) simply linearly depends on the modulus of components of the input w. In other words, the output probabilities are simply obtained by a matrix multiplication on the input probabilities. To produce non-linear physical phenomena, we need the following extra ingredient.

Let us consider the state space

P^{n}

of a single particle. A basis gives a map

μ : P^{n} \to S^{n}

to the simplex

S^{n}

(also known as the moment map of a corresponding torus action on

P^{n}

):

μ = (\frac{| z_{0} |^{2}}{| z_{0} |^{2} + \dots + {| z_{n} |}^{2}}, \dots, \frac{| z_{n} |^{2}}{| z_{0} |^{2} + \dots + {| z_{n} |}^{2}}) : P^{n} \to S^{n} .

The components of the moment map are the probability of quantum projection to basic states of the particle upon observation. By the law of large numbers, if we make independent observations of particles in an identical quantum state

\vec{z} = [z_{0} : \dots : z_{n}] \in P^{n}

for N times, the average of the observed results (which are elements in

{(1, 0, \dots, 0), \dots, (0, \dots, 0, 1)}

) converges to

μ (\vec{z}) \in S^{n}

as

N \to \infty

.

The additional ingredient we need is the choice of a map

s : S^{n} \to P^{m}

and

m, n \in Z_{> 0}

. For instance, when

m = n = 1

, we set the initial phase of the electron spin state according to a number in

[0, 1]

. Upon an observation of a state, we obtain a point in

{(1, 0, \dots, 0), \dots, (0, \dots, 0, 1)} \subset S^{n}

. Now, if we have N particles simultaneously observed, we obtain N values, whose average is again a point p in the simplex

S^{n}

. By s, these are turned to N quantum particles in state

s (p) \in P^{m}

again.

μ : P^{n} \to S^{n}

and

s : S^{n} \to P^{m}

give an interplay between quantum processes and classical processes with averaging. Averaging in the classical world is the main ingredient to produce non-linearity from the linear quantum process.

Now, let us modify Definition 3 by using

s : S^{n} \to P^{m}

. Let

P_{v}

be the product simplex corresponding to

Q_{v}

at each vertex. Moreover, as in Definitions 1 and 2 for (fuzzy) linear logical functions, we equip each vertex with affine linear functions

l_{v}

whose corresponding systems of inequalities divide

P_{v}

into chambers. This decomposition of

P_{v}

plays the role of the decomposition of

E^{(v)}

in Definition 3. The outgoing arrows at v are in a one-to-one correspondence with the chambers. Each outgoing arrow a at v is equipped with a map

{\tilde{q}}_{a}

from the corresponding chamber of

P_{v}

to

Q_{h (a)}

.

{\tilde{q}}_{a}

can be understood as an extension of

q_{a}

(whose domain is a subset of corners of

P_{t (a)}

) in Definition 3.

Definition 4.

We call the tuple

(G, Q, E, L, \tilde{q})

(where L is the collection of affine linear functions

l_{v}

) a quantum-classical logical graph.

Given N copies of the same state

\vec{w}

in

Q_{v}

, we first take a quantum projection of these and they become elements in

P_{v}

. We take an average of these N elements, which lies in a certain chamber defined by

l_{v}

. The chamber corresponds to an outgoing arrow a, and the map

{\tilde{q}}_{a}

produces N elements in

Q_{h (a)}

. Inductively, we obtain a quantum-classical process

Q^{in} \to T

.

For

N = 1

, this essentially produces the same linear probabilistic outcomes as in Definition 3. On the other hand, when

N > 1

, the process is no longer linear and produces a fuzzy linear logical function

P^{in} \to P^{out}

.

In summary, non-linear dependence on the initial state results from averaging of observed states.

Remark 6.

We can allow loops or cycles in the above definition. Then the system may run without stop. In this situation, the main object of concern is the resulting (possibly infinite) sequence of pairs

(v, s)

, where v is a vertex of G and s is a state in

Q_{v}

. This gives a quantum-classical walk on the graph G.

We can make a similar generalization for (fuzzy) linear logical functions by allowing loops or cycles. This is typical in applications in time-dependent network models.

3. Linear Logical Structures for a Measure Space

In the previous section, we have defined linear logical functions based on a directed graph. In this section, we first show the equivalence between our definition of linear logical functions and semilinear functions [4] in the literature. Thus, the linear logical graph we have defined can be understood as a representation of semilinear functions. Moreover, fuzzy and quantum logical functions that we define can be understood as deformations of semilinear functions.

Next, we consider measurable functions and show that they can be approximated and covered by semilinear functions. This motivates the definition of a logifold, which is a measure space that has graphs of linear logical functions as local models.

3.1. Equivalence with Semilinear Functions

Let us first recall the definition of semilinear sets.

Definition 5.

For any positive integer n, semilinear sets are the subsets of

R^{n}

that are finite unions of sets of the form

{x \in R^{n} : f_{1} (x) = \dots = f_{k} (x) = 0, g_{1} (x) > 0, \dots, g_{l} (x) > 0},

(2)

where the

f_{i}

and

g_{j}

are affine linear functions.

A function

f : D \to T

on

D \subset R^{n}

, where T is a discrete set, is called semilinear if for every

t \in T

,

f^{- 1} {t}

equals to the intersection of D with a semilinear set.

Now let us consider linear logical functions defined in the last section. We show that the two notions are equivalent (when the target set is finite). Thus, a linear logical graph can be understood as a graphical representation (which is not unique) of a semilinear function. From this perspective, the last section provides fuzzy and quantum deformations of semilinear functions.

Theorem 3.

Consider

f : D \to T

for a finite set

T = {t_{1}, \dots, t_{s}}

, where

D \subset R^{n}

. f is a semilinear function if and only if it is a linear logical function.

Proof.

It suffices to consider the case

D = R^{n}

. We use the following terminologies for convenience. Let

V (G)

and

E (G)

be the sets of vertices and arrows, respectively, for a directed graph G. A vertex

v \in V (G)

is called nontrivial if it has more than one outgoing arrows. It is said to be simple if it has exactly one outgoing arrow. We call a vertex that has no outgoing arrow a target and that which has no incoming arrow a source. For a target t, let

R_{t}

be the set of all paths from the source to target t.

Consider a linear logical function

f = f_{(G, L)} : R^{n} \to T

. Let p be a path in

R_{t}

for some

t \in T

. Let

{v_{1}, \dots, v_{k}}

be the set of non-trivial vertices that p passes through. This is a non-empty set unless f is just a constant function (recall that G has only one source vertex). At each of these vertices

v_{i}

,

R^{n}

is subdivided according to the affine linear functions

g_{i, 1}, \dots, g_{i, N_{i}}

into chambers

C_{i, 1}, \dots, C_{i, m_{i}}

, where

m_{i}

is the number of its outgoing arrows. All the chambers

C_{i, j}

are semilinear sets.

For each path

p \in R_{t}

, we define a set

E_{p}

such that

x \in E_{p}

if x follows path p to get the target t. Then

E_{p}

can be represented as

C_{1, j_{1}} \cap \dots \cap C_{k, j_{k}},

which is semilinear. Moreover, the finite union

f^{- 1} (t) = ⋃_{p \in R_{t}} E_{p}

is also a semilinear set. This shows that f is a semilinear function.

Conversely, suppose that we are given a semilinear function. Without loss of generality, we can assume that f is surjective. For every

t \in T

,

f^{- 1} (t_{i})

is a semilinear set defined by a collection of affine linear functions in the form of (2). Let

F = {l_{1}, \dots, l_{N}}

be the union of these collections over all

t \in T

.

Now, we construct a linear logical graph associated with f. We consider the chambers made by

(l_{1}, - l_{1}, \dots, l_{N}, - l_{N})

by taking the intersection of the half spaces

l_{i} \geq 0

,

l_{i} < 0

,

- l_{i} \geq 0

,

- l_{i} < 0

. We construct outgoing arrows of the source vertex associated with these chambers.

For each

t \in T

,

l_{j}

occurs in defining

f^{- 1} (t)

as either one of the following ways:

$l_{j} > 0$ , which is equivalent to $l_{j} \geq 0$ and $- l_{j} < 0$ ,
$l_{j} = 0$ , which is equivalent to $l_{j} \geq 0$ and $- l_{j} \geq 0$ ,
$l_{j}$ is not involved in defining $f^{- 1} (t)$ .

Thus,

f^{- 1} (t)

is a union of a sub-collection of chambers associated with the outgoing arrows. Then, we assign these outgoing arrows with the target vertex t. This is well-defined since

f^{- 1} (t)

for different t are disjoint to each other. Moreover, since

⋃_{t} f^{- 1} (t) = R^{n}

, every outgoing arrow is associated with a certain target vertex.

In summary, we have constructed a linear logical graph G which produces the function f. □

The above equivalence between semilinear functions and linear logical functions naturally generalizes to definable functions in other types of o-minimal structures. They provide the simplest class of examples in o-minimal structures for semi-algebraic and subanalytic geometry [4]. The topology of sub-level sets of definable functions was recently investigated in [24]. Let us first recall the basic definitions.

Definition 6

([4]). A structure

S

on

R

consists of a Boolean algebra

S_{n}

of subsets of

R^{n}

for each

n = 0, 1, 2, \dots,

such that

1.: the diagonals ${x \in R^{n} : x_{i} = x_{j}}, 1 \leq i < j \leq n$ belong to $S_{n}$ ;
2.: $A \in S_{m}, B \in S_{n} \Rightarrow A \times B \in S_{m + n}$ ;
3.: $A \in S_{n + 1} \Rightarrow π (A) \in S_{n}$ , where $π : R^{n + 1} \to R^{n}$ is the projection map defined by $π (x_{1}, \dots, x_{n + 1}) = (x_{1}, \dots, x_{n})$ ;
4.: the ordering ${(x, y) \in R^{2} : x < y}$ of $R$ belongs to $S_{2}$ .

A structure

S

is o-minimal if the sets in

S_{1}

are exactly the subsets of

R

that have only finitely many connected components, that is, the finite unions of intervals and points.

Given a collection

A

of subsets of the Cartesian spaces

R^{n}

for various n, such that the ordering

{(x, y) : x < y}

belongs to

A

, define

Def (A)

as the smallest structure on the real line containing

A

by adding the diagonals to

A

and closing off under Boolean operations, Cartesian products, and projections. Sets in

Def (A)

are said to be definable from

A

or simply definable if

A

is clear from context.

Given definable sets

A \subset R^{m}

and

B \subset R^{n}

we say that a map

f : A \to B

is definable if its graph

Γ (f) = {(x, f (x)) \in R^{m + n} : x \in A}

is definable.

Remark 7.

If

A

consists of the ordering, the singletons

{r}

for any

r \in R

, the graph in

R^{2}

of scalar multiplications maps

x \mapsto λ x : R \to R

for any

λ \in R

, and the graph of addition

{(x, y, z) \in R^{3} : z = x + y}

. Then

Def (A)

consists of semilinear sets for various positive integers n (Definition 5).

Similarly, if

A

consist of the ordering, singletons, and the graphs of addition and multiplication, then

Def (A)

consists of semi-algebraic sets, which are finite unions of sets of the form

\{x \in R^{n} : f (x) = 0, g_{1} (x) > 0, \dots, g_{l} (x) > 0\},

where f and

g_{1}, \dots, g_{l}

are real polynomials in n variables, due to the Tarski–Seidenberg Theorem [25].

One obtains semi-analytic sets in which the above

f, g_{1}, \dots, g_{l}

become real analytic functions by including graphs of analytic functions. Let an be the collection

A

and of the functions

f : R^{n} \to R

for all positive integers n such that

{f |}_{I^{n}}

is analytic,

I = [- 1, 1] \subset R

, and f is identically 0 outside the cubes. The theory of semi-analytic sets and subanalytic sets show that

Def (an)

is o-minimal, and relatively compact semi-analytic sets have only finitely many connected components. See [25] for efficient exposition of the ojasiewicz-Gabrielov-Hironaka theory of semi- and subanalytic sets.

Theorem 4.

Let us replace the collection of affine linear functions at vertices in Definition 1 by polynomials and call the resulting functions polynomial logical functions. Then

f : D \to T

for a finite set T is a semi-algebraic function if and only if f is a polynomial logical function.

The proof of the above theorem is similar to that of Theorem 3 and hence omitted.

3.2. Approximation of Measurable Functions by Linear Logical Functions

We consider measurable functions

f : D \to T

, where

0 < μ (D) < \infty

and T is a finite set. The following approximation theorem for measurable functions has two distinct features since T is a finite set. First, the functions under consideration, and linear logical functions that we use, are discontinuous. Second, the ‘approximating function’ actually exactly equals to the target function in a large part of D. Compared to traditional approximation methods, linear logical functions have an advantage of being representable by logical graphs, which have fuzzy or quantum generalizations.

Theorem 5

(Universal approximation theorem for measurable functions). Let μ be the standard Lebesgue measure on

R^{n}

. Let

f : D \to T

be a measurable function with

μ (D) < \infty

and a finite target set T. For any

ϵ > 0

, there exists a linear logical function

L : D \to \tilde{T}

, where

\tilde{T} = T \cup {*}

is T adjunct with a singleton, and a measurable set

E \subset D

with

μ (E) < ϵ

such that

{L |}_{D - E} {\equiv f |}_{D - E}

.

Proof.

Let

R : = \{\prod_{k = 1}^{n} (a_{k}, b_{k}] \subset R^{n} : a_{k} < b_{k} for all k\}

be the family of rectangles in

R^{n}

. We use the well-known fact that for any measurable set U of finite Lebesgue measure, there exists a finite subcollection

\{R_{j} : j = 1, \dots, N\}

of

R

such that

μ (U ▵ ⋃_{1}^{N} R_{j}) < ϵ

(see, for instance [1]). Here,

A ▵ B : = (A - B) \cup (B - A)

denotes the symmetric difference of two subsets

A, B

.

Suppose that a measurable function

f : D \to T = {t_{1}, \dots, t_{m}}

and

ϵ > 0

be given. For each

t \in T

, let

S_{t}

be a union of finitely many rectangles of

R

that approximates

f^{- 1} (t) \subset D

(that has finite measure) in the sense that

μ ((f^{- 1} (t) ▵ S_{t})) < \frac{ϵ}{| T |}

. Note that

S_{t}

is a semilinear set.

The case

m = 1

is trivial. Suppose

m > 1

. Define semilinear sets

S_{*} = ⋃_{i < j} (S_{t_{i}} \cap S_{t_{j}})

and

S_{t} = S_{t} ∖ S_{*}

for each

t \in T

. Now, we define

L : D \to \tilde{T}

,

L (p) = \{\begin{matrix} t & if p \in S_{t} \cap D \\ * & if p \in D ∖ ⋃_{t \in T} S_{t} \end{matrix}

which is a semilinear function on D.

If

p \in S_{*} \cap D

, then

p \in S_{t_{i}} \cap S_{t_{j}}

for some

t_{i}, t_{j} \in T

with

t_{i} \neq t_{j}

. In such a case,

p \in f^{- 1} (t_{i})

implies

p \notin f^{- 1} (t_{j})

. It shows

S_{*} \cap D \subset ⋃_{t \in T} (S_{t} ∖ f^{- 1} (t))

. Furthermore, we have

D ∖ ⋃_{t \in T} S_{t} \subset ⋃_{t \in T} (f^{- 1} (t) ∖ S_{t})

. Therefore

D ∖ ⋃_{t \in T} S_{t} = (D ∖ ⋃_{t \in T} S_{t}) \cup (D \cap S_{*}) \subset ⋃_{t \in T} f^{- 1} (t) ▵ S_{t},

and hence

μ (D ∖ ⋃_{t \in T} S_{t}) \leq \sum_{t \in T} μ (f^{- 1} (t) ▵ S_{t}) < ϵ .

By Theorem 3, L is a linear logical function. □

Corollary 2.

Let

f : D \to T

be a measurable function where

D \subset R^{n}

is of finite measure and T is finite. Then there exists a family

L

of linear logical functions

L_{i} : D_{i} \to T

, where

D_{i} \subset D

and

L_{i} {\equiv f |}_{D_{i}}

, such that

D ∖ ⋃_{i} D_{i}

is a measure zero set.

3.3. Linear Logifold

To be more flexible, we can work with the Hausdorff measure which is recalled as follows.

Definition 7.

Let

p \geq 0

,

δ > 0

. For any

U \subset R^{n}

,

Diam (U)

denotes the diameter of U defined by the supremum of distance of any two points in U. For a subset

E \subset R^{n}

, define

H_{δ}^{p} (E) : = inf_{U_{δ} (E)} \sum_{U \in U_{δ} (E)} Diam {(U)}^{p},

where

U_{δ} (E)

denotes a cover of E by sets U with

Diam (U) < δ

. Then the p-dimensional Hausdorff measure is defined as

H^{p} (E) : = lim_{δ \to 0} H_{δ}^{p} (E)

. The Hausdorff dimension of E is

{dim}_{H} (E) : = inf_{p} {p \in [0, \infty) : H^{p} (E) = 0}

.

Definition 8.

A linear logifold is a pair

(X, U)

, where X is a topological space equipped with a σ-algebra and a measure μ,

U

is a collection of pairs

(U_{i}, ϕ_{i})

, where

U_{i}

are subsets of X such that

μ (U_{i}) > 0

and

μ (X - ⋃_{i} U_{i}) = 0

;

ϕ_{i}

are measure-preserving homeomorphisms between

U_{i}

and the graphs of linear logical functions

f_{i} : D_{i} \to T_{i}

(with an induced Hausdorff measure), where

D_{i} \subset R^{n_{i}}

are

H^{p_{i}}

-measurable subsets in certain dimension

p_{i}

, and

T_{i}

are discrete sets.

The elements of

U

are called charts. A chart

(U, ϕ)

is called entire up to measure ϵ if

μ (X - U) < ϵ

.

Comparing to a topological manifold, we require

μ (U_{i}) > 0

in place of an openness condition. Local models are now taken to be graphs of linear logical functions in place of open subsets of Euclidean spaces.

Then, the results in the last subsection can be rephrased as follows.

Corollary 3.

Let

f : D \to T

be a measurable function on a measurable set

D \subset R^{n}

of finite measure with a finite target set T. For any

ϵ > 0

, its graph

gr (f) \subset D \times T

can be equipped with a linear logifold structure that has an entire chart up to measure ϵ.

Remark 8.

In [26], relations between neural networks and quiver representations were studied. In [27,28], a network model is formulated as a framed quiver representation; learning of the model was formulated as a stochastic gradient descent over the corresponding moduli’s space. In this language, we now take several quivers, and we glue their representations together (in a non-linear way) to form a ‘logifold’.

In a similar manner, we define a fuzzy linear logifold below. By Remark 2 and (2) of Remark 4, a fuzzy linear logical function has a graph as a fuzzy subset of

D \times T

. We are going to use the fuzzy graph as a local model for a fuzzy space

(X, P)

.

Definition 9.

A fuzzy linear logifold is a tuple

(X, P, U)

, where

1.: X is a topological space equipped with a measure μ;.
2.: $P : X \to (0, 1]$ is a continuous measurable function.
3.: $U$ is a collection of tuples $(ρ_{i}, ϕ_{i}, f_{i})$ , where $ρ_{i}$ are measurable functions $ρ_{i} : X \to [0, 1]$ with $\sum_{i} ρ_{i} \leq 1_{X}$ that describe fuzzy subsets of X, whose supports are denoted by by $U_{i} = {x \in X : ρ_{i} (x) > 0} \subset X$ ;

$ϕ_{i} : U_{i} \to D_{i} \times T_{i}$

are measure-preserving homeomorphisms where $T_{i}$ are finite sets in the form of (1) and $D_{i} \subset R^{n_{i}}$ are $H^{p_{i}}$ -measurable subsets in certain dimension $p_{i}$ ; $f_{i}$ are fuzzy linear logical functions on $D_{i}$ whose target sets are $T_{i}$ , as described in Remark 2.
4.: the induced fuzzy graphs $F_{i} : D_{i} \times T_{i} \to [0, 1]$ of $f_{i}$ satisfy

$P = \sum_{i} ρ_{i} \cdot ϕ_{i}^{*} (F_{i}) .$

(3)

Persistent homology [29,30,31] can be defined for fuzzy spaces

(X, P)

by using the filtration

X_{c} : = {x \in X : P (x) \geq c} \subset X

for

c \in [0, 1]

associated with

P

. We plan to study persistent homology for fuzzy logifolds in a future work.

4. Ensemble Learning and Logifolds

In this section, we briefly review the ensemble learning methods in [32] and make a mathematical formulation via logifolds. Moreover, in Section 4.4 and Section 4.5, we introduce the concept of fuzzy domain and develop a refined voting method based on this. We view each trained model as a chart given by a fuzzy linear logical function. The domain of each model can be a proper subset of its feature space defined by the inverse image of a proper subset of the target classes. For each trained model, a fuzzy domain is defined using the certainty score for each input, and only inputs which lie in this certain part are accepted. In [12], we demonstrated in experiments that this method produces improvements in accuracy compared to taking the average of outputs.

4.1. Mathematical Description of Neural Network Learning

Consider a subset

Z

of

R^{n} \times T

, where

T = {c_{1}, \dots, c_{N}}

, which we take as the domain of a model.

R^{n}

is typically referred to as the feature space, while each

c_{i} \in T

represents a class. We embed T as the corners of the standard simplex

S^{N - 1} \subset R^{N}

. One wants to find an expression of the probability distribution of

Z

in terms of a function produced by a neural network.

Definition 10.

The underlying graph of a neural network is a finite directed graph G. Each vertex v is associated with

R^{n_{v}}

for some

n_{v} \in Z_{> 0}

, together with a non-linear function

R^{n_{v}} \to R^{n_{v}}

called an activation function.

Let Θ be the vector space of linear representations. A linear representation associates each arrow

\vec{a}

with a linear map

R^{n_{s (\vec{a})}} \to R^{n_{t (\vec{a})}}

, where

s (\vec{a}), t (\vec{a})

are the source and target vertices, respectively.

Let us fix γ to be a linear combination of paths between two fixed vertices s and t in G. The associated network function

f_{θ} : R^{n_{s}} \to R^{n_{t}}

for each

θ \in Θ

is defined to be the corresponding function obtained by the sum of compositions of linear functions and activation functions along the paths of γ.

One would like to minimize the function

C_{Z} : Θ \to R

,

C_{Z} (θ) : = \sum_{(x, y) \in Z} {∥ f_{θ} (x) - y ∥}^{2},

which measures the distance between the graph of

f_{θ}

and

Z

. To do this, one takes a stochastic gradient descent(SGD) over

Θ

. In a discrete setting, it is given by the following equation:

θ_{k + 1} = θ_{k} - η \nabla C_{Z} - η W_{k},

where

η \in R_{> 0}

is called the step size or learning rate,

W_{k}

is the

k^{th}

noise or Brownian Motion, and

\nabla C_{Z}

denotes the gradient vector field of

C_{Z}

(in practice, the sample

Z

is divided into batches and C is the sum over a batch).

For practical purposes, the completion of the computational process is marked by the verification of epochs. Then the hyper-parameter space

H

for SGD is a subspace

{(η, Batch size, Epochs, Noise)} \subset R^{3} \times D

, where

D

is the space of

R

-valued random variables with zero mean and finite variance. This process is called the training procedure, and the resulting function

g = f_{θ_{*}}

is called a trained model, where

θ_{*}

is the minimizer. The

arg max g (x)

is called the prediction of g at x, which is well-defined almost everywhere. For

c_{i} \in T

,

g (x, c_{i}) : = g {(x)}_{i}

is called the certainty of the model for x being in class

c_{i}

.

4.2. A Brief Description of Ensemble Machine Learning

Ensemble machine learning utilizes more than one classifiers to make decisions. Dasarathy and Sheela [33] were early contributors to this theory, who proposed the partitioning feature space using multiple classifiers. Ensemble systems offer several advantages, including smoothing decision boundaries, reducing classifier bias, and addressing issues related to data volume. A widely accepted key to a successful ensemble system is achieving diversity among its classifiers. Refs. [7,34,35] provide good reviews of this theory.

Broadly speaking, designing an ensemble system involves determining how to obtain classifiers with diversity and how to combine their predictions effectively. Here, we briefly introduce popular methods. Bagging [36], short for bootstrap aggregating, trains multiple classifiers, each on a randomly sampled subset of the training dataset. Boosting, such as AdaBoost(Adaptive Boosting) [37,38] iteratively trains classifiers by focusing on the instances they misclassified in previous rounds. In Mixture of Experts [11], each classifier specializes in different tasks or subsets of dataset, with a ‘gating’ layer which determines weights for the combination of classifiers.

Given multiple classifiers, an ensemble system makes decisions based on predictions from diverse classifiers and a rule for combining predictions is necessary. This is usually done by taking a weighted sum of the predictions, see, for instance [39]. Moreover, the weights may also be tuned via a training process.

4.3. Logifold Structure

Let

(X, P)

be a fuzzy topological measure space with

μ (X) > 0

, where

μ

is the measure of X, which is taken as an idealistic dataset. For instance, it can be the set of all possible appearances of cats, dogs, and eggs. A sample fuzzy subset U of X is taken and is identified with a subset of

R^{n} \times T

. This identification is denoted by

ϕ : U \to R^{n} \times T

and

Z : = ϕ (U)

. For instance, this can be obtained by taking pictures in a certain number of pixels for some cats and dogs, and T is taken as the subset of labels {‘C’,‘D’}. By the mathematical procedure given above, we obtain a trained model g, which is a fuzzy linear logical function, denoted by

g = g_{(G, L, P, p)} : R^{n} \to S^{| T | - 1}

, where G is the neural network with one target vertex, and L and p are the affine linear maps and activation functions, respectively. This is the concept of a chart of X in Definition 9.

Let

g_{i} : R^{n_{i}} \to S^{| T_{i} | - 1}

be a number of trained models and

G_{i} : R^{n_{i}} \times T_{i} \to [0, 1]

be the corresponding certainty functions. Note that

n_{i}

and

T_{i}

can be distinct for different models. Their results are combined according to certain weight functions

ρ_{i} : X \to [0, 1]

and we get

\sum_{i} ρ_{i} (x) G_{i} (ϕ_{i} (x)) : X \to [0, 1],

where the sum is over those i whose corresponding charts

U_{i}

contain x.

This gives

P

in (3), and we obtain a fuzzy linear logifold.

In the following sections, we introduce the implementation details for finding fuzzy domains and the corresponding voting system for models with different domains.

4.4. Thick Targets and Specialization

We consider fuzzy subsets

U_{i} \subset X

and

Z_{i} \subset R^{n_{i}} \times T_{i}

with identification

ϕ_{i} : U_{i} \to Z_{i}

. A common fuzziness that we make use of comes from ‘thick targets’. For instance, let

T = {‘ C ’, ‘ D ’, ‘ E ’}

(continuing the example used in the last subsection). Consider

\tilde{T} = {{‘ C ’, ‘ D ’}, {‘ E ’}}

, which consists of the two classes

{‘ C ’, ‘ D ’}

and

{‘ E ’}

. We take a sample

\tilde{Z} \subset R^{n} \times \tilde{T}

consisting of pictures of cats, dogs, and eggs with the two possible labels ‘cats or dogs’ and ‘eggs’. Then, we train a model with two targets (

| \tilde{T} | = 2

), and obtain

\tilde{G} : R^{n} \times \tilde{T} \to [0, 1]

.

Definition 11.

Let T be a finite set and

\tilde{T}

be a subset of the power set

P (T)

such that

\emptyset \notin \tilde{T}

and

t \cap t^{'} = \emptyset

for all distinct

t, t^{'} \in \tilde{T}

.

1.: $t \in \tilde{T}$ is called thin (or fine) if it is a singleton and thick otherwise.
2.: The union $⋃ \tilde{T} \subset T$ is called the flattening of $\tilde{T}$ .
3.: We say that $\tilde{T}$ is full if its flattening is T and fine if all its elements are thin.

Given a model

g_{i} : R^{n_{i}} \to S^{| {\tilde{T}}_{i} | - 1}

, where

g_{i} = (g_{i, 1}, \dots, g_{i, | {\tilde{T}}_{i} |})

, define the certainty function of

g_{i}

as

C_{i} : = {max}_{j} g_{i, j}

. For

α \in [0, 1]

,

Z_{α, i} : = {(x, y) \in Z_{i} : C_{i} (x) \geq α}

is called the certain part of the model

g_{i}

, or fuzzy domain of

g_{i}

, at certainty threshold

α

in

Z_{i}

. Let

M

denote the collection of trained model

{g_{i}}_{i \in I}

with identifications

ϕ_{i} : U_{i} \subset X \to Z_{i}

. The union

X_{α} = ⋃_{i \in I} ϕ_{i}^{- 1} (Z_{α, i})

is called the certain part with certainty threshold α of

M

. For instance, in the dataset of appearances of dogs, cats, and eggs, suppose we have a model

g_{i}

with target

T_{i} = {‘ C ’, ‘ D ’}

.

ϕ_{i}^{- 1} (Z_{α, i})

is the subset of appearance of cats and dogs sampled by the set of labeled pictures

Z_{i} \subset R^{n_{i}} \times T_{i}

that has certainty

\geq α

by the model. Note that as

α

decreases, there must be a greater or equal number of models satisfying the conditions, in particular,

X_{α} \subset X_{α^{'}}

for

α^{'} < α

.

Table 1 summarizes the notations introduced here.

Table 1. Summary of frequently used symbols. Let X denote the dataset equipped with a topology and a measure representing its distribution in real-world space. Following Definition 9, we define the fuzzy linear Logifold as the collection

M = {(U_{i}, ϕ_{i}, g_{i})}_{i \in I}

, where each triple consists of a domain

U_{i} \subset X

, an identification map

ϕ_{i} : U_{i} \to R^{n_{i}} \times {\tilde{T}}_{i}

, and a trained model

g_{i}

. For brevity, we refer to

M

as the set of trained models

{g_{i}}_{i \in I}

. Note that

U_{i} = Z_{α, i .}

for some certainty threshold

α

in application.

One effective method we use to generate more charts of different types is to change the target of a model, which we call specialization. A network function

f_{θ} : R^{n} \to R^{| \tilde{T} |}

can be turned into a ‘specialist’ for a new target classes

{\tilde{T}}^{'} = {t_{1}, \dots, t_{m}}

, where each new target

t_{i} \in {\tilde{T}}^{'}

is a proper subset of target classes

\tilde{T}

of f such that

t_{i} \cap t_{j} = \emptyset

if

t_{i} \neq t_{j}

.

Let G and t be the underlying graph of a neural network function and associated target vertex. By adding one more vertex u and adjoining it to the target vertex t of G, we can associate a function which is the composition of linear and activation functions along the arrow

\vec{a}

whose source is t and the target is u, with u associated with

R^{m}

. This results in a network function

g_{θ^{'}} : R^{| \tilde{T} |} \to R^{m}

with underlying graph

t \overset{\vec{a}}{⟶} u

. By composing f and g, we obtain

{\tilde{f}}_{(θ^{'}, θ)} = g_{θ^{'}} \circ f_{θ}

, whose target classes are

t_{1}, \dots, t_{m}

, with the concatenated graph consisting of G and

t \overset{\vec{a}}{⟶} u

. Training the obtained network function

{\tilde{f}}_{(θ^{'}, θ)}

is called a specialization.

4.5. Voting System

We assume the above setup and notations for a dataset X and a collection of trained models

M

. We introduce a voting system that utilizes fuzzy domains and incorporates trained models with different targets. Pseudo-codes for the implementation are provided in Appendix A.

Let

T = {c_{1}, \dots, c_{N}}

be a given set of target classes, and suppose a measurable function

f : X \to T

is given. In practice, this function is inferred from a statistical sample. We will compare f with the prediction obtained from

M

.

Consider the subcollection of models

g_{i}

in

M

which have flattened targets

T_{i} \subset T

. By abuse of notation, we still denote this subcollection by

M

. We assume the following.

If the flattened target $T_{i}$ of a model is minimal in the sense that $T_{j} \neg \subset T_{i}$ for any other $j \neq i$ , then ${\tilde{T}}_{i}$ is fine, that is, all its elements are singleton.
Every target classes $t_{1}, \dots, t_{k}$ in a target set ${t_{1}, \dots, t_{k}}$ has no intersection.
$M$ has a trained model whose flattened target equals $T = {c_{1}, \dots, c_{N}}$ .

Below, we first define a target graph, in which each node corresponds to a flattened target

T^{'}

. Next, we consider predictions from the collection of models with the same flattened target

T_{i} = T^{'}

for some

T^{'} \subset T

. We then combine predictions from nodes along a path in a directed graph with no oriented cycle.

4.5.1. Construction of the Target Graph

We assign a partial order to the collection of trained models

M

, where the weighted answers from each trained model accumulate according to this order. The partial order is encoded by a directed graph with no oriented cycle, which we call a target graph. Define

T

as the collection of flattenings, that is,

T : = \{T_{i} : i \in I\} .

Among the flattenings in

T

, the partially ordered subset relation induces a directed graph that has a single source vertex, called the root node, which is associated with the given set of target classes

T = {c_{1}, \dots, c_{N}}

.

Let

T

denote the set of nodes in the target graph. For each node

s \in T

, let

T_{s} \in T

denote the associated flattening of the target, and define

I_{s}

as the index set

I_{s} : = {i \in I ∣ T_{i} = T_{s}},

which records the indices of the trained models corresponding to node s. By abuse of notation,

T

also refers to the target graph itself, and

g_{i} \in I_{v}

indicates that

g_{i}

is the trained model whose index i belongs to

I_{v}

.

We define the refinement of targets, which allow us to systematically combine the predictions from multiple models at each node.

Definition 12.

Let T be a finite set and suppose we have a collection of subsets

{\tilde{T}}_{i}

of the power set

P (T)

for

i \in I

such that for each i, its flattening equals T, that is,

⋃ {\tilde{T}}_{i} = T

; moreover,

\emptyset \notin {\tilde{T}}_{i}

and

t \cap t^{'} = \emptyset

for all distinct

t, t^{'} \in {\tilde{T}}_{i}

. The common refinement is defined as

\{⋂_{i \in I} t_{i} : {(t_{i})}_{i \in I} \in \prod_{i \in I} {\tilde{T}}_{i}\} ∖ \{\emptyset\} .

At each node

v \in T

, we consider the collection

{{\tilde{T}}_{i}}_{i \in I_{v}}

of targets of all models at the node, and we take its common refinement

{\bar{T}}_{v}

. See Example 2.

4.5.2. Voting Rule for Multiple Models Sharing the Same Target

Let

I_{E}

denote the characteristic function of a measurable set E, defined as

I_{E} (x) = {\begin{cases} 1 & if x \in E \\ 0 & otherwise, \end{cases}

where

E \subset X

or

E \subset R^{n}

for some positive integer n.

Let

(U_{i}, ϕ_{i}, g_{i})

be a triple consisting of trained model

g_{i} : R^{n_{i}} \to {\tilde{T}}_{i}

, fuzzy subset

U_{i} \subset X

, and identification

ϕ_{i} : U_{i} \to Z_{i} \subset R^{n_{i}} \times {\tilde{T}}_{i}

. Let

\begin{matrix} {\hat{x}}_{i} : = & π_{1} \circ ϕ_{i} (x) \in R^{n_{i}}, \\ {\hat{y}}_{i} : = & π_{2} \circ ϕ_{i} (x) \in {\tilde{T}}_{i} \end{matrix}

denote the feature and output of realized data for each

x \in U_{i}

via identification

ϕ_{i}

with the projection maps

π_{1}

and

π_{2}

from

R^{n_{i}} \times {\tilde{T}}_{i}

onto their first and second components, respectively.

The accuracy function

Φ_{i}

of the trained model

g_{i}

over

Z_{i}

at certainty threshold α is defined as

Φ_{i} (α) : = \frac{|{({\hat{x}}_{i}, {\hat{y}}_{i}) \in Z_{α, i} : P_{i} ({\hat{x}}_{i}) = {\hat{y}}_{i}}|}{|Z_{α, i}|} .

Here, we denote the measure of a subset Z by

| Z |

.

Let

G = {g_{1}, \dots, g_{m}}

be a family of trained models sharing the same target set

\tilde{T} = {t_{1}, \dots, t_{k}}

with accuracies

Φ_{1}, \dots, Φ_{m}

, respectively. We define the weighted answer from

G

, a group of trained models sharing the same targets for

x \in X

at a certainty threshold α as

p_{G} (α, x) : = (p_{G} {(α, x)}_{1}, \dots, p_{G} {(α, x)}_{k}) : = {(\frac{\sum_{i} I_{X_{α, i}} (x) Φ_{i} (α) g_{i, j} ({\hat{x}}_{i})}{Φ_{G} (α)})}_{j = 1, \dots, k},

where

Φ_{G} (α) : = \sum_{i} I_{{C_{i} (p_{i}) \geq α}} (p_{i}) Φ_{i} (α)

is the accuracy function of models in

G

with certainty threshold

α

, and

X_{α, i} : = ϕ^{- 1} (Z_{α, i})

is the certain part of

g_{i}

in X with certainty threshold

α

for each

i = 1, \dots, m

. If

Φ_{G} (α) = 0

, then we define

p_{G} (α, x) = 0 \in R^{k}

. See Example 2.

4.5.3. Voting Rule at a Node

Let v be a node in the target graph

T

associated with flattened target

T_{v}

, refinements

{\bar{T}}_{v}

, and associated models

I_{v}

. Consider the collection of all distinct target sets of models in

I_{v}

, that is

{{\tilde{T}}_{i}}_{i \in I_{v}} = {{\tilde{T}}_{v, 1}, \dots, {\tilde{T}}_{v, n_{v}}}

for some positive integer

n_{v}

. Let

G_{i}

denote the family of models sharing the same target

{\tilde{T}}_{v, i} = {t_{i, 1}, \dots, t_{i, k_{i}}}

for

i = 1, \dots, n_{v}

.

For each family of models

G_{i}

, we have a combined answer vector

p_{G_{i}} \in R^{k_{i}}

with the accuracy function

Φ_{G_{i}}

. Define

Ψ_{G_{i}}

as the weight function for the family of models

G_{i}

as

Ψ_{G_{i}} : = \{\begin{matrix} \frac{Φ_{G_{i}}}{\sum Φ_{G_{i}}} if \sum Φ_{G_{i}} \neq 0, \\ 0 otherwise \end{matrix}

for each

i = 1, \dots, n_{v}

.

Since each

p_{G_{i}} {(α, x)}_{j}

, the j-th component of the weighted answer from

G_{i}

for x at certainty threshold

α

indicates how much it predicts x to be classified into target

t_{i, j} \in {\tilde{T}}_{i}

at certainty threshold

α

, we can multiply these ‘scores’ to compute the overall agreement on

\cap_{i = 1}^{n_{v}} t_{i, j_{i}}

among the families

G_{i}

. For a given tuple

{(t_{i, j_{i}})}_{i} \in \prod_{i = 1}^{n_{v}} {\tilde{T}}_{v, i}

with multi-index

J = (j_{1}, \dots, j_{n_{v}})

, define combined answer on

t_{J}

(at node v) as

p_{J} (α, x) = \prod_{i = 1}^{n_{v}} p_{G_{i}} {(α, x)}_{j_{i}},

where

t_{J}

is the tuple

{(t_{i, j_{i}})}_{i = 1, \dots, n_{v}}

. Let

J_{v}

be the set of all possible indices J.

Let

{\bar{T}}_{v} = {{\bar{t}}_{1}, \dots, {\bar{t}}_{k}}

be the collection of refinements at node v. Then there exist unique indices

J_{1}, \dots, J_{k}

such that

{\bar{t}}_{s} = ⋂ t_{J_{s}} : = t_{1, j_{1}} \cap \dots \cap t_{n_{v}, j_{n_{v}}}

, where

J_{s} = (j_{1}, \dots, j_{n_{v}})

for each

s = 1, \dots, k

. For each multi-index J in

J_{v}

, we call J an invalid combination if

\cap t_{J} = \emptyset

and a valid combination otherwise, that is,

\cap t_{J} \in {\bar{T}}_{v}

. For an invalid combination

J = (j_{1}, \dots, j_{n_{v}})

, we define the contribution factor

β

to distribute their combined answer

p_{J}

to other refinements as follows:

β (t_{i, j_{i}}, {\bar{t}}_{s}) = \{\begin{matrix} \frac{p_{J_{s}}}{\sum_{{\bar{t}}_{m} \subset t_{i, j_{i}}} p_{J_{m}}} & if {\bar{t}}_{s} \subset t_{i, j_{i}} \\ 0 & otherwise \end{matrix}

(4)

for each

i = 1, \dots, n_{v}

and a valid refinement

{\bar{t}}_{s} \in {\bar{T}}_{v}

. Since

p_{J} = p_{J} \cdot (Ψ_{G_{1}} + \dots + Ψ_{G_{n_{v}}})

, we can decompose

p_{J}

into

\begin{matrix} p_{J} = & p_{J} Ψ_{G_{1}} β (t_{1, j_{1}}, {\bar{t}}_{1}) + \dots + p_{J} Ψ_{G_{n_{v}}} β (t_{n_{v}, j_{n_{v}}}, {\bar{t}}_{1}) \\ + \dots + p_{J} Ψ_{G_{1}} β (t_{1, j_{1}}, {\bar{t}}_{k}) + \dots + p_{J} Ψ_{G_{n_{v}}} β (t_{n_{v}, j_{n_{v}}}, {\bar{t}}_{k}), \end{matrix}

as

\sum_{s = 1}^{k} β (t_{i, j_{i}}, {\bar{t}}_{s}) = 1

for any

t_{i, j_{i}}

. Then, we define

A

as the answer at node v, a function from X to

R^{k}

at certainty threshold

α

, as

A_{v} = {(p_{J_{s}} + \sum_{\begin{matrix} J \in J_{v, invalid} \\ J = (j_{1}, \dots, j_{n_{v}}) \end{matrix}} \sum_{i = 1}^{n_{v}} p_{J} Ψ_{G_{i}} β (t_{i, j_{i}}, {\bar{t}}_{s}))}_{s = 1, \dots, k},

where

J_{v, invalid}

is the collection of invalid combinations at node v. See Example 2.

Example 2

(An example of the voting procedure at a node). For a given node

v \in T

, let the flattened target

T_{v} = {c_{1}, c_{2}, c_{3}, c_{4}, c_{5}}

and the indices of models

I_{v} = {(1, 1), (1, 2), 2}

be associated with v. Suppose that the two models

g_{1, 1}

and

g_{1, 2}

share the target set

{\tilde{T}}_{1}

, and

{\tilde{T}}_{2}

denotes the target set of

g_{2}

, where

\begin{matrix} {\tilde{T}}_{1} & = {t_{1, 1}, t_{1, 2}} = {{c_{1}, c_{2}, c_{3}}, {c_{4}, c_{5}}}, \\ {\tilde{T}}_{2} & = {t_{2, 1}, t_{2, 2}, t_{2, 3}} = {{c_{1}, c_{2}}, {c_{3}, c_{4}}, {c_{5}}} . \end{matrix}

Their refinement

{\bar{T}}_{v}

is

{{c_{1}, c_{2}}, {c_{3}}, {c_{4}}, {c_{5}}}

. Let

G_{1} = {g_{1, 1}, g_{1, 2}}

and

G_{2} = {g_{2}}

, the collections of models sharing the same targets.

Suppose that the accuracy functions

Φ_{1, 1}, Φ_{1, 2}, Φ_{2}

of models

g_{1, 1}, g_{1, 2}, g_{2}

are given, respectively. For simplicity, we look at certainty threshold

α_{0} = 0

and suppress our notation reserved for the certainty threshold α. Let an instance x be given, and trained models provide answers for x as follows:

\begin{matrix} g_{1, 1} & = (a_{1, 1}, a_{1, 2}), & g_{1, 2} & = (a_{2, 1}, a_{2, 2}), & g_{2} & = (b_{1}, b_{2}, b_{3}) . \end{matrix}

Then, we have the weighted answers

p_{G}

from each collection of models sharing the same targets

G

as follows:

\begin{matrix} p_{G_{1}} & = (\frac{Φ_{1, 1} a_{1, 1} + Φ_{1, 2} a_{2, 1}}{Φ_{G_{1}}}, \frac{Φ_{1, 1} a_{1, 2} + Φ_{1, 2} a_{2, 2}}{Φ_{G_{1}}}) : = (a_{1}, a_{2}), \\ p_{G_{2}} & = (b_{1}, b_{2}, b_{3}), \end{matrix}

where

Φ_{G_{1}} = Φ_{1, 1} + Φ_{1, 2}

and

Φ_{G_{1}} = Φ_{2}

are the accuracy functions of

G_{1}

and

G_{2}

, respectively. Additionally, we have the weight functions

Ψ_{G_{1}} = \frac{Φ_{G_{1}}}{Φ_{G_{1}} + Φ_{G_{2}}}

and

Ψ_{G_{2}} = \frac{Φ_{G_{2}}}{Φ_{G_{1}} + Φ_{G_{2}}}

.

Let

J_{v}

, the collection of all combinations, be

{J_{1}, J_{2}, J_{3}, J_{4}, J_{5}, J_{6}}

, where

\begin{matrix} t_{J_{1}} & = {t_{1, 1}, t_{2, 1}}, & t_{J_{2}} & = {t_{1, 1}, t_{2, 2}}, & t_{J_{3}} & = {t_{1, 1}, t_{2, 3}}, \\ t_{J_{4}} & = {t_{1, 2}, t_{2, 1}}, & t_{J_{5}} & = {t_{1, 2}, t_{2, 2}}, & t_{J_{6}} & = {t_{1, 2}, t_{2, 3}} . \end{matrix}

Note that

J_{3}

and

J_{4}

are invalid combinations at node v, and the refinements of valid combinations are

\cap t_{J_{1}} = {c_{1}, c_{2}}, \cap t_{J_{2}} = {c_{3}}, \cap t_{J_{5}} = {c_{4}}

, and

\cap t_{J_{6}} = {c_{5}}

. We compute the combined answer for each

t_{J_{i}}

as

\begin{matrix} p_{J_{1}} = & a_{1} b_{1}, & p_{J_{2}} = & a_{1} b_{2}, & p_{J_{3}} = & a_{1} b_{3}, \\ p_{J_{4}} = & a_{2} b_{1}, & p_{J_{5}} = & a_{2} b_{2}, & p_{J_{6}} = & a_{2} b_{3} . \end{matrix}

Then, the nontrivial contribution factors β of

J_{3}

defined in Equation (4) are

\begin{matrix} β (t_{1, 1}, {c_{1}, c_{2}}) & = \frac{p_{J_{1}}}{p_{J_{1}} + p_{J_{2}}} = \frac{b_{1}}{b_{1} + b_{2}}, \\ β (t_{1, 1}, {c_{3}}) & = \frac{p_{J_{2}}}{p_{J_{1}} + p_{J_{2}}} = \frac{b_{2}}{b_{1} + b_{2}}, \\ β (t_{2, 3}, {c_{5}}) & = 1, \end{matrix}

as

t_{1, 1} = {c_{1}, c_{2}, c_{3}}

and

t_{2, 3} = {c_{5}}

, and β of

J_{4}

are

\begin{matrix} β (t_{1, 2}, {c_{4}}) & = \frac{b_{2}}{b_{2} + b_{3}}, \\ β (t_{1, 2}, {c_{5}}) & = \frac{b_{3}}{b_{2} + b_{3}}, \\ β (t_{2, 1}, {c_{1}, c_{2}}) & = 1, \end{matrix}

as

t_{1, 2} = {c_{4}, c_{5}}

and

t_{2, 1} = {c_{1}, c_{2}}

. Therefore, the answer

A_{v}

at node v for x is

\begin{matrix} A_{v} (x) & = (a_{1} b_{1} + \frac{a_{1} b_{1} b_{3} Ψ_{G_{1}}}{b_{1} + b_{2}} + a_{2} b_{1} Ψ_{G_{2}}, a_{1} b_{2} + \frac{a_{1} b_{2} b_{3} Ψ_{G_{1}}}{b_{2} + b_{3}} \\ , a_{2} b_{2} + \frac{a_{2} b_{1} b_{2} Ψ_{G_{1}}}{b_{2} + b_{3}}, a_{2} b_{3} + a_{1} b_{3} Ψ_{G_{2}} + \frac{a_{2} b_{1} b_{3} Ψ_{G_{1}}}{b_{2} + b_{3}}) . \end{matrix}

4.5.4. Accumulation of Votes Along Valid Paths

For a target class

c \in T

, a sequence of nodes

γ = (s_{0}, s_{1}, \dots, s_{m})

in

T

is called a valid path for c if

γ

satisfies the following conditions:

$s_{0}$ is the root of $T$ .
$c \in T_{s_{i}}$ for all $i = 0, \dots, m$ .
${\bar{T}}_{s_{m}}$ consists of thin targets.

Let

γ = (s_{0}, s_{1}, \dots, s_{m})

be a valid path for a class

c \in T

, where

s_{0}

is the root of

T

. Since each trained model provides prediction independently, we define

M (γ, α, x)

, the weighted answer for x at certainty threshold α along a valid path γ, as the product

M (γ, α, x) = {(\prod_{i = 0}^{m} A_{s_{i}} {(α, x)}_{j_{i, t}})}_{t \in {\bar{T}}_{s_{m}}} = (M_{t_{1}}, \dots, M_{t_{| {\bar{T}}_{s_{m}} |}}),

(5)

which represents how much

M

predicts x to be classified in each target

t \in {\bar{T}}_{s_{m}}

along the path

γ

. Here,

j_{i, t}

is the index of

\bar{t} \in {\bar{T}}_{s_{i}}

such that

\bar{t}

is the the unique refinement in

{\bar{T}}_{s_{i}}

containing c. Then, define

P (γ, α, x)

, the prediction for x at certainty threshold α along a valid path γ, as the

arg max M (γ, α, x)

.

Remark 9.

Under the specialization method explained in Section 4.4, we can construct the ‘gating layer’ as in the Mixture of Expert [11] using this voting strategy. Let g be a trained model in

M

and the targets of g be

\tilde{T} = \{t_{1}, \dots, t_{k}\}

, where

t_{i} = {c_{i, 1}, \dots, c_{i, n_{i}}}

for

i = 1, \dots, k

such that

T = {c_{1, 1}, \dots, c_{k, n_{k}}}

. Then g serves as the ‘gating’ layer in

M

navigating an instance to other trained models that are trained on a dataset containing classes exclusively within a target

t_{i}

of

\tilde{T}

. See [12] for the experimental results.

4.5.5. Vote Using Validation History

We introduce the ‘using validation history’ method in prediction to alleviate concerns regarding the optimal valid path or certainty threshold. In other words,

α

and

γ

in Equation (5) are fixed through this method based on the validation dataset.

Let

X_{val}

be a measurable subset of X with

μ (X_{val}) < \infty

, where

μ

is the measure of X. Let

Γ (c)

denote the set of all valid paths in the target graph

T

for a class c. Given the true label function

f : X \to T

, we define r as the expected accuracy along a path γ at certainty threshold α to class c as follows:

r_{c} (γ, α) : = \frac{|({P (γ, α, x) = c} \cap f^{- 1} (c)) \cup {P (γ, α, x) \neq c and f (x) \neq c}|}{|X_{val}|},

where

γ \in Γ (c)

. Since there is a finite number of valid paths for each target class and

α \in [0, 1]

, there exists a tuple of maximizers

(γ^{*} (c), α^{*} (c))

for each class c such that the expected accuracy r attains its supremum at

α^{*} (c)

. We define the answer using the validation history for

x \in X

as

M (x) = (M_{c_{1}} (γ^{*} (c_{1}), α^{*} (c_{1}), x), \dots, M_{c_{N}} (γ^{*} (c_{N}), α^{*} (c_{N}), x)) .

Remark 10.

Let a system of trained models

M

and validation dataset

X_{val}

be given. The combining rule using the validation history for each trained model serves as the role of ρ in Definition 9, and

P : X \to [0, 1]

is defined by the vote of

M

using the validation history.

Remark 11.

To use the validation history, every model in the Logifold must first be evaluated on a validation dataset. Once this evaluation is completed, the time and space complexity of prediction become proportional to the number of models involved along the predetermined paths, which are bounded above by the total number of models in

M

.

Constructing the target tree requires searching for partial subset relations under the assumption of a unique maximal element. In the worst case, this process has a complexity of

O (T^{2})

, where T is the total number of target classes.

This bound arises because, in the worst case, the root may have

T - 1

children, such as

{1, \dots, T - 1}

,

{1, \dots, T - 2, T}

, …,

{2, \dots, T}

. Each of these children can, in turn, have up to

T - 2

children, and so on. When a new model is inserted into the tree, its position must be determined by repeatedly checking subset relations at each level. At most, this search requires

1 + (T - 1) + (T - 2) + \dots + 2

operations and memories, leading to a total complexity proportional to

O (T^{2})

in both time and space. Thus, constructing the full target tree or inserting a new model in the worst case involves quadratic growth in the number of target classes.

5. Experiments and Results

We summarize the results of two experiments from our previous paper [12] on well-known benchmark datasets as follows: CIFAR10 [13], MNIST [14], and Fashion MNIST [15].

The objective of first experiment was to demonstrate the logifold’s ability to mitigate the negative impact of poor-performing models within an ensemble. The setup involved combining six low-accuracy Simple CNN models (with an average accuracy of 56.45%) with one high-accuracy ResNet20 [40] model (achieving 85.96% accuracy) on the CIFAR10 test set (10,000 images). The logifold achieved a notable 84.86% accuracy, which outperformed simple averaging (62.55%) and majority voting (58.72%). This outcome underscores Logifold’s capability to leverage the contributions of the most well-trained component model, even when poor performance models are prevalent. Results are detailed in Table 2.

Table 2. Accuracy results for varying certainty thresholds. Thresholds are selected as

σ (- \infty), σ (0), σ (1), \dots, σ (7)

, where

σ (x) : = {(1 + e^{- x})}^{- 1}

denotes the sigmoid function. The second column indicates accuracy using the refined voting system without validation history; thus, the accuracy at the 0 threshold reflects standard weighted voting based on validation performance. The third column shows accuracy within the subset of data classified above each certainty threshold. Baseline comparisons (simple average and majority voting) and refined voting using validation history explained in Section 4.5.5 are summarized below the table.

The second experiment highlights the flexibility of the Logifold structure when integrating models trained on diverse target datasets. We combined MNIST, Fashion MNIST, and CIFAR10 datasets by resizing MNIST and Fashion MNIST images from

28 \times 28

grayscale to

32 \times 32

RGB images, matching the CIFAR10 dataset dimensions. This concatenation created a unified dataset with 30 distinct target classes: 10 classes from each dataset. We trained a ResNet20 [40] model T on the combined dataset and ResNet20 models M, F, and C, each exclusively trained on MNIST, Fashion MNIST, and CIFAR10, respectively. Additionally, a filter model

T^{'}

was derived through the specialization method explained in Section 4.4. Evaluated on a combined testing dataset of 30,000 samples (10,000 per dataset), the Logifold comprising

T^{'}

, M, F, and C achieved 89.31% accuracy using the refined voting with validation history method, outperforming that of a single model T at 75.16%. When model T was included alongside

T^{'}

, M, F, and C, the accuracy remained at 75.16% for simple averaging, weithed averaging or majority voting methods due to predictions of model T dominating these voting systems. However, refined voting system achieved 90.78%. Table 3 summarizes the results.

Table 3. Accuracy results for various Logifold compositions evaluated on the combined test set from MNIST, Fashion MNIST, and CIFAR10 datasets. The second column reports results obtained through non-refined voting methods (simple average, weighted average, or majority voting). The third column provides accuracy using refined voting with validation history.

6. Conclusions

The inherently discrete and often irregular nature of real-world data calls for a new mathematical framework that more faithfully models such complexity. Traditional manifold theory, while useful for interpolating data points, often falls short of addressing the essential discreteness and fuzziness needed in contemporary data science applications. Building on the local-to-global principle from classical geometry, we adapt it into a measure-theoretic context, deliberately moving beyond idealized assumptions like smoothness. We design frameworks that can handle the discrete, high-dimensional, and heterogeneous characteristics typical of modern datasets. Within this perspective, we propose the logifold structure (Definition 8)—a geometric construction that models the local behaviors of classifiers as “charts” and integrates them coherently into a global structure.

A key conceptual choice in this framework is to use the graphs of linear logical functions (Definition 1) as the fundamental local models. Instead of relying on a single global predictor, the logifold represents the data through a collection of these local, interpretable views—each similar in spirit to neural network models. Supported by the universality theorem (Theorem 5), this design ensures that the entire data space can be adequately covered by an atlas of specialized and diverse models. Each local model remains reliable within its domain, and together they form a coherent and mathematically grounded global ensemble. This research design, therefore, directly addresses the non-Euclidean, discrete, and high-dimensional challenges of real-world data by integrating well-understood local structures into a unified, flexible global framework.

We support our theoretical framework with an implementation of computer algorithms and experiments. In our previous work [12], we tested logifolds on the following well-known benchmark datasets: CIFAR10, MNIST, and Fashion MNIST. The experiments demonstrated that logifolds can effectively defend against poor outputs from underperforming models within an ensemble and flexibly combine models specialized for different target domains. Notably, the logifold ensemble substantially outperformed both simple averaging and majority voting baselines, and it also achieved higher accuracy than single models trained on the full task.

In future research, we shall focus on developing sophisticated, adaptive, and potentially learned methods for defining and enforcing model domains. This is essential for moving the logifold concept from a theoretical and experimental success toward broader practical adoption—particularly in settings involving adversarial attacks, out-of-distribution data, or highly dynamic environments where model uncertainty and reliability are critical.

Author Contributions

Conceptualization, S.-C.L.; methodology, I.J. and S.-C.L.; software, I.J. and S.-C.L.; validation, I.J.; formal analysis, I.J. and S.-C.L.; investigation, I.J. and S.-C.L.; resources, S.-C.L.; writing—original draft preparation, I.J. and S.-C.L.; writing—review and editing, I.J. and S.-C.L.; visualization, I.J. and S.-C.L.; supervision, S.-C.L.; project administration, S.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Pseudo-Codes

In this appendix, we provide pseudocode implementations for the algorithms discussed in Section 4. Given classification classes are

c_{1}, \dots, c_{N}

.

For instance, in Algorithm A1 with Example 2, we have the following six combinations:

\begin{matrix} {CB}_{1} : = & (t_{1, 1}, t_{2, 1}) = ({c_{1}, c_{2}, c_{3}}, {c_{1}, c_{2}}), & {CB}_{4} : = (t_{1, 2}, t_{2, 1}) & = ({c_{4}, c_{5}}, {c_{1}, c_{2}}), \\ {CB}_{2} : = & (t_{1, 1}, t_{2, 2}) = ({c_{1}, c_{2}, c_{3}}, {c_{3}, c_{4}}), & {CB}_{5} : = (t_{1, 2}, t_{2, 2}) & = ({c_{4}, c_{5}}, {c_{3}, c_{4}}), \\ {CB}_{3} : = & (t_{1, 1}, t_{2, 3}) = ({c_{1}, c_{2}, c_{3}}, {c_{5}}), & {CB}_{6} : = (t_{1, 2}, t_{2, 3}) & = ({c_{4}, c_{5}}, {c_{5}}) . \end{matrix}

Let

{\bar{t}}_{i}

denote the refinement obtained by the combination

{CB}_{i}

for

i = 1, \dots, 6

. In other words,

{CB}_{i} . refinement = {\bar{t}}_{i}

and

{\bar{t}}_{i} . component = {CB}_{i}

. Then

{\bar{t}}_{1} = {c_{1}, c_{2}}, {\bar{t}}_{2} = {c_{3}}, {\bar{t}}_{3} = {\bar{t}}_{4} = \emptyset, {\bar{t}}_{5} = {c_{4}}

, and

{\bar{t}}_{6} = {c_{5}}

.

{CB}_{3}

and

{CB}_{4}

are invalid combinations, and

{\bar{t}}_{1}, {\bar{t}}_{2}, {\bar{t}}_{5}

and

{\bar{t}}_{6}

are valid refinements. Therefore, we have

\begin{matrix} \bar{T} . allCombinations = & \{{CB}_{i} : i = 1, \dots, 6\}, \\ \bar{T} . validCombinations = & \{{CB}_{1}, {CB}_{2}, {CB}_{5}, {CB}_{6}\}, \\ \bar{T} . valid = & {{\bar{t}}_{1}, {\bar{t}}_{2}, {\bar{t}}_{5}, {\bar{t}}_{6}} . \end{matrix}

Algorithm A1 Refinement

Input: Targets

T_{1}, \dots, T_{n}

with

T = \cup T_{1} = \dots = \cup T_{n}

Output: Refinement

\bar{T}

1:: Initialize $\bar{T}$
2:: Combinations ← all combinations $(t_{1}, \dots, t_{n}) \in \prod_{i}^{n} T_{i}$
3:: for $c = (t_{1}, \dots, t_{n}) \in$ Combinations do
4:: $\bar{t} \leftarrow ⋂ c = t_{1} \cap \dots \cap t_{n}$
5:: $c . refinement \leftarrow \bar{t}$
6:: if $\bar{t} \neq \emptyset$ then
7:: $\bar{t} . component \leftarrow c$
8:: Append $\bar{t}$ to $\bar{T} . valid$
9:: Append c to $\bar{T} . validCombinations$
10:: end if
11:: Append c to $\bar{T} . allCombinations$
12:: end for
13:: return $\bar{T}$

Algorithm A2 Construct target graph

Input: Trained models

g_{1}, \dots, g_{n}

and corresponding targets

{\tilde{T}}_{1}, \dots, {\tilde{T}}_{n}

.
Output: Target graph

T

.

1:: Re-index ${{\tilde{T}}_{1}, \dots, {\tilde{T}}_{n}}$ as ${{\tilde{T}}_{1}, \dots, {\tilde{T}}_{m}}$ such that all elements are distinct.
2:: $G_{1}, \dots, G_{m} \leftarrow$ corresponding collections of trained models associated with ${\tilde{T}}_{1}, \dots, {\tilde{T}}_{m}$ ▹Group models sharing the same targets together
3:: Initialize an array $T$
4:: for $\tilde{T}$ runs over ${\tilde{T}}_{1}, \dots, {\tilde{T}}_{m}$ do
5:: $T \leftarrow ⋃ \tilde{T}$
6:: if $T \notin T$ then Append T to $T$
7:: end if
8:: end for
9:: Sort $T$ by decreasing size
10:: for T runs over $T$ do
11:: Add node T to $T$
12:: Initialize $T . nextNodes$
13:: $i \leftarrow$ be the index of T in $T$
14:: for $T^{'} \in T [0, \dots, i - 1]$ do
15:: if $T \subset T^{'}$ then Append T to $T^{'} . nextNodes$
16:: end if
17:: end for
18:: Initialize $T . targetsAndModels$
19:: for $\tilde{T} \in {{\tilde{T}}_{1}, \dots, {\tilde{T}}_{m}}$ do
20:: if $\cup \tilde{T} = T$ then
21:: Add $\tilde{T}$ and corresponding group of models G to $T . targetsAndModels$
22:: end if
23:: end for
24:: Initialize $T . models$
25:: $\{{({\tilde{T}}_{j}, G_{j})}_{j \in Λ}\} \leftarrow T . targetsAndModels$ ▹ $Λ$ is a finite index set
26:: Add $⋃_{j \in Λ} G_{j}$ to $T . models$
27:: $T . refinement \leftarrow REFINEMENT ({({\tilde{T}}_{j})}_{j \in Λ})$
28:: end for
29:: return $T$

Algorithm A3 Fuzzy accuracy

Input: Threshold

α \in [0, 1]

, Model g, Dataset

Z = {(x, y_{x})}

Output: Accuracy

Φ

and Certain Part

1:: Initialize CertainPart and $Φ$
2:: $CertainPart (α) \leftarrow {x ∣ max g (x) \geq α}$
3:: $Φ (α) \leftarrow \frac{| {x \in CertainPart (α) ∣ arg max g (x) = y_{x}} |}{| CertainPart (α) |}$
4:: return $Φ, CertainPart$

Algorithm A4 Total weight and rho

Input: Threshold

α \in [0, 1]

, Models

G = {g_{1}, \dots, g_{n}}

with accuracies

Φ_{g_{1}}, \dots, Φ_{g_{n}}

, Instance x
Output: Sum of weight by Accuracy

Φ

and

{(ρ_{g})}_{g \in G}

1:: for $g \in G$ do
2:: if $max g (x) \geq α$ then
3:: $ρ_{g} (α, x) \leftarrow 1$
4:: else
5:: $ρ_{g} (α, x) \leftarrow ϵ$ ▹ $ϵ$ can be any sufficiently small number.
6:: end if
7:: end for
8:: $Φ (α, x) \leftarrow \sum_{g \in G} ρ_{g} (α, x) Φ_{g} (α)$
9:: return $Φ, {(ρ_{g})}_{g \in G}$

Algorithm A5 Voting rule for shared targets

Input: Threshold

α \in [0, 1]

, Targets

T = {t_{1}, \dots, t_{n}}

, Models

G = {g_{1}, \dots, g_{k}}

with accuracies

Φ_{g_{1}}, \dots, Φ_{g_{k}}

, Instance x
Output: Answer p

1:: $Φ_{G}, {(ρ_{g})}_{g \in G} \leftarrow TOTAL WEIGHT AND RHO (α, G, x)$
2:: $p (α, x) \leftarrow {(Φ^{- 1} (α, x) \cdot \sum_{g \in G} ρ_{g} (α, x) Φ_{g} (α, x) g_{j} (x))}_{j = 1, \dots, n}$
3:: return p

Algorithm A6 Distribute answers at a node

Input: Node v in a target graph

T

with

\bar{T} = v . refinement

, Combined votes

{p_{c}}_{c \in \bar{T} . allCombinations}

, Instance x
Output: Answer for x at each threshold

α

1:: $\{{({\tilde{T}}_{i}, G_{i})}_{i = 1, \dots, k}\} \leftarrow v . targetsAndModels$
2:: Initialize $p$ and $β$
3:: for $t \in ⋃_{i = 1}^{k} {\tilde{T}}_{i}$ do
4:: ${\bar{T}}_{t} \leftarrow {\bar{t} \in \bar{T} . valid : \bar{t} \subset t}$
5:: $C_{t} \leftarrow {\bar{t} . component : \bar{t} \in {\bar{T}}_{t}}$
6:: end for
7:: for $t \in ⋃_{i = 1}^{k} {\tilde{T}}_{i}$ do
8:: for $\bar{t} \in {\bar{T}}_{t}$ do
9:: $c \leftarrow \bar{t} . component$
10:: $β (t, \bar{t}) \leftarrow \frac{p_{c} (α, x)}{\sum_{d \in C_{t}} p_{d} (α, x)}$
11:: end for
12:: end for
13:: for $c = (t_{1, j_{1}}, \dots, t_{k, j_{k}}) \in \bar{T} . allCombinations$ do
14:: if $c \in \bar{T} . validCombinations$ then
15:: $s \leftarrow$ the index of $c . refinement$ in $\bar{T}$ .
16:: $p {(α, x, c)}_{s} \leftarrow p_{c} (α, x)$
17:: $p {(α, x, c)}_{j} \leftarrow 0$ for all $j \neq s$
18:: else
19:: for $s = 1, \dots, m$ do
20:: if ${t \in c : {\bar{t}}_{s} \subset t} \neq \emptyset$ then
21:: $p {(α, x, c)}_{s} \leftarrow \sum_{t_{i, j_{i}} \in {t \in c : {\bar{t}}_{s} \subset t}} β (t_{i, j_{i}}, {\bar{t}}_{s}) Ψ_{G_{i}} (α, x) p_{c} (α, x)$
22:: else
23:: $p {(α, x, c)}_{s} \leftarrow ϵ$ ▹ $ϵ$ can be any sufficiently small number.
24:: end if
25:: end for
26:: end if
27:: end for
28:: return $p$

Algorithm A7 Voting rule at a node

Input: Threshold

α \in [0, 1]

, Node v in a target graph

T

, Instance x
Output: Answer for x at each threshold

α

1:: $\{{({\tilde{T}}_{i}, G_{i})}_{i = 1, \dots, k}\} \leftarrow v . targetsAndModels$
2:: $G \leftarrow {G_{1}, \dots, G_{k}}$
3:: for $(\tilde{T}, G) \in v . targetsAndModels$ do
4:: $p_{G} \leftarrow VOTING RULE FOR SHARED TARGETS (α, \tilde{T}, G)$
5:: $Φ, {(ρ_{g})}_{g \in G} \leftarrow TOTAL WEIGHT AND RHO (α, G, x)$
6:: end for
7:: $Ψ \leftarrow {(\frac{Φ_{G}}{\sum_{G \in G} Φ_{G}})}_{G \in G}$
8:: $\bar{T} = {{\bar{t}}_{1}, \dots, {\bar{t}}_{m}} \leftarrow v . refinement$
9:: for $t \in ⋃_{i = 1}^{k} {\tilde{T}}_{i}$ do
10:: ${\bar{T}}_{t} \leftarrow {\bar{t} \in \bar{T} . valid : \bar{t} \subset t}$
11:: $C_{t} \leftarrow {\bar{t} . component : \bar{t} \in {\bar{T}}_{t}}$
12:: end for
13:: Enumerate ${{\tilde{T}}_{1}, \dots, {\tilde{T}}_{k}} = {{t_{1, 1}, \dots, t_{1, r_{1}}}, \dots, {t_{k, 1}, \dots, t_{k, r_{k}}}}$
14:: $C \leftarrow \bar{T} . allCombinations$
15:: for $c = (t_{1, j_{1}}, \dots, t_{k, j_{k}}) \in C$ do
16:: Initialize $p_{c}$
17:: $p_{c} (α, x) \leftarrow \prod_{i = 1}^{k} p_{G_{i}} {(α, x)}_{j_{i}}$
18:: end for
19:: $Answers \leftarrow DISTRIBUTE ANSWERS AT A NODE (v, {p_{c}}_{c \in C}, x)$
20:: $p (α, x) \leftarrow {(\sum_{c \in C} Answers {(α, x, c)}_{j})}_{j = 1, \dots, m}$
21:: return $p$

Algorithm A8 Valid paths

Input: Target Graph

T

, Target class

c \in {c_{1}, \dots, c_{N}}

Output:

Valid Paths

1:: $s_{0} \leftarrow {c_{1}, \dots, c_{N}}$
2:: Initialize $Candidates$
3:: Search for all paths $s_{0}, \dots, s_{m}$ such that $s_{i} \in s_{i - 1} . nextNodes$ and append to $Candidates$
4:: Initialize $Valid Paths$
5:: for $s_{0}, \dots, s_{m} \in Candidates$ do
6:: if $c \in s_{i}$ for all $i = 0, \dots, m$ and $| \bar{t} | = 1 for all \bar{t} \in s_{m} . refinement$ then
7:: Append $s_{0}, \dots, s_{m}$ to $Valid Paths$
8:: end if
9:: end for
10:: return $Valid Paths$

Algorithm A9 Voting rule along a path

Input: Threshold

α \in [0, 1]

, Valid path

γ = (s_{0}, \dots, s_{m})

to c, Instance x
Output:

M

Answer for x at each threshold

α

along a valid path

γ

1:: ${{\bar{t}}_{1}, \dots, {\bar{t}}_{n}} \leftarrow s_{m} . refinement$
2:: for ${\bar{t}}_{j} \in {{\bar{t}}_{1}, \dots, {\bar{t}}_{n}}$ do
3:: for $s_{i} \in γ$ do
4:: ${Answer}_{s_{i}} (α, x) \leftarrow VOTING RULE AT A NODE {(α, s_{i}, x)}_{k}$ where ${\bar{t}}_{k} \in s_{i} . refinement$ contains c
5:: end for
6:: $M (γ, α, x) \leftarrow \prod_{i = 0}^{m} {Answer}_{s_{i}} (α, x)$
7:: end for
8:: return $M$

Algorithm A10 Compute expected accuracy

Input: Answer Vector

M

, Target Class c, Valid Path

γ

to c, Threshold

α \in [0, 1]

, (Validation) Dataset

X = {(x, y_{x})}

Output: Expected Accuracy r

1:: $Prediction (γ, α, x) \leftarrow arg max M (γ, α, x)$
2:: $X_{c} \leftarrow {x \in X : y_{x} = c}$
3:: $T P \leftarrow \{Prediction (γ, α, x) = c\} \cap X_{c}$
4:: $T N \leftarrow \{Prediction (γ, α, x) \neq c\} \cap X ∖ X_{c}$
5:: $r (γ, α) \leftarrow \frac{|T P \cup T N|}{|X|}$
6:: return r

Algorithm A11 Vote using validation history

Input: Thresholds

A = {0 = α_{0}, \dots, α_{n}}

, Target Graph

T

, (Validation) Dataset

Z = {{(x, y_{x})}_{x \in X_{val}}}

, Instance x
Output: Final Answer

M (x) = (M_{c_{1}} (x), \dots, M_{c_{N}} (x))

1:: for $c \in {c_{1}, \dots, c_{N}}$ do
2:: $Γ (c) \leftarrow VALID PATHS (T, c)$
3:: for $γ = (s_{0}, \dots, s_{m}) \in Γ (c)$ , and $α \in A$ do
4:: $\bar{T} \leftarrow s_{m} . refinement$
5:: for $x \in X_{val}$ and $\bar{t} \in \bar{T}$ do
6:: $M_{\bar{t}} (γ, α, x) \leftarrow VOTING RULE ALONG A PATH (α, t, γ, x)$
7:: end for
8:: $M (γ, α, x) \leftarrow {(M_{\bar{t}})}_{\bar{t} \in \bar{T}}$
9:: $r (c, γ, α) \leftarrow COMPUTE EXPECTED ACCURACY (M, c, γ, α, X_{val})$
10:: end for
11:: $γ^{*} (c), α^{*} (c) \leftarrow arg {max}_{γ, α} r (c, γ, α)$
12:: $M_{c} (x) \leftarrow M (γ^{*} (c), α^{*} (c), x)$
13:: end for
14:: return $M_{c_{1}} (x), \dots, M_{c_{N}} (x)$

References

Folland, G.B. Real analysis: Modern techniques and their applications. In Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Lee, J.M. Introduction to Smooth Manifolds. In Graduate Texts in Mathematics, 2nd ed.; Springer: New York, NY, USA, 2012. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
van den Dries, L. O-minimal structures and real analytic geometry. In Current Developments in Mathematics 1998 (Cambridge, MA); International Press of Boston: Somerville, MA, USA, 1999; pp. 105–152. [Google Scholar]
Zhang, S.; Liu, M.; Yan, J. The Diversified Ensemble Neural Network. Adv. Neural Inf. Process. Syst. 2020, 33, 16001–16011. [Google Scholar]
Wood, D.; Mu, T.; Webb, A.M.; Reeve, H.W.J.; Luján, M.; Brown, G. A unified theory of diversity in ensemble learning. J. Mach. Learn. Res. 2023, 24, 17302–17350. [Google Scholar]
Zhang, C.; Ma, Y. (Eds.) Ensemble Machine Learning Methods and Applications, 1st ed.; Springer: New York, NY, USA, 2012. [Google Scholar]
Abouelnaga, Y.; Ali, O.S.; Rady, H.; Moustafa, M. CIFAR-10: knn-based ensemble of classifiers. arXiv 2016, arXiv:1611.04905. [Google Scholar]
Antonio, B.; Moroni, D.; Martinelli, M. Efficient adaptive ensembling for image classification. Expert Syst. 2023, 42, e13424. [Google Scholar] [CrossRef]
Shen, L.; Chen, G.; Shao, R.; Guan, W.; Nie, L. MoME: Mixture of multimodal experts for generalist multimodal large language models. In Proceedings of the NIPS ’24: Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; pp. 42048–42070. [Google Scholar]
Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive mixtures of local experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef] [PubMed]
Jung, I.; Lau, S.-C. Logifold: A Geometrical Foundation of Ensemble Machine Learning. In Proceedings of the 2024 4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Male, Maldives, 4–6 November 2024; pp. 1–6. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report 0; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Deng, L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
Hájek, P. Metamathematics of Fuzzy Logic; Springer Science & Business Media: New York, NY, USA, 2013; Volume 4. [Google Scholar]
Das, R.; Sen, S.; Maulik, U. A survey on fuzzy deep neural networks. ACM Comput. Surv. 2020, 53, 1–25. [Google Scholar] [CrossRef]
Kwan, H.K.; Cai, Y. A fuzzy neural network and its application to pattern recognition. IEEE Trans. Fuzzy Syst. 1994, 2, 185–193. [Google Scholar] [CrossRef]
Mendel, J. Fuzzy logic systems for engineering: A tutorial. Proc. IEEE 1995, 83, 345–377. [Google Scholar] [CrossRef]
Popko, E.; Weinstein, I. Fuzzy logic module of convolutional neural network for handwritten digits recognition. J. Phys. Conference Ser. 2016, 738, 012123. [Google Scholar] [CrossRef]
Takagi, T.; Sugeno, M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 1985, SMC-15, 116–132. [Google Scholar] [CrossRef]
Ashtekar, A.; Schilling, T.A. Geometrical Formulation of Quantum Mechanics; Springer: New York, NY, USA, 1999. [Google Scholar]
Kempe, J. Quantum random walks: An introductory overview. Contemp. Phys. 2003, 44, 307–327. [Google Scholar] [CrossRef]
Ji, M.; Meng, K.; Ding, K. Euler characteristics and homotopy types of definable sublevel sets, with applications to topological data analysis. arXiv 2023, arXiv:2309.03142. [Google Scholar] [CrossRef]
Bierstone, E.; Milman, P.D. Semianalytic and subanalytic sets. Publ. MathÉmatiques L’IhÉs 1988, 67, 5–42. [Google Scholar] [CrossRef]
Armenta, M.A.; Jodoin, P.-M. The representation theory of neural networks. Mathematics 2021, 9, 3216. [Google Scholar] [CrossRef]
Jeffreys, G.; Lau, S.-C. Kähler geometry of framed quiver moduli and machine learning. Found. Comput. Math. 2023, 23, 1899–1957. [Google Scholar] [CrossRef]
Jeffreys, G.; Lau, S.-C. Noncommutative geometry of computational models and uniformization for framed quiver varieties. Pure Appl. Math. Q. 2023, 19, 731–789. [Google Scholar] [CrossRef]
Carlsson, G.; Zomorodian, A.; Collins, A.; Guibas, L. Persistence barcodes for shapes. Intl. Shape Model. 2005, 11, 149–187. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Letscher, D.; Zomorodian, A. Topological persistence and simplification. Discret. Comput. Geom. Graph Draw. 2002, 28, 511–533. [Google Scholar] [CrossRef]
Zomorodian, A.; Carlsson, G. Computing persistent homology. Discrete Comput. Geom. 2005, 33, 249–274. [Google Scholar] [CrossRef]
Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
Dasarathy, B.; Sheela, B. A composite classifier system design: Concepts and methodology. Proc. IEEE 1979, 67, 708–713. [Google Scholar] [CrossRef]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Yang, Y.; Lv, H.; Chen, N. A survey on ensemble learning under the era of deep learning. Artif. Intell. Rev. 2023, 56, 5545–5589. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
Fusion of Label Outputs; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2004; Chapter 4; pp. 111–149.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]

Figure 1. An example of a logifold. The graph jumps over values 0 and 1 infinitely in left-approaching to the point marked by a star (and the length of each interval is halved). This is covered by infinitely many charts of linear logical functions, each of which has only finitely many jumps. Moreover, the base is a measurable subset of

R

(which is hard to depict and not shown in the picture).

Figure 2. The left hand side shows a simple example of a logifold. It is the graph of the step function

[- 1, 1] \to {0, 1}

. The figure in the middle shows a fuzzy deformation of it, which is a fuzzy subset in

[- 1, 1] \times {0, 1}

. The right hand side shows the graph of the probability distribution of a quantum observation, which consists of the maps

\frac{| z_{0} |^{2}}{| z_{0} |^{2} + {| z_{1} |}^{2}}

and

\frac{| z_{1} |^{2}}{| z_{0} |^{2} + {| z_{1} |}^{2}}

from the state space

P^{1}

to

[0, 1]

.

Figure 3. The left side shows a partial directed graph at vertex v, with five outgoing arrows. On the right, chambers are formed in

R^{2}

by the affine maps

L_{v} = (l_{1}, l_{2}, l_{3})

defined on

R^{2}

. A point x is marked in the chamber defined by

{l_{1} \leq 0, l_{2} \geq 0, l_{3} \leq 0}

. One of the arrows corresponding to the shaded chamber containing x is highlighted in the left diagram.

Figure 4. A linear logical graph for a feed-forward network whose activation function at each hidden layer is the step function and that at the last layer is the index-max function.

Figure 5. The linear logical graph for a feed-forward network whose activation function at each hidden layer is the ReLu function and that at the last layer is the index-max function.

Table 1. Summary of frequently used symbols. Let X denote the dataset equipped with a topology and a measure representing its distribution in real-world space. Following Definition 9, we define the fuzzy linear Logifold as the collection

M = {(U_{i}, ϕ_{i}, g_{i})}_{i \in I}

, where each triple consists of a domain

U_{i} \subset X

, an identification map

ϕ_{i} : U_{i} \to R^{n_{i}} \times {\tilde{T}}_{i}

, and a trained model

g_{i}

. For brevity, we refer to

M

as the set of trained models

{g_{i}}_{i \in I}

. Note that

U_{i} = Z_{α, i .}

for some certainty threshold

α

in application.

Table 1. Summary of frequently used symbols. Let X denote the dataset equipped with a topology and a measure representing its distribution in real-world space. Following Definition 9, we define the fuzzy linear Logifold as the collection

M = {(U_{i}, ϕ_{i}, g_{i})}_{i \in I}

, where each triple consists of a domain

U_{i} \subset X

, an identification map

ϕ_{i} : U_{i} \to R^{n_{i}} \times {\tilde{T}}_{i}

, and a trained model

g_{i}

. For brevity, we refer to

M

as the set of trained models

{g_{i}}_{i \in I}

. Note that

U_{i} = Z_{α, i .}

for some certainty threshold

α

in application.

Symbol	Meaning
X	A fuzzy topological measure space (dataset)
$M$	A collection of trained models ${g_{i}}_{i \in I}$
$(U_{i}, ϕ_{i}, g_{i})$	The identification $ϕ_{i} : U_{i} \subset X \to R^{n_{i}} \times {\tilde{T}}_{i}$ ,
	$Z_{i} : = ϕ_{i} (U_{i})$ and ${\tilde{T}}_{i}$ the target of $g_{i}$ .
$T_{i} : = ⋃ {\tilde{T}}_{i}$	Flattening of targets of the i-th model
$g_{i, j} : = j - th component of g_{i}$	The certainty of $g_{i}$ to $t_{j} \in {\tilde{T}}_{i}$
$C_{i} : = {max}_{j} g_{i, j},$	The certainty function of $g_{i}$
$P_{i} : = t_{arg {max}_{j} g_{i, j}}$	The prediction function of $g_{i}$
$Z_{α, i} \subset R^{n_{i}} \times T_{i}$	The certain part by $g_{i}$
	at certainty threshold $α$ in $Z_{i}$
$X_{α} = ⋃_{i \in I} ϕ_{i}^{- 1} (Z_{α, i}) \subset X$	The certain part of X
	at certainty threshold $α$ by $M$

Table 2. Accuracy results for varying certainty thresholds. Thresholds are selected as

σ (- \infty), σ (0), σ (1), \dots, σ (7)

, where

σ (x) : = {(1 + e^{- x})}^{- 1}

denotes the sigmoid function. The second column indicates accuracy using the refined voting system without validation history; thus, the accuracy at the 0 threshold reflects standard weighted voting based on validation performance. The third column shows accuracy within the subset of data classified above each certainty threshold. Baseline comparisons (simple average and majority voting) and refined voting using validation history explained in Section 4.5.5 are summarized below the table.

Table 2. Accuracy results for varying certainty thresholds. Thresholds are selected as

σ (- \infty), σ (0), σ (1), \dots, σ (7)

, where

σ (x) : = {(1 + e^{- x})}^{- 1}

denotes the sigmoid function. The second column indicates accuracy using the refined voting system without validation history; thus, the accuracy at the 0 threshold reflects standard weighted voting based on validation performance. The third column shows accuracy within the subset of data classified above each certainty threshold. Baseline comparisons (simple average and majority voting) and refined voting using validation history explained in Section 4.5.5 are summarized below the table.

Certainty Threshold	Accuracy by Refined Voting	Accuracy in Certain Part	The Number of Certain Data
0	0.6158	0.6158	10,000
0.5	0.6158	0.6158	10,000
0.7311	0.6187	0.6187	10,000
0.8808	0.7821	0.7821	10,000
0.9526	0.8185	0.8946	8653
0.9820	0.7856	0.9626	6906
0.9933	0.7363	0.9856	5361
0.9975	0.7068	0.9934	5361
0.9991	0.6740	0.9978	3246

Simple average: 0.6255. Majority voting: 0.5872. Voting system using validation history: 0.8486.

Table 3. Accuracy results for various Logifold compositions evaluated on the combined test set from MNIST, Fashion MNIST, and CIFAR10 datasets. The second column reports results obtained through non-refined voting methods (simple average, weighted average, or majority voting). The third column provides accuracy using refined voting with validation history.

Comprising Models	Accuracy by Non-Refined Voting	Accuracy by Refined Voting
T	75.16%	75.13%
$(M, F, C, T^{'})$	9.93%	89.31%
$(T, M, F, C, T^{'})$	75.16%	90.78%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Logifold Structure for Measure Space

Abstract

1. Introduction

Organization

2. Linear Logical Functions and Their Fuzzy Analogs

2.1. Linear Logical Functions and Their Graphs

2.2. Zero Locus

2.3. Parameterization

2.4. Fuzzy Linear Logical Functions

2.5. The Semiring of Fuzzy Linear Logical Functions

2.6. Graph of Fuzzy Linear Logical Function as a Fuzzy Subset

2.7. A Digression to Non-Linearity in a Quantum-Classical System

3. Linear Logical Structures for a Measure Space

3.1. Equivalence with Semilinear Functions

3.2. Approximation of Measurable Functions by Linear Logical Functions

3.3. Linear Logifold

4. Ensemble Learning and Logifolds

4.1. Mathematical Description of Neural Network Learning

4.2. A Brief Description of Ensemble Machine Learning

4.3. Logifold Structure

4.4. Thick Targets and Specialization

4.5. Voting System

4.5.1. Construction of the Target Graph

4.5.2. Voting Rule for Multiple Models Sharing the Same Target

4.5.3. Voting Rule at a Node

4.5.4. Accumulation of Votes Along Valid Paths

4.5.5. Vote Using Validation History

5. Experiments and Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Pseudo-Codes

References

Article Metrics

Citations

Article Access Statistics