1. Introduction
In brief, the quantum measurement problem consists of finding a rule that correlates states of a quantum system with those of a macroscopic observer. When phrased in probabilistic terms, the problem is to find a consistent rule of replacing joint probabilities,
, by conditional probabilities,
, where
a and
b represent states (or properties) of the system and the observer, respectively. In standard quantum mechanics the rule can be inferred from Bayes law by the following sequence of equivalences:
Thus, the process of conditioning by the event “
b has occurred” can be represented by the “state vector reduction”,
However, do we really need (
2)? From an operational point of view, it is enough if we know the joint probability,
and the probability of the condition,
Both numbers are directly related to experimental data, so (
2) is redundant.
If we try to generalize the above procedure beyond quantum mechanics, various possibilities arise. In nonlinear quantum mechanics, for example, once we obtain
and
, we can deduce the mathematical form of an effective state vector reduction, but it will not coincide with (
2), because the sequence of transformations (
1) will no longer be true (cf. [
1] for the details). A naive combination of (
2) with nonlinear evolution of states implies the inconsistency known as faster-than-light communication [
2,
3,
4,
5]. Of course, one can work with the projection postulate even in nonlinear quantum mechanics (eliminating the faster-than-light effect), but the form of state vector reduction must be first derived in a consistent way from Bayes law [
1]. Here, consistency is the keyword.
The Bayes law, when written as
, is known as the product rule. Jaynes [
6] (following the ideas of Aczél [
7] and Cox [
8]) derives the product rule from some very general desiderata of consistent and plausible reasoning but, interestingly, what one finds turns out be more general,
where
g is some monotone non-negative function (cf. Equation (2.27) in [
6]). Still, for Jaynes,
is not yet a probability. His intuition tells him that the probability (or, rather, a measure of plausibility) is given by
, so that the product rule is reconstructed in the standard form,
What we will discuss later on in this paper employs a possibility that was not taken into account by Jaynes. Namely, we will treat formulas such as (
5) as a definition of a new product, ⊙, so that
We will also see that
and its higher iterates have intriguing similarities to neural activation functions, whereas higher iterates of
resemble a white noise.
A new product is an element of a new arithmetic, leading us ultimately to a whole hierarchical structure of such generalized models. As one of the conclusions, we will find that both
p and
may be treated as genuine probabilities, provided
g is restricted to the class discussed in detail in
Section 2. One of the possibilities, directly related to the measurement problem, is that
p are probabilities at a hidden-variable level, whereas
are the quantum ones. We will see that any two neighboring levels of the hierarchy are related to each other in a way that may be regarded as a form of a quantum–subquantum relationship. This will lead to the idea of relativity of quantumness.
In any such generalized and fundamental theory one is necessarily confronted with the chicken-or-egg dilemma: What was first, and , or and ? The Bayes law that defines the conditional probability in terms of the joint probability, or the product rule that defines the joint probability in terms of the conditional probability?
An alternative form of the dilemma can be expressed in terms of the projection postulate: Do we first define conditional probabilities in terms of some given form of state vector reduction, or we begin with joint probabilities and then infer the form of state vector reduction? In nonlinear quantum mechanics, the latter strategy is superior to the former one. However, in the Bayesian approach to probability, one updates probabilities on the basis of prior information, so the conditional probabilities are superior to the joint ones.
The formalism of arithmetic hierarchies discussed in the present paper clearly prefers the Bayesian approach. The reason is in the three fundamental lemmas, which we will discuss in
Section 2, which are true only for binary probabilities. There is priority in the binary coding, as we have to construct probabilities involving more than two events in terms of binary trees of conditional probabilities. Binary coding becomes as fundamental for probability theory as the two-spinors are fundamental for relativistic physics [
9].
We begin in
Section 2 by recalling the three fundamental lemmas about the functional equation
. In
Section 3, we construct a hierarchy of isomorphic arithmetics associated with
. The hierarchy of arithmetics leads to a hierarchy of probabilities introduced in
Section 4. A hierarchical ordering relation, briefly discussed in
Section 5, will allow us to unambiguously employ symbols such as < and >. A family of product rules, discussed in
Section 6, is employed in the problem of hidden-variables representation of singlet-state probabilities in
Section 7. We explain, in particular, that one encounters here three types of arithmetic levels in a single formula for joint probabilities: quantum, macroscopic, and hidden.
Section 8 introduces some elements of hierarchical calculi, with special emphasis on non-Newtonian integration. We make here a digression on Rényi’s entropy which is implicitly based on a generalized arithmetic, but does not take advantage of the possibilities inherent in generalized calculus.
Section 9 is devoted to local hidden-variable models of singlet-state probabilities constructed in terms of the generalized calculus. This seems to be the most controversial aspect of the formalism, as it clearly contradicts common wisdom about Bell’s theorem.
Section 10 brings us to the intriguing role played in quantum mechanics by the geodesic distance in the projective space of quantum states. A typical discussion of the Fubini–Study metric is restricted in the literature to its geometric interpretation. Here, we reveal its unknown aspect: Its role for the arithmetic structure of quantum states. It seems that
is a fundamental bijection that determines the arithmetic of the subquantum world. In
Section 11, we give a simple argument explaining why the effective number of distinguishable probabilistic levels of the hierarchy is finite. We also point out a possible interpretation of the hierarchy of probabilities in terms of neural activation functions. At such a formal level, the only means of relating formal probabilities to experiment is via the laws of large numbers, discussed in
Section 12. In
Section 13, we return to the problem of Bell’s inequalities. We depart here a little from the formalism we developed in a series of earlier papers where the same arithmetic was used at the hidden and the macroscopic levels. Our current understanding of the problem is that it is better to employ the freedom of combining different arithmetics simultaneously. We end the paper with remarks on open problems,
Section 14, and certain personal perspective is given in
Section 15. The
Appendix A is devoted to certain technicalities which cannot be found in the literature.
3. Hierarchy of Isomorphic Arithmetics
Assume that
occurring in the above three lemmas is a restriction of a bijection
, i.e.,
for
. It does not matter what the properties of
are if
, except for the bijectivity of
. Put differently,
g belongs to the equivalence class
of bijections whose restrictions to
are identical. Following the notation of Lemma 3, we denote
,
,
. Now, let
. Define,
The arithmetic
is the set
equipped with the above four operations, i.e.,
. The ordering relation is independent of
k if
g is increasing, which we therefore assume, hence
if and only if
. The neutral elements of addition,
, and multiplication,
,
can be regarded as bits, in principle applicable to some form of binary coding. Greater natural numbers are obtained by the
n-times repeated addition of
,
An
nth power of
x,
satisfies
Rational numbers are those of the form
The notion of rationality is arithmetic-dependent. Indeed, let
be a rational number in the arithmetic
. Then, typically,
,
, is not a rational number in
. Still, it is a rational number in the arithmetic
in consequence of (
26).
For any
, the four arithmetic operations are related by
The bijection
is an isomorphism of
and
, for any
,
The value
is not privileged. The role of a 0th level can be played by any
l. The notation where
is perfectly acceptable, hence any
can be regarded as “the” ordinary arithmetic we are taught at school. The latter statement is the content of the “arithmetic Copernican principle”, introduced in [
13] and discussed further in [
14]. In the present paper we nevertheless simplify notation and assume
. This is analogous to the usual habit of imposing initial conditions in Newtonian dynamics “at
” instead of a general
.
The hierarchy of arithmetics leads to the hierarchy of probabilities.
4. Hierarchy of Probabilities
Let
, so that
and
, for any
k. Now, let
p,
q,
, be probabilities. Assuming that
g satisfies the assumptions of Lemma 1, we find (in consequence of Lemmas 2 and 3, and
for any
)
for any
. The Copernican aspect is visible at the level of probabilities as well, if we define
,
, so that
for any
. Indeed, how to distinguish between (
36)–(
38) and (
39)–(
41), if we bear in mind that
k can be positive, negative, or zero, and the formulas are true for all
k? How to distinguish between the two levels if in both cases we find
and
? Which of the probabilities,
p or
P, is the one we measure in experiment? Which iterate,
k, 0, or
, is the one that defines our probabilities we experimentally define in terms of frequencies of successes? Which natural numbers
,
, or
, are the ones we use to define numbers of trials and successes?
Formula (
38) shows that probabilities
p and
q sum to 1 in infinitely many ways, corresponding to infinitely many values of
k in
. Formula (
37) shows that probabilities
p and
q generate infinitely many probabilities
and
that sum to 1 by means of the same addition
. The Arithmetic Copernican Principle is a relativity principle which states that any value of
k can correspond to the arithmetic and probability that we regard as “the human and experimental one”.
Still, this is not the end of the story. Replacing in (
37)
k by
,
and acting on both sides with
, we find
for any
. The resulting wealth of available probability models implied by a single bijection
g is truly overwhelming, yet ignored by those who study quantum probabilities and the hidden variables problem.
Let us now consider the concrete case of the equivalence class of a function
whose restriction to
is given by
. Then,
Let
be the probability of finding a point belonging to the overlap of two half-circles rotated by
. Then, for
,
,
in which we recognize the conditional probabilities for two successive measurements of spin-1/2 in two Stern–Gerlach devices placed one after another, with relative angle
.
By Lemma 3, we have in fact much more, because
can be replaced by any integer. For example, the second iterate
satisfies
, of course, as can be proved by a straightforward but instructive calculation [
14]. The minus-first iterate,
satisfies
, and so on and so forth.
Clearly, we have absolutely no criterion that could indicate which level of the hierarchy is the one we regard as our human one, a fact that justifies the adjective “Copernican”. For example, rewriting (
49) as
we find the relation between the two parameters,
and
, corresponding to the two levels of the hierarchy (see
Figure 1),
The usual tests of classicality and quantumness are based on inequalities. However, in order to discuss an inequality we have to control ordering relations such as ≤ and ≥. Fortunately, with our assumptions about g the problem is trivial.
8. Hierarchy of Calculi
A hierarchy of arithmetics leads to a hierarchy of “non-Newtonian” calculi [
16,
17,
18,
19,
20,
21]. Here, functions such as
have to be treated as mappings between arithmetics and not between sets, hence it is more appropriate to write
with some
. Otherwise the notions of derivative and integral are ambiguous. The derivative of
is
As before,
,
. The derivative is
-linear and satisfies an appropriate Leibniz rule,
Integration of
is defined in a way that guarantees the two fundamental theorems of calculus (under standard assumptions about differentiability and continuity):
The formulas become less abstract if one considers the following commutative diagram (
)
leading to a very simple and useful form of the derivative (
74),
while the integral reads,
Here,
denotes the usual (Riemann, Lebesgue, etc.) integral in
. Formula (
80) is derived under the assumption that
is continuous (in the usual meaning of the term employed in ordinary “Newtonian” real analysis), which is however, automatically guaranteed by the fact that
g is a bijection. What is important, neither
g nor its inverse
f have to be differentiable in the standard Newtonian sense. The latter makes an important difference with respect to the ordinary differential geometry where functions such as
would be excluded as non-differentiable at
. In the non-Newtonian formalism, any bijection
g, as well as its inverse
f, are automatically smooth with respect to the non-Newtonian differentiation defined by the same
g. Various explicit examples can be found in [
22,
23].
Linearity of the integral must be understood in the sense of
,
a property of fundamental importance for Bell-type inequalities [
13]. An analogous form of generalized linearity of integrals occurs in fuzzy calculus [
24,
25,
26,
27,
28].
Diagram (
79) implies
which leads to a new type of a chain rule, relating derivatives and integrals at different levels of the hierarchy,
Formulas (
85) and (
86) do not seem to appear in the literature, so we prove them in
Appendix A.
Digression: Logarithm and Rényi Entropies
Exponential function is defined by the differential equation,
The solution is given by
and satisfies
The inverse is given by
where
, and
Now, consider
,
. Rényi introduced his
-entropy as a Kolmogorov–Nagumo average [
29,
30,
31,
32,
33,
34,
35,
36] of the Shannon amount of information [
37] (we prefer the natural logarithm to the original
from [
33], but this is just a choice of units of information),
It is clear that (
92) can be expressed in several different ways by means of generalized arithmetics. For example,
has the same functional form as
. Alternatively, defining
and
, we find
Rényi’s choice of
was dictated by the assumed additivity of entropy for independent (i.e., uncorrelated) systems. Our general formalism suggests various hierarchical generalizations of the notion of entropy, automatically inheriting the additivity properties from the arithmetics involved. Some examples can be found in [
10].
9. Application: Local Hidden-Variable Models Based on Non-Newtonian Integration
Consider an integral representation of the standard
-valued probability, with probability densities
and characteristic functions
treated as mappings
. For example, setting
in (
63) and (
64) one can express the probabilities in integral forms,
is the characteristic function of the half-circle located symmetrically with respect to the angle
;
is the uniform probability density on the circle. Formula (
99) is local in the sense of Bell [
38] and Clauser and Horne [
39], because of the product structure of the term
The case
of Bayes law discussed in
Section 6 is (with
)
which is equivalent to the assumption that the first measurement reduces the probability density according to
Equation (
103) is an example of a classical projection postulate in theories based on
arithmetic.
Returning to the singlet case, corresponding to
,
, we can write it in analogy to (
98) and (
99),
where
, and
is the characteristic function representing the conjunction “
and
”. Notice that (
106) is a non-Newtonian integral
of the function
where
and the multiplication is given by
The right-hand side of (
109) has again the Bell–Clauser–Horne product form, the only difference being that instead of
one employs
. This is why (
109) can be regarded as a local hidden-variable representation of singlet-state probabilities, hence a counterexample to Bell’s theorem. This is the main idea of the approach to singlet-state correlations introduced in [
12] and further discussed in [
10,
13,
14].
A formal basis of the construction from [
10,
12,
13,
14] is given by the following:
Lemma 4. Consider four joint probabilities , , , , satisfyingA sufficient condition foris given by , where g satisfies Lemma 1. Any such G has a fixed point at . A disadvantage of the construction based on Lemma 4 is its restriction to “rotationally symmetric” probabilities, i.e., those fulfilling (
114). Moreover, being in itself sufficient as a counterexample to Bell’s theorem, it lacks the generality typical of arbitrary
.
The fundamental structure of the quantum probability model seems to be best described by Formula (
69).
So far, the angles occurring in singlet-state probabilities were interpretable as experimental parameters (angles between polarizers or Stern–Gerlach devices). But what about arbitrary quantum states, even those described by infinite-dimensional Hilbert spaces? It turns out that the parameter in question can be interpreted in geometric terms, independently of the physical nature of the problem.
10. Fubini–Study Geodesic Distance as a Hidden Variable
The scalar product
of two vectors belonging to some Hilbert space defines their Fubini–Study geodesic distance
[
40,
41,
42,
43,
44,
45],
Let
be a projector,
, and
, so that
is a conditional quantum probability. The geodesic distance between
and
satisfies
and thus,
The formal angle
between the two vectors in the Hilbert space acquires a direct physical interpretation if
a and
b represent linear polarizations of photons:
becomes the angle between two polarizers. In the analogous case of the electrons,
would represent one half of the angle between two Stern–Gerlach devices.
Next, let us rewrite (
118) as
where
is the bijection we have introduced in the context of the singlet state. Probabilities
and
represent, respectively, the hidden and the quantum neighboring levels of the hierarchy of (conditional) probabilities. The hidden probability is thus directly related to the Fubini–Study geodesic distance,
where
is the probability that two randomly chosen and intersecting straight lines intersect at an angle not exceeding
.
The Fubini–Study geodesic distance has been turned into a classical measure of a subset of a quarter-circle. It defines the whole hierarchy of probabilities,
, where
is the quantum one. Note that
has been elevated to the role of a universal bijection, defining an arithmetic applicable to all the possible (pure) quantum states. Explicitly, we find
Since
is real, it can be written as a real quadratic form,
Hence,
where
in (
132) is defined in a way that parallels the form of
in (
128), but with all the “standard” sums
and products
replaced by
and
, and all the coefficients transformed by
g. In effect, the difference between (
128) and (
132) is purely notational, as one can write the whole hierarchy of probabilities in a “quantum” form as well,
This is the Copernican principle in action. The choice of the “quantum” level of the hierarchy is just a matter of convention. In fact, any formula from (
123)–(
127) can represent quantum mechanics known from textbooks.
It is perhaps more striking that any of these levels can be regarded as a hidden-variable level, where the hidden variable is given by an appropriate geodesic distance.
The concrete example of can help us to understand the structure of the whole hierarchy. We will see that, in spite of the infinite dimension of the hierarchy, one effectively deals with a finite dimensional structure.
11. Effective Trunction of the Infinite Hierarchy of Probabilities
Figure 2 explains why in spite of the infinite number of levels, those that statistically differ between one another may be limited to a finite “band” in the hierarchy. What it practically means is that if our level of the hierarchy is given by some
l (say,
) then, depending on the available precision of our experiments, we may restrict the analysis to a finite collection of probabilities. In the example depicted in
Figure 2, we can restrict the analysis to 31 levels,
because the full infinite hierarchy is indistinguishable from
When increasing
k in
, we effectively obtain a theory that may look discrete, because
,
, are indistinguishable from the red step function in
Figure 2. For
,
, we obtain an analogous behavior of the inverse functions.
Let us stress that the above argument for indistinguishability has been formulated only for probabilities, , hence for , and not for , . In principle, for , all the levels of the hierarchy may be distinguishable.
Notice that for this concrete
, one finds
if
,
, and
if
. Thus, the higher-level probabilities possess several obvious analogies to neural activation functions [
46], making links between the hierarchical structure and the measurement problem even more intriguing. An observer who measures
probabilities ignores practically all the events whose probability is smaller than 1/2, and treats all
as certain.
This type of behavior is the essence of learning algorithms. An intriguing possibility occurs that is a probability related to the act of learning that events with probability p are true. Hence, the natural question: Is the stabilization of large iterates on effectively the step function a formal counterpart of stabilization of self-observation, a creation of self-awareness?
For the negative iterates, instead of a threshold function we tend toward a “white noise”: , , , and , for . The lower levels of the hierarchy become less and less diverse from the point of view of a higher-level observer. Here, the analogy is with observations of micro-scale events is quite evident. The relativity of probability becomes analogous to the “relativity of smallness”—what is small to us, may be large for a bacteria or an atom.
It is worth recalling that and only look discrete due to our limited resolution—in reality, both maps are continuous bijections of into itself.
Now, what about experiment and laws of large numbers? Can they somehow discriminate between all these probabilities?
12. Hierarchical Laws of Large Numbers
Laws of large numbers formalize the relations between probabilities (real numbers), (natural) numbers of trials and successes, and (rational) numbers of their relative frequencies. However, as we already know, all these notions are arithmetic dependent: a natural number may not be a natural number from the point of view of some other , a rational number may not be a rational number from the point of view of , and so on. The most general law of large numbers should involve all the levels of the hierarchy simultaneously. Dealing with binary events, we need an appropriate generalization of the Bernoulli law of large numbers.
To begin with, let us imagine we “live” in a world where all the possible computations are performed in terms of the arithmetic . If we toss a coin, say, one hundred times, and observe heads forty times, the arithmetic formulation of the experiment involves heads in trials. The experimental ratio is . This is a rational number in .
If the same experiment is described by an observer who employs arithmetic , , the experimental ratio is given by . In terms of and we can write and . Yet, if we demanded , it would imply that , i.e., is a fixed point of . Since the same argument can be applied to any rational number, one arrives at the conclusion that the trivial case is the only solution.
One concludes that a nontrivial g generically implies for . In other words, the same experiment can be described by different probabilities, , although from the frequentist perspective both descriptions involve forty successes in one hundred trials. We inevitably arrive at the whole hierarchy.
This is my tentative interpretation of the hierarchical structure. However, the links with neural activation functions deserve a separate study.
In order to formulate a generalized Bernoulli law of large numbers, we have to estimate the probability that
The modulus is defined in
in the standard way,
where we keep in mind that, by assumption,
and the ordering relation is unaffected by a strictly increasing
g. Inequality (
140) effectively boils down to
Next, we note that probabilities depicted in the lower part of
Figure 3 are normalized in consequence of the identity
The probability
corresponds to
sucessess in
trials. The expected number of successes and the corresponding variance read,
Applying
to (149) and (
150), we find
Now, let
if
. Then,
where
is the 0th-level probability that
. In this way we have arrived at the standard Bernoulli law of large numbers in
,
Of course, the left-hand side of (
155) cannot be greater than 1, so the number of trials
N must be chosen so that
For
we find, denoting
,
,
for any
.
In order to have a feel of the influence of
on the rate of convergence of experimental ratios to probabilities, consider the simple case of a symmetric coin,
, and the universal quantum bijection
. Since
for any
, we have to estimate
Figure 4 illustrates the right-hand side of (
158) for
and
, for the first four iterates of
g, from
to
The graphs are intriguing. Their interpretation is additionally obscured by the fact that Wolfram Mathematica operates in the arithmetic
, which is not used by any of the four observers. The problem requires further studies.
13. Hierarchical Approach to Bell’s Theorem—Revisited
If we are able to reconstruct singlet-state probabilities in a hidden-variable way, it means that Bell’s inequality (in any form) cannot be proved for the model. In the hierarchical context the obstacle for proving the inequality lies in the lack of the k-level additivity of the l-level integrals, if . The usual derivation, when seen from the hierarchical perspective, assumes -additivity of integrals, which is untrue for a nontrivial g, and in particular, hence the inequality derived at level zero does not apply to level 1: Level-0 formulas are “violated” by level-1 probabilities (and the other way around).
Let us see how it works. Consider the joint probabilities
where we assume the independence of the order in which the measurements are performed. This is typical of the scenarios involving “observer 1 measuring
” (“Alice”) and “observer 2 measuring
” (“Bob”) who are space-like separated and thus the order is undefined.
Now, we will derive an analog of the Clauser–Horne inequality [
39]. We will work with probabilities (
161). Let us stress that an analogous derivation was presented in [
13], but was based on the form occurring in (
109), that is by means of the bijection
G. The derivation we will discuss now is based on
, and not on
. Why? Because we want a proof that is easy to generalize to any
.
We assume a local-hidden variable form of the probabilities that occur at the hidden level (level zero), hence
Level-one conditional probabilities
can be rewritten in several useful forms. First of all, introducing the reduced (conditional) probability density we obtain the “projection postulate”,
Secondly, we can explicitly express the conditional probability in a local Clauser–Horne form (in the arithmetic
),
where
(because
,
).
Repeating step by step the derivation of the Clauser–Horne inequality [
39], but here in the arithmetic
, we can derive an analogous inequality which must be satisfied at the quantum level of the hierarchy. Such an inequality cannot be violated by quantum probabilities. For simplicity let us reduce the analysis to singlet-state probabilities and
for any
. Then,
and
Next, we consider the Clauser–Horne linear combination
Repeating in
the reasoning from [
39], we obtain
-multiplying the latter by
, integrating with
, and taking into account the
-linearity of the
integral, we find
Notice that inequality (
179) involves conditional probabilities, as opposed to the original Clauser–Horne one which was based on joint probabilities. The inequalities derived in the arithmetic induced by
and discussed in [
12,
13] were also based on joint probabilities. However, joint probabilities involve the “macroscopic” level-0 multiplication of
by
, whereas the conditional probabilities involve only the arithmetic of the “microscopic” level-1 probability
.
When investigating the violation of inequalities such as (
179) one should keep in mind the difference between
, for
, and its extension
(x) beyond the interval
. Here, (
179) is derived under the assumption that
, for any integer
n. Readers interested in explicit examples of
may consult [
12,
13,
14].
The inequality that can indeed be violated is
but it cannot be proved for the model, so is simply untrue. The technical difficulty in proving (
180) is the lack of
-linearity of the
integral.
The notion of “violation” of a formula is, in my opinion, very confusing. In the same sense one could say that the real-number inequality is violated by complex numbers. Instead of saying that violates one rather says that cannot be proved for all . The same happens with the Bell inequality, derived in but not valid in . On the other hand, the inequalities that can be derived in are never “violated” in , but certainly will be untrue in some other .
15. An Open Ending
Standard modern physics involves a three-level hierarchy: quantum, classical and cosmological. As human observers, we are positioned at the center of this hierarchy, but the connections with the remaining two levels remain unclear. We do not understand how is it that we observe quantum properties (the measurement problem). Similarly, we do not understand our relation with the large-scale universe (the dark energy problem). In both cases the arithmetic freedom is probably essential [
12,
48] but generally overlooked by our scientific community.
Bell’s theorem is generally believed to eliminate levels lower than the quantum one, but the hierarchical picture questions this viewpoint: Quantum and classical probabilities typical of the singlet state belong to neighboring levels in the hierarchy—any two neighboring levels. Elimination of any of the levels, thus, would destroy the whole hierarchical structure, all quantum levels included [
14].
To the best of my knowledge, the first systematic study of generalized arithmetics in physics was initiated by my paper [
49], in which the relativity of arithmetic was interpreted in terms of a fundamental symmetry. However, I merely rediscovered a structure that had previously been introduced to calculus by Grossman and Katz (non-Newtonian calculus) [
16,
17,
18], Maslov (idempotent analysis) [
50] and Pap (g-calculus) [
19]. The origins of the idea of generalized arithmetic and calculus can be traced back to the works of Volterra on the product integral [
51], Kolmogorov [
34], and Nagumo [
35] on generalized means, and Rényi on generalized entropies [
33]. Studies of a nonstandard number theory were initiated by Rashevsky [
52] and, in a concrete form of non-Diophantine arithmetic, developed by Burgin [
53,
54,
55,
56]. Generalized forms of arithmetic can be found in Bennioff’s attempts to formulate a coherent theory of physics and mathematics [
57,
58,
59,
60,
61]. Mathematical constructions such as Lad’s impediment functions [
62], cepstral signal analysis [
63,
64], fractal
-calculus [
65,
66,
67,
68,
69,
70], or nonextensive statistics [
71,
72,
73,
74,
75], involve certain formal elements analogous to non-Newtonian integration or differentiation. The first application of non-Newtonian calculus to probability of which I am aware was provided by Meginniss in his analysis of the objectivity of
p versus the subjectivity of
, with applications to gambling theory [
76]. Another field in which generalized arithmetic and non-Newtonian calculus are starting to attract attention is mathematical finance [
77,
78]. From my personal perspective, the most important achievements of the new formalism include circumventing the limitations of Bell’s theorem and Tsirelson bounds in quantum mechanics [
12,
13]; the arithmetic of time, which appears to eliminate dark energy from cosmology in the same way that the arithmetic of velocities eliminated the luminiferous aether from special relativity [
48]; formulating wave propagation along fractal coastlines [
23]; and overcoming the limitations of Fourier analysis on Cantor sets [
47].
The two most important observations of the present study seem to be the interpretation of the singlet-state probabilities in terms of several different arithmetic levels occurring in a single Formula (
69),
and the possible links with neural network learning algorithms.
The hierarchical structure is clearly “there”. What we have understood so far is just the tip of the iceberg.