Quantum-Like Sampling

Andreas Wichert

doi:10.3390/math9172036

Department of Informatics, INESC-ID/IST-University of Lisboa, 1000-029 Lisboa, Portugal

Mathematics2021, 9(17), 2036;https://doi.org/10.3390/math9172036

This article belongs to the Special Issue Advances in Quantum Artificial Intelligence and Machine Learning

Version Notes

Order Reprints

Abstract

Probability theory is built around Kolmogorov’s axioms. To each event, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. Kolmogorov’s probabilities of events are added, the sum of all possible events is one. The numerical degrees of belief can be estimated from a sample by its true fraction. The frequency of an event in a sample is counted and normalized resulting in a linear relation. We introduce quantum-like sampling. The resulting Kolmogorov’s probabilities are in a sigmoid relation. The sigmoid relation offers a better importability since it induces the bell-shaped distribution, it leads also to less uncertainty when computing the Shannon’s entropy. Additionally, we conducted 100 empirical experiments by quantum-like sampling 100 times a random training sets and validation sets out of the Titanic data set using the Naïve Bayes classifier. In the mean the accuracy increased from

78.84 %

to

79.46 %

.

Keywords:

quantum probabilities; sampling; quantum cognition; naïve bayes

1. Introduction

Quantum algorithms are based on different principles than classical algorithm. Here we investigate a simple quantum-like algorithm that is motivated by quantum physics. The incapability between Kolmogorov’s probabilities and quantum probabilities results from the different norms that are used. In quantum probabilities the length of the vector in

l_{2}

norm representing the amplitudes of all events is one. Usually, Kolmogorov’s probabilities are converted in Quantum-like probabilities by the squared root operation, it is difficult to attach any meaning to the squared root operation. Motivated by the lack of interpretability we define quantum-like sampling, the

l_{2}

sampling. The resulting Kolmogorov’s probabilities are not any more linear but related to the sigmoid function. In the following we will introduce the traditional

l_{1}

sampling Then we will introduce the quantum-like sampling, the

l_{2}

sampling. We will indicate the relation between the

l_{2}

sampling, the sigmoid function and the normal distribution. The quantum-like sampling leads to less uncertainty. We fortify this hypothesis by empirical experiments with a simple Naïve Bayes classifier on the Titanic dataset.

2. Kolmogorovs Probabilities

Probability theory is built around Kolmogorov’s axioms (first published in 1933 [1]). All probabilities are between 0 and 1. For any proposition x,

0 \leq p (x) \leq 1

and

p (t r u e) = 1, p (f a l s e) = 0 .

To each sentence, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. The last axiom expresses the probability of disjunction and is given by

p (x \lor y) = p (x) + p (y) - p (x \land y)

where do these numerical degrees of belief come from?

Humans can believe in a subjective viewpoint, which can be determined by some empirical psychological experiments. This approach is a very subjective way to determine the numerical degree of belief.
For a finite sample we can estimate the true fraction. We count the frequency of an event in a sample. We do not know the true value because we cannot access the whole population of events. This approach is called frequentist.
It appears that the true values can be determined from the true nature of the universe, for example, for a fair coin, the probability of heads is $0.5$ . This approach is related to the Platonic world of ideas. However, we can never verify whether a fair coin exists.

Frequentist Approach and $l_{1}$ Sampling

Relying on the frequentist approach, one can determine the probability of an event x by counting. If

Ω

is the set of all possible events,

p (Ω) = 1

and the cardinality

card (Ω)

determines the number of elements of a set

Ω

and

card (x)

is the number of elements of the set x and with

x \in Ω

p (x) = \frac{card (x)}{card (Ω)} .

(1)

This kind of sampling is the

l_{1}

sampling. With n events

x_{1}, x_{2}, . . ., x_{n}

that describe all possible events of the set

Ω

,

0 \leq \frac{card (x_{i})}{card (Ω)}, \sum_{i = 1}^{n} \frac{card (x_{i})}{card (Ω)} = 1 \Leftrightarrow \sum_{i = 1}^{n} card (x_{i}) = card (Ω)

We can interpret

\frac{card (ω_{i})}{card (Ω)}

as dimension of an n dimensional vector with the vector being

(\begin{matrix} \frac{card (x_{1})}{card (Ω)} \\ \frac{card (x_{2})}{card (Ω)} \\ \dots \\ \frac{card (x_{n})}{card (Ω)} \end{matrix}) .

(2)

with the

l_{1}

norm

{∥ x ∥}_{1} = | x_{1} | + | x_{2} | + \dots + | x_{n} |

(3)

it is as a unit-length vector in the norm

l_{1}

|\frac{card (x_{1})}{card (Ω)}| + |\frac{card (x_{2})}{card (Ω)}| + \dots + |\frac{card (x_{n})}{card (Ω)}| = 1 .

(4)

3. Quantum Probabilities

Quantum physics evaluates a probability

p (x)

of a state x as the squared magnitude of a probability amplitude

A (x)

, which is represented by a complex number

p (x) = {| A (x) |}^{2} = A {(x)}^{*} \cdot A (x) .

(5)

This is because the product of complex number with its conjugate is always a real number. With

A (x) = α + β \cdot i

(6)

A {(x)}^{*} \cdot A (x) = (α - β \cdot i) \cdot (α + β \cdot i)

(7)

A {(x)}^{*} \cdot A (x) = α^{2} + β^{2} = {| A (x) |}^{2} .

(8)

Quantum physics by itself does not offer any justification or explanation beside the statement that it just works fine, see [2]. This quantum probabilities are as well called von Neumann probabilities. Converting two amplitudes into probabilities leads to an interference term

2 \cdot ℜ (A (x) \cdot A^{*} (y))

,

\begin{matrix} {| A (x) + A (y) |}^{2} = {| A (x) |}^{2} + {| A (y) |}^{2} + 2 \cdot ℜ (A (x) \cdot A^{*} (y)) \\ {| A (x) + A (y) |}^{2} = p (x) + p (y) + 2 \cdot ℜ (A (x) \cdot A^{*} (y)), \end{matrix}

(9)

making both approaches, in general, incompatible

{| A (x) + A (y) |}^{2} \neq p (x) + p (y) .

(10)

In other words, the summation rule of classical probability theory is violated, resulting in one of the most fundamental laws of quantum mechanics, see [2]. In quantum physics we interpret

A (x_{i})

as dimension of an n dimensional vector with the vector being

(\begin{matrix} A (x_{1}) \\ A (x_{2}) \\ \dots \\ A (x_{n}) \end{matrix}) .

(11)

In quantum physics the unit-length vector is computed in the in the norm

l_{2}

{∥ x ∥}_{2} = \sqrt{(| x_{1} |^{2} + | x_{2} |^{2} + \dots + {| x_{n} |}^{2})}

(12)

instead of the norm

l_{1}

with a unit-length vector in the norm

l_{2}

| A (x_{1}) |^{2} + | A (x_{2}) |^{2} + \dots + {| A (x_{n}) |}^{2} = 1 .

(13)

By replacing the

l_{1}

norm by the Euclidean

l_{2}

norm in the classical probability theory (Kolmogorov probabilities), we obtain quantum mechanics [3] with all the corresponding laws. The incapability between the Kolmogorov’s probabilities and quantum probabilities results from the simple fact

{∥ x ∥}_{2} \leq {∥ x ∥}_{1} .

(14)

This incapability is as well the basis for quantum cognition. Quantum cognition is motivated by clues from psychology indicate that human cognition is based on quantum probability rather than the traditional probability theory as explained by Kolmogorov’s axioms, see [4,5,6,7]. Empirical findings show that, under uncertainty, humans tend to violate the expected utility theory and consequently the laws of classical probability theory (e.g., the law of total probability [6,7,8]), In [4,9,10,11,12,13,14] leading to what is known as the “disjunction effect” which, in turn, leads to violation of the Sure Thing Principle. The violation results from an additional interference that influences the classical probabilities.

Conversion

The amplitude is the root of the belief multiplied with the corresponding phase [11,12,13]

θ \in [0, 2 \cdot π)

A (x) = a (x, θ) = \sqrt{p (x)} \cdot e^{i \cdot θ} .

(15)

With

l_{1}

sampling it is difficult to attach any meaning to

\sqrt{card (x)}

,

a (x, θ) = \sqrt{\frac{card (x)}{card (Ω)}} \cdot e^{i \cdot θ} .

(16)

Motivated by the lack of interpretability we define the quantum-like sampling, the

l_{2}

sampling with n events

x_{1}, x_{2}, . . ., x_{n}

that describe all possible events

Φ = | card (x_{1}) |^{2} + | card (x_{2}) |^{2} + \dots + {| card (x_{n}) |}^{2}

(17)

p (x_{i}) = \frac{| card (x_{i}) |^{2}}{Φ}, \sum_{i = 1}^{n} \frac{| card (ω_{i}) |^{2}}{Φ} = 1

(18)

and

a (x, θ) = \frac{card (x)}{\sqrt{card (Φ)}} \cdot e^{i \cdot θ} .

(19)

with

Ω \leq Φ \leq Ω^{2} .

(20)

The

l_{2}

sampling leads to an interpretation of the amplitudes as a normalized frequency of occurrence of an event multiplied by the phase.

Φ

is dependent on the distribution of all values

card (x_{i})

. When developing empirical experiments that are explained by quantum cognition models

l_{2}

sampling should be used rather than

l_{1}

sampling. What is the interpretation of

| card (x_{i}) |^{2}

outside quantum interpretation?

4. Quantum-Like Sampling and the Sigmoid Function

To understand the difference between

l_{1}

sampling and

l_{2}

sampling (quantum-like sampling) we will analyze a simple binary event x

p (x) + p (\neg x) = 1 .

For

l_{1}

sampling

p (x)

and

p (\neg x)

is defined as

p (x) = \frac{card (x)}{card (Ω)}, p (\neg x) = \frac{Ω - card (x)}{card (Ω)}

and for

l_{2}

sampling

p (x) = \frac{card {(x)}^{2}}{card {(x)}^{2} + {(Ω - card (x))}^{2}}, p (\neg x) = \frac{{(Ω - card (x))}^{2}}{card {(x)}^{2} + {(Ω - card (x))}^{2}} .

with

ω = card (x)

we can define the functions for binary

l_{1}

and

l_{2}

sampling as

f (ω) = \frac{ω}{Ω}

(21)

and

g (ω) = \frac{ω^{2}}{ω^{2} + {(Ω - ω)}^{2}},

(22)

In Figure 1

f (ω) = \frac{ω}{Ω}

is compared to

g (ω) = \frac{ω^{2}}{ω^{2} + {(Ω - ω)}^{2}}

for

Ω

, (a)

Ω = 4

, (b)

Ω = 10

, (c)

Ω = 100

and (d)

Ω = 1000

. With growing size of

Ω

the function

g (ω)

converge to a continuous sigmoid function. In Figure 2 the sigmoid function

g (x)

with

Ω = 10

is compared to the well known scaled to the logistic function.

Figure 1.

f (ω) = \frac{ω}{Ω}

compared to

g (ω) = \frac{ω^{2}}{ω^{2} + {(Ω - ω)}^{2}}

for

Ω

, (a)

Ω = 4

, (b)

Ω = 10

, (c)

Ω = 100

and (d)

Ω = 1000

. With growing size of

Ω

the function

g (x)

converge to a sigmoid function.

Figure 2. Sigmoid functions:

g (x) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}

versus logistic function

σ (x) = \frac{1}{1 + e^{- 0.9523 \cdot (x - 5)}}

.

The derivative of the sigmoid function

g (x)

is a bell-shaped function. For the continuous function

g (x)

, the derivative is similar to the Gaussian distribution. The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will have an approximately Gausssian distribution. In Figure 3 the derivative of

g (x) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}

with

Ω = 10

,

g^{'} (x) = - \frac{5 (- 10 + x) x}{{(50 - 10 x + x^{2})}^{2}}

is indicated and compared to Gaussian distribution

N

with

μ = 5

and

σ^{2} = 1 . 9^{2}

.

Figure 3. (a)

g (x) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}

and

g^{'} (x) = - \frac{5 (- 10 + x) x}{{(50 - 10 x + x^{2})}^{2}}

; (b)

g^{'} (x) = - \frac{5 (- 10 + x) x}{{(50 - 10 x + x^{2})}^{2}}

versus

N (x | 5, 1 . 9^{2}) = \frac{1}{1.9 * \sqrt{2 * π}} \cdot e^{- \frac{1}{2} \cdot {(\frac{x - 5}{1.9})}^{2}}

.

The derivative of the sigmoid function

g (ω)

is less similar to the probability mass function of the the binomial distribution.

The

l_{2}

sampling leads to natural sigmoid representation of probabilities that reflects the nature of Gaussian/Normal distribution. This leads to less uncertainty using

l_{2}

compared to

l_{1}

sampling represented by Shannon’s entropy.

H = - p (x) \cdot {log}_{2} p (x) - p (\neg x) \cdot {log}_{2} p (\neg x)

(23)

as can be seen in Figure 4 for

Ω = 100

.

Figure 4. Shannon’s entropy H with

Ω = 100

, blue discrete plot for

l_{1}

sampling and yellow discrete plot

l_{2}

sampling.

Combination

Depending on

l_{1}

or

l_{2}

sampling we get different results when we combine probabilities. Multiplying two independent sampled events x and y results in the joint distribution

p_{l 1} (x, y) = \frac{x}{10} \cdot \frac{y}{10}

(24)

when

l_{1}

sampled with

Ω = 10

and

p_{l 2} (x, y) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}} \cdot \frac{y^{2}}{y^{2} + {(10 - y)}^{2}}

(25)

when

l_{2}

sampled with

Ω = 10

as indicated in the Figure 5. The same applies for weighted sum of information, corresponding to the Shannon’s entropy as used in the

I D 3

algorithm [15,16,17] for symbolical machine learning to preform a greedy search for small decision trees. Computing

H_{l 1} = - \frac{x}{10} \cdot {log}_{2} \frac{x}{10} - \frac{10 - x}{10} \cdot {log}_{2} \frac{10 - x}{10} - \frac{y}{10} \cdot {log}_{2} \frac{y}{10} - \frac{10 - y}{10} \cdot {log}_{2} \frac{10 - y}{10}

(26)

when

l_{1}

sampled with

Ω = 10

and

H_{l 2} = - \frac{x^{2}}{x^{2} + {(10 - x)}^{2}} \cdot {log}_{2} \frac{x^{2}}{x^{2} + {(10 - x)}^{2}} -

(1 - \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}) \cdot {log}_{2} (1 - \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}) -

\frac{x^{2}}{x^{2} + {(10 - x)}^{2}} \cdot {log}_{2} \frac{x^{2}}{x^{2} + {(10 - x)}^{2}} -

(1 - \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}) \cdot {log}_{2} (1 - \frac{x^{2}}{x^{2} + {(10 - x)}^{2}})

(27)

when

l_{2}

sampled with

Ω = 10

leads to different

H_{1}

and

H_{2}

values as indicated in the Figure 6.

Figure 5. Multiplying two independent sampled events x and y with

Ω = 10

leads to different results: (a)

p_{l 1} (x, y)

, (b)

p_{l 2} (x, y)

, (c) counterplot of

p_{l 1} (x, y)

, (d) counterplot of

p_{l 2} (x, y)

.

Figure 6. Different

H_{1}

and

H_{2}

values for sampled events x and y with

Ω = 10

: (a)

H_{1} (x, y)

, (b)

H_{2} (x, y)

, (c) counterplot of

H_{1} (x, y)

, (d) counterplot of

H_{2} (x, y)

.

Less uncertainty would lead to better results in machine learning. We preform empirical experiments with a simple Naïve Bayes classifier to fortify this hypothesis.

5. Naïve Bayes Classifier

For a target function

f : X \to h

, where each instance x described by attributes

a_{1}, a_{2} \dots a_{n}

Most probable value of

f (x)

is:

h_{m a p} = arg max_{h_{i}} p (h_{i} | a_{1}, a_{2}, a_{3}, . ., a_{n}) = \frac{p (a_{1}, a_{2}, a_{3}, . . ., a_{n} | h_{i}) \cdot P (h_{i})}{P (a_{1}, a_{2}, a_{3}, . . ., a_{n})}

The likelihood of the conditional probability

p (a_{1}, a_{2}, a_{3}, . . ., a_{n} | h_{i})

is described by

2^{n}

possible combinations. For n possible variables, the exponential growth of combinations being true or false becomes an intractable problem for large n since all

2^{n} - 1

possible combinations must be known. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence,

p (a_{1}, a_{2}, a_{3}, . . ., a_{n} | h_{i}) = \prod_{j = 1}^{n} p (a_{j} | h_{i}) .

is known as the Naïve Bayes assumption and is one of the most important developments in the recent history of Artificial Intelligence [16]. It assumes that a single cause directly influences a number of events, all of which are conditionally independent. The Naïve Bayes classifier is defined as

h_{m a p} = arg max_{h_{i}} \prod_{j = 1}^{n} P (a_{j} | h_{i}) \cdot P (h_{i}) .

Titanic Dataset

When the Titanic sank it killed 1502 out of 2224 passengers and crew. We are using a file titanic.cvs processed file from https://gist.github.com/michhar/ accessed on 1 July 2021 that corresponds to the file train.cvs from kaggle https://www.kaggle.com/c/titanic/data?select=train.csv accessed on 1 July 2021 that contains data for 891 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including their

P a s s e n g e r I d

, whether they

S u r v i v e d

, heir passenger-class

P c l a s s

, their

N a m e

, their

S e x

, their

A g e

, and other six attributes, see Figure 7. In our experiment we will only use the attributes

S u r v i v e d

,

P c l a s s

,

S e x

and

A g e

.

Figure 7. The Titanic data set is represented in an Excel table that contains data for 891 of the real Titanic passengers, some entries are not defined.

The binary attribute

S u r v i v e d

(=s) will be our target h resulting in the prior

p (s)

and

p (\neg s)

. Out of the 891 passengers 342 survived and 549 did not survive. The attribute

P c l a s s

has three values 1 for upper class, 2 for middle class and 3 for lower class. It will be binarized into the attribute c indicating if the person had a lower class cabin by

c = 1 i f P c l a s s = = 3 e l s e c = 0,

in the case

P c l a s s

is not defined the default value is 0. The attribute

S e x

has two binary values represented by the binary attribute m

m = 1 i f S e x = = m a l e e l s e m = 0,

in the case

S e x

is not defined the default value is 0. The attribute

A g e

will be binarized into the attribute g indicating if the person is a child or grown up by

g = 0 i f A g e < 18 e l s e g = 1,

in the case

A g e

is not defined the default value is 1.

Using the

l_{1}

sampling over the whole data set we get the values for the priors

p (s) = 0.3838, p (\neg s) = 0.6162,

and for the likelihoods,

p (c | s) = 0.3480, p (\neg c | s) = 0.6520, p (c | \neg s) = 0.6776, p (\neg c | \neg s) = 0.3224,

p (m | s) = 0.3187, p (\neg m | s) = 0.6813, p (m | \neg s) = 0.8525, p (\neg m | \neg s) = 0.14755,

p (g | s) = 0.8216, p (\neg g | s) = 0.178363, p (g | \neg s) = 0.9053, p (\neg g | \neg s) = 0.0947 .

Using the

l_{2}

sampling over the whole data set we get the values for the priors

p (s) = 0.27957406, p (\neg s) = 0.72042594

and for the likelihoods,

p (c | s) = 0.2217, p (\neg c | s) = 0.7783, p (c | \neg s) = 0.8154, p (\neg c | \neg s) = 0.1846,

p (m | s) = 0.1796, p (\neg m | s) = 0.8524, p (m | \neg s) = 0.9709, p (\neg m | \neg s) = 0.0291,

p (g | s) = 0.95499, p (\neg g | s) = 0.04501, p (g | \neg s) = 0.9892, p (\neg g | \neg s) = 0.0108 .

The sampled probability values are quite different.

We measure the accuracy of our algorithm by dividing the number of entries it correctly classified by the total number of entries. For

l_{1}

sampling there 190 are entries wrong classified resulting in the accuracy of

78.66 %

, for the

l_{2}

sampling 179 entries are wrong classified resulting in the accuracy of

79.91 %

.

In the next step we separate our data set in a training set and a validation set resulting in 295 elements in the validation set We use sklearn.model_selection accessed on 1 July 2021 import train_test_split accessed on 1 July 2021 with the parameters train_test_split(XH,yy,test_size=0.33,random_state=42) accessed on 1 July 2021. The later is used to validate how well our algorithm is doing. For

l_{1}

sampling 60 entries are wrong classified resulting in the accuracy of

79.66 %

, for the

l_{2}

sampling 55 entries are wrong classified resulting in the accuracy of

81.36 %

.

In the next we sample the 100 times a random training set and validation set We use sklearn.model_selection accessed on 1 July 2021 import train_test_split accessed on 1 July 2021 with the parameters train_test_split(XH,yy,test_size=0.33) accessed on 1 July 2021. For

l_{1}

sampling there were in the mean

62.42

entries wrong classified resulting in the accuracy of

78.84 %

, for the

l_{2}

sampling there were

60.59

entries wrong classified resulting in the accuracy of

79.46 %

.

The trend from this simple evaluation indicates that

l_{2}

sampling leads to better results compared to

l_{1}

sampling in the empirical experiments.

So far we looked at the performance of classification models in terms of accuracy. Specifically, we measured error based on the fraction of mistakes. However, in some tasks, there are some types of mistakes that are worse than others. However in our simple task the correctly classified results when

l_{1}

sampled are also correctly classified when

l_{2}

sampled.

6. Conclusions

We introduced quantum-like sampling also called the

l_{2}

sampling. The

l_{2}

sampling leads to an interpretation of the amplitudes as a normalized frequency of occurrence of an event multiplied by the phase

a (x, θ) = \frac{card (x)}{\sqrt{card (Φ)}} \cdot e^{i \cdot θ},

Φ

is dependent on the distribution of all values

card (x_{i})

. When developing empirical experiments that are explained by quantum cognition models

l_{2}

sampling should be used rather than

l_{1}

sampling.

The quantum inspired

l_{2}

sampling maps the probability values to a natural continuous sigmoid function, its derivative is is a bell-shaped function that is similar to the Gaussian distribution of events. The

l_{2}

sampling improves the classification accuracy in machine learning models that are based on sampling as indicated by empirical experiments with Naïve Bayes classifier.

Funding

This work was supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under project UIDB/50021/2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this paper are provided within the main body of the manuscript.

Acknowledgments

We would like to thank the anonymous reviewers for their valuable feedback.

Conflicts of Interest

Compliance with Ethical Standards: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The author declare no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

References

Kolmogorov, A. Grundbegriffe der Wahrscheinlichkeitsrechnung; Springer: Berlin, Germany, 1933. [Google Scholar]
Binney, J.; Skinner, D. The Physics of Quantum Mechanics; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
Aaronson, S. Is quantum mechanics an island in theoryspace? In Proceedings of the Växjö Conference Quantum Theory: Reconsideration of Foundations; Khrennikov, A., Ed.; University Press: Vaxjo, Sweden, 2004; Volume 10, pp. 15–28. [Google Scholar]
Busemeyer, J.; Wang, Z.; Trueblood, J. Hierarchical bayesian estimation of quantum decision model parameters. In Proceedings of the 6th International Symposium on Quantum Interactions, Paris, France, 27–29 June 2012; pp. 80–89. [Google Scholar]
Busemeyer, J.R.; Trueblood, J. Comparison of quantum and bayesian inference models. In Quantum Interaction; Lecture Notes in Computer, Science; Bruza, P., Sofge, D., Lawless, W., van Rijsbergen, K., Klusch, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5494, pp. 29–43. [Google Scholar]
Busemeyer, J.R.; Wang, Z.; Lambert-Mogiliansky, A. Empirical comparison of markov and quantum models of decision making. J. Math. Psychol. 2009, 53, 423–433. [Google Scholar] [CrossRef]
Busemeyer, J.R.; Wang, Z.; Townsend, J.T. Quantum dynamics of human decision-making. J. Math. Psychol. 2006, 50, 220–241. [Google Scholar] [CrossRef]
Khrennikov, A. Quantum-like model of cognitive decision making and information processing. J. Biosyst. 2009, 95, 179–187. [Google Scholar] [CrossRef] [PubMed]
Busemeyer, J.; Pothos, E.; Franco, R.; Trueblood, J. A quantum theoretical explanation for probability judgment errors. J. Psychol. Rev. 2011, 118, 193–218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Busemeyer, J.; Wang, Z. Quantum cognition: Key issues and discussion. Top. Cogn. Sci. 2014, 6, 43–46. [Google Scholar] [CrossRef] [PubMed]
Wichert, A. Principles of Quantum Artificial Intelligence: Quantum Problem Solving and Machine Learning, 2nd ed.; World Scientific: Singapore, 2020. [Google Scholar]
Wichert, A.; Moreira, C. Balanced quantum-like model for decision making. In Proceedings of the 11th International Conference on Quantum Interaction, Nice, France, 3–5 September 2018; pp. 79–90. [Google Scholar]
Wichert, A.; Moreira, C.; Bruza, P. Quantum-like bayesian networks. Entropy 2020, 22, 170. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yukalov, V.; Sornette, D. Decision theory with prospect interference and entanglement. Theory Decis. 2011, 70, 283–328. [Google Scholar] [CrossRef] [Green Version]
Luger, G.F.; Stubblefield, W.A. Artificial Intelligence, Structures and Strategies for Complex Problem Solving, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1998. [Google Scholar]
Mitchell, T. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
Winston, P.H. Artificial Intelligence, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1992. [Google Scholar]

Figure 1.

f (ω) = \frac{ω}{Ω}

compared to

g (ω) = \frac{ω^{2}}{ω^{2} + {(Ω - ω)}^{2}}

for

Ω

, (a)

Ω = 4

, (b)

Ω = 10

, (c)

Ω = 100

and (d)

Ω = 1000

. With growing size of

Ω

the function

g (x)

converge to a sigmoid function.

Figure 2. Sigmoid functions:

g (x) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}

versus logistic function

σ (x) = \frac{1}{1 + e^{- 0.9523 \cdot (x - 5)}}

.

Figure 3. (a)

g (x) = \frac{x^{2}}{x^{2} + {(10 - x)}^{2}}

and

g^{'} (x) = - \frac{5 (- 10 + x) x}{{(50 - 10 x + x^{2})}^{2}}

; (b)

g^{'} (x) = - \frac{5 (- 10 + x) x}{{(50 - 10 x + x^{2})}^{2}}

versus

N (x | 5, 1 . 9^{2}) = \frac{1}{1.9 * \sqrt{2 * π}} \cdot e^{- \frac{1}{2} \cdot {(\frac{x - 5}{1.9})}^{2}}

.

Figure 4. Shannon’s entropy H with

Ω = 100

, blue discrete plot for

l_{1}

sampling and yellow discrete plot

l_{2}

sampling.

Figure 5. Multiplying two independent sampled events x and y with

Ω = 10

leads to different results: (a)

p_{l 1} (x, y)

, (b)

p_{l 2} (x, y)

, (c) counterplot of

p_{l 1} (x, y)

, (d) counterplot of

p_{l 2} (x, y)

.

Figure 6. Different

H_{1}

and

H_{2}

values for sampled events x and y with

Ω = 10

: (a)

H_{1} (x, y)

, (b)

H_{2} (x, y)

, (c) counterplot of

H_{1} (x, y)

, (d) counterplot of

H_{2} (x, y)

.

Figure 7. The Titanic data set is represented in an Excel table that contains data for 891 of the real Titanic passengers, some entries are not defined.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Quantum-Like Sampling

Abstract

1. Introduction

2. Kolmogorovs Probabilities

Frequentist Approach and $l_{1}$ Sampling

3. Quantum Probabilities

Conversion

4. Quantum-Like Sampling and the Sigmoid Function

Combination

5. Naïve Bayes Classifier

Titanic Dataset

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Quantum-Like Sampling

Abstract

1. Introduction

2. Kolmogorovs Probabilities

Frequentist Approach and l 1 Sampling

3. Quantum Probabilities

Conversion

4. Quantum-Like Sampling and the Sigmoid Function

Combination

5. Naïve Bayes Classifier

Titanic Dataset

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Frequentist Approach and $l_{1}$ Sampling