Quantum-Like Sampling

: Probability theory is built around Kolmogorov’s axioms. To each event, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. Kolmogorov’s probabilities of events are added, the sum of all possible events is one. The numerical degrees of belief can be estimated from a sample by its true fraction. The frequency of an event in a sample is counted and normalized resulting in a linear relation. We introduce quantum-like sampling. The resulting Kolmogorov’s probabilities are in a sigmoid relation. The sigmoid relation offers a better importability since it induces the bell-shaped distribution, it leads also to less uncertainty when computing the Shannon’s entropy. Additionally, we conducted 100 empirical experiments by quantum-like sampling 100 times a random training sets and validation sets out of the Titanic data set using the Naïve Bayes classiﬁer. In the mean the accuracy increased from 78.84% to 79.46%.


Introduction
Quantum algorithms are based on different principles than classical algorithm. Here we investigate a simple quantum-like algorithm that is motivated by quantum physics. The incapability between Kolmogorov's probabilities and quantum probabilities results from the different norms that are used. In quantum probabilities the length of the vector in l 2 norm representing the amplitudes of all events is one. Usually, Kolmogorov's probabilities are converted in Quantum-like probabilities by the squared root operation, it is difficult to attach any meaning to the squared root operation. Motivated by the lack of interpretability we define quantum-like sampling, the l 2 sampling. The resulting Kolmogorov's probabilities are not any more linear but related to the sigmoid function. In the following we will introduce the traditional l 1 sampling Then we will introduce the quantum-like sampling, the l 2 sampling. We will indicate the relation between the l 2 sampling, the sigmoid function and the normal distribution. The quantum-like sampling leads to less uncertainty. We fortify this hypothesis by empirical experiments with a simple Naïve Bayes classifier on the Titanic dataset.
To each sentence, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. The last axiom expresses the probability of disjunction and is given by where do these numerical degrees of belief come from?
• Humans can believe in a subjective viewpoint, which can be determined by some empirical psychological experiments. This approach is a very subjective way to determine the numerical degree of belief. • For a finite sample we can estimate the true fraction. We count the frequency of an event in a sample. We do not know the true value because we cannot access the whole population of events. This approach is called frequentist. • It appears that the true values can be determined from the true nature of the universe, for example, for a fair coin, the probability of heads is 0.5. This approach is related to the Platonic world of ideas. However, we can never verify whether a fair coin exists.
Frequentist Approach and l 1 Sampling Relying on the frequentist approach, one can determine the probability of an event x by counting. If Ω is the set of all possible events, p(Ω) = 1 and the cardinality card(Ω) determines the number of elements of a set Ω and card(x) is the number of elements of the set x and with x ∈ Ω This kind of sampling is the l 1 sampling. With n events x 1 , x 2 , ..., x n that describe all possible events of the set Ω, We can interpret card(ω i ) card(Ω) as dimension of an n dimensional vector with the vector being with the l 1 norm it is as a unit-length vector in the norm l 1

Quantum Probabilities
Quantum physics evaluates a probability p(x) of a state x as the squared magnitude of a probability amplitude A(x), which is represented by a complex number This is because the product of complex number with its conjugate is always a real number. With A(x) * · A(x) = α 2 + β 2 = |A(x)| 2 .
Quantum physics by itself does not offer any justification or explanation beside the statement that it just works fine, see [2]. This quantum probabilities are as well called von Neumann probabilities. Converting two amplitudes into probabilities leads to an interference term 2 · (A(x) · A * (y)), making both approaches, in general, incompatible In other words, the summation rule of classical probability theory is violated, resulting in one of the most fundamental laws of quantum mechanics, see [2]. In quantum physics we interpret A(x i ) as dimension of an n dimensional vector with the vector being In quantum physics the unit-length vector is computed in the in the norm l 2 instead of the norm l 1 with a unit-length vector in the norm l 2 By replacing the l 1 norm by the Euclidean l 2 norm in the classical probability theory (Kolmogorov probabilities), we obtain quantum mechanics [3] with all the corresponding laws. The incapability between the Kolmogorov's probabilities and quantum probabilities results from the simple fact This incapability is as well the basis for quantum cognition. Quantum cognition is motivated by clues from psychology indicate that human cognition is based on quantum probability rather than the traditional probability theory as explained by Kolmogorov's axioms, see [4][5][6][7]. Empirical findings show that, under uncertainty, humans tend to violate the expected utility theory and consequently the laws of classical probability theory (e.g., the law of total probability [6][7][8]), In [4,[9][10][11][12][13][14] leading to what is known as the "disjunction effect" which, in turn, leads to violation of the Sure Thing Principle. The violation results from an additional interference that influences the classical probabilities.

Conversion
The amplitude is the root of the belief multiplied with the corresponding phase [11][12][13] With l 1 sampling it is difficult to attach any meaning to card(x), Motivated by the lack of interpretability we define the quantum-like sampling, the l 2 sampling with n events x 1 , x 2 , ..., x n that describe all possible events with The l 2 sampling leads to an interpretation of the amplitudes as a normalized frequency of occurrence of an event multiplied by the phase. Φ is dependent on the distribution of all values card(x i ). When developing empirical experiments that are explained by quantum cognition models l 2 sampling should be used rather than l 1 sampling. What is the interpretation of |card(x i )| 2 outside quantum interpretation?

Quantum-Like Sampling and the Sigmoid Function
To understand the difference between l 1 sampling and l 2 sampling (quantum-like sampling) we will analyze a simple binary event x For l 1 sampling p(x) and p(¬x) is defined as and for l 2 sampling we can define the functions for binary l 1 and l 2 sampling as and In Figure 1 f (ω) = ω Ω is compared to g(ω) =  The derivative of the sigmoid function g(x) is a bell-shaped function. For the continuous function g(x), the derivative is similar to the Gaussian distribution. The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will have an approximately Gausssian distribution. In Figure 3  The derivative of the sigmoid function g(ω) is less similar to the probability mass function of the the binomial distribution.
The l 2 sampling leads to natural sigmoid representation of probabilities that reflects the nature of Gaussian/Normal distribution. This leads to less uncertainty using l 2 compared to l 1 sampling represented by Shannon's entropy.
as can be seen in Figure 4 for Ω = 100.

Combination
Depending on l 1 or l 2 sampling we get different results when we combine probabilities. Multiplying two independent sampled events x and y results in the joint distribution when l 1 sampled with Ω = 10 and when l 2 sampled with Ω = 10 as indicated in the Figure 5. The same applies for weighted sum of information, corresponding to the Shannon's entropy as used in the ID3 algorithm [15][16][17] for symbolical machine learning to preform a greedy search for small decision trees. Computing when l 1 sampled with Ω = 10 and when l 2 sampled with Ω = 10 leads to different H 1 and H 2 values as indicated in the Figure 6.  Less uncertainty would lead to better results in machine learning. We preform empirical experiments with a simple Naïve Bayes classifier to fortify this hypothesis.

Naïve Bayes Classifier
For a target function f : X → h, where each instance x described by attributes a 1 , a 2 · · · a n Most probable value of f (x) is: 1 , a 2 , a 3 , .., a n ) = p(a 1 , a 2 , a 3 , ..., a n |h i ) · P(h i ) P(a 1 , a 2 , a 3 , ..., a n ) The likelihood of the conditional probability p(a 1 , a 2 , a 3 , ..., a n |h i ) is described by 2 n possible combinations. For n possible variables, the exponential growth of combinations being true or false becomes an intractable problem for large n since all 2 n − 1 possible combinations must be known. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence, p (a 1 , a 2 , a 3 , ..., a n |h i ) = n ∏ j=1 p(a j |h i ).
is known as the Naïve Bayes assumption and is one of the most important developments in the recent history of Artificial Intelligence [16]. It assumes that a single cause directly influences a number of events, all of which are conditionally independent. The Naïve Bayes classifier is defined as

Titanic Dataset
When the Titanic sank it killed 1502 out of 2224 passengers and crew. We are using a file titanic.cvs processed file from https://gist.github.com/michhar/ accessed on 1 July 2021 that corresponds to the file train.cvs from kaggle https://www.kaggle.com/c/titanic/ data?select=train.csv accessed on 1 July 2021 that contains data for 891 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including their PassengerId, whether they Survived, heir passenger-class Pclass, their Name, their Sex, their Age, and other six attributes, see Figure 7. In our experiment we will only use the attributes Survived, Pclass, Sex and Age. In the next step we separate our data set in a training set and a validation set resulting in 295 elements in the validation set We use sklearn.model_selection accessed on 1 July 2021 import train_test_split accessed on 1 July 2021 with the parameters train_test_split(XH,yy, test_size=0.33,random_state=42) accessed on 1 July 2021. The later is used to validate how well our algorithm is doing. For l 1 sampling 60 entries are wrong classified resulting in the accuracy of 79.66%, for the l 2 sampling 55 entries are wrong classified resulting in the accuracy of 81.36%.
In the next we sample the 100 times a random training set and validation set We use sklearn.model_selection accessed on 1 July 2021 import train_test_split accessed on 1 July 2021 with the parameters train_test_split(XH,yy,test_size=0.33) accessed on 1 July 2021. For l 1 sampling there were in the mean 62.42 entries wrong classified resulting in the accuracy of 78.84%, for the l 2 sampling there were 60.59 entries wrong classified resulting in the accuracy of 79.46%.
The trend from this simple evaluation indicates that l 2 sampling leads to better results compared to l 1 sampling in the empirical experiments.
So far we looked at the performance of classification models in terms of accuracy. Specifically, we measured error based on the fraction of mistakes. However, in some tasks, there are some types of mistakes that are worse than others. However in our simple task the correctly classified results when l 1 sampled are also correctly classified when l 2 sampled.

Conclusions
We introduced quantum-like sampling also called the l 2 sampling. The l 2 sampling leads to an interpretation of the amplitudes as a normalized frequency of occurrence of an event multiplied by the phase Φ is dependent on the distribution of all values card(x i ). When developing empirical experiments that are explained by quantum cognition models l 2 sampling should be used rather than l 1 sampling. The quantum inspired l 2 sampling maps the probability values to a natural continuous sigmoid function, its derivative is is a bell-shaped function that is similar to the Gaussian distribution of events. The l 2 sampling improves the classification accuracy in machine learning models that are based on sampling as indicated by empirical experiments with Naïve Bayes classifier.