Exploration of Quantum Interference in Document Relevance Judgement Discrepancy

Quantum theory has been applied in a number of fields outside physics, e.g., cognitive science and information retrieval (IR). Recently, it has been shown that quantum theory can subsume various key IR models into a single mathematical formalism of Hilbert vector spaces. While a series of quantum-inspired IR models has been proposed, limited effort has been devoted to verify the existence of the quantum-like phenomenon in real users’ information retrieval processes, from a real user study perspective. In this paper, we aim to explore and model the quantum interference in users’ relevance judgement about documents, caused by the presentation order of documents. A user study in the context of IR tasks have been carried out. The existence of the quantum interference is tested by the violation of the law of total probability and the validity of the order effect. Our main findings are: (1) there is an apparent judging discrepancy across different users and document presentation orders, and empirical data have violated the law of total probability; (2) most search trials recorded in the user study show the existence of the order effect, and the incompatible decision perspectives in the quantum question (QQ) model are valid in some trials. We further explain the judgement discrepancy in more depth, in terms of four effects (comparison, unfamiliarity, attraction and repulsion) and also analyse the dynamics of document relevance judgement in terms of the evolution of the information need subspace.


Introduction
Quantum theory has been applied in many fields outside physics, e.g., cognitive science [1][2][3][4][5] and information retrieval [6]. Quantum cognition has been attracting increasing attention for explaining some non-classical phenomena [7][8][9][10][11][12]. The quantum framework built upon the quantum probability theory uses a new mathematical formulation for constructing models of human judgement and decision-making [13,14]. The axioms of the classical probability theory, such as the axiom of Boolean logic and the law of total probability, do not always hold in the framework of quantum probability. On the other hand, recent work using the Bayesian model comparison method has revealed that the quantum cognition model with an interference term is more competitive with its accuracy, parsimony and robustness [15,16].
In information retrieval (IR), van Rijsbergen for the first time proposed that the quantum theory can be used to axiomatize the geometric, probabilistic and logic-based IR models within a single mathematical formalism in complex Hilbert vector spaces [6], and various key concepts from quantum mechanics find their analogy in the IR field. Following van Rijsbergen's pioneering work, a body of quantum or quantum-like frameworks have recently been developed to formalize retrieval models and have achieved significant improvements [17][18][19][20]. Piwowarski et al. [21] attempted to find an effective quantum representation of documents and queries for measuring the relevance between them. Zuccon and Azzopardi [17] proposed a quantum probability ranking principle (QPRP), which incorporates the inter-document dependency as a kind of interference effect. Sordoni et al. [22] developed a quantum language model (QLM), which achieves an improvement over the traditional language model. More recently, Li et al. [23] extended QLM and applied it to the session search. Xie et al. [24] formulated the quantum entanglement in QLM and achieve good retrieval performance.
While many quantum-inspired IR models have been proposed, limited efforts have been devoted to exploring the quantum-like phenomenon (e.g., quantum interference) in the real users' IR processes from a real user study perspective. Motivated by the quantum interference-based user study [8], there has been some research [25][26][27] that tries to validate the existence of quantum-like phenomenon and to explain the IR process from the user cognition point of view. Zhang et al. [25] proposed to use the probabilistic automata to model the transition of user's relevance judgement states. Wang et al. [26] conducted experiments that revealed the relevance judgement discrepancy between participants who simultaneously make the relevance judgement of documents on two topics. Their results show that the relevance degree of a topic is greatly affected by the companion topic's relevance. Despite these recent advances, we believe that the research in this area needs to be investigated in-depth with a well-designed methodology and more convincing real user studies.
In this paper, we aim to investigate and understand the phenomenon of quantum-like interference in document relevance judgement via an extensive user study. In order to explain the effect of the document presentation order on the interference in the search process, we will focus on the following research questions: (1) Does there exist any interference in relevance judgement of documents? If yes, is it a quantum interference? (2) Why is the judgement of relevance dynamic? What is the cause for such dynamics?
To the best of our knowledge, no previous study has investigated the order effect from a quantum point of view for the document relevance judgement. The verification of the existence of quantum interference will be based on the law of total probability and the order effect. We will also adopt the q-test, which is an effective test method to examine the existence of incompatible decisions in the quantum question (QQ) model [28] that assumes that only the question order influences the question context [29]. In our user study, to simulate a real search environment, a web browser-based system is built to present the textual stimulus and to collect users' rating data. Participants are randomly divided into two groups. For a given query term, each group will judge the relevance of the same documents, but in different presenting order. The experimental results reveal the existence of the quantum-like interference: (1) most of the search trials show an apparent judging discrepancy and the violation of the total probability law; (2) most of the queries show order effects in the χ 2 test; and the q-test for some cases in the process of relevance judgement is valid, but other cases that show an apparent order effect do not follow the q-test due to more complex context than the QQ model.
To interpret the judgement discrepancy and the order effect in-depth, this paper will study four effects, i.e., the comparison effect, the unfamiliarity effect, the attraction effect and therepulsion effect. These four effects to some extent are rooted in the indetermination of relevance in the human mind. Specifically, the comparison effect utilizes the reference points model derived from psychology to explain an intuitive phenomenon. Unfamiliarity effect reveals that lack of background knowledge leads to an interference in the process of document relevance judgement. We further analyse the dynamics of document relevance judgements and propose to study the attraction effect and repulsion effect, by the evolution of the information need subspace. The information need evolves after a series of relevance judgement about documents and is not only about the original query, but contains many complex cognitive reconstruction. Inspired by the subspace projection in quantum theory, the dynamic information need model can be an approach to capture the complex cognitive interference process in information retrieval.

Relevance in Information Retrieval
The definition of relevance is a key concept in the IR field. Document relevance judgement is a task for the user to judge whether or not a document is relevant to a query or topic. In the earlier literature, the relevance is mainly concerned about the topical relevance. Recently, this concept has evolved with the development of IR applications. Schamber et al. [30] re-examined the concept of relevance with a multidimensional, dynamic and user-centric perspective. We may have not reached any consensus on the definition of "relevance", but the properties of relevance, such as dynamics and multidimensionality, have been accepted by most people. Dynamics means that users's perception of relevance can change over time in the process of interaction with the IR system. Multidimensionality refers to that relevance being perceived by many aspects and perspectives. For example, people not only expect that the system can return more relevant and large-scale documents, but also need well-organized results, which contain a variety of fresh and authoritative information.
Basically, there are two classes of relevance: subjective or system-based relevance and subjective or human (user)-based relevance [31]. The former is a static concept, while the latter is treated as a subjective individualized mental experience that involves cognitive restructuring [32]. In fact, most of the current information processing systems do not employ effective user models. The real-time cognitive process is more complex, and it is irrational to build a static and rigid IR system for users who have higher and higher expectations.

Quantum-Inspired IR Research
Van Rijsbergen's pioneering work [6] introduces the quantum theory (QT) into the IR field and subsumes the major IR models into a single mathematical formalism in Hilbert vector spaces. Recently, researchers applied QT in many areas, such as lexical semantic spaces [33], document re-rank by the quantum probability ranking principle (QPRP) [17], the quantum language model (QLM) [22], a QLM-based session search model [23], etc.
The different judgements of documents caused by different presenting order were studied from a classical perspective [34,35]. Eisenberg and Barry [36] find that the judgements are influenced when documents are presented to judges in a high to low rank or a low to high rank. The approach of QPRP [17,37] presents a document ranking principle inspired by the quantum interference phenomenon, which assumes that the optimal selection of a document in each rank position should consider the influence of previously-selected documents. However, they do not validate the existence of the interference phenomenon from the users' perspective and account for the interference. To some extent, our work can provide additional evidence for the necessity of QPRP from the perspective of user study.
The interference phenomenon in IR has been studied by dynamic relevance judgement [25] and topic interference [26]. Zhang et al. [25] proposed to adopt a probabilistic automaton to model the transition of user judgement states. They use a transition matrix to represent the judgement interference effect of the previous document on the current one. However, the probabilistic automaton framework is a generalization of the concept of the Markov model, which is not in the quantum framework (due to the utilization of unitary transformation [38], the quantum finite automaton [39] may be more powerful to model such dynamic process, which is valuable for future study). Wang et al. [26] showed that the relevance of a topic to a document is greatly affected by the companion topic's relevance, and the degree of the impact differs with respect to different companion topics. They think that the judgement of a document may be only interfered with by a different reference point of another topic's relevance degrees. The participants should simultaneously make the relevance judgement of the two topics, and the judgement may be affected by a significant comparison effect, but without an order effect. Recently, a study on the interference between different dimensions of the relevance [31] of a document had been carried out [27]. They put forward an experimental framework for examining whether dimensions of relevance interact via an order effect and find that the user may judge different dimensions of relevance based on an incompatible decision perspective (two decisions are incompatible if they have to be made sequentially, and the order does matter).

Classical and Quantum Probability Measurement
In this section, we will point out the limit of the classical probability theory by introducing some non-classical phenomena, which concern the complex cognition process in humans' minds. In addition, the quantum probability measurement will be preliminarily introduced. The advantages of quantum probability compared to classical probability theory for explaining the interference effect will also be expounded.

Classical Probability Measurement
Researchers used to apply the classic probability theory derived from Kolmogorov axioms [40] to decision models. The classical probability theory relies on a set-theoretic representation and assigns the events as subsets. Therefore, the probability framework in the Kolmogorov theory should obey some classical axioms, like the distributive axiom and the law of total probability. For example, {A, B} represents different events, and B is the complementary set of B. We have: The law of total probability is violated in some non-classical phenomena. For example, the categorization-decision experiments [9] show that the probability of attacking in the decision-alone condition is higher than both the probabilities of attacking after different categorizations, resulting in a violation of the law of total probability (LTP). Classical models, like Markov [41] and Markov-like models [25], have been proposed to explain such dynamic phenomena, but fail to explain the violation of the law of total probability [15,42].
In the prisoner dilemma game [43], 97% of the players will defect when told the opponent defected, and 84% of them will defect when informed that he or she cooperated. However, when they do not know the opponent's decision, only 63% will defect. We get the probability of defecting given the condition that participants were told the opponent' decision as p (d): where od means the event that the opponent would defect and oc means the event that the opponent would cooperate. The probability of defection under unknown conditions when the participants do not know the participants is denoted as p(d). If the law of total probability is valid, we have: However, according to the empirical data, p(d) is lower than the probability of each possible case (e.g., p(od, d) or p(oc, d)) [43]. Thus, the defection probability under the "known condition" is evidently higher than the probability under the "unknown condition" (i.e., p (d) > p(d)), which violates the law of total probability. The classical probability theory fails to explain these non-classical cognition phenomena, like this prisoner dilemma game. In this paper, we also find some similar non-classical cognition phenomena in users' relevance judgement process when searching for information. In order to better understand the cognitive process of user' relevance judgement in information retrieval, we introduce the quantum probability theory, which has been theoretically proven to be able to consistently explain the non-classical cognition phenomena, e.g., the violation of LTP.

Quantum Probability Theory
Quantum probability theory is defined by von Neumann [44] with the projective geometric. Different from the classical probability, which defines events as subsets in a sample space, quantum probability theory defines events as subspaces in a vector space. In absence of the measurement and observation, the initial system (described by a wave function) is in a superposed state, which is based on the superposition principle. A wave function, which is governed by the linear Schrödinger equation, is designed to describe the behaviour of a wave in any space or at any time. The behaviour of a wave function can be reconsidered as a superposition of (possibly infinitely many) other wave functions of certain type-stationary states whose behaviour is particularly simple (for all linear systems, the net response at a given place and time caused by two or more stimuli is the sum of the responses, which would have been caused by each stimulus individually [45]). Instead of the classical superposition of two realistic physical systems, the quantum superposition principle admits the possibility of the coexistence of multiple exclusive possible states simultaneously in a single system. Considering an electron with two possible configurations (up and down), the two mutually exclusive elementary outcomes are considered as two basis states, which are denoted as |↑ and |↓ , respectively. Before the measurement (observation), the configuration of an electron is indeterminate. Quantum mechanics describes the physical system of a qubit with a superposed state between |↑ and |↓ .
where α and β are complex numbers, and α 2 + β 2 = 1. Von Neumann postulates that the measurement will lead to a collapse of the current vector state [44]. Recently, Conte [46,47] has proven this postulate solely based on the mathematical framework of traditional Clifford algebra, which results in a self-consistent formulation of quantum theory. In the projective measurement (projective measurement is one kind of quantum measurement [48]), after a measurement, the current superposed state will collapse onto a basis vector with a probability that is determined by the squared length of the projection from the state vector to the basis. Then, the current state of the system is updated as |↑ or |↓ . Each observation is theorized as projecting state vector |S onto the subspace, which represents the event |↑ or |↓ . The operator of the collapse to a basis is called "projector". Π ↑ = |↑ ↑| represents the projector for the basis |↑ , and Π ↓ = |↓ ↓| represents the projector for the basis |↓ . The probability of observing event |↑ is calculated as ||Π ↑ S|| 2 , where both |↑ and |S are unit vectors, so the length of the projection also can be denoted as the inner product between |↑ and |S : ||Π ↑ S|| = | S| ↑ |.
The probabilities of the events (up or down) are derived from the squared magnitudes of the amplitudes: α 2 for the up configuration and α 2 for the down configuration. For the guarantee of the probability normalization, we have α 2 + β 2 = 1. For the sake of visualization, in the projective measurement, as shown in Figure 1a, real numbers (rather than complex numbers) are adopted as the probability amplitudes. Then, the probability is computed by the squared length of the projection. when measuring, the state |S will "collapse" onto the "down" basis |↓ with the probability that is determined by the squared length of the projection; (b) the case of the prisoner dilemma game: when the player does not know the opponent's action, he or she may chose to defectwith a low probability. If he or she is informed about the opponent's action, he or she will become more greedy and be more likely to defect.

Quantum Cognition and Quantum Interference
Beyond microscopic physics, the quantum or quantum-like phenomena in the cognition process have been evidenced by the pioneering research of Conte and Khrennikov [49][50][51][52][53]. Cognition is "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses" [54]. As Aristotle said, the knowledge one can acquire and the organizing power one can use constitute the two parts of information [55]. The concept of information is illuminated by Deutsch [56], who also get the conclusion "the world is made of qubit". We cannot separate the information of the outside world from mental entities [57]. As human beings, we have cognitive structures and functions and are able to not only receive inputs from the outside, but also process such information by performing semantic acts with our willing awareness [58]. Costa de Beauregard [59] proposes that such symmetry between the knowing awareness and the willing awareness upon information is equivalent to the symmetry of the quantum collapse [60,61], which provides the possibility that the cognitive process obeys the basic rules of quantum mechanics. In the quantum cognition, the wave function represents the knowledge factor that engages our cognitive function [58], and projectors represent logic statements and, thus, directly relate information with our perceptive and cognitive functions. Quantum mechanics relates logic and mental entities, which is also supported by a number of results [62][63][64] indicating the possible intrinsic logical origin of quantum mechanics. Some recent works [65][66][67][68] have also elaborated the important role of quantum mechanics at a perceptive and cognitive level.
Moreover, the quantum waves may be considered as probability waves or information waves based on the qubit-world statements [56,69]. It may be inherent that the probability concept is both objective and subjective. Humans' minds can simultaneously consider multiple potential choices with each single decision or judgement before the actual response to the outside. There exists an intrinsic indetermination (e.g., an intrinsic doubt or an inner conflict) that characterizes the cognitive status of the subject in the decision or judgement [13]. After the response to the outside, the superposed states in the mind would collapse to a basis state.
In the non-classical phenomenon of humans' subjective attitude, the quantum theory provides a geometric approach [4,27,29,70,71] to the probability measurement. A person's belief about the subjective attitude on the decision or question is presented by a state vector in a multidimensional Hilbert space, while a potential decision or response is represented by a subspace of the Hilbert space. Each response can lead to a "projection" of the current belief state on the corresponding subspace.
In the game of the prisoner dilemma, if the player does not know the result of the opponent, he or she may have expectations that both of them chose to cooperate and get a minimum punishment. When told the action of the opponent, he or she will become unworried and greedy. Whatever the opponent will chose, he or she will make a conservative decision to defect.
In this cognitive process, it is inappropriate to only consider the conditional probability in a static context like Equation (2). The probabilities also depend on the dynamic context, which may change the state of the mental entity whose basic representative is consciousness. For the participant, the acquirement of the information (knowledge) about the opponent's action will lead to a semantic act, which can largely affect the following decision-making bias in his or her mind. This influence is modelled as a collapse of the wave function, which represents the knowledge factor of the participants [58]. The introduction of quantum collapse brings the cognitive process from a superimposed linear dynamics to a new dynamics in which semantics is involved, thus really involving the information and directly relating the mental entities [68].
When told the result that the opponent will defect, the current context of the participant's decision will be changed. Technically speaking, the belief state (wave function) |S projects onto the basis "the opponent defects" with a probability ||Π od S|| 2 , which is the squared length of the projection from |S to |S od (see Figure 1b). Then, the player makes his or her own decision under the belief state |S od , and the probability of defection is ||Π d S od || 2 . According to the derivation in Appendix (inspired by [4]), we get the probability of defection in the two-stage condition (when told the opponent's decision) as: In the unknown condition, we get the probability of defection as: Compared to Equation (6), the additional term 2 S od |S S d |S od S oc |S S d |S oc cos(θ) is called the interference term [4,15,42], which allows the quantum probability theory to explain the violation of the law of total probability. The presence of the aforementioned interference term has been verified in many experiments [11,13,49,50,[72][73][74].
For the sake of simplification and intuition, we describe the events and their corresponding probabilities in the real vector space. As shown in Figure 1b, it provides a coarse way to visually reflect the interference effect to some extent. There are two pairs of orthogonal bases, one for the player's action (defect or cooperate) and the other for the opponent's decision that is told (the opponent cooperated or defected). Both the lengths of the projection (i.e., Π d Π od S and Π d Π oc S) on the basis "defect" in the two-stage condition are higher than the projection Π d S (directly on the defect basis) under the unknown condition, which is consistent with the empirical data in the prisoner dilemma game [43]. It also implies that the interference effect works, which means In other words, different "paths" lead to different final probabilities. The sum of both probabilities of the path S → S od → S d and the path S → S oc → S d is different from the probability of path S → S d .
The change in cognitive states that results from making one decision or judgement can cause a person to respond differently to the subsequent one, which leads to an order effect [75,76]. The order effect can be found in many scenarios. For example, Moore [77] proposes an interesting poll in public opinions on the characters of Bill Clinton and Al Gore. Participants were asked two question, and people showed a great difference on the answers while in different orders. The order effect may be a specialization of the interference phenomenon. In the poll, if the question A is firstly asked, the answers to the question B will be affected by the previous answers. While in the opposite order, the action of answering question B will cause interference with the question A.
Trueblood et al. [76] propose an explanation of the order effect inspired by the incompatibility, which requires simultaneous observation in quantum mechanics [78]. The explanations [76] try to explore the potential to model the scenarios in which the two related questions are raised in order, which may not be simultaneously considered.
In the perspective of the incompatibility of cognition [76], the order effect is due to the incompatibility of two measurements. Let A and B denote the two measurements. Since the operation of projections cannot commute, we get: In quantum cognition, quantum interference is derived from the complex probability amplitudes. In the IR task, however, the intuitive meaning of the complex field can hardly be found. For the sake of computational feasibility, most IR models only adopt the real field of the probability amplitudes [17,22,79]. Therefore, in our IR approach to quantum interference, we can employ the subspace projection in the real vector space, which can reflect the interference effect to some extent.

Methodology
In this section, we describe the methodology of how to explore the existence of the quantum interference in the process of document relevance judgement and how to interpret such an interference based on the subspace projections of the information need. In the users' perspective, the concept of "relevance" may have some properties of subjectivity and dynamics. Thus, there may be many cognitive interferences in the process of relevance judgement. Section 3.1 introduces how we examine the existence of the quantum interference in a common IR scenario, and then, we develop a method to describe the evolution of information need in Section 3.2.

Test of the Law of the Total Probability
The probability derived from Kolmogorov axioms obeys the law of total probability (LTP). Imagine in this scenario that there are two groups of users to judge two documents d A and d B .
Participants from the first group should judge d A and then d B , while participants from the second group should judge d B only. We denote r A as the event that a participant judges d A as "relevant", r A as the event that a participant judges d A as "not relevant" and similarly for r B and r B . For the first group, we have: where p(r B |r A ) means the probability that a participant who has judged d A as "relevant" will judge d B as "relevant" and p(r B |r A ) means the probability that a participant who has judged the d A as "not relevant" will judge the d B as "relevant". For the second group, the process of judgement on d B is in the non-comparative context (in this paper, the non-comparative context of a document means the scenario in which users have never judged any documents before judging the current document). We denote the relevance probability of d B as p(r B ). In this paper, we want to test the equation: which is derived from the law of total probability. If p(r B ) is significantly bigger (smaller) than both p(r B |r A ) and p(r B |r A ) simultaneously, there will be no possibility to satisfy the law of total probability.

Test of the Order Effect and Incompatibility
There is a specialized interference phenomenon named the "order effect", which is introduced in Section 2.4. Imagine a relevance judgement scenario in which there are two groups of users to judge two documents d A and d B . Participants from the first group should judge d A and then d B , while participants from the second group should judge d B and then d A . With a given query, participates from two groups will judge the relevance of document A and B in different order, one for the AB order and the other for the BA order. There will be four types of event for the first group, ArBr, ArBn, AnBr and AnBn, where ArBn is denoted as the event of rating "relevant" firstly and then "not relevant" in AB order, and so forth. Analogously, there will be four types of event for the second group, BrAr, BrAn, BnAr and BnAn, while BrAn is denoted as the event of rating "relevant" firstly and then "not relevant" in BA order, and so forth.
An order effect is determined by comparing the agreement rates in a non-comparative context vs. a comparative context. In a classical perspective, the proportions of subjects with the judgement ArBr, ArBn, AnBr and AnBn in Group 1 should be approximately equal to the proportions of subjects with the judgement BrAr, BnAr, BrAn and BnAn in Group 2. An order effect occurs when the proportion of subjects with the four classes differs between the AB order and the BA order. A two-tailed χ 2 test of equality of four proportions between populations will be carried out to test the order effect.
For the application of human's answers to attitude questions, the quantum question order model (QQ model) [29] has achieved success. Wang et al. [29] propose a q-test, which can determine whether the order effect can be explained by the QQ model based on an incompatible decision perspective. Although recent works [68,80] claim that the feasibility of the QQ model on the order effect may be an open question, some empirical data have revealed that there is a high statistical association between the result of the q-test and the order effect [28]. In this paper, we want to explore whether the q-test for the QQ model is valid in the scenarios of document relevance judgement.
The q-test is an a priori, parameter-free and precise test of the quantum question model for examining whether the two events are incompatible [27,29]. The quantum question equality (QQ equality) is shown as Equation (11): This prediction of QQ equality can be tested as follows. Let p AB = p(ArBn) + p(AnBr) be the probability of having different answers to the two questions in the AB order and p BA = p(BrAn) + p(BnAr) be the probability of having different answers to the two questions in the BA order. Then, the QQ model must predict that the following q-test value equals zero: It has been proven that in the QQ model, the probability of having different answers (the answers of the two questions are contrary; for example, ArBn and AnBr in the AB order) should be equal to each other even in different orders. In the QQ model, the q-test will hold when there is an order effect. However, in a context that is more complex than the QQ model context, not all of the phenomena with the order effect hold in the q-test. The q-test for the QQ model is based on the assumption that "only the question order influences the question context".

Dynamic Information Need Model in Quantum Measurement
Relevance can be defined as "a dynamic concept that depends on users' judgements of the quality of the relationship between information and information need at a certain point in time" [30,31]. Although the dynamic nature of relevance has been widely accepted, relatively few works try to model and explain the dynamic process of judging ordered documents in a cognitive interference perspective.
Relevance judgement is the process in which users subjectively evaluate whether the information of returned documents can satisfy their information need. Intuitively, when someone has judged a document, it may affect the user's mind about the original query intention, leading to a change of the context of the process.
Inspired by quantum measurement, we try to build a quantum formal framework that looks at the information need and documents in a single vector space. The understanding of the original information need in a human's mind is denoted as a belief state in the vector space. Each judgement is considered as a collapse of the user's belief state, and the belief state will be updated to one of the bases as a basis state. The next documents can provide a new group of orthogonal bases for the next judgement. The belief state of information need, which is derived from the original query terms, can be denoted as a unit vector |S . The event that represents "the document d i is relevant" is denoted as |r i .
Imagine that in this scenario, with a given query, each user should judge the relevance of two documents d A and d B . r A and r A constitute a pair of orthogonal bases, so do r B and r B . Just like the previous examples (e.g., the prisoner dilemma game and public polls in the paper), a different order of projection leads to a different final probability. In the case of information need subspace projection, if someone has updated his or her information need to r A or r A (after he or she judged the d A as relevant or irrelevant, respectively), the relevance probability of d B is computed as: Similar to Equation (7), when we have not judged document d A , the relevance probability of d B can be computed as: Due to the existence of the interference term, when θ = 0, p(r B ) = p(r A , r B ) + p(r A , r B ). From the projection approach in Equations (13) and (14), we can get the observation, which indicates that Π r B S 2 = Π r B Π r A S 2 + Π r B Π r A S 2 when θ = 0. When θ = 0, the quantum probability will "collapse" into the classical probability, and the interference effect does not occur.

User Study for Quantum Interference
In order to investigate the quantum interference phenomenon in the process of relevance judgement, we conduct an online user study to collect the experimental data about users' relevance judgement for documents. In this section, we first introduce the details of the experimental setting and then analyse the experimental results. Based on the experimental results, we finally investigate the existence of quantum interference.

Experimental Procedure of the User Study
In our user study, we recruited 61 participants, which include 50 postgraduate students and 11 undergraduate students. Their ages range from 19 to 28 (mean = 23.89, variance = 3.04). The participants cover a wide range of majors (e.g., engineering, economics, maths, etc.). All of them have a rich search engine usage experience. They also have a good English reading level. For example, 38 participants have passed the CET-6 and the rest of them have passed the CET-4 (In the College English Test of China, a non-English-specialized student who holds the certificate of CET-4 and CET-6 can reach the English level for non-English major undergraduates and postgraduates, respectively).
We design a browser-based data-collecting system shown in Figure 2, which can show the textual content to users and record the explicit relevance feedback information. The presentation styles are similar to the page of mainstream websites with the title and content in the centre of the browser. The advertisements and inapposite navigation are dropped. Most of the materials are collected from Wikipedia, and the rest are from China Daily. The topics of the selected materials can cover various types of content, ranging from general information, such as public debates, to professional information for specific fields. The details for the selected materials are shown in Table 1.  Each participant will be guided by a test case, which teaches them to rate the relevance degree of the documents given a corresponding query. In the formal test, there are 15 sessions, each of which has two documents with respect to a specific query. The participants will be asked to rate the documents one by one in the given order. They will have a suitable break time for relaxing after finishing a session. They can also click the "Next" button to move to the next session on their own initiative. It takes about one hour for a participant to finish all of the 15 sessions (including the relaxing time between sessions). We paid five dollars to each participant.
All of the participants are randomly divided into two groups for the same query. The two documents with respect to one query will be presented to the two groups in different order. For each query, there is a description as a supplementary explanation to ensure that it can be easily understood by the participants. The simulated search engine result pages (SERPs) are presented, which are only consisted of two titles of the corresponding documents. Since we aim to investigate the interference between the main contents of the two documents, we need to avoid introducing more uncertain factors into the SERPs. To this end, we will not provide too much information in the SERPs by removing the snippets, URLs and other links that usually appear in the SERPs of typical commercial search engines. For a better search experience, we limit the document length (number of words in the document) from 200 to 300 (239 on average).
Documents will be rated with four different relevance degrees: fully irrelevant, a little relevant, partially relevant and fully relevant, respectively. Before the experiments, participants will be instructed with some explanations of the basic concepts of the different relevance degrees.
After all 15 trials, we conducted a questionnaire survey to investigate their familiarity degree on these topics. Participants need to rate 15 query-document pairs. The familiarity degrees range from one to five, where the rating of one means they know nothing about the topics of the query and documents, while the rating of five means they are very familiar with those topics.

Statistics for the Judgement Discrepancy of Relevance Probability
For each trial, there are two documents: d A and d B . Group 1 will meet the order of "d A and then d B ", while Group 2 meets "d B and then d A ". There is no prior document before d A in Group 1. Thus, the judgement of d A in Group 1 has never been influenced by any other documents, which means the judgement of d A is in the non-comparative context in Group 1, so is d B in Group 2.
With the four degrees of relevance, we divide these degrees into two classes: "fully irrelevant" and "a little relevant" are ascribed to "not relevant", while "partially relevant" and "fully relevant" are ascribed to "relevant". We have collected 61 users' rating data, 31 for Group A and 30 for Group B.
The notation ∆ 1 means the judgement difference of the document d A between the two groups, while the document presenting order in Group 1 is "d A then d B " (AB order), and the other is "d B then d A " (BA order) instead. The notation ∆ 2 means the judgement difference of the document d B .
We calculate the judgement discrepancy of documents' relevance between the two groups. The data are shown in Table 2. The cases for which the absolute difference of relevance probability between the two groups is bigger than 0.2 are bolded. Most of the relevance judgements of documents for a query have been influenced by the previous document relevance judgement. The judgement discrepancy between two groups varies from trial to trial with the mean of the absolute value 0.1231 (for all 30 documents). A two-tailed t-test of the four relevance degrees (without binarization of the original rating score) of the same documents by two groups is carried out (α = 0.05). Those cases exhibiting a statistically-significant difference are flagged by †. The main reason for a non-significant difference between two groups may be that the relevance degrees only have four discrete values and that users tend to make a conservative judgement with a rating of two or three ("a little relevant" or "partially relevant", respectively).
Let us look at the example of the trial about the query "innovation driven": 6% of participants judge the document about the "sharing economy" as relevant when they judge it firstly, while for participants from the other group who have judged the document about the "new open economic system" then the "sharing economy", the relevance probability increases to 37%, which is much bigger than the probability in the non-comparative context. At the same time, some sessions like "Albert Einstein" and "Kung Fu Panda" do not show an apparent judgement difference between the two groups.

Evaluation of Quantum Interference in Relevance Judgement
As shown in Table 2, most of the sessions show an apparent judgement difference in the two groups. We will test the judgement difference from the perspective of cognition interference.

Violation of the Law of Total Probability
In our experiment, one group should judge the two documents d A and d B in the AB order, while the another in the BA order. The first group will provide a judgement of d B in the comparative context and a judgement of d A in the non-comparative context. We denote the relevance probability of the document d i without any other previously-judged documents as p(r i ), while in the non-comparative context, p(r i |r j ) means the relevance probability under the condition of judging the previous document as "not relevance", and p(r i |r j ) is under the opposite condition. The probability of the document relevance in the comparative context is formalized as Equation (15).
The empirical results in our experiments are shown in Table 3. Referring to Table 1, Group 1 provides a non-comparative context to judge d A , while Group 23 provides a non-comparative context to judge d B . We apply d A as d i and d B as d j regularly, but the cases flagged by "*" in Table 3 mean the opposite, i.e., d B as d i and d A as d j instead. means the opposite; "+" ("−") means that p(r i ) is larger (lower) than both p(r i |r j ) and p(r i |r j ) which are in boldface.
The cases when the relevant probability of the document in the non-comparative context is higher (lower) than both of the two conditional probabilities are bolded. The last column flagged by "+" ("−") means the relevance probability in the non-comparative context is larger (lower) than both the probability in the comparative context, no matter the relevance judgement of the previous document. In this experiment, 22 cases (out of 30) cannot satisfy the law of total probability. Interference has apparently occurred in this judging process.

Test of the Order Effect and Incompatibility
An order effect is determined by comparing the agreement rates obtained in two groups. If the two groups have completely different judgements in different orders, it means there exists an apparent order effect. In our experiment, a two-tailed χ 2 test (α = 0, 05) of the equality of proportions between populations is shown in Table 4, where 12 queries (out of 15 queries) presenting an order effect are flagged by †.
In order to compare whether the decision pattern in the quantum question model is applicable to the relevance judgement process, we employ the q-test based on the QQ model. It has been proven that, in the QQ model, the probability of having different answers should be equal to each other, even in different orders. The QQ model must predict q = p AB − p AB = 0, while p AB = p(ArBn) + p(AnBr) and p BA = p(BrAn) + p(BnAr), as described in Section 3.1. The result is reported in Table 4. The queries that are statistically significant in the χ 2 test are flagged by †. Some of the queries that show an apparent order effect in the χ 2 test fail to hold the q-test. The Jaccard index coefficient (the Jaccard similarity coefficient is the ratio of elements in both sets over the elements only found in separate sets) between the χ 2 test and q-test is 0.231 (the tiny number shows a low correlation between the two test methods). There may be two main reasons. First, the QQ model is more available for the relatively large dataset. Second, the real search settings are more complex than the QQ model. The q-test for the QQ model assumes that only the question order influences the question context [29]. This assumption is usually violated in the relevance judgement process. The fuzzy information need, different knowledge backgrounds about the query and documents and even different reading habits can contribute the possibility of uncertainty to the cognitive process.
In summary, most of the queries show an apparent order effect. In some cases, when the q-test holds and there is statistical significance in the χ 2 test, the process of the relevance judgement on the two documents can be explained by an incompatible decision perspective in the QQ model.

Discussions for Quantum Interference in Relevance Judgement
Experiments have revealed that the interference phenomenon and order effects exist in the relevance judgement. This section will provide four judging effects to account for the judgement discrepancy. The comparison effect employs a psychological hypothesis and assumes that different reference points lead to different judgements. The unfamiliar effect points out that the lack of background knowledge in users' minds may be the underlying causes for interference. The more unfamiliar the query and documents are, the more apparent the interference phenomenon is. The attraction effect and the repulsion effect provide a new model to explain how the previously-judged documents affect the current judgement from the perspective of the dynamic information need.

Comparison Effect
For the query of "Mo Yan", who was awarded the Nobel Prize in Literature, when someone has seen a "fully relevant" document about "The Red Sorghum" (Mo's representative work of hallucinatory realism), they will lower the score of the second document about "Nobel Prize". In the non-comparative context, 55% of the participants judge the second one as "relevant", but when having seen a more relevant document and setting a high reference point, the probability decreases to 30%. Another example of the low reference point is about the query "religion"; the document about "politics" provides a low expectation for users; and users become more likely to accept that the document about "culture" is relevant to the query.
As shown in Figure 3, the extent of interference largely depends on the relevance of the previous document. There are 30 points that represent the 30 documents, and the relevance deviation is calculated as ∆ = p(r d ) − p (r d ), while ∆ is the judgement difference of the document between the comparative context and non-comparative context. From the results, we find that the more relevance the previous document has, the less perceived relevance the current document has. The Pearson correlation coefficient, which measures the linear relationship between them, is −0.53 with the two-tailed p-value of 0.0029. The more relevant the previous document is, the lower relevance degree the users can perceive from the current document.  Just like anchoring-and-adjustment heuristics, Tversky and Kahneman [81] said, "people make estimates by starting from an initial value that is adjusted to yield the final answer", that is different reference points [82] yield different estimates, which are biased toward the initial values. When someone has visited a highly relevant document and has a higher expectation to meet documents with a higher relevance degree, if the result is in contrast, he or she will be more disappointed and lower the score of the next one. On the contrary, if he or she meets an irrelevant document before, he or she will more likely accept a document that is partially relevant or even a little relevant.

Unfamiliar Effect
Contextuality is one of the two aspects (the other aspect is quantum entanglement) of quantum theory that opens the door to explain the phenomenon facing cognition and decisions in a totally new light [4]. The concept of contextuality is captured in quantum theory by the idea of "interference": "the context generated by making a first judgement or decision interferes with the subsequent judgement or decisions to produce order effects". In the search process, the concept of "context" is complex. If your major is computer science, you will have a clear understanding of the concept of "Boolean algebra" and "Turing machine", thus easily making a judgement of the relevance to the query "computer" without reading the documents carefully. The context largely depends on the participants' knowledge state, information need and other environment factors. Before participants start the trial, if we have a clear understanding of the query, documents and the relation between them, the interference will be barely observed. Since the context of the search process cannot bring extra information to users, they can only judge the relevance by their a priori knowledge instead of the information of the document itself. In cognitive science, a concept has two types of properties: context-independent properties, which are activated by the words in the whole cognitive space, and context-dependent properties activated by the current contexts [83].
In the search process, the context includes the users' cognitive background, the information need and any other environmental factors. For example, for the query "Albert Einstein" and the two documents about "Isaac Newton" and "the theory of relativity", respectively, most participants have a clear understanding of the two concepts (or "topics") and their relation with the query, thus leading to a tiny interference with each document. Things are the opposite when we consider the query of "transgene" and its corresponding documents about "hybrid" and "garden roses", respectively. These unfamiliar concepts have caused a massive interference with each other and influenced the participants' judgement. Generally speaking, higher context dependence and lower context-independence will be more likely to cause a more apparent interference phenomenon.
The phenomenon of interference could be more apparent in an unfamiliar context. In other words, the lack of knowledge background of the topic of the query in the human mind leads to an apparent interference between judgements. We have indeed found that there is a high correlation between the familiarity extent of the topic about the query-document pair and the judging deviation, as shown in the Figure 4. We called this the "unfamiliar effect". In Figure 4, there are 15 points that represent the 15 query-document pairs, and the mean relevance deviation is calculated as the sum of the absolute values of the relevance deviation in two orders: MeanDeviation = (|∆ 1 | + |∆ 2 |)/2. The result of the familiarity extent is from the questionnaire, which is introduced in Section 4. The Pearson correlation coefficient, which measures the linear relationship between them, is −0.68 with the two-tailed p-value of 0.0054. It is easy to find from the results that the more unfamiliar the topic of the session is, the greater the judging difference derived from the previous document will be.

Attraction Effect and Repulsion Effect
The statistical conclusion and intuitive explanation can reveal some users' behavioural patterns, yet it is difficult to give an effective and systematic explanation of the dynamic search process, which involves a series of cognitive processes. Besides the interference phenomenon in humans' minds, the concept from the query and documents may be mutually connected. Judging a document may affect the users' state of information need, result in a dynamic judgement of the relevance while in different orders. The information need is essentially uncertain and will evolve with the users' reading experience. Every relevance judgement of the current document will lead to a "collapse" of information need, and thus, users may make a different judgement of the next document. Now, some explanation inspired by quantum measurement is presented to account for the dynamic nature of the relevance judgement process. We try to explain how the visited documents affected the information need.
The dynamic information need model has been introduced in Section 3.2. As shown in Figure 5, for simplification, we treat the semantic spaces as only a two-dimensional space. Information need is denoted as the belief state in users' minds. |r A represents the events that d A is "relevant" to the information, and |r A represents that d A is "not relevant". |r B represents the events that d B is "relevant" to the information, and |r B represents that d B is "not relevant". |r A and |r A constitute a pair of orthogonal bases, so do |r B and |r B .
Not about "Obama's wife" "Obama" S Not about "Obama" (a) attraction effect on query "American President" (b) repulsion effect on query "probability" "Obama's wife" Not about "statistics" S Not about "distribution" "statistics " "distribution" Figure 5. Two examples of the dynamic need model inspired by quantum mechanics. (a) The judgement of rating the document "Obama" as "relevant" will attract the state of the users' information need to the document about "Obama's wife"; (b) the judgement of rating the document "distribution" as "irrelevant" will push away the state of the users' information need from the related concept.
For example, giving the query "American president", d A is about "Barack Obama", while d B is about "Obama's wife". |r A represents the event that the user judges that the document with the title "Barack Obama" is "relevant" to the query "American president". |r A represents the event that the user judges that the document with the title "Barack Obama" is "not relevant" to the query "American president". In other words, it means that the query is not about the document with the topic of "American president". Obviously, |r A and |r A are mutually exclusive elementary outcomes and can constitute a pair of orthogonal bases, so do |r B and |r B . The angle between |S and |r A is approximately determined by the original relevant degrees between the query and document d A .
Based on the two types of judgement, we can find two effects inspired by the dynamic assumption. One is the "attraction effect", referring to Figure 5a. For the case of "American president", 37% of participants directly judge the document d B about "Obama's wife" in the non-comparative context. When participants from another group judge firstly the document d A about "Obama" as relevant, the information need may become more closely connected with "Obama". In quantum theory, this means that the user's state collapses into the subspace of "Obama". After the belief state of information is updated to "Obama", the second measurement on "Obama's wife" will be more easily accepted by users (the relevance probability increases to 52%). Looking at Figure 5a, the length after two projections is longer than the length after only one projection.
In general, this is one case of the "attraction effect". In a more common scenario, we want to find something about "NoSQL database", but you know nothing about it. When shown a document about "MongoDb", one of the most popular NoSQL databases, you will refuse to continue to read it because of the lack of background knowledge about "MongoDb" and not being conscious of the connection between the "NoSQL database" and "MongoDb". If there is a document that introduces the variety of NoSQL database before the document of "MongoDb" and bridges the gap between these concepts, things would be different.
Another effect is named the "repulsion effect". If the participants have rejected something, they will be more likely to dislike homogeneous topics. In the query of "probability" and corresponding documents about "statistics" and "distribution", which are analogous in content, as shown in Figure 5b, users who label "probability" as not relevant will be more likely to reject the second one (in the comparative context, the relevance probability reduces from 0.83 to 0.62). In quantum theory, once judged, the dynamic information need "collapses" into the basis that is orthogonal to the "probability" and has a large angle with any documents highly related to the deserted contents.
The "attraction effect" and the "repulsion effect" provide an explanation for the dynamic nature of the relevance judgement. This reveals that the information need is not only about the original query terms, but also evolves along with the judgement or interaction in the search process. The visited pages will affect the following judgement because of the users's unconscious collapse of the information need. Unfortunately, the context of the search process is too complex to quantitatively analyse the judging discrepancy in different orders. The judgement discrepancy may be also caused by the superposition of multiple effects, leaving more detailed work for the future.

Conclusions and Future Work
In this paper, we design and carry out a user study, in order to investigate the quantum interference in the document relevance judgement process. The investigation is based on three kinds of test, i.e., the law of total probability, the χ 2 test for the order effect and the q-test for the quantum question model. The results show that there indeed exists document relevance judgement discrepancy, and the order effect is obviously observed, while the q-test holds in some trials in the process of the relevance judgement due to the complex context, rather than the question answering context.
In order to further interpret the document relevance judgement discrepancy, we have proposed four effects, namely the comparison effect, the unfamiliarity effect, the attraction effect and the repulsion effect. The comparison effect and the unfamiliarity effect provide an intuitive interpretation, while the attraction effect and the repulsion effect interpret the dynamic nature of the relevance judgement based on the evolution of the information need subspace.
One direct future work is to make use of the four effects for predicting the dynamic relevance of documents in the real-time IR setting. Each interaction and relevance judgement by a user can reflect the subtle change of the user's cognitive state, which helps the IR system to perceive the user's real-time information need. In the real search process, if we know how the relevance judgement on documents interfere with the others, it can help the IR system present better ordered documents to satisfy the user's information need. Especially in a user-centred IR system, it is more critical to build an effective user model based on users' real-time information need.
In the unknown condition, p(d) = S d |S = ( S od |S * S d |S od + S oc |S * S d |S oc ) 2 =( S od |S * S d |S od + S oc |S * S d |S oc ) * ( S od |S * S d |S od + S oc |S * S d |S oc ) † =| S od |S | 2 * | S d |S od | 2 + | S oc |S | 2 * | S d |S oc | 2 + 2| S od |S S d |S od S oc |S S d |S oc |cos(θ) (23) Compared to Equation (18), the additional term 2 S od |S S d |S od S oc |S S d |S oc cos(θ) is the interference term.