1. Introduction
This paper addresses the problem of information extraction by an AI-powered chatbot that is not fully reliable and may make mistakes in its responses. The problem concerns searching and extracting relevant information from large datasets in response to a user query. Unlike conventional search and rescue robots that typically perform targeted searches using well-known pathfinding algorithms to autonomously navigate in physical space, the AI chatbot processes a natural language conversation with a human user and extracts meaningful information through semantic search and human–AI interaction to generate the most relevant and contextually appropriate responses. During a user–AI conversation in natural language, the human user can modify and refine queries until s/he is satisfied with the chatbot’s output. We analyzed two key characteristics of human–AI interaction, search reliability and efficiency [
1,
2,
3]. The reliability of an AI-based chatbot is reflected in its ability to understand user queries and provide correct answers; it is measured by the frequency (probability) of correct responses [
1,
2]. The search efficiency of a chatbot [
3] indicates how accurate and relevant the information returned by the chatbot is. In this study, search efficiency is measured by the level of satisfaction that a human user receives for a correct answer from the chatbot on a 10-point Likert scale. This is a well-known IBM SPSS Statistics version 25 rating system for measuring customer satisfaction, in which the highest score of 10 means “very satisfied” on a 10-point scale from 1 to 10, and the lowest score of 1 means “very dissatisfied” [
4].
In this paper, we uncovered a counterintuitive relationship between the reliability of AI chatbots and their search efficiency. We demonstrated that, under general conditions, reducing the reliability of deployed chatbots can improve the search efficiency. In particular, a less reliable chatbot that stops searching after extracting relevant information can have a higher expected efficiency than a more reliable chatbot. This phenomenon aligns with the family of “more-for-less” paradoxes observed in various complex systems. Finally, we discussed the underlying mechanism of this paradox.
The main contributions of this paper were as follows: (i) we presented and described a multi-round conversational process between a human user and an AI chatbot, such as ChatGPT, that served to extract relevant information and consisted of multiple rounds of query modification/refinement during the conversation; (ii) we formulated the information extraction optimization problem for an imperfect AI chatbot as a two-player discrete sequential search problem with probabilistic parameters, where the first player, the AI chatbot, performed the search, while the second player, the human user, evaluated the performance of the first player at each search iteration; (iii) we developed an efficient index-based algorithm for finding the optimal search policy, which extended the well-known fast scheduling algorithms of Ross [
5], Sweat [
6], and Chew [
7] for discrete sequential search problems; and finally, (iv) we analyzed and explained the counterintuitive relationship between the AI chatbot reliability and the search efficiency in the multi-round information search/extraction process.
The remainder of the paper is organized as follows. In the next section, we introduce the main definitions that will be used in the sequel.
Section 3 describes the information extraction problem to be solved.
Section 4 provides a brief overview of the previous work. In
Section 5, we propose a mathematical model and an efficient index-based algorithm to optimize the expected user satisfaction over an infinite horizon. In
Section 6, we formulate and prove the paradoxical relationship between the reliability and efficiency of an optimal information extraction process using an AI chatbot. A numerical example to support and illustrate the studied paradox is presented in
Section 7. A summary of the contributions and an outline of future research are provided in
Section 8.
2. Preliminaries ∑i
In this section, we provide the basic definitions that will be used in the sequel.
2.1. The Discrete Search Problem with Imperfect Robot Sensors
The common
discrete search problem well known in operations research is as follows [
5,
6,
7,
8,
9,
10]. A target is hidden in one of the known locations, 1, 2, …,
N, with a probability distribution
p = (
p1,
p2, …,
pN), ∑
i pi = 1, and remains in this location during the search. A search device (for example, the search-and-rescue robot) must perform a sequence of inspections at locations 1, 2, …,
N to find the target and continue searching until the target is found.
The robot’s sensor is imperfect, that is, for each location i, it has a non-zero overlook probability ai associated with it, so that if the target is at location i and the robot is exploring that location, there is a chance (probability ai > 0) that the sensor will not detect the target for any i = 1, …, N.
Given the above probabilities, along with the duration of the inspections and the reward for detecting the target at a given location, the goal is to define a probabilistic search policy that finds the optimal sequence of robot movements that maximizes the expected total reward over an infinite time interval. Specific details regarding information search/extraction using AI-powered chatbots are presented in the next subsection.
2.2. The Information Extraction Problem Using AI Chatbots
The studied problem refers to searching and extracting relevant information from a large dataset in response to a user’s query. The goal of an AI chatbot is to provide the most useful and accurate information and avoid irrelevant results.
While the task of information extraction using AI chatbots can be considered a special class of the general discrete search problem, these problems differ in their goals, physical environments, and solution methods. Search-and-rescue robots perform searches, typically after disasters, in physical areas, seeking to find and rescue survivors, while AI-powered chatbots perform information search and extraction, aiming to find, extract, and generate relevant textual and visual answers from digital knowledge bases, documents, and the web. Search-and-rescue robots use discrete pathfinding algorithms such as A* and Dijkstra’s algorithm to autonomously navigate physical spaces, while the conversation between a human user and an AI chatbot consists of multiple stages (rounds), during which the user modifies/refines queries while the chatbot improves its responses over time based on user feedback and learning from the user’s interactions with the AI, until the user is satisfied.
An AI-powered chatbot searches, discovers, and extracts the most relevant information requested through a process called semantic search. This is an advanced information retrieval technology that focuses on understanding the meaning and content of a user’s query, rather than simply searching and matching keywords. The chatbot typically scans hundreds of databases and their clusters depending on the complexity of the query, where thousands of documents can be scanned in milliseconds, making this process efficient. The process consists of the following main steps:
Query understanding, where the chatbot interprets the human user’s request using natural language processing to extract the meaning and then convert the text into a numerical format;
Using semantic search, the chatbot compares the query with data stored in databases or multiple database clusters to find the most relevant matches;
The chatbot performs ranking/filtering of the query to prioritize the most relevant pieces of information and refine its response;
Once the relevant information has been found, the robot generates text using natural language and extracts the natural language response.
In the next section, we present a mathematical model of the problem.
3. Description of the Efficiency-Maximizing AI-Based Search Problem
The target information is hidden in one of the known database clusters (called locations) 1, 2, …, N, with a known probability distribution p = (p1, p2, …, pN), pi = 1, and remains in this location during the search. The search device (in our case, the AI-powered chatbot) must perform a sequence of scans in locations 1, 2, …, N to find the target information and continue the multi-round search until a human user declares, on some round, that the target has been found; in this case, the search process stops.
The chatbot is imperfect (not completely reliable), i.e., at each stage, for each location i, it has a non-zero overlook probability ai, so that if the requested target information is in location i and the chatbot scans this location, it does not recognize it and does not report the presence of the target information there, i = 1, …, N.
Given the above probabilities
p = (
p1,
p2, …,
pN) and
a = (
a1,
a2, …,
aN), as well as the duration of all conversation rounds and the user satisfaction levels with the extracted target information in each conversation round, the objective of the information extraction problem under consideration is to determine a stochastic search policy to find the optimal sequence of chatbot location scans that maximizes the expected user satisfaction over an infinite time horizon. In
Table 1, we highlight the main differences between the general discrete search problem, well known in operations research, and the information extraction problem. The essential difference between them is that the present work is motivated by and focuses on the practical issue of analyzing the properties of existing AI chatbots.
The mathematical model is considered in more detail. The discrete search area A contains N known database clusters (locations), in one of which a target information is hidden, with a known probability distribution p = (p1, p2, …, pN), pi = 1. The AI chatbot inspects the area to find and identify a hidden target whose location is unknown in advance; we assume that this smart robot must sequentially perform location scans in each round, one after another. The AI chatbot is imperfect; this means that associated with each location i is a non-zero overlook probability ai, that is, ai is the conditional probability that the robot will not detect the target information in a single scan of location i, given that this location in fact contains the requested information, i ∈ A = {1, …, N}. Therefore, since ai ≠ 0, for the search to be successful, each location must be searched infinitely often, and, from the mathematical point of view, in general, the sequence of user–chatbot conversational rounds can be infinite.
The duration of the user’s query modifications at each conversation round are assumed to be known in advance. Also, as we indicated above, in this work, the search efficiency is measured by the satisfaction level that a human user feels whenever he/she is satisfied by the chatbot response during some stage (round) of the conversation; it is assumed to decrease monotonically in time during the user’s conversation with the chatbot.
The goal of information extraction is to determine the optimal search strategy to find the target information with the maximum expected search efficiency (user’s satisfaction) over an infinite time horizon.
In addition to probabilities pi and ai defined above, we used the following notations:
Policy π is an infinite sequence of conversation stages (rounds), π = (π1, π2, …); πk is the number of a location (a database cluster) scanned at the kth round of the sequence π, πk ∈ A = {1, …, N}, k = 1, 2, …;
n is the (unknown) round of the sequence π at which the chatbot successfully extracts the targeted information;
is the number of unsuccessful scans (rounds) of location i = πn before the nth (successful) round of the sequence π; ≤ n − 1;
is the conditional probability that after unsuccessful scans of location πn, the chatbot will successfully detect that location (database cluster); πn contains the target information, given that the latter information is at location πn; );
is the unconditional probability that the chatbot will find the requested information at location πn;
tk is the known time required for a human user to read, evaluate, and refine the chatbot’s response in round k, k = 1,2, …;
Tπ,n = is the total time spent by a human user in conversation with the chatbot from round 1 to round n in sequence π;
r(π) = is the known Likert-scale satisfaction level obtained by the human user when the target information is found at location πn of the sequence π (before discounting). If the search for the target information requires Tπ,n time units, the level of satisfaction is discounted by for a known discount factor d, d ∈ (0,1), i.e., as defined as follows: Rdiscount(π) = = .
In the problem under consideration, the satisfaction level is a random variable with the value
x =
and the probability
p(
x) =
=
)
, depending on
π. Then, the expected discounted satisfaction (search efficiency) corresponds to the expected accumulated information obtained by the human user in the policy
π, denoted as
F(
π), and will be defined over an infinite time horizon as follows:
The goal is to define a stochastic policy for a search chatbot that determines where it should search at each round, that is, it determines an optimal infinite sequence of chatbot actions that maximizes the expected discounted satisfaction of a human user F(π) over an infinite period of time.
4. Previous Work
This section is devoted to two different topics of this study: a review of efficient discrete search algorithms and a brief overview of paradoxical phenomena arising in operations research.
4.1. Efficient Algorithms for the Discrete Sequential Search Problems
The discrete search problem is one of the oldest problems of operations research, first formulated and solved by Bernard Koopman and his team during World War II, when the military task was to provide effective methods of detecting submarines [
11]. An excellent review of search theory and a recent bibliography of the discrete search literature can be found in [
8,
12,
13].
Several authors have studied discrete sequential search problems with overlooking probabilities in various settings and developed efficient index-based solution methods (see [
5,
6,
7,
8,
9,
10]). When a search device, such as a search-and-rescue robot or an unmanned aerial vehicle (drone), is not reliable, the inspection/search of a location is imperfect, and in this case, any location may need to be inspected many, even infinite, times. Thus, the discrete search problem is to find an optimal strategy rather than the optimal search sequence. The latter optimal strategy depends on the random moment when the search device finds the target.
A typical stopping rule for a multi-step stochastic search process is: Stop at a step when the target is found or when the assigned search time has elapsed. Other criteria for terminating the search process, called stopping rules, have been proposed by Ross [
5] and Chew [
7]. Trummel and Weisinger [
14] and Wegener [
15] showed that the general discrete search problem is NP-hard.
In a more recent study, Kriheli et al. [
16] proposed a fast algorithm for searching for hidden targets by unmanned aerial vehicles under imperfect inspections with non-zero, false-positive, and false-negative detection probabilities.
Kress et al. [
17] analyzed several myopic and biology-inspired search strategies that minimize search time and showed that a local index-based policy is optimal when target detection by an autonomous automated search device is accompanied by reliable verification work performed by a team of humans.
The discrete sequential search problems mentioned above are single-player search problems in which the decision-maker must optimize a single objective function, such as maximizing the expected reward or minimizing the expected cost, given stochastic or deterministic constraints imposed by the environment; typically, it is necessary to find a single optimal alternative from a finite or infinite set of alternatives. In contrast, two-player search problems are usually studied and solved in the context of game theory as zero-sum games, in which two players compete to find a hidden object or location, with or without cooperation from each other.
The two-player game involves a seeker who moves through space and a hider who tries to prevent the seeker from finding the hidden object. The presence of a second rational player radically changes the problem from optimization to forecasting and combating the optimal counter strategies of another player. This introduces fundamentally new methods, moving from search theory to game theory, where the optimal decisions of each player are chosen depending on the decisions of a rational opponent or a cooperative player [
18,
19].
In recent years, two-player game problems and their solution methods have become a viable and widely used tool for analyzing and optimizing human–robot interactions. It is a new, rapidly growing area of research in artificial intelligence that studies efficient interactions between humans and robots, particularly through the lens of their joint performance of complex tasks, with direct interaction and coordination [
20,
21]. Importantly, searches involving human–robot interaction generally outperform hidden object searches by an autonomous robot [
22,
23].
In this paper, we continue and extend models of human–robot interaction by focusing on the study of a multi-round search scenario in which imperfect inspections of given databases are performed by an AI-powered chatbot, accompanied by a human user who verifies the chatbot’s responses and refines his/her queries.
4.2. Paradoxes in Operations Research
In teaching operations research and artificial intelligence courses, the authors have found that presenting counterintuitive or paradox-like anomalies can be a great eye-opener for listeners. Studying such anomalies leads to a better understanding of algorithms and a deeper learning of operations research and AI techniques. A typical example is the famous Braess paradox that arises in dynamic transportation networks and states that “
adding a new wide road to a congested transport network can lead to an increase in overall travel time” [
24,
25]. Similar counterintuitive effects are known to also occur, for instance, in power transmission systems [
26], biological and ecological systems [
27], and exploiting generative artificial intelligence [
28].
A mechanism of this paradox can be explained by changing drivers’ priorities and behaviors after a network’s reconstruction. For example, in response to a capacity addition, more drivers begin to use the expanded highway; those previously not traveling at peak times shift to the peak hours, and public transport users shift to driving. From a mathematical perspective, such drivers’ behaviors disrupt the equilibrium and lead to suboptimal traffic flow for everyone. This is because the Nash equilibrium of such a system is not necessarily optimal. The network change induces a new game structure between multiple drivers, which leads to a new situation. Unlike a Nash equilibrium, where drivers have no incentive to change their routes, when the system is not in a Nash equilibrium, individual drivers can improve their respective travel times by changing the routes they take. It follows that drivers will continue to switch until they reach a new Nash equilibrium, despite the reduction in overall performance.
Another phenomenon called the Jevons paradox describes a situation where technological progress increases the efficiency of resource use, reducing the amount required per user and lowering its cost. However, the falling cost of use increases demand. Consequently, while individual resource consumption decreases, total societal resource use by all users increases, regardless of whether this resource is fuel, a computer, or an electronic device. Therefore, this “paradox” is explained by the substantial societal increase in demand, which outweighs individual gains in resource efficiency, leading to a net increase (rather than decrease) in societal resource consumption [
29,
30]. These paradoxes warn against making decisions based solely on local or individual optimization without assessing the broader impact on the system.
One more paradox arising in scheduling theory was described by Spieksma and Woeginger [
31] and states that “
increasing the speed of some machines in a no-wait flow-shop problem can significantly worsen the optimal total time”. There are other, less well-known anomalies studied in literature, for example, the so-called “
more-for-less paradox” in the classical transportation problem [
32] and Graham’s multiprocessor scheduling anomaly [
33].
The next sections present another paradoxical “more-for-less” phenomenon, according to which increasing generative AI’s reliability decreases its efficiency. Note that in all mentioned papers and in this one, the word “paradox” is used to denote a counterintuitive and surprising phenomenon, not a classical paradox in the context of formal logic.
5. Finding the Optimal Policy π* for the Problem
In this section, we proved the optimality of the greedy index policy for the information extraction problem. We used a basic method called the “interchange argument”, which involves exchanging two consecutive search locations. For completeness, we described this index policy for solving the problem under study. In what follows, we used this policy to identify the paradoxical property of the problem. The idea of this method was as follows.
To find the optimal search strategy that maximizes the expected discounted satisfaction presented in (1), it sufficed to compare the objective function
F(
π) for two conjugate infinite sequences
π1 and
π2, in which exactly two arbitrary neighboring locations,
and
, were swapped:
Having performed elementary arithmetic operations with expression (1), it was easy to obtain the following:
From (2), it followed that
F(
π1) ≥
F(
π2) if:
or
Since neighboring locations were chosen arbitrarily, in order for a sequence
π* to have a maximum expected efficiency
F(
π*), it was sufficient that all of its components were chosen sequentially one after another in a non-decreasing order of the coefficients defined in (3). We came to the conclusion that the following two-index parameter
Qi,j characterized the “attractiveness” of location
i, given that this location was already searched unsuccessfully
j times,
j ≥ 1:
Remark. The symbol u(π,n) in Formula (3) denotes the same term as the symbol j in Formula (4), i.e., the number of times a certain location i = has already been unsuccessfully scanned by an unreliable chatbot. However, introducing such a double notation in this context was performed for convenience in order to write the formulas in a more readable form. The former notation was convenient when comparing two policies denoted as π1 and π2, while the latter will be more convenient below when describing the upcoming multi-round algorithm.
Using relations (2)–(3) and notation (4), we obtained the following result. The policy which maximized the expected search efficiency (i.e., user satisfaction) was defined as follows:
Proposition 1. If there have already been j = j(i) unsuccessful scans of each location i, where i ∈ {1, …, N} and j ≥ 1, then choosing for the next scan the location i*, which has the maximum value of the parameter Qi,j defined in (4) among all i for a fixed j = j(i), ensures the maximum discounted satisfaction F(π) of the user.
5.1. Iterative Algorithm for Constructing the Optimal Sequence
Proposition 1 formulated above allowed us to construct an efficient iterative algorithm for constructing an optimal search policy π*.
The algorithm was defined for an indefinite number of search rounds over an infinite time horizon. It contained an unknown number of rounds and continued until the human user was satisfied with the information obtained; in this case, the user stopped the search process. In essence, this procedure requires that at each round, for each subsequent search, exactly one known cluster of databases is selected and scanned, namely the cluster with the maximum parameter Qi,j, defined in (4), among the values of Qi,j that have not yet been selected.
The results of the calculations at any round k, k = 1,2, … are represented as a k × N matrix, denoted as Mk, in which N columns correspond to predetermined locations for scanning, and the rows correspond to the search rounds performed before the algorithm stops. The elements mi,j of the matrix Mk represent the parameters Qi,j from expression (4), computed sequentially round by round. Below, we describe round 1 separately, since it differed from all subsequent rounds in that it contained an additional step, designated as 1.1, in which the first row of the matrix M1 was formed. After that, we offer a description of the current kth round, the same for all k ≥ 2.
As mentioned, in this procedure, the subscript j denotes how many unsuccessful scans of location i (i = 1, …, N) were made by the chatbot before the start of round k (k = 1, 2, …). Therefore, this index can be provided with two indicators, namely i and k, and the subscript j was replaced by the notation jk(i).
5.2. Rounds of the Algorithm
Let us first describe round 1, which consisted of the following four steps.
Step 1.1. Calculate the elements of the first row of the matrix M1.
In accordance with expression (4), we set j = 1 for all locations, calculated the parameters Qi,1 for all i = 1, …, N, and wrote them as the corresponding elements mi,1 of the first row of the matrix M1.
Step 1.2. The chatbot performed the first (preliminary) scan of each location one by one and extracted the relevant information from each location. With the chatbot being unreliable, the human user verified that the target information from the scanned-once locations was not satisfiable yet and hence assigned 1 to the number of unsuccessful checks for each location.
Step 1.3. Determine the location and scan it once more.
Location
was chosen such that
=
maxi Qi,1.
(
) was scanned. If the human user was satisfied with the target information extracted now from the location
=
, the partial satisfaction level
F(
π*) obtained so far was computed, where
π* = {
}, and the search was terminated;
in cell (1,
) was highlighted in bold (see numerical example in
Section 7). Otherwise, the number of unsuccessful scans for location
was increased by 1, i.e., set
j2(
): =
j1(
) + 1 = 2. Go to the next round, round 2.
Let us now assume that the rounds 1, …, k (k ≥ 1) were completed, and the locations π*1, …, π*k were determined at these rounds, i.e., the optimal subsequence = (,, …, ) was found. At the end of the kth round, all parameters Qi,j, i = 1, …, N, j = 1,2, …, and k + 1 were calculated and represented as a known (k + 1) × N-matrix M(k+1). Accordingly, the corresponding indices j1(i1*), j2(i2*), …, and jk(ik*) and the objective function F(π*) were calculated. Then, the next, (k + 1)th round was as follows.
In round k + 1, k = 1, 2, …, consisted of the following three steps.
Step k + 1.1. Determine the location to be scanned (k + 1)th-in-turn by the iterative algorithm.
The second index j in the elements of the (k + 1)th row of the matrix Mk+1 by jk+1(i) for all i was determined; among all parameters , the location i*k+1 (to be scanned as the next one) was selected, which had the maximum as follows: = . The item was placed in cell (k + 1, ) of the (k + 1)th row of the matrix Mk+1, and this item was highlighted in bold.
Step k + 1.2. The human user checked whether the location contained the required target information; if yes, the search stopped; otherwise, the number of failed checks for location was increased by 1.
A current optimal subsequence π* = π*k+1 was considered, which, for now, was = (,…, ); the efficiency F() was computed. The number of unsuccessful scans of location increased by 1, that is, the second index for location was changed as follows: jk+2(): = jk+1() + 1, leaving the second indices for all other locations unchanged: ji,k+2: = ji,k+1, i ≠ . Go to step k + 1.3.
Step (k + 1).3. Determine all elements Qi,j in the (k + 2)th row of matrix Mk+2.
The (k + 2)th row of the matrix M(k+2), in which the elements Qi,j were calculated, was defined as follows: (i) the parameter was placed in cell (k + 2, jk+2()), and (ii) all other elements of the (k + 2)th row of M(k+2) must be the same as the corresponding elements in the previous row k + 1 in M(k+1), that is, = , for i ≠ . Go to the next round. Rows 1, …, k + 1 of the matrix Mk+2 were left the same as the rows in the matrix Mk+1; thus, the matrix Mk+2 was built. The rounds were repeated until a human user was satisfied with the target information extracted by the chatbot.
We concluded this section by discussing the complexity of the proposed algorithm. First, we noted that the sequential scan of datasets (also called full scan), which was considered in this paper, was the simplest (and at the same time the slowest) search procedure compared to indexed and vector search, widely used by AI chatbots. However, sequential search is often used in cases where there is no index on the queried field, for example, if the data is loaded on-the-fly without a search index.
The worst-case complexity C (running time) of the greedy iterative algorithm was defined as the sum of the chatbot’s execution time Tchatbot and the response time Tuser of the human user, determined at each round of the algorithm and multiplied by the total number of available rounds R of the chatbot–human dialogue, that is, C = (Tchatbot + Tuser)R, where Tchatbot = Ntscan; N is the total number of locations, and tscan denotes the average scanning time of one location. The AI chatbot was typically expected to produce one scan within 100–200 milliseconds; Tuser is usually a few minutes at each round. The number of the rounds was not known a priori, before the dialogue was completed, but it depended on the user’s satisfaction or the entire time budget allocated in advance. Thus, the total running time of the algorithm was proportional to the size R × N of the current matrix MR computed for R rounds of the algorithm.
The proposed iterative procedure is illustrated by a numerical example in
Section 7. It shows the operation of the algorithm and demonstrates that the paradoxical property manifested itself already at the early rounds of the considered iterative search process.
6. Comparison of Two Chatbots: The “Paradox”
The goal of this section was to uncover a counterintuitive property of the general AI chatbot information extraction problem. When using optimal search policy, an AI based chatbot must find and extract target information in a multi-round iterative process. We supposed that since the search chatbot was unreliable, the human user was never satisfied with the chatbot’s first answer and asked the latter to improve it with a refined query; in other words, a successful round was always preceded by at least one unsuccessful round, i.e., we assumed that the parameter u(i) exceeded 1 for any i.
Suppose we have two search chatbots with the same time/efficiency characteristics and the same discount but with different reliabilities. Let us denote chatbots 1 and 2 and assume that both are sufficiently reliable in the sense that their overlook probabilities are quite small: 0 < (s) < 0.5, s = 1, 2, and in all locations, i = 1, …, N. For definiteness, let us assume that the first chatbot is more reliable, that is, 0 < < , i = 1, …, N. Both chatbots operate in the same environment, but due to different reliabilities, their optimal search policies may be different.
Let π*(1) be the optimal policy for chatbot 1 and π*(2) be the optimal policy for chatbot 2. Accordingly, let F1(π*(1)) and F2(π*(2)) denote the satisfaction functions obtained by the first and second chatbot, respectively.
The main theoretical result of this work is as follows:
Proposition 2. If the chatbots are sufficiently reliable, that is, 0 < < , i = 1, …, N, then, when using the optimal search policy, a less reliable chatbot is more efficient, in the sense that its search efficiency is greater than the efficiency of a more reliable chatbot, i.e., F1(π*(1)) < F2(π*(2)).
To prove Proposition 2, we first need to prove the following auxiliary claim:
Proposition 3. Let the function f(x) be xk−1(1 − x), where k is a positive integer greater than 1. Then, f(x) is increasing in x when 0 < x < 0.5.
Proof. We have: f(x) = xk−1(1 − x) = xk−1 − xk, k > 1.
Then, f’(x) = (xk−1 − xk)’ = xk−2(k − 1 − kx). Since x > 0, f’(x) > 0 iff k − 1 − kx > 0, or, equivalently, f’(x) > 0 iff x < (k − 1)/k.
Note that (k − 1)/k ≥ 0.5 for any integer k > 1. Then, for k > 1, we have: 0 < x < 0.5 ≤ (k − 1)/k.
Therefore, f’(x) > 0, and, hence, f(x) is increasing. □
Now, we return to the proof of Proposition 2. Let us denote the optimal expected discounted satisfactions for chatbots 1 and 2 by F1(π*(1)) = E(Rdisc,1(π*(1))) and F2(π*(2)) = E(Rdisc,2(π*(2))), respectively. Since the reliabilities of the two chatbots are different, the optimal search strategies of these smart robots, denoted as π*(1) = () and π*(2) = (), respectively, in the general case can be different, despite all other characteristics of them being pairwise identical. The optimal values of the corresponding expected discounted satisfaction, F1(π*(1)) and F2(π*(2)), can also be different.
According to expression (1), we have:
If
<
for all
m, then, according to Proposition 3, we have:
Then, from (5) and (6), it follows that if 0 <
<
for all
m, then the following is true:
Next, it is easy to see the following:
since
π*
(2) denotes the optimal policy for chatbot 2, that is,
π*
(2) guarantees the maximum expected satisfaction
F2(
π). From (7) and (8), it follows that a less reliable chatbot is more efficient in the sense that it provides greater expected satisfaction to the human user than a more reliable chatbot.
7. Discussion
At this point, two related questions arise: Does the paradoxical situation under consideration arise: (1) when the chatbot is completely reliable, i.e., its overlook probability ai = 0, and (2) when the chatbot is not completely reliable, that is, its overlook probability is strictly positive and can be arbitrarily small?
Let us answer the first question first. A simple counterexample clearly shows that the answer here is: there is no paradox. For completeness, we provide proof below.
Consider the following numerical example, the idea of which was suggested to us by I.M. Sonin. For an absolutely reliable chatbot 1, we set ai1 = 0; for chatbot 2, we set ai2 = 0.1; the known satisfaction levels are ri = 10 for each chatbot and any location, and the prior probabilities that the target information is hidden at location i are pi = 0.5, i = 1,2; d = 0.1. Then, the absolutely reliable chatbot 1 will inevitably find the target information at the right location, either during the first search with probability ½ or during the second search (after an unsuccessful first search) with probability ½, and the search process ends. Assuming d = 0.1, the overall objective function F1(π*) for the first chatbot is: pi·10 + pi·10·, that is, about 9.6.
As for the second, unreliable chatbot, since ai2 > 0, the iterative search in these two locations can have an infinite number of rounds, and the expected total satisfaction on an infinite horizon will obey expression (1). Elementary arithmetic operations on Formula (1) using the data for chatbot 2 and the sum of an infinite geometrical progression show that the expected satisfaction F2(π*) of the second chatbot does not exceed 4.9 points, and in fact, it will even be much less for a very small ai2 > 0. We see that the efficiency of an absolutely reliable chatbot is greater, which means there is no paradox in this case.
Now let us answer the second question. At first glance, and here too, the answer should not be paradoxical, since the objective function defined in expression (1) is continuous in the variable
ai, and, therefore, one would expect to obtain the same answer as in the case where the probability
ai = 0. However, this answer is incorrect—in fact, it is only true for the special case where the chatbot is absolutely reliable and retrieves the target information during the first round of the algorithm. In this case, the process is finite and terminates after one scan of each location. However, if the chatbot is unreliable and thus does not detect the target in the first round, then, starting from the second round of the multi-round iterative search, the objective function (1) is discontinuous in the discrete variable
ui (i.e., the number of unsuccessful scans of location
i), and the proof based off the continuity of the objective function is no longer valid. The correct answer to the second question is that if the AI chatbot does not extract the target information in the first round of search, then during the iterative search procedure described in
Section 5.2, the less reliable chatbot is more efficient—for any positive overlook probability
ai, even as small as you wish. This claim is rigorously proved in Proposition 2 of this section.
8. Numerical Examples
We conducted experimental computations in which we modeled the information extraction process for two AI chatbots with different characteristics; we varied the number of rounds of the algorithm, the scanning time, and the overlook probabilities, including very large and very small ones. The computations consistently confirmed our judgement that the considered paradoxical phenomenon exists and is easily reproduced.
In this section, we considered numerical examples for two chatbots with six locations, randomly selected from our experiments. The examples convincingly confirmed the existence of the paradox. The input data are presented in
Table 2. The notations are introduced in
Section 3 as follows:
pi is a known probability that location i contains the information we are looking for; i = 1, …, 6;
ti is the time (in minutes) spent by a human user in conversation with the chatbot to evaluate the chatbot’s answer about the scan of location i; at each round, the time is assumed to remain the same;
ai is a non-zero overlook probability;
ri is a known Likert-scale satisfaction level obtained by the human user when the target information is found at location
i (at any round) before discounting; recall that the level of satisfaction that a human user receives for a correct answer from the chatbot is measured on the 10-point Likert scale. This is the IBM SPSS rating system, in which the highest score of 10 signifies “very satisfied” on a 10-point scale ranging from 1 to 10, while the lowest score of 1 means “very dissatisfied” [
4];
π = (π1, π2, …, πk) is a finite sequence of conversation stages (rounds);
πk is the number of a location (a database) scanned at the kth round of the sequence π, πk A = {1, …, 6}, k = 1, 2, …;
n is the a priori unknown round of the sequence π at which the chatbot successfully extracts the targeted information;
Tπ,n = is the total time spent by a human user in conversation with the chatbot from round 1 to round n in sequence π;
If the search for the target information required Tπ,n time units in π, the level of satisfaction in round πn was discounted by , for a known discount factor d, d ∈ (0,1), i.e., the discounted level of satisfaction was computed as follows: Rdiscount(π) = = .
The numerical parameters of the times and probabilities were randomly selected. These and numerous other examples stored in our dataset confirmed our theoretical judgements.
Calculations of the proposed iterative algorithm are presented in
Table 2 and
Table 3 for chatbots 1 and 2, respectively, for 12 consecutive rounds. The table columns correspond to the six locations considered; the rows represent rounds of the search example considered. The elements in the cells are the coefficients
Qi,j, calculated for 12 rounds, row by row.
The number in the cell (
i,
j) of these tables represents the parameter
Qi,j, calculated by Formula (4), where
i = 1, …, 12;
j = 1, …, 6 for two chatbots. The tables represent the matrices
Mk for two chatbots, introduced in
Section 5.2. The calculations in
Table 3 for chatbot 1 can be observed; similar calculations were performed in
Table 4 for chatbot 2. For chatbot 1, in the first round, the first row of matrix
M1 (i.e., the first row of
Table 3) was filled in with the coefficients
Qi,1,
i = 1, …,
N. Then, all the locations were (unsuccessfully) scanned once, after which the
Qi*(1),1 =
maxiQi,1 =
Q4,1 = 0.5112 and the corresponding location
i*(1) = 4 were found. Next, during this round, chatbot 1 scanned location 4 once more; we assumed that in this example, it did not discover the relevant target information there. Then, the number of unsuccessful scans of this location increased to 2, and the corresponding parameter
Q4,2 was calculated according to (4), (equals 0.0204) and placed in the cell (2,4) of matrix
M2. All other elements of the second row of this matrix were left unchanged—the same as in the previous row of
M1. The first round of the algorithm was completed, and then the second round started.
In round 2, the next location to be scanned was selected by computing Qi*(2),2 = maxiQi,2 = Q6,2 = 0.3237, and hence, the location i*(2) = 6 was chosen. Qi*(2),2 was placed in cell (2,6) of the second row of matrix M2, and this item was highlighted in bold.
Next, during this round, chatbot 1 scanned location 6 once more; we assumed that the human user was not satisfied by the target information extracted there. Then, the number of unsuccessful scans of this location increased to 2; the corresponding parameter Q6,3 was calculated according to (4) (equals 0.0204) and placed in cell (3,6) of the next matrix M3. All other elements of the third row of this matrix were left unchanged—the same as in the previous row of M2. The second round was completed.
During the subsequent rounds, the algorithm sequentially filled in the next rows 3, 4, …, 12 of
Table 3, following the instructions of the algorithm, as in round 2.
We assumed that the human user was satisfied by the target information retrieved in the 12th round. The results obtained for each chatbot over 12 rounds were as follows: (1) The optimal policies for chatbots 1 and 2 were equal to
π*
(1) = (4, 6, 1, 3, 5, 2, 6, 1, 3, 4, 6, 5) and
π*
(2) = (4, 6, 1, 3, 5, 2, 6, 4, 3, 5, 1, 6), respectively. (2) The partial efficiency
F1(
π*
(1)) and, accordingly,
F2(
π*
(2)) obtained at each round for the two chatbots are shown in the corresponding cells of the last columns of
Table 3 and
Table 4, respectively. Comparing these results, we see that
in each round of the search process, even in the early rounds, the optimal expected search efficiency for a less reliable chatbot was significantly greater than the optimum efficiency of the more reliable chatbot. The example considered, constructed for 12 rounds, covered simpler numerical examples; for example, the calculations presented in the top rows 1, 2, and 3 of
Table 3 and
Table 4 corresponded to a small illustrative example where the number of rounds of the algorithm was 3.
These numerical examples and, more generally, all the other numerous experimental calculations that we systematically performed showed that the paradoxical phenomenon in question clearly exists and is easily reproduced. However, it is unknown whether the experimental sensor data always reflect the possible behavior of real, accessible AI chatbots. Investigating this issue is a challenging open question and an attractive direction for future research.
9. Concluding Remarks and Open Problems
The focus of this work was to study a realistic situation where the overlook probability of a chatbot is non-zero and thus, the search process does not end during the first round of human–chatbot conversation. The main contributions of this paper were as follows. First, we analyzed the two-player discrete sequential search problem under study, derived a greedy index-based iterative search policy for its solution, and proved that it was optimal for the considered search problem. Then, we identified a paradoxical “more-for-less” phenomenon: a less reliable chatbot had a higher efficiency for any positive overlook probability, even if it was very small.
Despite the “paradoxical” wording of the title, the result of this paper is quite logically explainable and completely compatible with common sense. The point is that if a more reliable chatbot has a lower overlook probability, then it is plausible to think that it will find the target information faster, i.e., in fewer search rounds and ultimately in less average search time.
This common-sense observation is entirely consistent with the well-known fact from probability theory about the random variable
x, which is called the geometric random variable with parameter
p and which represents the random number of independent unsuccessful trials (rounds) before the target is found; this probability is given by
p(
k) =
p(1 −
p)
k−1,
k = 1, 2, … In this case, it is known that the expected number of trials equals 1/p = 1/(1 −
ai) [
34]. Thus, the higher the overlook probability, the greater the expected number of rounds and the average search time before the target information is found, and, therefore, the greater the total expected accumulated information and the search efficiency determined by expression (1). Therefore, in essence, there is no contradiction in the obtained result.
Finally, we would like to acknowledge some limitations of the present study, most notably the lack of comparison between the proposed mathematical model of information extraction with AI and the behavior of real, publicly available AI chatbots such as ChatGPT; the lack of discussion of potential bias in human users’ satisfaction assessment; and the lack of information about chatbots using non-greedy search policies where data are not scanned sequentially. These issues are beyond the scope of this study and are challenging directions for future research.
It would be interesting to know what paradoxical situations may arise for other search/extraction scenarios that arise in information search performed by real AI chatbots, in particular for multi-target, multi-criteria, multi-user, and multi-chatbot scenarios. Another general problem is to find and resolve real-world situations for good search policies (ideally, optimal or approximate with performance guarantees) based on advanced algorithmic ideas such as optimizing with learning [
35] and reinforcement search [
36].