Secretary Problem with Possible Errors in Observation

: The classical secretary problem models a situation in which the decision maker can select or reject in the sequential observation objects numbered by the relative ranks. In theoretical studies, it is known that the strategy is to reject the ﬁrst 37% of objects and select the next relative best one. However, an empirical result for the problem is that people do not apply the optimal rule. In this article, we propose modeling doubts of decision maker by considering a modiﬁcation of the secretary problem. We assume that the decision maker can not observe the relative ranks in a proper way. We calculate the optimal strategy in such a problem and the value of the problem. In special cases, we also combine this problem with the no-information best choice problem and a no-information second-best choice problem.

of the best choice problem. In this version, the decision maker wishes to pick not the best but the second best candidate, assuming that the best one is going to study at Harvard. It is worth knowing that Vanderbei solved this problem thirty years before publishing the results. Further extension of this model has been considered by Szajowski [7]. The author considers payoff functions that are more general: the decision maker is allowed to stop on a-th rank of the candidate. The problem for exact results for the optimal strategy for 3rd rank was obtained by Quine and Law [8]. Hsiau and Yang [9] considered a model where the candidates are divided into groups. The best choice problem with the idea of payoff function dependent on the absolute value of the item was introduced by Bartoszyński and Govindarajulu [10]. Szajowski [11] modified the idea and included the different personal costs of the choice of the object.
Presman and Sonin [12] changed the first condition of the CSP. They considered a random number of objects in the search. Oveis Gharan and Vondrák [13] showed the best strategy for the situation where an upper bound on the possible number of candidates is given. Recently, a modification of the secretary problem on a no-uniform permutation of ranks (so the second condition of CSP is changed) has been considered (v. Crews et al. [14]). It is a new model for sequential decision-making when we look at many variations in the literature. Here, the authors weight each ranking permutation according to the position of the best candidate in order to model costs incurred from conducting interviews with candidates that are ultimately not hired. It turns out that imposing even infinitesimal costs on the the interview will decrease the probability of the success to about 28%, as opposed to 1/e (about 37%) in the classical case. Jones [15] considered the best choice problem when the decision maker wishes to avoid patterns. This models a situation when there are "extrinsic trends in the candidate pool as well as intrinsic learning on the part of the interviewer". The game version of no-information best choice problem was presented by Ano [16].
In this article, we focus on the third condition of the CSP. We would like to find the best strategy in the situation where the DM is not able to rank objects properly. We will show that this problem can be solved by using the optimal stopping theory (v. Peskir and Shiryaev [17]). By solution, we understand finding the optimal strategy, i.e., the stopping moment that maximizes the expected reward and the value of the problem. We show that the strategy is a simple threshold strategy. A similar, but not the same problem, was considered by Gusein-Zade [18], who introduced a problem where the outcomes are variables dependent on the trials, and the distribution of the observed process is known to the decision maker. In this article, we construct the best choice problem with error modeled. A special case is presented and compared with the classical model. Via a limiting process, we show that the success probability is between 25% and 1/e. By comparing this model with the classical one, we obtain some interesting combinatorial formula and prove it using a mathematical induction principle.

Motivation
In many experiments connected with the CSP, it has been noticed that the DM do not hold the best strategy derived from the mathematical models. Seale and Rapoport [19] showed that the non-optimal decision rule that counts the number of successive non-candidates performs remarkably well. He observed that the DM tends to stop the search too early. The suggestion is that this is because of the cost function: the endogenous cost of time spent in the search. Bearden and Murphy [20] suggested that an early stop can be due to the variability in stopping thresholds. Recently, Hsiao and Kemp [21] noticed the opposite behavior that increases the length of the search depending on the context of the search. They gave only an experimental design of the experiment.
In the widely analyzed models of choosing the best object, it is assumed that the DM can reliably and objectively assess the usefulness of subsequent objects. An example would be choosing the best candidate (as in CSP). If DM is able to observe the size of a feature of particular interest to us, then we are dealing with a full-information best choice problem, i.e., we observe the realization of random variables from some known probabilistic distribution. If such observation is impossible, then we assume that DM gives a rank of the objects and thus we obtain a no-information best choice problem.
Suppose, however, that DM selects candidates only on the basis of information contained in the applicant's documents (e.g., CV, letters, images, etc.). A similar situation was the case of Kepler's dowry problem (v. Ferguson [1]), where the choice process was based not only on the interviews, but also on the negotiations with the families, the advice of friends, etc. However, there may be an error in the assessment of the objects.
Suppose we want to buy a set of goods. The products are presented to us sequentially, and once the rejected offer will not be renewed. If the person making the decision was an expert in the field, then, by watching the products one by one, he would give them a relative rank with certainty. However, for a non-expert, this task is difficult. He knows with some probability that he will be wrong. The relatively best product will always be assessed as the best of those observed so far, but worse products may be placed higher in the ranking due to lack of practice. The goal is to choose the best of the presented sets.
A slightly different approach has been proposed by Krieger and Samuel-Cahn [22]. They propose a model in which the rank is determined on the basis of the counterfeit objects. We propose a model where the rank is correctly assigned, but it is falsified before the selection itself. While there is no problem with the relatively first object, there are problems with the next ones. The decision maker knows that this can happen and must consider the possible chance of error during selection. We assume that this probability is known to him.

General Description
Let Ω be a probability space with sigma algebra F and measure P. Let us consider permutations of the set {1, ..., n} : (x 1 , ..., x n ). Let X t , t = 1, .., n be an absolute rank of the object observed in moment t. Since all permutations are equally probable, we have Let Y t be a relative rank of the t-th object, i.e., {Y t } n t=1 is a sequence of independent random variables with distribution In the considered problem, the DM does not observe the sequence Y t . Suppose that he observes sequentially a sequence O t whose conditional distribution dependent on Y t is known and given by For t = 1, ..., n, the values (1) are nonnegative, less than or equal to 1. Consider the following matrix: Each column of E t sums to 1. The matrix is upper triangular since we assume that the objects can look no worse than they are in reality. We make the following assumption about the sum of the rows of the above matrix. Let Note that t ≤ ε(t) ≤ t 2 . We assume that the function (3) is a non-decreasing function of t, i.e., the probability of observing 1 is not less than a step earlier in time. Using simple calculations, we obtain that The goal of the DM is to maximize the probability of selecting the object with absolute rank 1 regarding the observed sequence O t . His strategies are the stopping moments (or stopping times), i.e., random variables τ such that {τ = t} ∈ F t = σ{O 1 , ..., O t }. Let T be the collection of all stopping moments. Let us define the times of the successive "ones" appearing as: (with convention inf ∅ = ∞). Denote the set of this times by T 0 and note that T 0 ⊆ T . The optimal strategy is stopping moment τ * such that Suppose that in some moment t we observe event {O t = 1}. Since the absolute rank 1 can be obtained only in these moments in which the relative rank is equal to 1, let us calculate the probability that the current object is relatively the best from the past. Using the Bayes formula, we obtain Let us calculate that in the future there will be no more object with relative rank 1 if the current relative rank is 1: Therefore, the initial problem can be transformed into maximization problem of the gain function (6), which will be denoted as a function of observed object g(O t , t). The pair Θ i = (O τ i , τ i ), τ ∈ T 0 is a Markov chain with state space {1} × {1, ..., n}. The problem is therefore to find stopping time τ * such that where E r (Z) = E(Z|Θ 1 = r) for integrable random variable Z. Furthermore, we will omit the first coordinate in notation of function (6) and simply denote it as g(t). Suppose that, in moment t, we observe an event {O t = 1}. Let us calculate the probability that the next observation of value 1 will occur in moment s > t: In other words, this is the transition law for the process Θ. Let us define the operator T f on the set of the measurable and integrable functions f : with the convention that the empty product is equal to 1. Let Q f be a maximal operator on the set of the measurable and integrable functions f , i.e., Q f (r) = max{ f (r), T f (r)} (10) Let us define the value function of the problem as v(r) = sup τ E r (g(τ)). (11) Recall that function (11) satisfies the following recursion: (v. Peskir and Shiryaev [17]) v(r) = max{g(r), Tv(r)}.

Lemma 1. v(t) is a non-increasing function of t.
Proof. Using the Bellman equation and the formula for the transition probability (v. Peskir and Shiryaev [17]), we obtain a recursive equation For example, let us consider a situation in which e t (1, 1) = 1, e t (1, 2) = p = 1 − e t (2, 2), e t (1, k) = 0, k = 3, ..., t; p = 0.1, n = 100. The value function v(k) is presented in the Figure 1 below. In this case, the maximum value is around 0.3505 and the point in which function starts to decrease is k = 39. To calculate the threshold point, we will use the backward induction (since the problem is with a finite horizon). First, note that, for the last moment in time n, we obtain g(n) = n 2 nε(n) and Tg(n) = 0. Therefore, v(n) = g(n). Next, iterations for function v give the conclusion that there exists such a point t * that g(t) ≥ Tg(t), t ≥ t * and g(t) < Tg(t), t < t * . The point t * can be found as a solution of equality g(t) = Tg(t), which gives us Left-hand side of the inequality in (12) is increasing in t, while the right-hand side is decreasing in t. Therefore, if, for some t inequality g(t) ≥ Tg(t) holds, then it holds also for t + 1. Therefore, the problem is monotone and the One-Step Look-Ahead (OLA) rule is optimal (v. Peskir and Shiryaev [17], Abdel-Hameed [23]). The OLA rule gives us the strategy: stop on the first (if any) index t ≥ t * such that O t = 1.
The value of the problem is Summarizing our considerations, we conclude that the optimal strategy is to stop in the first moment t ≥ t * (if any) such that the observed rank O t = 1. Then, the probability of choosing the object with absolute rank 1 is the highest. It is worth noting that it will require a longer search than in the CSP.

Special Case
In the CSP, the DM observes this sequence sequentially and tries to select the best object among all (past, present, and future). In this special case, we allow misrepresentation of the relative rank of 2, which means that for all t ≥ 1. Using the formula for total probability, we can calculate the distribution of O k : To find the OLA strategy, we have to calculate the expected reward for doing one step more, i.e., where the last sum is taken over all possible choices of indicators m 1 , ..., m i . Additionally, Tg(n) = 0 and Tg(1) = 1 + p n The t * moment is the solution of the equality i.e., t n which, after some simplification, gives .
The value function for this example is presented in Figure 1.

Combinatorial Identity
Substituting p = 0, we obtain the classical version of secretary problem, i.e., the no-information best choice problem.
Let us recall the formula from which the optimal threshold can be obtained. In CSP, t * is given by the following formula: Now, by combining (16) and (15) (with p = 0), we obtain the following: . (17) Note that (17) can be expressed as Proof. We will use mathematical induction to prove the above identity. For any t and j = t + 1 (since the sum on the right-hand side is empty) set 0 = 0. For j = t + 2, we have Suppose the formula is true for j. Let's check for j + 1.
Using the induction principle, we obtain

Asymptotic Behavior of the Threshold and Value
Let us check the behavior of the solution when the number of observations n → ∞. In a general case, it is hard to observe the value function due to the needed assumptions on function ε. Let us check in this special case. Since parameter p gives information about the behavior of the second best object, we suppose the correlation with the problem of selecting the best or/and the second best object. Let the payoff function on the right-hand side of equality (15) be interpolation at points u k = t n on the interval [0, 1]. t n → u, u ∈ [0, 1] as n → ∞ and hence we have Optimal threshold u * = lim n→∞ t * n can be calculated from which gives It is easy to see that u * ∈ (e −1 , 1 2 ]. Therefore, the optimal strategy is as follows: skip the first n · u * objects and accept the first observer best object after the moment n · u * . Now, let us calculate the probability that, using this strategy, we will accept the best object overall. We make an approximation of the value function Note that v(u * ) ∈ [ 1 4 , e −1 ). The variability of the functions v(u * ) and u * as a functions of the parameter p is presented in the Figure 2 presented below. The conclusion from the presented analysis is as follows: the higher the probability of the information being distorted, the lower the expected value of the win and the further the stopping threshold. The DM should prefer a more conservative strategy to the CSP.
For p = 1, we have a secretary problem with a fatal error: position number 2 always shows as the relative best one. It is worth noting that, in this case, the asymptotic optimal rule and value are the same as for the problem of stopping on the second best object (e.g., Vanderbei [6], Szajowski [7]).

Conclusions
Although the secretary's problem is well known for many years, it is still strongly developed in mathematics. By modification of one or more points in the CSP, we can obtain another version of the stopping problem. Here, we introduced a modification that prevents DM from reliably matching the rank of the currently observed object. In our version, the optimal stopping rule exists due to the finite expected value of the gain function and finite horizon. In further modifications, it is worth considering infinite horizon and examine the existence and form of the optimal solution. Since the probability of an observed object depends on the unobservable Markov chain, this topic is strongly connected with Hidden Markov Models (v. Bäuerle and Rieder [24]) and the further applications of this tool in CSP can be considered. CSP is the source of many interesting patterns in combinatorics and probability theory. The presented model is one such development. Analytical considerations confirm that, in case of the known possibility of making errors, DM should consider a more conservative strategy and choose a candidate a bit later than it would be in the decision process without any doubts. However, in this strategy, the expected gain for using the above rule is less than in the classical one.
Funding: This research received no external funding.