Comparative analysis of the existence and uniqueness conditions of parameter estimation in paired comparison models

In this paper paired comparison models with stochastic background are investigated. We focus on the models which allow three options for choice and the parameters are estimated by maximum likelihood method. The existence and uniqueness of the estimator is a key issue of the evaluation. In the case of two options, a necessary and sufficient condition is given by Ford in the Bradley-Terry model. We generalize this statement for the set of strictly log-concave distribution. Although in the case of three options necessary and sufficient condition is not known, there are two different sufficient conditions which are formulated in the literature. In this paper we generalize them, moreover we compare these conditions. Their capacities to indicate the existence of the maximum are analyzed by a large number of computer simulations. These simulations support that the new condition indicates the existence of the maximum much more frequently then the previously known ones.


Introduction
Comparisons in pairs are frequently used in ranking and rating problems.They are mainly applied when scaling is very uncertain, but comparing the objects to the others can guarantee more definite results.The area of the possible applications is extremely large, some examples are the followings: education (Sahroni and Ariff, 2016;Kosztyán et al., 2020), sports (Cattelan et al., 2013;Gyarmati et al., 2023;Orbán-Mihálykó et al., 2022), information retrieval (Jeon and Kim, 2013;Gyarmati et al., 2022), energy supply (Trojanowski and Kazibudzki, 2021), financial sector (Montequín et al., 2020), management (Canco et al., 2021).The most popular method is AHP (Analytic Hierarchy Process) elaborated by Saaty (Saaty, 1977(Saaty, , 2004) ) and developed by others, see for example the detailed literature in (Liu et al., 2020).The method has lots of advantages: more than two options, several methods for evaluation, opportunity of incomplete comparisons, simple condition for the uniqueness of the evaluation (Bozóki et al., 2010), possibility of multi-level decision (Rahman et al., 2021), the concept of consistency (Brunelli, 2014).Nevertheless, due to the lack of stochastic background, the usual statistical tools, like confidence intervals, testing hypotheses are out of the possibilities.
Fundamentally different models of paired comparisons are Thurstone motivated stochastic models.The basic concept is the idea of latent random variables, presented in (Thurstone, 1927).Thurstone assumed Gauss distributed latent random variables and allowed two options in decisions, "worse" and "better".The method was modified: Gauss distribution was replaced by logistic distribution in (Bradley and Terry, 1952) and the model is called Bradley-Terry model (BTM).One of its main advantage are the simple mathematical formulae.Thurstone applied least squares method for parameter estimation, BTM applies maximum likelihood estimation and the not-complicated formulae allow quick numerical methods for solving optimization problems.The existence and uniqueness of the optimizer is a key issue in the case of ML estimations; necessary and sufficient condition for it is proved in (Ford Jr, 1957).The model was generalized for three options ("worse", "equal" and "better") in (Glenn and David, 1960) for Gauss distribution and in (Rao and Kupper, 1967) for logistic distribution.The latter paper applied maximum likelihood parameter estimation.Davidson made further modifications in the model concerning ties in (Davidson, 1970).For more than 3 options we can find generalization in (Agresti, 1992) in the case of Bradley-Terry model, and in (Orbán-Mihálykó et al., 2019a) in the case of Gauss distribution.In (Orbán-Mihálykó et al., 2019b) it was proved that the models require the same conditions in order to be able to evaluate the data uniquely in the case of a broad set of cumulative distribution functions for the latent random variables: the strictly log-concave property of the probability density function is the crucial point of the uniqueness, while the assurance of the existence is hidden in the data structure.We mention that Gauss distribution and logistic distribution are included in the set of distributions having strictly log-concave probability density function.Note that, due to the probabilistic background, the Thurstone motivated models have the opportunity of building in the homefield or firstmover advantage (Hankin, 2020), testing hypotheses (Szabó et al., 2016), making forecasts (McHale and Morton, 2011), therefore, they are worth investigating.
In Yan (2016), the author analyzes the structure of the comparisons allowing both two and three options in choice.The author emphasizes that not only the structure of the graph made from the compared pairs but the results of the comparisons affect the existence of MLE.He makes some data perturbations in the cases where there are comparisons, but some results do not occur.By these perturbations, the zero data values become positive, and these positive value guarantee the strongly connected property of the directed graph constructed by the wins.But these perturbations modify the data structures, therefore, it would be better to avoid them.
In (Bong and Rinaldo, 2022), the authors investigate BTM with two options and provide estimations for the probability of the existence of MLE.The authors turn to the condition of Ford to check whether MLE exists uniquely or not.As condition of Ford is necessary and sufficient condition, it indicates explicitly whether the MLE works or not.But in the case of other distributions and/or more than two options these investigations could not be performed due to the lack of necessary and sufficient condition for the existence and uniqueness of MLE.
To continue their research, it would be conducive to have (necessary and) sufficient condition for the existence and uniqueness.To the best knowledge of the authors, there is no such theorem in the research literature, only two sufficient conditions is known.In this paper we compare the known conditions, we formulate their generalization, and we prove it.Then, we compare the applicability of the different conditions from the following point of view: how often and for what kind of parameters are they able to indicate the existence and uniqueness of MLE.We make large numbers of computer simulations and we use them to answer these questions.
The paper is organised as follows: In Section 2 the investigated model is described.In Section 3 we present new conditions under which the existence and uniqueness is fulfilled.The proof can be found in Appendix A. In Section 4 the simulation results concerning the applicability are presented.Finally a short summary is given.

The investigated model
Let the number of the different objects to evaluate be denoted by n, and let the objects be referred to as 1, 2, ..., n.We want to evaluate them on the basis of the opinions of some persons called observers.Let us denote the latent random variable belonging to the i th object by ξ i , i = 1, 2, ..., n.Let the number of the options in a choice be s = 3, namely "worse", "equal" and "better", denoted by C 1 , C 2 and C 3 .We split the set of the real numbers R into 3 intervals, which have no common elements.Each option in judgment corresponds to an interval in the real line, the correspondence is noted by the same index.If the judgment between the i th and j th objects is the option C k , then we assume that the difference ξ i − ξ j of the latent random variables ξ i and ξ j is in the interval I k , k = 1, 2, 3.The intervals are determined by their initial points and endpoints, which are -∞, −d, d and ∞, d] and I 3 =(d,∞).The above intervals together with the corresponding options are presented in Figure 1.We can write the differences of the latent random variables in the following form: Now and η i,j are identically distributed random variables with expectation 0. The ranking of the expectations determines the ranking of the objects and the differences in their values give information concerning the differences of the strengths.We want to estimate the expectations and the value of the border of "equal" (d) on the basis of the data.For that we use maximum likelihood estimation.
The options and the intervals belonging to them The probabilities of the events can be computed on the basis of the assumptions concerning the distributions of η i,j as follows: where F is the (common) cumulative distribution function (c.d.f) of η i,j .
Let the number of observers be r.The judgment produced by the u th observer (u = 1, 2, ..., r) concerning the comparison of the i th and the j th objects is encoded by the elements of a 4 dimensional matrix which has only 0 and 1 coordinates depending on the choice of the respondent.The third indices correspond to the options in choices, k=1,2,3 are for judgments "worse", "equal", and "better", respectively.The matrix of all judgments be X, having 4 dimensions, i = 1, 2, ..., n, j = 1, 2, ..., n, k =1, 2, 3, u = 1, 2, ..., r and 1, if the opinion of the u th observer in pursuance of the comparison of the i th and the Of course, due to the symmetry, X i,j,k,u = X j,i,4−k,u .It expresses that if the i th object is "better" than the j th object, then the j th object is "worse" than the i th object, according to the judgment of the u th respondent.
Let A i,j,k = r u=1 X i,j,k,u be the number of observations C k in pursuance of the comparison of the i th and the j th objects and let A denote the three dimensional matrix containing the elements A i,j,k .Of course, A i,j,k = A j,i,4−k .
The likelihood function expresses the probability of the sample in the function of the parameters.Assuming independent judgments, the likelihood function is which has to be maximized in m = (m 1 , ..., m n ) and 0 < d.
One can realize that the likelihood function depends on the differences of the parameters m i , therefore, one of them can be fixed.

Conditions for the existence and uniqueness
In (Ford Jr, 1957), the author presents a necessary and sufficient condition for the existence and uniqueness of MLE, if there are only two options for choice and F , the c.d.f. of η i,j , is the logistic c.d.f..The condition is the following: for arbitrary non-empty partition of the objects, S and S, there exists at least one element of S, which is "better" than an element of S, and vice versa.In (Davidson, 1970), the author states that this condition supplemented with the condition "there is at least one tie ("equal")" is enough for having a unique maximizer in a modified Bradley-Terry model.The theorem assumes logistic distribution, its proof uses this special form, therefore, it is valid only for the investigated special model.Now we prove it for a broad set of c.d.f.'s.We require the following properties: F is a c.d.f. with 0 < F (x) < 1, F is three times continuously differentiable, its probability density function f is symmetric and the logarithm of f is a strictly concave function in R. Gauss and logistic distribution belong to this set, together with lots of others.Let us denote the set of these c.d.f.-s by F.
First we state the following generalization of Ford's theorem: Theorem 1 Let F ∈ F and suppose that there are only two options in choice.Fix the value of the parameter m 1 = 0.The necessary and sufficient condition for the existence and uniqueness of MLE is the following: for arbitrary non-empty partition of the objects, S and S, there exists at least one element of S, which is "better" than an element of S, and vice versa.
The proof of sufficiency relies on the argumentation of Theorem 4 omitting variable d.
The used steps are (ST3), (ST5), and (ST6) in Appendix A. In the last step, the strictly concave property of logL can be concluded from the theory of logarithmic concave measures (Prékopa, 1973).The necessity is obvious: if there would be a partition without "better" from one subset to another, then each element of this subset would be "worse" than the elements of the complement, but the measure of "worse" could not be estimated.The likelihood function would be monotone increasing, consequently, the maximum would not be reached.
Returning to the case of three options, we formulate conditions of Davidson in the followings: DC 1 There exists an index pair (i 1 , j 1 ) for which 0 <A i 1 ,j 1 ,2 .
DC 2 For any non-empty partition of the objects S and S, there exists at least two index pairs (i 2 ,j 2 ) and (i 3 ,j 3 ) i 2 , i 3 ∈ S, j 2 , j 3 ∈ S for which 0 < A i 2 ,j 2 ,3 and 0 < A i 3 ,j 3 ,1 .
Condition DC 1 expresses that there is a judgment "equal".Condition DC 2 coincides with the condition of Ford in (Ford Jr, 1957) in the case of two options.It expresses that there is at least one object in both subsets which is "better" than an object in the complement.
Theorem 2 Let F ∈ F. If conditions DC 1 and DC 2 hold, then, fixing m 1 = 0, the likelihood function ( 6) attains its maximal value and its argument is unique.
Theorem 2 is the consequence of a more general statement, Theorem 4, which will be proved in Appendix A.
Now we turn to another set of conditions which guarantees the existence and uniqueness of MLE.These conditions will be abbreviated by the initial letters MC.
MC 1 There is at least one index pair (i 1 , j 1 ) for which MC 2 There is at least one index pair (i 2 , j 2 ) for which Let us define the graph G (M ) as follows: the nodes are the objects to be compared.There is an edge between two nodes i and j, if 0 < A i,j,2 or (0 < A i,j,1 and 0 < A i,j,3 ) hold.
MC 3 Graph G (M ) is connected.
Theorem 3 (Orbán-Mihálykó et al., 2019b) Let F ∈ F. If conditions MC 1, MC 2 and MC 3 hold, then, after fixing m 1 =0, the likelihood function (6) attains its maximal value and the argument of the maximum is unique.
To clear the relationship between conditions DC 1, DC 2 and MC 1, MC 2, MC 3 we present two examples.In Example 1, DC 1, DC 2 are satisfied but MC 2 and MC 3 are not.In Example 2, DC 2 is not satisfied but MC 1, MC 2, MC 3 are.These examples expose that the sets of conditions DC and MC do not cover each other.Moreover, they support that MLE may exist uniquely even if DC 1 and DC 2 or MC 1, MC 2 and MC 3 do not hold.Therefore, we can see that neither conditions DC nor conditions MC are necessary conditions.

The above theorems can be generalized. Let us introduce the following set of conditions denoted by SC:
SC 1 There is at least one index pair (i 1 , j 1 ) for which Let us introduce a graph belonging to the results of the comparisons as follows: let DG (SC) be a directed graph, the nodes are the objects, and there is a directed edge from i to j if there is an opinion according to which i is "better" than j, that is 0 < A i,j,3 .Now we can formulate the following conditions: SC 2 There is a cycle in the directed graph DG (SC) .SC 3 For any non-empty partition of the objects S and S, there exists at least two (not necessarily different) index pairs (i 2 ,j 2 ) and (i 3 ,j 3 ) i 2 , i 3 ∈ S, j 2 , j 3 ∈ S for which 0 < A i 2 ,j 2 ,3 and 0 < A i 3 ,j 3 ,1 , or there exists an index pair (i 4 ,j 4 ) i 4 ∈ S and j 4 ∈ S for which 0 < A i 4 ,j 4 ,2 .
It is easy to see that condition SC 2 is more general than condition MC 2 and condition SC 3 is more general than condition DC 2. Condition SC 3 expresses that any subset and its complement are interconnected by an opinion "better" or an opinion "equal".Here Condition DC 2 is replaced by a more general condition: next to "better" the opinion "equal" can also be appropriate judgment for connection.
To analyse the relationships between the sets of conditions DC, MC and SC we can recognize that (A) DC 1, MC 1 and SC 1 coincide.(B) If DC 2 holds, then so does SC 2 and SC 3. (C) If MC 2 holds, so does SC 2. (D) If MC 3 holds, so does SC 3.These together present that conditions SC 1, SC 2, and SC 3 are the generalization of the conditions DC and MC.To show that SC is really a more general set of conditions we present Example 3. 4).In this case neither condition DC 2 nor MC 2 hold, but SC 1, SC 2 and SC 3 do.

Now we state the following theorem.
Theorem 4 Let F ∈ F. If conditions SC 1, SC 2 and SC 3 hold, then, after fixing m 1 =0, the likelihood function (6) attains its maximum value and its argument is unique.
The proof of Theorem 4 can be found in Appendix A.
We note that Theorem 2 is a straightforward consequence of Theorem 4. Unfortunately, conditions SC 1, SC 2 and SC 3 are not necessary conditions.One can prove that in the case of Example 4 there exists a unique maximizer of function ( 6) but SC 2 does not hold.

Comparisons of the efficiency of the conditions
In this section, we investigate in some special situations which sets of conditions (conditions DC 1, DC 2; conditions MC 1, MC 2, MC 3; conditions SC 1, SC 2, SC 3) are fulfilled, i.e. are able to detect the existence and the uniqueness of the maximizer.
From the applications' perspective, there are such cases when the strengths of the objects to rank are close to each other and when they differ very much.On the other hand, there are such cases when the judgment "equal" is frequent, and such cases when it is rare.Referring to sports: in football and in chess the result draw comes up often, but in handball rarely.
The most general set of conditions is the set SC.These conditions are fulfilled most frequently from the three sets of conditions.Nevertheless, it is interesting to what extent it is more applicable than the other two sets of conditions.For that we made a large amount of computer simulations in the case of different parameter settings, and we investigated, how frequently the conditions are satisfied and how frequently we experience that the maximum exists.
We used Monte-Carlo simulation for the investigations.We fixed the differences between two expectations and the value of parameter d.This means that in our cases m = (0, h, 2h, ..., (n − 1)h).We investigated 8 objects, and we generated randomly the pairs between which the comparisons exist.The number of comparisons was 8, 16, 32, 64.The results of the comparisons were also generated randomly, according to the probabilities (3), ( 4) and ( 5).
In these random cases we checked whether conditions DC, MC, and SC are satisfied or not.Moreover we performed the numerical optimizations and we investigated whether the maximal value exists.We used 4 parameter ensembles, called situations, which are shown in Table 1.
In the presented situations, if the value of h is small then the strengths of the objects are close to each other.It implies that many "better-worse" pairs could be formed during the simulations.On the other hand, if the value of h is large, the strengths of the objects are far from each other, then we can expect only few "better-worse" pairs, but a great amount of "better" judgment.In terms of the number of "equal" judgments, if d is large then lots of "equal" judgment could be formed during the simulations, while only few, when d is small.The set of conditions DC can apply well the judgments "better", but it require only a single "equal" judgment.However, the set of conditions MC can use the judgments "equal" for connections, and the pairs "better-worse" judgments.Conditions SC do not require pairs, only judgments "better", in one circle.We recall that a single "better-worse" pair is appropriate as a circle.The judgments "equal" are well-applicable for this set of conditions, too.
Table 1 summarizes the situations with the presumable ratios of the "equal" judgments and "better-worse" pairs.In addition, Tables 2, 3, 4 and 5 contains the numerical results of the simulations.The order of the situations in terms of the number of the existence of the maximal values is decreasing.Column MAX contains the number of the cases when the maximum exists.Columns DC/MAX, MC/MAX and SC/MAX present the ratios of the cases when the set of conditions DC, MC, SC hold, respectively.We can see that increasing the number of comparisons, the number of such cases when the maximal value exists and the ratios increase.We draw the attention to the fact that the values of the columns SC/MAX are less than 1 on several occasions.This detects again that SC is not a necessary condition.
We performed 10 8 simulations per situation.Table 2 presents the results in Situation I.In this case we can see the DC/MAX rate is lower than the MC/MAX rate.We could predict it because there are lots of "equal" judgment.The SC/MAX rate is high even for In the case of 16 comparisons SC is 3.5 times better than MC and over 100 times better than DC.Table 3 presents the results of Situation II.In this case, the rate of "equal" is low, which does not favour the set of conditions MC.This is also reflected in the ratio MC/MAX, which is much worse than the ratio DC/MAX.The set of conditions SC still stands out among the other conditions.
Table 4 shows the results of Situation III.Here the maximum values exist more rarely than in the previous two cases.In this case the number of "equal" decisions is high, while the number of "better-worse" pairs is low, which is favorable for the set of conditions MC and disadvantageous for the set of conditions DC, as we can see in Table 4.It can also be seen that none of the methods are as good as in the previous tables in terms of detecting the existence of the maximum.SC stands out again from the other two sets of conditions.Nevertheless, SC is able to show the existence of the maximum only in 73% in the case of 32 comparisons, compared to 99% in the previous situations.The set of conditions DC is almost useless, it is useful only in the cases 3.3% even if the number of comparisons equals 64.The set of conditions MC method is slowly catching up and getting better, but for small numbers of comparisons (8, 16, 32) it is far from the much better SC criteria.Table 5 presents the results in Situation IV.In the latter case, the numbers of "equal" choices and "better-worse" pairs are small, which is unfavorable MC, principally.In this situation, SC detects the existence of the maximal value exceptionally well.DC evinces them less fine, but it works better than MC.Nevertheless, for small numbers of comparisons, they are orders of magnitude weaker than SC.
In all situations we have found that when we make few comparisons, SC is superior to the other conditions.As we make more and more comparisons, both other methods get better and better, but they are always worse than SC.The clear conclusion from the four tables is that the set of conditions SC is much more effective than the others, especially for small numbers of comparisons.

Summary
In this paper conditions guaranteeing the existence and uniqueness of the maximum likelihood parameter estimation are investigated.The case of general log-concave probability density function is studied.If two options are allowed, the usually applied Ford's condition is generalized from the logistic distribution to a wide set of distributions.This condition is necessary and sufficient condition.In the case of three options in decision, necessary and sufficient condition has not been proved, but there are two different sufficient conditions.We generalized them.A new set of conditions is proved which guarantees the existence and uniqueness of the maximizer.Moreover, we compare the conditions by the help of computer simulations and we have experienced that the set of the new conditions indicates the existence and uniqueness much more frequently, than the previously known conditions.Consequently, it provides more effective methods for such research which was preformed by Yan (Yan, 2016) and Bong and Rinaldo (Bong and Rinaldo, 2022).
The research includes the possibility of further developments.It would be desirable to set up the necessary and sufficient condition of the existence and uniqueness of the maximizer for the case of three options in choices, and simulations may help these findings.Further research is necessary to investigate the case of more than 3 options.These would be the subject of a next paper.
is maximized under the conditions 0<d and m 1 =0.We prove that (8) attains its maximal value under the conditions 0<d and m 1 =0 and the argument of the maximal value is unique.
Computing the value of the log-likelihood function at m = (0, 0, 0, ..., 0), d = 1 and denoting this by logL 0 , the maximum has to be seek in such regions where the values of (8) are at least logL 0 .Moreover, we note that every term of the sum in (8) is negative (or zero if A i,j,k = 0), consequently, the maximum can not be attained in those regions where any term is under logL 0 .By investigating the limits of the terms, we will check which parameters can be restricted into a closed bounded regions.The proof of the existence relies on the Weierstass theorem: we restrict the range of d and m 2 ,...,m n to bounded closed sets where the continuous function ( 8) has maximal value.For that, we proof some lemmas.
(ST1) The first step is to find a positive lower bound for the variable d.
Lemma 1 Condition SC 1 guarantees that the maximum can be attained if ε ≤ d with an appropriate value of 0 < ε.
Proof.SC 1 guarantees that there exists an index pair i, j for which 0 < A i,j,2 .Now, If d −→ 0, the arguments of the c.d.f.tend to the same value, their difference tends to zero.Consequently, its logarithm, with a positive multiplier, tends to minus infinity.As 0.5•A i,j,2 •log(F (d−(m i −m j ))−F (−d−(m i −m j )))<logL 0 , if d < ε, we can restrict the region of d to the subset ε ≤ d with an appropriate value of 0 < ε, while seeking the maximum.
(ST2) The next step is to find an upper bound for the variable d.
Lemma 2 If 0 < A i,j,3 then there exists an upper bound K i,j for which it holds that the maximum can be attained in the region d − (m i − m j ) ≤ K i,j .
Proof.It is easy to see that if 0 < A i,j,3 , then (10)