1. Introduction
Subjects are repeatedly compared in pairs in a wide spectrum of situations, including sports games [
1], ranking of scientific journals [
2,
3], the quality of product brands [
4] and crowdsourcing [
5]. For instance, one team plays with another team in basketball; papers in one journal cite papers in another journal; one consumer chooses one product over another; workers in a crowdsourcing setup are asked to compare pairs of items.
One of the fundamental problems in paired comparison analysis is to derive a fair and reliable ranking of all subjects based on observed comparison data. In round-robin tournaments—where every pair of subjects competes sufficiently many times—a natural ranking can be directly obtained from the number of wins, as the full pairwise comparisons eliminate biases from incomplete matchups. However, in most practical scenarios (e.g., sports leagues, crowdsourcing evaluations, or journal rankings), comparisons are often sparse (not all pairs interact) and stochastic (outcomes contain random noise), leading to unreliable direct rankings based solely on raw win counts. To address this issue, paired comparison models have been developed to statistically infer the underlying merit parameters of subjects and generate objective rankings; see the classic monograph by [
6] for a comprehensive overview of such models and their theoretical foundations. Statistical models not only provide a method of ranking all subjects but are also tools for making inferences on the merits of subjects (e.g., testing whether two subjects have the same merit).
Here, we are concerned with a class of paired comparison models that assign one merit parameter to each subject and assume that the win–loss probability of any pair only depends on the difference between their merit parameters. Specifically, the probability of subject
i winning
j is
where
F is a known cumulative distribution function satisfying
,
is the merit parameter of subject
i and
is the total number of subjects. The well-known Bradley–Terry model [
7], which dates back to at least 1929 [
8], and the Thurstone model [
9], are two special cases of Model (
1). The former postulates the logistic distribution of
, while the latter postulates the normal distribution.
In the standard setting that
n is fixed and the number of comparisons in each pair goes to infinity, the theoretical properties of Model (
1) have been widely investigated in Chapter 4 of [
6]. In the opposite scenario that
n goes to infinity and each pair has a fixed number of comparisons, ref. [
10] proved the uniform consistency and asymptotic normality of the maximum likelihood estimator (MLE) in the Bradley–Terry model.
When the number of subjects is large, paired comparisons are often sparse. Taking the NCAA Division I FBS (Football Bowl Subdivision) regular season, for example, a team plays with at most 14 other teams among a total of 120 teams. The observed comparisons can be represented in a comparison graph with
nodes denoting subjects and a weighted edge between two nodes denoting the number of comparisons. The Erdös–Rényi comparison graph has been widely considered in the literature, e.g., [
1,
11,
12,
13], where the number of comparisons between any two subjects follows a binomial distribution
, and
measures the sparsity. Under a very weak condition on the sparsity on
, ref. [
13] established the uniform consistency and asymptotic normality of the MLE in the Bradley–Terry model by extending the proof strategies in [
10].
Moreover, ref. [
14] considered a fixed sparse comparison graph by controlling the length from one subject to another subject with 2 or 3, in which the consistency and asymptotic normality of the MLE also hold. Inference in the high-dimensional setting under the Bradley–Terry model and some generalized versions has also attracted great interest in the machine learning literature; the upper bounds of various errors are established under different conditions [e.g., the
error
in [
15,
16], the mean square error in [
17], the bias
in [
18]. Under the assumption that the log-likelihood function is strictly convex, ref. [
1] establish the uniform consistency of the MLE in general paired comparison models. However, the asymptotic theory of moment estimation under sparse paired comparison models remains largely underdeveloped in the literature. Existing theoretical developments focus almost exclusively on maximum likelihood estimation (MLE), which relies on strict distributional assumptions and is computationally demanding in high-dimensional settings. In contrast, this paper develops the method of moments (MOM), which avoids full distribution assumptions and maintains computational simplicity while achieving comparable asymptotic properties. The primary novelty of this work is to extend the asymptotic theory of high-dimensional sparse paired comparison models from MLE to the method of moments, establishing a parallel and complementary theoretical framework.
We further elaborate on the advantages of the method of moments (MOM) for high-dimensional sparse paired comparison models. Beyond computational efficiency, MOM has two key strengths over maximum likelihood estimation (MLE) for practical inference: (1) Robustness—the moment estimator is much less sensitive to outliers (e.g., upsets in our NFL data) in sparse settings, where MLE can be biased by extreme observations; (2) Quasi-likelihood compatibility—MOM relies only on moment conditions and avoids strict distributional assumptions, making it robust to model misspecification in high-dimensional sparse scenarios. These merits justify the use of MOM in this study, and our subsequent analysis establishes its asymptotic properties as the core theoretical contribution.
The main contributions of this paper are as follows. First, we develop the moment estimation, instead of the maximum likelihood estimation (MLE) or Bayesian estimation, based on the scores of subjects (i.e., the number of wins) to estimate the merit parameters in Model (
1). The reason why we prioritize moment estimation over MLE is that it is natural to rank subjects according to their scores and the computation based on moment equations is simpler, especially in high-dimensional sparse settings where MLE may suffer from numerical instability due to nonlinear optimization. When
belongs to the exponential family distribution, both estimations are identical. Second, under an Erdös–Rényi comparison graph, we establish a unified theoretical framework in which the uniform consistency and asymptotic normality of the moment estimator hold when
n goes to infinity and
tends to zero. A key idea for the proof of the consistency is that we obtain the convergence rate of the Newton iterative sequence for solving the estimator. The asymptotic normality is proved by applying Taylor expansions to a series of functions constructed from estimating equations and showing that remainder terms in the expansions are asymptotically neglected. Although each pair of subjects is assumed to have a comparison with the same probability
, our proof strategy can be easily extended to the case with different comparison probabilities at the order of
. Third, we use the Thurstone model to illustrate the unified theoretical results. Further extensions to a fixed sparse comparison graph in [
14] are also derived. Numerical studies and real data analysis illustrate our theoretical findings.
The rest of this paper is organized as follows. In
Section 2, we present the moment estimation. In
Section 3, we present the consistency and asymptotic normality of the moment estimator. We illustrate our unified results with one application in
Section 4. We extend the asymptotic results to a fixed comparison graph in
Section 5. In
Section 6, we carry out simulations and give real data analysis. We give a summary and further discussion in
Section 7. The proofs of the main results are relegated to
Appendix A. The proofs of supported lemmas are relegated to
Appendix B.
2. Moment Estimation
Assume that
subjects that are labeled as “
”, are compared in pairs repeatedly. Let
be the times that subject
i compares with subject
j and
be the times that subject
i wins subject
j out of
comparisons. As a result,
. By convention, define
and
. The comparison matrix
is generated from an Erdös–Rényi comparison graph, where
follows a binomial distribution
with
measuring the sparsity of comparisons. More generally,
. We set
to be the same for ease of exposition. Recall that
are the merit parameters of subjects
. The probability in Model (
1) implies that the winning probability only depends on the difference in merit between two subjects. For the identification of models, we normalize
by setting
as in [
10]. We assume that all paired comparisons are independent and
follows a binomial distribution
conditional on
.
Let
be the total wins of subject
i and
. To motivate the estimating equations, we compare the maximum likelihood equation and the moment equation under the Thurstone model described in
Section 4. The maximum likelihood equations are
where
is the density function of the standard normality and
is its distribution function. The corresponding moment equations are
We can see that the latter is simpler and easier to compute. On the other hand, it is natural to rank subjects according to their scores. Thus, we use the moment estimation here. When
in Model (
1) belongs to the exponential family distributions, both are the same.
Write
as the expectation of
and
. Then, the estimating equations are
The solution to the above equations is the moment estimator denoted by
and
. Let
If
is a one-to-one mapping, then
exists and is unique, i.e.,
. When
does not exist (i.e.,
is not one-to-one), any solution
of Equation (
2) is a moment estimator of
. The Newton–Raphson algorithm can be used to solve Equation (
2). Moreover, the R language provides the package “BradleyTerry2” to solve the estimator in the Bradley–Terry model.
We discuss the existence of
from the viewpoint of graph connection. If the comparison graph with the matrix
as its adjacency matrix is not connected, then there are two empty sets such that there are no comparisons between subjects in the first set and those in the second. In this case, there is no basis for ranking subjects in the first set and those in the second set. Further, a necessary condition for the existence of
is that the directed graph
with the win–loss matrix
as its adjacency matrix is strongly connected. In other words, for every partition of the subjects into two nonempty sets, a subject in the second set beats a subject in the first at least once. To see this, assume that there are two empty sets
and
such that all subjects in
win all comparisons with subjects in
. Without loss of generality, we set
and
with
, where
for
and
. By summing
over
, we have
Because
is a sum of
,
, and
, we have
Because
for
and
and at least such one
, it must be
when
in order to guarantee both sides in the above equation to be equal. In this case, at least one such difference
must go to infinity such that the moment estimate does not exist. The strong connection of
is also sufficient for guaranteeing the existence of the MLE in the Bradley–Terry model [
19] in which the moment estimator is equal to the MLE. It is interesting to see whether the strong connection of
is sufficient to guarantee the existence of
in a general model. In the next section, we will show that
exists with probability approaching one under some mild conditions.
3. Asymptotic Properties
In this section, we present the consistency and asymptotic normality of the moment estimator. We first introduce some notations. For a subset
, let
and
denote the interior and closure of
C, respectively. For a vector
, denote
by a vector norm with the
-norm,
, and the
-norm,
. Let
be an
-neighborhood of
x. For an
matrix
, let
denote the matrix norm induced by the
-norm on vectors in
, i.e.,
and let
be a general matrix norm. Define the matrix maximum norm:
. We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the superscript “*”.
Recall that
is the expectation of
. We assume that
is a continuous function with the third derivative. Write
and
as the first and second derivatives of
on
, respectively. Let
be a small positive number. When
, we assume that there are three positive numbers,
, such that
where
.
We use the Bradley–Terry model to illustrate the above inequalities, where
. A direct calculation gives that
It is easy to show that
If
, then
.
3.1. Consistency
To establish the consistency of
, let us first define a system of functions:
and
. It is clear that
. Let
be the Jacobian matrix of
on the parameter
. The asymptotic behavior of
depends crucially on the inverse of
. For convenience, denote
as
, where
Define
When
and
, in view of inequality (
3b), the entries of
V satisfy the following inequalities:
Without loss of generality, we assume that
when
hereafter (otherwise, we redefine
and repeat a similar process). Our strategy for the proof of consistency crucially depends on the existence of the inverse of
V, which requires that
V is a full rank matrix. It is easy to show that
V is positively semi-definite. Thus, if
V has a full rank, then
V must be positively definite. The following lemma assures the existence of the inverse of
V.
Lemma 1. Assume that . With probability at least , is positively definite.
Because
when
, we have
The probability of the nonexistence of
is less than
, going exponentially fast to zero. Generally, the inverse of
V does not have a closed form. Ref. [
20] proposed to approximate the inverse of
V,
, by the matrix
, where
In the above equation,
if
; otherwise,
. By extending the proof of [
20] to the sparse case, the upper bound of the approximate error
is given in Lemma A2.
Recall that the main idea of the proof of the consistency in the Bradley–Terry model [
10,
14] contains two parts. Let
,
,
and
. Since
, it suffices to show that the ratio of subject
,
, and the ratio of
,
are very close. With the nice mathematical properties of the logistic function
, the first part is to show that there are a number of subjects satisfying the following inequalities:
where
b and
c are certain numbers. The second part is to eliminate common terms
based on the condition that the number of the common neighbors between any two subjects,
, is at least
, where
in [
10] and
in [
14]. In the Erdös–Rényi comparison graph, [
13] further showed that there is at least one subject with its ratio close to both
and
.
The aforementioned strategies for the proof of consistency are built on the the premise of the existence of the MLE, which is guaranteed by the necessary and sufficient condition that the directed graph with the win–loss matrix as its adjacency matrix is strongly connected [
19]. As discussed before, it may be difficult to find the minimal sufficient condition to guarantee the existence of
in general paired comparison models. To overcome this difficulty, we aim to obtain the convergence rate of the Newton iterative sequence for solving Equation (
2). Under the well-known Newton–Kantorich conditions, the Newton iterative sequence converges, and its limiting point is the solution. We apply an adjusted version of the Newton–Kantorich theorem in [
21] to this end, which not only guarantees the existence of the solution but also gives an optimal error bound for the Newton iterative sequence.
Now, we formally state the consistency result.
Theorem 1. Assume that Conditions (
3a)
, (
3b)
and (
3c)
hold. If , then exists with probability approaching one and is uniformly consistent in the sense that To see how small could be, we consider a special case that is a constant vector, in which and are also constants. According to the above theorem, if , then .
3.2. Asymptotic Normality of
We establish the asymptotic distribution of
by characterizing its asymptotical representation. In detail, we apply a second-order Taylor expansion to
and find that
can be represented as the sum of a main term
and an asymptotically neglected remainder term, where
denotes the conditional expectation conditional on
. Because
does not have a closed form, we use the matrix
S defined in (
7) to approximate it. We formally state the asymptotic normality of
as follows.
Theorem 2. Let and . If , then for fixed k, the vector follows a k-dimensional multivariate normal distribution with mean zero and the covariance matrix , where Remark 1. If , then is equal to . When belongs to the exponential family distribution (e.g., the Bradley–Terry model), U is identical to V. If , then the asymptotic variance of is involved with an additional factor . The asymptotic variance of is on the order of if is bounded above by a constant.
5. Extension to a Fixed Sparse Design
In some applications such as sports, the comparison graph may be fixed, not random. For example, in the regular season of the National Football League (NFL), games are scheduled in advance. More specially, there are 32 teams in the 2 conferences of the NFL that are divided into 8 divisions, each consisting of 4 teams. In the regular season, each team plays 16 matches, 6 within the division and 10 between the divisions. Motivated by the design, ref. [
14] proposed a sparse condition to control the length from one subject to another subject with 2 or 3:
That is,
is the minimum ratio of the total number of paths between any
i and
j with length 2 or 3. Under the Erdös–Rényi comparison graph, there are similar sparsities. Specifically, the set of common neighbors of any two subjects
i and
j has at least the following size:
with a probability of at least
if
; see (
A12) in the proof of A2.
We assume that if two subjects have comparisons, they are compared
T times, in accordance with the aforementioned setting for easy of exposition. Similar to Lemma A2, the approximate error of using
S to approximate
is
where
and
. With similar lines of argument as in the proofs of Theorems 1 and 2, we have the following theorem, whose proof is omitted.
Theorem 3. Assume that conditions (
3a)
, (
3b)
and (
3c)
hold. If , then exists with probability approaching one and is uniformly consistent in the sense thatLet and . If , then for fixed k, the vector follows a k-dimensional multivariate normal distribution with mean zero and the covariance matrix , where is given in (
7)
. 7. Summary and Discussion
We have presented the moment estimation based on the scores of subjects in the paired comparison model under sparse comparison graphs. We have established the uniform consistency and asymptotic normality of the moment estimator. The consistency is shown by obtaining the convergence rate of the Newton iterative sequence. This leads to a condition on the sparsity parameter
requiring that
if
is a constant vector. We note that this condition looks much stronger than that in the Bradley–Terry model in [
13]. Since we consider a general model, it would seem to be suitable that a more severe condition is imposed. On the other hand, the condition imposed on
may not be the best possible condition. In particular, the conditions for guaranteeing the asymptotic normality seem stronger than those needed for the consistency. Note that the asymptotic behavior of the moment estimator depends not only on
but also on the configuration of all parameters. It would be of interest to investigate whether these conditions could be relaxed.
In this paper, we assume that given the comparison graph, all paired comparisons are independent. Note that the moment equation holds regardless of whether comparisons are independent.
When comparisons are not independent, the moment estimation still works. The consistency result in Theorem 1 still holds as long as there is the same order of the upper bound of
in Lemma A4. In fact, the independence assumption is not directly used when checking our proofs. It is only used in Lemma A4 to derive the upper bound of
using the Hoeffding inequality. Analogously, the independence assumption is used to derive the central limit theorem of
. In the dependence case, there are also many Hoeffding-type exponential tail inequalities (e.g., [
22,
23,
24]) and cental limit theorems for sums of a sequence of random variables (e.g., [
25,
26]) to apply.
Building on the theoretical and empirical findings of this study, we identify two promising avenues for further exploration, which not only address the current limitations but also extend the moment estimation framework to more complex and practical scenarios: 1. Moment estimation for paired comparison models with dependent outcomes. This paper assumes that all paired comparison outcomes are independent, which is a standard but restrictive assumption in many real-world settings (e.g., crowdsourcing evaluations where raters may have consistent biases, or sports leagues where team performance is serially correlated). A natural extension is to develop moment estimation methods for models with dependent outcomes, such as incorporating Markovian dependence or exchangeable correlation structures. Key challenges include deriving unbiased moment equations under dependence and establishing asymptotic properties (consistency, asymptotic normality) using tools from dependent random variable theory (e.g., Hoeffding-type inequalities for associated variables [
22,
23,
24]). 2. High-dimensional sparse paired comparison models with structured merit parameters. This study focuses on unstructured merit parameters (i.e.,
are independent). However, in many applications, merit parameters often exhibit inherent structures, such as group-level homogeneity (e.g., teams in the same sports division share similar strengths) or sparsity (e.g., only a few subjects have distinct merits in large-scale crowdsourcing). Extending the moment estimation framework to incorporate such structures (e.g., group lasso-penalized moment estimation, sparse merit parameter inference) would improve estimation efficiency. Key research questions include designing computationally feasible penalized moment equations and establishing oracle properties for the structured estimators. These directions align with the core theme of sparse paired comparison inference and address practical limitations of the current work. We believe pursuing these avenues will not only extend the theoretical scope of moment estimation but also broaden its applicability to more complex real-world problems.