Moment Estimation in Paired Comparison Models with a Growing Number of Subjects

Wang, Qiuping; Pan, Lu; Yan, Ting

doi:10.3390/e28030314

Open AccessArticle

Moment Estimation in Paired Comparison Models with a Growing Number of Subjects

by

Qiuping Wang

^1,*

,

Lu Pan

² and

Ting Yan

³

¹

School of Mathematics and Statistics, Zhaoqing University, Zhaoqing 526000, China

²

School of Mathematics and Statistics, Shangqiu Normal University, Shangqiu 476000, China

³

Department of Statistics, Central China Normal University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Entropy 2026, 28(3), 314; https://doi.org/10.3390/e28030314

Submission received: 19 February 2026 / Revised: 5 March 2026 / Accepted: 9 March 2026 / Published: 11 March 2026

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

When the number of subjects, n, is large, paired comparisons are often sparse. Here, we study statistical inference in a class of paired comparison models parameterized by a set of merit parameters, under an Erdös–Rényi comparison graph, where the sparsity is measured by a probability

p_{n}

tending to zero. We use the moment estimation base on the scores of subjects to infer the merit parameters. We establish a unified theoretical framework in which the uniform consistency and asymptotic normality of the moment estimator hold as the number of subjects goes to infinity. A key idea for the proof of the consistency is that we obtain the convergence rate of the Newton iterative sequence for solving the estimator. We use the Thurstone model to illustrate the unified theoretical results. Further extensions to a fixed sparse comparison graph are also provided. Numerical studies and real data analysis illustrate our theoretical findings.

Keywords:

consistency; central limit theorem; method of moments; paired comparison; parameter estimation; sparse

1. Introduction

Subjects are repeatedly compared in pairs in a wide spectrum of situations, including sports games [1], ranking of scientific journals [2,3], the quality of product brands [4] and crowdsourcing [5]. For instance, one team plays with another team in basketball; papers in one journal cite papers in another journal; one consumer chooses one product over another; workers in a crowdsourcing setup are asked to compare pairs of items.

One of the fundamental problems in paired comparison analysis is to derive a fair and reliable ranking of all subjects based on observed comparison data. In round-robin tournaments—where every pair of subjects competes sufficiently many times—a natural ranking can be directly obtained from the number of wins, as the full pairwise comparisons eliminate biases from incomplete matchups. However, in most practical scenarios (e.g., sports leagues, crowdsourcing evaluations, or journal rankings), comparisons are often sparse (not all pairs interact) and stochastic (outcomes contain random noise), leading to unreliable direct rankings based solely on raw win counts. To address this issue, paired comparison models have been developed to statistically infer the underlying merit parameters of subjects and generate objective rankings; see the classic monograph by [6] for a comprehensive overview of such models and their theoretical foundations. Statistical models not only provide a method of ranking all subjects but are also tools for making inferences on the merits of subjects (e.g., testing whether two subjects have the same merit).

Here, we are concerned with a class of paired comparison models that assign one merit parameter to each subject and assume that the win–loss probability of any pair only depends on the difference between their merit parameters. Specifically, the probability of subject i winning j is

P (i wins j) = F (β_{i} - β_{j}), i, j = 0, 1, \dots, n; i \neq j,

(1)

where F is a known cumulative distribution function satisfying

F (x) = 1 - F (- x)

,

β_{i}

is the merit parameter of subject i and

n + 1

is the total number of subjects. The well-known Bradley–Terry model [7], which dates back to at least 1929 [8], and the Thurstone model [9], are two special cases of Model (1). The former postulates the logistic distribution of

F (x)

, while the latter postulates the normal distribution.

In the standard setting that n is fixed and the number of comparisons in each pair goes to infinity, the theoretical properties of Model (1) have been widely investigated in Chapter 4 of [6]. In the opposite scenario that n goes to infinity and each pair has a fixed number of comparisons, ref. [10] proved the uniform consistency and asymptotic normality of the maximum likelihood estimator (MLE) in the Bradley–Terry model.

When the number of subjects is large, paired comparisons are often sparse. Taking the NCAA Division I FBS (Football Bowl Subdivision) regular season, for example, a team plays with at most 14 other teams among a total of 120 teams. The observed comparisons can be represented in a comparison graph with

n + 1

nodes denoting subjects and a weighted edge between two nodes denoting the number of comparisons. The Erdös–Rényi comparison graph has been widely considered in the literature, e.g., [1,11,12,13], where the number of comparisons between any two subjects follows a binomial distribution

(T, p_{n})

, and

p_{n}

measures the sparsity. Under a very weak condition on the sparsity on

p_{n}

, ref. [13] established the uniform consistency and asymptotic normality of the MLE in the Bradley–Terry model by extending the proof strategies in [10].

Moreover, ref. [14] considered a fixed sparse comparison graph by controlling the length from one subject to another subject with 2 or 3, in which the consistency and asymptotic normality of the MLE also hold. Inference in the high-dimensional setting under the Bradley–Terry model and some generalized versions has also attracted great interest in the machine learning literature; the upper bounds of various errors are established under different conditions [e.g., the

ℓ_{1}

error

∥ \hat{β} {- β ∥}_{1}

in [15,16], the mean square error in [17], the bias

E ∥ \hat{β} {- β ∥}_{\infty}

in [18]. Under the assumption that the log-likelihood function is strictly convex, ref. [1] establish the uniform consistency of the MLE in general paired comparison models. However, the asymptotic theory of moment estimation under sparse paired comparison models remains largely underdeveloped in the literature. Existing theoretical developments focus almost exclusively on maximum likelihood estimation (MLE), which relies on strict distributional assumptions and is computationally demanding in high-dimensional settings. In contrast, this paper develops the method of moments (MOM), which avoids full distribution assumptions and maintains computational simplicity while achieving comparable asymptotic properties. The primary novelty of this work is to extend the asymptotic theory of high-dimensional sparse paired comparison models from MLE to the method of moments, establishing a parallel and complementary theoretical framework.

We further elaborate on the advantages of the method of moments (MOM) for high-dimensional sparse paired comparison models. Beyond computational efficiency, MOM has two key strengths over maximum likelihood estimation (MLE) for practical inference: (1) Robustness—the moment estimator is much less sensitive to outliers (e.g., upsets in our NFL data) in sparse settings, where MLE can be biased by extreme observations; (2) Quasi-likelihood compatibility—MOM relies only on moment conditions and avoids strict distributional assumptions, making it robust to model misspecification in high-dimensional sparse scenarios. These merits justify the use of MOM in this study, and our subsequent analysis establishes its asymptotic properties as the core theoretical contribution.

The main contributions of this paper are as follows. First, we develop the moment estimation, instead of the maximum likelihood estimation (MLE) or Bayesian estimation, based on the scores of subjects (i.e., the number of wins) to estimate the merit parameters in Model (1). The reason why we prioritize moment estimation over MLE is that it is natural to rank subjects according to their scores and the computation based on moment equations is simpler, especially in high-dimensional sparse settings where MLE may suffer from numerical instability due to nonlinear optimization. When

F (\cdot)

belongs to the exponential family distribution, both estimations are identical. Second, under an Erdös–Rényi comparison graph, we establish a unified theoretical framework in which the uniform consistency and asymptotic normality of the moment estimator hold when n goes to infinity and

p_{n}

tends to zero. A key idea for the proof of the consistency is that we obtain the convergence rate of the Newton iterative sequence for solving the estimator. The asymptotic normality is proved by applying Taylor expansions to a series of functions constructed from estimating equations and showing that remainder terms in the expansions are asymptotically neglected. Although each pair of subjects is assumed to have a comparison with the same probability

p_{n}

, our proof strategy can be easily extended to the case with different comparison probabilities at the order of

p_{n}

. Third, we use the Thurstone model to illustrate the unified theoretical results. Further extensions to a fixed sparse comparison graph in [14] are also derived. Numerical studies and real data analysis illustrate our theoretical findings.

The rest of this paper is organized as follows. In Section 2, we present the moment estimation. In Section 3, we present the consistency and asymptotic normality of the moment estimator. We illustrate our unified results with one application in Section 4. We extend the asymptotic results to a fixed comparison graph in Section 5. In Section 6, we carry out simulations and give real data analysis. We give a summary and further discussion in Section 7. The proofs of the main results are relegated to Appendix A. The proofs of supported lemmas are relegated to Appendix B.

2. Moment Estimation

Assume that

n + 1

subjects that are labeled as “

0, \dots, n

”, are compared in pairs repeatedly. Let

t_{i j}

be the times that subject i compares with subject j and

a_{i j}

be the times that subject i wins subject j out of

t_{i j}

comparisons. As a result,

a_{i j} + a_{j i} = t_{i j}

. By convention, define

t_{i i} = 0

and

a_{i i} = 0

. The comparison matrix

{(t_{i j})}_{(n + 1) \times (n + 1)}

is generated from an Erdös–Rényi comparison graph, where

t_{i j}

follows a binomial distribution

B i n (T, p_{n})

with

p_{n}

measuring the sparsity of comparisons. More generally,

t_{i j} \sim B i n (T_{i j}, p_{n})

. We set

T_{i j}

to be the same for ease of exposition. Recall that

β_{0}, \dots, β_{n}

are the merit parameters of subjects

0, \dots, n

. The probability in Model (1) implies that the winning probability only depends on the difference in merit between two subjects. For the identification of models, we normalize

β_{i}, i = 0, 1, \dots, n

by setting

β_{0} = 0

as in [10]. We assume that all paired comparisons are independent and

a_{i j}

follows a binomial distribution

B i n (t_{i j}, p_{i j})

conditional on

t_{i j}

.

Let

a_{i} = \sum_{j = 0}^{n} a_{i j}

be the total wins of subject i and

a = {(a_{1}, \dots, a_{n})}^{⊤}

. To motivate the estimating equations, we compare the maximum likelihood equation and the moment equation under the Thurstone model described in Section 4. The maximum likelihood equations are

\sum_{j \neq i} [\frac{a_{i j} ϕ (β_{i} - β_{j})}{Φ (β_{i} - β_{j})} - \frac{(t_{i j} - a_{i j}) ϕ (β_{i} - β_{j})}{1 - Φ (β_{i} - β_{j})}] = 0, i = 1, \dots, n .

where

ϕ (\cdot)

is the density function of the standard normality and

Φ (\cdot)

is its distribution function. The corresponding moment equations are

a_{i} = \sum_{j \neq i} t_{i j} Φ (β_{i} - β_{j}), i = 1, \dots, n .

We can see that the latter is simpler and easier to compute. On the other hand, it is natural to rank subjects according to their scores. Thus, we use the moment estimation here. When

F (\cdot)

in Model (1) belongs to the exponential family distributions, both are the same.

Write

μ (\cdot)

as the expectation of

F (\cdot)

and

μ_{i j} (β) = μ (β_{i} - β_{j})

. Then, the estimating equations are

a_{i} = \sum_{j \neq i} t_{i j} μ_{i j} (β), i = 1, \dots, n .

(2)

The solution to the above equations is the moment estimator denoted by

\hat{β} = {({\hat{β}}_{1}, \dots, {\hat{β}}_{n})}^{⊤}

and

{\hat{β}}_{0} = 0

. Let

φ (β) = {(\sum_{j \neq 1} t_{i j} μ_{1 j} (β), \dots, \sum_{j \neq n} t_{n j} μ_{n j} (β))}^{⊤} .

If

φ (β) : R^{n} \to (0, \infty)

is a one-to-one mapping, then

\hat{β}

exists and is unique, i.e.,

\hat{β} = φ^{- 1} (a)

. When

φ^{- 1}

does not exist (i.e.,

φ

is not one-to-one), any solution

\hat{β}

of Equation (2) is a moment estimator of

β

. The Newton–Raphson algorithm can be used to solve Equation (2). Moreover, the R language provides the package “BradleyTerry2” to solve the estimator in the Bradley–Terry model.

We discuss the existence of

\hat{β}

from the viewpoint of graph connection. If the comparison graph with the matrix

{(t_{i j})}_{i, j = 0, \dots, n}

as its adjacency matrix is not connected, then there are two empty sets such that there are no comparisons between subjects in the first set and those in the second. In this case, there is no basis for ranking subjects in the first set and those in the second set. Further, a necessary condition for the existence of

\hat{β}

is that the directed graph

G_{n}

with the win–loss matrix

A = (a_{i j})

as its adjacency matrix is strongly connected. In other words, for every partition of the subjects into two nonempty sets, a subject in the second set beats a subject in the first at least once. To see this, assume that there are two empty sets

B_{1}

and

B_{2}

such that all subjects in

B_{1}

win all comparisons with subjects in

B_{2}

. Without loss of generality, we set

B_{1} = {0, \dots, m}

and

B_{2} = {m + 1, \dots, n}

with

0 \leq m < n

, where

a_{i j} = t_{i j}

for

i \in B_{1}

and

j \in B_{2}

. By summing

a_{i}

over

i = 0, \dots, m

, we have

\sum_{i = 0}^{m} a_{i} = \sum_{i = 0}^{m} \sum_{j = 0}^{m} t_{i j} μ (β_{i} - β_{j}) + \sum_{i = 0}^{m} \sum_{j = m + 1}^{n} t_{i j} μ (β_{i} - β_{j}) .

Because

a_{i}

is a sum of

a_{i j}

,

j = 0, \dots, n

, and

μ (β_{i} - β_{j}) + μ (β_{j} - β_{i}) = 1

, we have

\sum_{i = 0}^{m} \sum_{j = m + 1}^{n} a_{i j} = \sum_{i = 0}^{m} \sum_{j = m + 1}^{n} t_{i j} μ (β_{i} - β_{j}) .

Because

a_{i j} = t_{i j}

for

i = 0, \dots, m

and

j = m + 1, \dots, n

and at least such one

t_{i j} > 0

, it must be

μ (β_{i} - β_{j}) = 1

when

t_{i j} > 0

in order to guarantee both sides in the above equation to be equal. In this case, at least one such difference

β_{i} - β_{j}

must go to infinity such that the moment estimate does not exist. The strong connection of

G_{n}

is also sufficient for guaranteeing the existence of the MLE in the Bradley–Terry model [19] in which the moment estimator is equal to the MLE. It is interesting to see whether the strong connection of

G_{n}

is sufficient to guarantee the existence of

\hat{β}

in a general model. In the next section, we will show that

\hat{β}

exists with probability approaching one under some mild conditions.

3. Asymptotic Properties

In this section, we present the consistency and asymptotic normality of the moment estimator. We first introduce some notations. For a subset

C \subset R^{n}

, let

C^{0}

and

\bar{C}

denote the interior and closure of C, respectively. For a vector

x = {(x_{1}, \dots, x_{n})}^{⊤} \in R^{n}

, denote

∥ x ∥

by a vector norm with the

ℓ_{\infty}

-norm,

{∥ x ∥}_{\infty} = {max}_{1 \leq i \leq n} | x_{i} |

, and the

ℓ_{1}

-norm,

{∥ x ∥}_{1} = \sum_{i} | x_{i} |

. Let

B (x, ϵ) = {y : ∥ x - y ∥_{\infty} \leq ϵ}

be an

ϵ

-neighborhood of x. For an

n \times n

matrix

J = (J_{i j})

, let

{∥ J ∥}_{\infty}

denote the matrix norm induced by the

ℓ_{\infty}

-norm on vectors in

R^{n}

, i.e.,

{∥ J ∥}_{\infty} = max_{x \neq 0} \frac{{∥ J x ∥}_{\infty}}{{∥ x ∥}_{\infty}} = max_{1 \leq i \leq n} \sum_{j = 1}^{n} | J_{i j} |,

and let

∥ J ∥

be a general matrix norm. Define the matrix maximum norm:

{∥ J ∥}_{max} = {max}_{i, j} | J_{i j} |

. We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the superscript “*”.

Recall that

μ (\cdot)

is the expectation of

F (\cdot)

. We assume that

μ (\cdot)

is a continuous function with the third derivative. Write

μ^{'}

and

μ^{″}

as the first and second derivatives of

μ (π)

on

π

, respectively. Let

ϵ_{n}

be a small positive number. When

β \in B (β^{*}, ϵ_{n})

, we assume that there are three positive numbers,

b_{n 0}, b_{n 1}, b_{n 2}

, such that

\begin{matrix} [min_{i, j} μ^{'} (π_{i j})] \cdot [max_{i, j} μ^{'} (π_{i j})] > 0, \end{matrix}

(3a)

\begin{matrix} b_{n 0} \leq min_{i, j} | μ^{'} (π_{i j}) | \leq max_{i, j} | μ^{'} (π_{i j}) | \leq b_{n 1}, \end{matrix}

(3b)

\begin{matrix} max_{i, j} | μ^{″} (π_{i j}) | \leq b_{n 2}, \end{matrix}

(3c)

where

π_{i j} : = β_{i} - β_{j}

.

We use the Bradley–Terry model to illustrate the above inequalities, where

μ (x) = e^{x} / (1 + e^{x})

. A direct calculation gives that

μ^{'} (x) = \frac{e^{x}}{{(1 + e^{x})}^{2}}, μ^{″} (x) = \frac{e^{x} (1 - e^{x})}{{(1 + e^{x})}^{3}} .

It is easy to show that

b_{n 0} = \frac{e^{2 ∥ β^{*} ∥_{\infty} + 2 ϵ_{n}}}{{(1 + e^{2 ∥ β^{*} ∥_{\infty} + 2 ϵ_{n}})}^{2}} \leq | μ^{'} (x) | \leq b_{n 1} = \frac{1}{4}, | μ^{″} (x) | \leq b_{n 2} = \frac{1}{4} .

(4)

If

ϵ_{n} = o (1)

, then

1 / b_{n 0} = O (e^{2 ∥ β^{*} ∥_{\infty}})

.

3.1. Consistency

To establish the consistency of

\hat{β}

, let us first define a system of functions:

H_{i} (β) = \sum_{j = 0}^{n} t_{i j} μ_{i j} (β) - a_{i}, i = 0, \dots, n,

(5)

and

H (β) = {(H_{1} (β), \dots, H_{n} (β))}^{⊤}

. It is clear that

H (\hat{β}) = 0

. Let

H^{'} (β)

be the Jacobian matrix of

H (β)

on the parameter

β

. The asymptotic behavior of

\hat{β}

depends crucially on the inverse of

H^{'} (β)

. For convenience, denote

H^{'} (β)

as

V = {(v_{i j})}_{i, j = 1, \dots, n}

, where

v_{i j} = - t_{i j} μ^{'} (π_{i j}), i \neq j, v_{i i} = \sum_{j = 0, j \neq i}^{n} t_{i j} μ^{'} (π_{i j}) .

Define

v_{i 0} = v_{0 i} : = \sum_{j = 0, j \neq i}^{n} v_{i j} - v_{i i} = - t_{0 i} μ^{'} (π_{0 i}), i = 1, \dots, n, v_{00} = \sum_{j = 1}^{n} t_{0 j} μ^{'} (π_{0 j}) .

When

β \in B (β^{*}, ϵ_{n})

and

{min}_{i, j} μ^{'} (π_{i j}) > 0

, in view of inequality (3b), the entries of V satisfy the following inequalities:

\begin{matrix} if t_{i 0} > 0, & t_{i 0} b_{n 0} \leq v_{i i} + \sum_{j = 1, j \neq i}^{n} v_{i j} \leq t_{i 0} b_{n 1}, i = 1, \dots, n, \\ if t_{i j} > 0, & t_{i j} b_{n 0} \leq - v_{i j} \leq t_{i j} b_{n 1}, i, j = 1, \dots, n; i \neq j . \end{matrix}

(6)

Without loss of generality, we assume that

{min}_{i \neq j} μ^{'} (π_{i j}) > 0

when

β \in B (β^{*}, ϵ_{n})

hereafter (otherwise, we redefine

H_{i} (β) = a_{i} - \sum_{j \neq i} μ_{i j} (β)

and repeat a similar process). Our strategy for the proof of consistency crucially depends on the existence of the inverse of V, which requires that V is a full rank matrix. It is easy to show that V is positively semi-definite. Thus, if V has a full rank, then V must be positively definite. The following lemma assures the existence of the inverse of V.

Lemma 1.

Assume that

{min}_{i, j} μ_{i j}^{'} (β) > 0

. With probability at least

1 - {(1 - p_{n})}^{n T}

,

H^{'} (β)

is positively definite.

Because

\log (1 - x) \leq - x

when

x \in (0, 1)

, we have

e^{n T \log (1 - p_{n})} \leq e^{- p_{n} T n} .

The probability of the nonexistence of

V^{- 1}

is less than

e^{- p_{n} T n}

, going exponentially fast to zero. Generally, the inverse of V does not have a closed form. Ref. [20] proposed to approximate the inverse of V,

V^{- 1}

, by the matrix

S = {(s_{i j})}_{n \times n}

, where

s_{i j} = \frac{δ_{i j}}{v_{i i}} + \frac{1}{v_{00}} .

(7)

In the above equation,

δ_{i j} = 1

if

i = j

; otherwise,

δ_{i j} = 0

. By extending the proof of [20] to the sparse case, the upper bound of the approximate error

∥ V^{- 1} {- S ∥}_{max}

is given in Lemma A2.

Recall that the main idea of the proof of the consistency in the Bradley–Terry model [10,14] contains two parts. Let

{\hat{u}}_{i} = e^{{\hat{β}}_{i}}

,

u_{i} = e^{β_{i}}

,

i_{0} = arg {max}_{i} {\hat{u}}_{i} / u_{i}

and

i_{1} = arg {min}_{i} {\hat{u}}_{i} / u_{i}

. Since

{\hat{u}}_{0} / u_{0} = 1

, it suffices to show that the ratio of subject

i_{0}

,

{\hat{u}}_{i_{0}} / u_{i_{0}}

, and the ratio of

i_{1}

,

{\hat{u}}_{i_{1}} / u_{i_{1}}

are very close. With the nice mathematical properties of the logistic function

μ (x) = e^{x} / (1 + e^{x})

, the first part is to show that there are a number of subjects satisfying the following inequalities:

b \sum_{{j : t_{i_{0}, j} > 0}} ({\hat{u}}_{i_{0}} / u_{i_{0}} - {\hat{u}}_{j} / u_{j}) \leq c, b \sum_{{j : t_{i_{1}, j} > 0}} ({\hat{u}}_{j} / u_{j} - {\hat{u}}_{i_{0}} / u_{i_{0}}) \leq c,

(8)

where b and c are certain numbers. The second part is to eliminate common terms

{\hat{u}}_{j} / u_{j}

based on the condition that the number of the common neighbors between any two subjects,

{min}_{i, j} # {k : t_{i k} > 0, t_{j k} > 0}

, is at least

τ n

, where

τ = 1

in [10] and

τ \in (0, 1)

in [14]. In the Erdös–Rényi comparison graph, [13] further showed that there is at least one subject with its ratio close to both

{\hat{u}}_{i_{0}} / u_{i_{0}}

and

{\hat{u}}_{i_{1}} / u_{i_{1}}

.

The aforementioned strategies for the proof of consistency are built on the the premise of the existence of the MLE, which is guaranteed by the necessary and sufficient condition that the directed graph with the win–loss matrix as its adjacency matrix is strongly connected [19]. As discussed before, it may be difficult to find the minimal sufficient condition to guarantee the existence of

\hat{β}

in general paired comparison models. To overcome this difficulty, we aim to obtain the convergence rate of the Newton iterative sequence for solving Equation (2). Under the well-known Newton–Kantorich conditions, the Newton iterative sequence converges, and its limiting point is the solution. We apply an adjusted version of the Newton–Kantorich theorem in [21] to this end, which not only guarantees the existence of the solution but also gives an optimal error bound for the Newton iterative sequence.

Now, we formally state the consistency result.

Theorem 1.

Assume that Conditions (3a), (3b) and (3c) hold. If

b_{n 1}^{4} b_{n 2} / (b_{n 0}^{6} p_{n}^{4}) = o ({(n / \log n)}^{1 / 2})

, then

\hat{β}

exists with probability approaching one and is uniformly consistent in the sense that

∥ \hat{β} - β^{*} ∥_{\infty} = O_{p} (\frac{b_{n 1}^{2}}{b_{n 0}^{3} p_{n}^{2}} \sqrt{\frac{\log n}{n}}) = o_{p} (1) .

(9)

To see how small

p_{n}

could be, we consider a special case that

β^{*}

is a constant vector, in which

b_{n 0}, b_{n 1}

and

b_{n 2}

are also constants. According to the above theorem, if

p_{n} > O ({(\log n / n)}^{1 / 8})

, then

∥ \hat{β} - β^{*} ∥_{\infty} = O_{p} (p_{n}^{- 2} {(\log n / n)}^{1 / 2})

.

3.2. Asymptotic Normality of $\hat{β}$

We establish the asymptotic distribution of

\hat{β}

by characterizing its asymptotical representation. In detail, we apply a second-order Taylor expansion to

H (\hat{β})

and find that

\hat{β} - β^{*}

can be represented as the sum of a main term

V^{- 1} (a - E^{*} a)

and an asymptotically neglected remainder term, where

E^{*}

denotes the conditional expectation conditional on

{t_{i j} : i, j = 0, \dots, n}

. Because

V^{- 1}

does not have a closed form, we use the matrix S defined in (7) to approximate it. We formally state the asymptotic normality of

\hat{β}

as follows.

Theorem 2.

Let

V = \partial H (β^{*}) / \partial β

and

U = (u_{i j}) : = Var (a | t_{i j}, 0 \leq i, j \leq n)

. If

b_{n 2} b_{n 1}^{6} b_{n 0}^{- 9} p_{n}^{- 6} = o (n^{1 / 2} / \log n)

, then for fixed k, the vector

(({\hat{β}}_{1} - β^{*}), \dots, ({\hat{β}}_{k} - β_{k}^{*}))

follows a k-dimensional multivariate normal distribution with mean zero and the covariance matrix

Σ = {(σ_{i j})}_{k \times k}

, where

σ_{i j} = \frac{δ_{i j} u_{i i}}{v_{i i}^{2}} + \frac{u_{00}}{v_{00}^{2}} .

(10)

Remark 1.

If

U = V

, then

σ_{i j}

is equal to

δ_{i j} / v_{i i} + 1 / v_{00}

. When

F (\cdot)

belongs to the exponential family distribution (e.g., the Bradley–Terry model), U is identical to V. If

U \neq V

, then the asymptotic variance of

{\hat{β}}_{i}

is involved with an additional factor

u_{i i}

. The asymptotic variance of

{\hat{β}}_{i}

is on the order of

{(n p_{n})}^{- 1 / 2}

if

β^{*}

is bounded above by a constant.

4. Application to the Thurston Model

In this section, we illustrate the unified theoretical result by the application to the Thurston model.

The original Thurston model has a variance

σ^{2}

in the normal distribution, i.e., the probability that subject i is preferred over j is

F ((β_{i} - β_{j}) / σ)

. Since the merit parameters are scale invariable, we simply set

σ = 1

hereafter. Recall that

ϕ (x) = {(2 π)}^{1 / 2} e^{- x^{2} / 2}

is the standard normal density function and

Φ (x) = \int_{- \infty}^{x} ϕ (x) d x

is the distribution function of the standard normality. In the Thurston model,

μ (x) = Φ (x)

. Then,

μ^{'} (x) = ϕ (x), μ^{″} (x) = \frac{x}{\sqrt{2 π}} e^{- x^{2} / 2} .

Since

ϕ (x) = {(2 π)}^{1 / 2} e^{- x^{2} / 2}

is an decreasing function on

| x |

, we have when

| x | \leq Q_{n}

,

\frac{1}{2 π} e^{- Q_{n}^{2} / 2} \leq ϕ (x) \leq \frac{1}{2 π} .

Let

h (x) = x e^{- x^{2} / 2}

. Then,

h^{'} (x) = (1 - x^{2}) e^{- x^{2} / 2}

. Therefore, when

x \in (0, 1)

,

h (x)

is an increasing function on its argument x; when

x \in (1, \infty)

,

h (x)

is an decreasing function on x. As a result,

h (x)

attains its maximum value at

x = 1

when

x > 0

. Since

h (x)

is a symmetric function, we have

| h (x) | \leq e^{- 1 / 2} \approx 0.6

. Therefore,

b_{n 0} = \frac{1}{2 π} e^{- (∥ β^{*} {∥_{\infty} + ϵ_{n})}^{2} / 2}, b_{n 1} = \frac{1}{2 π}, b_{n 2} = {(2 π e)}^{- 1 / 2} .

In view of Theorems 1 and 2, we have the following corollary.

Corollary 1.

If

b_{n 1}^{4} b_{n 2} / (b_{n 0}^{6} p_{n}^{2}) = o ({(n / \log n)}^{1 / 2})

and

p_{n} > 24 \log n / n

, then

\hat{β}

exists with probability approaching one and is uniformly consistent in the sense that

∥ \hat{β} - β^{*} ∥_{\infty} = O_{p} (\frac{b_{n 1}^{2}}{b_{n 0}^{3} p_{n}} \sqrt{\frac{\log n}{n}}) = o_{p} (1) .

(11)

Let

V = \partial H (β^{*}) / \partial β

. If

b_{n 2} b_{n 1}^{6} b_{n 0}^{- 9} = o (n^{1 / 2} / \log n)

, then for fixed k, the vector

(({\hat{β}}_{1} - β^{*}), \dots, ({\hat{β}}_{k} - β_{k}^{*}))

follows a k-dimensional multivariate normal distribution with mean zero and the covariance matrix

Σ = {(σ_{i j})}_{k \times k}

defined at (10).

5. Extension to a Fixed Sparse Design

In some applications such as sports, the comparison graph may be fixed, not random. For example, in the regular season of the National Football League (NFL), games are scheduled in advance. More specially, there are 32 teams in the 2 conferences of the NFL that are divided into 8 divisions, each consisting of 4 teams. In the regular season, each team plays 16 matches, 6 within the division and 10 between the divisions. Motivated by the design, ref. [14] proposed a sparse condition to control the length from one subject to another subject with 2 or 3:

τ_{n} : = min_{0 \leq i < j \leq n} \frac{# {k : t_{i k} > 0, t_{j k} > 0}}{t} .

That is,

τ_{n}

is the minimum ratio of the total number of paths between any i and j with length 2 or 3. Under the Erdös–Rényi comparison graph, there are similar sparsities. Specifically, the set of common neighbors of any two subjects i and j has at least the following size:

# {k : t_{i k} > 0, t_{j k} > 0} \geq \frac{1}{2} (n - 1) p_{n}^{2},

with a probability of at least

1 - O (1 / n)

if

n p_{n}^{2} \geq 24 \log n

; see (A12) in the proof of A2.

We assume that if two subjects have comparisons, they are compared T times, in accordance with the aforementioned setting for easy of exposition. Similar to Lemma A2, the approximate error of using S to approximate

V^{- 1}

is

∥ V^{- 1} {- S ∥}_{max} \leq \frac{2 T^{2} b_{n 1}^{2} ρ_{max}}{b_{n 0}^{3} τ_{n}^{3} n^{2}},

where

ρ_{max} = t_{max} / n

and

t_{max} = {max}_{i} t_{i}

. With similar lines of argument as in the proofs of Theorems 1 and 2, we have the following theorem, whose proof is omitted.

Theorem 3.

Assume that conditions (3a), (3b) and (3c) hold. If

b_{n 1}^{4} b_{n 2} / (b_{n 0}^{6} τ_{n}^{2}) = o ({(n / \log n)}^{1 / 2})

, then

\hat{β}

exists with probability approaching one and is uniformly consistent in the sense that

∥ \hat{β} - β^{*} ∥_{\infty} = O_{p} (\frac{b_{n 1}^{2}}{b_{n 0}^{3} τ_{n}} \sqrt{\frac{\log n}{n}}) = o_{p} (1) .

(12)

Let

V = \partial H (β^{*}) / \partial β

and

U = (u_{i j}) : = Var (a | t_{i j}, 0 \leq i, j \leq n)

. If

b_{n 2} b_{n 1}^{6} b_{n 0}^{- 9} τ_{n}^{- 3} = o (n^{1 / 2} / \log n)

, then for fixed k, the vector

(({\hat{β}}_{1} - β^{*}), \dots, ({\hat{β}}_{k} - β_{k}^{*}))

follows a k-dimensional multivariate normal distribution with mean zero and the covariance matrix

Σ = {(σ_{i j})}_{k \times k}

, where

σ_{i j}

is given in (7).

6. Numerical Studies

In this section, we evaluate the asymptotic results of the moment estimator in the Thurston model through simulation studies and a real data example.

6.1. Simulation Studies

We carry out simulations to evaluate the finite sample performance of the moment estimator in the Thurston model. We set

T = 1

, which means that any pair has one comparison with probability

p_{n}

and no comparison with probability

1 - p_{n}

. Let c be a constant. We set the merit parameters to be a linear form, i.e.,

β_{i}^{*} = i c \log n / n

for

i = 1, \dots, n

, where

β_{0}^{*} = 0

. We considered four different values for c as

c = 0.3, 0.5, 0.8

. By allowing

∥ β^{*} ∥_{\infty}

to grow with n, we intended to assess the asymptotic properties under different asymptotic regimes.

In order to see how small

p_{n}

could be, we first evaluate the fail frequency that the “win–loss” graph

G_{n}

is strongly connected, which is the necessary condition to guarantee the moment estimator exists. We set c to be fixed with

c = 0.4

. The results are shown in Table 1 with 1000. We can see that the necessary condition did not hold in each simulation when

p_{n} = \log n / n

, while it holds with almost

100 %

frequency when

p_{n} = {(\log n / n)}^{1 / 2}

. This shows that it is necessary to control the rate of

p_{n}

tending to zero.

Based on Theorem 2,

{\hat{ξ}}_{i j} = [{\hat{β}}_{i} - {\hat{β}}_{j} - (β_{i}^{*} - β_{j}^{*})] / {({\hat{u}}_{i i} / {\hat{v}}_{i i}^{2} + {\hat{u}}_{00} / {\hat{v}}_{00}^{2})}^{1 / 2}

converges in distribution to the standard normality, where

{\hat{u}}_{i i}

and

{\hat{v}}_{i, i}

are the estimates of

u_{i i}

and

v_{i i}

by replacing

β^{*}

with

\hat{β}

. Therefore, we assessed the asymptotic normality of

{\hat{ξ}}_{i j}

via the coverage probability of the

95 %

confidence interval and the length of the confidence interval. The times that

\hat{β}

failed to exist were also recorded. Two values,

n = 100

and

n = 200

, were considered for the number of subjects. Each simulation was repeated 10,000 times.

The simulation results are shown in Table 2. When

p_{n} = {(\log n / n)}^{1 / 4}

, all simulated coverage probabilities are very close to the target level

95 %

. On the other hand, when

p_{n} = {(\log n / n)}^{1 / 2}

, they are a little lower than the normal level in the case

n = 100

and are very close to

95 %

in the case

n = 200

. The length of the confidence interval decreases as n increases, which qualitatively agrees with the theory. It is also expected that the length of the confidence interval increases as

p_{n}

decreases when n is fixed. Another phenomenon is that the length of the confidence interval under three distinct c has little difference when n and

p_{n}

are fixed.

6.2. A Real Data Example

We use the 2018 NFL regular season data as an illustrative example, which is available from https://www.espn.com/nfl/schedule/_/year/2018 (accessed on 20 August 2025). The NFL league consists of thirty-two teams that are divided evenly into two conferences, and each conference has four divisions that have four teams each. In the regular season, each team plays with three intra-division teams (each twice) and ten games with ten inter-division teams (each once). As discussed in [14], the design of the NFL regular season satisfies the sparsity condition of the fixed comparison graph, where

τ_{n} = 1 / 16

. We removed two ties before our analysis. The fitted merits that were obtained from fitting the Thurstone model for the remaining data are given in Table 3, where we used “Arizona Cardinals” with the smallest number of wins as the baseline (with

{\hat{β}}_{0} = 0

).

It is interesting to compare the ordering of six playoff seeds of the two conferences with the NFL rules with the ordering by their fitted merits in Table 3. The NFL rules are based on the regular season won–lost percentage record (PCT) and can be briefly summarized as follows: the teams in each division with the best PCT are seeded one through four; another two teams from each conference are seeded five and six based on their PCT. The six playoff seeds in the American Football Conference from No. 1 to No. 6 based on the PCT are Kansas City Chiefs, New England Patriots, Pittsburgh Steelers, Houston Texans, Los Angeles Chargers, and Indianapolis Colts, while the selected teams based on the fitted merits are Kansas City Chiefs, New England Patriots, Houston Texans, Baltimore Ravens, Los Angeles Chargers, and Pittsburgh Steelers. The corresponding six playoff seeds in the National Football Conference based on the PCT are Los Angeles Rams, New Orleans Saints, Chicago Bears, Washington Redskins, Seattle Seahawks, and Carolina Panthers, while the selected teams base on the fitted merits are Los Angeles Rams, New Orleans Saints, Chicago Bears, Dallas Cow boys, Seattle Seahawks, and Philadelphia Eagles. As we can see, the selected top three teams based on the PCT and the fitted merits in each conference are the same, and the selected teams from No. 4, No. 5 and No. 6 are not all the same.

7. Summary and Discussion

We have presented the moment estimation based on the scores of subjects in the paired comparison model under sparse comparison graphs. We have established the uniform consistency and asymptotic normality of the moment estimator. The consistency is shown by obtaining the convergence rate of the Newton iterative sequence. This leads to a condition on the sparsity parameter

p_{n}

requiring that

p_{n} \geq O ({(\log n / n)}^{1 / 8})

if

β^{*}

is a constant vector. We note that this condition looks much stronger than that in the Bradley–Terry model in [13]. Since we consider a general model, it would seem to be suitable that a more severe condition is imposed. On the other hand, the condition imposed on

b_{n 0}

may not be the best possible condition. In particular, the conditions for guaranteeing the asymptotic normality seem stronger than those needed for the consistency. Note that the asymptotic behavior of the moment estimator depends not only on

b_{n 0}

but also on the configuration of all parameters. It would be of interest to investigate whether these conditions could be relaxed.

In this paper, we assume that given the comparison graph, all paired comparisons are independent. Note that the moment equation holds regardless of whether comparisons are independent.

When comparisons are not independent, the moment estimation still works. The consistency result in Theorem 1 still holds as long as there is the same order of the upper bound of

a_{i} - E^{*} a_{i}

in Lemma A4. In fact, the independence assumption is not directly used when checking our proofs. It is only used in Lemma A4 to derive the upper bound of

a_{i} - E^{*} a_{i}

using the Hoeffding inequality. Analogously, the independence assumption is used to derive the central limit theorem of

a_{i} - E^{*} a_{i}

. In the dependence case, there are also many Hoeffding-type exponential tail inequalities (e.g., [22,23,24]) and cental limit theorems for sums of a sequence of random variables (e.g., [25,26]) to apply.

Building on the theoretical and empirical findings of this study, we identify two promising avenues for further exploration, which not only address the current limitations but also extend the moment estimation framework to more complex and practical scenarios: 1. Moment estimation for paired comparison models with dependent outcomes. This paper assumes that all paired comparison outcomes are independent, which is a standard but restrictive assumption in many real-world settings (e.g., crowdsourcing evaluations where raters may have consistent biases, or sports leagues where team performance is serially correlated). A natural extension is to develop moment estimation methods for models with dependent outcomes, such as incorporating Markovian dependence or exchangeable correlation structures. Key challenges include deriving unbiased moment equations under dependence and establishing asymptotic properties (consistency, asymptotic normality) using tools from dependent random variable theory (e.g., Hoeffding-type inequalities for associated variables [22,23,24]). 2. High-dimensional sparse paired comparison models with structured merit parameters. This study focuses on unstructured merit parameters (i.e.,

β_{i}

are independent). However, in many applications, merit parameters often exhibit inherent structures, such as group-level homogeneity (e.g., teams in the same sports division share similar strengths) or sparsity (e.g., only a few subjects have distinct merits in large-scale crowdsourcing). Extending the moment estimation framework to incorporate such structures (e.g., group lasso-penalized moment estimation, sparse merit parameter inference) would improve estimation efficiency. Key research questions include designing computationally feasible penalized moment equations and establishing oracle properties for the structured estimators. These directions align with the core theme of sparse paired comparison inference and address practical limitations of the current work. We believe pursuing these avenues will not only extend the theoretical scope of moment estimation but also broaden its applicability to more complex real-world problems.

Author Contributions

Conceptualization, Q.W.; Data curation, Q.W.; Formal analysis, Q.W.; Funding acquisition, Q.W.; Investigation, Q.W. and L.P.; Methodology, Q.W. and T.Y.; Software, Q.W. and L.P.; Supervision, T.Y.; Validation, Q.W. and T.Y.; Visualization, Q.W. and L.P.; Writing—original draft, Q.W.; Writing—review and editing, Q.W., L.P. and T.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 12301386.

Informed Consent Statement

Not Applicable.

Data Availability Statement

We use the 2018 NFL regular season data as an example, which is available from https://www.espn.com/nfl/schedule/_/year/2018 (accessed on 20 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In this section, we present the proofs of the theorems.

Appendix A.1. Preliminaries

We state two preliminary results firstly, which will be used in the proofs. The first result is the optimal error bound in the Newton method in [21] under the Kantorovich conditions [27].

Lemma A1

([21]). Let X and Y be Banach spaces, D be an open convex subset of X and F:

D \subseteq X \to Y

be Fréchet differentiable. Assume that, at some

x_{0} \in D

,

F^{'} (x_{0})

is invertible and that

\begin{matrix} ∥ F^{'} {(x_{0})}^{- 1} (F^{'} (x) - F^{'} (y)) ∥ \leq K ∥ x - y ∥, x, y \in D, \end{matrix}

(A1)

\begin{matrix} ∥ F^{'} {(x_{0})}^{- 1} F (x_{0}) ∥ \leq η, h = K η \leq 1 / 2, \\ \bar{S} (x_{0}, t^{*}) \subseteq D, t^{*} = 2 η / (1 + \sqrt{1 - 2 h}) . \end{matrix}

(A2)

Then: (1) The Newton iterates $x_{n + 1} = x_{n} - F^{'} {(x_{n})}^{- 1} F (x_{n})$ , $n \geq 0$ are well-defined, lie in $\bar{S} (x_{0}, t^{*})$ and converge to a solution $x^{*}$ of $F (x) = 0$ .
(2) The solution $x^{*}$ is unique in $S (x_{0}, t^{* *}) \cap D$ , $t^{* *} = (1 + \sqrt{1 - 2 h}) / K$ if $2 h < 1$ and in $\bar{S} (x_{0}, t^{* *})$ if $2 h = 1$ .
(3) $∥ x^{*} - x_{n} ∥ \leq t^{*}$ if $n = 0$ and $∥ x^{*} - x_{n} ∥ \leq 2^{1 - n} {(2 h)}^{2^{n} - 1} η$ if $n \geq 1$ .

The second result is the approximate error of using S to approximate

V^{- 1}

, whose proof is in the Appendix B.

Lemma A2.

If

p_{n} \geq 24 \log n / n

, then for sufficiently large n with a probability of at least

1 - O (n^{- 1})

, we have

∥ V^{- 1} {- S ∥}_{max} \leq \frac{12 T b_{n 1}^{2}}{b_{n 0}^{3} n (n - 1) p_{n}^{3}} .

Appendix A.2. Proof of Theorem 1

We aim to show Theorem 1 by obtaining the convergence rate of the Newton iterative sequence in view of Lemma A1, which requires us to verify the Kantovorich conditions (A1) and (A2). Condition (A1) depends on the Lipschitz continuous form of

H_{i}^{'} (β)

. Recall that

t_{max} = {max}_{i = 0, \dots, n} t_{i}

and

t_{min} = {min}_{i = 0, \dots, n} t_{i}

.

Lemma A3.

Let

D = B (β^{*}, ϵ_{n}) (\subset R^{n})

be an open convex set containing the true point

β^{*}

. For any given set

{t_{i j}, 0 \leq i, j \leq n}

, if Inequality (3c) holds, then

max_{i = 0, \dots, n} ∥ H_{i}^{'} (x) - H_{i}^{'} {(y) ∥}_{1} \leq 4 b_{n 2} t_{max} {∥ x - y ∥}_{\infty} .

Moreover, Condition (A1) also depends on the magnitudes of

| a_{i} - E (a_{i} | t_{i j}, j = 0, \dots n) |

,

i = 0, \dots, n

, which are stated below.

Lemma A4.

With a probability of at least

1 - O (1 / n)

, we have

max_{i = 0, \dots, n} | a_{i} - E (a_{i} | t_{i j}, j = 0, \dots, n) | \leq \sqrt{2 \log n t_{max}} .

The following results are the lower bound of

t_{min}

and the upper bounds of

t_{max}

and of

\sum_{i} t_{i}

.

Lemma A5.

(1) With probability at least

1 - (n + 1) exp (- \frac{1}{8} n T p_{n})

,

t_{min} = min_{i = 0, \dots, n} t_{i} \geq \frac{1}{2} n T p_{n} .

(2) With probability at least

1 - (n + 1) exp (- \frac{1}{10} n T p_{n})

,

t_{max} = max_{i = 0, \dots, n} t_{i} \leq \frac{3}{2} n T p_{n} .

(3) With probability at least

1 - exp (- \frac{1}{10} n (n + 1) T p_{n}

,

\sum_{i = 0}^{n} t_{i} \leq 3 n (n + 1) T p_{n} .

We are now ready to prove Theorem 1.

Proof of Theorem 1.

Note that

\hat{β}

is the solution to the equation

H (β)

= 0. We prove the consistency by obtaining the convergence rate of the Newton iterative sequence:

β^{(k + 1)} = β^{(k)} - {[H^{'} (β^{(k)})]}^{- 1} H (β^{(k)})

, where we set

β^{(0)} : = β^{*}

. To apply Lemma A1, we choose the convex set

D = B (β^{*}, ϵ_{n})

. The following calculations are based on the event

E_{n}

:

\begin{matrix} {t_{i j}, 0 \leq i, j \leq n : max_{i} | a_{i} - E (a_{i} | t_{i j}, j = 0, \dots, n) | \leq \sqrt{2 t_{max} \log n}, \\ H^{'} (β) > 0, t_{min} \geq \frac{T}{2} n p_{n}, t_{max} \leq \frac{3}{2} n T p_{n}} . \end{matrix}

Note that

b_{n 0} \leq | μ_{i j}^{'} (β) | \leq b_{n 1}

when

β \in B (β^{*}, ϵ_{n})

. Let

V = {(v_{i j})}_{n \times n} = H^{'} (β^{*})

. We use S defined in (7) to approximate

V^{- 1}

and let

W = V^{- 1} - S

. We verify the Kantovorich conditions in Lemma A1 as follows. Since

\sum_{i = 0}^{n} H_{i} (β) = 0

, we have

\sum_{i = 1}^{n} H_{i} (β) = - H_{0} (β) .

Based on Lemma A3, we have

\begin{matrix} ∥ {[H^{'} (β^{*})]}^{- 1} [H^{'} (x) - H^{'} (y)] ∥_{\infty} \\ \leq & ∥ S [H^{'} (x) - H^{'} (y)] ∥_{\infty} + {∥ W [H^{'} (x) - H^{'} (y)] ∥}_{\infty} \\ \leq & max_{i = 1, \dots, n} \frac{1}{v_{i i}} ∥ H_{i}^{'} (x) - H_{i}^{'} {(y) ∥}_{1} + \frac{1}{v_{00}} ∥ H_{0}^{'} (x) - H_{0}^{'} {(y) ∥}_{1} + {∥ W ∥}_{\infty} {∥ H^{'} (x) - H^{'} (y) ∥}_{\infty} \\ \leq & [\frac{2}{b_{n 0} t_{min}} + n \cdot \frac{12 b_{n 1}^{2} T}{n (n - 1) b_{n 0}^{3} p_{n}^{3}}] \times 4 b_{n 2} t_{max} \times {∥ x - y ∥}_{\infty} \\ = & O (\frac{b_{n 1}^{2} b_{n 2}}{b_{n 0}^{3} p_{n}^{2}}) \times {∥ x - y ∥}_{\infty} . \end{matrix}

Thus, we can set

K = O (b_{n 1}^{2} b_{n 2} b_{n 0}^{- 3} η_{n}^{- 1})

in (A1). Again, based on the event

E_{n}

, we have

\begin{matrix} η & = & ∥ {[H^{'} (β^{*})]}^{- 1} H (β^{*}) ∥_{\infty} \\ \leq & n ∥ V^{- 1} {- S ∥}_{max} {∥ H (β^{*}) ∥}_{\infty} + max_{i = 1, \dots, n} \frac{| H_{i} (β^{*}) |}{v_{i i}} + \frac{| H_{0} (β^{*}) |}{v_{00}} \\ \leq & [O (\frac{b_{n 1}^{2}}{n b_{n 0}^{3} p_{n}^{3}}) + O (\frac{1}{b_{n 0} t_{min}})] \times O ({(t_{max} \log n)}^{1 / 2}) \\ = & O (\frac{b_{n 1}^{2}}{b_{n 0}^{3} p_{n}^{2}} \sqrt{\frac{\log n}{n}}) . \end{matrix}

If

K η = O (\frac{b_{n 1}^{4} b_{n 2}}{b_{n 0}^{6} p_{n}^{4}} \sqrt{\frac{\log n}{n}}) = o (1),

(A3)

then this verifies Condition (A2). Based on Lemma A1,

{lim}_{k} β^{(k)}

exists, denoted by

\hat{β}

, and it satisfies

∥ \hat{β} - β^{*} ∥_{\infty} = O (\frac{b_{n 1}^{2}}{b_{n 0}^{3} p_{n}^{2}} \sqrt{\frac{\log n}{n}}) .

Based on Lemmas 1, A2, A4 and A5, the event

E_{n}

holds with a probability of at least

1 - O (n^{- 1})

if

p_{n} \geq 24 \log n / n

. Note that (A3) implies

p_{n} \geq 24 \log n / n

. This completes the proof. □

Appendix A.3. Proofs for Theorem 2

Write

{Var}^{*}

and

E^{*}

as the conditional variance and conditional expectation given

t_{i j}

for

0 \leq i, j \leq n

. Let

U = (u_{i j}) : = {Var}^{*} (a)

. In the Bradley–Terry model,

U = H^{'} (β^{*})

. Note that

a_{i}

is a sum of

t_{i}

independent Bernoulli random variables. Based on Lemma A5, we know that

{min}_{i} t_{i} = O_{p} (n p_{n})

. Let

σ_{min} = {min}_{i \neq j} p_{i j} (1 - p_{i j})

. If

n p_{n} σ_{min} \to \infty

, then

{min}_{i} u_{i i} \to \infty

. Based on the central limit theorem in the bound case, as in [28] (p. 289), if

n p_{n} σ_{min} \to \infty

, then

u_{i i}^{- 1 / 2} {a_{i} - E^{*} (a_{i})}

converges in distribution to the standard normal distribution. When considering the asymptotic behaviors of the vector

(a_{1}, \dots, a_{r})

with a fixed r, one could replace the degrees

a_{1}, \dots, a_{r}

by the independent random variables

{\tilde{a}}_{i} = a_{i, r + 1} + \dots + a_{i n}

,

i = 1, \dots, r

. Therefore, we have the following proposition.

Proposition A1.

If

n p_{n} σ_{min} \to \infty

, then as

n \to \infty

, for any fixed

r \geq 1

, the components of

(a_{1} - E^{*} (a_{1}), \dots, a_{r} - E^{*} (a_{r}))

are asymptotically independent and normally distributed with variances

u_{11}, \dots, u_{r r}

, respectively. Moreover, the first r rows of

S (a - E^{*} (a))

are asymptotically normal with covariance matrix

Σ = (σ_{i j})

, where

σ_{i j} = \frac{δ_{i j} u_{i i}}{v_{i i}^{2}} + \frac{u_{00}}{v_{00}^{2}} .

Lemma A6.

Let

V = H^{'} (β^{*})

and

W = V^{- 1} - S

and

{Cov}^{*} (\cdot) = Cov (\cdot | t_{i j}, 0 \leq i, j \leq n)

. Then,

{Cov}^{*} (W H^{'} (β^{*})) = O_{p} (\frac{b_{n 1}^{5}}{n^{2} b_{n 0}^{6} p_{n}^{5}}) .

Further, if

U = H^{'} (β^{*})

, then

{Cov}^{*} (W H^{'} (β^{*})) = O_{p} (\frac{b_{n 1}^{2}}{n^{2} b_{n 0}^{3} p_{n}^{5}}) .

Now, we are ready to prove Theorem 2.

Proof of Theorem 2.

Let

{\hat{π}}_{i j} = {\hat{β}}_{i} - {\hat{β}}_{j}

and

π_{i j}^{*} = β_{i}^{*} - β_{j}^{*}

. Based on Theorem 1,

\hat{β} \in B (β^{*}, ϵ_{n})

. To simplify notations, write

μ_{i j}^{'} = μ^{'} (π_{i j}^{*})

. Based on a second-order Taylor expansion, we have

t_{i j} μ ({\hat{π}}_{i j}) - t_{i j} μ (π_{i j}^{*}) = t_{i j} μ_{i j}^{'} ({\hat{β}}_{i} - β_{i}^{*}) - t_{i j} μ_{i j}^{'} ({\hat{β}}_{j} - β_{j}^{*}) + g_{i j}, i \neq j,

(A4)

where

g_{i j}

is the second-order remainder term:

\begin{matrix} g_{i j} = t_{i j} μ^{″} ({\tilde{π}}_{i j}) [{({\hat{β}}_{i} - β_{i})}^{2} + {({\hat{β}}_{j} - β_{j})}^{2} - 2 ({\hat{β}}_{i} - β_{i}) ({\hat{β}}_{j} - β_{j})] . \end{matrix}

In the above equation,

{\tilde{π}}_{i j}

lies between

π_{i j}^{*}

and

{\hat{π}}_{i j}

. If

b_{n 1}^{4} b_{n 2} b_{n 0}^{- 6} p_{n}^{- 4} = o ({(n / \log n)}^{1 / 2})

, based on Theorem 1, we have

∥ \hat{β} - β^{*} ∥_{\infty} = O_{p} (\frac{b_{n 1}^{2}}{b_{n 0}^{3} p_{n}^{2}} \sqrt{\frac{\log n}{n}}) .

Therefore, in view of (3c),

| μ_{i j}^{″} ({\tilde{π}}_{i j}) | \leq t_{i j} b_{n 2}

such that

| g_{i j} | \leq 4 b_{n 2} t_{i j} {∥ \hat{β} - β^{*} ∥}_{\infty}^{2} .

(A5)

Let

g_{i} = \sum_{j \neq i} g_{i j}

,

i = 0, \dots, n

, and

g = {(g_{1}, \dots, g_{n})}^{⊤}

. Then, based on Lemma A5 (2), we have

max_{i = 0, \dots, n} | g_{i} | = 4 b_{n 2} t_{max} \cdot O_{p} (\frac{b_{n 1}^{4} \log n}{n b_{n 0}^{6} p_{n}^{4}}) = O_{p} (\frac{b_{n 1}^{4} b_{n 2} \log n}{b_{n 0}^{6} p_{n}^{3}}) .

(A6)

By writing the equation in (A4) into a matrix form, we have

E^{*} a - a = V (\hat{β} - β^{*}) + g .

(A7)

Equivalently,

\hat{β} - β^{*} = V^{- 1} (E^{*} a - a) + V^{- 1} g .

(A8)

Similarly, we have

\begin{matrix} E^{*} a_{0} - a_{0} & = & \frac{\partial H_{0} (β^{*})}{\partial β} (\hat{β} - β^{*}) + \frac{1}{2} {(\hat{β} - β^{*})}^{⊤} \frac{\partial^{2} H_{0} (\tilde{β})}{\partial β \partial β^{⊤}} (\hat{β} - β^{*}) \\ = & - \sum_{i = 1}^{n} v_{i 0} ({\hat{β}}_{i} - β_{i}^{*}) + \frac{1}{2} v_{i 0} {({\hat{β}}_{i} - β_{i}^{*})}^{2} . \end{matrix}

Therefore, based on Lemma A5 (2), we have

| E^{*} a_{0} - a_{0} + \sum_{i = 1}^{n} v_{i 0} ({\hat{β}}_{i} - β_{i}^{*}) | = O_{p} (\frac{b_{n 1}^{4} t_{max} \log n}{n b_{n 0}^{6} p_{n}^{4}}) = O_{p} (\frac{b_{n 1}^{4} \log n}{b_{n 0}^{6} p_{n}^{3}}) .

Note that

\sum_{i = 0}^{n} a_{i} - E^{*} (a_{i}) = 0

. Multiplying both sides of (A7) by a row vector with all element 1 yields

\sum_{i = 1}^{n} g_{i} = a_{0} - E^{*} a_{0} + \sum_{i = 1}^{n} v_{i 0} ({\hat{β}}_{i} - β_{i}^{*}) .

Therefore, we have

| \sum_{i = 1}^{n} g_{i} | = O_{p} (\frac{b_{n 1}^{4} \log n}{b_{n 0}^{6} p_{n}^{3}}) .

(A9)

Based on (A6) and Lemma A2, we have

\begin{matrix} ∥ V^{- 1} {g ∥}_{\infty} & \leq & {∥ S g ∥}_{\infty} + {∥ (V^{- 1} - S) g ∥}_{\infty} \\ \leq & max_{i = 1, \dots, n} \frac{1}{v_{i i}} | g_{i} | + \frac{1}{v_{00}} | \sum_{i = 1}^{n} g_{i} | + n ∥ V^{- 1} {- S ∥}_{max} {∥ g ∥}_{\infty} \\ = & O_{p} (\frac{b_{n 2}}{t_{min} b_{n 0}} + \frac{b_{n 1}^{2}}{n b_{n 0}^{3} p_{n}^{3}}) \times O_{p} (\frac{b_{n 1}^{4} b_{n 2} \log n}{b_{n 0}^{6} p_{n}^{3}}) \\ = & O_{p} (\frac{b_{n 2} b_{n 1}^{6} \log n}{n b_{n 0}^{9} p_{n}^{6}}) . \end{matrix}

If

b_{n 2} b_{n 1}^{6} b_{n 0}^{- 9} p_{n}^{- 6} = o (n^{1 / 2} / \log n)

, then we have

{\hat{β}}_{i} - β_{i}^{*} = V^{- 1} (E^{*} a - a) + o_{p} (n^{- 1 / 2}) .

(A10)

Consequently, in view of Lemma A6, we have

{\hat{β}}_{i} - β_{i}^{*} = {[S (E^{*} a - a)]}_{i} + o_{p} (n^{- 1 / 2}) .

Therefore, Theorem 2 immediately comes from Proposition A1. □

Appendix B

In this section, we present the proofs of supported lemmas.

Appendix B.1. Proof of Lemma 1

Proof of Lemma 1.

For an arbitrarily given nonzero vector

x = {(x_{1}, \dots, x_{n})}^{⊤} \in R^{n}

, direct calculations give

\begin{matrix} x^{⊤} V x & = & = \sum_{i = 1}^{n} x_{i}^{2} v_{i i} + \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{n} x_{i} v_{i j} x_{j} \\ = & - \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{n} x_{i}^{2} v_{i j} - \sum_{i = 1}^{n} x_{i}^{2} v_{i 0} + \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{n} x_{i} v_{i j} x_{j} \\ = & - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{n} {(x_{i} - x_{j})}^{2} v_{i j} - \sum_{i = 1}^{n} x_{i}^{2} v_{i 0}, \end{matrix}

where the second equality is due to that

v_{i i} = - \sum_{j \neq i} v_{i j}

. Therefore,

x^{⊤} V x = 0

if and only if

x_{i} v_{i 0} = 0, i = 1, \dots, n, v_{i j} (x_{i} - x_{j}) = 0, 1 \leq i \neq j \leq n .

Because

μ_{i j} (β) \neq 0

and

v_{i j} = t_{i j} μ_{i j} (β)

for

i \neq j

, the above equations are identical to

x_{i} t_{i 0} = 0, i = 1, \dots, n, t_{i j} (x_{i} - x_{j}) = 0, 1 \leq i \neq j \leq n .

Let E be the event

{{t_{i j}}_{i, j = 0, i \neq j}^{n} : x_{i} t_{i 0} = 0, i = 1, \dots, n, t_{i j} (x_{i} - x_{j}) = 0, 1 \leq i \neq j \leq n} .

To show Lemma 1, it is sufficient to obtain the lower bound of the probability of the event E. We will evaluate the probability of the event E under two cases: there exists some zero element in x and there are no zero elements in x.

Case I: We consider there are zero elements in x. Let

{0, x_{i_{1}}, \dots, x_{i_{k}}}

be

k + 1

different distinct values in

{x_{1}, \dots, x_{n}}

,

Ω_{0} = {i : x_{i} = 0}

and

Ω_{j} = {q : x_{q} = x_{i_{j}}}

,

j = 1, \dots, k

. Since

x \neq 0

,

k \geq 1

. It is clear that

| Ω_{0} | > 0, | Ω_{j} | > 0, j = 1, \dots, k, \sum_{i = 0}^{k} | Ω_{i} | = n + 1,

(A11)

where

| Ω_{i} |

denotes the cardinality of

Ω_{i}

. Therefore, we have

\begin{matrix} P (E) & = & {(1 - p_{n})}^{\sum_{j = 1}^{k} | Ω_{j} |} \times \prod_{0 \leq i < j \leq k} {(1 - p_{n})}^{T | Ω_{i} | | Ω_{j} |} \\ = & {(1 - p_{n})}^{\sum_{j = 1}^{k} | Ω_{j} | + \sum_{0 \leq i < j \leq k} | Ω_{i} | | Ω_{j} |} . \end{matrix}

To obtain the lower bound of

P (E)

, it is sufficient to solve the minimizer of

\sum_{j = 1}^{k} | Ω_{j} | + \sum_{0 \leq i < j \leq k} | Ω_{i} | | Ω_{j} |

under the restricted condition (A11). Let

y_{i} = | Ω_{i} |

. Then,

\begin{matrix} \sum_{j = 1}^{k} y_{i} + \sum_{0 \leq i < j \leq k} y_{i} y_{j} \\ = & \sum_{j = 1}^{k} y_{i} + \frac{1}{2} \sum_{i = 0}^{n} \sum_{j = 0, j \neq i}^{k} y_{i} y_{j} \\ = & \frac{1}{2} \sum_{i = 0}^{n} \sum_{j = 0}^{n} y_{i} y_{j} - \frac{1}{2} \sum_{i = 0}^{k} y_{i}^{2} + \sum_{j = 1}^{k} y_{i} \\ = & \frac{1}{2} {(\sum_{i = 0}^{n} y_{i})}^{2} - \frac{1}{2} \sum_{i = 1}^{k} {(y_{i} - 1)}^{2} - \frac{1}{2} y_{0}^{2} + \frac{1}{2} k \\ = & \frac{1}{2} ({(n + 1)}^{2} + k) - \frac{1}{2} (\sum_{i = 0}^{k} z_{i}^{2}), \end{matrix}

where

z_{0} = y_{0}

and

z_{i} = y_{i} - 1

,

i = 1, \dots, k

. Under the restriction

\sum_{i} z_{i} = n + 1 - k > 0

and

z_{i} \geq 0

, the function

\sum_{i = 0}^{k} z_{i}^{2}

obtains its maximizer at such points

z = (0, \dots, n + 1 - k, 0, \dots, 0)

. Therefore, we have

\begin{matrix} \sum_{j = 1}^{k} y_{i} + \sum_{0 \leq i < j \leq k} y_{i} y_{j} \\ \geq & \frac{1}{2} ({(n + 1)}^{2} + k - {(n + 1 - k)}^{2}) = \frac{1}{2} (2 (n + 1) k + k - k^{2}) \\ = & \frac{1}{2} [- {(k - \frac{2 (n + 1) + 1}{2})}^{2} + {(n + 1 + \frac{1}{2})}^{2}] . \end{matrix}

Because

1 \leq k \leq n

, the above function obtains its minimizer at

k = 1

. That is,

\begin{matrix} - {(k - \frac{2 (n + 1) + 1}{2})}^{2} + {((n + 1) + \frac{1}{2})}^{2} \\ \geq & - {(1 - (n + 1) - 1 / 2)}^{2} + {(n + 1)}^{2} + (n + 1) + 1 / 4 \\ = & 2 (n + 1) . \end{matrix}

This shows

P (E) \leq {(1 - p_{n})}^{2 T (n + 1)} .

Case II: there are no zero elements in x. With the same notation

Ω_{j}

as in Case I, we have that

| ω_{0} | = 0

and

\begin{matrix} P (E) & = & {(1 - p_{n})}^{\sum_{j = 1}^{k} | Ω_{j} |} \times \prod_{1 \leq i < j \leq k} {(1 - p_{n})}^{T | Ω_{i} | | Ω_{j} |} \\ = & {(1 - p_{n})}^{T (\sum_{j = 1}^{k} | Ω_{j} | + \sum_{1 \leq i < j \leq k} | Ω_{i} | | Ω_{j} |)} . \end{matrix}

It is sufficient to obtain the minimizer of

\sum_{j = 1}^{k} | Ω_{j} | + \sum_{1 \leq i < j \leq k} | Ω_{i} | | Ω_{j} |

. under the restriction

\sum_{i} | Ω_{i} | = n

and

k \geq 1

. Let

y_{i} = | Ω_{i} |

. Then,

\begin{matrix} \sum_{j = 1}^{k} y_{i} + \sum_{1 \leq i < j \leq k} y_{i} y_{j} & = & \sum_{j = 1}^{k} y_{i} + \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{k} y_{i} y_{j} \\ = & \frac{1}{2} \sum_{i = 1}^{k} \sum_{j = 1}^{k} y_{i} y_{j} - \frac{1}{2} \sum_{i = 1}^{k} y_{i}^{2} + (n + 1) \\ = & \frac{1}{2} {(\sum_{i = 1}^{k} y_{i})}^{2} - \frac{1}{2} \sum_{i = 1}^{k} y_{i}^{2} + (n + 1) \\ = & \frac{1}{2} {(n + 1)}^{2} + (n + 1) - \frac{1}{2} (\sum_{i = 1}^{k} y_{i}^{2}) . \end{matrix}

Under the restriction

\sum_{i} y_{i} = n + 1 > 0

and

z_{i} \geq 0

, the functions

\sum_{i = 1}^{k} z_{i}^{2}

obtain their maximizer at points

z = (0, \dots, n + 1, 0, \dots, 0)

. Therefore, we have

\sum_{j = 1}^{k} y_{i} + \sum_{1 \leq i < j \leq k} y_{i} y_{j} \geq \frac{1}{2} ({(n + 1)}^{2} + (n + 1) - {(n + 1)}^{2}) \geq n + 1 .

This shows

P (E) \leq {(1 - p_{n})}^{(n + 1) T} .

By combining the lower bounds of

P (E)

under Cases I and II, we have

P (E^{c}) \geq 1 - {(1 - p_{n})}^{(n + 1) T},

where

E^{c}

denotes that V is positively definite. □

Appendix B.2. Proof of Lemma A2

Proof of Lemma A2.

Based on Lemma 1, V is a positively definite matrix with a probability of at least

1 - e^{- p_{n} T (n + 1)}

. In what follows, we assume that V is positively definite such that its inverse exists. The proof proceeds two parts. The first part is to evaluate the cardinality of the set of the common neighbors of any two subjects i and j. That is, we establish the lower bound:

min_{i, j} # {k : t_{i k} > 0, t_{k j} > 0} .

The second part is to show such inequality [c.f. (A16)]

a \geq b [\sum_{{k : t_{α k} > 0, t_{k β} > 0}} (z_{i α} - z_{i β})] .

We use the method of the proof in [20] with minor modifications that simplify their proofs to show the second part.

Part I. Let

1_{{\cdot}}

be an indicator variable. It is equal to one when the expression in

{\cdot}

is true; otherwise, it is equal to zero. For any given

i \neq j

, define

ξ_{i j} = \sum_{k = 0, k \neq i, j}^{n} 1_{{t_{i k} > 0, t_{j k} > 0}} .

Note that

ξ_{i j}

is the sum of

n - 1

independent Bernoulli random variables, and for three distinct indices

i, j, k

,

E ξ_{i j} = P (t_{i k} > 0) P (t_{j k} > 0) = {(1 - {(1 - p_{n})}^{T})}^{2} : = η_{n} .

Based on the Chernoff bound in [29], we have

\begin{matrix} P (ξ_{i j} \leq \frac{1}{2} (n - 1) η_{n}) \leq exp (- \frac{1}{8} (n - 1) η_{n}) . \end{matrix}

It follows that

\begin{matrix} P (min_{i, j} ξ_{i j} \leq \frac{1}{2} (n - 1) η_{n}) \leq \sum_{i, j} P (ξ_{i j} \leq \frac{1}{2} (n - 1) η_{n}) \leq \frac{(n + 1) n}{2} exp (- \frac{1}{8} (n - 1) η_{n}) . \end{matrix}

Since

T \geq 1

,

η_{n} = {(1 - {(1 - p_{n})}^{T})}^{2} \geq {(1 - (1 - p_{n}))}^{2} = p_{n}^{2} .

That is, with a probability of at least

1 - \frac{(n + 1) n}{2} exp (- \frac{1}{8} (n - 1) η_{n})

, we have

min_{i, j} ξ_{i j} \geq \frac{1}{2} (n - 1) η_{n} \geq \frac{1}{2} (n - 1) p_{n}^{2} .

(A12)

Part II. For convenience, we introduce a non-negative array

{q_{i j}}_{i, j = 1}^{n}

, where

q_{i j} : = - v_{i j}, i \neq j; q_{i i} : = \sum_{k = 1}^{n} v_{i k} = - v_{i 0}, i, j = 1, \dots, n .

Let

\begin{matrix} m : = min_{(i, j) \in {(i, j) : q_{i j} > 0}} q_{i j}, & M : = max_{i, j} q_{i j}, \\ t_{max} : = max_{i} t_{i}, & t_{min} : = min_{i} t_{i} . \end{matrix}

It is clear that

M \geq m > 0

and

q_{i j} \geq 0, q_{i j} = q_{j i}, v_{i j} = - q_{i j}, i \neq j, M t_{max} \geq v_{i i} = \sum_{k = 1}^{n} q_{i k} \geq m t_{min} .

Notice that

V^{- 1} - S = (V^{- 1} - S) (I - V S) + S (I - V S),

where I is a

n \times n

identity matrix. Let

X = I - V S

,

Y = S X

and

Z = V^{- 1} - S

; we have the recursion formula

Z = Z X + Y .

The goal is to give an upper bound of all

| z_{i j} |

.

According the definitions of S, V and X, we have

\begin{matrix} x_{i j} & = & δ_{i j} - \sum_{k = 1}^{n} v_{i k} s_{k j} \\ = & δ_{i j} - \sum_{k = 1}^{n} v_{i k} (\frac{δ k j}{v_{j j}} + \frac{1}{v_{00}}) \\ = & δ_{i j} - \frac{v_{i j}}{v_{i i}} - \frac{q_{i i}}{v_{00}} \\ = & (1 - δ_{i j}) \frac{q_{i j}}{v_{j j}} - \frac{q_{i i}}{v_{00}} . \end{matrix}

and

\begin{matrix} y_{i j} & = & \sum_{k = 1}^{n} s_{i k} x_{k j} = \sum_{k = 1}^{n} (\frac{δ_{i k}}{v_{i i}} + \frac{1}{v_{00}}) ((1 - δ_{k j}) \frac{q_{k j}}{v_{j j}} - \frac{q_{k k}}{v_{00}}) \\ = & \sum_{k = 1}^{n} \frac{δ_{i k}}{v_{i i}} \{(1 - δ_{k j}) \frac{q_{k j}}{v_{j j}} - \frac{q_{k k}}{v_{00}}\} + \sum_{k = 1}^{n} \frac{1}{v_{00}} \{(1 - δ_{k j}) \frac{q_{k j}}{v_{j j}} - \frac{q_{k k}}{v_{00}}\} \\ = & \frac{(1 - δ_{i j}) q_{i j}}{v_{i i} v_{j j}} - \frac{q_{i i}}{v_{i i} v_{00}} - \frac{q_{j j}}{v_{j j} v_{00}} . \end{matrix}

Since

0 \leq \frac{q_{i j}}{v_{i i} v_{j j}} \leq \frac{M}{m^{2} t_{min}^{2}}, 0 \leq \frac{q_{i j}}{v_{i i} v_{00}} \leq \frac{M}{m^{2} t_{min}^{2}},

for any different

i, j, k

, we have

| y_{i j} | \leq a : = \frac{2 M}{m^{2} t_{min}^{2}}, | y_{i j} - y_{i k} | \leq a .

(A13)

In view of the expressions of

x_{i j}

and

y_{i j}

, we have

z_{i j} = \sum_{k = 1}^{n} z_{i k} (1 - δ_{k j}) \frac{q_{k j}}{v_{j j}} - \sum_{k = 1}^{n} z_{i k} \frac{q_{k k}}{v_{00}} + y_{i j}, i, j = 1, \dots, n .

(A14)

Now, we fix an arbitrary i value and consider the upper bound of

{max}_{j} | z_{i j} |

.

Let

α

and

β

be such that

z_{i α} = max_{k = 1, \dots, n} z_{i k}, z_{i β} = min_{k = 1, \dots, n} z_{i k} .

Without loss of generality, we assume

z_{i α} \geq | z_{i β} |

(otherwise, we can invert the sign of

z_{i k}

and repeat the same process). Below, we will show

z_{i β} \leq 0

. Note that this conclusion is not investigated in [20]. By multiplying

v_{j j}

by both sides of (A14), we have

v_{j j} z_{i j} = \sum_{k = 1}^{n} z_{i k} (1 - δ_{k j}) q_{k j} - \sum_{k = 1}^{n} z_{i k} \frac{q_{k k} v_{j j}}{v_{00}} + v_{j j} y_{i j} .

Summarizing the above equations with

j = 1, \dots, n

, we have

\sum_{j = 1}^{n} v_{j j} z_{i j} = \sum_{k = 1}^{n} \sum_{j = 1}^{n} z_{i k} (1 - δ_{k j}) q_{k j} - \sum_{k = 1}^{n} \sum_{j = 1}^{n} z_{i k} \frac{q_{k k} v_{j j}}{v_{00}} + \sum_{j = 1}^{n} v_{j j} (\frac{(1 - δ_{i j}) q_{i j}}{v_{i i} v_{j j}} - \frac{q_{i i}}{v_{i i} v_{00}} - \frac{q_{j j}}{v_{j j} v_{00}}) .

Thus,

\begin{matrix} \sum_{k = 1}^{n} \sum_{j = 1}^{n} z_{i k} \frac{q_{k k} v_{j j}}{v_{00}} + \sum_{k = 1}^{n} z_{i k} q_{k k} \\ = & \sum_{j = 1}^{n} v_{j j} (\frac{(1 - δ_{i j}) q_{i j}}{v_{i i} v_{j j}} - \frac{q_{i i}}{v_{i i} v_{00}} - \frac{q_{j j}}{v_{j j} v_{00}}) \\ = & \sum_{j = 1}^{n} \frac{(1 - δ_{i j}) q_{i j}}{v_{i i}} - \frac{q_{i i} \sum_{j} v_{j j}}{v_{i i} v_{00}} - \sum_{j = 1}^{n} \frac{q_{j j}}{v_{00}} \\ = & - \frac{q_{i i}}{v_{i i}} - \frac{q_{i i} \sum_{j} v_{j j}}{v_{i i} v_{00}} = - \frac{q_{i i}}{v_{i i} v_{00}} (v_{00} + \sum_{j = 1}^{n} v_{j j}) \end{matrix}

Thus,

\begin{matrix} - \frac{q_{i i}}{v_{i i} v_{00}} (v_{00} + \sum_{j = 1}^{n} v_{j j}) & = & \sum_{k = 1}^{n} z_{i k} \frac{q_{k k} \sum_{j = 1}^{n} v_{j j}}{v_{00}} + \sum_{k = 1}^{n} z_{i k} q_{k k} \\ \geq & z_{i β} (v_{00} + \sum_{j = 1}^{n} v_{j j}), \end{matrix}

This shows

z_{i β} \leq - \frac{q_{i i}}{v_{i i} v_{00}} \leq 0 .

(A15)

Since

\sum_{k = 1}^{n} q_{k α} / v_{α α} = 1

, we have

\sum_{k = 1}^{n} z_{i α} \frac{q_{k α}}{v_{α α}} = \sum_{k = 1}^{n} z_{i k} (1 - δ_{k α}) \frac{q_{k α}}{v_{α α}} - \sum_{k = 1}^{n} z_{i k} \frac{q_{k k}}{v_{00}} + y_{i α} .

In other words,

\sum_{k = 1}^{n} [z_{i α} - z_{i k} (1 - δ_{k α})] \frac{q_{k α}}{v_{α α}} = - \sum_{k = 1}^{n} z_{i k} \frac{q_{k k}}{v_{00}} + y_{i α} .

Analogously, we have

\sum_{k = 1}^{n} [z_{i k} (1 - δ_{k β}) - z_{i β}] \frac{q_{k β}}{v_{β β}} = \sum_{k = 1}^{n} z_{i k} \frac{q_{k k}}{v_{00}} - y_{i β} .

Therefore,

\begin{matrix} y_{i α} - y_{i β} & = & \sum_{k = 1}^{n} {[z_{i α} - z_{i k} (1 - δ_{k α})] \frac{q_{k α}}{v_{α α}} + [z_{i k} (1 - δ_{k β}) - z_{i β}] \frac{q_{k β}}{v_{β β}}} \\ \geq & [\sum_{{k : t_{α k} > 0, t_{k β} > 0}} (z_{i α} - z_{i β})] \times \frac{m}{M t_{max}} . \end{matrix}

(A16)

The following calculations are based on the event

E_{n}

:

{min_{i \neq j} ξ_{i j} \geq \frac{1}{2} (n - 1) p_{n}^{2}, t_{min} \geq \frac{1}{2} n T p_{n}, t_{max} \leq \frac{3}{2} n T p_{n}}

In view of (A15) and (A16), we have

\begin{matrix} z_{i α} & \leq & z_{i α} - z_{i β} \leq \frac{M t_{max}}{m} \times {[min_{i \neq j} ξ_{i j}]}^{- 1} \times \frac{2 M}{m^{2} t_{min}^{2}} \\ \leq & \frac{M \cdot \frac{3}{2} n T p_{n}}{m} \times \frac{1}{\frac{1}{2} (n - 1) p_{n}^{2}} \times \frac{2 M}{m^{2} {(\frac{1}{2} n T p_{n})}^{2}} \\ = & \frac{24 M^{2}}{n (n - 1) m^{3} T p_{n}^{3}} \end{matrix}

(A17)

Note that

M = T b_{n 1}

and

m = b_{n 0}

. Based on Lemma 6 in the main text, we have

P (t_{min} \geq \frac{1}{2} n T p_{n}, t_{max} \leq \frac{3}{2} n T p_{n}) \geq 1 - 2 (n + 1) exp (- \frac{1}{8} n T p_{n}) .

In view of inequality (A12), we have

P (E_{n}) \geq 1 - \frac{(n + 1) n}{2} e^{- \frac{1}{8} (n - 1) p_{n}} - 2 (n + 1) e^{- \frac{1}{8} n T p_{n}} .

If

p_{n} \geq 24 \log n / n

, then

\frac{(n + 1) n}{2} e^{- \frac{1}{8} (n - 1) p_{n}} \leq O (\frac{1}{n}), (n + 1) e^{- \frac{1}{10} n p_{n}} = o (\frac{1}{n^{1.4}}),

such as

P (E_{n}) \geq 1 - O (\frac{1}{n}) .

Let

F_{n}

be the event that

V^{- 1}

exists. Based on Lemma 1, if

p_{n} \geq 24 \log n / n

, then

P (F_{n}) \geq 1 - exp (p_{n} T (n + 1)) \geq 1 - \frac{1}{n^{24}} .

Therefore,

P (E_{n} ⋂ F_{n}) \geq 1 - O (\frac{1}{n}) .

Substituting

M = T b_{n 1}

and

m = b_{n 0}

into (A17) shows Lemma 2. □

Appendix B.3. Proof of Lemma A3

Proof of Lemma A3.

Recall that

π_{i j} = β_{i} - β_{j}

and

H_{i} (β) = \sum_{j \neq i} t_{i j} μ (π_{i j}) - a_{i}, i = 1, \dots, n .

The Jacobian matrix

H^{'} (β)

of

H (β)

can be calculated as follows. By finding the partial derivative of

H_{i}

with respect to

β

for

i \neq j

, we have

\frac{\partial H_{i} (β)}{\partial β_{j}} = - t_{i j} μ^{'} (π_{i j}), \frac{\partial H_{i} (β)}{\partial β_{i}} = \sum_{j \neq i} t_{i j} μ^{'} (π_{i j}),

\frac{\partial^{2} H_{i} (β)}{\partial β_{i} \partial β_{j}} = - t_{i j} μ^{″} (π_{i j}), \frac{\partial^{2} H_{i} (β)}{\partial β_{i}^{2}} = \sum_{j \neq i} t_{i j} μ^{″} (π_{i j}) .

When

β \in B (β^{*}, ϵ_{n})

, based on Condition (3c), we have

| \frac{\partial^{2} H_{i} (β)}{\partial β_{i} \partial β_{j}} | \leq b_{n 2} t_{i j}, i \neq j .

Let

g_{i j} (β) = {(\frac{\partial^{2} H_{i} (β)}{\partial β_{1} \partial β_{j}}, \dots, \frac{\partial^{2} H_{i} (β)}{\partial β_{n} \partial β_{j}})}^{⊤} .

Therefore,

| \frac{\partial^{2} H_{i} (β)}{\partial β_{i}^{2}} | \leq t_{i} b_{n 2}, | \frac{\partial^{2} H_{i} (β)}{\partial β_{j} \partial β_{i}} | \leq b_{n 2} t_{i j} .

(A18)

This demonstrates that

∥ g_{i i} {(β) ∥}_{1} \leq 2 t_{i} b_{n 2}

. Note that when

i \neq j

and

k \neq i, j

,

\frac{\partial^{2} H_{i} (β)}{\partial β_{k} \partial β_{j}} = 0 .

Therefore, we have

∥ g_{i j} {(β) ∥}_{1} \leq 2 t_{i j} b_{n 2}

for

j \neq i

. Consequently, for vectors

x, y \subset D

, we have

\begin{matrix} max_{i = 0, \dots, n} {∥ H_{i}^{'} (x) - H_{i}^{'} (y) ∥}_{1} \\ \leq & max_{i = 0, \dots, n} \sum_{j = 1}^{n} | \frac{\partial H_{i} (x)}{\partial x_{j}} - \frac{\partial H_{i} (y)}{\partial y_{j}} | \\ = & max_{i = 0, \dots, n} \sum_{j = 1}^{n} | \int_{0}^{1} {[g_{i j} (t x + (1 - t) y)]}^{⊤} (x - y) d t | \\ \leq & max_{i = 0, \dots, n} 4 b_{n 2} t_{i} {∥ x - y ∥}_{\infty} \\ = & 4 b_{n 2} t_{max} {∥ x - y ∥}_{\infty} . \end{matrix}

This completes the proof. □

Appendix B.4. Proof of Lemma A4

Proof of Lemma A4.

Recall that

t_{i} = \sum_{j \neq i} t_{i j}

, and

a_{i}

is the number of wins of subject i out of

t_{i}

comparisons. Since all comparisons are mutually independent,

a_{i}

is the sum of

m_{i}

independent Bernoulli random variables given

t_{i j} = m_{i j}

for

j = 0, \dots, n

, where

m_{i} = \sum_{j \neq i} m_{i j}

. Based on [30]’s (1963) inequality, we have

\begin{matrix} P (| a_{i} - E (a_{i j} | t_{i j}, j = 0, \dots, n) | \geq \sqrt{2 m_{i} \log n} | t_{i j} = m_{i j}, j = 0, \dots, n) \\ \leq & 2 exp {- \frac{2 m_{i} \log n}{m_{i}}} = \frac{2}{n^{2}} . \end{matrix}

where

E (a_{i j} | t_{i j}, j = 0, \dots, n)

is the conditional expectation given

t_{i j}

for

0 \leq j \leq n

. Note that the upper bound of the above probability does not depend on

m_{i j}

. With the law of total probability, for fixed i,

\begin{matrix} P (| a_{i} - \sum_{j} E (a_{i j} | t_{i j}, j = 0, \dots, n) | \geq \sqrt{2 t_{i} \log n} |) \\ = & \sum_{m_{i 0} = 0}^{T} \dots \sum_{m_{i n} = 0}^{T} P (t_{i j} = m_{i j}, j = 0, \dots, n) \\ \times P (| a_{i} - \sum_{j} E (a_{i j} | t_{i j}, j = 0, \dots, n) | \geq \sqrt{2 m_{i} \log n} | t_{i j} = m_{i j}, j = 0, \dots, n) \\ \leq & 2 n^{- 2} \sum_{m_{i 0} = 0}^{T} \dots \sum_{m_{i n} = 0}^{T} P (t_{i j} = m_{i j}, j = 0, \dots, n) \\ \leq & 2 exp {- \frac{2 m_{i} \log n}{m_{i}}} = \frac{2}{n^{2}} . \end{matrix}

Therefore,

\begin{matrix} P (max_{i = 1, \dots, n} | a_{i} - \sum_{j} E (a_{i j} | t_{i j}, j = 0, \dots, n) | \geq \sqrt{2 t_{max} \log n}) \\ \leq & P (⋃_{i} \{| a_{i} - E a_{i} | \geq \sqrt{2 t_{i} \log n}\}) \\ \leq & \sum_{i = 1}^{n} P (| a_{i} - E a_{i} | \geq \sqrt{2 t_{i} \log n}) \\ \leq & n \times \frac{1}{n^{2}} = \frac{1}{n} . \end{matrix}

This completes the proof. □

Appendix B.5. Proof of Lemma A5

Proof of Lemma A5.

We first evaluate the uniform lower bound of

t_{i}

,

i = 0, \dots, n

. Note that

t_{i}

is the sum of

n T

independent and identically distributed (i.i.d.) binomial random variables,

B i n (T, p_{n})

. It can be also regarded as the sum of

T n

, i.i.d., Bernoulli random variables. With the use of the Chernoff bound ([29]), we have

P (min_{i = 0, \dots, n} t_{i} < \frac{T}{2} n p_{n}) \leq \sum_{i = 0}^{n} P (t_{i} < \frac{T}{2} n p_{n}) \leq (n + 1) exp (- \frac{T}{8} n p_{n}) .

Thus, with a probability of at least

1 - (n + 1) exp (- \frac{T}{8} n p_{n})

,

min_{i = 0, \dots, n} t_{i} \geq \frac{T}{2} n p_{n} .

Analogously, with the use of the Chernoff bound ([29]), we have

P (max_{i = 0, \dots, n} t_{i} > \frac{3}{2} n T p_{n}) \leq \sum_{i = 0}^{n} P (t_{i} > \frac{3}{2} n T p_{n}) \leq (n + 1) exp (- \frac{1}{10} n T p_{n}),

and

P (\frac{1}{2} \sum_{i} t_{i} > \frac{3}{2} (n + 1) n T p_{n}) \leq exp (- \frac{1}{10} (n + 1) n T p_{n}) .

□

Appendix B.6. Proof of Lemma A6

Proof of Lemma A6.

Write

\begin{matrix} H = H (β^{*}), V^{- 1} = H^{'} (β^{*}), \\ E^{*} (\cdot) = E (\cdot | t_{i j}, 0 \leq i, j \leq n), {Cov}^{*} (\cdot) = Cov (\cdot | t_{i j}, 0 \leq i, j \leq n) . \end{matrix}

Then,

H = E^{*} (a) - a .

Let

W = V^{- 1} - S

. Note that

U = {Cov}^{*} (H)

. Via direct calculations, we have

{Cov}^{*} (W H) = (V^{- 1} - S) U (V^{- 1} - S) : = I_{1} + I_{2}

where

I_{1} = V^{- 1} - S + S V S - S

and

I_{2} = (V^{- 1} - S) (U - V) (V^{- 1} - S)

. It is easy to verify

{(S V S - S)}_{i j} = \frac{v_{i 0}}{v_{i i} v_{00}} + \frac{v_{0 j}}{v_{j j} v_{00}} - \frac{(1 - δ_{i j}) v_{i j}}{v_{i i} v_{j j}} .

Therefore,

max_{i, j} | {(S V S - S)}_{i j} | \leq \frac{3 b_{n 1}}{t_{min}^{2} b_{n 0}^{2}} .

Based on Lemmas 3 and 5 in the main text, we have

I_{1} = O_{p} (\frac{b_{n 1}^{2}}{n^{2} b_{n 0}^{3} p_{n}^{5}}) .

Now, we evaluate

I_{2}

. Direct calculations give

\begin{matrix} {[(V^{- 1} - S) (U - V) (V^{- 1} - S)]}_{i j} \\ = & \sum_{k, s} {(V^{- 1} - S)}_{i k} {(U - V)}_{k s} {(V^{- 1} - S)}_{s j} \\ = & O ({(\frac{b_{n 1}^{2}}{n^{2} b_{n 0}^{3} p_{n} η_{n}})}^{2}) \sum_{k, s} | {(U - V)}_{k s} | \\ = & O ({(\frac{b_{n 1}^{2}}{n^{2} b_{n 0}^{3} p_{n} η_{n}})}^{2}) \times 2 (b_{n 1} + 1 / 4) \sum_{i} t_{i} \\ = & O (\frac{b_{n 1}^{5}}{n^{2} b_{n 0}^{6} p_{n}^{5}}) . \end{matrix}

where the second inequality is due to Lemma 3 and the third inequality is due to

p_{i j} (1 - p_{i j}) \leq 1 / 4

and

μ_{i j}^{'} (β) \leq b_{n 1}

. Therefore, we have

I_{1} + I_{2} = O_{p} (\frac{b_{n 1}^{5}}{n^{2} b_{n 0}^{6} p_{n}^{5}}) .

□

References

Han, R.; Xu, Y.; Chen, K. A general pairwise comparison model for extremely sparse networks. J. Am. Stat. Assoc. 2023, 118, 2422–2432. [Google Scholar] [CrossRef]
Stigler, S.M. Citation patterns in the journals of statistics and probability. Stat. Sci. 1994, 9, 94–108. [Google Scholar] [CrossRef]
Varin, C.; Cattelan, M.; Firth, D. Statistical modelling of citation exchange between statistics journals. J. R. Stat. Soc. Ser. A-Stat. Soc. 2016, 179, 1–63. [Google Scholar] [CrossRef]
Radlinski, F.; Joachims, T. Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Jose, CA, USA, 2007; pp. 570–579. [Google Scholar]
Chen, B.; Escalera, S.; Guyon, I.; Ponce-López, V.; Shah, N.B.; Simon, M.O. Overcoming calibration problems in pattern labeling with pairwise ratings: Application to personality traits. In European Conference on Computer Vision (ECCV 2016) Workshops; Springer: Cham, Switzerland, 2016; Volume 9915, pp. 419–432. [Google Scholar]
David, H.A. The Method of Paired Comparisons, 2nd ed.; Oxford University Press: Oxford, UK, 1988. [Google Scholar]
Bradley, R.A.; Terry, M.E. Rank analysis of incomplete block designs the method of paired comparisons. Biometrika 1952, 39, 324–345. [Google Scholar] [CrossRef]
Zermelo, E. Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung. Math. Z. 1929, 29, 436–460. [Google Scholar] [CrossRef]
Thurstone, L.L. A law of comparative judgment. Psychol. Rev. 1927, 34, 273–286. [Google Scholar] [CrossRef]
Simons, G.; Yao, Y.C. Asymptotics when the number of parameters tends to infinity in the bradley-terry model for paired comparisons. Ann. Stat. 1999, 27, 1041–1060. [Google Scholar] [CrossRef]
Chen, Y.; Suh, C. Spectral mle: Top-k rank aggregation from pairwise comparisons. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015); Bach, F., Blei, D., Eds.; International Machine Learning Society (IMLS): Stroudsburg, PA, USA, 2015; pp. 371–380. [Google Scholar]
Shah, N.B.; Wainwright, M.J. Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res. 2018, 18, 1–38. [Google Scholar]
Han, R.; Ye, R.; Tan, C.; Chen, K. Asymptotic theory of sparse bradley-terry model. Ann. Appl. Probab. 2020, 30, 2491–2515. [Google Scholar] [CrossRef]
Yan, T.; Yang, Y.; Xu, J. Sparse paired comparisons in the bradley-terry model. Stat. Sin. 2012, 22, 1035–1318. [Google Scholar] [CrossRef]
Agarwal, A.; Patil, P.; Agarwal, S. Accelerated spectral ranking. In Proceedings of the Machine Learning Research, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 70–79. [Google Scholar]
Hendrickx, J.; Olshevsky, A.; Saligrama, V. Graph resistance and learning from pairwise comparisons. In Proceedings of the Conference on Neural Information Processing Systems; NIPS: San Diego, CA, USA, 1999; pp. 2702–2711. [Google Scholar]
Vojnovic, M.; Yun, S. Parameter estimation for thurstone choice models. arXiv 2017, arXiv:1705.00136. [Google Scholar] [CrossRef]
Wang, J.; Shah, N.; Ravi, R. Stretching the effectiveness of mle from accuracy to bias for pairwise comparisons. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Online, 26–28 August 2020; Chiappa, S., Calandra, R., Eds.; Proceedings of Machine Learning Research; PMLR: Cambridge, MA, USA, 2020; Volume 108, pp. 66–76. [Google Scholar]
Ford, L.R. Solution of a ranking problem from binary comparisons. Am. Math. Mon. 1957, 64, 28–33. [Google Scholar] [CrossRef]
Simons, G.; Yao, Y.C. Approximating the inverse of a symmetric positive definite matrix. Linear Algebra Appl. 1998, 281, 97–103. [Google Scholar] [CrossRef][Green Version]
Yamamoto, T. Error bounds for newton s iterates derived from the kantorovich theorem. Numer. Math. 1986, 48, 91–98. [Google Scholar] [CrossRef]
Delyon, B. Exponential inequalities for sums of weakly dependent variables. Electron. J. Probab. 2009, 752–779. [Google Scholar] [CrossRef]
Roussas, G.G. Exponential probability inequalities with some applications. Lect. Notes-Monogr. Ser. 1996, 30, 303–319. [Google Scholar]
Ioannides, D.A.; Roussas, G.G. Exponential inequality for associated random variables. Stat. Probab. Lett. 1999, 42, 423–431. [Google Scholar] [CrossRef]
Cocke, W.J. Central limit theorems for sums of dependent vector variables. Ann. Math. Statist. 1972, 43, 968–976. [Google Scholar] [CrossRef]
Cox, J.T.; Grimmett, G. Central limit theorems for associated random variables and the percolation model. Ann. Probab. 1984, 12, 514–528. [Google Scholar] [CrossRef]
Kantorovich, L.V. Functional analysis and applied mathematics. Uspekhi Mat. Nauk 1948, 3, 89–185. [Google Scholar]
Loève, M. Probability Theory I, 4th ed.; Springer: New York, NY, USA, 1977. [Google Scholar]
Chernoff, H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 1952, 23, 493–507. [Google Scholar] [CrossRef]
Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]

Table 1. The fail frequency (

\times 100 %

).

Table 1. The fail frequency (

\times 100 %

).

$p_{n}$	$n = 100$	$n = 500$	$n = 1000$
${(\log n / n)}^{1 / 2}$	$0.5$	0	0
${(\log n / n)}^{2 / 3}$	$25.8$	$0.2$	0
$\log n / n$	100	100	100

Table 2. The reported values are the coverage frequency (

\times 100 %

) for

β_{i} - β_{j}

for a pair

(i, j)

/length of the confidence interval/fail probabilities (

\times 100 %

).

Table 2. The reported values are the coverage frequency (

\times 100 %

) for

β_{i} - β_{j}

for a pair

(i, j)

/length of the confidence interval/fail probabilities (

\times 100 %

).

$p_{n} = {(\log n / n)}^{1 / 4}$
n	$i$	$c = 0.2$	$c = 0.5$	$c = 0.8$
100	$(1, 2)$	$94.5 / 1.04 / 0$	$94.62 / 1.06 / 0$	$94.62 / 1.06 / 0$
	$(49, 50)$	$94.97 / 1.04 / 0$	$94.34 / 1.04 / 0$	$94.34 / 1.04 / 0$
	$(99, 100)$	$95.12 / 1.04 / 0$	$94.99 / 1.06 / 0$	$94.99 / 1.06 / 0$
200	$(1, 2)$	$95.18 / 0.78 / 0$	$94.71 / 0.79 / 0$	$94.71 / 0.79 / 0$
	$(99, 100)$	$95.14 / 0.78 / 0$	$94.49 / 0.78 / 0$	$94.49 / 0.78 / 0$
	$(199, 200)$	$94.82 / 0.78 / 0$	$94.64 / 0.79 / 0$	$94.64 / 0.79 / 0$
$p_{n} = {(\log n / n)}^{1 / 2}$
n	$i$	$c = 0.2$	$c = 0.4$	$c = 0.6$
100	$(1, 2)$	$93.28 / 1.58 / 0.19$	$93.48 / 1.60 / 0.49$	$93.48 / 1.60 / 1.15$
	$(49, 50)$	$93.85 / 1.58 / 0.19$	$93.94 / 1.58 / 0.49$	$93.94 / 1.58 / 1.15$
	$(99, 100)$	$93.80 / 1.58 / 0.19$	$93.96 / 1.61 / 0.49$	$93.96 / 1.61 / 1.15$
200	$(1, 2)$	$94.04 / 1.26 / 0$	$93.89 / 1.28 / 0$	$93.89 / 1.28 / 0$
	$(99, 100)$	$94.14 / 1.26 / 0$	$94.22 / 1.26 / 0$	$94.22 / 1.26 / 0$
	$(199, 200)$	$94.38 / 1.26 / 0$	$93.98 / 1.28 / 0$	$93.98 / 1.28 / 0$

Table 3. The fitted merit

{\hat{β}}_{i}

, the number of wins

a_{i}

, and the standard error

{\hat{σ}}_{i}

.

Table 3. The fitted merit

{\hat{β}}_{i}

, the number of wins

a_{i}

, and the standard error

{\hat{σ}}_{i}

.

American Football Conference					National Football Conference
Division	Team	${\hat{β}}_{i}$	$a_{i}$	${\hat{σ}}_{i}$	Division	Team	${\hat{β}}_{i}$	$a_{i}$	${\hat{σ}}_{i}$
East	New England Patriots	$1.452$	11	$0.519$	East	Dallas Cow boys	$1.284$	10	$0.512$
	New York Jets	$0.338$	4	$0.530$		Philadelphia Eagles	$1.193$	9	$0.511$
	Miami Dophins	$0.718$	7	$0.509$		New York Giants	$0.423$	5	$0.514$
	Buffalo Bills	$0.673$	6	$0.514$		Washington Redskins	$0.756$	7	$0.511$
North	Baltimore Ravens	$1.382$	10	$0.514$	North	Chicago Bears	$1.494$	12	$0.532$
	Cincinnati Bengals	$0.769$	6	$0.514$		Green Bay Packers	$0.595$	6	$0.522$
	Pittsburgh Steelers	$1.337$	9	$0.52$		Minnesota Vikings	$1.059$	8	$0.523$
	Cleveland Browns	$0.968$	7	$0.519$		Detroit Lions	$0.579$	6	$0.516$
South	Indianapolis Colts	$1.244$	10	$0.512$	South	New Orleans Saints	$1.908$	13	$0.544$
	Houston Texans	$1.430$	11	$0.516$		Atlanta Falcons	$0.767$	7	$0.512$
	Tennessee Titans	$1.205$	9	$0.51$		Carolina Panthers	$0.840$	7	$0.511$
	Jacksonville Jaguars	$0.591$	5	$0.517$		Tampa Bay Buccaneers	$0.506$	5	$0.52$
West	Kansas City Chiefs	$1.762$	12	$0.537$	West	Los Angeles Rams	$1.963$	13	$0.559$
	Denver Broncos	$0.713$	6	$0.523$		San Francisco 49ers	$0.166$	4	$0.54$
	Oakland Raiders	$0.365$	4	$0.537$		Seattle Seahawks	$1.305$	10	$0.525$
	Los Angeles Chargers	$1.748$	12	$0.536$		Arizona Cardinals	0	3	$0.555$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Q.; Pan, L.; Yan, T. Moment Estimation in Paired Comparison Models with a Growing Number of Subjects. Entropy 2026, 28, 314. https://doi.org/10.3390/e28030314

AMA Style

Wang Q, Pan L, Yan T. Moment Estimation in Paired Comparison Models with a Growing Number of Subjects. Entropy. 2026; 28(3):314. https://doi.org/10.3390/e28030314

Chicago/Turabian Style

Wang, Qiuping, Lu Pan, and Ting Yan. 2026. "Moment Estimation in Paired Comparison Models with a Growing Number of Subjects" Entropy 28, no. 3: 314. https://doi.org/10.3390/e28030314

APA Style

Wang, Q., Pan, L., & Yan, T. (2026). Moment Estimation in Paired Comparison Models with a Growing Number of Subjects. Entropy, 28(3), 314. https://doi.org/10.3390/e28030314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Moment Estimation in Paired Comparison Models with a Growing Number of Subjects

Abstract

1. Introduction

2. Moment Estimation

3. Asymptotic Properties

3.1. Consistency

3.2. Asymptotic Normality of $\hat{β}$

4. Application to the Thurston Model

5. Extension to a Fixed Sparse Design

6. Numerical Studies

6.1. Simulation Studies

6.2. A Real Data Example

7. Summary and Discussion

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Preliminaries

Appendix A.2. Proof of Theorem 1

Appendix A.3. Proofs for Theorem 2

Appendix B

Appendix B.1. Proof of Lemma 1

Appendix B.2. Proof of Lemma A2

Appendix B.3. Proof of Lemma A3

Appendix B.4. Proof of Lemma A4

Appendix B.5. Proof of Lemma A5

Appendix B.6. Proof of Lemma A6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Moment Estimation in Paired Comparison Models with a Growing Number of Subjects

Abstract

1. Introduction

2. Moment Estimation

3. Asymptotic Properties

3.1. Consistency

3.2. Asymptotic Normality of β ^

4. Application to the Thurston Model

5. Extension to a Fixed Sparse Design

6. Numerical Studies

6.1. Simulation Studies

6.2. A Real Data Example

7. Summary and Discussion

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Preliminaries

Appendix A.2. Proof of Theorem 1

Appendix A.3. Proofs for Theorem 2

Appendix B

Appendix B.1. Proof of Lemma 1

Appendix B.2. Proof of Lemma A2

Appendix B.3. Proof of Lemma A3

Appendix B.4. Proof of Lemma A4

Appendix B.5. Proof of Lemma A5

Appendix B.6. Proof of Lemma A6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Asymptotic Normality of $\hat{β}$