Next Article in Journal
Terminal Voltage and Load Frequency Regulation in a Nonlinear Four-Area Multi-Source Interconnected Power System via Arithmetic Optimization Algorithm
Previous Article in Journal
Joint Scheduling and Placement for Vehicular Intelligent Applications Under QoS Constraints: A PPO-Based Precedence-Preserving Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Closed-Form Expressions for the Normalizing Constants of the Mallows Model and Weighted Mallows Model on Combinatorial Domains

by
Jean-Pierre van Zyl
1,* and
Andries Petrus Engelbrecht
1,2,3
1
Division of Computer Science, Stellenbosch University, Stellenbosch 7599, South Africa
2
Department of Industrial Engineering, Stellenbosch University, Stellenbosch 7599, South Africa
3
Center for Applied Mathematics and Bioinformatics, Gulf University for Science and Technology, Mubarak Al-Abdullah 7207, Kuwait
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(19), 3126; https://doi.org/10.3390/math13193126
Submission received: 28 July 2025 / Revised: 11 September 2025 / Accepted: 24 September 2025 / Published: 30 September 2025
(This article belongs to the Section D1: Probability and Statistics)

Abstract

This paper expands the Mallows model for use in combinatorial domains. The Mallows model is a popular distribution used to sample permutations around a central tendency but requires a unique normalizing constant for each distance metric used in order to be computationally efficient. In this paper, closed-form expressions for the Mallows model normalizing constant are derived for the Hamming distance, symmetric difference, and the similarity coefficient in combinatorial domains. Additionally, closed-form expressions are derived for the normalizing constant of the weighted Mallows model in combinatorial domains. The weighted Mallows model increases the versatility of the Mallows model by allowing granular control over likelihoods of individual components in the domain. The derivation of the closed-form expression results in a reduction of the order of calculations required to calculate probabilities from exponential to constant.

1. Introduction

Permutation and combination spaces are an integral part of the foundations of mathematics and have been studied for centuries. Berge provides excellent explanations of the intricacies of combinatorial counting problems, permutation groups, and the set theory which underpins these issues [1]. Additionally, Biggs outlines the history of the study of combinations and permutations, dating back to the ancient Greek and Hindu historians, which highlights the influence that this field has had on mathematics [2].
Permutations are not simply an ancient concept confined to old texts but are still a relevant and actively studied part of the contemporary literature, e.g., see [3]. The issue of sampling random permutations famously became a contentious issue during the conscription of United States citizens by executive order of president Nixon [4]. The randomness of the sampled conscription dates was questioned by Fienberg [5], who found that conscription ranks were correlated with birthdates with a Spearman rank correlation coefficient of 0.226 . The “non-random” randomly sampled permutation of dates highlights the challenge of creating a distribution defined on a permutation domain.
Closely related to permutations are combinations, which are non-repeated unordered collections of discrete objects. Hall and Knuth emphasize the increased attention combinatorial problems have gained since the advent of computers and how computational complexity remains one of the limiting factors for combinatorial problems [6]. Papadimitriou and Steiglitz further illustrate the immense scale and ubiquity of combinatorial optimization problems (COPs) [7]. The high level of difficulty of COPs is not surprising, since COPs are often underpinned by non-deterministic polynomial time (NP)-hard decision problems. This NP-hardness, combined with the combinatorial explosion of the search space of COPs, necessitates highly specialized control of the exploration–exploitation trade-off in combinatorial optimization algorithms (COAs). Often there are high levels of exploration required, which are emphasized by new “combinatorial pure exploration” approaches in multi-armed bandit algorithms [8].
The normalizing constant, sometimes referred to as the partition function in physics, is an integral part of the definition of a statistical distribution. The normalizing constant ensures that the probability density function (PDF) of a distribution is valid and that the area under the curve of the PDF is one. For many models, a normalizing constant without a closed-form expression results in a doubly intractable problem; a popular example of a doubly intractable sampling problem is in exponential random graph models (ERGMs) for social networks [9]. As a result of the intractablity of this class of problems, authors in the contemporary literature have proposed approximation approaches to sample from doubly intractable distributions, e.g., Monte Carlo Metropolis–Hastings (MCMH) [10] and Markov chain Monte Carlo (MCMC) [11].
The Mallows model (MM) [12] is a popular distribution used to model permutations; however, no research has been conducted on the adaptation of the MM for combinations. This paper derives the closed-form expression for the normalizing constant of the MM with different combinatorial discrepancy functions. The contribution of this paper allows the MM to be applied to combinatorial problems with better computational efficiency.

2. The Mallows Model

In a seminal 1957 paper, Mallows proposed the MM as a popular distribution for ranked data [12]. Mallows started with a general non-null model to calculate the probability of putting two objects U i and U j in the correct order. The probability of correctly ordering two objects is formulated as
P = 1 2 + 1 2 tanh k log θ + log ϕ ,
where tanh is the hyperbolic tangent function, k = j i is the discrepancy between the true ranks of the two objects, and θ and ϕ are model parameters.
The parameters ϕ and θ change the behavior of the model, with ϕ closely associated with the Spearman’s ρ [13], while θ is associated with Kendall’s τ [14]. These two parameters can be varied in order to change two different effects. Firstly, the parameter θ is interpreted as the weights assigned to each of the n objects in the ordering, i.e., the level of importance assigned to a correct ordering in each position. Secondly, the parameter ϕ is interpreted as the probabilities that will be assigned to a newly ranked object, i.e., if ( n 1 ) objects have been ranked, and an additional object is introduced. Both the weights from θ and the probabilities from ϕ follow a geometric progression, decreasing away from the true position in the ordering.
The model parameters can be varied to obtain different behaviors, with the null hypothesis corresponding to θ = ϕ = 1 . An additional behavior is obtained when ϕ = 1 , in which case the MM gives a special case of the Bradley–Terry model [15]. When θ = 1 , the joint distribution of ρ and τ asymptotically tends to the bivariate normal form, which makes the two parameters indistinguishable.
 Definition 1.
Let π i Ω be an arbitrary permutation of length n. Then,
P ( π | π 0 , θ ) = 1 Ψ ( θ , π 0 ) e θ d ( π , π 0 ) ,
where Ψ ( θ , π 0 ) is the normalizing constant, π 0 is the central tendency, θ is the dispersion parameter, and d ( · , · ) is the discrepancy function. When θ > 0 , π 0 is the mode of the distribution; when θ < 0 , π 0 is the antimode; and when θ = 0 , the distribution is uniform.
 Definition 2.
The explicit formulation of the normalizing constant of the MM is
Ψ ( π 0 , θ ) = π i Ω e θ d ( π 0 , π i ) ,
where Ω is the permutation space.
The explicit form of the normalizing constant in Equation (3) is valid for any d ( · , · ) , but is computationally expensive to calculate. The calculation time complexity is O ( n ! ) for permutation spaces and O ( 2 n ) for combinatorial domains. Therefore, closed-form expressions for the normalizing constant, which do not require the summation over all possible permutations in Ω , are required to make the MM computationally feasible. The issue of computational complexity can be circumvented by the definition of a closed-form expression which is independent of the modal tendency. A closed-form expression of the normalizing constant, Ψ ( θ ) , depends on only the dispersion parameter and is often calculable in O ( n ) . However, Ψ ( θ ) is discrepancy-function-specific, with no general form, and so has to be derived for each new d ( · , · ) .

3. Discrepancy Functions

3.1. Permutation Discrepancy Functions

The original MM was defined with two variations (i.e., Mallows’ ϕ and Mallows’ θ ), which correspond to the use of Kendall’s τ and Spearman’s ρ , respectively. However, Diaconis proposed that any right-invariant discrepancy metric can be used with the MM and outlined six existing metrics which are compatible [16].
Kendall proposed a “measure of rank correlation”, named τ which measures the number of discordant pairs between two permutations [14]. Practically, Kendall’s τ counts the number of inversions required to transform two permutations to be alike. The distance measure based on τ is
d τ ( π , σ ) = 2 Σ n ( n 1 ) ,
where Σ = | { ( i , j ) Ω × Ω : i < j π ( i ) > σ ( j ) } | is the inversion number, i.e., the number of inversions needed to transform π into σ . The closed-form expression for Kendall’s τ was derived by Fligner and Verducci as Ψ ( θ ) = j = 1 n 1 1 e θ ( n j + 1 ) 1 e θ [17].
Spearman’s correlation coefficient, alternatively named Spearman’s ρ , was proposed by [13]. Spearman’s ρ is a non-parametric measure of correlation between the rankings of two variables and is used to measure the discrepancy between permutations. Closely related to ρ is Spearman’s footrule [18], which was introduced as an alternative to Spearman’s correlation coefficient and can be used with the MM. The ρ -based distance measure for permutations is defined as
d ρ ( π , σ ) = j = 1 n π ( j ) σ ( j ) 2 .
No known closed-form expression for Ψ ( θ ) with the correlation coefficient distance exists.
Hamming developed a system to detect and correct errors on digital computers, a system which relies heavily on the calculation of a distance between binary vectors [19]. The distance chosen by Hamming is equivalent to the complement of logical and (∧) of binary vectors, and is referred to as the Hamming distance. The Hamming distance is a count of the number of positions for which the symbols of two permutations differ:
d H ( π , σ ) = n j = 1 n δ π ( j ) , σ ( j )
Fligner and Verducci derived the closed-form expression for the normalizing constant with the Hamming distance as Ψ ( θ ) = n ! e n θ j = 0 n ( e θ 1 ) j j ! [17].
The Ulam distance is the minimum number of insertions and deletions required to transform one permutation to another and is the permutation edit distance [20]. The Ulam distance is a specific case of the generalized Levenshtein edit distance from [21]. Alternatively, the Ulam distance can be thought of as the complement of the longest increasing subsequence of a permutation as follows:
d U ( π , σ ) = n max k { { π σ 1 ( i 1 ) , , π σ 1 ( i k ) } | π σ 1 ( i 1 ) < < π σ 1 ( i k ) } .
There is no additive decomposition form for the Ulam distance. Therefore, Irurozki defined an alternative formula for the normalizing constant as Ψ ( θ ) = d = 0 n S u ( n , d ) e θ d , where S u ( n , d ) is the number of permutations at distance d from the identity [22].
Cayley proposed the distance which counts the minimum number of transpositions required to change a permutation π into σ [23]. The minimum number of transpositions can be calculated as the complement of the number of cycles in the composition of the two permutations which are compared. The resulting formula for the distance is
d C ( π , σ ) = n | { { i 1 , , i k } { 1 , , n } | π σ 1 ( i j ) = i ( j + 1 ) mod k } | .
The initial closed-form expression for the Cayley distance was given by Fligner and Verducci [17], with errors corrected by Irurozki [22]. The resulting expression is Ψ ( θ ) = j = 1 n 1 ( ( n j ) e θ + 1 ) .

3.2. Combinatorial Discrepancy Functions

Combinatorial discrepancy functions calculate the dissimilarity between sets within the defined universal set of elements, κ , η U , with | U | = n . Unlike discrepancy functions on permutation domains, combinatorial discrepancy functions do not account for the ordering of elements.
Although the Hamming distance was originally developed for binary vectors, and is popularly applied to permutations, it is also used for sets. The Hamming distance in set notation makes use of the intersection operation, instead of the logical and, which results in
d H ( κ , η ) = n | κ η | .
The Jaccard distance is one of the most popular set-based distance measures proposed by botanist Jaccard [24]. The Jaccard index was originally developed to measure the amount of diversity of plant species in different plots of land, with the Jaccard distance defined as the complement of the Jaccard index. The Jaccard distance is
d J ( κ , η ) = 1 | κ η | | κ η | .
Similarity-coefficient-based distance metrics are popular in the existing literature. Similarity coefficients measure the amount of agreement (in present attributes and non-present attributes) between two sets. Matching coefficient (MC) is a term often used to describe similarity coefficients related to the Rand index [25] or the coefficients attributed to either Goodall [26] or Sokal and Michener [27]. MCs are often normalized to the range [ 0 , 1 ] , with a corresponding distance metric defined as the complement of the coefficient. The generally used form of similarity-coefficient-based distances is
d M ( κ , η ) = | κ η |   +   | κ c η c | n .
A simple measure of dissimilarity which can be used for sets is the count of the number of insertions and deletions required to transform set A to set B. The count of insertions and deletions is simply the symmetric difference between two sets, denoted by Δ , which is formulated as
d S ( κ , η ) = | η κ | + | κ η | = | κ Δ η | .
The symmetric difference is closely related to the Jaccard distance, since the Jaccard distance is equivalent to | A Δ B | | A B | .

4. Normalizing Constant for Combinatorial Domains

Fligner and Verducci outline a process by which the closed-form expression of the normalizing constant can be derived, representing the distance to the central tendency as a random variable ν [17]. The moment-generating function (MGF) of the distribution over ν , D ( ν ) , is calculated under assumed conditions of uniformity over the input domain. The Taylor series expansion of the additive decomposition of the discrepancy measure is then substituted into the MGF of D ( ν ) , which results in the closed-form expression for the normalizing constant.
The MGF of a discrepancy function as a random variable is given in Definition 3.
 Definition 3.
Let D ( ν ) be the distance of an arbitrary combination to the central tendency expressed as a random variable.
MGF ( D ( ν ) ) M D , θ ( t )
Given the condition θ = 0 , the following shorthand notation is defined:
M D , 0 ( t ) = M ( t )
and
P ( D ( ν ) = d | θ = 0 ) = P 0 ( d )
Under the assumption of uniformity, the normalizing constant can be expressed using the MGF of D ( ν ) as in Lemma (1).
 Lemma 1.
The normalizing constant, under the assumption of uniformity, is related to the MGF by
Ψ ( θ ) = 2 n M ( θ )
 Proof.
Ψ ( θ ) = π Ω e θ d ( π , π 0 ) = 2 n d i = d ( π , π 0 ) π Ω P 0 ( d i ) e θ d i = 2 n M ( θ )
The exponential tilting, as first defined in [28], of the MM is given in Definition 4.
 Definition 4.
Since MM is an exponential family, the MGF of D ( ν ) with the MM is
M D , θ = M ( t θ ) M ( θ )
According to Fligner and Verducci, the expected value and variance of D ( ν ) was derived through direct argument by Lehman [29] and is given in Definition 5.
 Definition 5.
The expected value of D ( ν ) is
E θ D ( ν ) = d d t l o g ( M ( t ) ) | t = θ
and the variance of D ( ν ) is
V a r θ D ( ν ) = d 2 d t 2 l o g ( M ( t ) ) | t = θ
Consider an appropriate discrepancy function, d ( · , · ) , and a combination, κ , (for distinguishing nomenclature, permutations are referenced by π and combinations by κ ). The distance from κ to the central tendency, κ 0 , can then be defined element-wise as in Definition 6.
 Definition 6.
The element-wise representation of a discrepancy function between a combination κ and a central tendency κ 0 is captured by the quantity ϵ i as
ϵ i ( κ ) = 1 i f κ ( i ) κ 0 ( i ) , 0 otherwise ,
where is a dyadic operator for element-wise comparison and κ ( i ) is an element within the combination of κ.
The element-wise representation of the discrepancy function in Equation (21) can then be used to define the additive decomposition of d ( κ , κ 0 ) = d ( κ ) , given in Definition 7.
 Definition 7.
The additive decomposition of a discrepancy function is
d ( κ ) = g ( X ( κ ) )
with
X ( κ ) = i = 1 n ϵ i ( κ ) ,
where g is a function which transforms X ( κ ) if needed (e.g., linear transform, translation, or identity).
 Definition 8.
Assume that κ is uniformly distributed over Ω; define a function f such that
f ( t ) = E t X
where, for brevity, X is the distance of a sampled combination to the central tendency.
 Definition 9.
The jth derivative of f ( t ) evaluated at t = 1 is
f ( j ) ( 1 ) = E X ! ( X j ) ! ,
Proof. 
From the power rule
d d t f ( t ) = E X · t ( X 1 ) ,
which after repeated application becomes
d j d t j f ( t ) = E X · · ( X j ) · t ( X j ) .
When evaluated at t = 1 , Equation (27) results in
d j d t j f ( 1 ) = E X · · ( X j ) · 1 ( X j ) = E X ! ( X j ) ! .
 Lemma 2.
From the definition of f and f ( j ) , the Taylor series expansion of f ( t ) at t = 1 is
f ( t ) = t 1 2 + 1 n .
Proof. 
The nth degree Taylor series expansion of f ( t ) around t = 1 is
f ( t ) = j = 0 n f ( j ) ( 1 ) j ! ( t 1 ) j .
From Definition 9,
f ( t ) = j = 0 n E X ! ( X j ) ! j ! ( t 1 ) j = j = 0 n E X ! j ! ( X j ) ! ( t 1 ) j = j = 0 n E X j ( t 1 ) j
where the binomial coefficient X j is zero when X < j . The binomial coefficient can also be redefined as
X j = A j ϵ i 1 · · ϵ i j
where A j = { S Ω | | S | = j } . Consequently,
E X j = A j E ϵ i 1 · · ϵ i j = n j 2 j .
Hence,
f ( t ) = j = 0 n E X j ( t 1 ) j = j = 0 n n j 2 j ( t 1 ) j = j = 0 n n j t 1 2 j .
Finally, from the binomial theorem of ( x + y ) n = j = 0 n n j x j y n j with the substitution of x = t 1 2 and y = 1 , the result is
f ( t ) = j = 0 n n j t 1 2 j = t 1 2 + 1 n .

4.1. Normalizing Constant with the Hamming Distance

This section proves the existence of a closed-form expression for the normalizing constant of the MM with the Hamming distance.
 Theorem 1.
The closed-form expression of the MM normalizing constant with the Hamming distance is
Ψ ( θ ) = e θ + 1 n
Proof. 
Let the element-wise representation of the Hamming distance between two combination be defined using the equality operator (=) as
ϵ i ( κ ) = 1 if κ ( i ) = κ 0 ( i ) , 0 otherwise ,
The additive decomposition of the Hamming distance is then defined with the transform function g H ( x ) = n x , i.e.,
d H ( κ ) = n X ( κ ) .
The MGF of D ( ν ) with the Hamming distance and the assumption of uniformity is
M ( t ) = E e ( n X ) t = e n t f ( e t ) = e n t e t 1 2 + 1 n .
From Lemmas (1) and (2), the closed-form expression for the normalizing constant with the Hamming discrepancy function is
Ψ ( θ ) = 2 n e θ n e θ 1 2 + 1 n = e θ + 1 n .

4.2. Normalizing Constant with the Symmetric Difference

This section proves the existence of a closed-form expression for the normalizing constant of the MM with the symmetric difference.
 Theorem 2.
The closed-form expression of the MM normalizing constant with the symmetric difference is
Ψ ( θ ) = e θ + 1 n
Proof. 
Let the element-wise representation of the symmetric difference between two combination be defined in terms of the logical exclusive or (⊕) as
ϵ i ( κ ) = 1 if κ ( i ) κ 0 ( i ) , 0 otherwise .
The additive decomposition of the symmetric difference is then defined with the transform function g S ( x ) = x , i.e.,
d S ( κ ) = X ( κ )
The MGF of D ( ν ) with the symmetric difference and the assumption of uniformity is
M ( t ) = E e X t = f ( e t ) = e t 1 2 + 1 n .
From Lemmas (1) and (2), the closed-form expression for the normalizing constant with the symmetric difference discrepancy function is
Ψ ( θ ) = 2 n e θ 1 2 + 1 n = e θ + 1 n .
The cardinality of the symmetric difference between two combinations, | κ Δ η | , is equivalent to the Hamming distance between κ and η . Therefore, Equations (40) and (45) are equivalent despite the difference in Boolean operator used to define the distance metrics.

4.3. Normalizing Constant with the Similarity Coefficient Distance

This section proves the existence of a closed-form expression for the normalizing constant of the MM with the similarity coefficient.
 Theorem 3.
The closed-form expression of the MM normalizing constant with the similarity coefficient is
Ψ ( θ ) = e θ n + 1 n
Proof. 
Let the element-wise representation of the similarity coefficient distance be defined in terms of the negation of the logical exclusive or (⊙) as
ϵ i ( κ ) = 1 if κ ( i ) κ 0 ( i ) , 0 otherwise .
The additive decomposition of the symmetric difference is then defined with the transform function g M ( x ) = x n , i.e.,
d M ( κ ) = X ( κ ) n
The MGF of D ( ν ) with the similarity coefficient and the assumption of uniformity is
M ( t ) = E e X t = f ( e t n ) = e t n 1 2 + 1 n .
From Lemmas (1) and (2), the closed-form expression for the normalizing constant with the similarity coefficient discrepancy function is
Ψ ( θ ) = 2 n e θ n 1 2 + 1 n = e θ n + 1 n .

5. Weighted Mallows Model for Combinatorial Domains

Fligner and Verducci proposed an extension of the MM, the generalized Mallows model (GMM) [17]. The GMM has n 1 dispersion parameters (i.e., θ j with 1 j < n ), which are each used to influence the probability of a specific position within a sampled permutation. Irurozki proposed the weighted Mallows model (WMM) extension of the MM with the Hamming distance for permutation problems [22]. Since the decomposition vector of the Hamming distance has n terms instead of n 1 terms, the GMM of the Hamming distance does not exist.
The WMM can be constructed for combination spaces as well. The WMM is used to control the likelihood of specific elements being included in the combinations sampled around the central tendency. In other words, the WMM can sample combinations, given a central tendency and collection of dispersion parameters, and take into account different levels of preference for the inclusion of each element e i U .
 Definition 10.
Let the element-wise distance vector be
D ( κ ) = ( D 1 ( κ , κ 0 ) , , D n ( κ , κ 0 ) ) ,
where D j ( κ , κ 0 ) = 0 if κ ( i ) κ 0 ( i ) and 1 otherwise (i.e., D ( κ ) is a vector of discrepancy function components defined using the dyadic operator ). It follows that d ( κ , κ 0 ) = j = 1 n D j ( κ ) .
 Definition 11.
Define a vector
ϵ ( κ ) = ( ϵ 1 ( κ ) , , ϵ n ( κ ) )
with ϵ ( κ ) = g ( D ( κ ) ) (i.e., each ϵ j ( κ ) = g ( D j ( κ ) ) ) where g is a function which transforms D j ( κ ) if needed (e.g., linear transform, translation, or identity). When κ is sampled uniformly, ϵ ( κ ) is a random binary vector; therefore, the probability of sampling κ with a distance vector ϵ is P 0 ( ϵ ( κ ) = ϵ ) .
 Lemma 3.
The normalizing constant, under the assumption of uniformity, is related to the MGF by
Ψ ( θ ) = 2 n M ϵ ( θ )
Proof. 
The normalizing constant can be defined as a function of the MGF of a random vector X -associated MGF, M X ( t ) = E j = 1 n e t j X j . Let
Ψ ( θ ) = 2 n D B n P 0 ( D ( κ ) = D ) e j = 1 n θ j D j ( κ ) = 2 n g ( D ) B n P 0 ( g ( D ) ) e j = 1 n θ j g ( D j ) = 2 n ϵ B n P 0 ( ϵ ) e j = 1 n θ j ϵ j = 2 n M ϵ ( θ ) .
 Definition 12.
Assume that κ is uniformly distributed over Ω, define a function f similarly to the process in [17] as
f ϵ ( t ) = f ϵ ( t 1 , , t n ) = E t 1 ϵ 1 t n ϵ n
 Lemma 4.
The Taylor series expansion of f ϵ ( t ) at t = 1 is
f ( t ) = j = 1 n t i + 1 2
Proof. 
The multivariate Taylor series expansion of f ( t ) around t = 1 is
f ( t ) = j = 0 1 j ! x 1 + + x n = j j x 1 x n j f t 1 x 1 t n x n | t = 1 ( t 1 1 ) x 1 ( t n 1 ) x n ;
while the derivative of f ϵ with respect to variable t i is given as
f ϵ t i = ϵ P 0 ( ϵ ) t 1 ϵ 1 ϵ i t i ϵ i 1 t n ϵ n = ϵ | ϵ i = 0 P 0 ( ϵ ) t 1 ϵ 1 0 · t i 0 1 t n ϵ n + ϵ | ϵ i = 1 P 0 ( ϵ ) t 1 ϵ 1 1 · t i 1 1 t n ϵ n = 0 + ϵ | ϵ i = 1 P 0 ( ϵ ) j i t j ϵ j ,
which evaluated at t = ( 1 , , 1 ) is
f ϵ t i | t = 1 = ϵ | ϵ i = 1 P 0 ( ϵ ) 1 ϵ 1 1 · 1 1 1 1 ϵ n = ϵ | ϵ i = 1 P 0 ( ϵ ) .
Equation (59) is simply the probability of sampling a combination uniformly given that element i is included in the combination, i.e.,
ϵ | ϵ i = 1 P 0 ( ϵ ) = 2 n 1 2 n .
The second-order partial derivation with respect to the same variable t i is zero, i.e.,
2 f t i 2 = 0 .
However, the second-order cross partial derivative with respect to two different variables t i 1 and t i 2 is
f ϵ t i 1 t i 2 | t = 1 = ϵ | ϵ i 1 = 1 , ϵ i 2 = 1 P 0 ( ϵ ) 1 ϵ 1 1 · 1 1 1 1 · 1 1 1 1 ϵ n = ϵ | ϵ i 1 = 1 , ϵ i 2 = 1 P 0 ( ϵ ) ,
which is the probability of sampling a combination uniformly given that elements i 1 and i 2 are included in the combination, i.e.,
ϵ | ϵ i 1 = 1 , ϵ i 2 = 1 P 0 ( ϵ ) = 2 n 2 2 n .
In general, the k-th-order cross partial derivative evaluated at t = 1 is
k f t i 1 t i k | t = 1 = 2 n k 2 n = 1 2 k
With the underlying assumptions j ϵ 1 ϵ n = j ! ϵ B n , and j n j f t i j = 0 j > 1 , the Taylor series expansion becomes
f ϵ ( t ) = j = 0 n 1 j ! A j j ϵ i 1 ϵ i n j f t i 1 t i j | t = 1 s = 1 j ( t i s 1 ) = j = 0 n 1 j ! A j j ! 2 j s = 1 j ( t i s 1 ) = j = 0 n 1 2 j A j s = 1 j ( t i s 1 ) = j = 0 n 1 2 j e j ( t 1 1 ) , , ( t n 1 )
where e j ( t 1 1 ) , , ( t n 1 ) is the elementary symmetric polynomial (ESP) of degree j and A j = { S Ω | | S | = j } . The sequence of the ESP has a generating function of the form
j = 1 n 1 + x j · a = j = 0 n a j · e j x 1 , , x n
which can be used to represent the weighted sum of the ESPs as
j = 0 n 1 2 j e j x 1 , , x n = j = 1 n 1 + x j · 1 2
Hence,
f ( t ) = j = 1 n t i + 1 2

5.1. Weighted Mallows Model Normalizing Constant with the Hamming Distance

This section proves the existence of a closed-form expression for the normalizing constant of the WMM with the Hamming distance.
 Theorem 4.
The closed-form expression of the WMM normalizing constant with the Hamming distance is
Ψ ( θ ) = j = 1 n e θ j + 1 .
Proof. 
Let the element-wise distance vector be H ( κ ) = ( H 1 ( κ , κ 0 ) , , H n ( κ , κ 0 ) ) where H j ( κ , κ 0 ) = 1 if κ ( i ) = κ 0 ( i ) and 0 otherwise (i.e., a vector of Hamming distance components). It follows that
d H ( κ , κ 0 ) = j = 1 n H j ( κ )
Define the vector ϵ ( κ ) using the function g H ( x ) = 1 x ,
ϵ ( κ ) = g H ( H ) = ( g H ( H 1 ( κ ) ) , , g H ( H n ( κ ) ) ) = ( 1 H 1 ( κ ) , , 1 H n ( κ ) )
The MGF parameterized by ϵ is
M ϵ ( θ ) = M g H ( H ) ( θ ) = M ( 1 H ) ( θ ) = e j = 1 n θ j f ϵ ( e θ ) = e j = 1 n θ j j = 1 n 1 + e θ j 2
which, from Lemmas (3) and (4), results in the closed-form expression for the normalizing constant of the WMM with the Hamming distance as
Ψ ( θ ) = 2 n e j = 1 n θ j j = 1 n 1 + e θ j 2 = j = 1 n e θ j + 1 .

5.2. Weighted Mallows Model Normalizing Constant with the Symmetric Difference

This section proves the existence of a closed-form expression for the normalizing constant of the WMM with the symmetric difference.
 Theorem 5.
The closed-form expression of the WMM normalizing constant with the symmetric difference is
Ψ ( θ ) = j = 1 n e θ j + 1 .
Proof. 
Let the element-wise distance vector be S ( κ ) = ( S 1 ( κ , κ 0 ) , , S n ( κ , κ 0 ) ) where S j ( κ , κ 0 ) = 1 if κ ( i ) κ 0 ( i ) and 0 otherwise (i.e., a vector of symmetric difference components). It follows that
d S ( κ , κ 0 ) = j = 1 n S j ( κ )
Define the vector ϵ ( κ ) using the function g S ( x ) = x ,
ϵ ( κ ) = g S ( S ) = ( g S ( S 1 ( κ ) ) , , g S ( S n ( κ ) ) ) = ( S 1 ( κ ) , , S n ( κ ) )
The MGF parameterized by ϵ is
M ϵ ( θ ) = M g S ( S ) ( θ ) = M S ( θ ) = f ϵ ( e θ ) = j = 1 n 1 + e θ j 2
which, from Lemmas (3) and (4), results in the closed-form expression for the normalizing constant of the WMM with the symmetric difference as
Ψ ( θ ) = 2 n j = 1 n 1 + e θ j 2 = j = 1 n e θ j + 1 .

5.3. Weighted Mallows Model Normalizing Constant with the Similarity Coefficient Distance

This section proves the existence of a closed-form expression for the normalizing constant of the WMM with the similarity coefficient.
 Theorem 6.
The closed-form expression of the WMM normalizing constant with the similarity coefficient is
Ψ ( θ ) = j = 1 n e θ j n + 1 .
Proof. 
Let the element-wise distance vector be M ( κ ) = ( M 1 ( κ , κ 0 ) , , M n ( κ , κ 0 ) ) where M j ( κ , κ 0 ) = 1 n if κ ( i ) κ 0 ( i ) and 0 otherwise (i.e., a vector of similarity coefficient components). It follows that
d M ( κ , κ 0 ) = j = 1 n M j ( κ )
Define the vector ϵ ( κ ) using the function g M ( x ) = x n ,
ϵ ( κ ) = g M ( M ) = ( g M ( M 1 ( κ ) ) , , g M ( M n ( κ ) ) ) = ( M 1 ( κ ) n , , M n ( κ ) n )
The MGF parameterized by ϵ is
M ϵ ( θ ) = M g M ( M ) ( θ ) = M M n ( θ ) = f ϵ ( e θ ) = j = 1 n 1 + e θ j n 2
which, from Lemmas (3) and (4), results in the closed-form expression for the normalizing constant of the WMM with the similarity coefficient as
Ψ ( θ ) = 2 n j = 1 n 1 + e θ j n 2 = j = 1 n e θ j n + 1 .

6. Discussion

The preceding sections derived the closed-form expressions for the MM and WMM. The derived expressions for the normalizing constant of the MM and WMM with different discrepancy functions are summarized in Table 1.
Consider a hypothetical combinatorial domain defined on the universal set U = { a , b , c } . Given that the cardinality of the universal set is | U | = n , the set U spans 2 n possible combinations, i.e.,
P ( U ) = { , { a } , { b } , { c } , { a , b } , { a , c } , { b , c } , { a , b , c } } .
Consider two sets κ = { a , b } and κ 0 = { b } ; the symmetric difference is
d S ( κ , κ 0 ) = | κ Δ κ 0 | = | κ 0 κ | + | κ κ 0 | = 0 + 1 = 1 .
The probability of sampling a set κ given the central tendency κ 0 is expressed as
P ( κ | κ 0 , θ ) = 1 Ψ ( θ ) e θ d ( κ , κ 0 ) ;
with an arbitrary value of θ = 1.5 ,
P ( { a , b } | { b } , 1.5 ) = 1 e 1.5 + 1 3 e 1.5 · 1 = 0.1219 .
Alternatively, if the similarity coefficient is used, i.e.,
d M ( κ , κ 0 ) = | κ κ 0 | + | κ c κ 0 c | n = 1 + 1 3 = 2 3 .
Then the resultant probability is
P ( { a , b } | { b } , 1.5 ) = 1 e 1.5 3 + 1 3 e 1.5 · 2 3 = 0.0887 .
If the value of the dispersion parameter is arbitrary changed to θ = 5 for the similarity coefficient calculation, the result is
P ( { a , b } | { b } , 5 ) = 1 e 5 3 + 1 3 e 5 · 1 3 = 0.0212 .
The effect of the dispersion parameter is further illustrated by the following two examples with the symmetric difference: given κ = { a , b } and κ 0 = { a , b } , the probability of sampling the central tendency for a dispersion of θ = 0 is
P ( { a , b } | { a , b } , 0 ) = 1 e 0 + 1 3 e 0 · 0 = 0.125 = 1 2 n .
In contrast, for a dispersion of θ = 10 , the probability is
P ( { a , b } | { a , b } , 10 ) = 1 e 10 + 1 3 e 10 · 0 = 0.9998 1.0 .
When the dispersion parameter is zero, the model behaves like a uniform distribution, with all subsets of P ( U ) equally likely to be sampled with a probability of 1 2 n . As the dispersion parameter increases, there is a point at which it becomes certain that the central tendency is the only combination which can be sampled.
The combinatorial MM holds promise for application to a wide range of problems in the combinatorial domain. Many combinatorial decision problems are in the class of NP problems [30]. Combinatorial decision problems in NP often result in COPs, which do not have tractable exact solutions. For example, the graph coloring problem, Boolean satisfiability problem, job-shop scheduling problem, and the traveling salesman problem are all examples of COPs which are in NP. As a result, approximation approaches such as metaheuristics are employed to find potentially optimal solutions in a reasonable amount of time.
The combinatorial MM is applicable to any problem for which there is an underlying combinatorial decision problem and which can be represented as a finite set. It is a popular approach in the literature to enhance the exploration and exploitation capabilities of metaheuristics through the inclusion of a probabilistic local search mechanism. For example, the popular particle swarm optimisation (PSO) metaheuristic has a variant which uses the Gaussian density function to sample new positions [31]. Similarly, the logistic chaotic function has been employed to improve the performance of differential evolution (DE) [32], and the Wigner semicircle distribution has been used to stochastically apply different local search techniques to evolutionary strategies (ESs) [33].

7. Conclusions and Future Work

This paper expanded the Mallows model (MM) and weighted Mallows model (WMM) for use in combinatorial domains. Previously, the computational bottleneck for probability calculation and combination sampling has been the normalizing constant. The requirement to sum over all possible subsets of elements in the universal set made it computationally infeasible to use the MM to sample combinations in relation to a central tendency.
The closed-form expressions for the normalizing constant with the Hamming distance, symmetric difference, and similarity coefficient (Rand index) makes the MM and WMM computationally feasible for use on combinatorial problems and provides a platform for solving problems defined using set-valued representations. Undoubtedly, the most natural measurement of discrepancy in combinatorial domains remains the Jaccard distance. Unfortunately, the closed-form expression for the Jaccard distance remains unobtainable due to the lack of an additive decomposition form; however, researchers are invited to attempt a more efficient expression for the normalizing constant with the Jaccard distance.
Additionally, future work should concentrate on efficient sampling methods for the MM and determine whether existing approaches like Gibbs sampling are appropriate. Further, the parameter learning of the combinatorial MM should be explored, for example, a thorough investigation into the maximum likelihood estimation (MLE) of the parameters.

Author Contributions

Conceptualization, J.-P.v.Z. and A.P.E.; methodology, J.-P.v.Z.; formal analysis, J.-P.v.Z.; writing—original draft preparation, J.-P.v.Z.; writing—review and editing, A.P.E.; supervision, A.P.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Berge, C. Principles of Combinatorics; Academic Press: New York, NY, USA, 1971. [Google Scholar]
  2. Biggs, N.L. The roots of combinatorics. Hist. Math. 1979, 6, 109–136. [Google Scholar] [CrossRef]
  3. Cameron, P.J. Permutation Groups; Number 45; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
  4. Nixon, R. Amending the Selective Service Regulations to Prescribe Random Selection; Executive Order 11497, Number 229; Federal Register: Washington, DC, USA, 1969; Volume 34. [Google Scholar]
  5. Fienberg, S.E. Randomization and Social Affairs: The 1970 Draft Lottery. Science 1971, 171, 255–261. [Google Scholar] [CrossRef]
  6. Hall, M.; Knuth, D.E. Combinatorial Analysis and Computers. Am. Math. Mon. 1965, 72, 21–28. [Google Scholar] [CrossRef]
  7. Papadimitriou, C.H.; Steiglitz, K. Combinatorial Optimization: Algorithms and Complexity; Courier Corporation: New York, NY, USA, 2013. [Google Scholar]
  8. Chen, S.; Lin, T.; King, I.; Lyu, M.R.; Chen, W. Combinatorial pure exploration of multi-armed bandits. Adv. Neural Inf. Process. Syst. 2014, 27, 379–387. [Google Scholar]
  9. Robins, G.; Pattison, P.; Kalish, Y.; Lusher, D. An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 2007, 29, 173–191. [Google Scholar] [CrossRef]
  10. Liang, F. A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants. J. Stat. Comput. Simul. 2010, 80, 1007–1022. [Google Scholar] [CrossRef]
  11. Park, J.; Haran, M. Bayesian inference in the presence of intractable normalizing functions. J. Am. Stat. Assoc. 2018, 113, 1372–1390. [Google Scholar] [CrossRef]
  12. Mallows, C.L. Non-null ranking models. I. Biometrika 1957, 44, 114–130. [Google Scholar] [CrossRef]
  13. Spearman, C. The proof and measurement of association between two things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
  14. Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  15. Bradley, R.A.; Terry, M.E. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 1952, 39, 324–345. [Google Scholar] [CrossRef]
  16. Diaconis, P. Group Representations in Probability and Statistics; Lecture Notes-Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1988; Volume 11, pp. i–vi+1–192. [Google Scholar]
  17. Fligner, M.A.; Verducci, J.S. Distance based ranking models. J. R. Stat. Soc. Ser. B (Methodol.) 1986, 48, 359–369. [Google Scholar] [CrossRef]
  18. Spearman, C. Footrule for measuring correlation. Br. J. Psychol. 1906, 2, 89. [Google Scholar] [CrossRef]
  19. Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
  20. Ulam, S.M. Monte Carlo calculations in problems of mathematical physics. Mod. Math. Eng. 1961, 261, 281. [Google Scholar]
  21. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics-Doklady; American Institute of Physics: New York, NY, USA, 1966; Volume 10, pp. 707–710. [Google Scholar]
  22. Irurozki, E. Sampling and Learning Distance-Based Probability Models for Permutation Spaces. Ph.D. Thesis, Universidad del País Vasco-Euskal Herriko Unibertsitatea, Biscay, Spain, 2014. [Google Scholar]
  23. Cayley, A. Note on the theory of permutations. Lond. Edinburgh Dublin Philos. Mag. J. Sci. 1849, 34, 527–529. [Google Scholar] [CrossRef]
  24. Jaccard, P. Distribution de la Flore Alpine dans le Bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 241–272. [Google Scholar]
  25. Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
  26. Goodall, D.W. The distribution of the matching coefficient. Biometrics 1967, 23, 647–656. [Google Scholar] [CrossRef]
  27. Sokal, R.R.; Michener, C.D. A Statistical Method for Evaluating Systematic Relationships; University of Kansas Scientific Bulletin: Lawrence, KS, USA, 1958. [Google Scholar]
  28. Esscher, F. On the probability function in the collective theory of risk. Scand. Actuar. J. 1932, 1932, 175–195. [Google Scholar] [CrossRef]
  29. Lehmann, L. Theory of Point Estimation; A Wiley Publication in Mathematical Statistics; Wiley: Hoboken, NJ, USA, 1983. [Google Scholar]
  30. Karp, R.M. On the computational complexity of combinatorial problems. Networks 1975, 5, 45–68. [Google Scholar] [CrossRef]
  31. Krohling, R.A. Gaussian swarm: A novel particle swarm optimization algorithm. In Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore, 1–3 December 2004; IEEE: New York, NY, USA, 2004; Volume 1, pp. 372–376. [Google Scholar]
  32. Jia, D.; Zheng, G.; Khan, M.K. An effective memetic differential evolution algorithm based on chaotic local search. Inf. Sci. 2011, 181, 3175–3187. [Google Scholar] [CrossRef]
  33. Lara, A.; Sanchez, G.; Coello, C.A.C.; Schutze, O. HCS: A new local search strategy for memetic multiobjective evolutionary algorithms. IEEE Trans. Evol. Comput. 2009, 14, 112–132. [Google Scholar] [CrossRef]
Table 1. Summary of derived normalizing constants.
Table 1. Summary of derived normalizing constants.
Discrepancy FunctionNormalizing Constant
Hamming distance Ψ ( θ ) = e θ + 1 n
Symmetric difference Ψ ( θ ) = e θ + 1 n
Similarity index Ψ ( θ ) = e θ n + 1 n
Weighted Hamming distance Ψ ( θ ) = j = 1 n e θ j + 1
Weighted Symmetric difference Ψ ( θ ) = j = 1 n e θ j + 1
Weighted Similarity index Ψ ( θ ) = j = 1 n e θ j n + 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

van Zyl, J.-P.; Engelbrecht, A.P. Closed-Form Expressions for the Normalizing Constants of the Mallows Model and Weighted Mallows Model on Combinatorial Domains. Mathematics 2025, 13, 3126. https://doi.org/10.3390/math13193126

AMA Style

van Zyl J-P, Engelbrecht AP. Closed-Form Expressions for the Normalizing Constants of the Mallows Model and Weighted Mallows Model on Combinatorial Domains. Mathematics. 2025; 13(19):3126. https://doi.org/10.3390/math13193126

Chicago/Turabian Style

van Zyl, Jean-Pierre, and Andries Petrus Engelbrecht. 2025. "Closed-Form Expressions for the Normalizing Constants of the Mallows Model and Weighted Mallows Model on Combinatorial Domains" Mathematics 13, no. 19: 3126. https://doi.org/10.3390/math13193126

APA Style

van Zyl, J.-P., & Engelbrecht, A. P. (2025). Closed-Form Expressions for the Normalizing Constants of the Mallows Model and Weighted Mallows Model on Combinatorial Domains. Mathematics, 13(19), 3126. https://doi.org/10.3390/math13193126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop