Assessing Antithetic Sampling for Approximating Shapley, Banzhaf, and Owen Values

: Computing Shapley values for large cooperative games is an NP-hard problem. For practical applications, stochastic approximation via permutation sampling is widely used. In the context of machine learning applications of the Shapley value, the concept of antithetic sampling has become popular. The idea is to employ the reverse permutation of a sample in order to reduce variance and accelerate convergence of the algorithm. We study this approach for the Shapley and Banzhaf values, as well as for the Owen value which is a solution concept for games with precoalitions. We combine antithetic samples with established stratiﬁed sampling algorithms. Finally, we evaluate the performance of these algorithms on four different types of cooperative games.


Introduction
Cooperative game theory [1,2] provides a framework for analyzing situations where groups of individuals or entities collaborate to achieve common goals or outcomes.Unlike non-cooperative game theory, which focuses on strategic interactions without binding agreements, cooperative game theory explores scenarios in which players can form coalitions and distribute the benefits of their cooperation.
Modern applications of cooperative game theory reach far beyond the fair allocation of benefits.For example, models and solution concepts from cooperative game theory are employed for understanding voting power in committees [3][4][5], as well as for analyzing genetic networks [6,7], terrorist networks [8][9][10], or complex shareholding networks [11,12].However, for n players the number of coalitions grows exponentially and this makes efficient computations on cooperative games challenging.The latter is particularly true for the Shapley value [13], the most widely used solution concept from cooperative game theory.In the general case, calculating the Shapley value is NP-hard [14][15][16].Very recently, there has been plenty of research in the field of interpretable machine learning using the Shapley value as a tool to determine the importance of features with respect to the outcome of neural networks [17][18][19][20].In these applications, Shapley values are very frequently approximated via a technique called permutation sampling which was introduced in the seminal paper by Castro et al. (2009) [21].Permutation sampling for Shapley value computation was improved in various ways in order to draw samples more efficiently [22][23][24].In the machine learning community, antithetic sampling, i.e., employing a permutation together with its reverse, has gained traction due to its simplicity since it was recommended in an article by Mitchell et al. (2022) [23].The recent survey by Chen et al. (2023) [18] presents antithetic sampling in a favorable light, and a recent book by Molnar (2023) [20] recommends antithetic sampling as well.
This paper studies antithetic permutation sampling for approximating Shapley values from a cooperative game theory perspective.Deliberately, we also include two other point-valued solution concepts in our investigations.The Banzhaf value [25,26] enjoys a wide range of applications, including contemporary studies on data valuation in machine learning [27].The Owen value [28] generalizes the Shapley value for situations with precoalitions, i.e., the player set is partitioned into disjoint a priori unions of players.Algaba et al. (2023) [8] approximates Owen values to identify the most important people in the terrorist network responsible for the attack on the World Trade Center in September of 2001.
We incorporate antithetic sampling into existing algorithms for approximating the Shapley, Banzhaf, and Owen values.In particular, we also combine existing stratified sampling algorithms for the Shapley value [22] and for the Owen value [29] with the concept of antithetic sampling via the reverse permutation and develop new algorithms.We point out the unbiasedness and consistency of our estimators.
Our article is limited to point-valued solution concepts based on marginal contributions.For a study presenting Shapley value approximation in the broader context of computing solutions of cooperative games, we refer the reader to Liben-Nowell et al. (2012) [30].Specifically, the goal of this research is to evaluate the potential of antithetic sampling in the context of sampling permutations or subsets with replacement.There are various other concepts for approximating Shapley values which this paper does not discuss, as it is too numerous to list them all.In particular, we do not perform any sampling without replacement [31,32] and we completely omit the multilinear extension method by Owen (1972) [33] and its recent variants [34], any approaches to combine sampling with exact solutions of subproblems [35,36], and any modern machine learning-related models [19,37].
This article is organized as follows: In Section 2, we introduce the basic concepts from cooperative game theory, provide a brief introduction to Monte Carlo methods and some variance reduction techniques, and give a very brief overview of existing permutation sampling algorithms.In Section 3, we discuss antithetic sampling for the Shapley and Banzhaf values, introduce a novel stratified antithetic sampling algorithm for Shapley value approximation, and point out how antithetic samples can be incorporated into the two-stage stratified sampling algorithm with optimum allocation introduced in [22].In Section 4, we integrate antithetic sampling into the approximation algorithm for the Owen value introduced by Saavedra-Nieves, García-Jurado, and Fiestras-Janeiro (2018) in [38] and then develop a sophisticated stratified antithetic sampling method for the Owen value based on the ideas by Saavedra-Nieves from [29].In Section 5, we analyze the performance of the discussed algorithms for four different types of cooperative games and assess the results critically.We end with our conclusions and recommendations in Section 6.

Preliminaries
In this section, we first introduce a few basic concepts from cooperative game theory, including the Shapley [13], Banzhaf [25,26], and Owen [28] values.Afterwards, we introduce some terminology on Monte Carlo methods, including variance reduction via antithetic sampling and stratified sampling.

Cooperative Game with Transferable Utility
The focus of our study is cooperative games with transferable utility (TU games) [1,2].The term cooperative describes the fact that players can form coalitions and make binding agreements on how to distribute the proceeds of these coalitions between themselves.The term transferable utility means that the amount of utility earned by a coalition can be both expressed by a number and transferred between the players.The most common type of utility is money.
A TU game is a pair (N, v) where N = {1, . . ., n} is the set of players and v : 2 N → R is a real valued function, called the characteristic function, defined on the subsets of N. The subsets S ⊆ N are also called coalitions.v maps a real number v(S) to each coalition S representing the amount of utility earned by this coalition.For the empty coalition there holds the normalization v(∅) = 0. Throughout this work, we will denote by a lower-case letter the cardinality of a set, e.g., |S| = s.
Let G N be the set of TU games with player set N. A point-valued solution concept is a map ϕ : G N → R n that assigns a vector ϕ(N, v) ∈ R n to each game (N, v) ∈ G N , where the i-th element of this vector, ϕ i (N, v), represents the worth or influence of player i ∈ N in the game (N, v) according to the underlying solution concept.The most popular point-valued solution concept in cooperative game theory is the Shapley value [13].Given the player set N and the characteristic function v : 2 N → R, the Shapley value of player i is defined as Definition 1 ([21,22]).Let π(N) denote the set of all possible permutations of the player set N = {1, . .
Using the previous definition, the Shapley value (1) of player i can be rewritten in the form Another important point-valued solution concept in cooperative game theory is the Banzhaf Value [25,26].Given the player set N and the characteristic function v : 2 N → R, the Banzhaf value of player i is defined as From ( 2) and (3), we understand that both Shapley and Banzhaf values are based on the concept of marginal contributions.Note that the Shapley value is an efficient solution concept as ∑ n i=1 Sh i (N, v) = v(N) whereas in general the Banzhaf value is not, i.e., the sum of the Banzhaf values of all players does not necessarily equal the value v(N) of the grand coalition.
A TU game with precoalitions (also known as a priori unions) is a triple (N, v, P) where (N, v) is a TU game and P = {P 1 , . . ., P p } is a partition of the player set N with p being the number of precoalitions.Each player has to be part of a precoalition, i.e., ∪ p b=1 P b = N.Furthermore, all precoalitions are disjoint, i.e., P b ∩ P c = ∅ for all b, c ∈ {1, . . ., p} and b = c.We denote by P (i) the precoalition to which player i belongs.Throughout this work, we will use the terms precoalition, a priori union, and union as synonyms as long as there is no ambiguity.Likewise, we will use the terms partition and coalition structure synonymously.
We denote by ψ : U N → R n a point-valued solution concept for games with precoalitions, where U N is the set of TU games with precoalitions with player set N. The most frequently used solution concept for cooperative games with precoalitions is the Owen Value [28].It can be viewed as an extension of the Shapley value to games with precoalitions.Given the player set N, the characteristic function v : 2 N → R, and the partition P, the Owen value of player i is defined as Just like for the Shapley value, it is also possible to write the Owen value in terms of permutations.We call a permutation O ∈ π(N) compatible with a partition of the player set P if the elements of each class of P are never torn apart in the order O. Let π(N) denote the set of all permutations of N which are compatible with the coalition structure P.Then, the Owen value (4) of player i can be rewritten in the form

Monte Carlo Methods
Our introduction to Monte Carlo methods and permutation sampling follows Botev and Ridder (2017) [39] and Mitchell et al. (2022) [23].Let µ = E[H(X)], where X : Ω → D H is a discrete random variable following an equal distribution with Ω being the sample space, and H : D H → R an arbitrary function that returns a real value for any value of its domain D H .The exact value of µ can be retrieved by evaluating Using the crude Monte Carlo method [39], we can approximate µ as with X 1 , . . ., X m being drawn as i.i.d.replications of X and m being the number of samples.The resulting estimator µ is unbiased, i.e., E[µ] = µ, and its variance is given by so that the variance shrinks with an increasing number of samples m, i.e., the estimator µ is consistent, as long as Var[H(X)] is finite.Using Equation (2) and employing a uniform sample of permutations Π m ⊂ Ω = π(N) of size m delivers a simple Monte Carlo estimator for the Shapley value This approach is called permutation sampling and was formally established by Castro et al. (2009) [21].The estimator for the Shapley value of player i in ( 6) is unbiased and consistent.The Central Limit theorem guarantees convergence at a rate of O(1/ √ m).In terms of a practical implementation, a single sample of m permutations Π m can be used to evaluate the Shapley values Sh i for all players i.We can walk through each permutation O ∈ Π m of length n and when incrementing i and evaluating v(pre i (O) ∪ {i}) we simply reuse v(pre i (O)) from the previous computation [23].
Antithetic sampling is a variance reduction technique for Monte Carlo methods.Instead of taking i.i.d.samples, samples are taken as correlated pairs.The overview article [40] defines the antithetic estimate as where X 1 , . . ., X m/2 is an i.i.d.sample and X as 1 , . . ., X as m/2 its corresponding antithetic sample.The variance of the estimator µ as is given by so that if H(X) and H(X as ) have a negative covariance, i.e., they are negatively correlated, the variance of the estimator µ as is reduced compared to the crude Monte Carlo approach.Antithetic sampling for functions of permutations was first investigated in [41].The idea is simply to combine permutations and their reverse permutations.The purpose of this article is to study antithetic sampling more deeply and to integrate this idea into established approaches for stratification.
Stratified sampling is a more general variance reduction technique.It divides the population into strata which form a partition of the sample space Ω [39].Let {z 1 , . . ., z l } be the set of those strata with z b ∩ z c = ∅ for all b, c ∈ {1, . . ., l} and b = c as well as ∪ l b=1 z b = Ω.Let Z be a discrete random variable taking values from {z 1 , . . ., z l }, and then µ can be rewritten as ] hereby is the expected value of H(X) under the condition that Z = z b , which can be approximated by where X b,1 , . . ., X b,l is an i.i.d.sample simulated from the conditional distribution of X given that Z = z b and m b is the sample size of stratum z b .The resulting estimator of µ is given by with E[µ st ] = µ, i.e., the estimator µ st is unbiased, and the upper bound of its variance estimated by [39] as which means this technique should always perform better or at least equally well compared to the crude Monte Carlo method.Note that the latter equation only holds for a proportional allocation of the total sample size m with respect to p Z (•), i.e., m b = p Z (z b )m for all b ∈ {1, . . ., l}.
Stratified sampling was first used for Shapley value estimation in [42] and later improved in [22].We will combine stratified sampling algorithms from [22] with the concept of antithetic sampling in the following section.

Antithetic Sampling for the Shapley and Banzhaf Values
The concept of antithetic sampling was defined in Section 2 in a general way.In this section, we explain how to generate antithetic subsets.We integrate the idea of antithetic subsets into established algorithms for computing Shapley and Banzhaf values.

Antithetic Subset Generation
When applying Monte Carlo methods in the context of cooperative games, samples are mostly permutations or subsets (coalitions) of the player set N. Lomeli et al. (2019) [41] define the antithetic sample of a given permutation as its reverse.In this subsection, we generalize this idea to generating antithetic subsets.Let O be a random permutation of the player set N. The antithetic sample S as to S = pre i (O) can be defined as where rev(O) is a function that returns the reversed permutation of the given order O.We will use the rule (7) to generate antithetic sample elements throughout this work to skip the steps in between of reversing the order and running pre i (•) again.Furthermore, (7) makes it more straightforward to adapt the usage of antithetic sampling to algorithms that are based on sampling coalitions instead of permutations.Let us formally define a function for generating antithetic subsets.
Definition 2. Let N = {1, . . ., n} be the set of players.Then as i : 2 N\{i} → 2 N\{i} with as i (S) = N \ (S ∪ {i}) returns an antithetic subset for a given subset S ⊆ N \ {i} and fixed player i ∈ N.

It is trivial to see
Remark 1.The map as i : 2 N\{i} → 2 N\{i} from Definition 2 is a bijection.
Let us reformulate as i as a piecewise function.
Remark 2. The map as i : 2 N\{i} → 2 N\{i} from Definition 2 can be rewritten as a piecewise function where every as h i is a subfunction as h It is easy to see that the domains of all subfunctions are a partition of the domain of the piecewise function as i , i.e., ∪ n−1 h=0 {S ∈ 2 N\{i} |s = h} = 2 N\{i} and {S ∈ It is trivial to see that this is also true for the codomain of as i , i.e., the codomains of all as h i are a partition of the codomain of as i .Hence, every subfunction as h i is a bijection.
In the following, we will indicate by bold letters, i.e., as i (M), the elementwise application of as i to all subsets S ∈ M in a sample M.

Computing Shapley Values Using Antithetic Sampling
The algorithm ApproShapley is a simple algorithm for Shapley value approximation based on random sampling proposed in [21].We already introduced this idea through Equation ( 6) at the beginning of Section 2.2.Although the algorithm was already extended to make use of antithetic sampling in [23], we dedicate this subsection to this algorithm and provide a concise description of it.
The algorithm ApproShapley from [21] is a random sampling algorithm to estimate the Shapley value of all players at once.For a given number of players n and a specified sample size m for each player, the algorithm takes m random orders O of the player set N. The algorithm estimates the Shapley value Sh i for all players i as the average marginal contribution of player i to all those m orders (6).
When applying antithetic sampling to this algorithm, the algorithm only takes m 2 random orders of the player set N (with • denoting the ceiling function).Instead, the antithetic sample generated via as i is used to update the estimated Shapley value as well.
We describe this approach in Algorithm 1.
Theorem 1.The estimator Sh as i for the Shapley value of player i from Algorithm 1 is both unbiased, i.e., E[Sh as i ] = Sh i , and consistent, i.e., lim m→∞ P(|Sh Proof.The paper [21] points out that the estimator Sh i from ( 6) is unbiased and consistent since Sh i is a sample mean and Sh i is a population mean.This is also true for our proposed estimator Sh as i .To prove that, we need to show that the samples obtained by calling as i follow the same probability distribution as if they were directly randomly sampled.This means showing that as i maps every given subset to a unique antithetic subset that has an equal probability during the random sampling process.The probability of randomly sampling a subset S in the context of the Shapley value is given by 1 n whereby it is easy to see that a subset with size s = h and h ∈ {0, . . ., n − 1} has an equal probability of being randomly sampled as a subset of size s = n − h − 1 due to the symmetry of the binomial coefficient, i.e., ( n−1 h ) = ( n−1 n−h−1 ).This matches our definitions of all as h i , where each as h i maps a subset from {S ∈ 2 N\{i} |s = h} to an antithetic subset from {S ∈ 2 N\{i} |s = n − h − 1}.Thus, the elements from the domain and codomain of all as h i have an equal probability of being taken when conducting random sampling.
Furthermore, it is clear that each element from {S ∈ 2 N\{i} |s = h} maps to a unique element from {S ∈ 2 N\{i} |s = n − h − 1}.Therefore, as long as the randomly sampled subsets are i.i.d.within the domains of all as h i , the generated samples are also i.i.d.within the codomains of all as h i .This is always the case since the proposed algorithm takes all subsets with the same size with equal probability, which leads to the conclusion that the estimator Sh as i is both unbiased, i.e., E[Sh as i ] = Sh i , and consistent, i.e., lim m→∞ P(|Sh

Algorithm 1 Antithetic sampling for Shapley value approximation
end for end for Remark 3. Another property of the original algorithm from [21] is its efficiency in allocation, i.e., n which also holds for the antithetic version of this algorithm taking into account that the sum of the marginal contributions in any order equals v(N).This is trivial for the randomly sampled orders, but it also holds for the antithetic samples.Since we are generating the antithetic samples by using as i (S) with S = pre i (O) independently for every player i in the proposed algorithm, these antithetic samples are equal to pre i (rev(O)) which means all antithetic samples are based on the same antithetic permutation rev(O) in a fixed iteration j.Thus, the sum over the marginal contributions of all players to their respective antithetic samples S as in a fixed iteration j also equals v(N).

Computing Shapley Values Using a Combination of Stratified and Antithetic Sampling
The algorithm St-ApproShapley proposed by Castro et al. (2017) [22] uses stratification to reduce the variance of the estimated Shapley values.The algorithm approximates the Shapley value of every player independently.Hence we only describe the algorithm for estimating the Shapley value for a fixed player i in the following.
The algorithm St-ApproShapley by Castro et al. (2017) [22] defines n strata T h with h ∈ {0, . . ., n − 1}, where stratum T h includes all subsets of N \ {i} with size h, i.e., T h = {S ⊆ N \ {i}|s = h}.From each of these strata T h , a sample M h of size m h is taken.For every sample M h , the mean of marginal contributions of player i to all S ∈ M h is calculated, resulting in Sh st i,h for all h.The estimated Shapley value Sh st i is the mean over all those Sh st i,h .In the following, we would like to extend the algorithm St-ApproShapley [22] to make use of antithetic sampling.A simple solution would be halving the sample size m h of every stratum and creating the antithetic sample element S as = as i (S) for every S ∈ M h , which would result in where Sh st,as * i,h should be the average marginal contribution of player i in position h + 1 over the sample M h , but taking a closer look shows this is not the case.Instead, Sh st,as * i,h from (9) does not only consist of marginal contributions of player i in position h + 1, but also in position n − h.In the following, we propose a more sophisticated algorithm.
Our new algorithm combining stratified and antithetic sampling only runs for h ∈ {0, . . ., n 2 − 1}.For every h, an i.i.d.sample M h of size m h from stratum T h is taken and the corresponding antithetic sample M n− h−1 from stratum T n− h−1 is generated by applying as i (M h).The former sample is used to update the estimator Sh st,as i, h of stratum T h, while the latter sample is used to update the estimator Sh st,as i,n− h−1 of the stratum T n− h−1 .The estimator of each stratum is the average over the marginal contributions of player i to all subsets S in the sample of the underlying stratum.The estimator of the Shapley value Sh st,as i of player i is the average over all those stratum estimators.
When executing the novel algorithm as described above, there is an edge case if n is odd.In that case, if h = (n − 1)/2, there holds h = n − h − 1.Thus, the estimators Sh st,as i, h and Sh st,as i,n− h−1 are the same estimator.In that case, the estimator is updated twice per iteration and therefore has used 2m h samples.To account for that, the estimator will be divided by 2 before being added to the overall estimator Sh st,as i , which is shown in the conditional statement at the end of the outer loop of our proposed Algorithm 2.
In our edge case when n is odd, an additional aspect needs to be taken into account whenever an equal sample size for all strata is desired.For n odd, the domain {S ∈ 2 N\{i} |s = h} and the codomain {S ∈ 2 N\{i} |s = n − h − 1} of the function as (n−1)/2 i are equal sets.This means that generating as i (M (n−1)/2 ) does not result in an antithetic sample from another stratum as it does for all other M h, but once again in a sample from T (n−1)/2 .This would result in taking twice the amount of samples from T (n−1)/2 as from any other T h .Thus, if an equal sample size for all strata T h is desired, m (n−1)/2 is only allowed to be half the size of every other m h.We formalize this idea in Algorithm 3.
Proof.Let us start by revisiting the results for the estimator of the original algorithm St-ApproShapley from [22].The estimators Sh st i,h are unbiased for all h.The estimated Shapley value Sh st i is the mean over all those Sh st i,h and hence unbiased as well.Equal sample sizes for all strata entail that m → ∞ implies m h → ∞ for all strata T h and hence Sh st i is consistent.The original estimator Sh st i,h can be interpreted as a mean of means, which is still possible for our our estimator Sh st,as i,h .We pointed out that as i is a bijective function which can be rewritten as a piecewise function consisting of as h i , for all h ∈ {0, . . ., n − 1}, where every as h i maps from {S ∈ 2 N\{i} |s = h} to {S ∈ 2 N\{i} |s = n − h − 1}.We also pointed out that each of these as h i is bijective, which means there is a one-to-one-mapping between the elements of the domain and the elements of the codomain.
Since our proposed algorithm takes i.i.d.samples M h from the domains of each as h i with h ∈ {0, . . ., n/2 − 1}, the elements from the codomains M h generated by running as h i (M h) are also i.i.d.among themselves because of the one-to-one-mapping.These elements from the codomains are used as the samples M n− h−1 for all strata T n− h−1 .Since the estimators of the strata are the mean of those i.i.

Computing Shapley Values Using Two-Stage-Stratification and Antithetic Samples
Castro et al. (2017) [22] proposed an even more sophisticated version of their algorithm St-ApproShapley, called Two-Stage-St-ApproShapley-opt, that further reduces the variance of the estimated Shapley values by sampling proportional to the variance of the strata.The latter approach is normally referred to as optimum allocation or Neyman allocation [43].Note that unlike St-ApproShapley, this algorithm approximates the Shapley value for all players at once.Therefore, the population is divided into n × n strata indexed by i and h, where i ∈ N defines the considered player in the stratum, i.e., the player whose marginal contributions are calculated, and h ∈ {0, . . ., n − 1} defines the number of players that player i is joining in the stratum, i.e., the size of S.
The algorithm is divided into two stages.In the first stage, the estimated Shapley value is calculated by stratified sampling for every player like it was described for the algorithm St-ApproShapley, whereby each stratum obtains an equal sample size m/(2n 2 ).We refer to the beginning of Section 3.3 for more details about stratified sampling in the context of the Shapley value.In addition to the approximation of the Shapley value, the variance of each stratum for each player is estimated.Afterwards, the sample sizes for the second stage are calculated, where the sample sizes of different strata are proportional to the variances of the strata estimated in the first stage.The second stage is once again a stratified sampling algorithm for every player like it was described at the beginning of Section 3.3, but this time with a different sample size for each stratum, i.e., those sample sizes that were calculated previously and are proportional to the estimated variances forming the first stage are used.The original algorithm Two-Stage-St-ApproShapley-opt from [22] might take more samples than specified by the user.This is why we slightly changed the calculation of the sample sizes for the second stage compared to the original algorithm.In our opinion, this allows for fairer comparisons with other algorithms.In our antithetic algorithm which we propose later in this subsection, we use only the generic term Calculate m st i,h as a placeholder instead of an exact implementation.We refer to [22] for the original sample allocation method.In our adapted sample allocation method, on the other hand, the sample size m st i,h of a stratum with player i in position h + 1 is calculated as described in Algorithm 4.

end if end for end while
Please note that we will also use Algorithm 4 for the sample allocation in the nonantithetic version of the Two-Stage-St-ApproShapley-opt when conducting comparisons between the non-antithetic and the antithetic version of the Two-Stage-St-ApproShapley-opt in Section 5. Furthermore, we emphasize that this slightly changed sample distribution does affect the usage of antithetic sampling neither in a positive nor a negative way.Again, the only reason for using this different approach are fairer comparisons to other algorithms.
As for our heuristics for combining antithetic sampling with two-stage sampling, there are a few aspects that need to be considered.First, it is important to update the estimated variance in the first stage regardless of using a randomly sampled subset or a subset derived through the antithetic sampling process, i.e., via as i .Second, it is not possible to use as i as the function to generate an antithetic subset in the second stage.In the second stage, the sample sizes are not equal for each stratum, but proportional in their size to the estimated variance of each stratum and thus bound to a specific stratum with player i being in position h + 1, i.e., a stratum where player i is joining subsets with h players.When using as i for generating the antithetic subset, the antithetic subset would be a sample for player i in position n − h but not h + 1, i.e., the antithetic subset length would be n − h − 1 but not h.Thus, using as i for antithetic subset generation would result in an incorrect usage of the sample sizes determined after the first stage.
To solve the latter problem, we define a function called get_antithetic_S_for_same_position.This function returns an approximated antithetic subset S as with s as = s for a given player set N, subset S, and player i.The term for_same_position refers to the fact that player i is in the same position in the returned antithetic sample S as as it is in the provided sample S. Note that this no longer satisfies the definition of antithetic sampling in the context of permutations by [41] where the authors propose to take the reverse permutation as the antithetic sample of a given permutation.
The algorithm inside the function get_antithetic_S_for_same_position works as follows: At first, a prototypical antithetic sample S as is generated by using as i (S).Since we want S as to be used with a player in position s + 1, S as must be adapted to be of size s.If the initial antithetic sample is too small, a random subset of size s − s as of the remaining subset N \ (S as ∪ {i}) will be appended to it.If S as is already too large, s as − s elements of S as will be discarded and only the resulting smaller subset of it will eventually be used as the antithetic sample.Algorithm 5 specifies our approach.
Algorithm 5 Defining S as for a given position s + 1 of player i function get_antithetic_S_for_same_position(N, S, i) S as ← as i (S) if s as < s then Take a random subset A ⊆ N \ (S as ∪ {i}) of size s − s as S as ← S as ∪ A else if s as > s then Take a random subset A ⊂ S as of size s S as ← A end if return S as end function Unlike our algorithm described in the following, it would also have been possible to use as i to generate samples in the first stage as was shown in Algorithm 2 because the sample sizes are equal for each stratum in the first stage.In other words, it would be possible to employ our function get_antithetic_S_for_same_position only for the second stage.For simplicity, we will use get_antithetic_S_for_same_position in both stages of the antithetic version of the Two-Stage-St-ApproShapley-opt.
To obtain our antithetic version of the algorithm Two-Stage-St-ApproShapley-opt, only small adaptions are needed.In both stages, the sample size of every stratum is halved as compared to the base algorithm.Furthermore, in both stages, our function get_antithetic_S_for_same_position will be called for every sampled subset S to generate a corresponding S as , and those S as well as S as will be used to update the average marginal contribution of player i in position h + 1.In addition to that, both resulting marginal contributions, i.e., x and x as , will be used to update the estimated variance of the underlying stratum in the first stage.The rest of the algorithm remains unaffected, and we refer to Algorithm 6 for more details.
end for end for , ∀i ∈ N Proof.The original algorithm Two-Stage-St-ApproShapley-opt returns an unbiased and consistent estimator [22].This estimator can be interpreted as a mean of means, which is still possible for our proposed algorithm.Hence, we need to prove that the estimators of all strata are unbiased.The true value of each stratum is its population mean, and thus the estimator of each stratum is unbiased if it is a sample mean, which is given as long as samples are i.i.d.taken from the stratum.The process of obtaining the samples from a stratum consists of two steps.First, subsets are randomly taken from each stratum.Second, for each of these subsets an antithetic subset is generated by calling get_antithetic_S_for_same_position.Note that this process is executed in the first as well as the second stage of the algorithm for every stratum, where the difference between both stages lies only in the sample size.In the following, we will show that the samples obtained via this process are equally distributed within each stratum.
Inside our function get_antithetic_S_for_same_position, prototypical antithetic samples are generated by calling as i .We already proved that as i is bijective, which means it is a one-to-one-mapping.Thus, as long as its inputs are i.i.d., so are its outputs.This is always given due to the fact that each subset S within a stratum is equally likely to be taken during random sampling.Furthermore, each subset maps to an antithetic subset with an equal probability during random sampling.This is automatically given due to the fact that all subsets have an equal probability.The following optional operations inside the function, i.e., extending the prototypical samples or shrinking them, change the prototypical antithetic samples in a random way which preserves their equal distribution.Therefore, it can be assumed that the combined samples of each stratum, i.e., those sampled directly and those generated via get_antithetic_S_for_same_position, behave like i.i.d.samples, and thus the estimator of each stratum is a sample mean which results in unbiased estimators of all strata.Since the estimator Sh st−opt,as i is a mean of those estimators of all strata, it is also unbiased, i.e., E[Sh st−opt,as i ] = Sh i .Algorithm 6 ensures that m → ∞ guarantees an infinite number of samples for all strata with nonzero variance and hence Sh st−opt,as i is consistent, i.e., lim m→∞ P(|Sh st−opt,as i

Computing Banzhaf Values Using Antithetic Sampling
In this subsection, we extend the use of antithetic sampling to another eminent pointvalued solution concept for TU games, i.e., the Banzhaf value [25,26].To do so, we use an algorithm called simple random sampling with replacement from a paper by Saavedra-Nieves (2021) [44] as the base algorithm that will be extended to use antithetic sampling.This algorithm samples m subsets of N \ {i} and averages the marginal contribution of a player i to these subsets as the estimated Banzhaf value of player i.
We propose a new algorithm taking advantage of antithetic sampling.However, only small changes compared to the original algorithm are needed.Instead of sampling m times, the new algorithm only samples m 2 times.In every iteration, the antithetic sample element S as of the sampled subset S is generated by calling the function as i .The rest is identical to the original algorithm, which means averaging the marginal contribution of a player to these subsets, i.e., all sampled coalitions S and all generated antithetic coalitions S as , as the estimated Banzhaf value. 2 } do Take a random subset S ⊆ N \ {i} S as ← as i (S) Bz as i ← Bz as i + v(S∪{i})−v(S)+v(S as ∪{i})−v(S as ) m+m mod 2 end for Proof.In [44], it was shown that the estimator Bz i of the base algorithm is both unbiased and consistent since Bz i is a sample mean and Bz i is a population mean.This is also true for our proposed estimator Bz as i .We pointed out that as i is a bijection that maps from 2 N\{i} to 2 N\{i} , i.e., as i is a one-to-one-mapping.Thus, as long as the inputs of as i , i.e., the randomly sampled subsets, are i.i.d., the outputs of as i , i.e., the corresponding antithetic subsets, are also i.i.d.among themselves.Given that the inputs of as i are in fact i.i.d.since they are randomly sampled from 2 N\{i} , the corresponding antithetic subsets are i.i.d. as well.Therefore, the estimator is a sample mean, where the whole sample consists of the randomly sampled subsets as well as those generated via as i .Thus, the estimator Bz as i is both unbiased, i.e., E[Bz

Antithetic Sampling for the Owen Value
We want to extend the use of antithetic sampling to solution concepts for games with precoalitions.In this section, we propose a random as well as a stratified sampling algorithm in combination with antithetic sampling for the Owen value.First, we incorporate the precoalition structure into our function for generating antithetic subsets from Definition 2.

Antithetic Subset Generation for Games with Precoalitions
The elements of P form a partition of the player set N and represent the coalition structure.This means that elements within a precoalition may never be split when sampling subsets S. Therefore, not all subsets S ⊆ N \ {i} are compatible with P. Definition 3. Let N = {1, . . ., n} be the set of players and P be a partition of N specifying the precoalition structure.We define a set C i that contains all possible subsets S ⊆ N \ {i} compatible with P for a given player i, i.e., C i = {S ⊆ N \ {i}|S is compatible with P}.We introduce a function ãs i derived from the function as i from Definition 2 which returns a subset compatible with P from a subset compatible with P, i.e., ãs i : Remark 4. The idea of the function ãs i from Definition 3 is as follows: Let S ∈ C i be compatible with P for a given player i.Then, S as contains all precoalitions from P \ P (i) that are not in S as well as all players from P (i) \ {i} that are not in S.

It is worthwhile to formally establish
Theorem 5.The map ãs i : C i → C i from Definition 3 is a bijection.
Proof.The proof for injectivity of ãs i is trivial, but proving surjectivity provides additional insight.C i can be modeled as 8), we know S = N \ (S as ∪ {i}).Furthermore, S as being an element of C i means that it can be represented in the form which means that any S as ∈ C i can be reached from a set S defined as ∪ A∈P\(R ∪P (i) ) A ∪ P (i) \ (Q ∪ {i}).It is easy to see that this set S is also an element of C i .Note that this set S contains all precoalitions from P \ P (i) , that are not in S as , which is expressed by ∪ A∈P\(R ∪P (i) ) A, and all players from P (i) \ {i}, that are not in S as , which is expressed by P (i) \ (Q ∪ {i}).Hence, ãs i is a bijection.
Let us reformulate ãs i as a piecewise function.
Remark 5.The map ãs i : C i → C i from Definition 3 can be rewritten as a piecewise function where the variable r ∈ {0, . . ., p − 1} counts the number of precoalitions from P \ P (i) and the variable q ∈ {0, . . ., p (i) − 1} counts the number of other players from P (i) \ {i} in S.
Remark 6.The piecewise function ãs i from Equation (10) introduced in Remark 5 consists of the subfunctions ãs k,h i : The relationships between domains and codomains of the subfunctions from Remark 6 are visualized in Figure 1.
Mutually antithetic strata are highlighted in the same color.These are domains and codomains of all ãs k,h i , while the matrix itself represents C i .E.g. ãs 0,0 i maps from C 0,0 i to C p−1,p (i) −1 i and vice versa, ãs to C 0,0 i .For clarity, not all combinations are highlighted in color.We focus on C 0,0 i and C It is trivial that this is also true for the codomain of ãs i , i.e., the codomains of all ãs k,h i are a partition of the codomain of ãs i .It is clear that the subfunctions from Remark 6 are bijective.

Computing Owen Values Using Antithetic Sampling
For the Owen value, we use the algorithm from [38] as the base algorithm.In each iteration, the algorithm chooses a random permutation of players for each precoalition P j with j ∈ {1, . . ., p}.Then, it chooses a random permutation of those precoalitions.This results in an order Õ.The estimated Owen value is the average marginal contribution of player i to pre i ( Õ) for all sampled orders Õ, where pre i (•) returns the set of all players before player i for a given order as explained in Equation (5).
Compared to the base algorithm, in the antithetic variant the number of randomly taken samples is halved and for each of the remaining m 2 samples S an antithetic sample is generated via ãs i (S).Note that inside the description of Algorithm 8, the sampling procedure takes each permutation Õ ∈ π(N), i.e., each permutation Õ compatible with the partition P, with the same probability.Concretely, it chooses a random permutation of the elements of each precoalition and then it takes at random a permutation of the p precoalitions.Proof.The paper [38] shows that the estimator of the base algorithm is both unbiased and consistent.Again, the estimator is a sample mean while the true value is a population mean.Thus, we need to show that calling ãs i does not change this behavior, i.e., the resulting estimator in our algorithm is still a sample mean.To prove that, we need to show that the samples obtained by calling ãs i follow the same probability distribution as if they were directly randomly sampled.This means showing that ãs i maps every given subset to a unique antithetic subset that has an equal probability of being chosen during the random sampling process.For the Owen value, the probability of randomly sampling a subset S with k other precoalitions and h other players from player i's precoalition P (i) is given by 1 whereby it is easy to see that a subset with a given k and h has an equal probability of being randomly sampled as a subset with p − k − 1 other precoalitions and p (i) − h − 1 other players from player i's own precoalition due to the symmetry of the binomial coefficients.This matches our definitions of all ãs k,h i , where each ãs k,h i maps a subset from C k,h i to an antithetic subset from . Thus, the elements from the domain and codomain of all ãs k,h i have an equal probability of being chosen during random sampling.Furthermore, we pointed out that these subfunctions ãs k,h i are bijective, which means that each element of the domain maps to a unique element of the codomain and thus the elements of the codomain are i.i.d.among themselves.Therefore, the antithetic estimator of the Owen value is still a sample mean and hence both unbiased and consistent.

Algorithm 8 Antithetic sampling for Owen value approximation
Ow as i ← 0 ∀i ∈ N for j ∈ {1, . . ., m 2 } do Take a random order Õ ∈ π(N), i.e., a permutation Õ compatible with the partition P for i ∈ N do S ← pre i ( Õ) S as ← ãs i (S) Ow as i ← Ow as i + v(S∪{i})−v(S)+v(S as ∪{i})−v(S as ) m+m mod 2 end for end for

Computing Owen Values Using a Combination of Stratified and Antithetic Sampling
The article [29] presents a stratified sampling algorithm for the Owen value.The algorithm divides the population into p × p (i) strata based on the number of other precoalitions P \ P (i) , denoted by k ∈ {0, . . ., p − 1}, and on the number of players from P (i) \ {i}, denoted by h ∈ {0, . . ., p (i) − 1}.The marginal contributions of player i in each of these strata are averaged and the estimator is the weighted sum over the averages of all strata.The strata weights are defined as for the Owen value.
There are different possibilities of how an antithetic version of this algorithm can be implemented.We already proposed a stratified antithetic sampling algorithm for the Shapley value in Section 3.3 where the algorithm runs only for h ∈ {0, . . ., n/2 − 1} instead of for all h ∈ {0, . . ., n − 1} because the strata of the positions { n/2 , . . ., n − 1} can be reached through the antithetic sampling process.Based on that previous observation, an intuitive approach for the use with precoalitions would be to run only for k ∈ {0, . . ., p/2 − 1} and h ∈ {0, . . ., p (i) /2 − 1}, whereas only samples from these strata are taken and all others samples for all other positions are generated via antithetic sampling.However, this leads to no samples being taken from strata where k ≥ p/2 ∧ h < p (i) /2 as well as k < p/2 ∧ h ≥ p (i) /2 .This problem is visualized in Figure 2.
It is not possible to reach all strata through antithetic sampling when running for k ∈ {0, . . ., p/2 − 1} and h ∈ {0, . . ., p (i) /2 − 1}.Directly sampled strata are highlighted in blue, those generated through antithetic sampling in green, and those that cannot be reached in red.
Running either one of the variables k or h for {0, . . ., p − 1} or {0, . . ., p (i) − 1}, respectively, are possible solutions to this problem.Both approaches result in reaching all combinations of k and h.We decided to run k for {0, . . ., p − 1}, which is visualized in Figure 3 and described in the following.Note that we employ the letter k again instead of k, because we generally aim to use k as a loop variable for the precoalitions throughout this work whenever possible without creating ambiguity.
We commence by describing the algorithm for even p (i) .The algorithm becomes more complicated when p (i) is odd.We will go into more detail regarding the edge cases occurring for odd p (i) later.
For all k ∈ {0, . . ., p − 1} and h ∈ {0, . . ., being also of size m k, h.Then, for all k ∈ {0, . . ., p − 1} and h ∈ {0, . . ., p (i) /2 − 1} the mean of marginal contributions of player i to all S ∈ M k, h as well as M p−k−1,p (i) − h−1 is calculated, resulting in Ow st,as i,k, h and Ow st,as i,p−k−1,p (i) − h−1 , respectively.The estimator Ow st,as i is the weighted sum over all those Ow st,as i,h,k .The weights are defined in Equation (11).For these weights there holds w Ow k, h = w Ow p−k−1,p (i) − h−1 , so that the same weight w Ow k, h can be used for the estimators Ow st,as i,k, h and Ow st,as i,p−k−1,p (i) − h−1 .While this idea works fine as long as p (i) is even, things become more complicated when p (i) is odd.In that case, there holds h = p (i) − h − 1 for h = n−1 2 , which means that samples with that value of h and any given k are mapped to samples with the same h and p − k − 1.Therefore, to avoid generating samples by both random sampling and the antithetic sample generation process, k is only allowed to be ran up to p 2 − 1.This is visualized in Figure 4. Inside Algorithm 9, this is taken into account by calling continue to skip sampling subsets from strata with h = n−1 2 and k ≥ p 2 .
Algorithm 9 Stratified antithetic sampling for Owen value approximation Ow st,as i If, in addition to p (i) being odd, p is also odd, there is another effect that needs to be considered.In that case, there holds k Thus, the estimators Ow st,as i,k, h and Ow st,as i,p−k−1,p (i) − h−1 are the same estimator.In that case, the estimator is updated twice per iteration and therefore uses 2m h samples.To correct that, the estimator will be divided by 2 before being added to the overall estimator Ow st,as i , which is shown in the conditional statement at the end of the loop in Algorithm 9. We refer to Figure 5 for more details.The same edge case also needs to be taken into account if a proportional sample distribution over all strata is desired.
is odd and p is even, samples with h = n−1 2 are only taken from strata up to p 2 − 1, i.e., those strata highlighted in blue in the middle column.Their corresponding antithetic samples are samples from the strata with the same amount of other players from player i's precoalition h and p − k − 1 other precoalitions involved, i.e., those strata highlighted in green in the middle column.Note that directly sampled strata are highlighted in blue and those generated through antithetic sampling in green. is also from . This stratum is highlighted in orange, all other sampled strata are highlighted in blue, and the rest of the samples generated through antithetic sampling are highlighted in green.
We already showed that ãs i can alternatively be defined as multiple subfunctions ãs k,h i .Taking a look at Equation (11), it is easy to see that the domain C k,h i and the codomain i are always equally weighted.Algorithm 10 creates a sample distribution that is proportional to the weights of the strata.For a total sample size m, h m , and thus the proportionality of the sample sizes with respect to the weights for all strata is still satisfied.
Algorithm 10 Sample allocation for stratified antithetic sampling for Owen value approximation for k ∈ {0, . . ., p − 1} do for h ∈ {0, . . ., As we already mentioned, there is one edge case in which the sample size needs to be adapted if a proportional sample distribution is desired.If both p and p (i) are odd, then m (p−1)/2,(p (i) −1)/2 must be halved.This is due to the fact that a sample M k, h leads to a sample M p−k−1,p (i) − h−1 when using ãs i and ãs k, h i for antithetic sample generation.When k = (p − 1)/2 and h = (p (i) − 1)/2, there holds k = p − k − 1 and h = p (i) − h − 1, which means that the antithetic sample generated via as k, h i is once again taken from the same stratum as the input of as k, h i , i.e., the domain C k, Proof.The proof relies heavily on the study of stratified sampling for the Owen value in the paper by Saavedra-Nieves (2023) [29].In [29], it is pointed out that the estimators Ow st i,k,h associated with the strata C k,h i are unbiased for all k ∈ {0, . . ., p − 1} and all h ∈ {0, . . . ,p (i) − 1}.The estimated Owen value Ow st i is the mean over all those Ow st i,h,k and hence unbiased as well.A proportional allocation procedure for all strata entails that m → ∞ implies m k,h → ∞ for all strata C k,h i and hence Ow st i is consistent.Since our estimator Ow st,as i can be interpreted as a weighted sum of means, we continue by showing that the estimator of each stratum Ow st,as i,k,h is both unbiased and consistent.In the following, we focus on the case that p (i) is even.For all strata with h ∈ {0, . . ., p (i) /2 − 1}, the quantity Ow st,as i,k, h is a sample mean and provides an unbiased estimator for Ow i,k, h which is a population mean of the underlying stratum.The samples of all other strata where h ≥ p (i) /2 are obtained by generating the antithetic samples from the strata that were directly sampled, i.e., ãs i (S) for all S ∈ M k, h, h ∈ {0, . . ., p (i) /2 − 1}.
We already showed that each ãs k, h i is a bijection between its domain and its codomain.Thus, as long as the sample M k, h from the domain C k, h i is i.i.d., the antithetic sample

Results
In this section, we estimate Shapley, Banzhaf, and Owen values by using the algorithms proposed in this work.We use TU games for which the solution is known to measure the error.We employ the mean squared error (mse) defined as for error measurement where E is the estimator and E the true value, i.e., in practice E is replaced by Sh for the Shapley value, by Bz for the Banzhaf value, and by Ow for the Owen value.We introduce our test games in Section 5.1 and then display, analyze, and interpret our computational results in Section 5.2.

Test Games with Known Solutions
In the following, we describe the four distinguished test games that are used throughout this section.
Airport Games go back to Littlechild and Thompson (1977) [45].The problem consists of n players each owning an airplane that requires a specific runway length.The challenge that arises is how to divide the costs for a runway in a fashion that fits the needs of all players.This makes the airport problem into a special case of a maintenance problem in which the tree representing the problem turns out to be a line graph.For more details on airport games, including closed form solutions for the Shapley and Owen values, we refer to [46,47].Let (N, v) be an airport game with N = {1, . . ., n} and v defined as with cost vector c = (c 1 , . . ., c n ).
Bankruptcy games go back to O'Neill (1982) [49] who studied a problem of rights arbitration from the Talmud.Imagine a person dies leaving debts c 1 , . . ., c n to n creditors.When the sum ∑ n i=1 c i of the debts is greater than the value of the estate E of the deceased, we are confronted with the dilemma that the debts are mutually inconsistent because the estate is too small in order to meet all of the claims of the n creditors.We define a bankruptcy game v for a set of players N = {1, . . ., n}, a debt vector c of length n, and an estate E as Concretely, we use the claim vector c from (14) with estate E = 40 for our bankruptcy games with n = 20 players.In the case of the precoalitions (15) and n = 40 players, we simply duplicate the claims vector ( 14  All results were obtained under Microsoft Windows 10 Home (64-bit) on an Intel(R) Core(TM) i7-1165G7 CPU with a clock speed of 2.80 GHz and 16 GB RAM, i.e., on a standard laptop PC.
For our sampling algorithms, it is safe to assume that sampling the characteristic function (of the TU game to be approximated) is the most costly part of the computation.Hence, we always plot the number of samples per player (x-axis) against the mean squared error (mse, y-axis) (12).
Let us look at Figure 6 for the Shapley values first.For the airport, bankruptcy, and glove games, antithetic sampling (Algorithm 1) clearly outperforms classical random sampling.For the airport game, stratified antithetic sampling (Algorithm 2) performs better than stratified sampling, while it is safe to say that for the bankruptcy game and the glove game, stratified antithetic sampling does not do worse than stratified sampling.Also, for the airport, bankruptcy, and glove games, two-stage stratified antithetic sampling (Algorithm 6) never shows a worse performance than two-stage stratified sampling (Algorithm 4).However, for the weighted game the picture changes.Stratified antithetic sampling performs worse than stratified sampling and even loses to classical random sampling while antithetic sampling (without stratification) converges most slowly.Only two-stage antithetic stratified sampling performs equally well as its baseline counterpart for the weighted game.We study approximations of the Banzhaf value in Figure 7.For the airport, bankruptcy, and glove games, antithetic sampling (Algorithm 7) leads to faster convergence than classical random sampling in all three cases.Again, for the weighted game the observation is reversed as classical random sampling outperforms antithetic sampling.
We next turn our attention to the Owen value and Figure 8.For both the airport game and the bankruptcy game, antithetic sampling (Algorithm 8) clearly outperforms classical random sampling and stratified antithetic sampling (Algorithm 9) converges faster than stratified sampling.For the glove game, antithetic sampling still beats the base algorithm slightly, whereas stratified antithetic sampling and stratified sampling perform equally well.Again, for the weighted game our observations are different.The two antithetic variants converge more slowly than the base algorithms.
We finally compare execution times and MSEs for Shapley value approximations for larger airport games with 40, 60, 80, and 100 players and 30000 samples per player in Table 1.As for the execution times, the authors are very well aware how these depend on their concrete implementation in R. Random antithetic sampling (Algorithm 1) always takes a little more time than random sampling (ApproShapley) and so does two-stage stratified antithetic sampling (Algorithm 6) as compared to two-stage stratified sampling (Algorithm 4).On the other hand, our stratified antithetic sampling approach (Algorithm 2) always needs slightly less computing time than stratified sampling (St-ApproShapley).In terms of the MSEs (which we deem much more meaningful and important than our execution times), the picture in Table 1 is very clear.Stratification in both its classical and our antithetic variant pays off in all four test cases when compared to approaches without stratification.Two-stage stratification in both its classical and our antithetic variant performs even better.

Comparison with the Ergodic Sampling Approach by Illés and Kerényi
Illés and Kerényi (2022) [55] propose ergodic sampling for approximating Shapley values, i.e., their sampled permutations are ergodic (meaning they follow the strong law of large numbers) but not independent.Ergodic sampling aims to construct pairs of negatively correlated samples in order to reduce the variance of the estimate implying that antithetic sampling can be regarded as the simplest heuristic for creating ergodic samples.Illés and Kerényi [55] propose a sophisticated algorithm to learn the best ergodic transform for a TU game at hand.Their algorithm consists of two stages.In the first stage m 1 , random permutations are sampled in order to learn an optimal ergodic transform t for a specific TU game via a greedy approach.In the second stage, this transform t is employed for actual ergodic rather than independent sampling, see [55] for a detailed description of the algorithms.Illés and Kerényi [55] provide a MATLAB implementation of their approach via https://de.mathworks.com/matlabcentral/fileexchange/71822-estimation-of-the-shapley-value-by-ergodic-sampling (accessed on 4 October 2023).
We ported their algorithm to R and integrated it into the package mentioned at the beginning of Section 5.2.
Table 2 compares our antithetic approaches and ergodic sampling with six different values of m 1 for a bankruptcy game with n = 20 players.For all values of m 1 except for m 1 = 5, ergodic sampling leads to a lower MSE than random sampling (ApproShapley), but it is outperformed by random antithetic sampling (Algorithm 1).All the four stratified sampling approaches lead to superior results.
Table 2. Execution times and MSEs for different antithetic sampling algorithms and ergodic sampling for approximating the Shapley value of the bankruptcy game described in Section 5.1, i.e., a bankruptcy game with n = 20 players, the claims vector defined in Equation ( 14), and estate E = 40.The sample size per player is 100,000, which means the overall sample size is m=2,000,000.

Critical Appraisal of Antithetic Sampling
On the one hand, our experiments confirm that the concept of antithetic sampling bears plenty of promise for accelerating estimations of Shapley, Banzhaf, and Owen values.On the other hand, we observe that this concept cannot be recommended unconditionally.From a very broad perspective, our experiments affirm in the context of permutation sampling for TU games that "stratification" is a much more powerful concept for variance reduction than "incorporating antithetic samples into an established algorithm".
The question whether antithetic sampling algorithms truly lead to acceleration appears to depend on the game at hand, its properties, and its parametrization.For example, we observe that both airport games and bankruptcy games are convex games, whereas glove games and weighted games are not convex in general.A TU game v is called convex if v(S ∪ T) + v(S ∩ T) ≥ v(S) + v(T) for all S, T ⊆ N, see [2], p. 10.Our experiments show that antithetic sampling can lead to an increase rather than a decrease in variance, a phenomenon Illés and Kerényi [55] also observe for ergodic sampling in some examples in their paper.
Finally, there is clearly a lack of analytical understanding of antithetic sampling in the context of estimating Shapley values.As long as we are unable to estimate the decrease in variance achieved via antithetic sampling, we will not be able to quantify the sample size a practitioner needs in order to guarantee a certain theoretical error.In such a case, one could only rely on the error bound for the base variant (rather than the antithetic variant) of the algorithm.For example, for the classical ApproShapley algorithm, we could still rely on bounds for the estimation error based on Hoeffding's inequality from the paper by Maleki et al. (2013) [42] in cases where the variance of marginal contributions or the range of marginal contributions is known.We agree with the remarks by Illés and Kerényi [55] that it is not known how to quantify the quality of variance reduction methods for the Shapley value as one can find only illustrative examples, but no statistical results, on this question in the literature.

Conclusions
This article studies antithetic permutation sampling for approximating three pointvalued solution concepts from cooperative game theory, i.e., the Shapley, Banzhaf, and Owen values.We provide a detailed analysis of antithetic subset generation and present novel antithetic sampling algorithms for the Banzhaf and Owen values.We show how to combine stratified sampling and antithetic sampling and develop sophisticated algorithms for the Shapley and Owen values.We point out that all our estimators are both unbiased and consistent.
This study was motivated by the widespread usage of antithetic sampling approximations of Shapley values in interpretable machine learning [18,20,23] which employ randomly sampled permutations together with their reverse permutations.The goal of our research was to provide a detailed assessment of the potential of antithetic sampling in the context of sampling permutations or subsets.Deliberately, we did not only study Shapley values, but also the Banzhaf and Owen values in order to ensure that our observations are also valid for other solution concepts based on marginal contributions and in the presence of precoalitions.
Our numerical experiments support the assessment that the concept of antithetic variates can lead to faster convergence of sampling algorithms for Shapley, Banzhaf, and Owen values, especially when combined with existing approaches for stratified sampling.However, we also find that this is not always the case and hence antithetic sampling should not be recommended without reservation.We regret the lack of theoretical bounds for the estimation error for antithetic sampling as compared to their corresponding base methods employing i.i.d.sampling.Our experiments show that stratification has a more profound effect on improving Shapley value estimations than our additional incorporation of antithetic sampling.
We wish to emphasize that this article is definitely not meant to be an overview of state-of-the-art algorithms for estimating Shapley values.Our subject is limited to antithetic sampling with replacement.While our paper reports very favorable results for Neyman sampling, i.e., the two-stage stratified sampling algorithm from [22], we need to stress that we omitted other important stratification methods, in particular Bernstein sampling introduced by Burgess and Chapman in their papers [31,32].We are certain it would not have changed our evaluation of antithetic sampling.Also, we deliberately did not include sampling approaches without replacement [31,32] in this study.While these approaches allow for sharper error bounds, they are more sophisticated to implement and discuss as storage requirements might become more critical.We are convinced that incorporating antithetic samples into existing algorithms without replacement in a similar fashion to our study would not lead to a different assessment of the advantages and disadvantages of antithetic sampling.
Finally, our research emphasizes an open research question already posed similarly by Illés and Kerényi [55].Can we identify classes of TU games for which specific Monte Carlo methods perform well?Can we identify certain favorable properties of TU games in the latter context, ideally independent from the parametrization of the games?Trying to build upon the work by Liben-Nowell et al. (2012) [30] could provide a starting point.

Theorem 2 .
The estimator Sh st,as i for the Shapley value of player i from Algorithm 2 with sample allocation according to Algorithm 3 is both unbiased, i.e., E[Sh st,as i ] = Sh i , and consistent, i.e., lim m→∞ P(|Sh st,as i

Theorem 3 .Algorithm 6
The estimator Sh st−opt,as i for the Shapley value of player i from Algorithm 6 with sample allocation according to Algorithm 5 is both unbiased, i.e., E[Sh st−opt,as i ] = Sh i , and consistent, i.e., lim m→∞ P(|Sh st−opt,as i − Sh i | > ) = 0 for all > 0. Stratified antithetic sampling for Shapley value approximation with optimum sample allocation

Remark 7 .=
It is easy to see that the domains of the subfunctions from Remark 6 form a partition of the domain of ãs i , i.e., ∪ C i and C b i ∩ C d i = ∅ for b, d ∈ {0, . . ., p − 1} × {0, . . ., p (i) − 1} and b = d.

Theorem 6 .
The estimator Ow as i for the Owen value of player i from Algorithm 8 is both unbiased, i.e., E[Ow as i ] = Ow i , and consistent, i.e., lim m→∞ P(|Ow as i − Ow i | > ) = 0 for all > 0.

hi
are equal sets.This would result in sampling twice the sample size as specified in the variable m k, h.Hence, m (p−1)/2,(p (i) −1)/2 should be halved in advance as described in Algorithm 10.Theorem 7. The estimator Ow st,as i for the Owen value of player i from Algorithm 9 with sample allocation according to Algorithm 10 is both unbiased, i.e., E[Ow st,as i ] = Ow i , and consistent, i.e., lim m→∞ P(|Ow st,as i ), i.e., we use c P = (c, c) with estate E P = 80.Glove games are defined by a set N = {1, . . ., n} of n players and a disjoint union N = L ∪ R with L being the set of players in possession of one left-hand glove each and R being the set of players in possession of one right-hand glove each.The worth of a coalition S is the number of pairs of gloves that the members of S can supply v(S) = min(|S ∩ L|, |S ∩ R|).(20)For more details on glove games, we refer to the textbook by Peters (2015)[50], pp.155-156.Concretely, for our glove games with n = 20 players, the set of players with left-hand gloves is L = {1, 2, 5, 7, 12, 17, 18, 19} and hence the set of players with right-hand gloves is R = N \ L. In the case of the precoalitions(15) and n = 40 players, we work withL P = {1,2, 5, 7, 12, 17, 18, 19, 21, 22, 25, 27, 32, 37, 38, 39} and R P = N \ L P .

Figure 6 .Figure 7 .
Figure 6.Performance gain analysis of antithetic sampling for Shapley value estimation.

Figure 8 .
Figure 8. Performance gain analysis of antithetic sampling for Owen value estimation.
. , n}.Further, let O : N → N be a permutation that assigns the player O(k) to each position k.Given a permutation O ∈ π(N), we define pre i (O) as the set of predecessors of player i in the order O, i.e., pre i (O) = {O(1), . . ., O(k − 1)}, if i = O(k).In this setting, the marginal contribution of player i for a given order O d. samples, they are unbiased, which results in the estimator Sh Sh i .Algorithm 3 ensures equal sample sizes for all strata.Hence m → ∞ guarantees m h → ∞ for all strata T h and hence Sh Sample allocation for stratified antithetic sampling for Shapley value approximation n − 1} Obtain m st i,h according to Algorithm 4 or Castro et al. (2017) [22], ∀i ∈ N, ∀h ∈ {0, . . ., n − 1} for i ∈ N and h ∈ {0, . . ., n − 1} do for j ∈ {1, . . ., Choose random subset S ⊆ N \ {i} of size h S as ← get_antithetic_S_for_same_position(N, S, i) Sh + v(S ∪ {i}) − v(S) + v(S as ∪ {i}) − v(S as ) . as well.Due to the random sampling process, it is given that the samples from the domains C k, with h ≥ p (i) /2 are sample means and therefore unbiased.It is clear that these arguments carry over to the case that p (i) is odd.Ow i .The sample allocation according to Algorithm 10 ensures that m → ∞ still guarantees m k, h → ∞ for all strata C k, 1, we also study Shapley values of airport games (N, v) with low variance of the form c = (n − 1, . . ., n − 1 Two-Stage Stratified Antithetic Sampling 48.7 s 1.42 × 10 −6 Ergodic Sampling, m 1 = 5 43.4 s 3.44 × 10 −5 Ergodic Sampling, m 1 = 10 43.6 s 2.39 × 10 −5 Ergodic Sampling, m 1 = 25 45.3 s 1.59 × 10 −5 Ergodic Sampling, m 1 = 50 47.5 s 1.50 × 10 −5 Ergodic Sampling, m 1 = 100 51.1 s 1.54 × 10 −5 Ergodic Sampling, m 1 = 200 59.3 s 1.90 × 10 −5