Fairness in Algorithmic Decision-Making: Applications in Multi-Winner Voting, Machine Learning, and Recommender Systems

: Algorithmic decision-making has become ubiquitous in our societal and economic lives. With more and more decisions being delegated to algorithms, we have also encountered increasing evidence of ethical issues with respect to biases and lack of fairness pertaining to algorithmic decision-making outcomes. Such outcomes may lead to detrimental consequences to minority groups in terms of gender, ethnicity, and race. As a response, recent research has shifted from design of algorithms that merely pursue purely optimal outcomes with respect to a ﬁxed objective function into ones that also ensure additional fairness properties. In this study, we aim to provide a broad and accessible overview of the recent research endeavor aimed at introducing fairness into algorithms used in automated decision-making in three principle domains, namely, multi-winner voting, machine learning, and recommender systems. Even though these domains have developed separately from each other, they share commonality with respect to decision-making as an application, which requires evaluation of a given set of alternatives that needs to be ranked with respect to a clearly deﬁned objective function. More speciﬁcally, these relate to tasks such as (1) collectively selecting a ﬁxed number of winner (or potentially high valued) alternatives from a given initial set of alternatives; (2) clustering a given set of alternatives into disjoint groups based on various similarity measures; or (3) ﬁnding a consensus ranking of entire or a subset of given alternatives. To this end, we illustrate a multitude of fairness properties studied in these three streams of literature, discuss their commonalities and interrelationships, synthesize what we know so far, and provide a useful perspective for future research.


Introduction
Decision-making by algorithms is becoming a ubiquitous part of our societal and economic lives.Algorithmic decisions increasingly appear in a plethora of domains such as healthcare, legal, education, banking, e-commerce, etc.In healthcare, for example, algorithms are being used to routinely monitor biochemical signals in patients, and immediately alert clinicians when anomalies arise [1].Deep learning algorithms are able to process anonymized electronic health records and flag potential emergencies, to which clinicians are then promptly able to respond.Similarly, in US courts, an algorithmic system known as COMPASS is used to estimate the risk of recidivism.Human Resource departments in various companies are increasingly resorting to algorithms that are able to filter from the initial set of potential applications to reduce human time and effort in the evaluation of applications [2].Similarly, universities and colleges have begun using algorithmic predictions on big data to estimate which students will do well before accepting their admission applications [3].With banks moving towards mobile payments to offer a seamless and fast customer experience, payment services based on machine learning algorithms verify and identify credit fraud in real-time.Similarly, insurance companies use automated data credibility assessment methods to quickly perform complex rounds of approval, verification, and evaluation so as to flag duplicate or otherwise unusual activities.Online retailers such as Amazon and Alibaba routinely deploy recommender systems algorithms in order to filter the set of product items that are displayed on the users dashboard.It is becoming evident that (with or without our desire) algorithmic decisions leave their footprints in our day to day activities from the way we do grocery shopping to the way we do banking.This increasing application and deployment of algorithmic decision-making in economy and society are driven by their high accuracy, effectiveness, low cost, and efficiency.Acceleration in the adoption of algorithmic decision-making is further supported by the access to mass volumes of data that is being currently collected in the digital economy as well as advancement in development of hardware such as General Processing Units (GPUs) and Tensor Precessing Units (TPUs).
In addition to the notable benefits and growing prevalence of algorithmic decisions, we are also witnessing growing concerns and skepticism in academia and popular media with respect to algorithmic unfairness and the evidences that they may inadvertently discriminate against certain minority groups.Evidence has shown that algorithmic decisions not only counteract and expose biases but also afford new mechanisms for introducing biases with unintended and detrimental effects [4].Specifically, algorithmic decisions have been shown to amplify biases and unfairness embedded in data in terms of sensitive features such as gender, culture, race, etc.For example, in their recent study, Caliskan, Bryson, and Narayanan [5] found that natural language processing algorithms do capture historic discrimination against gender, such as by more closely associating words like "doctor" with males and "nurse" with females.As such algorithms are trained on historical data, past discrimination and stereotypes prevalant in the society are reflected in their predictions.These concerns become particularly alarming when algorithmic decisions are interacting and influencing almost every aspect of economic and social life of groups and individuals.As an example, consider the work of Angwin et al. [6], who found that COMPASS is biased against African-American defendants.As the tools' error rates were asymmetric, African-American defendants were more vulnerable to be incorrectly labeled as higher-risk than they actually were when compared to their white defendants.In another example, recommender algorithms deployed for personalization have been shown to propagate or even create biases that may influence decisions and opinions of the user at the receiving end [7,8].Such phenomena has been observed in social media platforms such as Facebook and Twitter, resulting in an inflation in the polarization of society by over 20 percent in the last eight years [9].Algorithmic decisions have also been shown to amplify biases with respect to gender embedded in data.For example, algorithms trained on data which feature under representation of women in science, technology, engineering and mathematics (STEM) topics output decisions more biased towards men [10].
Nevertheless, it is encouraging to observe that as a response to the above-mentioned scrutiny and following debates in popular media, computer science scholars have been swift in beginning to collaborate with lawyers, policy-makers, economists, social scientists, and others in designing fair, transparent, and reliable algorithms.This has also led to the organization of the relatively new yet much influential ACM conference on Fairness Accountability and Transparency in Machine Learning (FATML), which is particularly targeted at bringing together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems.Even though these outlets mainly focus on algorithmic fairness pertaining to machine learning algorithms, this represents an important step in achieving fairness in algorithmic decision-making in general.
Taking a further step in this direction, in this paper, we review the proliferation of research on fairness in algorithms, synthesize our present understanding, and conclude with identification of major challenges (if any) and pressing open questions and future research directions.We particularly look into three domains of research on decision-making, namely, multi-winner voting algorithms (Section 2), machine Learning algorithms (Section 3), and recommendation systems algorithms (Section 4).A major portion of recent research on fairness in algorithmic decision-making falls into one of these three domains.We recall many concrete concepts of fairness in these areas and discuss their importance, interrelationships, as well as other related problems.Notably, understanding and keeping in touch with latest research in fairness in algorithms is of great importance to policy makers and practitioners interested in introducing algorithmic decision-making into their organizations and businesses.This article also aims to provide a concise overview to readers looking for an outlook on diverse dimensions of algorithmic fairness research in one place.This is important because, as is evident in our review, algorithmic fairness research is advancing in diverging directions with a variety of definitions, designs of fair algorithms and mechanisms being developed with uncorrelated and non intersecting idiosyncratic assumptions.It is therefore imperative to have some sort of convergence in the further development of the field in order to facilitate more beneficial societal impact.
Note that research has revealed that biases in algorithmic decision can come from multitude of sources, such as human decisions on how the data was collected, noisy preferences provided by decision makers, features selected, steps taken in cleaning and preprocessing the data, and even the choice of algorithms itself.These elements are largely dependent on courses of action taken by the user of the algorithm.However, in this article, our main focus is to stick with fairness in algorithmic decision-making precluding the biases introduced by the courses of action by humans.Accordingly, given as input a set of alternatives A = {a 1 , a 2 , . . ., a n }, the decision-making task is required to evaluate this set based on a clearly defined objective function.Based on this evaluation, the decision-making algorithm groups the alternatives and ranks the groups into a particular order.This function symbolizes applications such as (1) collectively selecting a fixed number of winners from a set of alternatives; (2) clustering the set of alternatives into disjoint groups; or (3) finding a consensus ranking of a smaller subset of or all alternatives.We believe these specific tasks are covered by the applications of algorithms in multi-winner voting, machine learning, and recommender systems.

Multi-Winner Voting
Collective decision-making is a significant branch of social choice theory and has wide applications in both economics and computer systems.Examples of applications include political elections, committee selections (e.g., journal editorial board selection), selecting items to display in online shops, recommending multiple items to users in recommender systems, company or institute employee recruitment, heuristic algorithms selection in meta-heuristics, selecting data to load into caches in cloud computing systems, etc. Concretely, collective decision-making is mainly concerned with deriving consensus outcomes based on preferences of a number of decision-making participants over possible outcomes.Without a doubt, voting is one of the most popular approaches for collective decision-making.In this setting, we have a set of candidates C (possible outcomes), a set of voters V (decision-making participants) each of whom has a preference over candidates in C, and then we either aim to select a subset of exactly k candidates as winners for some integer, k, or find a ranking of candidates from the best to the worst for the community.It should be pointed out that voters need not necessarily to be human beings, they can also be certain criteria, robots, functions, or even algorithms.A large number of algorithms or multi-winner voting rules have been proposed for the purpose of the former.However, as fairness properties were not comprehensively taken into account when these rules were coined, many of them may result in unfair outcomes.For instance, assume that we have 100 voters who are divided into two groups: the majority and the minority.In particular, the majority consists of 90 voters, all of whom approve their spoiled candidates c 1 , c 2 , . . ., c 10 .The remaining 10 voters, who are a minority, approve only the last candidate denoted by c , probably because only this candidate has positive utility to them.If we aim to select 10 winners and apply the prevalent approval voting, then {c 1 , c 2 , . . ., c 10 } will be selected, as they are approved by the maximum number of voters.This result is clearly biased against the minority since their opinion is completely ignored.
In this section, we shall survey recent progress on the study of fairness properties in multi-winner decision-making.Regarding fairness, an important concern is fair for whom and fair at which level.These two questions are important guidance for us to define different fairness properties.In voting, we have two types of entities, namely, the candidates and the voters, both of whom may need to be fairly considered.For single-winner voting rules (k = 1), which aim to select exactly one winner, the neutrality property ensures that candidates are treated equally, whereas the anonymity property ensures that voters are treated equally [11].Recall that neutrality says that the winners' identities remain the same after candidates are renamed, and anonymity says that all voters have the equal power and the order of them have no impact on the results (see the work by the authors of [12] for the formal definitions).These two properties have long been studied in the literature.Neutrality and anonymity are of course also desired for multi-winner voting rules, where a fixed number of winners are selected [13,14].However, these two properties only provide individual-level fairness by regarding each voter and each candidate as an independent individual, but do not say anything about group-level fairness, which is of particular importance in some real-world applications.Consider the above example and consider what should be a fair result for both the majority and the minority.As the minority accounts to 10% of all voters, should 10% of the winners also come from their approved candidates?If this is the case, then a fair result would be that selecting c and nine of {c 1 , c 2 , . . ., c 10 } as the winners.To fill the gap, proportional fairness properties of multi-winner decision-making have been proposed and received a considerable amount of study in the literature in recent years.Generally speaking, these properties stipulate that certain groups of voters should be proportionally represented in a committee according to the strengths of their numbers.
This section is devoted to numerous important proportionality properties studied in the recent literature.We discuss mainly two preference models: the dichotomous preference model and the linear preference model.

Dichotomous preference.
Each voter classifies candidates into two classes, namely, the approved candidates and the disapproved candidates.In particular, all approved candidates are preferred to all disapproved candidates, and candidates inside each class are equally preferred.Linear preference.Each voter ranks all candidates in a linear order , from the best to the worst.For two candidates, a and b, a b means that the corresponding voter strictly prefers a to b.
Multi-winner voting rules with dichotomous preferences are often referred to as approval-based multi-winner voting rules, and with linear preferences are referred to as ranking-based rules.
We divide our discussions into four subsections.In Sections 2.1 and 2.2, we survey fairness properties for ranking-based and approval-based voting, respectively.These properties are aimed at certain groups of voters.We shall give the definitions of these properties, discuss the relations among them, point out the complexity of two important problems related to these concepts, and offer an overview of the most important voting rules studied in the literature and whether they fulfill these properties.In Section 2.3, we discuss recent research on the setting where candidates have sensitive attributes or are labeled, and fairness are provided for groups of candidates.Section 2.4 is aimed at discussing stability concept-based fairness properties.

Voter Fairness in Ranking-Based Voting
We consider first ranking-based voting where voters are asked to report linear order preferences over candidates.A voter v's preference is denoted by v , so that a v b represents that this voter prefers the candidate a to the candidate b.A crucial notion in this setting is solid coalition which was first mentioned in the work by the authors of [15].Particularly, for a subset C ⊆ C of candidates, a solid coalition is a subset of voters U ⊆ V such that all voters in U rank all candidates in C above all the other candidates, i.e., for all voters v ∈ U, it holds that a v b for all a ∈ C and b ∈ C \ C .In this case, we say that U supports C and call U a C -solid coalition.
The following proportional property provides fairness for solid coalitions: it states that for a solid coalition of a certain scale, a guaranteed number of candidates supported by this coalition should be selected as winners.
q-Proportionality for solid coalition (q-PSC) [16].For a rational number, q, a k-committee w ⊆ C satisfies q-PSC if for every positive integer and for every solid coalition U ⊆ V supporting some C ⊆ C such that |U| ≥ q, it holds that |w ∩ C | ≥ min{ , |C |}.
Normally, we are only interested in the case where n k+1 < q ≤ n k , where n denotes the number of voters.One of the reason is that when k = 1, i.e., we select only one winner, a q-PSC committee is a singleton consisting of a candidate who is most preferred by at least a majority of the voters, whenever such a candidate exists (note that such a candidate must be a Condorcet winner).In addition, q-PSC is not guaranteed to exist if q ≤ n/(k + 1).Moreover, if q > n/k, any q-PSC committee must provide some counter-intuitive properties (see the work by the authors of [16] for the details).
Specifically, if q is equal to the so-called Hare quota n/k, the property is referred to as Hare-PSC (q H -PSC). Besides, if q is equal to the Droop quota n k+1 + 1, the property is referred to as Droop-PSC (q D -PSC).
Proportionality for solid coalition seems to be first considered by Dummett [15].Many of its variants have been studied very recently [16,17].For example, the weak q-PSC puts constraints only on solid coalitions supporting a limited sized committee, and asks a committee w to contain all candidates who are supported by these solid coalitions.
Weak q-PSC.[16].A committee w ⊆ C satisfies weak q-PSC if the following holds, for every positive integer , every C ⊆ C such that |C | ≤ , and every C -solid coalition U of size at least q, it holds that C ⊆ w.
Similar to q-PSC, we are particularly interested in the case where n k+1 < q ≤ n k .Weak q H -PSC and weak q D -PSC are referred to as weak q-PSC, where q takes the Hare quota and the Droop quota, respectively.Both PSC and weak PSC are designed to guarantee fairness for voters at group levels, but they differ at the degree of fairness they could provide.In fact, due to the definitions, we know that q-PSC implies weak q-PSC, but not necessarily the other way around.
Given a concept of fairness, a significant question is whether we can compute a committee providing the fairness efficiently.Let τ be a fairness property.

τ-Computing
Input: An election (C, V) and a positive integer k ≤ |C|.Question: Is there a k-committee w ⊆ C which provides the τ property at (C, V)?
Prior to the proposal of many fairness properties, a large body of voting rules have been extensively and widely studied in the literature.Analyzing whether the outcomes of these voting rules provide some specific fairness property is of also particular importance.This motivation brings the following decision problem into the line of research.

Input:
An election (C, V) and a committee w ⊆ C. Question: Does w satisfy τ at (C, V)?
Concerning the first decision problem, Aziz and Lee [16] proved that both computing and testing q-PSC and weak q-PSC are polynomial-time solvable for all possible values of q.See Table 1.
Table 1.Complexity of computing a committee satisfying a proportional property, or testing whether a given committee satisfies a proportional property.In the table, "P" stands for "polynomial-time solvable".All results are from the work by the authors of [16].

Complexity of Computing Complexity of Testing
q-PSC P P weak q-PSC P P For the second decision problem, we survey the results for many voting rules studied in the literature.Let (C, V) be an election.For a vote ∈ V, the position of a candidate c in , denoted by pos (c), is the number of candidates ranked before c plus one, i.e., Committee scoring rules.Under a committee scoring rule, each voter provides a score to each committee based on the positions of the committee-members in the preference of this voter, and winning committees are those with the maximum total score.Committee scoring rules were first studied by Elkind et al. [17] as a general framework to encapsulate many concrete multi-winner voting rules, including, e.g., Bloc, k-Borda, Chamberlin-Courant, etc.
• k-Borda.Each voter gives m − i points to each candidate ranked in the i-th position, where m denotes the number of candidates.The score of a committee from a voter is the sum of the scores of all its members from the voter.

•
Bloc.Every voter gives 1 point to all of their top k ranked candidates.The score of a committee from a voter is the sum of the scores of all its members from the voter.
Monroe's rule.This rule is similar to the CC rule but with a further restriction that every candidate can represent at most n k voters.Let g : V → C be an assignment function and g − (c), c ∈ C be the set of voters, ∈ V, such that g( ) = c.Moreover, let G be the set of all assignment functions from V to C. The Monroe score of a k-committee w ⊆ C is then defined as where α : N → N is a mapping as in CC.Monroe's rule selects k-committees with the maximum score as winning committees.Single-transferable voting (STV).STV rules are a large class of voting rules each of which is featured by a rational number q and some vote-reweighting approach.A common principle of these rules is to guarantee certain groups of voters are proportionally represented.Fixing a rational quota q and a vote-reweighting approach, the STV rule selects winning committees iteratively as shown below.For a candidate c, let V top (c) be the set of voters ranking c in the top.
1. Initially, we associate to each voter v ∈ V a weight denoted by weight(v).(Usually, all voters have weight 1 initially, but this is not necessarily the case.) 2. If there is a candidate, c ∈ C, that is ranked in the top by at least q voters, that candidate is added to the winning committee.Then, we apply the vote-reweighting approach so that the total weight of all votes ranking c in the top are reduced by min{q, p}, where p = ∑ v∈V top (c) weight(v) is the sum of the weights of all voters ranking c in the top before the reweighting.Moreover, the candidate c is deleted from C and from all votes.3.If there is no such a candidate c as discussed above, then a candidate that is ranked in the top by the least number of voters is eliminated.4. The procedure terminates until k candidates are selected.
Many of concrete STV rules have been considered in the literature (see the works by the authors of [18,19] for a history and a summary of many important STV rules).However, for simplicity, in this survey, we discuss only STV rules where initially all voters have weight 1, and the uniform reweighting approach is used in Step 2. Particularly, according to this reweighting approach, in Step 2, the weight of a voter v which ranks c in the top is reduced to min{0, weight(v) • (1 − q p )}. Two important STV rules are those when q is equal to the Hare quota or the Droop quota, i.e., q = n k and q = n k+1 + 1.We denote these two special STV rules as D-STV and H-STV, respectively.
Much research has been done to investigate whether the above defined multi-winner voting rules provide the PSC fairness property, see Table 2 for a summary of the current known results.According to this table, by using STV rules, we are able to obtain, in polynomial time, winning committees that provide both q H -PSC and q D -PSC fairness simultaneously.Nevertheless, it is important to point out that STV rules fail many monotonic properties [16].For a nice remedy, Aziz and Lee recently proposed a new rule, which they named "Expanding Approvals Rule" (EAR).In particular, they showed that EAR has the following advantages compared with any other concrete rules studied in the literature to date.First, an EAR winning committee can be always computed in polynomial-time.Second, EAR committees provide both q H -PSC and q D -PSC fairness.Third, EAR committees satisfy many monotonic properties.Finally, EAR works not only for strict preference elections but also for the case where voters hold weak order preferences over candidates.See the work by the authors of [16] for the definition of EAR and the detailed discussions.Table 2.A summary of the PSC properties satisfied by several important multi-winner voting rules and the complexity of computing a winning committee with respect to these rules.In the table, "N" means that the rule in the corresponding row does not satisfy the property in the corresponding column, and "Y" means that the rule satisfies the property.Observing that weak q-PSC is a too strong property for many rules to satisfy, Elkind et al. [17] studied three weak versions, namely, solid coalitions, consensus committee, and unanimity.They showed that each of SNTV, Bloc, k-Borda, CC, and Monroe fails at least one of these weak versions, and these results imply the ones for these rules in the table.

Voter Fairness in Approval-Based Voting
This section is devoted to fairness properties of approved-based multi-winner voting rules, where each vote v ∈ V consists of a subset of candidates, the candidates approved by the corresponding voter.Several important proportional properties have been put forward in the literature.In general, these properties aim at providing fairness for certain group of voters who approve some candidates in common.In particular, they ensure that for such a group of enough large size, at least a certain number of candidates approved by all (or some) members of this group should be selected.

Justified representation (JR).
A k-committee, w ⊆ C, provides JR, if, for every subset U ⊆ V of at least n k votes such that u∈U u = ∅, at least one of the candidates approved by some vote in U is included in w, i.e., This property was proposed by Aziz et al. [22,23].

Proportional justified representation (PJR).
A k-committee ,w ⊆ C, provides PJR if for every positive integer ≤ k, and for every subset U ⊆ V of at least • n k votes such that | u∈U u| ≥ , the committee w contains at least candidates from u∈U u, i.e., |w ∩ ( u∈U u) | ≥ .This property was proposed in the work by the authors of [24].

Extended justified representation (EJR).
A k-committee w ⊆ C provides EJR if for every positive integer ≤ k and for every subset U ⊆ V of at least t ≥ • n k votes such that | u∈U u| ≥ , the committee w contains at least candidates from every vote u ∈ U, i.e., |w ∩ u| ≥ for all u ∈ U.This property was proposed by Aziz et al. [23].

Perfect representation (PR).
PR is defined for special elections.Particularly, let (C, V) be an election such that and c i is approved by all votes in V i .This property was studied in the work by the authors of [24].
From the definitions, it is easy to see that EJR implies PJR, and PJR implies JR [22,24].As discussed in the previous section, two important questions are (1) whether we can always calculate a committee providing a certain property efficiently and (2) whether we can determine whether a given committee provides a certain fairness property efficiently.For JR, we have positive answers to both questions.However, for the two more refined concepts EJR and PJR, we have only the positive answer to the first question.See Table 3 for a summary of the concrete results.The next important question is, therefore, whether there are committees providing several fairness properties simultaneously, and whether we calculate such a committee in polynomial time if it exists.
We check the answer by surveying several well-studied and natural multi-winner voting rules.

Approval voting (AV).
The AV score of a candidate is the number of votes approving this candidate, and a winning k-committee consists of k candidates with the highest AV scores.

Satisfaction approval voting (SAV).
The SAV score of a candidate c is defined as where m denotes that number of candidates.A winning k-committee consists of k candidates with the highest SAV scores.Minimax approval voting (MAV).This rule aims to find a committee that is most close to every voter's opinion.More precisely, the Hamming distance between a committee w and a vote v is d H (v, w) = |w \ v| + |v \ w|, and this rule selects a k-committee w minimizing max v∈V d H (v, w).

Proportional approval voting (PAV).
The PAV score of a committee w is defined as A winning k-committee is an one with the maximum score.

Sequential proportional approval voting (seq-PAV).
This rule provides an approximation solution to PAV rule.It selects k winners in k rounds, one in each round.Precisely, initially we let w = ∅.Assume that we have an i-committee w after round i < k.Then, in the next round, we find a candidate c which offers the maximum PAV score of w ∪ {c}, and we extend w by resetting w := w ∪ {c}.After k rounds, w contains exactly k candidates.

Chamberlin-Courant approval voting (CCAV).
This rule is a variant of CC rule for approval-based voting.In particular, a voter satisfies with a committee if and only if this committee contains at least one of her approved candidates.This rule selects a k-committee that satisfies the maximum number of voters.

Monroe's approval voting (MonAV)
. This is a variant of Monroe's rule for approval-based voting and is similar to CCAV.In CCAV, a candidate can satisfy all voters who approve this candidate.However, in MonAV, we require that each candidate is assigned to at most n k voters approving this candidate and, moreover, each voter can be assigned to at most one candidate.The MonAV score of a committee is the maximum number of voters who are satisfied by this committee and fulfill the above conditions.
In addition to the above rules, a class of important rules, coined by Phragmén, have been studied.These rules determine the winners in a reverse-thinking approach.Particularly, assume that we know the k winners.The rules assume that each of this winners has a unit point which is distributed over all voters approving this candidate in a way to achieve some objective (Phragén's rules differ only at the objectives).Then, the selected winners should be those that yield the optimal objective over all subsets of k candidates.We give the formal definitions below.
A load distribution is a two-dimensional array x = (x v,c ) v∈V,c∈C satisfying the following conditions.
This corresponds to winner that is only distributed over voters approving that winner.

It holds that
That is, there are in total k pointes to be distributed.

For every
This together with the previous restriction ensure that exactly k candidates have points to distribute.
For a load distribution x and a vote v, let x v = ∑ c∈C x v,c .Particularly, x v is referred to as the voter load of v. Due to the last two conditions in the definition of load distribution, we know that each load distribution x gives us a unique k-committee Note that for a k-committee w, there can be multiple load distributions x such that f (x) = w.max-Phragmén.This rule first calculates a load distribution x such that max v∈V x v is minimized.Then, f (x) is the winning committee.

var-Phragmén.
This rule first calculates a load distribution x such that ∑ v∈V x 2 v = ∑ v∈V (∑ c∈C x v,c ) 2 is minimized.Then, f (x) is the winning committee.seq-Phragmén.This rule takes k rounds to select the winners, one for each round.For a candidate c, let V c = {v ∈ V : c ∈ v} be the set of voters approving c.Initially, let w = ∅.Let x (j) v denote the voter loads after round j.At first, all voters have a load of 0, i.e., x As a first candidate, we select one c ∈ C that receives the most approvals and add c into w.Then, the voter load of each voter approving this selected candidate is increased to 1  |V c | .In the next round, we choose a candidate that induces a (new) maximal voter load that is as small as possible, but now we have to take into account that some voters already have a non-zero load.The new maximal load if some candidate c ∈ C is chosen as the (j + 1)-st committee member is measured as In other words, if c is chosen, then we adjust the voter loads of all voters approving c, so that they have the same voter load afterwards.Let c be the candidate that minimizes s for all v ∈ V c .After k rounds, the committee w consists of exactly k candidates.Note that we also obtain a load distribution x such that f (x) = w.
The above rules have been extensively studied in the literature from different aspects [26][27][28][29][30][31][32][33][34].However, the proportionality properties defined above of these rules have only received attention recently.Fernández et al. [24] proved that winner determination for all multi-winner voting rules that satisfy PR must be NP-hard.This directly implies that AV, SAV, and seqPAV do not fulfill PR as winner determination for these rules are polynomial-time solvable.Fernández and Fisteus [35] showed that MAV does not satisfy PR.Aziz et al. [23] showed that AV, SAV, MAV, and seqPAV do not satisfy JR.As PJR and EJR imply JR, it must be that AV, MAV, SAV, and seqPAV fail also PJR and EJR.So, none of AV, MAV, SAV, and seqPAV satisfy any properties studied in this section.The proportional properties of other rules have also been studied in the literature and we summarize them in Table 4. Table 4.A summary of proportional properties of important approval-based multi-winner voting rules and the complexity of winner determination for these rules.In the table, "N" means that the rule in the corresponding row does not satisfy the property in the corresponding column, and "Y" means that the rule satisfies the property.With the help of Table 4, we know that the answer to the following important question is in the negative:

EJR
Is there a natural rule (or an algorithm) whose outcome always provide JR, EJR, PJR, and PR simultaneously?
But do we still have some hope?The answer is unfortunately in the negative again.In fact, Fernández et al. [24] proved that there are no voting rules whose outcome always provides both PR and EJR.In particular, they construct an election instance where none of the PR committees provides EJR (Theorem 4 in the work by the authors of [24]).This negative result is in fact their motivation to propose the PJR property.Due to the fact that EJR implies PJR, and PJR implies JR, and the above impossibility result, our question then breaks down to the following two questions.First, is there any natural EJR rule?Second, is there any natural rule whose outcome always provide PR and PJR?The results in Table 4 provide a comprehensive answer: among the rules in the table, PAV is the only one that provides EJR, and thus provides JR and PJR too, and max-Phragmén is the only one that guarantees PJR and PR, and thus provides JR too.However, an obvious disadvantage of PAV and max-Phragmén is that computing a winning committee for them turned out to be a computationally hard problem.To overcome this dilemma, we need to explore alternative rules that satisfy these properties.Max-Phragmén seems unlikely to have any proper alternative to remedy the disadvantage, since it has been shown that computing any PR committee is NP-complete [24].For PAV, there do exist good alternatives.In particular, very recently, Aziz et al. [25] crafted two polynomial-time algorithms (multi-winner voting rules) whose outcome always provides EJR, and thus provides JR and PJR as well.It should be pointed out that other approaches to overcoming the dilemma include designing fixed-parameter algorithms or polynomial-time algorithms for some domain-restricted elections.This work has been conducted for PAV in recent years (see, e.g., [34,39,40]).

Fairness for Candidates with Sensitive Attributes
In the previous two sections, we mainly survey fairness properties designed for certain groups of voters.In some real-word applications, candidates have sensitive attributes.In these applications, fairness for groups of candidates has to be imposed into the decision-making procedure to avoid discrimination.In this section, we survey the recent progress of the study on this topic.We still assume that a fixed number k of winners shall be selected.
Ceils, Huang, and Vishnoi [41] studied fairness in the setting where candidates are in a number of groups, each of which corresponds to a sensitive attribute such as gender, ethnicity, etc. Notably, each candidate may have several attributes and hence the groups may be non-disjoint.They studied a quite general framework which requires that a winning committee should be a one that maximizes the score with respect to some defined scoring function, and fulfills the restriction that for each group of candidates with a specific attribute, a prescribed fraction of the group members must be selected.
Formally, let f : 2 C → R ≥0 be a scoring function.Moreover, let C 1 , C 2 , . . ., C t ⊆ C be subsets of candidates (it may that C i ∩ C j = ∅), and for each C i , 1 ≤ i ≤ t, let i and u i be two integers such that 0 ≤ i ≤ u i ≤ |C i |.Then, the goal of an f -multi-winner voting rule is to select a k-committee w ⊆ C with the maximum score under the restriction that for each The framework is so general that it only stipulates the maximum and the minimum numbers of candidates that should be selected from each group in general but leaves the settings of these two values to ad hoc applications.Particularly, i s and u i s can be constants, or any function of the number of candidates in the groups, the total number of candidates, etc.As argued by the authors, the framework generalizes several important proportional fairness properties studied in the literature including such as fully proportional representation [42], fixed-degressive proportionality [43], flexible proportionality [44], etc.
Given that the framework is so general, it is not surprising that computing a winning committee is a computationally hard problem.Given this negative result, the authors explored numerous approximation algorithms for calculating committees satisfying the above fairness constraints.Their results largely depend on the maximum number of groups each candidate is included.For example, they showed that if everyone belongs to exactly one group, i.e., (C 1 , C 2 , . . ., C t ) form a partition of C, there is a (1 − 1/e)-approximation algorithm, and they showed that this is probability optimal.However, in the case where some candidate belongs to at least three groups, checking whether there is a feasible solution is already NP-hard, and even in the case where feasible solutions exist, finding a solution with approximation factor ω(log / ) remains NP-hard, where is the maximum number of groups each candidate belongs to.For many other interesting theoretical results, we refer to Tables 1 and 2 in [41].The authors also conducted an experimental work to show that for many rules, the constrained version outputs a committee which is very close to the unconstrained version.Table 4 in [41] summarizes their findings regarding this issue.
Almost at the same time Ceils, Huang, and Vishno [41] posted their paper on Arix (https://arxiv.org/abs/1710.10057);Bredereck et al. [45] posted on Arix (https://arxiv.org/abs/1711.06527) a paper investigating a similar model.However, they mainly focused on the parameterized complexity and computational complexity of the winner determination problem.Similar constraints have been also considered in party-based voting (a apportionment problem) [46], where each party nominates several candidates and a total number of k seats should be distributed to these parties based on the preferences of voters to parties.

Stable Fairness
Cheng et al. [47] recently put forward a notion of group fairness inspired by the concept of core in cooperative game theory.In general, it says that a committee is fair to a group of voters if they cannot obtain a committee of proportional size that is strictly better for all members by deviating.Formally, for two committees S ⊆ C and S ⊆ C, let V(S, S ) be the number of voters prefer S to S. We say that S blocks S if and only if where n is the number of voters and k is the desired winning committee size.A committee S is i-stable for some integer i, such that 1 ≤ i ≤ k if and only if there does not exist a committee S of size at most i which blocks S. Cheng et al. [47] showed that their notion generalizes some previous studied notions such as justified representation.They also extended their notion to Stable Lotteries and Approximate Stability, and studied the existence of these stable solutions and how efficient they can be calculated.We refer to the work by the authors of [47] for the details.

Machine Learning Algorithms
Machine learning (ML) algorithms have gained a lot of attention in recent years due to their growing predictive capabilities.In this paper, we mainly cover supervised machine learning.The other two classes of machine learning, namely, unsupervised machine learning and reinforcement machine learning algorithms, have gained comparatively lesser research attention with respect to fairness and also remains beyond the scope of our review.
Supervised machine learning algorithms are provided a set of input "features" denoted by x (i) ∈ X and output "target" labels y (i) ∈ Y, which is jointly called "training set" (X , Y).Given the training set, supervised machine learning algorithms learn a function h : X → Y such that h(x) is a "good" predictor for the corresponding value of y (for an unknown x), where h denotes "hypothesis".Based on the distribution of Y, such a task could either be "regression" (where y (i) ∈ Y is continuous) or "classification" (where y (i) ∈ Y is a discrete class).A machine learning algorithm is evaluated based on its ability to correctly predict label y for an unseen data point (x ).Notably, such algorithms represent automated data-driven decision-making which functions by learning from historical decisions, often taken by humans.The utility of such systems (both classification and regression) is optimized by minimizing the errors while training and prediction over given training set.When given an initial set of alternatives, such tasks could represent clustering or classifying a set of alternatives into disjoint groups.Arguably, it is possible that when being trained and optimized for making such decisions (especially for individuals belonging to different protected classes), some classes might be unfairly treated with respect to the outcome and the error rates of the algorithmic decision-making.To account for and avoid such unfairness, the studies in fairness in machine learning has introduced various notions of unfairness.In the next sections, we provide a brief review on various such definitions (Section 3.1) and mechanisms (Section 3.2) of fair machine learning algorithms.

Fairness Notions
The literature on fair ML algorithms has predominately drawn on the concepts and definition of fairness from legal domain.Popular concepts such as direct discrimination (or "disparate treatment") and indirect discrimination (or "disparate impact") are based on various antidiscrimination laws that prohibit unfair treatment of individuals based on sensitive attributes such as gender, race, etc. [4].Disparate treatment occurs when the decision an individual user receives is prone to change with respect to changes in her corresponding sensitive attribute information.Similarly, disparate impact occurs when the decision outcomes disproportionately benefit or hurt members of certain sensitive attribute groups.More formally, Disparate Treatment.Given dataset D = (A, X, Y), with a set of sensitive attributes A (such as race, gender, etc.), remaining attributes X, and binary class to be predicted Y, predicted binary class Ŷ, disparate treatment is said to exist in data D if Disparate Impact.Given dataset D = (A, X, Y), with a set of sensitive attributes A (such as race, gender, etc.), remaining attributes X, and binary class to be predicted Y, disparate impact is said to exist in data for positive outcome class 1 and majority protected attribute 1 where Pr(Y = y|A = a) denotes the conditional probability that the class outcome in y ∈ Y given sensitive attribute a ∈ A.
For the convenience of the readers, in Table 5, we provide concise summary of various definitions of fairness in literature.
In our review, we observed that a majority of recent studies have focused on design of automated decision-making systems that aim at avoiding one or both of these unfairness notions.For example, consider the work of Feldman et al. [48], who developed a test for disparate impact as well as methods by which data might be made unbiased.Luong et al. [49] provided a method of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classification.Zemel et al. [50] proposed a learning algorithm for fair classification that achieved both group fairness and individual fairness by formulating the fairness as an optimization problem of finding a good representation of the data with two competing goals: to encode data as well as possible while simultaneously obfuscating any information about membership in the protected group.Zafar et al. [51] introduced a flexible constraint-based framework to enable the design of fair margin-based classifiers which make use of a general and intuitive measure of decision boundary unfairness.In a more recent work, Zafar et al. [52] introduced an alternative notion of unfairness called disparate mistreatment.A classifier is said to suffer from disparate mistreatment if the misclassification rates for different groups of individuals having different values of the sensitive attribute A are different.Zafar et al. [52] proposed that disparate mistreatment in binary classification task can be specified with respect to various misclassification measures such as overall misclassification rate, false positive rate, false negative rate, false omission rate, and false discovery rate.We have also witnessed recent works drawing on fairness concepts from economics and social welfare such as equality, Gini distribution, etc., in its conceptualization of fairness.
Despite gaining tremendous research attention in recent years, a major feature of fair ML literature has been an extensive set of definitions of fairness to choose from and various empirical and theoretical findings suggesting the impossibility of satisfying various fairness definitions at the same time.Nevertheless, in our review, we aim to cluster fairness in machine learning algorithms studied in the extant literature into three main categories, namely, anticlassification, statistical parity, and calibration.
1. Anticlassification, also known as unawareness, seeks to achieve fairness in ML outcomes by excluding the use of protected features such as race, gender, or ethnicity from the statistical model.This notion is consistent with disparate treatment.Despite being intuitive, easy-to-use and having legal support, a crucial difficulty of this approach is that a protected feature might be correlated with many other unprotected features, and it is practically infeasible to identify all such covariate "proxies" and remove them from the statistical model.For example, protected class race might be correlated with various other features, such as education level, salary, life-expectancy, etc., and removing all these proxies from the statistical model could have detrimental effects in predictive performance.Consider we have a vector x i ∈ R t that represents the visible attributes of individual i such as race, gender, education level, age, etc.An algorithmic decision can be represented as a function d : R t → {0, 1}, where d(x) = k, k ∈ {0, 1}, means that action a k is taken.Suppose that x can be partitioned into protected and unprotected features: x = (x p , x u ).Let X p denote the set of all protected features.Then, anticlassification requires that decisions do not consider protected attributes, more formally, Several other variants of anticlassification are also proposed in the literature [53,54].2. Statistical parity (also known by the names of demographic parity, independence, statistical parity, and classification parity) requires that common measures of predictive accuracy and performance errors remain uniform across various groups segmented by the protected features.This includes notions such as statistical parity, equality of accuracy, equality of false positive/false negative rates, and equality of positive/negative predictive values [55][56][57].The main idea of this notion is to quantify and equate benefit and harm of the impact of the ML prediction to groups segmented by protected attributes equally and to distribute the errors among different stakeholders equally [55].This notion of fairness has recently found application in criminal justice [58] and is consistent with disparate impact.
The measure of classification parity based on false positive rate and the proportion of decisions that are positive have received considerable attention in machine learning domain [55,59,60].For formal definition, please refer to Table 5.
Recent research by Hu and Chen [61] suggests that the enforcement of statistical parity criteria in the short-term benefits building up the reputation of the disadvantageous minority in labor market in the long run.Note that, a critical flaw of notion of statistical parity is that it is easy to satisfy it by some arbitrary configuration, for example selecting best and qualified candidates from one group and random alternatives from the other group can still satisfy statistical parity.Moreover, the definition also ignores any possible correlation between positive outcome and protected attributes.3. Calibration requires that ML outcomes remain independent of protected features after controlling for estimated risk.Calibration relates to the fairness of risk scores and requires that for a given risk score, the proportion of individuals re-offending remains uniform across protected groups.
Calibration is beneficial as a fairness condition as it does not require much intervention in the existing decision-making process [62].A major disadvantage of calibration is that it has been shown that risk score can be manipulated to appear calibrated by ignoring information about the favored group [63].Formally, given risk scores s(x), calibration is satisfied when Despite these multitude of notions measuring fairness from a diverse perspective, recent research has identified theoretical and empirical evidence that each of them suffer from significant statistical limitations [57].The above-described notions of fairness only aim to ensure equality between group averages, particularly drawn from protected classes such as gender, race, etc.In contrast, "individual notion" takes into account additional characteristics of individual features and looks into differences between individuals rather than groups.Individual fairness is satisfied when similar individuals are treated similarly.Users are treated as individuals regardless of their group membership (either protected or unprotected group).Individual fairness is quantified by the distance between the predicted outcomes and the distance between the individual characteristics [64].Josef et al. [65] introduced the study of fairness in multi-armed bandit problems which ensures that given a pool of individuals, a worse individual is never favored over a better one, despite a learning algorithm's uncertainty over the true payoff.A major drawback of the existing individual notion of fairness is the need to make strong initial assumptions.For instance, the notion coined by Dwork et al. [64] assumes the existence of prior agreed upon similarity metric which is nontrivial to compute and that of Joseph et al. [65] requires significant assumptions of the underlying functional form of the relationship between features and labels for any possible practical application.Another drawback pertains to the difficulty in selecting an appropriate metric function to measure the similarity of two inputs [66].
It is also beneficial to note that a new and emerging notion of fairness considers "causal" notion draws on literature on causal discovery and inference in its definitions [67][68][69].Another emerging literature proposes that the right notion of fairness depends on the context right notion of fairness, which depends on the context [60,70].Please refer to Table 5 for specific definition.

Fairness Definition Description
Equalized Odds Predicted outcome Ŷ satisfies equalized odds with respect to protected attribute A and true outcome Y, if Ŷ and A are independent conditional on Y, more specifically Equal Opportunity A binary predictor Ŷ satisfies equal opportunity with respect to A and Y if Statistical Parity A predictor Ŷ satisfies demographic parity if P( Ŷ|A = 0) = P( Ŷ|A = 1) [64] Counterfactual Fairness For a given causal model (U, V, F) where V ≡ A ∪ X, predictor Ŷ is said to be "counterfactually fair" if under any context X = x and for all y and for any value a attainable by A [68] Fairness through awareness An algorithm is fair if it gives similar predictions to similar individuals.Any two individuals who are similar with respect to a similarity metric defined for a particular task should be classified similarly [64].

Individual fairness
Let O be a measurable space and δ(O) be the space of the distribution over O.
If M : X → δ(O) denotes a map that maps each individual to a distribution of outcomes, the formulation of individual fairness is then where X , X ∈ R d are two metric functions on the input space and the output space, respectively [64].

Fairness Mechanisms
Next we turn to discuss the three fairness mechanisms clustered on the timing of the application of debiasing mechanism into preprocessing, in-processing, and postprocessing.

A. Preprocessing. Preprocessing methods deal with removing the protected features or their
covariates before training the model.Similar to anticlassification, this method come with severe disadvantages as the protected feature might be correlated with many other unprotected features, and it is practically infeasible to identify all such covariates and exclude them without losing a lot on predictive accuracy.Kamiran and Calders [71] suggest a set of data processing techniques aimed at ensuring fairness for classification tasks.These include suppression, massaging the dataset, reweighting, and sampling.
Suppression.In this process, exactly like anticlassification, all the features that correlate with the protected set of features X p are first identified which are then removed from the classification model.Massaging the dataset.In this process, labels of some data points are manipulated in order to remove existing discrimination from the training data.In order to find a good set of labels to change, Kamiran and Calders [71] proposed a combination of ranking and learning.
Reweighting.Instead of changing the labels, in this method the tuples in the training dataset are assigned asymmetric weights in order to overcome the bias Sampling.Kamiran and Calders [71] introduced "uniform sampling" and "preferential sampling", where the training data is sampled with the help of a ranker as a debiasing method.
Kamiran and Calders [71] found that suppression of the protected attributes does not always result in the removal of bias and massaging and preferential sampling techniques performed best for debiasing with a minimal loss in accuracy.
Another idea developed in preprocessing is to learn a new representation of the data such that it removes the information correlated to the sensitive attribute [50,72,73].The central algorithm such as classification then use the cleaned data.An advantage of this method is that the analyst can avoid the need to modify the classifier or access sensitive attributes during test time.B. In-processing.In this method, the optimization procedure is modified to incorporate cost of unfairness.This is typically done by addition of a constraint to the optimizing problem or addition of cost of fairness as a regularizer.For example, Agarwal et al. incorporate cost-sensitive classification into their original objective function [59].Given a dataset, {(x i , c 0 i , c 1 i )} n i=1 , where c 0 i is the cost of predicting 0 on x i and c 1 i is the cost of predicting 1 on x i , a cost-sensitive classification algorithm given the dataset outputs ĥ = arg min where h(x i ) represents the original objective function without cost sensitivity.
More generally, the reduction approach by Agarwal et al. suggests the reduction of training with fairness constraints and solving a series of cost-sensitive classifications using off-the-shelf methods [59].
An important advantage of this method is that there is no need to access sensitive attributes at test time.This method also provides higher flexibility in terms of trade-off between accuracy and fairness measures.An important disadvantage is that this method is task specific and requires modification of classifier which can often exponentially increase the computational complexity.
The method to optimize counterfactual fairness also falls into this category.Kusner et al. [68] propose "counterfactual fairness" that explicitly specifies the assumptions about the data generating process.This can be done by adding a linear or convex surrogate for the fairness constraint in the learning models.For example, consider a predictive problem with fairness considerations, where A, X, and Y represent the protected attributes, remaining attributes, and the output of interest, respectively.C. Postprocessing.Postprocessing methods require editing the posteriors in order to satisfy the fairness constraints.The method searches for a proper threshold using the original score function for each group.We refer to Hard et al. [55] for more details on this postprocessing method.This method requires test-time access to the protected attribute and lacks flexibility in terms of trade-off between accuracy and fairness.However, this method benefits from being general and applicable to any classifier without any modification.
Besides these, there are also some work on fairness in unsupervised learning.In their recent paper, Bolukbasi et al. [74] analyzed the unfairness present in word embeddings, a popular framework used to represent text data as vectors and quantitatively demonstrate that word-embeddings contain biases in their geometry that reflect stereotypes present in our society (for example words like "programmer" was closer to male names as compared to "homemaker", which was closer to female names).Additionally, the authors also introduce various debiasing methods to deal with detrimental effects of such gender bias.In the similar line, Zhao et al. [75] investigated various datasets and models associated with multilabel object classification and visual semantic role labeling, and found that various datasets for these tasks contain significant gender bias which are amplified by the models trained on these datasets.As an example, the authors found that activities like cooking are highly associated with females as compared to males.Following these works, a large number of subsequent work has been devoted to debiasing techniques biases embedded in word embeddings [76][77][78][79].

Recommender Systems
Recommender systems are among the most pervasive applications of algorithmic decision-making in industry, with many services using them to support users in finding products or information that are of potential interest [80].Such systems find applications in various online platforms such as Netflix, Linkedin, Amazon, etc., where the alternative set of items is much larger which needs to be filtered (and a smaller set of items is to be designed) before being presented to the user.There are various approaches for recommender systems available, such as collaborative filtering [81], content-based filtering [82], and knowledge-based recommendation [83], or some hybrid combinations of these.First, collaborative filtering algorithms are based on the assumption of word-of-mouth, that is, decisions of users are influenced by other users who are closer to her (such as family and friends).User-based collaborative filtering [81] identifies the k-nearest neighbors of the focal user and based on these nearest neighbors calculates a prediction of the focal user's rating for a specific item.In contrast to user-based collaborative filtering.Item-based collaborative filtering [84] searches for items rated by focal user that received similar ratings as items currently under investigation in order to estimate the probability of its utility.Second, content-based collaborative filtering [82] is based on the assumption of monotonicity of personal interests.In content-based filtering, the content of already consumed items are compared with those of the new items that can potentially be recommended to the user.Based on some "similarity" measure of such comparisons, items that are likely to be of interest to the focal user are recommended.Third, knowledge-based recommendation [83] also draws on deeper knowledge (such as semantic knowledge) about the items in addition to ratings and textual item descriptions that the first two approaches use.
The study of bias and fairness in recommender systems is an emerging research area that is receiving increasing attention.This is further fueled by evidences of detrimental consequences of popularity bias in recommender systems where recommenders typically emphaize popular items over other "long-tail", less popular ones that may only be popular among small groups of users [85].Notably, a majority of recommender algorithms can be considered as a subset of machine learning algorithms.Notwithstanding, we discuss them separately here due to their unique importance and application pertaining to fairness, and because studying fairness in recommender systems is considered to be challenging and complex as they often consist of multiple models, must balance multiple goals, and are difficult to evaluate due to sparsity and dynamism.
Like algorithmic fairness in general, the definition of fairness in recommender systems is as well challenging.In traditional recommender systems, the optimization only takes place on the accuracy of performance, that is, how well the algorithm predicts whether a user will like an item or not based on the utilities of users.Literature in fairness of recommender systems adds in constraints or additional objectives in order to ensure sufficient item coverage, fairness or diversity when it comes to item recommendation.Recommender systems with such constraints can better facilitate their adoption and purchase and fairly deal with the wishes and preferences of all classes/groups of users [86].
Similar to the accuracy-fairness trade-off in machine learning, recommender systems as well suffer from utility-fairness conundrum as making the recommendations fair will likely reduce utility of the entire system.Moreover, recommendation systems also suffer from some unique shortcomings as compared to machine learning fairness in general [87].For instance, in a recent paper, Farnadi et al. [88] defined two primary types of bias drawn from imbalance in data.First, observation bias appears due to the feedback loop in the recommender systems, as item displayed by the recommender system gets further reinforced in the choice by the agent over the period of time, leading to the increase in probability for the item to be retained in the system.Moreover, items similar to such an item also get more weightage by the system to be further recommended.Second, biases that come from imbalance in the data are caused when a systematic bias is present in the data/ experience due to societal or historical features.Literature has explored approaches towards handling such biases by increasing the diversity of recommendations [89,90].Additionally, a more recent line of research looks at fairness in recommender systems through the use of various metrics.For instance, Yao and Huang [87] adopt five different fairness metrics in their exploration of fair recommender systems based on matrix factorization.Burke [91] introduces fairness via neighborhood balancing with a space linear method.
Although these early works have played a vital role in increasing our understanding of fairness in recommender systems, most of the existing work in fair recommender systems focus on fairness in supervised learning setting, and only very recently are researchers moving towards fairness in unsupervised tasks such as clustering and ranking (See, for example, the works by the authors of [87,[91][92][93].).Below, we provide a more elaborate overview of the existing literature divided into three main clusters: (1) fairness for users and group of users (Section 4.1) where we look at research work aimed at introducing fairness for users or their groups; (2) fairness for items (Section 4.2) where fairness is introduced from the side of recommended items, and, finally; (3) multi-stakeholder fairness (Section 4.3) where fairness incorporates various stakeholders at the same time.In Table 6, we provide a concise summary of these three domains.

Fairness for Users and Groups of Users
These methods are aimed at ensuring fairness for individual or a group of users.Similar to the classification or statistical parity discussed in Section 3.1, fairness for users consider group fairness in which protected group incurs rating prediction errors in parity with the nonprotected group.Yao and Huang [87] studied fairness in collaborative-filtering settings and identify new fairness metrics that can be optimized by adding fairness terms to the learning objective.They also show via experiments that their new metrics can better measure fairness than the baseline and are effectively useful in reducing bias.In another paper, Ning and Karypis [94] aimed to achieve the same notion of user fairness by adding a regularization term to the collaborative filtering objective function that measures the deviation with respect to the total weight assigned to the protected and nonprotected group member.
A related but separate line of work looks at individual fairness in group recommendation, where the goal is to design systems that recommend to a group of users while respecting the individual preferences of the group members.In such a setting, the objective is not only to maximize the overall satisfaction among group members but also to ensure that the recommendations are fair in terms of minimizing the feeling of dissatisfaction among group members.Earlier work in this line mainly view fairness issues from the perspective of game theory and voting theory by treating the group decision process either as non-cooperative game or as a voting campaign without clearly modeling the trade-off between overall satisfaction and fairness of users [99][100][101][102].
In a more recent work, Lin et al. [103] investigated the group recommendation problem from a computational lens.Their method tries to maximize the satisfaction of each group member while minimizing the unfairness between them.The authors conceptualize such fairness-aware group recommendation as a multiobjective optimization problem consisting of two independent objectives: individual fairness and social welfare.In a similar line of research, user fairness is modeled in terms of satisfaction of the user with the group recommendation.Qi et al. [104] propose probabilistic models that capture the preference of a group towards a recommended package, and incorporate fairness into it by ensuring further devouring so that no user is consistently slighted by the item selection in the package.This idea has been further developed in subsequent papers [105,106].For example, in [105], Serbos et al. develop fairness measures for package recommendation based on "proportionality" and "envy-fairness".
Proportionality.Given a package , P, and a parameter , , we say that a user u likes an item i ∈ P if i is ranked in the top-% of the preferences of u over all items.Consequently, for a user , u, and a package , P, we say that P is m-proportional for u, for m ≥ 1, if there exists at least m items in P, which are liked by u.Envy-freeness.Given a group G, a package P, and a parameter , we say that a user u ∈ G is envy-free for an item i ∈ P, if r(u, i) is in top -% of the preferences in the set {r(u, i) : v ∈ G}.Consequently, for a user u, a package P and a group G, we say that the package P is m-envy-free for u, for m ≥ 1, if u is envy-free for at least m items in P.
The authors develop algorithms that can construct a package of items for a group of users satisfying either proportionality or envy-freeness.
A separate but related line of work looks at individual fairness in group recommendation.Sacharidis [107] looks into the minimum utility a group member receives as the notion of fairness.The author further proposes a technique that is able to rank the items by considering all admissible ways in which a group might reach a decision.

Fairness for Items
Although recent research has been focused on the importance of identifying fairness and diversity in terms of aspects of user preferences as a quality of recommendations, growing research attention is also being received by fairness in terms of groups of items.For example, researches have looked into algorithms that guarantee fairness among item categories when recommended to users.Steck [95] looked into the application of movie recommendations and suggested that item fairness should ensure that the various (past) areas of interest of a user need to be reflected with their corresponding proportions when making current recommendation.For a particular set of recommendations to be fair, it must contain items from various groups with a ratio that is equal to the group ratio present in the subject's input preferences.To ensure such fairness, the authors propose a greedy iterative re-ranking (postprocessing) algorithm that can construct a list that balances the utility of the objects selected and the list's deviation from the input preferences.
In a similar vein, Tsintzou, Pitoura, and Tsaparas [96] presented another re-ranking method that achieves fairness by recompiling a set of objects such that the ratio of objects from various groups (output bias) is the same as the ratio present in the subject's input preferences (input bias).Such a method is able to avoid amplifying existing biases in the input by iteratively swapping a low-utility nonprotected object with a high-utility protected object.

Multiple Stakeholder Fairness
A unique characteristic of recommender systems is in facilitating mapping or transaction between parties, such as producers and consumers-a perspective now popularly known as multi-stakeholder recommendation or two-sided markets.Such platforms benefit from integrating the preferences of multiple parties into recommendation generation and evaluation.They are now of common occurrence in online market places designed in a variety of industries such as music (Spotify, Soundcloud, and Pandora), recruitment (LinkedIn), content and entertainment (Dailymotion and Youtube), transportation and housing (Airbnb and Uber), etc.A commonality for all these platforms is that they provide a common place where providers and users congregate and make some form of transactions.While traditional recommender systems focused specifically towards satisfaction of consumer by providing a set of relevant content, these multi-sided recommender systems face the problem of additionally optimizing preferences for providers as well as for platform.Fairness requires multiple parties to gain or lose equally with respect to the recommendations made.Such a system is known as multi-stakeholder recommender system and is gaining a lot of recent research attention [108].
A recent paper by Burke [91] provides a great starting point for research in multi-stakeholder fairness.Burke's framework divides the stakeholders of a given recommender system into three categories-consumers, provides, and platforms-and introduces measures that take into consideration such multisided fairness.In a similar vein, Abdollahpouri et al. [97] describe origins of multistakeholder recommendation, and the landscape of system designs providing illustrative examples of current research.This line of research distinguishes itself from fairness consideration in earlier works where fairness in recommender systems is typically evaluated on their ability to provide items that satisfy needs and interests of the end user.In the same line of research, Mehrotra et al. [98] propose a conceptual computational framework applying counterfactual estimation techniques in order to understand and evaluate different recommendation policies surrounding the trade-off between relevance and fairness in the absence of A/B tests, a popularmethod of comparing two versions of same method against each other to determine which one performs better.

Conclusions
Algorithms are taking increasingly prominent decision-making roles in various applications in societal, organizational, and individual lives.Algorithmic decision-making has proliferated everywhere from legal to medical and from social media to employee recruitment in firms.As algorithmic decisions find themselves in major areas of societal impact, it becomes imperative to ensure that they guarantee some level of fairness and trust, more so when individuals and groups that represent minorities or protected classes in terms of gender, race, etc., are exposed to the detrimental consequences of algorithmic decisions.Motivated by the growing attention and interest of public and academia into fairness in algorithmic decision-making, this article endeavored to collect, survey and synthesize emerging and existing research aimed at introducing fairness in algorithmic decision-making.In this work, we provide a useful and simplified taxonomy of the current state of research in algorithmic fairness with a particular focus on decision-making as an application.Such a taxonomy and framework for analyzing algorithmic fairness research, we believe should be beneficial for future research.

Challenges and Future Research Directions
Our review also identified various challenges with respect to existing research on fair algorithmic decisions.First, our review identified multiple definitions of what is a fair decision-making algorithm and diverse approaches to ensuring fairness in algorithmic decisions.This becomes particularly true as fairness being a social construct gets measured in various notions that often correspond to differing lens in social sciences, justice, economics, and moral philosophy.Such diverse (and often uncorrelated) definitions and methods on the one hand provides a variety of tools to address different manifestations of bias and discrimination embedded in data.On the other hand existence of different definitions has led the research community into diverging path of research endeavors leading to a defragmented domain of science.
For instance, consider the two salient measures of fairness, (a) algorithmic fairness that requires the score that an algorithm produces to be equally accurate for all members vs.(b) algorithmic fairness that requires that the algorithm produces the same percentage of errors in terms of prediction for each group under consideration.Even though there exists normative commonality across these measures, there is so far no algorithmic solution to achieve parity in both these dimensions.Moreover, there is no consensus in literature on what is the best definition of fairness under a given circumstance.Theoretical and empirical evidence showing that different definitions of fairness cannot be satisfied at once makes it even difficult endeavour for policy-makers.To this end, evaluating each definition and method to decide on which definition and method to consider for a given task is a daunting task.Therefore, it is important for the algorithmic fairness community to move towards a converging path.For instance, a unified framework by Speicher et al. [109] is an important and encouraging first step in this direction.To this end, we hope that this article provides a broad overview for such an effort to successfully be accomplished.Moreover, our view is that such a unified framework should cross domains of algorithms and not just remain limited to machine learning.
Second, our review also discovered that a large majority of above reviewed work is centered on the development of statistical definitions of fairness and methods to expose and remove the corresponding biases.Research efforts need to be directed to bridge the gap between mathematical and algorithmic research in academia and their application in practice.See, for example Veale, Kleek, and Binns for such a work [110].In order to make fair algorithms accessible to practitioners, easy-to-use and off-the-shelf tools need to be developed.We have already seen encouraging first steps in this direction from the Human-Computer Interaction (HCI) community (see, e.g., the works by the authors of [111,112]).Future work should therefore aim at linking the definition of fairness studied in research to the definition of fairness based on user's perception.For example, in their recent work Srivastava, Heidari, and Krause [46] found that most simplistic mathematical definitions of fairness (i.e., demographic parity) most closely matches the people's idea of fairness in practice.This association remains true even when the participants were explicitly informed about the existence of other more complicated notions of fairness [46].In the vein, Holstein et al. [113] conducted a systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems.The study identified various areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair ML research literature.Similar research associating the work in academia and practice should be beneficial in making the fairness in algorithms literature more realistic and more easily accessible to practitioners.
Third, empirical evidence and mathematical proofs have by now extensively established the prevalence of inherent trade-offs between the constraints imposed with the notions of fairness and performance accuracy of algorithms [63,114].This has practical implications as the designer of the system and the user need to decide on level of performance accuracy that they are willing to forgo in order to ensure fairness constraints.Design of algorithms that aim at handling such trade-offs in a systematic way could be beneficial and further explored.
Fourth, our review has also discovered that fairness in machine learning and recommender systems is excessively focused on supervised learning.Though there has been some progress in unsupervised learning such as word embedding and clustering [5,[115][116][117], it is limited as compared to supervised setting.Future work should further advance fair decision-making with respect to unsupervised learning algorithms.

Limitations
Note that, due to lack of space and a choosen design to keep the discussion focused, in this article we only focus on the fairness in algorithmic decision-making in three main domains, namely multi-winner voting, machine learning, and recommender systems.It is important to note that, by design, we have not given enough attention to a large and perhaps equally important work on peripheral topics such as fairness in natural language understanding, resource allocation, representation learning, causal learning, etc.This we leave open for the future research to survey.
are not yet in w.Then we add c to w and set x(j+1) v := s (j+1) c

• Single nontransferable vote (SNTV). Every
with preference v and a nonempty committee w ⊆ C, let top w ( v ) be the top-ranked candidate of v among w, i.e., top w ( v ) is the candidate c ∈ w, such that c v c for all c ∈ w \ {c}.The CC score of a committee w ⊆ C from a voter with mapping α is then α(pos v (top w ( v ))).In this section, we consider only the Borda satisfaction function α : N → N, which, for m candidates, holds that α

Table 3 .
Complexity of computing a committee satisfying a proportional property or testing whether a given committee satisfies a proportional property.

Table 5 .
Different types of fairness in recommender systems.

Table 6 .
Different types of fairness in recommender systems.