1. Introduction
Classification under uncertainty by a group of agents is a common task that appears in different fields. In some applications it is formulated as a labeling process of similar entities (also called “instances”), while in others it is formulated as clustering procedures. For example, consider a group of physicians analyzing the medical records of a patient. Each physician analyzes the symptoms of the patient and diagnoses possible diseases, thus classifying or tagging the case with the disease name. The final diagnosis of the group is made based on the collective classifications provided by the group members. Naturally, with prior knowledge of the expertise of each physician, a larger weight can be given to those physicians who are experts in the specific disease. Note, however, that the challenge of reaching a collective decision is further enhanced when there is no prior knowledge on the agents’ expertise. This can be the case when ad hoc classifications are obtained by online surveys and questionnaires based on anonymous users with different yet unknown expertise levels.
One of the most popular methods to reach a collective classification based on a group of agents’ answers is known as the “wisdom of the crowd” (WOC). According to this approach, a decision can be reached based on the aggregated opinion of the agents, including both the experts and nonexperts [
1]. WOC is usually based on a majority (or plurality) vote, meaning that an opinion preferred by most of the agents is considered to be a correct answer. The WOC’s main assumption is that the expertise level of the agents is distributed somewhat symmetrically around the unknown true answer. Therefore, it makes sense to apply a majority vote procedure to obtain better accuracy (i.e., relying on the law of large numbers). Numerically, the majority vote is represented by the median statistics, and for a relatively large number of nonskewed agents, it effectively solves the group classification problem. Another setting where a majority vote is effective is when agents who make classifications have high and homogeneous levels of expertise in the considered field.
Nonetheless, in various settings, the WOC assumption does not hold. For example, in online questionnaires over the internet in specific fields, only a few of the users are real experts in the field, while most of the users are nonexperts, and considering their opinions can seriously reduce the collective classification accuracy.
In this paper, we focus on ad hoc classification by a group of agents with unknown different levels of expertise. The suggested algorithm includes two stages:
 
Classification of the agents according to the levels of their expertise;
 
Classification of the entities with respect to the agents’ levels of expertise.
In other words, in the first stage, the algorithm recognizes the experts in the fields of the presented entities, and in the second stage it classifies the entities, preferring the opinions of these experts (for example, using some weighting scheme or expectationmaximization scheme).
In the classification of the agents, we assume that the agents with the same fields of expertise have relatively close or even the same opinions in their field of expertise, while the nonexperts’ opinions (if they are not biased) are more scattered over other possible classifications. Accordingly, if the agents propose similar classes for the same entities, then these agents are considered to be experts in these classes. Consequently, a lower level of expertise can be associated with agents who are inconsistent in their opinions and create classes that differ from the classes proposed by the other agents. Certainly, if the levels of the agents’ expertise are known, then this stage can be omitted, and the problem can be reduced to the majority or plurality votes and further optimization procedures.
In the classification of the entities, one can utilize conventional methods such as combination of the weighted agents’ classifications. We follow the expectationmaximization (EM) approach as suggested by Dawid and Skene [
2]. In the expectation (E) step, the algorithm estimates correct choices according to the agent’s expertise, and in the maximization (M) step, it maximizes the likelihood of the agent’s expertise with respect to the distances from the correct choices. To measure the distances between the agents’ classifications, we use the weighted Hamming distance, which is a normalized metric over the set of partitions that represent the agents’ classifications.
The suggested algorithm was validated and tested using simulated and realworld datasets [
3,
4]. The obtained classifications were compared against several approaches: (i) classifications obtained by a bruteforce likelihoodmaximization (LM) algorithm (see
Section 4), (ii) majority vote (see
Section 6.2.1), (iii) the recently developed fast Dawid–Skene (FDS) algorithm [
5], and (iv) the widely known GLAD classification algorithm [
6]. It was found that the proposed algorithm considerably outperforms these popular methods due to its higher accuracy and lower computation time.
The rest of this paper is organized as follows: In
Section 2, we briefly overview the related methods that form a basis for the suggested techniques.
Section 3 includes a formal description of the considered problem. In
Section 4, we outline and clarify the bruteforce likelihoodmaximization algorithm, which is used for comparisons of the classifications of small datasets.
Section 5 presents the suggested distancebased collaborative classification (DBCC) algorithm.
Section 6 includes the results of the numerical simulations and the comparisons of the proposed DBCC algorithm with other classification techniques.
Section 7 concludes the discourse.
2. Related Work
In terms of the classification problems, aggregation of the agents’ opinions is often treated as a proper application of crowdsourcing techniques. Chiu et al. [
7] considered decisionmaking processes with crowdsourcing and outlined three potential roles of the crowd: intelligence (problem identification), design (alternative solutions), and choice (evaluation of alternatives). Each of these problems can be considered by different methods and, in particular, by recognition of the crowd’s preferences and choices of the alternatives based on these preferences, or in contrast, by as many alternating opinions as possible and further aggregation of these opinions into the unified one.
Crowd opinion aggregation is conducted by several methods and depends on the problem set. For example, Ma et al. [
8] (2014) developed an algorithm for gradual aggregation based on measuring the distances between opinions at three similarity levels. In the aboveconsidered problems, following the “wisdom of the crowd” (WOC) approach, the most commonly used aggregation technique was expectation maximization (EM) [
9], which was also implemented in the wellknown Dawid–Skene (DS) algorithm [
2]. First, this algorithm was applied to analyze the error rate, and then it was extended to different problems that required an aggregation of opinions. In particular, Zhang et al. [
10] proposed a twostage version of the algorithm and justified its performance using spectral methods. Shah et al. [
11] considered a permutationbased model and introduced a new error metric that compares different estimators in the DS algorithm. Finally, Sinha et al. [
5] suggested a fastexecutable version of the DS algorithm (termed the FDS algorithm), and at the estimation step (Estep) the dataset is estimated based on the current values of the parameters. Moreover, while at the maximization step (Mstep), the values of the parameters are chosen such that the likelihood of the dataset is maximized. Starting from the initial estimates, the algorithm alternates between the Mstep and the Estep until the estimates converge to a unified decision.
In parallel to the development of the DS method, other studies have focused on the data analysis phase of the problem, as well as on the possible extensions of the method to multilabel classifications. Following this direction, Duan et al. [
12] proposed three statistical quality control models based on the DS algorithm. The authors incorporated label dependency to estimate the multiple true labels given crowdsourced multilabel agents for each instance (entity). Another approach based on the Bayesian models was suggested by Wei et al. [
13], who considered the agent’s reliability and the dependency of the classes.
The EMbased methods were also enriched by learning methods to obtain a better classification. In particular, the techniques of multiple Gaussian processes enabled us to learn from the agents and estimate the reliability of the individual agents from the data without any prior knowledge. Groot et al. [
14] and Rodrigues and Pereira [
15] introduced different models based on standard Gaussian classifiers and presented a precise handling of multiple agents with different levels of expertise.
Other suggested methods for estimating the agents’ expertise levels were based on probabilistic methods. Using such an approach, Whitehill et al. [
6] proposed a procedure for determining the agents’ expertise (called the GLAD algorithm), while Raykar et al. [
16] suggested a method for estimating the classes’ true labels. Bachrach et al. [
17] proposed a probabilistic graphical model that considered the entities, the agents’ expertise, and the true labels of the entities. Finally, since the considered problem could be deemed a framework of unsupervised learning, Rodrigues and Pereira [
15] addressed it as a problem of deep learning using crowd opinions in neural networks. Moayedikia et al. [
18] proposed an unsupervised approach based on optimization methods using the “harmony search” over different agent combinations.
Following the work of Chiu et al. [
7], the present paper focuses on the evaluation of classification alternatives, where the crowd preferences are identified and analyzed ad hoc for further support of the decisionmaking process. This study presents a novel heuristic that follows the direction outlined in the DS algorithm and its faster FDS (fast DS) version. This study addresses the problem of unsupervised classification for a relatively small number of entities and varying levels of the agents’ expertise. The performance of the suggested heuristic is compared with several known approaches and, especially, with the performance of the popular majority voting method and the FDS algorithm.
3. Problem Setup
Let $X=\{{x}_{1},{x}_{2},\dots ,{x}_{n}\}$ be a set of $n$ entities that represent certain characteristics of some phenomenon, and let $j=1,2,\dots ,l$ be the labels by which the set of entities can be divided into $l$ classes ${C}_{j}\subset X$, such that ${\bigcup}_{j=1}^{l}{C}_{j}=X$ and ${C}_{i}\cap {C}_{j}=\varnothing $ while $i\ne j$. The set of the correct classes ${C}_{j}$ forms an ordered partition $\gamma =\{{C}_{1},{C}_{2},\dots ,{C}_{l}\}$, where the order of the classes is defined by the order of the labels in the sense that if the labels $i$ and $j$ holds $i<j$, then class ${C}_{i}$ precedes class ${C}_{j}$ in $\gamma $.
We assume that the classification of the entities is conducted by $m$ agents. Consequently, each $k$th agent, $k=1,2,\dots ,m$, generates a partition ${\alpha}_{k}=\{{C}_{1}^{k},{C}_{2}^{k},\dots ,{C}_{l}^{k}\}$ of the set $X$ by labeling the entities, and this partition represents the agent’s opinion on the considered phenomenon. Similar to the partition $\gamma $, the order in the agents’ partitions ${\alpha}_{k}$, $k=1,2,\dots ,m,$ is defined by the order of the labels $j=1,2,\dots ,l$. It is assumed that the agents are independent in their opinions. However, different agents, $u$ and $v$, $u\ne v$, can generate equivalent classifications ${\alpha}_{u}={\alpha}_{v}$, where ${C}_{j}^{u}={C}_{j}^{v}$, $j=1,2,\dots ,l$. In addition, it is assumed that for each class, ${C}_{j}\subset X$, $j=1,2,\dots ,l$, there exists at least one agent $u$ who is an expert in this class. This assumption implies that if the correct classification $\gamma =\{{C}_{1},{C}_{2},\dots ,{C}_{l}\}$ is available, class ${C}_{j}^{u}$ from the agent’s classification ${\alpha}_{u}$ is equivalent to class ${C}_{j}$ from the correct classification $\gamma $.
The considered problem is formulated as follows: given the set $X=\{{x}_{1},{x}_{2},{\dots ,x}_{n}\}$ of entities and the set $\mathcal{A}=\left\{{\alpha}_{1},{\alpha}_{2},\dots ,{\alpha}_{m}\right\}$ of classifications created by $m$ experts using $l$ labels, find a classification ${\gamma}^{*}=\{{C}_{1}^{*},{C}_{2}^{*},\dots ,{C}_{l}^{*}\}$, $l\le n$, which is as close as possible to the unknown correct classification $\gamma =\{{C}_{1},{C}_{2},\dots ,{C}_{l}\}$.
To clarify the problem, let us consider a toy example of the dataset presented in
Table 1. The dataset consists of
$n=12$ entities classified by
$m=6$ agents with
$l=4$ classes. The unknown correct classification is denoted by
$\gamma $. In addition, we use
${\gamma}_{M}$ to denote the classification obtained by the majority vote.
The columns in the table are denoted by ${\left({r}_{\mathrm{1,1}},{r}_{\mathrm{2,1}},\dots ,{r}_{\mathrm{12,1}}\right)}^{T}$, …, ${\left({r}_{\mathrm{1,6}},{r}_{\mathrm{2,6}},\dots ,{r}_{\mathrm{12,6}}\right)}^{T}$, where the table entry ${r}_{i,k}$ represents the classification of element ${x}_{\mathrm{i}}$ by agent ${a}_{k}$ to one of the classes ${C}_{1}$, ${C}_{2}$, ${C}_{3}$, and ${C}_{4}$. The actual table entries are the tags of the corresponding class, namely, $1$, $2$, $3,$ and $4$.
In this example, we assume that the first agent,
$k=1,$ is an expert in class
${C}_{1}$, the second agent,
$k=2$, is an expert in class
${C}_{2}$, the third agent,
$k=3$, is an expert in classes
${C}_{1}$ and
${C}_{2}$, the fourth agent,
$k=4,$ is an expert in class
${C}_{3}$, the fifth agent,
$k=5,$ is an expert in class
${C}_{4}$, and finally, the sixth agent,
$k=6$, is an expert in the last classes
${C}_{3}$ and
${C}_{4}$. The data are summarized in
Table 1.
The results of the comparison of the agents’ classifications
${a}_{k}$ with the correct classification
$\gamma $ appear in the eighth column of
Table 1. It can be seen that each agent
$k=1,2,\dots ,6$ provides the classification
${a}_{k}$ which is rather far from the correct classification,
$\gamma $. Similarly, the classification
${\gamma}_{M}$, in the last column of
Table 1, generated by the majority vote is also far from the correct classification (with an accuracy level of
$50\%$). Thus, majority voting does not work well in this case, since the agents’ classifications are not symmetrically distributed around the correct class. Note, however, that classification of the proposed algorithm that is presented in
Section 5, and denoted by
${\gamma}^{*}$, which classifies the entities by the agent’s expertise (which is unknown a priori), is equivalent to the correct classification
$\gamma $, i.e., it results in a
$100\%$ accurate classification, where for
 
Expert $k=1$, class ${C}_{1}=\left\{{x}_{2},{x}_{6},{x}_{8}\right\}$;
 
Expert $k=2$, class ${C}_{2}=\left\{{x}_{1},{x}_{3},{x}_{10}\right\}$;
 
Expert $k=3$, classes ${C}_{1}=\left\{{x}_{2},{x}_{6},{x}_{8}\right\}$ and ${C}_{2}=\left\{{x}_{1},{x}_{3},{x}_{10}\right\}$;
 
Expert $k=4$, class ${C}_{3}=\left\{{x}_{4},{x}_{5},{x}_{11}\right\}$;
 
Expert $k=5$, class ${C}_{4}=\left\{{x}_{7},{x}_{9},{x}_{12}\right\}$;
 
Expert $k=6$, classes ${C}_{3}=\left\{{x}_{4},{x}_{5},{x}_{12}\right\}$ and ${C}_{4}=\left\{{x}_{7},{x}_{9},{x}_{12}\right\}$.
Thus, by identifying the expert agents, a correct classification can be achieved (see the implementation of the proposed algorithm to
Table 1 at the end of
Section 5).
Note, again, that in the considered setup both the correct classification and the agents’ levels and fields of expertise are unknown, and this information should be estimated only from the agents’ classifications. As seen later, the recognition of the expert agents is based on the assumption that experts in the same field of expertise provide closer answers than the answers of the nonexpert agents.
4. Local Search by Likelihood Maximization
Inspired by the considered example, where the best classification is provided by considering the opinions of the experts, we start with an algorithm that provides an exact solution by maximization of the expected likelihood between the agents’ classifications. This algorithm follows the brute force approach and, because of its high computational complexity, it can be applied only to small datasets.
Let $X=\{{x}_{1},{x}_{2},{\dots ,x}_{n}\}$ be a set of entities and $\mathcal{A}=\left\{{\alpha}_{1},{\alpha}_{2},\dots ,{\alpha}_{m}\right\}$ be the set of agents’ classifications ${\alpha}_{k}=\{{C}_{1}^{k},{C}_{2}^{k},\dots ,{C}_{l}^{k}\}$, $k=1,2,\dots ,m$, while the correct classification $\gamma =\left\{{C}_{1},{C}_{2},\dots ,{C}_{l}\right\}$ is unknown to the agents.
Let
${r}_{ik}\in \left\{1,\dots ,l\right\}$ be the tag by which the
$k$th agentlabeled entity is
${x}_{i}$ (see the columns in
Table 1); in other words, the values
${r}_{ik}$ are the opinions of the agents about that entity, and
${r}_{ik}=j$ denotes that in the classification of an agent
${\alpha}_{k}$, entity
${x}_{i}$ is in class
${C}_{j}^{k}$.
Assume that in the correct classification $\gamma $ an entity ${x}_{i}\in X$ belongs to class ${C}_{j}$. Since $\gamma $ is unknown, we consider the probability ${p}_{j{j}^{\prime}}^{k}=Pr\left\{{r}_{ik}={j}^{\prime}{x}_{i}\in {C}_{j}\right\}$ that the $k$th agent classifies an entity ${x}_{i}$ as a member of the class ${C}_{{j}^{\prime}}$ while the correct class is ${C}_{j}$, and ${P}_{k}={\Vert {p}_{j{j}^{\prime}}^{k}\Vert}_{l\times l}$ denotes the probability matrix that includes the opinions ${p}_{j{j}^{\prime}}^{k}$ of the $k$th agent, $k=1,2,\dots ,m$, on the membership of the entity ${x}_{i}$, $i=1,2,\dots ,n$ to the classes ${C}_{j}$, $j=1,2,\dots ,l$. If agent $k$ is completely reliable, then ${P}_{k}$ is a unit matrix. In general, the agent is considered to be an expert in class ${C}_{j}$ if ${p}_{j{j}^{\prime}}^{k}$ is close to one, while ${p}_{j{j}^{\prime}}^{k}$ and ${p}_{{j}^{\prime}j}^{k}$ are close to zero for all $j\ne {j}^{\prime}$.
Finally, we denote by ${p}_{C}=Pr\left\{C\ne \varnothing \right\}$ the probability that class $C\subset X$ includes at least one entity. Then, if $\stackrel{~}{C}\left(i\right)$ is an estimated class for the entity ${x}_{i}$, then ${p}_{\stackrel{~}{C}\left(i\right)}$ is the probability that entity ${x}_{i}$ will be classified to class $\stackrel{~}{C}\left(i\right)$. Additionally, we denote by ${\stackrel{~}{c}}_{i}\le l$ the label associated with class $\stackrel{~}{C}\left(i\right)$. Similarly, $C\left(i\right)$ denotes the correct class; for the $i$th entity ${x}_{i}$, the value ${p}_{C\left(i\right)}$ is the probability that the entity ${x}_{i}$ will be correctly included in the class $C\left(i\right)$.
Using these terms, the classification problem can be formulated as a problem of finding the classes
$\stackrel{~}{C}\left(i\right)$,
$i=1,2,\dots ,n$, the matrices
${P}_{k}$,
$k=1,2,\dots ,m$, and the probabilities
${p}_{{\stackrel{~}{c}}_{i},{r}_{ik}}^{k}$ that maximize the likelihood function
In the other words, it is required to maximize the value of the likelihood function
with respect to its arguments and subject to the relevant conditions:
An approximated solution of this problem can be defined as follows:
where
$\mathbb{I}$ is an indicator function that is
$\mathbb{I}\left(a=b\right)=1$ if
$a=b$ and
$\mathbb{I}\left(a=b\right)=0$ otherwise. The approximated solution can be obtained, for example, by majority vote (see
Section 6.2.1), which can also be used as an initial solution in the considered optimization algorithm.
The proposed algorithm, which aims to solve optimization problem (1) by local search. is outlined as follows (Algorithm 1).
Algorithm 1: Likelihood Maximization 
Given the set $X$ of $n$ items ${x}_{i}$, $i=1,2,\dots ,n$, and the set of the agents’ classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$, do:
Create the agents’ opinions matrix $r=\Vert {r}_{ik}\Vert $, $i=1,2,\dots ,n$, $k=1,2,\dots ,m$. Start with the solution given by the approximate formulae or by majority vote. Solve optimization problem (1). While no improvements to the current solution (which is the set of classes $\stackrel{~}{C}\left(i\right)$, $i=1,2,\dots ,n$) in its entire neighborhood are found, do: Define the neighbors of the solution as the classifications that can be obtained from the solution by changing the estimated class $\stackrel{~}{C}\left(i\right)$ for a single entity ${x}_{i}$; Calculate the likelihood for the set of neighboring classifications; Exclude the neighbors with a small likelihood; Solve optimization problem (1); End while. Return the obtained solution.

Following the outlined algorithm, an initial solution is refined iteratively until reaching the maximal expected likelihood. Such a method can provide an optimal solution to the problem; however, it requires high computation power and can be implemented only for relatively small problems. The time complexity of the Algorithm 1 is $\mathcal{O}\left(\upsilon nm{l}^{3}\right)$, where $n$ is the number of entities, $m$ is the number of agents, $l$ is the number of classes, and $\upsilon $ is the number of iterations until algorithm convergence. Here, $\upsilon $ is the number of repetitions of lines 5–8 in the while loop, where a maximum of $l$ classes are defined for each entity of a maximum of $n$ entities, and the optimization problem is solved by $m{l}^{2}$ steps. Since the number of classes $l$ is at most equal to the number of items $n$, the complexity of the Algorithm 1 in the worst case is $\mathcal{O}\left(\upsilon m{n}^{4}\right)$.
Having said that, the above Algorithm 1 can be used to prove the existence of a solution to the problem under the indicated assumption. Moreover, in the simulations shown below, we use this algorithm for analysis and comparison of the optimal classifications against the classifications generated by the heuristic method that is suggested next.
5. Suggested Algorithm: DistanceBased Collaborative Classification
The suggested algorithm, called the distancebased collaborative classification (DBCC) algorithm, consists of two stages: in the first stage, based on the presented opinions, the agents are tagged as experts and nonexperts for each of the different classes, and in the second stage, the classification of the entities is conducted with respect to the agents’ levels of expertise.
Classification of the agents according to their expertise levels is based on the assumption that agents with similar fields of expertise produce similar classifications of the related entities. On the other hand, the classifications of nonexpert agents are distributed over a relatively larger range of classes. Consequently, the tagging of the agents as experts and nonexperts is conducted by clustering the agents’ classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$, with respect to the different classes.
Let
$sim\left({\alpha}_{u},{\alpha}_{v}C\right)$ be a certain measure of similarity between two classifications
${\alpha}_{u}$ and
${\alpha}_{v}$ with respect to the class
$C\subset X$,
$u,v=1,2,\dots ,m$. Then, over all of the agents’ classifications
${\alpha}_{k}$,
$k=1,2,\dots ,m$, a central classification
$\xi \left(C\right)$ with respect to class
$C$ can be defined as follows:
The assumption about the closeness of the classifications produced by experts in a certain class implies that the values $sim\left({\alpha}_{k},\xi C\right)$ of similarities between the agents’ classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$, and some central classification $\xi \left(C\right)$ are distributed according to the mixture of two distributions: the first represents the distribution of the experts in class $C$, and the second represents the distribution of the nonexperts in this class.
The similarity between the classifications can be measured by several methods, for example, by the Rokhlin or Ornstein distances, or by the symmetric version of the Kullback–Leibler divergence (for the use of such metrics, refer, e.g., to [
19]). However, to avoid additional specification of probabilistic measures over the entities, in the suggested algorithm, we use a normalized version of the wellknown Hamming distance. This distance is defined as follows:
Let ${\alpha}_{u}=\{{C}_{1}^{u},{C}_{2}^{u},\dots ,{C}_{l}^{u}\}$ and ${\alpha}_{v}=\{{C}_{1}^{v},{C}_{2}^{v},\dots ,{C}_{l}^{v}\}$ be two classifications of the set $X=\{{x}_{1},{x}_{2},{\dots ,x}_{n}\}$ entities. Consider the classes ${C}_{j}^{u}\in {\alpha}_{u}$ and ${C}_{j}^{v}\in {\alpha}_{v}$, $j=1,2,\dots ,l$, and let $\mathfrak{n}\left({\alpha}_{u}j\right)=\#{C}_{j}^{u}$ denote the cardinality of the class ${C}_{j}^{u}$, while $\mathfrak{n}\left({\alpha}_{v}j\right)=\#{C}_{j}^{v}$ denotes the cardinality of the class ${C}_{j}^{v}$. The values $\mathfrak{n}\left({\alpha}_{u}j\right)$ and $\mathfrak{n}\left({\alpha}_{v}j\right)$ are the numbers of entities that are included in the $j$th class or, similarly, are tagged with the label $j$ by agents $u$ and $v$, respectively. In other words, $\mathfrak{n}\left({\alpha}_{u}j\right)$ and $\mathfrak{n}\left({\alpha}_{v}j\right)$ represent the independent opinions of agents $u$ and $v$ about the $j$th class.
In addition, let
$\mathfrak{n}\left({\alpha}_{u},{\alpha}_{v}j\right)=\#\left(\left({C}_{j}^{u}\cup {C}_{j}^{v}\right)\backslash \left({C}_{j}^{u}\cap {C}_{j}^{v}\right)\right)$ denote the cardinality of the symmetric difference between the classes
${C}_{j}^{u}$ and
${C}_{j}^{v}$. The number
$\mathfrak{n}\left({\alpha}_{u},{\alpha}_{v}j\right)$ represents the disagreement of the agents about the
$j$th class. The normalized Hamming distance between the classifications
${\alpha}_{u}$ and
${\alpha}_{v}$ is defined as the following ratio:
For each $j$, the defined distance ${d}_{NorHam}\left({\alpha}_{u},{\alpha}_{v}j\right)$ is a metric such that $0\le {d}_{norHam}\left({\alpha}_{u},{\alpha}_{v}j\right)\le 1$. This represents the disagreements between the agents with respect to different classes and, consequently, enables the definition of experts and nonexperts per class.
Additionally, using this distance, the set of classifications
${\alpha}_{k}$ and, consequently, the set of
$m$ agents can be considered as a metric space that allows for the application of conventional clustering algorithms. In the suggested DBCC algorithm, we apply Gaussian mixture clustering and the expectationmaximization algorithm [
20].
As a result of the clustering, the agents are tagged according to their level of expertise with respect to each class $C\subset X$. These levels are represented by the weights ${w}_{k}\left(C\right)$ associated with the agents and are used at the classification stage of the entities.
Classification of the entities ${x}_{i}\in X$, $i=1,2,\dots ,n$, based on the agents’ opinions ${\alpha}_{k}$, $k=1,2,\dots ,m$, with respect to their expertise levels ${w}_{k}\left(C\right)$, $C\subset X$, is conducted using conventional voting techniques; in the suggested DBCC algorithm, we use the relative majority vote.
In general, the suggested algorithm acts as follows: In the first stage, for each class, the differences (in terms of the normalized Hamming distance) between the agents’ classifications are defined. Using these distances, the agents are divided into two groups: experts and nonexperts. At this stage, an assumption is made that the experts in their area of expertise provide similar classifications of the related instances, unlike the nonexperts, whose classifications are more diverse. Accordingly, the opinions of the experts gain higher weights with respect to the nonexperts when all of the opinions are aggregated.
In the second stage, the entities are classified by majority vote with respect to the weighted opinions of the agents. Then, the obtained solution is corrected following the stages of the EM algorithms; the resulting classification of the entities is considered as an estimated classification obtained at the Mstep and is used at the Estep for the definition of more precise levels of the agents’ expertise.
The DBCC algorithm is outlined as follows (Algorithm 2):
Algorithm 2: DistanceBased Collaborative Classification (DBCC) Algorithm 
Given the set $X$ of $n$ items ${x}_{i}$, $i=1,2,\dots ,n$, the enumeration $j=1,2,\dots ,l$ of possible classes and the set of the agents’ classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$, do: InitializationInitialize distance matrices ${\Vert d\Vert}_{m\times m}$, distance arrays ${\Vert a\Vert}_{m}$, and weight arrays ${\Vert w\Vert}_{m}$. Initialize expertise map ${\Vert E\Vert}_{n\times m}$. Classification of the agents and definition of the expertise levels 3.
For each class ${C}_{j}$, $j=1,\dots ,l$, do:  4.
For each agent $u=1,2,\dots ,m$, do:  5.
For each agent $v=1,2,\dots ,m$, do:  6.
Set distance ${d}_{uv}={d}_{norHam}\left({\alpha}_{u},{\alpha}_{v}j\right)$ between the classifications ${\alpha}_{u}$ and ${\alpha}_{v}$ with respect to class ${C}_{j}$ into the distance matrix ${\Vert d\Vert}_{m\times m}$.  7.
End  8.
End  9.
Find central classification ${\xi}_{j}=\underset{u=1,2,\dots ,m}{\mathrm{Argmin}}{\sum}_{v=1}^{m}{d}_{uv}$.  10.
For each agent $k=1,2,\dots ,m$, do:  11.
Set distance ${d}_{k}={d}_{norHam}\left({\xi}_{j},{\alpha}_{k}j\right)$ from agent’s classification ${\alpha}_{k}$ to the central classification ${\xi}_{j}$ into the distance array ${\Vert a\Vert}_{m}$.  12.
End  13.
Cluster the agents into two groups (experts and nonexperts) with respect to distance array ${\Vert a\Vert}_{m}$.  14.
For each agent $k=1,2,\dots ,m$, do:  15.
Set weight ${w}_{k}$ into the weights array: the agent with the closest to the center vector obtains the weight 1, and more distant agents obtain the weight 0.  16.
End  17.
For each agent in the group of expert agents, do:  18.
Add class ${C}_{j}$ to the expertise map ${E}_{jk}$ of the $k$th agent with the weight ${w}_{u}$.  19.
End  20.
End Classification of the entities with respect to the agents’ expertise 21.
For each entity ${x}_{i}$, $i=1,\dots ,n$, do:  22.
For each class ${C}_{j}$, $j=1,2,\dots ,l$, do:  23.
Initialize the score of ${C}_{j}$ by zero.  24.
End  25.
For each agent $k=1,2,\dots ,m$, do:  26.
If class ${C}_{j}$ is in the agent’s expertise map ${E}_{jk}$, then add a score to this class.  27.
End  28.
Set a label for entity ${x}_{i}$ as an index $j$ of the class with the highest score.  29.
End Correction of the classification by repeating the expectation maximization steps 30.
Repeat until convergence (expectation maximization):  31.
Mstep: from the estimated correct classification, obtain normalized Hamming distances for all agents.  32.
Estep: estimate the correct classification by running steps 4–17 over the obtained distances.  33.
End

The suggested DBCC algorithm is a heuristic procedure utilized the EM techniques. At the Mstep, it maximizes the likelihood of agents’ expertise by using the distances from the estimated correct classifications. The latter is obtained at the Estep with respect to the agents’ expertise at the previous iteration. The process converges in the sense that the difference between the classifications obtained in two sequential steps tends to be zero. In practice, the process can be terminated when the difference between two sequential classifications decreases more than a certain predefined value of order $n\times {10}^{3}$.
The time complexity of the suggested Algorithm 2 is $\mathcal{O}\left(\upsilon \left(l{m}^{2}+(l+m)n\right)\right)$, where $n$ is the number of entities, $m$ is the number of agents, $l$ is the number of classes, and $\upsilon $ is the number of iterations up to the convergence of the EM part of the algorithm. Here, $\upsilon $ defines the number of iterations of the algorithm (see Line 32); in the term $l{m}^{2}$, $l$ is the number of iterations the for loop (Lines 3–20), and ${m}^{2}$ is the number of iterations for the loops (lines 4–8) and the number of steps in the operation in Line 9 (the other loops require $m$ steps), and the term $(l+m)n$ represents the number of iterations for the loop in Lines 21–29 and two internal for loops (Lines 22–24 and 25–27). Since the number of classes $l$ is at most equal to the number of items $n$, the complexity of the algorithm in the worst case is $\mathcal{O}\left(\upsilon \left({m}^{2}n+{n}^{2}\right)\right)$.
To clarify the main advantage of the algorithm that aims to find experts and nonexperts for further classification, let us refer back to the dataset presented in
Table 1.
Consider the classifications ${a}_{1}$ and ${a}_{2}$ provided by the first and the second agents with respect to class ${C}_{1}$. Following Equation (3), the distance between the classifications ${a}_{1}$ and ${a}_{2}$ is a ratio between the number of disagreements of the agents about the membership of the entity to a certain class. For the first and the second agents with respect to ${C}_{1}$, one obtains $\mathfrak{n}\left({\alpha}_{1},{\alpha}_{2}1\right)=3$, which represents the disagreement regarding three entities ${x}_{2}$, ${x}_{6}$, and ${x}_{8}$ that were classified by the first agent to the class ${C}_{1}$ ($\mathfrak{n}\left({\alpha}_{1}1\right)=3$); however, they were classified to other classes by the second agent ($\mathfrak{n}\left({\alpha}_{2}1\right)=0)$. Thus, the distance ${d}_{NorHam}\left({\alpha}_{1},{\alpha}_{2}1\right)=3/\left(3+0\right)=1$ is the maximal possible distance between these classifications.
Similarly, the distance between the classifications ${a}_{3}$ and ${a}_{4}$ with respect to the class ${C}_{1}$ is as follows: The number of disagreements between the agents is $\mathfrak{n}\left({\alpha}_{3},{\alpha}_{4}1\right)=6$ (entities ${x}_{2}$, ${x}_{3}$, ${x}_{7}$, ${x}_{8}$, ${x}_{10}$ and ${x}_{12}$), while regarding entity ${x}_{6}$, the agents agree with one another. The numbers of independent classifications of the third and fourth agents about class ${C}_{1}$ are $\mathfrak{n}\left({\alpha}_{3}1\right)=3$ (entities ${x}_{2}$, ${x}_{6}$ and ${x}_{8}$) and $\mathfrak{n}\left({\alpha}_{4}1\right)=5$ (entities ${x}_{3}$, ${x}_{6}$, ${x}_{7}$, ${x}_{10}$ and ${x}_{12}$), respectively. Thus, ${d}_{NorHam}\left({\alpha}_{3},{\alpha}_{4}1\right)=6/\left(3+5\right)=0.75$.
Calculation of the distances among the agents with respect to all four classes
${C}_{j}$,
$j=1,\dots ,4,$ results in the following tables (zero distances are shown in bold font):
$\mathbf{Class}\text{}{\mathit{C}}_{\mathbf{1}}$  ${\mathit{\alpha}}_{\mathbf{1}}$  ${\mathit{\alpha}}_{\mathbf{2}}$  ${\mathit{\alpha}}_{\mathbf{3}}$  ${\mathit{\alpha}}_{\mathbf{4}}$  ${\mathit{\alpha}}_{\mathbf{5}}$  ${\mathit{\alpha}}_{\mathbf{6}}$ 
${\alpha}_{\mathbf{1}}$  $$  $1.0$  $\mathbf{0}$  $0.75$  $1.0$  $0.71$ 
${\alpha}_{2}$  $1.0$  $$  $1.0$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{3}$  $\mathbf{0}$  $1.0$  $$  $0.75$  $1.0$  $0.71$ 
${\alpha}_{4}$  $0.75$  $1.0$  $0.75$  $$  $0.5$  $0.56$ 
${\alpha}_{5}$  $1.0$  $1.0$  $1.0$  $0.5$  $$  $0.43$ 
${\alpha}_{6}$  $0.71$  $1.0$  $0.71$  $0.56$  $0.43$  $$ 
$\mathbf{Class}\text{}{\mathit{C}}_{\mathbf{2}}$  ${\mathit{\alpha}}_{\mathbf{1}}$  ${\mathit{\alpha}}_{\mathbf{2}}$  ${\mathit{\alpha}}_{\mathbf{3}}$  ${\mathit{\alpha}}_{\mathbf{4}}$  ${\mathit{\alpha}}_{\mathbf{5}}$  ${\mathit{\alpha}}_{\mathbf{6}}$ 
${\alpha}_{1}$  $$  $0.5$  $0.5$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{2}$  $0.5$  $$  $\mathbf{0}$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{3}$  $0.5$  $\mathbf{0}$  $$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{4}$  $1.0$  $1.0$  $1.0$  $$  $0.67$  $0.5$ 
${\alpha}_{5}$  $1.0$  $1.0$  $1.0$  $0.67$  $$  $0.33$ 
${\alpha}_{6}$  $1.0$  $1.0$  $1.0$  $0.5$  $0.33$  $$ 
$\mathbf{Class}\text{}{\mathit{C}}_{\mathbf{3}}$  ${\mathit{\alpha}}_{\mathbf{1}}$  ${\mathit{\alpha}}_{\mathbf{2}}$  ${\mathit{\alpha}}_{\mathbf{3}}$  ${\mathit{\alpha}}_{\mathbf{4}}$  ${\mathit{\alpha}}_{\mathbf{5}}$  ${\mathit{\alpha}}_{\mathbf{6}}$ 
${\alpha}_{1}$  $$  $0.64$  0$.67$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{2}$  $0.64$  $$  $0.56$  $0.8$  $0.78$  $0.8$ 
${\alpha}_{3}$  $0.67$  $0.56$  $$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{4}$  $1.0$  $0.8$  $1.0$  $$  $1.0$  $\mathbf{0}$ 
${\alpha}_{5}$  $1.0$  $0.78$  $1.0$  $1.0$  $$  $1.0$ 
${\alpha}_{6}$  $1.0$  $0.8$  $1.0$  $\mathbf{0}$  $1.0$  $$ 
$\mathbf{Class}\text{}{\mathit{C}}_{\mathbf{4}}$  ${\mathit{\alpha}}_{\mathbf{1}}$  ${\mathit{\alpha}}_{\mathbf{2}}$  ${\mathit{\alpha}}_{\mathbf{3}}$  ${\mathit{\alpha}}_{\mathbf{4}}$  ${\mathit{\alpha}}_{\mathbf{5}}$  ${\mathit{\alpha}}_{\mathbf{6}}$ 
${\alpha}_{1}$  $$  $0.33$  0$.25$  $1.0$  $071$  $0.71$ 
${\alpha}_{2}$  $0.33$  $$  $0.33$  $1.0$  $1.0$  $1.0$ 
${\alpha}_{3}$  $0.25$  $0.33$  $$  $1.0$  $0.71$  $0.71$ 
${\alpha}_{4}$  $1.0$  $1.0$  $1.0$  $$  $0.5$  $0.56$ 
${\alpha}_{5}$  $0.71$  $1.0$  $0.71$  $1.0$  $$  $\mathbf{0}$ 
${\alpha}_{6}$  $0.71$  $1.0$  $0.71$  $1.0$  $\mathbf{0}$  $$ 
Note that, for each class, the minimal distance between the classifications is zero, and the experts can be defined with respect to this distance. For class ${C}_{1}$, a zero distance ${d}_{NorHam}\left({\alpha}_{1},{\alpha}_{3}1\right)=0$ is obtained between classifications ${\alpha}_{1}$ and ${\alpha}_{3}$, i.e., the first and third agents are considered to be experts with respect to class ${C}_{1}$. Similarly, for class ${C}_{2},$, zero distances are obtained for ${d}_{NorHam}\left({\alpha}_{2},{\alpha}_{3}2\right)=0$ and, thus, the second and third agents are considered to be experts with respect to class ${C}_{2}$; ${d}_{NorHam}\left({\alpha}_{4},{\alpha}_{6}3\right)=0$, so the fourth and sixth agents are experts in class ${C}_{3}$; and ${d}_{NorHam}\left({\alpha}_{5},{\alpha}_{6}4\right)=0$, so the fifth and sixth agents are considered to be experts with respect to class ${C}_{4}$.
Following these distance calculations, the “experts” in each class obtain a weight of
$1$, while the other nonexpert agents obtain zero weights. Thus, in this weighting scheme, only expert classifications are considered. Finally, in the considered dataset (see
Table 1), according to the opinions of the experts (the first and third agents), the first class is
${C}_{1}^{*}=\left\{{x}_{2},{x}_{6},{x}_{8}\right\}$; according to the opinions of the experts (the second and third agents), the second class is
${C}_{2}^{*}=\left\{{x}_{1},{x}_{3},{x}_{10}\right\}$; according to the opinions of the experts (the fourth and sixth agents), the third class is
${C}_{3}^{*}=\left\{{x}_{4},{x}_{5},{x}_{11}\right\}$; and according to the opinions of the experts (the fifth and sixth agents), the fourth class is
${C}_{4}^{*}=\left\{{x}_{7},{x}_{9},{x}_{12}\right\}$.
Then, the resulting partitioning of the dataset is as follows:
Note that this straightforward, illustrative example does not require (and does not demonstrate) the complicated clustering and correction steps by the EM algorithm, which plays an important role in the realworld datasets, where the division of the agents into experts and nonexperts is not binary.
6. Numerical Simulations and Comparisons
The suggested algorithm was studied using two data settings: simulated data with known characteristics, which enabled the analysis of the effectiveness and robustness of the DBCC algorithm, and realworld data obtained from a dedicated questionnaire.
Classifications obtained by the suggested Algorithm 2 were compared with the results provided by the optimal likelihoodmaximization bruteforce algorithm, the majority vote, the most accurate heuristic FDS algorithm, and the fastest GLAD algorithm.
The algorithms were implemented in the Python programming language and run on a standard Lenovo ThinkPad T480 PC with an Intel^{®} Core™ i78550U Processor (8M Cache, 4.00 GHz) and 32 GB memory (DDR4 4267 MHz).
6.1. Data
To analyze the proposed method, it was applied to different datasets: (i) simulated data, (ii) realworld data with simulated classes and, finally, (iii) an entirely realworld questionnaire dataset. In the first case, for a given
$n$ entity
${x}_{i}$,
$i=1,2,\dots ,n$, we simulated both the classes
${C}_{j}$,
$j=1,2,\dots ,l$, and the agents’ classifications
${\alpha}_{k}$,
$k=1,2,\dots ,m$; in the second case, we used realworld data with simulated labeled datasets; and in the third case, we created and analyzed an online questionnaire that measures the levels of expertise of users regarding famous paintings and painters (the questionnaire is available via the link Famous painters (google.com); see the
Appendix A).
6.1.1. Simulated Data
In the simulated data, we used $m\in \left\{4,10,16,20,24,32\right\}$ agents in the trials, while their classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$, were randomly generated. The probability of obtaining correct classifications for expert agents was specified as ${p}_{e}\in \left[0.6,1.0\right]$, and for nonexpert agents as ${p}_{n}\in \left[0.2,0.6\right]$. The number of entities in the trials was $n\in \left\{50,200,300,500,1000,2000\right\}$, and the number of classes was $l\in \left\{2,3,4,6,8,10,12,16\right\}$.
6.1.2. RealWorld Data with Simulated Classes
In the case of realworld data with simulated classes, we considered the realworld data from different databases, where to define multiple agents with different expertise, we used simulated labeling of the data. The agents were simulated by using different classifiers (e.g., random forests), and their expertise over different classes was simulated by scrambling the features in the dataset. In a comparative analysis, we used seven known datasets from Kaggle [
3], as follows: Iris, Abalone Age, Glass Type, Students’ Results, User Activity, Robots Conversation, and Wine Quality.
For example, in the Iris dataset, the agents’ expertise was defined as follows: Agent 1 and Agent 2 are experts in the class “Irissetosa”, Agent 2 is an expert in the class “Irisversicolor”, and Agent 4 is an expert in the class “Irisverginica”. Recall that according to this definition, the probability that these agents provide correct classification of the entities of these classes is higher.
In addition, we used the WiFi localization database from the Machine Learning Repository [
21]. The datasets have different numbers of entities
$150<n<4000$ and different numbers of classes
$l\in \left\{3,4,5,6\right\}$; per the different numbers of classes, different numbers of agents
$10<m<20$ are simulated with various levels of expertise.
6.1.3. RealWorld Data
To obtain realworld data, we designed and distributed an online questionnaire that contains questions on painters and paintings based on common knowledge. In particular, the questionnaire contains 40 paintings created by eight famous painters. The agents were asked to indicate the painter of each painting. Thus, in terms of classification, the agents were required to classify
$n=40$ entities into
$l=8$ classes. The questionnaire was offered to
$m=90$ volunteers in the university, including both students and professors, without any specific educational background in the arts. An example of the paintings and questionnaire that was used are presented in the
Appendix A.
6.2. Algorithms for Comparisons
The results obtained by the suggested algorithm were compared with the results obtained by four baseline methods: (i) the widely used majority voting algorithm; (ii) the bruteforce maximumlikelihood optimization; (iii) the FDS algorithm, which was recently proposed as an effective heuristic to establish an expertbased classification; and (iv) the GLAD algorithm.
6.2.1. Majority Vote
A majority vote is a simple and popular rule that is often used in different tasks of social choices. The algorithm based on this rule acts as follows:
Let $X=\{{x}_{1},{x}_{2},{\dots ,x}_{n}\}$ be a set of entities that should be classified by $m$ experts to $l\le n$ possible number of classes. Then, the entity ${x}_{i}$, $i=1,2,\dots ,n$, is classified to class ${C}_{j}$, $j=1,2,\dots ,l$, if the majority of the agents classified it to this class (thus, labeling it by the $j$th label); ties are broken randomly.
As indicated above, despite its simplicity, in crowdsourcing tasks, the majority vote rule provides good results when the number of agents is relatively large and with similar levels and fields of expertise.
6.2.2. Likelihood Maximization
The likelihoodoptimization procedure—the Algorithm 1 presented in
Section 4, is an optimal bruteforce algorithm that is used to obtain an optimal solution in relatively small problems.
In the numerical simulations, optimization problem (1) has been solved by using a local search heuristic that is feasible for considered cases with a small number of agents.
6.2.3. Fast Dawid–Skene Algorithm
As indicated above, the fast Dawid–Skene (FDS) algorithm [
5] is a modification of the original DS aggregation algorithm proposed by Dawid and Skene [
2].
The FDS algorithm follows the EM approach, such that at the Estep, the data are classified using the current parameter values, and at the Mstep, these values are corrected to maximize the likelihood of the data. The algorithm starts with some initial classification. It then alternates between the Estep and the Mstep up to convergence, such that the difference between the current and the previously obtained classifications is less than the predefined small value.
The unsupervised classification algorithm follows the same approach with the aboveindicated differences in the classifications conducted at the Estep and in the used parameters.
6.2.4. GLAG Algorithm
The generative model for labels, abilities, and difficulties (GLAD) [
6] is a probabilistic algorithm that simultaneously infers the expertise of each agent, the context of the entity, and the most likely class for each entity.
Similar to the other indicated methods, this algorithm follows the EM approach, namely, given the agents’ classifications and initial expertise. At the Estep, it computes the posterior probability for every entity, and at the Mstep it maximizes the expectation of the loglikelihood of the observed and hidden parameters using gradient descent.
6.3. Simulation Results
The suggested algorithm was implemented over different datasets, as indicated above, with different groups of agents, and compared with the four outlined algorithms.
6.3.1. Likelihood Maximization vs. Majority Voting
The comparison of the algorithm based on majority voting (
Section 6.2.1) and the likelihoodmaximization Algorithm 1 (
Section 6.2.2) was conducted using the simulated settings, with
$m=8$ and
$m=12$ agents. In both cases, the number of entities was
$n=400$, and the number of classes was
$l=4$. Such a relatively small dataset enables the application of the optimal likelihoodmaximization Algorithm 1 and its timely execution. The results of the simulations are summarized in
Table 2.
In the considered settings, the likelihoodmaximization Algorithm 1 outperformed majority voting both for $m=8$ and for $m=12$ agents and provided a higher accuracy hit rate within similar computation times.
6.3.2. Suggested Algorithm vs. Majority Voting
In the next simulations, the proposed DBCC algorithm was compared with the majority voting rule. In the simulations, agents of different levels of expertise were selected, such that their classifications ${\alpha}_{k}$, $k=1,2,\dots ,m$ would follow a correct classification with probabilities ${p}_{n}\in \left[0.2,0.6\right]$ for nonexpert agents, and with probabilities (reliability) ${p}_{e}\in \left[0.6,1.0\right]$ for expert agents. Since the probabilities ${p}_{n}$ and ${p}_{e}$ are, in essence, measures of the agents’ levels of expertise in certain fields, we refer to these probabilities as the reliabilities of the agents.
The trials were executed for
$m=32$ experts classifying
$n=500$ entities into
$l=8$ classes. The probabilities of correct classifications (considered as the reliabilities of the agents) were
${p}_{e}\in \left[0.6,1.0\right]$. The percentage of times where the proposed DBCC algorithm outperformed the majority vote method with respect to the ratio of expert and nonexpert reliability is given in
Figure 1.
As expected, for homogeneous groups that included agents with close levels of expertise, majority voting outperformed the suggested algorithm. However, for heterogeneous groups of agents with different levels of expertise, the suggested DBCC algorithm outperformed the majority voting method.
These results demonstrate once again that the suggested algorithm is preferable over a majority vote for practical tasks where the group of agents includes both experts and nonexperts with respect to different fields.
Figure 2 demonstrates the percentage of times when the proposed DBCC algorithm outperformed the majority voting method in a classification of
$n$ entities. In these simulations, the probability that the experts would provide correct classifications was
${p}_{e}=0.7$, and the probability that the nonexperts would provide correct classifications was
${p}_{n}=0.2$.
It can be seen that for a heterogeneous group of agents that includes both experts and nonexperts, the suggested algorithm substantially outperforms the majority voting method, and its effectiveness increases with the size of the group.
In both settings, that group of agents included experts and nonexperts of different levels of expertise. For these agents, the suggested DBCC algorithm outperformed the majority voting method. At the same time, the effectiveness of the suggested algorithm decreased with the decrease in expertise levels, and when the group of agents included only nonexperts, it became less effective than majority voting.
The obtained results demonstrate that the suggested algorithm is preferable in tasks where a small number m of agents classifies a large number n of entities. In contrast, if the number $n$ of entities is small and the number $m$ of agents is large, it is preferred to use a majority vote.
6.3.3. Accuracy Analysis
The accuracy of the suggested algorithm was compared against the accuracy of the majority voting (see
Section 6.2.1), the likelihoodmaximization Algorithm 1 (see
Section 6.2.2), and the FDS algorithm (see
Section 6.2.3). In addition, we also present the results of the GLAD algorithm [
6]. The results of the simulations are shown in
Figure 3.
It can be seen that for a relatively small number of entities ($n<750$), the suggested DBCC algorithm outperforms the benchmark algorithms. For a larger number of entities, the DBCC is close to the FDS and the likelihoodmaximization algorithms. The other two methods, majority voting and GLAD, result in lower accuracy.
For many entities (n > 1000), the confusion matrices in the likelihoodmaximization become more accurate and very close to optimal, which affects the algorithm’s accuracy until it becomes closer to the optimal solution, obtaining 100% accuracy.
In the next simulations, the algorithms were applied to the realworld data [
3,
21] with simulated labeling, as indicated in
Section 6.1.2. The results of the simulations are shown in
Figure 4.
The DBCC algorithm and the FDS algorithm outperformed the majority voting in all of the datasets. Additionally, it should be noted that since the likelihoodmaximization Algorithm 1 utilizes the probabilities that the agents provide correct classification and depends on the correctness of these probabilities, it results in lower accuracy on the datasets with a relatively small number of entities than the other algorithms. In contrast, the datasets with large numbers of entities demonstrate optimal accuracy. This observation illustrates the wellknown difference between statistical probabilities estimated by relatively small samples vs. theoretical probabilities that are defined over infinite populations.
6.3.4. Run Time until convergence
In the last simulations, the run time until convergence of the suggested algorithm was studied. We compared it with the benchmark methods: the likelihoodmaximization Algorithm 1 (see
Section 6.2.2), the FDS algorithm (see
Section 6.2.3), and the previously mentioned GLAD algorithm (see
Section 6.2.2). Since the majority voting method is not an iterated process, we did not consider it in these simulations. The graphs of the run time with respect to the number
$n$ of entities are shown in
Figure 5.
It can be seen that the run time of the suggested algorithm is very close to the run time of the fastest GLAD algorithm and, as in the GLAD algorithm, it linearly depends on the number of entities.
The likelihoodmaximization Algorithm 1 is the slowest algorithm, since it checks all of the possibilities to find the maximum likelihood according to the given confusion matrices. The number of such possibilities increases exponentially with the number of entities.
The FDS algorithm is faster than the likelihoodmaximization Algorithm 1, but it is still slower than the suggested algorithm since, in contrast to the suggested algorithm, it calculates the maximum likelihood of every class for every entity in the Estep.
Thus, from the run time point of view, the suggested algorithm acts similarly to the fastest algorithm and results in classifications that are close in accuracy to the classifications created by the most accurate algorithms.
7. Conclusions
In this paper, we present a novel algorithm for unsupervised collaborative classification of a set of arbitrary entities. In contrast to the existing methods, the suggested algorithm starts with the classification of the agents to experts and nonexperts in each domain, and then it generates classification of the entities by preferring the opinions of the expert agents.
Classification of the agents is based on the assumption that the experts have similar opinions in their field of expertise, while the nonexperts often tend to disagree and adopt different opinions in fields in which they are not experts.
Classification of the entities is based on the conventional expectationmaximization method initialized by majority vote and using the agents’ levels of expertise, as defined at the stage of the agents’ classification.
To verify the activity of the algorithm, we also formalized the considered task in the form of an optimization problem and suggested the likelihoodmaximization algorithm (LMax) that uses brute force and provides the accurate solution.
Numerical simulations of the suggested DBCC algorithm and its comparisons with the known methods, such as majority vote, the FDS algorithm, and the GLAD algorithm, demonstrated that the run time of the suggested Algorithm 2 depends linearly on the number of entities, and it is close in run time to the fastest GLAD algorithm.
The accuracy of the suggested Algorithm 2 depends on the expertise levels of the agents. For the heterogeneous group that includes both experts and nonexperts, the suggested Algorithm 2 resulted in a higher accuracy than the known heuristic algorithms and, especially, outperformed them in the scenarios where a small group of the agents considered a dataset with many entities.