You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

27 November 2023

Unsupervised Classification under Uncertainty: The Distance-Based Algorithm

,
,
,
,
and
1
Department of Industrial Engineering, Tel-Aviv University, Ramat-Aviv, Tel-Aviv 69978, Israel
2
Department of Industrial Engineering and Management, Faculty of Engineering, Ariel University, Ariel 40700, Israel
3
Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala 147004, India
4
Department of Management Science and Engineering, Institute of Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
This article belongs to the Section D2: Operations Research and Fuzzy Decision Making

Abstract

This paper presents a method for unsupervised classification of entities by a group of agents with unknown domains and levels of expertise. In contrast to the existing methods based on majority voting (“wisdom of the crowd”) and their extensions by expectation-maximization procedures, the suggested method first determines the levels of the agents’ expertise and then weights their opinions by their expertise level. In particular, we assume that agents will have relatively closer classifications in their field of expertise. Therefore, the expert agents are recognized by using a weighted Hamming distance between their classifications, and then the final classification of the group is determined from the agents’ classifications by expectation-maximization techniques, with preference to the recognized experts. The algorithm was verified and tested on simulated and real-world datasets and benchmarked against known existing algorithms. We show that such a method reduces incorrect classifications and effectively solves the problem of unsupervised collaborative classification under uncertainty, while outperforming other known methods.

1. Introduction

Classification under uncertainty by a group of agents is a common task that appears in different fields. In some applications it is formulated as a labeling process of similar entities (also called “instances”), while in others it is formulated as clustering procedures. For example, consider a group of physicians analyzing the medical records of a patient. Each physician analyzes the symptoms of the patient and diagnoses possible diseases, thus classifying or tagging the case with the disease name. The final diagnosis of the group is made based on the collective classifications provided by the group members. Naturally, with prior knowledge of the expertise of each physician, a larger weight can be given to those physicians who are experts in the specific disease. Note, however, that the challenge of reaching a collective decision is further enhanced when there is no prior knowledge on the agents’ expertise. This can be the case when ad hoc classifications are obtained by online surveys and questionnaires based on anonymous users with different yet unknown expertise levels.
One of the most popular methods to reach a collective classification based on a group of agents’ answers is known as the “wisdom of the crowd” (WOC). According to this approach, a decision can be reached based on the aggregated opinion of the agents, including both the experts and non-experts []. WOC is usually based on a majority (or plurality) vote, meaning that an opinion preferred by most of the agents is considered to be a correct answer. The WOC’s main assumption is that the expertise level of the agents is distributed somewhat symmetrically around the unknown true answer. Therefore, it makes sense to apply a majority vote procedure to obtain better accuracy (i.e., relying on the law of large numbers). Numerically, the majority vote is represented by the median statistics, and for a relatively large number of non-skewed agents, it effectively solves the group classification problem. Another setting where a majority vote is effective is when agents who make classifications have high and homogeneous levels of expertise in the considered field.
Nonetheless, in various settings, the WOC assumption does not hold. For example, in online questionnaires over the internet in specific fields, only a few of the users are real experts in the field, while most of the users are non-experts, and considering their opinions can seriously reduce the collective classification accuracy.
In this paper, we focus on ad hoc classification by a group of agents with unknown different levels of expertise. The suggested algorithm includes two stages:
-
Classification of the agents according to the levels of their expertise;
-
Classification of the entities with respect to the agents’ levels of expertise.
In other words, in the first stage, the algorithm recognizes the experts in the fields of the presented entities, and in the second stage it classifies the entities, preferring the opinions of these experts (for example, using some weighting scheme or expectation-maximization scheme).
In the classification of the agents, we assume that the agents with the same fields of expertise have relatively close or even the same opinions in their field of expertise, while the non-experts’ opinions (if they are not biased) are more scattered over other possible classifications. Accordingly, if the agents propose similar classes for the same entities, then these agents are considered to be experts in these classes. Consequently, a lower level of expertise can be associated with agents who are inconsistent in their opinions and create classes that differ from the classes proposed by the other agents. Certainly, if the levels of the agents’ expertise are known, then this stage can be omitted, and the problem can be reduced to the majority or plurality votes and further optimization procedures.
In the classification of the entities, one can utilize conventional methods such as combination of the weighted agents’ classifications. We follow the expectation-maximization (EM) approach as suggested by Dawid and Skene []. In the expectation (E) step, the algorithm estimates correct choices according to the agent’s expertise, and in the maximization (M) step, it maximizes the likelihood of the agent’s expertise with respect to the distances from the correct choices. To measure the distances between the agents’ classifications, we use the weighted Hamming distance, which is a normalized metric over the set of partitions that represent the agents’ classifications.
The suggested algorithm was validated and tested using simulated and real-world datasets [,]. The obtained classifications were compared against several approaches: (i) classifications obtained by a brute-force likelihood-maximization (LM) algorithm (see Section 4), (ii) majority vote (see Section 6.2.1), (iii) the recently developed fast Dawid–Skene (FDS) algorithm [], and (iv) the widely known GLAD classification algorithm []. It was found that the proposed algorithm considerably outperforms these popular methods due to its higher accuracy and lower computation time.
The rest of this paper is organized as follows: In Section 2, we briefly overview the related methods that form a basis for the suggested techniques. Section 3 includes a formal description of the considered problem. In Section 4, we outline and clarify the brute-force likelihood-maximization algorithm, which is used for comparisons of the classifications of small datasets. Section 5 presents the suggested distance-based collaborative classification (DBCC) algorithm. Section 6 includes the results of the numerical simulations and the comparisons of the proposed DBCC algorithm with other classification techniques. Section 7 concludes the discourse.

3. Problem Setup

Let X = { x 1 , x 2 , , x n } be a set of n entities that represent certain characteristics of some phenomenon, and let j = 1 , 2 , , l be the labels by which the set of entities can be divided into l classes C j X , such that j = 1 l C j = X and C i C j = while i j . The set of the correct classes C j forms an ordered partition γ = { C 1 , C 2 , , C l } , where the order of the classes is defined by the order of the labels in the sense that if the labels i and j holds i < j , then class C i precedes class C j in γ .
We assume that the classification of the entities is conducted by m agents. Consequently, each k th agent, k = 1 , 2 , , m , generates a partition α k = { C 1 k , C 2 k , , C l k } of the set X by labeling the entities, and this partition represents the agent’s opinion on the considered phenomenon. Similar to the partition γ , the order in the agents’ partitions α k , k = 1 , 2 , , m , is defined by the order of the labels j = 1 , 2 , , l . It is assumed that the agents are independent in their opinions. However, different agents, u and v , u v , can generate equivalent classifications α u = α v , where C j u = C j v , j = 1 , 2 , , l . In addition, it is assumed that for each class, C j X , j = 1 , 2 , , l , there exists at least one agent u who is an expert in this class. This assumption implies that if the correct classification γ = { C 1 , C 2 , , C l } is available, class C j u from the agent’s classification α u is equivalent to class C j from the correct classification γ .
The considered problem is formulated as follows: given the set X = { x 1 , x 2 , , x n } of entities and the set A = α 1 , α 2 , , α m of classifications created by m experts using l labels, find a classification γ * = { C 1 * , C 2 * , , C l * } , l n , which is as close as possible to the unknown correct classification γ = { C 1 , C 2 , , C l } .
To clarify the problem, let us consider a toy example of the dataset presented in Table 1. The dataset consists of n = 12 entities classified by m = 6 agents with l = 4 classes. The unknown correct classification is denoted by γ . In addition, we use γ M to denote the classification obtained by the majority vote.
Table 1. Example of the simulated data with the correct classification and the majority vote for m = 6 agents classifying n = 12 entities by l = 4 classes.
The columns in the table are denoted by r 1,1 , r 2,1 , , r 12,1 T , …, r 1,6 , r 2,6 , , r 12,6 T , where the table entry r i , k represents the classification of element x i by agent a k to one of the classes C 1 , C 2 , C 3 , and C 4 . The actual table entries are the tags of the corresponding class, namely, 1 , 2 , 3 , and 4 .
In this example, we assume that the first agent, k = 1 , is an expert in class C 1 , the second agent, k = 2 , is an expert in class C 2 , the third agent, k = 3 , is an expert in classes C 1 and C 2 , the fourth agent, k = 4 , is an expert in class C 3 , the fifth agent, k = 5 , is an expert in class C 4 , and finally, the sixth agent, k = 6 , is an expert in the last classes C 3 and C 4 . The data are summarized in Table 1.
The results of the comparison of the agents’ classifications a k with the correct classification γ appear in the eighth column of Table 1. It can be seen that each agent k = 1 , 2 , , 6 provides the classification a k which is rather far from the correct classification, γ . Similarly, the classification γ M , in the last column of Table 1, generated by the majority vote is also far from the correct classification (with an accuracy level of 50 % ). Thus, majority voting does not work well in this case, since the agents’ classifications are not symmetrically distributed around the correct class. Note, however, that classification of the proposed algorithm that is presented in Section 5, and denoted by γ * , which classifies the entities by the agent’s expertise (which is unknown a priori), is equivalent to the correct classification γ , i.e., it results in a 100 % accurate classification, where for
-
Expert k = 1 , class C 1 = x 2 , x 6 , x 8 ;
-
Expert k = 2 , class C 2 = x 1 , x 3 , x 10 ;
-
Expert k = 3 , classes C 1 = x 2 , x 6 , x 8 and C 2 = x 1 , x 3 , x 10 ;
-
Expert k = 4 , class C 3 = x 4 , x 5 , x 11 ;
-
Expert k = 5 , class C 4 = x 7 , x 9 , x 12 ;
-
Expert k = 6 , classes C 3 = x 4 , x 5 , x 12 and C 4 = x 7 , x 9 , x 12 .
Thus, by identifying the expert agents, a correct classification can be achieved (see the implementation of the proposed algorithm to Table 1 at the end of Section 5).
Note, again, that in the considered setup both the correct classification and the agents’ levels and fields of expertise are unknown, and this information should be estimated only from the agents’ classifications. As seen later, the recognition of the expert agents is based on the assumption that experts in the same field of expertise provide closer answers than the answers of the non-expert agents.

4. Local Search by Likelihood Maximization

Inspired by the considered example, where the best classification is provided by considering the opinions of the experts, we start with an algorithm that provides an exact solution by maximization of the expected likelihood between the agents’ classifications. This algorithm follows the brute force approach and, because of its high computational complexity, it can be applied only to small datasets.
Let X = { x 1 , x 2 , , x n } be a set of entities and A = α 1 , α 2 , , α m be the set of agents’ classifications α k = { C 1 k , C 2 k , , C l k } , k = 1 , 2 , , m , while the correct classification γ = C 1 , C 2 , , C l is unknown to the agents.
Let r i k 1 , , l be the tag by which the k th agent-labeled entity is x i (see the columns in Table 1); in other words, the values r i k are the opinions of the agents about that entity, and r i k = j denotes that in the classification of an agent α k , entity x i is in class C j k .
Assume that in the correct classification γ an entity x i X belongs to class C j . Since γ is unknown, we consider the probability p j j k = P r r i k = j | x i C j that the k th agent classifies an entity x i as a member of the class C j while the correct class is C j , and P k = p j j k l × l denotes the probability matrix that includes the opinions p j j k of the k th agent, k = 1 , 2 , , m , on the membership of the entity x i , i = 1 , 2 , , n to the classes C j , j = 1 , 2 , , l . If agent k is completely reliable, then P k is a unit matrix. In general, the agent is considered to be an expert in class C j if p j j k is close to one, while p j j k and p j j k are close to zero for all j j .
Finally, we denote by p C = P r C the probability that class C X includes at least one entity. Then, if C ~ i is an estimated class for the entity x i , then p C ~ i is the probability that entity x i will be classified to class C ~ i . Additionally, we denote by c ~ i l the label associated with class C ~ i . Similarly, C i denotes the correct class; for the i th entity x i , the value p C i is the probability that the entity x i will be correctly included in the class C i .
Using these terms, the classification problem can be formulated as a problem of finding the classes C ~ i , i = 1 , 2 , , n , the matrices P k , k = 1 , 2 , , m , and the probabilities p c ~ i , r i k k that maximize the likelihood function
L C ~ , p C ~ , P 1 , P 2 , , P m = i = 1 n p C ~ i j = 1 m p c ~ i , r i k k .
In the other words, it is required to maximize the value of the likelihood function
L C ~ , p C ~ , P 1 , P 2 , , P m m a x
with respect to its arguments and subject to the relevant conditions:
j = 1 l p j j k = 1 ,   p j j k 0 ,   k = 1 , 2 , , m ,   j , j = 1 , 2 , , l .
j = 1 l p C ~ i = 1 ,   c ~ i 1 , 2 , , l ,   p C ~ i 0 ,   i = 1 , 2 , , n .
An approximated solution of this problem can be defined as follows:
p C i = i = 1 n I C ~ i = C i / n ,
p c i , j k = i = 1 n I C ~ i = C i r i k = j / i = 1 n I C ~ i = C i ,
where I is an indicator function that is I a = b = 1 if a = b and I a = b = 0 otherwise. The approximated solution can be obtained, for example, by majority vote (see Section 6.2.1), which can also be used as an initial solution in the considered optimization algorithm.
The proposed algorithm, which aims to solve optimization problem (1) by local search. is outlined as follows (Algorithm 1).
Algorithm 1: Likelihood Maximization
Given the set X of n items x i , i = 1 , 2 , , n , and the set of the agents’ classifications α k , k = 1 , 2 , , m , do:
  • Create the agents’ opinions matrix r = r i k , i = 1 , 2 , , n , k = 1 , 2 , , m .
  • Start with the solution given by the approximate formulae or by majority vote.
  • Solve optimization problem (1).
  • While no improvements to the current solution (which is the set of classes C ~ i , i = 1 , 2 , , n ) in its entire neighborhood are found, do:
  • Define the neighbors of the solution as the classifications that can be obtained from the solution by changing the estimated class C ~ i for a single entity x i ;
  • Calculate the likelihood for the set of neighboring classifications;
  • Exclude the neighbors with a small likelihood;
  • Solve optimization problem (1);
  • End while.
  • Return the obtained solution.
Following the outlined algorithm, an initial solution is refined iteratively until reaching the maximal expected likelihood. Such a method can provide an optimal solution to the problem; however, it requires high computation power and can be implemented only for relatively small problems. The time complexity of the Algorithm 1 is O υ n m l 3 , where n is the number of entities, m is the number of agents, l is the number of classes, and υ is the number of iterations until algorithm convergence. Here, υ is the number of repetitions of lines 5–8 in the while loop, where a maximum of l classes are defined for each entity of a maximum of n entities, and the optimization problem is solved by m l 2 steps. Since the number of classes l is at most equal to the number of items n , the complexity of the Algorithm 1 in the worst case is O υ m n 4 .
Having said that, the above Algorithm 1 can be used to prove the existence of a solution to the problem under the indicated assumption. Moreover, in the simulations shown below, we use this algorithm for analysis and comparison of the optimal classifications against the classifications generated by the heuristic method that is suggested next.

5. Suggested Algorithm: Distance-Based Collaborative Classification

The suggested algorithm, called the distance-based collaborative classification (DBCC) algorithm, consists of two stages: in the first stage, based on the presented opinions, the agents are tagged as experts and non-experts for each of the different classes, and in the second stage, the classification of the entities is conducted with respect to the agents’ levels of expertise.
Classification of the agents according to their expertise levels is based on the assumption that agents with similar fields of expertise produce similar classifications of the related entities. On the other hand, the classifications of non-expert agents are distributed over a relatively larger range of classes. Consequently, the tagging of the agents as experts and non-experts is conducted by clustering the agents’ classifications α k , k = 1 , 2 , , m , with respect to the different classes.
Let s i m α u , α v | C be a certain measure of similarity between two classifications α u and α v with respect to the class C X , u , v = 1 , 2 , , m . Then, over all of the agents’ classifications α k , k = 1 , 2 , , m , a central classification ξ C with respect to class C can be defined as follows:
ξ C = argmin u = 1 , 2 , , m v = 1 m s i m α u , α v | C .
The assumption about the closeness of the classifications produced by experts in a certain class implies that the values s i m α k , ξ | C of similarities between the agents’ classifications α k , k = 1 , 2 , , m , and some central classification ξ C are distributed according to the mixture of two distributions: the first represents the distribution of the experts in class C , and the second represents the distribution of the non-experts in this class.
The similarity between the classifications can be measured by several methods, for example, by the Rokhlin or Ornstein distances, or by the symmetric version of the Kullback–Leibler divergence (for the use of such metrics, refer, e.g., to []). However, to avoid additional specification of probabilistic measures over the entities, in the suggested algorithm, we use a normalized version of the well-known Hamming distance. This distance is defined as follows:
Let α u = { C 1 u , C 2 u , , C l u } and α v = { C 1 v , C 2 v , , C l v } be two classifications of the set X = { x 1 , x 2 , , x n } entities. Consider the classes C j u α u and C j v α v , j = 1 , 2 , , l , and let n α u | j = # C j u denote the cardinality of the class C j u , while n α v | j = # C j v denotes the cardinality of the class C j v . The values n α u | j and n α v | j are the numbers of entities that are included in the j th class or, similarly, are tagged with the label j by agents u and v , respectively. In other words, n α u | j and n α v | j represent the independent opinions of agents u and v about the j th class.
In addition, let n α u , α v | j = # C j u C j v \ C j u C j v denote the cardinality of the symmetric difference between the classes C j u and C j v . The number n α u , α v | j represents the disagreement of the agents about the j th class. The normalized Hamming distance between the classifications α u and α v is defined as the following ratio:
d n o r H a m α u , α v | j = n α u , α v | j / n α u | j + n α v | j .
For each j , the defined distance d N o r H a m α u , α v | j is a metric such that 0 d n o r H a m α u , α v | j 1 . This represents the disagreements between the agents with respect to different classes and, consequently, enables the definition of experts and non-experts per class.
Additionally, using this distance, the set of classifications α k and, consequently, the set of m agents can be considered as a metric space that allows for the application of conventional clustering algorithms. In the suggested DBCC algorithm, we apply Gaussian mixture clustering and the expectation-maximization algorithm [].
As a result of the clustering, the agents are tagged according to their level of expertise with respect to each class C X . These levels are represented by the weights w k C associated with the agents and are used at the classification stage of the entities.
Classification of the entities x i X , i = 1 , 2 , , n , based on the agents’ opinions α k , k = 1 , 2 , , m , with respect to their expertise levels w k C , C X , is conducted using conventional voting techniques; in the suggested DBCC algorithm, we use the relative majority vote.
In general, the suggested algorithm acts as follows: In the first stage, for each class, the differences (in terms of the normalized Hamming distance) between the agents’ classifications are defined. Using these distances, the agents are divided into two groups: experts and non-experts. At this stage, an assumption is made that the experts in their area of expertise provide similar classifications of the related instances, unlike the non-experts, whose classifications are more diverse. Accordingly, the opinions of the experts gain higher weights with respect to the non-experts when all of the opinions are aggregated.
In the second stage, the entities are classified by majority vote with respect to the weighted opinions of the agents. Then, the obtained solution is corrected following the stages of the EM algorithms; the resulting classification of the entities is considered as an estimated classification obtained at the M-step and is used at the E-step for the definition of more precise levels of the agents’ expertise.
The DBCC algorithm is outlined as follows (Algorithm 2):
Algorithm 2: Distance-Based Collaborative Classification (DBCC) Algorithm
Given the set X of   n items x i , i = 1 , 2 , , n , the enumeration j = 1 , 2 , , l of possible classes and the set of the agents’ classifications α k , k = 1 , 2 , , m , do:
Initialization
  • Initialize distance matrices d m × m , distance arrays a m , and weight arrays w m .
  • Initialize expertise map E n × m .
Classification of the agents and definition of the expertise levels
3.
For each class C j , j = 1 , , l , do:
4.
For each agent u = 1 , 2 , , m , do:
5.
For each agent v = 1 , 2 , , m , do:
6.
Set distance d u v = d n o r H a m α u , α v | j between the classifications α u and α v with respect to class C j into the distance matrix d m × m .
7.
End
8.
End
9.
Find central classification ξ j = Argmin u = 1 , 2 , , m v = 1 m d u v .
10.
For each agent k = 1 , 2 , , m , do:
11.
Set distance d k = d n o r H a m ξ j , α k | j from agent’s classification α k to the central classification ξ j into the distance array a m .
12.
End
13.
Cluster the agents into two groups (experts and non-experts) with respect to distance array a m .
14.
For each agent k = 1 , 2 , , m , do:
15.
Set weight w k into the weights array: the agent with the closest to the center vector obtains the weight 1, and more distant agents obtain the weight 0.
16.
End
17.
For each agent in the group of expert agents, do:
18.
Add class C j to the expertise map E j k of the k th agent with the weight w u .
19.
End
20.
End
Classification of the entities with respect to the agents’ expertise
21.
For each entity x i , i = 1 , , n , do:
22.
For each class C j , j = 1 , 2 , , l , do:
23.
Initialize the score of C j by zero.
24.
End
25.
For each agent k = 1 , 2 , , m , do:
26.
If class C j is in the agent’s expertise map E j k , then add a score to this class.
27.
End
28.
Set a label for entity x i as an index j of the class with the highest score.
29.
End
Correction of the classification by repeating the expectation maximization steps
30.
Repeat until convergence (expectation maximization):
31.
M-step: from the estimated correct classification, obtain normalized Hamming distances for all agents.
32.
E-step: estimate the correct classification by running steps 4–17 over the obtained distances.
33.
End
The suggested DBCC algorithm is a heuristic procedure utilized the EM techniques. At the M-step, it maximizes the likelihood of agents’ expertise by using the distances from the estimated correct classifications. The latter is obtained at the E-step with respect to the agents’ expertise at the previous iteration. The process converges in the sense that the difference between the classifications obtained in two sequential steps tends to be zero. In practice, the process can be terminated when the difference between two sequential classifications decreases more than a certain predefined value of order n × 10 3 .
The time complexity of the suggested Algorithm 2 is O υ l m 2 + ( l + m ) n , where n is the number of entities, m is the number of agents, l is the number of classes, and υ is the number of iterations up to the convergence of the EM part of the algorithm. Here, υ defines the number of iterations of the algorithm (see Line 32); in the term l m 2 , l is the number of iterations the for loop (Lines 3–20), and m 2 is the number of iterations for the loops (lines 4–8) and the number of steps in the operation in Line 9 (the other loops require m steps), and the term ( l + m ) n represents the number of iterations for the loop in Lines 21–29 and two internal for loops (Lines 22–24 and 25–27). Since the number of classes l is at most equal to the number of items n , the complexity of the algorithm in the worst case is O υ m 2 n + n 2 .
To clarify the main advantage of the algorithm that aims to find experts and non-experts for further classification, let us refer back to the dataset presented in Table 1.
Consider the classifications a 1 and a 2 provided by the first and the second agents with respect to class C 1 . Following Equation (3), the distance between the classifications a 1 and a 2 is a ratio between the number of disagreements of the agents about the membership of the entity to a certain class. For the first and the second agents with respect to C 1 , one obtains n α 1 , α 2 | 1 = 3 , which represents the disagreement regarding three entities x 2 , x 6 , and x 8 that were classified by the first agent to the class C 1 ( n α 1 | 1 = 3 ); however, they were classified to other classes by the second agent ( n α 2 | 1 = 0 ) . Thus, the distance d N o r H a m α 1 , α 2 | 1 = 3 / 3 + 0 = 1 is the maximal possible distance between these classifications.
Similarly, the distance between the classifications a 3 and a 4 with respect to the class C 1 is as follows: The number of disagreements between the agents is n α 3 , α 4 | 1 = 6 (entities x 2 , x 3 , x 7 , x 8 , x 10 and x 12 ), while regarding entity x 6 , the agents agree with one another. The numbers of independent classifications of the third and fourth agents about class C 1 are n α 3 | 1 = 3 (entities x 2 , x 6 and x 8 ) and n α 4 | 1 = 5 (entities x 3 , x 6 , x 7 , x 10 and x 12 ), respectively. Thus, d N o r H a m α 3 , α 4 | 1 = 6 / 3 + 5 = 0.75 .
Calculation of the distances among the agents with respect to all four classes C j , j = 1 , , 4 , results in the following tables (zero distances are shown in bold font):
Class   C 1 α 1 α 2 α 3 α 4 α 5 α 6
α 1 1.0 0 0.75 1.0 0.71
α 2 1.0 1.0 1.0 1.0 1.0
α 3 0 1.0 0.75 1.0 0.71
α 4 0.75 1.0 0.75 0.5 0.56
α 5 1.0 1.0 1.0 0.5 0.43
α 6 0.71 1.0 0.71 0.56 0.43
Class   C 2 α 1 α 2 α 3 α 4 α 5 α 6
α 1 0.5 0.5 1.0 1.0 1.0
α 2 0.5 0 1.0 1.0 1.0
α 3 0.5 0 1.0 1.0 1.0
α 4 1.0 1.0 1.0 0.67 0.5
α 5 1.0 1.0 1.0 0.67 0.33
α 6 1.0 1.0 1.0 0.5 0.33
Class   C 3 α 1 α 2 α 3 α 4 α 5 α 6
α 1 0.64 0 . 67 1.0 1.0 1.0
α 2 0.64 0.56 0.8 0.78 0.8
α 3 0.67 0.56 1.0 1.0 1.0
α 4 1.0 0.8 1.0 1.0 0
α 5 1.0 0.78 1.0 1.0 1.0
α 6 1.0 0.8 1.0 0 1.0
Class   C 4 α 1 α 2 α 3 α 4 α 5 α 6
α 1 0.33 0 . 25 1.0 071 0.71
α 2 0.33 0.33 1.0 1.0 1.0
α 3 0.25 0.33 1.0 0.71 0.71
α 4 1.0 1.0 1.0 0.5 0.56
α 5 0.71 1.0 0.71 1.0 0
α 6 0.71 1.0 0.71 1.0 0
Note that, for each class, the minimal distance between the classifications is zero, and the experts can be defined with respect to this distance. For class C 1 , a zero distance d N o r H a m α 1 , α 3 | 1 = 0 is obtained between classifications α 1 and α 3 , i.e., the first and third agents are considered to be experts with respect to class C 1 . Similarly, for class C 2 , , zero distances are obtained for d N o r H a m α 2 , α 3 | 2 = 0 and, thus, the second and third agents are considered to be experts with respect to class C 2 ; d N o r H a m α 4 , α 6 | 3 = 0 , so the fourth and sixth agents are experts in class C 3 ; and d N o r H a m α 5 , α 6 | 4 = 0 , so the fifth and sixth agents are considered to be experts with respect to class C 4 .
Following these distance calculations, the “experts” in each class obtain a weight of 1 , while the other nonexpert agents obtain zero weights. Thus, in this weighting scheme, only expert classifications are considered. Finally, in the considered dataset (see Table 1), according to the opinions of the experts (the first and third agents), the first class is C 1 * = x 2 , x 6 , x 8 ; according to the opinions of the experts (the second and third agents), the second class is C 2 * = x 1 , x 3 , x 10 ; according to the opinions of the experts (the fourth and sixth agents), the third class is C 3 * = x 4 , x 5 , x 11 ; and according to the opinions of the experts (the fifth and sixth agents), the fourth class is C 4 * = x 7 , x 9 , x 12 .
Then, the resulting partitioning of the dataset is as follows:
γ * = C 1 * , C 2 * , C 3 * , C 4 * = x 2 , x 6 , x 8 , x 1 , x 3 , x 10 , x 4 , x 5 , x 11 , x 7 , x 9 , x 12 .
Note that this straightforward, illustrative example does not require (and does not demonstrate) the complicated clustering and correction steps by the E-M algorithm, which plays an important role in the real-world datasets, where the division of the agents into experts and non-experts is not binary.

6. Numerical Simulations and Comparisons

The suggested algorithm was studied using two data settings: simulated data with known characteristics, which enabled the analysis of the effectiveness and robustness of the DBCC algorithm, and real-world data obtained from a dedicated questionnaire.
Classifications obtained by the suggested Algorithm 2 were compared with the results provided by the optimal likelihood-maximization brute-force algorithm, the majority vote, the most accurate heuristic FDS algorithm, and the fastest GLAD algorithm.
The algorithms were implemented in the Python programming language and run on a standard Lenovo ThinkPad T480 PC with an Intel® Core™ i7-8550U Processor (8M Cache, 4.00 GHz) and 32 GB memory (DDR4 4267 MHz).

6.1. Data

To analyze the proposed method, it was applied to different datasets: (i) simulated data, (ii) real-world data with simulated classes and, finally, (iii) an entirely real-world questionnaire dataset. In the first case, for a given n entity x i , i = 1 , 2 , , n , we simulated both the classes C j , j = 1 , 2 , , l , and the agents’ classifications α k , k = 1 , 2 , , m ; in the second case, we used real-world data with simulated labeled datasets; and in the third case, we created and analyzed an online questionnaire that measures the levels of expertise of users regarding famous paintings and painters (the questionnaire is available via the link Famous painters (google.com); see the Appendix A).

6.1.1. Simulated Data

In the simulated data, we used m 4 , 10 , 16 , 20 , 24 , 32 agents in the trials, while their classifications α k , k = 1 , 2 , , m , were randomly generated. The probability of obtaining correct classifications for expert agents was specified as p e 0.6 , 1.0 , and for non-expert agents as p n 0.2 , 0.6 . The number of entities in the trials was n 50 , 200 , 300 , 500 , 1000 , 2000 , and the number of classes was l 2 , 3 , 4 , 6 , 8 , 10 , 12 , 16 .

6.1.2. Real-World Data with Simulated Classes

In the case of real-world data with simulated classes, we considered the real-world data from different databases, where to define multiple agents with different expertise, we used simulated labeling of the data. The agents were simulated by using different classifiers (e.g., random forests), and their expertise over different classes was simulated by scrambling the features in the dataset. In a comparative analysis, we used seven known datasets from Kaggle [], as follows: Iris, Abalone Age, Glass Type, Students’ Results, User Activity, Robots Conversation, and Wine Quality.
For example, in the Iris dataset, the agents’ expertise was defined as follows: Agent 1 and Agent 2 are experts in the class “Iris-setosa”, Agent 2 is an expert in the class “Iris-versicolor”, and Agent 4 is an expert in the class “Iris-verginica”. Recall that according to this definition, the probability that these agents provide correct classification of the entities of these classes is higher.
In addition, we used the Wi-Fi localization database from the Machine Learning Repository []. The datasets have different numbers of entities 150 < n < 4000 and different numbers of classes l 3 , 4 , 5 , 6 ; per the different numbers of classes, different numbers of agents 10 < m < 20 are simulated with various levels of expertise.

6.1.3. Real-World Data

To obtain real-world data, we designed and distributed an online questionnaire that contains questions on painters and paintings based on common knowledge. In particular, the questionnaire contains 40 paintings created by eight famous painters. The agents were asked to indicate the painter of each painting. Thus, in terms of classification, the agents were required to classify n = 40 entities into l = 8 classes. The questionnaire was offered to m = 90 volunteers in the university, including both students and professors, without any specific educational background in the arts. An example of the paintings and questionnaire that was used are presented in the Appendix A.

6.2. Algorithms for Comparisons

The results obtained by the suggested algorithm were compared with the results obtained by four baseline methods: (i) the widely used majority voting algorithm; (ii) the brute-force maximum-likelihood optimization; (iii) the FDS algorithm, which was recently proposed as an effective heuristic to establish an expert-based classification; and (iv) the GLAD algorithm.

6.2.1. Majority Vote

A majority vote is a simple and popular rule that is often used in different tasks of social choices. The algorithm based on this rule acts as follows:
Let X = { x 1 , x 2 , , x n } be a set of entities that should be classified by m experts to l n possible number of classes. Then, the entity x i , i = 1 , 2 , , n , is classified to class C j , j = 1 , 2 , , l , if the majority of the agents classified it to this class (thus, labeling it by the j th label); ties are broken randomly.
As indicated above, despite its simplicity, in crowdsourcing tasks, the majority vote rule provides good results when the number of agents is relatively large and with similar levels and fields of expertise.

6.2.2. Likelihood Maximization

The likelihood-optimization procedure—the Algorithm 1 presented in Section 4, is an optimal brute-force algorithm that is used to obtain an optimal solution in relatively small problems.
In the numerical simulations, optimization problem (1) has been solved by using a local search heuristic that is feasible for considered cases with a small number of agents.

6.2.3. Fast Dawid–Skene Algorithm

As indicated above, the fast Dawid–Skene (FDS) algorithm [] is a modification of the original DS aggregation algorithm proposed by Dawid and Skene [].
The FDS algorithm follows the EM approach, such that at the E-step, the data are classified using the current parameter values, and at the M-step, these values are corrected to maximize the likelihood of the data. The algorithm starts with some initial classification. It then alternates between the E-step and the M-step up to convergence, such that the difference between the current and the previously obtained classifications is less than the predefined small value.
The unsupervised classification algorithm follows the same approach with the above-indicated differences in the classifications conducted at the E-step and in the used parameters.

6.2.4. GLAG Algorithm

The generative model for labels, abilities, and difficulties (GLAD) [] is a probabilistic algorithm that simultaneously infers the expertise of each agent, the context of the entity, and the most likely class for each entity.
Similar to the other indicated methods, this algorithm follows the EM approach, namely, given the agents’ classifications and initial expertise. At the E-step, it computes the posterior probability for every entity, and at the M-step it maximizes the expectation of the log-likelihood of the observed and hidden parameters using gradient descent.

6.3. Simulation Results

The suggested algorithm was implemented over different datasets, as indicated above, with different groups of agents, and compared with the four outlined algorithms.

6.3.1. Likelihood Maximization vs. Majority Voting

The comparison of the algorithm based on majority voting (Section 6.2.1) and the likelihood-maximization Algorithm 1 (Section 6.2.2) was conducted using the simulated settings, with m = 8 and m = 12 agents. In both cases, the number of entities was n = 400 , and the number of classes was l = 4 . Such a relatively small dataset enables the application of the optimal likelihood-maximization Algorithm 1 and its timely execution. The results of the simulations are summarized in Table 2.
Table 2. Simulation results of majority voting and likelihood maximization for m = 8 and m = 12 agents classifying n = 400 entities by l = 4 classes.
In the considered settings, the likelihood-maximization Algorithm 1 outperformed majority voting both for m = 8 and for m = 12 agents and provided a higher accuracy hit rate within similar computation times.

6.3.2. Suggested Algorithm vs. Majority Voting

In the next simulations, the proposed DBCC algorithm was compared with the majority voting rule. In the simulations, agents of different levels of expertise were selected, such that their classifications α k , k = 1 , 2 , , m would follow a correct classification with probabilities p n 0.2 , 0.6 for non-expert agents, and with probabilities (reliability) p e 0.6 , 1.0 for expert agents. Since the probabilities p n and p e are, in essence, measures of the agents’ levels of expertise in certain fields, we refer to these probabilities as the reliabilities of the agents.
The trials were executed for m = 32 experts classifying n = 500 entities into l = 8 classes. The probabilities of correct classifications (considered as the reliabilities of the agents) were p e 0.6 , 1.0 . The percentage of times where the proposed DBCC algorithm outperformed the majority vote method with respect to the ratio of expert and non-expert reliability is given in Figure 1.
Figure 1. Percentage of cases in which the suggested algorithm outperformed the majority voting method in terms of accuracy (y-axis) with respect to the ratio of expert reliability to non-expert reliability (x-axis). Different dashed lines correspond to different levels of the agents’ reliability, which represent the levels of the agents’ expertise.
As expected, for homogeneous groups that included agents with close levels of expertise, majority voting outperformed the suggested algorithm. However, for heterogeneous groups of agents with different levels of expertise, the suggested DBCC algorithm outperformed the majority voting method.
These results demonstrate once again that the suggested algorithm is preferable over a majority vote for practical tasks where the group of agents includes both experts and non-experts with respect to different fields.
Figure 2 demonstrates the percentage of times when the proposed DBCC algorithm outperformed the majority voting method in a classification of n entities. In these simulations, the probability that the experts would provide correct classifications was p e = 0.7 , and the probability that the non-experts would provide correct classifications was p n = 0.2 .
Figure 2. Percentage of cases where the suggested algorithm obtained more accurate classifications than the majority voting method (y-axis) with respect to the number of entities n (x-axis).
It can be seen that for a heterogeneous group of agents that includes both experts and non-experts, the suggested algorithm substantially outperforms the majority voting method, and its effectiveness increases with the size of the group.
In both settings, that group of agents included experts and non-experts of different levels of expertise. For these agents, the suggested DBCC algorithm outperformed the majority voting method. At the same time, the effectiveness of the suggested algorithm decreased with the decrease in expertise levels, and when the group of agents included only non-experts, it became less effective than majority voting.
The obtained results demonstrate that the suggested algorithm is preferable in tasks where a small number m of agents classifies a large number n of entities. In contrast, if the number n of entities is small and the number m of agents is large, it is preferred to use a majority vote.

6.3.3. Accuracy Analysis

The accuracy of the suggested algorithm was compared against the accuracy of the majority voting (see Section 6.2.1), the likelihood-maximization Algorithm 1 (see Section 6.2.2), and the FDS algorithm (see Section 6.2.3). In addition, we also present the results of the GLAD algorithm []. The results of the simulations are shown in Figure 3.
Figure 3. Accuracy of the suggested DBCC algorithm and the benchmark algorithms (y-axis) with respect to the number n of entities (x-axis). In the figure, the algorithms are denoted as follows: CClas—the suggested DBCC algorithm, LMax—the likelihood-maximization algorithm, FDS—fast Dawid–Skene algorithm, GLAD—GLAD algorithm, Maj—majority voting method.
It can be seen that for a relatively small number of entities ( n < 750 ), the suggested DBCC algorithm outperforms the benchmark algorithms. For a larger number of entities, the DBCC is close to the FDS and the likelihood-maximization algorithms. The other two methods, majority voting and GLAD, result in lower accuracy.
For many entities (n > 1000), the confusion matrices in the likelihood-maximization become more accurate and very close to optimal, which affects the algorithm’s accuracy until it becomes closer to the optimal solution, obtaining 100% accuracy.
In the next simulations, the algorithms were applied to the real-world data [,] with simulated labeling, as indicated in Section 6.1.2. The results of the simulations are shown in Figure 4.
Figure 4. Accuracy of the suggested and benchmark algorithms applied to real-world data with simulated labeling. In the figure, the algorithms are denoted as follows: CClas—suggested DBCC algorithm, LMax—the likelihood-maximization algorithm, FDS—fast Dawid–Skene algorithm, GLAD—GLAD algorithm, Maj—majority voting method. The number n of entities in the datasets is as follows: Iris— 150 , Glass— 215 , Students— 1000 , User— 1000 , Wine— 1000 , Robots— 2000 , Wi-Fi— 2000 , and Abalone— 4000 .
The DBCC algorithm and the FDS algorithm outperformed the majority voting in all of the datasets. Additionally, it should be noted that since the likelihood-maximization Algorithm 1 utilizes the probabilities that the agents provide correct classification and depends on the correctness of these probabilities, it results in lower accuracy on the datasets with a relatively small number of entities than the other algorithms. In contrast, the datasets with large numbers of entities demonstrate optimal accuracy. This observation illustrates the well-known difference between statistical probabilities estimated by relatively small samples vs. theoretical probabilities that are defined over infinite populations.

6.3.4. Run Time until convergence

In the last simulations, the run time until convergence of the suggested algorithm was studied. We compared it with the benchmark methods: the likelihood-maximization Algorithm 1 (see Section 6.2.2), the FDS algorithm (see Section 6.2.3), and the previously mentioned GLAD algorithm (see Section 6.2.2). Since the majority voting method is not an iterated process, we did not consider it in these simulations. The graphs of the run time with respect to the number n of entities are shown in Figure 5.
Figure 5. Run time of the suggested and known algorithms with respect to the number n of entities. In the figure, the algorithms are denoted as follows: CClas—the proposed DBCC algorithm, LMax—the likelihood-maximization algorithm, FDS—fast Dawid–Skene algorithm, GLAD—GLAD algorithm.
It can be seen that the run time of the suggested algorithm is very close to the run time of the fastest GLAD algorithm and, as in the GLAD algorithm, it linearly depends on the number of entities.
The likelihood-maximization Algorithm 1 is the slowest algorithm, since it checks all of the possibilities to find the maximum likelihood according to the given confusion matrices. The number of such possibilities increases exponentially with the number of entities.
The FDS algorithm is faster than the likelihood-maximization Algorithm 1, but it is still slower than the suggested algorithm since, in contrast to the suggested algorithm, it calculates the maximum likelihood of every class for every entity in the E-step.
Thus, from the run time point of view, the suggested algorithm acts similarly to the fastest algorithm and results in classifications that are close in accuracy to the classifications created by the most accurate algorithms.

7. Conclusions

In this paper, we present a novel algorithm for unsupervised collaborative classification of a set of arbitrary entities. In contrast to the existing methods, the suggested algorithm starts with the classification of the agents to experts and non-experts in each domain, and then it generates classification of the entities by preferring the opinions of the expert agents.
Classification of the agents is based on the assumption that the experts have similar opinions in their field of expertise, while the non-experts often tend to disagree and adopt different opinions in fields in which they are not experts.
Classification of the entities is based on the conventional expectation-maximization method initialized by majority vote and using the agents’ levels of expertise, as defined at the stage of the agents’ classification.
To verify the activity of the algorithm, we also formalized the considered task in the form of an optimization problem and suggested the likelihood-maximization algorithm (LMax) that uses brute force and provides the accurate solution.
Numerical simulations of the suggested DBCC algorithm and its comparisons with the known methods, such as majority vote, the FDS algorithm, and the GLAD algorithm, demonstrated that the run time of the suggested Algorithm 2 depends linearly on the number of entities, and it is close in run time to the fastest GLAD algorithm.
The accuracy of the suggested Algorithm 2 depends on the expertise levels of the agents. For the heterogeneous group that includes both experts and non-experts, the suggested Algorithm 2 resulted in a higher accuracy than the known heuristic algorithms and, especially, outperformed them in the scenarios where a small group of the agents considered a dataset with many entities.

Author Contributions

Conceptualization, I.B.-G. and P.G.; methodology, I.B.-G., E.K. and T.R.; software, A.G.; validation, P.G., A.G. and P.K.; formal analysis, A.G. and E.K.; investigation, T.R.; resources, I.B.-G.; data curation, P.K.; writing—original draft preparation, E.K.; writing—review and editing, P.K.; visualization, A.G.; supervision, I.B.-G.; project administration, I.B.-G. and P.G.; funding acquisition, I.B.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by Koret Foundation, the Digital Living 2030 grant.

Data Availability Statement

Data in applicable by the links appearing in the references.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The data were collected using Google Forms []. The questionnaire included the images of the paintings and the list of options regarding the author of the painting. The resulting dataset can be downloaded via the link at [].
An example of the question is shown in Figure A1.
Figure A1. Example of the question: (a) the painting and (b) the list of possible alternatives to be selected by the user.
The image is accompanied by a list of painters, and the respondent is required to choose the painter who authored the presented painting. The other examples of the paintings are shown in Figure A2.
Figure A2. Other examples of the paintings appearing in the questionnaire: (a) Paul Klee, Tale a la Hoffmann; (b) Rembrandt Harmenszoon van Rijn, Storm on the Sea of Galilee; (c) Vincent van Gogh, Rooftops; (d) Michelangelo di Lodovico Buonarroti Simoni, Libyan Sibyl.

References

  1. Hamada, D.; Nakayama, M.; Saiki, J. Wisdom of crowds and collective decision making in a survival situation with complex information integration. Cogn. Res. 2020, 5, 48. [Google Scholar] [CrossRef] [PubMed]
  2. Dawid, A.P.; Skene, A.M. Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc. Ser. C 1979, 28, 20–28. [Google Scholar] [CrossRef]
  3. Kaggle Inc. Iris Flower Dataset/Abalone Age Prediction/Glass Classification/Students Test Data/User Activity/Classification of Robots from Their Conversation/Wine Quality Dataset. Available online: https://www.kaggle.com/datasets/ (accessed on 23 November 2023).
  4. The Paintings Authorship. Dataset. Available online: https://www.iradbengal.sites.tau.ac.il/_files/ugd/901879_2cafbbe73b0248828ed5dece50c6c3f0.csv?dn=Painters_dataset.csv (accessed on 23 November 2023).
  5. Sinha, V.B.; Rao, S.; Balasubramanian, V.N. Fast Dawid-Skene: A fast vote aggregation scheme for sentiment classification. In Proceedings of the 7th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining, London, UK, 20 August 2018. [Google Scholar]
  6. Whitehill, J.; Ruvolo, P.; Wu, T.; Bergsma, J.; Movellan, J. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Adv. Neural Inf. Process. Syst. 2009, 22, 2035–2043. [Google Scholar]
  7. Chiu, C.; Liang, T.; Turban, E. What can crowdsourcing do for decision support? Decis. Support Syst. 2014, 65, 40–49. [Google Scholar] [CrossRef]
  8. Ma, J.; Lu, J.; Zhang, G. A three-level-similarity measuring method of participant opinions in multiple-criteria group decision supports. Decis. Support Syst. 2014, 59, 74–83. [Google Scholar] [CrossRef]
  9. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Chen, X.; Zhou, D.; Jordan, M.I. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. J. Mach. Learn. Res. 2016, 17, 1–44. [Google Scholar] [PubMed]
  11. Shah, N.B.; Balakrishnan, S.; Wainwright, M.J. A Permutation-based model for crowd labeling: Optimal estimation and robustness. arXiv 2016, arXiv:1606.09632. [Google Scholar] [CrossRef]
  12. Duan, L.; Oyama, S.; Sato, H.; Kurihara, M. Separate or joint? Estimation of multiple labels from crowdsourced annotations. Expert Syst. Appl. 2014, 41, 5723–5732. [Google Scholar] [CrossRef]
  13. Wei, X.; Zeng, D.D.; Yin, J. Multi-Label Annotation Aggregation in Crowdsourcing. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018. [Google Scholar]
  14. Groot, P.; Birlutiu, A.; Heskes, T. Learning from multiple annotators with Gaussian processes. In Proceedings of the 21st Int Conf Artificial Neural Networks and Machine Learning, Espoo, Finland, 14–17 June 2011; pp. 159–164. [Google Scholar]
  15. Rodrigues, F.; Pereira, F.C. Deep learning from crowds. In Proceedings of the 32nd Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1611–1618. [Google Scholar]
  16. Raykar, V.C.; Yu, S.; Zhao, L.H.; Valadez, G.H.; Florin, C.; Moy, L. Learning from crowds. J. Mach. Learn. Res. 2010, 11, 1297–1322. [Google Scholar]
  17. Bachrach, Y.; Minka, T.; Guiver, J.; Graepel, T. How to Grade a Test Without Knowing the Answers—A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, 26 June–1 July 2012; pp. 819–826. [Google Scholar]
  18. Moayedikia, A.; Yeoh, W.; Ong, K.; Ling, Y. Improving accuracy and lowering cost in crowdsourcing through an unsupervised expertise estimation approach. Decis. Support Syst. 2019, 122. [Google Scholar] [CrossRef]
  19. Kagan, E.; Ben-Gal, I. Probabilistic Search for Tracking Targets; Wiley & Sons: Chichester, UK, 2013. [Google Scholar]
  20. van Dyk, D.A. Fitting Mixed-effects models using efficient EM-type algorithms. J. Comput. Graph. Stat. 2000, 9, 78–98. [Google Scholar]
  21. UCI. 2007. Available online: https://archive.ics.uci.edu/dataset/196/localization+data+for+person+activity (accessed on 23 November 2023).
  22. The Paintings Authorship. Questionnaire (Google Form). Available online: https://docs.google.com/forms/d/e/1FAIpQLSf_iUo1T7gMIPJyEoG0Cz3xoetfv6LNZvHjcmUyRL_Z4i3Kqw/viewform (accessed on 23 November 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.