1. Introduction
Intelligent cooperative multiagent systems
(ICMASs) are applied for a large diversity of difficult real-life problem solving tasks [
1,
2,
3,
4,
5,
6]. In cooperative multiagent systems
(CMASs), the intelligence could be considered at the system’s level. Much research has [
3,
7] proved that, even in cooperative multiagent systems composed of simple agents, an increased intelligence at the systems level could emerge if the member agents cooperate efficiently and flexibly.
In scientific literature, the intelligence estimation is many times based on some intuitive biologically inspired intelligence considerations. There are very few studies related to intelligence measurement, from which few are able also to be applied for accurate and robust symmetric comparison of the intelligence of two or even more multiagent systems. We consider that, similarly to biological systems, there is a variability even in the case of ICMASs. In some situations, a CMAS could behave more intelligently, in others less intelligently. Another aspect consists in treating what we call low and high outlier intelligence values. Sometimes taking into consideration such values could result in an erroneous evaluation and/or comparison result. Another aspect that we consider important consists in the number of compared multiagent systems. If the intelligence of more than two multiagent systems should be compared, an erroneous decision consists in comparing them pairwise. We will further elaborate on this subject in the discussions section.
We have not found in the scientific literature an effective metric that includes all of the considerations previously mentioned, able to simultaneously compare the similarity in intelligence of any number (even more than two) cooperative multiagent systems. With this purpose, we propose a novel metric called
MetrIntSimil that is capable of making an accurate and also robust symmetric comparison of the similarity in intelligence of two or more than two cooperative multiagent systems, taking into consideration the variability in the intelligence of the multiagent systems. The
Travelling Salesman Problem (
TSP) [
8,
9,
10] is a well known NP-hard problem (non-deterministic polynomial-time hardness). NP-hardness, is the defining property of a class of problems that are, “at least as hard as the hardest problems in NP”. Both variants of the
TSP the
Symmetric TSP (
STSP) and the
Asymmetric TSP (
ATSP) are frequently studied. In order to assess the effectiveness of the proposed metric, we conducted a case study. Three multiagent systems composed of cooperative reactive agents specialized in solving a class of NP-hard problems, the
STSP [
11], were considered. The cooperative multiagent systems operated as a
Best-Worst Ant System [
12,
13], a
Min-Max Ant System [
14,
15] and an
Ant Colony System [
16,
17].
The upcoming part of the paper is organized as follows:
Section 2 analyzes the intelligence of cooperative multiagent systems; representative metrics described in the scientific literature proposed for measuring artificial systems intelligence are also presented. In
Section 3, our proposed
MetrIntSimil metric for intelligence comparison of more cooperative multiagent systems is presented. For validation purposes, in
Section 4, a case study was performed. In
Section 5, we discussed on the designed
MetrIntSimil metric and compared it with a recent metric presented in the literature.
Section 6 presents our next research direction. In
Section 7, the main conclusions of the research described in this paper are presented.
3. Description of the MetrIntSimil Metric
In this section, we present a novel universal metric proposed for the accurate and robust symmetric comparison of the similarity in intelligence of two or more than two
CMASs specialized in difficult problem solving. The
MetrIntSimil metric is described as an algorithm called from now on
Multiagent Systems Intelligence Comparison. Henceforth, we consider a set of cooperative multiagent systems denoted as
.
represents the number of compared multiagent systems. The obtained intelligence indicators as a result of problem solving intelligence measuring are denoted as
=
…,
,
=
, …,
=
,
.
represents the cardinality/sample size of
.
Table 1 presents the obtained intelligence indicator results for
.
—represents the measured intelligence of the
;
—represents the measured intelligence of the
; …;
—represents the measured intelligences of the
.
An intelligence indicator should make a quantitative indication of a system’s intelligence. In the case of a particular set of cooperative multiagent systems, the researcher who wishes to compare the intelligence of more multiagent systems should decide on the most appropriate intelligence indicator. Our metric, presented in the form of the
MetrIntSimil algorithm, is appropriate for multiagent systems, where the problem solving intelligence indicator of each system can be expressed as a single value. If necessary, in the case of a multiagent system, this value can be calculated as the weighted sum of some other values that measure different aspects of the system intelligence Equation (
1):
Equation (
1) indicates the general case when the intelligence indicator is calculated as the weighted sum of
r intelligence components measure, where:
denote the intelligence components measure, which are obtained as a result of a problem solving intelligence evaluation; and
denote the intelligence components weights. For illustrative purposes, we present the scenario of an intelligent cooperative multiagent system composed of flying drones (drones with agent properties) denoted
CoopIntDrones. The drones should cooperatively perform different missions established by a human specialist(s) denoted
HE. Based on the efficient cooperative solving of difficult problems, the intelligence can be considered at the system’s level. The intelligence of such a system cannot be unanimously defined. The human specialist(s) who would like to measure the
CoopIntDrones intelligence must clarify what he/she understands by intelligence, establish the corresponding problem solving intelligence indicator, and the intelligence components based on that should generate the intelligence indicator.
HE could consider, for example, the machine intelligence based on the intelligence of fulfilling the mission and the ability to learn.
CoopIntDrones can learn new data/information/knowledge that could increase the efficiency of cooperation and improve the fulfilling of future missions. As intelligence components, the following could be considered: the necessary time for the fulfilling of the mission; the mission fulfilling accuracy; quantity of new data/information/knowledge learnt at the system’s level; quantity of measurable improvement in cooperation efficiency by learning; degree of autonomy in the fulfilling of the mission (counting the number of times for which the remote intervention of human specialists was necessary) and some others.
represent the central intelligence indicators of the . We considered the central intelligence indicators of as the means or the medians of the . The decision for opting as central intelligence indicator for the mean is in the parametric case (all the intelligence indicator data sampled from Gaussian population with equal Variance and Standard Deviation(SD); Variance=SD) or the median in the nonparametric case (not all the intelligence indicator data sampled from Gaussian population or all the intelligence indicator data sampled from Gaussian population, but not all the intelligence indicator data variances are equal).
Figure 1 presents the flowchart of the main processing steps performed by the
MetrIntSimil metric. The
MetrIntSimil algorithm described in detail in
Figure 2 compares the intelligence of
on some testing problems sets. It checks if the results (more concretely, the central intelligence indicators) are similar or different from a statistical point of view. We call in the following,
Null Hypothesis denoted as
H0, the statement that
intelligence are similar from the statistical point of view. We denote by
H1 the
Alternative Hypothesis, which indicates that the intelligence of
is different from the statistical point of view.
The MetrIntSimil metric uses as input = ; ; …; = that represents the intelligence indicators obtained during the intelligence evaluation in solving some sets of test problems.
For the normality verification, we propose the
One-Sample Kolmogorov–Smirnov Goodness-of-Fit test [
52,
53,
54] and the
Lilliefors test [
53,
54,
55] that is based on the
Kolmogorov–Smirnov test. For the verification of equality of variances of two samples, the
F test [
56] can be used. For the verification of equality of variances of more than two samples, we propose the use of the
Bartlett test [
57,
58].
We propose in some cases the use of a method for the elimination of outlier intelligence indicator values. We call outlier intelligence indicator value, a very high or very low intelligence value, different from those other intelligence indicator values. The difference of an intelligence indicator value from others should be considered from the statistical point of view. Based on this fact, we consider appropriate the application of statistical tests for outlier intelligence values detection. There are many tests for statistical outliers’ detection described in scientific literature, like:
Chauvenet’s criterion [
59,
60],
Peirce’s criterion [
61],
Dixon’s Q test [
62] and
Grubbs test [
63].
We chose the Grubbs test for outliers’ detection and decided to apply the significance level = 0.05. At first application, the Grubbs test is able to detect a single outlier (if there is at least one outlier). If a value is identified as an outlier, then it can be concluded that this is the most statistically different value from those other measured intelligence indicator values. If an outlier is identified, then a decision of whether the outliers’ detection test should be applied again may be considered. This is a recursive process, and the detection method could be applied consecutively more times until there are no other outliers identified.
If sample intelligence data does not follow a Gaussian distribution, then one can opt for the application of a transformation. Some of the most common normalizing transformations are indicated in
Table 2 [
64].
For the effective comparison of intelligence of the multiagent systems, the parametric
Single-Factor ANOVA test [
65] or the nonparametric
Kruskal–Wallis test [
66] should be applied. In case of choosing each of them, the
(the significance levels at which the statistical test is applied) value should be established. We suggest as the value of
,
= 0.05, which we consider as the most appropriate.
denotes the probability to make a Type I error, to reject
H0 (Null Hypothesis) when it is true. A Type I error means detecting an effect that is not present.
In the proposed metric algorithm, if p-value (p-value obtained by applying the Single-Factor ANOVA test or the Kruskal–Wallis test), then it can be decided that H0 could be accepted. The conclusion states that even if there is a numerical difference between the calculated central intelligence indicators , there is no statistical difference between the intelligence of the studied k multiagent systems. The numerical difference is the result of the variability in the intelligence of the multiagent systems. In this situation, from the classification point of view, all of the multiagent systems can be classified in the same class composed of systems with similar intelligence.
If
H1 is accepted (as result of
H0 rejection), then the intelligence level of
is different. The numerical difference between the central intelligence indicators
is statistically significant and is not the consequence of the variability. From the classification point of view
cannot be classified in the same class composed of systems with similar intelligence. If
is accepted, then the
Dunn test [
67] or
Tukey test [
68,
69] should be applied, which allow the classification in intelligence classes of all the studied
CMASs. More concretely, these tests make a statistical comparison between the central intelligence indicators, the mean in parametric case or the median in the nonparametric case.
Tukey test [
68,
69] is a single-step parametric multiple pairwise comparison method.
Tukey test can be used as a post hoc analysis following the rejection of
Single-Factor ANOVA test null hypothesis.
Dunn’s test [
67] is a non-parametric multiple pairwise comparisons method.
Dunn’s test is based on rank sums. It is used as a post hoc method following rejection of a
Kruskal–Wallis test null hypothesis. In the case study presented in a further section based on the non-parametric data, the
Dunn test is applied.
5. Discussion and Comparison of the MetrIntSimil Metric
In our research, we considered the difficult problem solving intelligence measuring at the level of the whole cooperative multiagent system, not at the individual/agent level. Our metric, presented in the form of the algorithm MetrIntSimil, is appropriate for multiagent systems, where the intelligence indicator of a problem solving by a multiagent system can be expressed as a single value. If necessary, this value can be calculated as the weighted sum of some values of more intelligent components that measure different aspects of the system intelligence.
An intelligence indicator should make a quantitative indication of a system intelligence in solving a difficult problem. The researcher who wishes to make a comparison of the intelligence of two or more multiagent systems should decide on the type of intelligence indicator. For all the compared multiagent systems, the type of intelligence indicator should be the same. We consider that an effective metric must be able to measure the same type of intelligence. As an example, in the case of biological systems, it makes no sense to compare the intelligence of a fish with the intelligence of a bird.
The elaborated metric takes into consideration the variability in the intelligence of the compared multiagent systems. A multiagent system could have different intelligent reactions in different situations. In a specific situation, the reaction could be more or less intelligent. In our research, we considered the presence of high and low outlier intelligence values, which are statistically very different from all other intelligence values. If such outlier values are taken into consideration, this could influence the comparison result of more multiagent systems’ intelligence.
In our research, we considered the necessity of the establishment of a central intelligence indicator of a CMAS, which illustrates the central intelligence tendency of the multiagent system. We considered as possible central intelligence indicators, and the calculation as the means of intelligence indicators sample data in the parametric case (all , , …, are sampled from a Gaussian population and their variance is equal from statistical point of view) and as the medians in the nonparametric case (not all the intelligence indicators data are sampled from Gaussian population or all the intelligence indicator data sampled from Gaussian population, but they have different variances). The median is more robust than the mean, a higher or lower value influences more the mean than the median.
We did not find in the scientific literature an effective metric based on difficult problem solving intelligence measuring that has all the properties of MetrIntSimil metric, such as: allowing the simultaneous intelligence comparison of two or more than two multiagent systems; and accuracy and robustness in comparison and universality at the same time.
MetrIntComp metric [
51] presented in the scientific literature is able to make a comparison of two cooperative multiagent systems’ intelligence. The
MetrIntComp metric is based on a similar principle of difficult problem solving intelligence measuring as the
MetrIntSimil metric. The
MetrIntComp metric uses difficult problem solving intelligence evaluation data, based on which it makes a mathematically grounded comparison of exactly two cooperative multiagent systems intelligence. This allows the classification of the compared systems in intelligence classes (classification in the same class or in different classes). The main advantage of the
MetrIntComp metric is the robustness. The robustness is assured by the fact that in the metric algorithm for the obtained intelligence indicator data comparison, the two unpaired samples
Mann–Whitney test is used that is known as a nonparametric robust test [
89,
90]. It does not require data normality (that the samples belong to a Gaussian distribution).
MetrIntSimil based on the obtained intelligence indicators makes a mathematically grounded analysis. At a specific step of the
MetrIntSimil metric algorithm based on some analysis, it chooses between the application of the parametric
Single-Factor ANOVA test [
65] and nonparametric
Kruskal–Wallis test [
66]. Based on this fact, the
MetrIntSimil metric is accurate and robust at the same time.
MetrIntSimil conserves and extends the properties and advantages of the
MetrIntComp metric. In the case of normally distributed intelligence indicator data with same variances,
MetrIntSimil is able to apply a parametric test that is the most appropriate. Another advantage consists in the necessary sample size of intelligence indicators. If a parametric test could be applied, then the required sample size should be smaller than in the nonparametric case.
The
Mann–Whitney test for two unpaired samples is the non-parametric analog to the two-sample unpaired
t-test. It uses a different test statistic comparatively with the
Kruskal–Wallis test (
U instead of the
H of the
Kruskal–Wallis test), but the
p-value is mathematically identical to that of a
Kruskal–Wallis test [
91,
92].
Comparatively with the MetrIntComp metric, the MetrIntSimil metric is able to make a simultaneous comparison of more than two multiagent systems, with the established significance level (the probability of making a Type I error is ). MetrIntComp could be used for the comparison of more that two cooperative multiagent systems, pair-by-pair, but this approach is not appropriate. The probability of making a Type I error increases as the number of tests increase. If the significance level is set at , the probability of a Type I error can be obtained, regardless of the number of groups being compared. For example, if the probability of a Type I error for the analysis is set at = 0.05 and six two-sample tests (t-test for example) are performed, the overall probability of a Type I error for the set of tests = 1− = 0.265.
In the scientific literature, there is no universal view on what intelligence metrics should measure. Each of the designed metrics consider the machine intelligence based on different principles. Based on this fact, most of them cannot be effectively compared directly with each other. For comparison reasons, we chose a recent intelligence metric called
MetrIntComp [
51], which made possible the comparison and this opens a research direction to standardization of the intelligence metrics.
Table 6 summarises the comparison results.
A case study was realized for the experimental evaluation of the
MetrIntComp metric proposed in the paper [
51]. It measured and compared the intelligence of two cooperative multiagent systems in solving an NP-hard problem, the
Symmetric TSP more concretely. We used the intelligence indicators reported in the paper [
51] and applied on them the
MetrIntSimil metric. The same result was obtained for
MetrIntSimil as was obtained by applying the
MetrIntComp metric. Both of the metrics made a differentiation in intelligence between the two studied cooperative multiagent systems, even if the numerical difference between the measured intelligence was small. Based on this fact, the two multiagent systems could not be considered to belong to the same class of intelligence and should be classified in different classes of intelligence.
7. Conclusions
Intelligent cooperative multiagent systems (CMASs) by a large diversity are used for many real life problem solving tasks. There are very few metrics designed for the quantitative evaluation of CMASs intelligence. There are even fewer metrics that allow also an effective quantitative comparison of the intelligence level of more multiagent systems. In this paper, we proposed a novel metric called MetrIntSimil that allows an accurate and robust symmetric comparison of the similarity in intelligence of two or more than two CMASs. The proposed metric efficiently takes into account the variability in the intelligence of the compared CMASs.
For validation purposes of the
MetrIntSimil metric, we conducted a case study for three cooperative multiagent systems,
that operated by mimicking a
Best-Worst Ant System [
12,
13],
that operated by mimicking an
Min-Max Ant System [
14,
15] and
that operated by mimicking an
Ant Colony System [
16,
17]. The evaluation was carried out for solving a NP-hard problem, the
Symmetric Traveling Salesman Problem [
11]. The proposed metric identified that two of the multiagent systems
and
have similar intelligence level, and, based on that, they can be classified in the same similarity class of intelligence denoted
. The multiagent
intelligence is different from the other two multiagent systems intelligence, and, based on that, it should be considered that it belongs to another intelligence class that we denoted by
. Another conclusion consists in the fact that the multiagent systems belonging to
have a higher intelligence level than those that belong to
.
The universal MetrIntSimil metric is not dependent on aspects/details like the studied/compared cooperative multiagent systems’ architecture. It could be applied even to comparison of similarity in intelligence of systems that operate individually without cooperating. Based on a comprehensive study of the scientific literature, we consider that our proposed metric is original and will represent the basis for intelligence measuring and comparison of systems intelligence in many future research works worldwide.