Designing Possibilistic Information Fusion—The Importance of Associativity, Consistency, and Redundancy

: One of the main challenges in designing information fusion systems is to decide on the structure and order in which information is aggregated. The key criteria by which topologies are constructed include the associativity of fusion rules as well as the consistency and redundancy of information sources. Fusion topologies regarding these criteria are ﬂexible in design, produce maximal speciﬁc information, and are robust against unreliable or defective sources. In this article, an automated data-driven design approach for possibilistic information fusion topologies is detailed that explicitly considers associativity, consistency, and redundancy. The proposed design is intended to handle epistemic uncertainty—that is, to result in robust topologies even in the case of lacking training data. The fusion design approach is evaluated on selected publicly available real-world datasets obtained from technical systems. Epistemic uncertainty is simulated by withholding parts of the training data. It is shown that, in this context, consistency as the sole design criterion results in topologies that are not robust. Including a redundancy metric leads to an improved robustness in the case of epistemic uncertainty.


Introduction
The discipline of information fusion is concerned with the aggregation of uncertain information from several sources. Through the process of fusion, uncertainty is to be reduced, that is, information fusion aims at creating information of higher quality [1].
Uncertainty and ignorance manifest in many forms, such as a lack of confidence, aleatoric uncertainty, or epistemic uncertainty. A comprehensive taxonomy of ignorance is provided by Ayyub and Klir [2]. Uncertain information are modelled in various mathematical frameworks, especially probability theory, Dempster-Shafer theory, fuzzy set theory, and possibility theory [3], and each has strengths and weaknesses with regard to types of uncertainty. Possibilistic information fusion is focused on handling epistemic uncertainty, imprecise information, and incomplete information [4,5], which stem from, e.g., scarce data, repetitive data, or biased data. In possibilistic information fusion, knowledge about the state of affairs is complemented by excluding alternatives, which single information sources deem impossible.
In the following, this paper relies on the nomenclature of information items and information sources adopted from [6].

Definition 1 (Information Item).
Consider an unknown entity v and a non-empty set of possible alternatives X A = {x 1 , . . . , x n } with n ∈ N >0 . An information item models information in the form of plausibilities or probabilities about v regarding X A . An information item can be, e.g., a set, an interval, a probability distribution, or a possibility distribution. Consequently, an item may be expressed with certainty (v = x or, assuming A ⊂ X A , v ∈ A), may be affected by uncertainty (v is probably x or v is possibly x), or may be expressed imprecisely (x 1 < vs. < x 2 ).

Definition 2 (Information Source).
An information source S provides information items. It is an ordered concatenation of information items S = {I 1 , I 2 , . . . , I m } with m ∈ N >0 . Each I j represents an information item at instance j ∈ {1, . . . , m}. An information source may be, for example, a technical sensor, a variable, a feature, or a human expert.
Often information fusion benefits from distributing the fusion into a multi-step piecewise process [7][8][9][10]. This means, for example, that information items are fused sequentially, in parallel, or hierarchically instead of centralised all at once. The sequence in which items are fused is often referred to as the topology or architecture. While the term architecture is often used in a broader sense to refer to complete fusion frameworks (see [11][12][13][14]), the term topology is used in this paper to describe the structure in which the fusion is arranged. Example fusion topologies are shown in Figure 1.
Designing and optimising a fusion topology is one of the main challenges in implementing an information fusion system [15]. An optimal topology reduces communicational and computational loads, increases fusion accuracy [16], and helps to detect defective sources [17]. Fusion topologies are usually designed manually, as e.g., in the dissertation of Mönks [18] or require meta-knowledge about information sources, such as in the work of Fritze et al. [19]. Automated learning processes are rare. Such a learning process is made more difficult by epistemic uncertainty due to, e.g., missing or underrepresented classes in training data or due to having few training data instances to begin with. This calls for approaches of learning topologies based on possibility theory.  Key characteristics for designing fusion topologies are the associativity of fusion rules, consistency of information items, and redundancy of information sources. Associativity allows the optimisation of a topology towards, e.g., computational load or other criteria without having to worry about distorting the fusion result. Associativity is especially crucial if a specific topology is necessitated by an application.
Information may not be available at the same time or information sources may be spatially distributed so that a centralised fusion is simply not feasible. Structuring fusion based on consistency or redundancy was proposed quite early [17,20]. The basic idea is to fuse consistent or redundant information in earlier stages and complementary information in later stages. Grouping sources in this way provides the benefits that (i) it is reasonable to conduct fusion conjunctively resulting in maximal certain information [6] and (ii) it is easier to identify defective or malfunctioning sources increasing the robustness of applications [21][22][23][24].
In this article, we contribute an approach towards a data-driven automated learning of information fusion topologies. The article focuses on information modelled within the possibility theory. As a foundation, common possibilistic fusion rules are recapitulated and analysed regarding the associativity property. Based on this analysis, design algorithms relying on consistency and redundancy are proposed and discussed. The aim of the design algorithms is to build topologies that result in maximal specific (i.e., minimal uncertain) fusion outcomes and that facilitate source defect detection. The proposed learning algorithm approaches are discussed with regard to their robustness and further improved by exploiting outlier resistant averaging possibilistic fusion rules. As a first step in this article, an overview of the state of the art in fusion topology design is given independent of the mathematical framework.

Fusion Topology Design in Related Work
Information fusion systems are composed of various interacting parts and methodologies, such as information sources, information pre-processing, fusion nodes, mathematical frameworks, or fusion algorithms. This results in in high-dimensional design spaces, i.e., a large amount of hyperparameters. Deciding on and designing the topology is an important subtask in fusion system design as identified by Raz et al. [16]. The authors explored the design space of a relatively simple fusion task (still > 2 × 10 5 design combinations) with the help of machine-learning algorithms. Their goal was to estimate the impact of design choices on the performance of the fusion system. Among other design parameters, the topology and allocation of sources to fusion nodes were identified to be crucial to the performance. This motivated ongoing work on topology design.
A widely used approach towards designing topologies and allocating information sources is to rely on meta-knowledge about the information sources. Mönks et al. [18,25] grouped information sources (here: technical sensors) into a two-level fusion topology based on the sensor's observed objects, measured physical property, or spatial location. Semantically close (e.g., observing the same object) or spatially close sensors are assumed to be at least partly-redundant and are allocated to the same fusion node. This manual approach has been partly automated by Fritze et al. [19,26,27] who equipped sensors with a self-description containing information about the sensor's characteristics, its contextual environment, and observed objects. A rule-based system then matches and groups sensors based on their self-description. Other ontology-based approaches have been proposed by Boury-Brisset [28] and Martí et al. [29]. Both do not focus on topology design specifically but rather on designing or facilitating a fusion system. Boury-Brisset [28] discussed ontological methods for the integration in the Joint Directors of Laboratories (JDL) fusion architecture [30] including the semantic integration of information. Martí et al. [29] proposed an ontology-based adaptive sensor fusion architecture, and this architecture organises sensors and external sources into preprocessing nodes and fusion nodes depending on the task at hand. A recent application of ontology-based design of information fusion systems can be found in the field of assisted living [31]. Ontological approaches reduce the manual effort needed for structuring fusion topologies; however, they still require profound expert knowledge about the information sources and their context. Building the ontology requires manual engineering and is time-consuming [28].
Designing information fusion topologies is closely related to the data association step predominately but not exclusively used in the JDL fusion architecture. Solaiman and Bossé [32] refer to the task of data association as the identification of any relation between information elements and monitored objects. Waltz and Llinas [33] defined the data association problem with regard to fusion systems more specifically as the "Cross correlation of measurements and m-ary decisions to partition all measurements into sets of common origin. One can distinguish between associating a set of measurements (partitioning) and associating a measurement (or a set of measurements) to a given object. [. . . ]".
In this definition, the partitioning of measurements refers to preparing a fusion task in which each partition represents the input to a fusion node; hence, the relation to designing fusion topologies. Data-driven approaches for data association are given by Grabisch and Prade [34] and Ayoun and Smets [35]. Both approaches cluster sensor measurements based on quantifications of the measurements' proximities. Grabisch and Prade [34] modelled information within the possibility theory and computed the proximity based on the degree of intersection of possibility distributions. Ayoun and Smets [35] used Dempster-Shafer theory instead and clustered based on the degree of conflict between measurements. A similar approach was taken by Schubert [36,37]-although not explicitly labelled as data association-who clustered basic belief functions (evidential masses) based on their conflict and attraction with each other. All of these works ( [34][35][36][37]) partition information sources based on single instances of measurements (the current measurement) and not on historical data. More sophisticated interdependencies and interrelations between information sources can only be detected robustly in historical data. For example, for the identification and quantification of redundancies between sources, meaningful data are necessary, which spans over the sources' frame of discernment as shown by Holst and Lohweg [38,39].
Regarding the problem of data association, it has to be mentioned that more recent publications focus solely on the specific application task of visual target tracking (see for example the works of Kamal et al. and Yoon et al. [40,41]). This focus comes with a shift in interpretation of the data association problem as shown by the definition given by Khaleghi et al. [42]: "[. . . ] the data association problem, which may come in two forms: measurement-to-track and track-to-track association. The former refers to the problem of identifying from which target, if any, each measurement is originated, while the latter deals with distinguishing and combining tracks, [. . . ]". Publications with this shifted focus are less related to the problem of designing fusion topologies.
In summary, in related works, the task of structuring fusion topologies has been approached based on expert knowledge, ontologies, or based on current measurements. Approaches that consequently analyse historical data or information in order to derive a fusion topology are missing. While this section considered topology design independently from the mathematical fusion framework, the remainder of this paper focuses on possibility theory.

Fusion within Possibility Theory
To provide a basis for a discussion on fusion topology design, the importance of associativity, and the role of consistency and redundancy, the core principles of possibility theory (PosT) are recapitulated. For this, common fusion rules are also reported in detail.
The main motivation behind PosT is that probability theory (ProbT) is not able to model epistemic uncertainty adequately-such as imprecision or missing information. Probability theory models random phenomena quantitatively; PosT handles incomplete information qualitatively [5,43]. Zadeh [44] introduced PosT based on fuzzy sets in the context of natural language processing. He interpreted fuzzy membership functions as possibility distributions allowing uncertainties in the sense of imprecisions as well as a lack of confidence in statements [45].
Consequently, PosT is mathematically close to fuzzy set theory [46]. This proximity often allows mathematical operations defined in the context of fuzzy sets-such as similarity measures or t-norms-to be applied to possibility distributions. Since Zadeh's introduction of PosT, Dubois and Prade [4,6,[47][48][49] and YAGER [50][51][52][53] have mainly contributed to the advancement of possibility theory. If not explicitly mentioned otherwise, a numerical, real-valued representation of possibility values is assumed (cf. Dubois et al. [6] for an overview of qualitative and numerical possibility scales).
A possibility distribution is defined as a mapping of mutually exclusive and exhaustive alternative events to a numerical representation. Let the set of all alternative events be described as the frame of discernment X and let v ∈ X be an imprecisely known element whose true value is unknown. Then, a possibility distribution is defined by Alternatives x ∈ X that are assigned higher values are deemed more plausible. Alternatives with π v (x) = 0 are considered impossible, and alternatives with π v (x) = 1 are fully plausible. Possibility theory is strongly guided by the minimum specificity principle, which states that any alternative x not known to be impossible should not be disregarded [45]. Extreme cases of knowledge about v are total ignorance and complete knowledge. In the first case, ∀x ∈ X : π v (x) = 1. In the case of complete knowledge, only one alternative is fully possible, and all others are impossible. A possibility distribution π v (x) is said to be normal if ∃x ∈ X : π v (x) = 1. The subset A ⊆ X, which ∀x ∈ A : π v (x) = 1 is referred to as core of π v (x); if ∀x ∈ A : π v (x) ≥ 1, then A is referred to as support. In the following, the shortened notation π(x) = π v (x) is used.
Let multiple information sources S = {S 1 , . . . , S n } each provide an information item I i , i ∈ {1, . . . , n} in the form of a possibility distribution π i regarding the same imprecisely known element v ∈ X. A possibilistic fusion operator is then defined by fu : [0, 1] n → [0, 1] and the fused possibility distribution is obtained as π (fu) (x) = fu(π 1 (x), . . . , π n (x))). Multiple information sources allow the identification of even more impossible or hardly possible alternatives for the unknown v resulting in more precise, more specific, and thus more qualitative information. In this sense, the goal in possibilistic fusion is to reach a maximal specific outcome (the most certain outcome possible) although possibility theory follows the minimum specificity principle. It is important that none of the available information is disregarded or neglected-that is, that any information source is considered by the fusion process (see also the fairness property postulated for fusion operators [6]). This fairness constraint represents the minimum specificity principle stating that alternatives that are not known to be impossible are not to be ruled out [45].
Over time, multiple possibilistic fusion operators haven been proposed, verified, and brought to applications. We propose to categorise these operators as follows: • Possibilistic Pooling Fusion has mainly been advanced by Dubois et al. [4,48]. The aim of possibilistic pooling is to find the possibility degree for each alternative x.
Hence, operators work on the grades of possibilities (by applying fuzzy norms). Inside this framework, the choice of fusion rules is most often based on the state of knowledge about the reliability of the information sources involved. Depending on reliability and available knowledge, fusion operators are distinguished into conjunctive, disjunctive, and trade-off modes [32]. • Possibilistic Estimation Fusion was mainly devised and advanced by Yager [54]. In contrast to pooling, estimation operators are based on Zadeh's extension principle [55], which defines the use of mappings to fuzzy inputs. The goal of estimation concerns itself with finding the result that is the most compatible with all information items. Operators apply averaging functions on the frame of discernment X. • Majority-guided Fusion identifies majority subsets-often based on consistency measuresand aggregates information from these subsets either exclusively or prioritised-similar to a voting procedure. Majority-guided fusion deliberately violates the fairness principle. It finds application in situations in which it is explicitly known that sources produce consistent readings, e.g., in redundantly engineered technical sensor systems [23]. The operators for majority-guided fusion are often based upon either pooling or fuzzy estimation as is shown in detail in the following.

Possibilistic Pooling Fusion
Conjunctive and disjunctive fusion is most commonly performed using triangular norms (t-norms) and their counterpart triangular conorms (s-norms)-both stemming from fuzzy set theory. Triangular norms and conorms are functions t, s : 1], which satisfy the properties of commutativity, associativity, and monotonicity [56]. For t-norms, 1 is the identity element, i.e., t(π, 1) = π. For s-norms, 0 is the identity element, i.e., s(π, 0) = π. Examples of t-norms are the minimum and the product operator. An example of an s-norm is the maximum operator. Although t-norms and s-norms are defined as binary functions, they can be directly applied to multiple possibility distributions because of their commutative and associative property.
In conjunctive mode, it is presumed that sources agree at least partially about the possibility of alternatives, that is, their information items are at least partially consistent.
Partially agreeing sources are characterised by items with h(I) > 0-that is, their possibility distributions have overlapping support. Fully agreeing sources have items with h(I) = 1, i.e., their possibility distributions have overlapping cores. Conjunctive fusion of fully consistent information items is then achieved by directly applying a t-norm [48]: As t-norms satisfy the strong zero preservation principle, i.e., t(π, 0) = 0, the conjunctive fusion excludes all alternatives, which at least one information source deems impossible. Conjunctive fusion results in the most specific outcome by eliminating alternatives. If information items are only partially consistent, then fusion based on t-norms results in subnormal possibility distributions. Renormalising the resulting possibility distribution leads to which is only defined if sources are not completely disagreeing and if their information items not fully inconsistent, i.e., h = 0 [48].
The disjunctive fusion is appropriate if information items are completely inconsistent, i.e., sources disagree, at least one of them is wrong in its assessment, and it is not known which one. The disjunctive fusion is given by applying an s-norm: keeping all available information. In general, purely disjunctive fusion is not desirable as it results in minimal specific outcomes but is necessary in disagreeing cases. Trade-off fusion modes combine conjunctive and disjunctive fusion depending on what is known (or assumed) about the reliability of sources. Prominent fusion rules can be found in the paper of Dubois and Prade [4]. For this paper, the most important of these are fusion based on the most consistent subsets, quantified fusion, and adaptive fusion.
One prominent way to aggregate information in a two-step process is to search for maximal consistent subsets (MCS) [20,57]. These nonconflicting MCS are fused conjunctively prior to disjunctive fusion of intermediate results. Dubois et al. [58] proposed an algorithm that finds MCS with linear complexity. In this algorithm, all subsets of I with a consistency above or equal to α ∈ [0, 1] are clustered. Let I MCS ⊆ I denote MCS subsets, then MCS fusion is formalised for a possibilistic setting as [6]: Later advancements in MCS fusion were proposed in multiple works [59][60][61]. Quantified fusion [62,63] is a similar two-step fusion process, which assumes that the number of reliable sources j is known. The quantified rule then takes all subsets of information items I * ⊆ I with cardinality j and fuses these conjunctively in the first step. All intermediate results are then fused disjunctively: Adaptive fusion aims at progressing gradually from conjunctive to disjunctive behaviour as conflict increases. A simple adaptive fusion rule is , min max i∈{1,...,n} It fuses all sources disjunctively (assuming one source is right) and discounts the result by (1 − h). In parallel, it fuses all sources conjunctively (assuming all sources are right) and combines both intermediate results. This process does not consider situations in which more than one or less than all sources are reliable. If many sources are fused, it is likely that h → 0, thus, resulting in uninformative results [4]. Dubois' adaptive fusion rule [4,48] builds upon the quantified (7) and adaptive fusion rule (8) assuming that a minimum and maximum number of reliable sources are known. The minimum and maximum number are derived from the consistency of information items I. The cardinality of the largest fully consistent subset gives the minimum number j − = max(|I| | h(I) = 1); the largest partially consistent subset provides the maximum number j + = max(|I| | h(I) > 0). The adaptive fusion is then in which π ( f u) − (x) are obtained by quantified fusion (7) (with j − and j + , respectively) and h + = max I * ⊆I | |I * | = j + (h(I * )). In this way, completely disagreeing sources with fully inconsistent items (h = 0) are disregarded. Furthermore, small changes in the input possibility distributions may lead to significant changes in the fusion result [64].
Oussalah et al. [64] proposed changes to (9) improving the behaviour in the case of outliers and with regard to robustness against small changes. For their progressive fusion rule, they introduced a distance measurement with which the disjunctive fusion (π ( f u) − (x)) part is adapted. Let x 0 , x 1 ∈ X be the smallest and largest element of the consensus set, then measures the distance from point x to the consensus set. Let α(x) = min d(x) d 0 , 1 be a weighting factor. The threshold d 0 is the maximum distance until outliers are considered.
Then, π Instead of (9), (10) considers the completely disjunctive fusion of all information items. The degree to which it considers disjunction relies on d(x). The further x is from the consensus set, the more consideration is given to inconsistent items.

Possibilistic Estimation Fusion
Whereas pooling fusion aims at discarding alternatives, estimation fusion assumes that none of the sources are completely wrong and attempts to find a fusion result that is compatible with all information items [4]. Nonetheless, more specific or precise outcomes are still preferable. Estimation fusion has received less attention in the scientific community compared with pooling fusion (The higher number of citations of Dubois's paper [4] compared to Yager's paper [65] reflect the higher attention). Therefore, the following discussion takes a deeper look into the algebraic properties of estimation fusion.
Estimation fusion is based on Zadeh's extension principle, which allows mapping functions to be used on fuzzy sets [66]. Let Y, Z be a frame of discernments and F : Y → Z. Let A be a fuzzy set defined on Y and B a fuzzy set defined on Z, and then F maps the fuzzy membership function µ A (y) with y ∈ Y to µ B (z) with z ∈ Z by µ B (z) = µ A F −1 (z) = µ A (y) with z = F(y). If F results in multiple outputs for the same y, then In multi-source estimation fusion, the input possibility distributions are first pooled by a fusion function-here referred to in this context as G. The result is then mapped by the multi-parameter function F(x 1 , x 2 , . . . , x n ) with x i ∈ X i , i ∈ {1, . . . , n} onto a new frame of discernment X, i.e., for which the notation is used in the following. The fusion rule in (11) Yager [65] proposed an estimation fusion rule in which G is the minimum operator and F is defined to be an averaging operator.
Yager's estimation fusion rule [65] is then: The application of the minimum operator results in maximal specific possibility distributions, which are placed on an averaged frame of discernment. The disadvantages of estimation fusion are that (i) it requires a frame of discernment for which it is sensible to apply averaging operators on and that (ii) estimation fusion may lead to fusion results that have been deemed impossible by all sources, i.e., the results do not satisfy the zero preservation principle [4]. Regarding the first disadvantage, it is often assumed that X ⊆ R [65], which is also assumed for the remainder of this section.
If G is also an averaging operator, then a noteworthy interaction between estimation fusion and the frame of discernment takes place, which is relevant for practical implementations.

Proposition 1.
If G is an averaging operator other than the minimum operator and X ⊆ R, then fusion with (13) is influenced by the borders of X. More formally, min π (fu) (x)>0 x is dependent on min x∈X x and max π (fu) (x)>0 x on max x∈X x.
i.e., x is the smallest element in X for which at least one π i > 0. Furthermore, let x i . If G = min, then, for at least one permutation of the n-tuple , . . . , π n (x a )   > 0. This n-tuple defines the minimum boundary of π (fu) , i.e., min An example of the effects of Proposition 1 is illustrated in Figure 2.  (12) and X as discussed in Proposition 1. A frame of discernment X = [0, 10] and three possibility distributions are given. Each possibility distribution claims complete knowledge; π 1 (x = 3) = 1, π 2 (x = 5) = 1, and π 3 (x = 7) = 1. The plots show fusion results (dashed red) in which F is the arithmetic mean and G is (a) the minimum, (b) the maximum, and (c) the arithmetic mean operator. Corollary 1. If X is also unbounded and F is an averaging operator other than the minimum or maximum operator, then (13) results in an unbounded π (fu) . If X is half-bounded, then π (fu) is also half-bounded.
Proof. From Proposition 1 it follows directly that, if F = max, then lim Consequently, if G is an averaging operator other than the minimum operator, then it is reasonable to apply estimation fusion only on bounded X. Otherwise, (12) and (13) lead to fusion results spanning to infinity-even for very precise input possibility distributions.

Majority-Guided Fusion
In essence, fusion rules, which focus and prioritise the consensus set-often also referred to as majority observation-fall under the category of majority-guided fusion. Majority-guided fusion is particularly sensible in cases in which information sources are known to produce consistent items. Possibility distributions deviating from the consensus set are then deduced to be faulty (unreliable) instead of giving useful information about the unknown value v.
With this in mind, Dubois' fusion rule (9) already satisfies as a majority-guided fusion rule because it ignores all inconsistent information items (although this fact is precisely one of the main points of criticism by Oussalah et al. [64]). In the specific case of assuming fully reliable sources and expecting consistency between items, it is reasonable to rely on simpler fusion rules; accordingly, it was proposed to use a purely conjunctive fusion rule [23]. Similarly simple are counting fusion functions; the result here is the alternative that most sources consider possible [5].
Estimation fusion rules, such as (13), favour the majority observation because of the averaging characteristic of the estimation operator F. A more complex majority-guided fusion rule, which is based on Yager's estimation fusion (13), was proposed by Glock et al. [67], the majority-opinion-guided possibilistic fusion rule (MOGPFR). The MOGPFR replaces both the conjunctive fusion part G and the estimation operator F with the Implicative Importance Weighted Ordered Weighted Averaging (IIWOWA) operator. The IIWOWA operator, as proposed by [68], is an extension of the parent class of Ordered Weighted Averaging (OWA) operators [50]. An OWA operator allows weighting inputs with w = (w 1 , . . . , w n ), w i ∈ [0, 1], and ∑ i w i = 1. Inputs π i are ordered in descending order. This results in aggregation 1 n ∑ w i · π i and allows the aggregation to be shifted between the minimum with w = (0, 0, . . . , 1) and maximum w = (1, . . . 0, 0). The MOGPFR is then defined as follows: in which λ IIWOWA (•) denotes the IIWOWA operator, and rel i is the reliability for each source. The MOGPFR specifically allows the control of fusion by (i) a reliability vector , which discounts informations items and (ii) two weighting vectors, w p and w m , which control whether G and F are close to the minimum or maximum operator, respectively. The IIWOWA operator is defined only for inputs in [0, 1], which necessitates the fuzzification of X so that the possibility distributions become π i µ (i) . The MOGPFR facilitates the prioritisation of information items belonging the majority observation. The importance values v i are determined by a distance function of π i to the majority set; the possibility distribution π i is discounted accordingly. The parameters w p and w m allow adapting fusion towards conjunctive and disjunctive behaviour. The benefit gained by the MOGPFR lies in its level of control through parametrisation.

Approach towards Topology Design
Associative fusion rules allow changing the sequence in which information sources are fused without altering the fusion result. Therefore, associativity is a beneficial property with regard to the topology design of distributed information fusion systems. Assuming associativity, a system designer or a design algorithm can focus on other criteria for designing a fusion system, such as spatial availability of sources or consistency as well as the redundancy of sources. In this section, we analyse the presented fusion rules regarding the associativity property and its impact on topology design. Following this, a two-layer fusion topology based on the MCS fusion rule (6) is presented. Consistency as a design criterion both increases the specificity of fusion results due to the minimum-operator [6] and to facilitate source defect detection algorithms [21,22]. This motivates the dive into the MCS fusion topologies in this article.
Some flaws and shortcomings of this consistency-based approach are discussed, which leads to several adjustments to overcome those. This includes the introduction of redundancy as a design criterion.
First, both fusion node and fusion topology are defined, and some notations introduced: Definition 4 (Fusion Node). A fusion node fn is a self-contained module encapsulating a fusion operator. A node takes information items as input and outputs a single fused information item. As a node is a self-contained module, a fusion node and its fusion operator have to satisfy the following additional properties: • Modularity: A fusion node outputs a fused information item, which qualifies as a possibility distribution π (see Section 3), i.e., π is normal. This property allows self-contained intermediate results in a topology and makes fusion nodes modular. This increases the transparency of the distributed fusion topology. • Self-Reproducing: Given a single input, a fusion node reproduces this input. It preserves its identity, i.e., fu(I) = I.
Idempotency as a property is not required since idempotency restricts the fusion node in the case where a reinforcement effect is desired (e.g., via the product operator as a t-norm). A fusion node with an associative fusion operator is beneficial since it allows splitting the fusion node.
A fusion node is a modular part of a fusion topology. In order to facilitate the fusion process of the grander topology, it may output auxiliaries denoted as [AUX]. Consequently, a node is also required to be able to process [AUX] as input if necessary.
Definition 5 (Fusion Topology). Interconnected fusion nodes build up a fusion topology. Fusion nodes may be interconnected parallelly, serially, hierarchically, cascadingly, or in more complex structures. A fusion topology organises a feed forward flow of information. Recursive interconnections are excluded. A fusion topology is constructed in layers l ∈ N >0 . In each layer, fusion nodes are indexed consecutively with k ∈ N >0 . The k-th fusion node in layer l is denoted by fn (k,l) , its output information item by I (k,l) , and its auxiliary output by [AUX] (k,l) .
Given the above definitions, Figure 3 shows a three-layer example topology to help visualise the introduced notations.

Associativity
In possibilistic information fusion, the fusion process is rarely considered to be distributed. As a consequence, possibilistic fusion rules are often not associative, which heavily alters the fusion results in differently structured topologies. However, in works regarding possibilistic fusion, associativity has been considered with low priority at best and neglected at worst. For instance, associativity is described as a useful property by Dubois et al. [6]; however, its absence is not considered to be a fatal flaw.
As a first step in discussing associativity, the fusion rules presented in the previous section are summarised in Table 1. Proof for nonassociativity in the case of minimum-norm and associativity in the case of product-norm given by Dubois and Prade [47] yes The table also shows whether the rules satisfy the following two properties: Definition 6 (Associativity). A fusion operator fu is associative if the fusion outcome is independent of the sequence in which information items are fused, i.e., fu(I 1 , Definition 7 (Quasi-associativity). A fusion operator fu is quasi-associative if it can be expressed as a sequence of associative steps and a final operation acting on the results of the previous associative steps [47]. Let f be an associative function and g be a function not restricted to the associativity property, then fu is quasi-associative if fu(I 1 , I 2 ,

Proposition 2.
If a fusion operator is associative, then it is also quasi-associative.
Proof. Let I be a set of information items, let f = fu and g be an identity function: g(I) = I.
Then, g( f (I)) = fu(I)-that is, by making use of an identity function, an associative fusion operator becomes quasi-associative.
From this, it follows that, if a fusion rule is not quasi-associative, then it is also not associative. Associative rules allow unrestricted topology design in the sense that sources can be freely assigned to fusion nodes without changing the overall fusion result. Quasiassociative rules require a final centralised fusion step in which the nonassociative part is computed. The associative part can be distributed to fusion nodes.
MCS fusion (6) is based on the idea that consistent information items are to be fused conjunctively first before the results are fused disjunctively. MCS fusion thus specifies a sequence in which information is to be fused. Consequently, MCS fusion is not associative. It is quite easy to see that different sequences result in different outcomes (see Appendix B for an example). Quantified fusion (7) has a similar approach, meaning that it fuses conjunctively and disjunctively in two steps. Quantified fusion is-for the same reasons as MCS fusion-not associative and not quasi-associative.
More sophisticated fusion rules-such as adaptive (8), (9) and progressive (10) rulesattempt to make the most of all available information. These fusion rules rely on specific metrics, such as global consistency, consistency between specific subsets, or distances between information items. Many of these metrics are only computable if all information items are available centrally. Since all three rules (8), (9), and (10) are based on the quantified fusion rule, they inherit quantified fusion's nonassociativity.

MCS-Based Topology Design
In addition to relying on associative and quasi-associative rules, there is the third option to design a fusion topology and its fusion process based on the characteristics of the information items themselves. In this case, the possibility distributions of sources are analysed, which guides the design towards desired effects. In a sense, the information provided by the multi-source system dictates the topology.
One approach to do so is to build upon the MCS fusion rule (6). It itself is not quasiassociative, and thus information items cannot be freely assigned to fusion nodes. However, by carefully searching for all the most consistent subsets, fusion can be distributed in a way that each fusion node produces the most specific intermediate result from agreeing sources, thus, emphasizing the consensus of this agreeing subset. In such a two-layer topology, all I ∈ I MCS-α (k) are fused in separate fusion nodes fn (k) using, at the first level, a mix of renormalised conjunctive minimum fusion and maximum fusion: . At the second level, all intermediate results are fused disjunctively using the maximum operator. An exemplary fusion topology based on the MCS fusion rule is shown in Figure 4.
π (1) π (2) π (3) π (4) π fu As MCS fusion analyses the consistency of information items, the inferred topology needs to be adapted for each new set of items. This is, particularly in a technical system, often not practical or feasible. Think, for example, of a technical multi-sensor system in which sensors give updated measurements in periodic time increments. In this case, the advantages of distributed fusion-such as the distribution of computing load into local nodes or lower communication loads by condensing information-are negated by the reorganisation with each measurement. Finding the MCS requires having all information items at hand in one central node rendering the distribution of the fusion process pointless. Therefore, topology design based on MCS fusion is only beneficial, if knowledge about the sources' expected behaviour regarding consistency exists a priori. In other words, if it is known that sources produce consistent items continually, then they are assigned to a fusion node without the need for an update with each new instance or measurement. This knowledge can be derived or learned from representative training data. Conclusions about the sources' consistency in the training data are used to build up the MCS fusion topology. Let be a set of information sources that are assigned to fusion node fn (k) . Furthermore, let j = {1, . . . , m} be indices of training data, I (k),j be an information item produced by source S (k) at instance j, and I (k),j be all information items of S MCS−α (k) at instance j, then i.e., a source S (k) belongs to S MCS−α at least to a degree of α. MCS-based fusion nodes are then created by Algorithm 1, which is based on the algorithm provided for finding MCS [58,61]. Algorithm 1 starts with S and searches all MCS for the first data instance (j = 1). The found MCS are stored and themselves searched for new MCS for the next data instance and so forth. is assigned to fusion node fn (k) . The algorithm relies on finding MCS of information items as defined by Dubois et al. [58,61].
Input: A set of information sources S, alpha-cut-level α Output: Set of sets S h with fusion node set S MCS−α For the following computations, the minimum consistency in each group is stored as a reference value: In an MCS fusion topology, which is learned from training data rather than updated each j, it is not guaranteed that, for new data instances, intermediate results I fu (k,1) are disjoint. As of this, the maximum fusion rule of the final layer as described previously is replaced with (15). This means that, in the case that the topology is learned using Algorithm 1, all fusion nodes use the same fusion rule.
Regarding parameter α, the following observation leads to maximal specific fusion results at the first layer. If ∀j cores of the possibility distributions are disjoint, then fusion with MCS-1 is equal to maximum fusion [6]. Therefore, MCS-1 fusion demands continuous mutual consistency. In contrast, MCS-0 results in minimum fusion if ∀j the supports overlap and is less restrictive. (15) results in the maximal specific information items if Algorithm 1 is executed with α = 0.

Proposition 4. MCS fusion as outlined in
Proof. With decreasing α, the condition for grouping items into fusion nodes becomes less strict-as can be seen in (16). Thus, fusion node sizes increase with decreasing α. It follows that the maximum node sizes are achieved if α = 0. The more information items belong to a node, the more alternatives for the unknown true value are eliminated by the minimum operator in (6). Consequently, the integral x a π(x) dx inside the specificity measure (A2) becomes minimal if α = 0, and therefore specificity (A2) itself becomes maximal.
Consequently, we propose the design of MCS fusion by using α = 0 to achieve maximal node sizes and maximal specific fusion results.
The approach presented in (16) and Alg. 1 allows the transfer of the MCS-fusion rule (6) to distributed fusion topologies. This is an alternative to designing topologies based on (quasi-)associative fusion rules, which are rare in a possibilistic setting. An MCS-based topology is aimed at producing maximal specific and precise fusion subresults. However, distributed MCS-fusion lacks robustness in the case of nonrepresentative training data or defective sources, which is detailed in the next section.

Robustness
The MCS fusion topology based on consistencies in the historic training data is prone to unexpected inconsistencies in information items. Due to the minimum operator used in the first level fusion nodes (see (15) and Figure 4), intermediate fusion results are altered significantly if items are less consistent then they are expected to be, that is h ≤ α r . Even in large groups of sources, a single information source producing an unexpectedly inconsistent item may change the outcome significantly. An example of such an occurrence inside a fusion node using α r = 1 is given in Figure 5. Unexpected inconsistent behaviour of reliable sources occurs in two situations.

•
First, incomplete information and epistemic uncertainty in the training data may lead to assessing a group of sources as consistent prematurely. Information sources may produce different (in)consistent behaviours depending on the training data's true value and its position on the frame of discernment. Take, for example, a conditionmonitoring scenario of a technical system in which sensors state the condition on a discrete frame of discernment X = {error1, error2, normal}. Two sensors may both detect two of the conditions (e.g., error1, normal); however, only one is able to detect the third condition (error2). If training data does not include data regarding error2, then with Algorithm 1, both sensors are falsely identified as consistent and grouped into a fusion node. If error2 occurs later, then the sensors behave unexpectedly inconsistently. This problem relates to spurious correlations in probability theory [70], which describes that, in large datasets, it is particularly likely that correlations are found between variables incorrectly. • Second, defective sources are a cause of unexpected inconsistent behaviour. Defective sources are sources that are trustworthy and therefore have a high reliability but nonetheless start to supply incorrect information [71]. Source defects appear in different forms: Information can change suddenly, drift continuously or incrementally, or can be characterised by an increasing number of outliers [72,73]. Countermeasures are majority-guided fusion rules as applied by Ehlenbröker et al. and Holst and Lohweg [21,23]. This requires redundant and reliable sources in a fusion node.
In the following, we propose three adaptations to the distributed MCS-based fusion topology. These adaptations aim to increase the robustness of the topology in the case of incomplete training data and defective sources. •

Redundancy-Driven Topology Design:
To counteract non-representative training data, it must be ensured that information sources are not prematurely deemed to be consistent. For this, it must be analysed whether the consistent behaviour between sources extends over the entire frame of discernment. Therefore, instead of the consistency metric used in (16), the redundancy metric originally proposed in previous works [38,39] is adopted, which ensures that the complete frame of discernment is considered. • Discounting Defective Sources: Grouping the information sources by consistency (or redundancy) eases the detection of defects [23,24]. Items detected as defective are discounted in the fusion node so that they have less influence on the output of the node. This requires an adjustment of the fusion rule (previously minimum or maximum operator) in the nodes. This defect detection step explicitly exploits the distributed topology to its advantage. This deliberately dismisses the associativity of the overall fusion. • Estimation-fusion-based Nodes: Averaging information is a natural way to favour opinions of the majority. Adopting estimation fusion in nodes results in more robust behaviour against defects-such as outliers-compared to purely conjunctive fusion as applied in (6).

Redundancy-Driven Topology Design
In previous work [39], a redundancy metric was proposed that introduces the notion of range of a set of possibility distributions. [39]). Given a frame of discernment X = [x a , x b ], the range of a set of possibility distributions p quantifies how far p stretches over X. Let P (p) bet the power set of all possible p, then the range is described by a monotonic increasing function rge : P (p) → [0, 1] with the following properties:
The range determines whether a set of possibility distributions covers X. Together with the consistency measurement applied in (16), rge is adopted into the topology design approach. Consistency and range are balanced against each other, which results in a dual redundancy metric: Definition 9 (Possibilistic Redundancy Metric [39]). Let S = {S 1 , S 2 , . . . , S n }, i.e., a set of information sources, and P (S) be all possible combinations of sources, then a possibilistic redundancy metric ρ is a function that maps P (S) to the unit interval: ρ : P (S) → [0, 1]. Information sources are only redundant if their information items both (i) are redundant themselves and (ii) cover the frame of discernment, i.e., have a high range (Definition 8). In accordance with [39], the redundancy of information items is determined via possibilistic similarity measures. Consistency (2) satisfies the requirements to serve as a similarity measure [32].
In this context and to qualify as an intuitively meaningful metric, the following requirements have to be met: • Boundaries: A redundancy metric should be able to model complete redundancy and complete non-redundancy. It follows that ρ is minimally and maximally bounded. It is proposed that ρ ∈ [0, 1]. • Identity relation: An information source is fully redundant with identical copies of itself: ρ(S, S, . . . , S) = 1. Note that sources can be redundant without necessarily being identical. • Symmetry: The metric ρ is a symmetric function in all its arguments, i.e., ρ(S 1 , S 2 , . . . , S n ) = ρ(S p(1) , S p(2) , . . . , S p(n) ) for any permutation p on N >0 . The following relations between redundancy of information items and sources hold.
• If information sources are redundant, then they provide redundant information items. Consequently, ρ(S) increases as the redundancy of information items increase.

•
Redundant information items do no necessitate that their information sources are also redundant. Due to cases of incomplete information, redundant information items may be a case of spurious redundancy (similar to spurious correlation).
To capture the idea of a dual metric, ρ is designed to be a function of two pieces of evidence. The evidence against redundancy e c : P (S) → [0, 1]. As long as information items are redundant, e c (S) = 0. Determining the redundancy of information items is both based on the similarity of possibility distributions and related to the notion of possibilistic dependency. An overview of possibilistic redundancy measures for information items is provided by Holst and Lohweg [39]. Dependency measures are reviewed by Dubois et al. [74].
Evidence in favour of redundancy e p : P (S) → [0, 1] quantifies the amount of epistemic uncertainty in training data. It incorporates the range of information. It indicates to what degree information is available from the complete frame of discernment. A set of information sources is only redundant if e p (S) > 0 and e c (S) < 1. The smaller value of e p and (1 − e c ) dominates the redundancy metric. In previous work [39], the geometric mean is proposed as an averaging function for e p and e c as follows: ρ(S) = ρ e c (S), e p (S) = e p (S) · (1 − e c (S)).
Let the consistency measure h (2) determine the redundancy between information items and let I j be the set of information items available at instance j, then i.e., e c averages consistencies available from training data with an averaging operator (see Definition 3). Designing MCS-based topologies (16) is based on the notion that the consistency is above a certain α for all instances. To keep this notion for the redundancybased design, the minimum operator is used as averaging operator in (19).
The evidence e p is computed based on the range as follows: The range itself is dependent on the position of possibility distributions on the frame of discernment, which is determined by their center of gravity [2] if π(x) = 1 and ∀x ∈ {X \ x} : π(x ) = 0, The position of a set of possibility distributions p is obtained by prior disjunctive fusion (5), i.e., pos(p) = pos(fu(p)).
Given a set of information sources S = {S 1 , S 2 , . . . , S n } providing information items I j = p j = {π 1,j , π 2,j , . . . , π n,j }, then At least one pair p j , p j of information item sets needs to range over the frame of discernment X in order to provide evidence for a redundant behaviour, i.e., e p (S) > 0 if ∃j : rge p j > 0.
The redundancy metric ρ (18) is used as a decision criterion to find suitable sets of information sources S ρ (k) to be fused in fusion nodes fn (k) . Algorithm 2 describes a simple approach that searches all subsets of consistency-based fusion nodes in S h (found by Algorithm 1). A set of sources is only assigned to a fusion node if ρ ≥ η.
if ρ(S) ≥ η or |S| = 1 then /* S is added to fusion topology */ if S S ρ then S ρ .append(S); end else /* create subsets of S to be checked for redundancy */ foreach S ∈ S do S ← S \ S; if S / ∈ S then S .append(S ) end end end idx ← idx + 1; end As motivated previously, the redundancy-based approach of Algorithm 2 results in a more robust MCS-based topology design than Algorithm 1. As (18) includes the range of information items, the effects of incomplete information and epistemic uncertainty in the training data are reduced. This leads to less detections of spurious relations.

Discounting Defective Sources
Information items that deviate from the expected level of consistency α r (17) are seen as unreliable and, consequently, are discounted in each fusion node. Therefore, the degree of reliability rel ∈ [0, 1] is determined with regard to α r . Let I be information items fused in a node and I * be the largest subset in I, which has (i) h(I * ) ≥ α r and (ii) |I * | > 1; then, In the case that there is no unique I * with h(I * ) ≥ α r and at least two elements, then all items are seen as fully reliable, and fusion needs to switch to disjunctive fusion.
Information items' possibility distributions are modified prior to fusion so that they have a lesser effect on the fusion results [4,75]. A modification function for discounting information items has to satisfy the following requirements (extended from previous work [39]).

Definition 10 (Requirements for Information Item Modification).
As modification aims at changing fusion outputs, the requirements interact with fusion rules to be applied on π: • Information preservation: If rel(I) = 1, then the information must not be changed but instead preserved. Let π be a modified possibility distribution based on π. If rel(I) = 1, then π = π. • Neutral element: If rel(I) = 0, then I needs to have no effect on the fusion. The item I needs to act as a neutral element on fusion operator fu, i.e., fu(I, I) = fu(I). Modification functions were proposed by Yager and Kelman [75] π (x) = rel · π(x) + 1 − rel, and Dubois and Prade [4] π (x) = max x∈X (π(x), 1 − rel).
Both satisfy the requirements for modification only for conjunctive fusion. A general modification function for the use with OWA operators was proposed by Larsen [68]. It is defined based on the andness degree and ∈ [0, 1] of OWA fusion: π (x) = and + rel · (π(x) − and).
The OWA operator results in the minimum fusion for and = 1 and in maximum fusion for and = 0. The OWA modification (24) introduces a global possibility level of and to the distribution π . As of this, the modification satisfies the requirement of neutral element only if and = 1 or and = 0 but not for 0 < and < 1.
All three modification functions raise the overall possibility level globally. As argued in previous work [39], this kind of approach towards modification functions is counterintuitive if it is considered that defective or unreliable sources may err in their estimation of the unknown value v. An unreliable source may be slightly incorrect. Raising the possibility level globally cannot model such a situation. A modification function that widens or shrinks the possibility distribution is proposed as (adapted from previous work [39]): This modification considers both minimum and maximum fusion as they occur in the MCS-based fusion topology but does not approach a global modification. The reliability rel and the control parameter β ∈ R ≥1 define a vicinity around x. The new possibility π (x) is taken from this vicinity. This creates a widening or shrinking effect, respectively. The parameter β allows to control the size of the vicinity and, thus, the extent to which rel alters π(x). The larger β is, the less effect rel has on π(x). If rel > 0 and β → ∞, then (25) has no widening or shrinking effect.

Estimation-Based Fusion Nodes
The third adaptation to increase the robustness of the proposed MCS-based fusion topology is to replace fusion in the first layer (15) with estimation fusion (13). In this way, defective sources have a lesser impact on the fusion result of a node.
Associativity needs to hold for first layer fusion nodes (see Figure 4) if multi-level fusion is to be achieved (splitting fusion nodes into smaller ones). Estimation fusion is only associative if G is associative and monotonic increasing and F is associative. In the proposed estimation-based fusion nodes, G is the minimum operator that satisfies associativity and monotonicity. The function F is defined to be an averaging operator, which is rarely associative, e.g., the arithmetic mean. Multi-level distributed fusion can still be achieved by using a fusion node's ability to output auxiliary information (see Definition 4).
If a node outputs the number of information items that contributed to its fusion result as a weight w, then a weighted arithmetic mean operator of the form results in associative fusion. In the following, we refer back to the notation of fusion nodes as defined in Definition 4, i.e., I k,l denotes the set of information items that serve as input to fusion node fn (k,l) . To achieve associativity, a weight w (k,l) is assigned to the output of fn (k,l) , which is defined as The distributed weighted average function allows splitting nodes without changing the fusion result. An overview of a distributed fusion topology based on estimation fusion rules is given in Figure 6.
To keep the option of discounting defective sources, weights w (k,l) are modified in the case a defect is detected via (23) as follows: If rel I (k),l = 1, then information is preserved. Otherwise, if rel I (k),l = 0 the information item is completely discounted.

Remark on Multi-Level Fusion by Splitting Nodes
The MCS-based design approach describes a two-layer fusion topology by first fusing consistent or redundant information items conjunctively and then fusing the intermediate results disjunctively. In this context, multi-layer fusion can be achieved by splitting a single fusion node into multiple smaller ones. This may be beneficial if, e.g., communication or computational loads per node need to be optimised. While this approach of splitting is feasible due to the associativity of applied fusion rules, the ability of the fusion topology to detect and discount defective sources is reduced by doing so.
Discounting information items requires finding the unique largest subset of items whose consistency is greater than α r . If multiple sources are defective simultaneously, then-depending on the fusion node size-the largest subset may be made up by defective sources. In the worst case, the maximum number of defective sources a fusion node can handle is n−1 2 [24], with n being the number of sources contributing to a fusion node. As the proposed discounting approach is node-specific, the ability of a node to discount defective sources is hindered by splitting nodes. The smaller n is, the smaller is the maximum number of detectable defective sources. This hast to be kept in mind in designing an MCS-based fusion topology.

Evaluation
The evaluation is structured into three parts in which the computational complexity, topology design approaches, and the robustness of distributed MCS fusion are focused. Distributing information fusion is motivated-as outlined in Section 1-by the assumption that computational load per distributed node is less than the load for a single centralised node. First, this assumption is examined for MCS and estimation fusion.
Subsequently, the computational complexity of design Algorithms 1 and 2 are discussed. Their performance and the effectiveness of the MCS-fusion adaptations (see Section 4.3) are then evaluated on selected real-world datasets.

Computational Complexity
The following evaluation of computational time complexity relies on the Bachmann-Landau notation f (n) = O(g(n)), which states that a function f (n) does not grow faster for n → ∞ than g(n). f (n) is therefore asymptotically upper bounded by g(n). O(g(n)) denotes the set of all f (n) such that there exist positive constants c and n 0 : f (n) ≤ c · g(n), ∀n ≥ n 0 [76].

Fusion Rules
In the following, we evaluate whether the computational load of MCS and estimation fusion are decreased by distributing, i.e., whether each fusion node in a distributed topology has a lower load compared to a single centralised node. For MCS fusion, it is assumed that the MCS have already been found, i.e., only (6) is considered.
As (6) consists exclusively of minimum and maximum operations, centralised MCS fusion is O(n) with n being the number of input information sources. In a distributed twolayer fusion topology, each fusion node has n f ≤ n input sources. First layer nodes operate using renormalised minimum fusion; the final layer node applies maximum fusion. Fusion in each node is therefore O(n f ). This simple observation shows that computational load of distributed nodes is less than in centralised fusion-for reasonable MCS fusion topologies.
For estimation fusion, the situation is not as simple. Estimation fusion, as defined in (11), (12), and (13), iterates over every n-tuple (x 1 , . . . , x n ). Thus, the computational load increases exponentially with its number of inputs n.
Proposition 5. Let X * be the frame of discernment with the highest cardinality in {X 1 , . . . , X n }, then the complexity of estimation fusion rule (11) is O(|X * | n · F +|X * | n · G +|X * | n ). If G is the minimum operator and F is the arithmetic mean operator, then the complexity is O(|X * | n ). (11) is a combination of F, G, and the maximum operator. F and G need to be computed for each n-tuple (x 1 , . . . , x n ) for every x i ∈ X i , i.e., F and G are computed ∏ n i=1 |X i | times. The maximum operator is computed for each x ∈ X. Its number of inputs is at worst ∏ n i=1 |X i |. In total, the complexity of (11) is
Therefore, the complexity of (12) relies on the complexities of G and F; however, it is safe to say that the growth |X * | n leads to issues in practical implementations. Unfortunately, in this case, the lack of scalability cannot be solved by distributing the estimation fusion over several nodes. Proposition 6. Let G be the minimum operator and F be an averaging operator as defined in (13). Assume a topology of fusion nodes using estimation fusion (13) exclusively, then fusion at the final fusion node in the last layer still grows exponentially, that is, has O(∏ n i=1 |X i |) or O(|X * | n ), respectively.
Proof. Looking at a single fusion node with n k inputs, F maps in worst case each tuple (x 1 , . . . , x n k ) to a unique point x. Then, the size of the output's frame of discernment is ∏ n k i=1 |X i |. Let fn (k,l) be fusion nodes arranged in a topology so that the fusion topology outputs a single information item, i.e., there is a final fusion node fn (1,L) , L ∈ N + . Assume all n available information items are input into a fusion node exactly once. Then, the final node has to process 2 ≤ n final ≤ n input information items. The number of tuples to iterate is then ∏ n final k=1 |X k,L−1 |. In a two-layer topology, ∏ Thus, fusion at the final node has O(|X * | n ).
For estimation fusion, the number of elements in the frame of discernment grows with each fusion node. The final fusion node has to process in worst case |X * | n tuples, which is the same for centralised fusion.
Yager demonstrated [65] that, if all π i are convex and if X contains only real-valued ordered elements, then (13) (that is G = min and F is an averaging operator) can also be computed via the crisp-set α-cuts Definition 11. A possibility distribution π is said to be convex iff (1) each of its α-cuts A α are a single closed interval, i.e., A α = [a, b] and (2) all A α are nested, i.e., ∀α 1 > α 2 : For each α-level the crisp sets A α i are fused using the averaging operator F, which results in The fused possibility distribution is then obtained by taking the maximum α-level as follows: Proposition 7. The computational load of (28)-(30) grows linearly in number of input possibility distributions n, number of elements in X * , and number of α-levels n α , i.e., (28)-(30) have in total O(n · |X * | · n α ).
In contrast to (13), the computational load is distributed over fusion nodes if (28)- (30) are distributed. Using α-cuts, neither |X| nor n α grow with each fusion node. Rather, they stay constant. Consequently, increasing the number of fusion nodes in a topology-which decreases the number of inputs per fusion node-reduces the computational load per node. In conclusion, both estimation fusion as well as MCS fusion profit from reduced computational load per node if fusion is distributed.

Fusion Topology Algorithms
Using (16) naively to search all possible subsets of a set of information sources S for fusion nodes is computationally demanding. Such an approach grows exponentially in number of sources n. The proposed Algorithm 1 presents a computational faster approach. Proposition 8. The Algorithm 1 for finding consistency-based fusion nodes has complexity O m · n 2 with n = |S| and m being the number of training data instances.
Proof. Algorithm 1 iterates over all training data instances j. For j = 1, it searches S for all MCS. As the algorithm of [58,61] grows linearly in n, this step is O(n). For each subsequent iteration with j > 1, it searches all previously MCS found at j − 1 again for MCS. The maximum number of found MCS is n. The maximum number of sources belonging to an MCS is also n, i.e., each iteration at j > 1 grows with n 2 . Consequently, Algorithm 1 is O m · n 2 .
The redundancy-based Algorithm 2 takes the fusion nodes found by Algorithm 1 as input. If an MCS does not meet the redundancy criterion, then Algorithm 2 searches within each MCS for largest subsets with ρ ≥ η. is n, Algorithm 2 is O(2 n ).
In contrast to the consistency-based algorithm, the redundancy-based version scales in its current implementation poorly with number of sources. For reasons of practical implementation, this needs to be addressed in future works. In this regard, plausibility checks are promising as to whether subsets of S MCS−α (k) can actually exhibit the required range. In such cases, it would not make sense to search these subsets at all, saving computational time.

Robustness
Fusion using the default MCS-based topology is prone to unexpected behaviour of information sources regarding their consistency (see Section 4.2). In the following, the MCS fusion design approach and topology are evaluated on selected real-world datasets regarding their robustness. First, consistency-based design is compared to the redundancy-based design approach. Following this, the adaptations of discounting and estimation fusion are evaluated. Implementation and data preprocessing are detailed to increase reproducibility.

Data Preprocessing
Several data preprocessing steps are performed before the implementation. These are necessary (i) to homogenise a heterogeneous frame of discernments, (ii) to reduce the effects of noise (aleatoric uncertainty) on the fusion results and topology design, and (iii) if data are not available as possibility distributions but rather as singular values or probability distributions. Preprocessing comprises the three following steps.

•
If data are singular values or probability distributions, they are transferred into possibility distributions first. For this step, singular values x are interpreted as probability distributions with p(x) = 1 and x ∈ X \ {x} : p(x ) = 0. Transformation is conducted by the truncated triangular probability-possibility transformation [49,77,78] resulting in π(x). • Second, sources providing noisy data are regarded as partially unreliable. Their possibility distribution are modified using (25) accordingly. Unreliability values for information sources are determined heuristically. • Third, modified possibility distributions π(x) are mapped to a common, shared frame of discernment. This X is based on fuzzy memberships µ, i.e., X = [µ a , µ b ]. This requires a fuzzy class to be defined to which µ(x) indicates the degree of membership of x. The class membership function µ(x) can either be provided by an expert or trained automatically [18,38,39]. Here, µ(x) is trained by a parametric unimodal potential function proposed proposed by Lohweg et al. [79]: , and with x being the arithmetic mean of given training data x. The parameters are determined as follows: C l = x − min j∈{1,2,...,m} x j , C r = max j∈{1,2,...,m} x j − x, and D l , D r ∈ N >1 . D l and D r are often determined empirically [21,80]. A training routine for D l and D r based on density estimations is given by Mönks et al. [81]. The possibility distribution π(x) is then mapped to π(µ) via the extension principle as follows: π(µ) = max x∈X:µ(x) = µ π(x).
A detailed description and visualisations of these preprocessing steps are given previous work [39]. Together, the preprocessing steps allow to apply the proposed design algorithms even on heterogeneous, noisy, and nonpossibilistic data. Robustness against noise can additionally be increased by data filtering. However, since parameters of (31) rely on minimum and maximum values of training data, applying filter directly on training data x would distort the borders of the unimodal potential function. For this reason, memberships µ(x)-instead of data-are filtered in the preprocessing.

Nonrepresentative Training Data
The effects of nonrepresentative training data on consistency-based MCS topology design and on redundancy-based design are evaluated. Consistency-based topology is obtained by Algorithm 1 with parameter α = 0 as argued in Proposition 4. Its redundancybased counterpart is obtained by Algorithm 2. To ensure highly redundant information sources in fusion nodes, parameter η is set to 0.6, i.e., sources are added to a fusion node if their redundancy is greater than or equal to η.
Both design approaches are applied to the Sensorless Drive Diagnosis (SDD) dataset [82,83]-a multi-class classification dataset (The SDD dataset is available for download at the University of California Machine Learning Repository [84]). Nonrepresentative training data are simulated by withholding data of certain classes from the design algorithms creating a situation of epistemic uncertainty.
For the creation of the Sensorless Drive Diagnostics data set, an electromechanical drive was monitored to detect faulty system behaviour. Data comprise features obtained from phase-related motor currents and voltages. Each feature serves as an information source in this evaluation. The dataset is particularly interesting because (i) it contains highly noisy data and (ii) data are often linearly or non-linearly correlated and thus potentially redundant. The SDD dataset contains 11 classes in total, of which class 1 represents healthy system behaviour (henceforth referred to as the normal condition). All other classes represent various fault states, such as gear or bearing damage.
The design algorithms are executed on two subsets of the dataset. First, only data belonging to the normal condition build a reduced training dataset. This reduced set manifests epistemic uncertainty. It is nonrepresentative with regard to the complete behaviour of information sources. For comparison, the second subset is constructed to include all data, i.e., the complete dataset serves as training data.
Regarding the preprocessing steps, the unimodal potential function (31) is trained on the normal condition with parameters D l = 2 and D r = 2. To regard the noise in the dataset, possibility distributions are modified with reliability parameters ∀S ∈ S : rel(S) = 0.9 and β = 1. Additionally, memberships are smoothed with a moving average filter using a window size of 5. As the SDD dataset provides data as singular values, the preprocessing steps result in rectangular possibility distributions.
The following behaviour is expected from the topology design approaches, which helps in verifying their output: • For the consistency-based approach, fusion nodes trained on complete data are expected to be smaller or of equal size compared with nodes trained on reduced data. More specifically, ∀k, ∃k : S MCS−α (k),reduced ⊆ S MCS−α (k ),complete because (16) requires consistencies of all data instances to be above the threshold α. • Sources grouped by the redundancy-based approach S ρ (k) are expected to always be a subset of at least one consistency-based found group S MCS−α (k) , i.e., ∀k, ∃k : because the redundancy metric (18) is more restrictive than pure consistency. The additional range information (22) prevents sources being added to a fusion node when it is not known that they behave consistently over the complete frame of discernment.
The results of Algorithms 1 and 2 are shown in Tables 2 and 3, respectively. Both tables show found fusion nodes for the first layer of the two-layer fusion topology. Fusion nodes are shown both for reduced and complete training data along with redundancy ρ (18), range evidence e p (20), and inconsistency evidence e c (19).
The results of Table 2 show that the MCS-based topology meets the expectation regarding fusion node sizes. Furthermore, each set S MCS−α (k),complete is a subset of at least one S MCS−α (k),reduced , e.g., S MCS−α (1),complete ⊂ S MCS−α (7),reduced . It is also notable that-especially but not exclusively on reduced data-some sources occur in many fusion nodes. This relates, for example, to sources 25 and 37. Sources with little informative value are likely to be consistent with other sources because such sources provide possibility distributions, which are likely wide or even close to total ignorance. For sources 25 and 37, it is the case that both provide large possibility distributions covering a significant part of the frame of discernment. Lastly, no fusion node based on complete data is exactly similar to fusion nodes based on reduced data (which is different in the following redundancy-based approach). Fusion nodes differ significantly. This means that nonrepresentative data limits the performance of the consistent-based approach substantially, i.e., because epistemic uncertainty is not considered by Algorithm 1 fusion nodes are inflated with spuriously consistent information sources.
The results of the redundancy-based approach (Table 3) also meet the expectations formulated beforehand, i.e., ∀k, ∃k : . In contrast to the consistency-based approach, sources with little informative value are not part of fusion nodes (e.g., sources 25 and 37). The computation of the range (22) penalises wide possibility distributions. This is because of the disjunctive fusion prior to computing the position of a set of distributions (21). Sets including information items close to total ignorance are given a position close to 0.5 resulting in low range values and hence low redundancies.
Similar to the consistency approach, the amount of fusion nodes decreases from reduced to complete training data. This shows that the redundancy-based approach is not able to rule out all sets showing spurious redundancy. However, the majority of nodes learned on complete data are exactly similar to nodes on reduced data. This is true for sets {10, 11 coming close. This shows that the redundancy-based approach finds significant sets despite nonrepresentative training data. Table 2. Fusion nodes and their contributing information sources as designed by Algorithm 1 with parameter α = 0. Grouped information sources are consistent for all instances of training data (see metric e c (19)). The left side shows fusion nodes found on reduced, highly epistemic uncertain training data, i.e., only data of the class stating normal condition were available. The right side shows nodes found on complete data. Fusion node sets on reduced training data do not meet the required redundancy threshold (i.e., ρ η), which is due to the low range-based evidence e p (20). Information sources are numbered as provided by the SDD dataset [82,83]. Fusion nodes with less than two information sources are omitted. In total, 24 fusion nodes were found on reduced data and 28 on complete data.

Node
Reduced Training Data Complete Training Data Therefore, it copes better than the consistency approach in situations with high epistemic uncertainty because the evidence e p (20) quantifies epistemic uncertainty. Nonetheless, it is advisable to update and adapt fusion nodes and topology with newly available data. This reduces risk of nodes with spurious redundancy. Figure 7 depicts scatter plots of selected information sources to visualise the shortcomings of the consistency-based approach and to show the effects of epistemic uncertainty. Information items may be close to each other-and therefore be consistent-for parts of the training data (see plots (a), (b), and (c)). This is indicated by the fact that the positions of items are clustered in the upper right corners for reduced training data. This does not mean that consistent behaviour carries over to complete data (which is only true for (c)). Table 3. Fusion nodes and their contributing information sources as designed by Algorithm 2 with parameters α = 0 and η = 0.6. Grouped information sources are consistent for all instances of training data and range over a significant part of the frame of discernment. The left side shows fusion nodes found on reduced, highly epistemic uncertain training data. The right side shows nodes found on complete data. Information sources (features) are numbered as provided by the SDD dataset [82,83]. Fusion nodes with less than two information sources are omitted. In total, 29 fusion nodes were found on reduced data and 31 on complete data.

Node
Reduced Training Data Complete Training Data  (15) {46, 47 Figure 7. Information items of selected information sources belonging to reduced training data (green) and complete training data (blue). Data belongs to the Sensorless Drive Diagnosis dataset [82,83]. Subplot (a) shows information sources (features) {1, 5}, (b) {25, 10}, and (c) {43, 45}. Each point in the scatter plots represents the position or centre of gravity of a possibility distribution obtained by (21). Possibility distributions of a single pair are plotted below each scatter plot to give an intuition about the size of the distributions. In the case of reduced training data, information sources (a) {1, 5} and (b) {25, 10} belong to fusion nodes in the consistency-based approach (see Table 2) but not in the redundancy-based approach (see Table 3). Without the additional information provided by the range metric (22), the consistency-based approach considers sources, which result in being inconsistent on complete training data. Sources (c) {43, 45} are given as an example in which information items are consistent over the complete training data. Both the consistency-based as well as the redundancybased approach consider {43, 45} in fusion nodes. Note that the scatter plot in (a) is zoomed in for better visibility.

Defective Sources
Regarding defective sources, two adaptations to the MCS topology were proposed in this paper. Both adaptations-(i) discounting defective sources and (ii) estimation-fusionbased nodes-were evaluated on data with purposely engineered source defects.
The Typical Sensor Defects (TSD) dataset [21] provides such defective sources (The TSD dataset is available for download at https://zenodo.org/record/56358 (accessed on 9 March 2022)). The TSD dataset contains data of a storage container for hazardous and flammable materials measured, e.g., by temperature, smoke, and gas sensors. The dataset comprises several files, which each include a specific simulated source defect, such as incremental drift or outlier readings. For this evaluation, the files "data_standard.csv" and "data_drift_0_001.csv" are used.
The first provides unaltered data without defects. The second one contains the same data with the exception that a temperature sensor (feature 15) drifts with 1‰ h −1 of its base value. Regarding preprocessing, the parameters for the unimodal potential function (31) are provided as metadata in the dataset. As data are hardly affected by noise, sources are fully reliable, and no averaging filter is applied. Data are provided with an error margin of ±2% of the sensor's measurement range [21] creating a uniform probability density function. Thus, preprocessing results in triangular possibility distributions.
The fusion topology is learned on unaltered data using the consistency-based approach of Algorithm 1-again with α = 0. This creates three fusion nodes on the first layer: Their fusion results are fused at the final node fn (1,2) using MCS fusion (6). For the first layer nodes, the following fusion rules are used and evaluated: • renormalised conjunctive fusion based on (15), • discounted renormalised conjunctive fusion extending (15) with (23), (25), • estimation fusion (13), and • weighted estimation fusion (27).
is applied. Similarities sim ∈ [0, 1] with sim = 1 indicating full similarity. Table 4 lists the minimum, arithmetic mean, and maximum values of the computed similarity values for fn (2) and fn (1,2) . High similarities show robust behaviour against the defective source. As fn (1) and fn (3) contain no defective sources, they are omitted from the table. It can be seen from the results that renormalised conjunctive fusion, which is the default rule in MCS fusion, was affected the most by the drifting source. Measures against defective sources are therefore reasonable.
The approach of detecting and discounting by widening inconsistent possibility distributions improved the robustness slightly but not substantially. The ineffectiveness is due to two reasons. First, widening with (25) does shift the fusion result toward reliable sources but does not guarantee that the original fusion result is restored. It is reasonable to assume that parameter β has a substantial impact, which needs to be investigated in further works. Second, a drifting possibility distribution may actually drift into other possibility distributions creating a false most consistent subset in the process.
This may lead to situations in which the wrong source is discounted. It is assumed that the risk of this happening decreases with the number of sources in a fusion node.
Estimation fusion nodes showed, on the other hand, a significant increase in robustness evidenced by the higher min-and mean-values. Weighted estimation fusion demonstrated the best performance. Due to its averaging nature, estimation fusion reduces the effects of defective sources the better the higher the number of sources. Table 4. Similarity between fusion node outputs on unaltered (standard) dataset and drift affected dataset. The table shows the minimum, arithmetic mean, and maximum of similarities computed on each data instance. The drift affected source belongs to fn (2) . Therefore, fn (1) and fn (3) are not explicitly listed. Similarity is increased by proposed countermeasures to defective sources.

Conclusions
Choosing a topology is one of the main challenges in information fusion system design. Associativity, consistency, and redundancy play key roles in the performance of a topology. In this article, we detailed and discussed a data-driven design approach resulting in a two-layer topology inspired by MCS fusion. Due to the associativity of fusion rules in the first layer nodes, the topology can be extended to multiple layers without affecting the fusion results.
The basic design approach relies on the consistency of information items to find MCS nodes. The resulting consistency-based topology was susceptible to unexpected behaviour from information sources caused by unrepresentative training data or defective sources. We proposed adaptations to the basic design comprising the inclusion of a redundancy metric, the automated discounting of defective sources, and the application of outlier robust estimation fusion.
In the evaluation, we demonstrated that the redundancy-enhanced design resulted in more robust topologies in the case of epistemic uncertainty. Furthermore, evaluation showed that discounting defective sources and estimation fusion reduced the effects of defective sources. Estimation fusion outperformed the discounting approach in this regard mainly because, in certain situations, the discounting approach incorrectly identified sources as defective. Further work is required to improve this.
While the consistency-based approach found MCS in linear time regarding the number of sources and number of data instances, the redundancy-enhanced version searched the power set of all MCS. Although ∀k : |S MCS−α (k) | ≤ |S| and although, in practical applications, it is reasonable to assume ∀k : |S MCS−α (k) | |S|, the scalability of the redundancy-based approach needs to be improved in further works. Another topic that should be addressed in further works is to adapt the design approaches so that they are able to update a topology on streamed data. With new dates becoming available, the epistemic uncertainty is reduced. Updating a topology has the potential to improve the fusion results continuously in small steps.  Figure A1. Three possibility distributions fused by the quantified fusion rule (7). Plot (a) shows π