Modeling of Nonlinear Aggregation for Information Fusion Systems with Outliers Based on the Choquet Integral

Modern information fusion systems essentially associate decision-making processes with multi-sensor systems. Precise decision-making processes depend upon aggregating useful information extracted from large numbers of messages or large datasets; meanwhile, the distributed multi-sensor systems which employ several geographically separated local sensors are required to provide sufficient messages or data with similar and/or dissimilar characteristics. These kinds of information fusion techniques have been widely investigated and used for implementing several information retrieval systems. However, the results obtained from the information fusion systems vary in different situations and performing intelligent aggregation and fusion of information from a distributed multi-source, multi-sensor network is essentially an optimization problem. A flexible and versatile framework which is able to solve complex global optimization problems is a valuable alternative to traditional information fusion. Furthermore, because of the highly dynamic and volatile nature of the information flow, a swift soft computing technique is imperative to satisfy the demands and challenges. In this paper, a nonlinear aggregation based on the Choquet integral (NACI) model is considered for information fusion systems that include outliers under inherent interaction among feature attributes. The estimation of interaction coefficients for the proposed model is also performed via a modified algorithm based on particle swarm optimization with quantum-behavior (QPSO) and the high breakdown value estimator, least trimmed squares (LTS). From simulation results, the proposed MQPSO algorithm with LTS (named LTS-MQPSO) readily corrects the deviations caused by outliers and swiftly achieves convergence in estimating the parameters of the proposed NACI model for the information fusion systems with outliers.


Introduction
In the modern world, to make optimum decisions in economics, industry, science, aeronautics, manufacturing, traffic control, and many other military and civilian applications we are extremely dependent on useful and crucial information which is drawn from messages or data via transformation, classification and/or some other processing. Therefore, multi-sensor systems providing these messages or data are becoming increasingly important in meeting the goals of optimum decision-making. Besides, a feasible model to elaborate on information fusion and a soft computing technique to perform the heavy computations required are also critical.
Within the consideration of a feasible model, traditionally, the most common forms are the weighted average model and the linear regression model. These models are all linear and assume that there is no interaction among feature attributes (i.e., input information). However, in many real-world systems, the inherent interaction among feature attributes must be considered circumspectly and these kinds of systems are essentially non-additive systems. Hence, a nonlinear aggregation based on a nonlinear integral (NANI) model with respect to a non-additive set function is a powerful way of coping with these kinds of systems. In general, the Choquet integral is the most frequent form of the nonlinear integral and some literature proposing its use exists [1][2][3][4]. Liu et al. [1] proposed a NACI model derived from one of the following three kinds of fuzzy supports: the bespoke fuzzy support, the sample relative fuzzy support and the response correlative fuzzy support. This model deals with the interaction among feature attributes based on the correlation in statistics. Wang et al. proposed the original [2] and weighted [3,4] NACI model to deal with the information with numerical and categorical feature attributes, respectively. In fact, the weighted NACI model is the generalized form of the original one. In these two models, the interaction among the feature attributes toward the objective attributes (i.e., outputs) is described as non-additive set functions and is essentially derived from the co-relationship in the statistics. Although the weighted NACI model is successful in describing the interaction among hybrid feature attributes, at the same time, more parameters have to be estimated than in the original NACI model, but for a system with n-dimensional feature attributes, there are 2 n n  parameters that must be determined and it is obvious that the amount of parameters increases exponentially with the dimensions of the feature attributes. The problem of exactly finding out these parameters is an essential optimization problem and the basic idea consists of making the residuals as small as possible. Residuals here are defined as the difference between what is actually observed and what is estimated. To minimize residuals, traditionally, the Least Square (LS) method is introduced and typically it achieves a remarkable estimation under circumstances where all attributes are uncontaminated. Unfortunately, in real world applications these features and objective attributes are always subject to outliers. That is, outliers may occur due to various reasons, such as erroneous measurements or data with a heavy-tailed distribution function. Whenever outliers exist, they always cause a serious deviation of what is estimated. Within the outlier detection literature [5][6][7], the least trimmed squares (LTS) estimator and the least median squares (LMS) estimator are the most popular ways of eliminating the effects caused by outliers. The LTS estimator not only possesses a high breakdown value but also several advantages over the LMS estimator, therefore, in this study we have focused our efforts on the LTS estimator to eliminate the inference from outliers. That is, we propose a feasible model able to effectively reject outliers that is also a contribution of this paper to the fuzzy integral problem.
Confirming the feasible model and from previous analysis, to efficiently and swiftly estimate the model"s parameters satisfying specific criteria is the next challenge. That is, a timesaving soft computing technique is necessary for the information fusion system with contaminated attributes. In the literature, there are many outstanding soft computing techniques that qualify for this task; they are neural network (NN) [8], GA [9], ant colony optimization (ACO) [10], etc. Particle swarm optimization with quantum-behavior (QPSO) which is an improved version of the traditional particle swarm optimization (PSO) [11] would be one of the powerful choices [12][13]. In the QPSO algorithm, particles are bounded in the searching range just like electrons move in a quantum well; meanwhile, according to the uncertainty principle, a particle"s position and velocity cannot be determined simultaneously. Hence, the information of a particle in quantum space is depicted by probabilities (i.e., wave function) and the dynamic behavior of a particle is widely divergent and dominated by the Schrödinger equation. The QPSO algorithm ensures the congregation of the particle swarm without losing the randomness. Within the QPSO algorithm, particles can appear at any position of the whole space which is searched with a certain probability. This algorithm offers high performance in single mode systems, because of the property of swift convergence. However, particles usually fall into local extreme states in multimode optimization systems and then take on the premature phenomenon. In order to make use of the merits of quick convergence and conquer premature in the traditional PSO, we proposed a QPSO algorithm with elitist crossover mechanism of the GA (named MQPSO) in our previous work [14] and demonstrated a superior performance than the GA in estimations of model parameters. In this paper, we improve the MQPSO algorithm proposed in our previous work to manipulate systems with outliers. That is, the mechanism of the LTS estimator is introduced to eliminate deviations caused by outliers and enhance the robustness of the MQPSO algorithm. To distinguish it, the revised MQPSO algorithm is named LTS-MQPSO. The most significant improvement is that the LTS-MQPSO algorithm combines the concepts of the simulated annealing (SA) and the GA within the QPSO algorithm to achieve global search and overcome prematurity in optimal processes, respectively; meanwhile, the LTS estimator is also performed to eliminate the inference from outliers. In order to verify the proposed LTS-MQPSO algorithm, a numerical example is also performed in this study. From the results of the experiment, the proposed LTS-MQPSO algorithm is able to acquire reasonable parameters for the NACI model and make quite precise decisions.
The rest of paper is organized as follows: in Section 2, we introduce the NACI model and characterize the information fusion system. Section 3, the least trimmed square estimator and the QPSO algorithm are briefly described. Next, we propose the LTS-MQPSO algorithm in detail. Section 5, is shown the results of numerical simulation and then the paper is concluded in Section 6.

The NACI Model and Information Fusion System Characterization
In traditional linear aggregations, the most frequent model used to describe the relation between feature attributes X and objective attribute Y is the Lebesgue-like integral [15]: where 0  is a constant, s  is a scaling factor, the integrand f represents observations of the scope of feature attributes X ,  is an additive measure which indicates the relative contribution of each element of feature attributes and er is the error term which has the form of normally distributed random perturbation with zero mean and variance 2  . This linear model always performs a good approximation based on a fundamental assumption that there is no interaction among feature attributes. However, in many real-world systems, the inherent interaction among feature attributes must be considered circumspectly. To reasonably describe the inherent interaction among feature attributes, Wang and Klir [16,17] proposed a regular non-additive set function  named normalized general measure (NGM). The NGM is defined on the power set of feature attributes and the formal definition of the NGM can be express as: Besides, a nonlinear integral is also introduced to aggregate the feature attributes. That is, whenever we deal with information fusion systems where information possesses some inherent interactions, the nonlinear integral with respect to the NGM is the most reasonable tool. In practical applications, there are many kinds of nonlinear integrals such as the Choquet integral [18], the Sugeno integral [19], the Wang integral [20], and so on. The Sugeno integral, by definition, is similar to logical operations and thus it is not an extension of the Lebesgue-like integral. Although the Sugeno integral is very timesaving to perform, it cannot be precisely inverted and this is a fatal defect. On the other hand, the Wang integral has been shown to possess remarkable properties. However, it is rather complex and quite time-consuming to perform. Those are the main reasons why the Choquet integral is adopted in this paper. The Choquet integral with respect to the NGM is defined as follows:   , is called the  -cut set of function f . Since X is a finite set and the value of measureable function f can be sorted as: where   * * * 12 , , , n x x x is a permutation of   12 , , , n x x x . Then, the discrete type of Choquet integral with respect to the NGM defined above can be expressed as: Compared to the linear aggregation model shown in Equation (1) represents the relative strength of contribution to objective attributes Y by a single feature attribute i represents the joint relative strength of contribution to objective attributes Y by the feature attribute set A . In addition, to simultaneously deal with observations with categorical attributes and numerical attributes, the NACI model which indicates the relation between hybrid attributes X and objective attributes Y can be expressed by the following formula [4]: where c and q are constants, fd In the NACI model, constants c , q , vectors  and the NGM  are all parameters of the model. In total there are 2 n n  unknown parameters and this number increases exponentially with the dimensions of the feature attributes. In order to complete the NACI model, these model"s parameters have to be determined in advance. That is so called the training state of the NACI model. In the training, associating Equation (7) with available observations constitutes an over-determined system with the Choquet integral. Thus, the analytic solution of the model parameters cannot be figured out exactly. Furthermore, constants c and q are essentially different from the other parameters which are governed by the Choquet integral. Therefore, a dual optimization procedure must be simultaneously performed; meanwhile, the performance index of optimization J (called fitness function) is also introduced and expressed as: where k is the length of available observations for the training state. Because the kernel of the performance index of optimization is the LS estimator, it always suffers from atypical observations which arise from outliers in real world systems. That is, the LS method deviates seriously in estimations of a model"s parameters where outliers are present. Hence, it is also a major objective of this study to propose a feasible method for resolving this issue. The proposed method has to achieve not only precise model"s parameters but also remarkable capability of rejecting outliers. In general, these kinds of problems are also called robust regressions and many high breakdown value regression estimators have been proposed for this [6,7]. For the reasons of simplicity and efficiency, the LMS and the LTS are the more popular regression estimators in scientific applications. Furthermore, the LTS estimator possesses not only the same breakdown value as the LMS, but also several additional merits: for instance, its objective function is smoother; its statistical efficiency is better, and so on. Therefore, we focus the treatment of outliers in the LTS method and thus, Equation (9) is revised as: where * j y and * j f are a permutation of observations under the best model parameters and h is a trimmed parameter of the LTS estimator. The block diagram of the proposed structure for the training state and information fusion systems is shown in Figures 1 and 2, respectively.  In Figure 1, the block named MQPSO receives the differences of objective attributes between observations and estimations when the terminative criterion is not satisfied yet; meanwhile, the parameters of the NACI model are updated based on these differences. Another block which is named LTS is used for filtering out these atypical observations and the trimmed parameter of the LTS estimator h is also revised by the global optimal parameters so far. Besides, the block named "Non-additive systems with outliers" is the system that we are considering. That is, it is the source of the training data (Observations) which are used for modeling the NACI. The block named "Subset of observations" is represented as the observations after the LTS. That is, the "Subset of observations" is also "Non-additive systems" but different from the Observations (Non-additive systems with outliers). In Figure 2, the block named feature attributes of information depicts continued observations in a period in which the decision profile (DP) is produced. Associating the DP with the model"s parameters which are acquired in the training state, the decision is usually able to be made precisely. Besides, the block named decision by majority guarantees that we are always able to make a correct decision in a low contaminated environment.

The LTS Estimator and the QPSO Algorithm
The LTS estimator is formulated as: where i d is the th i observation, 2 () i rd is the th i squared residual, k is the length of observations and h is the number of data points which are not trimmed from the data set. In robust regression analysis [6], the maximum tolerance of the LTS estimators to outliers (named Maximum Breakdown Point) for any equivariant regression estimator satisfies: where  is the dimension of variables. Intuitively, the breakdown point is bounded above at 50%. The maximum breakdown point is actually attained for Equation (12) with in a multiple regression system and the solution of Equation (11) always exists. Of course, one can achieve the optimal solution by considering k h C ordinary least squares problems for all subsets of   1,2, ,k with h elements and selecting the best one among all candidates. Obviously, it is laborious and impractical for real world systems with large numbers of observations. In order to cope with a great deal of observations, the FAST-LTS method was been proposed [7]. The major distinguishing features are the initial h -subset, the C-step and the nested extensions. By and large, the initial h -subset is a preselecting mechanism to confirm that a clean h -subset   , the performance of these two procedures is poor and it takes much more time. To deal with this situation, the procedure named nested extension is introduced. In nested extensions, the data is partitioned into many subsets and then, the initial h -subset and the C-step are applied to each subset. Next, each subset with  feasible solutions is extended to the full observations and the C-step procedure performed repeatedly. Finally, an optimal solution that satisfies the specific desired accurateness would be achieved.
After drawing observations without contaminations, a proper soft computing technique is essential and can help us to efficiently estimate the parameters of the NACI model. In the literature there are many outstanding soft computing techniques that qualify for this work. The QPSO algorithm is one of these soft computing techniques, and possesses significant global and local search abilities. In the QPSO algorithm, particles move in a quantum multi-dimensional space, the state of particles is usually depicted by normalized wave function ( , ) t   , i.e., the probability amplitude of the position where particles are present; and further, 2 ( , ) t   is then interpreted as the corresponding probability density function which satisfies the follow equation: 1 td    (13) where  are the n-dimensional coordinates. That is, a single particle with mass m is subjected to the influence of a potential field ( , ) Vt  in the quantum space and the wave function is governed by the Schrödinger equation: where is the Planck constant and 2  is the Laplacian operator. In an environment with a potential field, the particles are then attracted to the center of field through the optimization process, and this attraction leads to the global optimum. Based on the assumption that the attractive potential field is time-independent (the co-called stationary state), the solution of the time-independent Schrödinger equation has the form [21]: where  has the dimensions of an angular frequency. In theory, any type of potential well can describe this system which is bounded and attracted by a potential field. However, the simplest one is the Delta Potential Well and the potential field is given by: where  is a positive number proportional to the "depth" of the potential well. The meaning of Equation (16) is that the depth is infinite at the origin and zero elsewhere. For the sake of simplicity, the solution of time-independent Schrödinger equation for this system in one dimensional space is considered and expressed as: where () Qz is the probability density function for measuring a particle"s state and L is the characteristic length of Delta Potential Well. The L specifies the search scope of a particle and is called "Creativity" or "Imagination". In order to obtain the precise position of particles, the Monte Carlo Method is used for simulating the procedure whereby the quantum state collapses to the classic state. After this effort, the particle"s position can be expressed as: where NP is the number of particles in a population, u is random number uniformly distributed on [ where 1 c , 2 c are constriction coefficients and loc i p , gol p are the best position of the th i particle and the global best position found so far. In order to improve performance of the QPSO algorithm, Sun et al. [13] employ a Mainstream Thought Point (or named Mean Best Position, mbest ) to evaluate the parameter L . However, to extend the global search of the QPSO algorithm, the mbest is modified and then, these two parameters can be expressed as the following form: , , , where  is a creative coefficient which is used to adjust the convergent speed of individual particle and the performance of the QPSO algorithm. Hence, the particle"s position can be updated in the each iteration by the form:

The LTS-MQPSO Algorithm
Within empirical applications, however, the QPSO algorithm usually represents a stagnating phenomenon for searching the global optimal solution in multi-mode problems and systems. Meanwhile, it is also strongly influenced by the creative coefficient  . In order to improve these defects, the updating mechanism of the creative coefficient  on the MQPSO algorithm which is proposed in our previous works is revised. That is, the modified MQPSO algorithm combines the QPSO algorithm with mechanisms of the SA and the GA to achieve global search and overcome premature for traditional PSO in optimization process. Two significant improvements are introduced to the modified MQPSO algorithm. They are the nonlinear updating of the creative coefficient  with the form of the SA and the instantaneous monitoring the convergence of the optimization procedure, respectively. In the QPSO algorithm, the creative coefficient  is set to a large number at the beginning and adjusted decreasingly following the optimization procedure. Such mechanisms effectively realize that a global search is performed at the beginning and the convergence is achieved finally. In general, the decreasing rate of  is linear, but a nonlinear revision according to the convergence of the optimization process would be more reasonable and feasible. In the modified MQPSO algorithm, a nonlinearly revising mechanism which is similar to the SA algorithm is introduced and expressed as the form: where   is step length of  , fit  is the changing rate of optimal estimation so far and ini  is the initial value of  . A typical curve of  which is adjusted by fit  is shown in Figure 3. The other improvement of the modified MQPSO algorithm is the mechanism to overcome prematurity. Inspired by the mechanisms of mutation and elite crossover in the GA, an index of conquering stagnation (named ECM which is an abbreviation of Elite Crossover and Mutation) is used for monitoring the status of the optimization procedure in the modified MQPSO algorithm. That is, during the optimization procedure, the modified MQPSO algorithm preserves each different gol p ; meanwhile, the index of conquering stagnation, ECM is set to zero whenever gol p is updated. Of course, the ECM increases by one whenever gol p is unchanged. Before finishing the current iteration, the modified MQPSO algorithm judges whether ECM exceeds the specific criteria. If it is true, the modified MQPSO algorithm lets the new population be these collected gol p instead of the original population (all/or these worse particles) and sets the ECM to zero, instantaneously.
For observations without outliers, the MQPSO algorithm offers superior performance for estimating parameters than the GA [14]. Because the kernel of estimating fitness is the LS estimator, the MQPSO algorithm always makes a serious deviation in the contaminated circumstance. Therefore, the LTS estimator is introduced to sieve out the observations without contamination. The proposed LTS-MQPSO algorithm and flow chart is shown below and Figure 4.
Step 1: Randomly initialize the population of particles with dimension 22 n n  and then, evaluate their fitness values by Equation (10).
Step 2: Sort particles according to their fitness values and then initialize loc i p , gol p .
Step 3: Perform the LTS estimator to sieve out these h observations without contamination.
Step 6: Evaluate the fitness values of all particles base on (10).
Step 7: According to fitness values evaluated in Step 6, update loc i p .
Step 8: Check over whether the maximum iteration is reached or the terminative criterion is satisfied?
If yes, go to Step 11, else perform next Step. Step 9: Check over whether gol p is updated? If gol p is updated, sets ECM to 0 and perform the LTS procedure, then go to Step 3. If gol p is unchanged, increase ECM by 1 and perform next Step.
Step 10: Check over whether the maximum ECM is reached? If yes, let these collected gol p instead of ( 1) t   and go to Step 4, else keep ( 1) t   and go to Step 4.
Step 11: Check over whether gol p should be updated and then output the results.

Numerical Simulation and Results
The multi-sensor-based intelligent security robot (ISR) [23] consists of six subsystems; namely, sensor system, remote supervision system, software development system, image system, avoid obstacle and motion planning system. These subsystems can acquire and preliminarily processes sensory signals and then, the sensory data is transmitted by interface devices to the main controller (IPC) for further treatment. The hierarchy structure of sensory systems used for the ISR is shown in Figure 5. In the fire detection subsystem and intruder detection subsystem, the sensory data is transmitted by a digital input/output interface. That is, these two subsystems only send a decision which is made by an information fusion system to the IPC of the ISR. However, a wrong decision is usually made whenever the sensory signal is contaminated with outliers. In this simulation, we focus our attention on the fire detection subsystem. This subsystem is constituted by environmental sensors, which include flame sensors, smoke sensors and temperature sensors. It is suitable for demonstrating and verifying the effectiveness and feasibility of the proposed information fusion system shown in Figures 1 and 2. Prior to performing the numerical simulation, the principles of these three sensors are briefly described. In the smoke sensor module, the kernel is a TG135 ionization smoke sensor. When smoke occurs, an ionizing radioactive source is brought close to the plates and the air itself is ionized. In other words, it will generate a tiny current. For the flame sensory module, the R2868 ultraviolet sensor is used for detecting the flame. Its peak wavelength is 200 μm and its sensing wavelength is 185-260 μm. For the temperature sensory module, the AD590 semiconductor sensor is adopted to detect the temperature of fire. This sensor has a positive temperature coefficient of about 0.7, and its linearity is within 0.5% for a temperature range between −65 °C and 150 °C . The standard output of the AD590 is 1 mA/°K. In general, these sensory signals are all tiny values and have to be converted to a standardized voltage output by an amplifier circuit. Besides, the relations of input sensory signals and output voltage signals must be made linear by tuning the calibration circuits. Finally, these sensory signals that are converted to binary digital signals are transmitted to the IPC. In this experiment, these three modules are integrated together and the resulting 3-in-1 fire detection sensor is shown in Figure 6. Because the sensory signal is tiny, it always suffers from outliers and this causes a wrong output. Fortunately, these outliers only last an instant in general and we are able to eliminate them by considering the interactions among continuous samples. For the sake of simplicity, an artificial observation profile which simulates four continuous sampling data points with normalization is made to estimate the model"s parameters by associating the proposed LTS-MQPSO algorithm in the training state. All simulations are implemented in the Matlab environment and conducted on an Intel Core 2 Duo CPU P8400, 4GB Ram capacity PC.   and h = 75%. Then, we randomly create 400 4-dimensional feature attributes with 10% random contamination to produce training data as shown in Table 1, where true y are the original objective attributes, cont y are the contaminated objective attributes and the bold-faced numbers represent that objective attributes are contaminated. In this example, the termination criteria of the program are that the iterations reach a maximum of 1,500 times or the mean square error is less than 10 −5 . After performing the proposed LTS-MQPSO algorithm for many times, the average results of estimating the model parameters and comparisons are shown in Tables 2-4. In addition, we also show in Figures 7-10 plots of the training data and estimated results. In Figure 7, a comparison between the contaminated (red line) and the estimated (blue dash line) objective attributes are shown. These two curves nearly overlap besides these points where outliers are present. To clearly show the performance of rejecting outliers, the zoomed in portion which is circled with a dotted line is also shown in Figure 8. As shown in Figure 8, the LTS-MQPSO algorithm is able to identify outliers and reject them. In the Figure 9, a comparison between the original (red line) and the estimated (blue dash line) objective attributes are shown. These two curves almost overlap everywhere. To distinguish each other, the zoomed in portion which is circled with dotted line is also shown in the Figure 10. As shown in this figure, the difference between the original and the estimated objective attributes is less than 10 −4 . Besides, it is intuitive that the LTS-MQPSO algorithm is able to make quite precise estimations of model"s parameters.

Conclusions
In this paper, the NACI model association with the LTS-MQPSO algorithm is considered and developed to deal with a non-additive system with outliers. Whenever atypical observations are present, the parameter estimation method based on the LS estimator is no longer feasible. Therefore, replacement of the LS estimator with the LTS estimator is an excellent alternative. That is, we successfully integrate the mechanisms of the SA, and the GA into the QPSO algorithm to estimate parameters of the NACI model; meanwhile, the LTS estimator is also introduced to filter out outliers before performing the modified MQPSO algorithm. From the simulation results, the proposed LTS-MQPSO algorithm can precisely estimate parameters of the NACI model for observations contaminated with outliers; meanwhile, it still maintains high coincidence between the estimated and original objective attributes.