GACEM: Genetic Algorithm Based Classifier Ensemble in a Multi-sensor System

Multi-sensor systems (MSS) have been increasingly applied in pattern classification while searching for the optimal classification framework is still an open problem. The development of the classifier ensemble seems to provide a promising solution. The classifier ensemble is a learning paradigm where many classifiers are jointly used to solve a problem, which has been proven an effective method for enhancing the classification ability. In this paper, by introducing the concept of Meta-feature (MF) and Trans-function (TF) for describing the relationship between the nature and the measurement of the observed phenomenon, classification in a multi-sensor system can be unified in the classifier ensemble framework. Then an approach called Genetic Algorithm based Classifier Ensemble in Multi-sensor system (GACEM) is presented, where a genetic algorithm is utilized for optimization of both the selection of features subset and the decision combination simultaneously. GACEM trains a number of classifiers based on different combinations of feature vectors at first and then selects the classifiers whose weight is higher than the pre-set threshold to make up the ensemble. An empirical study shows that, compared with the conventional feature-level voting and decision-level voting, not only can GACEM achieve better and more robust performance, but also simplify the system markedly.


Introduction
Classification is one of the most important purposes of multi-sensor systems (e.g., target recognition [1,2], personal identity verification [3], landmine detection [4]). It is well known that data available from multiple sources underlying the same phenomenon may contain complementary information. Intuitively, if such information from multiple sources can be appropriately combined, the performance of a classification system could be improved. A classification system, capable of combining information from multiple sources or from multiple feature sets, is said to be capable of performing data fusion. Usually there are two conventional approaches to deal with this, i.e., feature-level fusion and decision-level fusion [2,[5][6][7]. In feature-level fusion, features are extracted from multiple sensor observations, and combined into a single concatenated feature vector which is input to a classifier such as neural networks, decision trees, etc. Decision-level fusion involves the fusion of sensor information, after each sensor has made a preliminary solution of the classification task [8]. There have been some qualitative suggestions about how to choose the fusion strategy: Brooks [6] supposed that feature-level fusion would be a superior choice if the information represented by the data was correlated, while decision-level fusion would be a better choice if the data was uncorrelated. Additionally, in [9] it was demonstrated that decision-level fusion worked well when the data was fault-free, but its performance degraded faster than feature-level fusion when measurement error was introduced to the system. However, most of these conclusions are from empirical research and neither data fusion nor decision fusion can be proven to be the optimal fusion technique for all events, so the search for the optimal fusion framework in multi-sensor systems is still an open problem.
In the last decade, quite a lot of papers have proposed a classifier ensemble for designing high performance pattern classification systems [10,11]. A classifier ensemble is also known under different names in the literature: combing classifiers, committees of learners, mixtures of experts, classifier fusion, multiple classifier systems, etc [12]. It has been proven that in the long run, the combined decision is supposed to be better (more accurate, more reliable) than the classification decision of the best individual classifier [13]. Generally, the research on classifier ensembles involves two main phases: the design of the ensemble process and the design of the combination function. Although this formulation of the design problem leads one to think that effective design should address both phases, most of the design methods described in the literature focus on only one of them [10,14]. For the multi-sensor system, as we know, there is not so much research focused on the application of classifier ensembles. Ref. [15] argued that application of classifier ensembles in the decision-level fusion could be helpful for moderation to compensate for sampling problems where moderation can be regarded as replacing any fusion parameter's value with its mathematical expectation. But the results could be better convinced if there is a large-scale empirical study for proof and it is almost impossible to moderate sophisticated classifier, such as neural networks, because of the high variability of excessive parameters. Another approach proposed in [16] by Polikar et al. is generating an ensemble of classifiers using data from each source, and combining these classifiers using a weighted voting procedure. The weights are determined based on the individual classifier's training performance as well as the observed or predicted reliability of each data source. In essence, the approach is derived from AdaBoost [17] which involves subsampling the training examples [18]. We have also shown an analogous application of the Bagging algorithm [19] in mechanical noise source identification [20].
Moreover, Roli et al. presented an application of classifier fusion for multi-sensor image recognition [21]. The common feature is that Refs. [16,20,21] mostly focused on the decision level. As shown in later sections (see Section 2.3), we believe that these approaches could be synergistic with the new method proposed in this article.
In this paper, an approach named Genetic Algorithm based Classifier Ensemble in Multi-sensor system (GACEM) is proposed. By introducing the concept of Meta-feature (MF) and Trans-function (TF), the fusion problem can be unified in the classifier ensemble framework and then it has been shown that either the feature-level fusion and or the decision-level fusion is just a special case of our framework. After that, different from the previous application of GA [22,23], an ad hoc chromosome coding strategy in GACEM is presented for the selection of feature subset and the optimization of decision combination simultaneously. Correspondingly, some genetic operators such as crossover and mutation operators are modified to take into account a binary and real-coded chromosome template. By doing so, the final classifier ensemble framework is obtained after evolution. Finally, an experiment of classification of 35 kinds of different sound sources is designed and the results prove the effectiveness of GACEM.
The paper is organized as follows. In the next section we analyze the feasibility of application of classifier ensemble in multi-sensor system. The technical detail of GACEM is discussed in Section 3. Section 4 provides and analyzes the experimental results of sound source classification. Finally, conclusions and some potential further research directions are presented in Section 5.

Problem formulation
Consider a classification problem where a test pattern (whch may be an event, a physical phenomenon, etc.) is to be assigned to a class label S ( The concepts of MF and TF are the theoretical basis of applying classifier ensemble methods in multi-sensor systems. Unfortunately, in many situations, the concept of MF and TF may be hard to substantialize and understand, so they are of less use for calculation than theoretical deduction. But under certain conditions, they do have exact physical meaning. For example, in the sound measurement system (see Section 4.1), if we use the power spectrum as the feature vector, then 0 R is the power spectrum at the excitation point (sound source position) and i T is in fact equivalent to the square of magnitude of the frequency response function (FRF) between the excitation point and the ith response point (sensor position). And given a precise system model (e.g., the finite element model built in ANSYS), all the information mentioned above can be calculated.

Classifier ensembles in multi-sensor systems
Using MF and TF, the observation set That is to say, any technique proven to be effective in pattern classification is also believed to be theoretically effective in pattern classification in MSS. Many researchers have shown that the classifier ensemble is a very promising way to improve classification performance [10,11,21] and a typical demonstration figure of a classifier ensemble can be found in [24]. As shown in Figure 2, several feature sets are generated from the raw data from an observed phenomenon and then a number of classifiers can be obtained by training from versatile combination of different feature sets. It is notable that the numbers of feature sets (M) and classifiers (N) may be unequal. Finally, on the base of classification of each classifier, the final classification result can be given through some kind of fusion rules, such as majority voting [25], plurality voting [26], weighted averaging [27]. Analogously, in MSS, the feature vector i R ( ) is also generated from the MF 0 R describing the observed phenomenon. What's more, the combination of feature vectors from different sensors will lead to versatile classifiers. As shown in Figure 3, the red line i T means the TF from 0 R to ) are binary (0-1) parameters representing whether the feature vector i R contributes to the training of the j-th classifier j f , i.e., 1 ij C  means positive and 0 negative. Besides, the importance of the j-th classifier can be indicated by j  . Besides, it is very important to understand that the generated classifier j f may be a sub-classifier ensemble system by performing such operations like Bagging or Boost as mentioned in [16] or [20]. This, however, is not the focus here. Further studies will be summarized in our next study.
In particular, two special cases are given: Obviously, (1) is in accordance with the feature-level fusion [see Figure 1(a)] and (2) is in accordance with the decision-level fusion [see Figure 1(b)]. Next, given a pool of N classifiers, there are a number of possible combining strategies to follow. But it is usually not clear which one may be the optimal for a particular problem. The simplest idea is to enumerate all possible solutions, i.e., assessing the classification accuracy on a validation set with all possible solutions and then choosing one exhibiting the best performance [10]. But the burden of exponential complexity of such search limits its practical applicability for larger systems. For example, If 5 M N   , the number of possible combination of feature will be 7 0 1 Considering there would be hundreds of sensors in large-scale MSS in engineering, the exhausted search is obviously unpractical for application. So we need more feasible search algorithm.

GACEM: Genetic Algorithm based Classifier Ensemble in a Multi-sensor System
In essence, searching for the optimal classifier ensemble framework in MSS belongs to the 'optimization-centered' problem while traditional optimization techniques often fail to meet the demands and challenges of highly dynamic and volatile information flow [28]. In the prevailing optimization approaches, the genetic algorithm (GA) provides a valuable alternative to traditional methods due to its inherent parallel nature and its ability of global optimization.

A brief introduction of GA
A genetic algorithm is a search algorithm based on the mechanics of natural selection and natural genetics. It efficiently utilizes historical information to obtain new search points with expected enhanced performance. In every generation, a new set of artificial individuals is created, using the information from the best of the old generation. Genetic algorithm combines the survival of the fittest from the old population with a randomized information exchange that helps to form new individuals with higher fitness. There are three basic genetic algorithm operators: selection, crossover, and mutation. Those operators combined with the proper fitness function definition constitute the main body of genetic algorithms [29]. GA has been used in various pattern recognition problems, such as image registration, semantic scene interpretation, and feature selection [28].
In summary, the GA search process typically comprises of the following steps: Step 1. Randomly generate initial population of chromosomes.
Step 2. Evaluate fitness (objective function) of each chromosome.
Step 3. Are the termination criteria met? If YES, go to step 7. If NO, go to step 4.
Step 4. Generate new population by selecting pairs for mating, recombination using crossover and mutation.
Step 5. Evaluate fitness (objective function) of each new chromosome.
Step 6. Identify the fittest individual in the population. Go to step 3.

Detail of GACEM
In this section we present an approach, i.e. GACEM, to find the optimal classifier ensemble in MSS. As mentioned above, the purpose of GACEM is optimization for design of both the ensemble process and the combination function.

Fitness function
Although there have been some studies on how to evaluate the performance of classifier ensembles and various measures have been proposed for the purpose [12], we don't think those heuristic statistical parameters are surely to be superior to directly choosing the classification accuracy as the criterion for evaluation. And it is believed that choosing an additional validation set other than the training set for evaluation will moderate the risk of overfitting [30]. So the classification performance on an evaluation sample set is adopted as the fitness function in GACEM.

Selection operators
We choose the roulette selection in GACEM. The standard roulette selection chooses parents by simulating a roulette wheel, in which the area of the section of the wheel corresponding to an individual chromosome is proportional to its fitness performance.

Crossover operator
Since there are both binary and real value codes in the chromosome, we need a hybrid crossover operator. For the b-Part, the scattered crossover function is adopted, which creates a random binary vector and selects the genes where the vector is a 1 from the first parent, and the genes where the vector is a 0 from the second parent, and combines the genes to form the child. While for the r-Part, we use the intermediate crossover function

Mutation operator
Mutation is also designed to be processed for different parts. For the b-Part, a random gene is chosen and the value  is substituted by ( ) NOT  . While for the r-Part, another gene is chosen randomly and the value  is replaced by a new random number between [0,1] .

Stopping criteria
There are two termination conditions in GACEM. Either the maximum number of iterations over the terminal number max I of generations or the best fitness value beyond the value of fitness limit fit L , the algorithm will stop.

Flowchart
Now we have introduced most of the details of GACEM, but there is still another three important prerequisites before performing the algorithm: (1) choosing the basic classifier, (2) determination of N and (3) choosing the decision combination function. For (1), first it is notable that GACEM is classifier-independent, i.e., any classifier, such as a neural network (NN) or a decision tree (DT), could in theory be applied as basic classifier for the ensemble, but considering the fact that GA is inherently a time-consuming kind of search strategy, the more efficient ones like decision trees and k nearest neighbors (k-NN) will be better choices. For (2), theoretically, the range of N could be from 1 to  (this makes no sense of course), but too large value of N will increase the complexity of a classifier ensemble system [30], while if N is too small, the performance of the GACEM will deteriorate without enough diverse classifiers, so the search for an appropriate N is a heuristic process and we will discuss it in Section 4.2. For (3), as we know, although there has been a lot of prevailing approaches such as voting and averaging [11,31], none has been proved to be the panacea. The choice is indeed more of an art than a science. But it has been proved that ensemble many instead of all of the classifiers at hand could achieve better performance [23]. So the basic idea in GACEM is among all N classifiers, just taking those whose weights (i.e.  ) are bigger than a pre-set threshold  to join the ensemble and ignoring the others. And the effect of different combination function will be discussed in Section 4. Step 1. Generate initial population of chromosomes.
Evaluate fitness (classification accuracy on val S ) of each new chromosome: Decoding the i -th chromosome and building N classifiers based on train S ; Choosing those classifiers whose weight is bigger than  to construct the classifier ensemble; Calculating the classification accuracy (i.e., fitness of the i -th chromosome) of val S using the generated classifier ensemble; Find the chromosome with highest fitness 0 b Chm among the population;

}
Step 3. Are the optimization criteria met? If YES, go to step 9. If NO, go to step 4.
Step 4. Generate new population using the selection operator.
Step 5. Perform the crossover operator according to the crossover probability c P .
Step 6. Perform the mutation operator according to the mutation probability m P .

Experiment environment
There have been a number of applications of MSS in modern engineering and sound source classification is one of them. In order to acquire a better estimation of the sound source's characters, a number of sensors are used for condition monitoring and data acquisition. For example, [32] demonstrated utilization of an onboard MSS for monitoring and diagnosis of ship's acoustic health. In this article, an analogous experiment is designed. A ribbed cylindrical double-shell (see Figure 5) is built for simulation of the cabin of ship with reduced scale size and two vibration exciters (see Figure  6) are placed in the double-shell to simulate sound source by working at different frequency condition (See Table 1). Moreover, seven sensors including five accelerometers and two hydrophones are used for data acquisition in different positions (See Table 2). The overall sketch map of the experiment can be found in in Figure 7.

Feature generation
In our experiment, the sampling frequency is 1 kHz and the analyzing frequency is 500 Hz. For each sound source, the sampling time is 10 s, so the time series of each sound source contains 10,000 points. When extracting data samples from the recordings, we choose the segments of continuous 512 points from the beginning in turn. Then the number of data samples of each sound source is 19 and among them, four are picked out for training, five for validation in the fitness function and 10 for testing the generalization. So the total number of data samples in training set, validation set and test set of all sound sources is 140, 175 and 350 respectively. And for a given sound source, the data samples in different sets are all I.I.D (Independent Identically Distributed) due to the steady signal character of the source. The detailed introduction of different sample sets can be found in Table 3. After computing the power spectrum of each raw data pattern, we then divide the spectrum vector from 0 to 500 Hz into 25 equal-width bins each holding 20 Hz frequency band. And the sum of each bin is taken as one dimension of the feature vector for the classification. So the raw data sample can be transformed into a 25-dimensional feature vector. Supposing represents such a feature vector, it is then to be scaled through the following step: to ensure all the elements of x will vary between [0,1] . For example, the time series, power spectrum and feature vector of one sample of the 22 nd sound source signal in channel A 1 are shown in Figure 8.

Experimental methodology
In our experiments, GACEM is compared with the conventional approaches, i.e. feature-level fusion (FLF), decision-level fusion (DLF), and the single basic classifier generated on the Sensor channel with the Best Performance (SBP). Here the genetic algorithm employed by GACEM is realized in MATLAB 7.1. The experiments with GACEM are confined to four basic types of classifiers: (1) Linear Discriminant Classifier (LDC) [33], (2) Quadratic Discriminant Classifier (QDC) [33], (3) k-Nearest Neighbor (k-NN) [34] and (4) Classification And Regression Trees (CART) [35]. Besides, in one round performance comparison among FLF, DLF, SBP and GACEM, the selected basic classifiers are identical. Here we do not optimize the architecture and the parameters of those basic classifiers because we care the relative performance of the ensemble approaches instead of their absolute performance. What's more, as mentioned above, DC F can be arbitrary rule. Without the loss of generality, we adopt the plurality voting as the decision combination function. . And the plurality voting is adopted as the decision combination function. The results of the Classification Accuracy Rate (CAR) of GACEM with different basic classifier are given in Figure 9. The best fitness function value versus generation of GACEM with different basic classifier is shown below in Figure 10.  Table 4. Each row represents the feature source of the classifier, for example, in Table 4(a), the first classifier 1 f is built on feature from the 2 nd sensor channel (H 2 ) and its weight is 0.2075. Because our given threshold is 0.05, so 1 f is accepted into the classifier ensemble system. and also adopt the plurality voting as the decision combination function. A natural explanation for choosing  is that the classifier whose weight is less than the average (1/ N ) will contribute little for ensemble.
Comparison of CAR when 3 N M  and N M  is demonstrated in Figure 11. We find that CAR does have been improved on all kinds of basic classifier, which proves that our hypothesis of enlarging the value of N is helpful. Also, the best fitness function value versus generation of GACEM with different basic classifier is shown below in Figure 12. Like Figure 10, it is further believed that more generations will yield better performance because of the existence of uptrend in the last few generations. , the number of selected classifiers in ensemble is 7, 11, 3 and 12 using LDC, QDC, k-NN and CART respectively. In particular, when the basic classifier is k-NN, over all 21 ( 3 21 N M   ) generated classifiers, only three of them are chosen for ensemble (see Table 5). On the contrary, the performance is even better than the ensemble consisting of seven classifiers presented in Table 4(c). This means that GACEM can generate classifier ensembles with far smaller sizes but more powerful classification ability. Table 5. Encoded chromosome individual with the best fitness on k-NN, noting that only 15 f , 16 f , and 19 f whose weight is greater than the threshold   Figure 13 shows: fixing the basic classifier, the CAR of GACEM varies little among the three kind of listed combination functions, i.e., majority voting, plurality voting and weighted averaging. This means that GACEM is not so sensitive to the selection of combination function.

Conclusions
The experimental study shows that GACEM is superior to both the conventional feature-level fusion and decision-level fusion because it utilizes the combination of more than one classifier to obtain a more precise classification result. Besides, GACEM is able to choose the elites for ensemble among the classifiers where the good and bad are intermingled, which could reduce the complexity of the classifier ensemble system remarkably.
Note that although GACEM has obtained impressive performance in our empirical study, we believe that there are still some candidate improvement directions on GACEM: (1) taking more sophisticated and powerful classifier such as support vector machine (SVM) as the basic classifier, (2) improving the basic classifiers by synergizing with subsampling the training examples such as Bagging or Boosting and (3) using different basic classifier for different subset of features set by adding extra gene positions to indicate both the basic classifier's type and parameters and then allowing the GA to search the optimal setting. Also, it is feasible to design algorithms for sensor selection [36,37] along the way that GACEM goes.