# A Risk Profile for Information Fusion Algorithms

^{1}

^{2}

^{*}

## 1. Introduction

## 2. Background

#### 2.1. Alpha-Beta Fusion Algorithm

^{β}representing the effective number of independent samples, both the need to smooth errors and account for correlations is modeled. The probability of class C

^{i}, conditioned on multiple data inputs, X

^{i}is approximated by [21]:

_{i}is a weighting of the input probabilities. The generalized mean of the likelihood function P(x

_{i}|C

_{i}) is computed where α is the parameter of the generalized mean. The prior probability P(C

_{i}) is considered independent of the likelihoods. β ranges from 0 (fully correlated likelihoods) to 1 (fully independent likelihoods). Combinations of alpha and beta provide both well-known combining rules such as the naïve-Bayes (α = 0, β = 1), log-average (0,0), average (1,0) and a continuum of combinations between these rules which provide flexibility in modeling error and correlation. The weighting parameter can be used for confidence measures on the individual inputs. While these weights will not be examined here, use of the assessment tools described here on individual inference algorithms could provide a method for assigning these weights.

_{true,i}, where p

_{true,i}is the probability of the true class for the i

^{th}sample. While this metric accurately reflects the cost of information for a probability, the asymptotic approach to infinite as the probability approaches zero, is a severe cost which has been criticized as not reflective of the performance of forecasting systems. Within forecasting a common alternative is the Brier or mean-square scoring rule, (1 − p

_{true,i})

^{2}, which limits the cost of reporting zero probability to 1. Like surprisal, the Brier is a proper score, which requires that the reported forecast is unbiased relative to the expected probability. Nevertheless, an overreliance on the mean-square average, which does not reflect the full characteristics of an information metric, can encourage the design of inferencing algorithms which allow very confident probabilities. This over-confidence reduces the robustness of the algorithm. This can be examined by considering a direct model of risk and robustness using the principals of nonlinear statistical coupling and will be evident in the example shown in Section 4.

#### 2.2. Modeling Risk with Coupled-Surprisal

_{q}(AB) = H

_{q}(A) + H

_{q}(B) + (1 − q)H

_{q}(A)H

_{q}(B). The maximum entropy distribution for non-additive entropy constrained by the deformed probabilities is referred to as a q-exponential or q-Gaussian . This methodology has been shown to provide an effective model of the origin of power-law phenomena in a variety of complex system [26] such as turbulence, dynamics of solar wind, stock market fluctuations, and biological processes to name just a few. Further theoretical evidence including generalized central limit theorems [15,27], probabilistic models [28,29,30], and models of statistical fluctuations [31,32] have demonstrated a foundational justification for this approach. Nevertheless, a direct physical interpretation of the symbol q has remained elusive. One of the difficulties is that the original convention based on raising probabilities to a power, results in an asymmetry between the mathematical model and the physical principal of non-additivity. By directly modeling the degree of non-additivity as κ = 1 − q other physical interpretations are more direct and the mathematical symmetry simplifies the development of consistent models [6,10,33,34]. Kappa is choosen as a symbol because of its usage for coupling coefficients, such as the spring constant and its relation to curvature in geometry. The definition used here is related to Kaniadakis’ deformed logarithm by κ

^{coupling}= 2κ

^{Kaniadakis}with r = 2κ

^{Kaniadakis}[35,36].

_{κ}(p

_{i}), which is shown in Figure 1, has some advantages with regard to a consistent interpretation of κ specifying the negative risk or optimism. Positive values of κ lower the information cost, which is equivalent to a reduction in risk or an optimistic belief. This domain has a finite cost for probability zero and is associated with the maximum entropy distributions with compact-support domain, which have a finite domain of non-zero probabilities. The cost of information for negative values of κ approaches infinity faster than the Shannon surprisal. This domain is associated with the heavy-tail maximum entropy distributions, and is consistent with the higher cost of information selecting distributions which are more ‘robust’ in that they model a slower decay to states which can be ignored.

**Figure 1.**Coupled-surprisal cost function, −ln

_{κ}p

_{true}. The red curve is Shannon surprisal (κ = 0); curves above this represent a more robust metric (κ < 0); curves below surprisal represent a more decisive metric (κ > 0).

## 3. Relationship between Generalized Mean and Generalized Entropies

_{i}and p

_{i}

^{−κ}to emphasis the relation to the generalized mean with the weight and the sample equal to p

_{i}and the mean parameter α = −κ. The expression for the Renyi entropy is common with substitution κ = 1 − q; however, the connection between Equations (8) and (6) requires use of the κ-power and κ-product:

_{κ}p

_{i,true}, with equally weighted test samples :

_{f}= α. The input weights modify the confidence of each input or in terms of risk are w

_{i}= 1 − κ

_{i}. The output confidence can be split into a portion which is the sum of the input weights and a portion which is the output confidence w

^{β−1}= 1 − κ

_{o}. Thus the fusion method can be viewed as controlling the risk bias on the input, fusion, and output of the probabilities:

_{f}= 0 the coupled power term reduces to the standard power term which is the expression for the coupled-probability. Likewise, the output probability is modified by the coupled-probability term 1 − κ

_{o}.

## 4. Application to Designing and Assessing a Fusion Algorithm

**Figure 2.**(a) Examples of the handwritten numerals used as a classification and inferencing problem. (b) Individual misclassification for the six feature sets. (c) Fusion of the feature sets using the generalized mean with alpha varied between −1 and 2.

**Figure 3.**(a) Histogram of the probabilities assigned to the true class for four fusion methods. (b) A risk profile based on the generalized mean of the true class probabilities versus the coupling parameter κ. The naïve-Bayes is a decisive fusion method which has near perfect score for large positive values of κ, but lacks robustness which is reflected in the sharp drop in the generalized mean for negative values of κ. Averaging and log-averaging are more robust methods; the generalized mean decays slower for negative values of κ but does not achieve as high a value for positive κ. Using the alpha-beta fusion method, the Neutral (Shannon Surprisal) metric is optimized for α = 0.4, β = 0.6 and has improved robustness relative to the naïve-Bayes.

_{i}of the alpha-beta fusion method in Equation (1) can be viewed as an individual risk bias on the inputs equal to κ

_{i}= 1 − w

_{i}. These algorithmic methods will be considered in more detail in a future publication.

**Figure 4.**Performance of alpha-beta fusion against effective probabilities based on the cost-function for (a) Shannon surprisal, κ = 0; (b) Brier or mean-square average; (c) Robust Coupled-Surprisal, κ = −0.5; and (d) Decisive Coupled-Surprisal, κ = 0.5. The circles indicate the region of optimal performance.

## 5. Conclusions

^{β}to model the effective number of independent samples given N inputs. Together the alpha-beta fusion algorithm:

## Acknowledgements

## References

- Dawid, A.P. The geometry of proper scoring rules. Ann. Inst. Stat. Math.
**2007**, 59, 77–93. [Google Scholar] [CrossRef] - Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc.
**2007**, 102, 359–378. [Google Scholar] [CrossRef] - Jose, V.; Nau, R.F.; Winkler, R.L. Scoring rules, generalized entropy, and utility maximization. Oper. Res.
**2008**, 56, 1146. [Google Scholar] [CrossRef] - Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Rosen, D.B. How good were those probability predictions? The expected recommendation loss (ERL) scoring rule. In Proceedings of the Thirteenth International Workshop on Maximum Entropy and Bayesian Methods; Heidbreder, G.R., Ed.; Kluwer Academic Pub.: Santa Barbara, CA, USA, 1993; p. 401. [Google Scholar]
- Wang, Q.A.; Nivanen, L.; Le Mehaute, A.; Pezeril, M. On the generalized entropy pseudoadditivity for complex systems. J. Phys. A
**2002**, 35, 7003–7007. [Google Scholar] [CrossRef] - Furuichi, S.; Yanagi, K.; Kuriyama, K. Fundamental properties of Tsallis relative entropy. J. Math. Phys.
**2004**, 45, 4868. [Google Scholar] [CrossRef] - Beck, C. Generalised information and entropy measures in physics. Cont. Phys.
**2009**, 50, 495–510. [Google Scholar] [CrossRef] - Tsallis, C. Nonadditive entropy and nonextensive statistical mechanics-an overview after 20 years. Braz. J. Phys.
**2009**, 39, 337–356. [Google Scholar] [CrossRef] - Nelson, K.P.; Umarov, S. Nonlinear statistical coupling. Phys. A
**2010**, 389, 2157–2163. [Google Scholar] [CrossRef] - Borges, E.P. A possible deformed algebra and calculus inspired in nonextensive thermostatistics. Phys. A
**2004**, 340, 95–101. [Google Scholar] [CrossRef] - Pennini, F.; Plastino, A.; Ferri, G.L. Fisher information, Borges operators, and q-calculus. Phys. A
**2008**, 387, 5778–5785. [Google Scholar] [CrossRef] - Suyari, H.; Tsukada, M. Law of error in Tsallis statistics. IEEE Trans. Inf. Theory
**2005**, 51, 753–757. [Google Scholar] [CrossRef] - Wada, T.; Suyari, H. κ-generalization of Gauss' law of error. Phys. Lett. A
**2006**, 348, 89–93. [Google Scholar] [CrossRef] - Umarov, S.; Tsallis, C.; Steinberg, S. On a q-central limit theorem consistent with nonextensive statistical mechanics. Milan J. Math.
**2008**, 76, 307–328. [Google Scholar] [CrossRef] - Kittler, J.; Hatef, M.; Duin, R.; Matas, J. On combining classifers. IEEE Trans. Patt. Anal. Mach. Intel.
**1998**, 20, 226. [Google Scholar] [CrossRef] - Tax, D.; Van Breukelen, M.; Duin, R. Combining multiple classifiers by averaging or by multiplying? Patt. Recognit.
**2000**, 33, 1475–1485. [Google Scholar] [CrossRef] - Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; Wiley-Interscience: Hoboken, NJ, USA, 2004. [Google Scholar]
- Hero, A.O.; Ma, B.; Michel, O.; Gorman, J. Alpha-divergence for classification, indexing and retrieval. Technical Report CSPL-328, U. Mich., Communication and Signal Processing Laboratory, May 2011. [Google Scholar]
- Amari, S. Integration of stochastic models by minimizing α-divergence. Neural Comp.
**2007**, 19, 2780–2796. [Google Scholar] [CrossRef] [PubMed] - Scannell, B.J.; McCann, C.; Nelson, K.P.; Tgavalekos, N.T. Fusion algorithm for the quantification of uncertainty in multi-look discrimination. In Presented at the 8th Annual U.S. Missile Defense Conference, Washington, DC, USA, 22–24 March 2010.
- Anteneodo, C.; Tsallis, C.; Martinez, A.S. Risk aversion in economic transactions. Europhys. Lett.
**2002**, 59, 635–641. [Google Scholar] [CrossRef] - Anteneodo, C.; Tsallis, C. Risk aversion in financial decisions: A nonextensive approach. arXiv, 2003; arXiv:cond-mat/0306605v1. [Google Scholar]
- Topsoe, F. On truth, belief and knowledge. In ISIT’09, Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, Korea, 28 June–3 July 2009; Volume 1, pp. 139–143.
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys.
**1988**, 52, 479–487. [Google Scholar] [CrossRef] - Gell-Mann, M.; Tsallis, C. Nonextensive Entropy: Interdisciplinary Applications; Oxford University Press: New York, NY, USA, 2004. [Google Scholar]
- Vignat, C.; Plastino, A. Central limit theorem and deformed exponentials. J. Phys. A
**2007**, 40, F969–F978. [Google Scholar] [CrossRef] - Marsh, J.A.; Fuentes, M.A.; Moyano, L.G.; Tsallis, C. Influence of global correlations on central limit theorems and entropic extensivity. Phys. A
**2006**, 372, 183–202. [Google Scholar] [CrossRef] - Moyano, L.G.; Tsallis, C.; Gell-Mann, M. Numerical indications of a q-generalised central limit theorem. Europhys. Lett.
**2006**, 73, 813. [Google Scholar] [CrossRef] - Hanel, R.; Thurner, S.; Tsallis, C.C. Limit distributions of scale-invariant probabilistic models of correlated random variables with the q-Gaussian as an explicit example. Eur. Phys. J. B
**2009**, 72, 263–268. [Google Scholar] [CrossRef] - Beck, C.; Cohen, E. Superstatistics. Phys. A
**2003**, 322, 267–275. [Google Scholar] [CrossRef] - Wilk, G.; Wodarczyk, Z. Fluctuations, correlations and the nonextensivity. Phys. A
**2007**, 376, 279–288. [Google Scholar] [CrossRef] - Nelson, K.P.; Umarov, S. The relationship between Tsallis statistics, the Fourier transform, and nonlinear coupling. arXiv, 2008; arXiv:0811.3777v1 [cs.IT]. [Google Scholar]
- Souto Martinez, A.; Silva González, R.; Lauri Espíndola, A. Generalized exponential function and discrete growth models. Phys. A
**2009**, 388, 2922–2930. [Google Scholar] [CrossRef] - Kaniadakis, G.; Scarfone, A.M. A new one-parameter deformation of the exponential function. Phys. A
**2002**, 305, 69–75. [Google Scholar] [CrossRef] - Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Two-parameter deformations of logarithm, exponential, and entropy: A consistent framework for generalized statistical mechanics. Phys. Rev. E
**2005**, 71, 46128. [Google Scholar] [CrossRef] - Tsallis, C.; Plastino, A.R.; Alvarez-Estrada, R.F. Escort mean values and the characterization of power-law-decaying probability densities. J. Math. Phys.
**2009**, 50, 043303. [Google Scholar] [CrossRef][Green Version] - Abe, S. Stability of Tsallis entropy and instabilities of Renyi and normalized Tsallis entropies: A basis for q-exponential distributions. Phys. Rev. E
**2002**, 66, 46134. [Google Scholar] [CrossRef] - Oikonomou, T. Tsallis, Renyi and nonextensive Gaussian entropy derived from the respective multinomial coefficients. Phys. A
**2007**, 386, 119–134. [Google Scholar] [CrossRef] - Machine Learning Repository. Available online: http://www.ics.uci.edu/~mlearn/MLRepository.html (accessed on 15 November 2010).
- Duin, R.; Tax, D. Experiments with classifier combining rules. In Multiple Classifier Systems; Springer: Berlin, Germany, 2000; Volume 1857, pp. 16–29. [Google Scholar]
- Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]

© 2011 by the authors licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nelson, K.P.; Scannell, B.J.; Landau, H.
A Risk Profile for Information Fusion Algorithms. *Entropy* **2011**, *13*, 1518-1532.
https://doi.org/10.3390/e13081518

**AMA Style**

Nelson KP, Scannell BJ, Landau H.
A Risk Profile for Information Fusion Algorithms. *Entropy*. 2011; 13(8):1518-1532.
https://doi.org/10.3390/e13081518

**Chicago/Turabian Style**

Nelson, Kenric P., Brian J. Scannell, and Herbert Landau.
2011. "A Risk Profile for Information Fusion Algorithms" *Entropy* 13, no. 8: 1518-1532.
https://doi.org/10.3390/e13081518