Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving

Eberbach, Eugene

doi:10.3390/IS4SI-2017-04037

Open AccessProceeding Paper

Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving^†

by

Eugene Eberbach

Department of Computer Science and Robotics Engineering Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA

^†

Presented at the IS4SI 2017 Summit DIGITALISATION FOR A SUSTAINABLE SOCIETY, Gothenburg, Sweden, 12–16 June 2017.

Proceedings 2017, 1(3), 194; https://doi.org/10.3390/IS4SI-2017-04037

Published: 9 June 2017

(This article belongs to the Proceedings of Proceedings of the IS4SI 2017 Summit DIGITALISATION FOR A SUSTAINABLE SOCIETY, Gothenburg, Sweden, 12–16 June 2017.)

Download Versions Notes

Abstract

:

We study the relation between Information Theory and Automatic Problem Solving to demonstrate that the Entropy measure can be used as a special case of $-Calculus Cost Functions measure. We hypothesize that Kolmogorov Complexity (Algorithmic Entropy) can be useful to standardize $-Calculus Search (Algorithm) Cost Function.

Keywords:

information theory; entropy; superTuring models of computation; automatic problem solving; $-calculus; machine learning; ID3

1. Introduction

Information Theory Field founded by Claude Shannon in 1948 [1] is a branch of statistics that is essentially about uncertainty in communication. Shannon showed that uncertainty can be quantified, linking physical entropy to messages and defined the entropy of a discrete random variable 𝑋 as Entropy (𝑋) = −

\sum_{i = 1}^{n}

(x_i)log₂ P(x_i).

A (key) result of Shannon entropy is that −log₂ (x_i) gives the length in bits of the optimal prefix code (e.g., Huffman code) for a message x_i. Similar like for probabilities, the conditional entropy has been defined to express mutual information, and entropy for continuous random variables.

It looks that Information Theory can be applied practically to anything, including coding theory, communication, data mining, machine learning, physics, bioinformatics. In this paper, we investigate the relations between Information Theory and Automatic Problem Solving.

Universal Problem Solving Methods are the part of AI and Theoretical Computer Science [2,3,4,5]. However, automatic problem solving requires construction of the universal algorithm, and this is a Turing machine unsolvable problem. The importance of automatic problem solving methods is so tremendous that even partial solutions are very desirable:

The never ending dream of universal problem solving methods resurrect throughout all computer science history:
-
Universal Turing Machine (UTM, Turing, 1936);
-
General Problem Solver (GPS, Simon and Newell, 1961);
-
Evolutionary Algorithms (Holland, Rechenberg, Fogel, 1960s);
-
Prolog and Fifth Generation Computers Project (1970/80s);
-
Genetic Programming Problem Solver (GPPS, Koza, 1990s);
-
$-Calculus ([2,3,4]);
Negative Results: a universal algorithm does not exist—halting/decision problem of UTM (Turing, 1936) can be represented as a special instance of the universal algorithm, No Free Lunch Theorem (Wolpert, Macready, 1997);
Positive Result: a universal algorithm can be indefinitely asymptotically approximated (Eberbach, CEC’2003).

For such approximation, the models of computation more expressive than TM are needed. They are called superTuring or hypercomputational models of computation or superrecursive algorithms [3,6].

2. Automatic Problem Solving: $-Calculus SuperTuring Model of Computation

$-Calculus (read: Cost Calculus, [2,3,4] is a process algebra using anytime algorithms [5] for interactive problem solving targeting intractable and undecidable problems. It is formalization of resource-bounded computation (anytime algorithms) [5] guaranteeing to produce better results if more resources (e.g., time, memory) become available. Its unique feature is support for problem solving by incrementally searching for solutions and using cost performance measure to direct its search.
We can write in short that $-Calculus = Process Algebra + Anytime Algorithms.
Historically $-Calculus has been inspired both by λ-Calculus and π-Calculus:
-
Church’s λ-Calculus (1936): sequential algorithms, equivalent to TM model, built around function as a basic primitive.
-
Milner’s π-Calculus (1992): parallel algorithms, a superTuring model, but no support for automatic problem solving, built around interaction.
-
Eberbach’s $-Calculus (1997): parallel algorithms, a superTuring model, built-in support for automatic problem solving (kΩ-optimization), built around cost.
The kΩ-optimization meta-search represents this “impossible” to construct but “possible to approximate indefinitely” universal algorithm, i.e., it approximates the universal algorithm. It is a very general search method, allowing to simulate many other search algorithms, including A*, minimax, dynamic programming, tabu search, evolutionary algorithms.

$-Calculus Syntax:

Simplicity, everything is $-expression, open system, prefix notation with potentially infinite number of arguments, and consisting of simple and complex $-expressions.
-
Simple (atomic) $-expressions.
send (→ a P1 P2 …) send Pi through channel a.
receive (← a X1 X2 …) receive Xi from channel a.
cost ( $ P1 P2 …) compute cost of Pi.
suppression ( ‘ P1 P2 …) suppress evaluation of Pi.
atomic call ( a P1 P2 …) and its definition ( := (a X1 X2 …) ^P^).
negation ( ¬a P1 P2 …) negation of atomic function call.
-
Complex $-expressions.
general choice ( + P1 P2 …) pick up one of Pi.
cost choice ( ? P1 P2 …) pick up Pi with the smallest cost.
adversary choice ( # P1 P2 …) pick up Pi with the highest cost.
sequential composition ( . P1 P2 …) do P1 P2 … sequentially.
parallel composition ( || P1 P2 …) do P1 P2 … in parallel.
function call ( f P1 P2 …) and its definition ( := ( f X1 X2 …) P).

$-Calculus Operational Cost Semantics:

Search methods (and kΩ optimization, in particular) can be:
-
complete if no solutions are omitted,
-
optimal if the highest quality solution is found,
-
totally optimal if the highest quality solution with minimal search cost is found.
Search can involve single or multiple agents; for multiple agents it can be:
-
cooperative ($-calculus cost choice used),
-
competitive ($-calculus adversary choice used),
-
random ($-calculus general choice used).
Search can be:
-
offline (n = 0, the complete solution is computed first and executed after without perception),
-
online (n ≠ 0, action execution and computation are interleaved.

On Problem Solving as an Instance of Multiobjective Minimization:

Given an objective/cost function $: A × X → R, A is an algorithm operating on its input X and R set of real numbers, problem solving can be understood as a multiobjective (total) minimization problem to find a* ∈ A_F and x* ∈ X_F, A_F ⊆ A terminal states of algorithm, and X_F ⊆ X terminal states of X such that $(a*, x*) = min($₁($₂(a), $₃(x)), a∈A, x∈X}, where $₃ is a problem-specific cost function, $₂ is a search algorithm cost function, and $₁ is an aggregating function combining $₂ and $₃.

If $₁ becomes an identity function we obtain Pareto optimality keeping objectives separate.
For optimization (best quality solutions)—$₂ is fixed, and $₃ is used only.
For search optimization (minimal search costs)—$₃ is fixed, and $₂ is used only.
For total optimization (best quality solutions with min search costs)—both $₁, $₂ and $₃ are used.
kΩ-optimization (meta-search)—a very general search method that builds dynamically optimal or “satisficing” plans of actions from atomic and complex $-expressions.
kΩ-meta-search is controlled by parameters:
-
k—depth of search,
-
b—width of search,
-
n—depth of execution,
-
Ω—alphabet for optimization,
kΩ-optimization works iterating through phases: select, examine, execute.
It is very flexible and powerful method: combines the best of both worlds: deliberative agents for flexibility, and reactive agents for robustness.
The “best” programs are the programs with minimal cost—each statement in the language has its associated cost $ (this leads to a new paradigm—cost languages).

Cost Performance Measures and Standard Cost Function:

$-calculus is built around the central notion of cost. The cost functions represent a uniform criterion of search and the quality of solutions in problem solving. Cost functions have their roots in von Neumann/Morgenstern utility theory and they satisfy axioms for utilities [5]. In decision theory they allow to choose states with optimal utilities on average (the maximum expected utility principle).
In $-calculus they allow to choose states with minimal costs subject to uncertainty (expressed by probabilities, fuzzy set or rough set membership functions).
It is not clear whether it is possible to define a minimal and complete set of cost functions, i.e., possible to use “for anything”. $-calculus approximates this desire for universality by defining a standard cost function possible to use for many things (but not for everything, thus a user may define own cost functions).

3. Application of Information Theory Entropy as an Instance of $-Calculus Cost Measure

Most algorithms that have been developed for learning decision trees are variations on a core algorithm that employs a top-down, greedy search through the space of possible decision trees, i.e., the ID3 algorithm by Quinlan and its successor C4.5 [5]. ID3 performs a simple hill-climbing search through the hypothesis space of possible decision trees using as an evaluation function the Shannon-based information gain measure. The information gain Gain(S,A) of an attribute A relative to a collection of examples S is defined as

Gain(S,A) = Entropy(S) − Σ_v _{∈ Values(A)} Entropy(S_v) |S_v|/|S|, where Entropy(S) = Σ_i − p_i log₂ p_i.

Let’s consider problem solving (learning + classification) for ID3 expressed as a special case of kΩ-search finding the shortest classification tree by minimizing the sum of negative gains, i.e., maximizing the sum of positive gains. The system consists of one agent, i.e., p = 1, which is interested only in information gain for alphabet A = {a_i, a_ij}, i, j = 1, 2, i.e., Ω = A, i.e., costs of other actions are ignored (being neutral—in this case having 0 cost), and it uses a standard cost function $ = $₃, where $₃ represents the quality of solutions in the form of cumulative negative information gains—payoff in $-calculus. In other words, total optimization is not performed—only regular optimization like in the original ID3. A weak congruence is used. In other words, empty actions have zero cost. The number of steps in the derivation tree selected for optimization in the examine phase k = 2, the branching factor b = ∞, and the number of steps selected for execution in the examine phase n = 0, i.e., execution is postponed until learning is over. Flags gp = reinf = update = strongcong = 0. The goal of learning/classification is to minimize the sum of negative information gains. The machine learning takes the form of the tree of $-expressions that are built in the select phase, pruned in the examine phase and passed to execution phase for classification work. Data are split into training and test data as usual.

Let’s assume for simplicity that we have only one decision attribute and two input attributes a₁ and a₂ with data taking two possible values on them denoted by a₁₁, a₁₂, a₂₁, a₂₂. Let’s assume that cost of actions is equal to entropy of data associated with this action, i.e., $(a_i) = Entropy(a_i), $(a_ij) = Entropy(a_ij), i, j = 1, 2.

OPTIMIZATION: The goal will be to minimize the sum of costs (negative gains).

0.: t = 0, initialization phase init: S₀ = ε₀:

The initial tree consists of an empty action ε₀ representing a missing classification tree of which cost is ignored (a weak congruence). Because S₀ is not the goal state, the first loop iteration consisting of select, examine, and execute phase replaces an invisible ε₀ two steps deep (k = 2) by all offsprings b = ∞.

1.: t = 1, first loop iteration:

select phase sel: ε₀ = (? (. a₁ (+ (. a₁₁ ε₁₁) (. a₁₂ ε₁₂))) (. a₂ ( + (. a₂₁ ε₂₁) (. a₂₂ ε₂₂)))),

examine phase exam: $(S₀) = $(ε₀) = min ($(a₁) + p₁₂ $(a₁₂), $(a₂) + p₂₂ $(a₂₂)).

Let’s assume that attribute a₁ was selected, i.e., $-expression starting from a₁ is cheaper. Note that due to appropriate definition of the standard cost function [2,3] this is a negative gain from ID3.

Note that no estimates of future solutions are used (weak congruence—greedy hill climbing search). Execution is postponed (n = 0), and follow-up ε₁₁ and ε₁₂ will be selected for expansion in the next loop iteration.

2.: t = 2, second loop iteration:

select phase sel: ε₁₁ = (. a₂ (+ (. a₂₁ ε₂₁) (. a₂₂ ε₂₂))).

Let’s assume that ε₂₂ has data from one class only, thus this is the leaf node—no further splitting of training data is required.

examine phase exam:

Nothing to optimize/prune—all attributes were used in the path or the leaf node contained sample data from one class of the decision attribute. Thus the end of the learning phase and the shortest decision tree is designated for the execution:

execute phase exec:

Test data are classified by the decision tree left from the select/examine phases. After that the kΩ-search re-initializes for the new problem to solve.

Note that we can change for example values of k (considering a few attributes in parallel), b, n and optimization to total optimization, then this will be related, but not ID3 algorithm any more. This is the biggest advantage and flexibility of $-calculus automatic problem solving. It can modify “on fly” existing algorithms and design new algorithms, and not simulation of ID3 alone.

4. Conclusions and Future Work

Everything what we were able to demonstrate so far in this paper is that entropy can be used as a special case of $-calculus cost function, however it cannot replace all instances of cost functions. One of the main unsolved problems of $-calculus is axiomatization of cost functions in the style of Kolmogorov axiomatization of probability theory [5] (this might be an undecidable problem), or to estimate how good is approximation of all cost functions by $-calculus standard cost function. We hypothesize that Kolmogorov complexity known also as algorithmic entropy [7] can be used to standardize $-calculus search (algorithm) cost function $₂, but this is left for future work.

Conflicts of Interest

The author declares no conflict of interest.

References

Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Eberbach, E. Approximate reasoning in the algebra of bounded rational agents. Int. J. Approx. Reason. 2008, 49, 316–330. [Google Scholar] [CrossRef]
Eberbach, E. The $-Calculus process algebra for problem solving: A paradigmatic shift in handling hard computational problems. Theor. Comput. Sci. 2007, 383, 200–243. [Google Scholar] [CrossRef]
Eberbach, E. $-Calculus of bounded rational agents: Flexible optimization as search under bounded resources in interactive systems. Fundam. Inform. 2005, 68, 47–102. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
Burgin, M. Super-Recursive Algorithms; Springer: New York, NY, USA, 2005. [Google Scholar]
Kolmogorov, A.N. On tables of random numbers. Sankhyā Ser. A 1965, 25, 369–375. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Eberbach, E. Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving. Proceedings 2017, 1, 194. https://doi.org/10.3390/IS4SI-2017-04037

AMA Style

Eberbach E. Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving. Proceedings. 2017; 1(3):194. https://doi.org/10.3390/IS4SI-2017-04037

Chicago/Turabian Style

Eberbach, Eugene. 2017. "Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving" Proceedings 1, no. 3: 194. https://doi.org/10.3390/IS4SI-2017-04037

Article Menu

Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving^†

Abstract

1. Introduction

2. Automatic Problem Solving: $-Calculus SuperTuring Model of Computation

3. Application of Information Theory Entropy as an Instance of $-Calculus Cost Measure

4. Conclusions and Future Work

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving †

Abstract

1. Introduction

2. Automatic Problem Solving: $-Calculus SuperTuring Model of Computation

3. Application of Information Theory Entropy as an Instance of $-Calculus Cost Measure

4. Conclusions and Future Work

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Application of Information Theory Entropy as a Cost Measure in the Automatic Problem Solving^†