2.1. The Basic Concepts and Definitions
In order to make the paper self-contained, we define a simple model of a real computer. A detailed description can be found in [
4]. A computer consists of a set of instructions
I and an accessible memory
M. It is important to note that any instruction
contains not only its name (say, JUMP), but memory addresses and indexes of registers. For example, all instructions JUMP which deal with different memory addresses are contained in
I as different instructions.
We suppose that at the initial moment there is a program and data which can be considered as binary words P and D, located in the memory of a computer M. In what follows we will call the pair P and D a computer task. A computer task determines a certain sequence of instructions . (It is supposed that an instruction may contain an address of a memory location, the index of a register, etc.) For example, if the program P contains a loop which will be executed ten times, then the sequence X will contain the body of this loop repeated ten times. We say that two computer tasks and are different, if the sequences and are different.
It is important to note that we do not suppose that all possible sequences of instructions are allowable. In principle, there can be sequences of instructions which are forbidden. For example, it is possible that some pairs of instructions are forbidden,
etc. In other words, it is possible that sequences of instructions should satisfy some limitations (or some rules). We define the set of all allowable sequences of instructions by
. So, any computer task can be presented as a sequence of instructions from
. Moreover, the opposite is also true: any sequence of instructions from
can be considered as a computer task. Indeed, using a so-called assembly language any sequence of instructions from
can be presented as a computer program, see, for example, [
6]. (It is worth noting that some sequences can be meaningless and two different sequences of instructions can be equal. This situation is typical for any language when someone considers its capacity, because a language contains synonyms,
etc.)
Let us denote the execution time of an instruction
x by
. For the sake of simplification, we suppose that all execution times
are integers and the greatest common divisor of
equals 1. (This assumption is valid for many computers if the time unit equals a so-called clock rate and there are instructions whose executed time is one unit,
i.e.,
; see [
6].) In this paper this assumption gives a possibility to use lim instead of lim sup, when the capacity will be defined.)
Naturally, the execution time
of a sequence of instructions
X is given by
Denote the number of different problems, whose execution time equals
T, by
and let
be the size of the set of all sequences of instructions, whose execution time equals
T,
i.e.,
The key observation is as follows:
Hence,
(Here and below
T is an integer,
and
is the number of elements of
Y if
Y is a set, and the length of
Y if
Y is a word.) In other words, the total number of computer tasks executed in time
T is equal to (1). Basing on this consideration we give the following definition.
Definition 1 Let there be a computer with a set of instructions I and let be the execution time of an instruction The computer capacity is defined as follows:where is defined in (1). Claim 1 The limit (4) exists if I is finite, execution times are integers and the greatest common divisor of equals 1.
The next question to be investigated is the definition of the efficiency of computer (or performance), when a computer is used for solving problems of a certain kind. For example, one computer can be a Web server, another can be used for solving differential equations, etc. Certainly, the computer efficiency depends on the problems the computer has to solve. In order to model this situation we suggest the following approach: there is an information source which generates a sequence of computer tasks in such a way, that the computer begins to solve each next task as soon as the previous task is finished. We will not deal with a probability distribution on the sequences of the computer tasks, but consider sequences of computer instructions, determined by sequences of the computer tasks, as a stochastic processes. In what follows we will consider the model when this stochastic process is stationary and ergodic, and we will define the computer efficiency for this case.
A natural question is the applicability of this model. The point is that modern computers are commonly used for solving several computer tasks in parallel, which is why the sequence of executed instructions is a mixture of quite a large number of subsequences. So, in some natural situations, this sequence can be modeled by a stationary ergodic source.
The definition of efficiency will be based on results and ides of information theory, which we introduce in what follows. Let there be a stationary and ergodic process
generating letters from a finite alphabet
A (the definition of stationary ergodic process can be found, for example, in [
7]). The
order Shannon entropy and the limit Shannon entropy are defined as follows:
where
,
is the probability that
(this limit always exists, see [
5,
7]). We will consider so-called i.i.d. sources. By definition, they generate independent and identically distributed random variables from some set
A. Now we can define the computer efficiency.
Definition 2. Let there be a computer with a set of instructions I and let be the execution time of an instruction Let this computer be used for solving such a randomly generated sequence of computer tasks, that the corresponding sequence of the instructions ,
,
is a stationary ergodic stochastic process. Then the efficiency is defined as follows:where is the probability that .
Informally, the Shannon entropy is a quantity of information (per letter), which can be transmitted and the denominator in Equation (
6) is the average execution time of an instruction.
More formally, if we take a large integer
T and consider all
letter sequences
, then, for large
T, the number of “typical” sequences will be approximately
whereas the total execution time of the sequence will be approximately
(By definition of a typical sequence, the frequency of any word
u in it is close to the probability
. The total probability of the set of all typical sequences is close to 1.) So, the ratio of
and the average execution time will be asymptotically equal to (
6), if
A rigorous proof can be obtained basing on methods of information theory; see [
7]. We do not give it, because definitions do not need to be proven, but mention that there are many results about channels which transmit letters of unequal duration [
8].
2.2. Methods for Estimating the Computer Capacity
Now we consider the question of estimating the computer capacity and efficiency defined above. The efficiency, in principle, can be estimated basing on statistical data, which can be obtained by observing a computer which solves tasks of a certain kind.
The computer capacity
can be estimated in different situations by different methods. In particular, a stream of instructions generated by different computer tasks can be described as a sequence of words created by a formal language, or the dependence between sequentially executed instructions can be modeled by Markov chains,
etc. Seemingly the most general approach is to define the set of admissible sequences of instructions as a certain subset of all possible sequences. More precisely, the set of admissible sequences
G is defined as a subset
where
is the set of one-side infinite words over the alphabet
A:
In this case the the capacity of
G is deeply connected with the topological entropy and Hausdorff dimension; for definitions and examples see [
9,
10,
11,
12] and references therein. We do not consider this approach in details, because it seems to be difficult to use it for solving applied problems which require a finite description of the channels.
The simplest estimate of computer capacity can be obtained if we suppose that all sequences of the instructions are admissible. In other words, we consider the set of instructions
I as an alphabet and suppose that all sequences of letters (instructions) can be executed. In this case the method of calculation of the lossless channel capacity, given by C. Shannon in [
5], can be used. It is important to note that this method can be used for upper-bounding the computer capacity for all other models, because for any computer the set of admissible sequences of instructions is a subset of all words over the “alphabet”
I.
Let, as before, there be a computer with a set of instructions
I whose execution time is
and all sequences of instructions are allowed. In other words, if we consider the set
I as an alphabet, then all possible words over this alphabet can be considered as admissible sequences of instructions for the computer. The question we consider now is how one can calculate (or estimate) the capacity (4) for this case. The solution is suggested by C. Shannon [
5] who showed that the capacity
is equal to the logarithm of the largest real solution
of the following equation:
where
. In other words,
It is easy to see that the efficiency (6) is maximal, if the sequence of instructions
,
is generated by an i.i.d. source with probabilities
where
is the largest real solution to the Equation (
7),
. Indeed, having taken into account that
for i.i.d. source [
7] and the definition of entropy (5), the direct calculation of
in (6) shows that
and, hence,
It will be convenient to combine all the results about computer capacity and efficiency in the following statement:
Theorem 1 Let there be a computer with a set of instructions I and let be the execution time of .
Suppose that all sequences of instructions are admissible computer programs. Then the following equalities are valid:- i)
The alphabet capacity (4) equals where is the largest real solution to the Equation (7).
- ii)
The efficiency (6) is maximal if the sequences of instructions are generated by an i.i.d. source with probabilities