Next Article in Journal
Quantum Discord and Information Deficit in Spin Chains
Next Article in Special Issue
Information Geometry on Complexity and Stochastic Interaction
Previous Article in Journal
Kählerian Information Geometry for Signal Processing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fundamental Scale of Descriptions for Analyzing Information Content of Communication Systems

Laboratorio de Evolución, Universidad Simón Bolívar, Sartenejas, Baruta, Miranda, 1080, Venezuela
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(4), 1606-1633; https://doi.org/10.3390/e17041606
Submission received: 3 January 2015 / Revised: 16 March 2015 / Accepted: 19 March 2015 / Published: 25 March 2015
(This article belongs to the Special Issue Information Theoretic Incentives for Cognitive Systems)

Abstract

:
The complexity of the description of a system is a function of the entropy of its symbolic description. Prior to computing the entropy of the system’s description, an observation scale has to be assumed. In texts written in artificial and natural languages, typical scales are binary, characters, and words. However, considering languages as structures built around certain preconceived set of symbols, like words or characters, limits the level of complexity that can be revealed analytically. This study introduces the notion of the fundamental description scale to analyze the essence of the structure of a language. The concept of Fundamental Scale is tested for English and musical instrument digital interface (MIDI) music texts using an algorithm developed to split a text in a collection of sets of symbols that minimizes the observed entropy of the system. This Fundamental Scale reflects more details of the complexity of the language than using bits, characters or words. Results show that this Fundamental Scale allows to compare completely different languages, such as English and MIDI coded music regarding its structural entropy. This comparative power facilitates the study of the complexity of the structure of different communication systems.

1. Introduction

The understanding of systems and their complexity requires accounting for their entropy. The emergence of information upon the scale of observation has become a topic of discussion since it reveals much of the systems’ nature and structure. Bar Yam [1] and Bar-Yam et al. [2] have proposed the concept of complexity profile as a useful tool to study systems at different scales. Among others, Lopez-Ruiz et al. [3], and Prokopenko [4] focus on the change of the balance between the system disorder and self-organization for different scales of observation. In a different approach, Gell-Mann [5] considers complexity as a property associated to the irregularities of the physical system. But Gell-Mann sees both randomness and order as manifestations of regularity, and therefore quantities that offer the possibility for reducing the length of a description and hence the computed complexity of a system.
These complexity concepts are all evaluated using arbitrarily selected symbol scales. The selected observation scale depends on the communication system used in the description; for example, systems described with human natural languages are prone to be analyzed with the characters and words scales because they hold the most meaning for humans. When the analysis of information is in the context of its transmission, it is common to find binary codes as the base of study. A possible consequence of this preselected scale of observation is the possible inclusion of our assumptions about the system’s structure, which skews our interpretation about system properties.
Many studies have evaluated the entropy of descriptions based on a preconceived scale; in 1997 Kontoyiannis [6] evaluated the description entropies at the scale of characters; in 2002 Montemurro and Zanette [7] studied the entropy as a function of the word-role; more recently Savoy [8], and Febres et al. [9,10] have studied the impact of the style of writing over entropy speeches using the word as the unit of the scale. In 2009 Piasecki and Plastino [10] studied entropy as a function of a 2-dimensional domain. They explored the effects of multivariate distributions and calculate the entropy associated to several 2D patterns. All these studies share the same direction; assume a space for a domain and a scale and compute the entropy. The strategy of the present study is to set the same problem in a reversed fashion: given an entropy descriptor of a multivariate distribution defined for some domain space, what would be the best way to segment that domain space in order to reproduce the known entropy descriptor? The answer to this question would have a twofold value: (a) an indication to the scale that best represents the system expression as the distribution of sizes of the space segments, and (b) an approximation to the algorithmic complexity of the description.
Algorithmic complexity as a concept does not consider the observation scale [5,11]. Algorithmic complexity―also called Kolmogorov’s complexity―is the length of the shortest string that completely describes a system. Since the shortest string is a characteristic impossible to guarantee, algorithmic complexity has been regarded as an unreachable figure. Nevertheless, estimating complexity by searching for a nearly uncompressible description of a system, would have the advantage of being independent of the observation scale. In fact, a method to search for a nearly uncompressible description could be achieved by adjusting the observation scale until the process discovers the scale that best comprises the original description. The result would lead to an approximation to the algorithmic complexity of the system.
While these previous studies assume symbols as characters or words, in our present study we leave freedom to group adjacent characters, to form symbols in order to comply with a higher hierarchy criterion, as is the minimization of the entropy. This study develops a series of algorithms to recognize the set of symbols that, according to their frequency, leads to a minimum entropy description. The method developed in this study mimics a simplified communication system’s evolution process. The proposed algorithm is tested with short example of English text, and two descriptions, the first is an English text and the second, a sound musical instrument digital interface (MIDI) file. This representation of the components may convey a description of a system and its structural essence.

2. A Quantitative Description of a Communication System

A version of Shannon’s entropy formula, generalized for communication systems comprised of D symbols, is used to compute quantity of information in a descriptive text. To determine the symbols that make up the sequential text, a group of algorithms were developed. These algorithms are capable of recognizing the set of symbols which form the language used in the textual description. The number of symbols D represents not only the diversity of the language but also the fundamental scale used for the system description.

2.1. Quantity of Information for a D’nary Communication System

We refer to language as the set of symbols used to construct a written message. The number of different symbols in a language will be referred as the diversity D.
To compute the entropy h of a language, that is, the entropy of the set of D different symbols, used with a probability pi to form a written message, we use the Shannon’s entropy expression, normalized to produce values between zero and one:
h = D i = 1 p i · log D p i ,
Note that the base of the logarithm is equal to the language’s diversity D, whereas classical Shannon’s expression uses 2 as the base of the logarithm; also equal to the diversity of the binary language that he studied. Researchers such as Zipf [12], Kirby [13], Kontoyiannis [6], Gelbukh and Sidorov [14], Montemurro and Zanette [7], Savoy [15], Febres, Jaffe and Gershenson [9] and Febres and Jaffe [16], among others, have studied the relationship between the structure of some human and artificial languages, and the symbol probability distribution corresponding to written expressions of each type of language.
All these studies assume symbols as characters or words, in our present study we leave freedom to group adjacent characters, to form symbols in order to comply with the minimization of the entropy h as expressed in Equation (1). In the following sections we explain this optimization problem, and our approach to find a solution reasonably close to the set of symbols that produce an absolute minimum entropy.

2.2. Scale and Resolution

We propose a quantitative concept of scale: the scale of a system equals the diversity of the language used for its description. Thus, for example, if a picture is made with all available colors in an 8-bit-color map of pixels, then the diversity of the color language of the picture would equal 28, and the scale of the picture description, considering each color as a symbol, would be also 28. Another example would be a binary language, a scale 2 communication system made up of only two symbols. Notice we have used the term “communication system” to refer to the media used to code information.
Interestingly, the system’s description scale is determined, in first place, by the observer, and in a much smaller degree by the system itself. The presumably high complexity of a system, functioning with the actions and reactions of a large number of tiny pieces, simply dissipates if (a) the observer or the describer fails to see the details, (b) the observer or describer is not interested the details, and prefers to focus on the macroscopic interactions that regulate the whole system’s behavior, or (c) the system does not have sufficient different components, which play the role of symbols here, to refer to each type of piece. It is clear that any observed system scale implies the use of a certain number of symbols. It is also clear that the number of different symbols used in a description is linked with our intuitive idea of scale. There being no other known quantitative meaning of the word scale, we suggest its use as a descriptor of languages by specifying the number of symbols forming them.
Resolution specifies the maximum accuracy of observation and defines the smallest observable piece of information. In the computer coded files we used to interpret descriptions, we consider the character as the smallest observable and non-divisible piece of information.
Let E denote the physical space that a symbol or a character occupies, and let the sub-index signal the object being referred to. Thus, considering a written message M, constructed using DM different symbols Y as M = { Y 1 , Y 2 , , Y D M }, we would say the message M occupies the space EM and each symbol Yi occupies the space E Y i. We define the length of all characters equal one. Therefore E C i 1 for any i. Finally, if the number of characters in a message is N, each symbol Yi appears F Y i times within the message, and the symbol diversity is DM, we can write the following constraints over the number of characters, symbols and the space they occupy:
E M = D M i = 1 F Y i E Y i = N i = 1 E C i = N

2.3. Looking for a Proper Language Scale

We see the scale of a language as the set of finite symbols that “best” serves to represent a written message. The qualification “best” refers to the capacity of the set of symbols to convey the message with precision in the most effective way.
Take for example the western natural languages. Among their alphabets, there are only minor differences; too few differences to explain how far from each other those languages are. As Newman [17] observes, some letters may be the basic units of a language, but there are other units formed by groups of letters.
Chomsky’s syntactic structures [18], later called context-free grammar (CFG) [19] offers another representation of natural language structure. The CFG describes rules for the proper connections among words according to their specific function within the text. Thus, CFG is a grammar generator useful to study the structure of sentences. Chomsky himself treats a language as an infinite or finite set of sentences. CFG works at a much larger scale than the one we are looking for in this study.
Regarding natural languages it is common to think that a word is the group of characters within a leading and a trailing blank-space. At some time a meaning was assigned to that word, and thereafter the word’s meaning, as well as its writing, evolves and adopts a shape that works fine for us, the users of that language. Zipf’s principle of least effort [14] and Flesch’s reading ease score [20] certainly give indications about the mechanisms guiding words, as written symbols, to reduce the number of characters needed to be represented.
From a quantitative linguistics perspective, this widely accepted method for recognizing words offers limited applicability. Punctuation signs, for example, have a very precise meaning and use. The frequency of their appearance in any western natural language compete with the most common words in English and Spanish [21]. However, punctuation signs are very seldom preceded by a blank-space and are normally written with just a single character, which promotes the false idea that they function like letters from the alphabet; they do not. They have meaning as well as common words have. Another situation revealing the inconvenience of this natural but too rigid conception of words, is the English contraction when using the apostrophe. It is difficult to count the number of words in the expression “they’re”. How many words are there, one or two? See Febres et al. [21] for a detailed explanation on English and Spanish word recognition and treatment for quantification purposes.
Intuitively the symbols forming a description written using some language, should be those driving the whole message to low entropy when computed as the function of the symbols frequency. In this situation the message is fixed as fixed is also the text and the quantity of information it conveys. Then, there appears to be a conflict: while the information is constant because the message is invariant, any change to the set of symbols considered as basic units, alters the computed message entropy, as if the information had changed; it has not. To solve this paradox, we return to the question asked at the beginning of this section about the meaning of “best” in the context of this discussion. From the point of view of the message emitter, the term “best” considers the efficiency to transmit an idea. This is what Shannon’s work was intended for: to determine the amount of information, estimated as entropy, needed to transmit an idea. From the reader’s point of view the economy of the problem works different. The reader’s problem is to interpret the message received to maximize the information extracted. In other words, the reader focuses on the symbols which turn the script as an organized, and therefore easier to interpret message. If the reader is a human and there are words in the message, the focused symbols are most likely words because those are the symbols that add meaning for this kind of reader. But if there existed the possibility to select another set of symbols which makes the message look even more organized, the reader would rather use this set of symbols because it would require less effort to read.
In conclusion, what the reader considers “best” is the set of symbols that maximizes the organization of the message while for the sender the “best” means the set of symbols needed to minimize the disorder of the message and thus the quantity of information processed. These statements are expressed as objective functions in Equations (3) where the best set of symbols is named B, the message is M, the message entopy is hM and the message organization is (1−hM):
Senders’ objective : min B h M Receiver’s objective : max B ( 1 h M ) = min B h M
Following this reasoning, “best” means the same for both sides of the communication process. This may have important implications when considering languages as living organisms or colonies of organisms. Both parts of the communication process push the language to evolve in the same direction: augmenting self-organization and the reducing of entropy of the messages. Both come together. Self-organization can be seen as one of the evolving directions of languages. Thus, self-organization is an indirect way to measure how deeply evolved a language is and what its capacity is to convey complex ideas or sensations. Finally, an objective function to search the most effective set of symbols―the set with minimal entropy―to describe a language has been found. It will be used to recognize the set of symbols that best describes a language used to write a description.

2.4. Language Recognition

Consider a description consisting of a message M built up with a sequence of N characters or elementary symbols. The message M can be treated as an ordered set of characters Ci as:
M = { C 1 , C 2 , , C N } .
No restriction is imposed over the possibility of repeating characters. Consider also the language B, consisting of a set of DB different symbols Yi, each formed with a sequence of E Y i consecutive characters found with probability P ( Y i ) > 0 in message M. Thus:
B = { Y 1 , Y 2 , , Y D B , P ( Y i ) } .
Y i = C j , C j + 1 , , C j + E Y i 1 } , 1 i D B , 1 j N E Y i + 1.
The symbol probability distribution P(Yi) can be obtained dividing the frequency distribution fi by the total number of symbols N in the message:
P ( Y i ) = f i N
Language B, used to convey the message M, can now be specified as the set of DB different symbols and the probability density function P(Yi) which establishes the relative frequencies of appearance of the symbols Yi. Each symbol Yi is constructed with a sequence of contiguous characters as indicated in Equation (6). The set of symbols that describes the message M with the least entropy comes after the solution of the following optimization problem:
min B D B i = 1 F Y i E Y i N · log D B F Y i E Y i N , Subject to : B = { Y 1 , Y 2 , Y i , Y D B , P ( Y i ) } , for i = 1 , 2 , , D B Y i = { C j , C j + 1 , , C j + E Y i 1 } , for i = 1 , 2 , , D B and j = 1 , 2 , N E Y i + 1 , i = 1 D B F Y i E Y i = N , F Y i 1 , E Y i 1 , for i = 1 , 2 , 3 , , D B .
The resulting language will be the best in the sense that it is the set of symbols that offers a maximum organization of the message. The symbol lengths will range from a minimum to a maximum defining a distribution of symbol lengths characteristic of this scale of observation which is referred to as the Fundamental Scale.

3. The Algorithm

The optimization problem (8) is highly nonlinear and restrictions are coupled. A strategy for finding a solution has been devised. It is a computerized process compound of text-strings processing, entropy calculations, text-symbol ordering and genetic algorithms. Given a description consisting of a text of N characters, the purpose of the algorithm is to build a set of symbols B whose entropy is close to a minimum. The process forms symbols by joining as many as V adjacent characters in the text. A loop where V is kept constant, controls the size of the symbols being incorporated to language B. The process ends when the maximum symbol length of Vmx characters is considered to form symbols. We add a sub-index to language BV to indicate the symbol size V considered at each stage of its construction. We have defined several sections of the algorithm and we named them according to their similarity with a system where each symbol appears and ends up being part of a language, only if it survives the competence it must stand against other symbols. A pseudo-code of the fundamental scale algorithm is included in Appendix A.

3.1. Base Language Construction

In the first stage, the message M is separated into single characters. The resulting set of characters along with their frequency distribution constitute the first attempt to obtain a good language and it will be denoted as B1. The sub-index indicates the maximum length that any symbol can achieve.

3.2. Prospective Symbol Detection

The prospective symbol detection consists of scanning the text looking for strings of exactly V characters. All V-long strings are considered as prospective symbols to join the previously constructed language BV−1 made of strings of up to V−1 characters. The idea is to find all possible different V-long strings present in the message M, which after complying with some entropy reduction criteria, would complement language BV−1 to form language BV.
To cover all possibilities of character sequences forming symbols of length equal to V, several passes are done over the text. The difference from one pass to another is the character where the initial symbol starts, which will be called the phase of the pass. Figure 2 illustrates how the strategy covers all possibilities of symbol instances for any symbol size specification V.

3.3. Symbol Birth Process

Prospective Symbols detected in the previous stage whose likelihood to be an entropy reducer symbol is presumed too low, are discarded and never inserted as part of the language. Interpreting entropy Equation (1) as the summation of contributions of the uncertainty due to each symbol, we can intuit that minimum total uncertainty―minimum entropy―occurs when each symbol uncertainty contribution is about the same. Thus, any Prospective Symbol must be close to the average uncertainty per symbol in order to have some opportunity to actually reduce the entropy after its insertion. The average contribution of the uncertainty ui for symbol i can be estimated as:
u i = p i log D B V p i = h D B V ,
This leads us to look for symbols complying with condition shown in Equation (10), and save processing time whenever a prospective symbol is not within a 2λ-width band of around the average uncertainty value:
h D B V λ < u i < h D B V + λ .
Parameter λ can be adjusted to avoid improperly rejecting entropy reducer symbols or to operate in the safe side at the expense on processing time.

3.4. Conservation of Symbolic Quantity

The inclusion of prospective symbols into the arrays of symbols representing the language B, is performed to avoid the overlap of the newly inserted symbols and the previous language existing symbols. Therefore, every time a prospective symbol is inserted into the stack of symbols, the instances of former symbols occupying the space the new symbols, must be released. Sometimes this freed string is only a fraction of a previously existing symbol. Thus, the insertion of a symbol may produce a break up of other symbols, generating empty spaces for which recovered symbols must be reinserted in order to keep the original text intact.

3.5. Symbol Survival Process

A final calculation is performed to confirm the entropy reduction achieved after the insertion of a symbol into the language being formed. Those symbols not producing an entropy reduction, are rejected and the Language B is reverted to its condition prior to the last insertion of a symbol.

3.6. Controlling Computational Complexity

The computational complexity of this algorithm is far beyond polynomial. A rough estimation sets the number of steps required above the factorial of the diversity of the language treated. Thus, segmenting the message into shorter pieces, allows the algorithm to find a feasible solution and to keep affordable processing times for large texts. This strategy is in fact a sort of parallel processing which significantly reduces the algorithm’s computational complexity down to becoming an applicable tool. A complex system software platform has been developed along with this study to deal with the complexities of this algorithm, and the structure needed to maintain record of every symbol of each description within a core of very many texts. This experimental software, is named Monet and a brief description of it can be found in [21].
The noise introduced when cutting the original description in pieces, is limited. At most two symbols may be fractured for each segment. Very low compared to the number of symbols making each segment. The algorithm calculates the entropy of each description chunk. But, as Grabchak et al. [22] explain, the estimation of the description’s entropy must consider the bias introduced when short text samples are evaluated. Taking advantage of the extensive list of symbols and frequencies available and organized by means of the software Monet, we used the alternative of calculating the description entropy using the joint sets of symbols for each description partition, an then forming the whole description. As a result, no bias has to be corrected.

4. Tests and Results

In order to compare the differences obtained when observing a written message at the scales of characters, words and the fundamental scale, we designed an Example Text. Table 1 shows the symbols obtained after the analysis of the Example Text at the three observation scales used in this study. The entropies calculated at the scales of characters and words were 0.81 and 0.90 respectively, the entropy at the fundamental scale was 0.76; an important reduction of the information required to describe the same message.
These results also get along with our intuition. Clearly, the selection of a certain character-string as a fundamental symbol, is favored by the frequency of appearance of the string of characters. As a result, the “space character” (represented as ø in the table) is recognized as the most frequent fundamental symbol. It indeed is an important structural piece in any English text, since it defines the beginning and the end of natural words. The length of the string of characters also favors the survival of the symbol in its competence with other prospective symbols. The string “describ”, for example, appears twice in the Example Text and the algorithm recognized it as a symbol. On the other hand, the 11-char long string “An adverb” also appears two times, but the algorithm found it more effective in reducing the overall entropy, to break that phrase apart and increase the appearances of other symbols. A similar case is that of the word “adverb”, which appears in nine instances (not including those written with the first capital letter) on the Example Text. But the entropy minimization problem found a more important entropy reduction by splitting the word “adverb” in shorter and more frequent symbols as “dv” (10 times), or the characters as “e” (70 times), “a” (40 times),), “r” (33 times), and “b”(12 times).
In another experiment, we contrasted two different types of communication systems by performing tests over full real messages. The first test is based on a text description written in English and the second in test based on the text file associated to music coded using the MIDI format. The English text is a speech by Bertrand Russell given in 1950 during the Nobel Prize ceremony. The MIDI music is a version of the 4th movement of Beethoven’s ninth symphony. The sizes of these descriptions are near the limit of applicability of the algorithm. English descriptions of 1300 words or less can be processed in short times of less than a minute. Larger English texts have to be segmented using the control computational complexity criteria mentioned in Section 3.6 to reach reasonable working times. Bertrand Russell’s speech was fractioned in seven pieces. For MIDI music files, the processing times show an attitude of sharp increase starting for music pieces lasting about 3 min. The version of 4th movement of Beethoven’s ninth symphony used, is a 25 min long piece. It was necessary to process it by fractioning in 20 segments.
To reveal the differences of descriptions when observed at different scales, symbol frequency distributions were produced. For the English text, characters, words and the fundamental scale were applied. For the MIDI music text distributions at character and fundamental scale were constructed. Words do not exist as scale for music. The corresponding detailed set of fundamental symbols can be seen in Appendix B. The frequency distributions were ordered upon the frequency rank of the symbols, thus the obtained were Zipf’s profiles.
Table 2 shows the length N, the diversity D and the entropy h obtained for these two descriptions analyzed at several scales and Figure 3 shows the corresponding Zipf’s profiles for Bertrand Russell’s speech English speech and Beethoven’s 9th Symphony’s 4th movement. Both descriptions’ profiles are presented at the scales they were analyzed: character-scale and the fundamental scale for both, English and music, and the word-scale only for English.
In Figures 3a and 3b, the character scale exhibit the smallest diversity range. Taking only the characters as allowable symbols, leaves out any possibility of combination to form more elaborated symbols and excluding any possibility of representing how the describing information of a system arranges to create what could be loosely called the “language genotype”. Allowing the composition of symbols as the conjunction of several successive characters, dramatically increases the diversity of symbols.
The selection of the symbols to build an observation scale holding the criteria of minimizing the resulting frequency distribution entropy, bounds the final symbolic diversity in a scale while capturing a variety of symbols that represents the way characters are organized to represent the language structure. The fundamental scale appears as the most effective scale, since with it, the original message can be represented with the most compressed information, expressed as the lowest entropy measured for all scales in both communication systems evaluated.
Any scale of observation has a correspondence with the size of the symbols focused at that scale. When that size is the same for all symbols, the scale can be regarded as a regular scale and specified indicating its size. If on the contrary, the scale does not correspond to a constant symbol size, then a symbol frequency distribution based on the sizes is a valid depiction of the scale. That is the case of the scales of words for English texts and the fundamental scale for our two examples. Figures 4 and 5 show those distributions and are useful to interpret the fundamental scales of both examples.

5. Discussion

The results clearly showed the calculus of the entropy content of a communication system varies in important ways, depending on the scale of analysis. Looking at a language at the scale of characters provides a different picture than examining it at the level of words, or at the here described fundamental scale. Thus, in order to compare different communication systems, we need to use a similar scale applicable to each communication system. We showed that the fundamental scale presented here is applicable to very different communication systems, such as music, computer programs, and natural languages. This allows us to perform comparative studies regarding the systems entropy and thus to infer about the relative complexity of different communication systems.
In both examples analyzed, the profiles at the scale of characters and the fundamental scale run close to each other, within the range of the most frequent symbols to the symbols with a rank placed near the mid logarithmic scale. For points with lower ranking, the fundamental-scale profile extends its tail toward the region of low symbol frequencies. The closeness of fundamental and character scaled profiles in the high frequency region, indicates that the character-scaled language B1 is a subset of the fundamental scale language. The language at fundamental scale, having a greater symbolic diversity and therefore more degrees of freedom, finds a way to generate a symbol frequency distribution with a lower entropy as compared to the minimal entropy distribution when the description is viewed at the scale of words. Focusing in the fundamental scale profiles, the symbols located in the lower rank region―the tail of the profile―tend to be longer symbols formed by more than one character. These multi-character symbols, which cannot exist at the character scale, are formed at the expense of instances of single character symbols typically located in the profile’s head. This explains the nearly constant gap between the two profiles in the profiles’ heads.
The English description, observed at the scale of words, produces a symbol profile incapable of showing short symbols―fragments of a word―which would represent important aspects of a spoken language as syllabus and other typical fundamental language sounds. On the opposite extreme, by observing at the character scale, the profile forbids considering strings of characters as symbols, thus meaningful words or structures cannot appear at this scale, missing important information about the structure of the described system.
The fundamental scale, on the other hand, appears as an intermediate scale capable of capturing the essence of the most elementary structure of a language, as its alphabet, as well as larger structures which represent the result of language evolution in its way to form more specialized and complex symbols. The same applies for music MIDI representation. There is no word scale for music, but clearly the character scale does not capture the richness that undoubtedly is present in this type of language.
Another difference between the fundamental scale, and other scales is the sensitivity to the order of the symbols as they appear in the text. At the scale of words or the scale of characters, the symbol frequency profile does not vary as the symbol order. The profiles depend only on the number of appearances of each symbol, word or character, depending on the subject scale. The profile built at the fundamental scale does change as the symbol order is altered, not because of the symbol order itself, but because the symbol set recognized as fundamental, changes when the order or words or characters are modified. As a consequence, the character and word scales do not have any sense of grammar. The fundamental scale and its corresponding profile, on the other hand, is affected by the order in which words are organized—or disorganized—and is therefore sensitive to the rules of grammar. Other communication systems may not have words, but they must have some rules or the equivalence of a grammar. Assuming rigid rules as symbol size or symbol delimiters seems to be a barrier when studying the structure of system descriptions.
In the search for symbols, the fundamental scale method accounts for frequent sequences of strings which result from grammar rules. The string “ing”, for example appears at the end of words representing verbs or actions. Moreover, it normally comes followed by a space character (“ ”). As the sequence appears with noticeable frequency, the fundamental scale method recognizes the char sequence “ing” (ending with a space) as an entropy reducer token and therefore an important descriptive piece of English as a language. The observation of a description at its fundamental scale is therefore, sensitive to the order in which char-strings appear within the description. The fundamental scale method detects the internal grammar which has been ignored when analyzing Zipf’s profiles at the scale of words in many previous studies.
Despite the concept of fundamental scale being applicable to descriptions built over multidimensional spaces, the fundamental scale method and the algorithm developed is devised for 1-dimensional descriptions. The symbol search process implemented scans the description along the writing dimension of the text file being analyzed. This means that the fundamental symbols constituting 2D descriptions like pictures, photographs or plain data tables cannot be discovered with the algorithm as developed. To extend the fundamental scale algorithm to descriptions of more than one dimension, the restriction (8c) must be modified or complemented, to incorporate the sense of indivisible information unit―as has been the character in the development of this study―and the allowed symbol boundary shape in the description-space considered. This adjustment is a difficult task to accomplish because establishing criteria for the shapes of the boundaries becomes a hard to solve topology problem, especially in higher dimensional spaces.
There are other limitations for the analysis of descriptions of one dimension. Some punctuation signs which belong more to the writing system than to the language itself, work in pairs. Parenthesis, quotes, admiration and question marks are some of the written punctuation signs which work in couples. Intuition indicates that each one of them is a half-symbol belonging to one symbol. In these cases, not considering each half as part of the same symbol most likely increases the entropy associated to the set of symbols discovered, thus becoming a deviation of the ideal application of the method. Nevertheless, for English, Spanish and human natural languages, in general, the characters which work in couples, appear infrequently as compared to the rest of characters. Thus the minimal entropy distortion introduced by this effect is small.
Practical use of the algorithm is feasible up to some description lengths. The actual limit depends on the nature of the language used in the description. For syllabic human natural languages the algorithm can be directly applied to texts of 40,000 characters or less. Longer texts, however, can be analyzed by partitioning. Thus the application limit for texts expressed in human natural languages, covers most needs. For the analysis of music, the use of the algorithm is limited to the MIDI format, result in large processing times even for powerful computers available today. The problem of scanning all possible sets of symbols in a sequence of characters grows as a combinatorial number. The Problem rapidly gets too complex in the computational sense, and its practical application is only feasible for representations of music in reduced sets of digitized symbols like the MIDI coding. Using more comprehensive formats like .MP3, a compressing technology capable of reducing the size of a music pack while keeping reasonably good sound quality, would be enough to locate the solution of the problem beyond our possibilities of performing experiments with large sets of musical pieces. Yet, the fundamental scale method provides new possibilities for discovering the most representative dimension of small sized textual descriptions, allowing us to advance in our understanding of languages.
The Fundamental Scale, as a concept and as a method to find a quantitative approximation to the description of communication systems promises to be fruitful in further research. Tackling the barriers of the algorithm by finding ways to reduce the number of loops and augmenting the assertiveness of the criteria used, may extend the space of practical use of the notion of a description’s fundamental scale. Here we showed that the method reveals structural properties of languages and other communication systems, offering a path for comparative studies of the complexity of communication.

Appendix A. The Fundamental Scale Algorithm Pseudo-Code

The following are a series of pseudo-codes of routines to determine the Fundamental Scale of any sequence of characters.
Entropy 17 01606f6Entropy 17 01606f7Entropy 17 01606f8Entropy 17 01606f9

Appendix B

B.1. Bertrand Russell’s speech given at the 1950 Nobel Award Ceremony:

Word-scale profile Complete List. Speech text.
Total number of symbols [words]: 5716. Diversity: 1868.
RankSymbolOccurrencesLengthRankSymbolOccurrencesLength
1,412151or202
2the342352some204
3.256153no192
4of218254so182
5to172255was183
6is140256our183
7and132357human185
8a106158can173
9in105259these175
10that84460very164
11are80361may163
12it66262many164
13be64263;151
14they53464than154
15not52365such154
16as51266fear154
17which46567motives147
18if44268war143
19I43169life134
20have42470people136
21we40271however137
22by38272because127
23you37373«121
24he37274his123
25but37375excitement1210
26for35376hate124
27will34477most124
28their33578your124
29¹32179great125
30with32480an122
31from30481think115
32power30582become116
33this29483been114
34when28484motive116
35would27585herd114
36more27486much114
37one27387out103
38there27588should106
39who26389could105
40has26390those105
41them25491politics108
42men25392vanity106
43do25293political99
44at25294were94
45all25395upon94
46what25496desires97
47on24297wish84
48other24598?81
49love24499man83
50had223100desire86
201boredom47301preference310
202time44302various37
203better46303type34
204while45304obvious37
205gambling48305sometimes39
206serious47306sank34
207long44307away34
208found45308cause35
209hand44309end33
210old43310killed36
211taken45311innocent38
212members47312believe37
213destructive411313themselves310
214above45314desired37
215within46315step34
216enemies47316wars34
217French46317kind34
218way43318where35
219communists410319passions38
220effective49320instinctive311
221sympathy48321brothers38
222self44322feeling37
223Nation46323Russians38
224selfishness411324enemy35
225moralists49325ways34
226general47326conflict38
227although48327altruistic310
228politicians411328against37
229since45329operation39
230ideologies410330fall34
231government410331hunger36
232account37332history37
233population310333rivalry37
234South35334current27
235North35335theory26
236books35336psychology210
237sort34337facts25
238person36338constitutional214
239between37339began25
240cannot36340right25
241politician310341average27
242frequently310342income26
243causes36343want24
244action36344tell24
245another37345heard25
246far33346questions29
247too33347remote26
248wholly36348scientific210
249duty34349constantly210
250sense35350thinking28
501designed28551century27
502deceive27552feel24
503condemn27553hatred26
504form24554strange27
505appropriate211555methods27
506feared26556best24
507fellow26557thoroughly210
508leads25558produced28
509exciting28559ill23
510provide27560treated27
511rabbits27561Western27
512impulse27562countries29
513big23563sum23
514contain27564expensive29
515small25565Germans27
516enmity26566victors27
517actual26567secured27
518member26568advantages210
519mechanism29569B21
520nations27570large25
521regards27571disguised29
522international213572conclusion210
523discovered210573intelligence212
524degree26574ladies26
525says24575economic28
526am22576nor23
527line24577mankind27
528Rhine25578court25
529essential29579civilized29
530danger26580dance25
531TRUE24581none24
532might25582killing27
533regard26583Royal15
534Mother26584Highness18
535Nature26585Gentlemen19
536cooperation211586chosen16
537easily26587subject17
538schools27588lecture17
539turning27589tonight17
540cruelty27590discussions111
541everyday28591insufficient112
542atom24592statistics110
543bomb24593organization112
544wicked26594set13
545rival25595forth15
546hating26596minutely18
547burglars28597difficulty110
548disapprove210598finding17
549attitude28599able14
550irreligious211600ascertain19

B.2. Bertrand Russell’s speech given at the 1950 Nobel Award Ceremony:

Fundamental-scale profile: Complete profile. Speech text.
Total number of symbols [Fundamental Symbols]: 25,362. Diversity: 1247.
RankSymbolProbabilityOccurrencesLengthRankSymbolProbabilityOccurrencesLength
1ø0.1924755020151pr0.000513132
2e0.0992702589152B0.000512131
3t0.0770742010153fo0.000512132
4n0.0505881319154.øTh0.000485134
5a0.0504901317155ot0.000479122
6o0.0493891288156st0.000477122
7i0.0492021283157ly0.000475122
8s0.0478201247158¹0.000475121
9h0.0474281237159ur0.000474122
10r0.037977990160ll0.000473122
11d0.017566458161if0.000472122
12l0.017226449162co0.000471122
13f0.015580406163as0.000471122
14c0.015178396164S0.000471121
15w0.013417350165E0.000469121
16m0.010668278166to0.000439112
17y0.009954260167politic0.000439117
18,0.009707253168F0.000436111
19u0.008401219169ra0.000433112
20p0.007610198170ca0.000433112
21g0.006424168171øf0.000432112
22v0.006262163172ce0.000430112
23b0.004462116173K0.000428111
24.0.004066106174will0.000425114
25I0.00169544175øb0.000399102
26k0.00142137176um0.000398102
27nd0.00125833277em0.000397102
28be0.00114930278M0.000395101
29ma0.00083122279av0.000395102
30of0.00082922280ev0.000395102
31A0.00082722181su0.000394102
32x0.00082622182ol0.000394102
33T0.00078721183ver0.000393103
34un0.00074719284se0.000393102
35us0.00071319285whic0.000390104
36.øI0.00066817386woul0.00036394
37by0.00063317287pp0.00035792
38s,0.00062916288de0.00035692
39mo0.00062716289im0.00035692
40me0.00062616290ua0.00035592
41ed0.00059916291ac0.00035592
42ad0.00059215292op0.00035592
43lo0.00059215293wi0.00035492
44ve0.00058815294from0.00035494
45om0.00058715295com0.00035493
46ød0.00058615296øp0.00035392
47W0.00055314197no0.00035392
48ri0.00055114298hi0.00035392
49ag0.00051413299so0.00035392
50;0.000513131100ho0.00035292
776øgr7.87×10−5231001nm4.00×10−512
777lec7.87×10−5231002fl4.00×10−512
778lki7.87×10−52310034.00×10−512
779ødea7.87×10−5241004du4.00×10−512
780?øAn7.87×10−524100584.00×10−511
781rimi7.87×10−5241006J4.00×10−511
782day,7.87×10−5241007øu4.00×10−512
783joym7.87×10−5241008nu4.00×10−512
784stsø7.87×10−5241009ob4.00×10−512
785firs7.87×10−5241010cke4.00×10−513
786forø7.87×10−5241011oul4.00×10−513
787notø7.87×10−5241012ari4.00×10−513
788mora7.87×10−5241013til4.00×10−513
789eøol7.87×10−5241014le.4.00×10−513
790hand7.87×10−5241015mun4.00×10−513
791zedøm7.87×10−5251016war4.00×10−513
792løref7.87×10−5251017tho4.00×10−513
793uchøa7.87×10−5251018rth4.00×10−513
794produc7.87×10−5261019døw4.00×10−513
795inøcon7.87×10−5261020slav4.00×10−514
796lømake¹up7.87×10−5291021øint4.00×10−514
797heødevilø7.87×10−5291022aløs4.00×10−514
798shouldøbe7.87×10−5291023nalø4.00×10−514
799y¹fiveømil7.87×10−52101024ty.ø4.00×10−514
800seriousness7.87×10−52111025tern4.00×10−514
801fromøboredom7.87×10−52121026erty4.00×10−514
802not,øperhaps,7.87×10−52131027lyøf4.00×10−514
803uch7.86×10−5231028ent,4.00×10−514
804ry7.86×10−5221029ryøg4.00×10−514
805arø7.84×10−5231030eøk3.96×10−513
806'7.84×10−5211031pulse3.96×10−515
807llø7.83×10−5231032øwould3.96×10−516
808rad7.83×10−5231033unl3.96×10−513
809dr7.83×10−5221034you,3.96×10−514
810D7.83×10−5211035oføex3.96×10−515
811øun7.79×10−5231036igh3.96×10−513
812wil7.79×10−5231037rav3.96×10−513
813about7.79×10−5251038øev3.96×10−513
814y,øa7.79×10−5241039øsp3.96×10−513
815Gr7.79×10−5221040eøt3.96×10−513
816oe7.79×10−5221041xci3.96×10−513
817lov7.79×10−5231042let3.96×10−513
818øMa7.79×10−5231043ud3.96×10−512
819iew7.79×10−5231044nn3.96×10−512
820agr7.79×10−5231045hr3.96×10−512
821ivi7.79×10−5231046ug3.96×10−512
822,ø«7.79×10−5231047gg3.96×10−512
823.øO7.79×10−5231048ead3.96×10−513
824esu7.79×10−5231049,øe3.96×10−513
825oøi7.79×10−5231050eed3.96×10−513

B.3. Beethoven 9th Symphony, 4th movement:

Fundamental -scale profile: Complete profile. Complete text. Listen MIDI Version.
Total number of symbols [Fundamental Symbols]: 84645. Diversity: 2824.
RankSymbolProbabilityOccurrenceLengthRankSymbolProbabilityOccurrenceLength
120.383323244615110.002872431
2x0.038963298152]0.002782351
3Φ0.03870327615350.002732311
4n0.033202810154+0.002672261
5@0.019211626155[0.002652241
630.019161622156A0.002592191
7d0.01769149715700.002612211
890.014541231158!0.002051731
920.013591151159.0.001891601
10?0.01358114916080.001671421
11-0.013211118161W0.001601351
12é0.013041104162P0.001581341
13J0.012211033163-0.001501271
14E0.012121026164D0.001481251
15B0.01108938165————0.001231041
16L0.00997844166¿0.00117991
17Q0.00979828167Y0.00115971
18N0.0094479916830.00107911
19/0.00899761169%0.00100841
2060.00877742170————0.00087737
21;0.00826699171#0.00081691
22C0.00815690172°@0.00078662
23=0.00801678173w0.00073621
24O0.00773654174½0.00070591
25V0.00748633175$0.00064541
26K0.007466311760.00064541
27ã0.006715681770.00063531
280.006585571780.00057481
2940.006355371790.00057481
30¾0.00600508180————0.00056478
31Z0.00594503181ú0.00054461
3270.00590499182————0.00054464
33G0.005824921830.00054461
34I0.00576488184————0.00053444
35°0.00517438185¤0.00050431
36&0.004543851860.00050431
37F0.004263601874¾'40.00049414
38R0.00410347188ƒ0.00048411
39X0.004093461890.00047400
400.00401340190_0.00047401
41*0.00369312191"0.00047401
420.00345292192let0.00046391
43S0.00323274193`0.00044371
44T0.00322273194nn0.00044371
45:0.00320271195@30.00043372
46H0.00318269196°@30.00043363
47f0.00302255197Á0.00042361
48M0.00299253198í0.00042361
49U0.00285241199-¾'-0.00038324
500.002852411100v0.00038321
401S?2G0.0000654651¿2ΦR¾0.0000435
40229Zn0.0000654652H?nM20.0000435
403d0.000065265322M¾nV0.0000438
4042é0.0000652654"2‚r.¾2"¾‚0.00004310
4050.0000651655V0.0000432
40620.0000652656[0.0000432
407Φ8x0.000065365730.0000432
408ãΦ0.0000652658N0.0000432
409ΦB0.000055265910.0000432
410Φ’0.000055266070.0000432
411ã0.0000552661;x27x0.0000435
412X30.0000552662GxãS0.0000435
413&30.0000552663;2270.0000435
414#30.0000552664%2Φ2x0.0000435
415?2-?n0.0000545665&2ΦUx0.0000435
416-;0.0000542666¾2ΦEx0.0000435
417132=0.00005446672&xã0.0000435
418L32@0.00005446682E2ΦS0.0000435
419832D0.0000544669¾xãQ0.0000435
420632B0.0000544670ΦJx2é0.0000435
421édnN0.0000544671ãfX220.0000435
422=3nU20.0000545672Φ]x2Q0.0000435
423;3nS20.00005456732Φ4x0.0000434
424L?2C?0.0000545674xãfZ20.0000435
425d-N2ΦLd-L2ΦJd-J0.00005415675+xãQ0.0000435
426E?0.0000542676ΦZx2]0.0000435
427&2Φ¾0.000054467722Φ@x0.0000435
428NxnN0.0000544678é/0.0000432
4292-2Φ9-292Φ---R0.00005414679620.0000432
430Sx0.0000542680720.0000432
431éx0.0000542681ãfI0.0000433
4322Q0.00005426827d2+d0.0000435
433320.0000542683;x2@x0.0000435
434//ãf0.0000544684O22L20.0000435
435/2-/ãf90.0000547685Gx2;x2V0.0000437
436Q2ΦQxnQ0.0000548686;/2é/29/0.0000438
437Q/2E/ãfQ0.0000548687Qx0.0000432
438]x2Bx29xn]0.00005410688L0.0000432
4397xn72Φ-xnE0.00005410689J0.0000432
4400.0000542690Φ-0.0000432
441Φ]0.0000542691Bd0.0000432
4420.0000542692Y22M20.0000435
4432ã.¾ã0.00005466932;xnK0.0000435
444.¾ãf0.000054469422xã0.0000435
445K¾nT0.00005446952/xnW0.0000435
446Kãf50.0000544696°@3Z0.0000435
447VZ2JZ0.0000545697ΦZxnZ0.0000435
448é?2.?ãf0.0000547698/xãf]20.0000436
449?2:?ãf:0.0000547699:xn0.0000433
4502“¾‚úJ0.0000548700C220.0000433
21012dZ0.000022427714?-40.0000114
2102V20.0000224277224?-0.0000114
2103Xd2P0.000022427739d2-0.0000114
2104!22[0.00002242774-2Φ/0.0000114
21052dn&0.00002242775-42Φ0.0000114
21064V20.0000224277620.0000112
2107E2]0.00002242777K27Kn0.0000115
21082OKn0.00002242778K2NK0.0000114
2109Bd20.00002242779éK270.0000114
2110°@30.00002242780xE0.0000112
21112ΦGK0.00002242781QK20.0000113
2112:d2N0.00002242782Xd0.0000112
2113!dnU0.00002242783OK0.0000112
2114Ld2*0.0000224278420.0000112
2115¾2240.00002242785dx0.0000112
2116EdSE0.00002242786dI0.0000112
2117Z2N2oN0.00002282787Un0.0000112
2118K2NK2EKã0.00002282788ã*0.0000112
21192LKãfC220.00002282789Jd0.0000112
212024dxX22P0.00002282790¼0.0000111
2121CK2LK2GK0.00002282791Q0.0000112
21222;K27Kn@0.000022827922/0.0000112
2123X2ÁO2[0.00002282793dn[0.0000113
2124[2O2"-d0.000022927942dã0.0000113
2125Φ=Kn=22720.00002292795J-0.0000112
2126d2]dÉ22Y&0.00002292796ú@20.0000113
2127ãf=2Φ@Kãf0.000022927972¿30.0000113
2128]2jZd2Qd2]0.000022102798F0.0000112
2129&d22dIQ2V0.0000221027990.0000112
2130-‚0.00002222800°@0.0000113
2131…F0.0000222280142Φ0.0000113
2132260.00002222802¿30.0000112
2133230.00002222803ΦT0.0000112
2134E-0.00002222804$-0.0000112
21352A0.000022228050.0000112
21362¿0.00002222806¡0.0000111
2137R0.00002222807-‚ú0.0000113
21382L0.00002222808ãé0.0000113
2139°0.00002222809¿3ã0.0000113
2140ã@0.000022328102260.0000113
2141220.00002232811Φ@30.0000113
21422J20.000022328120-2$0.0000114
214332L0.00002232813…Fé20.0000114
2144‚úé0.000022328143Φ0.0000112
2145VE20.00002232815š0.0000111
2146-2@0.00002232816Ó0.0000111
2147ƒV70.00002232817~0.0000111
2148Φ@-0.000022328180.0000111
2149X°@0.000022328193‡60.0000113
21502ã¤0.000022328202730.0000113

Acknowledgments

We wish to thank to anonymous referees for comments which improved the presentation of our work.

Author Contributions

Conceived the notion of Fundamental Scale: Gerardo Febres. Conceived and implemented the algorithm: Gerardo Febres. Designed and performed the tests: Gerardo Febres. Analyzed the results: Gerardo Febres and Klaus Jaffe. Wrote the paper: Gerardo Febres and Klaus Jaffe. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bar-Yam, Y. Multiscale Complexity/Entropy. Adv. Complex Syst. 2004, 7, 47–63. [Google Scholar]
  2. Bar-Yam, Y.; Harmon, D.; Bar-Yam, Y. Computationally tractable pairwise complexity profile. Complexity 2013, 18, 20–27. [Google Scholar]
  3. Lopez, Ruiz R.; Mancini, H.; Calbet, X. Statistical Measure Of Complexity. Phys. Lett. A 1995, 209, 321–326. [Google Scholar]
  4. Prokopenko, M.; Boschetti, F.; Ryan, A.J. An information-theoretic primer on complexity, self-organisation and emergence. Complexity 2008, 15, 11–28. [Google Scholar]
  5. Gell-mann, M. What is Complexity? Remarks on simpicity and complexity by the Nobel Prize-winning author of. The Quark and the Jaguar. Complexity 1995, 1. [Google Scholar] [CrossRef]
  6. Kontoyiannis, I. The Complexity and Entropy of Literary Styles; NSF Technical Report 97; Department of Statistics, Stanford University: Stanford, CA, USA, 1997; pp. 1–15. [Google Scholar]
  7. Montemurro, M.A.; Zanette, D.H. Entropic Analysis of the Role of Words in Literary Texts. Adv. Complex Syst. 2002, 5, 7–17. [Google Scholar]
  8. Savoy, J. Text Clustering : An Application with the State of the Union Addresses. J. Assoc. Inf. Sci. Technol. 2015. [Google Scholar] [CrossRef]
  9. Febres, G.; Jaffé, K.; Gershenson, C. Complexity measurement of natural and artificial languages. Complexity 2014. [Google Scholar] [CrossRef]
  10. Piasecki, R.; Plastino, A. Entropic descriptor of a complex behaviour. Physica A 2010, 389, 397–407. [Google Scholar]
  11. Funes, P. Complexity measures for complex systems and complex objects. Available online: www.cs.brandeis.edu/~pablo/complex.maker.html accessed on 19 March 2015.
  12. Zipf, G.K. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology; Addison-Welesly: New York, NY, USA,; 1949. [Google Scholar]
  13. Kirby, G. Zipf ’s Law. J. Naval Sci 1985, 10, 180–185. [Google Scholar]
  14. Gelbukh, A.; Sidorov, G. Zipf and Heaps Laws’ Coefficients Depend on Language. Comput. Linguist. Intell. Text Process 2001, 2004, 332–335. [Google Scholar]
  15. Savoy, J. Vocabulary Growth Study : An Example with the State of the Union Addresses. J. Quant. Linguist. 2015, in press. [Google Scholar]
  16. Febres, G.; Jaffé, K. Quantifying literature quality using complexity criteria 2014, Arxiv, 1401.7077.
  17. Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar]
  18. Chomsky, N. Syntactic Structures; Mouton: The Hague, The Netherlands, 1957; Volume 13–17, pp. 27–33. [Google Scholar]
  19. Sipser, M. Introduction to the Theory of Computation; Thomson Course Technology: Boston, MA, USA, 2006; pp. 100–108. [Google Scholar]
  20. Flesch, R. How to Test Readability; Harpe & Brothers: New York, NY, USA, 1951. [Google Scholar]
  21. Febres, G. MoNet: Complex experiment modeling platform. Available online: www.gfebres.com\F0IndexFrame\F132Body\F132BodyPublications\MoNET\MultiscaleStructureModeller.pdf accessed on 19 March 2015.
  22. Grabchak, M.; Zhang, Z.; Zhang, D.T. Authorship Attribution Using Entropy. J. Quant. Linguist 2013, 20, 301–313. [Google Scholar]
Figure 1. Major components of the fundamental scale algorithm.
Figure 1. Major components of the fundamental scale algorithm.
Entropy 17 01606f1
Figure 2. Examples of reading a text to recognize prospective symbols with a sliding window of SymbolSize = 4 and reading Phase = 0, 1, and 3. Phase = 2 not shown. The message: “xMTrkbhÿXbÿYÿQñÖZbQñêrÿQÞgzÿQËQbØQËlÿQÿñQñpMTrkÿQ€ÿÿQ”.
Figure 2. Examples of reading a text to recognize prospective symbols with a sliding window of SymbolSize = 4 and reading Phase = 0, 1, and 3. Phase = 2 not shown. The message: “xMTrkbhÿXbÿYÿQñÖZbQñêrÿQÞgzÿQËQbØQËlÿQÿñQñpMTrkÿQ€ÿÿQ”.
Entropy 17 01606f2
Figure 3. Symbol profiles for an English text (a) and a MIDI music text (b) at different scales of observation.
Figure 3. Symbol profiles for an English text (a) and a MIDI music text (b) at different scales of observation.
Entropy 17 01606f3
Figure 4. Bertrand Russell’s 1950 Nobel ceremony speech behavior according symbol length. (a) At fundamental scale symbol occurrences vs symbol length. (b) At fundamental scale symbol-length frequency distribution. (c) At word-scale symbol occurrences vs. symbol length. (d) At word-scale symbol-length frequency distribution.
Figure 4. Bertrand Russell’s 1950 Nobel ceremony speech behavior according symbol length. (a) At fundamental scale symbol occurrences vs symbol length. (b) At fundamental scale symbol-length frequency distribution. (c) At word-scale symbol occurrences vs. symbol length. (d) At word-scale symbol-length frequency distribution.
Entropy 17 01606f4
Figure 5. Beethoven’s 9th symphony 4th movement MIDI music language behavior according symbol length. (a) At fundamental scale symbol occurrences vs. symbol length. (b) At fundamental scale symbol-length frequency distribution.
Figure 5. Beethoven’s 9th symphony 4th movement MIDI music language behavior according symbol length. (a) At fundamental scale symbol occurrences vs. symbol length. (b) At fundamental scale symbol-length frequency distribution.
Entropy 17 01606f5
Table 1. Results of the analysis of the Example Text at the three scales studied.
Table 1. Results of the analysis of the Example Text at the three scales studied.
Example Text: symbol sets at different scales.
-What is an adverb? An adverb is a word or set of words that modifies verbs, adjectives, or other adverbs. An adverb answers how, when, where, or to what extent, how often or how much (e.g., daily, completely). Rule 1. Many adverbs end with the letters “ly”, but many do not. An adverb is a word that changes or simplifies the meaning of a verb, adjective, other adverb, clause, or sentence expressing manner, place, time, or degree. Adverbs typically answer questions such as how?, in what why?, when?, where?, and to what extent?. Adverbs should never be confused with verbs. While verbs are used to describe actions, adverbs are used describe the way verbs are executed. Some adverbs can also modify adjectives as well as other adverbs.
FY = Frequency, EY = Space occupied, N = Message length, ø =space, Vmx = max. symb. length = 13
Char scaleWord scaleFundamental scale
D = 38 h = 0.8080Diversity D = 82 Entropy h = 0.9033Diversity D = 80 h = 0.7628
d = 0.0486 N = 782Specific diversity d = 82 Length N = 171Specific diversity d = 1384 Length N = 578
Idx.SymbolFYIdx.SymbolFYIdx.SymbolFYIdx.SymbolFYEYIdx.SymbolFYEY
1ø1691,2141completely11ø100141ul22
2e862.1142)12e70142wi22
3a453or743Rule13a40143io22
4s444adverbs744114s36144ie22
5r445?645end15t36145im22
6t396adverb546letters16r33146whe23
7o347verbs447ly17o22147øan23
8d328how448but18n21148dif23
9n309an449do19,18149uch23
10h2810what450not110h17150,øc23
11i2511is351changes111b12151anyø24
12v2112a352simplifies112dv10252wordø25
13b2113other353meaning113d9153describ27
14w2114to354verb114c8154.øAdverb28
15,2115the355adjective115u7155ød12
16c1716as356clause116l6156øv12
17l1617are357sentence117?6157word14
18.1118word258expressing118wh625812
19u1119of259manner119w5159ma12
20m1020that260place120i5160f11
21y1021adjectives261time121.4261ns12
22f722when262degree122g4162An12
23?623where263typically123x4163w12
24A524extent264answer124ly4264b,12
25g525with265questions125m4165v11
26p526266such126verbs4566-11
27x427used267in127y3167(11
28j328describe268why128p3168)11
29W229many269and129dj3269R11
30"230-170should130øof3370111
31-131set171never131ctiv3471M11
32(132words172be132.øA2372q11
33)133modifies173confused133.2173S11
34R134answers174While134W2174ho12
351135often175actions135"2175øm12
36M136much176way136ow2276ng12
37q137177executed137me2277if12
38S138e178Some138le2278in12
39g179can139øi2279on12
40daily180also140pl2280si12
81modify1
82well1
Table 2. Details of two descriptions used to test the fundamental scale method.
Table 2. Details of two descriptions used to test the fundamental scale method.
Name of scale
CharactersFundamentalWords
Text tagCommunication SystemLength NDiversity DEntropy hLength NDiversity DEntropy hLength NDiversity DEntropy h
.Bertrand Russell 1950.NobelLectureEnglish32,621680.705126,08012270.5178647615900.8215
Beethoven. Symphony9.Mov4MIDI Music103,5641600.646484,64528240.4658not defined

Share and Cite

MDPI and ACS Style

Febres, G.; Jaffe, K. A Fundamental Scale of Descriptions for Analyzing Information Content of Communication Systems. Entropy 2015, 17, 1606-1633. https://doi.org/10.3390/e17041606

AMA Style

Febres G, Jaffe K. A Fundamental Scale of Descriptions for Analyzing Information Content of Communication Systems. Entropy. 2015; 17(4):1606-1633. https://doi.org/10.3390/e17041606

Chicago/Turabian Style

Febres, Gerardo, and Klaus Jaffe. 2015. "A Fundamental Scale of Descriptions for Analyzing Information Content of Communication Systems" Entropy 17, no. 4: 1606-1633. https://doi.org/10.3390/e17041606

Article Metrics

Back to TopTop