Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments

Matricciani, Emilio

doi:10.3390/info16100847

Open AccessArticle

Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments

by

Emilio Matricciani

Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, 20133 Milan, Italy

Information 2025, 16(10), 847; https://doi.org/10.3390/info16100847

Submission received: 20 August 2025 / Revised: 24 September 2025 / Accepted: 29 September 2025 / Published: 30 September 2025

Download

Browse Figures

Versions Notes

Abstract

The aim of the present paper is to further develop a theory on the flow of linguistic variables making a sentence, namely, the transformation of (a) characters into words; (b) words into word intervals; and (c) word intervals into sentences. The relationship between two linguistic variables is studied as a communication channel whose performance is determined by the slope of their regression line and by their correlation coefficient. The mathematical theory is applicable to any field/specialty in which a linear relationship holds between two variables. The signal-to-noise ratio

Γ

is a figure of merit of a channel being “deterministic”, i.e., a channel in which the scattering of the data around the regression line is negligible. The larger

Γ

is, the more the channel is “deterministic”. In conclusion, humans have invented codes whose sequences of symbols that make words cannot vary very much when indicating single physical or mental objects of their experience (larger

Γ)

. On the contrary, large variability (smaller

Γ)

is achieved by introducing interpunctions to make word intervals, and word intervals make sentences that communicate concepts. This theory can inspire new research lines in cognitive science research.

Keywords:

Balto-Slavic languages; communication channels; language processing; Germanic languages; Greek; Latin; New Testament; Romance languages; short-term memory; translation; Uralic languages

1. Introducing an Equivalent Input–Output Model of Short-Term Memory

Humans can communicate and extract meaning both from spoken and written languages. Whereas the sensory processing pathways for listening and reading are distinct, listeners and readers appear to extract very similar information about the meaning of a narrative story—heard or read—because the brain assimilates a written text like the corresponding spoken/heard text [1]. In the following, therefore, we consider the processing of reading or writing a text—a writer is also a reader of his/her own text—the same due to the same brain activity. In other words, the human brain represents semantic information in a modal form, independently of input modality.

How the human brain analyzes parts of a sentence (parsing) and describes their syntactic roles is still a major question in cognitive neuroscience. In references [2,3], we proposed that a sentence is elaborated by short-term memory (STM) with two independent processing units in series (equivalent surface processors) of similar size. The clues for conjecturing this input–output model emerged by considering many novels belonging to the Italian and English literature. In reference [3], we showed that there are no significant mathematical/statistical differences between the two literary corpora, according to the so-called surface deep language parameters, suitably defined.

The model conjectures that the mathematical structure of alphabetical languages—digital codes created by the human mind for communication—seems to be deeply rooted in humans, independently of the particular language used or historical epoch. The complex and inaccessible mental process lying beneath communication—still largely unknown—can be studied by looking at the input–output functioning revealed by the structure of alphabetical languages.

The first processor is linked to the number of words between two contiguous interpunctions, denoted by the variable

I_{p}

—termed word interval (Appendix A lists the mathematical symbols used in the present article)—approximately ranging in Miller’s

7 \pm 2

law range [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. The second processor is linked to the number

M_{F}

of

I_{p}

’s contained in a sentence, referred to as the extended short-term memory (E–STM), ranging approximately from 1 to 6. These two units can process sentences containing approximately

8.3

to

61.2

words, values that can be converted into time by assuming a reading speed. This conversion gives

2.6 ~ 19.5

s for a fast reader [14] and

5.3 ~ 30.1

s for a reader of novels, values well supported by experiments [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30].

The E–STM must not be confused with the intermediate memory [31,32]. It is not modeled by studying neuronal activity, but by studying only the surface aspects of human communication, due, of course, to neuronal activity, such as words and interpunctions, whose effects writers and readers have experienced since the invention of writing. In other words, the model proposed in references [2,3] describes the “input–output” characteristics of STM. In reference [33], we further developed the theory by including an equivalent first processor that memorizes syllables and characters to produce a word.

In conclusion, in references [2,3,33], we have proposed an input–output model of STM, made of three equivalent linear processors in series which independently process (1) syllables and characters to make a word, (2) words and interpunctions to make a word interval, and (3) word intervals to make a sentence. This is a simple but useful approach because the multiple brain process regarding speech/texts is not yet fully understood; characters, words, and interpunctions—the latter are needed to distinguish word intervals and sentences—can be easily studied [34,35,36]. Moreover, the theory can inspire new research lines in cognitive science.

In other words, the model conjectures that the mathematical structure of alphabetical languages is deeply rooted in humans, independently of the particular language used or historical epoch. The complex and inaccessible mental process lying beneath communication—still largely unknown—is revealed by looking at the input–output functioning built in in alphabetical languages of any historical epoch.

The literature on STM and its various aspects is immense and multidisciplinary; we have recalled above only few references, but nobody—as far as we know—has considered the connections we found and discussed in references [2,3,33]. Our modeling of STM processing by three units in series is new.

A sentence conveys meaning, of course; therefore, the theory we have developed might be one of the necessary starting points to arrive at the Information Theory that will finally include meaning.

Today, many scholars are trying to arrive at a “semantic communication” theory or “semantic information” theory, but the results are still, in our opinion, in their infancy [37,38,39,40,41,42,43,44,45]. These theories, i.e., those concerning STM, have not considered the main “ingredients” of our theory, namely the number of characters per word,

C_{P}

,

I_{P},

and

M_{F}

, parameters that anybody understands and can calculate in any alphabetical language [34,35,36], as a starting point for including meaning. This is still a very open issue.

The aim of the present paper is twofold: (a) to further develop the theory proposed in references [2,3,33], and (b) to apply it to the flow of linguistic variables making a sentence. This “signal” flow is built in in the model proposed in reference [33], namely, the transformation of (a) characters into words, (b) words into word intervals, and (c) word intervals into sentences, according to Figure 1. Since the connection between these linguistic variables is described by regression lines [34,35,36], in the present article, we analyze experimental scatterplots between these variables.

The article is divided in two parts. In the first part—from Section 2, Section 3 and Section 4—we recall and further develop the theory of linear channels [2,3,33]; in the second part—from Section 5, Section 6, Section 7 and Section 8—we apply it to a significant database of literary texts.

The database of literary texts considered is a large set of the New Testament (NT) books, namely the Gospels according to Matthew, Mark, Luke, John, the Book of Acts, the Epistle to the Romans, and the Apocalypse—155 chapters in total, according to the traditional subdivision of these texts. We have considered the original Greek texts and their translation to Latin and to 35 modern languages, texts partially studied in reference [35]. Notice that in this paper, “translation” is indistinguishable from “language” because we deal only with one translation per language.

We consider the NT books and their modern translations for two reasons: (a) they tell the same story, and therefore, it is meaningful to compare the translations in different languages; (b) they use common words—not the words of scientific/academic disciplines—therefore, they can give some clues on how most humans communicate.

After this introductory section, Section 2 presents the theory of linear regression lines and associated communication channels; Section 3 presents the connection of single linear channels; Section 4 proposes and discusses the theory of series connection of single channels affected by noise; Section 5 reports an exploratory data analysis of the NT texts; Section 6 reports findings concerning single channels; Section 7 concerns series connection of channels; Section 8 concerns cross channels; and finally, Section 9 summarizes the main findings and indicates future studies.

2. Theory of Linear Regression Lines and Associated Communication Channels

In this section, we recall and further expand the general theory of stochastic variables linearly connected, originally developed for linguistic channels [35,36] but applicable to any other field/specialty in which a linear relationship holds between two variables.

Let

x

(independent variable) and

y

(dependent variable) be linked by the following line:

y = m x + b

(1)

Notice that Equation (1) models a deterministic relationship through the slope

m

and the intercept

b

. Since in most scatterplots between linguistic variables

b = 0

, in the following we assume

b = 0 .

(2)

However, notice that if

b \neq 0

, the theory can be fully applied by defining a new dependent variable

\overset{ˇ}{y} = y - b

.

In general, the relationship between

x

and

y

is not deterministic, i.e., given by Equation (1), but stochastic (random). Equation (1) models, in fact, two variables perfectly correlated—correlation coefficient

r = 1

—characterized by a multiplicative “bias”

m .

In general, however, these conditions do not hold. Therefore, Equation (1) can be written as follows:

y = m x + n

(3)

In Equation (3),

n

is an additive Gaussian stochastic variable with zero mean value [34,35,36]; therefore, Equation (3) models a noisy linear channel. Notice that

n

must not be confused with the intercept

b

.

Figure 2 shows the flow chart describing Equations (1) and (3) with a system/channel representation. The black box indicated with

m

represents the deterministic channel, i.e., Equation (1); the black box indicated with

r

represents the parallel channel due to the scattering of

y

around the regression line. The additive noise

n

is a Gaussian stochastic variable with zero mean that makes the linear channel partially stochastic, namely “noisy”.

Now, let us consider the following:

(a): The variance of the difference between the values calculated with Equation (1) $(m \neq 1)$ and those calculated with $y = x$ $(m = 1)$ (45° line) at a given $x$ value as the “regression noise” power $N_{m}$ [35]. This “noise” is due to the multiplicative bias between the two variables.
(b): The variance of the difference between the values not lying on Equation (1) ( $r \neq 1$ ) and those lying on it ( $r = 1$ ) as the “correlation noise” power $N_{r}$ [35]. This “noise” is due to the spread of $y$ around the line given by Equation (1), modeled by $n$ .
(c): Let $s_{x}^{2}$ and $s_{y}^{2}$ be the variances of $x$ and $y$

In case (a), we obtain the difference

(m - 1) x;

therefore, the variance (or power) of the values lying on the regression line, regression noise, is given by the following:

N_{m} = {(m - 1)}^{2} s_{x}^{2}

(4)

Now, we define the regression noise-to-signal power ratio (NSR),

R_{m},

as follows:

R_{m} = \frac{N_{m}}{s_{x}^{2}} = {(m - 1)}^{2}

(5)

In case (b), the fraction of the variance

s_{y}^{2}

due to the values of

y

not lying on the regression line (correlation noise power,

N_{r}

) is given by the following [46]:

N_{r} = (1 - r^{2}) s_{y}^{2}

(6)

The parameter

r^{2}

is called the coefficient of determination and it is proportional to the variance of

y

explained by the regression line [46]. However, this variance is correlated with the slope

m

because the fraction of the variance

s_{y}^{2}

due to the regression line, namely

r^{2} s_{y}^{2}

, is related to

m

according to the following [46]:

r^{2} s_{y}^{2} = m^{2} s_{x}^{2} .

(7)

Figure 3 shows the flow chart of variances.

Therefore, inserting Equation (7) in Equation (6), we obtain the correlation NSR,

R_{r}

:

R_{r} = \frac{N_{r}}{s_{x}^{2}} = \frac{(1 - r^{2})}{r^{2}} m^{2}

(8)

Now, since the two noise sources are disjoint, the total NSR

R

of the channel shown in Figure 2 and Figure 3 is given by the following:

R = R_{m} + R_{r}

(9)

Therefore,

R

depends only on the two parameters

m

and

r

of the regression line:

R = {(m - 1)}^{2} + \frac{(1 - r^{2})}{r^{2}} m^{2}

(10)

Finally, the signal-to-noise ratio (SNR)

γ

is given by the following:

γ = \frac{1}{R} = \frac{1}{{(m - 1)}^{2} + \frac{1 - r^{2}}{r^{2}} m^{2}}

(11)

In decibels, this is written as follows:

Γ = 10 \times {l o g}_{10} γ

(12)

Of course, we expect that no channel can yield

|r| = 1

and

|m| = 1

; therefore,

γ = \infty

. In empirical scatterplots, it is very likely that

|r| < 1, |m| \neq 1

.

In conclusion, the slope

m

measures the multiplicative “bias” of the dependent variable

y

compared to the independent variable

x

in the deterministic channel; the correlation coefficient

r

measures how “precise” the linear best fit is.

Finally, notice the more direct and insightful analysis that can be achieved by using the NSR instead of the more common SNR because, in Equation (9), the single-channel NSRs simply add together. This makes it easy to study, for example, which addend determines

R

, and thus

Γ

, while this is far less easy with Equation (11). Moreover, this choice also leads to a useful graphical representation of Equation (10) that can guide analysis and design [11], as shown in Section 8.

In the next sections, we apply the theory of linear channel modeling to specific cases.

3. Connection of Single Linear Channels

We first study how the output variable

y_{k}

of channel

k

relates to the output variable

y_{j}

of another similar channel

j

for the same input

x

. This channel is termed “cross channel” and it is fundamental in studying language translation [35]. Secondly, we study how the output of a deterministic channel, modeled by Equation (1), relates to the output of its stochastic version, Equation (3).

3.1. Cross Channels

Let us consider a scatterplot

k

and a scatterplot

j

in which the independent variable

x

and the dependent variable

y

are linked by linear regression lines:

y_{k} = m_{k} x_{k}

(13)

y_{j} = m_{j} x_{j}

(14)

As discussed in Section 2, Equations (13) and (14) do not give the full relationship between the two variables because they link only conditional average values, measured by the slopes

m_{k}

and

m_{j}

in the deterministic channels. According to Equation (3), we can write more general linear relationships by considering the scattering of the data, always present in experiments, modeled by additive Gaussian zero mean noise sources

n_{k}

and

n_{j}

.

y_{k} = m_{k} x_{k} + n_{k},

(15)

y_{j} = m_{j} x_{j} + n_{j} .

(16)

Now, we can develop a series of interesting investigations about these equations. By eliminating

x

, we can compare the dependent variable

y_{j}

of Equation (16) to the dependent variable

y_{k}

of Equation (15) for

x_{k} = x_{j} = x

. In doing so, we can find the regression line and the correlation coefficient of the new scatterplot linking

y_{j}

to

y_{k}

without the availability of the scatterplot itself.

By eliminating

x

between Equations (15) and (16), we obtain the following:

y_{j} = \frac{m_{j}}{m_{k}} y_{k} - \frac{m_{j}}{m_{k}} n_{k} + n_{j}

(17)

Compared to the new independent variable

y_{k}

, the slope

m_{k j}

of the regression line is given by the following:

m_{k j} = m_{j} / m_{k}

(18)

Because the two Gaussian noise sources are independent and additive, the total noise is given by the following:

n_{k j} = - \frac{m_{j}}{m_{k}} n_{k} + n_{j} = {- m}_{k j} n_{k} + n_{j}

(19)

Figure 4 shows the flow chart describing the cross channel.

Now, from Equation (18), the

R_{m}

of the new channel is

R_{m} = {(m_{k j} - 1)}^{2} .

(20)

The unknown correlation coefficient

r_{k j}

between

y_{j}

and

y_{k}

is given by the following [35]:

r_{k j} = c o s |a r c o s (r_{j}) - a r c o s (r_{k})|

(21)

Therefore, the

R_{r}

of the new channel is

R_{r} = \frac{1 - r_{k j}^{2}}{r_{k j}^{2}} m_{k j}^{2} .

(22)

In conclusion, in the new channel connecting

y_{j}

to

y_{k},

we can determine the slope and the correlation coefficient of the scatterplot between

y_{j}

and

y_{k}

for the same value of the independent variable

x

. Now, the availability of this scatterplot is experimentally very rare because we are unlikely to find values of

y_{k}

and

y_{j}

for exactly the same value of

x

; therefore, cross channels can reveal relationships very difficult to discover experimentally.

In the next sections, we further develop the theory of linear channels, originally established in reference [35], for cross channels.

3.2. Stochastic Versus Deterministic Channel

We compare a deterministic channel

k

with a stochastic channel

j

derived from channel

k

by adding noise. In other words, we start from the regression line given by Equation (1) and then add noise

n

due to the correlation coefficient

|r| \neq 1

. Therefore, from the theory of stochastic channels discussed in Section 3.1, we obtain

y_{k} = m_{k} x_{k},

(23)

y_{j} = m_{k} x_{k} + n_{k},

(24)

m_{k j} = m_{k} / m_{k} = 1,

(25)

R_{m} = {(m_{k j} - 1)}^{2} = 0,

(26)

r_{k j} = c o s |a r c o s (r_{j}) - a r c o s (1)| = c o s |a r c o s (r_{j})| = r_{j} = r,

(27)

R = R_{r} = \frac{1 - r^{2}}{r^{2}} .

(28)

In conclusion, in transforming a deterministic channel into a stochastic channel, only the correlation noise is present; therefore, the SNR is given by

γ = \frac{r^{2}}{1 - r^{2}} .

(29)

Equation (29) coincides with the ratio between the variance explained by the regression line (proportional to the coefficient of determination

r^{2}

) and the variance due to the scattering (correlation noise), proportional to

1 - r^{2}

[46].

So far, we have considered single channels. In the next section, we consider the series connection of single channels to determine the SNR of the overall channel.

4. Series Connection of Single Channels Affected by Correlation Noise

In this section, we consider a channel made of a series of single channels. We consider this case because it can be found in many specialties, and because in Section 7 we apply it to specific linguistic channels.

Figure 5 shows the flow chart of three single channels in series. These channels can be characterized as in Section 3.2, i.e., only with the correlation noise; therefore, the overall channel is compared to the deterministic channel in which

m = m_{1} m_{2} m_{3} .

(30)

From Figure 5, it is evident that the output noise of a preceding channel produces additive noise at the output of the next channel in series. The purpose of this section is to calculate

R

at the output of the series of channels.

Theorem 1.

The NSR

R

of

n

linear channels in series, each characterized by the correlation noise-to-signal ratio

R_{i}

, is given by

R = \sum_{i = 1}^{n} R_{i} .

(31)

Proof.

Let the three linear relationships of the isolated channels of Figure 5 (i.e., before connecting them in series) be given by

y = m_{1} x + n_{1},

(32)

z = m_{2} y + n_{2},

(33)

t = m_{3} z + n_{3} .

(34)

Let

s_{x}^{2}

,

s_{y}^{2}

,

s_{z}^{2}

be the variances (power) of the variables, and let

N_{1} = N_{1 r}

,

N_{2} = N_{2 r}

,

N_{3} = N_{3 r}

be the variances (power) of the Gaussian zero mean noise

n_{1}

,

n_{2}

,

n_{3}

; then, the NSRs of the isolated channels are given by

R_{1} = \frac{N_{1}}{s_{y}^{2}} = \frac{N_{1}}{m_{1}^{2} s_{x}^{2}},

(35)

R_{2} = \frac{N_{2}}{s_{z}^{2}} = \frac{N_{2}}{m_{2}^{2} s_{y}^{2}},

(36)

R_{3} = \frac{N_{3}}{s_{t}^{2}} = \frac{N_{3}}{m_{3}^{2} s_{z}^{2}} .

(37)

When the first two blocks are connected in series, the input to the second block must also include the output noise of the first block; therefore, from Equations (31) and (33), we obtain the modified output variable

\overset{˘}{z}

:

\overset{˘}{z} = m_{2} y = m_{2} (m_{1} x + n_{1}) + n_{2} = m_{2} m_{1} x + m_{2} n_{1} + n_{2} .

(38)

In Equation (37),

m_{2} m_{1} x

is the output “signal” and

m_{2} n_{1} + n_{2}

is the output noise; therefore, the NSR at the output of the second block is

R = \frac{m_{2}^{2} N_{1} + N_{2}}{m_{2}^{2} m_{1}^{2} s_{x}^{2}} = \frac{m_{2}^{2} N_{1}}{m_{2}^{2} m_{1}^{2} s_{x}^{2}} + \frac{N_{2}}{m_{2}^{2} m_{1}^{2} s_{x}^{2}} = \frac{N_{1}}{m_{1}^{2} s_{x}^{2}} + \frac{N_{2}}{m_{2}^{2} s_{x}^{2}} = R_{1} + R_{2} .

(39)

Now, for three channels in series, it is sufficient to consider

R

given by Equation (38) as the input NSR to the third single channel to obtain the final NSR and prove Equation (31):

R = R_{1} + R_{2} + R_{3} .

(40)

□

Finally, notice that

R

of Equation (30) is proportional to the mean

< R_{i} >

:

R = n \sum_{i = 1}^{n} R_{i} / n = n < R_{i} > .

(41)

In other words, the series channel averages the single

R_{i}

.

In conclusion, Equations (30)–(40) allow us to study channels made by the series of several single channels affected by correlation noise by simply adding together their single NSRs.

In the next sections, we apply the theory to linguistic channels suitably defined, after exploring the database on the NT mentioned in Section 1.

5. Exploratory Data Analysis

In this second part, we explore the linear relationships between characters, words, interpunctions, and sentences, according to the flow chart shown in Figure 1, of the New Testament books considered (Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse). This is the database of our experimental analysis and application of the theory of linear channels discussed in the previous sections.

Table 1 lists the language of translation and language family, including the total number of characters (

C)

, words (

W)

, sentences (

S

), and interpunctions (

I

).

Figure 6 shows the scatterplots in the original Greek texts between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) characters and sentences. Figure 7 shows these scatterplots in the English translation. Appendix B shows example of scatterplots in other languages. Table 2 reports the slope

m

and correlation coefficient

r

of the indicated scatterplots (155 samples for each scatterplot) for each translation, namely the input parameters of our theory on communication channels. The differences between languages are due to the large “domestication” of the original Greek texts discussed in reference [47].

The four scatterplots define fundamental linear channels and they are connected with important linguistic parameters previously studied [34,35,36], namely the following:

(a): The number of characters per word, $C_{P}$ , given by the ratio between characters (abscissa) and words (ordinate) in Figure 6a.
(b): The number of words between two successive interpunctions, $I_{P}$ —called the word interval—given by the ratio between interpunctions (abscissa) and words (ordinate) in Figure 6b.
(c): The number of word intervals in sentences, $M_{F}$ , given by the ratio between sentences (abscissa) and interpunctions (interpunctions) in Figure 6c.

Figure 6d shows the scatterplot between characters and sentences, which will be discussed in Section 7.

In the next section, we study the channels corresponding to these scatterplots.

Figure 8 shows the probability distributions of the correlation coefficient

r

and the coefficient of determination

r^{2}

for the scatterplots: words versus characters (green line); interpunctions versus words (cyan); and sentences versus interpunctions (magenta). The black line refers to the scatterplot sentences versus characters; the red line refers to the series channel considered in Section 7, which links characters to sentences.

For correlation coefficients—and consequently for the coefficient of determination, which determines the SNR—we notice the following remarkable findings:

(a): In any language, the largest correlation coefficient is found in the scatterplot between characters and words. The communication digital codes invented by humans show remarkable strict relationships between digital symbols (characters) and their sequences (words) to indicate items of their experience, material or immaterial. Languages do not differ from each other very much, with $r$ in the range 0.9753–0.9983 (Armenian, Cebuano) and overall $r = 0.9925 \pm 0.0038$
(b): The smallest correlation coefficient is found in the scatterplot between characters and sentences, being overall $0.0140 \pm 0.0027$ . This relationship must be, of course, the most unpredictable and variable because the many digital symbols that make a sentence can create an extremely large number of combinations, each delivering a different concept.
(c): The correlation coefficient (and also the coefficient of determination $r^{2}$ ) decreases as characters combine to create words, as words combine to create word intervals, and as word intervals combine to create sentences.

The path just mentioned in item (c) describes an increasing creativity and variety of meaning different from that of the deterministic channel.

The characters-to-words channel shows the smallest

r^{2}

; therefore, this channel is the nearest to being purely deterministic. It does not tend to be typical of a particular text/writer but more of a language because a writer has very little freedom in using words of very different length [34], if we exclude specialized words belonging to scientific and academic disciplines.

On the contrary, the channels words-to-interpunctions and interpunctions-to-sentences are less deterministic, as a writer can exercise his/her creativity of expression more freely; therefore, these channels depend more on the writer/text than on the language. Finally, the big “jump” from characters to sentences gives the greatest freedom.

In conclusion, humans have invented codes whose sequences of symbols that make words cannot vary very much when indicating single physical or mental objects of their experience. To communicate concepts, on the contrary, a large variability can be achieved by introducing interpunctions to form word intervals and word intervals to form sentences, the final depositary of human basic concepts.

Figure 9 shows the probability distributions of the slope

m

. The black line (only partially visible because it is superposed to the red line) refers to the scatterplot sentences versus characters; the red line refers to the series channel that connects sentences to characters, as discussed in Section 7.

On the slopes, we notice the following important findings:

(a): The slope of the scatterplot between interpunctions and sentences (magenta line) is the largest in any language—overall $0.3795 \pm 0.0755$ —and determines the number of word intervals, $M_{F},$ contained in a sentence in its deterministic channel.
(b): The slope of the scatterplot between interpunctions and words (cyan line) determines the length of the word interval, $I_{P}$ , in its deterministic channel.
(c): The slope of the scatterplot between words and characters (green line) determines the number of characters per word, $C_{P}$ , in its deterministic channel. As discussed below, this channel is the most “universal” channel because, from language to language, $C_{P}$ varies little compared to other linguistic variables.
(d): The smallest slopes are in the scatterplots between characters and sentences, being overall $0.0140 \pm 0.0027$ . For example, in English there are 519,043 characters and 6590 sentences (Table 1); now, according to Table 1, the deterministic channel predicts $0.0128 \times 519,043 \approx 6644$ sentences, just +0.8% difference from the true value.

As reiterated above, the slopes describe deterministic channels. As discussed in Section 6, a deterministic channel is not “deterministic” when concerning the number of concepts, because the same number of sentences can communicate different meanings by just changing words and interpunctions. What is “deterministic” is the size of the ensemble.

In the next section, we model single linguistic channels, i.e., channels not yet connected in series, from the linear relationships shown above.

6. Single Linguistic Channels

In this section, we apply the theory developed in Section 3.2 to the scatterplots of Section 5 and therefore to the following single channels:

(a): Characters-to-words.
(b): Words-to-interpunctions.
(c): Interpunctions-to-sentences.

These single channels are modeled like in Figure 2 and Figure 3; they are affected only by the regression noise.

Γ

is obtained from Equation (29) and drawn in Figure 10. Table 3 reports the mean and standard deviation of

Γ

in each channel.

From Figure 10 and Table 3, we notice the following interesting facts:

(a): Languages show different $Γ$ values due to the large degree of domestication of the original Greek texts [47].
(b): $Γ$ decreases steadily in this order: characters-to-words, words-to-interpunctions, and interpunctions-to-sentences. A decreasing $Γ$ says, to a certain extent, how much less deterministic a channel is.
(c): Words-to-interpunction and interpunctions-to-sentences have close values, and therefore, they show similar deterministic channels.
(d): Most languages have a $Γ$ greater than that in Greek. This agrees with the finding that in modern translations of the Greek texts, domestication prevails over foreignization [47].
(e): Finally, we can consider $Γ$ as a figure of merit of a linguistic channel being deterministic: the larger $Γ$ is, the more the channel is deterministic.

Figure 11 shows histograms (37 samples) of

Γ

for each channel. The probability density function of

Γ

can be modeled with a Gaussian model (therefore,

γ

is a lognormal stochastic variable), with the mean and standard deviation reported in Table 3. Figure 12 shows the probability functions of

Γ

which show, again, the differences and similarities of the channels.

In conclusion, the large

Γ

of the characters-to-words channel, in any language, indicates that the transformation of characters into words is the most deterministic.

In the next section, we connect the single channels to obtain the series channels modeled in Figure 5 and study them according to the theory of Section 4.

7. Series Connection of Linguistic Channels Affected by Correlation Noise

Let us connect the three single channels to obtain the series channel shown in Figure 5 and apply the theory of Section 4. We first show the results concerning the theory of series channel, and then we compare the single channel characters-to-sentences to that obtained with the series of single channels.

Figure 13a shows the single NSRs and the series NSRs in linear units for each language; Figure 13b shows the corresponding

Γ

(dB), partially already reported in Figure 10. We can notice that in the sum indicated in Equation (31), the NSR of the characters-to-words channel is negligible compared to the other two NSRs. For example, in English (language no. 10),

R = 0.0152 + 0.1059 + 0.1120 = 0.2331 \approx 0.1059 + 0.1120 = 0.2179

; therefore,

R \approx 0.22

against

R \approx 0.23

. In general,

R_{1} ≪ R_{2}, R_{3},

so the characters-to-words channel can be ignored for a first approximation, because it is about

1 / 10

of the other two addends in Equation (31).

For the characters-to-words channel, Figure 14a shows the slope calculated from the scatterplot between characters and sentences (Table 2) and the slope given by Equation (30). The agreement is excellent; in practice, the two values coincide (correlation coefficient 0.9998). Figure 14b shows the scatterplot between the correlation coefficient calculated from the scatterplot between characters and sentences (Table 2) and that calculated by solving Equation (29) for

r

after calculating

γ

from Equation (40). In this case, the two values are poorly correlated (correlation coefficient 0.3929). Finally, notice the difference between the probability of

Γ

calculated by solving Equation (29) for

r —

the red line in Figure 12—and that calculated from the available scatterplots and regression line (Table 2), i.e., the black line. The smoother red curve models more accurately the relationship between characters and sentences than the available scatterplot shown in Figure 6d, because

R

is proportional to the mean value of the single channels (see Equation (41)).

In conclusion, the

Γ

calculated in a series channel linking two variables is more reliable than that calculated from a single channel/scatterplot between the two variables.

In the next section, we apply the theory of cross channels of Section 3.1.

8. Cross Channels: Language Translations

In cross channels, we study how the output variable

y_{k}

of channel

k

relates to the output variable

y_{j}

of another similar channel

j

for the same input

x

; therefore, we apply the theory of Section 3.1. In this new channel, we can determine the slope and the correlation coefficient of the scatterplot between

y_{k}

and

y_{j}

for the same value of the independent variable

x

; therefore, cross channels can reveal relationships more difficult to discover experimentally.

From the database of the NT texts and the scatterplots of Figure 6, we can study at least three cross channels:

(a): The words-to-words channel, by eliminating characters; therefore, the number of words is compared for the same number of characters.
(b): The interpunctions-to-interpunctions channel, by eliminating words; therefore, the number of word intervals is compared for the same number of words.
(c): The sentences-to-sentences channel, by eliminating interpunctions; therefore, the number of sentences is compared for the same number of word intervals.

Now, since these channels connect one independent variable in one language to the same (dependent) variable in another language, they describe very important linguistic channels, namely translation channels, and they can be studied from this particular perspective. Therefore, cross channels in alphabetical texts describe the mathematics/statistics of translation, as we first studied in reference [35].

Figure 15 shows the slope

m_{k j}

and the correlation coefficient

r_{k j}

by assuming Greek as language

k

, namely the reference language, for the three cross channels. We can notice the following:

(a): For most languages, $m_{k j} > 1$ in any cross channel; therefore, most modern languages tend to use more words for the same number of characters; more word intervals for the same number of words; and more sentences for the same number of word intervals than Greek. In other words, the corresponding deterministic channel (the channel characterized by a multiplicative slope) is significantly biased compared to the original Greek texts.
(b): The correlation coefficient $r_{k j}$ is always very near unity. Therefore, the scattering of the data around the regression line is similar in all three cross channels.

Figure 16 shows the findings assuming English as the reference language. In this case, we consider the “translation” from English into the other languages [35]. Clear differences are noticeable:

(a): Words-to-words channel: For most languages, $m_{k j} \leq 1$ . The multiplicative bias is small, as languages tend to use the same number of words as in English. This was not the case for Greek. The correlation coefficient $r_{k j}$ is practically the same for all languages. In other words, modern languages tend to use the same number of words as in English for the same number of characters; therefore, domestication of the alleged translation of English to the other languages is moderate, compared to Greek or Latin (see languages 1 and 2 in Figure 16a).
(b): Interpunctions-to-interpunctions channel: The multiplicative bias $m_{k j}$ is strong, as in Greek; therefore, the deterministic cross channels are different from language to language. The correlation coefficient $r_{k j}$ is more scattered than in Figure 15 and different from language to language. Curiously, in the channel English-to-Greek, $m_{k j} \approx 1$ , and there is no bias. The correlation coefficient $r_{k j}$ is similar to that of the sentences-to-sentences channel.
(c): Sentences-to-sentences: $m_{k j} \leq 1$ for most languages, and $r_{k j}$ is similar to that of the interpunctions-to-interpunctions channel.

Since similar diagrams can be shown when other modern languages are considered as the independent language, we can conclude that the translation from Greek to modern languages shows a high degree of domestication, due especially to the multiplicative bias, namely to the deterministic channels rather than the stochastic part of the channel. In conclusion, the translation from a modern language into another modern language is mainly achieved through deterministic channels. Therefore, the SNR is mainly determined by

R_{m}

.

This conclusion is visually evident in the scatterplot between

X = \sqrt{R_{m}}

and

Y = \sqrt{R_{r}}

shown in Figure 17, where a constant value of

Γ

traces the arch of the circle [34]. It is clear that

\sqrt{R_{m}} ≫ \sqrt{R_{r}}

; in other words, in the three cross channels,

Γ

is dominated by

R_{m}

, in agreement with what is shown in Figure 15 and Figure 16.

Finally, Figure 18 shows the mean value and standard deviation of

Γ

in the three channels by assuming the language indicated in abscissa as an independent language/translation.

Notice that, overall, the probability distribution of

Γ

can be modeled as Gaussian, with the mean value and standard deviation reported in Table 4 (for its calculation, see Appendix C). Notice that cross channels have larger

Γ

values than the series channels (Table 3), because “translation” between two modern languages uses mostly deterministic channels.

Figure 19 shows, as an example, the modeling of the words-to-words overall channel.

Now, we conjecture the characteristics of the three channels for an indistinct human being by merging all values, as done with words in Figure 19.

Figure 20 shows the Gaussian probability density functions and their probability distributions of the overall

Γ

in the three channels calculated with the values in Table 4. These distributions refer, therefore, to channels in which all languages merge into a single digital code. In other words, we might consider these probability distributions as “universal”, typical of humans using plain text.

From Figure 18, Figure 19 and Figure 20 and Table 4, the following “universal” characteristics clearly emerge.

(a): The words-to-words channel is distinguished from the other two channels, with a larger $Γ$ . This channel is the most deterministic.
(b): The interpunctions-to-interpunctions and sentences-to-sentences channels are very similar both in the mean value and standard deviation of $Γ$ , therefore indicating a similar freedom in creating variations with respect to their deterministic channels.

9. Summary and Conclusions

How the human brain analyzes parts of a sentence (parsing) and describes their syntactic roles is still a major question in cognitive neuroscience. In references [2,3,33], we proposed that a sentence is elaborated by short-term memory with three independent processing units in series: (1) syllables and characters to make a word, (2) words and interpunctions to make a word interval, and (3) word intervals to make a sentence.

This approach is simple but useful, because the multiple processing of the brain regarding speech/text is not yet fully understood but characters, words, and interpunctions—the latter are needed to distinguish word intervals and sentences—can be easily studied in any alphabetical language and epoch. Our conjecture, therefore, is that we can find clues on the performance of the mind, at a high cognitive level, by studying the most abstract human invention, namely alphabetical texts.

The aim of the present paper was to further develop and complete the theory proposed in references [2,3,33] and then apply it to the flow of linguistic variables making a sentence, namely, the transformation of (a) characters into words; (b) words into word intervals; and (c) word intervals into sentences. Since the connection between these linguistic variables is described by regression lines, we have analyzed experimental scatterplots between the variables.

In the first part of the article, we have recalled and further developed the theory of linear channels, which models stochastic variables linearly connected. The theory is applicable to any field/specialty in which a linear relationship holds between two variables.

We have first studied how the output variable

y_{k}

of channel

k

relates to the output variable

y_{j}

of another similar channel

j

for the same input

x

. These channels can be termed as “cross channels” and are fundamental in studying language translation.

Secondly, we have studied how the output of a deterministic channel relates to the output of its noisy version. A deterministic channel is not “deterministic” when concerning the number of concepts, because, for example, the same number of sentences can communicate different meanings by just changing words and interpunctions. What is “deterministic” is the size of the ensemble.

Then, we have studied a channel made of a series of single channels and have established that its noise-to-signal ratio

Γ

is proportional to the average of the single-channel noise-to-signal ratios.

In the second part of the article, we have explored, experimentally, the linear relationships between characters, words, interpunctions, and sentences in a large set of the New Testament books. We have considered the original Greek texts and their translation to Latin and to 35 modern languages because, in any language, they tell the same story; therefore, it is meaningful to compare their translations. Moreover, they use common words; therefore, they can give some clues on how most humans communicate.

The characters-to-words channel is the nearest to being purely deterministic. It does not tend to be typical of a particular text/writer but more of a language because a writer has very little freedom in using words of very different length.

On the contrary, the channels words-to-interpunctions and interpunctions-to-sentences are less deterministic, as they depend more on writer/text than on language.

The signal-to-noise ratio

Γ

is a figure of merit of the deterministic channel. The larger

Γ

is, the more the channel is deterministic.

In conclusion, humans have invented codes whose sequences of symbols that make words cannot vary very much when indicating single physical or mental objects of their experience. On the contrary, to communicate concepts, large variability is achieved by introducing interpunctions to make word intervals and word intervals to make sentences, the final depositary of human basic concepts. Future work should be devoted to non-alphabetical languages. Finally, notice that the theory can inspire new research lines in cognitive science research.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The author thanks Lucia Matricciani for drawing Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Table A1. List of mathematical symbols.

Symbol	Definition
$m$	Slope of regression line
$m_{k j}$	Slope in cross channel
$m_{k j}$	Correlation coefficient in cross channel
$n_{C}$	Number of characters per chapter
$n_{W}$	Number of words per chapter
$n_{S}$	Number of sentences per chapter
$n_{I}$	Number of interpunctions per chapter
$r$	Correlation coefficient of linear variables
$r^{2}$	Coefficient of determination
$s$	Standard deviation
$s^{2}$	Variance
$C_{P}$	Characters per word
$I_{p}$	Word interval
$M_{F}$	Word intervals per sentence
$N_{m}$	Regression noise power
$N_{r}$	Correlation noise power
$R_{m}$	Regression noise-to-signal power ratio
$R_{r}$	Correlation noise-to-signal power ratio
$P_{F}$	Words per sentence
$γ$	Signal-to-noise ratio (linear)
$Γ$	Signal-to-noise ratio (dB)
$< >$	Mean value

Appendix B. Scatterplots in Different Languages

Figure A1. Scatterplots for the French translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure A2. Scatterplots for the Italian translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure A3. Scatterplots for the Portuguese translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure A4. Scatterplots for the Spanish translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure A5. Scatterplots for the German translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure A6. Scatterplots for the Russian translation between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Appendix C

Let

m_{k}

and

s_{k}

be the (conditional) mean value and standard deviation of samples belonging to set

k

-th out of

N

sets of the ensemble, e.g., the values shown in Figure 18. From statistical theory [46,47], the unconditional mean (ensemble mean)

m

is given by the mean of means:

m = \frac{1}{N} \sum_{k = 1}^{N} m_{k} .

(A1)

The unconditional variance (ensemble variance)

s^{2}

(

s

is the unconditional standard deviation) is given by

s^{2} = v a r (m_{k}) + \frac{1}{N} \sum_{k = 1}^{N} s_{k}^{2},

(A2)

v a r (m_{k}) = \frac{1}{N} \sum_{k = 1}^{N} m_{k}^{2} - m^{2} .

(A3)

From Equations (A1)–(A3), we obtain the overall values reported in Table 4.

References

Deniz, F.; Nunez–Elizalde, A.O.; Huth, A.G.; Gallant Jack, L. The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality. J. Neurosci. 2019, 39, 7722–7736. [Google Scholar] [CrossRef] [PubMed]
Matricciani, E. A Mathematical Structure Underlying Sentences and Its Connection with Short–Term Memory. AppliedMath 2024, 4, 120–142. [Google Scholar] [CrossRef]
Matricciani, E. Is Short–Term Memory Made of Two Processing Units? Clues from Italian and English Literatures down Several Centuries. Information 2024, 15, 6. [Google Scholar] [CrossRef]
Miller, G.A. The Magical Number Seven, Plus or Minus Two. Some Limits on Our Capacity for Processing Information. Psychol. Rev. 1956, 63, 343–352. [Google Scholar] [CrossRef]
Crowder, R.G. Short–term memory: Where do we stand? Mem. Cogn. 1993, 21, 142–145. [Google Scholar] [CrossRef]
Lisman, J.E.; Idiart, M.A.P. Storage of 7 ± 2 Short–Term Memories in Oscillatory Subcycles. Science 1995, 267, 1512–1515. [Google Scholar] [CrossRef]
Cowan, N. The magical number 4 in short–term memory: A reconsideration of mental storage capacity. Behav. Brain Sci. 2000, 24, 87–114. [Google Scholar] [CrossRef]
Bachelder, B.L. The Magical Number 7 ± 2: Span Theory on Capacity Limitations. Behav. Brain Sci. 2001, 24, 116–117. [Google Scholar] [CrossRef]
Saaty, T.L.; Ozdemir, M.S. Why the Magic Number Seven Plus or Minus Two. Math. Comput. Model. 2003, 38, 233–244. [Google Scholar] [CrossRef]
Burgess, N.; Hitch, G.J. A revised model of short–term memory and long–term learning of verbal sequences. J. Mem. Lang. 2006, 55, 627–652. [Google Scholar] [CrossRef]
Richardson, J.T.E. Measures of short–term memory: A historical review. Cortex 2007, 43, 635–650. [Google Scholar] [CrossRef] [PubMed]
Mathy, F.; Feldman, J. What’s magic about magic numbers? Chunking and data compression in short–term memory. Cognition 2012, 122, 346–362. [Google Scholar] [CrossRef] [PubMed]
Gignac, G.E. The Magical Numbers 7 and 4 Are Resistant to the Flynn Effect: No Evidence for Increases in Forward or Backward Recall across 85 Years of Data. Intelligence 2015, 48, 85–95. [Google Scholar] [CrossRef]
Trauzettel-Klosinski, S.; Dietz, K. Standardized Assessment of Reading Performance: The New International Reading Speed Texts IreST. IOVS 2012, 53, 5452–5461. [Google Scholar] [CrossRef]
Melton, A.W. Implications of Short–Term Memory for a General Theory of Memory. J. Verbal Learn. Verbal Behav. 1963, 2, 1–21. [Google Scholar] [CrossRef]
Atkinson, R.C.; Shiffrin, R.M. The Control of Short–Term Memory. Sci. Am. 1971, 225, 82–91. [Google Scholar] [CrossRef]
Murdock, B.B. Short–Term Memory. Psychol. Learn. Motiv. 1972, 5, 67–127. [Google Scholar]
Baddeley, A.D.; Thomson, N.; Buchanan, M. Word Length and the Structure of Short–Term Memory. J. Verbal Learn. Verbal Behav. 1975, 14, 575–589. [Google Scholar] [CrossRef]
Case, R.; Midian Kurland, D.; Goldberg, J. Operational efficiency and the growth of short–term memory span. J. Exp. Child Psychol. 1982, 33, 386–404. [Google Scholar] [CrossRef]
Grondin, S. A temporal account of the limited processing capacity. Behav. Brain Sci. 2000, 24, 122–123. [Google Scholar] [CrossRef]
Pothos, E.M.; Joula, P. Linguistic structure and short–term memory. Behav. Brain Sci. 2000, 24, 138–139. [Google Scholar] [CrossRef]
Conway, A.R.A.; Cowan, N.; Michael, F.; Bunting, M.F.; Therriaulta, D.J.; Minkoff, S.R.B. A latent variable analysis of working memory capacity, short–term memory capacity, processing speed, and general fluid intelligence. Intelligence 2002, 30, 163–183. [Google Scholar] [CrossRef]
Jonides, J.; Lewis, R.L.; Nee, D.E.; Lustig, C.A.; Berman, M.G.; Moore, K.S. The Mind and Brain of Short–Term Memory. Annu. Rev. Psychol. 2008, 69, 193–224. [Google Scholar] [CrossRef] [PubMed]
Barrouillest, P.; Camos, V. As Time Goes By: Temporal Constraints in Working Memory. Curr. Dir. Psychol. Sci. 2012, 21, 413–419. [Google Scholar]
Potter, M.C. Conceptual short–term memory in perception and thought. Front. Psychol. 2012, 3, 113. [Google Scholar] [CrossRef]
Jones, G.; Macken, B. Questioning short–term memory and its measurements: Why digit span measures long–term associative learning. Cognition 2015, 144, 1–13. [Google Scholar] [CrossRef]
Chekaf, M.; Cowan, N.; Mathy, F. Chunk formation in immediate memory and how it relates to data compression. Cognition 2016, 155, 96–107. [Google Scholar] [CrossRef]
Norris, D. Short–Term Memory and Long–Term Memory Are Still Different. Psychol. Bull. 2017, 143, 992–1009. [Google Scholar] [CrossRef]
Houdt, G.V.; Mosquera, C.; Napoles, G. A review on the long short–term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
Islam, M.; Sarkar, A.; Hossain, M.; Ahmed, M.; Ferdous, A. Prediction of Attention and Short–Term Memory Loss by EEG Workload Estimation. J. Biosci. Med. 2023, 11, 304–318. [Google Scholar] [CrossRef]
Rosenzweig, M.R.; Bennett, E.L.; Colombo, P.J.; Lee, P.D.W. Short–term, intermediate–term and Long–term memories. Behav. Brain Res. 1993, 57, 193–198. [Google Scholar] [CrossRef]
Kaminski, J. Intermediate–Term Memory as a Bridge between Working and Long–Term Memory. J. Neurosci. 2017, 37, 5045–5047. [Google Scholar] [CrossRef] [PubMed]
Matricciani, E. Equivalent Processors Modelling the Short–Term Memory. Preprints 2025. [Google Scholar] [CrossRef]
Matricciani, E. Deep Language Statistics of Italian throughout Seven Centuries of Literature and Empirical Connections with Miller’s 7 ∓ 2 Law and Short–Term Memory. Open J. Stat. 2019, 9, 373–406. [Google Scholar] [CrossRef]
Matricciani, E. A Statistical Theory of Language Translation Based on Communication Theory. Open J. Stat. 2020, 10, 936–997. [Google Scholar] [CrossRef]
Matricciani, E. Multiple Communication Channels in Literary Texts. Open J. Stat. 2022, 12, 486–520. [Google Scholar] [CrossRef]
Strinati, E.C.; Barbarossa, S. 6G Networks: Beyond Shannon Towards Semantic and Goal–Oriented Communications. Comput. Netw. 2021, 190, 107930. [Google Scholar] [CrossRef]
Shi, G.; Xiao, Y.; Li, Y.; Xie, X. From semantic communication to semantic–aware networking: Model, architecture, and open problems. IEEE Commun. Mag. 2021, 59, 44–50. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y.; Juang, B.H. Deep learning enabled semantic communication systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Luo, X.; Chen, H.H.; Guo, Q. Semantic communications: Overview, open issues, and future research directions. IEEE Wirel. Commun. 2022, 29, 210–219. [Google Scholar] [CrossRef]
Wanting, Y.; Hongyang, D.; Liew, Z.Q.; Lim, W.Y.B.; Xiong, Z.; Niyato, D.; Chi, X.; Shen, X.; Miao, C. Semantic Communications for Future Internet: Fundamentals, Applications, and Challenges. IEEE Commun. Surv. Tutor. 2023, 25, 213–250. [Google Scholar]
Bao, J. Towards a theory of semantic communication. In Proceedings of the Network Science Workshop, West Point, NY, USA, 22–24 June 2011; pp. 110–117. [Google Scholar]
Bellegarda, J.R. Exploiting Latent Semantic Information in Statistical Language Modeling. Proc. IEEE 2000, 88, 1279–1296. [Google Scholar] [CrossRef]
D’Alfonso, S. On Quantifying Semantic Information. Information 2011, 2, 61–101. [Google Scholar] [CrossRef]
Zhong, Y. A Theory of Semantic Information. China Commun. 2017, 14, 1–17. [Google Scholar] [CrossRef]
Papoulis Papoulis, A. Probability & Statistics; Prentice Hall: Hoboken, NJ, USA, 1990. [Google Scholar]
Matricciani, E. Domestication of Source Text in Literary Translation Prevails over Foreignization. Analytics 2025, 4, 17. [Google Scholar] [CrossRef]

Figure 1. A flow chart of linguistic variables. The output variable of each block is connected to its input variable by a regression line.

Figure 2. Flow chart of linear systems. Upper panel: deterministic channel with multiplicative bias

m

, Equation (1). Lower panel: noisy deterministic channel with multiplicative bias and Gaussian noise source, Equation (3).

Figure 2. Flow chart of linear systems. Upper panel: deterministic channel with multiplicative bias

m

, Equation (1). Lower panel: noisy deterministic channel with multiplicative bias and Gaussian noise source, Equation (3).

Figure 3. A flow chart of variances:

r^{2} s_{y}^{2}

is the output variance of the values lying on the regression line, Equation (7);

(1 - r^{2}) s_{y}^{2}

is the output variance due to the values of

y

not lying on the regression line, Equation (6).

Figure 3. A flow chart of variances:

r^{2} s_{y}^{2}

is the output variance of the values lying on the regression line, Equation (7);

(1 - r^{2}) s_{y}^{2}

is the output variance due to the values of

y

not lying on the regression line, Equation (6).

Figure 4. Flow chart describing cross channel.

Figure 5. Flow chart of noisy single channels connected in series.

Figure 6. Scatterplots for the original Greek texts between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. The continuous black line is the regression line.

Figure 7. Scatterplots for the English texts between (a) characters and words; (b) words and interpunctions; (c) interpunctions and sentences; and (d) between characters and sentences. In this case, English is the language to be translated. The continuous black line is the regression line.

Figure 8. (a) Cumulative probability distribution that the abscissa is not exceeded, correlation coefficient

r

; (b) cumulative probability distribution that the abscissa is not exceeded, coefficient of determination

r^{2}

. Both refer to the following scatterplots: words versus characters, green; interpunctions versus words: cyan; sentences versus interpunctions, magenta. The black line refers to the scatterplot sentences versus characters; the red line refers to the series channel considered in Section 7.

Figure 8. (a) Cumulative probability distribution that the abscissa is not exceeded, correlation coefficient

r

; (b) cumulative probability distribution that the abscissa is not exceeded, coefficient of determination

r^{2}

. Both refer to the following scatterplots: words versus characters, green; interpunctions versus words: cyan; sentences versus interpunctions, magenta. The black line refers to the scatterplot sentences versus characters; the red line refers to the series channel considered in Section 7.

Figure 9. Cumulative probability distribution that the abscissa is not exceeded for the regression line slope

m

in the following scatterplots: words versus characters, green; interpunctions versus words: cyan; sentences versus interpunctions, magenta. The black line (not visible because superposed by the red cline) refers to the scatterplot sentences versus characters; the red line refers to the series channel considered in Section 7.

Figure 9. Cumulative probability distribution that the abscissa is not exceeded for the regression line slope

m

in the following scatterplots: words versus characters, green; interpunctions versus words: cyan; sentences versus interpunctions, magenta. The black line (not visible because superposed by the red cline) refers to the scatterplot sentences versus characters; the red line refers to the series channel considered in Section 7.

Figure 10. (a) Signal-to-noise ratio SNR

Γ

(dB) versus language (see order number in Table 1); (b) theoretical relationship between

Γ

and coefficient of determination. Characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta. Horizontal lines in (a) draw mean values.

Figure 10. (a) Signal-to-noise ratio SNR

Γ

(dB) versus language (see order number in Table 1); (b) theoretical relationship between

Γ

and coefficient of determination. Characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta. Horizontal lines in (a) draw mean values.

Figure 11. Histograms (37 samples) of the signal-to-noise ratio (SNR

Γ)

for each channel: (a) characters-to-interpunctions; (b) words-to-interpunctions; (c) interpunctions-to-sentences.

Figure 11. Histograms (37 samples) of the signal-to-noise ratio (SNR

Γ)

for each channel: (a) characters-to-interpunctions; (b) words-to-interpunctions; (c) interpunctions-to-sentences.

Figure 12. Cumulative probability distribution that the abscissa is not exceeded, for the signal-to-noise ratio, SNR

Γ,

in the following channels: characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta. The black line refers to the channel characters-to-sentences estimated from the scatterplot of Figure 6d; the red line refers to the series channel considered in Section 7.

Figure 12. Cumulative probability distribution that the abscissa is not exceeded, for the signal-to-noise ratio, SNR

Γ,

in the following channels: characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta. The black line refers to the channel characters-to-sentences estimated from the scatterplot of Figure 6d; the red line refers to the series channel considered in Section 7.

Figure 13. (a) Single-channel NSR and series-channel NSR in linear units; (b) the signal-to-noise ratio SNR

Γ

(dB). The horizontal lines draw mean values. Channels: characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta; series channel, red.

Figure 13. (a) Single-channel NSR and series-channel NSR in linear units; (b) the signal-to-noise ratio SNR

Γ

(dB). The horizontal lines draw mean values. Channels: characters-to-words, green; words-to-interpunctions, cyan; interpunctions-to-sentences, magenta; series channel, red.

Figure 14. (a) A scatterplot between the slope calculated from the scatterplot between characters and sentences (Table 2) and the slope given by Equation (31); (b) a scatterplot between the correlation coefficient calculated from the scatterplot between characters and sentences (Table 2) and that calculated by solving Equation (29) for

r .

The continuous black line is the regression line.

Figure 14. (a) A scatterplot between the slope calculated from the scatterplot between characters and sentences (Table 2) and the slope given by Equation (31); (b) a scatterplot between the correlation coefficient calculated from the scatterplot between characters and sentences (Table 2) and that calculated by solving Equation (29) for

r .

The continuous black line is the regression line.

Figure 15. The mean value (upper panel) and correlation coefficient (lower panel) in the indicated languages, assuming Greek as the reference language (to be translated) in the channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences.

Figure 16. The mean value (upper panel) and correlation coefficient (lower panel) in the indicated languages, assuming English as the reference language (to be translated) in the channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences.

Figure 17. A scatterplot between

X = \sqrt{R_{m}}

and

Y = \sqrt{R_{r}}

in the indicated channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences. Red circles indicate the coordinates

< \sqrt{R_{m}} >

,

< \sqrt{R_{r}} >

of the barycenter.

Figure 17. A scatterplot between

X = \sqrt{R_{m}}

and

Y = \sqrt{R_{r}}

in the indicated channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences. Red circles indicate the coordinates

< \sqrt{R_{m}} >

,

< \sqrt{R_{r}} >

of the barycenter.

Figure 18. The mean value (upper panel) and standard deviation lower of SNR

Γ

(dB) in the indicated language (see Table 1) in the indicated channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences. Black lines indicate the overall means. The mean and standard deviations are calculated from the mean of variances.

Figure 18. The mean value (upper panel) and standard deviation lower of SNR

Γ

(dB) in the indicated language (see Table 1) in the indicated channels: (a) words-to-words; (b) interpunctions-to-interpunctions; (c) sentences-to-sentences. Black lines indicate the overall means. The mean and standard deviations are calculated from the mean of variances.

Figure 19. A histogram of the signal-to-noise ratio SNR

Γ

(dB) in the words-to-words channel (

37 \times 37 - 37 = 1332

samples), shown with blue circles. The continuous black line models the histogram with a Gaussian density function.

Figure 19. A histogram of the signal-to-noise ratio SNR

Γ

(dB) in the words-to-words channel (

37 \times 37 - 37 = 1332

samples), shown with blue circles. The continuous black line models the histogram with a Gaussian density function.

Figure 20. “Universal” Gaussian probability density function (upper panel) and probability distribution function (that the abscissa is not exceeded) of

Γ

(dB) in the following channels: words-to-words, black; interpunctions-to-interpunctions, blue; sentences-to-sentences, red. The horizontal black line indicates the mean value.

Figure 20. “Universal” Gaussian probability density function (upper panel) and probability distribution function (that the abscissa is not exceeded) of

Γ

(dB) in the following channels: words-to-words, black; interpunctions-to-interpunctions, blue; sentences-to-sentences, red. The horizontal black line indicates the mean value.

Table 1. The language of translation and language family of the New Testament books (Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse), including the total number of characters (

C)

, words (

W)

, sentences (

S

), and interpunctions (

I

). The list concerning the genealogy of Jesus of Nazareth reported in Matthew 1.1–1.17 and in Luke 3.23–3.38 was deleted so as to not bias the statistics of linguistic variables [35]. The source of the texts considered is reported in reference [35].

Table 1. The language of translation and language family of the New Testament books (Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse), including the total number of characters (

C)

, words (

W)

, sentences (

S

), and interpunctions (

I

). The list concerning the genealogy of Jesus of Nazareth reported in Matthew 1.1–1.17 and in Luke 3.23–3.38 was deleted so as to not bias the statistics of linguistic variables [35]. The source of the texts considered is reported in reference [35].

Language	Order	Abbreviation	Language Family	$C$	$W$	$S$	$I$
Greek	1	Gr	Hellenic	486,520	100,145	4759	13,698
Latin	2	Lt	Italic	467,025	90,799	5370	18,380
Esperanto	3	Es	Constructed	492,603	111,259	5483	22,552
French	4	Fr	Romance	557,764	133,050	7258	17,904
Italian	5	It	Romance	505,535	112,943	6396	18,284
Portuguese	6	Pt	Romance	486,005	109,468	7080	20,105
Romanian	7	Rm	Romance	513,876	118,744	7021	18,587
Spanish	8	Sp	Romance	505,610	117,537	6518	18,410
Danish	9	Dn	Germanic	541,675	131,021	8762	22,196
English	10	En	Germanic	519,043	122,641	6590	16,666
Finnish	11	Fn	Germanic	563,650	95,879	5893	19,725
German	12	Ge	Germanic	547,982	117,269	7069	20,233
Icelandic	13	Ic	Germanic	472,441	109,170	7193	19,577
Norwegian	14	Nr	Germanic	572,863	140,844	9302	18,370
Swedish	15	Sw	Germanic	501,352	118,833	7668	15,139
Bulgarian	16	Bg	Balto-Slavic	490,381	111,444	7727	20,093
Czech	17	Cz	Balto-Slavic	416,447	92,533	7514	19,465
Croatian	18	Cr	Balto-Slavic	425,905	97,336	6750	17,698
Polish	19	Pl	Balto-Slavic	506,663	99,592	8181	21,560
Russian	20	Rs	Balto-Slavic	431,913	92,736	5594	22,083
Serbian	21	Sr	Balto-Slavic	441,998	104,585	7532	18,251
Slovak	22	Sl	Balto-Slavic	465,280	100,151	8023	19,690
Ukrainian	23	Uk	Balto-Slavic	488,845	107,047	8043	22,761
Estonian	24	Et	Uralic	495,382	101,657	6310	19,029
Hungarian	25	Hn	Uralic	508,776	95,837	5971	22,970
Albanian	26	Al	Albanian	502,514	123,625	5807	19,352
Armenian	27	Ar	Armenian	472,196	100,604	6595	18,086
Welsh	28	Wl	Celtic	527,008	130,698	5676	22,585
Basque	29	Bs	Isolate	588,762	94,898	5591	19,312
Hebrew	30	Hb	Semitic	372,031	88,478	7597	15,806
Cebuano	31	Cb	Austronesian	681,407	146,481	9221	16,788
Tagalog	32	Tg	Austronesian	618,714	128,209	7944	16,405
Chichewa	33	Ch	Niger–Congo	575,454	94,817	7560	15,817
Luganda	34	Lg	Niger–Congo	570,738	91,819	7073	16,401
Somali	35	Sm	Afro-Asiatic	584,135	109,686	6127	17,765
Haitian	36	Ht	French Creole	514,579	152,823	10,429	23,813
Nahuatl	37	Nh	Uto-Aztecan	816,108	121,600	9263	19,271

Table 2. Slope

m

and correlation coefficient

r

of indicated regression lines in each language/translation.

Table 2. Slope

m

and correlation coefficient

r

of indicated regression lines in each language/translation.

Language	Words vs. Characters		Interpunctions vs. Words		Sentences vs. Interpunctions		Sentences vs. Characters.
Language	$m_{1}$	$r_{1}$	$m_{2}$	$r_{2}$	$m_{3}$	$r_{3}$	$m$	$r$
Greek	0.2054	0.9893	0.1369	0.9298	0.3541	0.9382	0.0099	0.8733
Latin	0.1944	0.9890	0.2038	0.9515	0.2957	0.9366	0.0117	0.8646
Esperanto	0.2256	0.9920	0.2045	0.9668	0.2461	0.9545	0.0113	0.8998
French	0.2386	0.9945	0.1347	0.9483	0.4045	0.9509	0.0131	0.9339
Italian	0.2233	0.9921	0.1636	0.9476	0.3489	0.9537	0.0127	0.8856
Portuguese	0.2246	0.9924	0.1845	0.9620	0.3532	0.9484	0.0146	0.9106
Romanian	0.2312	0.9933	0.1568	0.9589	0.3823	0.9384	0.0138	0.8820
Spanish	0.2320	0.9919	0.1580	0.9619	0.3565	0.9581	0.0130	0.9047
Danish	0.2417	0.9945	0.1694	0.9574	0.3961	0.9551	0.0163	0.9257
English	0.2364	0.9925	0.1365	0.9509	0.3962	0.9483	0.0128	0.8916
Finnish	0.1702	0.9904	0.2067	0.9621	0.3029	0.9464	0.0107	0.9131
German	0.2142	0.9938	0.1731	0.9637	0.3511	0.9555	0.0130	0.9325
Icelandic	0.2315	0.9937	0.1805	0.9600	0.3672	0.9527	0.0154	0.9296
Norwegian	0.2460	0.9956	0.1305	0.9581	0.5018	0.9621	0.0162	0.9626
Swedish	0.2371	0.9918	0.1277	0.9218	0.5041	0.9499	0.0154	0.9423
Bulgarian	0.2271	0.9926	0.1809	0.9590	0.3861	0.9482	0.0159	0.9203
Czech	0.2223	0.9927	0.2125	0.9496	0.3879	0.9282	0.0184	0.9034
Croatian	0.2287	0.9915	0.1825	0.9504	0.3853	0.9605	0.0161	0.9095
Polish	0.1968	0.9939	0.2159	0.9650	0.3768	0.9245	0.0160	0.9049
Russian	0.2148	0.9889	0.2397	0.9712	0.2566	0.9274	0.0132	0.8728
Serbian	0.2370	0.9925	0.1745	0.9513	0.4154	0.9436	0.0172	0.9111
Slovak	0.2149	0.9911	0.1973	0.9532	0.4085	0.9544	0.0173	0.9092
Ukrainian	0.2181	0.9893	0.2122	0.9730	0.3556	0.9448	0.0166	0.9545
Estonian	0.2054	0.9912	0.1881	0.9559	0.3342	0.9467	0.0129	0.8995
Hungarian	0.1882	0.9885	0.2412	0.9719	0.2632	0.9482	0.0120	0.9282
Albanian	0.2458	0.9896	0.1573	0.9607	0.3040	0.9582	0.0117	0.9106
Armenian	0.2140	0.9753	0.1802	0.9699	0.3698	0.9635	0.0142	0.8868
Welsh	0.2482	0.9953	0.1734	0.9818	0.2543	0.9493	0.0109	0.9336
Basque	0.1614	0.9939	0.2045	0.9673	0.2925	0.9506	0.0097	0.9210
Hebrew	0.2380	0.9945	0.1784	0.9615	0.4869	0.9635	0.0206	0.9144
Cebuano	0.2149	0.9983	0.1145	0.9465	0.5491	0.9578	0.0136	0.9670
Tagalog	0.2072	0.9957	0.1281	0.9555	0.4879	0.9363	0.0130	0.9411
Chichewa	0.1649	0.9964	0.1685	0.9420	0.4733	0.9596	0.0132	0.9381
Luganda	0.1610	0.9951	0.1797	0.9488	0.4314	0.9501	0.0125	0.9235
Somali	0.1876	0.9965	0.1628	0.9300	0.3505	0.9399	0.0107	0.8773
Haitian	0.2972	0.9959	0.1571	0.9672	0.4338	0.9567	0.0203	0.9288
Nahuatl	0.1489	0.9955	0.1593	0.9304	0.4759	0.9582	0.0114	0.9435
Overall	$0.2161$ $\pm 0.0296$	$0.9925$ $\pm 0.0038$	$0.1750$ $\pm 0.0308$	$0.9558$ $\pm 0.0131$	$0.3795$ $\pm 0.0755$	$0.9492$ $\pm 0.0100$	$0.0140$ $\pm 0.0027$	0.9149 $\pm 0.0252$

Table 3. The mean and standard deviation of the signal-to-noise ratio

Γ

(dB) in the indicated channel. The probability density function of each channel is modeled as Gaussian.

Table 3. The mean and standard deviation of the signal-to-noise ratio

Γ

(dB) in the indicated channel. The probability density function of each channel is modeled as Gaussian.

Channel	$Mean \pm$ $Standard Deviation of Γ$ (dB)
Characters-to-Words	$18.60 \pm 2.00$
Words-to-Interpunctions	$10.42 \pm 1.38$
Interpunctions-to-Sentences	$9.66 \pm 0.89$

Table 4. The mean and standard deviation of the signal-to-noise ratio

Γ

(dB) in the indicated cross channels. The probability density function of each channel is modeled as Gaussian.

Table 4. The mean and standard deviation of the signal-to-noise ratio

Γ

(dB) in the indicated cross channels. The probability density function of each channel is modeled as Gaussian.

Channel	$Mean Γ$ (dB)	Standard Deviation (dB)
Words-to-Words	18.93	9.21
Interpunctions-to-Interpunctions	15.60	7.99
Sentences-to-Sentences	14.94	8.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matricciani, E. Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments. Information 2025, 16, 847. https://doi.org/10.3390/info16100847

AMA Style

Matricciani E. Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments. Information. 2025; 16(10):847. https://doi.org/10.3390/info16100847

Chicago/Turabian Style

Matricciani, Emilio. 2025. "Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments" Information 16, no. 10: 847. https://doi.org/10.3390/info16100847

APA Style

Matricciani, E. (2025). Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments. Information, 16(10), 847. https://doi.org/10.3390/info16100847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revealing Short-Term Memory Communication Channels Embedded in Alphabetical Texts: Theory and Experiments

Abstract

1. Introducing an Equivalent Input–Output Model of Short-Term Memory

2. Theory of Linear Regression Lines and Associated Communication Channels

3. Connection of Single Linear Channels

3.1. Cross Channels

3.2. Stochastic Versus Deterministic Channel

4. Series Connection of Single Channels Affected by Correlation Noise

5. Exploratory Data Analysis

6. Single Linguistic Channels

7. Series Connection of Linguistic Channels Affected by Correlation Noise

8. Cross Channels: Language Translations

9. Summary and Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B. Scatterplots in Different Languages

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI