Readability Indices Do Not Say It All on a Text Readability

: We propose a universal readability index, G U , applicable to any alphabetical language and related to cognitive psychology, the theory of communication, phonics and linguistics. This index also considers readers’ short-term-memory processing capacity, here modeled by the word interval I P , namely, the number of words between two interpunctions. Any current readability formula does not consider I p , but scatterplots of I p versus a readability index show that texts with the same readability index can have very different I p , ranging from 4 to 9, practically Miller’s range, which refers to 95% of readers. It is unlikely that I P has no impact on reading difﬁculty. The examples shown are taken from Italian and English Literatures, and from the translations of The New Testament in Latin and in contemporary languages. We also propose an extremely compact formula, relating the capacity of human short-term memory to the difﬁculty of reading a text. It should synthetically model human reading difﬁculty, a kind of “footprint” of humans. However, further experimental and multidisciplinary work is necessary to conﬁrm our conjecture about the dependence of a readability index on a reader’s short-term-memory


Introduction
First developed in the United States [1-9], readability formulae are applicable to any alphabetical language. They are based on the length of words and sentences, and therefore they allow the comparison of different texts automatically and objectively to assess the difficulty that readers may find in reading them. From the point of view of the writer, a readability formula allows the design of the best possible match between readers and texts. Many readability formulae have been proposed for English [6], and only some for very few languages [10].
In Reference [11] we have defined a global readability formula applicable to any alphabetical language, based on a calque of the readability formula used in Italian [12], both for providing it for languages that have none, and also for estimating, on common grounds, the readability of texts belonging to different languages/translations. In fact, because an "absolute" readability formula-i.e., a formula that provides numerical indices related to a universal origin, such as "zero"-might not exist at all, the readability formula proposed in Reference [11] can be used to compare different texts, because what counts, in this comparison, is the difference between numerical values. In other words, differences give more insight than absolute values for the purpose of comparing texts [11].
As the title of this article claims, any current readability formula, however, does not say everything about a text readability, because it neglects the response of readers' short-term memory to the partial stimuli contained in a sentence, i.e., to how the words of a sentence are punctuated, a process described by the word interval I P [13]. All readability formula neglect, in fact, the empirical connection between the short−term memory capacity of

A Readability Formula for Alphabetical Languages
The observation that differences are more important than absolute values in using readability formulae [13] justifies the development of a readability formula that can be used to compare texts, even those written in different languages [15]. For most languages, in fact, no readability formula has been defined, and only few adapt English formulae to their texts [10,18]. The proposed formula, of course, does not exclude using other readability formulae specifically devised for a language-e.g., the large choice for English- [4,6] but it allows the comparison, on the same ground, of the readability of texts written in any language and in translation.
For this purpose, we have proposed in Reference [11] to adopt, as a reference, the readability formula developed for Italian, known by the acronym GULPEASE [12]: In Equation (1) C P is the number of characters per word, and P F , is the number of words per sentence. Notice that, like all readability formulae, Equation (1) does not contain any reference to interpunctions (besides, of course, full stops, question marks and exclamation marks, which determine the length of sentences), and therefore it does not consider the parameter very likely linked to the short-term memory capacity, namely the word interval I P [13].
G can be interpreted as a readability index by considering the number of years of school attended in Italy's school system (see Reference [12]), as shown in Figure 1. The larger G, the more readable the text for any number of school years.
The continuous lines shown in Figure 1 divide the quadrant into areas of the same performance of texts, such as "almost unintelligible", "very difficult", etc. For example, the area labelled "easy" indicates all combinations of values of G and school years required to declare a text "easy" to read. In all cases, it is shown that, as the number of school years of the reader increases, the readability index he/she can tolerate decreases.
In Reference [11] we have shown, for Italian literature, that the term 10C P varies very little from text to text and across seven centuries, while the term 300/P F varies very much and, in practice, determines the value of the readability index.
Equation (1) says that a text is more difficult to read if P F is large, i.e., if sentences are long, and if C P is large, i.e., if words are long. In other words, a text is easier to read if it contains short words and short sentences, a result that is predicted by any known readability formula and should be true, of course, in any language.  [12]), as a function of the number of school years attended in Italy. The continuous lines divide the quadrant into areas of the same performance of texts. Elementary school lasts 5 years, junior high school lasts 3 years, and high school lasts 5 years. Children stay at school till they are 19 years old. For comparison, the green vertical axis on the right refers to the Flesh Reading Ease index.
Equation (1) says that a text is more difficult to read if is large, i.e., if sentences are long, and if is large, i.e., if words are long. In other words, a text is easier to read if it contains short words and short sentences, a result that is predicted by any known readability formula and should be true, of course, in any language.
In Reference [11], we have proposed the adoption of Equation (1) also for the other languages, such as those listed in Table 1, by scaling the constant 10 according to the ratio between the average number of characters per word in Italian, < , > = 4.48 and the average number of characters per word in another language, e.g., < , > = 4.24 for English. The rationale for this choice is that is a parameter typical of a language which, if not scaled, would bias without really quantifying the change in reading difficulty of readers, who are surely accustomed to reading, in their language, shorter or longer words, on average, than those found in Italian. This scaling, therefore, avoids changing for the only reason that a language has, on average, words shorter or longer than Italian. In any case, as recalled above, affects a readability formula much less than [13]. In Reference [11], we have proposed the adoption of Equation (1) also for the other languages, such as those listed in Table 1, by scaling the constant 10 according to the ratio between the average number of characters per word in Italian, < C p,ITA > = 4.48 and the average number of characters per word in another language, e.g., < C p,ENG > = 4.24 for English. The rationale for this choice is that C P is a parameter typical of a language which, if not scaled, would bias G without really quantifying the change in reading difficulty of readers, who are surely accustomed to reading, in their language, shorter or longer words, on average, than those found in Italian. This scaling, therefore, avoids changing G for the only reason that a language has, on average, words shorter or longer than Italian. In any case, as recalled above, C p affects a readability formula much less than P F [13]. Table 1. Values of C P and k of Equations (2) and (3) in the New Testament texts in the indicated languages. Languages are listed according to their language family (see Reference [11]).

Language
Language Family C P k On the other hand, we have maintained the constant 300 because P F depends significantly on author's style [13,15], not on language. Finally, notice that the constant 89 sets just the absolute ordinate scale, and therefore it has no impact on comparisons [13].
In conclusion, in Reference [11] we have defined a global readability index applicable to texts written in a language as: By using Equations (2) and (3), we force the average value of 10 × C P of any language to be equal to that found in Italian, namely 10 × 4.48. Table 1 reports for Greek, Latin and 35 contemporary languages, the average values of C P [11] and the calculated values of the constant k of Equation (3). For example, for English texts, C P of a sample text is multiplied by 10.6, instead of 10; for Nahuatl (longer words), C P is multiplied by 6.7, and for Haitian (shorter words) by 13.3.
Notice that k seems to be a stable factor. For example, in the sample of the English literature studied in Reference [17], we have found < C P,ENG > = 4.23 (instead of the 4.24 of Table 1). Now, because the value found in the Italian literature [13] is < C P,ITA > = 4.67, therefore k = 4.67/4.23 = 1.10, instead of the k = 4.48/4.24 = 1.06 of Table 1.
As recalled above, all readability formulae substantially tell the same story, and therefore they should be very similar and it is very likely that any one of them can be obtained from another. We illustrate this fact with an example.
Because English is the language that has more readability formulae than any other language, let us compare G to the most classical English readability formula proposed and amply discussed by Flesch [1,2], known as the Flesch Reading Ease (RE) formula: In Equation (4), w is the average number of words per sentence, and s is the average number of syllables per word. Because the number of characters per word is, on average, proportional to the number of syllables per word, the parameter s paralles C P and, of course, w = P F .
How Equation (4) quantifies the degree of difficulty was defined by Flesch himself [1,2], and its values are reported in the vertical scale of Figure 1 (right ordinate scale), for comparison with G (left ordinate scale). Figure 2 shows the scatterplot between the values calculated with the global readability index G, Equation (2), versus those calculated with RE, Equation (4), according to WinWord, in novels from English literature [17], Table 2. obtained from another. We illustrate this fact with an example.
Because English is the language that has more readability formulae than any other language, let us compare to the most classical English readability formula proposed and amply discussed by Flesch [1,2], known as the Flesch Reading Ease ( ) formula: In Equation (4), is the average number of words per sentence, and is the average number of syllables per word. Because the number of characters per word is, on average, proportional to the number of syllables per word, the parameter paralles and, of course, = .
How Equation (4) quantifies the degree of difficulty was defined by Flesch himself [1,2], and its values are reported in the vertical scale of Figure 1 (right ordinate scale), for comparison with (left ordinate scale). Figure 2 shows the scatterplot between the values calculated with the global readability index , Equation (2), versus those calculated with , Equation (4), according to WinWord, in novels from English literature [17], Table 2.  Table 2. Robinson Crusoe, cyan "o"; Pride and Prejudice, black "o"; Vanity Fair, blue "o"; Alice's Adventures in Wonderland, magenta "o"; Treasure Island, green "o"; Adventures of Huckleberry Finn, red "+"; Peter Pan, blue "+"; The Sun Also Rises, green "+"; A Farewell to Arms, black "+".   Table 2. Robinson Crusoe, cyan "o"; Pride and Prejudice, black "o"; Vanity Fair, blue "o"; Alice's Adventures in Wonderland, magenta "o"; Treasure Island, green "o"; Adventures of Huckleberry Finn, red "+"; Peter Pan, blue "+"; The Sun Also Rises, green "+"; A Farewell to Arms, black "+". Table 2. Novels from English literature. Deep-language parameters C P , P F , I P , G and universal readability index G U , the latter discussed in Section 4. Novels are listed according to the year of publication. We can notice a fair agreement between the two indices, with a correlation coefficient of 0.850. The bias could be compensated by downscaling RE.

Literary Work
The attribution of the grade level GL in the USA school system was defined by Kincaid et al. [3], by using the same parameters w and s. The grade level is similar to that attributed to G.
Another readability formula, the Automated Readability Index (ARI), was also defined by Kincaid et al. ii for specific military documents [3]. It is fully related to G because it depends on the same parameters, C P and P F : As ARI increases, the age of required readers increases too. Figure 3 shows the scatterplot between the global G, Equation (2), and ARI, for the the same English novels considered in Figure 2. We can see a very tight relationship for fixed C P .
In conclusion, the global readability formula, Equation (2), provides a readability index that can be directly scaled to ARI and approximately also to RE. For this reason, we continue studying G, which we will modify by introducing the word interval I P to obtain the universal readability formula/index mentioned above. To do so we need to recall, in the next section, some fundamental knowledge on I P . In conclusion, the global readability formula, Equation (2), provides a readability index that can be directly scaled to and approximately also to . For this reason, we continue studying , which we will modify by introducing the word interval to obtain the universal readability formula/index mentioned above. To do so we need to recall, in the next section, some fundamental knowledge on .

Word Interval and Short-Term Memory
As we have discussed in References [11,13,15], the word interval -namely the number of words per interpunctions-varies in the same range of the short-term memory capacity-given by Miller's 7 ± 2 law [14], a range that includes 95% of all cases, and very likely the two ranges are deeply related because interpunctions organize small portions of more complex arguments (which make a sentence) in short chunks of text, which are the natural input to short-term memory [19][20][21][22][23][24][25][26][27]. Moreover, , drawn against the number of words per sentence, , tends to approach a horizontal asymptote as increases, and this occurs both in ancient classical languages (Greek and Latin) and in contemporary languages, as shown in References [11,13] by studying translations of the New Testament books from Greek. In other words, even if sentences get longer, cannot get larger than about the upper limit of Millers' law (namely 9), because of the constraints imposed by the short-term memory capacity of readers and writers, as well.
The average value of can be empirically related to the average value of according to the non-linear relationship [13]:

Word Interval and Short-Term Memory
As we have discussed in References [11,13,15], the word interval I p¯n amely the number of words per interpunctions-varies in the same range of the short-term memory capacitygiven by Miller's 7 ± 2 law [14], a range that includes 95% of all cases, and very likely the two ranges are deeply related because interpunctions organize small portions of more complex arguments (which make a sentence) in short chunks of text, which are the natural input to short-term memory [19][20][21][22][23][24][25][26][27]. Moreover, I p , drawn against the number of words per sentence, P F , tends to approach a horizontal asymptote as P F increases, and this occurs both in ancient classical languages (Greek and Latin) and in contemporary languages, as shown in References [11,13] by studying translations of the New Testament books from Greek. In other words, even if sentences get longer, I p cannot get larger than about the upper limit of Millers' law (namely 9), because of the constraints imposed by the short-term memory capacity of readers and writers, as well.
The average value of I p can be empirically related to the average value of P F according to the non-linear relationship [13]: where I P∞ gives the horizontal asymptote, and P Fo gives the value of < P F > at which the exponential falls at 1/e of its maximum value. Equation (6) is a good average mathematical model for Italian literature [13] and also for Greek, Latin and contemporary languages [11,15]. Reference [11] reports the values of I P∞ and P Fo for each language considered.
Presently, we have carried out the same analysis as for the large corpus of Italian literature [13] for a smaller but useful corpus of the English literature recently studied in Reference [17], and have calculated the best-fit values of Equation (6). Figure 4 shows the scatter plot of I p versus P F (values calculated for each chapter) and the best-fit curve, with I P∞ = 6.70 and P Fo = 6.78, to be compared with I P∞ = 7.37 and P Fo = 10.22 of the Italian literature, whose curve is also drawn. where gives the horizontal asymptote, and gives the value of < > at which the exponential falls at 1/ of its maximum value.
Equation (6) is a good average mathematical model for Italian literature [13] and also for Greek, Latin and contemporary languages [11,15]. Reference [11] reports the values of and for each language considered. Presently, we have carried out the same analysis as for the large corpus of Italian literature [13] for a smaller but useful corpus of the English literature recently studied in Reference [17], and have calculated the best-fit values of Equation (6). Figure 4 shows the scatter plot of versus (values calculated for each chapter) and the best-fit curve, with = 6.70 and = 6.78 , to be compared with = 7.37 and = 10.22 of the Italian literature, whose curve is also drawn. Notice that the constants of the English literature differ from those reported in Reference [17] ( = 6.57, = 4.16) for the same literary corpus, because the latter were the results of fitting Equation (6) to the average values of and , not to the values of and obtained by considering the samples (a sample for each chapter), which give the scatterplot drawn in Figure 4. The different values are due, of course, to the non-linear best fit. Now, as we have recalled in Section 2, any readability index is practically a function only of . Readability formulae do not consider , but the scatterplots of versus show an interesting story: texts with the same do not show the same . In other words, according to the theory of readability formulae, a text with a given index should be readable with the same effort both by readers who display a powerful short-term memory processing capacity (large ) and by readers who do not (small ). For example, for = 60 ("easy/standard" texts for readers with 8 years of school, Figure 1), Figure 5 shows that Notice that the constants of the English literature differ from those reported in Reference [17] (I P∞ = 6.57, P Fo = 4.16) for the same literary corpus, because the latter were the results of fitting Equation (6) to the average values of I P and P F , not to the values of I P and P F obtained by considering the samples (a sample for each chapter), which give the scatterplot drawn in Figure 4. The different values are due, of course, to the non-linear best fit. Now, as we have recalled in Section 2, any readability index is practically a function only of P F . Readability formulae do not consider I p , but the scatterplots of I p versus G show an interesting story: texts with the same G do not show the same I p . In other words, according to the theory of readability formulae, a text with a given index should be readable with the same effort both by readers who display a powerful short-term memory processing capacity (large I p ) and by readers who do not (small I p ). For example, for G = 60 ("easy/standard" texts for readers with 8 years of school, Figure 1), Figure 5 shows that I P can vary from 4 to 9. This is practically Miller's range, which refers to 95% of readers [14]. We think that these readers should be distinguished, and therefore, our aim is to propose, in the next section, a possible "universal" readability index, G U , based on G, which includes I p .
can vary from 4 to 9. This is practically Miller's range, which refers to 95% of readers [14]. We think that these readers should be distinguished, and therefore, our aim is to propose, in the next section, a possible "universal" readability index, , based on , which includes .

A Universal Readability Formula
We suppose that the global readability index should be modified by introducing a function that depends linearly on . Our hypothesis is based on Miller's law, which quantifies linearly the processing capacity of the short-term memory. Moreover, the

A Universal Readability Formula
We suppose that the global readability index G should be modified by introducing a function that depends linearly on I P . Our hypothesis is based on Miller's law, which quantifies linearly the processing capacity of the short-term memory. Moreover, the function should not change the global value for a reader with an "average" processing short-term memory capacity. For words, this average is not 7, but about 6 [1,28]; therefore, in the following we assume this latter value. Notice that 6.03 is the average value of I P (standard deviation 1.11) of the data listed in Table 2 of Reference [11], a further indication of its barycentric value. We write our proposed universal readability formula as: where G is given by Equation (2). We assume that the numerical value of the discrete derivative ∆G ∆I P is given by: In Equation (8), the numerical values are the maximum and minimum averages found in the Italian literature-see Reference [13], whose oldest texts (seven centuries old, e.g., Boccaccio's Decameron) are still read today in Italian high schools with a reasonable effort, a possibility not available in other Western languages.
From [13], we calculate: Therefore, the proposed universal readability formula is given by Equation (10) sets G U = G for I P = 6; G U < G for I P > 6 and G U > G for I P < 6. In other words, if a text with a given G, has a small word interval I P , then it should be read more easily than a text with the same G, but larger I P . For example, texts with G = 60 would be transformed in Miller's range of 5 to 9 to G U = 66 for I P = 5 and in G U = 42 for I P = 9, and therefore, in the first case, the text considered "easy" after 8 years of school (Figure 1), is considered "easy" to read but only after 7.2 years of school; in the second case, the text would be considered "easy", but only after about 13.2 years of school. The meaningful difference between the two indices is therefore very large: 66 − 42 = 24, corresponding to 13.2 − 7.2 = 5 years of school. This significant difference would be lost in the original formula of Equation (2), or in any other readability formula. Figure 6 shows the scatterplots between G U and I P (blue circles) for the samples concerning the literary texts considered in Italian [13] and in English Literatures, in Table 2. Compared to the scatterplots of Figure 5 (redrawn in Figure 6 with red circles), the difference between G U and G is evident: the linear dependence of G U on I P , according to Equation (10), spreads the values around a line and introduces significant correlation coefficients, −0.9016 for Italian, and −0.7730 for English. The regression line: is very similar in the two languages: G U,ENG = −8.88I P + 111.64 (13) This result indicates that Equation (11) might be "universal". Finally, some specific examples concerning novels taken from Italian and English literatures will further illustrate the relationship between G and G U . alytics 2023, 2, FOR PEER REVIEW 11 (a) (b) Finally, some specific examples concerning novels taken from Italian and English literatures will further illustrate the relationship between and . Table 3 shows how the readability index is modified from to for some Italian novels written from the XIV to the XX century [13]. For example, it is interesting to notice how is transformed into for the two novels written by Alessandro Manzoni.  Table 3 shows how the readability index is modified from G to G U for some Italian novels written from the XIV to the XX century [13]. For example, it is interesting to notice how G is transformed into G U for the two novels written by Alessandro Manzoni. Table 3. Novels from Italian Literature [13]. Average deep-language parameters C P , P F , I P , and G and corresponding universal readability index, G U . Novels are listed according to the alphabetical order of the author's name.  Milan 1873), one of the most studied Italian novelist in Italian high schools (Licei) and universities, in 1827 published Fermo e Lucia (Fermo and Lucia), a text that scholars of Italian Literature-and Manzoni himself-consider the "first" version of his masterpiece I Promessi Sposi (The Betrothed, available in a new English translation [29]) published in the years 1840-1842. According to scholars of Italian literature [30][31][32][33], the two versions differ very much, both in story structure and characters and, as far as we are here concerned, also in style and language; therefore, it is interesting to see how much the author transformed (mathematically) Fermo e Lucia into I Promessi Sposi, a study partially carried out in References [13,15].

Novel
As far as readability is concerned, from Table 3 we notice a large improvement in I Promessi Sposi, compared to Fermo e Lucia, if differences are considered. In fact, G = 51.72 in Fermo e Lucia and G = 56.00 in I Promessi Sposi, a difference of only 4.28 units, leading to a decrease in school years (for "easy" reading, Figure 1) of only about 0.8 years. This difference does not justify the reading difficulty of the two texts discussed by scholars of Italian literature [30][31][32][33]. However, if we consider G U , then the difference is quite large, very likely measuring the relative reading difficulty, because G U ranges from 44.70 to 60.20, a difference of 15.5 units leading to a decrease in school years (for "easy" reading, Figure 1) from 11.8 (Fermo e Lucia) to only 8 (I Promessi Sposi), well justified by scholars of Italian literature [30][31][32][33]. In conclusion, G U is a better estimate than G in assessing the difference in reading difficulty between these two very studied novels. Table 2 shows also how the readability index is modified from G to G U in some English novels.
As we can read from Table 2, in Robinson Crusoe the readability index decreases from 50.84 to 42.22, therefore passing from about 10.3 to 12.4 years of school for "easy" reading ( Figure 1). For Hemingway's novels, The Sun Also Rises is more readable (72.45) than A Farewell to Arms (66.99); the order given by G, i.e., 72.58 and 73.17, respectively, is reversed, therefore reducing the number of years of school required for "easy" reading by 1 (Figure 1). The Hound of The Baskervilles changes its readability index from 60.27 to 46.16, therefore passing from 8 to 11.5 years of school for "easy" reading ( Figure 1).
In conclusion, by introducing the word interval I P in the definition of a readability index, as in Equation (10), readability differences in texts are more "fine-tuned" for readers.

A "Footprint" of Humans
As already recalled, in Reference [11] we have studied the translation of the New Testament from Greek to Latin and to contemporary languages. For all these translations, we have recently calculated the scatterplots between G and I P , and between G U and I P , with results very similar to those shown in Figure 6. Some specific examples are reported in Appendix A. Similarly, we have calculated the linear best fit between G U and I P . Appendix B lists the values of the constants a and b of Equation (11) Table 2). In the first case, the difference in the readability of the two novels is 19.71, and in the second case it is 17.49, which implies an "error" of about 0.25 years of school (Figure 1).
It may be interesting to consider the most compact relationship between G U and I P , given by the overall average values of the constants reported in Appendix B: Figure 7 show this average relationship together with ±1 standard deviaton bounds. These extremely compacted curves can synthetically represent how the capacity of human short-term memory (modelled by I P ) is related to the difficulty of reading a text, in any alphabetical language; therefore, it may be considered as a kind of "footprint" of humans.
Analytics 2023, 2, FOR PEER REVIEW 13 reversed, therefore reducing the number of years of school required for "easy" reading by 1 (Figure 1). The Hound of The Baskervilles changes its readability index from 60.27 to 46.16, therefore passing from 8 to 11.5 years of school for "easy" reading ( Figure 1). In conclusion, by introducing the word interval in the definition of a readability index, as in Equation (10), readability differences in texts are more "fine-tuned" for readers.

A "Footprint" of Humans
As already recalled, in Reference [11] we have studied the translation of the New Testament from Greek to Latin and to contemporary languages. For all these translations, we have recently calculated the scatterplots between and , and between and , with results very similar to those shown in Figure 6.  Table  2). In the first case, the difference in the readability of the two novels is 19.71, and in the second case it is 17.49, which implies an "error" of about 0.25 years of school (Figure 1).
It may be interesting to consider the most compact relationship between and , given by the overall average values of the constants reported in Appendix B: Figure 7 show this average relationship together with ±1 standard deviaton bounds. These extremely compacted curves can synthetically represent how the capacity of human short-term memory (modelled by ) is related to the difficulty of reading a text, in any alphabetical language; therefore, it may be considered as a kind of "footprint" of humans.

Conclusions
We have proposed a universal readability index, G U , Equation (10). Compared to the current readability indices, this index considers also readers' short-term memory processing capacity, here described by the word interval I P , namely, the number of words between two interpunctions. The observation that differences give more insight than absolute values has justified, we think, the development of a universal readability formula which is useful for comparing texts written even in different languages and is applicable to alphabetical languages and related to cognitive psychology, the theory of communication, phonics and linguistics.
Scholars have never considered including in the current readability formulae the word interval, I p , but the scatterplots of I p versus any readability index show that texts with the same readability index can have very different values of I p . Now, it is unlikely that I P has no impact on reading difficulty. By introducing I P in the definition of a readability index, readability differences in texts are better "fine-tuned" for readers, e.g., to their school years as a reference. We have used the global readability index developed for Italian [11], after showing that Flesch's index and ARI are connected to this index because they depend on the same variables.
We have calculated an extremely compact formula, Equation (14), which can measure how the capacity of human short-term memory (modelled by I P ) is likely related to the difficulty of reading a text, measured by the universal readability index G U , here defined. We think that it synthetically models human reading difficulty, i.e., it might be considered a "footprint" of humans.
However, there is an important aspect to be considered. Because, as far as we know, there are no direct experiments on the relationship between readability and short-term memory capacity, the universal index here proposed, Equation (10), should be considered a first step in researching this important relationship. Therefore, further work needs to be carried out by a multidisciplinary team of researchers to fully validate Equation (10).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Scatterplots between G U and I P , for selected languages. We show the scatterplots between G and I P (red circles), and between G U and I P , (blue circles) for some selected languages.  Table A1.  (12) and the correlation coefficient between G U and I P are reported in Table A1.  Table A1.  (12) and the correlation coefficient between G U and I P are reported in Table A1.  Table A1.  (13) and the correlation coefficient between G U and I P are reported in Table A1.