Next Article in Journal
Wireless-Channel Key Distribution Based on Laser Synchronization
Previous Article in Journal
Does the Differential Structure of Space-Time Follow from Physical Principles?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Word Length in Political Public Speaking: Distribution and Time Evolution

by
Natalia L. Tsizhmovska
1 and
Leonid M. Martyushev
1,2,*
1
Technical Physics Department, Ural Federal University, 19 Mira St., 620002 Ekaterinburg, Russia
2
Institute of Industrial Ecology, Russian Academy of Sciences, 20 S. Kovalevskaya St., 620219 Ekaterinburg, Russia
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(3), 180; https://doi.org/10.3390/e26030180
Submission received: 25 December 2023 / Revised: 14 February 2024 / Accepted: 19 February 2024 / Published: 21 February 2024

Abstract

:
In this paper, word length in the texts of public speeches by USA and UK politicians is analyzed. More than 300 speeches delivered over the past two hundred years were studied. It is found that the lognormal distribution better describes the distribution of word length than do the Weibull and Poisson distributions, for example. It is shown that the length of words does not change significantly over time (the average value either does not change or slightly decreases, and the mode slightly increases). These results are fundamentally different from those obtained previously for sentence lengths and indicate that, in terms of quantitative linguistic analysis, the word length in politicians’ speech has not evolved over the last 200 years and does not obey the principle of least effort proposed by G. Zipf.

1. Introduction

Natural languages have been one of the most important subjects of science for a long time. However, the interest in language research has flared up with renewed vigor in recent times due to the development of computer technology, general digitalization, development of artificial intelligence systems, etc. One of the scientific areas demonstrating a new impetus for development is quantitative linguistics. Here, quantitative statistical methods of analysis are used to study texts and speeches, which helps to gain a deeper understanding of the structure and evolution of language. By transforming linguistic objects of natural language into numbers according to certain rules, quantitative linguistics strives, on the basis of statistical analysis, to formulate general laws according to which language functions. There are a number of achievements along this path, three of which are as follows: Zipf’s law: the frequency of words is inversely proportional to their rank in frequency lists. Brevity law: the more frequently a word is used, the “shorter” that word tends to be. Menzerath–Altmann law: the sizes of the constituents of a construction decrease with the increase in size of the construction under study, and vice versa [1,2].
It is well known [3,4] that homogeneous and representative datasets (collections of texts) should be used to detect linguistic laws, since even the genre of the text [5,6,7], not to mention the time of its writing, can influence the results obtained. An important dataset for study is the texts of politicians’ public speeches. These texts are well documented over an interval of sufficient length and “ought to be fairly consistent in genre and register” [8]. An important feature of political speeches is that for the most part they are carefully prepared for perception and effective impact on the average mass listener living in a certain time epoch. Therefore, despite the specific genre, political speeches are an excellent object for studying the laws and evolution of language.
The study of politicians’ public speeches using the methods of quantitative linguistics has been previously carried out in works [8,9,10,11,12]. In particular, [8] analyzes the inaugural speeches of USA presidents and notes that the average length of sentences has monotonically decreased over 200 years, and in total has decreased by approximately 50%. In [9], this conclusion for the USA was confirmed and also generalized to public speeches by members of UK parliamentary parties in the period from 1900 to 2000. It was found in [9] that the frequency distribution of sentence lengths during public speaking in the USA and the UK is best described not by the lognormal distribution, but by the Weibull distribution. A similar distribution and reduction in the length of sentences over time was associated in [9] with the principle of least effort proposed by G. Zipf [4,13,14]. The formulation of this principle is as follows: “… the person will strive to minimize the probable average rate of his work-expenditure (over time). And in so doing he will be minimizing his effort. … Least effort, therefore, is a variant of least work.”.
Will the length of words, rather than sentences, also obey the principle of least effort for the same public speeches of politicians, i.e., have Weibull distributions and decrease over time?
This is a very interesting and controversial issue. Indeed, according to Zipf’s law and the brevity law, the power function describes the distribution of word lengths [13,14,15] and considering the Weibull distribution is an unnecessary complication. However, a question arises regarding the validity of these laws when considering words of arbitrary length. For example, according to studies [15] conducted for English and Swedish, only words containing more than three letters follow this law and without a lower bound on word length, the distribution of word lengths should be more complex. Both discrete distributions (especially Poisson) [4,5] and continuous distributions (in particular, lognormal and gamma distributions) [13,15,16,17,18] have been previously proposed in the literature as possible options. Regarding the temporal behavior of average word length, research results are very contradictory [5]. Thus, according to the Menzerath–Altmann law, as the length of sentences calculated in words decreases, the length of the words should increase on average. From the studies of politicians’ public speeches [8,9,10], the sentence length, as mentioned above, decreases significantly over time. However, according to [8], the length of the word does not increase, but rather decreases slightly (by about 5% over the last century). Note that for other genres of text, there has been a slight increase in the average word length over several centuries for Arabic [7], as well as for English and Russian throughout the nineteenth and most of the twentieth century [11] (however, according to [11], since the end of the twentieth century, there has been a decrease in the average word length). Other studies of changes in average word length over time can be found in [5,11,19]; their results are extremely contradictory. Thus, as seen here, there is no answer to the question formulated above. The purpose of this paper is to attempt to answer it within quantitative linguistics.

2. Data for Analysis

The choice of public speeches of politicians in the USA and the UK was determined by the availability of speeches in digital archives, as well as by a desire for homogeneity of texts by genre and uniformity in distribution over time.
We analyzed text transcripts of 88 speeches by USA presidents from 1789 to 2021 (including 59 inaugural addresses), available in [20,21]. Six speeches were made in the 18th century, 51 speeches were made in the 19th century, 25 in the 20th century, and 6 in the 21st century. For the 18th and 19th centuries, speeches by USA presidents were both inaugural and other speeches. These speeches were made in 1789, 1793, 1796–1798, 1801, 1803, 1805, 1809, 1812, 1813, 1815, 1817, 1821, 1825, 1827, 1829, 1833, 1837–1839, 1841, 1842, 1845, 1846, 1848–1851, 1853, 1854, 1856, 1857, 1861, 1864, 1865, 1869, 1873, 1875–1877, 1879, 1881, 1885, 1889, 1893, 1895, 1897. For the 20th and 21st centuries, only inaugural addresses were used (except for 1968, where an additional speech by President Johnson was used); as a result, they are uniformly distributed every four years 1901–2021.
For the UK, transcripts of 247 speeches were analyzed for the period from 1808 to 2018. These speeches are available in [22,23]. Initially, only the speeches of UK party leaders were considered, but they only covered the period from 1895 to 2018. To extend the temporal coverage, speeches by queens and kings have been added, as well as some additional speeches by party members in Parliament. As a result, 19 speeches were analyzed for the 19th century (they belonged to 1808, 1814, 1814, 1815, 1817, 1819, 1827, 1830, 1837, 1842, 1853, 1868, 1877, 1893, 1895, 1896, 1897, 1897, 1899), 170 for the 20th century (1900–1913, 1917–1930, 1932–1937, 1941–1943, 1945–1951, 1955–1958, 1960–1999), and 58 speeches (2000–2018) for the 21st century.
The word length was determined by the number of letters from space to space. This approach is not universal, but it is widely used [19,24]. It is known that there is still no universal, generally accepted unit for measuring word length (number of syllables, number of letters, breath groups, etc.). The choice of such a unit is made on the basis of the problem statement [4,19,24]. At the moment, we cannot be completely sure that the method of measuring word length does not affect the results obtained. So, in [25], it is stated that: “There can be no a priori decision as to what a word is, or in what units word length can be measured. Meanwhile, in contemporary theories of science, linguistics is no exception to the rule: there is hardly any science which would not acknowledge, to one degree or another, that it has to define its object, first, and that constructive processes are at work in doing so. What has not yet been studied is whether there are particular dependencies between the results obtained on the basis of different measurement units; it goes without saying that, if they exist, they are highly likely to be language-specific.”.
Word length calculation was performed automatically using the developed and tested computer program (see an example of such calculation in Figure 1).
As can be seen from Figure 1, the software we use counts digits, and this can often be incorrect in terms of the length of the spoken words that these digits represent. We did not change the software algorithm since this is a rather non-trivial computer task. But the most important reason for our inaction in this matter is that the error introduced by these digits into the results obtained in the paper turns out to be negligibly. This is due to the fact that the share of the number of digits in relation to the total number of words in the studied political speeches is very small. In the speech, a fragment of which is presented in Figure 1, this share is 0.003, and the highest share in the studied speeches was about 0.004. The statistical quantities studied below are not sensitive to this influence. This software was previously used to calculate sentence lengths as well [9]. Therefore, the dots in the text are saved after processing, as is demonstrated in the fragment below. These dots have no effect on word length processing.
The average speech length in words was approximately 2800 for the USA and 5100 for the UK. As can be seen from Figure 2, in the vast majority of cases the number of words for each speech is more than 1000. This is a very good sample size for a reliable analysis. The difference in the average number of words for the USA and the UK is due to political tradition and historical reasons. This difference is not the subject of this article and does not affect the results presented below.
In further analysis, word length will be treated as a random variable. This term is used in the sense of mathematical statistics. One may wonder whether such an assumption can be used when the texts of political speeches are usually carefully prepared. The answer is yes, for two reasons. The first reason is that when preparing speeches, one does not consider the length of the words, but primarily their content and meaning. The second reason is that even if writers of political speeches paid special attention to the length of the words used (and, for example, considered the Menzerath–Altmann law or various readability indexes), many other controlled and uncontrolled factors, such as political, cultural, historical, psychological, etc., would have various multidirectional effects on the choice of words and therefore, from the point of view of mathematical statistics, word length would be random.
Statistical analysis was carried out in widespread professional commercial software MATLAB R2020 (The MathWorks). Distribution parameters were calculated using the Levenberg–Marquardt algorithm implemented in MATLAB. The commercial software Statistica 12.0 (TIBCO Software) was used to visualize the results and to validate and verify several calculations. The data (year of the speech and values corresponding to the processed word lengths of speeches) are open access [26].

3. Analysis of the Word Length Distribution Law

In the process of analyzing the law of distribution, 82 speeches by USA presidents and 245 speeches from the UK were used. Presumed multimodality excluded speeches from the following years: 1793, 1793, 1829, 1833, 1846, 1849, 2005 (for USA) and two speeches: 1814 and 1977 (for UK). Five continuous distributions containing no more than two parameters were used for analysis, namely the following: lognormal, Weibull, Rayleigh, folded normal and half normal (normal). Weibull and lognormal distributions are traditionally used in the study of word lengths, as discussed in the Introduction. Rayleigh distribution is very closely related to the Weibull distribution, and it was important to check whether word lengths are described by this simpler one-parameter distribution. Folded normal and half-normal distributions are directly related to the normal distribution, which is the most basic distribution in mathematical statistics. The belonging to the normal distribution is first of all checked before proceeding to more complex ones.
The ranking of these distributions by the quality of empirical data description was carried out according to the coefficient of determination: the closer the coefficient of determination was to 1, the higher the quality of distribution was (the highest is the first place, the lowest (worst) is the fifth place).
Table 1 and Table 2 summarize the calculation results presented in detail in Appendix A. From the tables we can see that the best distribution is the lognormal distribution. So, it took the first place for the overwhelming number of speeches (in 58 cases out of 82 for the USA and in 230 cases out of 245 for UK). Second place by a wide margin is the Weibull distribution, which was best in only 24 cases (out of 82) for the USA, and 15 cases (out of 245) for the UK. Note that the coefficients of determination for these distributions do not differ from each other remarkably but they differ statistically significantly. So, the average value of the coefficient of determination for the lognormal distribution was 0.998, and it was 0.995 for the Weibull distribution. For folded/half-normal and Rayleigh distributions, the coefficients of determination ranged from 0.979 to 0.986.
Thus, based on the coefficient of determination, the lognormal distribution seems to be the most suitable for describing the law of distribution of word lengths in the texts of the considered public speeches. As is known, the lognormal distribution has the form 1 2 + 1 2 e r f ln x µ σ 2 , where σ and µ are parameters. The computed parameters σ and µ for all analyzed speeches are presented in Appendix B. Examples of the description of empirical data of different years for the USA and the UK are presented in Figure 3 and Figure 4. As can be seen, the lognormal distribution better describes empirical distributions in comparison with its direct “competitor”, the Weibull distribution. This is most noticeable for small word lengths (lengths between one and four). At the same time, from Figure 3 and Figure 4, it can be seen that the lognormal distribution systematically underestimates the mode value of the empirical data by approximately one.
The insets in Figure 3 and Figure 4 show approximations of the empirical histograms using the discrete two-parameter Poisson function λ k e λ / k ! and the one-parameter power function following the brevity law and Zipf’s law. As discussed in the Introduction, these functions are proposed to describe the distribution of word lengths. However, as can be seen from the presented data, these theoretical distributions (functions) are not applicable for the data we are studying.
Note that no significant differences were found in the parameters µ and σ of the lognormal distributions used to describe speeches delivered at approximately the same time by US presidents and members of various parliamentary parties in the UK.

4. Change in Word Length over Time

A number of quantities were calculated to analyze the possible change in word length over time. They are listed below:
  • The average word length. To calculate this parameter, the total number of letters in a speech is divided by the number of words. The change in this parameter over time is shown in Figure 5. As can be seen from the graph, the average word length for UK speeches has remained almost unchanged over two hundred years and is about 4.5. For USA speeches, the result is more complicated. So, from 1789 to about 1950, the average length of words practically did not change and was about 4.9. However, then there is a small stepwise change in the average word length to 4.5. As a result, from 1950 to the present, the average length of words for both the USA and the UK is the same. Within quantitative linguistics it is impossible to understand the reasons for such behavior. However, based on historical facts, we will put forward the following hypotheses to explain this behavior. (1) Initially, USA presidents made speeches only before Congress; other citizens became acquainted with their speeches through newspapers. Such speeches only began to be fully broadcasted on radio and television after the Second World War. This may have led to the observed decrease in average word length. (2) The unchanged average word length for the UK can be explained, apparently, by the fact that the speeches were made before members of Parliament, a rather conservative representative body with strong centuries-old traditions. Adherence to tradition is a quality that is often used to describe British society as a whole. This is illustrated by the result obtained in the work about the unchanged average length of words of parliamentarians over more than two hundred years. (3) It was after the Second World War that a special relationship emerged between the USA and the UK. The term “special relationship” publicly emerged in Winston Churchill’s “Iron Curtain” speech of 1946. The special relationship is a term that is often used to describe the close historic, political, military, economic and cultural relations between USA and UK political leaders and elites. This could be the reason that the vocabulary of USA and UK politicians became very close after the war and, as a consequence, the average length of words is the same.
2.
The median of the word length distribution for all speeches was found to be four and did not change over time.
3.
The maximum word length of speeches delivered over 200 years also did not change and was in the range from 14 to 16. Examples of these most common words for the USA: accountability, administration, constitutional, accomplishment, irresponsibility. For the UK, these words are responsibility, congratulation, disestablishment, apprenticeship, discrimination, internationalism.
4.
The mode for each of the speeches was calculated based on the averaging of the three most probable word lengths. The averaging was performed taking into account the probabilities of occurrence of these three values in the text. This was performed to make this parameter more sensitive to the shape and width of the length distribution near the maximum. The change in the mode of the word length distribution over time is shown in Figure 6. As can be seen, the mode of word lengths of public speeches for both the USA and the UK increases slightly over 200 years from 2.85 to 3.02. This change is approximately linear with a slope of 0.0005± 0.0002 for the USA and 0.0006 ± 0.0001 for the UK.
5.
Figure 7 and Figure 8 show the dependence of the parameters of the lognormal distribution µ, σ on time. As can be seen, µ does not depend on time, remaining approximately equal to 1.25. At the same time, the parameter σ slightly decreases over 200 years from 0.74 to 0.62. These results are consistent with the results presented in Figure 5 and Figure 6. Indeed, as we know, the mean value for a lognormal distribution is related to µ, σ as e x p ( µ + σ 2 / 2 ) . As a consequence, when µ is constant and σ decreases the mean value will decrease. On the other hand, since the mode is related to µ, σ as e x p ( µ σ 2 ) , then when µ remains constant and σ decreases, the mode should increase. The above agreement in the behavior of the parameters of the lognormal distribution µ, σ (Figure 7 and Figure 8) with the results based on the analysis of directly empirical histograms (Figure 5 and Figure 6) provides an additional argument for the applicability of the lognormal distribution to describe the distribution of word lengths.
The change in the probability density of the lognormal distribution with time is presented in Figure 9. It can be seen that this change is very insignificant over two hundred years.

5. Conclusions

This paper shows that the lognormal distribution is better applicable to describe the distribution of word lengths in the texts of public speeches by politicians. This is significantly different from the result obtained for the sentence length distribution, in which the Weibull distribution is preferred [9]. As a consequence, we can conclude that the principle of least effort does not have a significant impact on the length of words used by politicians. As is known, an important reason for the appearance of a lognormal distribution is the presence of a multiplicative random process that determines the random variable [27]. In our case, such a random variable is the length of the word.
It is found that word length in public speeches of USA and UK politicians has remained almost unchanged over the last two hundred years. There has been a very slight decrease in the average value for the USA and a slight increase in the most probable word length (mode). This result differs significantly from the results obtained previously for sentence lengths, the lengths of which decreased significantly over time [9]. The result obtained for words is quite surprising. Indeed, for such a long period of time, many very significant historical events have occurred for humanity, new techniques and technologies have appeared and improved explosively and enormous cultural and social changes have taken place in society. All this inevitably led to the emergence of new words and should have affected the frequency of use of existing words. However, as statistical analysis has shown, the length of words used has not changed much. Apparently, the length of words is statistically a very conservative parameter, which turns out to be insensitive to the above changes. These changes affect words in different ways, which do not affect their average length.
The results obtained in this paper question not only the applicability of the principle of least effort, but indicate a possible limitation of the Menzerath–Altmann, Zipf and brevity laws for analyzing word lengths in public speaking. We base this conclusion on the following logic. (1) As discussed above (see also [9]) the feasibility of the principle of least effort should lead to a Weibull distribution for word lengths and to a decrease in average word lengths over time. However, neither one nor the other was found. (2) Since it was previously discovered in [9] that the length of sentences decreases significantly, then, based on the constancy of the length of words over time obtained for the same speeches, it follows that the Menzerath–Altmann law is inapplicable for the sentence/word pair in the speeches of politicians. (3) Since the consequence of the Zipf and brevity laws is a power–law dependence of the frequency of words on their length, the established lognormal distribution for word lengths shows the limitations of these laws, at least when considering all word lengths, including the shortest ones.
There are quite a lot of laws in quantitative linguistics, only a small part of which we critically discuss in this article. These laws (statements) do not have the same degree of generality and universal acceptance as, for example, can be seen in physics. If we use the terminology of physics, then these statements in quantitative linguistics can be called working hypotheses rather than laws. These working hypotheses need clarification and adjustment. Thanks to research like the present one, this work of turning working hypotheses into laws is taking place.
It is important to emphasize in conclusion, following [4], that word length is an essential property of a word; from the point of view of quantitative linguistics, word length with its relationships with other linguistic structures and levels provides information that is an important part of the general theory of language.

Author Contributions

L.M.M.: conceptualization, methodology, supervision, investigation, writing—original draft, writing—review and editing. N.L.T.: investigation, software, validation, computation, visualization, reading—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Ranking the suitability of theoretical distribution laws for the observed empirical word length distributions based on the determination coefficient (R2) criterion. After the name of the distribution, its R2 is given:
Table A1. Data for the USA (distribution laws).
Table A1. Data for the USA (distribution laws).
Year, PartyI Place, R2II Place, R2III Place, R2
1789Lognormal, 0.995Weibull, 0.993Folded normal, 0.987
1796Lognormal, 0.998Weibull, 0.996Rayleigh, 0.992
1797Weibull, 0.993Lognormal, 0.992Folded normal, 0.989
1798Lognormal, 0.989Weibull, 0.988Folded normal, 0.981
1801Lognormal, 0.997Weibull, 0.994Folded normal, 0.985
1803Lognormal, 0.996Weibull, 0.994Folded normal 0.985
1805Lognormal, 0.996Weibull, 0.995Folded normal, 0.988
1805Weibull, 0.994Lognormal, 0.994Folded normal, 0.988
1809Lognormal, 0.993Weibull, 0.992Folded normal, 0.987
1812Weibull, 0.996Lognormal, 0.992Folded normal, 0.991
1813Lognormal, 0.995Weibull, 0.994Folded normal, 0.987
1815Lognormal, 0.993Weibull, 0.992Folded normal, 0.986
1817Lognormal, 0.994Weibull, 0.994Folded normal, 0.987
1821Lognormal, 0.996Weibull, 0.995Folded normal, 0.987
1821Lognormal, 0.994Weibull, 0.994Folded normal, 0.986
1825Lognormal, 0.992Weibull, 0.992Folded normal, 0.986
1825Lognormal, 0.993Weibull, 0.991Folded normal, 0.984
1827Lognormal, 0.994Weibull, 0.992Folded normal, 0.984
1833Lognormal, 0.993Weibull, 0.992Folded normal, 0.985
1837Weibull, 0.994Lognormal, 0.991Folded normal, 0.989
1838Weibull, 0.993Lognormal, 0.992Folded normal, 0.988
1839Lognormal, 0.993Weibull 0.992Folded normal, 0.987
1841Lognormal, 0.993Weibull, 0.993Folded normal, 0.987
1842Lognormal, 0.993Weibull, 0.993Folded normal, 0.987
1845Lognormal, 0.993Weibull, 0.993Folded normal, 0.987
1848Weibull, 0.995Lognormal, 0.993Folded normal, 0.99
1850Lognormal, 0.993Weibull, 0.993Folded normal, 0.986
1851Lognormal, 0.993Weibull, 0.992Folded normal, 0.985
1853Lognormal, 0.994Weibull, 0.994Folded normal, 0.988
1854Weibull, 0.991Lognormal, 0.988Folded normal, 0.986
1856Lognormal, 0.99Weibull, 0.988Folded normal, 0.982
1857Weibull, 0.995Lognormal, 0.995Folded normal, 0.989
1861Lognormal, 0.996Weibull, 0.994Folded normal, 0.986
1861Lognormal, 0.997Weibull, 0.993Folded normal, 0.983
1864Lognormal, 0.994Weibull, 0.994Folded normal, 0.987
1865Lognormal, 0.998Weibull, 0.993Rayleigh, 0.989
1865Lognormal, 0.992Weibull, 0.99Folded normal, 0.982
1868Weibull, 0.994Lognormal, 0.992Folded normal, 0.989
1869Weibull, 0.995Lognormal, 0.994Folded normal, 0.99
1873Lognormal, 0.995Weibull, 0.994Folded normal, 0.987
1875Lognormal, 0.996Weibull, 0.994Folded normal 0.986
1876Lognormal, 0.994Weibull, 0.994Folded normal, 0.987
1877Weibull, 0.993Lognormal, 0.992Folded normal, 0.989
1877Weibull, 0.992Lognormal, 0.991Folded normal, 0.987
1879Weibull, 0.992Lognormal, 0.989Folded normal, 0.986
1881Lognormal, 0.995Weibull, 0.992Folded normal, 0.984
1885Lognormal, 0.992Weibull, 0.992Folded normal, 0.985
1889Lognormal, 0.995Weibull, 0.994Folded normal, 0.986
1893Weibull, 0.993Lognormal, 0.991Folded normal, 0.988
1893Weibull, 0.993Lognormal, 0.992Folded normal, 0.987
1895Weibull, 0.992Lognormal, 0.99Folded normal, 0.987
1897Lognormal, 0.992Weibull, 0.992Folded normal, 0.986
1901Lognormal, 0.993Weibull, 0.992Folded normal, 0.984
1905Lognormal, 0.998Weibull, 0.994Folded normal, 0.985
1909Weibull, 0.993Lognormal, 0.992Folded normal, 0.987
1913Lognormal, 0.998Weibull, 0.994Rayleigh, 0.984
1917Lognormal, 0.998Weibull, 0.994Rayleigh, 0.987
1921Lognormal, 0.993Weibull, 0.993Folded normal, 0.987
1925Lognormal, 0.995Weibull, 0.993Folded normal, 0.986
1929Weibull, 0.993Lognormal, 0.993Folded normal, 0.987
1933Weibull, 0.996Lognormal, 0.996Folded normal, 0.989
1937Lognormal, 0.996Weibull, 0.996Folded normal, 0.989
1941Lognormal, 0.997Weibull, 0.993Folded normal, 0.984
1945Lognormal, 0.999Weibull, 0.995Folded normal, 0.986
1949Weibull, 0.996Lognormal, 0.995Folded normal, 0.991
1953Lognormal, 0.997Weibull, 0.995Folded normal, 0.987
1957Lognormal, 0.998Weibull, 0.995Rayleigh, 0.99
1961Lognormal, 0.998Weibull, 0.996Rayleigh, 0.992
1965Lognormal, 0.997Weibull, 0.996Rayleigh, 0.99
1969Lognormal, 0.998Weibull, 0.996Rayleigh, 0.988
1973Lognormal, 0.999Weibull, 0.993Folded normal, 0.983
1977Lognormal, 0.998Weibull, 0.996Folded normal, 0.988
1981Lognormal, 0.997Weibull, 0.995Folded normal, 0.987
1985Lognormal, 0.998Weibull, 0.996Folded normal, 0.988
1989Lognormal, 0.999Weibull, 0.996Rayleigh, 0.993
1993Lognormal, 0.997Weibull, 0.994Rayleigh, 0.986
1997Lognormal, 0.998Weibull, 0.995Rayleigh, 0.989
2001Weibull, 0.996Lognormal, 0.995Folded normal, 0.99
2009Lognormal, 0.998Weibull, 0.995Rayleigh, 0.99
2013Lognormal, 0.998Weibull, 0.996Rayleigh, 0.99
2017Lognormal, 0.997Weibull, 0.995Rayleigh, 0.992
2021Lognormal, 0.998Weibull, 0.996Rayleigh, 0.992
Table A2. Data for UK (distribution laws).
Table A2. Data for UK (distribution laws).
Year, PartyI Place, R2II Place, R2III Place, R2
1808Lognormal, 0.996Weibull, 0.996Folded normal, 0.989
1814Lognormal, 0.996Weibull, 0.993Folded normal, 0.985
1815Lognormal, 0.998Weibull, 0.996Folded normal, 0.988
1817Lognormal, 0.995Weibull, 0.995Folded normal, 0.987
1819Lognormal, 0.993Weibull, 0.993Folded normal, 0.986
1827Lognormal, 0.996Weibull, 0.995Folded normal, 0.987
1830Lognormal, 0.998Weibull, 0.995Rayleigh, 0.987
1837Lognormal, 0.996Weibull, 0.994Folded normal, 0.985
1842Weibull, 0.995Lognormal, 0.994Folded normal, 0.99
1853Lognormal, 0.995Weibull, 0.994Folded normal, 0.985
1868Lognormal, 0.998Weibull, 0.993Folded normal, 0.985
1877Lognormal, 0.996Weibull, 0.992Folded normal, 0.984
1893Lognormal, 0.993Weibull, 0.987Half Normal, 0.979
1895 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1896 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1897 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1897 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1899 LiberalLognormal, 0.998Weibull, 0.993Folded normal, 0.984
1900 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.988
1901 LiberalLognormal, 0.998Weibull, 0.994Folded normal, 0.984
1902 ConservativeLognormal, 0.997Weibull, 0.996Folded normal, 0.988
1903 ConservativeLognormal, 0.998Weibull, 0.997Folded normal, 0.99
1903 LiberalLognormal, 0.998Weibull, 0.994Folded normal, 0.984
1904 ConservativeLognormal, 0.997Weibull, 0.996Folded normal, 0.989
1905 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1906 ConservativeLognormal, 0.997Weibull, 0.994Folded normal, 0.987
1907Lognormal, 0.998Weibull 0.994Folded normal, 0.986
1907 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1907 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1908 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1908 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1909 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1909 LiberalLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1910 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.989
1910 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.987
1911 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1912 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.989
1912 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1913 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.989
1913 LiberalLognormal, 0.997Weibull, 0.994Folded normal, 0.986
1917Lognormal 0.999Weibull, 0.996Folded normal, 0.988
1918 LiberalLognormal, 0.998Weibull, 0.994Folded normal, 0.986
1919 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1920 ConservativeLognormal, 0.999Weibull, 0.995Rayleigh, 0.988
1920 LiberalLognormal, 0.997Weibull, 0.993Folded normal, 0.985
1921 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.99
1921 LiberalLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1922 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.989
1922 LiberalLognormal, 0.997Weibull, 0.993Folded normal, 0.984
1923 LiberalLognormal, 0.996Weibull, 0.993Folded normal, 0.984
1924 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.987
1924 LabourLognormal, 0.997Weibull, 0.993Folded normal, 0.984
1924 LiberalLognormal, 0.995Weibull, 0.993Folded normal, 0.985
1925 ConservativeLognormal, 0.997Weibull, 0.994Folded normal, 0.985
1925 LiberalLognormal, 0.995Weibull, 0.993Folded normal, 0.985
1926 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1927Lognormal, 0.999Weibull, 0.996Rayleigh, 0.988
1927 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.988
1927 LiberalLognormal, 0.999Weibull, 0.996Rayleigh, 0.991
1928 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1928 LiberalLognormal, 0.998Weibull, 0.994Folded normal, 0.984
1929 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.988
1929 LiberalLognormal, 0.999Weibull, 0.994Rayleigh, 0.989
1930 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1932 ConservativeLognormal, 0.998Weibull, 0.993Folded normal, 0.984
1932 LiberalLognormal, 0.997Weibull, 0.995Rayleigh, 0.984
1933 ConservativeLognormal, 0.999Weibull, 0.993Rayleigh, 0.985
1934 ConservativeLognormal, 0.998Weibull, 0.993Folded normal, 0.983
1935 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1936 LiberalLognormal, 0.994Weibull, 0.993Folded normal, 0.986
1937Lognormal, 0.997Weibull, 0.996Folded normal, 0.989
1937 LiberalLognormal, 0.996Weibull, 0.994Folded normal, 0.985
1941 LiberalLognormal, 0.998Weibull, 0.994Rayleigh, 0.985
1942 LiberalLognormal, 0.996Weibull, 0.995Folded normal, 0.987
1943 LiberalLognormal, 0.997Weibull, 0.994Folded normal, 0.986
1945 LiberalLognormal, 0.996Weibull, 0.994Folded normal, 0.986
1946 LabourLognormal, 0.998Weibull, 0.994Folded normal, 0.984
1947Lognormal 0.998Weibull, 0.996Folded normal, 0.988
1947 LabourLognormal, 0.998Weibull, 0.993Folded normal, 0.984
1948 LabourLognormal, 0.997Weibull, 0.995Folded normal, 0.986
1949 LabourLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1950 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.985
1951 LabourLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1955 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.987
1956 ConservativeLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1957Lognormal 0.999Weibull, 0.994Rayleigh, 0.99
1957 ConservativeLognormal, 0.998Weibull, 0.994Rayleigh, 0.986
1958 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1960 ConservativeLognormal, 0.998Weibull, 0.993Folded normal, 0.983
1961 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1962 ConservativeLognormal, 0.997Weibull, 0.994Folded normal, 0.985
1963 ConservativeLognormal, 0.997Weibull, 0.994Folded normal, 0.985
1963 LiberalLognormal, 0.998Weibull, 0.996Folded normal, 0.987
1964 LabourLognormal, 0.996Weibull, 0.995Folded normal, 0.989
1965 ConservativeLognormal, 0.999Weibull, 0.995Folded normal, 0.986
1965 LabourLognormal, 0.997Weibull, 0.995Folded normal, 0.988
1966 ConservativeLognormal, 0.998Weibull, 0.994Rayleigh, 0.986
1966 LabourWeibull, 0.995Lognormal, 0.994Folded normal, 0.988
1967Lognormal, 0.997Weibull, 0.995Folded normal, 0.988
1967 ConservativeLognormal, 0.998Weibull, 0.994Rayleigh, 0.984
1967 LabourLognormal, 0.997Weibull, 0.995Folded normal, 0.986
1968 ConservativeLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1968 LabourLognormal, 0.997Weibull, 0.996Folded normal, 0.988
1969 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.988
1969 LabourLognormal, 0.997Weibull, 0.996Folded normal, 0.989
1970 ConservativeLognormal, 0.997Weibull, 0.991Folded normal, 0.981
1970 LabourLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1971 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.987
1971 LabourLognormal, 0.998Weibull, 0.996Folded normal, 0.989
1972 ConservativeLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1972 LabourLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1973 ConservativeLognormal, 0.998Weibull, 0.994Rayleigh, 0.986
1973 LabourLognormal, 0.997Weibull, 0.997Folded normal, 0.99
1974 LabourLognormal, 0.996Weibull, 0.995Folded normal, 0.988
1975 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.986
1975 LabourLognormal, 0.996Weibull, 0.995Folded normal, 0.987
1976 ConservativeLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1976 LabourLognormal, 0.997Weibull, 0.996Folded normal, 0.988
1977 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.988
1977 Liberal aWeibull, 0.995Lognormal, 0.995Folded normal, 0.989
1977 Liberal bWeibull, 0.995Lognormal, 0.995Folded normal, 0.989
1978 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.989
1978 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1978 LiberalLognormal, 0.997Weibull, 0.996Folded normal, 0.989
1979 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.987
1979 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.985
1979 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1980 ConservativeLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1980 LabourLognormal, 0.998Weibull, 0.994Folded normal, 0.985
1980 LiberalWeibull, 0.996Lognormal, 0.995Folded normal, 0.989
1981 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.989
1981 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.987
1981 LiberalLognormal, 0.997Weibull, 0.995Folded normal, 0.986
1982 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.99
1982 LabourLognormal, 0.999Weibull, 0.994Rayleigh, 0.986
1982 LiberalLognormal, 0.996Weibull, 0.995Folded normal, 0.988
1982 SDP-Liberal Alliance Lognormal, 0.995Weibull, 0.993Folded normal, 0.985
1983 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.988
1983 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.987
1983 LiberalLognormal, 0.996Weibull, 0.996Folded normal, 0.988
1984 ConservativeLognormal, 0.998Weibull, 0.995Rayleigh, 0.987
1984 LabourLognormal, 0.997Weibull, 0.993Folded normal, 0.983
1984 LiberalLognormal, 0.996Weibull, 0.995Folded normal, 0.988
1985 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.989
1985 LabourLognormal, 0.998Weibull, 0.995Folded normal, 0.986
1985 LiberalLognormal, 0.996Weibull, 0.996Folded normal, 0.988
1986 ConservativeWeibull, 0.997Lognormal, 0.997Rayleigh, 0.992
1986 LabourLognormal, 0.997Weibull, 0.994Folded normal, 0.986
1986 LiberalWeibull, 0.996Lognormal, 0.996Folded normal, 0.99
1987 ConservativeLognormal, 0.997Weibull, 0.997Rayleigh, 0.989
1987Weibull 0.997Lognormal, 0.996Folded normal 0.991
1987 LabourLognormal, 0.998Weibull, 0.993Folded normal, 0.984
1987 SDP-Liberal Alliance aLognormal, 0.997Weibull, 0.994Folded normal, 0.985
1987 SDP-Liberal Alliance bLognormal, 0.997Weibull, 0.995Folded normal, 0.987
1988 ConservativeLognormal, 0.997Weibull, 0.996Rayleigh, 0.989
1988 LabourLognormal, 0.996Weibull, 0.994Folded normal, 0.987
1988 LiberalWeibull, 0.996Lognormal, 0.996Folded normal, 0.99
1989 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.991
1989 LabourLognormal, 0.998Weibull, 0.993Folded normal, 0.983
1990 LabourLognormal, 0.996Weibull, 0.994Folded normal, 0.986
1991 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.993
1991 LabourLognormal, 0.996Weibull, 0.995Folded normal, 0.987
1992 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.992
1992 LabourLognormal, 0.997Weibull, 0.995Folded normal, 0.988
1992 Liberal DemocratLognormal, 0.999Weibull, 0.996Rayleigh, 0.99
1993 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.993
1993 LabourLognormal, 0.998Weibull, 0.996Rayleigh, 0.989
1993 Liberal DemocratLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1994 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.992
1994 LabourLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1994 Liberal DemocratLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1995 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.993
1995 LabourWeibull, 0.998Lognormal, 0.997Rayleigh, 0.992
1996 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.991
1996 LabourLognormal, 0.999Weibull, 0.996Rayleigh, 0.989
1996 Liberal DemocratLognormal, 0.998Weibull, 0.996Rayleigh, 0.991
1997 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.991
1997Lognormal, 0.996Weibull, 0.994Folded normal 0.986
1997 LabourLognormal, 0.998Weibull, 0.996Folded normal, 0.988
1997 LabourLognormal, 0.998Weibull, 0.997Rayleigh, 0.99
1998 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.99
1998 LabourLognormal, 0.997Weibull, 0.997Rayleigh, 0.99
1998 Liberal DemocratLognormal, 0.998Weibull, 0.996Rayleigh, 0.99
1999 ConservativeLognormal, 0.998Weibull, 0.996Rayleigh, 0.989
1999 LabourLognormal, 0.998Weibull, 0.996Rayleigh, 0.99
1999 Liberal Democrat aLognormal, 0.999Weibull, 0.994Folded normal, 0.984
1999 Liberal Democrat bLognormal, 0.997Weibull, 0.996Rayleigh, 0.989
2000 ConservativeLognormal, 0.999Weibull, 0.995Rayleigh, 0.992
2000 LabourLognormal, 0.997Weibull, 0.997Folded normal, 0.99
2000 Liberal DemocratLognormal, 0.998Weibull, 0.996Rayleigh, 0.992
2001 ConservativeLognormal, 0.997Weibull, 0.997Folded normal, 0.989
2001 LabourLognormal, 0.997Weibull, 0.997Folded normal, 0.99
2001 Liberal DemocratLognormal, 0.997Weibull, 0.996Folded normal, 0.987
2002 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.993
2002 LabourLognormal, 0.998Weibull, 0.996Folded normal, 0.988
2002 Liberal DemocratLognormal, 0.997Weibull, 0.997Folded normal, 0.989
2003 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.992
2003 LabourLognormal, 0.998Weibull, 0.995Rayleigh, 0.988
2003 Liberal DemocratLognormal, 0.998Weibull, 0.996Rayleigh, 0.99
2004 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.993
2004 LabourLognormal, 0.998Weibull, 0.996Rayleigh, 0.991
2004 Liberal DemocratlLognormal, 0.998Weibull, 0.996Rayleigh, 0.991
2005 ConservativeLognormal, 0.999Weibull, 0.995Rayleigh, 0.989
2005 LabourLognormal, 0.998Weibull, 0.996Rayleigh, 0.988
2005 Liberal DemocratLognormal, 0.997Weibull, 0.997Folded normal, 0.989
2006 Conservative aLognormal, 0.999Weibull, 0.997Rayleigh, 0.99
2006 Conservative bLognormal, 0.999Weibull, 0.996Rayleigh, 0.99
2006 LabourLognormal, 0.998Weibull, 0.997Rayleigh, 0.989
2006 Liberal DemocratLognormal, 0.997Weibull, 0.995Folded normal, 0.987
2007Weibull, 0.995Lognormal, 0.994Folded normal, 0.989
2007 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.994
2007 LabourLognormal, 0.998Weibull, 0.996Rayleigh, 0.989
2007 Liberal DemocratLognormal, 0.998Weibull, 0.997Rayleigh, 0.991
2008 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.99
2008 LabourLognormal, 0.998Weibull, 0.997Rayleigh, 0.992
2008 Liberal DemocratlLognormal, 0.998Weibull, 0.998Rayleigh, 0.994
2009 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.993
2009 LabourLognormal, 0.999Weibull, 0.995Rayleigh, 0.992
2009 Liberal DemocratLognormal, 0.998Weibull, 0.997Rayleigh, 0.991
2010 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.994
2010 LabourLognormal, 0.998Weibull, 0.995Rayleigh, 0.987
2010 Liberal DemocratlLognormal, 0.999Weibull, 0.996Rayleigh, 0.993
2011 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.993
2011 LabourLognormal, 0.999Weibull, 0.996Rayleigh, 0.994
2011 Liberal DemocratLognormal, 0.999Weibull, 0.997Rayleigh, 0.994
2012 ConservativeLognormal, 0.999Weibull, 0.996Rayleigh, 0.995
2012 Liberal DemocratLognormal, 0.999Weibull, 0.996Rayleigh, 0.992
2013 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.996
2013 LabourLognormal, 0.999Weibull, 0.997Rayleigh, 0.996
2013 Liberal DemocratLognormal, 0.998Weibull, 0.997Rayleigh, 0.992
2014 ConservativeLognormal, 0.999Weibull, 0.997Rayleigh, 0.995
2014 LabourLognormal, 0.999Weibull, 0.997Rayleigh, 0.993
2014 Liberal DemocratLognormal, 0.999Weibull, 0.996Rayleigh, 0.991
2015 ConservativeLognormal, 0.998Weibull, 0.998Rayleigh, 0.996
2015 LabourLognormal, 0.997Weibull, 0.997Rayleigh, 0.991
2015 Liberal DemocratLognormal, 0.998Weibull, 0.997Rayleigh, 0.99
2016 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.992
2016 LabourWeibull, 0.997Lognormal, 0.997Folded normal, 0.991
2016 Liberal DemocratLognormal, 0.998Weibull, 0.998Rayleigh, 0.992
2017Lognormal 0.997Weibull 0.996Rayleigh 0.991
2017 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.992
2017 LabourWeibull, 0.997Lognormal, 0.997Rayleigh, 0.991
2017 Liberal DemocratWeibull, 0.997Lognormal, 0.996Folded normal, 0.991
2018 ConservativeLognormal, 0.998Weibull, 0.997Rayleigh, 0.993
2018 LabourLognormal, 0.997Weibull, 0.996Folded normal, 0.989
“a” and “b” denote different speeches by the same party during the same year.

Appendix B

The parameters for the lognormal distributions used for the description of the empirical word length distributions.
Table A3. Data for the USA.
Table A3. Data for the USA.
YearR2µσ
17890.9951.270.73
17960.9981.190.60
17970.9921.260.73
17980.9891.290.73
18010.9971.250.68
18030.9961.250.68
18050.9941.270.70
18050.9961.280.68
18090.9931.250.75
18120.9921.320.70
18130.9951.260.70
18150.9931.230.73
18170.9941.260.70
18210.9941.260.69
18210.9961.270.68
18250.9931.270.71
18250.9921.290.72
18270.9941.240.69
18330.9931.260.71
18370.9911.290.72
18380.9921.290.71
18390.9931.240.74
18410.9931.250.71
18420.9931.230.73
18450.9931.270.71
18480.9931.300.68
18500.9931.250.71
18510.9931.230.71
18530.9941.280.71
18540.9881.300.73
18560.9901.240.74
18570.9951.280.68
18610.9971.250.68
18610.9961.220.70
18640.9941.290.68
18650.9921.300.68
18650.9981.220.61
18680.9921.320.69
18690.9941.230.70
18730.9951.210.71
18750.9961.240.69
18760.9941.270.71
18770.9911.290.71
18770.9921.270.72
18790.9891.300.71
18810.9951.270.68
18850.9921.280.71
18890.9951.280.68
18930.9921.290.71
18930.9911.310.71
18950.9901.290.72
18970.9921.270.70
19010.9931.290.70
19050.9981.210.66
19090.9921.250.72
19130.9981.200.65
19170.9981.180.64
19210.9931.290.71
19250.9951.250.69
19290.9931.290.72
19330.9961.240.69
19370.9961.260.68
19410.9971.200.67
19450.9991.160.66
19490.9951.300.67
19530.9971.230.64
19570.9981.190.63
19610.9981.220.62
19650.9971.180.62
19690.9981.160.64
19730.9991.160.67
19770.9981.200.65
19810.9971.190.67
19850.9981.220.64
19890.9991.150.61
19930.9971.230.64
19970.9981.230.63
20010.9951.220.66
20090.9981.220.62
20130.9981.250.63
20170.9971.270.61
20210.9981.170.62
Table A4. Data for UK.
Table A4. Data for UK.
YearR2µσ
18080.9961.240.68
18140.9961.210.70
18150.9981.160.68
18170.9951.220.68
18190.9931.260.71
18270.9961.100.71
18300.9981.200.65
18370.9961.250.67
18420.9941.220.71
18530.9951.210.68
18680.9981.200.70
18770.9961.170.70
18930.9931.150.73
1895 Liberal0.9981.170.69
1896 Liberal0.9981.170.67
1897 Liberal0.9981.170.66
1897 Conservative0.9981.190.67
1899 Liberal0.9981.160.67
1900 Conservative0.9981.180.63
1901 Liberal0.9981.180.66
1902 Conservative0.9971.200.68
1903 Conservative0.9981.190.68
1903 Liberal0.9981.160.67
1904 Conservative0.9971.210.66
1905 Liberal0.9971.210.68
1906 Conservative0.9971.190.69
19070.9981.140.69
1907 Conservative0.9981.210.67
1907 Liberal0.9981.180.67
1908 Liberal0.9981.170.66
1908 Conservative0.9981.200.66
1909 Liberal0.9981.170.67
1909 Conservative0.9981.180.66
1910 Liberal0.9981.170.68
1910 Conservative0.9981.180.66
1911 Conservative0.9981.170.65
1912 Liberal0.9971.170.68
1912 Conservative0.9991.150.64
1913 Conservative0.9991.160.63
1913 Liberal0.9971.180.69
19170.9991.180.66
1918 Liberal0.9981.200.68
1919 Liberal0.9971.180.69
1920 Liberal0.9971.180.69
1920 Conservative0.9991.110.64
1921 Liberal0.9981.180.68
1921 Conservative0.9991.170.63
1922 Liberal0.9971.190.69
1922 Conservative0.9981.130.67
1923 Liberal0.9961.200.70
1924 Liberal0.9951.230.70
1924 Labour0.9971.190.68
1924 Conservative0.9981.160.66
1925 Liberal0.9951.260.69
1925 Conservative0.9971.200.67
1926 Conservative0.9981.210.66
19270.9991.160.65
1927 Liberal0.9991.200.61
1927 Conservative0.9981.180.64
1928 Liberal0.9981.180.67
1928 Conservative0.9981.180.65
1929 Liberal0.9991.200.62
1929 Conservative0.9981.200.63
1930 Liberal0.9971.210.67
1932 Conservative0.9981.170.67
1932 Liberal0.9971.240.66
1933 Conservative0.9991.170.65
1934 Conservative0.9981.160.67
1935 Conservative0.9981.170.65
1936 Liberal0.9941.260.68
19370.9971.190.67
1937 Liberal0.9961.220.68
1941 Liberal0.9981.230.65
1942 Liberal0.9961.240.67
1943 Liberal0.9971.240.66
1945 Liberal0.9961.230.67
1946 Labour0.9981.200.67
19470.9981.190.64
1947 Labour0.9981.180.67
1948Labour0.9971.200.68
1949 Labour0.9981.190.67
1950 Labour0.9981.180.67
1951 Labour0.9981.170.67
1955 Conservative0.9981.180.65
1956 Conservative0.9981.190.66
19570.9991.190.61
1957 Conservative0.9981.180.64
1958 Conservative0.9981.210.65
1960 Conservative0.9981.200.67
1961 Conservative0.9981.190.65
1962 Conservative0.9971.220.67
1963 Liberal0.9981.200.65
1963 Conservative0.9971.220.67
1964 Labour0.9961.240.69
1965 Labour0.9971.220.68
1965 Conservative0.9991.190.65
1966 Labour0.9941.280.69
1966 Conservative0.9981.190.64
19670.9971.200.68
1967 Labour0.9971.250.67
1967 Conservative0.9981.190.65
1968 Labour0.9971.260.66
1968 Conservative0.9981.230.66
1969 Labour0.9971.240.65
1969 Conservative0.9981.180.64
1970 Labour0.9971.250.67
1970 Conservative0.9971.190.69
1971 Conservative0.9981.190.64
1971 Labour0.9981.260.66
1972 Conservative0.9981.180.66
1972 Labour0.9981.220.66
1973 Conservative0.9981.180.64
1973 Labour0.9971.260.65
1974 Labour0.9961.260.67
1975 Conservative0.9981.200.64
1975 Labour0.9961.280.68
1976 Conservative0.9981.210.65
1976 Labour0.9971.220.68
1977 Conservative0.9981.190.63
1997 Labour0.9981.250.65
1977 Liberal a0.9951.250.69
1977 Liberal b0.9951.260.70
1978 Conservative0.9991.200.63
1978 Labour0.9981.180.67
1978 Liberal0.9971.250.66
1979 Conservative0.9981.200.64
1979 Labour0.9981.190.66
1979 Liberal0.9971.240.67
1980 Conservative0.9981.220.65
1980 Labour0.9981.180.66
1980 Liberal0.9951.270.68
1981 Conservative0.9981.200.63
1981 Labour0.9981.170.66
1981 Liberal0.9971.250.67
1982 SDP-Liberal Alliance 0.9951.210.70
1982 Conservative0.9981.210.62
1982 Labour0.9991.150.65
1982 Liberal0.9961.260.67
1983 Conservative0.9981.210.63
1983 Labour0.9981.160.65
1983 Liberal0.9961.250.67
1984 Conservative0.9981.210.64
1984 Labour0.9971.220.65
1984 Liberal0.9961.240.68
1985 Conservative0.9981.240.63
1985 Labour0.9981.210.65
1985 Liberal0.9961.280.67
1986 Conservative0.9971.270.62
1986 Labour0.9971.220.66
1986 Liberal0.9961.270.67
19870.9961.300.65
1987 Conservative0.9971.260.63
1987 Labour0.9981.200.67
1987 SDP-Liberal Alliance a0.9971.240.66
1987 SDP-Liberal Alliance b0.9971.270.67
1988 Conservative0.9971.260.63
1988 Labour0.9961.250.67
1988 Liberal0.9961.230.68
1989 Conservative0.9981.230.62
1989 Labour0.9981.190.66
1990 Labour0.9961.240.66
1991 Conservative0.9991.210.61
1991 Labour0.9961.230.65
1992 Conservative0.9991.210.61
1992 Labour0.9971.220.66
1992 Liberal Democrat0.9991.200.63
1993 Conservative0.9991.200.60
1993 Labour0.9981.230.63
1993 Liberal Democrat0.9981.220.67
1994 Conservative0.9981.210.61
1994 Labour0.9981.200.65
1994 Liberal Democrat0.9981.230.65
1995 Conservative0.9991.210.61
1995 Labour0.9971.210.63
1996 Conservative0.9981.180.62
1996 Labour0.9991.160.64
1996 Liberal Democrat0.9981.220.62
19970.9961.190.67
1997 Conservative0.9981.190.63
1997 Labour0.9981.190.64
1998 Conservative0.9981.230.63
1998 Labour0.9971.190.63
1998 Liberal Democrat0.9981.230.63
1999 Conservative0.9981.230.64
1999 Labour0.9981.200.63
1999 Liberal Democrat a0.9991.190.66
1999 Liberal Democrat b0.9971.240.64
2000 Conservative0.9991.210.61
2000 Labour0.9971.240.64
2000 Liberal Democrat0.9981.260.61
2001 Conservative0.9971.230.63
2001 Labour0.9971.240.64
2001 Liberal Democrat0.9971.260.65
2002 Conservative0.9991.210.60
2002 Labour0.9981.220.64
2002 Liberal Democrat0.9971.280.64
2003 Conservative0.9991.230.60
2003 Labour0.9981.190.63
2003 Liberal Democrat0.9981.260.63
2004 Conservative0.9991.200.61
2004 Labour0.9981.210.62
2004 Liberal Democrat0.9981.280.62
2005 Conservative0.9991.170.63
2005 Labour0.9981.210.64
2005 Liberal Democrat0.9971.240.65
2006 Conservative a0.9991.240.64
2006 Conservative b0.9991.220.63
2006 Labour0.9981.220.64
2006 Liberal Democrat0.9971.240.65
20070.9941.320.70
2007 Conservative0.9991.170.60
2007 Labour0.9981.190.64
2007 Liberal Democrat0.9981.250.62
2008 Conservative0.9991.210.63
2008 Labour0.9981.210.61
2008 Liberal Democratl0.9981.240.60
2009 Conservative0.9991.200.60
2009 Labour0.9991.210.60
2009 Liberal Democrat0.9981.220.62
2010 Conservative0.9991.220.60
2010 Labour0.9981.180.64
2010 Liberal Democratl0.9991.220.61
2011 Conservative0.9991.220.60
2011 Labour0.9991.180.59
2011 Liberal Democrat0.9991.230.60
2012 Conservative0.9991.190.58
2012 Liberal Democrat0.9991.210.62
2013 Conservative0.9991.200.58
2013 Labour0.9991.150.58
2013 Liberal Democrat0.9981.220.62
2014 Conservative0.9991.180.59
2014 Labour0.9991.190.60
2014 Liberal Democrat0.9991.230.61
2015 Conservative0.9981.210.59
2015 Labour0.9971.240.62
2015 Liberal Democrat0.9981.190.63
2016 Conservative0.9981.230.62
2016 Labour0.9971.280.63
2016 Liberal Democrat0.9981.210.62
20170.9971.250.62
2017 Conservative0.9981.220.62
2017 Labour0.9971.290.63
2017 Liberal Democrat0.9961.250.67
2018 Conservative0.9981.220.62
2018 Labour0.9971.250.65
“a” and “b” denote different speeches by the same party during the same year.

References

  1. Altmann, G. Prolegomena to Menzerath’s law. Glottometrika 1980, 2, 1–10. [Google Scholar]
  2. Zipf, G.K. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology; Addison-Wesley Publishing: Cambridge, UK, 1949. [Google Scholar]
  3. Popescu, I.; Naumann, S.; Kelih, E.; Rovenchak, A.; Overbeck, A.; Sanada, H.; Smith, R.D.; Čech, R.; Mohanty, P.; Wilson, A. Word length: Aspects and languages. Issues Quant. Linguist. 2013, 3, 224–281. [Google Scholar]
  4. Grzybek, P. (Ed.) Contributions to the Science of Text and Language: Word Length Studies and Related Issues; Springer Science & Business Media: Dordrecht, The Netherlands, 2006; Volume 31. [Google Scholar]
  5. Rottmann, O.A. On Word Length in German and Polish. Glottometrics 2018, 42, 13–20. [Google Scholar]
  6. Vieira, D.S.; Picoli, S.; Mendes, R.S. Robustness of sentence length measures in written texts. Phys. A Stat. Mech. Its Appl. 2018, 506, 749–754. [Google Scholar] [CrossRef]
  7. Milička, J. Average Word Length from the Diachronic Perspective: The Case of Arabic. Linguist. Front. 2018, 1, 81–89. [Google Scholar] [CrossRef]
  8. Liberman, M. Real Trends in Word and Sentence Length. 2011. Available online: http://languagelog.ldc.upenn.edu/nll (accessed on 25 December 2023).
  9. Tsizhmovska, N.L.; Martyushev, L.M. Principle of least effort and sentence length in public speaking. Entropy 2021, 23, 1023. [Google Scholar] [CrossRef]
  10. Tucker, E.C.; Capps, C.J.; Shamir, L. A data science approach to 138 years of congressional speeches. Heliyon 2020, 6, e04417. [Google Scholar] [CrossRef] [PubMed]
  11. Bochkarev, V.V.; Shevlyakova, A.V.; Solovyev, V.D. The average word length dynamics as an indicator of cultural changes in society. Soc. Evol. Hist. 2015, 14, 153–175. [Google Scholar]
  12. Lenard, D.B. Gender differences in the length of words and sentences on the corpus of congressional speeches. Imp. J. Interdiscip. Res. 2016, 2, 1417–1424. [Google Scholar]
  13. Corral, Á.; Serra, I. The brevity law as a scaling law, and a possible origin of zipf’s law for word frequencies. Entropy 2020, 22, 224. [Google Scholar] [CrossRef] [PubMed]
  14. Chen, H.; Liu, H. A diachronic study of Chinese word length distribution. Glottometrics 2014, 29, 81–94. [Google Scholar]
  15. Sigurd, B.; Eeg-Olofsson, M.; Van de Weijer, J. Word length, sentence length and frequency—Zipf revisited. Stud. Linguist. 2004, 58, 37–52. [Google Scholar] [CrossRef]
  16. Grzybek, P. Word Length. In The Oxford Handbook of the Word; Taylor, J.R., Ed.; Oxford Academic: Oxford, UK, 2015. [Google Scholar]
  17. Torre, I.G.; Luque, B.; Lacasa, L.; Kello, C.T.; Hernández-Fernández, A. On the physical origin of linguistic laws and lognormality in speech. R. Soc. Open Sci. 2019, 6, 191023. [Google Scholar] [CrossRef] [PubMed]
  18. Rosen, K.M. Analysis of speech segment duration with the lognormal distribution: A basis for unification and comparison. J. Phon. 2005, 33, 411–426. [Google Scholar] [CrossRef]
  19. Kosmidis, K.; Kalampokis, A.; Argyrakis, P. Language time series analysis. Phys. A Stat. Mech. Its Appl. 2006, 370, 808–816. [Google Scholar] [CrossRef]
  20. University of Virginia. Famous Presidential Speeches of the United States. Available online: https://millercenter.org/the-presidency/presidential-speeches (accessed on 1 January 2022).
  21. Inaugural Addresses of the Presidents of the United States. Available online: https://www.bartleby.com/124/ (accessed on 25 July 2021).
  22. British Political Speech; Swansea University: Swansea, UK. Available online: http://britishpoliticalspeech.org/index.htm (accessed on 25 July 2021).
  23. UK Parliament 2023. Hansard. Available online: https://hansard.parliament.uk/ (accessed on 1 January 2022).
  24. Montemurro, M.A.; Pury, P.A. Long-range fractal correlations in literary corpora. Fractals 2002, 10, 451–461. [Google Scholar] [CrossRef]
  25. Grzybek, P. History and Methodology of Word Length Studies: The State of the Art; Springer: Dordrecht, The Netherlands, 2007; pp. 15–90. [Google Scholar]
  26. Tsizhmovska, N.L. Word Length in Public Speaking; Ural Federal University: Yekaterinburg, Russia, 2023; Available online: https://github.com/Kototiapa/Word-Length-in-Public-Speaking (accessed on 1 January 2022).
  27. Sobkowicz, P.; Thelwall, M.; Buckley, K.; Paltoglou, G.; Sobkowicz, A. Lognormal distributions of user post lengths in Internet discussions—A consequence of the Weber-Fechner law? EPJ Data Sci. 2013, 2, 1–20. [Google Scholar] [CrossRef]
Figure 1. Example of the word length calculation by the algorithm used in the study.
Figure 1. Example of the word length calculation by the algorithm used in the study.
Entropy 26 00180 g001
Figure 2. Number of words in the text N versus time t. Black triangles indicate data for the UK and red circles indicate data for the USA.
Figure 2. Number of words in the text N versus time t. Black triangles indicate data for the UK and red circles indicate data for the USA.
Entropy 26 00180 g002
Figure 3. Word length distribution histogram. The dashed red line is the lognormal distribution, the solid blue line is the Weibull distribution. The inset shows the best possible approximation of the distribution histogram using a discrete Poisson distribution (black dots) and a power function of the form const/(word length), represented by the dashed green line. (a) USA speech, delivered in 1873, 1334 words. The determination coefficient is 0.995 for lognormal distribution (µ = 1.21, σ = 0.71); (b) UK speech, delivered in 1868; 1595 words. The determination coefficient is 0.998 for lognormal distribution (µ = 1.20, σ = 0.70).
Figure 3. Word length distribution histogram. The dashed red line is the lognormal distribution, the solid blue line is the Weibull distribution. The inset shows the best possible approximation of the distribution histogram using a discrete Poisson distribution (black dots) and a power function of the form const/(word length), represented by the dashed green line. (a) USA speech, delivered in 1873, 1334 words. The determination coefficient is 0.995 for lognormal distribution (µ = 1.21, σ = 0.71); (b) UK speech, delivered in 1868; 1595 words. The determination coefficient is 0.998 for lognormal distribution (µ = 1.20, σ = 0.70).
Entropy 26 00180 g003
Figure 4. Word length distribution histogram. The dashed red line is the lognormal distribution, the solid blue line is the Weibull distribution. The inset shows the best possible approximation of the distribution histogram using a discrete Poisson distribution (black dots) and a power function of the form const/(word length), represented by the dashed green line. (a) USA speech, delivered in 1977, 1213 words. The coefficient of determination is 0.998 for lognormal distribution (µ = 1.20, σ = 0.65); (b) UK speech (Conservative party), delivered in 1978; 4898 words. The coefficient of determination is 0.999 for lognormal distribution (µ = 1.20, σ = 0.63).
Figure 4. Word length distribution histogram. The dashed red line is the lognormal distribution, the solid blue line is the Weibull distribution. The inset shows the best possible approximation of the distribution histogram using a discrete Poisson distribution (black dots) and a power function of the form const/(word length), represented by the dashed green line. (a) USA speech, delivered in 1977, 1213 words. The coefficient of determination is 0.998 for lognormal distribution (µ = 1.20, σ = 0.65); (b) UK speech (Conservative party), delivered in 1978; 4898 words. The coefficient of determination is 0.999 for lognormal distribution (µ = 1.20, σ = 0.63).
Entropy 26 00180 g004
Figure 5. Average word length A as a function of the time speaking t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Figure 5. Average word length A as a function of the time speaking t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Entropy 26 00180 g005
Figure 6. Mode word length M as a function of the time speaking t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Figure 6. Mode word length M as a function of the time speaking t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Entropy 26 00180 g006
Figure 7. Behavior of the parameter µ of the lognormal distribution versus the time t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Figure 7. Behavior of the parameter µ of the lognormal distribution versus the time t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Entropy 26 00180 g007
Figure 8. Behavior of the parameter σ of the lognormal distribution versus the time t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Figure 8. Behavior of the parameter σ of the lognormal distribution versus the time t. Red circles indicate data for the USA and black triangles indicate data for the UK.
Entropy 26 00180 g008
Figure 9. Lognormal distribution showing the change in word length distributions over time. Data for the USA, the µ parameter is 1.25, the σ parameters are 0.68 (red line), 0.64 (blue line) and 0.60 (green line) for 1815, 1915 and 2015, respectively.
Figure 9. Lognormal distribution showing the change in word length distributions over time. Data for the USA, the µ parameter is 1.25, the σ parameters are 0.68 (red line), 0.64 (blue line) and 0.60 (green line) for 1815, 1915 and 2015, respectively.
Entropy 26 00180 g009
Table 1. Ranking of distributions according to the coefficient of determination criterion. USA speeches. The total number is 82.
Table 1. Ranking of distributions according to the coefficient of determination criterion. USA speeches. The total number is 82.
PlaceLognormalWeibullFolded NormalRayleighHalf Normal
15824000
22458000
30067150
40015958
50001468
60004438
Table 2. Ranking of distributions according to the coefficient of determination criterion. UK speeches. The total number is 245.
Table 2. Ranking of distributions according to the coefficient of determination criterion. UK speeches. The total number is 245.
PlaceLognormalWeibullFolded NormalRayleighHalf Normal
123015000
214230100
300145991
400998363
500037208
610026218
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsizhmovska, N.L.; Martyushev, L.M. Word Length in Political Public Speaking: Distribution and Time Evolution. Entropy 2024, 26, 180. https://doi.org/10.3390/e26030180

AMA Style

Tsizhmovska NL, Martyushev LM. Word Length in Political Public Speaking: Distribution and Time Evolution. Entropy. 2024; 26(3):180. https://doi.org/10.3390/e26030180

Chicago/Turabian Style

Tsizhmovska, Natalia L., and Leonid M. Martyushev. 2024. "Word Length in Political Public Speaking: Distribution and Time Evolution" Entropy 26, no. 3: 180. https://doi.org/10.3390/e26030180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop