# Statistics and Machine Learning Experiments in English and Romanian Poetry

## Abstract

**:**

## 1. Introduction

## 2. Material and Methods

#### 2.1. Letter Frequency

#### 2.2. Informational Entropy

#### 2.3. Informational Energy

#### 2.4. Marcov Processes

#### 2.5. N-Gram Entropy

#### 2.6. Language Entropy

#### 2.7. Conveyed Information

**Proposition**

**1.**

**Proof.**(i)

#### 2.8. N-Gram Informational Energy

#### 2.9. Comparison of Two Texts

#### 2.10. Neural Networks in Poetry

#### 2.10.1. RNNs

#### 2.10.2. Data Collection

#### 2.10.3. Data Processing and Clean-up

#### 2.10.4. Choosing the Model

## 3. Results and Discussion

- what! not a word!–and am i the the cisseth pantses susthe the the lof lelen ban depilers in the shele whe whas the as hof bathe the with lens on bo sint the wor thire ched the the the ghordes ang the aned fors and rore dis the the the sors,

- Wake a wildly thrilling measure.
- There will my gentle girl and i,
- The bare pus the soun mare shes and the stist,
- And the and will sost an fing sote no on ass som yian,
- A whis for atile and or souns and pey a porin dave sis bund and are.

- Tha thaigh cill list the the beraich be for withe hond,
- Far om dill and butting of wole,
- O sould af mare hor wore sour;
- To will the ast dore he ares ereis s soar?

- No just applause her honoured name
- Of idiots that infest her age;
- No just applause her honoured name of strain,
- And still the last his breathed lings his press,
- Thou not his art of heart to loves of may,
- The spare that mays the ding the rain,
- Which cares the shep her loves of prose;
- Then heart the such the gleam the live decear;
- This still the gard, our thee in parter’s fiest
- The love through may the word,
- When bay and mights of some the song the new
- The sits again, and sing…

- Outhful eclogues of our pope?
- Yet his and philips’ faults, of differs still;
- To hours, where still the promenty glow.
- The dearts and the tome to dark,
- And to the child and fears of the and the farre;
- Though her decerance more thy poes’s to dark,
- And path of a still howard of day!
- Your conful in so later her stord,
- And praise of the dear the day,
- In proded the former frem this such,
- In the soul in sumery vain,
- And so and the Lamble…

- The hall of my fathers, art gone to decay;
- In thy once smiling garms and stranger striating and and his die;
- The bounding poud the spreise of fall the live,
- The classic poed to the dear of dray–
- To must the dear a strong the combon the forth
- The great by the dear with this the breathing, well,
- In all the percid the thanding showe;
- When prate the that a some his will the say;
- Thou the eyes of the bard, and thought the sungod a fill;

- A transport of young indignation,
- With fervent contempt evermore to disdain you:
- I seembrate to the sought for the faint shill
- The gark her sweets thou greated of the sain,
- A dele the praise the seem and merben meet,
- I change eye may of thy spirit lead,
- No verse thy fain as gale the sight.

- 5.
- Then, the more the durming fort confine!
- The pallage in once the piling fain,
- And not stranger the saint despise me to fair,
- The lovering deep and the prines of critic…

- Few short years will shower
- The gift of riches, and the pride of power;
- E’en now a name the seems who the prophears to fear,
- And stripling band the heart they discounts all the trays!
- One back with must a seem shill the pands,
- The love and the forger disgreand ford;
- And fur the versance the lead at lade;
- Who bard my path, and bounce love alone;
- The daughter land admartuner sweet the speak, while,
- May speak and the some fain the gold my pease,…

- And so perhaps you’ll say of me,
- In which your readers may agree.
- Still I write on, and so mine thee;
- Yet with the heart of earmoness of and cartale.

- 1.
- What and a seem the only soul its me,
- Who balling all of the shall fordst these page bare:
- And the shall helf the partless deep for the cheep;
- What shine! though all some can must here the seif.
- The tells the palling, while the charms the plies;
- And for the sight! when they the heart the prine,

- Hope’s endeavour,
- Or friendship’s tears, pride rush’d between,
- And blotted out the mander great.
- Who fate, though for the right the still grew,
- To siek’st envesing in stream and tome,
- In parth will be my faster and dear,
- To gromend the dare, which lead the glord,
- The pence with the steel to the foem
- And the tarth, i change the rofty cound,
- This cale the ang repire the view,
- For the the saight we fore before,
- And the fall to the deastex flow,…

- Live me again a faithful few,
- In years and feelings still the same,
- And I will fly the hoother shine,
- And child the dast the tarth with
- The tone the fall thy stain’d or sweet,
- And then long the pang the nommer reat,
- When lut the well the sames like the stand,
- When wit the changuid our foight with still thee,
- What mingtred her song each senate,
- And brand the senty art steps record,
- The spring the first by poesion the faint,
- Some light the blest the rest to sig…

- Would have woman from paradise driven;
- Instead of his houris, a flimsy pretence,
- While voice is and the buld they consuce,
- Which menting parragues, chilfoued the resure
- And to might with from the gold a boy,
- The seems my son to cartan’s plaintus for the
- The ear in the still be the praces,
- And life the disonge to shere a chance;
- The mowting pours, the sonite of the fore
- The pures of sain, and all her all,
- And the presert, conal all, or the steems,…

- Lengthened line majestic swim,
- The last display the free unfettered limb!
- Those for him as here a gay and speet,
- Revile to prayial though thy soul and the such
- Roprowans his view where the happling hall:
- He feelin sense with the gloolish fair to the daw;
- The scane the pands unpressar of all the shakenome from.

## 4. Conclusions and Future Directions

## Funding

## Conflicts of Interest

## References

- Shannon, C. A mathematical theory of communication. Bell. Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef][Green Version] - Shannon, C. Prediction and entropy of printed English. Bell Syst. Tech. J.
**1951**, 30, 50–64. [Google Scholar] [CrossRef] - Cover, T.M.; King, R.C. A convergent gambling estimate of the entropy of English. IEEE Trans. Inf. Theory
**1978**, 24, 413–421. [Google Scholar] [CrossRef] - Teahan, W.J.; Cleary, J.G. The entropy of English using PPM-based models. In Proceedings of the Data Compression Conference—DCC ’96, Snowbird, UT, USA, 31 March–3 April 1996; pp. 53–62. [Google Scholar]
- Kontoyiannis, I. The Complexity and Entropy of Literary Styles; NSF Technical Report No. 97; Department of Statistics Stanford University: Stanford, CA, USA, 1996. [Google Scholar]
- Guerrero, F.G. A New Look at the Classical Entropy of Written English. Available online: https://arxiv.org/ftp/arxiv/papers/0911/0911.2284.pdf (accessed on 10 October 2020).
- Moradi, H.; Grzymala-Busse, J.W.; Roberts, J.A. Entropy of English text: Experiments with humans and a machine learning system based on rough sets. Inf. Sci.
**1998**, 104, 31–47. [Google Scholar] [CrossRef][Green Version] - Maixner, V. Some remarks on entropy prediction of natural language texts. Inf. Storage Retr.
**1971**, 7, 293–295. [Google Scholar] [CrossRef] - Savchuk, A.P. On the evaluation of the entropy of language using the method of Shannon. Theory Probab. Appl.
**1964**, 9, 154–157. [Google Scholar] - Nemetz, T. On the experimental determination of the entropy. Kybernetik
**1972**, 10, 137–139. [Google Scholar] [CrossRef] - Basharin, G.P. On a statistical estimate for the entropy of a sequence of independent random variable. Theory Probab. Appl.
**1959**, 4, 333–336. [Google Scholar] [CrossRef] - Blyth, C.R. Note on Estimating Information. Ann. Math. Stat.
**1959**, 30, 71–79. [Google Scholar] [CrossRef] - Pfaffelhuber, E. Error estimation for the determination of entropy and information rate from relative frequency. Kybernetik
**1971**, 8, 350–351. [Google Scholar] [CrossRef] - Grignetti, M. A note on the entropy of words in printed English. Inf. Control
**1964**, 7, 304–306. [Google Scholar] [CrossRef][Green Version] - Treisman, A. Verbal responses and contextual constraints in language. J. VerbalLearn. Behav.
**1965**, 4, 118–128. [Google Scholar] [CrossRef] - White, H.E. Printed English compression by dictionary encoding. Proc. IEEE
**1967**, 5, 390–396. [Google Scholar] [CrossRef] - Miller, G.A.; Selfridge, J.A. Verbal context and the recall of meaningful material. Am. J. Psychiatry
**1950**, 63, 176–185. [Google Scholar] [CrossRef] - Jamison, D.; Jamison, K. A note on the entropy of partially known languages. Inf. Control
**1968**, 12, 164–167. [Google Scholar] [CrossRef][Green Version] - Wanas, M.A.; Zayed, A.I.; Shaker, M.M.; Taha, E.H. First, second and third order entropies of Arabic text. IEEE Trans. Inf. Theory
**1967**, 22, 123. [Google Scholar] [CrossRef] - Guerrero, F.G. On the Entropy of Written Spanish. Rev. Colomb. Estad.
**2012**, 35, 425–442. [Google Scholar] - Barnard, G.A. Statistical calculation of word entropies for four western languages. IRE Trans. Inf. Theory
**1955**, 1, 49–53. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons, Inc.: New York, NY, USA, 2012. [Google Scholar]
- Rosso, O.A.; Craig, H.; Moscato, P. Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers. Phys. A Stat. Mech. Appl.
**2009**, 388, 916–926. [Google Scholar] [CrossRef] - Lesher, G.W.; Moulton, B.J.; Higginbotham, D.J. Limits of Human Word Prediction Performance. 2002. Available online: http://www.csun.edu/~hfdss006/conf/2002/proceedings/196.htm (accessed on 15 November 2020).
- Grzymala-Busse, J.W. LERS—A System For Learning From Examples Based on Rough Sets. In Intelligent Decision Support, Handbook of Applications and Advances of the Rough Set Theory; Slowinski, R., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1992; pp. 3–18. [Google Scholar]
- Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci.
**1982**, 11, 341–356. [Google Scholar] [CrossRef] - Pawlak, Z. Rough Classification. Int. J. Man Mach. Stud.
**1984**, 20, 469–483. [Google Scholar] [CrossRef] - Boulanger-Lewandowski, N.; Bengio, Y.; Vincent, P. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the Twenty-Nine International Conference on Machine Learning (ICML’12), Edinburgh, UK, 26 June–1 July 2012. [Google Scholar]
- Eck, D.; Schmidhuber, J. A First Look at MUSIC composition Using LSTM Recurrent Neural Networks; Technical Report No. IDSIA-07-02; IDSIA USI-SUPSI, Instituto Dalle Molle Di Studi Sull Intelligenza Artificiale: Manno, Switzerland, 2002. [Google Scholar]
- Sutskever, I.; Hinton, G.E.; Taylor, G.W. The recurrent temporal restricted boltzmann machine. In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS’08), Vancouver, BC, Canada, 8–11 December 2008; pp. 1601–1608. [Google Scholar]
- Sutskever, I.; Martens, J.; Hinton, G.E. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML 2011), Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Fermi, E. Thermodynamics; Dover Publications, Inc.: New York, NY, USA, 1956. [Google Scholar]
- Onicescu, O. Energie informationala. St. Cerc. Math.
**1966**, 18, 1419–1430. [Google Scholar] - Onicescu, O. Theorie de L’information. Energy Inf. Ser. A
**1966**, 26, 841–842. [Google Scholar] - Onicescu, O.; Stefanescu, V. Elemente de Statistica Informationala cu Aplicatii/Elements of Informational Statistics with Applications; Editura Tehnica: Bucharest, Romania, 1979. [Google Scholar]
- Halliday, D.; Reskick, R. Fundamentals of Physics; John Wiley and Sons, Inc.: New York, NY, USA, 1974. [Google Scholar]
- Marcus, S. Poetica Matematica; Editura Academiei Republicii Socialiste Romania: Bucharest, Romania, 1970. [Google Scholar]
- Rizescu, D.; Avram, V. Using Onicescu’s Informational Energy to Approximate Social Entropy. Procedia Soc. Behav. Sci.
**2014**, 114, 377–381. [Google Scholar] [CrossRef][Green Version] - Bailey, K. System entropy analysis. Kybernetes
**1997**, 26, 674–687. [Google Scholar] [CrossRef] - Gray, R. Entropy and Information Theory, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
- Agop, M.; Gavrilut, A.; Rezus, E. Implications of Onicescu’s informational energy in some fundamental physical models. Int. J. Mod. Phys. B
**2015**, 29, 1550045. [Google Scholar] [CrossRef] - Alipour, M.; Mohajeri, A. Onicescu information energy in terms of Shannon entropy and Fisher information densities. Mol. Phys.
**2012**, 110, 403–405. [Google Scholar] [CrossRef] - Chatzisavvas, K.; Moustakidis, C.; Panos, C. Information entropy, information distances, and complexity in atoms. J. Chem. Phys.
**2005**, 123, 174111. [Google Scholar] [CrossRef][Green Version] - Nielsen, F. A Note on Onicescu’s Informational Rnergy and Correlation Coefficient in Exponential Families. arXiv
**2020**, arXiv:2003.13199. [Google Scholar] - Calin, O.; Udriste, C. Geometric Modeling in Probability and Statistics; Springer: New York, NY, USA, 2014. [Google Scholar]
- Gagniuc, P.A. Markov Chains: From Theory to Implementation and Experimentation; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
- Øksendal, B.K. Stochastic Differential Equations: An Introduction with Applications, 6th ed.; Springer: Berlin, Germany, 2003. [Google Scholar]
- Calin, O. Deep Learning Arhitectures—A Mathematical Approach; Springer Series in Data Sciences; Springer: New York, NY, USA, 2020. [Google Scholar]
- Williams, R.J.; Hinton, G.E.; Rumelhart, D.E. Learning representations by back-propagating errors. Nature
**1986**, 323, 533–536. [Google Scholar] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**The letter probability distribution for: (

**a**). Jefferson the Virgian; (

**b**). Othello; (

**c**). Memories of Childhood (in Romanian); (

**d**). The Morning Star (in Romanian).

**Figure 2.**An interpretation of the informational energy as a momentum of inertia: the distribution (

**a**) tends to rotate about the x-axis easier than distribution (

**b**), so it has a smaller momentum of rotation.

**Figure 5.**A simple RNN with 70 LSTM cells, with an overlap of one dense layer and a softmax activation output.

**Figure 6.**The characters present in Byron’s poetry. Each character is included between single quotation marks.

X | a | b | ⋯ | z |

$P(X)$ | $p(a)$ | $p(b)$ | ⋯ | $p(z)$ |

Title | Entropy |
---|---|

Jefferson the Virginian | 4.158 |

Nietzsche | 4.147 |

Genesis | 4.105 |

Macbeth | 4.187 |

Romeo and Juliet | 4.182 |

Othello | 4.160 |

King Richard III | 4.214 |

Byron vol. I | 4.184 |

Byron vol. II | 4.184 |

Author | Entropy |
---|---|

Eminescu | 3.875 |

Ispirescu | 3.872 |

Creanga | 3.877 |

Title | Information Energy |
---|---|

Jefferson the Virginian | 0.0332 |

Nietzsche | 0.0334 |

Genesis | 0.0348 |

Macbeth | 0.0319 |

Romeo and Juliet | 0.0323 |

Othello | 0.0327 |

King Richard III | 0.0314 |

Byron vol. I | 0.0322 |

Byron vol. II | 0.0324 |

Author | Information Energy |
---|---|

Eminescu | 0.0368 |

Ispirescu | 0.0422 |

Creanga | 0.0399 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Calin, O.
Statistics and Machine Learning Experiments in English and Romanian Poetry. *Sci* **2020**, *2*, 92.
https://doi.org/10.3390/sci2040092

**AMA Style**

Calin O.
Statistics and Machine Learning Experiments in English and Romanian Poetry. *Sci*. 2020; 2(4):92.
https://doi.org/10.3390/sci2040092

**Chicago/Turabian Style**

Calin, Ovidiu.
2020. "Statistics and Machine Learning Experiments in English and Romanian Poetry" *Sci* 2, no. 4: 92.
https://doi.org/10.3390/sci2040092