Next Article in Journal
Quantum Identity Authentication in the Counterfactual Quantum Key Distribution Protocol
Next Article in Special Issue
Remaining Useful Life Prediction with Similarity Fusion of Multi-Parameter and Multi-Sample Based on the Vibration Signals of Diesel Generator Gearbox
Previous Article in Journal
The Arbitrarily Varying Relay Channel
Open AccessArticle

Recurrence Networks in Natural Languages

1
Departamento de Física, Escuela Superior de Física y Matemáticas, Ciudad de México 07738, Mexico
2
Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México 07340, Mexico
3
Facultad de Ciencias, Univesidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
4
Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Zacatecas 98000, Mexico
5
Department of Physics, Queens College, City University of New York, New York, NY 11367, USA
6
Advanced Consortium on Cooperation, Conflict, and Complexity (AC4), Earth Institute, Columbia University, New York, NY 10027, USA
7
Graduate Center, City University of New York, New York, NY 10016, USA
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(5), 517; https://doi.org/10.3390/e21050517
Received: 7 April 2019 / Revised: 14 May 2019 / Accepted: 17 May 2019 / Published: 23 May 2019
(This article belongs to the Special Issue Entropy, Nonlinear Dynamics and Complexity)
We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different European languages. The similarity between patterns of length m is determined by the Hamming distance and a value r is considered to define a matching between two patterns, i.e., a repetition is defined if the Hamming distance is equal or less than the given threshold value r. In this way, we calculate the adjacency matrix, where a connection between two nodes exists when a matching occurs. Next, the recurrence network is constructed for the texts and some representative network metrics are calculated. Our results show that average values of network density, clustering, and assortativity are larger than their corresponding shuffled versions, while for metrics like such as closeness, both original and random sequences exhibit similar values. Moreover, our calculations show similar average values for density among languages which that belong to the same linguistic family. In addition, the application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties. Finally, we discuss our results in the context of the general characteristics of written texts. View Full-Text
Keywords: recurrence networks; natural languages; patterns repetition recurrence networks; natural languages; patterns repetition
Show Figures

Figure 1

MDPI and ACS Style

Baeza-Blancas, E.; Obregón-Quintana, B.; Hernández-Gómez, C.; Gómez-Meléndez, D.; Aguilar-Velázquez, D.; Liebovitch, L.S.; Guzmán-Vargas, L. Recurrence Networks in Natural Languages. Entropy 2019, 21, 517.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop