Method of Distinguishing Styles by Fractal and Statistical Indicators of the Text as a Sequence of the Number of Letters in Its Words
Abstract
:1. Introduction
- Descriptive statistics;
- Correlation analysis between the original text and its translations;
- Approximation of histograms by the number of letters in words.
- The text is presented as regular sequence of random events without semantic representation. It allows us to use the classical methods of time series analysis.
- The method for calculating the exact value of the fractal dimension is developed.
- The fractal analysis model is presented. It can be used for Hurst index calculation.
2. State of the Arts
- Fractal analysis (fractal dimension, Hurst index and constant);
- R/S power dependence;
- Phase analysis (quasi-cycle parameters); and
- Construction of recurrent diagrams.
3. Materials and Methods
3.1. Text Representing by a Regular Sequence of Random Events
- A regular sequence of elements, when the intervals between them are precisely defined and do not change their size, i.e., ;
- If there are permissible deviations from regularity, then ;
- Intervals are random variables: .
3.2. Fractal Analysis of Regular Sequence
- The fractal dimension; and
- The related indicator, which is often called the Hurst index, the trend indicator, or the Hurst exponent.
3.2.1. Cellular Method for Determining the Fractal Dimension
3.2.2. Determination of the Hurst Index
3.2.3. Determination of the Degree Dependence Constant
3.3. Fractal Analysis Model
4. Results
4.1. Text Preparation
4.2. Set the Size of the Grid Cells
4.3. Determining the Number of Cells
- (1)
- The number of grid cells is determined for each group, covering the vertical cells with the minimum and maximum values of the elements in each group.
- (2)
- Fractal dimension is calculated. The essence of the proposed method is as follows.
- (3)
- For each group and for each grid, one has to calculate the value between the maximum and minimum values of the elements included in this group and divide this value by the size of the grid cell. Obviously, the group size and the cell size must match, and or the number of cells of a specific grid size can be determined as follows:
4.4. Determination of the Fractal Dimension D of the Sequence
- (a)
- Column O, starting with cell O2 contains number of letters for a particular text, as shown in Table 1;
- (b)
- In the first row of columns P, Q, R, S, and T in cells P1, Q1, R1, S1, and T1 is indicated the nominal size of the group divided by the sequence. It corresponds to the cell size of a particular grid. In cells P2, Q2, R2, S2, and T2, formulas give the value of the size of the fractal cell;
- (c)
- Further calculations are performed according to Formula (5);
- (d)
- Autocomplete forms columns with values of differences in the specified intervals;
- (e)
- The formulas form step (c), to calculate the number of cells in a group of words that correspond to the cell sizes for specific grids. However, as a result of such a “sliding” calculation, due to autocomplete, the division of the sequence into groups is destroyed. Therefore, to obtain the exact value of the number of cells of the grid that covers the graph, in accordance with Formula (5), it is necessary to determine the value of their sum. Thus, for a grid with the size of cells , it is necessary to calculate the sum in cells P2, P4, P6,…; for a grid , it is necessary to calculate the sum of cells Q2, Q5, Q8,…; for a grid , it is necessary to calculate the sum of cells R2, R6, R10,…; for a grid it is necessary to calculate the sum of cells S2, S7, S12,…; for the grid , it is necessary to calculate the sum of cells T2, T8, T14,…. To find the number of cells for a particular grid, only certain values for each column P, Q, R, S, and T should be calculated. In other words, the sum of every second cell, starting with the first, for column P should be calculated. We should calculate the sum of every third cell for column Q, every fourth cell for R, every fifth cell for S, and every sixth cell for T. It should be done starting the calculation each time from the first cell;
- (f)
- To find the value of the fractal dimension, the table of correspondence between the size of the grid cells and the number of cells of this grid is calculated (Table 2).
4.5. Determining the Hurst Exponent
- If 0 ≤ H < 0.5, the levels are oscillating;
- If H = 0.5, the series is an example of random Brownian motion;
- If 0.5 < H ≤ 1, the series is fractal with the presence of a trend.
4.6. Determination of the R/S Ratio Constant
5. Discussion
- First, each text style has the same form of paragraphs, indents, punctuation, etc. These elements disappear in the model. As a result, the texts lose their specific style features. However, the results of the study indicate that such differences still occur;
- Secondly, the material used is enough to make any statistical conclusions for only short texts.
- In terms of fractal indicators, the poem style has the most significant value of the fractal dimension, and the conversational style has the smallest value. In our opinion, this can be explained by the fact that colloquial language mainly uses short words, and the poem style uses rhyming pairs of words, which can be quite long.
- The fractal dimension values for the artistic, confessional, scientific, and epistolary style are very close. This can be explained as follows: the first two styles focus on the perception of the content by the average reader, and the second two are already focused on a specific reader, i.e., specialist. Business and journalistic style are quite close.
- The Hurst index is rigidly related to the fractal dimension. It requires an analysis of the meaning of the text for its interpretation. The fact is that this indicator characterizes the trends in the fluctuations of the levels of the numerical sequence. Therefore, there remains the problem of how to connect it with the text size.
- The constant on the set of two-parameter functions is a parameter of position or scale. From physical point of view, this constant characterizes the material, environment, and conditions. In terms of mathematical problems, it comes from solving differential equations and integrals. From Table 3 we can form the following classification: for business and journalistic texts, it has the lowest value (0.26 and 0.25); for colloquial texts, the value is slightly higher (0.33); for artistic, scientific, and epistolary texts, the value is even higher (0.38, 0.39, and 0.402, respectively); for confessional texts and poems, its value is the largest (0.46 and 0.68, respectively). The correspondence of fractal indicators to these styles remains problematic.
- The journalistic style has the smallest value of the power function constant, and the poetic style has the most significant value of this constant. The value of this constant differs almost three times, and this is only for eight short texts.
- According to statistical indicators, the most significant value of the average word length is slightly less for scientific text style. Business style (6.42 and 6.36), as well as artistic, confessional, and poetic styles have close average lengths (4.22, 4.34, and 4.37, respectively); journalistic and epistolary styles are also quite close to this indicator (5.43 and 5.44, respectively), and there is a separate conversational style (4.76). The value of the indicator of the first two styles of indicators can be explained by the presence of long terms in the texts: technical, economic, political, and others. Journalistic and epistolary styles have relatively high but almost the same average word lengths.
- The standard deviation values for artistic, confessional, and epistolary styles are the smallest (2.38, 2.33, and 2.34, respectively), and the largest values of this indicator are for business and epistolary styles (3.23 and 3.30, respectively). Conversational, scientific, and business styles have a value of this indicator between these two groups (2.74, 2.94, and 3.07, respectively).
- The scope of the cumulative series is quite difficult to interpret because the cumulative series is very nonlinear. According to this indicator, the most significant values are conversational and epistolary styles (47.4 and 41.4, respectively), and the least significant values are artistic and confessional styles (23.4 and 27.7, respectively). The other scientific, business, journalistic, and poetic styles are located between these two groups.
- In the analysis of the English text, as shown in in Figure 6, the behavior of fractal and statistical indicators gives grounds to draw the following conclusions. First, all indicators confirm the high homogeneity of the first four parts of the text. Here, as in the previous discussion, the behavior of the scope of the cumulative series was not considered, although for the first three parts it differs a little from the fourth and fifth parts.
- The results of the cluster analysis confirm the difference between the styles even if the editing method was used to construct the proposed values.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Abuzayed, B.; Al-Fayoumi, N.; Charfeddine, L. Long range dependence in an emerging stock market’s sectors: Volatility modelling and VaR forecasting. Appl. Econ. 2018, 50, 2569–2599. [Google Scholar] [CrossRef]
- Shono, H.; Peng, C.K.; Goldberger, A.L.; Shono, M.; Sugimori, H. A new method to determine a fractal dimension of non-stationary biological time-serial data. Comput. Biol. Med. 2000, 30, 237–245. [Google Scholar] [CrossRef]
- Miniczuk, J.; Wojdyłło, P. Estimation of Hurst exponent revisited. Comput. Stat. Data Anal. 2007, 51, 4510–4525. [Google Scholar] [CrossRef]
- Liu, B.; Yao, L.; Fu, X.; He, B.; Bai, L. Application of the fractal method to the characterization of organic heterogeneities in shales and exploration evaluation of shale oil. J. Mar. Sci. Eng. 2019, 7, 88. [Google Scholar] [CrossRef] [Green Version]
- Fernández-Martínez, M.; Guirao, J.L.G.; Sánchez-Granero, M.Á.; Segovia, J.E.T. Fractal Dimension for Fractal Structures: With Applications to Finance; Springer: Berlin/Heidelberg, Germany, 2019; Volume 19, pp. 20–31. [Google Scholar]
- Orzeszko, W. Fractal dimension of time series as a measure of investment risk. Acta Univ. Nicolai Copernic. Ekon. 2010, 41, 57–70. [Google Scholar] [CrossRef] [Green Version]
- Raimundo, M.S.; Okamoto, J., Jr. Application of Hurst Exponent (H) and the R/S Analysis in the Classification of FOREX Securities. Int. J. Model. Optim. 2018, 8, 116–124. [Google Scholar] [CrossRef] [Green Version]
- Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Springer Science & Business Media: Berlin, Germany, 2013; Volume 97. [Google Scholar]
- Liu, Y.; Wang, Y.; Chen, X.; Zhang, C.; Tan, Y. Two-stage method for fractal dimension calculation of the mechanical equipment rough surface profile based on fractal theory. Chaos Solitons Fractals 2017, 104, 495–502. [Google Scholar] [CrossRef]
- Chen, X.; Li, J.; Han, H.; Ying, Y. Improving the signal subtle feature extraction performance based on dual improved fractal box dimension eigenvectors. R. Soc. Open Sci. 2018, 5, 180087. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, X.; Liao, H. A consensus-based probabilistic linguistic gained and lost dominance score method. Eur. J. Oper. Res. 2019, 272, 1017–1027. [Google Scholar] [CrossRef]
- Deng, X.; Wang, J.; Wei, G.; Lu, M. Models for multiple attribute decision making with some 2-tuple linguistic pythagorean fuzzy hamy mean operators. Mathematics 2018, 6, 236. [Google Scholar] [CrossRef] [Green Version]
- Iakovleva, E.A.; Katermina, T.S.; Platonov, V.V.; Vinogradov, A.N. Logical-Linguistic Modeling for Predicting and Assessing the Pandemic Consequences in the Arctic. In Knowledge in the Information Society; Springer: Cham, Switzerland, 2020; pp. 403–416. [Google Scholar]
- Khairova, N.; Lewoniewski, W.; Węcel, K. Estimating the quality of articles in Russian Wikipedia using the logical-linguistic model of fact extraction. In International Conference on Business Information Systems; Springer: Cham, Switzerland, 2017; pp. 28–40. [Google Scholar]
- Kobyzev, I.; Prince, S.; Brubaker, M. Normalizing flows: An introduction and review of current methods. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2020; pp. 56–78. [Google Scholar]
- Lande, D.; Subach, I.; Puchkov, A. A System for Analysis of Big Data from Social Media. Inf. Secur. 2020, 47, 44–61. [Google Scholar] [CrossRef]
- Ullah, S.; Ahmad, H.N.; Jan, S.U.; Jan, T.; Shah, S.; Butt, N.I.; Jan, M.Y. A statistical analysis of Pakistan Journal of Surgery: A bibliometric lens from 2007–2016. Pak. J. Surg. 2017, 33, 123–127. [Google Scholar]
- Puthal, D. Lattice-modeled information flow control of big sensing data streams for smart health application. IEEE Internet Things J. 2018, 6, 1312–1320. [Google Scholar] [CrossRef]
- Gutman, E.V.; Nurmieva, R.R. Stylistic aspect of translation of social and political vocabulary (On the material of English and Tatar languages). Humanit. Soc. Sci. Rev. 2019, 7, 65–70. [Google Scholar] [CrossRef] [Green Version]
- Kulchytskyi, I. Statistical Analysis of the Short Stories by Roman Ivanychuk. In COLINS, CEUR; 2019; Volume 2362, pp. 312–321. [Google Scholar]
- Odinokaya, M.; Krepkaia, T.; Sheredekina, O.; Bernavskaya, M. The culture of professional self-realization as a fundamental factor of students’ internet communication in the modern educational environment of higher education. Educ. Sci. 2019, 9, 187. [Google Scholar] [CrossRef] [Green Version]
- Conversational Style of Speech: Text-Example. Available online: https://ycilka.net/tvir.php?id=291 (accessed on 30 June 2021).
- Dudyk, P. Stylistics of the Ukrainian Language. Artistic Style of Speech and Speech. Available online: http://litmisto.org.ua/?p=5462 (accessed on 1 July 2021).
- Design as a Modern Branch of Human Activity. The Text of the Transfer. Available online: https://skripnikmarina.ucoz.ua/publ/rozvitok_movlennja/mova/stislij_perekaz_tekstu_naukovogo_stilju/13-1-0-69 (accessed on 30 June 2021).
- Sports Today (A Debatable Note in a Newspaper in a Journalistic Style). Available online: https://www.ukrlib.com.ua/sochm/printout.php?id=944 (accessed on 30 June 2021).
- John 1: 1-17. Available online: http://news.ugcc.ua/bible-quote/%D0%94%D1%96%201:1-8,%20%D0%99%D0%BE%201:1-17 (accessed on 1 July 2021).
- Stylistics. An Example of Epistolary Style. Available online: https://sites.google.com/site/stilistikamiller/home/epistolarnij-stil (accessed on 1 July 2021).
- Kostenko, L. And Everything in the World Must Be Experienced. Available online: https://luol-carmelo.livejournal.com/116992.html (accessed on 1 July 2021).
- The Bogey-Beast. Available online: https://americanliterature.com/childrens-stories/the-bogey-beast (accessed on 1 July 2021).
- Albalawi, R.; Yeap, T.H.; Benyoucef, M. Using topic modeling methods for short-text data: A comparative analysis. Front. Artif. Intell. 2020, 3, 42. [Google Scholar] [CrossRef] [PubMed]
- Andronache, I.; Marin, M.; Fischer, R.; Ahammer, H.; Radulovic, M.; Ciobotaru, A.M.; Peptenatu, D. Dynamics of forest fragmentation and connectivity using particle and fractal analysis. Sci. Rep. 2019, 9, 12228. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vysotska, V.; Lytvyn, V.; Kovalchuk, V.; Kubinska, S.; Dilai, M.; Chyrun, L.; Brodyak, O. Method of similar textual content selection based on thematic information retrieval. In Proceedings of the 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 17–20 September 2019; Volume 3, pp. 1–6. [Google Scholar]
- Palmquist, M.E.; Carley, K.M.; Dale, T.A. Applications of computer-aided text analysis: Analyzing literary and nonliterary texts. In Text Analysis for the Social Sciences; Routledge: England, UK, 2020; pp. 171–190. [Google Scholar]
- Roberts, C.W. (Ed.) Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts; Routledge: England, UK, 2020. [Google Scholar]
- Humphreys, A.; Wang, R.J.H. Automated text analysis for consumer research. J. Consum. Res. 2018, 44, 1274–1306. [Google Scholar] [CrossRef]
- Bohdalová, M.; Bohdal, R.; Valach, V. Short term prediction of gas prices using time series analysis. In Proceedings of the ITISE 2016, Granada, Spain, 27–29 June 2016. [Google Scholar]
- Bohdalová, M.; Bohdal, R. Forecasting of financial time series using fuzzy ARMA approach. In Proceedings of the FSTA 2016, Liptovský Ján, Slovakia, 24–29 January 2016. [Google Scholar]
N | O | P | Q | R | S | T |
---|---|---|---|---|---|---|
Word Number | Letters Count | m = 2 | m = 3 | m = 4 | m = 5 | m = 6 |
1 | 6 | 2 | 2.333 | 1.75 | 1.4 | 1.167 |
2 | 10 | 3.5 | 2.333 | 1.75 | 1.4 | 1.167 |
3 | 3 | 1 | 1.333 | 1 | 1.2 | 1.333 |
4 | 5 | 1 | 0.667 | 1 | 1.6 | 1.333 |
5 | 7 | 0.5 | 1 | 2 | 1.6 | 2 |
6 | 6 | 1.5 | 2.667 | 2 | 2.4 | 2 |
7 | 9 | 4 | 2.667 | 3 | 2.4 | 2 |
8 | 1 | 2 | 4 | 3 | 2.4 | 2 |
9 | 5 | 4 | 3 | 2.25 | 1.8 | 1.5 |
10 | 13 | 4.5 | 3 | 2.25 | 1.8 | 1.5 |
11 | 4 | 0 | 0 | 0 | 0 | 0 |
Cell size | Number of Cells | Logarithm of Their Size | Logarithm of Their Number |
---|---|---|---|
2.0 | 111.5 | 0.7 | 4.7 |
3.0 | 67.7 | 1.1 | 4.2 |
4.0 | 52.5 | 1.4 | 4.0 |
5.0 | 38.2 | 1.6 | 3.6 |
6.0 | 28.3 | 1.8 | 3.3 |
Styles | Parameters | |||||
---|---|---|---|---|---|---|
n | D | H | R | S | C | |
conversational | 150 | 1.2105 | 0.7895 | 47.4362 | 2.742 | 0.3329 |
artistic | 150 | 1.3497 | 0.6503 | 23.4228 | 2.3791 | 0.3802 |
scientific | 150 | 1.3157 | 0.6843 | 36.9329 | 3.0744 | 0.3913 |
business | 150 | 1.2607 | 0.7393 | 33.7315 | 3.2267 | 0.2586 |
journalistic | 150 | 1.2414 | 0.7586 | 32.3893 | 2.9368 | 0.2477 |
confessional | 150 | 1.3485 | 0.6515 | 27.7248 | 2.3301 | 0.4567 |
epistolary | 150 | 1.3129 | 0.6871 | 41.3557 | 3.2989 | 0.4027 |
poetic | 150 | 1.3702 | 0.6298 | 37.1611 | 2.3408 | 0.6793 |
Fragments of Text | Fractal Characteristics of English Text | |||||
---|---|---|---|---|---|---|
D | H | A | R | S | C | |
first | 1.2943 | 0.7057 | 3.6513 | 21.0769 | 1.7414 | 0.2930 |
second | 1.2806 | 0.7194 | 3.6667 | 27.6667 | 1.6674 | 0.3737 |
third | 1.2755 | 0.7245 | 3.6564 | 22.3026 | 1.7290 | 0.2828 |
fourth | 1.279 | 0.721 | 3.7231 | 9.0000 | 1.7066 | 0.1178 |
fifth | 1.3495 | 0.6505 | 4.0615 | 36.7538 | 2.0675 | 0.5757 |
No of Object or Group | Name of Objects and Groups | Distance between Objects and Groups |
---|---|---|
1 | colloquial | - |
2 | artistic | - |
3 | scientific | - |
4 | business | - |
5 | journalistic | - |
6 | confessional | - |
7 | epistolary | - |
8 | poetical | - |
9 | 2 + 6 | 0.263 |
10 | 3 + 7 | 0.538 |
11 | 4 + 5 | 0.554 |
12 | 8 + 9 | 1.132 |
13 | 1 + 11 | 1.328 |
14 | 13 + 10 | 2.055 |
15 | 12 + 14 | 5.506 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kaminskiy, R.; Shakhovska, N.; Kajanová, J.; Kryvenchuk, Y. Method of Distinguishing Styles by Fractal and Statistical Indicators of the Text as a Sequence of the Number of Letters in Its Words. Mathematics 2021, 9, 2410. https://doi.org/10.3390/math9192410
Kaminskiy R, Shakhovska N, Kajanová J, Kryvenchuk Y. Method of Distinguishing Styles by Fractal and Statistical Indicators of the Text as a Sequence of the Number of Letters in Its Words. Mathematics. 2021; 9(19):2410. https://doi.org/10.3390/math9192410
Chicago/Turabian StyleKaminskiy, Roman, Nataliya Shakhovska, Jana Kajanová, and Yurii Kryvenchuk. 2021. "Method of Distinguishing Styles by Fractal and Statistical Indicators of the Text as a Sequence of the Number of Letters in Its Words" Mathematics 9, no. 19: 2410. https://doi.org/10.3390/math9192410
APA StyleKaminskiy, R., Shakhovska, N., Kajanová, J., & Kryvenchuk, Y. (2021). Method of Distinguishing Styles by Fractal and Statistical Indicators of the Text as a Sequence of the Number of Letters in Its Words. Mathematics, 9(19), 2410. https://doi.org/10.3390/math9192410