New Estimations for Shannon and Zipf–Mandelbrot Entropies

The main purpose of this paper is to find new estimations for the Shannon and Zipf–Mandelbrot entropies. We apply some refinements of the Jensen inequality to obtain different bounds for these entropies. Initially, we use a precise convex function in the refinement of the Jensen inequality and then tamper the weight and domain of the function to obtain general bounds for the Shannon entropy (SE). As particular cases of these general bounds, we derive some bounds for the Shannon entropy (SE) which are, in fact, the applications of some other well-known refinements of the Jensen inequality. Finally, we derive different estimations for the Zipf–Mandelbrot entropy (ZME) by using the new bounds of the Shannon entropy for the Zipf–Mandelbrot law (ZML). We also discuss particular cases and the bounds related to two different parametrics of the Zipf–Mandelbrot entropy. At the end of the paper we give some applications in linguistics.


Introduction
The idea of the Shannon entropy [1] plays a key role in information theory, while in some cases, it is denoted as measure of uncertainty. There are basically two methods for understanding the Shannon entropy. Under one point of view, the Shannon entropy quantifies the amount of information in regard to the value of X (after measurement). Under another point of view, the Shannon entropy tells us the amount of uncertainty about the variable of X before we learn its value (before measurement) [2]. The random variable, entropy, is characterized regarding its probability distribution and it can appear as a better measure of predictability or uncertainty. SE permits the appraisal of the normal least number of bits expected to encode a series of symbols based on the letters in order of estimation and the recurrence of the symbols. The formula for SE is given by [1] where ψ 1 , ψ 2 , ..., ψ n ∈ + with ∑ n i=1 ψ i = 1. There are many applications of the Shannon entropy in most applied sciences and in other sciences, such as biology [3], genomic geography [4], and finance [5]. Currently, the Shannon entropy is applied in the simulation of laser dynamics and as an objective measure to evaluate models and compare observational results [6,7].
In 1932, George Zipf gave the idea that the size of the r th largest occurrence of an event is inversely proportional to its rank. That is, this law states that P r = 1/r b , where P r is the frequency of occurrence of the r th ranked and b is close to unity. As in linguistics, Zipf found that one can calculate the number of times each word appears in the text. Therefore, if the rank (γ) of the word is in accordance with the frequency of the word's appearance (ρ), then the product of these two numbers is a constant ((C): C = ρ.γ, see [8,9]).
There are several applications of the Zipf law, and here we present some of them. This law has been used in city populations. Kristian Giesen and Jens Suedekum conducted a study on the city measure distributions of single German districts' reliance on researching the phenomenon [10]. They built their study based on the intuition by Gabaix (1999) which states that the Zipf law takes after an irregular development process. This means that Gabaix displays that if the districts follow the Gibrat law, they should notice the Zipf law at both the districts at a national level. By utilizing non-parametric procudures, they found that the Gibrat law holds in each German district, regardless of how "districts" are defined. To put it differently, the Gibrat and Zipf laws are inclined to hold ubiquitously in space. In geology, the Zipf law has been used with temperate prosperity in the resource estimation of extracting ores from the ground and petroleum [11]. In principle, it forecasts how many entities of a confident size can be left in a sequence of decreasing size, assuming the largest has been established. The solar flare intensity (M. E. J. Newman, 2004) [12] represents the cumulative distribution of the vertex gamma-ray density of solar flares, for which perceptions were made between 1980 and 1989 by the well-known HardX-Ray fulmination spectrometer onboard the solar maximum mission satellite launched in 1980. The spectrometer uses a CsI gleaming discloser to measure gamma-rays from solar flares. For website traffic (Shane Parkins, 2015) [13], the Zipf law seems, by all accounts, to be the control as opposed to the special case. It is available at the level of routers that transmit data from one geographic location to another and in the content of the World Wide Web. At the social and economic levels, it also determines how persons choose the sites they visit and form peer-to-peer societies. The omnipresent nature of the Zipf law in cyberspace is geared toward deeper empathy with the internet phenomena, for example, discovering the potential of prevalence proxy caches in divergent Autonomous Systems (ASes) with the purpose of reducing the costs incurred by internet service providers and pacification of the load on the internet backbone [14].
It was determined that Zipf's law can describe the size and rank distribution of earthquakes, including those with magnitude, but it cannot predict when they will occur. In the earth-moon the crater size-frequency distribution can be represented by the Zipf law [12,15].
In 1966, Benoit Mandelbrot gave an enhancement for the Zipf law, known as ZML, which gives a generalization of the account of the low-rank words in corpus [16]: where i < 1000, r, c > 0 and if h = 0, we get the Zipf law.
To complete this section, we give some notions and results from ref. [20]. Let g : G −→ R be a convex function defined on the convex set G, T n = {1, 2, 3, ..., n}. Let s be fixed positive integer and l be all those positive integers, such that 1 ≤ l ≤ s ≤ n. Suppose M s 1 , M s 2 , ..., M s l represents any subsets of T n , such that M s Analogously, for other particular values of s with 1 ≤ l ≤ s ≤ n, one can obtain different functionals. The following generalized refinements of the Jensen inequality were given in refs. [20,21].
Due to the great importance of the Shannon and Zipf-Mandelbrot entropies, many results are devoted to these entropies in the literature. The main focus of this paper was to associate some refinements of the Jensen inequality to the Shannon and Zipf-Mandelbrot entropies. In this paper, we use the main results given in ref. [20] and obtain some estimations for these entropies. We also discuss some particular cases of these results. At the end of the paper, we give some applications in linguistics. The idea of this paper can be applied for other results of the Jensen inequality to obtain new estimations for these entropies.

Estimations for the Shannon Entropy
We start by giving our first main result for the Shannon entropy.
In the following corollary, we discuss another particular case of Theorem 3.

Estimations for the Zipf-Mandelbrot Entropy
In the following main result, we obtain some general estimations for the Zipf-Mandelbrot entropy.
Proof. If we substitute ψ i with Then, Now, by applying Theorem 3 for ψ i = 1 (i+h) r Q n,h,r , we obtain the required result.
We can use Theorem 4 to obtain the following corollary.

Remark 4.
By using Remark 2, we also have In the following result, we obtain the estimation for the Zipf-Mandelbrot entropy concerning two different parameters.
Now we give applications of the above results in linguistics. In ref. [26], Gelbukh and Sidorov observed the difference between the coefficients r 1 and r 2 in the Zipf law for the English and Russian languages. They processed 39 literature texts for each language, chosen randomly from different genres, with the requirement that the size be greater than 10,000 running words each. They calculated the coefficients for each of the mentioned texts and as a result, they obtained an average of r 1 = 0.973863 for the English language and r 2 = 0.892869 for the Russian language.
In the following results, we give the application of inequality (11) for the English language.