Partition Entropy as a Measure of Regularity of Music Scales

: The entropy of the partition generated by an n -tone music scale is proposed to quantify its regularity. The normalized entropy relative to a regular partition and its complementary, here referred to as the bias, allow us to analyze various conditions of similarity between an arbitrary scale and a regular scale. Interesting particular cases are scales with limited bias because their tones are distributed along specific interval fractions of a regular partition. The most typical case in music concerns partitions associated with well-formed scales generated by a single tone h . These scales are maximal even sets that combine two elementary intervals. Then, the normalized entropy depends on each number of intervals as well as their relative size. When well-formed scales are refined, several nested families stand out with increasing regularity. It is proven that a scale of minimal bias, i


Introduction
An n-tone music scale E n determines a partition of the octave in n intervals.Regarding their regularity, scales can be of equal divisions of the octave, i.e., with n tones of equal temperament (n-TET scale), where discretization precision increases as the number of tones grows, or that of different divisions, where precision increases in two ways: as the size of the intervals decreases and as their regularity increases.In particular, regarding well-formed scales of one generator, hereinafter referred to as cyclic scales, if they are non-degenerate (of equal temperament), they are formed from two elementary intervals [1,2], one longer than the interval of an n-TET scale and the other of shorter width.These scales also fulfill the condition of maximum evenness [3,4]; that is, they present the most even distribution, which does not depend on the relative size of the intervals.In many cases, as the number of tones of a cyclic scale increases, the regularity of the intervals decreases, still satisfying the condition of a maximal even set.Therefore, for cyclic scales, it is not obvious how to quantify precision or the regularity of the partition as the number of tones increases.The present work had the purpose of analyzing this using partition entropy, as well as studying the bias from regular temperament in more general cases.
The regularity of the intervals of E n can be quantified in several ways.A fairly common way is from their standard deviation.In the octave, for i ∈ I = {1, . . ., n} the intervals A i between consecutive tones, when they are considered a discrete random variable of equal probability such that ∑ i∈I A i = 1, have expected value t = 1 n , i.e., the size of the elementary interval of the n-TET scale.A measure of dispersion of these values about the mean is the standard deviation σ, defined from its square, the variance, σ 2 = 1 n ∑ i∈I (A i − t) 2 .In this way, it is possible to compare different n-tone scales in terms of the average dispersion of their intervals relative to the n-TET scale.However, this procedure does not present interesting properties to deal with, for instance, successive refinements of cyclic scales.
Another way to measure the regularity of E n may be based on the quadratic sum of its intervals.We consider this case in a more general way.Let us assume that (M, µ) is a Lebesgue-measurable space such that µ(M) = 1 and α = {A i } i∈I is a finite measurable partition of M; that is, µ(∪ i∈I A i ) = 1 and µ(A i ∩ A j ) = 0 if i ̸ = j.(Equalities that involve measures are understood to be true except in a null set; that is, we should strictly write µ(M \ ∪ i∈I A i ) = 0, but we will not write this to simplify the notation.)Then, the sum of squares for the partition α relative to the measure µ is S(α) = ∑ i∈I µ(A i ) 2 .The function S(α) is a convex function of µ(A i ), which, when constrained to ∑ i∈I µ(A i ) = 1, has an absolute minimum for µ(A i ) = 1 n , ∀i.In this case, S(α) = 1 n is a decreasing function of n.When the partition is refined, that is, if β = {B j } j∈J is another partition of the octave, then its sum α ∨ β = {A i ∩ B j } i∈I, j∈J is a refinement of them.In order to make successive refinements, it would be desirable to be able to express S(α ∨ β) = S(α) + F(α, β) with a function F that is a linear combination of squares of µ(A i ∩ B j ), i.e., also a sum of squares of the refinement.But this is not possible because in F there will necessarily appear cross products.
Fortunately, the notions of regularity and fineness of a partition have, in several fields, a well-known way of being quantified, which is the partition entropy.The concept of entropy was introduced by Clasius in thermodynamics (in 1850), Boltzmann applied it to statistical mechanics (1877), Planck related it to probability theory (1906), Shannon applied it to information theory (1948), Jaynes used it as a measure of uncertainty (1957), and Kolmogorov extended it to deterministic systems (1958).
Certainly, entropy has been used in music almost from the beginning of information theory [5][6][7] by considering the musical language as a source that produces a sequence of symbols representing musical tones and by associating them with certain probabilities according to their frequency of appearance.In particular, a wide range of works have used entropy to identify music styles (e.g., [8][9][10][11]) although results may vary depending on the preanalytical assumptions made in order to treat the data (scale degrees, pitch class, octave equivalence, weighting by duration, key-signature dependency, modal bias, etc.).With the same purpose, cross-entropy (e.g., [12]) has been used to quantify stylistic similarity between two sequences [13].An up-to-date review on entropy and other physical parameters applied to music is provided by Gündüz [14].
Nevertheless, it seems that entropy as a measure of regularity of music scales has been neglected.There is a significant difference between discrete state spaces and continuous state spaces.In most cases, entropy is considered in terms of information theory, where the natural application is to symbolic frequency analysis.Although Shannon did suggest a generalization to the continuum through infinitesimal equal measure states, becoming an integral form of his discrete theory, partition entropy measures an inverse density rather than a frequency over equal measure.
Therefore, in the current paper, it is proposed to use the normalized entropy as a measure of the regularity.Its application is illustrated in two cases: first, to study the similarity between an arbitrary scale and an n-TET scale; and second, to study the regularity of cyclic scales generated by a tone h and the relationship between their bias and the rational approximation of log 2 h that they provide.

Partition Entropy
Following the notation and terminology of Arnold and Avez [15], the entropy of the partition α is defined from a concave function z(t), as shown below.(The entropy of the partition is also called metric entropy, and it consists of changing the metric of the theory of probabilities, i.e., the probability, by a generic metric, where the concepts relative to the partitions remain.) The following relationship is fulfilled, Let us remember that if the function of a random variable f (x) is concave, the expected value satisfies E( f (x)) ≤ f (E(x)) (Jensen's inequality).Therefore, for a partition of n elements, the value log 2 n is the maximum value that the entropy can reach, and it is reached when the elements of the partition are of equal measure. Thus, (in probabilities, it would be the information associated with the measure of the random variable A i , i.e., the expected value of the information content or self-information of the variable A i ).The smaller µ(A i ), the greater the above logarithm; so, this average gives an idea of how refined the partition is.
Shannon (1948) showed that the entropy defined in this way is the only function that, except for a multiplicative constant, satisfies the following postulates: is an increasing function of n.Let us consider partitions α = {A i } i∈I and β = {B j } j∈J and a refinement of both, α ∨ β = {A i ∩ B j } i∈I, j∈J .The conditional entropy of α relative to β is defined as where µ(A i /B j ) is the conditional measure of A i relative to B j .
Therefore, the partition entropy seems like an appropriate parameter to measure how much a scale of n arbitrary tones differs from an n-TET scale.It is also the parameter that provides us with an estimate of how refined a scale is, because the entropy of a scale of n tones will always be greater than that of a subscale of n−1 tones, and less than or equal to that of an n-TET scale.This will be useful when dealing with cyclic scales.In addition, as the intervals of a partition are subdivided, the entropy is additive with regard to the intervals being refined, which saves calculations.

Cyclic Scales
The computation of entropy for cyclic scales requires a brief review of their properties, which hereinafter are summarized by following Cubarsi [16,17].
A ratio of 2 between frequencies corresponds to the range of one octave.For any frequency ratio ν ∈ Ω ≡ (0, ∞), the values 2 k ν, k ∈ Z, define one equivalence class.The set Ω is a commutative group for multiplication.The set of all the octaves of the fundamental frequency ratio (ν 0 = 1) is a monogenous subgroup of Ω of an infinite cardinal, Ω 2 = {2 k , k ∈ Z}.The frequency classes are the elements of the quotient group Ω 0 = Ω/Ω 2 , also commutative for multiplication.For each equivalence class, we choose a representative in [1,2) (with identified extremes) as the reference octave, which we identify with Ω 0 .A finite set of these representatives will be referred to as scale tones.
An n-tone cyclic scale E h n is a scale of one generator, a real positive value h.The scale tones satisfy the symmetry condition, consisting of displaying several degrees of rotational symmetry that are equivalent to the closure condition [18][19][20], although such an equivalence does not hold for scales with more than two generators [21,22].The partition of the octave induced by the scale notes has exactly two sizes of scale steps, and each number of generic intervals occurs in two different sizes, which is known as Myhill's property [1,2].For h = 3, we get the particular case of the 12-tone Pythagorean scale, generated by fifths, as well as those listed in Table 1.In general, for h other than a rational power of (which would lead to degenerate cases of equal temperament scales) we would obtain generalized Pythagorean scales.The scale tones are ν k = h k 2 k ; k = 0, . . ., n − 1; with k = ⌊k log 2 h⌋ (floor function).When the scale tones are ordered from lowest to highest pitch in [1,2) (say, in cyclic order or by ordinal), we find two extreme tones-the minimum tone ν m and the maximum tone ν M -which determine the two elementary factors U = ν m = h m 2 m (up the fundamental) and D = 2 ν M = 2 M +1 h M (down the fundamental) associated with the generic widths of the step interval such that U M D m = 2.The indices satisfy n = m + M, which are all coprime.
The tone ν n = h n 2 n , which does not belong to the scale E h n , provides the closure condition (either ν n → 1 + or ν n → 2 − ) determining the n-order comma κ n = min(ν n , 2 ν n ) (in the frequency space), i.e., the error in closing the scale near the fundamental with no other scale tones between them.The comma itself does not provide information about whether ν n closes above or below the fundamental.In using the index N = m + M + 1 = ⌊n log 2 h + 1 2 ⌋, two parameters provide this information: on the one hand, the scale closure, 2 N , which is a value close to 1 satisfying U D = γ n ; and on the other hand, the scale digit δ = N − n , taking values 0 or 1.Then, 1 (left and central panels) displays how the 12-tone cyclic scale for h = 3 (Pythagorean scale) is formed.The h iterates of indices m = 7 and M = 5 are the extreme tones, while n = 12 provides the closure, since there is no other tone with a frequency that is between ν 12 and the fundamental.The ratio between consecutive iterates is either 3  2 or 3 4 , except for the last iterate (the wolf fifth), which compensates for the comma.When the scale tones are arranged in pitch order, i.e., by ordinals, their ratios are either U = 3 7 2 11 or D = 2 8  3 5 .Such a distinction is much clearer under the following alternative approach.Each tone ν k in the frequency space is associated with a note or pitch class log 2 ν k in the octave S 0 = R/Z, so that the above quantities have the corresponding ones in S 0 .Thus, the elementary intervals u = log 2 U and d = log 2 D generate the partition of the octave in n intervals, with Mu + md = 1, satisfying u − d = ϕ n , with interval closure ϕ n = log 2 γ n and interval comma |ϕ n | = log 2 κ n .Sometimes, it is more useful to work in the multiplicative space of tones, and sometimes, in the additive space of notes.In the latter case, a frequency x ∈ [1, 2) is usually expressed by musicians as 1200 log 2 x ∈ [0, 1200) in cents (¢), so that each semitone of the 12-TET scale is divided into 100 parts.Figure 1 (right panel) displays how the intervals between the 12 notes of the Pythagorean scale are distributed along the circle of the octave (clockwise direction).All of the intervals, even the last one, are either u = 113.7¢or d = 90.2¢.

The fraction N
n is associated with the convergent and semi-convergent continued fraction expansions of log 2 h [23][24][25].Among cyclic scales, two categories may be pointed out.The first category is formed by optimal scales, associated with the best closure γ n ≈ 1, corresponding to the best rational approximations, i.e., the convergents of its canonical continued fraction expansions from both sides.The second category, which we shall name accurate scales, is associated with the best estimations of the generator tone 2 N n h ≈ 1, corresponding to the good rational approximations N n of log 2 h.Apart from accurate scales, which include optimal scales, there are cyclic scales not associated with good or best rational approximations, still corresponding to semiconvergents.
Notice that the terms "good" and "best" rational approximations [26] are equivalent to the best approximation "of the first kind" and "of the second kind", respectively [27].The conditions mean the following.
A one-sided best approximation of log 2 h + occurs when γ n < 1 and A one-sided best approximation of log 2 h − occurs when A one-sided good approximation of log 2 h + occurs when For all these scales, the values U, D, and γ n have bounds according to Appendix A. In particular, | log 2 γ n | quantifies the error of the rational approximation of log 2 h.These bounds determine the interval between a note of the cyclic scale E h n and the one with the same ordinal in the n-TET scale, which is analyzed in Appendix B.
Finally, the family of cyclic scales follow a chain, E h n ⊂ E h n + , so that in starting from the indices (m, M) of E h n , the same values for the next scale E h n + are (i) m + =m + M, M + =M ⇐⇒ δ=0 and (ii) m + =m, M + =m + M ⇐⇒ δ=1 (see Table 1).

Entropy of a Cyclic Scale
For a cyclic scale E h n , with the measure defined for tone intervals as µ([x 0 , x 1 ]) = log 2 x 1 x 0 for 0<x 0 ≤x 1 , consider the partition of the octave α≡(u M , d m ) composed of M intervals of width u and m intervals of width d, regardless of the order in which the intervals follow each other.The partition entropy is We will explicitly write it in terms of the indices of the minimum and maximum tones m and M. In order to do so, we define the values where {log 2 x} is the mantissa of log 2 x, that is, log 2 h x − ⌊log 2 h x ⌋, such that the elemental intervals u and d can be expressed in terms of the indices of the minimum and maximum tones as In this way, Equation ( 4) becomes Let us see how to express the entropy of the cyclic scale E h n + , the one following E h n in the chain of cyclic scales.The scale refinement process is as follows.The partition (u M , d m ) breaks, so that the major interval splits into two, one of the same size as the minor one plus a remainder.This residual is only smaller than the size of the smaller interval when the scale is optimal.This process is iterated.
As explained in Section 3, we must distinguish two cases, depending on whether γ n is greater or less than 1, or equivalently, if δ equals 0 or 1: In this case, N = n , and m + = n, M + = M.The refinement is performed in the M intervals of size u, and the m intervals of size d are maintained: ).Thus, we express the entropy according to Equation (3), with the new partition α ∨ β = (u ′ M + , d ′ m + ): In terms of the indices of the extreme tones, in noting H n = H(E h n ), it can be written as The refinement is performed in the m intervals of size d, and the M intervals of size u are maintained: In terms of the indices of the extreme tones, they are equivalent to

Partition Modulation
According to Equation (A2), we write the elementary intervals of a cyclic scale as We have referred to one degenerate case of a cyclic scale, the limiting case when the two intervals are one, u → d.In this case, u = d = 1 n and E h n = E ⊤ n , the n-TET scale.This is equivalent to ϕ n → 0 in the expressions of Equation ( 9).However, we should consider two more degenerate cases.The larger is one interval, the smaller is the other, always filling a full octave, Mu + md = 1.Therefore, if d → 0, then the cyclic scale becomes an equal temperament scale of M tones, and if u → 0, the scale becomes an equal temperament scale of m tones.In other words, one of the following cases applies: In this case, the intervals of the cyclic scale satisfy 0 < d < ϕ n < u.Since u − d = ϕ n , then when u → ϕ n , the scale is not non-degenerate anymore and becomes an M-TET scale, with entropy Obviously, an optimal cyclic scale is far from this situation, because it satisfies 0 < ϕ n < d < u. (ii) Also, according to Appendix A, for ϕ n < 0, owing to Equation (A8), In this case, the intervals of the cyclic scale satisfy 0 < u < |ϕ n | < d.Then, d → |ϕ n |, so that the scale becomes a degenerate m-TET scale with entropy An optimal cyclic scale case is also far from this situation, since it satisfies 0 These results are immediate consequence of the bounds obtained in Appendix A.

Modulating Temperament Scales
The most usual case of cyclic scale E h n is the one associated with a generator corresponding to a harmonic h ∈ Z + of the fundamental tone.Then, the tonal class of the generator is g = h ⌊log 2 h⌋ ∈ (1, 2), which can be written as g = 2 µ+ϕn n , where µ = N − 1 n is the ordinal of the scale tone that better approximates g [17], known as the chromatic length of the pitch class of the generator.The value µ = ⌊n log 2 g + 1 2 ⌋ also determines the indices m and M of the extreme tones, corresponding to the ordinals 1 and n−1 of the scale notes.Since as µ m − n m = 1, m is the positive integer, 0 < m < n so that µ m = 1 mod n.
For fixed n (and also m and M), the family of cyclic scales C h n (ξ) has generators that, in general, are not the tonal class of any harmonic; that is, they are real scales.In this case, we write them as g ′ = 2 µ+ξ n so that, for ξ ∈ (− 1 m , 1 M ), we obtain a partition modulation, which is a continuum of irregular temperaments [28,29] close to E h n and E ⊤ n , also corresponding to cyclic scales.Hence, the n-tone cyclic scales C h n (ξ) are a family of modulating temperament scales around the generator 2 For example, for n = 12 and g ′ = 4  √ 5, the value ξ = 12 4 log 2 5 − ⌊ 12 4 log 2 5 + 1 2 ⌋ = −0.03422gives rise to the quarter-comma meantone temperament, which fits the class of the fifth harmonic well, in exchange for decreasing by 5¢ in the accuracy of the third one.Since the value of |ξ| is lower than 1 n+m = 1 19 = 0.05263, it results in an optimal scale.

Entropy in Terms of the Closure
Let us consider an n-tone modulating the temperament scale C h n (ξ) ∈ T µ n .According to Equations ( 4) and ( 9), we write the entropy of C h n (ξ) as By deriving Equation (10) with respect to ξ, we obtain If 0 < ξ < 1 M , the two terms of the previous equation are positive and the derivative is negative; therefore, in this interval, the entropy is a decreasing function.If − 1 m < ξ < 0, both terms in the above equation are negative and the derivative is positive, so in this interval, the entropy is an increasing function.At ξ = 0, there is a local maximum, since log In addition, it is straightforward to see the following properties: M .Properties (c) and (d) refer to the slight asymmetry of the entropy far from ξ = 0, which, as we shall see in the last section, is negligible around ξ = 0. Property (e) means that a cyclic scale generated by h and its inverse scale, generated counter-clockwise with swapped indices of the extreme tones, although they have different notes that differ in one comma, have the same entropy.
Therefore, the entropy of any n-tone modulating temperament scale in the family T µ n only depends on the closure.

Normalized Entropy
Since the entropy increases as the partition is refined, to be able to compare qualities of different scales, whether they are cyclic or not, we will consider the normalized entropy, i.e., relative to the maximum value that it can reach, as that of the equally tempered scale.For an n-tone scale E n , it is defined as such that η tends to 1 as E n approaches an n-TET scale.This ratio has also been called efficiency and relative entropy [30].For a cyclic scale E h n , we will write η n = η(E h n ).Let us see in which case the normalized entropy increases when a scale is being refined.Let us assume two scales, not necessarily cyclic: Lemma 2. The normalized entropy increases if and only if the relative increment of entropy in refining the partition is greater than the relative increment of the entropy of the corresponding equal temperament scales.

Proof. By the sub-additivity property, H(E n
The deviation of an n-tone scale E n , not necessarily cyclic, relative to the regular n-TET scale will be measured from the complementary of the normalized entropy, which we will call bias:

Bias of a Cyclic Scale
For a non-degenerate n-tone cyclic scale C h n (ξ) ∈ T µ n , the bias θ(C h n (ξ)) = θ n (ξ) depends on the divergence of its elementary intervals u and d with regard to the elementary interval of the n-TET scale.Thus, in bearing in mind Equation (10), The graph of the function θ n is the mirror image up-to-down of H n , scaled by the factor 1 log 2 n ; hence, it is convex (Figure 3).The value θ n (0) = 0 is its minimum in the interval ξ ∈ (− 1 m , 1 M ), and at the extremes (corresponding to degenerate scales), it takes the values Therefore, in the interval (− 1 m , 0), θ n (ξ) is a decreasing function, and in (0, 1 M ), it is an increasing function of ξ.Note that this behavior of θ n in terms of ξ holds when n is fixed.In other words, θ n = θ(ξ; m, M).Then, although for another n ′ -tone cyclic scale, n ′ ̸ = n, the bias θ n ′ has a similar behavior with respect to ξ ′ , we cannot assure that if ξ < ξ ′ , then θ(ξ; m, M) < θ(ξ ′ ; m ′ , M ′ ), since this also depends on the values of m ′ and M ′ , such that m ′ + M ′ = n ′ .
Figure 4 shows the trends of the bias θ n and the interval comma |ϕ n | for Pythagorean scales (generated by h = 3) in terms of n.In general, optimal scales have low bias, but the bias not always decreases as a scale is refined.For example, the first optimal scales are those of n = 5; 12; 41; 53; 306; 665 . .., and the bias decreases for n = 5; 12; 53; 665, but the scale of n = 41 has greater bias than that of n = 5; 12, and the one of n = 306 has greater bias than those of n = 12; 53.Also notice that there are intervals where |ϕ n | decreases but θ n increases.We will say that a cyclic scale E h n is of minimal bias (MB) if for any cyclic scale E h n ′ with n ′ < n, θ n < θ n ′ .Obviously, it is tantamount to say that η n > η n ′ ; hence, the scale has greater normalized entropy than the previous ones.Not all optimal cyclic scales are MB.
Before a deeper analysis, in an approximate way, we may estimate how close an MB scale and an equal temperament scale of the same number of tones are.We may use the criterion for which a cyclic scale (not necessarily optimal) unambiguously approximates an n-TET scale if every note of the former is at a distance equal or less than half an elementary interval from the latter.According to Equation (A12) with λ = 2, this leads to the condition ¢(κ n ) ≤ 600 n−1 (condition I|2n).This condition can be satisfied by optimal and non-optimal cyclic scales.For example, for n = 3, condition I|2n is holds although the scale is not optimal.On the contrary, n = 41; 306; 111 202 are optimal scales but the condition I|2n is not met.
Let us see what happens with the MB condition.In this case, for up to n = 2 • 10 5 , all scales that satisfy MB also satisfy I|2n, but there are scales that are I|2n and not MB scales, for example, for n = 3; 306; 15 601; 31 867; 79 335.Hence, the condition that a note of the cyclic scale is closer than half an interval of a note of a tempered scale is weaker than the MB condition.On the contrary, if we further restrict the above condition, for example, the respective notes are at most at a third interval, i.e., ¢(κ n ) ≤ 400 n−1 (condition I|3n), then for up to n = 2 • 10 5 , all I|3n scales are MB scales.In Table 1, these and other properties are displayed.

Scale Distributed within Regular Intervals
We study several families of scales for which it is possible to estimate a priori the lower limit of the entropy.
We say that two n-tone scales E n and E ′ n sharing the fundamental tone alternate if their notes in cyclic order-s j = log 2 σ j ∈ E n and s ′ j = log 2 σ ′ j ∈ E ′ n , where 0 < j < n-fulfill one of the following conditions: If E ′ n is the equal temperament scale E ⊤ n , then we write σ ′ j = ϑ ′ j and say that E n is regular interval distributed (RID).In this case, the alternation can also be defined from the following relations involving intervals between tones, (The interval between two tones Note that these conditions are generally more restrictive than the condition Obviously, Equation ( 16) implies the condition of Equation ( 17), although, as seen in Appendix B, for cyclic scales, they are equivalent.
For example, suppose case (a).If 0 ≤ I(σ j , ϑ ′ j ) < 1 n , then the interval between two consecutive notes of E n satisfies I(σ j , σ j+1 In a similar way, we would reason case (b).
Lemma 3. The interval between two consecutive notes of an n-tone RID scale is lower than 2 n .

Scale r-Similar to n-TET
A criterion for measuring the proximity between scales is similarity [31].Two n-tone scales T and T ′ are similar at level r > 0 (r-similar) if for each tone For example, a cyclic scale E h n and the n-TET scale have a level of similarity 1 2n if and only if any pair of notes with ordinal j, i.e., ϑ j ∈ E h n and ϑ ′ j ∈ E ⊤ n , satisfy d(ϑ j , ϑ ′ j ) ≤ 1 2n .This condition is equivalent to the condition of Equation (A12) with λ = 2, ¢(κ n ) ≤ 600 n−1 .However, for the current purpose of evaluating and comparing entropy partitions, we will slightly modify such a concept by assuming that the fundamental is shared by both scales.Then, we say a scale E n is similar to the n-TET scale at level r, with 0 < r ≤ 1 2n , if their tones satisfy d(σ j , ϑ ′ j ) ≤ r ; 0 < j < n (18)

Scale One-Side r-Similar to n-TET
We introduce a concept that will mix the one of similarity with the one of distribution within regular intervals, which will be appropriate for studying the bounds of the entropy for cyclic scales.
Let ϑ ′ j be the tones of the n-TET scale.We say that a scale E n is one-side r-similar at level r (r-OSS) to the n-TET scale, with 0 It is OSS by the right (a) or by the left (b), respectively.Therefore, when r = 1 n it matches the definition of an RID scale.

Entropy of a Scale 1
λn -OSS to n-TET Consider an n-tone scale E n , not necessarily cyclic, which is 1  λn -OSS to n-TET for λ ≥ 1.We already know that the maximum entropy is log 2 n.Let us see the minimum entropy it can reach.We assume case (a) of Equation ( 19) and consider the n notes (n ≥ 2) of E n in cyclic order in [0, 1) ∪ {1}, since the entropy does not change if we add a null set.We also extend the range of the possible variation in the intervals between notes at the extremes to include the limiting degenerate cases.The points determining the division of the octave are By writing T = 1 n and assuming t 0 = t n = 0, the respective intervals are Then, the scale E n generates the partition α = {T i }, i ∈ {1, . . ., n}, with entropy The entropy H(α) is a concave and differentiable function f (t 1 , . . ., t n−1 ), f :V → R, with V = [0, 1 λn ] n−1 , a hypercube, which is a convex and compact space.Then, f has a local and global maximum at (t 1 , . . ., t n−1 ) = (0, . . ., 0), corresponding to an n-TET scale with T i = T and H = log 2 n, and the global minimum is reached at one or some of the vertices of V.These vertices are determined using the possible values t j ∈ {0, 1  λn }, 1 ≤ j ≤ n − 1, and the resulting intervals T i can only take the following values: interval T 1 , values of which A in number have width λ−1 λn , B width 1 n , and C width λ+1 λn , so that (Some of these intervals may have width zero, giving rise to a degenerate scale with less than n non-null intervals.) The entropy at one of these vertices can be written in terms of the respective number of intervals as where z is the concave function defined in Equation ( 1), also extended for values t > 1, where z(t) < 0. Among possible configurations, we look for the minimum value that f can take, which will correspond to the minimum value of g.Notice that this quantity is added to the entropy of an n-TET scale, as it is consistent with the fact that the B intervals of width 1 n are not involved in the expression.However, the intervals in number C, of greater width than 1 n , contribute to decreasing the entropy, while those in number A, of width lower than 1 n , contribute to increasing it, since the corresponding function z evaluated in values less than 1 is positive.

Entropy of a Scale 1
λn -Similar to n-TET Let us calculate the minimum entropy that can reach an n-tone scale E n , not necessarily cyclic, with a similarity level 1  λn with the n-TET scale for λ ≥ 2. According to the previous notation and considerations, the points that determine the division of the octave are by assuming t 0 = t n = 0.As before, we write the entropy of the partition α = {T i }, i ∈ {1, . . ., n}, as , a convex and compact space.The local and global maximum of f takes place at (t 1 , . . ., t n−1 ) = (0, . . ., 0), corresponding to the scale of n-TET, with T i = T and H = log 2 n.The global minimum is reached at one or some of the vertices of V.These vertices are determined using the possible values t j ∈ {− 1 λn , 1 λn }, 1 ≤ j ≤ n − 1, and the resulting intervals T i can only take the following values: the values λ−1 λn and λ+1 λn of the extreme intervals T 1 and T n , while the intermediate intervals can have values λ−2 λn , 1 n , and λ+2 λn .The octave is covered by a number of different intervals satisfying of which A in number have width λ−2 λn , B width λ−1 λn , C width 1 n , D width λ+1 λn , and E width λ+2 λn , so that The entropy at one of these vertices can be written in terms of the respective number of intervals: Once again, the intervals of width 1 n are not involved in this expression.The intervals in numbers D and E, of greater width than 1  n , contribute to decreasing the entropy, while those who are there in numbers A and B, of width lower than 1  n , contribute to increasing it.The minimum value of g is obtained using the highest possible values of E and D (in that order) and minimum of A and B.

Cyclic RID Scales
We may reformulate Theorem A1 according to the above definitions.Theorem 3. Optimal cyclic scales are RID scales.
Nevertheless, there are many non-optimal cyclic scales that are also RID scales.Cyclic RID scales correspond to partial convergents of continued fractions where the comma is not as low as for optimal scales, i.e., their closure is not a best approximation, although they belong to a family of relatively good partial convergents.However, although in most cases, a convergent that gives a good approximation (i.e., an accurate scale) generates a RID scale, there are exceptions, such as for n = 200.Between the notes j = 83 of the respective scales E 3 200 and E ⊤ 200 , there is a distance of nearly 1.5 elementary intervals of the equal temperament scale, and between two consecutive notes of the cyclic scale, there may be a distance equivalent to 2.1 regular intervals, which means that within some regular intervals, there are two notes of the cyclic scale.
Therefore, there exist accurate scales that are not RID, and RID scales that are not accurate.

Comma and Elementary Intervals of an RID Scale
It is immediate to identify an RID scale from its interval comma |ϕ n | = | log 2 γ n | = |n log 2 h − N|.As seen in Appendix B, the distance between the tones ϑ ′ j ∈ E ⊤ n and This condition, which is satisfied by all optimal scales, assures us that the value of |ϕ n | is small enough to be far from the previously seen degenerate cases of less than n notes, since for every RID scale, From another point of view, while considering RID scales, we are excluding "bad approximations" of log 2 h that satisfy any of the following: if This fact also has implications on the bounds of the two elementary intervals of cyclic RID scales, We distinguish the following cases: (We should distinguish two cases: (a) M > m: Therefore, the extremes still correspond to n-tone scales, by avoiding degenerate cases.) (We should distinguish two cases: (a) M > m: ) and through substitution into Equation ( 14), we obtain The same happens with the higher-order terms.Therefore, for enough small |ϕ n |, we can use the approximation derived from the following result.Lemma 5.The bias of a cyclic RID scale satisfies

Cyclic Scales of Minimal Bias
As explained in Section 4, while refining a cyclic scale, in each iteration, one of the indices of the extreme tones remains fixed, while the other increases by a value equal to the one that remains fixed.Let us see that in each refinement, the following function appearing in Equation (34) always increases: Notice that for m + M > 2, the above inequalities hold, and if m + M = 2, then M = m = 1, so that the above results are also valid.

Corollary 1. Let us write ψ
With this notation, the bias of a cyclic RID scale can be estimated from Theorem 4. Every cyclic RID scale of minimal bias is optimal.
Proof.We prove this by denying the consequent.
Therefore, if a cyclic RID scale is not optimal, it cannot be MB.

Conclusions
In the current paper, it was proposed to measure the regularity of the intervals of a music scale from their partition entropy.Among other properties, the fact of being a continuous increasing function of n for an n-TET scale, which is always the maximum value that the entropy of any n-tone scale can reach, together with the sub-additivity property, which guarantees that while refining the partition, the entropy always increases, make this parameter very suitable for our purpose.In order to compare scales with different numbers of tones, the entropy relative to the corresponding regular scale is used, which is the normalized entropy, so that their complementary to 1 quantifies the bias relative to the n-TET scale.
The main application of these concepts has been to cyclic scales, and their properties were reviewed and further investigated in the Appendices.Since non-degenerate cyclic scales are maximal even sets [3,4] and their intervals come in two possible sizes [1,2], the remaining properties allowing us to distinguish between scale distributions are the number of intervals of each size and the ratio between them.
Two situations have been analyzed.First, cyclic scales with a fixed number of tones, which is a family of modulating temperament scales around one generator.In this case, the bias only depends on the closure, i.e., the relative size of both elementary intervals.Second, as cyclic scales are refined, the bias also depends on how many intervals of each size there are.
In order to study such a dependency, it was necessary to restrict the scales in two ways.We centered our attention to scales with a lower limit of the entropy, determined using the condition that their notes are distributed along each of the intervals of a regular scale (RID scales).Such a study was conducted in a general way, by calculating the maximum bias of several scales, not necessarily cyclic, with different levels of similarity with an n-TET scale, either from one side or from both sides.Figure 4 displays the similarity levels for cyclic scales.In addition, we considered scales with the comma not exceeding that of an RID scale, since in this case, the dependency between bias and closure is well defined.
We proved that any cyclic scale of minimal bias (MB scale), i.e., with a bias that is lower than that of the cyclic scales of fewer tones, is necessarily optimal, i.e., corresponds to a best rational approximation of log 2 h.
Notice that the bias, θ n , according to Equations ( 35) and (36), is proportional to the product of ψ(m, M), depending on each number of elementary intervals, and ϕ 2 n , depending on their relative size, so that θ n accounts for the degrees of freedom allowed for cyclic scales.
Among the optimal scales, it was possible to select the ones for which intervals are distributed along the octave as regularly as possible relative to an equal temperament scale of the same number of tones.Therefore, in relation to the closure, scales can be ordered in nested families, from worst to best, as cyclic, accurate, and optimal scales, whilst in relation to their regularity, they can be ordered in nested families as cyclic, RID, optimal, and MB scales.
Although the current entropy-based measure has been particularly used to deepen the study of cyclic scales, the present work clearly suggests future applications to more general cases, also through using alternative metrics, either based on the distribution of the scale notes or the intervals, such as the recent Boltzmann-Shannon Interaction Entropy [32], allowing us to estimate a normalized entropy from a finite sample of points on a bounded interval.
Table A1 summarizes the above results over the octave S 0 .Notice that, for optimal scales, Equations (A3) and (A6) give the following bounds, similarly to the theory of continued fractions: Table A1.Bounds for ϕ n = log 2 γ n , u = log 2 U, and d = log 2 D, depending on the type of cyclic scale.

Figure 1 .
Figure 1.(Left) Tones (frequencies ν k ) of the cyclic scale for h = 3 and n = 12 in[1,2) in order of iterates (blue) and in pitch order (green).The red dot is assumed as closing the scale.(Center) Ratios between iterates (blue) and consecutive scale tones (green).(Right) Scale intervals in cents (1200 log 2 ν k ) along the circle of the octave clockwise direction (number in white for the first one) with two interval sizes.

Figure 2 Figure 2 .
Figure 2 displays how entropy increases in terms of the number of tones of a Pythagorean scale.The left panel refers to the scales with a lower number of tones, while the right panel shows the larger trend (in logarithmic scale), by making explicit the values of δ.

Lemma 1 .
The scales formed from the two elementary intervals u ′ = 1+m ξ n and d ′ = 1−Mξ n with values − 1 m < ξ < 1 M generate an infinite and continuous family of n-tone cyclic scales C

Figure 3 .
Figure 3. Graphs of normalized entropy and bias.

Figure 4 .
Figure 4. Behavior of |ϕ n |= log 2 κ n (red) and θ n =1−η n (blue) in terms of the number of tones n (bilogarithmic scales).In gray is maximum bias for several similarity levels to n-TET scales.

Table 1 .
Properties < u < 1 n .Once again, the extremes correspond to ntone scales not degenerating toward scales of fewer tones.(b) M < m: 2 For large values of |ϕ n |, that is, |ϕ n | > 1 m or |ϕ n | > 1 M , Equation (14) can have a quite arbitrary and non-symmetrical behavior, but for values |ϕ n | < 1 n−1 , and in particular, for optimal scales with |ϕ n | < 1 n+M or |ϕ n | < 1 n+m , depending on the value δ, the bias θ n behaves as proportional to ϕ 2 n .With |mϕ n | < 1 and |Mϕ n | < 1, we can approximate the logarithms in the following expressions as Assume two cyclic RID scales E h n and E h n ′ such that n > n ′ with interval commas satisfying |ϕ n | > |ϕ n ′ |, i.e., E h n is not optimal.Then, applying the previous corollary to Equation (36), we have