Next Article in Journal
Bounds on the Domination Numbers of δ-Complement Graphs
Previous Article in Journal
Slow Translation of a Soft Sphere in an Unbounded Micropolar Fluid with Interfacial Stress Jump
Previous Article in Special Issue
3-Partition Order-Preserving Pattern Matching
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Elementary yet Precise Best-Case Analysis of MergeSort with an Application to the Sum of Digits Problem

by
Marek A. Suchenek
Department of Computer Science, California State University, Dominguez Hills, 1000 E. Victoria St., Carson, CA 90747, USA
Mathematics 2026, 14(4), 733; https://doi.org/10.3390/math14040733
Submission received: 4 January 2026 / Revised: 31 January 2026 / Accepted: 9 February 2026 / Published: 21 February 2026

Abstract

An exact formula B ( n ) = n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) , where Zigzag ( x ) = min ( x x , x x ) , for the minimum number B ( n ) of comparisons of keys performed by MergeSort on an n-element array is derived and analyzed. The said formula is less complex than any other known formula for the same and can be evaluated in O ( log c ) time, where c is a constant. It is shown that there is no closed-form formula for the above. Other variants for B ( n ) are described as well. Since the recurrence relation for the minimum number of comparisons of keys for MergeSort is identical with a recurrence relation for the number of 1s in binary expansions of all integers between 0 and n (exclusively), the above results extend to the sum of binary digits problem.
MSC:
11A63; 11B37; 68P10; 68R99; 68W40

1. Introduction

“One Picture is Worth a Thousand Words”
[An advertisement for the San Antonio Light (1918)]
Teaching undergraduate Analysis of Algorithms has been a rewarding, although a bit taxing, experience. I was often surprised to learn that many basic problems that clearly belong to its core syllabus had been left unanswered or partially answered. Additionally, it seemed a bit odd to me that many otherwise decent texts offered unnecessarily imprecise computations (text [1] being a notable exception in this category) of several rather fundamental results.
In this article, I pursue a seemingly marginal—albeit fundamental—topic, a precise characterization of the best-case behavior of a well-known sorting algorithm, MergeSort , whose pursuit, however, yields some interesting findings that could hardly be characterized as “marginal.” It turns out that—contrary to what a casual student of this subject might believe—computing the exact formula for the number of comparisons of keys that MergeSort performs on any n-element array in the best case is not a routine exercise (in fact, it is significantly more complicated than precise derivation of the well-known exact formula for the number of comparisons of keys performed by it in the worst case) and leads to an instance of a problem that gained some notoriety for being a hard nut to crack analytically: the sum of digits problem. Even more unexpectedly, the relatively straightforward (although not quite closed-form) formula for the said number of comparisons yields an improvement of a well-known answer to this instance of the sum of digits problem:
How many 1s appear in binary representations of all integers between (but not including) 0 and n?

2. Materials and Methods

Several derivations presented in this article may be characterized as experimental mathematics. Once they produced experimentally (using Wolfram Mathematica software) the desired (sought-after) formulas, finding analytic proofs of the said formulas became a relatively easy and straightforward exercise, and has been done in Section 9, Section 10 and Section 11. Those derivations provide an insightful exercise that demonstrates how certain, apparently, hard-to-solve recurrence relations pertaining to the complexity of divide-and-conquer algorithms can be practically and precisely solved by means of experiments and computations augmented with formal analytic proofs of the derived solutions.
No materials were used in the research described in this article.

3. Results

The main result of this article is the following formula of the Main Theorem that gives the exact number B ( n ) of comparisons of keys performed by the classic MergeSort algorithm in the best case while sorting an n-element array.
B ( n ) = n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) ,
where
Zigzag ( x ) = min ( x x , x x ) .
The above formula may be evaluated in sublinear O ( l o g c n ) time, where c is a constant, that is considerably shorter than the time needed for running MergeSort on its best-case input array (for instance, an array of consecutive integers) of n elements or for counting directly how many times digit 1 occurs in binary representations of all integers between 1 and n. This constitutes a significant improvement over other exact formula, attributed to J. R. Trollope and discussed in Section 8, for the said number of 1s that involves infinite summation. Even if only finitely many terms in that infinite summation are non-zero, the Trollope formula does not indicate how long the process of its computation will take for any value of n. For instance, the Trollope formula may possibly require more time to evaluate than counting directly how many times digit 1 occurs in binary representations of all integers between 1 and n. (See Note 1 at the end of this Section.)
The exactness of the formula derived and proved in this article allows for its definite experimental verification, which in general cannot be done for characterizations given by Big-O/Big- Ω /Big- Θ notation. Such an experimental verification has been performed by the uthor using a Java program that incorporates the code for Merge visualized in Figure 1.
Moreover, the exact and practical way to compute the answer to the question of how many comparisons of key does MergeSort perform in the best case while sorting an n-element array fills a significant loophole in many computer science faculty and students’ knowledge of the time complexity of MergeSort and satisfies the cognitive curiosity of those who realize the advantages of knowing things exactly in exact sciences and mathematics. (See Note 2 at the end of this Section.)
It is also demonstrated (see Corollary 3) that there is no exact closed-form formula that gives the said number of comps and (in Section 7) that the said number is significantly larger (by a term of size Θ ( n ) ) than half of the number of comps performed by MergeSort in the worst case.
Note 1. 
Since the total number of significant bits in binary representations of all numbers between 1 and n is equal to i = 1 n ( lg i + 1), the direct counting of 1s in all those numbers requires at least lg n ! or, by virtue of Stirling’s formula, n lg n n lg e + O ( lg n ) steps that take Θ ( n log n ) time to carry them on, which is significantly more than the O ( l o g c n ) time that is sufficient for the evaluation of Formula (15) derived and proved in this article.
Note 2. 
A student of mine was once trying to convince me that the number of comparisons of keys that MergeSort performs in the worst case was exactly equal to twice the number it performed in the best case. In order to show her why it was not exactly the case, I experimentally derived an exact formula for it. This is how I got into writing this article.

4. MergeSort and Its Best-Case Behavior

A call to MergeSort inherits an n-element array A of integers and sorts it non-decreasingly, following the steps described below.
A Java code of Merge is shown in Figure 1.
A typical measure of the running time of MergeSort is the number of comparisons of keys, which for brevity I call comps, that it performs while sorting array A . Since no comps are performed outside Merge , the running time of MergeSort can be computed as the sum of numbers of comps performed by all calls to Merge during the execution of MergeSort .
Because all comps during the execution of Merge are performed within the while -loop that begins in line 76 of the code shown in Figure 1, and the said loop does not terminate before the last element of one of the two arrays has been compared with an element of the other array, the number of comps performed by Merge on two arrays is never less than the size of any of the two arrays passed to it, which in the case of step 2c of Algorithm 1 is not less than n 2 , thus making n 2 a lower bound for the said number of comps. Since the number of comps performed by Merge in the case where all the elements of the first array of size n 2 are less than all the elements of the second array of size n 2 is actually equal to the size of the first array, it follows that n 2 is the minimum number of comps performed during the execution of Merge (and not just a lower bound of that number).
Algorithm 1  MergeSort
To sort an n-element array A , do the following:
  • If n 1 , then return A to the caller;
  • If n 2 , then
    (a)
    Pass the first n 2 elements of A to a recursive call to MergeSort ;
    (b)
    Pass the last n 2 elements of A to another recursive call to MergeSort ;
    (c)
    Linearly merge, by means of a call to Merge , the non-decreasingly sorted arrays that were returned from those calls onto one non-decreasingly sorted array A ;
    (d)
    Return A to the caller.
Obviously, increasingly sorted array A of any size n 2 produces only best-case scenarios for all subsequent calls to Merge . Therefore, k 2 —where k 2 is the size of any sub-array A of size greater than or equal to 2 passed to any (recursive) call to MergeSort —is the actual minimum of and not just a lower bound on the said number of comps performed by Merge invoked by the said call to MergeSort . This fact allows for a rudimentary analysis of the recursion tree for MergeSort that easily yields—as we will see in the next two subsections—the exact Formula (7) for the minimum number of comps for the entire MergeSort . Without said fact, one could only derive by means of analysis of the recursion tree a lower bound on the number of comps performed by MergeSort , even though—as I have pointed out, above—the size n 2 of the first sub-array passed to Merge is the actual minimum of comps performed at this level and not just a lower bound.
The problem arises when one tries to reduce the said formula, which naturally involves long summations, to one that can be evaluated in a logarithmic time.

4.1. Recursion Tree

The obvious recursion tree for MergeSort and sufficiently large n is shown in Figure 2 and explained in some detail in the caption below it.
A recursive application of the equality
n 2 = n + 1 2
(which can be easily verified separately for odd and even values of n) allows for rewriting of that tree onto one whose first four levels are shown in Figure 3.

4.2. Recurrence Relation B ( n ) for the Minimum Numbers of Comps

As I have noticed, before, the number of comps performed by a single call to MergeSort in the best case on an n-element array for n 2 is equal to n 2 . (Recall that all comps performed during a single call to MergeSort are done within the call to Merge that MergeSort invokes after both recursive calls to itself have been completed).
Therefore, the following recurrence relation for the minimum number B ( n ) of comparisons of keys that MergeSort performs on any n-element array is straightforward to derive from its description given by Algorithm 1, as follows:
B ( 1 ) = 0 ,
and for n 2 ,
B ( n ) = n 2 + B ( n 2 ) + B ( n 2 ) .
Using the equality (1), the recurrence relation (3) is equivalent to
B ( n ) = n 2 + B ( n 2 ) + B ( n + 1 2 ) .
A graph of B ( n ) is shown in Figure 4.
Note 3. 
If n is a power of 2, that is, if n = 2 lg n , then
B ( n ) = n 2 lg n .
Indeed, in such a case, recurrences (3) and (4) simplify to B ( n ) = n 2 + 2 B ( n 2 ) , so that B ( n ) = 2 n 2 + 2 B ( n 2 2 ) = = lg n n 2 + 2 B ( n 2 lg n ) = lg n n 2 + 2 B ( 1 ) = [since lg n = lg n and, by the equality (2), B ( 1 ) = 0 ] = n 2 lg n .

4.3. A Solution of Recurrence Relation B ( n )

Unfolding the recurrence (4) allows for noticing that the minimum number B ( n ) of comps performed by all calls to Merge is equal to the sum of all values shown at nodes highlighted yellow in the recursion tree T of Figure 3. They may be summed up level by level. One can notice from Figure 3 that the number of comps performed at any level k with the maximal number 2 k of nodes is given by this formula:
i = 0 2 k 1 n + i 2 k + 1 .
What is not clear is whether all levels of the recursion tree T are maximal. Fortunately, the answer to this question does not depend on whether a given instance of MergeSort is running on a best-case array or on any other case of array. It has been known from a classic analysis of the worst-case running time of MergeSort that every level of its recursion tree T that contains at least one non-leaf, or—in other words—a node that shows the value p 2 , is maximal.Appendix A, contains a detailed derivation of that fact. Thus all levels 0 through h 1 of T are maximal. Therefore, Formula (5) gives the number of comps for every level 0 k h 1 .
The last level h of T may be not maximal because the level h 1 may contain leaves, or —in other words—nodes that show the value p = 1 , where p = n + i 2 h 1 for some 0 i 2 h 1 1 , and as such do not have any children in the level h. However, for each such node, the value of p 2 = n + i 2 h is 0, so it can be included in summation (5) without affecting its value even though the said value does not correspond to any node in the level h. Therefore, Formula (5) gives the number of comps for the level k = h .
Additionally, the depth h of T is known to be equal to lg n , as Theorem A2 in Appendix A states. Thus the minimum number of comps performed by MergeSort is given by this summation
k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 .
Since n 1 < 2 lg n so that n + 2 lg n 1 < 2 lg n + 1 and for every 0 i 2 lg n 1 , n + i < 2 lg n + 1 , thus making
n + i 2 lg n + 1 = 0 ,
it follows that the value of the inner summation (5) in summation (6) for k = h = lg n is 0. Therefore, Formula (6) is equal to this slightly shorter formula:
k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 .
Unfortunately, summation (7) contains n 1 non-zero terms (each corresponding to one of n 1 internal nodes of 2-tree T), so it cannot be evaluated quickly in its present form. Fortunately, as I am going to demonstrate, its inner summation (5) can be reduced to a closed-form formula (13).

4.4. Zigzag Function

In order to reduce (5) to a closed form, I am going to use the function Zigzag defined by
Zigzag ( x ) = min ( x x , x x ) .
The following fact is instrumental for that purpose.
Theorem 1. 
For every natural number n and every positive natural number m,
i = m 2 m 1 n + i 2 m i = 0 m 1 n + i 2 m = 2 m × Zigzag ( n 2 m ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
The equality (9) can be verified experimentally, for instance, with the help of software for symbolic computation; I used Wolfram Mathematica for that purpose. The analytic proof is deferred to Section 9. □
Corollary 1. 
For every natural number n and every positive natural number m,
i = 0 m 1 n + i 2 m = n 2 m × Zigzag ( n 2 m ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
First, let us note (analytic proof of this fact is a straightforward exercise; see Appendix B) that
i = 0 2 m 1 n + i 2 m = n .
From (11), I conclude
i = 0 m 1 n + i 2 m + i = m 2 m 1 n + i 2 m = n .
Solving Equations (9) and (12) for i = 0 m 1 n + i 2 m yields (10). □
Here is the closed form of summation (5).
Corollary 2. 
For every natural number n and every natural number k,
i = 0 2 k 1 n + i 2 k + 1 = n 2 2 k Zigzag ( n 2 k + 1 ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
Substitute m = 2 k in (10). □
The following theorem yields Formula (14) for the minimum number B ( n ) of comps performed by MergeSort .
Theorem 2. 
For every natural number n,
k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 = n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 = k = 0 lg n ( n 2 2 k Zigzag ( n 2 k + 1 ) ) =
= n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) .
Formula (14), although not a quite closed form, comprises summation with only lg n + 1 closed-form terms, so it may be evaluated in O ( log c ) time, where c is a constant. I will show in Section 5 that (14) does not have a closed form. Graphs of both sides of equality (14) are shown in Figure 6. One can see that, for natural numbers n, they coincide with the solution B ( n ) of recurrences (2) and (3) visualized in Figure 4.
Main Theorem. 
For every natural number n, the minimum number B ( n ) of comps that MergeSort performs while sorting an n-element array is
B ( n ) = n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
As it has been shown in Section 4.3, Formula (7) gives the minimum number B ( n ) of comps performed by MergeSort on any n-element array. Thus substituting B ( n ) for
k = 0 lg n i = 0 2 k 1 n + i 2 k + 1
in equality (14) of Theorem 2 yields (15). □

5. A Fractal in B ( n )

A deceitfully simple expression
k = 0 lg x 2 k + 1 Zigzag ( x 2 k + 1 ) ,
half of which occurs in Formula (15) of the Main Theorem, is a formidable adversary for those who may try to turn it into a closed form, although the time required for its evaluation for any given n is O ( log c ) (so, for all practical purposes, some may consider (15) a closed-form formula). That does not come as a surprise, taking into account that its graph, shown in Figure 7, bears a resemblance of fractal. This can be easily seen as soon as a sawtooth function 2 lg x + 1 x is subtracted from it, yielding the function F ( x ) given by
F ( x ) = k = 0 lg x 2 k + 1 Zigzag ( x 2 k + 1 ) 2 lg x + 1 + x .
Since 1 2 x 2 lg x + 1 < 1 , equality (8) implies
Zigzag ( x 2 lg x + 1 ) = 1 x 2 lg x + 1 ,
or
2 lg x + 1 Zigzag ( x 2 lg x + 1 ) = 2 lg x + 1 x .
Equality (18) simplifies definition (17) of function F to
F ( x ) = k = 1 lg x 2 k Zigzag ( x 2 k ) ,
visualized in Figure 8.
The function F is a fractal with quasi similarity that repeats at intervals of exponentially growing length. It is a union
F = k = 0 f k
of functions f k , each having an interval [ 2 k , 2 k + 1 ) as its domain. In other words, for every integer k 0 ,
f k = F [ 2 k , 2 k + 1 ) ,
which, of course, yields (20).
Let f ^ k be the normalized f k on interval [ 0 , 1 ) , defined by
f ^ k ( x ) = 1 2 k f k ( 2 k ( x + 1 ) ) ,
and f ˜ k be the periodized f ^ k by composing it with a sawtooth function x x (the fractional part of x) defined by
f ˜ k ( x ) = f ^ k ( x x ) .
Contracting definitions (21), (22), and (23) yields
f ˜ k ( x ) = 1 2 k F ( 2 k ( x x + 1 ) ) .
One can compute (an elementary geometric argument based on the graph visualized in Figure 9 will do) from (24) the following alternative formula for f ˜ k ( x ) :
f ˜ k ( x ) = i = 0 k 1 1 2 i Zigzag ( 2 i x ) .
Figure 9 shows functions f ˜ 0 , , f ˜ 6 drawn on the same graph.
Since each function f k , and—therefore—each function f ^ k , and—therefore—each function f ˜ k , are a result of smaller and smaller triangles piled, originating in the function Zigzag of definition (19) of function F, on one another as shown in Figure 9, for any integers 0 i < j , f ˜ i linearly interpolates f ˜ j . Because of that, each f ˜ i linearly interpolates the limit F ˜ of all f ˜ k s defined by
F ˜ ( x ) = lim k f ˜ k ( x ) ,
as Figure 10 illustrates. An application of (25) to (26) yields
F ˜ ( x ) = i = 0 1 2 i Zigzag ( 2 i x ) .
Since, for every integer n, k and i k , 2 i n 2 k is integer, Zigzag ( 2 i n 2 k ) = 0 . Therefore, by virtue of (25) and (27), for every non-negative integer k and n,
F ˜ ( n 2 k ) = f ˜ k ( n 2 k ) .
This and (25) eliminate the need for infinite summation (as it appears in (27)) while computing F ˜ ( n 2 k ) .
It can be shown that, although a continuous function, F ˜ —known under the name of the blancmange function or the Takagi fractal curve—is nowhere differentiable. As such, it does not have a closed-form formula as any closed-form formula on a real interval must define a function and have a derivative at every point of that interval, except for a non-dense set of its points. Since F ˜ can be expressed in a function, described by a closed-form formula, of the right-hand side of Formula (14), the latter does not have a closed-form formula either.
Theorem 3. 
For every closed-form formula φ ( n ) there is a positive n such that
k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) φ ( n ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
Follows from the above discussion. A more detailed analytic proof is deferred to Section 10. □
This way, we arrived at the following conclusion.
Corollary 3. 
There is no closed-form formula for B ( n ) .
Proof. 
A closed-form formula for B ( n ) would, by virtue of (15) yield a closed-form formula for k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) , which by Theorem 3 does not exist. □
Note 4. 
One can apply the reverse transformations to those used in Section 5 on the function F ˜ and construct a fractal function F ˘ , shown in Figure 11, given by the equation
F ˘ ( x ) = 2 lg x F ˜ ( x 2 lg x ) ,
which for every positive integer n satisfies
F ˘ ( n ) = F ( n ) ,
where F is given by (19).

6. Computing F ˜ ( x ) and B ( n ) from One Another

Computing values of the function F ˜ ( x ) does not have to be as complex as (or more complex than) definition (27) implies. Of course, for every integer n, F ˜ ( n ) = 0. One can apply some elementary arguments based on a structure visualized in Figure 10 to conclude that
F ˜ ( 2 3 ) = F ˜ ( 1 3 ) = 2 3
(the latter being the maximum of F ˜ ( x ) ) or that, for every positive integer k,
F ˜ ( 1 2 k ) = k 2 k .
It takes a bit more work to compute
F ˜ ( 3 2 k ) = 3 k 4 2 k .
It turns out that computing values of the function F ˜ ( x ) for every x that has a finite binary representation can be done easily if an oracle for computing the values of the function B ( n ) defined by (2) and (4) is given, which is not that surprising after a glance in Figure 11. Once that is accomplished, since F ˜ ( x ) is a continuous function and the set of numbers with finite binary representations is dense in the set R of reals, it allows for fast approximations of F ˜ ( x ) for every real x. (It helps to remember that F ˜ is a periodic function with F ˜ ( x ) = F ˜ ( x x ) .)
Theorem 4. 
For every positive integer n and integer k with n 2 k ,
F ˜ ( n 2 k ) = n × k 2 B ( n ) 2 k .
Proof. 
Equality (35) can be verified experimentally, for instance, with the help of software for symbolic computation (Section 4.4). The analytic proof is deferred to Section 11. □
Theorem 4 allows for easy computing of B ( n ) if F ˜ ( n 2 k ) is given for some k lg n using this form of (35):
Corollary 4. 
For every positive integer n and integer k with n 2 k ,
B ( n ) = n × k 2 2 k 1 F ˜ ( n 2 k ) .
Proof. 
An obvious conclusion from (35). □
For instance, putting k = lg n + 1 in (36) easily yields (15). For k = lg n , I obtain
B ( n ) = n lg n 2 2 lg n 1 F ˜ ( n 2 lg n ) =
[by (27)]
= n lg n 2 2 lg n 1 i = 0 1 2 i Zigzag ( 2 i n 2 lg n ) =
[since for i lg n , 2 i n 2 lg n is integer and Zigzag ( 2 i n 2 lg n ) = 0 ]
= n lg n 2 2 lg n 1 i = 0 lg n 1 1 2 i Zigzag ( 2 i n 2 lg n ) =
= n lg n 2 1 2 i = 0 lg n 1 2 lg n i Zigzag ( n 2 lg n i ) .
Substituting k for lg n i , I conclude
B ( n ) = n lg n 2 1 2 k = 1 lg n 2 k Zigzag ( n 2 k ) ,
similar to the (15) characterization of B ( n ) .

7. Relationship Between the Best Case and the Worst Case

An attentive student of MergeSort tends to believe (taking into account that substituting n 1 for n 2 in the recurrence relation (3) for the best case yields the recurrence relation for the worst case) that its worst-case behavior is about twice as bad as its best-case behavior. This, of course, is only approximately true. In this section, I will derive the exact difference between 2 B ( n ) and W ( n ) using the function F defined by (17).
An exact formula for the number W ( n ) of comparisons of keys performed by MergeSort in the worst case is known (see Formula (A7) in Appendix A) and is given for any positive integer n by the following equality:
W ( n ) = i = 1 n lg i .
From (15) and (17), one can derive
2 B ( n ) = n lg n 2 lg n + 1 + 2 n F ( n ) =
[by i = 1 n lg i = n lg n 2 lg n + 1 + n + 1 from [3]
= i = 1 n lg i 1 + n F ( n ) =
[by (38)]
= W ( n ) 1 + n F ( n ) .
The above equality yields the following characterization.
Theorem 5. 
For every positive integer n, the difference between twice the number B ( n ) of a comparison of keys performed in the best case and the number W ( n ) of a comparison of keys performed in the worst case by MergeSort while sorting an n-element array is
2 B ( n ) W ( n ) = n 1 F ( n ) ,
where F ( n ) , visualized in Figure 8, is given by (19).
Proof. 
Follows from the above discussion. □
In particular, since, for every positive integer n,
0 F ( n ) n 1 2
(see Figure 8 for an explanation), I conclude with the following tight linear bounds on 2 B ( n ) W ( n ) .
Corollary 5. 
For every positive integer n, the difference between twice the minimum number B ( n ) and the maximum number W ( n ) of a comparison of keys performed in the worst case by MergeSort while sorting an n-element array satisfies this inequality:
n 1 2 2 B ( n ) W ( n ) n 1 .
Proof. 
Follows from (39) and (40). □
Obviously, 2 B ( n ) W ( n ) = n 1 whenever F ( n ) = 0 , that is, whenever n = 2 lg n . It can be shown that 2 B ( n ) W ( n ) = n 1 2 whenever n= 1 3 ( 2 k + 1 + ( 1 ) k ) for some integer k 0 . Therefore, the number of comps performed by MergeSort in the best case is larger than half the number of comps performed by it in the worst case by at least by n 1 4 and at most by n 1 2 , which makes the former number significantly larger (by a term of size Θ ( n ) ) than half of the latter number.
A graph of 2 B ( n ) W ( n ) and its tight bounds are shown in Figure 12.

8. The Sum of Digits Problem

A known [but not to me during my derivation of Formula (15)] explicit formula, published in [4], for the total number A ( n , 2 ) of 1s in binary representations of all integers between 0 and n (not including 0 and n) is expressed in terms of the function Zigzag (referred to as 2 g in [4]). The following are verbatim quotations and screenshots from [4].
“If α ( κ , r ) denotes the sum of the digits of κ when κ is represented in base r, then”
Mathematics 14 00733 i001
“Let g ( x ) be periodic of period 1 and defined on [ 0 , 1 ] by”
Mathematics 14 00733 i002
It has been shown in [5] that the recurrence relation for A ( n , 2 ) is the same as the recurrence relation for B ( n ) given by (2) and (3). Therefore, Formula (14) derived in this article is equivalent to A ( n , 2 ) given above by the considerably more complicated definition. Interestingly, the above definition can be simplified to (14) along the lines of the elementary derivation of the alternative Formula (37) for B ( n ) .
Even more interestingly, if someone did succeed in simplifying Trollope’s formula in [4] then I am not aware of it. In particular, the most recent article [6] on the subject of sum of digits does not contain any hint that such a simplification has been found and published.

9. Proof of Theorem 1, Section 4.4

In this section, I provide an analytic proof of the experimentally derived Theorem 1, Section 4.4, which was instrumental for the derivation of a logarithmic-length Formula (15) for B ( n ) . The result and its proof have a flavor of Concrete Mathematics. Although they are interesting in their own right, they cannot be found in [7].
Theorem 6. 
(Same as Theorem 1.) For every natural number n and every positive natural number m,
i = m 2 m 1 n + i 2 m i = 0 m 1 n + i 2 m = 2 m × Zigzag ( n 2 m ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
Proof. 
First, let us note that
i = m 2 m 1 n + i 2 m i = 0 m 1 n + i 2 m = i = 0 m 1 n + i + m 2 m i = 0 m 1 n + i 2 m =
= i = 0 m 1 ( n + i 2 m + 1 2 n + i 2 m ) ,
that is,
i = m 2 m 1 n + i 2 m i = 0 m 1 n + i 2 m = i = 0 m 1 ( n + i 2 m + 1 2 n + i 2 m ) .
Let
n = k × 2 m + r ,
where 0 r < 2 m , and let 0 i < m . We have
n + i 2 m + 1 2 = k × 2 m + r + i 2 m + 1 2 = k + r + i 2 m + 1 2
and
n + i 2 m = k × 2 m + r + i 2 m = k + r + i 2 m .
Thus, by virtue of (43),
i = m 2 m 1 n + i 2 m i = 0 m 1 n + i 2 m = i = 0 m 1 ( r + i 2 m + 1 2 r + i 2 m ) .
We have
r + i 2 m + 1 2 r + i 2 m = 1   if 1 2 r + i 2 m < 1 0   otherwise ,
because r + i 2 m + 1 2 < 3 m 2 m + 1 2 = 2 so that r + i 2 m + 1 2 1 , and therefore, r + i 2 m + 1 2 r + i 2 m 1 .
Let I be defined as
I = { i N 1 2 r + i 2 m < 1 } = { i N m r i < 2 m r } .
By virtue of (46), we have
i = 0 m 1 ( r + i 2 m + 1 2 r + i 2 m ) = i I ( r + i 2 m + 1 2 r + i 2 m ) = i I 1 = # I ,
where # ( I ) denotes the cardinality of I.
If r m , then, by (47), # I = m ( m r ) = r . If r > m , then, by (47), # I = 2 m r . In any case,
# I = min ( r , 2 m r ) = 2 m min ( r 2 m , 1 r 2 m ) =
[since 0 r 2 m < 1 so that r + i 2 m = 0 and r + i 2 m = 1 ]
= 2 m min ( r 2 m r 2 m , r 2 m r 2 m ) =
[by definition (8) of the function Zigzag]
= 2 m × Z i g z a g ( r 2 m ) =
[since Zigzag is a periodic function with period 1]
= 2 m × Z i g z a g ( k + r 2 m ) = 2 m × Z i g z a g ( k × 2 m + r 2 m ) =
[by (44)]
= 2 m × Z i g z a g ( n 2 m ) .
Thus
# I = 2 m × Z i g z a g ( n 2 m ) .
From (43), (45), (48), and (49), I conclude (42). □

10. Proof of Theorem 3, Section 5

In this section, I provide an analytic proof of Theorem 3.
I begin with a brief discussion/motivation of what can be generally considered a closed-form formula for a function from the set of real numbers, except, perhaps, a finite number of reals, into a set of real numbers.
The general concept of closed-form formula (cff) has not been precisely defined in the literature. In this article, it denotes a formula that is composed of constants, variables, and some “standard” arithmetic and logical operations that has a finite and fixed structure and length that do not depend on the values of any variables occurring in the said formula. More specifically, in this article, it is used in the following sense.
The arithmetic constants are any integers with fixed values. The logical constants are true and false.
The basic arithmetic functions are: binary addition, binary multiplication, binary maximum and minimum, subtraction, division, exponentiation, logarithm functions, and the floor function. The basic arithmetic relations are: the equality relation and the less-than relation. The basic logical operations are: negation and binary conjunction.
An atomic cff is an expression that is a constant or variable, or denotes a reference to a basic function or relation whose arguments are constants or variables. The set of cffs is the smallest set of expressions that contains all atomic formulas and is closed under any finite and syntactically correct applications of basic functions and relations and the conditional arithmetic operation by the statement if c then d else e, where c is a logical cff and d and e are arithmetic cffs.
Note 5. 
One could use a more inclusive definition of closed-form formula without invalidating the results presented in this article as long as the extra arithmetic expressions allowed by it are differentiable on their domains except, perhaps, on their non-dense subsets.
Since non-rational real numbers are defined as limits of infinite, Cauchy-convergent rationals, limits of certain infinite sequences of reals do implicitly occur in definitions of some functions the references to which are accepted as cffs. Below is a motivating example of the function 2 x : R R as an insight of what is accepted as a cff for a continuous function—like, say, F ˜ ( x ) —on the set R of reals or on an interval thereof. One picks a dense (in the metric topology of R ) subset Q (in this example, the set of rationals) of R , with a collection of mappings ρ x ( i ) : N Q , where x R , given by ρ x ( i ) = i × x i so that lim i ρ x ( i ) = x . Since, for any x R Q , 2 x has been defined as
2 x = lim i 2 ρ x ( i ) = lim i 2 i × x i ,
lim i 2 i × x i is considered a cff for the function 2 x : R R .
For the reader’s convenience, Theorem 3 is quoted below as Theorem 7.
Theorem 7. 
(Same as Theorem 3). There is no cff  φ ( n ) the domain of which contains all positive integers and the values of which coincide with k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) for all positive integers n; that is, for every cff  φ ( n ) , there is a positive n such that
k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) φ ( n ) ,
where Zigzag is a function defined by (8) and visualized in Figure 5.
The rest of this section constitutes the proof of Theorem 7.
Lemma 1. 
For every positive integer n,
F ˜ ( n 2 lg n ) = n ( lg n + 2 ) 2 B ( n ) 2 lg n 2 ,
where the function F has been defined by equality (17); the function F ˜ , visualized in Figure 13 has been defined by equality (27); and the function B ( n ) has been defined by Equations (2) and (4).
Proof. 
From (17) I compute
i = 0 lg n 2 i + 1 Zigzag ( n 2 i + 1 ) = F ( n ) + 2 lg n + 1 n ,
that is,
i = 0 lg n 2 i Zigzag ( n 2 i + 1 ) = 1 2 F ( n ) + 2 lg n n 2 .
Applying equality (53) to equality (15) I conclude
B ( n ) = n 2 ( lg n + 1 ) ( 1 2 F ( n ) + 2 lg n n 2 ) ,
or
B ( n ) = n 2 ( lg n + 1 ) 1 2 F ( n ) 2 lg n + n 2 ,
that is,
1 2 F ( n ) = n 2 ( lg n + 1 ) B ( n ) 2 lg n + n 2 ,
or
F ( n ) = n ( lg n + 2 ) 2 B ( n ) 2 lg n + 1 .
On the other hand, by virtue of (19),
F ( n ) = i = 1 lg n 2 i Zigzag ( n 2 i ) =
[putting j = lg n i ]
= j = 0 lg n 1 2 lg n j Zigzag ( n 2 lg n j ) = 2 lg n j = 0 lg n 1 1 2 j Zigzag ( 2 j n 2 lg n ) =
[since for j lg n , 2 j n 2 lg n N so that Zigzag ( 2 j n 2 lg n ) = 0 ]
= 2 lg n j = 0 1 2 j Zigzag ( 2 j n 2 lg n ) =
[by (27)]
= 2 lg n F ˜ ( n 2 lg n ) .
Thus
F ( n ) = 2 lg n F ˜ ( n 2 lg n )
or
F ˜ ( n 2 lg n ) = 1 2 lg n F ( n ) .
Combining equalities (54) and (56) yields
F ˜ ( n 2 lg n ) = 1 2 lg n ( n ( lg n + 2 ) 2 B ( n ) 2 lg n + 1 ) ,
or (52). □
Lemma 2. 
If the function B ( n ) defined by equations (2) and (4) has a cff  β : N N , then the function F ˜ ( x ) defined by equation (27) has a cff  φ : [ 1 , 2 ) [ 0 , 2 3 ] .
Proof. 
Let
D = { n 2 lg n n N }
be the set of rationals in the interval [ 1 , 2 ) with finite binary representations, enumerated by ν ( n ) : N D given by ν ( n ) = n 2 lg n and visualized in Figure 14. (It is a trivial exercise to show that every real number with finite binary representation in the interval [ 1 , 2 ) is of the form n 2 lg n for some n N , and it is obvious that every real number of that form has a finite binary representation and falls into that interval).
D is a dense subset of the interval [ 1 , 2 ) of reals. Indeed, if x [ 1 , 2 ) , then, for every n N , 2 n x 2 n D and
lim n 2 n x 2 n = x .
Hence, for any x [ 1 , 2 ) , putting
n = 2 i x ,
so that
lg n = lg 2 i x = lg 2 i x = i + lg x = i + lg x = i
[the last equality holds because 1 x < 2 so that 0 lg x < 1 and lg x = 0 ], or
lg n = i ,
I conclude, by virtue of (58),
F ˜ ( x ) = F ˜ ( lim i 2 i x 2 i ) =
[by the continuity of F ˜ ( x ) ]
= lim i F ˜ ( 2 i x 2 i ) =
[by equality (52) of Lemma 1]
= lim i 2 i x ( i + 2 ) 2 B ( 2 i x ) 2 i 2 = lim i i 2 i x 2 B ( 2 i x ) 2 i + lim i 2 2 i x 2 i 2 =
[by equality (58)]
lim i i 2 i x 2 B ( 2 i x ) 2 i + 2 x 2 .
Thus for any x [ 1 , 2 ) ,
F ˜ ( x ) = lim i i 2 i x 2 B ( 2 i x ) 2 i + 2 x 2 .
Equality (61) shows that if there is a cff  β : N N for the function B defined by Equations (2) and (4) then there is a cff  φ : [ 1 , 2 ) [ 0 , 2 3 ] given by
lim i i 2 i x 2 β ( 2 i x ) 2 i + 2 x 2
for F ˜ ( x ) . This completes the proof of Lemma 2. □
Should the nowhere differentiable function F ˜ have a cff, it would be differentiable everywhere except, perhaps, on a non-dense subset of R . The following inductive argument demonstrates that. All atomic cffs are differentiable except, perhaps, on a non-dense subset of R . If a finite number of cffs are differentiable except, perhaps, on non-dense subsets of R , then their composition is differentiable except, perhaps, on non-dense subsets of R . (For instance, the function x Z i g z a g ( 1 x ) is differentiable on ( 0 , 1 ) , except for the non-dense set { 2 n n N } .) Thus F ˜ has no cff.
The above observation, together with Lemma 2, completes the proof of Theorem 7.

11. Proof of Theorem 4, Section 6

In this Section, I provide an analytic proof of experimentally derived Theorem 4, Section 6. This result, re-stated as Theorem 8 below, allows for practically efficient computations of values of the continuous blancmange function for reals with finite binary floating-point representations. I also provide some properties (Lemmas 3–5) of the Zigzag function, given by equality (8) and visualized in Figure 5, that are useful for a neat derivation of a formula for the blancmange function as the limit of a finite sum of some values of the Zigzag function.
Let the function F ˜ (known as the blancmange function), visualized in Figure 13, be defined by (27) and B ( n ) , given by (2) and (4), be the least number of comparisons of keys that MergeSort performs while sorting an n-element array.
Theorem 8. 
(Same as Theorem 4.) For every positive integer n and integer k with n 2 k ,
F ˜ ( n 2 k ) = n × k 2 B ( n ) 2 k .
The remainder of this section constitutes a proof of Theorem 8.
Note. The function Zigzag, visualized in Figure 5, has been defined by Equation (8).
Lemma 3. 
For every k lg n + 2 ,
2 k Z i g z a g ( n 2 k ) = n .
Proof. 
Let k lg n + 2 , or 2 k 2 × 2 lg n + 1 > 2 n , that is,
2 k > 2 n .
We have
0 n 2 k
[by (64)]
n 2 n = 1 2 = 0
or
n 2 k = 0 .
Additionally,
1 n 2 k
[by (64)]
n 2 n = 1 2 = 1
or
n 2 k = 1 .
Now,
2 k Z i g z a g ( n 2 k ) =
[by (8)]
= 2 k min { n 2 k n 2 k , n 2 k n 2 k } = min { n 2 k n 2 k , 2 k n 2 k n } =
[by (65) and (66)]
= min { n , 2 k n } =
[since by (64), 2 k n n ]
= n .
Hence, (63) holds. □
Lemma 4. 
For every k lg n + 1 ,
i = lg n + 2 k 2 i Z i g z a g ( n 2 i ) = n × k n ( lg n + 1 ) .
Proof. 
By induction on k.
Basis step: k = lg n + 1 .
L = i = lg n + 2 lg n + 1 2 i Z i g z a g ( n 2 i ) = 0 .
R = n ( lg n + 1 ) n ( lg n + 1 ) = 0 .
Hence, L = R . This completes the basis step.
Inductive step: k lg n + 2 .
Inductive hypothesis: (67).
i = lg n + 2 k + 1 2 i Z i g z a g ( n 2 i ) = i = lg n + 2 k 2 i Z i g z a g ( n 2 i ) + 2 k Z i g z a g ( n 2 k ) =
[by the inductive hypothesis and by equality (63) of Lemma 3]
= n × k n ( lg n + 1 ) + n = n × ( k + 1 ) n ( lg n + 1 ) .
Thus
i = lg n + 2 k + 1 2 i Z i g z a g ( n 2 i ) = n × ( k + 1 ) n ( lg n + 1 ) .
This completes the inductive step. □
Lemma 5. 
For every k lg n + 1 ,
2 k F ˜ ( n 2 k ) = i = 1 k 2 i Z i g z a g ( n 2 i ) .
Proof. 
By definition (27) of the function F ˜ , I get
2 k F ˜ ( n 2 k ) = 2 k i = 0 1 2 i Z i g z a g ( 2 i n 2 k ) =
[since, for every integer x, Z i g z a g ( x ) = 0 , so that, for i k , Z i g z a g ( 2 i n 2 k ) = 0 ]
= 2 k i = 0 k 1 1 2 i Z i g z a g ( 2 i n 2 k ) = i = 0 k 1 2 k i Z i g z a g ( n 2 k i ) =
[putting j = k i ]
= j = 1 k 2 j Z i g z a g ( n 2 j ) ,
which completes the proof of (68). □
At this point, I am ready to conclude the proof of Theorem 4.
By virtue of (15), e have
2 B ( n ) = n ( lg n + 1 ) k = 0 lg n 2 k + 1 Z i g z a g ( n 2 k + 1 ) =
= n ( lg n + 1 ) i = 1 lg n + 1 2 i Z i g z a g ( n 2 i ) =
= n ( lg n + 1 ) i = 1 k 2 i Z i g z a g ( n 2 i ) + i = lg n + 2 k 2 i Z i g z a g ( n 2 i ) =
[by Lemmas 4 and 5]
= n ( lg n + 1 ) 2 k F ˜ ( n 2 k ) + n × k n ( lg n + 1 ) = n × k 2 k F ˜ ( n 2 k ) ,
that is,
2 B ( n ) = n × k 2 k F ˜ ( n 2 k ) ,
from which (63) follows.
This completes the proof of Theorem 4.
Note. A glance at the proof of Lemma 4 suffices to notice that it fails if n > 2 k , and so does Theorem 4. In particular, for k = lg n , Lemma 1 yields
F ˜ ( n 2 k ) = n × k 2 B ( n ) 2 k + 2 n 2 k 2 > n × k 2 B ( n ) 2 k
since, for n > 2 lg n , 2 n 2 lg n 2 > 0 .

12. Final Remarks

It appears that a lack of sufficient interaction between research related to the sum of digits problem and analysis of recursive algorithms had a detrimental impact on the progress of the said research. In particular, I found that some clever use of recursion trees that are often utilized in the analysis of algorithms but are relatively rarely used in pure mathematics allowed for a relatively easy derivation of a finitary Formula (7) that, due to its finiteness, is significantly simpler and more practical to compute (but still requiring the addition of n 1 terms) than an instance for r = 2 of the infinite formula (1.1) derived in [4] and quoted in Section 8. I suppose that if the mathematicians interested in the sum of digits problem knew Formula (7), then some of them would simplify it to Formula (15) of the Main Theorem that can be evaluated in O ( l o g c n ) for some constant c.
A similar lack of communication between researchers has been hypothesized in [6]: “When reading the literature on the subject we have noted that the most recent papers do not always cite more ancient ones, confirming a remark of Stolarsky […]: Whatever its mathematical virtues, the literature on sums of digital sums reflects a lack of communication between researchers”.
On the other hand, those willing to find or quote from the literature a formula for the exact number of comparisons of keys performed by MergeSort in the best case on an n-element array might have been discouraged by finding Trollope’s infinite formula (1.1) that, for r = 2 , yields the count of 1s occurring in binary representations of all integers between 1 and n and has been proved, around 1974, to yield the said number of comps. To make things look even more discouraging, directly counting the said 1s requires Θ ( n lg n ) time that is about as long as running MergeSort on an increasingly sorted array of consecutive numbers from 1 to n and counting how many comps have been done. These could have been factors that stopped some of those analyzing MergeSort from trying to precisely characterize its best-case time complexity, thus leaving a significant loophole in many computer science faculty and students’ knowledge of the time complexity of MergeSort .
Fortunately, this article fills this important loophole.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The Author declares that no conflict of interest or ethical issues have occurred during the research described and in the preparation of this article.

Appendix A. A Derivation of the Worst-Case Running Time W ( n ) = i = 1 n lg i of MergeSort

Let us assume that n 2 is large enough to spur a cascade of many recursive calls to MergeSort following the recursion tree T, a sketch of which is shown in Figure 2.
The nodes in tree T correspond to calls to MergeSort and show sizes of (sub)arrays passed to those calls. The root corresponds to the original call to MergeSort . If a call that is represented by a node p executes further recursive calls to MergeSort , then these calls are represented by the children of p; otherwise, p is a leaf. Thus T is a 2-tree (a binary tree whose every non-leaf has exactly 2 children).
The levels in tree T are enumerated from 0 to h, where h is the number of the last level of the tree, or —in other words—the depth of T. In Figure 2, they are shown on the left side of the tree. The root is at the level 0, its children are at level 1, its grandchildren are at level 2, its great-grandchildren (not shown on the sketch in Figure 2) are at level 3, and so on. Clearly, since every call to MergeSort on a sub-array of size 2 executes two further recursive calls to MergeSort , only the nodes that show the value 1 are leaves and all other nodes have 2 children each. Thus, since all nodes in the last level h are leaves, they all show the value 1. Additionally, since the original input array gets split, eventually, onto n 1-element sub-arrays, the number of all leaves in T is n. (This, however, does not mean that the last level h necessarily contains all the leaves of T).
If a level i has 2 i nodes, each of them showing a value 2 , then each such node has 2 children so that level i + 1 has twice the number of nodes in level i, that is, 2 i + 1 nodes. Since level 0 has 2 0 nodes, it follows (completion of a proof by induction with the basis and inductive steps outlined above is left as an exercise for the reader) that if k is the level number of any level above which all the nodes show values 2 , then all levels i = 0 , k contain exactly 2 i nodes each.
The last level h may contain 2 h nodes or less. I am going to show that each level i above level h contains exactly 2 i nodes. Here is a very insightful property that I am going to use for that purpose. It states that MergeSort is splitting its input array fairly evenly so that, at any level of the recursive tree, the difference between the lengths of the longest sub-array and the shortest sub-array is 1 . This fact is the root cause of good worst-case performance of MergeSort .
Theorem A1. 
The difference between values shown by any two nodes in the same level of the recursion tree for MergeSort is 1 .
Proof. 
The property clearly holds for level 0. I will show that if it holds for level i and i is not the last level of the recursion tree (that is, i < h ), then it also holds for level i + 1 .
Let us assume that the property holds for some level i < h . Let c d be numbers shown by any two (not necessarily distinct) nodes in level i + 1 . It suffices to show that
d c 1 .
Let a b be the numbers shown by the parents of the above-mentioned nodes. Those parents, of course, must reside in level i. By the inductive hypothesis (that holds for level i), b a 1 , that is,
a b a + 1 .
The numbers shown by all their four children are a 2 , a 2 , b 2 , and b 2 , so the largest difference between any of those four numbers is b 2 a 2 . In particular, d c is not larger than that.
We have
d c b 2 a 2
[by (A2)]
a + 1 2 a 2 =
[since for any integer c, c 2 = c + 1 2 ]
= a + 2 2 a 2 = a 2 + 1 a 2 =
[since for every x, x + 1 = x + 1 ]
= a 2 + 1 a 2 = 1 .
Thus (A1) holds. This completes the inductive step and completes the proof of the property. □
As I have noted, the values shown at all nodes in the last level h are all 1. Thus the values shown at their parents that reside at level h 1 are all 2, and the values shown at their grand parents, which reside at level h 2 , are all 3 . Thus, by Theorem A1, all nodes at level h 2 show values 2 and, therefore (as I have proved before), all levels i = 0 , , h 1 have 2 i nodes each, as it has been visualized in Figure 2.
Theorem A2. 
The depth h of the recursion tree T ( n ) for MergeSort run on an array of size n is
h = lg n .
Proof. 
T ( n ) has n leaves (one for each of the n 1-element subarrays of the sorted array that are passed to (recursive) calls to MergeSort ). Since every level k of T ( n ) , except, perhaps, for the last level h, has the maximal number 2 k of nodes that any binary tree can have at level k, a binary tree with n leaves could not be any shorter than T ( n ) . Hence, T ( n ) is a shortest binary tree with n leaves. Therefore, (by a well-known fact—see, for instance, [2], page 68, Lemma 2.5), its depth h is equal to lg n . Thus the equality (A3) holds. □
Because each node in any level above h 1 shows a value 2 , it has 2 children. Thus the value it shows is equal to the sum of values shown by its children, as I have indicated at the beginning of this Section. From that, I conclude (a proof by induction is left as an exercise for the reader) that the sum of values shown at nodes in any level i = 0 , , h 1 is the same for each such level. Thus the said sum is equal to the value shown by the only node at level 0, that is, equal to n.
Let a 1 , , a 2 i be the values shown at the nodes of some level i = 0 , , h 1 . The number of comps performed by a call to Merge invoked by the call to MergeSort on an array of a j elements is either 0 if a j = 1 (no call to Merge is made) or, as I have shown in the previous section, is a j 1 if a j 2 . Therefore, in either case, it is a j 1 . Thus the number of comps C i performed at level i is
C i = ( a 1 1 ) + + ( a 2 i 1 ) = ( a 1 + + a 2 i ) ( 1 + + 1 ) 2 i = n 2 i .
Moreover, since all nodes at the last level h are 1s,
C h = 0 .
Therefore, the total number W ( n ) of comps that MergeSort performs in the worst case on an n-element array is equal to
W ( n ) = i = 0 h C i =
[by (A5)]
= i = 0 h 1 C i =
[by (A4)]
= i = 0 h 1 ( n 2 i ) = n h ( 2 h 1 ) = n h 2 h + 1 =
[by (A3)]
= n lg n 2 lg n + 1 .
This way, I have proved the following.
Theorem A3. 
The number W ( n ) of comparisons of keys that MergeSort performs in the worst case while sorting an n-element array is
W ( n ) = n lg n 2 lg n + 1 .
Proof. 
Follows from the above derivation. □
Using the well-known (see, for instance, [3]) closed-form formula for i = 1 n lg i , I conclude that
W ( n ) = i = 1 n lg i .

Appendix B. Proof of i = 0 m 1 n + i m = n

Theorem A4. 
For every natural number n and every positive natural number m,
i = 0 m 1 n + i m = n .
Proof. 
Let n = k m + l , where 0 l < m .
We have
n + i m = k m + l + i m = k + l + i m = k + l + i m .
Therefore,
i = 0 m 1 n + i m = m k + i = 0 m 1 l + i m = m k + i = m l m 1 l + i m = m k + i = m l m 1 1 = m k + l = n .

References

  1. Sedgewick, R.; Flajolet, P. An Introduction to the Analysis of Algorithms; Pearson: Boston, MA, USA, 2013. [Google Scholar]
  2. Baase, S. Computer Algorithms: Introduction to Design and Analysis, 2nd ed.; Addison-Wesley Publishing: Reading, MA, USA, 1991. [Google Scholar]
  3. Knuth, E.K. The Art of Computer Programming, 2nd ed.; Addison-Wesley Publishing: Reading, MA, USA, 1997; Volume 3. [Google Scholar]
  4. Trollope, J.R. An explicit expression for binary digital sums. Math. Mag. 1968, 41, 21–25. [Google Scholar] [CrossRef]
  5. McIlroy, M.D. The number of 1’s in binary integers: Bounds and extremal properties. SIAM J. Comput. 1974, 3, 255–261. [Google Scholar] [CrossRef]
  6. Allouche, J.-P.; Stipulanti, M. Summing the sum of digits. Commun. Math. 2024, 33, 2. [Google Scholar] [CrossRef]
  7. Knuth, D.; Graham, R.; Patashnik, O. Concrete Mathematics: A Foundation for Computer Science; Addison–Wesley: Reading, MA, USA, 1994. [Google Scholar]
Figure 1. A Java code of Merge , based on a pseudo-code from [2]. Calls to Boolean method Bcnt . incr ( ) count the number of comps and facilitate experimental verification of the formula derived and proved in this article.
Figure 1. A Java code of Merge , based on a pseudo-code from [2]. Calls to Boolean method Bcnt . incr ( ) count the number of comps and facilitate experimental verification of the formula derived and proved in this article.
Mathematics 14 00733 g001
Figure 2. A sketch of the recursion 2-tree T for MergeSort for n 10 , with level numbers shown on the left and the numbers of nodes in respective levels shown on the right. The nodes correspond to calls to MergeSort and show the sizes of (sub)arrays passed to those calls. The last level number is h, the depth of T; it only contains nodes with the value 1. The root corresponds to the original call to MergeSort . If a call that is represented by a node p executes further recursive calls to MergeSort , then these calls are represented by the children of p; otherwise, p is a leaf. Thus a node in T is a leaf if, and only if, it shows the number 1. The level h 1 of tree T may or may not contain some (but not all) leaves (not shown on the sketch), and if it does contain leaves then the level h of tree T contains less than n nodes, all of them being leaves. In any case, the total number of leaves in tree T is n. The wavy line represents a path of the right child in T.
Figure 2. A sketch of the recursion 2-tree T for MergeSort for n 10 , with level numbers shown on the left and the numbers of nodes in respective levels shown on the right. The nodes correspond to calls to MergeSort and show the sizes of (sub)arrays passed to those calls. The last level number is h, the depth of T; it only contains nodes with the value 1. The root corresponds to the original call to MergeSort . If a call that is represented by a node p executes further recursive calls to MergeSort , then these calls are represented by the children of p; otherwise, p is a leaf. Thus a node in T is a leaf if, and only if, it shows the number 1. The level h 1 of tree T may or may not contain some (but not all) leaves (not shown on the sketch), and if it does contain leaves then the level h of tree T contains less than n nodes, all of them being leaves. In any case, the total number of leaves in tree T is n. The wavy line represents a path of the right child in T.
Mathematics 14 00733 g002
Figure 3. The first four levels of the recursion 2-tree T from Figure 2, with the equality (1) applied, recursively. The number of comparisons of keys performed in the best case by Merge invoked in step 2c of Algorithm 1 as a result to a call to MergeSort corresponding to a node of T is equal to the number that is shown in its left child, highlighted yellow. All the children of the nodes at any level k h 1 of T show numbers of the form n + i 2 k + 1 , where 0 i < 2 k + 1 . All the right children of the nodes at the level k of T show numbers of the form n + i 2 k + 1 , where 2 k i < 2 k + 1 . Thus all the left children (highlighted yellow) at the level k show numbers of the form n + i 2 k + 1 , where 0 i < 2 k .
Figure 3. The first four levels of the recursion 2-tree T from Figure 2, with the equality (1) applied, recursively. The number of comparisons of keys performed in the best case by Merge invoked in step 2c of Algorithm 1 as a result to a call to MergeSort corresponding to a node of T is equal to the number that is shown in its left child, highlighted yellow. All the children of the nodes at any level k h 1 of T show numbers of the form n + i 2 k + 1 , where 0 i < 2 k + 1 . All the right children of the nodes at the level k of T show numbers of the form n + i 2 k + 1 , where 2 k i < 2 k + 1 . Thus all the left children (highlighted yellow) at the level k show numbers of the form n + i 2 k + 1 , where 0 i < 2 k .
Mathematics 14 00733 g003
Figure 4. Graph of the solution B ( n ) of the recurrences (2) and (4).
Figure 4. Graph of the solution B ( n ) of the recurrences (2) and (4).
Mathematics 14 00733 g004
Figure 5. Graph of function Zigzag ( x ) = min ( x x , x x ) .
Figure 5. Graph of function Zigzag ( x ) = min ( x x , x x ) .
Mathematics 14 00733 g005
Figure 6. Graphs of functions k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 (bottom line) and n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) (top line) of equality (14). They coincide with each other for all natural numbers n.
Figure 6. Graphs of functions k = 0 lg n i = 0 2 k 1 n + i 2 k + 1 (bottom line) and n 2 ( lg n + 1 ) k = 0 lg n 2 k Zigzag ( n 2 k + 1 ) (top line) of equality (14). They coincide with each other for all natural numbers n.
Mathematics 14 00733 g006
Figure 7. A graph of function k = 0 lg x 2 k + 1 Zigzag ( x 2 k + 1 ) plotted against a sawtooth function 2 lg x + 1 x (the orange line).
Figure 7. A graph of function k = 0 lg x 2 k + 1 Zigzag ( x 2 k + 1 ) plotted against a sawtooth function 2 lg x + 1 x (the orange line).
Mathematics 14 00733 g007
Figure 8. A graph of function F ( x ) = k = 1 lg x 2 k Zigzag ( x 2 k ) plotted below its tight linear upper bound y = x 1 2 (it can be shown that F ( x ) = x 1 2 whenever x= 1 3 ( 2 k + 1 + ( 1 ) k ) for some integer k 0 ); also shown below F ( x ) are the terms 2 k Zigzag ( x 2 k ) (the brown lines) of the summation and their tight linear upper bound y = x 3 (the orange line).
Figure 8. A graph of function F ( x ) = k = 1 lg x 2 k Zigzag ( x 2 k ) plotted below its tight linear upper bound y = x 1 2 (it can be shown that F ( x ) = x 1 2 whenever x= 1 3 ( 2 k + 1 + ( 1 ) k ) for some integer k 0 ); also shown below F ( x ) are the terms 2 k Zigzag ( x 2 k ) (the brown lines) of the summation and their tight linear upper bound y = x 3 (the orange line).
Mathematics 14 00733 g008
Figure 9. A graph of the first six (the first one is 0) normalized parts (in blue, orange, green, red and purple) of function F of Figure 8 plotted against the line y = i = 0 1 2 2 i + 1 = 2 3 (in brown). Also shown (in blue) are the first five terms 1 2 i Zigzag ( 2 i x ) , i = 0 , , 4 , of sums that occur in Formula (25) for f ˜ k ( x ) ; for each integer n and all x [ n , n + 1 ) , their parts above the X-axis restricted to [ n , n + 1 ) visualize a fragment of an infinite binary search trie T defined as the set of shortest binary representations of x x with the last digit 1 (if the said binary representation is finite) being interpreted as the sequence terminator; in particular, the root of T is 0.1, and if a is a finite binary sequence, then the children of the binary representation 0.a1 are 0.a01 and 0.a11.
Figure 9. A graph of the first six (the first one is 0) normalized parts (in blue, orange, green, red and purple) of function F of Figure 8 plotted against the line y = i = 0 1 2 2 i + 1 = 2 3 (in brown). Also shown (in blue) are the first five terms 1 2 i Zigzag ( 2 i x ) , i = 0 , , 4 , of sums that occur in Formula (25) for f ˜ k ( x ) ; for each integer n and all x [ n , n + 1 ) , their parts above the X-axis restricted to [ n , n + 1 ) visualize a fragment of an infinite binary search trie T defined as the set of shortest binary representations of x x with the last digit 1 (if the said binary representation is finite) being interpreted as the sequence terminator; in particular, the root of T is 0.1, and if a is a finite binary sequence, then the children of the binary representation 0.a1 are 0.a01 and 0.a11.
Mathematics 14 00733 g009
Figure 10. Functions f ˜ 0 ( x ) , f ˜ 1 ( x ) , and their limit (the topmost curve) F ˜ ( x ) (each plotted in a different color) given by (27). Collapsing Zigzag ( x ) would yield the same, albeit the scaled-down (by a factor of 2) pattern 1 2 F ˜ ( 2 x ) , as (27) does imply.
Figure 10. Functions f ˜ 0 ( x ) , f ˜ 1 ( x ) , and their limit (the topmost curve) F ˜ ( x ) (each plotted in a different color) given by (27). Collapsing Zigzag ( x ) would yield the same, albeit the scaled-down (by a factor of 2) pattern 1 2 F ˜ ( 2 x ) , as (27) does imply.
Mathematics 14 00733 g010
Figure 11. A graph (in orange) of function F ˘ ( x ) = 2 lg x F ˜ ( x 2 lg x ) plotted above a graph (in blue) of the function F ( x ) = k = 1 lg x 2 k Zigzag ( x 2 k ) .
Figure 11. A graph (in orange) of function F ˘ ( x ) = 2 lg x F ˜ ( x 2 lg x ) plotted above a graph (in blue) of the function F ( x ) = k = 1 lg x 2 k Zigzag ( x 2 k ) .
Mathematics 14 00733 g011
Figure 12. A graph of 2 B ( n ) W ( n ) shown (in blue) between graphs of its tight linear bounds n 1 (green) and n 1 2 (in orange).
Figure 12. A graph of 2 B ( n ) W ( n ) shown (in blue) between graphs of its tight linear bounds n 1 (green) and n 1 2 (in orange).
Mathematics 14 00733 g012
Figure 13. A graph of the function F ˜ ( x ) = i = 0 1 2 i Z i g z a g ( 2 i x ) .
Figure 13. A graph of the function F ˜ ( x ) = i = 0 1 2 i Z i g z a g ( 2 i x ) .
Mathematics 14 00733 g013
Figure 14. A graph of enumeration ν ( n ) = n 2 lg n of the set D.
Figure 14. A graph of enumeration ν ( n ) = n 2 lg n of the set D.
Mathematics 14 00733 g014
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Suchenek, M.A. Elementary yet Precise Best-Case Analysis of MergeSort with an Application to the Sum of Digits Problem. Mathematics 2026, 14, 733. https://doi.org/10.3390/math14040733

AMA Style

Suchenek MA. Elementary yet Precise Best-Case Analysis of MergeSort with an Application to the Sum of Digits Problem. Mathematics. 2026; 14(4):733. https://doi.org/10.3390/math14040733

Chicago/Turabian Style

Suchenek, Marek A. 2026. "Elementary yet Precise Best-Case Analysis of MergeSort with an Application to the Sum of Digits Problem" Mathematics 14, no. 4: 733. https://doi.org/10.3390/math14040733

APA Style

Suchenek, M. A. (2026). Elementary yet Precise Best-Case Analysis of MergeSort with an Application to the Sum of Digits Problem. Mathematics, 14(4), 733. https://doi.org/10.3390/math14040733

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop