An integer is prime if the only positive integers that divide it are 1 and itself; if is not prime we say it is composite. The number 1 is neither prime nor composite, but instead is called a unit.
This property is called unique factorization, and is one of the most important properties of prime numbers. If 1 were considered a prime, then unique factorization would fail; for example, if 1 were prime then and would be two different factorizations of 1701.
The proof is by contradiction. Assume there are only finitely many primes. We label them
. Consider the number
. Either this is a new prime not on our list, or it is composite. If it is composite, it must be divisible by some prime; however, it cannot be divisible by any prime on our list, as each of these give remainder 1. Thus m
is either prime or divisible by a prime not on our list. In either case, our list was incomplete, proving that there are infinitely many primes. With a little bit of effort, one can show that Euclid’s argument proves there is a
. The first few primes generated in this manner are 2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801. A fascinating question is whether or not every prime is eventually listed (see [13
The notation means .
A complex number is of the form . We call x the real part and y the imaginary part; we frequently denote these by and , respectively.
Let . Then . The integral test from calculus states that this series converges if and only if converges, and this integral converges if .
To see this, use the geometric series formula (see Footnote 9) to expand as and note that occurs exactly once on each side (and clearly every term from expanding the product is of the form for some n.
We give two quick proofs of the importance of the Euler product by showing how it implies there are infinitely many primes. The first is
, which means there must be infinitely many primes as otherwise the product is finite. The second proof is to note
is irrational; if there were only finitely many primes than the product would be rational. See for example [16
] for details.
The subject of meromorphic continuation belongs to complex analysis. For the benefit of the reader who hasn’t seen this, we give a brief example that will be of use throughout this paper, namely the geometric series formula . Note that while the sum makes sense only when , is well-defined for all and agrees with the sum whenever . We say is a meromorphic continuation of the sum.
One proof is to use the Gamma function, . A simple change of variables gives = . Summing over n represents a multiple of as an integral. After some algebra we find =+, with . Using Poisson summation, we find , which yields =+ , from which the claimed functional equation follows.
Let f be a meromorphic function with only finitely many poles on an open set U which is bounded by a ‘nice’ curve . Thus at each point we have with . If we say f has a zero of order N. If we say f has a pole of order , and in this case we call the residue of f at (for clarity, we often denote this by ). If f does not have a pole at , then the residue is zero. Our assumption implies that there are only finitely many points where the residue is non-zero. The residue theorem states . One useful variant is to apply this to , which then counts the number of zeros of f minus the number of poles; another is to look at where is analytic, which we will do later when stating the explicit formula for the zeros of the Riemann zeta function.
Some care is required with this sum, as
diverges. The solution involves pairing the contribution from
; see for example [17
To see this, note , while the contribution from with is bounded by (this is because implies ).
Partial summation is the discrete analogue of integration by parts [16
]. In our case,
is equivalent to
Note this is only true for zeros in the critical strip, namely ; for zeros outside the critical strip we can and do have zeros of not corresponding to zeros of because of poles of the Gamma function.
The Riemann Hypothesis is probably the most important mathematical aside ever in a paper. Riemann [18
] wrote (translated into English; note when he talks about the roots being real, he’s writing the roots as
, and thus
is the Riemann Hypothesis): …and it is very probable that all roots are real. Certainly one would wish for a stricter proof here; I have meanwhile temporarily put aside the search for this after some fleeting futile attempts, as it appears unnecessary for the next objective of my investigation.
Though not mentioned in the paper, Riemann had developed a terrific formula for computing the zeros of
, and had checked (but never reported!) that the first few were on the critical line
. His numerical computations were only discovered decades later when Siegel was looking through Riemann’s papers.
The prime number theorem is in fact equivalent to the statement that
for any zero of
. The prime number theorem was first proved independently by Hadamard [22
] and de la Vallée Poussin [23
] in 1896. Each proof crucially used results from complex analysis, which is hardly surprising given that Riemann had shown
is related to the zeros of the meromorphic function
. It was not until almost 50 years later that Erdös [14
] and Selberg [15
] obtained elementary proofs of the prime number theorem (in other words, proofs that did not use complex analysis, which was quite surprising as the prime number theorem was known to be equivalent to a statement about zeros of a meromorphic function). See [24
] for some commentary on the history of elementary proofs.
By this we mean . If the integrand is not absolutely convergent, then the value could depend on how we tend to infinity. The standard example is the Cauchy distribution, . Note , and .
The reason is we can always adjust our probability distribution, in this case, to have mean 0 and variance 1 by simple translations and rescaling. For example, if the density p has mean and variance , then has mean 0 and variance 1. Thus the third moment (or the fourth if the third vanishes) are the first moments that truly show the ‘shape’ of the density.
, the rth
. The reason for the non-uniqueness of moments is that the moment generating function
does not converge in a neighborhood of the origin. See [44
], Chapter 2.
The standard example is the function if and 0 otherwise. By using the definition of the derivative and L’Hopital’s rule, we see for all n, but clearly this function is non-zero if . Thus the radius of convergence is zero! This example illustrates how much harder real analysis can be than complex analysis. There if a function of a complex variable has even one derivative then it has infinitely many, and is given by its Taylor series in some neighborhood of the point.
We can consider the probability densities , where A is a probability density and . As almost all the probability (mass) is concentrated in a narrower and narrower band about the origin; we let be the limit with all the mass at one point. It is a discrete (as opposed to continuous) probability measure, with infinite density but finite mass. Note that acts like a unit point mass; however, instead of having its mass concentrated at the origin, it is now concentrated at .
As A is real symmetric, the eigenvalues are real (see Footnote 19) and thus this measure is well defined.
These ensembles have behavior that is often described by a parameter , which is 1 for real symmetric, 2 for complex Hermitian and 4 for symplectic matrices.
]. We quote from [47
] who quote from [48
]: There is no covering company responsible for organizing the city transport. Consequently, constraints such as a time table that represents external influence on the transport do not exist. Moreover, each bus is the property of the driver. The drivers try to maximize their income and hence the number of passengers they transport. This leads to competition among the drivers and to their mutual interaction. It is known that without additive interaction the probability distribution of the distances between subsequent buses is close to the Poisonian distribution and can be described by the standard bus route model.... A Poisson-like distribution implies, however, that the probability of close encounters of two buses is high (bus clustering) which is in conflict with the effort of the drivers to maximize the number of transported passengers and accordingly to maximize the distance to the preceding bus. In order to avoid the unpleasant clustering effect the bus drivers in Cuernevaca engage people who record the arrival times of buses at significant places. Arriving at a checkpoint, the driver receives the information of when the previous bus passed that place. Knowing the time interval to the preceding bus the driver tries to optimize the distance to it by either slowing down or speeding up.
The papers go on to show the behavior is well-modeled by random matrix theory (specifically, ensembles of complex Hermitian matrices)!
Explicitly, given two points with masses and and initial velocities and and located at and , we can describe how the system evolves in time given that gravity is the only force in play.
While there are known solutions for special arrangements of special masses, three bodies in general position is still open; see [49
] for more details.
Whether or not Pluto will regain planetary status is an entirely different question.
This is reminiscent of the Central Limit Theorem. For example, if we average over all sequences of tossing a fair coin times, we obtain N heads, and most sequences of tosses will have approximately N heads, where approximately means deviations on the order of .
The first reference to this conjecture in the literature might not have been until 1973 by Montgomery [3
If v is an eigenvector with eigenvalue of a Hermitian matrix A (so with the complex conjugate transpose of A, then ; the first expression is while the last is , with non-zero. Thus , and the eigenvalues are real. This is one of the most important properties of Hermitian matrices, as it allows us to order the eigenvalues.
In fact, one of the authors has used Weibull distributions to model run production in major league baseball, giving a theoretical justification for Bill James’ Pythagorean Won-Loss formula [62
Obviously this Weibull cannot be a normal distribution, as they have very different decay rates for large x, and this Weibull is a one-sided distribution! What we mean is that for this Weibull is well approximated by a normal distribution which shares its mean and variance, which are (respectively) and .
Historically, Frechet introduced this distribution in 1927, and Nuclear Physicists often refer to the Weibull distribution as the Brody distribution [25
There is an interesting perspective to proving more than a third of the zeros lie on the critical line. As zeros off the line occur in complex conjugate pairs, proving more than a third of all non-trivial zeros lie on the line is equivalent to more than a half of all zeros with real part at least 1/2 are on the line! Thus, in this sense, a ‘majority’ of all zeros are on the critical line.
The class number measures how much unique factorization fails in the ring of integers of a finite extension of
, and thus is an extremely important property of these fields. For example,
has unique factorization, while
does not (in the latter, note we can write 6 as either
, and none of these four numbers can be factored as
without one of the two factors being a unit (the units are numbers in the ring whose norm is 1; in
, these numbers are
we would also have numbers such as
). The class number problem is to find all imaginary quadratic fields with a given class number; see [70
] for more details and results. It turns out (see [72
]) that if there are many small (relative to the average spacing) gaps between zeros of
on the critical line, then there are terrific lower bounds for the class number; another connection between the class number and zeros of L
-functions is through the work of Goldfeld [73
] and Gross-Zagier [74
Of course, one has to invert these relations to find the eigenvalues!
] for proofs of the central limit theorem, or [16
] for a sketch of the proof.
For example, imagine for all N we always had half the moments equal 0 and the other half equal ; then the average is but no measure is close to the system average.
Many of the papers in the field have large sections devoted to handling combinatorics; see for instance [83
]. Interestingly, sometimes the combinatorics cannot
be handled, and in [87
] we believe the number theory and random matrix theory agree where both have been calculated, but we cannot do the combinatorics to prove this.
The proof is trivial if A is diagonal or upper diagonal, following by definition. As we only need this result for real symmetric matrices, which can be diagonalized, we only give the proof in this case. Let S be such that with diagonal. The claim follows from and .
With a little more work, we could calculate the variance of the average eigenvalue squared, using either the Central Limit Theorem or even Chebyshev’s Theorem (from probability).
This is similar in some sense to dimensional analysis arguments in physics, which detect parameter dependence but not the constants. For example, imagine a pendulum of mass m (in kilograms), length L (in meters) where the difference in rest height (when the pendulum is down) and the raised height (where it is at angle ) is meters. We assume the only force acting is gravity, with constant g (in meters per second squared). The period is how long (in seconds) it takes the pendulum to do a complete cycle, must be a function of m, L, and g; however, the only combinations of these quantities that give units of seconds are and ; thus the period must be a function of these two expressions. The correct answer turns out to be (at least for small initial displacements) approximately ; we are able to deduce the correct functional form, though the constants are beyond such analysis.
It is not hard to show the odd moments vanish by simple counting arguments. For the even moments, if the ’s are not matched in pairs then there is negligible contribution as . The proof follows by counting how many tuples there can be with a given matching, and then comparing that to (the proofs are somewhat easier if our distribution is even). For example, as we are assuming our distribution has zero mean, if ever there was an that was unmatched (so neither or occurs as the index in any other factor in the trace expansion, then the expectation of this term must vanish as each is drawn from a mean zero distribution. The number of valid pairings of the ’s (where everything is matched in pairs) is , which is the th moment of the standard normal; not ever matching contributes fully, though, and this is why the resulting moments are significantly smaller than the Gaussian’s.
The Catalan numbers are
. They arise in a variety of combinatorial problems; see for example [88
If the ensemble of matrices had a different symmetry, the start of the proof proceeds as above but the combinatorics changes. For example, looking at Real Symmetric Toeplitz matrices (matrices constant along diagonals) leads to very different combinatorics (and in fact a different density of states than the semi-circle); see [83
]. If we looked at d
-regular graphs, the combinatorics differs again, this time involving local trees; see [90
A graph is d-regular if each vertex is connected to exactly d neighbors.
The Fourier transform of g is .
This means for any that (i.e., g and all its derivatives tend to zero faster than any polynomial).
We want (1)
is symmetric; (2)
in the hyperplane
; see [86
The earliest occurrence was probably in Dirichlet’s work, who used L-functions attached to characters on to study primes in arithmetic progressions. Another example are elliptic curve L-functions, which (at least conjecturally) give information about the group of Mordell-Weil group of rational solutions of the elliptic curve.
This is because the zeros are tending to infinity. Thus, given any zero and any finite box, only finitely many zeros can be associated to it such that the required differences lie in the box. Therefore, this zero has negligible contribution in the limit as we are dividing by N.
For example, the Birch and Swinnerton-Dyer conjecture states the order of vanishing at the central point of an elliptic curve L-function equals the rank of the Mordell-Weil group of rational solutions of the elliptic curve; this is quite important information which we do not wish to discard!
These classical compact groups are much more natural random matrix ensembles. In our original formulation, we chose the matrix elements randomly and independently from some probability distribution p. What should we take for p? The GOE and GUE ensembles, where the entries are chosen from Gaussians, arise by imposing invariance on under orthogonal (respectively unitary) transformations; these are natural conditions to impose as the probability of a transformation should be independent of the coordinate system used to write it down. The classical compact groups come endowed with a natural probability, namely Haar measure.
The eigenvalues of a unitary matrix are of the form . To see this, let v be an eigenvector of U with eigenvalue . Note , which gives , so . Thus, similar to real symmetric and complex Hermitian matrices, we can parametrize the eigenvalues by a real quantity.
We use the normalization
. The Fourier transform has many nice properties on the Schwartz space (see [21
] for example).
These are very strong conditions, and most choices of
will not satisfy these requirements. Fortunately there are many choices that do work, and these frequently encode information about arithmetically interesting problems. The most studied examples include Dirichlet L
-functions, modular and Maass forms; see [99
] for details.
For example, there are three flavors of orthogonal symmetry, and their 1-level densities are indistinguishable if the support of
is contained in
; however, the 2-level densities of all three are distinguishable for arbitrarily small support [100
]. Another example is in obtaining better decay rates for high vanishing at the central point in a family [101
This means we avoid the trivial character if n is relatively prime to m and 0 otherwise; this character gives rise to a simple modification of .
We could consider the related family of quadratic Dirichlet characters coming from a fundamental discriminant with ; this family agrees with the scaling limit of symplectic matrices.
There are actually three flavors of orthogonal groups. If all the signs of the functional equation are even, the corresponding group should be , and similarly if the signs are all odd. We refer the reader to the previously mentioned surveys for the details.
In fact, the case being studied here has the best averaging formula! For cuspidal newforms we have the Petersson formula, and for families of elliptic curves we can use periodicity in evaluating Legendre sums of cubic polynomials. In general, however, unless our family is obtained in some manner from a family such as one of these, we do not possess the needed averaging formula.
is the set where multiplication and addition are modulo m; for example, if , and then , so , while . The hardest step in proving that this is a group under multiplication is finding inverses; one way to accomplish this is with the Euclidean algorithm.
These properties mean , and . Further, though initially only defined on , we extend the definition to all of by setting . As , or for all . Thus has to be an root of unity. We see each of these roots gives rise to a character (one of which is the principal or trivial character); by multiplicativity once we know the character’s action on the generator we know it everywhere.
This is not surprising, as the above argument is quite crude, where we have inserted absolute values and thus lost the expected cancelation due to the terms having opposite sign.
This just asserts that any ‘nice’ L-function has all of its zeros in the critical strip having real part equal to 1/2.
If =, then is related to a product over primes of . It is conjectured that these functions have functional equations, satisfy the Riemann hypothesis, et cetera. The existence of the Rankin-Selberg convolution is known for just a few choices of the ’s.
There are several difficulties with the proofs in general, ranging from not knowing properties of the Rankin-Selberg convolution in general to not having a good averaging formula over general families.
A special case of this theorem was discovered by Dueñez-Miller in [102
] in studies of a family of
and a family of L
-functions. The analysis there led to a disproof of a folklore conjecture that the theory of low-lying zeros of L
-functions is equivalent to a theory of the distribution of signs of the functional equations in the family; see [102
] for details. The key ingredient in the proofs is the universality of the second moments of the Satake parameters
; this is similar to the universality found by Rudnick and Sarnak [86
] in the n
-level correlations. The higher moments of the Satake parameters control the rate of convergence to the random matrix predictions.
Instead of attaching a symmetry constant, additionally one can attach a symmetry vector which incorporates other information, such as the rank of the family.
The search string was ’number theory and random matrix theory’.