Nuclei, Primes and the Random Matrix Connection

In this article, we discuss the remarkable connection between two very different fields, number theory and nuclear physics. We describe the essential aspects of these fields, the quantities studied, and how insights in one have been fruitfully applied in the other. The exciting branch of modern mathematics, random matrix theory, provides the connection between the two fields. We assume no detailed knowledge of number theory, nuclear physics, or random matrix theory; all that is required is some familiarity with linear algebra and probability theory, as well as some results from complex analysis. Our goal is to provide the inquisitive reader with a sound overview of the subjects, placing them in their historical context in a way that is not traditionally given in the popular and technical surveys.

In the early 1970's a remarkable connection was unexpectedly discovered between two very different fields, nuclear physics and number theory, when it was noticed that random matrix theory accurately modeled many problems in each. Random matrix theory was first used in the early 1900's in the study of the statistics of population characteristics [Wis]. The field developed rapidly in the 1950's when it was found to describe the spacing distributions of adjacent resonances (of the same spin and parity) observed in the interaction of low energy neutrons with nuclei [Wig5], and it flourished in the 1970's following a chance encounter between Hugh Montgomery and Freeman Dyson [Mon] (when they saw it also predicted answers to many of the most difficult problems in number theory).
In this review article we describe the subjects and the quantities studied, and how insights in one field have been fruitfully applied in the other. We assume no familiarity with either subject; for the most part, basic linear algebra and probability theory suffice (though we need some results from complex analysis on analytic continuation and contour integration for some of the number theory calcuations). As there are many mathematical surveys of the subject, as well as some popular accounts [Ha, Roc] of how the connection between the fields was noticed, our goal is to explain the broad brushstrokes of the theory without getting bogged down in the technical details. For those interested in a more mathematical survey, we recommend [Con2,Con3,FSV,KaSa2,KeSn3] (see also Section 1.8 of [Meh2]). Our point is to give the flavor of the subject, and bring these amazing connections to the attention of a wide audience. We concentrate on a representative sample of results and problems, and urge the interested reader to sample the bibliography. In particular, though we discuss many of the current statistics studied, we discuss the computations in detail only for the classical problem of the density of normalized eigenvalues of real symmetric matrices and the 1-level density for the family of Dirichlet L-functions. We chose these examples as the key steps in the analysis of these problems are similar to many others, but the mathematical prerequisites to follow the calculations are significantly less.
In particular, our choice means that there are many important topics which will have only the briefest of mention (if any). To do the field justice would require a significantly longer article than this. Our hope is that by keeping the pre-requisites modest a large audience will be able to appreciate the striking similarities between two very different fields, and get a sense as to the nature of the computations. There are a few places where real and complex analysis is used (Fourier transforms and the residue theorem), as well as some abstract algebra or group theory (mostly group homomorphisms from (Z/mZ) * to complex numbers of absolute value 1). We state all needed results, and when possible provide brief explanations and proofs. While in the last part of the paper we concentrate on Dirichlet L-functions, we do mention additional families of L-functions (the background material is more substantial here, and we give only the briefest mention of the needed facts).
The paper is organized as follows. In §2 we first give some number theory preliminaries to set the stage, describing some of the problems researchers are interested in, and how they are connected with the zeros of the Riemannn zeta function. We mention the famous Riemann Hypothesis, and how its veracity is related to understanding the prime numbers. This provides the motivation for studying the behavior of these zeros. The amazing observation, first noticed in the 1970's, is that many properties of these zeros can be modeled by random matrix theory, which had enjoyed a remarkable success in modeling nuclear physics. We briefly describe random matrix theory and discuss why it is applicable to so many problems. In §3 we describe some of the history of nuclear physics, concentrating on the experimental results which laid the groundwork for the introduction of random matrix theory. We then sketch the proof of one of the most important results in the subject, Wigner's semi-circle law, in §4. While other results are more closely related to the number theory quantities we wish to study, we give this proof as it highlights in a very accessible manner the techniques needed to attack a variety of problems. We then return to number theory in §5 and discuss some (but by no means all!) of the earliest applications of random matrix theory. We concentrate on the 1-level density, highlighting the similarities between this calculation and the proof of Wigner's semi-circle law. We give an interpretation of our number theory results in the language of nuclear physics, and then conclude with a very brief summary of some of the current avenues being explored.
For the reader: The core of the paper is Sections 2 and 3, where we describe the number theory problems and the nuclear physics history which led to the development of random matrix theory, as well as briefly summarizing random matrix theory. Sections 4 and 5 are more advanced (especially the latter), where we give details of the calculations. Many of the more technical comments and some proofs of claims are relegated to the footnotes; these may safely be skipped by the reader interested in the broad brushstrokes of the theory and subject. For the benefit of the reader, we have also included in the footnotes definitions and explanations of much of the assumed background material to help keep the paper accessible.
2. Introduction 2.1. Number Theory Preliminaries. The primes 1 are the building blocks of number theory: every integer can be written uniquely as a product of prime powers [HW]. 2 One of the most important questions we can ask about primes is also one of the most basic: how many primes are there at most x? In other words, how many building blocks are there up to a given point?
Euclid proved over 2000 years ago that there are infinitely many primes; so, if we let π(x) denote the number of primes at most x, we know lim x→∞ π(x) = ∞. Euclid's proof is still used in courses around the world. 3 Can we do better? How rapidly does π(x) go to infinity?
In particular, what can we say about lim x→∞ π(x)/x, which represents the probability that a number at most x is prime?
The answer is given by the Prime Number Theorem, which states the number of primes at most x is Li(x) + o(Li(x)), where Li(x) = x 2 dt/ log t and for x large, Li(x) is approximately x/ log x. 4 While it is possible to prove the prime number theorem elementarily [Erd,Sel2], the most informative proofs use complex numbers 5 and complex analysis, and lead to the fascinating connection between number theory and nuclear physics. One of the most fruitful approaches to understanding the primes is to understand properties of the Riemann zeta function, ζ(s), which is defined for ℜ(s) > 1 by ζ(s) = ∞ n=1 1 n s ; (2.1) the series converges for ℜ(s) > 1 by the integral test. 6 By unique factorization, it turns out that we may also write ζ(s) as a product over primes; 7 this is called the Euler product of ζ(s), and is one of its most important properties: 8 (2.2) argument proves there is a C > 0 such that π(x) ≥ C log log x. The first few primes generated in this manner are 2, 3,7,43,13,53,5,6221671,38709183810571,139,2801. A fascinating question is whether or not every prime is eventually listed (see [Sl]).
We call x the real part and y the imaginary part; we frequently denote these by ℜ(z) and ℑ(z), respectively.
The integral test from calculus states that this series converges if and only if ∞ 1 x −σ dx converges, and this integral converges if σ > 1.
7 To see this, use the geometric series formula (see Footnote 9) to expand (1 − p −s ) −1 as ∞ k=0 p −ks and note that n −s occurs exactly once on each side (and clearly every term from expanding the product is of the form n −s for some n. 8 We give two quick proofs of the importance of the Euler product by showing how it implies there are infinitely many primes. The first is 1/n s → ∞ as s → 1+, which means there must be infinitely many primes as otherwise the product is finite. The second proof is to note 1/n 2 = π 2 /6 is irrational; if there were only finitely many primes than the product would be rational. See for example [MT-B] for details. Initially defined only for ℜ(s) > 1, using complex analysis the Riemann zeta function can be meromorphically continued 9 to all of C, having only a simple pole with residue 1 at s = 1. It satisfies the functional The distribution of the primes is a difficult problem; however, the distribution of the positive integers is not, and has been completely known for quite some time! The hope is that we can understand n 1/n s as this involves sums over the integers, and somehow pass this knowledge on to the primes through the Euler product (see Footnote 8 for two examples). Riemann [Ri] (see [Ed] for an English translation) observed a fascinating connection between the zeros of ζ(s) and the error term in the prime number theorem. As this relation is the starting point for our story, we describe the details in some length in the next paragraph. This part is a bit more technical and relies on complex analysis. The reader may safely skip most of the next paragraph; the key piece for the rest of the paper is (2.8), where we show how the primes are connected to the zeros of ζ(s) (the function Λ(n) which appears is defined in (2.4).
One of the most natural things to do to a complex function is to take contour integrals of its logarithmic derivative [La, SS]; this will yield information about zeros and poles (we'll see later that we can get even more information if we weight the integral with a test function). There are two expressions for ζ(s); however, for the logarithmic derivative it is clear that we should use the Euler product over the sum expansion, 9 The subject of meromorphic continuation belongs to complex analysis. For the benefit of the reader who hasn't seen this, we give a brief example that will be of use throughout this paper, namely the geometric series formula 1 + r + r 2 + r 3 + · · · = 1/(1 − r). Note that while the sum makes sense only when |r| < 1, 1/(1 − r) is welldefined for all r = 1 and agrees with the sum whenever |r| < 1. We say 1/(1 − r) is a meromorphic continuation of the sum.
10 One proof is to use the Gamma function, Γ(s) = ∞ 0 e −t t s−1 dt. A simple change of variables gives ∞ 0 x 1 2 s−1 e −n 2 πx dx = Γ s 2 /n s π s/2 . Summing over n represents a multiple of ζ(s) as an integral. After some algebra we find Γ s +∞ n=1 e −n 2 πx . Using Poisson summation, we find ω 1 as the logarithm of a product is the sum of the logarithms. Let (this is proved by using the geometric series formula to write (1−p −s ) −1 as ∞ k=0 1/p s , collecting terms and then using the definition of Λ(n)). Moving the negative sign over and multiplying by x s /s, we find where we are integrating over some line ℜ(s) = c > 1. The integral on the right hand side is 1 if n < x and 0 if n > x (by choosing x non-integral, we don't need to worry about x = n), and thus gives n≤x Λ(n). By shifting contours and keeping track of the poles and zeros of ζ(s), the residue theorem 11 [La, SS] implies that the left hand side is (2.7) the x term comes from the pole of ζ(s) at s = 1 (remember we count poles with a minus sign), while the x ρ /ρ term arises from zeros; in both cases we must multiply by the residue, which is x ρ /ρ (it can be shown that ζ(s) has neither a zero nor a pole at s = 0). 12 The Riemann zeta function vanishes whenever ρ is a negative even integer; we call 11 Let f be a meromorphic function with only finitely many poles on an open set U which is bounded by a 'nice' curve γ. Thus at each point z 0 ∈ U we have f (x) = ∞ n=N a n (z − z 0 ) n with N > −∞. If N > 0 we say f has a zero of order N . If N < 0 we say f has a pole of order −N , and in this case we call a −1 the residue of f at z 0 (for clarity, we often denote this by Res(f, z 0 )). If f does not have a pole at z 0 , then the residue is zero. Our assumption implies that there are only finitely many points where the residue is non-zero. The residue theorem states 1 2πi γ f (z)dz = z∈U Res(f, z). One useful variant is to apply this to f ′ (z)/f (z), which then counts the number of zeros of f minus the number of poles; another is to look at f ′ (z)/f (z) · g(z) where g(z) is analytic, which we will do later when stating the explicit formula for the zeros of the Riemann zeta function.
12 Some care is required with this sum, as 1/|ρ| diverges. The solution involves pairing the contribution from ρ with ρ; see for example [Da]. these the trivial zeros. These terms contribute . This leads to the following beautiful formula: If we write n as p r , the contribution from all p r pieces with r ≥ 2 is bounded by 2x 1/2 log x for x large, 13 thus we really have a formula for the sum of the primes at most x, with the prime p weighted by log p. Through partial summation, knowing the weighted sum is equivalent to knowing the unweighted sum. 14 We can now see the connection between the zeros of the Riemann zeta function and counting primes at most x. The contribution from the trivial zeros is well-understood, and is just − 1 2 log(1−x −2 ). The remaining zeros, whose real parts are in [0, 1], are called the non-trivial or critical zeros. They are far more important and more mysterious. The smaller the real part of these zeros of ζ(s), the smaller the error. Due to the functional equation, however, if ζ(ρ) = 0 for a critical zero ρ then ζ(1 − ρ) = 0 as well. 15 Thus the 'smallest' the real part can be is 1/2. This is the celebrated Riemann Hypothesis. 16 It has a plethora of applications throughout number theory and mathematics; counting primes is but one of many. 17 It is clear, however, that the distribution 13 To see this, note p 2 ≤x log p ≤ x 1/2 log x, while the contribution from n = p r with r ≥ 3 is bounded by r≥3 p r ≤x log p ≤ x 1/3 2 log 2 x (this is because p r ≤ x implies r ≤ log 2 x ≤ 2 log x).
14 Partial summation is the discrete analogue of integration by parts [MT-B]. In our case, p≤x log p ∼ x is equivalent to p≤x 1 ∼ x/ log x.
15 Note this is only true for zeros in the critical strip, namely 0 ≤ ℜ(ρ) ≤ 1; for zeros outside the critical strip we can and do have zeros of ζ(s) not corresponding to zeros of ζ(1 − s) because of poles of the Gamma function.

16
The Riemann Hypothesis is probably the most important mathematical aside ever in a paper. Riemann [Ed, Ri] wrote (translated into English; note when he talks about the roots being real, he's writing the roots as 1/2+iγ, and thus γ ∈ R is the Riemann Hypothesis): ...and it is very probable that all roots are real. Certainly one would wish for a stricter proof here; I have meanwhile temporarily put aside the search for this after some fleeting futile attempts, as it appears unnecessary for the next objective of my investigation. Though not mentioned in the paper, Riemann had developed a terrific formula for computing the zeros of ζ(s), and had checked (but never reported!) that the first few were on the critical line ℜ(s) = 1/2. His numerical computations were only discovered decades later when Siegel was looking through Riemann's papers.

17
The prime number theorem is in fact equivalent to the statement that ℜ(ρ) < 1 for any zero of ζ(s). The prime number theorem was first proved independently by of the zeros of the Riemann zeta function will be of primary (in both senses of the word!) importance.
If we assume the Riemann Hypothesis, all the zeros in the critical strip (0 ≤ ℜ(ρ) ≤ 1) lie on the critical line ℜ(s) = 1/2, and it makes sense to talk about the distribution between adjacent zeros. The purpose of this note is to discuss one of the most powerful models used to predict the behavior of these zeros, namely random matrix theory. While other methods have since been developed, random matrix theory (which we describe in the next subsection) was the first to make truly accurate, testable predictions. The general idea is that the behavior of zeros of the Riemann zeta function are well-modeled by the behavior of eigenvalues of certain matrices. This idea had previously been successfully used to model the distribution of energy levels of heavy nuclei (some of the fundamental papers and books on the subject, ranging from experiments to theory, include [BFFMPW,DLL,Dy1,Dy2,FLM,FRG,FKPT,Gau,HH,HPB,Hu,Meh1,Meh2,MG,Po,Wig1,Wig2,Wig3,Wig4,Wig5,Wig6]). We describe the development of random matrix theory in nuclear physics in detail in the next section, and then delve into more of the details of the connection between the two subjects in §4 and §5.

Random Matrix Theory Preliminaries.
Before describing what we mean by random matrix theory and random matrix ensembles (i.e., sets of matrices), we quickly review the needed analysis and probability material, and then in the next subsection discuss why random matrix theory is so applicable at modeling a variety of problems.
Let p(x) be a continuous or discrete probability distribution. For notational convenience we assume p is continuous and use integral notation below, though similar statements hold in the discrete case. This means p(x) ≥ 0, ∞ −∞ p(x)dx = 1, and if X is a random variable with density p, then the probability X takes on a value in [a, b] is just b a p(x)dx.
Hadamard [Had] and de la Vallée Poussin [dlVP] in 1896. Each proof crucially used results from complex analysis, which is hardly surprising given that Riemann had shown π(x) is related to the zeros of the meromorphic function ζ(s). It was not until almost 50 years later that Erdös [Erd] and Selberg [Sel2] obtained elementary proofs of the prime number theorem (in other words, proofs that did not use complex analysis, which was quite surprising as the prime number theorem was known to be equivalent to a statement about zeros of a meromorphic function). See [Gol2] for some commentary on the history of elementary proofs.
Let X be a random variable with density p. We define the k th moment of p, denoted µ k or E[X k ], by (2.9) The zeroth moment is always 1, and the first moment is called the mean.
The second moment is related to the variance. Recall the variance σ 2 is defined by and equals the second moment if the mean is zero. For convergence issues, we typically are interested in random variables with zero mean, variance 1 and finite higher moments. 18 While at first it might seem restrictive to assume we have mean 0 and variance 1, this is actually equivalent to the first and second moments are finite. 19 The moments are extremely important for understanding a density. While it is not the case that the moments uniquely determine a probability distribution 20 , they do for sufficiently nice distributions. The situation is similar to the theory of Taylor series. It is sadly not the case that every 'nice' function agrees with its Taylor series in an arbitrarily small neighborhood about the point of expansion, even if by 'nice' we mean infinitely differentiable! 21 See [ShTa, Si] for more details.
18 By this we mean ∞ −∞ |x| k p(x)dx < ∞. If the integrand is not absolutely convergent, then the value could depend on how we tend to infinity. The standard example is the Cauchy distribution, p(x) = (π(1+x 2 )) −1 . Note The reason is we can always adjust our probability distribution, in this case, to have mean 0 and variance 1 by simple translations and rescaling. For example, if the density p has mean µ and variance σ, then g(x) = σ −1 p(σx + µ) has mean 0 and variance 1. Thus the third moment (or the fourth if the third vanishes) are the first moments that truly show the 'shape' of the density. 20 For r ∈ N, the r th moment of f 1 and f 2 is exp(r 2 /2). The reason for the non-uniqueness of moments is that the moment generating function M f (t) = ∞ −∞ exp(tx)f (x)dx does not converge in a neighborhood of the origin. See [CaBe], Chapter 2.

21
The standard example is the function f (x) = exp(−1/x 2 ) if |x| > 0 and 0 otherwise. By using the definition of the derivative and L'Hopital's rule, we see f (n) (0) = 0 for all n, but clearly this function is non-zero if |x| > 0. Thus the radius of convergence is zero! This example illustrates how much harder real analysis can be than complex analysis. There if a function of a complex variable has even one derivative then it has infinitely many, and is given by its Taylor series in some neighborhood of the point.
We can now describe random matrix theory and the ensembles we'll study. Consider a real symmetric matrix A, so a 11 a 12 a 13 · · · a 1N a 12 a 22 a 23 · · · a 2N . . .
We fix a density p, and define This means The goal is to understand properties of the eigenvalues of A. We accomplish this by studying a related measure where we place point masses at the normalized eigenvalues. We use the Dirac delta funtionals δ(x − x 0 ), which is a unit point mass at x 0 . This means To each real symmetric matrix A, we attach a probability measure 23 (2.11) in §4.1 we'll see why we are normalizing the eigenvalues as we have done here. This measure counts the number of normalized eigenvalues in an interval: (2.12) 22 We can consider the probability densities A N (x)dx, where A is a probability density and A N (x) = N · A(N x). As N → ∞ almost all the probability (mass) is concentrated in a narrower and narrower band about the origin; we let δ(x) be the limit with all the mass at one point. It is a discrete (as opposed to continuous) probability measure, with infinite density but finite mass. Note that δ(x − x 0 ) acts like a unit point mass; however, instead of having its mass concentrated at the origin, it is now concentrated at x 0 .
23 As A is real symmetric, the eigenvalues are real (see Footnote 19) and thus this measure is well defined.
Using the definition of the Dirac delta functional, the k th moment, which we denote M A,N (k), is readily computed: . (2.13) While this is a nice, explicit formula for the k th moment, it seems useless as we do not know the location of the eigenvalues of A; we will see in §4 that this is not the case at all. There are many other ensembles of matrices worth studying. In addition to real symmetric matrices, complex Hermitian and symplectic are frequently studied. 24 In this paper we concentrate on real symmetric matrices.
Random matrix theory models the behavior of a system by an appropriate set of matrices. Specifically, we calculate some quantity (say the probability two normalized eigenvalues are less than half the average spacing apart) for each matrix and then average over all matrices in our family. The hope, which is born out in many cases (ranging from number theory to nuclear physics to bus routes in Cuernevaca, Mexico 25 ), is that these system averages are close to the behavior of the system of interest. We describe this correspondence in greater detail below.
2.3. Why Random Matrix Theory. Why do random matrix models have a chance of giving useful answers to questions in nuclear physics 24 These ensembles have behavior that is often described by a parameter β, which is 1 for real symmetric, 2 for complex Hermitian and 4 for symplectic matrices.

25
See [BBDS, KrSe]. We quote from [BBDS] who quote from [KrSe]: There is no covering company responsible for organizing the city transport. Consequently, constraints such as a time table that represents external influence on the transport do not exist. Moreover, each bus is the property of the driver. The drivers try to maximize their income and hence the number of passengers they transport. This leads to competition among the drivers and to their mutual interaction. It is known that without additive interaction the probability distribution of the distances between subsequent buses is close to the Poisonian distribution and can be described by the standard bus route model.... A Poisson-like distribution implies, however, that the probability of close encounters of two buses is high (bus clustering) which is in conflict with the effort of the drivers to maximize the number of transported passengers and accordingly to maximize the distance to the preceding bus. In order to avoid the unpleasant clustering effect the bus drivers in Cuernevaca engage people who record the arrival times of buses at significant places. Arriving at a checkpoint, the driver receives the information of when the previous bus passed that place. Knowing the time interval to the preceding bus the driver tries to optimize the distance to it by either slowing down or speeding up. The papers go on to show the behavior is wellmodeled by random matrix theory (specifically, ensembles of complex Hermitian matrices)!

Figure 1. Molecules in a box
and other subjects? We consider one of the central problems of classical mechanics, namely the orbits in a solar system. It is possible to write down a closed form solution in the special case when there are just two point masses interacting through gravity. 26 The three body problem, however, defies closed form solutions. 27 From physical grounds we know of course that there is a solution; however, for our solar system we cannot analyze the solution well enough to determine whether or not billions of years from now Pluto will escape from the sun's influence! 28 As difficult as the above problem is, the situation is significantly worse when we try to understand the behavior of heavy nuclei. Uranium, for instance, has over 200 protons and neutrons in its nucleus, each subject to and contributing to complex forces. If we completely understood the theory of the nucleus, we could predict the energy levels; sadly, we are far from a complete understanding! As we'll see in the next section, physicists were able to gain some insights into the nuclear structure by shooting neutrons into the nucleus and analyzing the results; however, a complete understanding of the nucleus was, and still is, lacking.
How should we attack such a problem? It's useful to recall other complex problems from physics and how they were successfully modeled. We consider a standard problem in statistical mechanics, namely calculating the pressure on a wall. Consider the box in Figure 1. For simplicity we assume that every molecule is moving either left or right, 26 Explicitly, given two points with masses m 1 and m 2 and initial velocities v 1 and v 2 and located at r 1 and r 2 , we can describe how the system evolves in time given that gravity is the only force in play.

27
While there are known solutions for special arrangements of special masses, three bodies in general position is still open; see [Wh] for more details.

28
Whether or not Pluto will regain planetary status is an entirely different question. and all are traveling at the same speed. If we want to calculate the pressure on the left wall, we need to know how many particles strike the wall in an infinitesimal time. Thus we need to know how many particles are close to the left wall and moving towards it. In a room there would be at least a mole (about 6.022 · 10 23 ) of air molecules, which means that this computation is well beyond our abilities. Without going into all of the physics (see for example [Re]), we can get a rough idea of what is happening. The complexity, the enormous number of configurations of positions of the molecules, actually helps us. For each configuration we can calculate the pressure due to that configuration. We then average over all configurations, and it turns out that a generic configuration is close to the system average. This theory has enjoyed great success, and suggests a way to model nuclear physics.
Returning to our problem about heavy nuclei, from quantum mechanics we have the following equation governing our problem: (2.14) where H is the Hamiltonian, whose entries depend on system, E n are the energy levels and Ψ n are the energy eigenfunctions. Thus we have 'reduced' nuclear physics to linear algebra. Unfortunately, there are two difficulties with this approach. The first is that H is an infinite dimensional matrix, and the second is that we do not know any of the entries! This makes for quite a daunting task! Wigner's great insight was that this enormous complexity is similar to what we saw in Statistical Mechanics, and actually helps us. The interactions are so complex we might as well regard each entry as some randomly chosen number. Thus instead of considering the true H for the system, we consider N ×N real symmetric matrices with entries independently chosen from nice probability distributions. We compute whatever statistics we are interested in for these matrices, average over all matrices, and then take the N → ∞ scaling limit. The main result is that the behavior of the eigenvalues of an arbitrary matrix is often well approximated by the behavior obtained by averaging over all matrices, and this is a good model for many systems, ranging from the energy levels of heavy nuclei to the zeros of the Riemann zeta function. 29

Nuclear Physics History
Below we discuss some of the history of investigations of the nucleus, concentrating on the parts that led to the introduction of random matrix theory to the subject. We mention some of the connections with number theory, which will be explored in much greater detail later.
3.1. Introduction. The Riemann Hypothesis asserts that the nontrivial zeros of the Riemann zeta function are of the form ρ = 1/2 + iγ ρ with γ ρ real. About the year 1913, Pólya conjectured 30 that the γ ρ are the eigenvalues of a naturally occurring, unbounded, self-adjoint operator, and are therefore real. 31 Later, Hilbert contributed to the conjecture, and reportedly introduced the phrase 'spectrum' to describe the eigenvalues of an equivalent Hermitian operator, apparently by analogy with the optical spectra observed in atoms. This remarkable analogy pre-dated Heisenberg's Matrix Mechanics and the Hamiltonian formulation of Quantum Mechanics by more than a decade. Not surprisingly, the Pólya-Hilbert conjecture was considered so intractable that it was not pursued for decades, and Random Matrix Theory remained in a dormant state. To quote Diaconis [Di1]: "Historically, Random Matrix Theory was started by Statisticians [Wis] studying the correlations between different features of population (height, weight, income...). This led to correlation matrices with (i, j) entry the correlation between the ith and jth features. If the data were based on a random sample from a larger population, these correlation matrices are random; the study of how the eigenvalues of such samples fluctuate was one of the first great accomplishments of Random Matrix Theory." Diaconis [Di2] has given an extensive review of Random Matrix Theory from the perspective of a statistician. A strong argument can be made, however, that Random Matrix Theory, as we know it today in the Physical Sciences, began in a formal mathematical sense with the Wigner surmise [Wig5] concerning the spacing distribution of adjacent resonances (of the same spin and parity) in the interactions between low-energy neutrons and nuclei, discussed below. 30 The first reference to this conjecture in the literature might not have been until 1973 by Montgomery [Mon]. 31 If v is an eigenvector with eigenvalue λ of a Hermitian matrix the first expression is λ||v|| 2 while the last is λ||v|| 2 , with ||v|| 2 = v * v = |v i | 2 nonzero. Thus λ = λ, and the eigenvalues are real. This is one of the most important properties of Hermitian matrices, as it allows us to order the eigenvalues.
3.2. Nuclear Physics and Random Matrix Theory. The period from the mid-1930's to the late 1970's was the Golden Age of Neutron Physics; widespread interest in understanding the physics of the nucleus, coupled with the need for accurate data in the design of nuclear reactors, made the field of Neutron Physics of global importance in fundamental Physics, Technology, Economics, and Politics. In the mid-1950's, a discovery was made that turned out to have far-reaching consequences beyond anything that those working in the field could have imagined. For the first time, it was possible to study the microstructure of the continuum in a strongly-coupled, many-body system, at very high excitation energies. This unique situation came about as the result of the following facts: • Neutrons, with kinetic energies of a few electron-volts, excite states in compound nuclei at energies ranging from about 5 million electron-volts to almost 10 million electron-volts -typical neutron binding energies. Schematically, see Figure 2.
• Low-energy resonant states in heavy nuclei (mass numbers greater than about 100) have lifetimes in the range 10 −14 to 10 −15 seconds, and therefore they have widths of about 1 eV. The compound nucleus loses all memory of the way in which it is formed. It takes a relatively long time for sufficient energy to reside in a neutron before being emitted. This is a highly complex, statistical process. In heavy nuclei, the average spacing of adjacent resonances is typically in the range from a few eV to several hundred eV.
• Just above the neutron binding energy, the angular momentum barrier restricts the possible range of values of total spin of a resonance, J (J = I + i + l, where I is the spin of the target nucleus, i is the neutron spin, and l is the relative orbital angular momentum). This is an important technical point.
• The neutron time-of-flight method provides excellent energy resolution at energies up to several keV. (See Firk [Fi] for a review of time-of-flight spectrometers.) A 1-eV neutron travels 1 meter in 72.3 microseconds. At nonrelativistic energies, the energy resolution ∆E at an energy E is simply: where ∆t is the total timing uncertainty, and t E is the flight time for a neutron of energy E.
In 1958, the two highest-resolution neutron spectrometers in the world had total timing uncertainties ∆t ≈ 200 nanoseconds. For a flight-path length of 50 meters the resolution was ∆E ≈ 3 eV at 1 keV.
In 238 U + n, the excitation energy is about 5 MeV; the effective resolution for a 1 keV-neutron was therefore ∆E/E effective ≈ 6 · 10 −7 .
(3.2) (at 1 eV, the effective resolution was about 10 −11 ). Two basic broadening effects limit the sensitivity of the method, they are: (1) Doppler broadening of the resonance profile due to the thermal motion of the target nuclei; it is characterized by the quantity where A is the mass number of the target. If E = 1 keV and A = 200, δ ≈ 0.7 eV, a value that may be ten times greater than the natural width of the resonance.
(2) Resolution broadening of the observed profile due to the finite resolving power of the spectrometer. For a review of the experimental methods used to measure neutron total cross sections see Firk and Melkonian [FM]. Lynn [Ly] has given a detailed account of the theory of neutron resonance reactions.
In the early 1950's, the field of low-energy neutron resonance spectroscopy was dominated by research groups working at nuclear reactors. They were located at National Laboratories in the United States, the United Kingdom, Canada, and the former USSR. The energy spectrum of fission neutrons produced in a reactor is moderated in a hydrogenous material to generate an enhanced flux of low-energy neutrons. To carry out neutron time-of-flight spectroscopy, the continuous flux from the reactor is "chopped" using a massive steel rotor with fine slits through it. At the maximum attainable speed of rotation (about 20,000 rpm), and with slits a few thousandths-of-an-inch in width, it is possible to produce pulses each with a duration approximately 1 µsec. The chopped beams have rather low fluxes, and therefore the flight paths are limited in length to less than 50 meters. The resolution at 1keV is then ∆E ≈ 20 eV, clearly not adequate for the study of resonance spacings about 10 eV.
In 1952, there were only four accelerator-based, low-energy neutron spectrometers operating in the world. They were at Columbia University in New York City, Brookhaven National Laboratory, the Atomic Energy Research Establishment, Harwell, England, and at Yale University. The performances of these early accelerator-based spectrometers were comparable with those achieved at the reactor-based facilities. It was clear that the basic limitations of the neutron-chopper spectrometers had been reached, and therefore future developments in the field would require improvements in accelerator-based systems.
In 1956, a new high-powered injector for the electron gun of the Harwell electron linear accelerator was installed to provide electron pulses with very short durations (typically less than 200 nanoseconds) [FRG]. The pulsed neutron flux (generated by the (γ, n) reaction) was sufficient to permit the use of a 56-meter flight path; an energy resolution of 3 eV at 1 keV was achieved.
At the same time, Professors Havens and Rainwater (pioneers in the field of neutron time-of-flight spectroscopy) and their colleagues at Columbia University were building a new 385-MeV proton synchrocyclotron a few miles north of the campus (at the Nevis Laboratory). The accelerator was designed to carry out experiments in meson physics and low-energy neutron physics (neutrons generated by the (p, n) reaction). By 1958, they had produced a pulsed proton beam with duration of 25 nanoseconds, and had built a 37-meter flight path [RDRH, DRRH]. The hydrogenous neutron moderator generated an effective pulse width of about 200 nanoseconds for 1 keV-neutrons. In 1960, the length of the flight path was increased to 200 meters, thereby setting a new standard in neutron time-of-flight spectroscopy [GRPH].
3.3. The Wigner Surmise. At a conference on Neutron Physics by Time-of-Flight, held in Gatlinburg, Tennessee on November 1st and 2nd, 1956, Professor Eugene Wigner (Nobel Laureate in Physics, 1963) presented his surmise regarding the theoretical form of the spacing distribution of adjacent neutron resonances (of the same spin and parity) in heavy nuclei. At the time, the prevailing wisdom was that the spacing distribution had a Poisson form (see, however, [GP]). The limited experimental data then available was not sufficiently precise to fix the form of the distribution (see [Hu]). The following quotation, taken from Wigner's presentation at the conference, introduces the concept of random matrices in Physics, for the first time: "Perhaps I am now too courageous when I try to guess the distribution of the distances between successive levels. I should re-emphasize that levels that have different J-values (total spin) are not connected with each other. They are entirely independent. So far, experimental data are available only on even-even elements. Theoretically, the situation is quite simple if one attacks the problem in a simple-minded fashion. The question is simply 'what are the distances of the characteristic values of a symmetric matrix with random coefficients?' We know that the chance that two such energy levels coincide is infinitely unlikely. We consider a two-dimensional matrix, a 11 a 12 a 21 a 22 , in which case the distance between two levels is (a 11 − a 22 ) 2 + 4a 2 12 . This distance can be zero only if a 11 = a 22 and a 12 = 0. The difference between the two energy levels is the distance of a point from the origin, the two coordinates of which are (a 11 − a 22 ) and a 12 . The probability that this distance is S is, for small values of S, always proportional to S itself because the volume element of the plane in polar coordinates contains the radius as a factor.
The probability of finding the next level at a distance S now becomes proportional to SdS. Hence the simplest assumption will give the probability for a spacing between S and S + dS.
If we put x = ρS = S/ S , where S is the mean spacing, then the probability distribution takes the standard form where the coefficients are obtained by normalizing both the area and the mean to unity." This form, in which the probability of zero spacing is zero, is strikingly different from the Poisson form in which the probability is a maximum for zero spacing. The form of the Wigner surmise had been previously discussed by Wigner himself [Wig1], and by Landau and Smorodinsky [LS], but not in the spirit of Random Matrix Theory. It is interesting to note that the Wigner distribution is a special case of a general statistical distribution, named after Professor E. H. Waloddi Weibull (1887Weibull ( -1979, a Swedish engineer and statistician [Wei]. For many years, the distribution has been in widespread use in statistical analyses in industries such as aerospace, automotive, electric power, nuclear power, communications, and life insurance. 32 The distribution gives the lifetimes of objects and is therefore invaluable in studies of the failure rates of objects under stress (including people!). The Weibull probability density function is where x ≥ 0, k > 0 is the shape parameter, and λ > 0 is the scale parameter. We see that Wei(x; 2, 2/ √ π) = p(x), the Wigner distribution. Other important Weibull distributions are given in the following list • Wei(x; 1, 1) = exp(−x) the Poisson distribution; • Wei(x; 2, λ) = Ray(λ), the Rayleigh distribution; • Wei(x; 3, λ) is approximately a normal distribution. 33 For Wei(x; k, λ), the mean is λΓ (1 + (1/k)), the median is λ log(2) 1/k , and the mode is λ(k − 1) 1/k /k 1/k , if k > 1. As k → ∞, the Weibull distribution has a sharp peak at λ. 34 At the time of the Gatlinburg conference, no more than 20 s-wave neutron resonances had been clearly resolved in a single compound nucleus and therefore it was not possible to make a definitive test of the Wigner surmise. Immediately following the conference, J. A. Harvey and D. J. Hughes [HH], and their collaborators, working at the fastneutron-chopper-groups at the high flux reactor at the Brookhaven National Laboratory, and at the Oak Ridge National laboratory, gathered their own limited data, and all the data from neutron spectroscopy groups around the world, to obtain the first global spacing distribution of s-wave neutron resonances. Their combined results, published in 1958, showed a distinct lack of very closely spaced resonances, in agreement with the Wigner surmise.
By late 1959, the experimental situation had improved, greatly. At Columbia University, two students of Professors Havens and Rainwater completed their PhD theses; one, Joel Rosen [RDRH], studied the first 55 resonances in 238 U+n up to 1 keV, and the other, J Scott Desjardins [DRRH], studied resonances in two silver isotopes (of different spin) in the same energy region. These were the first results from the new high-resolution neutron facility at the Nevis cyclotron.
At Harwell, Firk, Lynn, and Moxon [FLM] completed their study of the first 100 resonances in 238 U + n at energies up to 1.8 keV; 33 Obviously this Weibull cannot be a normal distribution, as they have very different decay rates for large x, and this Weibull is a one-sided distribution! What we mean is that for 0 ≤ x ≤ 2 this Weibull is well approximated by a normal distribution which shares its mean and variance, which are (respectively) Γ(4/3) ≈ .893 and Γ(5/3) − Γ(4/3) 2 ≈ .105. 34 Historically, Frechet introduced this distribution in 1927, and Nuclear Physicists often refer to the Weibull distribution as the Brody distribution [BFFMPW]. their measurement of the total neutron cross section for the interaction 238 U + n in the energy range 400-1800 eV is shown in Figure 3.
When this experiment began in 1956, no resonances had been resolved at energies above 500 eV. The distribution of adjacent spacings of the first 100 resonances in the single compound nucleus, 238 U + n, ruled out an exponential distribution and provided the best evidence (then available) in support of Wigner's proposed distribution.
Over the last half-century, numerous studies have not changed the basic findings. At the present time, almost 1000 s-wave neutron resonances in the compound nucleus 239 U have been observed in the energy range up to 20 keV. The latest results, with their greatly improved statistics, are shown in Figure 4 [DLL].
3.4. Further Developments. The first numerical investigation of the distribution of successive eigenvalues associated with random matrices was carried out by Porter and Rozenzweig in the late 1950's [PR]. They diagonalized a large number of matrices where the elements are generated randomly but constrained by a probability distribution. The analytical theory developed in parallel with their work: Mehta [Meh1], Mehta and Gaudin [MG], and Gaudin [Gau]. At the time it was clear that the spacing distribution was not influenced significantly by the chosen form of the probability distribution. Remarkably, the n × n distributions had forms given almost exactly by the original Wigner 2 × 2 distribution.
The linear dependence of p(x) on the normalized spacing x (for small x) is a direct consequence of the symmetries imposed on the (Hamiltonian) matrix, H(h ij ). Dyson [Dy1] discussed the general mathematical properties associated with random matrices and made fundamental contributions to the theory by showing that different results are obtained when different symmetries are assumed for H. He introduced three basic distributions; in Physics, only two are important, they are: • the Gaussian Othogonal Ensemble (GOE) for systems in which rotational symmetry and time-reversal invariance holds (the Wigner distribution): p(x) = (π/2)x exp (−(π/4)x 2 ); • the Gaussian Unitary Ensemble (GUE) for systems in which time-reversal invariance does not hold (French et al. [FKPT]): p(x) = (32/π 2 )x 2 exp(−(π/4)x 2 ).
The mathematical details associated with these distributions are given in [Meh1].
The impact of these developments was not immediate in Nuclear Physics. At the time, the main research endeavors were concerned with the structure of nuclei-experiments and theories connected with Shell-, Collective-, and Unified models, and with the nucleon-nucleon interaction. The study of Quantum Statistical Mechanics was far removed from the main stream. Almost two decades went by before Random Matrix Theory was introduced in other fields of Physics (see, for example, Bohigas, Giannoni and Schmit [BGS] and Alhassid [Al]).
3.5. From Physics to Number Theory. Interestingly, the next development occurred in an area having nothing to do with Physics. In the field of Number Theory, perhaps the greatest unsolved problem has to do with the Riemann conjecture (that dates from the mid-19 th century): if ζ(s) = 1/n s , then every complex number ρ in the critical strip (0 ≤ ℜ(ρ) ≤ 1) at which the analytic continuation of ζ(s) has a non-trivial zero has real part equal to 1/2. In 1914, Hardy [Har] proved that there are infinitely many zeros of the zeta function on the line critical line ℜ(s) = 1/2. Later Selberg [Sel1] proved a small positive percentage are on this line; this was improved by Levinson [Lev] to a third, and now thanks to Conrey [Con1] we know at least two-fifths lie on the line. 35 35 There is an interesting perspective to proving more than a third of the zeros lie on the critical line. As zeros off the line occur in complex conjugate pairs, proving more than a third of all non-trivial zeros lie on the line is equivalent to more than In the early 1970's, Hugh Montgomery, a mathematician at the University of Michigan, was investigating the relative spacing of the zeros of the zeta function [Mon] (because of applications to the class number problem 36 ). Let us recall that, if we have a series of points distributed randomly along a line, with average density normalized to 1, and we treat the coordinates of the points as independent random variables, then the probability of finding j points in a given interval of length x is the Poisson distribution For our real symmetric and complex Hermitian random matrix ensembles, the probability of finding more than one eigenvalue in a short interval is less than that given by the Poisson distribution -the eigenvalues of the random matrix are said to 'repel' each other. The pair and higher level correlation function describe this effect (we discuss these functions in greater detail later in the paper; knowing all the correlation functions is equivalent to knowing the neighbor spacings). Montgomery studied the pair correlation function for the zeros of the zeta function and he gave evidence that it has the asymptotic form be ∈ Z} does not (in the latter, note we can write 6 as either 2 · 3 or (1 + i √ 5)(1 − i √ 5), and none of these four numbers can be factored as (a + ib √ 5)(c + id √ 5) without one of the two factors being a unit (the units are numbers in the ring whose norm is 1; in Z[i √ 5], these numbers are ±1 (in Z[ √ 5] we would also have numbers such as 2 + √ 5, as (2 + √ 5)(−2 + √ 5) = 1). The class number problem is to find all imaginary quadratic fields with a given class number; see [St, Wa] for more details and results. It turns out (see [CI]) that if there are many small (relative to the average spacing) gaps between zeros of ζ(s) on the critical line, then there are terrific lower bounds for the class number; another connection between the class number and zeros of L-functions is through the work of Goldfeld [Gol1] and Gross-Zagier [GZ]. (an ensemble without time-reversal invariance). In a masterful numerical calculation of the distribution of spacings between zeros of the zeta function, Andrew Odlyzko [Od1,Od2] tested the Montgomery conjecture by studying millions of normalized zeros near the 10 20th and the 10 22nd zero of ζ(s). His computed correlation function shows remarkable agreement with Montgomery's form (see Figure 5).
As we shall see, this work continues to have a profound impact on developments in contemporary Number Theory.
In the remaining two sections, we explore one statistic from random matrix theory (the density of eigenvalues) and one from number theory (the 1-level density of low-lying zeros). Though these statistics are not exactly analogous, they are similar. The reason we chose to study these two are that the general steps of the proofs are similar, and thus this provides a nice introduction to how intuitions and methods in one field can be transferred to another.

Wigner's Semi-circle law
4.1. Wigner's Semi-circle Law (Statement). We state and prove a version of Wigner's semi-circle law below. We refer the reader to [ERSY,ESY,TV1,TV2] for the most general version and proof of the semi-circle law as well as spacings between adjacent eigenvalues. We content ourselves with this special case as this version is easy to state and prove, and the conditions are frequently satisfied in practice.
Theorem 4.1 (Wigner's semi-circle law). Consider the ensemble of N × N real symmetric matrices with entries independent, identically distributed random variables from a fixed probability distribution p(x) with mean 0, variance 1, and other moments finite. Then for almost all A, as N → ∞ In other words, the number of normalized eigenvalues in an interval [a, b] ⊂ [−1, 1] is found by integrating the semi-circle over that interval.
Note that such a result could never hold for all A, as given any ǫ > 0 there is always a small (though rapidly tending to zero!) probability that we've chosen a matrix that is within ǫ units from being a diagonal (i.e., each non-diagonal entry is at most |ǫ|).
For example, consider Figures 6 and 7. In the first we've drawn the entries from the standard normal. This satisfies the conditions of Wigner's semi-circle law, and we see already that with just 400 × 400 matrices the fit is excellent.
How essential are the conditions in the theorem? Does the result hold even if these conditions are violated, but perhaps the proof is just harder (or currently unknown)? To investigate this, we choose instead of the standard normal the Cauchy distribution (π(1 + x 2 )) −1 . This distribution clearly has infinite variance, and thus obviously fails to satisfy the conditions. (It also has no mean as the integral of |x| is infinite.) We see that the behavior is decidedly non-semi-circular (the huge probabilities at the end are the probabilities of observing an eigenvalue that far or further).
We will use the Method of Moments to prove the semi-circle law. We briefly summarize how we can pass from knowledge of the moments to knowledge of the eigenvalue distribution. Recall the k th moment is . Imagine we had a 1 × 1 matrix and we knew the first moment of the eigenvalues (well, here it would just be eigenvalue). We have one equation in one unknown: (4.1) this is clearly solvable and we can express λ 1 (A) in terms of µ 1 . Imagine now we have a 2 × 2 matrix. Then we have two equations in two unknowns: 1 2 5/2 (λ 1 (A) + λ 2 (A)) = µ 1 1 2 4 λ 1 (A) 2 + λ 2 (A) 2 = µ 2 . (4.2) For almost all µ 1 and µ 2 this is solvable (actually, we do not have to worry about this ever not being solvable, as the λ i (A) are always drawn from a matrix, and thus the equations will be consistent). We can therefore express the two eigenvalues in terms of the first two moments. Similarly, if we looked at the first three moments of a 3 × 3 matrix we would have enough information to find the eigenvalues. In the general case, we need to know the first N moments to find the eigenvalues of an N × N matrix; 37 as we are letting N → ∞, we need to compute all the moments to determine the eigenvalues. The idea of the proof is as follows. For each matrix A we calculate its moments; let us denote the k th moment of A by M A,N (k). The 37 Of course, one has to invert these relations to find the eigenvalues! We then show that lim N →∞ M N (k) = C(k), the k th moment of the semi-circle. This is almost, but not quite, enough to then conclude that a central limit theorem type situation occurs 38 , and a generic eigenvalue measure is close to the system average (which converges to the semi-circle as N → ∞). The reason it is not sufficient is that we must also control the variances 39 ; however, this is easily done by similar arguments (see for example [HM, MMS]).
38 See [GS, Fel] for proofs of the central limit theorem, or [MT-B] for a sketch of the proof.
39 For example, imagine for all N we always had half the moments equal 0 and the other half equal 2C(k); then the average is C(k) but no measure is close to the system average.

Wigner's Semi-circle Law (Sketch of Proof).
We sketch the proof of Wigner's Semi-circle Law. As we've stated earlier, the reason we chose to prove this (as opposed to many of the other results) is that this proof is mostly self-contained, and highlights the key features of proofs in the subject.
There are typically three steps in working with random matrix ensembles (or the corresponding number theory quantities). We state these steps below, and then elaborate in great detail.
(2) Develop an explicit formula relating what we want to study to something we understand.
(3) Use an averaging formula to analyze the quantities above.
Note it is not always trivial to figure out what is the correct statistic to study, and frequently very advanced combinatorics are needed to analyze the quantities. 40 We describe these steps in detail for our random matrix ensembles and the semi-circle law. The key input is the following basic result from linear algebra, the Eigenvalue Trace Lemma. 41 (4.4) As the trace of a matrix is the sum of its diagonal entries, The Eigenvalue Trace Lemma allows us to do the first two steps, namely determine the correct scale and relate what we want to study 40 Many of the papers in the field have large sections devoted to handling combinatorics; see for instance [HM, Rub, RS]. Interestingly, sometimes the combinatorics cannot be handled, and in [Gao] we believe the number theory and random matrix theory agree where both have been calculated, but we cannot do the combinatorics to prove this.

41
The proof is trivial if A is diagonal or upper diagonal, following by definition. As we only need this result for real symmetric matrices, which can be diagonalized, we only give the proof in this case. Let S be such that A = SΛS −1 with Λ diagonal. The claim follows from A k = SΛ k S −1 and Trace(ABC) = Trace(BCA).
(the eigenvalues) to what we know (the matrix elements we randomly choose). We'll see later the analogue of this in number theory.
⋄ For the first step, we take k = 2 and find where λ(A) 2 denotes the average of the square of the eigenvalues. As a ij = a ji and these are drawn from a probability distribution with mean 0 and variance 1, we have E[a 2 ij ] = 1, as this is just the variance. Thus the expected value of N times the average eigenvalue square is just N 2 , so the average eigenvalue square is of size N, so (heuristically) the average eigenvalue is of size √ N. 42 Why do we then normalize the eigenvalues by dividing by 2 √ N instead of √ N? The reason is to make the final formula 'clean' (i.e., this allows us to say the semi-circle law instead of the semi-ellipse); arguments such as these will capture the dependence on the key parameter (in this case, the N-dependence), but it will not catch constant dependence. 43 ⋄ For the second step, we want to understand the eigenvalues but it is the matrix elements we choose. The Eigenvalue Trace Lemma says λ i (A) k = Trace(A k ); thus (2.13) becomes This is a terrific exchange. We have discussed how knowing the moments of the eigenvalues suffices to determine the eigenvalues; this allows us to express these moments in terms of the quantities we are 42 With a little more work, we could calculate the variance of the average eigenvalue squared, using either the Central Limit Theorem or even Chebyshev's Theorem (from probability).
43 This is similar in some sense to dimensional analysis arguments in physics, which detect parameter dependence but not the constants. For example, imagine a pendulum of mass m (in kilograms), length L (in meters) where the difference in rest height (when the pendulum is down) and the raised height (where it is at angle θ) is L 0 meters. We assume the only force acting is gravity, with constant g (in meters per second squared). The period is how long (in seconds) it takes the pendulum to do a complete cycle, must be a function of m, L, L 0 and g; however, the only combinations of these quantities that give units of seconds are L/g and L 0 /g; thus the period must be a function of these two expressions. The correct answer turns out to be (at least for small initial displacements) approximately 2π L/g; we are able to deduce the correct functional form, though the constants are beyond such analysis.

choosing.
⋄ For the third and final step, we note that in order for (4.7) to be useful we must be able to average it over all A in our family; in other words, we must compute The advantage is that Trace(A k ) is a polynomial in the matrix entries, and the integrals above can be readily evaluated. For example, the integral for the average second moment is ji · p(a 11 )da 11 · · · p(a N N )da N N .

The integration factors as
(the first piece is 1 because it is a variance and thus is 1 by assumption, while the others are 1 as this is one of the defining properties of a probability distribution).
While the second moment calculation looks simple, the higher moment calculations require more involved computations and combinatorics 44 , in particular the Catalan numbers. 45 The point is these computations can be done, and this analysis completes the proof (see [MT-B] for the general arguments, and [Leh] for the combinatorics calculation). 46 4.3. Additional statistics. There are numerous other statistics we could investigate; the density of normalized eigenvalues is by no means the most natural, but it does highlight the general features. The more fundamental statistic is the spacings between adjacent normalized eigenvalues, and not their density (though the density is used to rescale to have mean spacing 1, allowing us to compare apples and apples). These spacings can either be attacked directly, or through the n-level correlations and combinatorics. It is conjectured that for our ensembles of real symmetric matrices, the spacings between normalized eigenvalues converges to a universal measure independent of p. This measure is approximately (π/2)x exp (−(π/4)x 2 ). Until very recently this was only known if the matrix elements were chosen from normal distributions; however, there has been great progress since the original version of this paper was written. L. Erdös, J. A. Ramirez, B. Schlein, T. Tao, V. Vu and H.-T. Yau [ERSY,ESY,TV1,TV2] have removed this assumption and greatly generalized the class of matrices where the conjecture is known; the interested reader should see these 44 It is not hard to show the odd moments vanish by simple counting arguments. For the even moments, if the a ij 's are not matched in pairs then there is negligible contribution as N → ∞. The proof follows by counting how many tuples there can be with a given matching, and then comparing that to N k/2+1 (the proofs are somewhat easier if our distribution is even). For example, as we are assuming our distribution has zero mean, if ever there was an a i ℓ i ℓ+1 that was unmatched (so neither (i ℓ , i ℓ+1 ) or (i ℓ+1 , i ℓ ) occurs as the index in any other factor in the trace expansion, then the expectation of this term must vanish as each a ij is drawn from a mean zero distribution. The number of valid pairings of the a ij 's (where everything is matched in pairs) is (2k − 1)!! = (2k − 1)(2k − 3) · · · , which is the 2k th moment of the standard normal; not ever matching contributes fully, though, and this is why the resulting moments are significantly smaller than the Gaussian's.

45
The Catalan numbers are C n = 1 n+1 2n n . They arise in a variety of combinatorial problems; see for example [Stan].

46
If the ensemble of matrices had a different symmetry, the start of the proof proceeds as above but the combinatorics changes. For example, looking at Real Symmetric Toeplitz matrices (matrices constant along diagonals) leads to very different combinatorics (and in fact a different density of states than the semi-circle); see [HM, MMS]. If we looked at d-regular graphs, the combinatorics differs again, this time involving local trees; see [McK].  papers for details. As the best result is constantly being improved, the interested reader should check the arxiv, http://arxiv.org/, for the current status. Below we give some numerics from investigations in the bulk of the spectrum (i.e., normalized eigenvalues near 0). Our first example is when p is the uniform distribution on [−1, 1] (Figure 8). Already for 300 × 300 matrices we see excellent agreement with the conjecture.
What about other distributions, for example the Cauchy density (π(1 + x 2 )) −1 ? We saw earlier that the density of states was decidedly non-semi-circular. It is a different story for the spacings (Figure 9). Already for 300 × 300 matrices we see excellent agreement with the conjecture.
There are numerous other ensembles where the density of states is non-semi-circular but the spacing between adjacent eigenvalues seems to agree with the conjecture. For example, McKay [McK] proved the density of states of d-regular graphs 47 is Kesten's measure (which does converge to the semi-circle as d → ∞), and simulations by many (including [JMRR]) see the conjectured behavior. This is also apparent in more advanced tests, where the distribution of the largest eigenvalue is observed to follow a β = 1 Tracy-Widom distribution (see [MNS]). These distributions govern the largest eigenvalues in many settings; see [TW1,TW2,TW3]. If the ensemble has a very different structure,  however (such as the Toeplitz ensembles in [BDJ, HM, MMS]) then both statistics could behave differently.

From Random Matrix Theory to Number Theory
We now discuss the path from random matrix theory to number theory. As this story has been told numerous times, we concentrate on some illuminating aspects and refer the reader to the references previously mentioned. We concentrate on a very small subset of statistics and connections; for example, we will almost completely ignore the contributions to studying moments of the zeta function. Our goal is to explain how the behavior of some key statistics in number theory are the same as the corresponding statistics in random matrix theory. We concentrate on ζ(s) and its simplest generalization, Dirichlet Lfunctions, though these results hold for a larger class of L-functions as well (exactly how large a class is the subject of many research programs).
The starting point of our analysis is Riemann's Explicit Formula, which is a natural generalization of (2.5). Let φ(s) be a 'nice' function. We have Shifting contours, by the residue theorem the left hand size is basically φ(1) − ρ φ(ρ), while the right hand side is (for s = σ + it): The integral is basically the Fourier transform 48 (up to some constants) of f (t) = φ(σ + it). A careful analysis [MT-B] gives the following explicit formula relating sums of a test function over zeros of ζ(s) to sums of the Fourier transform over primes.
Theorem 5.1 (Explicit Formula). Let ρ denote the sum over the non-trivial zeros of ζ(s) (i.e., the zeros in critical strip), g an even Schwartz function 49 of compact support and φ(r) = ∞ −∞ g(u)e iru du. Write ρ as 1/2 + iγ ρ ; if the Riemann Hypothesis is true then γ ρ is real. We have note up to scale that g and φ are essentially a Fourier transform pair.

Preliminaries.
We now come to the key moment of our story, when Montgomery and Dyson [Mon] noticed the agreement between the pair correlation of zeros of ζ(s) and eigenvalues of complex Hermitian matrices. The pair correlation statistic of a set {x 1 , x 2 , . . . } is where I is an arbitrary interval. We can generalize this to triple correlation (which would be how often pairs of differences are in a box) and higher; knowing all the correlations is equivalent to knowing the spacing between adjacent elements. Instead of using a box or hypercube we can use a smooth test function. 50 Odlyzko [Od1,Od2] observed phenomenal agreement between adjacent zeros and the corresponding distribution for spacings between adjacent eigenvalues of complex Hermitian matrices; see Figure 10. 48 The Fourier transform of g is g(ξ) = ∞ −∞ g(x)e −2πixξ dx. 49 This means for any m, n ≥ 0 that lim |x|→∞ (1 + x 2 ) m g (n) (x) = 0 (i.e., g and all its derivatives tend to zero faster than any polynomial). 50 We want (1) f (x 1 , . . . , x n ) is symmetric; (2) f (x+t(1, . . . , 1)) = f (x) for t ∈ R; (3) f (x) → 0 rapidly as |x| → ∞ in the hyperplane k x j = 0; see [RS]. Hejhal [Hej] proved (for suitable test functions) that the triple correlation of ζ(s) agrees with random matrix theory, while Rudnick and Sarnak [RS] showed agreement with the n-level correlations of any Lfunction attached to a cuspidal automorphic representation of GL m /Q. To describe these L-functions in detail would be too much of a digression, so we will content ourselves with a very brief introduction, referring the interested reader to [RS] for details. The Riemann zeta function has an Euler product, a functional equation, and is conjectured to have all of its zeros in the critical strip 0 ≤ ℜ(s) ≤ 1 on the line ℜ(s) = 1/2. The generalization is a series such as ∞ n=1 a n /n s , where the a n are of arithmetic interest. In order to call this series an L-function, we require it to have certain properties (such as an Euler product and a functional equation, as well as some growth rates on the a n 's). We call these L-functions, and they arise throughout number theory. 51 In random matrix theory, in order to understand the behavior of the eigenvalues of one matrix A we embedded it in a family of random matrices, and showed that with high probability the behavior of the eigenvalues of A are close to the ensemble average (at least as N → ∞). For the n-level correlations, we do not need to perform any averaging over L-functions; we may study an individual L-function. The reason is that the density of zeros in the critical strip whose imaginary part is of size T is (up to some constants) of size 1/ log T ; in other words, the higher up we go, the more densely the zeros of an L-function are packed together, and thus one L-function provides enough zeros high up to average.
The results mentioned above suggested that, for the purposes of number theory, it sufficed to know how random complex Hermitian matrices behave, as the zeros of ζ(s) (and other L-functions) high up on the critical line showed remarkable agreement with these eigenvalues. This turned out, however, to only be part of the story. The reason is that the n-level correlations are insensitive to finitely many zeros. In other words, if we were to remove the 1701 zeros nearest to the critical point s = 1/2, the n-level correlations would not change. 52 This is a major problem for number theory, as often we expect there to be behavior at the central point of arithmetic interest. 53 Katz and Sarnak [KaSa1,KaSa2] showed that, as the size of the matrices tends to infinity, the n-level correlations of complex Hermitian matrices also equals those of N × N unitary matrices, as well as its orthogonal and symplectic subgroups. 54 Thus when we say that the 51 The earliest occurrence was probably in Dirichlet's work, who used L-functions attached to characters on (Z/mZ) * to study primes in arithmetic progressions. Another example are elliptic curve L-functions, which (at least conjecturally) give information about the group of Mordell-Weil group of rational solutions of the elliptic curve. 52 This is because the zeros are tending to infinity. Thus, given any zero and any finite box, only finitely many zeros can be associated to it such that the required differences lie in the box. Therefore, this zero has negligible contribution in the limit as we are dividing by N .

53
For example, the Birch and Swinnerton-Dyer conjecture states the order of vanishing at the central point of an elliptic curve L-function equals the rank of the Mordell-Weil group of rational solutions of the elliptic curve; this is quite important information which we do not wish to discard! 54 These classical compact groups are much more natural random matrix ensembles. In our original formulation, we chose the matrix elements randomly and independently from some probability distribution p. What should we take for p? The zeros behave like eigenvalues of complex Hermitian matrices, we could have also said they behave like eigenvalues of unitary matrices (or one of their subgroups). 55 We thus need a new statistic to study which will 'break' this symmetry, and say which ensemble is truly modeling the behavior. Further, our statistic should take into account the behavior near the central point, as interesting arithmetic occurs there. One popular choice is the 1-level density, which we now describe. 5.2. 1-level Density (Preliminaries). Let φ(x) be an even Schwartz function. This means for any m, n ≥ 0 that lim |x|→∞ (1 + x 2 ) m φ (n) (x) (i.e., φ and all its derivatives tend to zero faster than any polynomial). We also assume the Fourier Transform 56 is compactly supported; this means there is some σ < ∞ such that φ(ξ) = 0 if |ξ| ≥ σ.
Consider an L-function we assume this series converges for ℜ(s) > 1, has a meromorphic extension to all of C satisfying a functional equation, and has an Euler product. 57 We have remarked that high up, the spacing between zeros is like 1/ log T at height T ; what is it near s = 1/2? The answer can be deduced from an analysis of the functional equation, which shows there is some number C f (called the analytic conductor) such that the zeros near the central point are spaced on the order of 1/ log C f . This suggests we study the following statistic: GOE and GUE ensembles, where the entries are chosen from Gaussians, arise by imposing invariance on Prob(A)dA under orthogonal (respectively unitary) transformations; these are natural conditions to impose as the probability of a transformation should be independent of the coordinate system used to write it down. The classical compact groups come endowed with a natural probability, namely Haar measure.

55
The eigenvalues of a unitary matrix are of the form e iθ . To see this, let v be an eigenvector of U with eigenvalue λ. Note v * U * U v = v * v, which gives |λ| 2 ||v|| 2 = ||v|| 2 , so |λ| = 1. Thus, similar to real symmetric and complex Hermitian matrices, we can parametrize the eigenvalues by a real quantity. 56 We use the normalization φ(ξ) = ∞ −∞ φ(x)e −2πixξ dx. The Fourier transform has many nice properties on the Schwartz space (see [SS] for example).
57 These are very strong conditions, and most choices of λ f (n) will not satisfy these requirements. Fortunately there are many choices that do work, and these frequently encode information about arithmetically interesting problems. The most studied examples include Dirichlet L-functions, modular and Maass forms; see [IK] for details.
Definition 5.2 (The 1-level density). Let φ be an even Schwartz function and L(s, f ) an L-function as above. The 1-level density is We may generalize the above to n-level densities; while for some applications it is essential to understand these generalizations 58 , for many purposes studying the 1-level density suffices. This statistic differs in several important ways from the n-level correlation.
The first is that individual zeros now contribute in the limit. Moreover, most of the contribution is from the zeros near the central point (thus this statistic is sensitive to what is happening there). This is because φ is of rapid decay, so once we are a couple of average spacings away, there is negligible contribution. There is a trade-off, namely it no longer suffices to study just one L-function. The reason is we always need something to average over, and there are just too few zeros near the central point on this scale. The solution is to look at the zeros near the central point for many L-functions that share common properties. We average the 1-level densities over the family. Unlike the n-level correlations, where we are looking high up on the critical line and see the same behavior in all L-functions, we see very different behavior near the central point, depending on what family of L-functions we study. Katz and Sarnak [KaSa1,KaSa2] conjecture that to any 'nice' family of L-functions, as the conductors tend to infinity the 1-level density of the family agrees with the N → ∞ scaling limit of a classical compact group (typically N × N unitary, orthogonal or symplectic matrices). Moreover, these groups all have distinguishable behavior. In other words, the universality seen in the n-level correlations is broken.
Before describing the proof, we give some examples of families of L-functions and the corresponding symmetries.
(1) Dirichlet L-functions: Let m be a prime and consider all nonprincipal 59 Dirichlet characters χ from (Z/mZ) * to the complex numbers of absolute value 1. To each character χ we have an L-function L(s, χ) = n χ(n)/n s = p (1 − χ(p)p −s ) −1 . As q → ∞, the behavior agrees with the scaling limit of unitary 58 For example, there are three flavors of orthogonal symmetry, and their 1-level densities are indistinguishable if the support of φ is contained in (−1, 1); however, the 2-level densities of all three are distinguishable for arbitrarily small support [Mil1]. Another example is in obtaining better decay rates for high vanishing at the central point in a family [HM]. 59 This means we avoid the trivial character χ 0 (n) = 1 if n is relatively prime to m and 0 otherwise; this character gives rise to a simple modification of ζ(s).

matrices. 60
(2) Cuspidal newforms: Let We say f is a weight k holomorphic cuspform of level N if where γz = (az + b)/(cz + d). As k or N tend to infinity, the behavior agrees with the scaling limit of orthogonal matrices. 61 (3) Elliptic curves: Let E : y 2 = x 3 + A(T )x + B(T ) be an elliptic curve over Q(T ). For each t ∈ Z we can specialize and get an elliptic curve E t : y 2 = x 3 + A(t)x + B(t). We can build an L-function, where a p (t) is related to the number of solutions to Our family is now t ∈ [X, 2X] with X → ∞, and these families have orthogonal symmetry.
In these and many other cases (see [DM1,FI,Gao,Gü,HM,HR,ILS,KaSa2,Mil1,Mil5,Ro,Rub,Yo2] for a representative sampling of results), we can show for suitably restricted φ that the 1-level density agrees with the scaling limit of one of the mentioned classical compact groups.
5.3. 1-level Density (Proofs). We briefly describe how the proofs proceed. We concentrate on the family of Dirichlet characters with prime conductor q tending to infinity. As in the proof of Wigner's semi-circle law, there are three steps.
⋄ We first must determine the correct scale to study the zeros near the central point. The answer can be shown to follow from the functional equation; in this case, we normalize the zeros by the factor (log(m/π))/(2π). 60 We could consider the related family of quadratic Dirichlet characters coming from a fundamental discriminant d ∈ [X, 2X] with X → ∞; this family agrees with the scaling limit of symplectic matrices. 61 There are actually three flavors of orthogonal groups. If all the signs of the functional equation are even, the corresponding group should be SO(2N ), and similarly if the signs are all odd. We refer the reader to the previously mentioned surveys for the details.
⋄ The next step is to relate what we want to study, namely the zeros near the central point, to something we have a chance of understanding, in this case the coefficients of the L-function, χ(n). We do this with a generalization of Riemann's explicit formula (this is the analogue of the Eigenvalue Trace Lemma). Our result is a straightforward generalization of the formula Riemann used to connect the zeros of ζ(s) with the distribution of primes. The difference here is that we have a more general test function. It can be shown (see [ILS, RS]) that for φ an even Schwartz function and L(s, χ) = n χ(n)/n s a Dirichlet L-function from a non-trivial character χ with conductor m and zeros ρ = 1 2 + iγ χ,ρ , then Note the left hand side is a sum over zeros and the right hand side a sum over the coefficients in the L-function. We also have φ on the right hand side. We now see how the support condition enters. As we assume φ is supported in (−σ, σ), this restricts the sums on the right to having only finitely many terms. This is the 1-level density for one Dirichlet L-function L(s, χ). We now average over all non-principal χ (i.e., χ = χ 0 ). We will see below that there are m − 2 such characters, and thus we obtain (5.10) ⋄ Similar to the proof of Wigner's semi-circle law, our explicit formula would be useless unless we can perform the averaging over the family. We briefly review some needed results about these characters (see for instance [MT-B]), and then show how to handle the sums. Unfortunately, the averaging formulas in number theory are significantly worse than the corresponding averaging formulas in random matrix theory; this leads to far more restricted results in number theory. 62 For m prime, the group (Z/mZ) * is cyclic of order m − 1; 63 denote its generator by g. Let ζ m−1 = e 2πi/(m−1) . The principal character χ 0 is given by As the characters are group homomorphisms from (Z/mZ) * to complex numbers of absolute value 1, the m − 2 primitive characters are determined by their action on g. 64 Thus there exists an ℓ such that χ(g) = ζ ℓ m−1 . A simple calculation (use the explicit representation for the characters and the geometric series formula) shows that where we are summing over all characters, including the principal one. It is easy to remove the contribution from the principal character, and we find for any prime p = m χ =χ 0 χ(p) = −1 + m − 1 p ≡ 1(m) −1 otherwise.
(5.13) 62 In fact, the case being studied here has the best averaging formula! For cuspidal newforms we have the Petersson formula, and for families of elliptic curves we can use periodicity in evaluating Legendre sums of cubic polynomials. In general, however, unless our family is obtained in some manner from a family such as one of these, we do not possess the needed averaging formula. 63 (Z/mZ) * is the set {1, 2, . . . , m − 1} where multiplication and addition are modulo m; for example, if m = 17, x = 11 and y = 9 then xy = 99 = 5 · 17 + 14, so xy ≡ 14 mod 17, while x + y = 20 ≡ 3 mod 17. The hardest step in proving that this is a group under multiplication is finding inverses; one way to accomplish this is with the Euclidean algorithm.
64 These properties mean χ(1) = 1, χ(xy) = χ(x)χ(y) and χ(x r ) = χ(x) r . Further, though initially only defined on (Z/mZ) * , we extend the definition to all of Z by setting χ(n + ℓm) = χ(n). As g m−1 ≡ 1 mod m, χ(g m−1 ) = 1 or χ(g) m−1 = 1 for all χ. Thus χ(g) has to be an m − 1 root of unity. We see each of these roots gives rise to a character (one of which is the principal or trivial character); by multiplicativity once we know the character's action on the generator we know it everywhere.
We expand on the above. Assuming the (Generalized) Riemann Hypothesis 66 , the zeros of our L-functions lie on the critical line ℜ(s) = 1/2. Thus we may order them, and it makes sense to talk about spacings between adjacent zeros. Following the Pólya-Hilbert dream (and there are numerous people pursuing this), we may even try to search for a physical system whose energy levels are these zeros! Regardless, we have two sequences of real numbers: the zeros of our L-function(s) and the energy levels of our heavy nucleus (nuclei).
How do we understand the structure of the nucleus? We bombard it with low energy neutrons, and see what happens. The analogy on the number theory side is we 'bombard' the zeros of our L-function with our Schwartz test function; what we 'see' now is a sum over primes.
In physics, ideally we would like to be able to send in a neutron with any energy; unfortunately, current technology only allows us to send in neutrons with energy in a given band. Thus we cannot obtain perfect information about the internal structure of the nucleus and its energy levels. This corresponds exactly with number theory, where the support of the Fourier transform of our Schwartz test function is playing the role of the neutron's energy. Bombarding the zeros with a test function is equivalent to summing the Fourier transform against related quantities, and our averaging formulas are only able to handle certain restricted sums.
It is worth dwelling on this last observation a little more. The Heisenberg Uncertainty Principle can be recast in mathematical terms as a statement about a function and its Fourier transform, namely it is not possible to simultaneously localize f and f (i.e., the product of the variances, their spreads about their means, cannot be too small). Ideally we would like to take δ(x − a) as our test function, as this would allow us to understand whether or not there are zeros at a. In particular, if we take a = 0 we would understand the behavior at the critical point. Unfortunately the Fourier transform of δ(x) is identically 1; this corresponds to having absolutely no control on the prime sum side. 5.5. Future avenues. Random matrix theory has enjoyed remarkable success in suggesting questions and predicting answers for number theory. The n-level correlations and densities are two of many examples. One problem, however, is that random matrix theory often cannot detect the arithmetic of the L-functions, and this must be added (in a sometimes unsatisfying manner). A terrific example of this is in the study of moments of L-functions (see [CGo,CGh,CFKRS,KeSn1,KeSn2] and the references therein); see also the discussion below on the hybrid product formulas of Gonek, Hughes and Keating. For families of L-functions {L(s, f i )} f i ∈F i (i ∈ {1, 2, . . . , I}), we can form the Rankin-Selberg convolution and study the family {L(s, f 1 ⊗ · · · ⊗ f I )} (f 1 ,...,f I )∈F 1 ×···×F I ; see [IK] for details. 67 Given a family {L(s, f i )} f i ∈F i , the Katz-Sarnak conjecture states the behavior of zeros near the central point (as the conductors tend to infinity) agrees with the N → ∞ scaling limit of a subgroup of unitary matrices U(N). A natural question to ask is how the behavior of zeros in the family of convolutions is related to the behavior of the constituent families. Dueñez-Miller show that if the family of L-functions are 'nice' (see [DM2] for statements and proofs 68 ), then we can attach a symmetry constant c(F i ) to each family satisfying the following conditions: (1) c(F i ) is 0 if the family has unitary symmetry, 1 if the family has symplectic symmetry and −1 if the family has orthogonal symmetry; (2) c(F i × F j ) = c(F i ) × c(F j ). In other words, for many families the symmetry type of the convolution is the product of the symmetry types. 69 This leads to a very nice map from families of L-functions to {0, ±1}. 70 Another problem is that the main term in the 1-level density agrees with random matrix theory, but the arithmetic of the family does not 67 If L(s, f i ) = L ∞;i (s) p mi j=1 (1 − α j;i (p)p −s ) −1 , then L(s, f 1 ⊗ f 2 ) is related to a product over primes of m1 j=1 m2 k=1 (1 − α j;1 (p)α k;2 (p)p −s ) −1 . It is conjectured that these functions have functional equations, satisfy the Riemann hypothesis, et cetera. The existence of the Rankin-Selberg convolution is known for just a few choices of the f i 's. 68 There are several difficulties with the proofs in general, ranging from not knowing properties of the Rankin-Selberg convolution in general to not having a good averaging formula over general families. 69 A special case of this theorem was discovered by  in studies of a family of GL(4) and a family of GL(6) L-functions. The analysis there led to a disproof of a folklore conjecture that the theory of low-lying zeros of Lfunctions is equivalent to a theory of the distribution of signs of the functional equations in the family; see [DM1,DM2] for details. The key ingredient in the proofs is the universality of the second moments of the Satake parameters α j;i (p); this is similar to the universality found by Rudnick and Sarnak [RS] in the n-level correlations. The higher moments of the Satake parameters control the rate of convergence to the random matrix predictions.
70 Instead of attaching a symmetry constant, additionally one can attach a symmetry vector which incorporates other information, such as the rank of the family. surface until we examine the lower order terms (which control the rate of convergence; see for example [FI,Mil5,Yo1]). One promising line of research is the L-functions Ratios Conjecture [CFZ1,CFZ2], which is supported by corresponding calculations for random matrix ensembles (see [CS,GJMMNPP,Mil4,Mil6,MilMo,Sto] for some recent work supporting these conjectures, especially [CS] for a very accessible introduction to the method and a summary of its successes). Another approach is through hybrid product formulas [GHK]. A typical L-function has two product representations, one as an Euler product over primes, and one as a Hadamard product over its zeros. In this approach an L-function is modeled by the product of a partial Euler and a partial Hadamard product. The Hadamard piece is believed to be wellmodeled by random matrix theory, while the Euler product introduces the arithmetic. Thus the interplay between random matrix theory and number theory continues, and what began as a chance meeting in the 1970's now yields over 1,000,000 hits on a google search 71 (as of August 2009).