A New Class of Weighted CUSUM Statistics

A change point is a location or time at which observations or data obey two different models: before and after. In real problems, we may know some prior information about the location of the change point, say at the right or left tail of the sequence. How does one incorporate the prior information into the current cumulative sum (CUSUM) statistics? We propose a new class of weighted CUSUM statistics with three different types of quadratic weights accounting for different prior positions of the change points. One interpretation of the weights is the mean duration in a random walk. Under the normal model with known variance, the exact distributions of these statistics are explicitly expressed in terms of eigenvalues. Theoretical results about the explicit difference of the distributions are valuable. The expansions of asymptotic distributions are compared with the expansion of the limit distributions of the Cramér-von Mises statistic and the Anderson and Darling statistic. We provide some extensions from independent normal responses to more interesting models, such as graphical models, the mixture of normals, Poisson, and weakly dependent models. Simulations suggest that the proposed test statistics have better power than the graph-based statistics. We illustrate their application to a detection problem with video data.


Introduction
A change point is a location or time at which observations or data obey two different models: before and after. Detecting change points is a nontrivial problem and has been studied by many authors; see a book treatment in [1] and recent advances in CUSUM-based change point tests [2][3][4]. In real problems, we may know some prior information about the location of the change point, say at the right or left tail of the sequence. How does one incorporate prior information into current CUSUM-based statistics? We consider a new class of weighted CUSUM statistics for a simple model and provide some extensions to more complicated models.
Given a series of univariate random variables Y 1 , . . . , Y n , we consider the problem of testing whether there is a change in the mean of their distribution. The test statistic we use is: where Y = (Y 1 , . . . , Y n ) , Y = n −1 ∑ n j=1 Y j , γ > 0, and w k (τ) = −(k − τ) 2 + max{τ 2 , (n − τ) 2 where τ = 0, n/2, and n account for three different prior positions of the change point, respectively. We call S n a weighted CUSUM (WC) statistic. Inspired by the change point literature, we consider these types of quadratic weights. The term max{τ 2 , (n − τ) 2 } = max 0≤j≤n (j − τ) 2 is introduced to ensure that the weight w k (τ) is positive for any 0 < k < n. Usually, we choose γ = 2 to capture the change in the mean. When τ = n/2, the weight w k (n/2) = k(n − k) corresponds to the likelihood ratio test; see Csörgő and Horváth [1] and a related review in Jandhyala et al. [5]. If prior information indicates that the change point more likely occurs in the right or left tail of the sequence, we can set the weight w k (0) = (n + k)(n − k) (left drifted to the symmetry center point 0) or w k (n) = k(2n − k) (right drifted to the symmetry center point n) to improve the power of the test.
One interpretation of the weights is the mean duration in a random walk {X i , i ≥ 0} on N + 1 states, {0, 1, . . . , N}, whose transition probability is given by P(X i+1 = k ± 1|X i = k) = 1/2 for k = 1, . . . , N − 1, P(X i+1 = 0|X i = 0) = 1, and P(X i+1 = N|X i = N) = 1. Let T denote the random time at which the process first reaches 0 or N. Then, for k = 1, . . . , n − 1, E(T|X 0 = k) = k(n − k) = w k (n/2) if N = n; E(T|X 0 = k) = k(2n − k) = w k (n) if N = 2n; and E(T|X 0 = n − k) = (n + k)(n − k) = w k (0) if N = 2n. Figure 1 depicts four vectors w k for n = 10. The centers of symmetry of these quadratic weights are at different positions. The weights in (1) can be thought of as an inverse prior probability on the change point, giving S n a Bayesian flavor, as in Gardner [6], who used the uniform prior n −2 , or Perron [7], who devised a unit-root test for time series. From a frequentist perspective, the weighted sum statistic offers an alternative to the maximum statistic most commonly used Csörgő and Horváth [1], which we show (in small simulations omitted here) has higher power, especially when the change point is at the center of the sequence for any τ, in the right tail of the sequence for τ = n, and in the left tail of the sequence for τ = 0.
For these types of quadratic weights, a couple of questions naturally arise: will different weights lead to different distributions of WC in Equation (1)? If so, how significant will the differences in the distribution be? If two different weights lead to the same distribution, are there any intrinsic reasons? Although one can estimate the distribution of WC by simulation, theoretical results about the explicit differences of the distributions are valuable. Moreover, simulations and computations of eigenvalues for large n are computationally expensive. To answer the aforementioned questions, we shall study the distribution of the WC theoretically; we derive Karhunen-Loève expansions of the exact and asymptotic distributions of the WC statistics. The calculation of a Karhunen-Loève expansion is a nontrivial task, even under the normal model. Gardner [6] discussed the uniform weight under the normal assumption, but the quadratic weights we consider here increase the difficulty substantially. We present below a unified theory that enables us to establish the distribution of WC using dual Hahn polynomials. The asymptotic distributions for the quadratic weights w k (0) and w k (n) are identical, and the expansions of asymptotic distributions between w k (0) and another quadratic weight w k (n/2) differ by an odd number of terms. We make a comparison with the expansion of the limit distributions of the Cramér-von Mises statistic and the Anderson and Darling statistics; see also MacNeill [8].
The WC has some variants in other models. For example, in the graphical model, γ can be 1 if we replace Y with a count of edges. Here, the main challenge is to approximate the covariance of edge-count statistics under the null permutations. In the normal mixed model, a variant of WC can be derived by considering a marginal likelihood function. In the Poisson mixed model, however, the calculation of the marginal likelihood function is hindered by an integral without a closed form. To approximate this integral, one may use Laplace, or saddle point approximation [9][10][11][12][13]. Here, we apply the saddle point approximation to the integral and provide a variant of WC related to the log link. For the classical change point Poisson model without latent variables, see [1] (p. 27); for the Poisson process with a change point, we refer readers to Akman and Raftery [14], Loader [15]. Moreover, to adopt the assumption of weak dependence in practice, we avoid the estimation of the variance and provide a randomized version of WC.
The structure of the paper is outlined as follows. In Section 2, we derive the explicit expansions of the distribution of the WC statistics and explore their connections with the Karhunen-Loève expansion. We derive extended versions of WC by considering the observations as nodes in the graphical model and allowing the observations from a normal or Poisson mixed model to be weakly dependent. In Section 3, we discuss the power of the proposed WC test. In Section 4, we use simulation to compare the performance of this test with that of a graph-based test statistic. In Section 5, we present an application for video data. In Section 6, we discuss the extension to multiple change points and suggest future work on other quadratic weights.

Explicit Distribution for a Normal Model
We assume here that {Y i } are independent following a normal distribution with a common known variance σ 2 . The case of unknown σ 2 is addressed in Remark 3, and an extension relaxing the independence assumption is given in Section 2.6.
Following the derivation in Gardner [6], we write (1) as a quadratic form where p k = p k (τ) = w −1 k (τ), and n 2 Q = AA with A = (A 1 , · · · , A n−1 ). Here, A k = p 1/2 k (n − k, · · · , n − k, −k, · · · , −k) such that the first k entries of A k are p 1/2 k (n − k) and the last n − k entries −p 1/2 k k. By using the recurrence identity and the dual Hahn polynomial, we obtain a new exact result in terms of the eigenvalues of Q in (3). Theorem 1. Assume that {Y i } are independent normally distributed random variables with a common mean and known variance σ 2 . The exact distribution of S n (Y; τ, 2) is S n (Y; τ, 2) where Z 2 k are independent and identically distributed normal random variables with mean zero and variance 1, λ n (τ) = 0, and The proof of Theorem 1 is given in Appendix A. We make the following remarks.

Remark 1.
It is interesting that λ 2k (n/2) = λ k (0) for all 0 < k < n/2; namely, the eigenvalues for w k (n/2) with even indices coincide with the eigenvalues for w k (0) with indices less than n/2. As the sample size increases from n to n + 1, the n − 1 nonzero eigenvalues are retained and the added nonzero eigenvalue must be 1/{n(n + 1)} for w k (n/2) or 1/{2n(2n + 1)} for w k (0) or w k (n). This interesting phenomenon has not been seen in the uniform weights of Gardner [6]. As far as we know, this recursive property of the eigenvalues for the non-uniform weights is new. Figure 2 depicts the pattern of eigenvalues (cross products of rows and columns) illustrated by dots for three weights w k (n/2) (blue), w k (0) (green), and w k (n) (purple) with the increase of n.  Figure 2. The pattern of eigenvalues (cross products of rows and columns) illustrated by dots for three weights w k (n/2) for τ = n/2 (blue), w k (0) for τ = 0 (green) and w k (n) for τ = n (purple) with the increase of n.

Remark 2.
The distribution in (4) can be calculated numerically using Imhof's method [16] or simulated by a Monte Carlo method, but accurate analytical approximations are potentially faster and more stable. A saddle point approximation to the distribution of quadratic forms in normal variates was studied in Kuonen [17], building on Daniels [9,18] and Lugannani and Rice [19].

Remark 3.
When the variance σ 2 is unknown, we can replace σ 2 with a consistent estimator by using Slutsky's lemma. This also holds in Corollary 1. For dependent data, one issue is to give a valid estimate of the variance; see Section 2.6.

Karhunen-Loève Expansion
The squared integral of a Brownian bridge arises in the study of tests for goodness-offit. Given a sample of independent and identically distributed random variables with an empirical distribution function F n (x), the statistic provides a test of the null hypothesis that the observations come from the distribution F(·). The Cramér-von Mises statistic has ψ(t) ≡ 1, and the Anderson-Darling statistic has Here, we shall discuss two new weights: MacNeill [8] showed that ∞} is an orthonormal basis in L 2 (0, 1) and B(t) is a standard Brownian motion and B(t) − tB(1) is a Brownian bridge.
Anderson and Darling [20] showed that In Appendix B, we use Jacobi polynomials to derive the Karhunen-Loève expansion for the integrals of the weighted square of the Brownian bridge with two new weights ψ(t) = 1/ {t(2 − t)} and ψ(t) = 1/(1 − t 2 ). The results are stated in the following theorem. Theorem 2. The two weights ψ(t) = 1/{t(2 − t)} and ψ(t) = 1/(1 − t 2 ) lead to the same Karhunen-Loève expansions: The proof of the above two equalities will be provided in Appendix B. One can see the equivalence of these two equalities by using a change of variable.
Given different probabilities (p), Table 1 presents the critical values c p for which p = P(χ 2 n (τ) ≤ c p ) for different n, where χ 2 n (τ) = ∑ n k=1 λ k (τ)Z 2 k and calculations of critical values for finite n are based on Imhof's method [16] implemented in R package CompQuadForm [21]. A few critical values are tabulated in Anderson and Darling [22] for χ 2 n (τ) with n = ∞. One can see the critical values converge very quickly as n increases to ∞. In fact, we can connect the limit distribution of WC statistic and its functional limit distribution by the Karhunen-Loève expansion of the integral of the weighted square of Brownian bridge in terms of the Jacobi polynomials. Theorem 1 immediately implies the following asymptotic distribution as n → ∞.

Corollary 1.
Under the assumptions of Theorem 1, when n → ∞, converges to ∑ ∞ k=1 λ k (τ)Z 2 k in probability as n → ∞. By the functional limit theorem,

Graphical Model
Assume the {Y i,j , 1 ≤ i ≤ n, 1 ≤ j ≤ q} are independent and have common mean where µ − = µ + or σ 2 − = σ 2 + , the parameters µ, µ − , µ + , σ 2 , σ 2 − , and σ 2 + are unknown. A graphical model can be established by treating each q-dimensional vector as a node and assigning the Euclidean distance between any two vectors. Here, we consider a path P with an ordering of nodes (v 1 , . . . , v n ) and edges (v i , v i+1 ) for i = 1, . . . , n − 1. Associated with the path, the count of edges that connect nodes between arbitrary two disjoint sets N k = {1, . . . , k} and N k = {k + 1, . . . , n} is defined to be: where I(·) is an indicator function that takes 1 if true otherwise 0. The C P (N k , N k ) counts edges between two groups N k and N k .
Denote the expectation and variance of C P (N k , N k ) under n! permutations of nodes as E perm C P (N k , N k ) and Var perm C P (N k , N k ). By [23], A WC statistic may be constructed as A large value of observed S n (P * ; τ, γ) based on the shortest Hamiltonian path (SHP), P * , indicates a rejection of the null hypothesis, i.e., there is a change point; see the heuristic algorithm of SHP in Biswas et al. [24] and the analysis of power and change point in Shi, Wu and Rao [25,26] for γ = 2 and w k (τ) = Var perm C P (N k , N k ). Here, we will establish the asymptotic distribution of S n (P; τ, γ) for γ = 1, 2. First, we give the following Lemma.
By the functional limit theorem, and which solves an open problem in [25,26]. Different values of γ lead to different rates of convergence and different "normings".

Normal Mixed Model
Assume j are independent and identically normally distributed with mean zero and variance σ 2 , and U i are independent latent variables following a normal distribution with mean zero and variance ν 2 .
Consider testing where µ − = µ + , the parameters µ, µ − and µ + are unknown, and we tentatively assume the time k * , called the change point, and the variances σ 2 and ν 2 to be known. The marginal log-likelihood function of µ under H 0 is . In a similar way, the marginal log-likelihood function of µ − and µ + under H a can be obtained. Then, the marginal log-likelihood ratio is As the change point k * could be unknown in practice, we may sum over k * = 1, . . . , n − 1 and consider the average value, which leads to where By Theorem 1 and Remark (3) in terms of weighted version for any τ, as n → ∞,

Poisson Mixed Model
where ρ − = ρ + , the parameters ρ, ρ − and ρ + are unknown. Under normal distribution for U i , the likelihood ratio contains an integral. With the focus on the simple Poisson mixed model without a change point, Hall et al. [27,28] applied the Gaussian variational approximation (GVA) to approximate the integral so as to avoid solving the integral. We provide a saddle point approximation here. The marginal log-likelihood function of ρ under H 0 is where 1 does not depend on r and The calculation of (ρ) is hindered by the lack of a closed form of the integral I i (ρ). Here, we apply the saddle point approximation to the integral as shown in Lemma 2.
where the symbol ≈ means asymptotic equivalence and the saddle point c solves φ (u) = 0 with φ(u) = au − be u , i.e., c = log(a/b).
In (16), I i (ρ) = I(ρ; qY i• , q, ν 2 ), so Lemma 2 gives the leading term as and the leading term approximation to max ρ (ρ) . In a similar way, max ρ 1 ,ρ 2 (ρ 1 , ρ 2 ) under H a can be approximated, giving the approximate log-likelihood ratio Considering that the change point k * is unknown, we may sum (17) over k * = 1, . . . , n − 1 as shown in (1) and consider the average value, Note that the term w k (n/2) is derived from the approximate likelihood ratio statistic, different from the classical Poisson change point statistic in Csörgő and Horváth [1] (p. 27).
By Theorem 1 and Remark 3 in terms of weighted version for any τ, as n, q → ∞,

Weak Dependence
Now, we consider a space-time model for the distribution of Y i,j , where i indexes time and j indexes space. First, we assume some weak dependence conditions on space by supposing the central limit theorem holds: where σ 2 = lim q→∞ qvar(Ȳ i• ). Next, we assume some weak dependence conditions on time by supposing that the following invariance principle or functional central limit theorem holds for any t ∈ (0, 1) [29,30]: where Y •• = ∑ n i=1 Y i• /n =μ 1,n andσ 2 = lim nq→∞ nq var(Ȳ •• ).
The weak dependence conditions in (19) and (20) are satisfied if the series is mdependence, mixing, or linear process. Shao and Zhang [31] proposed a normalized change point statistic where Similarly, with the same w k as above, we propose a randomized version of WC: By the functional central limit theorem, when n → ∞,

Power and Change Point Estimation
Considering the WC statistic S n (Y • ; τ, 2) in (14), we now consider the power of change point test based on S n (Y • ; τ, 2) under the alternative hypothesis in Section 2.4. We assume some weak dependence conditions in Section 2.6. We note that (23) has the same asymptotic null distribution as (4) in Theorem 1. The asymptotic distribution is shown in Theorem 2. To establish the consistency of the test, we make a further assumption that the change point index k * is bounded away from the endpoints.
Under the alternative hypothesis, the change magnitude ∆ = µ + − µ − = 0. Under a weak dependence satisfying (19) and (20), 0 < τ 1 ≤ k * /n ≤ τ 2 < 1, τ 1 and τ 2 are two constants, nq∆ 2 → ∞, and n 3 The proof of Theorem 3 is in Appendix E. As expected, the power of the test based on (23) increases with n, q, and the size of the change in the mean.
The estimated change point iŝ We refer the reader to Bai [32,33] for some early works on the asymptotic distribution ofk(n/2) and [34] for a treatment on the convergence rate ofk(n/2).

Simulations
The main purpose of this simulation is to assess the effect of different values for w k (τ), n, q, and change magnitude on the power of our test in (23), and that of the graph-based tests [25,26,35], as both can handle high-dimensional data, and the distance of the graph can be changed to test different changes of parameters for a fair comparison. For example, if we are not sure whether the mean or variance changes, the Euclidean distance can be used to measure the distance between any two nodes in the graph: see Chen and Zhang [35], and Shi, Wu and Rao [25]. Another pseudo-distance can be used if only the change in the mean needs to be detected; see Shi, Wu and Rao [26]. We denote the maximal test of Chen and Zhang based on Euclidean distance by MST and based on the pseudo-distance by MST*. The associated algorithm is in the R package gSeg [36]. Similarly, we denote Shi, Wu, and Rao's test (Shi, Wu and Rao [25,26]) based on Euclidean distance by SHP and based on the pseudo-distance by SHP * , and the associated R package can be accessed from [37]. First, we simulate {Y i,j , 1 ≤ i ≤ k * , 1 ≤ j ≤ q} independent standard normal random variables and {Y i,j , k * + 1 ≤ i ≤ n, 1 ≤ j ≤ q} independent normal random variables with mean ∆ and variance 1. The critical values for α = 0.05 are given in Table 1 with p = 1 − α. We use these critical values and generate 200 simulations with sample sizes n = 40, 80, dimensions q = 50, 100, change point locations k * = n/4, n/2, 3n/4, and change magnitude ∆ = 0.1, 0.2.
In Table 2, we show the percentage of rejections of the null hypothesis at level 0.05 for each of the change point tests. We can see that the power of the graph-based method MST * or SHP * is higher than that of MST and SHP, which use the pseudo-distance for detecting changes in the mean. Interestingly, the power of the graph-based method for change point detection is still not as high as that of (23). This aspect of the comparison, which we have not seen in other literature so far, is considered a new and meaningful comparison, and at least we can claim that there is room for improvement in the change point detection of the graph-based method. Now we look at the effect of the weights on the power. This weight w k (n/2) yields the highest power when the change point is in the middle; however, the w k (n) weight yields the highest power when the change point is near the beginning of the sequence, and conversely, the w k (0) weight yields the highest power when the change point is near the end of the sequence. Moreover, the power increases with increasing n, q, and ∆, which agrees with Theorem 3. Now, we introduce a mixture distribution and slightly change the way the random variables are generated. We simulate {Y i,j , k * + 1 ≤ i ≤ n, 1 ≤ j ≤ q} from a mixture of two normal distributions with mixture weights (0.5, 0.5) or (0.8, 0.2), means (0, 0.2) or (0, 1), and variance always being (1, 1), which corresponds to ∆ = 0.1 or ∆ = 0.2. We keep the other settings from the previous comparison. As we expected, the difference between Table 2 and Table 3 is very small. Table 2. Estimated power (%) for the w k (0), w k (n/2), and w k (n) in (23), MST, MST * , SHP, and SHP * , based on 200 simulations; n are the sample sizes, q are the dimensions, k * are the change point locations, and ∆ is the size of the change in the mean of the normal random variables.  Table 3. Estimated power (%) for the w k (0), w k (n/2), w k (n), MST, MST * in (23)

Data Analysis
Here, we analyze the video data provided by Dr. Mathieu Lihorea, which are available from [26]. In Lihoreau, Chittka and Raine [38], the authors used artificial pollen to attract bees and an automatic monitoring camera to capture the bee's flight path. However, this automatic monitoring feature does not fully start recording when the bee enters and stops recording when the bee leaves, in fact in this video, the recording starts before the bee enters and does not stop when the bee leaves. Since we only care about the part of the video with bees, detecting the arrival and departure of bees helps us to automatically cut the original video. Although the video contains the interference of ants, the bees are much larger compared to the ants, so it can be assumed that the presence and departure of the bees cause a change in the mean value of the pixel values of the image.
This video has a length of 49 seconds, a frame width of 352, a frame height of 288, and a frame rate of 29.97 frames per second. Shi, Wu and Rao [26] extracted the video into n = 49 images according to the rate of one frame per second. From these 49 images, we can obtain that the image positions corresponding to the bee entering and leaving are 4 and 40, respectively. Moreover, we can extract this video into more images according to the rate of 2 or 5 frames per second. So, the number of images obtained, n, increases to 98 or 245, and at the same time, the positions of the images corresponding to the entry and exit of the bees also change with n. If we call the image locations where these bees appear and leave as change points, k * , we assume that k * /n is constant with respect to n and close to 0 or 1, respectively. In Figure 3 the first row is four images located at 4 (change point), 5, 40 (change point), and 41 from extracted 49 images; the second row is four images located at 7 (change point), 8, 79 (change point), and 80 from extracted 98 images; and the third row is four images located at 19 (change point), 20, 198 (change point), and 199 from extracted 245 images. Since the images contain R, G, and B components, we use a weighted average of the R, G, and B components and same-scale transformations on the weighted average as suggested by Shi, Wu and Rao [26]. Our quadratic weight test statistics are able to detect these two change points. We compared them to the graph-based change point estimates by applying the method of SHP * and MST * once to the whole sequence. As shown in Table 4, all tests are significant at a level 0.05 except the quadratic weight w k (0) for the size of 49 returns a p-value 0.067; w k (0) and w k (n) give the estimates of the second and first change points, respectively; w k (n/2) gives the same estimates of change points as w k (n), and both cannot give the estimates of the second change point, such as SHP * and MST * . Thus, we recommend these two weights w k (0) and w k (n) for detecting the departure and arrival of the bee. Table 4. Estimated change points for the w k (0), w k (n/2), w k (n), MST * , and SHP * , based on extracted 49, 98, and 245 images; n are the sample sizes and k * are the change point locations.

Discussion
This paper mainly focuses on single change point detection. However, it is possible to extend our method and apply the WC statistic to the detection of multiple change points. An approach recommended in the literature is to select data intervals where there is evidence for a single change point. Some researchers suggested penalty procedures based on either the adaptive lasso [39] or smoothly clipped absolute deviation [40][41][42]; others applied CUSUM statistics [4,[43][44][45]. As long as the aforementioned intervals have been chosen, one could use tests based on WC. If the tests are rejected for some of the intervals, then the change point can be estimated by (24).
It would also be of interest, although challenging, to consider other quadratic weights, such as w k (n/4) and w k (3n/4), as these statistics may be more powerful to detect some change points that are close to the third-quarter and quarter positions of the sequence. The eigenvalues of these quadratic terms may not have recursive formulas. Acknowledgments: We thank two anonymous reviewers for helpful comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Proof of Theorem 1. The exact distribution of S n (Y; τ, 2) is determined by the eigenvalues of Q. Define the n × (n − 1) matrix B = (B 1 , · · · , B n−1 ) with B k = p −1/2 k (0, · · · , 0, 1, −1, 0, · · · , 0) such that all entries of B k are zeros except the k entry p −1/2 k and the k + 1 entry −p −1/2 k . It is readily seen that A B = nI n−1 and thus B QB = I n−1 . Note that where P is a diagonal matrix with P kk = p k and T is a tridiagonal matrix with T kk = 2 and T k,k+1 = T k+1,k = −1.
We shall find the relationship between Q and B B. We diagonalize B B = RΓR with RR = I n and Γ diagonal matrix with Γ kk = γ k . Set C = BRΓ −1/2 . We have C C = I n−1 and C QC = Γ −1 . Finally, we introduce u = (n −1/2 , · · · , n −1/2 ) such that Qu = 0, C u = 0 and u u = 1. Define U = (C, u). It then follows that U U = I n and This implies that the nonzero eigenvalues of Q are reciprocals of those of B B.
Let (v 1 , · · · , v n−1 ) be the eigenvector corresponding to an eigenvalue λ of B B. We have the recurrence identity where v 0 = v n = 0, p 0 = 1, and k = 1, · · · , n − 1. The above recurrence relation appears in Gardner [6] (1.7) as an eigenvalue equation for a forward difference operator. As mentioned in Gardner [6], it is difficult to find an explicit formula for the eigenvalues unless the prior distribution is uniform; i.e., p k is independent of k. To overcome this difficulty, we make use of the above recurrence relation and then apply the classical theory of orthogonal polynomials and special functions. To be more specific, we shall link the eigenvector in (A1) to the dual Hahn polynomial by making some transformations.
Since it is a quadratic form in normal random variables, the results follow.

Appendix B
Proof of Theorem 2. Case I. For the weight 2t − t 2 , we define X t = B(t)−tB(1) √ 2t−t 2 . By the Karhunen-Loève expansion, where random variables Z k are stochastically independent normal and e k (·) are an orthonormal basis. Then, the integral of the square of X t becomes ∑ ∞ k=1 Z 2 k , and we need the variance of Z k . We consider the covariance of X t , called the Mercer Kernel: By Mercer's theorem, there exists a set {λ k , e k (t)} such that where λ k are eigenvalues and e k (t) are eigenfunctions satisfying the Fredholm integral equation Denote e(t) = √ 2t − t 2 f (t). After the multiplication of √ 2t − t 2 on both sides, the above eigenvalue problem becomes Note that [min(t, s) − ts]s k ds = t − t k+2 (k + 1)(k + 2) .
We then have Obviously, λ = 0; otherwise f k = 0 for all k ≥ 0. Now, we compare the coefficients of t k with k ≥ 0 on both sides of the above identity. It follows that , We shall prove that λ n = {1/[(2n)(2n + 1)], n = 1, . . .} are eigenvalues of K X (t, s). To see this, we obtain from the above recurrence relation Making use of the Pochhammer symbol (a) j := ∏ j−1 i=0 (a + i), we obtain Consequently, we then have which agrees with λ = 1/[(2n)(2n + 1)]. By normalizing f 0 = 1, we can express the eigenfunction as where Z k are independent normal random variables, each having mean zero and variance 1. That proves (5). Case II. For the weight 1 − t 2 , we intend to solve the eigenvalue problem Denote e(t) = √ 1 − t 2 f (t). After the multiplication of √ 1 − t 2 on both sides, the above eigenvalue problem becomes Note that We then have Obviously, λ = 0; otherwise f k = 0 for all k ≥ 0. Now, we compare the coefficients of t k with k ≥ 0 on both sides of the above identity. It follows that f 0 = 0, and , On account of f 0 = 0, the above recurrence relation implies that f 2j = 0 for all j ≥ 0. Now, we set g j = f 2j+1 with j ≥ 0. The above recurrence relation (with k = 2j + 1) becomes For convenience, we let 1/λ = 2µ(2µ + 1). It is readily seen that Making use of the Pochhammer symbol (a) j := ∏ Consequently, The left-hand side is λ f 1 . When µ = n + 1 with n = 0, 1, · · · , by Pfaff-Saalschütz identity, we calculate the right-hand side as .
To calculate the covariance, we need the following moments for any disjoint subsets A 1 , A 2 , and A 3 of N n . Their sizes are denoted as n 1 , n 2 and n 3 with n 1 + n 2 + n 3 ≤ n.
A simple calculation gives and z s + z 2 /2 .