Next Article in Journal
A Generalized Equilibrium Transform with Application to Error Bounds in the Rényi Theorem with No Support Constraints
Next Article in Special Issue
Theoretical Aspects on Measures of Directed Information with Simulations
Previous Article in Journal
Vibration Signal Processing-Based Detection of Short-Circuited Turns in Transformers: A Nonlinear Mode Decomposition Approach
Previous Article in Special Issue
On a Periodic Capital Injection and Barrier Dividend Strategy in the Compound Poisson Risk Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

One Dimensional Discrete Scan Statistics for Dependent Models and Some Related Problems

by
Alexandru Amarioarei
1,2 and
Cristian Preda
3,4,5,6,*
1
Faculty of Mathematics and Computer Science, University of Bucharest, 010014 Bucharest, Romania
2
National Institute of Research and Development for Biological Sciences, 060031 Bucharest, Romania
3
Laboratoire de Mathématiques Paul Painlevé, University of Lille, 59655 Villeneuve d’Ascq, France
4
Biostatistics Department, Delegation for Clinical Research and Innovation, Lille Catholic Hospitals, GHICL, 59462 Lomme, France
5
Institute of Statistics and Applied Mathematics of the Romanian Academy, 050711 Bucharest, Romania
6
Inria Lille Nord-Europe, MODAL, 59655 Villeneuve d’Ascq, France
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(4), 576; https://doi.org/10.3390/math8040576
Submission received: 18 March 2020 / Revised: 7 April 2020 / Accepted: 9 April 2020 / Published: 13 April 2020
(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Abstract

:
The one dimensional discrete scan statistic is considered over sequences of random variables generated by block factor dependence models. Viewed as a maximum of an 1-dependent stationary sequence, the scan statistics distribution is approximated with accuracy and sharp bounds are provided. The longest increasing run statistics is related to the scan statistics and its distribution is studied. The moving average process is a particular case of block factor and the distribution of the associated scan statistics is approximated. Numerical results are presented.

1. Introduction

There are many situations when an investigator observes an accumulation of events of interest and wants to decide if such a realisation is due to hazard or not. These types of problems belong to the class of cluster detection problems, where the basic idea is to identify regions that are unexpected or anomalous with respect to the distribution of events. Depending on the application domain, these anomalous agglomeration of events can correspond to a diversity of phenomena—for example one may want to find clusters of stars, deposits of precious metals, outbreaks of disease, minefield detections, defectuous batches of pieces and many other possibilities. If such an observed accumulation of events exceeds a preassigned threshold, usually determined from a specified significance level corresponding to a normal situation (the null hypothesis), then it is legitimate to say that we have an unexpected cluster and proper measures has to be taken accordingly.
Searching for unusual clusters of events is of great importance in many scientific and technological fields, including DNA sequence analysis ([1,2]), brain imaging ([3]), target detection in sensors networks ([4,5]), astronomy ([6,7]), reliability theory and quality control ([8]) among many other domains. One of the tools used by practitioners to decide on the unusualness of such agglomeration of events is the scan statistics. Basically, the tests based on scan statistics are looking for events that are clustered amongst a background of those that are sporadic.
Let 2 m T be two positive integers and X 1 , , X T be a sequence of independent and identically distributed random variables with the common distribution F 0 . The one dimensional discrete scan statistics is defined as
S m ( T ) = max 1 i T m + 1 W i ,
where the random variables W i are the moving sums of length m given by
W i = j = i i + m 1 X j .
Usually, the statistical tests based on the one dimensional discrete scan statistics are employed when one wants to detect a local change in the signal within a sequence of T observations via testing the null hypothesis of uniformity, H 0 , against a cluster alternative, H 1 (see References [9,10]). Under H 0 , the random observations X 1 , , X T are i.i.d. distributed as F 0 , while under the alternative hypothesis, there exists a location 1 i 0 T m + 1 where X i , i { i 0 , , i 0 + m 1 } , are distributed according to F 1 F 0 and outside this region X i are distributed as F 0 .
We observe that whenever S m ( T ) exceeds the threshold τ , where the value of τ is computed based on the relation P H 0 S m ( T ) τ = α and α is a preassigned significance level of the testing procedure, the generalized likelihood ratio test rejects the null hypothesis in the favor of the clustering alternative (see Reference [9]). It is interesting to note that most of the research has been done for F 0 being binomial, Poisson or normal distribution (see References [9,10,11,12,13]). More recently, Reference [14] proposed a testing procedure based on one-dimensional scan statistic for geometric and negative binomial distributions.
There are three main approaches used for investigating the exact distribution of the one dimensional discrete scan statistics—the combinatorial methods ([12,15]), the Markov chain imbedding technique ([16,17]) and the conditional probability generating function method ([18,19]). Due to the high complexity and the limited range of application of the exact formulas, a considerable number of approximations and bounds have been developed for the estimation of the distribution of the one dimensional discrete scan statistics, for example, References [9,12,13,20]. A full treatment of these results is presented in References [10,11].
Even if in general the X i ’s are supposed to be i.i.d distributed, there are applications, such as detecting similarities between DNA sequences, where the X i ’s are not independent ([21]). In order to evaluate the effect of dependence, the alternative model is in many cases a Markov chain the whole dependence structure of which depends only on the joint distribution of two consecutive random variables.
In this work we introduce dependence models based on block-factors obtained from i.i.d. sequences in the context of the one dimensional discrete scan statistics. We derive approximations and their corresponding errors with application to the longest increasing run distribution and the moving average process.
The paper is structured as follows. In Section 2 we introduce the block factor model and present the approximation technique for the distribution of the scan statistics under this model. As a particular block factor model, the distribution of the length of the longest increasing run in a trial of i.i.d real random variables and the moving average processes are related to the scan statistics distribution in Section 3. Numerical results based on simulations illustrate the accuracy of the approximation. Concluding remarks end the paper.

2. One Dimensional Scan Statistics for Block-Factor Dependence Model

Most of the research devoted to the one dimensional discrete scan statistic considers the independent and identically distributed model for the random variables that generate the sequence which is to be scanned. In this section, we define a dependence structure for the underlying random sequence based on a block-factor type model.

2.1. The Block-Factor Dependence Model

Let us recall (see also Reference [22]) that the sequence { X i } i 1 of random variables with state space S W is said to be k block-factor of the sequence { Y i } i 1 with state space S Y if there is a measurable function f : S Y k S W such that
X i = f Y i , Y i + 1 , , Y i + k 1 , i 1 .
The Figure 1 presents the sequence { X i } i = 1 , , T of length T obtained as a k block-factor from a sequence { Y i } i = 1 , , T ˜ of length T ˜ = T + k 1 throughout some function f.
As an example of block-factor model, in Reference [23], the authors consider an i.i.d. sequence { Y n } n 1 of standard normal distributed random variables and the 2 block-factor defined by
X i = a Y i + b Y i + 1 , i 1 , for f ( x , y ) = a x + b y , a , b R .
Therefore, due to the overlapping structure of X i and X i + 1 , they obtain a Gaussian stationary process { X i } i 1 with some correlation structure for which the scan statistics distribution is studied.
More generally, let observe that if a sequence { X i } i 1 of random variables is a k block-factor, then the sequence is ( k 1 ) -dependent. Recall that a sequence { X i } i 1 is m-dependent with m 1 (see Reference [22]), if for any h 1 the σ -fields generated by { X 1 , , X h } and { X h + m + 1 , } are independent.

2.2. Scan Statistics Viewed as Maximum of 1-Dependent Sequence

Let { X 1 , , X T } be a k-block factor of the i.i.d. sequence { Y 1 , , Y T ˜ } , where T ˜ 1 k and T = T ˜ k + 1 , and S m ( T ) be the scan statistics associated to the sequence { X i } i = 1 , , T as defined in (1) for some scanning window of length m, 1 m T .
Put T = ( L 1 ) ( m + k 2 ) + m 1 for some integer L 1 and define, for each j { 1 , , L 1 } , the random variables
Z j = max ( j 1 ) ( m + k 2 ) + 1 i j ( m + k 2 ) W i ,
where W i = s = i i + m 1 X s . That is, for each j { 1 , , L 1 } , Z j is the scan statistic associated to the sequence of length 2 m + k 3 , { X ( j 1 ) ( m + k 2 ) + 1 , , X j ( m + k 2 ) + m 1 } . An illustration of the construction of variables Z j is presented in Figure 2 for L = 5 and k = 1 .
Then, { Z j } j = 1 , , L 1 is 1-dependent and stationary and we have that
S ( m , T ) = max 1 j L 1 Z j .
Thus, for any block factor model obtained from an i.i.d sequence, the distribution of the associated scan statistics is the distribution of the maximum of some 1-dependent stationary sequence.

2.3. Approximation

In References [24,25] the authors extended the approximation results obtained in Reference [26] for the distribution of the maximum of a 1-dependent stationary sequence. The main results is stated in the following theorem.
Let { Z j } j 1 be a strictly stationary 1-dependent sequence of random variables and for x < sup { u | P ( Z 1 u ) < 1 } , let
1 q n = q n ( x ) = P ( max ( Z 1 , , Z n ) x ) .
Theorem 1.
For all x such that q 1 ( x ) 1 α 0.9 , the following approximation formula holds:
q n 2 q 1 q 2 1 + q 1 q 2 + 2 ( q 1 q 2 ) 2 n n F ( α , n ) ( 1 q 1 ) 2
with
F ( α , n ) = 1 + 3 n + Γ ( α ) n + K ( α ) ( 1 q 1 )
where Γ ( α ) = L ( α ) + E ( α ) ,
K ( α ) = 11 3 α ( 1 α ) 2 + 2 l ( 1 + 3 α ) 2 + 3 l α α ( 2 l α ) ( 1 + l α ) 2 1 α ( 1 + l α ) 2 3 1 2 α ( 1 + l α ) 1 α ( 1 + l α ) 2 2
L ( α ) = 3 K ( α ) ( 1 + α + 3 α 2 ) [ 1 + α + 3 α 2 + K ( α ) α 3 ] + α 6 K 3 ( α ) + 9 α ( 4 + 3 α + 3 α 2 ) + 55.1
E ( α ) = η 5 1 + ( 1 2 α ) η 4 1 + α ( η 2 ) 1 + η + ( 1 3 α ) η 2 2 ( 1 α η 2 ) 4 ( 1 α η 2 ) 2 α η 2 ( 1 + η 2 α η ) 2
and where η = 1 + l α with l = l ( α ) = t 2 3 ( α ) + ε , for arbitrarily small ε > 0 , and t 2 ( α ) the second root in magnitude of the equation α t 3 t + 1 = 0 .
The evaluation of the functions K and Γ for some selected values of α is presented in Table 1. These values allow to compute directly the error bound approximation in (7).
Applying Theorem 1 to the sequence { Z j } j = 1 , , L 1 defined in (4), from (5) we obtain an approximation and the associated error bound for P ( S ( m , T ) x ) in the following way. Put for s { 2 , 3 } ,
Q s = Q s ( x ) = P j = 1 s 1 { Z j x }
and observe that using the notation of (6) we have Q s = q s 1 . For x such that Q 2 ( x ) 1 α 0.9 we apply the result from Theorem 1 to obtain the approximation
P S ( m , T ) x 2 Q 2 Q 3 1 + Q 2 Q 3 + 2 ( Q 2 Q 3 ) 2 ( L 1 ) ,
with an error bound of about ( L 1 ) F ( α 1 , L 1 ) ( 1 Q 2 ) 2 . Observe that Q 2 and Q 3 represent the distributions of the scan statistics over sequences of variable Z j of lengths 2 m + k 3 and respectively 3 m + 2 k 5 . Q 2 and Q 3 are generally estimated by Monte Carlo simulation.
Thus, if Q ^ s is an estimate of Q s , s { 2 , 3 } , with Q ^ s Q s β s and x is such that 1 Q ^ 2 ( x ) 0.1 then
P S ( m , T ) x 2 Q ^ 2 Q ^ 3 1 + Q ^ 2 Q ^ 3 + 2 ( Q ^ 2 Q ^ 3 ) 2 1 L E t o t a l ,
where E t o t a l is the total error of the approximation given by
E t o t a l = ( L 1 ) β 2 + β 3 + F Q ^ 2 , L 1 1 Q ^ 2 + β 2 2 .
One of the main advantage of this approximation method with respect to the product-type approximation proposed in Reference [12], who uses the same quantities Q 2 and Q 3 , is that it provides sharp error bounds for the approximation.

3. Some Related Problems to the Scan Statistics under Block-Factor Dependence Model

In order to illustrate the efficiency of the approximation (14) and the obtained error bounds, in this section we present two examples of statistics related to discrete scan statistics.

3.1. Length of the Longest Increasing Run in a i.i.d Sequence

Let Y 1 , Y 2 , …, Y T ˜ be a sequence of length T ˜ , T ˜ 1 , of independent and identically distributed random variables with the common distribution F. We say that the subsequence Y k , , Y k + l 1 forms an increasing run (or ascending run) of length l 1 , starting at position k 1 , if it verifies the following relation
Y k 1 > Y k < Y k + 1 < < Y k + l 1 > Y k + l .
We denote the length of the longest increasing run among the first T ˜ random variables by M T ˜ . This run statistics plays an important role in many applications in fields such computer science, reliability theory or quality control. The asymptotic behaviour of M T ˜ has been investigated by several authors depending on the common distribution, F. In the case of a continuous distribution [27] (see also Reference [28]) has shown that this behaviour does not depend on the common law. For the particular setting of uniform U ( [ 0 , 1 ] ) random variables, this problem was addressed by References [29,30,31]. Under the assumption that the distribution F is discrete, the limit behaviour of M T ˜ depends strongly on the common law F, as in Reference [32] (see also References [33,34]) proved for the case of geometric and Poisson distribution. In Reference [35], the case of discrete uniform distribution is investigated, while in Reference [36], the authors study the asymptotic distribution of M T ˜ when the variables are uniformly distributed but not independent.
In this section, we evaluate the distribution of the length of the longest increasing run using the methodology developed in Section 2. The idea is to express the distribution of the random variable M T ˜ in terms of the distribution of the scan statistics random variable.
Let T = T ˜ 1 and define the block-factor transformation f : R 2 R by
f ( x , y ) = 1 , if   x < y 0 , otherwise .
Then, our block-factor model becomes
X i = f ( Y i , Y i + 1 ) = 1 Y i < Y i + 1 .
and X 1 , …, X T form a 1-dependent and stationary sequence of random variables.
Notice that the distribution of M T ˜ and the distribution of the length of the longest run of ones, L T , among the first T binary random variables X i , are related and satisfy the following identity
P M T ˜ m = P L T < m , for   m 1 .
The statistics L T is also known as the length of the longest success run or head run and was extensively studied in the literature. One can consult the monographs of References [16,17] for applications and further results concerning this statistic. Moreover, the random variable L T can be interpreted as a particular case of the scan statistics random variable and between the two we have the relation
P L T m = P S ( m , T ) = m .
Hence, combining (19) and (20), we can express the distribution of the length of the longest increasing run as
P M T ˜ m = P S ( m , T ) < m .
Thus, we can estimate the distribution of M T ˜ using the foregoing identity and the approximations developed in Section 2 for the discrete scan statistics random variable.
We should also note that in Reference [30] the authors studied the asymptotic behaviour of L T over a sequence of m-dependent binary random variables. They showed that, given a stationary m-dependent sequence of random variables with values 0 and 1, V k k 1 , if there exist positive constants t, C such that
P V k + 1 = 1 | V 1 = = V k = 1 1 C k t , for   all   k C ,
then, as N
max 1 k N P L N < k e N r ( k ) = O ( ln N ) h N ,
where r ( k ) = P V 1 = = V k = 1 P V 1 = = V k + 1 = 1 and h = sup { m t , 1 } .
In order to illustrate the accuracy of the approximation of M T ˜ based on scan statistics, using the methodology developed in Section 2, we consider that the random variables Y i ’s have a common uniform U [ 0 , 1 ] distribution. Simple calculations show that P X 1 = = X k = 1 = 1 ( k + 1 ) ! and
P X k + 1 = 1 | X 1 = = X k = 1 = 1 k + 2 1 2 k ,
thus C = 2 , t = 1 and h = 1 . In the context of our particular situation, the result of Reference [30] in Equation (23) becomes:
max 1 m T P L T < m e T r ( m ) = O ln T T ,
where r ( m ) = P X 1 = = X m = 1 P X 1 = = X m + 1 = 1 = m + 1 ( m + 2 ) ! .
In Table 2, we consider a numerical comparison study between the simulated value (column S i m ) obtained by Monte Carlo simulation with I T E R s i m = 10 4 trials, the approximation based on scan statistics (column A p p ) computed from the Equation (14) where Q ^ 2 and Q ^ 3 are computed with I T E R a p p = 10 5 trials and the limit distribution (column L i m A p p ) of the distribution of the length of the longest increasing run, P M T ˜ m , in a sequence of T ˜ = 10001 random variables distributed uniformly over [ 0 , 1 ] . The results show that both our method and the asymptotic approximation in (25) are very accurate. It is worth mentioning that for our simulations we used an adapted version of the Importance Sampling procedure introduced in Reference [3], an efficient method that proved to perform very well for small p values (where naive Monte Carlo methods tend to break down) [37].

3.2. The Moving Average - Like Process of Order q Model

We consider the particular situation of the one dimensional discrete scan statistics defined over a sequence of random variables obtained as a linear block factor of a discrete Gaussian white noise. Because of the similarity with the definition of a classical moving average process, we call that block factor model a moving average - like process. It is worth mentioning that the distribution of the scan statistics in the context of a moving average process for normal data was studied in Reference [38] where the authors compared the product-type approximation developed in Reference [13] with the approximation of Reference [23]. In the block-factor model introduced in (3), let q 1 be a positive integer and Y 1 , Y 2 , …, Y T ˜ be a sequence of independent and identically Gaussian distributed random variables with known mean μ and variance σ 2 .
Let a = ( a 1 , , a q + 1 ) R q + 1 be a fixed non null vector and take f : R q + 1 R , the (measurable) transformation that defines the block-factor model, to be equal with
f ( y 1 , , y q + 1 ) = a 1 y 1 + a 2 y 2 + + a q + 1 y q + 1 .
For i { 1 , , T } , with T = T ˜ q , our dependent model is defined by the relation
X i = a 1 Y i + a 2 Y i + 1 + + a q + 1 Y i + q .
The moving sums of size m, W i , 1 i T m + 1 can be expressed as
W i = j = i i + m 1 X j = b 1 Y i + b 2 Y i + 1 + + b m + q Y i + m 1 + q ,
where the coefficients b 1 , …, b m + q are evaluated by
(a)
For m q ,
b k = j = 1 k a j , k { 1 , , q } j = 1 q + 1 a j , k { q + 1 , , m } j = k m + 1 q + 1 a j , k { m + 1 , , m + q }
(b)
For m < q ,
b k = j = 1 k a j , k { 1 , , m } j = k m + 1 k a j , k { m + 1 , , q } j = k m + 1 q + 1 a j , k { q + 1 , , m + q } .
Therefore, for each i { 1 , , T m + 1 } , the random variable W i follows a normal distribution with mean E W i = ( b 1 + + b m + q ) μ and variance V a r W i = b 1 2 + + b m + q 2 σ 2 . Moreover, a simple calculation shows that the covariance matrix Σ = { C o v W t , W s } has the entries
C o v W t , W s = j = 1 m + q | t s | b j b | t s | + j σ 2 , | t s | m + q 1 0 , otherwise .
Given the mean and the covariance matrix of the vector ( W 1 , , W T m + 1 ) , one can use the importance sampling algorithm developed in Reference [3] (see also Reference [37]) or the one presented in Reference [39] to estimate the distribution of the one dimensional discrete scan statistics S ( m , T ) . Another way is to use the quasi-Monte Carlo algorithm developed in Reference [40] to approximate the multivariate normal distribution.
In our application example we adopt the importance sampling procedure developed in Reference [3]. In order to evaluate the accuracy of the approximation developed in (14), we consider q = 2 , T = 1000 , m = 20 , Y i N ( 0 , 1 ) and the coefficients of the moving average model to be ( a 1 , a 2 , a 3 ) = ( 0.3 , 0.1 , 0.5 ) . We compare our approximation (column A p p ) given in (14) with the one (column AppPT) given in Reference [41] using product-type approximations. In Table 3, we present numerical results for the setting described above. In our algorithms we used I T E R a p p = 10 6 trials for the computation of Q ^ 2 (x) and Q ^ 3 ( x ) and I T E R s i m = 10 5 trials for the Monte-Carlo simulation of the P ( S ( m , T ) x ) .

4. Conclusions

Block factor models defined from i.i.d. sequence generate random sequences with a particular type of dependence structure. For this type of dependence, the scan statistics can be viewed as the maximum of a 1-dependent stationary sequence, for which the distribution can be approximated with high accuracy. The approximation error can be controlled by using efficient algorithms of simulation as for example the importance sampling approach proposed in Reference [3] (see also Reference [25]). We approximated the distribution of longest increasing run statistics over an i.i.d sequence as a particular case of scan statistics distribution over a block factor model.

Author Contributions

Conceptualization, A.A. and C.P.; methodology, A.A. and C.P.; software, A.A.; validation, A.A. and C.P.; formal analysis, A.A. and C.P.; writing original draft preparation, C.P.; writing review and editing, A.A. and C.P.; visualization, A.A. and C.P.; supervision, A.A. and C.P.; project administration, C.P.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by a grant of the Romanian National Authority for Scientific Research and Innovation, project number POC P-37-257 and MCI National Core Program, project 25 N/2019 BIODIVERS 19270103.

Acknowledgments

The authors wish to thank the anonymous reviewers for their careful reading of the manuscript and their helpful suggestions and comments which led to the improvement of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hoh, J.; Ott, J. Scan statistics to scan markers for susceptibility genes. Proc. Natl. Acad. Sci. USA 2000, 97, 9615–9617. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Sheng, K.-N.; Naus, J. Pattern matching between two non aligned random sequences. Bull. Math. Biol. 1994, 56, 1143–1162. [Google Scholar] [CrossRef]
  3. Naiman, D.Q.; Priebe, C.E. Computing scan statistic p values using importance sampling, with applications to genetics and medical image analysis. J. Comput. Graph. Stat. 2001, 10, 296–328. [Google Scholar] [CrossRef]
  4. Guerriero, M.; Pozdnyakov, V.; Glaz, J.; Willett, P. A repeated significance test with applications to sequential detection in sensor networks. IEEE Trans. Signal Process. 2010, 58, 3426–3435. [Google Scholar] [CrossRef]
  5. Guerriero, M.; Willett, P.; Glaz, J. Distributed target detection in sensor networks using scan statistics. IEEE Trans. Signal Process. 2009, 57, 2629–2639. [Google Scholar] [CrossRef]
  6. Darling, R.W.R.; Waterman, M.S. Extreme value distribution for the largest cube in a random lattice. SIAM J. Appl. Math. 1986, 46, 118–132. [Google Scholar] [CrossRef] [Green Version]
  7. Marcos, R.; Marcos, C. From star complexes to the field: Open cluster families. Astrophys. J. 2008, 672, 342–351. [Google Scholar] [CrossRef]
  8. Boutsikas, M.V.; Koutras, M.V. Reliability approximation for Markov chain imbeddable systems. Methodol. Comput. Appl. Probab. 2000, 2, 393–411. [Google Scholar] [CrossRef]
  9. Glaz, J.; Naus, J. Tight bounds and approximations for scan statistic probabilities for discrete data. Ann. Appl. Probab. 1991, 1, 306–318. [Google Scholar] [CrossRef]
  10. Glaz, J.; Naus, J.; Wallenstein, S. Scan Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
  11. Glaz, J.; Balakrishnan, N. Scan Statistics and Applications; Springer Sciences+Business Media: Berlin, Germany, 1999. [Google Scholar]
  12. Naus, J. Approximations for distributions of scan statistics. J. Am. Stat. Assoc. 1982, 77, 177–183. [Google Scholar] [CrossRef]
  13. Wang, X.; Glaz, J.; Naus, J. Approximations and inequalities for moving sums. Methodol. Comput. Appl. Probab. 2012, 14, 597–616. [Google Scholar]
  14. Chen, J.; Glaz, J. Scan statistics for monitoring data modeled by a negative binomial distribution. Commun. Stat. Theory Methods 2016, 45, 1632–1642. [Google Scholar] [CrossRef]
  15. Naus, J. Probabilities for a generalized birthday problem. J. Am. Stat. Assoc. 1974, 69, 810–815. [Google Scholar] [CrossRef]
  16. Balakrishnan, N.; Koutras, M.V. Runs and Scans with Applications; Wiley Series in Probability and Statistics; Wiley-Interscience [John Wiley & Sons]: New York, NY, USA, 2002. [Google Scholar]
  17. Fu, J.C.; Lou, W. Distribution Theory of Runs and Patterns and Its Applications: A Finite Markov Chain Imbedding Approach; World Scientific Publishing Co., Inc.: River Edge, NJ, USA, 2003. [Google Scholar]
  18. Ebneshahrashoob, M.; Gao, T.; Wu, M. An efficient algorithm for exact distribution of discrete scan statistics. Methodol. Comput. Appl. Probab. 2005, 7, 1423–1436. [Google Scholar] [CrossRef]
  19. Uchida, M. On generating functions of waiting time problems for sequence patterns of discrete random variables. Ann. Inst. Stat. Math. 1998, 50, 650–671. [Google Scholar] [CrossRef]
  20. Chen, J.; Glaz, J. Approximations and inequalities for the distribution of a scan statistic for 0-1 Bernoulli trials. Adv. Theory Pract. Stat. 1997, 1, 285–298. [Google Scholar]
  21. Arratia, R.; Gordon, L.; Waterman, M.S. The Erdos-Rényi law in distribution for coin tossing and sequence matching. Ann. Stat. 1990, 18, 539–570. [Google Scholar] [CrossRef]
  22. Burton, R.M.; Goulet, M.; Meester, R. On one-dependent processes and k-block factors. Ann. Probab. 1993, 21, 2157–2168. [Google Scholar] [CrossRef]
  23. Haiman, G.; Preda, C. One dimensional scan statistics generated by some dependent stationary sequences. Stat. Probab. Lett. 2013, 83, 1457–1463. [Google Scholar] [CrossRef]
  24. Amărioarei, A. Approximation for the Distribution of Extremes of One Dependent Stationary Sequences of Random Variables. arXiv 2012, arXiv:1211.5456v1. [Google Scholar]
  25. Amărioarei, A. Approximations for the Multidimensional Discrete Scan Statistics. Ph.D. Thesis, University of Lille, Lille, France, 2014. [Google Scholar]
  26. Haiman, G. Estimating the distributions of scan statistics with high precision. Extremes 2000, 3, 349–361. [Google Scholar] [CrossRef]
  27. Pittel, B. Limiting behavior of a process of runs. Ann. Probab. 1981, 9, 119–129. [Google Scholar] [CrossRef]
  28. Frolov, A.; Martikainen, A. On the length of the longest increasing run in Rd. Stat. Prob. Lett. 1999, 41, 153–161. [Google Scholar] [CrossRef]
  29. Grill, K. Erdos-Révész type bounds for the length of the longest run from a stationary mixing sequence. Probab. Theory Relat. Fields 1987, 75, 169–179. [Google Scholar] [CrossRef]
  30. Novak, S. Longest runs in a sequence of m-dependent random variables. Probab. Theory Relat. Fields 1992, 91, 269–281. [Google Scholar] [CrossRef]
  31. Révész, P. Three problems on the llength of increasing runs. Stochastic Process. Appl. 1983, 5, 169–179. [Google Scholar] [CrossRef] [Green Version]
  32. Csaki, E.; Foldes, A. On the length of theh longest monnotone block. Studio Scientiarum Mathematicarum Hungarica 1996, 31, 35–46. [Google Scholar]
  33. Eryilmaz, S. A note on runs of geometrically distributed random variables. Discrete Math. 2006, 306, 1765–1770. [Google Scholar] [CrossRef] [Green Version]
  34. Grabner, P.; Knopfmacher, A.; Prodinger, H. Combinatorics of geometrically distributed random variables: Run statistics. Theoret. Comput. Sci. 2003, 297, 261–270. [Google Scholar] [CrossRef] [Green Version]
  35. Louchard, G. Monotone runs of uniformly distributed integer random variables: A probabilistic analysis. Theoret. Comput. Sci. 2005, 346, 358–387. [Google Scholar] [CrossRef] [Green Version]
  36. Mitton, N.; Paroux, K.; Sericola, B.; Tixeuil, S. Ascending runs in dependent uniformly distributed random variables: Application to wireless networks. Methodol. Comput. Appl. Probab. 2010, 12, 51–62. [Google Scholar] [CrossRef]
  37. Malley, J.; Naiman, D.Q.; Bailey-Wilson, J. A compresive method for genome scans. Hum. Heredity 2002, 54, 174–185. [Google Scholar] [CrossRef] [PubMed]
  38. Wang, X.; Zhao, B.; Glaz, J. A multiple window scan statistic for time series models. Stat. Probab. Lett. 2014, 94, 196–203. [Google Scholar] [CrossRef]
  39. Shi, J.; Siegmund, D.; Yakir, B. Importance sampling for estimating p values in linkage analysis. J. Am. Stat. Assoc. 2007, 102, 929–937. [Google Scholar] [CrossRef]
  40. Genz, A.; Bretz, F. Computation of Multivariate Normal and T Probabilities; Springer: New York, NY, USA, 2009. [Google Scholar]
  41. Wang, X.; Glaz, J. A variable window scan statistic for MA(1) process. In Proceedings of the 15th International Conference on Applied Stochastic Models and Data Analysis ASMDA 2013, Barcelona, Spain, 25–28 June 2013; pp. 955–962. [Google Scholar]
Figure 1. The block factor model.
Figure 1. The block factor model.
Mathematics 08 00576 g001
Figure 2. Construction of Z j .
Figure 2. Construction of Z j .
Mathematics 08 00576 g002
Table 1. Selected values for l, K and Γ functions in Theorem 1 for ε = 10 6 .
Table 1. Selected values for l, K and Γ functions in Theorem 1 for ε = 10 6 .
α l ( α ) K ( α ) Γ ( α )
0.1 1.5347 38.63 480.69
0.05 1.1893 21.28 180.53
0.025 1.0835 17.56 145.20
0.01 1.0313 15.92 131.43
Table 2. The distribution of the length of the longest increasing run: T ˜ = 10001 , I T E R s i m = 10 4 , I T E R a p p = 10 5 .
Table 2. The distribution of the length of the longest increasing run: T ˜ = 10001 , I T E R s i m = 10 4 , I T E R a p p = 10 5 .
mSimApp E total LimApp
Equation (14)Equation (15)Equation (25)
50.000007000.000007330.148602990.00000676
60.175672620.179376450.010896280.17620431
70.802574240.803623530.001109900.80215088
80.975485100.975664600.000115790.97550345
90.997498210.997510490.000011140.99749792
100.999770740.999771830.000000980.99977038
110.999980750.999980830.000000080.99998073
120.999998510.999998510.000000010.99999851
130.999999890.999999890.000000000.99999989
140.999999990.999999990.000000000.99999999
151.000000001.000000000.000000001.00000000
Table 3. MA-like(q = 2) model: m = 20 , T = 1000 , X i = 0.3 Y i + 0.1 Y i + 1 + 0.5 Y i + 2 , I T E R a p p = 10 6 , I T E R s i m = 10 5 .
Table 3. MA-like(q = 2) model: m = 20 , T = 1000 , X i = 0.3 Y i + 0.1 Y i + 1 + 0.5 Y i + 2 , I T E R a p p = 10 6 , I T E R s i m = 10 5 .
xSimAppPTApp E total
Equation (14)Equation (15)
110.5822520.5894790.5843550.015156
120.7709710.7737000.7714460.004010
130.8899860.8900090.8894310.001167
140.9515290.9545360.9517230.000370
150.9806530.9824330.9806750.000124
160.9928270.9936900.9927910.000042
170.9974860.9954710.9974990.000014
180.9991860.9994110.9991880.000004
190.9997540.9997170.9997540.000001
200.99993010.9999300.000000

Share and Cite

MDPI and ACS Style

Amarioarei, A.; Preda, C. One Dimensional Discrete Scan Statistics for Dependent Models and Some Related Problems. Mathematics 2020, 8, 576. https://doi.org/10.3390/math8040576

AMA Style

Amarioarei A, Preda C. One Dimensional Discrete Scan Statistics for Dependent Models and Some Related Problems. Mathematics. 2020; 8(4):576. https://doi.org/10.3390/math8040576

Chicago/Turabian Style

Amarioarei, Alexandru, and Cristian Preda. 2020. "One Dimensional Discrete Scan Statistics for Dependent Models and Some Related Problems" Mathematics 8, no. 4: 576. https://doi.org/10.3390/math8040576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop