Abstract
In this paper, we investigate waiting time problems for a finite collection of patterns in a sequence of independent multi-state trials. By constructing a finite GI/M/1-type Markov chain with a disaster and then using the matrix analytic method, we can obtain the probability generating function of the waiting time. From this, we can obtain the stopping probabilities and the mean waiting time, but it also enables us to compute the waiting time distribution by a numerical inversion.
1. Introduction
Waiting time problems for runs and patterns in a random sequence of trials are considered important, as they are of theoretical interest and have practical applications in various areas of statistics and applied probability such as reliability, sampling inspection, quality control, DNA/RNA sequence analysis, and hypothesis testing ([1]). For comprehensive surveys and applications of related waiting time problems, refer to the books of Balakrishnan and Koutras [2] and Fu and Lou [3].
Let be a sequence of random variables taking values in a finite set A. A finite sequence of elements of A is called a pattern. We consider a finite collection of patterns, possibly of different lengths. For , let be the waiting time until the first occurrence of pattern as a run in the series . Let W be the waiting time until one of the K patterns appears, i.e.,
Many researchers have studied waiting time problems for general and specific choices of in a random sequence of trials. When is a sequence of independent and identically distributed (i.i.d.) Bernoulli trials, Fu and Koutras [4] developed a finite Markov chain embedding method, which was first employed by Fu [5], to study the exact distributions for the numbers of specified runs and patterns. Fu [6] extended the finite Markov chain embedding method to study the exact distributions for the numbers of runs and patterns in a sequence of i.i.d. multi-state trials. In addition, he obtained the waiting time distribution of a specified pattern.
In this paper, we are mainly interested in computing the waiting time distribution, as well as the stopping probabilities , . Li [7], Gerber and Li [8], Guibas and Odlyzko [9], Blom and Thorburn [10] and Antzoulakos [11] considered the case when is a sequence of i.i.d. multi-state trials. Li [7] and Gerber and Li [8] used the martingale approach to obtain the mean waiting time and the stopping probabilities , for a finite collection of patterns. Guibas and Odlyzko [9] used the combinatorial method to obtain the probability generating function of the waiting time. Blom and Thorburn [10] also used the combinatorial method to obtain the mean waiting time and the stopping probabilities , for a finite collection of patterns with the same length. Antzoulakos [11] used the finite Markov chain embedding method to study waiting time problems for a single pattern as well as a finite collection of patterns.
Han and Hirano [12], Fu and Chang [13], Glaz et al. [14], Pozdnyakov [15], Gava and Salotti [16], Zhao et al. [17] and Kerimov and Öner [18] considered the case when is a discrete time homogenous Markov chain with a finite state space, i.e., is a sequence of Markov dependent multi-state trials. Han and Hirano [12] studied waiting time problems for two different patterns. Fu and Chang [13] studied waiting time problems for a finite collection of patterns by using the finite Markov chain embedding method. Glaz et al. [14] obtained the mean waiting time and the probability generating function of the waiting time for a finite collection of patterns in a two-state Markov chain by using the method of gambling teams and martingale approach. Pozdnyakov [15] investigated the same problems as in Glaz et al. [14] for multi-state Markovian trials. Gava and Salotti [16] obtained the system of linear equations of stopping probabilities , , by using the methods developed for gambling teams in [14,15]. Recently, Zhao et al. [17] found a method, which is based on the method of [9], to calculate and , . Even more recently, Kerimov and Öner [18] found oscillation properties of the expected stopping times and stopping probabilities for patterns consisting of two consecutive states. For useful reviews of different approaches to solve waiting time problems of patterns for both i.i.d. and Markov dependent trials, refer to Fu and Lou [3].
Antzoulakos [11] and Fu and Chang [13] obtained the probability generating function of the waiting time for a finite collection of patterns in a sequence of i.i.d. and Markov dependent multi-state trials, respectively. They used a Markov chain with absorbing states corresponding to the patterns and considered the waiting time as the first entrance time into the absorbing state. The Markov chain has the transition probability matrix P of the form:
where is the submatrix of P whose entries are transition probabilities from a transient state to a transient state, is the submatrix of P whose entries are transition probabilities from a transient state to an absorbing state, O is the zero matrix and I is the identity matrix. By using the general formula that represents the probability generating function of the first entrance time into the absorbing state, they obtained the probability generating function of the waiting time. Their results are expressed in terms of the submatrices and , as well as variants of them. Chang [19] also studied waiting time problems for a finite collection of patterns. He investigated the distribution of the waiting time until the rth occurrence of any pattern in the collection of patterns. He also used the expression (1) for the analysis.
In this paper, we consider a sequence of i.i.d. multi-state trials. We also use a Markov chain with transition probability matrix of the form (1). However, we heavily investigate the structure of the submatrices and . This enables us to construct a finite GI/M/1-type Markov chain with a disaster and consider the waiting time as the time until the occurrence of the disaster. Based on this and the matrix analytic method, we obtain the probability generating function of the waiting time W on , , . From this, we can obtain the stopping probabilities , as well as the conditional/unconditional mean waiting times, and , but it also enables us to compute the waiting time distribution by a numerical inversion. The benefit of our method is that it is useful and efficient even when the length of the pattern is large. Our method can also be extended to Markov dependent multi-state trials.
The paper is organized as follows. In Section 2, we formulate our waiting time problems. In Section 3, we construct a GI/M/1-type Markov chain with a disaster. From this we can obtain our results, which are given in Section 4. In Section 5, numerical examples are presented to illustrate our results. Conclusions are given in Section 6.
2. Problem Formulation
Let be a sequence of i.i.d. trials taking values in a finite set A. Assume that for ,
where . For a finite collection of patterns, suppose that pattern is of the form
where , i.e., is any pattern of length . Here, , are fixed positive integers with . Recall that W is the waiting time until one of the K patterns appears, i.e.,
We will call W the sooner waiting time.
Our main interest is to derive the probability generating function of the sooner waiting time W on , , i.e.,
From this, we can obtain the stopping probabilities, the conditional/unconditional probability mass functions of W, and the conditional/unconditional means of W as follows:
- The stopping probabilities , are given by .
- The conditional probability mass function of W, given , i.e., , , can be computed from the conditional probability generating function of W given , , by a numerical inversion.
- The probability mass function of W, , can be computed from by a numerical inversion.
- The conditional mean of W, , , can be obtained from , where . In addition, the unconditional mean of W, , can be obtained from .
3. GI/M/1-Type Markov Chain with a Disaster
In this section, we construct a GI/M/1-type Markov chain with a disaster to obtain an expression for (2). We define the following three terms:
- is a subpattern of pattern if for some k with ; when , means the null pattern (i.e., the pattern with length 0).
- A subpattern of pattern is proper if (i.e., ).
- A subpattern of pattern is a leading subpattern of if for some j with .
Assume that for , is not a proper subpattern of .
We now introduce a two-dimensional process , , where and are defined as follows:
(i) , and for ,
- is the largest such that is a proper leading subpattern of a pattern in , if .
- , where is an extra point, if .
(ii) and for ,
- is the smallest such that is a proper leading subpattern of pattern , if .
- , if .
- , if and .
To clarify the definitions of and , we provide the following example: Let be a sequence of i.i.d. trials taking values in a finite set . Suppose , where and . For example, if we consider the sequence of trials
then , are given in Table 1. As another example, if we consider the sequence of trials
then , are given in Table 2. Note that is a discrete time Markov chain.
Table 1.
Sample paths of and corresponding to the sample path of , .
Table 2.
Sample paths of and corresponding to the sample path of , .
Define and for , let be the number of patterns in whose lengths are larger than k, i.e.,
We also define . Note that . If , then . Furthermore, the set of all possible values of when is given by
Therefore, the state space of the discrete time Markov chain is
For each state , the first component k is called level. The one-step transition probability matrix P of is given, in lexicographic order with being the last element in the set of levels, as follows:
where the submatrices are described below. A matrix consisting of components with and will be called an matrix.
- For , is the matrix whose -component iswhere is the subset of A consisting of x such that is not a leading subpattern of a pattern in for any .
- For , is an matrix. The -component of isif . Otherwise, .
- For , and , is an matrix. The -component of isif the following three conditions hold:
- (i)
- is a proper leading subpattern of pattern ;
- (ii)
- is not a proper leading subpattern of pattern for ;
- (iii)
- is not a leading subpattern of a pattern in for .
Otherwise, . - O’s are zero matrices (possibly of different sizes).
- is the identity matrix.
To make it easier to understand how the matrix P in (4) is constructed, we explain with an example. For the previously described example with , and , we have
The matrix P is
where
That is, P is given by
4. Probability Generating Function of the Waiting Time
In this section, we derive an expression for (2). The analysis is based on the matrix analytic method. For more details, refer to Neuts [20,21] and Latouche and Ramaswami [22]. Let
For , we define
which means the probability generating functions for the time of the first visit to state , starting from state at time 0, before the first visit to level . Let be the matrix of the probability generating functions whose -component is . Conditioning on the first transition, we have
where the -component of is
Since
we have
Substituting (6) into (5), we obtain
Equation (7) can be interpreted as follows: Starting from level n, the Markov chain may visit level (while avoiding level ) in two ways: it may move up to level at the very next transition (contributing the factor ), or it may move to level k () at the first transition, move up from level k to level , then from level to level , and so on, until finally moving from level n to level (contributing the factor ). From (7), we obtain
where is the identity matrix.
For , we define
which means that () is the probability generating function for the time of the first visit to state , starting from state , before the first visit to level , and that is the probability generating function for the time of the first visit to state , starting from state . Let be the matrix of the probability generating functions whose -component is . Conditioning on the first transition, we have
where the -component of () is
Since
we have
Substituting (11) into (9), we obtain
which can be written as
From this equation, we obtain
In summary, we obtain the following theorem.
Theorem 1.
From Theorem 1, we can obtain the following results.
Corollary 1.
- (i)
- The stopping probabilities , are given by
- (ii)
- The conditional probability generating functions of W, given , are given by
- (iii)
- The marginal probability generating function of W is given by
Remark. As mentioned in Section 2, the conditional probability mass function , can be computed from (14) by a numerical inversion. In addition, the probability mass function can be computed from (15) by a numerical inversion. For the numerical inversion of probability generating functions, refer to Abate and Whitt [23].
By Theorem 1, we can also obtain the conditional/unconditional means of the sooner waiting time. To get this, we introduce
Recall that , . By differentiating (13) with respect to z and evaluating at , we have
Therefore, to obtain an expression for , , we need to determine , , and , . Equation (8) may be written as
from which
Therefore, , are obtained as follows:
Similarly, we can obtain , , by using equation (12). Equation (12) may be written as
from which
Therefore, , are obtained as follows:
Since , , we can obtain the conditional mean waiting times , from . We can also obtain the unconditional mean waiting time from . From these two formulas and (16), we obtain the following theorem.
5. Numerical Examples
In this section, we present numerical results for the computations of the stopping probabilities, the probability mass functions (along with the tail probabilities) of the sooner waiting time, and the conditional/unconditional means of the sooner waiting time. To illustrate our results, we provide two examples.
Example 1.
Let be a sequence of i.i.d. trials taking values in a finite set . Assume that for ,
Suppose that , i.e., the collection consists of 10 patterns, . We select the collection of patterns as shown in Table 3, where the lengths of the patterns, , are chosen from the order statistics of i.i.d. random variables with mean 5. The set of patterns given in Table 3 is an example of a randomly selected pattern set such that one pattern is not a subpattern of another. The procedure of randomly selecting a one pattern set is omitted here.
Table 3.
The patterns used in Example 1.
By Theorem 1, we can compute . Table 4 shows the stopping probabilities
Table 4.
The stopping probabilities , for Example 1.
In Figure 1, we plot the joint probabilities , , with n varying. This can be computed by the numerical inversion of its generating function:
Figure 1.
Plots of the joint probabilities when for Example 1.
In Table 5, we present the probability mass function of W
and the tail probability of W
with n varying. Here, can be computed by the numerical inversion of its generating function:
Table 5.
The probability mass function and tail probability for Example 1.
By Theorem 2, we can compute the conditional mean of the sooner waiting time W, , , and the unconditional mean of W, . Table 6 shows the conditional and unconditional mean waiting times for Example 1.
Table 6.
The conditional mean , and unconditional mean for Example 1.
The next example will be for the Bernoulli trials.
Example 2.
Let be a sequence of i.i.d. Bernoulli trials, i.e., takes values in a finite set . Assume that for ,
Suppose that the collection consists of 5 patterns, , where
For Example 2, the joint probabilities , are shown in Figure 2. Also, the stopping probabilities , , the probability mass function of W (along with the tail probability) and the conditional/unconditional means of W are shown in Table 7, Table 8 and Table 9, respectively.
Figure 2.
Plots of the joint probabilities , for Example 2.
Table 7.
The stopping probabilities , for Example 2.
Table 8.
The probability mass function and tail probability for Example 2.
Table 9.
The conditional mean , and unconditional mean for Example 2.
6. Conclusions
We have derived the probability generating function of the sooner waiting time for a finite collection of patterns in a sequence of i.i.d. multi-state trials. From this probability generating function we have obtained the stopping probabilities and the mean waiting time, but it also has enabled us to compute the waiting time distribution by a numerical inversion. As mentioned in the introduction, our method can be extended to Markov dependent multi-state trials.
For further research, we will investigate the tail asymptotics for the sooner waiting time W. From Figure 1 and Figure 2, we can expect that the distribution of W has a geometric tail behavior. This is true under certain aperiodic condition because W is the first passage time to a subset of the state space, in a discrete time Markov chain with a finite state space. Under some assumptions about periodicity, the distribution of W exhibits a geometric tail behavior, i.e.,
for some and . Here “∼" means that the limit of the ratio is 1. It would be of interest to find explicit expressions for c and . We also have the following geometric tail behavior:
for some and . Here is independent of i and is the same as that described above. It would also be of interest to find explicit expressions for , .
Author Contributions
Conceptualization, B.K. and J.K. (Jeongsim Kim); investigation, B.K. and J.K (Jeongsim Kim); methodology, B.K. and J.K. (Jeongsim Kim); numerical investigation, J.K. (Jerim Kim); writing–original draft preparation, J.K. (Jeongsim Kim); writing–review and editing, B.K., J.K. (Jeongsim Kim) and J.K. (Jerim Kim). All authors have read and agreed to the published version of the manuscript.
Funding
The first author’s research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2B5B01001864). The second author’s research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1F1A1A01065568).
Acknowledgments
We are grateful to the reviewers for their valuable comments and suggestions, which greatly improved this paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Kim, B.; Kim, J. Sooner waiting time problems in a sequence of multi-state trials with random rewards. Stat. Probab. Lett. 2019, 153, 171–179. [Google Scholar] [CrossRef]
- Balakrishnan, N.; Koutras, M.V. Runs and Scans with Applications; Wiley: New York, NY, USA, 2002. [Google Scholar]
- Fu, J.C.; Lou, W.Y.W. Distribution Theory of Runs and Patterns and Its Applications: A Finite Markov Chain Imbedding Approach; World Scientific: Singapore, 2003. [Google Scholar]
- Fu, J.C.; Koutras, M.V. Distribution theory of runs: A Markov chain approach. J. Am. Stat. Assoc. 1994, 89, 1050–1058. [Google Scholar] [CrossRef]
- Fu, J.C. Reliability of large consecutive-k-out-of-n: F systems with (k-1)-step Markov dependence. IEEE Trans. Reliab. 1986, R-35, 602–606. [Google Scholar] [CrossRef]
- Fu, J.C. Distribution theory of runs and patterns associated with a sequence of multi-state trials. Stat. Sin. 1996, 6, 957–974. [Google Scholar]
- Li, S.-Y.R. A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Ann. Probab. 1980, 8, 1171–1176. [Google Scholar] [CrossRef]
- Gerber, H.U.; Li, S.-Y.R. The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stoch. Process. Their Appl. 1981, 11, 101–108. [Google Scholar] [CrossRef]
- Guibas, L.J.; Odlyzko, A.M. String overlaps, pattern matching, and nontransitive games. J. Comb. Theory Ser. A 1981, 30, 183–208. [Google Scholar] [CrossRef]
- Blom, G.; Thorburn, D. How many random digits are required until given sequences are obtained? J. Appl. Probab. 1982, 19, 518–531. [Google Scholar] [CrossRef]
- Antzoulakos, D.L. Waiting times for patterns in a sequence of multistate trials. J. Appl. Probab. 2001, 38, 508–518. [Google Scholar] [CrossRef]
- Han, Q.; Hirano, K. Sooner and later waiting time problems for patterns in Markov dependent trials. J. Appl. Probab. 2003, 40, 73–86. [Google Scholar] [CrossRef]
- Fu, J.C.; Chang, Y.M. On probability generating functions for waiting time distributions of compound patterns in a sequence of multistate trials. J. Appl. Probab. 2002, 39, 70–80. [Google Scholar] [CrossRef]
- Glaz, J.; Kulldorff, M.; Pozdnyakov, V.; Steele, J.M. Gambling teams and waiting times for patterns in two-state Markov chains. J. Appl. Probab. 2006, 43, 127–140. [Google Scholar] [CrossRef][Green Version]
- Pozdnyakov, V. On occurrence of patterns in Markov chains: Method of gambling teams. Stat. Probab. Lett. 2008, 78, 2762–2767. [Google Scholar] [CrossRef]
- Gava, R.J.; Salotti, D. Stopping probabilities for patterns in Markov chains. J. Appl. Probab. 2014, 51, 287–292. [Google Scholar] [CrossRef]
- Zhao, M.-Z.; Xu, D.; Zhang, H.-Z. Waiting times and stopping probabilities for patterns in Markov chains. Appl. Math. J. Chin. Univ. 2018, 33, 25–34. [Google Scholar] [CrossRef]
- Kerimov, A.; Öner, A. Oscillation Properties of Expected Stopping Times and Stopping Probabilities for Patterns Consisting of CONSECUTIVE states in Markov Chains. Rocky Mt. J. Math. Available online: https://projecteuclid.org/euclid.rmjm/1588298550 (accessed on 1 May 2020).
- Chang, Y.M. Distribution of waiting time until the rth occurrence of a compound pattern. Stat. Probab. Lett. 2005, 75, 29–38. [Google Scholar] [CrossRef]
- Neuts, M.F. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach; Johns Hopkins University Press: Baltimore, MD, USA, 1981. [Google Scholar]
- Neuts, M.F. Structured Stochastic Matrices of M/G/1 Type and Their Applications; Marcel Dekker, Inc.: New York, NY, USA, 1989. [Google Scholar]
- Latouche, G.; Ramaswami, V. Introduction to Matrix Analytic Methods in Stochastic Modeling; ASA-SIAM series on Statistics and Applied Probability; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
- Abate, J.; Whitt, W. Numerical inversion of probability generating functions. Oper. Res. Lett. 1992, 12, 245–251. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).