Consistent Estimation of Partition Markov Models
Abstract
:1. Introduction
2. Preliminaries
- (i)
- are equivalent (denoted by ) if
- (ii)
- is a Markov chain with partition if this partition is the one defined by the equivalence relationship introduced by item (i).
- (i)
- Suppose Then, verifies Definition 1(ii);
- (ii)
- Suppose and Define and then verifies Definition 1(ii), while does not check that definition.
3. Consistent Estimation through the Bayesian Information Criterion
- (i)
- is a good part of if for values of
- (ii)
- is a good partition of if for each verifies item (i).
- (i)
- is a good partition of
- (ii)
- Consider the Example 1(ii), and the partition is a good partition of
- (a)
- Let denote the partitionwhereis a partition of and for with
- (b)
- For , we write and In addition,
3.1. A Metric on the State Space
- (i)
- with equality if and only if
- (ii)
- (iii)
3.2. Consistent Estimation of the Process’s Partition
4. Navigation Patterns on a Web Site (MSNBC.com)
5. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Appendix A. Proofs
Appendix A.1. Proof of Theorem 1
Appendix A.2. Proof of Theorem 2
Appendix A.3. Proof of Theorem 3
Appendix B. Auxiliary Results
References
- Buhlmann, P.; Wyner, A. Variable length Markov chains. Ann. Stat. 1999, 27, 480–513. [Google Scholar]
- Rissanen, J. A universal data compression system. IEEE Trans. Inf. Theory 1983, 29, 656–664. [Google Scholar] [CrossRef]
- Weinberger, M.; Rissanen, J.; Feder, M. A universal finite memory source. IEEE Trans. Inf. Theory 1995, 41, 643–652. [Google Scholar] [CrossRef]
- Csiszár, I.; Talata, Z. Context tree estimation for not necessarily finite memory processes, via BIC and MDL. IEEE Trans. Inf. Theory 2006, 52, 1007–1016. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Csiszár, I. Large-scale typicality of Markov sample paths and consistency of MDL order estimators. IEEE Trans. Inf. Theory 2002, 48, 1616–1628. [Google Scholar] [CrossRef]
- Csiszár, I.; Shields, P.C. The consistency of the BIC Markov order estimator. Ann. Stat. 2000, 28, 1601–1619. [Google Scholar]
- Jääskinen, V.; Xiong, J.; Corander, J.; Koski, T. Sparse markov chains for sequence data. Scand. J. Stat. 2014, 41, 639–655. [Google Scholar] [CrossRef]
- Manning, C.D.; Schütze, H. Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, MA, USA, 1999; Volume 999. [Google Scholar]
- Garca, J.E.; González-López, V.A. Minimal Markov Models. In Proceedings of the Fourth Workshop on Information Theoretic Methods in Science and Engineering, Helsinki, Finland, 7–10 August 2011; Volume 1, pp. 25–28. [Google Scholar]
- Farcomeni, A. Hidden Markov Partition Models. Stat. Probab. Lett. 2011, 81, 1766–1770. [Google Scholar] [CrossRef]
- García, J.E.; Fernández, M. Copula based model correction for bivariate Bernoulli financial series. In Proceedings of the 11th International Conference of Numerical Analysis and Applied Mathematics (ICNAAM 2013), Rhodes, Greece, 21–27 September 2013; AIP Publishing: Melville, NY, USA, 2013; Volume 1558, pp. 1487–1490. [Google Scholar]
- Fernández, M.; García Jesús, E.; González-López, V.A. Multivariate Markov chain predictions adjusted with copula models. In New Trends in Stochastic Modeling and Data Analysis; ISAST: Athens, Greece, 2015. [Google Scholar]
- García, J.E.; González-López, V.A.; Hirsh, I.D. Copula-Based Prediction of Economic Movements. In Proceedings of the 13th International Conference of Numerical Analysis and Applied Mathematics (ICNAAM 2015), Rhodes, Greece, 23–29 September 2015; AIP Publishing: Melville, NY, USA, 2015; Volume 1738, p. 140005. [Google Scholar]
- García, J.E.; González-López, V.A. Detecting regime changes in Markov models. Proceedings of The Sixth Workshop on Information Theoretic Methods in Science and Engineering, Tokyo, Japan, 26–29 August 2013. [Google Scholar]
- Gusfield, D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- MSNBC.com Anonymous Web Data Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data (accessed on 5 April 2017).
- Index of /~jg/MSNBC. Available online: http://www.ime.unicamp.br/~jg/MSNBC/ (accessed on 5 April 2017).
- Galves, A.; Galves, C.; García, J.; Garcia, N.L.; Leonardi, F. Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 2012, 6, 186–209. [Google Scholar] [CrossRef]
s | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
0.1 | 0.1 | 0.1 | 0.2 | 0.1 | 0.1 | 0.1 | 0.1 |
User 1 | frontpage, tech, tech, frontpage. |
User 2 | weather, weather, weather, misc, local, weather, weather, weather. |
User 3 | on-air, msn-news, msn-news, msn-news, msn-news, misc, msn-news. |
User 4 | news. |
User 5 | msn-sports, sports, msn-sports. |
User 6 | frontpage, frontpage, frontpage. |
User 7 | news, business, tech, local, business, business. |
User 8 | frontpage. |
User 9 | local. |
User 10 | frontpage, tech, tech. |
User 11 | frontpage, frontpage, business, frontpage. |
User 12 | sports, sports, sports, sports, sports, sports. |
Order 3 | ||
Method | Number of Parts () | BIC Value |
(i) | 196 | −2957442 |
(ii) | 210 | −2895322 |
(iii) | 269 | −2865622 |
Order 2 | ||
Method | Number of Parts () | BIC Value |
(i) | 177 | −3614825 |
(ii) | 177 | −3613655 |
(iii) | 181 | −3611092 |
Part | Strings | |
---|---|---|
msn-news.news.local, msn-news.business.local, on-air.tech.local | 0.7257 | |
tech.local.local, msn-news.tech.local, business.local.local | ||
on-air.local.local, msn-news.local.local | ||
health.news.local, health.local.local, news.local.local | 0.6096 | |
local.local.local | 0.8874 | |
misc.local.local, tech.weather.local, frontpage.opinion.misc | 0.6355 | |
local.news.misc, local.misc.local, misc.misc.local | ||
weather.local.local, local.weather.misc, weather.weather.misc | 0.7822 | |
local.local.misc, on-air.weather.misc, msn-news.weather.misc, local.misc.misc | 0.7373 | |
misc.local.misc | 0.8563 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
García, J.E.; González-López, V.A. Consistent Estimation of Partition Markov Models. Entropy 2017, 19, 160. https://doi.org/10.3390/e19040160
García JE, González-López VA. Consistent Estimation of Partition Markov Models. Entropy. 2017; 19(4):160. https://doi.org/10.3390/e19040160
Chicago/Turabian StyleGarcía, Jesús E., and Verónica A. González-López. 2017. "Consistent Estimation of Partition Markov Models" Entropy 19, no. 4: 160. https://doi.org/10.3390/e19040160
APA StyleGarcía, J. E., & González-López, V. A. (2017). Consistent Estimation of Partition Markov Models. Entropy, 19(4), 160. https://doi.org/10.3390/e19040160