Fixed-Rate Universal Lossy Source Coding and Model Identification: Connection with Zero-Rate Density Estimation and the Skeleton Estimator
Abstract
1. Introduction
Contributions of This Work
2. Preliminaries
2.1. Basic Definitions
2.2. Fixed-Rate Universal Lossy Source Coding with Memory or Training Data
2.3. Raginsky’s Two-Stage Joint Universal Coding and Modeling
- ,
- , and
- .
3. Connections with Zero-Rate Density Estimation
3.1. Density Estimation with a Rate Constraint
3.2. Main Results
- (i)
- can be expressed by where is a bounded metric in with and
- (ii)
- for all , for all , and for all , there exists a -block code, say , that achieves the n-order operational DRF in (7).
4. Joint Source Coding and Modeling Achievability Results
4.1. Main Result: The Skeleton Density Estimator
4.2. Examples of -Totally Bounded Clases
4.2.1. Finite Mixture Classes
4.2.2. Monotone Densities in
4.2.3. r-Moment Smooth Class in
4.3. Yatracos Classes with Finite VC Dimension
- (i)
- is -totally bounded,
- (ii)
- the Yatracos class has a finite VC dimension (Definition A1 in Appendix B), and
- (iii)
- the Kolmogorov’s entropy of associated with the sequence grows strictly sub-linearly, i.e., is ,
5. The Parametric Scenario
The Practical Skeleton Estimator
6. Summary of the Results
- Proposition 1 and Theorem 1 formalize the interplay between the two-stage joint fixed-rate coding and modeling objective and the problem of zero-rate uniformly consistent (in expected total variation) density estimation.
- Theorem 2 establishes a necessary and sufficient condition on a family of densities for the existence of a strongly minimax joint coding and modeling scheme achieving both source coding and model identification objectives (Definition 4). The result is obtained for the rich non-parametric collection of -totally bounded densities.
- For the modeling stage, we propose using the skeleton estimator, which first quantizes the data and then finds the minimum-distance decision on this finite set of density candidates (42). This is a practical solution in the sense that the inference (minimization) is carried out over a finite set.
- By introducing combinatorial regularity conditions on the family of distributions , the skeleton scheme achieves rate of convergence in the n-order distortion redundancy, and the same rate in the expected total variational distance for the modeling part (Theorem 3).
- Finally, for a relevant parametric setting, a practical skeleton-based joint coding and modeling scheme is proposed that achieves a rate of for the n-order distortion redundancy (Theorem 4). This rate is slightly better than the achieved in [18] under the same rate overhead of . Furthermore, Theorem 4 removes the finite-VC-dimension assumption over the Yatracos class considered in [18] (Theorem 3.2), while achieving the same performance rates in terms of n-order distortion redundancy , uniform expected risk to learn the density , and rate overhead .
7. Conclusions
8. Proofs of Results
8.1. Proposition 1
8.2. Theorem 1
8.3. Theorem 2
8.4. Theorem 3
8.5. Lemma 1
8.6. Theorem 4
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Proof of (22) and (23)
Appendix B. Basic Definitions of Vapnik and Chervonenkis Theory
Appendix C. Pseudo Algorithm to Implement the Practical ϵ-Covering Presented in Lemma 1
- In each of the k dimensions of , the interval is partitioned uniformly with sub-intervals of length . This produces a scalar quantization of with prototypes per coordinate.
- A product partition of is made with the scalar quantizations of the previous step. From the proof of Lemma 1, this is a -covering of with prototypes. Let us denote this set by .
- From the proof of Lemma 1, the covering of constructed in the previous step induces an -covering of by applying the indexing function , i.e., by
References
- Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial; Now Inc.: Houston, TX, USA, 2004. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Interscience: New York, NY, USA, 2006. [Google Scholar]
- Gyorfi, L.; Pali, I.; van der Meulen, E. There is no unieversal soruce code for an infinite source alphabet. IEEE Trans. Inf. Theory 1994, 40, 267–271. [Google Scholar] [CrossRef]
- Davisson, L.D. Universal noiseless coding. IEEE Trans. Inf. Theory 1973, 19, 783–785. [Google Scholar] [CrossRef]
- Kieffer, J.C. A unified approach to weak universal source coding. IEEE Trans. Inf. Theory 1978, 24, 674–682. [Google Scholar] [CrossRef]
- Rissanen, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory 1984, 30, 629–636. [Google Scholar] [CrossRef]
- Boucheron, S.; Garivier, A.; Gassiat, E. Codign on countable infininite alphabets. IEEE Trans. Inf. Theory 2009, 55, 358–373. [Google Scholar] [CrossRef]
- Bontemps, D.; Boucheron, S.; Gassiat, E. About adaptive coding on countable alphabets. IEEE Trans. Inf. Theory 2014, 60, 808–821. [Google Scholar] [CrossRef]
- Bontemps, D. Universal coding on infinite alphabets: exponentially decreasing envelops. IEEE Trans. Inf. Theory 2011, 57, 1466–1478. [Google Scholar] [CrossRef]
- Silva, J.F.; Piantanida, P. Almost lossless variable-length source coding on countably infinite alphabets. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Silva, J.F.; Piantanida, P. The redundancy gains of almost lossless universal source coding over envelope families. In Proceedings of the IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 1–5. [Google Scholar]
- Silva, J.F.; Piantanida, P. Universal weak variable-length source coding on countable infinite alphabets. arXiv 2017, arXiv:1708.08103. [Google Scholar]
- Berger, T.; Gibson, J.D. Lossy source coding. IEEE Trans. Inf. Theory 1998, 44, 2693–2723. [Google Scholar] [CrossRef]
- Linder, T.; Lugosi, G.; Zeger, K. Rates of convergence in the source codign theorem, in empirical quantization design, and in univesal lossy source codign. IEEE Trans. Inf. Theory 1994, 40, 1728–1740. [Google Scholar] [CrossRef]
- Linder, T.; Lugosi, G.; Zeger, K. Fixed-rate universal lossy soruce coding and rate of convergence for memoryless sources. IEEE Trans. Inf. Theory 1995, 41, 665–676. [Google Scholar] [CrossRef]
- Neuhoff, D.L.; Gray, R.M.; Davisson, L.D. Fixed rate universal block source coding with a fidelity criterion. IEEE Trans. Inf. Theory 1975, 21, 511–523. [Google Scholar] [CrossRef]
- Ziv, J. Coding of sources with unkown statistics-Part II: Distortion relative to a fidelity criterion. IEEE Trans. Inf. Theory 1972, 18, 389–394. [Google Scholar] [CrossRef]
- Raginsky, M. Joint fixed-rate univesal lossy coding and identification of continuous-alphabet memoryless sources. IEEE Trans. Inf. Theory 2008, 54, 3059–3077. [Google Scholar] [CrossRef]
- Chou, P.; Effros, M.; Gray, R.M. A vector quantization approach to universal noiseless coding and quantization. IEEE Trans. Inf. Theory 1996, 42, 1109–1138. [Google Scholar] [CrossRef]
- Rissanen, J. Stochastic complexity and modeling. Ann. Stat. 1986, 14, 1080–1100. [Google Scholar] [CrossRef]
- Barron, A.; Rissanen, J.; Yu, B. The minimun description lenght principle in coding and modeling. IEEE Trans. Inf. Theory 1998, 44, 2743–2760. [Google Scholar] [CrossRef]
- Barron, A.; Györfi, L.; van der Meulen, E.C. Distribution estimation consistent in total variation and in two types of information divergence. IEEE Trans. Inf. Theory 1992, 38, 1437–1454. [Google Scholar] [CrossRef]
- Tao, G. Adaptive Control Design and Analysis; Wiley-IEEE Press: Hoboken, NJ, USA, 2003. [Google Scholar]
- Berger, T. Rate Distortion Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1971. [Google Scholar]
- Devroye, L.; Györfi, L. Nonparametric Density Estimation: The L1 View; Wiley Interscience: New York, NY, USA, 1985. [Google Scholar]
- Devroye, L.; Györfi, L. Principles of Nonparametric Learning; Chapter Distribution and Density Estimation; Springer: New York, NY, USA, 2001. [Google Scholar]
- Shannon, C.E. Coding theorems for a discrete source with fidelity criterion. IRE Int. Conv. Rec. 1959, 4, 325–350. [Google Scholar]
- Gallager, R.G. Information Theory and Realiable Communication; John Wiley & Songs: Hoboken, NJ, USA, 1968. [Google Scholar]
- Yatracos, Y.G. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. Ann. Stat. 1985, 13, 768–774. [Google Scholar] [CrossRef]
- Devroye, L.; Lugosi, G. Combinatorial Methods in Density Estimation; Springer: New York, NY, USA, 2001. [Google Scholar]
- Silva, J.F.; Derpich, M.S. Necessary and sufficient conditions for zero-rate density estimation. In Proceedings of the Information Theory Workshop (ITW), Paraty, Brazil, 16–20 October 2011. [Google Scholar]
- Halmos, P.R. Measure Theory; Van Nostrand: New York, NY, USA, 1950. [Google Scholar]
- Scheffé, H. A useful convergence theorem for probability distribution. Ann. Math. Stat. 1947, 18, 434–458. [Google Scholar]
- Gersho, A.; Gray, R. Vector Quantization and Signal Compression; Kluwer Academic: Norwell, MA, USA, 1992. [Google Scholar]
- Gray, R.; Neuhoff, D. Quantization. IEEE Trans. Inf. Theory 1998, 44, 2325–2384. [Google Scholar] [CrossRef]
- Gray, R.M. Entropy and Information Theory; Springer: New York, NY, USA, 1990. [Google Scholar]
- Kolmogorov, A.N.; Tikhomirov, V.M. ϵ-emtropy and ϵ-capacity of sets in function spaces. Transl. Am. Math. Soc. 1961, 17, 277–364. [Google Scholar]
- Yatracos, Y.G. A note on L1 consistent estimation. Can. J. Stat. 1988, 16, 283–292. [Google Scholar]
- Vapnik, V.; Chervonenkis, A.J. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 1971, 16, 264–280. [Google Scholar] [CrossRef]
- Vapnik, V. Statistical Learning Theory; John Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
- Dudley, R.M. Central limits theorems for empirical measures. Ann. Probab. 1978, 6, 899–929. [Google Scholar] [CrossRef]
- Devroye, L.; Lugosi, G. A universally acceptable smoothing factor for kernel density estimation. Ann. Stat. 1996, 24, 2499–2512. [Google Scholar]
- Devroye, L.; Lugosi, G. Nonasymtotic universal smothing factors, kernel complexity and Yatracos classes. Ann. Stat. 1997, 25, 2626–2637. [Google Scholar]
- Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
- Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Springer: New York, NY, USA, 1996. [Google Scholar]
- Breiman, L. Probability; Addison-Wesley: Boston, MA, USA, 1968. [Google Scholar]
- Varadhan, S. Probability Theory; American Mathematical Society: Providence, RI, USA, 2001. [Google Scholar]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Silva, J.F.; Derpich, M.S. Fixed-Rate Universal Lossy Source Coding and Model Identification: Connection with Zero-Rate Density Estimation and the Skeleton Estimator. Entropy 2018, 20, 640. https://doi.org/10.3390/e20090640
Silva JF, Derpich MS. Fixed-Rate Universal Lossy Source Coding and Model Identification: Connection with Zero-Rate Density Estimation and the Skeleton Estimator. Entropy. 2018; 20(9):640. https://doi.org/10.3390/e20090640
Chicago/Turabian StyleSilva, Jorge F., and Milan S. Derpich. 2018. "Fixed-Rate Universal Lossy Source Coding and Model Identification: Connection with Zero-Rate Density Estimation and the Skeleton Estimator" Entropy 20, no. 9: 640. https://doi.org/10.3390/e20090640
APA StyleSilva, J. F., & Derpich, M. S. (2018). Fixed-Rate Universal Lossy Source Coding and Model Identification: Connection with Zero-Rate Density Estimation and the Skeleton Estimator. Entropy, 20(9), 640. https://doi.org/10.3390/e20090640
 
        
