Rate Distortion Theory for Descriptive Statistics
Abstract
:1. Introduction
1.1. Preliminaries on Rate Distortion Theory
1.2. Lossy Source Coding and Statistics
1.3. Calculations in the R Program
- We create a number of random probability vectors over the source alphabet. These probability vectors are chosen according to a Dirichlet distribution.
- For each probability vector over the source alphabet, the optimal reconstruction point for these weights is calculated using the Nelder–Mead algorithm, which is a general purpose optimization algorithm built into the R program.
- If one of the reconstruction points has probability close to zero, it is removed.
- If the conditional distributions of source points given two different reconstruction points are close together, then one of the reconstruction points is removed.
- For each reconstruction point, we replace the reconstruction point by a new reconstruction point that is optimal with respect to the joint distribution, and go back to step 3.
1.4. Organization of the Paper
2. Cluster Analysis
2.1. Centroid-Based Clustering
2.2. Hierarchical Clustering
3. Gaussian Mixture Models
4. Anscombe’s Quartet
4.1. First Dataset
4.2. Second Dataset
4.3. Third Dataset
4.4. Fourth Dataset
5. Analysis of Orientation of Early Mosques
5.1. Historical Background
5.2. Distortion
- The architect may not have been able to determine the qibla bearing exactly.
- Local obstacles or other practical problems may have influenced the orientation of the site.
- The original structure may now be a ruin, or it has been rebuilt so that the original orientation is difficult to determine.
- Sometimes, it is difficult to measure the orientation as discussed in [49].
5.3. Test of Great Circles vs. Rhumb Lines
5.4. Outlier Detection for Sites before 700 CE
5.5. Determination of Rate and Reconstruction Point
5.6. Calculation of a Descriptive Confidence Region
5.7. Cross-Validation
- All the mosques in the Magreb appear as outliers and are removed from the dataset.
- The reconstruction point associated with Petra appears again with approximately the same coordinates and the same confidence region.
- A reconstruction point appears with Jerusalem as qibla. Only a single mosque at Qasr Tuba is associated with this qibla with high confidence. A few other mosques could also have had this qibla.
- Two new reconstruction points emerge south and southeast of Petra. If the slope is increased to , then these two reconstruction points merge into a single reconstruction point that we label .
6. Discussion
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kullback, S.; Leibler, R. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
- Csiszár, I.; Shields, P. Information Theory and Statistics: A Tutorial; Foundations and Trends in Communications and Information Theory; Now Publishers Inc.: Hanover, MA, USA, 2004. [Google Scholar]
- Quine, M.P.; Robinson, J. Efficiencies of chi-square and likelihood ratio goodness-of-fit tests. Ann. Stat. 1985, 13, 727–742. [Google Scholar] [CrossRef]
- Harremoës, P.; Vajda, I. Entropy Testing is Efficient. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 1841–1846. [Google Scholar] [CrossRef]
- Harremoës, P.; Vajda, I. On the Bahadur-Efficient Testing of Uniformity by means of the Entropy. IEEE Trans. Inf. Theory 2008, 54, 321–331. [Google Scholar] [CrossRef]
- Holst, L. Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika 1972, 59, 138–145. [Google Scholar] [CrossRef]
- Zamanzade, E.; Arghami, N.R. Goodness-of-fit test based on correcting moments of modified entropy estimator. J. Stat. Comput. Simul. 2011, 81, 2077–2093. [Google Scholar] [CrossRef]
- Mahdizadeh, M.; Zamanzade, E. New goodness of fit tests for the Cauchy distribution. J. Appl. Stat. 2017, 44, 1106–1121. [Google Scholar] [CrossRef]
- Zamanzade, E.; Mahdizadeh, M. Entropy Estimation From Ranked Set Samples with Application to Test of Fit. Rev. Colomb. Estad. 2017, 40, 223. [Google Scholar] [CrossRef]
- Rissanen, J. Modelling by shortest data description. Automatica 1978, 14, 465–471. [Google Scholar] [CrossRef]
- Barron, A.R.; Rissanen, J.; Yu, B. The Minimum Description Length Principle in Coding and Modeling. IEEE Trans. Inform. Theory 1998, 44, 2743–2760. [Google Scholar] [CrossRef] [Green Version]
- Grünwald, P. The Minimum Description Length principle; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Harremoës, P. Extendable MDL. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 1516–1520. [Google Scholar] [CrossRef]
- Linnik, Y.V. An information-theoretic proof of the central limit theorem with Lindebergcondition. Theory Probab. Appl. 1959, 4, 288–299. [Google Scholar] [CrossRef]
- Brown, L.D. A Proof of th Central Limit Theorem motivated by the Cramer-Rao Inequality. In Statistics and Probability: Essays in Honor of C. R. Rao; Kallianpur, I., Krishnaiah, P.R., Ghosh, J.K., Eds.; North-Holland: Amsterdam, The Netherlands, 1982; pp. 141–148. [Google Scholar]
- Barron, A.R. Entropy and the Central Limit Theorem. Ann. Probab. Theory 1986, 14, 336–342. [Google Scholar] [CrossRef]
- Johnson, O. Information Theory and Central Limit Theorem; Imperial Collage Press: London, UK, 2004. [Google Scholar]
- Barron, A.R. The strong ergodic theorem for densities: Generalized Shannon-McMillan-Breimantheorem. Ann. Probab. 1985, 13, 1292–1303. [Google Scholar] [CrossRef]
- Shields, P. The Interaction Between Ergodic Theory and Information Theory. IEEE Trans. Inform. Theory 1998, IT-44, 2079–2093. [Google Scholar] [CrossRef] [Green Version]
- Barron, A. Information Theory and Martingales. In Proceedings of the International Symposium on Information Theory, Budapest, Hungary, 24–28 June 1991. [Google Scholar]
- Harremoës, P. Martingales and Information Divergence. In Proceedings of the 2005 IEEE International Symposium on Information Theory, Adelaide, Australia, 4–9 September 2005; pp. 164–168. [Google Scholar]
- Csiszár, I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. 1963, 8, 95–108. [Google Scholar]
- Harremoës, P.; Holst, K.K. Convergence of Markov Chains in Information Divergence. J. Theor. Probab. 2009, 22, 186–202. [Google Scholar] [CrossRef] [Green Version]
- Harremoës, P. Poisson’s Law and Information Theory. In Proceedings of the 2001 IEEE International Symposium on Information Theory, Washington, DC, USA, 24-29 June 2001; p. 46. [Google Scholar]
- Harremoës, P.; Ruzankin, P. Rate of Convergence to Poisson Law in Terms of Information Divergence. IEEE Trans. Inf. Theory 2004, 50, 2145–2149. [Google Scholar] [CrossRef]
- Harremoës, P.; Johnson, O.; Kontoyiannis, I. Thinning, Entropy and the Law of Thin Numbers. IEEE Trans. Inf. Theory 2010, 56, 4228–4244. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. IRE Natl. Conv. Rec. 1959, 7, 142–163. [Google Scholar]
- Berger, T. Rate Distortion Theory: A Mathematical Basis for Data Compression; Prentice-Hall: Englewood, NJ, USA, 1971. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, USA, 1991. [Google Scholar]
- Blau, Y.; Michaeli, T. Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 675–685. [Google Scholar]
- Eldar, Y.C.; Kutyniok, G. (Eds.) Compressed Sensing: Theory and Applications; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar] [CrossRef]
- Harremoës, P. Testing Goodness-of-Fit via Rate Distortion. In Proceedings of the 2009 IEEE Information Theory Workshop on Networking and Information Theory, Volos, Greece, 10–12 June 2009; pp. 17–21. [Google Scholar] [CrossRef] [Green Version]
- Harremoës, P. The Rate Distortion Test of Normality. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 241–245. [Google Scholar] [CrossRef]
- Sims, C.R. RateDistortion: Routines for Solving Rate-Distortion Problems. Package for R Program. Not Updated. Available online: https://rdrr.io/cran/RateDistortion/ (accessed on 8 January 2022).
- Harremoës, P. ArimotoBlahut.R. Package for Rate Distortion Calculations in R. Available online: http://www.harremoes.dk/Peter/ArimotoBlahut.R (accessed on 6 January 2022).
- Harremoës, P. QiblaScript.R. Worksheet for Analyzing Qibla Data in R. Available online: http://www.harremoes.dk/Peter/QiblaSkript.R (accessed on 25 December 2022).
- Blahut, R.E. Computation of Channel Capacity and Rate-Distortion Functions. IEEE Trans. Inform. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef] [Green Version]
- Everitt, B.; Landau, S.; Leese, M.; Stahl, D. Cluster Analysis, 5th ed.; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
- Harremoës, P.; Tishby, N. The Information Bottleneck Revisited or How to Choose a Good Distortion Measure. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 566–570. [Google Scholar]
- Anscombe, F.J. Graphs in Statistical Analysis. Am. Stat. 1973, 27, 17–21. [Google Scholar] [CrossRef]
- Gibson, D. Early Islamic Qibla Database. Available online: https://figshare.com/articles/dataset/Early_Islamic_Qibla_Database/13570655/1 (accessed on 25 December 2022). [CrossRef]
- Schumm, W.R. How Accurately could Early (622-900 C.E.) Muslims Determine the Direction of Prayers (Qibla)? Religions 2020, 11, 102. [Google Scholar] [CrossRef] [Green Version]
- Schumm, W.R.; Crawford, D.W.; Barkey, P.E.; Bush, D.; Bosch0, D.W. Using statistics to analyze anthropological/religious issues from the distant past. Insights Anthropol. 2021, 5, 337–346. [Google Scholar] [CrossRef]
- Amine, A. L’islam de Pétra Réponse à la thèse de Dan Gibson: Présentation & Revue critique; BoD—Books on Demand: Norderstedt, Germany, 2020. [Google Scholar]
- Rignell, K. Letter from Jacob of Edessa to John the Stylite of Litarab Concerning Ecclesiastical Canons; Gleerup: Lund, Sweden, 1979. [Google Scholar]
- Von Grunebaum, G.E. Classical Islam; Taylor & Francis Ltd.: New York, NY, USA, 2017. [Google Scholar]
- Gibson, D. Early Islamic Qiblas; Scholars Press: Vancouver, BC, Canada, 2017. [Google Scholar]
- Crone, P. Meccan Trade and the Rise of Islam; Gorgias Press: Piscataway, NJ, USA, 1987. [Google Scholar]
- Khan, M.A. The Unveiling Origin of Mecca; AuthorHouse: Bloomington, IN, USA, 2021. [Google Scholar]
- al Tamimi, A.J. The Byzantine-Arabic Chronicle: Full Translation and Analysis. August 2019. Available online: https://aymennjawad.org/23129/the-byzantine-arabic-chronicle-full-translation (accessed on 25 December 2022).
- Hottinger, J.H. Historia Orientalis, Quae ex Variis Orientalium Monumentis Collecta; Joh. Jacobi Bodmeri: Zürich, Schwizerland, 1651. [Google Scholar]
- Mardia, K.V.; Jupp, P.E. Directional Statistics; Wiley Series in Probability and Statistics; Wiley: Chichester, England, 2000. [Google Scholar]
- Harremoës, P. Information Theory for Angular Data. In Proceedings of the 2010 IEEE Information Theory Workshop (ITW 2010, Cairo, Egypt, 6–8 January 2010; pp. 181–185. [Google Scholar] [CrossRef]
- King, D.A. The Petra Fallacy—Early Mosques do Face the Sacred Kaaba in Mecca but Dan Gibson Doesn’t Know How/Comparing Historical Orientations with Modern Directions Can Lead to False Results. 2020. Available online: https://muslimheritage.com/the-petra-fallacy/ (accessed on 25 December 2022).
- Voss, J. An Introduction to Statistical Computing; Wiley: Chichester, UK, 2014. [Google Scholar]
- Macdonald, M.C.A. Arabs and Empires before Islam; Chapter Arabs and Empires before the Sixth Century; Oxford University Press: Oxford, UK, 2015; pp. 11–89. [Google Scholar]
- Gibson, D.; Harremoës, P. Early Islamic Qibla Database 2022. 2022. Available online: https://figshare.com/articles/dataset/Early_Islamic_Qibla_Database/13570655/2 (accessed on 25 December 2022). [CrossRef]
I | II | III | IV | ||||
---|---|---|---|---|---|---|---|
x | y | x | y | x | y | x | y |
10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
Rec. Point | Latitude | Longitude | Weight | Distortion |
---|---|---|---|---|
77.16% | 0.00446 | |||
Ma | 15.34% | 0.00266 | ||
SG | 7.50% | 0.00134 |
Site | |||
---|---|---|---|
Massawa Mosque | 100.0 | 0.0 | 0.0 |
Huaisheng Mosque | 86.9 | 0.1 | 13.0 |
Hama Great Mosque | 91.1 | 8.9 | 0.0 |
Palmyra Congregational | 92.2 | 7.8 | 0.0 |
Amr ibn -Al-As | 100.0 | 0.0 | 0.0 |
Sidi Ghanem | 0.0 | 0.0 | 100.0 |
Graveyard of Sidi ’Ukba | 0.1 | 58.2 | 41.7 |
Qasr Humeima | 100.0 | 0.0 | 0.0 |
Zawailah | 0.0 | 100.0 | 0.0 |
Dome of the Chain | 100.0 | 0.0 | 0.0 |
Ka’ba | 100.0 | 0.0 | 0.0 |
Qasr El-Bai’j | 79.2 | 29.6 | 0.0 |
Um Jimal Later Castellum | 84.7 | 15.3 | 0.0 |
Kathisma Church | 100.0 | 0.0 | 0.0 |
Qasr Mushash | 86.1 | 13.9 | 0.0 |
Seven Sleepers Mosque | 92.1 | 7.9 | 0.0 |
Husn Umayyad Mosque | 97.4 | 2.6 | 0.0 |
Zeila Qiblatain Mosque (Rt) | 100.0 | 0.0 | 0.0 |
Zeila Qiblatain Mosque (Lft) | 100.0 | 0.0 | 0.0 |
Rec. Point | Latitude | Longitude | Weight | Distortion |
---|---|---|---|---|
99.87% | 0.00481 | |||
0.13% | 0.00498 |
Rec. Point | Latitude | Longitude | Weight | Distortion |
---|---|---|---|---|
2.57% | 0.00662 | |||
34.01% | 0.01134 | |||
63.42% | 0.01329 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Harremoës, P. Rate Distortion Theory for Descriptive Statistics. Entropy 2023, 25, 456. https://doi.org/10.3390/e25030456
Harremoës P. Rate Distortion Theory for Descriptive Statistics. Entropy. 2023; 25(3):456. https://doi.org/10.3390/e25030456
Chicago/Turabian StyleHarremoës, Peter. 2023. "Rate Distortion Theory for Descriptive Statistics" Entropy 25, no. 3: 456. https://doi.org/10.3390/e25030456
APA StyleHarremoës, P. (2023). Rate Distortion Theory for Descriptive Statistics. Entropy, 25(3), 456. https://doi.org/10.3390/e25030456