# Compression Challenges in Large Scale Partial Differential Equation Solvers

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Compression Aspects of PDE Solvers

#### 2.1. High Entropy Data Necessitates Lossy Compression

#### 2.2. Data Layout Affects Compression Design Space

#### 2.3. Error Metrics and Error Propagation Affect Compression Accuracy Needs

#### 2.4. Compression Speed and Complexity Follow the Memory Hierarchy

## 3. In-Memory Compression

#### 3.1. Scalar Quantization for Overlapping Schwarz Smoothers

#### 3.2. Fixed-Rate Transform Coding

## 4. Communication in Distributed Systems

#### 4.1. Inexact Parallel-in-Time Integrators

#### 4.2. Multilevel Transform Coding on Unstructured Grids for Compressed Communication

#### 4.3. Error Metrics

## 5. Mass Storage

#### 5.1. Adjoint Solutions

#### 5.1.1. PDE-Constrained Optimization

#### 5.1.2. Goal-Oriented Error Estimation

#### 5.1.3. Adaptive Grid Refinement

#### 5.2. Checkpoint/Restart

#### 5.3. Postprocessing and Archiving

#### 5.3.1. Crash Simulation

#### 5.3.2. Weather and Climate

#### 5.3.3. Computational Fluid Dynamics

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Strikwerda, J. Finite Difference Schemes and Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2007. [Google Scholar]
- Deuflhard, P.; Weiser, M. Adaptive Numerical Solution of PDEs; de Gruyter: Berlin, Germany, 2012. [Google Scholar]
- Zienkiewicz, O.; Taylor, R.; Zhu, J. The Finite Element Method; Elsevier Butterworth-Heinemann: Oxford, UK, 2005. [Google Scholar]
- McCalpin, J. Memory Bandwidth and System Balance in HPC Systems. 2016. Available online: https://sites.utexas.edu/jdm4372/2016/11/22/sc16-invited-talk-memory-bandwidth-and-system-balance-in-hpc-systems/ (accessed on 16 September 2019).
- McCalpin, J. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Tech. Comm. Comput. Archit. (TCCA) Newsl.
**1995**, 2, 19–25. [Google Scholar] - McKee, S. Reflections on the memory wall. In Proceedings of the Conference Computing Frontiers, Ischia, Italy, 14–16 April 2004; pp. 162–167. [Google Scholar]
- Alted, F. Why Modern CPUs Are Starving and What Can Be Done about It. Comp. Sci. Eng.
**2010**, 12, 68–71. [Google Scholar] [CrossRef] - Reed, D.; Dongarra, J. Exascale computing and big data. Comm. ACM
**2015**, 58, 56–68. [Google Scholar] [CrossRef] - Lindstrom, P.; Isenburg, M. Fast and Efficient Compression of Floating-Point Data. IEEE Trans. Vis. Comput. Graphics
**2006**, 12, 1245–1250. [Google Scholar] [CrossRef] [PubMed] - Burtscher, M.; Ratanaworabhan, P. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Comp.
**2009**, 58, 18–31. [Google Scholar] [CrossRef] - Claggett, S.; Azimi, S.; Burtscher, M. SPDP: An Automatically Synthesized Lossless Compression Algorithm for Floating-Point Data. In Proceedings of the IEEE 2018 Data Compression Conference, Snowbird, UT, USA, 27–30 March 2018; p. 17936904. [Google Scholar]
- Filgueira, R.; Singh, D.; Carretero, J.; Calderón, A.; García, F. Adaptive-Compi: Enhancing MPI-Based Applications’ Performance and Scalability by using Adaptive Compression. Int. J. High Perform. Comput. Appl.
**2011**, 25, 93–114. [Google Scholar] [CrossRef] - Lakshminarasimhan, S.; Shah, N.; Ethier, S.; Ku, S.H.; Chang, C.; Klasky, S.; Latham, R.; Ross, R.; Samatova, N. ISABELA for effective in situ compression of scientific data. Concurr. Comp. Pract. Exp.
**2013**, 25, 524–540. [Google Scholar] [CrossRef] - Iverson, J.; Kamath, C.; Karypis, G. Fast and Effective Lossy Compression Algorithms for Scientific Datasets. In Euro-Par 2012 Parallel Processing; Kaklamanis, C., Papatheodorou, T., Spirakis, P., Eds.; Springer: Berlin, Germany, 2012; pp. 843–856. [Google Scholar]
- Lindstrom, P. Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Vis. Comp. Graphics
**2014**, 20, 2674–2683. [Google Scholar] [CrossRef] - Lindstrom, P. Error distributions of lossy floating-point compressors. In Proceedings of the Joint Statistical Meetings, Baltimore, MD, USA, 29 July–3 August 2017; Volume 2017, pp. 2574–2589. [Google Scholar]
- Diffenderfer, J.; Fox, A.; Hittinger, J.; Sanders, G.; Lindstrom, P. Error Analysis of ZFP Compression for Floating-Point Data. SIAM J. Sci. Comput.
**2019**, 41, A1867–A1898. [Google Scholar] [CrossRef] [Green Version] - Di, S.; Cappello, F. Fast error-bounded lossy HPC data compression with SZ. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, USA, 23–27 May 2016; pp. 730–739. [Google Scholar]
- Tao, D.; Di, S.; Chen, Z.; Cappello, F. Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, USA, 29 May–2 June 2017; pp. 1129–1139. [Google Scholar] [CrossRef]
- Liang, X.; Di, S.; Tao, D.; Li, S.; Li, S.; Guo, H.; Chen, Z.; Cappello, F. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 438–447. [Google Scholar]
- Weiser, M.; Götschel, S. State Trajectory Compression for Optimal Control with Parabolic PDEs. SIAM J. Sci. Comp.
**2012**, 34, A161–A184. [Google Scholar] [CrossRef] - Götschel, S. Adaptive Lossy Trajectory Compression for Optimal Control of Parabolic PDEs. Ph.D. Thesis, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany, 2015. [Google Scholar]
- Demaret, L.; Dyn, N.; Floater, M.; Iske, A. Adaptive Thinning for Terrain Modelling and Image Compression. In Advances in Multiresolution for Geometric Modelling; Dodgson, N., Floater, M., Sabin, M., Eds.; Springer: Berlin, Germany, 2005; pp. 319–338. [Google Scholar]
- Solin, P.; Andrs, D. On Scientific Data and Image Compression Based on Adaptive Higher-Order FEM. Adv. Appl. Math. Mech.
**2009**, 1, 56–68. [Google Scholar] - Shafaat, T.; Baden, S. A method of adaptive coarsening for compressing scientific datasets. In Applied Parallel Computing. State of the Art in Scientific Computing; Kåström, B., Elmroth, E., Dongarra, J., Waśniewski, J., Eds.; Springer: Berlin, Germany, 2007; pp. 774–780. [Google Scholar]
- Unat, D.; Hromadka, T.; Baden, S. An Adaptive Sub-sampling Method for In-memory Compression of Scientific Data. In Proceedings of the IEEE 2009 Data Compression Conference, Snowbird, UT, USA, 16–18 March 2009; p. 10666336. [Google Scholar]
- Austin, W.; Ballard, G.; Kolda, T. Parallel Tensor Compression for Large-Scale Scientific Data. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Chicago, IL, USA, 23–27 May 2016; p. 16158560. [Google Scholar]
- Ballard, G.; Klinvex, A.; Kolda, T. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition. arXiv
**2019**, arXiv:1901.06043. [Google Scholar] - Ballester-Ripoll, R.; Lindstrom, P.; Pajarola, R. TTHRESH: Tensor Compression for Multidimensional Visual Data. IEEE Trans. Vis. Comp. Graph.
**2019**. [Google Scholar] [CrossRef] - Ainsworth, M.; Tugluk, O.; Whitney, B.; Klasky, S. Multilevel techniques for compression and reduction of scientific data – the multilevel case. SIAM J. Sci. Comput.
**2019**, 41, A1278–A1303. [Google Scholar] [CrossRef] - Peyrot, J.L.; Duval, L.; Payan, F.; Bouard, L.; Chizat, L.; Schneider, S.; Antonini, M. HexaShrink, an exact scalable framework for hexahedral meshes with attributes and discontinuities: Multiresolution rendering and storage of geoscience models. Comput. Geosci.
**2019**, 23, 723–743. [Google Scholar] [CrossRef] - Tao, D.; Di, S.; Liang, X.; Chen, Z.; Cappello, F. Optimizing lossy compression rate-distortion from automatic online selection between sz and zfp. IEEE Trans. Parallel Distrib. Syst.
**2019**, 30, 1857–1871. [Google Scholar] [CrossRef] - Maglo, A.; Lavoué, G.; Dupont, F.; Hudelot, C. 3D Mesh Compression: Survey, Comparisons, and Emerging Trends. ACM Comput. Surv.
**2015**, 47, 44. [Google Scholar] [CrossRef] - Götschel, S.; von Tycowicz, C.; Polthier, K.; Weiser, M. Reducing Memory Requirements in Scientific Computing and Optimal Control. In Multiple Shooting and Time Domain Decomposition Methods; Carraro, T., Geiger, M., Körkel, S., Rannacher, R., Eds.; Springer: Berlin, Germany, 2015; pp. 263–287. [Google Scholar]
- Nasiri, F.; Bidgoli, N.M.; Payan, F.; Maugey, T. A Geometry-aware Framework for Compressing 3D Mesh Textures. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 4015–4019. [Google Scholar] [CrossRef]
- Caillaud, F.; Vidal, V.; Dupont, F.; Lavoué, G. Progressive compression of arbitrary textured meshes. Comput. Graphics Forum
**2016**, 35, 475–484. [Google Scholar] [CrossRef] - Anzt, H.; Dongarra, J.; Flegar, G.; Higham, N.; Quintana-Ortí, E. Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers. Concurr. Comput.
**2019**, 31, e4460. [Google Scholar] [CrossRef] - Schneck, J.; Weiser, M.; Wende, F. Impact of Mixed Precision and Storage Layout on Additive Schwarz Smoothers; Report 18-62; Zuse Institute: Berlin, Germany, 2018. [Google Scholar]
- Hackbusch, W. A sparse matrix arithmetic based on $\mathscr{H}$-matrices, Part I: introduction to $\mathscr{H}$-matrices. Computing
**1999**, 62, 89–108. [Google Scholar] [CrossRef] - Dahmen, W.; Harbrecht, H.; Schneider, R. Compression techniques for boundary integral equations – asymptotically optimal complexity estimates. SIAM J. Numer. Anal.
**2006**, 43, 2251–2271. [Google Scholar] [CrossRef] - Lu, T.; Liu, Q.; He, X.; Luo, H.; Suchyta, E.; Choi, J.; Podhorszki, N.; Klasky, S.; Wolf, M.; Liu, T.; et al. Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium, Vancouver, BC, USA, 21–25 May 2018; p. 17974767. [Google Scholar]
- Poppick, A.; Nardi, J.; Feldman, N.; Baker, A.; Hammerling, D. A Statistical Analysis of Compressed Climate Model Data. In Proceedings of the 4th International Workshop Data Reduction for Big Scientific Data, Frankfurt, Germany, 28 June 2018. [Google Scholar]
- Baker, A.H.; Hammerling, D.M.; Mickelson, S.A.; Xu, H.; Stolpe, M.B.; Naveau, P.; Sanderson, B.; Ebert-Uphoff, I.; Samarasinghe, S.; De Simone, F.; et al. Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev.
**2016**, 9, 4381–4403. [Google Scholar] [CrossRef] [Green Version] - Hoang, D.; Klacansky, P.; Bhatia, H.; Bremer, P.T.; Lindstrom, P.; Pascucci, V. A Study of the Trade-off Between Reducing Precision and Reducing Resolution for Data Analysis and Visualization. IEEE Trans. Vis. Comp. Graph.
**2019**, 25, 1193–1203. [Google Scholar] [CrossRef] - Whitney, B. Multilevel Techniques for Compression and Reduction of Scientific Data. Ph.D. Thesis, Brown University, Providence, RI, USA, 2018. [Google Scholar]
- Götschel, S.; Weiser, M. Lossy Compression for PDE-constrained Optimization: Adaptive Error Control. Comput. Optim. Appl.
**2015**, 62, 131–155. [Google Scholar] [CrossRef] - Jacob, B.; Ng, S.; Wang, D. Memory Systems: Cache, DRAM, Disk; Morgan Kaufman: Burlington, MA, USA, 2010. [Google Scholar]
- Cappello, F.; Di, S.; Li, S.; Liang, X.; Gok, A.; Tao, D.; Yoon, C.; Wu, X.C.; Alexeev, Y.; Chong, F. Use cases of lossy compression for floating-point data in scientific data sets. Int. J. High Perf. Comp. Appl.
**2019**. [Google Scholar] [CrossRef] - Williams, S.; Waterman, A.; Patterson, D. Roofline: An insightful visual performance model for multicore architectures. Comm. ACM
**2009**, 52, 65–76. [Google Scholar] [CrossRef] - Pekhimnko, G.; Seshadri, V.; Kim, Y.; Xin, H.; Mutlu, O.; Gibbons, P.; Kozuch, M.; Mowry, T. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, 7–11 December 2013; p. 16657326. [Google Scholar]
- Shafiee, A.; Taassori, M.; Balasubramonian, R.; Davis, A. MemZip: Exploring unconventional benefits from memory compression. In Proceedings of the 20th International Symposium on High Performance Computer Architecture, Orlando, FL, USA, 15–19 February 2014; p. 14393980. [Google Scholar]
- Young, V.; Nair, P.; Qureshi, M. DICE: Compressing DRAM caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24–28 June 2017; p. 17430274. [Google Scholar]
- Jain, A.; Hill, P.; Lin, S.C.; Khan, M.; Haque, M.; Laurenzano, M.; Mahlke, S.; Tang, L.; Mars, J. Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, 15–19 October 2016; p. 16545476. [Google Scholar]
- Mittal, S.; Vetter, J. A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems. IEEE Trans. Parallel Distrib. Syst.
**2016**, 27, 1524–1536. [Google Scholar] [CrossRef] - Kahan, W. 754-2008—IEEE Standard for Floating-Point Arithmetic; IEEE: Los Alamitos, CA, USA, 2008. [Google Scholar] [CrossRef]
- Baboulin, M.; Buttari, A.; Dongarra, J.; Kurzak, J.; Langou, J.; Langou, J.; Luszczek, P.; Tomov, S. Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun.
**2009**, 180, 2526–2533. [Google Scholar] [CrossRef] [Green Version] - Anzt, H.; Luszczek, P.; Dongarra, J.; Heuveline, V. GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement. In Euro-Par 2012 Parallel Processing; Lecture Notes in Computer Science; Kaklamanis, C., Papatheodorou, T., Spirakis, P., Eds.; Springer: Berlin, Germany, 2012; Volume 7484. [Google Scholar]
- Grout, R. Mixed-Precision Spectral Deferred Correction; Preprint CP-2C00-64959; National Renewable Energy Laboratory: Golden, CO, USA, 2015. [Google Scholar]
- Giraud, L.; Haidar, A.; Watson, L. Mixed-Precision Preconditioners in Parallel Domain Decomposition Solvers. In Domain Decomposition Methods in Science and Engineering XVII; Lecture Notes in Computational Science and Engineering; Langer, U., Discacciati, M., Keyes, D., Widlund, O., Zulehner, W., Eds.; Springer: Berlin, Germany, 2008; Volume 60, pp. 357–364. [Google Scholar]
- Ahmed, N.; Natarajan, T.; Rao, K.R. Discrete Cosine Transform. IEEE Trans. Comput.
**1974**, C-23, 90–93. [Google Scholar] [CrossRef] - Said, A.; Pearlman, W. A new, fast, and efficient image codiec based on set partitioning in hierarchical trees. IEEE Trans. Circ. Syst. Video Technol.
**1996**, 6, 243–250. [Google Scholar] [CrossRef] - Toselli, A.; Widlund, O. Domain Decomposition Methods—Algorithms and Theory; Computational Mathematics; Springer: Berlin, Germany, 2005; Volume 34. [Google Scholar]
- Gander, M. 50 Years of Time Parallel Time Integration. In Multiple Shooting and Time Domain Decomposition Methods; Carraro, T., Geiger, M., Körkel, S., Rannacher, R., Eds.; Springer: Cham, Switzerland, 2015; Volume 9, pp. 69–113. [Google Scholar]
- Emmett, M.; Minion, M. Toward an efficient parallel in time method for partial differential equations. Comm. Appl. Math. Comp. Sci.
**2012**, 7, 105–132. [Google Scholar] [CrossRef] - Fischer, L.; Götschel, S.; Weiser, M. Lossy data compression reduces communication time in hybrid time-parallel integrators. Comput. Vis. Sci.
**2018**, 19, 19–30. [Google Scholar] [CrossRef] - Martin, G. Range encoding: An algorithm for removing redundancy from a digitised message. In Proceedings of the Video & Data Recording Conference, Southampton, Hampshire, UK, 24–27 July 1979. [Google Scholar]
- Sweldens, W. The Lifting Scheme: A Construction of Second Generation Wavelets. SIAM J. Math. Anal.
**1998**, 29, 511–546. [Google Scholar] [CrossRef] [Green Version] - Stevenson, R. Locally supported, piecewise polynomial biorthogonal wavelets on nonuniform meshes. Constr. Approx.
**2003**, 19, 477–508. [Google Scholar] [CrossRef] - Cohen, A.; Echeverry, L.M.; Sun, Q. Finite Element Wavelets; Technical Report; Université Pierre et Marie Curi: Paris, France, 2000. [Google Scholar]
- Ochoa, I.; Asnani, H.; Bharadia, D.; Chowdhury, M.; Weissman, T.; Yona, G. QualComp: A new lossy compressor for quality scores based on rate distortion theory. BMC Bioinform.
**2013**, 14, 187. [Google Scholar] [CrossRef] - Götschel, S.; Chamakuri, N.; Kunisch, K.; Weiser, M. Lossy Compression in Optimal Control of Cardiac Defibrillation. J. Sci. Comp.
**2014**, 60, 35–59. [Google Scholar] [CrossRef] - Böhm, C.; Hanzich, M.; de la Puente, J.; Fichtner, A. Wavefield compression for adjoint methods in full-waveform inversion. Geophysics
**2016**, 81, R385–R397. [Google Scholar] [CrossRef] [Green Version] - Lindstrom, P.; Chen, P.; Lee, E.J. Reducing disk storage of full-3D seismic waveform tomography (F3DT) through lossy online compression. Comput. Geosci.
**2016**, 93, 45–54. [Google Scholar] [CrossRef] [Green Version] - Oden, J.T.; Prudhomme, S. Goal-oriented error estimation and adaptivity for the finite element method. Comput. Math. Appl.
**2001**, 41, 735–756. [Google Scholar] [CrossRef] [Green Version] - Volin, Y.M.; Ostrovskii, G.M. Automatic computation of derivatives with the use of the multilevel differentiating techniques—1. Algorithmic basis. Comput. Math. Appl.
**1985**, 11, 1099–1114. [Google Scholar] [CrossRef] - Griewank, A. Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation. Optim. Methods Softw.
**1992**, 1, 35–54. [Google Scholar] [CrossRef] - Griewank, A.; Walther, A. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
- Colli Franzone, P.; Deuflhard, P.; Erdmann, B.; Lang, J.; Pavarino, L. Adaptivity in Space and Time for Reaction-Diffusion Systems in Electrocardiology. SIAM J. Sci. Comput.
**2006**, 28, 942–962. [Google Scholar] [CrossRef] [Green Version] - Nagaiah, C.; Kunisch, K.; Plank, G. Numerical solution for optimal control of the reaction-diffusion equations in cardiac electrophysiology. Comput. Optim. Appl.
**2011**, 49, 149–178. [Google Scholar] [CrossRef] - Dennis, J.E., Jr.; Moré, J.J. Quasi-Newton methods, motivation and theory. SIAM Rev.
**1977**, 19, 46–89. [Google Scholar] [CrossRef] - Borzí, A.; Schulz, V. Computational Optimization of Systems Governed by Partial Differential Equations. In Computational Science and Engineering; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar]
- Deuflhard, P.; Leinen, P.; Yserentant, H. Concepts of an Adaptive Hierarchical Finite Element Code. Impact Comput. Sci. Engrgy
**1989**, 1, 3–35. [Google Scholar] [CrossRef] - von Tycowicz, C.; Kälberer, F.; Polthier, K. Context-Based Coding of Adaptive Multiresolution Meshes. Comput. Graphics Forum
**2011**, 30, 2231–2245. [Google Scholar] [CrossRef] - Becker, R.; Rannacher, R. A feed-back approach to error control in finite element methods: Basic analysis and examples. East West J. Numer. Math.
**1996**, 4, 237–264. [Google Scholar] - Becker, R.; Kapp, H.; Rannacher, R. Adaptive finite element methods for optimal control of partial differential equations: Basic concepts. SIAM J. Control Optim.
**2000**, 39, 113–132. [Google Scholar] [CrossRef] - Rannacher, R. On the adaptive discretization of PDE-based optimization problems. In PDE Constrained Optimization; Heinkenschloss, M., Ed.; Springer: Berlin, Germany, 2006. [Google Scholar]
- Weiser, M. On goal-oriented adaptivity for elliptic optimal control problems. Optim. Meth. Softw.
**2013**, 28, 969–992. [Google Scholar] [CrossRef] - Cyr, E.; Shadid, J.; Wildey, T. Towards efficient backward-in-time adjoint computations using data compression techniques. Comput. Methods Appl. Mech. Eng.
**2015**, 288, 24–44. [Google Scholar] [CrossRef] - Tao, D.; Di, S.; Liang, X.; Chen, Z.; Cappello, F. Improving Performance of Iterative Methods by Lossy Checkponting. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, Tempe, AZ, USA, 11–15 June 2018; ACM: New York, NY, USA, 2018; pp. 52–65. [Google Scholar] [CrossRef]
- Calhoun, J.; Cappello, F.; Olson, L.; Snir, M.; Gropp, W. Exploring the feasibility of lossy compression for PDE simulations. Int. J. High Perform. Comput. Appl.
**2019**, 33, 397–410. [Google Scholar] [CrossRef] - Young, J.W. A First Order Approximation to the Optimum Checkpoint Interval. Commun. ACM
**1974**, 17, 530–531. [Google Scholar] [CrossRef] - Daly, J.T. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gerner. Comp. Syst.
**2006**, 22, 303–312. [Google Scholar] [CrossRef] - Di, S.; Robert, Y.; Vivien, F.; Cappello, F. Toward an Optimal Online Checkpoint Solution under a Two-Level HPC Checkpoint Model. IEEE Trans. Parallel Distrib. Syst.
**2017**, 28, 244–259. [Google Scholar] [CrossRef] - Thole, C.A. Compression of LS-DYNA3D™ Simulation Results using FEMZIP©. In Proceedings of the 3rd LS-DYNA Anwenderforum, Bamberg, Germany, 14–15 October 2004; pp. E-III-1–E-III-5. [Google Scholar]
- Teran, R.I.; Thole, C.A.; Lorentz, R. New Developments in the Compression of LS-DYNA Simulation Results using FEMZIP. In Proceedings of the 6th European LS-DYNA Users’ Conference, Salzburg, Austria, 9–11 May 2007. [Google Scholar]
- Mertler, S.; Müller, S.; Thole, C. Predictive Principal Component Analysis as a Data Compression Core in a Simulation Data Management System. In Proceedings of the 2015 Data Compression Conference, Snowbird, UT, USA, 7–9 April 2015; pp. 173–182. [Google Scholar] [CrossRef]
- Düben, P.D.; Leutbecher, M.; Bauer, P. New methods for data storage of model output from ensemble simulations. Mon. Weather Rev.
**2019**, 147, 677–689. [Google Scholar] [CrossRef] - Kuhn, M.; Kunkel, J.M.; Ludwig, T. Data compression for climate data. Supercomput. Front. Innov.
**2016**, 3, 75–94. [Google Scholar] - Otero, E.; Vinuesa, R.; Marin, O.; Laure, E.; Schlatter, P. Lossy data compression effects on wall-bounded turbulence: Bounds on data reduction. Flow Turbul. Combust.
**2018**, 101, 365–387. [Google Scholar] [CrossRef] - Marina, O.; Schanena, M.; Fischer, P. Large-Scale Lossy Data Compression Based on an a Priori Error Estimator in a Spectral Element Code; Technical Report; ANL/MCS-p6024-0616; Argonne National Laboratory: Lemont, IL, USA, 2016. [Google Scholar]

**Figure 1.**Naive roofline model showing achievable performance vs. the arithmetic intensity. Some computations, e.g., dense matrix-matrix multiplication (GEMM), perform many flops per byte fetched or written to memory, such that their execution speed is bounded by the peak floating point performance. Others, such as sparse matrix-vector multiplication (SpMV), require many bytes to be fetched from memory for each flop, and are therefore memory-bound (filled diamonds). Data compression methods for memory-bound computations can reduce the amount of data to be read or written, and therefore increase the arithmetic intensity. Different compression schemes can achieve different compression factors (empty diamonds on top) and thus different arithmetic intensities. The computational overhead of compression and decompression can, however, reduce the performance gain (empty circles bottom), depending on the complexity of the compression method used.

**Figure 2.**Run times of BLAS level 2 operations on $2048\times 2048$-matrices for overlapping Schwarz smoothers with mixed precision. Depending on the access patterns, a speedup over the Intel Math Kernel Library (MKL) almost on par with the compression factor can be achieved [38].

**Figure 3.**Theoretically estimated parallel efficiency $E={T}_{\mathrm{seq}}/\left(N{T}_{\mathrm{par}}\right)$ for variation of different parameters around the nominal scenario (marked by *). Left: varying communication bandwidth in terms of the communication time for uncompressed data. Right: varying requested tolerance.

**Figure 4.**Representation of 1D linear finite element functions ${u}_{h}$ in the nodal and hierarchical basis.

**Figure 5.**Error vs. compression factor: A priori estimates for transform coding of finite element functions with hierarchical basis transform, cf. [21].

**Figure 6.**Comparison of quantization errors, i.e., error in the reconstructed solution yielding the same ${H}^{-1}$ error norm. Left: hierarchical basis. Right: wavelets. Using the ${H}^{-1}$ norm to measure the error allows larger pointwise absolute reconstruction errors compared to the ${L}^{\infty}$ error metric, thus higher compression factors.Comparison of quantization errors

**Figure 7.**Optimization progress of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) for the monodomain example (9), (10), using different quantization tolerances for the state trajectory. No delta-encoding between timesteps was used. The horizontal line shows the approximate discretization error of the reduced gradient. See also [34].

**Figure 9.**Goal-oriented error estimation: generated meshes with (left) and without (middle) compression (with compression factor up to 32) for Example 3b in [87]. Meshes were generated using weights according to Weiser [87]. Estimated errors (right) are shown for weights due [87] (ws) and to Becker et al. [85] (bkr), both with and without compression. Differences between estimated errors with and without compression are barely visible, and negligible compared to the differences between the two error concepts.

**Figure 10.**Left: highly local peak function. Right: adaptively refined mesh with 4237 vertices for minimal finite element interpolation error. A uniform grid with the same local resolution has 263,169 vertices.

**Figure 11.**Influence of ${T}_{\mathrm{CP}}$, comparing the nominal scenario (green) to scenarios with ${T}_{\mathrm{CP}}$ reduced (blue) and increased (red) by a factor 4. Left: overall runtime vs. number of checkpoints for $N=4$ and ${T}_{C}$ = 100,000 s. Right: overall runtime vs. actual computation time for $N=100$. In both cases ${p}_{\mathrm{RS}}=7.74\times {10}^{-7}$, ${T}_{\mathrm{CP}}=245.583\phantom{\rule{3.33333pt}{0ex}}\mathrm{s}$ and ${T}_{R}=545.583\phantom{\rule{3.33333pt}{0ex}}\mathrm{s}$ were used. While the runtimes were measured for the parallel-in-time solution of a 3D heat equation on the HLRN-III Cray XC30/XC40 supercomputer (www.hlrn.de), the probability of failure was determined from HLRN-III logfiles.

**Table 1.**Compression factors for adaptive mesh refinement and transform coding of the peak function shown in Figure 10. Adaptive and uniform mesh refinement yield the same interpolation error; the same error tolerance for lossy compression was used in the two cases.

Double | Transform Coding | |
---|---|---|

uniform | 1 | 54 |

adaptive | 62 | 744 |

n | number of checkpoints | ${T}_{C}$ | time for actual computation [s] |

N | number of compute cores | $\phantom{\rule{4pt}{0ex}}\phantom{\rule{0.166667em}{0ex}}{T}_{\mathrm{CP}}$ | time to write/read a checkpoint [s] |

${p}_{\mathrm{RS}}$ | probability of failure | ${T}_{\mathrm{DS}}$ | time to recover data structures [s] |

per unit time and core [1/s] | ${T}_{\mathrm{R}}$ | recovery time $={T}_{\mathrm{CP}}+{T}_{\mathrm{DS}}$ [s] | |

${N}_{\mathrm{RS}}$ | number of restarts | ${T}_{\mathrm{RS}}$ | time for restart [s] |

T | overall runtime (wall clock) [s] | b | $:=1-{T}_{\mathrm{R}}{p}_{\mathrm{RS}}N$ |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Götschel, S.; Weiser, M.
Compression Challenges in Large Scale Partial Differential Equation Solvers. *Algorithms* **2019**, *12*, 197.
https://doi.org/10.3390/a12090197

**AMA Style**

Götschel S, Weiser M.
Compression Challenges in Large Scale Partial Differential Equation Solvers. *Algorithms*. 2019; 12(9):197.
https://doi.org/10.3390/a12090197

**Chicago/Turabian Style**

Götschel, Sebastian, and Martin Weiser.
2019. "Compression Challenges in Large Scale Partial Differential Equation Solvers" *Algorithms* 12, no. 9: 197.
https://doi.org/10.3390/a12090197