Resulting file sizes of the different compression algorithms can be interpreted as estimations of the information content of the data that were compressed. The description methods used to achieve efficient compression can be interpreted as models for those data. The comparison of HydroZIP results with the benchmark can serve to test whether hydrological knowledge leads to better models for hydrological data, when model adequacy is measured by compression efficiency. Furthermore the spatial patterns of compression results and most efficient methods are presented.

#### 4.1. Results of the Benchmark Algorithms

Figure 6 gives an overview of the best compression rates achieved by existing algorithms on P and Q , as a function of location of the data. The results generally show a better compression for P, due to its lower entropy, but Q can often be compressed well below its entropy, due to the strong autocorrelation. Furthermore, it is visible that western climate, with more long dry spells, generally yields more compressible time series of P and Q. Note, however, that results should be interpreted with caution due to the influence of the scaling and quantization procedure used before compressing. For more elaborate discussion on the results of benchmark algorithms and spatial patterns, and a discussion on subjectivity of information content, see [

5].

**Figure 6.**
Spatial distribution of the compression results with the best benchmark algorithms for each time series of rainfall (left) and streamflow (right).

**Figure 6.**
Spatial distribution of the compression results with the best benchmark algorithms for each time series of rainfall (left) and streamflow (right).

#### 4.2. Results for Compression with HydroZIP

We can see from

Figure 7 that HydroZIP outperformed the benchmark algorithms on all rainfall series and a good part of the streamflow series, with 90% of the compressed file size reductions falling between 1.4% and 11.8% for compression of P; see

Table 2 for more statistics. For the permuted series on the right, where temporal dependencies cannot be exploited for compression, the results are even more pronounced. This result indicates that the main gain of HydroZIP is due to the efficient characterization of the distribution, and the fact that these parametric distributions are good fits to the hydrological data. The less pronounced difference in the original time series may indicate that there is still room for improvement for the coding of temporal dependencies, which apparently works well for some of the benchmark algorithms.

As can be seen from

Figure 8, HydroZIP can outperform all benchmark algorithms in all basins by using an efficient description of a two parameter gamma distribution for wet days, and the probability of dry days as a separate parameter, either after run length encoding of the dry spells (legend: RLE) or directly (legend:

$\Gamma +\mathrm{P}\left(0\right)$). Analogously, we can say that the compression experiment yielded the inference that daily precipitation is best modeled as a mixture of an occurrence process and a intensity process. This model is in line with [

63,

64], who also used AIC for model selection, which can be interpreted as analogous to our present approach of finding the shortest description for the data. Also the fact that we found the gamma distribution to be a powerful description for compression of the data is in line with the widespread use of the gamma distribution to describe daily rainfall amounts [

65,

66,

67], although other distributions are sometimes found to behave better, especially in the tail [

68]. In general, finding good compressors for environmental time series is completely analogous to modeling those series, such as done in, e.g., [

69,

70,

71].

**Figure 7.**
Comparison of file size after compression between HydroZIP and the best compression of the benchmark algorithms. Each point represents the data for one catchment. Points below the line indicate that HydroZIP outperforms the benchmark. The left figure shows results for the time series of rainfall and runoff. For the right figure, these time series are randomly permuted to exclude the effect of temporal dependencies.

**Figure 7.**
Comparison of file size after compression between HydroZIP and the best compression of the benchmark algorithms. Each point represents the data for one catchment. Points below the line indicate that HydroZIP outperforms the benchmark. The left figure shows results for the time series of rainfall and runoff. For the right figure, these time series are randomly permuted to exclude the effect of temporal dependencies.

**Table 2.**
Quantiles over the set of 431 basins of percentage file size reduction of HydroZIP over benchmark. Negative values indicate a larger file size for HydroZIP.

**Table 2.**
Quantiles over the set of 431 basins of percentage file size reduction of HydroZIP over benchmark. Negative values indicate a larger file size for HydroZIP.
quantile | file size reduction (%) |
---|

| **P** | **Q** | **Pperm** | **Qperm** |

Min | −2.5 | −83.2 | 3.5 | 15.0 |

5% | 1.4 | −9.6 | 4.7 | 48.4 |

50% | 5.5 | −0.3 | 6.8 | 57.7 |

95% | 11.8 | 2.8 | 17.0 | 70.5 |

Max | 34.2 | 15.4 | 36.7 | 83.5 |

**Figure 8.**
Geographical spread of best performance compression methods for P.

**Figure 8.**
Geographical spread of best performance compression methods for P.

**Figure 9.**
Geographical spread of best performance compression methods for Q. For the locations with circles, one of the benchmark algorithms performed best. The stars indicate the HydroZIP algorithm using coding of the differences.

**Figure 9.**
Geographical spread of best performance compression methods for Q. For the locations with circles, one of the benchmark algorithms performed best. The stars indicate the HydroZIP algorithm using coding of the differences.

The best compression algorithms for streamflow show a more diverse geographical picture (

Figure 9). Only in 199 out of 431 basins was HydroZIP able to outperform the best benchmark algorithm, using the method of coding the lag-1 differences with a skew-Laplace distribution (the stars in

Figure 9); see

Figure 4 for an example of the efficiency of this method. From

Figure 10 it becomes clear that the better performance of the benchmark algorithms is mainly due to the efficient use of temporal dependencies in those algorithms, since for the randomly permuted series HydroZIP outperforms the benchmark in 364 out of 431 basins.

**Figure 10.**
Geographical spread of best performance compression methods for the randomly permuted Q series. HydroZIP outperforms the benchmark everywhere except at the green and dark blue circles.

**Figure 10.**
Geographical spread of best performance compression methods for the randomly permuted Q series. HydroZIP outperforms the benchmark everywhere except at the green and dark blue circles.