Multiband and Lossless Compression of Hyperspectral Images

Hyperspectral images are widely used in several real-life applications. In this paper, we investigate on the compression of hyperspectral images by considering different aspects, including the optimization of the computational complexity in order to allow implementations on limited hardware (i.e., hyperspectral sensors, etc.). We present an approach that relies on a three-dimensional predictive structure. Our predictive structure, 3D-MBLP, uses one or more previous bands as references to exploit the redundancies among the third dimension. The achieved results are comparable, and often better, with respect to the other state-of-art lossless compression techniques for hyperspectral images.


Introduction
Hyperspectral imaging instruments collect information by exploring the electromagnetic spectrum of a specific geographical area.In contrast to the human eye and traditional camera sensors, which can only perceive visible light (i.e., the wavelengths between 360 to 760 nanometers (nm)), spectral imaging techniques allow to cover a significant portion of wavelengths (i.e., the frequencies of ultraviolet and infrared rays).It is important to note that the spectrum is subdivided into different spectral bands.Therefore, hyperspectral images can be viewed as three-dimensional data (often referred as datacubes).
For instance, the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [1] hyperspectral sensor (NASA Jet Propulsion Laboratory (JPL) [2]) measures from 380 to 2500 nm of the electromagnetic spectrum.In particular, the spectrum is subdivided into 224 spectral bands.
From the analysis of hyperspectral data, it is possible to identify and/or classify materials, objects, etc.Such capabilities are related to the fact that some objects and materials have a unique signature (a sort of fingerprint) in the electromagnetic spectrum, therefore this fingerprint can be used for identification purposes.
Hyperspectral data are widely used in real-life applications including agriculture, mineralogy, physics, surveillance, etc.For instance, in geological applications the capabilities of hyperspectral remote sensing are exploited to identify various types of minerals or to search for minerals and oil.
One of the most important parameters to evaluate the precision of a sensor is the spectral resolution, which is the width between two adjacent bands.For instance, by considering the AVIRIS hyperspectral images, the spectral resolution is 10 nm.The spatial resolution is a relevant aspect too.Informally, the spatial resolution denotes how extensive is the geographical area mapped by the sensor into a pixel.It could be difficult to recognize materials and/or objects from a pixel, if a too wide area is mapped into it.
Many hundreds of gigabytes can be produced every day by a single hyperspectral sensor.Therefore, it is necessary to compress these data, in order to transmit and to store them efficiently.
Since such data are often used in delicate tasks and there are high costs involved in the acquisitions, lossless compression is generally required.
This paper focuses on a novel technique for the lossless compression of hyperspectral images.The proposed algorithm is based on the predictive coding model and the proposed predictive structure uses a configurable multiband three-dimensional structure.It is possible to customize our predictor by individuating the number of the previous bands which will be used as references and the wideness of the prediction context.Through appropriate configurations of such parameters, the computational complexity and the memory usage can be optimized depending on the hardware available.
Because of its high configurability, our algorithm is suitable for "on board" implementations on hardware with limited capabilities, as for example on an airplane or on a satellite.
The experimental results we have achieved are comparable and often better, with respect to other state of the art approaches.Our scheme provides a good trade-off between computational complexity/memory usage and compression performances.
The rest of the paper is organized as follows: Section 2 briefly reviews previous work on lossless and lossy compression of hyperspectral images, Section 3 outlines the proposed lossless compression approach and Section 4 focuses on the description of experimental results.Finally, Section 5 highlights our conclusion and future work directions.

Related Works
Lossless compression of hyperspectral images is generally based on the predictive coding model.The predictive-based approaches have different advantages: they use limited resources in terms of computational power and memory usage and achieve good compression performances.Often, these models are suitable for on board implementations.
The Consultative Committee for Space Data Systems (CCSDS) has specified the CCSDS 123 standard, which outlines a method for lossless compression of multispectral and hyperspectral image data and a format for storing the compressed data [7,8].The main objective is to establish a Recommended Standard for a multispectral and hyperspectral images, and to specify the compressed data format.In literature, many proposed approaches implement the recommendations of the CCSDS 123 standard for the lossless compression of hyperspectral images, as for instance, the ones described in [9][10][11].
Other approaches are designed for offline compression, since they use more sophisticated techniques and/or they require the complete availability of the hyperspectral image.These approaches are not suitable for an on board implementation but can achieve better compression performances.Mielikainen, in [12], proposed an approach for the compression of hyperspectral image through Look-Up Table (LUT).LUT predicts each pixel by using all the pixels in the current and in the previous band, by searching the nearest neighbor, in the previous band, which has the same pixel value as the pixel located in the same spatial coordinates as the current pixel.LUT has high compression performances, but it uses more resources in terms of memory and CPU usage.
Other lossless techniques are based on dimensionality reduction through principal component transform [13] or they are based on the clustered differential pulse code modulation [14].An error-resilient lossless compression technique is proposed in [15].
For the lossy compression of hyperspectral images, the compression algorithms are generally based on 3D frequency transforms: as for examples 3-D Discrete Wavelet Transform (3D-DWT) [16], 3-D Discrete Cosine Transform (3D-DCT) [17], Karhunen-Loève transform (KLT) [18], etc.These approaches are easily scalable.On the other hand, they must maintain the entire hyperspectral image at the same time in memory.Locally optimal Partitioned Vector Quantization (LPVQ) [19,20] applies a Partitioned Vector Quantization (PVQ) scheme independently to each pixel of the hyperspectral image.
The variable sizes of the partitions are chosen adaptively and the indices are entropy coded.The codebook is included as part of the coded output.This technique can be used also in lossless mode, but the high costs required in terms of CPU and memory do not allow an on board implementation.

Lossless Multiband Compression for Hyperspectral Images (LMBHI)
Hyperspectral images present two typologies of correlations: In particular, contiguous bands are strongly correlated (inter-band correlation) and the pixels are generally correlated, since, for instance, two adjacent pixels map adjacent areas, possibly composed of the same material, etc. (intra-band correlation).Such characterizations are exploited by the compression strategies, in order to optimize the redundancy among the third dimension.The main aim of our approach, which we denoted as Lossless MultiBand compression for Hyperspectral Images (LMBHI), is to exploit the correlation with a predictive coding model.
In detail, for each pixel, X, of the input hyperspectral image, LMBHI performs the prediction of the current pixel, X, by selecting the appropriate prediction context of X (three-dimensional or a bi-dimensional contexts).
All the pixels that belong to the first band are predicted by using a bi-dimensional predictive structure: the 2-D Linearized Median Predictor (2-D LMP) [21], which exploits only the intra-band correlation, since the first band has no reference bands.The other pixels are predicted by using a new three-dimensional predictive approach, which uses a prediction context composed of the neighboring pixels of X and its reference pixels in the previous bands.
Once the prediction step is computed, the prediction error e (defined in Equation ( 1)) is modeled and coded.

Review of the 2-D Linearized Median Predictor (2D-LMP)
The 2-D Linearized Median Predictor (2D-LMP) [21] uses a prediction context that is composed by three neighboring pixels of X, namely, I A , I B , and I C , as shown in Figure 1.In particular, the predictive structure is derived from the well-established 2-D Median Predictor, which is used in JPEG-LS [22].The 2-D Median Predictor has the following predictive structure outlined in the Equation (2).
X " Algorithms 2016, 9, 16 3 of 15 The variable sizes of the partitions are chosen adaptively and the indices are entropy coded.The codebook is included as part of the coded output.This technique can be used also in lossless mode, but the high costs required in terms of CPU and memory do not allow an on board implementation.
In particular, contiguous bands are strongly correlated (inter-band correlation) and the pixels are generally correlated, since, for instance, two adjacent pixels map adjacent areas, possibly composed of the same material, etc. (intra-band correlation).Such characterizations are exploited by the compression strategies, in order to optimize the redundancy among the third dimension.The main aim of our approach, which we denoted as Lossless MultiBand compression for Hyperspectral Images (LMBHI), is to exploit the correlation with a predictive coding model.
In detail, for each pixel, , of the input hyperspectral image, LMBHI performs the prediction of the current pixel, , by selecting the appropriate prediction context of X (three-dimensional or a bi-dimensional contexts).
All the pixels that belong to the first band are predicted by using a bi-dimensional predictive structure: the 2-D Linearized Median Predictor (2-D LMP) [21], which exploits only the intra-band correlation, since the first band has no reference bands.The other pixels are predicted by using a new three-dimensional predictive approach, which uses a prediction context composed of the neighboring pixels of and its reference pixels in the previous bands.Once the prediction step is computed, the prediction error (defined in Equation ( 1)) is modeled and coded.

Review of the 2-D Linearized Median Predictor (2D-LMP)
The 2-D Linearized Median Predictor (2D-LMP) [21] uses a prediction context that is composed by three neighboring pixels of , namely, , , and , as shown in Figure 1.In particular, the predictive structure is derived from the well-established 2-D Median Predictor, which is used in JPEG-LS [22].The 2-D Median Predictor has the following predictive structure outlined in the Equation (2).

= max( , )
≥ min( , ) min( , ) ≤ max( , ) + − ℎ (2) Basically, Median Predictor is in charge of selecting one of the above three options, depending on the context.By combining all the three options, it is possible to obtain the predictive structure of 2D-LMP, defined as in the Equation (3).Basically, Median Predictor is in charge of selecting one of the above three options, depending on the context.By combining all the three options, it is possible to obtain the predictive structure of 2D-LMP, defined as in the Equation (3).

3-D MultiBand Linear Predictor (3D-MBLP)
The Multiband Linear Predictor (3D-MBLP) uses a prediction context by considering two parameters: ‚ B: number of the previous bands, that are considered for the prediction; ‚ N: number of the samples for the current and each previous band, which will be used for the creation of the prediction context.
First of all, we define a bi-dimensional enumeration E, graphically represented in Figure 2. The main aim of such an enumeration, is to permit the relative indexing of the pixels with respect to the pixel which is currently under analysis (which has 0 as index in Figure 2).

•
: number of the previous bands, that are considered for the prediction; • : number of the samples for the current and each previous band, which will be used for the creation of the prediction context.First of all, we define a bi-dimensional enumeration , graphically represented in Figure 2. The main aim of such an enumeration, is to permit the relative indexing of the pixels with respect to the pixel which is currently under analysis (which has 0 as index in Figure 2).In order to define the prediction context of 3D-MBLP, we use the following notations:

•
, : indicates the -th pixel of the -th band, according to the enumeration ; • , : denotes the pixel that has the same spatial coordinates of , of the j-th band, according to the enumeration .In the following, we suppose that the current band is the -th band.In particular, by using our notations, it is possible to observe that can be also addressed as , In detail, the 3D-MBLP predictor is based on the least squares optimization technique and the prediction is computed by means of the Equation (4).

=
• , The coefficients: = , … , are chosen to minimize the energy of the prediction error described by the Equation ( 5).

=
, − , can be rewritten in matrix notation by means of the following equation: In order to define the prediction context of 3D-MBLP, we use the following notations: ‚ I i,j : indicates the i-th pixel of the j-th band, according to the enumeration E; ‚ I 0,j : denotes the pixel that has the same spatial coordinates of X, of the j-th band, according to the enumeration E.
In the following, we suppose that the current band is the k-th band.In particular, by using our notations, it is possible to observe that X can be also addressed as I 0,k In detail, the 3D-MBLP predictor is based on the least squares optimization technique and the prediction is computed by means of the Equation (4).X " The coefficients: α 0 " rα 1 , . . ., α B s T are chosen to minimize the energy of the prediction error described by the Equation (5).
P can be rewritten in matrix notation by means of the following equation: Subsequently, by taking the derivate of P and by setting it to zero, we obtain the optimal coefficients by means of the Equation (6).
Once the coefficients α 0 , which solve the linear system Equation ( 6), are computed, the prediction, X, of the current pixel X, can be calculated.

Modeling and Coding of Prediction Errors
Starting from the consideration that a prediction error can assume positive or negative values.Similarly to [23], we use an invertible mapping function (highlighted in the Equation ( 7)), in order to have only non-negative values.It is important to note that the mapping function does not alter the redundancy among the errors.For the coding of the mapped prediction errors we use the Arithmetic Coder (AC) scheme.

Computational Complexity
The main computational costs of our approach are due to the resolution of the linear system Equation ( 6), used to generate the optimal coefficients, which need the computation of the predicted pixel.By using the normal equation method, the linear system can be solved with ˆN `B 3 ˙¨B 2 floating-point operations [24].Figure 3 shows the trend of the computational complexity of our predictive model, in terms of number of operations (Y-axis) that are required for the solving of the linear system, by using configurations with different parameters (X-axis).

Experimental Results
We performed experiments on two datasets of AVIRIS hyperspectral images: the 1997 AVIRIS Dataset (Section 4.1) and the CCSDS Dataset (Section 4.2).
In our experiments we considered also the PAQ8 algorithm (described in [25]) for the coding of the prediction error.PAQ8 is a state-of-the-art lossless compression algorithm that belongs to the PAQ family of compression algorithm.It is important to note that the PAQ8 family is strictly related to the well-established Prediction by Partial Matching scheme (PPM) [26].In general, the PAQ8 algorithm achieve a high degree of compression performances, but the PAQ8 scheme has significant computational complexity.Therefore, such scheme is not fully adequate to be used for on board applications.
The experiments are performed by using a non-optimized Java-based proof-of-concept of our If we use only the previous band as a reference (B " 1), only about 20 operations are needed to solve the system.Instead, four or nine times more operations are required if we use two previous bands (B " 2) or three previous bands (B " 3).A linear system can have three kinds of solutions: no solutions, one solution, and infinity solutions.In the first and the third scenarios, the proposed predictive structure cannot perform the prediction.In these scenarios, it is desirable to use another low-complexity predictive structure and we have used the 3-D Distances-based Linearized Median Predictor (3D-DLMP) [21].

Experimental Results
We performed experiments on two datasets of AVIRIS hyperspectral images: the 1997 AVIRIS Dataset (Section 4.1) and the CCSDS Dataset (Section 4.2).
In our experiments we considered also the PAQ8 algorithm (described in [25]) for the coding of the prediction error.PAQ8 is a state-of-the-art lossless compression algorithm that belongs to the PAQ family of compression algorithm.It is important to note that the PAQ8 family is strictly related to the well-established Prediction by Partial Matching scheme (PPM) [26].In general, the PAQ8 algorithm achieve a high degree of compression performances, but the PAQ8 scheme has significant computational complexity.Therefore, such scheme is not fully adequate to be used for on board applications.
The experiments are performed by using a non-optimized Java-based proof-of-concept of our approach, which employs few minutes on a medium-end laptop (equipped with an Intel Core i5 4200 M processor and 8 GB of RAM).

1997 AVIRIS Dataset
Each one of AVIRIS hyperspectral image of the AVIRIS '97 dataset is subdivided into scenes (the number of scenes is highlighted in Table 1).It is important to note that each scene has 614 columns, 512 lines, and 224 spectral bands.In addition, each pixel is stored by using 16 bits.In Tables 2 and 3 we report the results achieved by using B " 1 with N " 8 and N " 16, respectively.Subsequently, in Tables 4 and 5 we report the results achieved, by using B = 2. Finally, Tables 6 and 7 report the results achieved by using B " 3 and the N parameter equal to 8 and equal to 16, respectively.All the results are reported in terms of Bits Per Sample (BPS).In each table we report the results achieved by using both the AC and the PAQ8 schemes for the coding of the prediction errors.In Tables 8 and 9 the average results on all the tested hyperspectral images are reported.In detail, the first column indicates the N parameter and from the second to the fourth columns the average results for B " 1, B " 2, and B " 3, respectively.As it is possible to observe from Figures 4 and 5 which graphically represent the average results, the best results are achieved when the following parameters are used: N " 16 and B " 2 (Figures 4b and 5b).The worst results are obtained by using the following parameters: N " 8 and B " 3.

Comparison with other Approaches
In order to compare the experimental results achieved by our approach, we consider the Compression Ratio (C.R.) as a measure for the compression performances.In detail, Table 10 reports the results achieved by considering several parameters on all the hyperspectral images of the used dataset.More precisely, the results are reported in terms of C.R. and they are compared with other state of the art lossless compression schemes.
From the experimental results, it should be observed that LMBHI gets its best results by using two previous bands as references (i.e. when the following parameter is used: = 2), LMBHI outperforms, in average, all the other state of the art approaches.
On the other hand, when only the previous band is used (i.e., when = 1 ), LMBHI outperforms all the compared state of the art techniques, with the exception of LPVQ.But, LPVQ is not suited for on board implementation.

Comparison with other Approaches
In order to compare the experimental results achieved by our approach, we consider the Compression Ratio (C.R.) as a measure for the compression performances.In detail, Table 10 reports the results achieved by considering several parameters on all the hyperspectral images of the used dataset.More precisely, the results are reported in terms of C.R. and they are compared with other state of the art lossless compression schemes.From the experimental results, it should be observed that LMBHI gets its best results by using two previous bands as references (i.e. when the following parameter is used: B " 2), LMBHI outperforms, in average, all the other state of the art approaches.
On the other hand, when only the previous band is used (i.e., when B " 1), LMBHI outperforms all the compared state of the art techniques, with the exception of LPVQ.But, LPVQ is not suited for on board implementation.
In this latter case, our approach achieves better results with respect to LPVQ on three of the five hyperspectral images: Moffett Field, Jasper Ridge, and Low Altitude, but LPVQ gains on Cuprite and especially on Lunar Lake.In addition, LUT obtains better results of our approach on two of four compared hyperspectral images: Lunar Lake and Jasper Ridge.
The high flexibility and adaptability of our approach makes it considerable for on board implementations.
In fact, the coding parameters can be customized depending on the hardware available.

CCSDS Dataset
In this section we focus on the experimental results we have achieved by considering the CCSDS Dataset, which is composed by five calibrated and seven uncalibrated hyperspectral images.This dataset is provided by Consultative Committee for Space Data Systems (CCSDS) Multispectral and Hyperspectral Data Compression [27].
In Table 11, we shortly describe the dataset by reporting the number of scenes (second column) and the number of samples per line (third column) for the calibrated and the uncalibrated images (first column).The samples of the calibrated and the uncalibrated images are stored by using 16 bits (16-bit signed integer for the calibrated and 16-bit unsigned for the uncalibrated), except for the Hawaii and Maine images in which the samples are stored by using 12 bits (unsigned) [27].Each image is composed by 512 lines.In Table 12, we report our results in terms of bits-per sample (BPS).The results refer to the calibrated hyperspectral images (first column), by using several configurations for our approach (columns from the second to the fourth).Analogously to Table 12, in Table 13 we report our experimental results for the uncalibrated images.In each table, we report the results by using the AC and the PAQ8 schemes.
The best results are achieved when the following configuration is used: B " 2 and N " 16.

Comparison with Other Approaches
We have compared our results on the CCSDS dataset with other state-of-art approaches.Table 14 reports the results achieved by considering several values for the B and N parameters on the calibrated hyperspectral images of the CCSDS dataset, by using the AC scheme as well as the PAQ8 scheme for the coding of the prediction errors.Tables 15 and 16 report the comparison between the proposed approach and other approaches for the 16-bit uncalibrated and 12-bit uncalibrated hyperspectral images of the CCSDS dataset.
From Tables 12 and 13 it comes clear that the best results are achieved when the value of N is equal to 16 and B is equal to 2.
By looking at Table 14, it should be observed that our approach, when the following configuration is used: N " 16 and B " 2, achieves results that are comparable but slightly worse with respect to FL and FL# [27].Our approach outperforms all the other techniques when the PAQ8 scheme is used for the coding of prediction errors with N " 16 and B " 2. However, in such configuration the computational complexity of our approach is not suitable for on board implementations.

Conclusions and Future Works
In this paper, we have investigated on the lossless compression of hyperspectral images by introducing a multiband three-dimensional predictive structure, we named as 3D-MBLP.
Because of its configurability, it is possible to implement the algorithm on different typologies of sensors, by using appropriate configuration for each type of sensors.Moreover, the proposed approach can be also easily scaled for future generation sensors which will have better hardware capabilities.The experimental results we achieved are comparable and often outperform the other state of the art lossless compression techniques.
In future works, we will include a pre-processing stage before the compression of the hyperspectral image, which substantially reorders the bands by considering their correlation.This will possibly improve the compression performance as in [28,29].

Figure 1 .
Figure 1.The prediction context of the 2D-LMP predictive structure.The gray part is already coded and the white part is not coded yet.

Figure 1 .
Figure 1.The prediction context of the 2D-LMP predictive structure.The gray part is already coded and the white part is not coded yet.

Figure 2 .
Figure 2. The enumeration we used for the relative indexing with respect to the current pixel, identified with 0 as index.

Figure 2 .
Figure 2. The enumeration E we used for the relative indexing with respect to the current pixel, identified with 0 as index.

Figure 3 .
Figure 3.The number of operations ( -axis) required to solve the linear system Equation (6), by using different parameters ( -axis).

Figure 3 .
Figure 3.The number of operations (Y-axis) required to solve the linear system Equation (6), by using different parameters (X-axis).

Figure 4 .
Figure 4. Graphical representation of the average results by using the AC scheme for the coding of the prediction errors.= 8 (a) and = 16 (b).

Figure 4 .Figure 4 .Figure 5 .
Figure 4. Graphical representation of the average results by using the AC scheme for the coding of the prediction errors.N " 8 (a) and N " 16 (b).

Figure 5 .
Figure 5. Graphical representation of the average results by using the PAQ8 scheme for the coding of the prediction errors.N " 8 (a) and N " 16 (b).

Table 1 .
Description of the dataset used.

Table 2 .
Achieved results by using the following parameters: N " 8, B " 1. (N.P. indicates that the scene is not present).

Table 3 .
Achieved results by using the following parameters: N " 16, B " 1.

Table 4 .
Achieved results by using the following parameters: N " 8, B " 2.

Table 5 .
Achieved results by using the following parameters: N " 16, B " 2.

Table 6 .
Achieved results by using the following parameters: N " 8, B " 3.

Table 7 .
Achieved results by using the following parameters: N " 16, B " 3.

Table 8 .
Average Results on the 1997 AVIRIS Images (AC).

Table 8 .
Average Results on the 1997 AVIRIS Images (AC).

Table 10 .
Compression results, in terms of compression ratio (C.R.) achieved by LMBHI (by using various parameter configurations), compared to other lossless compression methods.

Table 11 .
Description of the CCSDS dataset.

Table 12 .
Achieved results for the calibrated images of the CCSDS dataset.

Table 13 .
Achieved results for the uncalibrated images of the CCSDS dataset.

Table 14 .
Comparison with other lossless compression methods (calibrated images).The results are reported in bits-per-sample (BPS).

Table 15 .
Comparison with other lossless compression methods (16-bit uncalibrated images).The results are reported in bits-per-sample (BPS).

Table 16 .
Comparison with other lossless compression methods (12-bit uncalibrated images).The results are reported in bits-per-sample (BPS).