Next Article in Journal
Combination of Spatial and Frequency Domains for Floating Object Detection on Complex Water Surfaces
Next Article in Special Issue
Availability and Fade Margin Calculations for 5G Microwave and Millimeter-Wave Anyhaul Links
Previous Article in Journal
The Influence of Environmental Change (Crops and Water) on Population Redistribution in Mexico and Ethiopia
Previous Article in Special Issue
A Tweak-Cube Color Image Encryption Scheme Jointly Manipulated by Chaos and Hyper-Chaos

Appl. Sci. 2019, 9(23), 5218; https://doi.org/10.3390/app9235218

Article
A High Efficiency Multistage Coder for Lossless Audio Compression using OLS+ and CDCCR Method
Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, ul. Żołnierska 49, 71-210 Szczecin, Poland
*
Authors to whom correspondence should be addressed.
Received: 14 October 2019 / Accepted: 28 November 2019 / Published: 30 November 2019

Abstract

:
In this paper, the improvement of the cascaded prediction method was presented. Three types of main predictor block with different levels of complexity were compared, including two complex prediction methods with backward adaptation, i.e., extension Active Level Classification Model (ALCM+) and extended Ordinary Least Square (OLS+). Our own approach to implementation of the effective context-dependent constant component removal block is also presented. Additionally, the improved adaptive arithmetic coder with short, medium and long-term adaptation was presented, and the experiment was carried out comparing the results with other known lossless audio coders against which our method obtained the best efficiency.
Keywords:
adaptive arithmetic coder; cascaded prediction; context-dependent constant component removing; extended active level classification model; Least Mean Square

1. Introduction

The compression of audio and video signals is getting increasing attention as the transmission bandwidth is getting wider, storage media is getting cheaper and the requirement for better quality is growing. This has been proven by the dynamic work of the MPEG Group, operating from 1988 to today, and its impressive portfolio [1]. Among the developed standards and technologies, the lossless compression algorithms deserve special attention, which enables the conversion of the raw data into a compressed form and reconversion without losing any information.
Important purposes of lossless audio compression include recording storage, saving of records with high-quality sound on commercial media (e.g., DVDs, Blu-Ray, and CD (44100 sample/16 bit)) and selling songs in an online music stores for more demanding customers who are not satisfied with the quality of mp3 format [2]. Moreover, lossless mode is often required at the stage of music processing in a studio, advertising materials and in the production of radio and television programs, films, (post-production [3]) etc. In such cases, no lossy coding is used, which, at each iteration of sound editing, may cumulate additional distortions.
The motivation for our work is to improve the seemingly complete solution, in which we see areas of further development opportunities. We hope that through this research in the future the archiving and transmission of audio data in lossless form will become even better and more popular. Our solution is designed for target to use in static long-term archiving, such as the music database, source distribution of high-quality audio samples, where it is not required too often encoding but only distribution of coded files and encoding on client side. However, our solution does not give up and is also suitable for online encoding and decoding. Our solution is time symmetric, and is explained in more detail in Section 8.
Some popular examples of lossless audio compression codecs are APE (Monkey’s Audio) [4], Free Lossless Audio Coding (FLAC) [5], Shorten [6], WavPack [7], Nanozip [8], True Audio (TTA) [9], Tom’s verlustfreier Audiokompressor (TAK) [10], Lossless Audio (LA) [11] and the most often scientifically published one among lossless coders—MP4-ALS-RM23—which was published as a standard ISO/IEC 14496-3:2009 with the last revision in 2015 [12]. MP4-ALS-RM23 has many switches to customise the operation of the algorithm. Therefore, in the further part of the work, we distinguish two modes: default mode and the best mode, where default mode is used without any parameters, whereas the best mode uses switches in such a way as to obtain the best results for the tested base.
Lossless audio coders usually have two characteristic stages: modelling and fast compression, where the linear prediction is usually used for modelling, and entropy coders for fast compression [13]. Usually the modelling consists in replacing the current sample x(n) with its difference from the expected value x ^ ( n ) rounded to the nearest integer number:
e ( n ) = x ( n ) x ^ ( n ) + 0.5 .
The error signal e(n) has a distribution similar to the Laplace distribution, but because of the use of integers, in practice, the geometric distribution is used (see the Golomb code in Section 7).
The simplest constant prediction model is DPCM using the previous sample x ^ ( n ) = x ( n 1 ) . The basic predictive models are those with r-fixed coefficients (linear prediction in more detail is described in Section 2, see Formula (2)). A fixed predictor with a fixed set of r prediction coefficients can be used to efficiently encode various categories of audio data. The basic assumption of its universality is the fact that the sum of the coefficients should be 1 (this is a condition of the unbiased prediction estimator). The prediction order for the universal predictor (not associated with a specific audio signal) should not be too large (r ≤ 4).
The low order rule does not apply to, e.g., static predictors, in which the coefficients are determined for individual signal frames [14] (or entire sound files [15]), e.g., by means of minimising the mean square error (MMSE).
The publication of the results of our research has been divided into three stages of research. In [16], the general concept of new predictive cascade coding was presented, and in [17], the advantages of adaptive Golomb coding [18] are described. A description of the last widest stage of research work is in this article. Among the published solutions of this type, the highest efficiency is characterised by a cascaded RLS-LMS coder (see Section 5) [2] (implementation of MP4-ALS-RM23 in the best mode). Our method is a significant improvement in this cascading concept. Our coder has higher compression efficiency with a noticeable shortening of the total encoding time relative to work [16]; the experiments shown in Section 9 prove this. In Section 2 the basic prediction methods was described. In Section 3 and Section 4, we focus on simple and complex prediction methods with backward adaptation. In Section 4.2, the most effective OLS+ prediction block was described. Cascade connection of prediction block was described in Section 5. Details about removing context-dependent cumulate prediction error are presented in Section 6. In Section 7, a high efficiency version of the binary arithmetic coder using the Golomb code family was presented and analysis of practical aspects of prediction error encoder implementation is presented in Section 7.5. Schematic diagram of the proposed cascading audio data encoder method was presented in Section 8. The conclusion and summary are found in Section 10.

2. Basic Prediction Method

Among the basic prediction methods, i.e., constant and static linear predictive models, presented in the literature [19], lossless audio compression uses two approaches of an adaptive predictive modelling: forward adaptation and backward adaptation.
In the coding of audio, the method of determining the predicted value based on a linear prediction (which is based on the MMSE criterion) is most frequently used at the pre-modelling stage [13]. It requires a solution system of equations with r variables wj that create vector of coefficients w = [w1, w2, … wr]T. Due to how the varied nature of the data in different parts of the sound work, methods with the current adaptation of the prediction coefficients allow achieving a high degree of compression.
Forward adaptation is an asymmetrical method of time, and its advantage is fast decoding time. The encoding process can be much longer because is possible perform initial coding multiple times, choosing the best parameters for the final version of the encoded file. In addition to the classic solutions used in MP4-ALS-RM23 (default version), it is possible look for better prediction models using extended data analysis [20], using other predictive techniques than the basic LPC computed using the Levinson–Durbin method, such as using Laguerre filters [21], and also due to cascading block connections with different lengths and prediction orders [22].
The predominantly used version is prediction with forward adaptation, which requires saving of prediction coefficients for individual encoded frames (the vector w of prediction coefficients is determined by optimising the mean error of prediction in the entire frame which requires access to it for analysis before coding [20,23]). In backward adaptation, the access to future samples is not required because the adaptation of vector of prediction coefficients is based on already coded samples [23,24]. The literature shows that prediction methods with backward adaptation achieve higher compression efficiencies because of the possibility of using higher orders of prediction and quick adaptation to changes in signal characteristics over time (adaptation of parameters every single sample of data). For this reason, this work focuses on this approach. In addition, we are currently focusing only on the CD standard parameters, i.e., 44,100 samples per second, with a resolution of 16 bits, as it is still the most common format for commercial use of music.
In stereo CD standard existing dependencies between channels allow to use in r-order predictive models using samples from both channels, left xL(ni) and right xR(nj).
x ^ L ( n ) = i = 1 r L a i x L ( n i ) + j = 1 r R b j x R ( n j ) x ^ R ( n ) = j = 1 r L c j x R ( n j ) + i = 0 r R 1 d i x L ( n i )
Based on the general form wCh = [w1, w2, … wr]T, according to Formula (2) (where Ch is the channel L (left) or R (right)), we can distinguish vectors of prediction coefficients for the left channel w L = [ a 1 , a 2 , , a r L , b 1 , b 2 , , b r R ] T and for right channel w R = [ c 1 , c 2 , , c r L , d 0 , d 1 , , d r R 1 ] T . There are two formulas because by coding (decoding) the value of the right channel sample xR(n), there is already access to the current sample of the left channel xL(n). The result of the bit average can be influenced by the selection of which channels are coded in the first order. Note that the rL ordering in both cases concerns the set of samples of the currently coded channel, whereas rR is number samples of the opposite channel, in addition r = rL + rR.
There is a problem with the selection of the universal rR/rL ratio. The most common proportions in the literature are 1:1 and 1:2, after completing the bit-minimising experiments (in forward adaptation mode) it turned out that depending on the size of frame (2q), the best order of prediction r is changing (within the test database). Also, the increase order of the prediction r leads to a decreasing value of the ratio rR/rL [14]. For backward adaptation (methods described in Section 4), good results were obtained for a 1:1 proportion.

3. Simple Prediction Method with Backward Adaptation

The basic advantage of adaptation of prediction coefficients is the ability to adjust wj coefficients to the variability of the signal over time. An additional characteristic, only for backward adaptation, is the lack of the need for initial analysis of the entire signal frame, which must be carried out using, for example, static models (forward adaptation), where in the case of MMSE, one or many systems of equations with r unknowns must be determined and solved.
One of the simplest methods of adapting prediction coefficients is Least Mean Square (LMS). In the basic version it is less effective compared to even the RLS, ALCM+, and OLS+ methods discussed in the next section, as it is relatively slow convergence to achieve the results obtained than MMSE with forward adaptation.
Adaptation of prediction coefficients in LMS is done according to the following relation,
w ( n + 1 ) = w ( n ) + μ e ( n ) x ( n )
where vector x(n) = [x(n − 1), x(n − 2), …, x(nr)]T contains r samples of the currently encoded channel. Most often, the vector w(0) is initialised as w(0) = [β, 0, …, 0] where β = 0 or β = 1. The complexity of this procedure is linear O(r), therefore it is suitable for building high order predictive models.
The step size, µ, is very important here, as it determines the speed of adaptation of the vector coefficients w(n) to the current signal characteristics. Proper selection of this value has a significant impact on the final bit average of the encoded audio track and it is difficult to find a universal experimental value, the same problem applies to the value of the prediction order.
To a large extent, these problems can be solved by introducing an extended version of this method. The standardised version of Normalised LMS (NLMS) [25] has a higher rate of adaptation due to the energy level of the last encoded signal samples. This involves changing the learning factor:
μ ( n ) = μ 1 + x T ( n + 1 ) x ( n + 1 ) = μ 1 + i = 0 r 1 x 2 ( n i ) .
In our solution, we proposed to develop Formula (3) by introducing a scaling factor, which is located on the diagonal of the matrix C = diagonal ([0.9951 0.9952 … 0.995r]), to the formula for adaptation of prediction coefficients by ES-NLMS method (an idea taken from the exponentially weighted step-size (NLMS) [26]), and the value of 0.995 was selected experimentally:
w NLMS ( n + 1 ) = w NLMS ( n ) + μ ( n ) e ( n ) C x ( n ) .

4. Complex Prediction Method with Backward Adaptation

In [24], the authors presented a highly efficient cascade method combining several stages of prediction (DPCM + RLS + LMS1 + LMS2 + LMS3 − see Section 5) with backward adaptation (implementation the MP4-ALS-RM23 – in the best mode). It is a method using high complexity of the recursive least square (RLS) block and a large summary number of prediction coefficients undergoing adaptation in LMS1, LMS2 and LMS3 blocks.
In RLS, the backward adaptation method of linear prediction coefficients has the computational complexity O(r2), which is much greater compared to NLMS while offering faster and more effective prediction with a small order of prediction (details of adaptation of coefficients in RLS are described in [24]). To further improve its performance, it is proposed to use preliminary data modelling in the form of a DPCM block. This allows reducing the dynamics of the signal fed to the input of the RLS block (see Figure 1). In this figure, the RLS predictive block is the only one in the cascade approach that uses interchannel relationships in stereo. Similarly, the cases of ALCM+ and OLS+ are described in the next two items.

4.1. ALCM+

In the proposed solution, the complexity of the main prediction block can be significantly reduced by replacing the RLS block with the rapid ALCM+ method. In the ALCM+ method, the prediction coefficients adaptation procedure has a linear complexity relative to the order of prediction and does not require any square matrix to be adapted.

4.1.1. ALCM Rapid Adaptation Method

The adaptive prediction method Activity Level Classification Model (ALCM) [27] was developed in the 1990s for image coding purposes. It is characterised by a lower, though similar, computational complexity compared to the classic LMS solution. Originally, the method operated on linear prediction models of the fifth or sixth order, with the set of all coefficients in each step being adapted (by the constant μ = 1/256 at 8-bit samples) only two (up to six in the ALCM+ version proposed in this paper) properly selected.
Despite its simplicity, this method gives relatively good results also with regard to audio coding. The method is applied with respect to the interchannel stereo mode, and therefore two prediction orders {rL, rR} correspond to the data from the currently coded and opposite channels, and the total prediction order of the ALCM+ method equals r = rL + rR. Generalising some formulas, we can introduce the designation Ch (channel) in place of L (left) and R (right).
To increase the efficiency of the method, the initial DPCM block is used, the DPCM block returns an e C h DPCM ( n ) error at the time n, which is buffered as a set of errors from previous i-th moments marked as g(i). Therefore the ALCM+ input stream consists of elements that are the difference of two consecutively coded samples g(i) = x(ni) − x(ni − 1). Assuming that the left channel sample are encoded before right channel sample, the ALCM+ encoder input vectors at time n have the following form; for the left channel, gL(n) = [gL(1), gL(2), …, gL(r)]T = [xL(n − 1) − xL(n − 2), xL(n − 2) − xL(n − 3), …, xL(nrL) − xL(nrL− 1), xR(n − 1) − xR(n − 2), xR(n − 2) − xR(n − 3), …, xR(nrR) − xR(nrR− 1)]T, while for the right channel gR(n) = [gR(1), gR(2), …, gR(r)]T = [xR(n − 1) − xR(n − 2), xR(n − 2) − xR(n − 3), …, xR(nrL) − xR(nrL − 1), xL(n) − xL(n − 1), xL(n − 1) − xL(n − 2), …, xL(nrR + 1) − xL(nrR)]T.
Due to the specificity of the vector gCh(n), the predicted values are determined as
x ^ C h ( n ) = x C h ( n 1 ) + i = 1 r w C h ( i ) g C h ( i ) .

4.1.2. The Principle of Adaptation in Our Proposition of the Improved Version of ALCM+

In the new version of ALCM+ presented in this paper, it was proposed to introduce five parallel predictive models with orders rj of r = {10, 22, 30, 56, 110}, respectively, where the orders of predictions have been chose as a result of many experiments to optimise compression efficiency for the learning base.
The predicted value is determined as the arithmetic mean of these five prediction models:
x ^ C h ( n ) = x C h ( n 1 ) + 1 5 j = 1 5 i = 1 r j w C h ( j ) ( i ) g C h ( i ) ,
where Ch is the indicator of L (left) or R (right) channel and j is the number of one of the five prediction models.
In each of these models, after coding the next sample, six prediction coefficients are adapted. For this purpose, the three highest and lowest values of rj from the gCh(i) elements from the vector gCh(n) are determined, where i 1 ; r j ¯ .. Denoting these three elements, respectively, as the smallest value:
g C h ( q C h ( 1 ) ( j ) ) g C h ( q C h ( 2 ) ( j ) ) g C h ( q C h ( 3 ) ( j ) )
and three largest values:
g C h ( p C h ( 3 ) ( j ) ) g C h ( p C h ( 2 ) ( j ) ) g C h ( p C h ( 1 ) ( j ) ) .
Then, for everyone j-th ( j 1 ; 5 ¯ ) predictive models (if assuming that the condition g C h ( q C h ( 1 ) ( j ) ) < g C h ( p C h ( 1 ) ( j ) ) is true), the following adaptation prediction coefficients are used (see Algorithm 1).
Algorithm 1: Adaptation of prediction coefficients by predictive models.
if   x ^ C h ( n ) < x C h ( n ) then   { w p C h ( k ) ( j ) ( j ) ( n + 1 ) = w p C h ( k ) ( j ) ( j ) ( n ) + μ C h ( k ) ( j ) , for   k = { 1 ,   2 ,   3 } w q C h ( k ) ( j ) ( j ) ( n + 1 ) = w q C h ( k ) ( j ) ( j ) ( n ) μ C h ( k ) ( j ) ,   for   k = { 1 ,   2 ,   3 } }   else   if   x ^ C h ( n ) > x C h ( n ) { w p C h ( k ) ( j ) ( j ) ( n + 1 ) = w p C h ( k ) ( j ) ( j ) ( n ) μ C h ( k ) ( j ) , for   k = { 1 ,   2 ,   3 } w q C h ( k ) ( j ) ( j ) ( n + 1 ) = w q C h ( k ) ( j ) ( j ) ( n ) + μ C h ( k ) ( j ) , for   k = { 1 ,   2 ,   3 } }
The modifying step-size μ C h ( k ) ( j ) value is determined as follows,
μ C h ( k ) ( j ) = min { 2 6 ; μ ¯ C h ( k ) ( j ) } ,
where,
μ ¯ C h ( k ) ( j ) = | x C h ( n ) x ^ C h ( n ) | 2 3 k = 1 3 α k ( g C h ( p C h ( k ) ( j ) ) g C h ( q C h ( k ) ( j ) ) )
and αk = {1; 0.5; 0.5}.

4.2. OLS+

Extended version of Ordinary Least Square (OLS+) is characterised by a slightly more computational complexity than RLS. In addition to the update (O(r2) complexity, as in RLS) of the square matrix R(n) autocovariance (r×r dimensions), an additional procedure for determining the inverse matrix with the O(r3) complexity is required.
In the OLS+ method, prediction coefficients are calculated adaptively, individually for each coded sample, minimising the mean square error in a certain limited backward area. The influence of older samples is limited by the use of the forgetting effect determined by the factor ff.
In OLS+, the basic matrix equation used to calculate the vector w(n) has been extended by the element ubias, which comes from the principles of ridge regression [28]. The ubias element modifies the main diagonal values in autocovariance matrix R(n). This prevents the occurrence of a singular matrix and also improves the overall efficiency of modelling. The vector of prediction coefficients is obtained by solving the matrix equation (for simplification the designations of L and R channels have been removed at this point, remembering that for each channel, there are separate RCh(n) matrix’s and vectors wL(n) and wR(n), as described in Section 2, Formula (2)):
w ( n + 1 ) = R bias 1 ( n + 1 ) · q ( n + 1 ) ,
where the matrix Rbias(n + 1) is complemented by a value ubias(n + 1):
R bias ( n + 1 ) = R ( n + 1 ) + u bias ( n + 1 ) I ,
where I is a identity matrix. In [16], ubias was a fixed value. Based on the work in [28], it was proposed to determine the ubias in the following way,
u bias ( n + 1 ) = r ( 1 f f ) c OLS e 2 ( n ) max { 1 , w j 2 ( n ) } ,   for j 1 ; r ¯ .
Elements Rj,i(n + 1) of the matrix R(n + 1) with dimensions r×r are updated after encoding each subsequent sample as follows.
Rj,i(n + 1) = cOLSx(ni)⋅x(nj) + ffRj,i(n).
Therefore, the elements qj(n + 1) of the vector q(n + 1) with dimensions r×1 are updated as follows.
qj(n + 1) = cOLSx(n)⋅x(nj)+ ffqj(n).
For the first 100 samples, the predictive model is not determined, due to the insufficient representativeness of data in the autocovariance matrix, so a simple DPCM prediction model of the first order is used. The weight
c OLS = ( s e r r ( n + 1 ) + 2 ) 3 4
is dependent on the calculated iterative value
serr(n + 1) = hfserr(n)+ |e(n)|.
The introduction of cOLS is intended to reduce the role of input data for which larger absolute prediction errors |e(n)| are obtained. The best compression results were obtained at r = 20 (predictor using 10 samples back from both channels) and with forgetting factors ff = 0.9983 and hf = 0.69.
The OLS+ method has greater computational complexity than in case of RLS method because of need to inverse the matrix. An advantage of OLS+ is the better efficiency of the predictive model, obtained by using additional operations on autocovariance matrix before it is inverse, according to the Formulas (13), (17), and (18). It is different to the case of the method of fast adaptation in RLS as it applies to the already inverted matrix, which is more sensitive and difficult to determine the common forgetting factor that ensures proper efficiency in all the test base files. For this reason, in the MP4-ALS-M23 forgetting factor has a universal value equal 1.

5. Cascade Connection of Prediction Method

In the solution described in [24] based on prediction models with backward adaptation, a cascade connection of individual five stages (blocks) is used. The first one is the constant predictor DPCM, whereas the second block uses the RLS method for example with an order rRLS = 16 (using w vector with eight wj coefficients per channel in stereo mode [24,29]). The remaining three blocks (see Figure 1) use the NLMS method (see Section 3), wherein each subsequent block the smaller prediction order is used; the outputs of subsequent blocks are dependent on the values calculated in the preceding blocks. For the NLMS blocks, the following prediction orders were proposed, ri = {384, 112, 16}.
Our previous approach in [16] also uses a cascade of five blocks including three subsequent blocks of the modified version of NLMS with orders {1000, 25, 10}. In the solution proposed here, which is an extension of the codec from work [16], the prediction order in the middle NLMS block has been increased from r = 25 to r = 380. Instead of the OLS block, an improved version of OLS+ has been introduced, whose simplified version has been successfully used earlier in lossless image compression [30,31].
In Section 4, two new approaches to the initial prediction stage (ALCM+ and OLS+) are proposed. In Figure 2, the proposed solution with the ALCM+ block (of lower complexity) is shown. The higher complexity version differs only in that the OLS+ block has been introduced in place of DPCM and ALCM+ blocks.
If we define w(j)(n) as a vector that stores the prediction coefficients used in the j-th stage of the cascade, then the relationships between the successive stages of the cascading system are as follows,
y1(n) = x(n − 1)
y j ( n ) = i = 1 r j w i ( j ) ( n ) e j 1 ( n i ) ,   for j > 1
e1(n) = x(n) − y1(n),
ej(n) = ej−1(n) − yj(n), for j > 1.
Compared to the original approach presented in [24,29], the initial DPCM block has been abandoned (y1(n) = 0), and the RLS block has been replaced with the extended OLS+ block, where interchannel dependencies are removed.
After K stages of prediction blocks as the last block of determining the final version of the predicted x ^ ( n ) = y K value, the method of removing the cumulative prediction error was added where the Cmix value is calculated as the constant component (bias), depending on the individual number of the context for each successively coded x(n) sample (see Section 6). This method earlier was successfully used only in lossless image compression [32]. This block is marked as Context-Dependent Constant Component Removing (CDCCR) and, similarly to the development of NLMS blocks, has been described in detail in Section 3.
The final prediction error is calculated from
e ( n ) = x ( n ) y 1 ( n ) + y 2 ( n ) + + y K ( n ) + C mix + 0.5 .
The last two blocks (marked in Figure 2 as Golomb Code and CABAC) in the cascade audio data compression system proposed here are used to efficiently code prediction errors e(n) into the resulting binary data stream (see Section 7).
In Table 1, the first-order entropy results for 16 test files are presented (more about first-order entropy in Section 7). The first three columns contain entropy results after applying DPCM + ALCM+, DPCM + RLS and OLS+, respectively. The last three columns present the results of the same set, but with an additional three NLMS blocks with prediction orders {1000, 380, 10}, respectively. In both RLS and OLS+, blocks the prediction order of r = 20 was used (the average first order entropy of the whole database was bold in the last row of Table 1.).
An important conclusion is that the advantage of OLS+ over RLS is only noticeable after including three NLMS blocks in the cascade prediction system. This shows how unobvious the purely theoretical approaches can be. Only practical experiments (see [24]) allow determining the final universal form of the cascade prediction system.
The final solution proposed in this work is therefore the cascade shown in Figure 2 after taking into account the changing of the first two DPCM + ALCM+ blocks into the OLS+ block.

6. Removing Context-Dependent Cumulate Prediction Error

Often prediction methods can introduce constant (bias) components in predetermined prediction errors. The nature of component depends on the properties of a context defined as (heuristically selected) a set of features the nearest neighbourhood. In practice, the division into a large number of contexts is used (it can be from several hundred to several thousand), which precisely allows to determine the type of the closest neighbourhood of the coded sample using different features of the previous few samples and previously coded prediction errors. The context number can be represented, e.g., as the i-bit-number, where each bit can be determined in a separate way; for example, by two-state quantisation of the module of the previous prediction error, checking if the condition |e(n − 1)| > 50 or checking the condition x(n − 1) > x(n − 2).
For example, with i = 11 different rules of this type, 2048 contexts can be created, and each can adaptively calculate a separate average or median value from previous prediction errors that can be subtracted from the predicted prediction error at the next occurrence of the given context in the coding step.
This idea is commonly used in relation to image coding [32]. The adaptive method of removing a constant component is used both in the CALIC [33] and JPEG-LS algorithms [34].
The algorithms are shown in Algorithm 2, where j is the context number, S ( j ) ( C h ) is the current cumulative value of prediction error e K ( C h ) ( n ) in the given context, N ( j ) ( C h ) is the count of occurrences of the given context and C ( j ) ( C h ) is the current correction value that should be added to the predicted value eK(n) as its corrective value.
Algorithm 2: Our authorial algorithm of adaptation C ( j ) ( C h ) value based on the CALIC method.
Initial value:
S ( j ) ( C h ) : =   C ( j ) ( C h ) : =   0 ,   for   every   j ; N ( j ) ( C h ) : =   4 ,   for   every   j ; if   |   e K ( C h ) ( n ) |   <   4 V a r ( X ) 4   then   { S ( j ) ( C h ) : =   S ( j ) ( C h ) +   e K ( C h ) ( n ) ; N ( j ) ( C h ) : =   N ( j ) ( C h ) +   1 ; }   C ( j ) ( C h ) : =   S ( j ) ( C h ) / N ( j ) ( C h ) ;
The periodic forgetting technique is also useful, which additionally allows adjustment of the values of the constants C ( j ) ( C h ) to the local properties of the j-th context, if N ( j ) ( C h ) is greater than 127 then N ( j ) ( C h ) is set on 64 and S ( j ) ( C h ) is set as S ( j ) ( C h ) /2.
To remove a context-dependent constant component, instead of the arithmetic mean used in CALIC, it is possible to use the median of a set of prediction errors previously appearing in a given context.
Also, in this case, it is worthwhile to additionally use the technique of forgetting periodic after the occurrence counter of N ( j ) ( C h ) j-th context has reached the value of 128. This improves the overall efficiency of compression while also reducing the size of the vector to 64 elements, which store the prediction errors that appear earlier in the given context. Forgetting periodic consists in reducing the vector by half of 128 elements by removing the first 32 and last 32 elements sorted ascending vector which stores the prediction errors that appear earlier in the given context.
In presented codec this solution was proposed to encode an audio signal with equally good effect as the last block CDCCR in the cascade as an improvement of the value of the predicted coded samples. Due to this work a bit average can be shortened by up to 1% (depending on the input data and the efficiency of previous prediction blocks).
The idea is based on the determination of 4 types of contexts, and each of them is used twice (using the arithmetic mean and medians marked with the AVE and MED symbols, respectively). In this way, eight constant components are created, and, on the basis of which, the final weighted C mix ( C h ) is determined by
C mix ( C h ) = 1 36 ( 4 i = 1 4 C ( j ( i ) ) ( C h ) AVE + 5 i = 1 4 C ( j ( i ) ) ( C h ) MED ) .
The high efficiency of the mixed method of constant component correction results from the principles of mixing correction values that are burdened by some of level of uncertainty, which may individually deteriorate in some situations the final level of prediction error (incorrect prediction of correction values). This is prevented by mixing (weighted average) that causes the correlation of these uncertain correction values to a large extent. At the same time, this method reduces the chances of occurrence of the asymmetry effect of the distribution, which can be manifested by the occurrence in the given j-th context of differences between the position in the histogram, indicated as the arithmetic mean C ( j ) ( C h ) of prediction errors, and the actual position indicating in the histogram the maximum probability (histogram of the Laplace distribution has one maximum), indicated in [35].
In Table 2 is a set of decision rules based on which the numbers of contexts are determined. Decisions are two-state (binary) quantizer that returns the value of bit 0 or 1 (individual bits are defined here as αi), which corresponds to the fulfilment or non-fulfilment of a given condition (YES/NO answer). As they are partially repeated in subsequent ways of building the context number, they will be listed Table 2, and the construction rules for the four types of context number are described in the following Section 6.1, Section 6.2, Section 6.3 and Section 6.4.

6.1. First Type of Context

The number 1 of the context type is determined as a ten-bit number of κ9κ8κ7κ6κ5κ4κ3κ2κ1κ0 characters, where κi − 1 = αi for i = {1, 2, …, 8}. The last two bits of κ9κ8 are determined using a four-state quantizer of the sum of S1 with three thresholds {50, 250, 700}, where
S 1 = i = 1 5 | x ( n i ) x ^ ( n ) | .

6.2. Second Type of Context

The number 2 of the context type is determined as a eleven-bit number of κ10κ9κ8κ7κ6κ5κ4κ3κ2κ1κ0 characters, where κi − 1 = αi + 19 for i = {1, 2}, κi + 1 = αi + 8 for i = {1, 2, 3}, κi + 4 = αi + 11 for i = {1, 2, 3, 4} and κi + 8 = αi + 15 for i = {1, 2}.

6.3. Third Type of Context

The number 3 of the context type is determined as a eleven-bit number of κ10κ9κ8κ7κ6κ5κ4κ3κ2κ1κ0 characters, where κi − 1 = αi + 19 for i = {1, 2}, κi + 1 = αi + 8 for i = {1, 2}, κi + 3 = αi + 3 for i = {1, 2}, κi + 5 = αi + 17 for i = {1, 2} and κ8 = α22. The last two bits of κ10κ9 are determined using a four-state quantizer of the sum of S2 with three thresholds {100, 400, 1550}, where
S 2 = i = 1 4 1 i | x ( n i ) x ^ ( n ) | .

6.4. Fourth Type of Context

The number 4 of the context type is determined as a number that is a composite of three values. The first two are determine as six-state quantizer with thresholds {-100, -10, 0, 10, 100} for value d1 and d2, respectively, where di = x(n − i) − x(n − i − 1), for i = {1, 2}. The third value of the context number is the six-bit κ5κ4κ3κ2κ1κ0 character value, where κi − 1 = αi + 8 for i = {1, 2}, κ2 = α2, whereas three bits κ5κ4κ3 are determined using an eight-state quantizer of the sum of S2 with seven thresholds {40, 80, 180, 400, 1000, 2000, 4000}, totally obtaining 6⋅6⋅26 = 2304 contexts of type number 4.

7. Adaptive Golomb Code and Context Adaptive Binary Arithmetic Coder

Practical implementation of effective data encoding involves minimising the average number of bits per single data generated by the S source (in the case of audio compression the input data are samples, in this work, we adopted the standard 16-bit per samples, which the source S is a stream of sample from CD). Depending on what sources we deal with, we can divide them into sources without memory DMS (Discrete Memoryless Source) and sources with memory CSM (Conditional Source Model). Considering this division from the point of view of Markov model, in the first case, for the lower limit of the bit average we can introduce unconditional entropy (H(SDMS) - zero order entropy), and in the second case, we deal with the conditional entropy of the k-th order defined as H(S|C(k)), where in the case of audio signal, the context of C(k) is defined in example as k samples backwards. By introducing the total entropy as H(S), we obtain the relation H(S) ≤ H(S|C(k)) ≤ H(SDMS).
Trying to use a static version of Huffman code, for example, in the case of 16-bit data, requires giving the decoder a probability distribution of the encoded file, which is an impractical idea, because, with 16-bit samples and possibly low precision of writing, the individual probabilities pi, for example, using 2 bytes per one pi value, the size of the file header itself will be 217 B. In addition, the zero order entropy, which is the lower limit of the bit average of the static encoder, indicates that this approach gives unsatisfactory results (see the second column in Table 3).
For this reason adaptive versions of codecs are used in practice. Moreover, since the audio data do not constitute a sequence of independent values it is difficult to determine and apply the k-th order of Markov model. In practice, it is assumed that the removal of interdependencies is possible by use of predictive techniques writing only predictive errors into the file. The prediction model can be a linear model of the k-th order or a more complex nonlinear solution, but we reduce it to the one value predicted x ^ ( n ) (creating not explicitly defined first-order Markov model), which we subtract from the current sample x(n), see Formula (1). As a result a sequence of prediction errors can be interpreted as a source for which the first-order entropy value is noticeably smaller than the zero order entropy (see the third column in Table 3). It should be noted here that it is extremely difficult to determine the total entropy H(S) because it is difficult to determine the order of the Markov model, which will take into account all the interdependencies between the individual data (an example is the use of linear prediction of the order of r = 1000 in the first degree of NLMS proposed here the cascading prediction technique).
Usually, instead of a static Huffman code, the adaptive Rice coder [36] and the multivalued arithmetic coder with quantisation are used for efficient coding of prediction errors [15]. This allows reducing the size of the alphabet as a result of which a part of less-significant bits representing the value of prediction errors is saved without further coding [37]. The method proposed here has a similar implementation complexity compared to the one described in the work [37], where simple mapping function is used. That indicates currently the best Rice code parameter, which corresponds statistically to the most suitable fit (in the medium-term sense) to the distribution used by the coder.
Our solution uses an adaptive Golomb code, whose output data is additionally subjected to compression using two context adaptive binary arithmetic coders (CABAC). Each of these coders has its own way of determining the number of the context with which the individual probability distribution of bits 0 and 1 is associated. This solution is faster than used in work [16] while it is characterised by greater flexibility in adjusting to changes in signal features over time. The combination of adaptive versions of the arithmetic encoder and Golomb code allows for taking into account changes in the characteristics of the probability distribution of prediction errors over in time, which involves into a further decrease the bit average.

7.1. Short-Term Feature of the Probability Distribution

Analysing the closest neighbourhood composed of samples x(ni) and the prediction error signal e(ni), the fact that there are short-term dependencies between the coded data in sequence can be used. On the basis of these features, the correct type of distribution of the currently coded modified prediction error e ¯ ( n ) value can be determined quite accurately, where e ¯ ( n ) = 2e(n) − 1 for e(n) > 0, and e ¯ ( n ) = −2e(n) in otherwise.
Starting from this assumption, a contextual arithmetic coder is usually designed so that has t distributions of probabilities associated with individual context numbers from 0 to t − 1. Theoretically, it is expected the increase in compression efficiency with the increasing number of contexts. However, there is the problem of too slow adaptation of their distribution to the approximate actual state. The adaptive nature of calculation of probability distributions requires a quick defined of the approximate target shape of each of t distributions. Therefore, a certain compromise should be made between the number of contexts and the speed of adaptation of their distributions. In coding of images t is 8 [33], 16 and even 20 [38].
The proposed solution uses nine classes that define coarse signal characteristics from a short-term point of view, using principles similar to those used in the works [39,40]. The ω parameter is calculated as a weighted average of z = 17 previous prediction error modules |e(ni)|:
ω = 5 4 z i = 1 z | e ( n i ) | i .
The ω value is quantised using t − 1 thresholds th(j), to obtain the short-term arithmetic context number. For t = 9, thresholds were set as th = {4, 10, 30, 50, 80, 180, 500, 1100}, receiving bmedium number belonging to the range 0 ; 8 ¯ .
It is possible to include ultra-short-term features based on only the closest four errors, by calculating the max{e(n − 1), e(n − 2), e(n − 3), 0.6⋅e(n − 4)} value. If the value is greater than 1500, the bultra bit assumes value 1, otherwise 0.

7.2. Medium-Term Features of the Probability Distribution

The values of bmedium and bultra presented in Section 7.1 provide information on short-term features of the currently estimated probability distribution. Additionally, in the case of the prediction error stream, the assumption is made about the medium-term stationarity of its distribution, which is usually similar to the geometrical distribution. Data on such features can be coded in a highly efficient manner using the Golomb code family [18].
For each coded e ¯ ( n ) value, an individual m group number is calculated. For this purpose, the adaptive average cost of coding with the Golomb code use one of the appropriately selected 40 probability distributions. Probability distributions are defined based on the formula G(i) = (1 − p) ⋅ pi, with individual values p(j) associated with the j-th distribution depending on the m(j) parameter of the Golomb code, which is the j-th number from the experimentally selected set m = {1, 2, 3, 4, 6, 8, 12, 16, 20, 24, 32, 40, 48, 64, 80, 96, 128, 160, 192, 256, 320, 384, 448, 512, 576, 640, 768, 896, 1024, 1152, 1280, 1536, 1792, 2048, 2560, 3072, 4096, 5120, 6144, 8192}. This increased flexibility gives the Golomb codes an advantage over the Rice code, for which m values can be only a number 2 raised to an integer value (e.g., 21 = 2, 22 = 4, 23 = 8, 24 = 16, …).
To calculate a current predicted value of m, in [17], the estimate using backward adaptation was proposed, applying a local assumption of stationarity S(n), being the average value of modified prediction errors. Assuming that the expected value of modified prediction errors is inversely proportional to 1 − p, we obtain p = (S(n) − 1)/S(n), where the expected value of S(n) equals
S ( n ) = 1 N G i = 1 N G e ¯ ( n i ) ,
where NG specifies the number of previously coded modified prediction errors. The group number m can be calculated using formula presented in [41,42] (at δ = 0):
m = log 10 ( 1 + p ) log 10 p + δ .
An experimental correction value δ = 0.41 [17] was introduced to Formula (29) which resulted in a slight decrease in the bit average (for the whole test base). At δ = 0, there is a linear relationship between m and S(n) values of m = (ln 2) ⋅ S(n) [43], and after taking into account δ = 0.41, an approximated form of the formula was determined:
m = 0.693147 S + 0.563636 .
After simplifying the Formula (28) to the iterative form and using the auxiliary sum Sγ, which is adapted using the formula S γ ( q ) ( n ) = e ¯ ( n 1 ) + γ q S γ ( q ) ( n )   , we obtain the value of S ( q ) ( n ) = ( 1 γ q ) S γ ( q ) ( n ) . The value of γq is the experimentally determined forgetting factor, in work [17] it was set at a compromise level of 0.952. A noticeable improvement was obtained by using two forgetting factors (γ1 = 0.935 and γ2 = 0.992) and using them to calculate two sums of S(1)(n) and S(2)(n). In addition, we use the weighting effect of these sums, calculating the current average costs Lcost(q) of encoding the input data stream (for q = {1, 2}) using Golomb codes separately for parameters m1 and m2:
L cos t ( q ) ( n ) = 0.8 L cos t ( q ) ( n 1 ) + l e n g t h ( q ) ( e ¯ ( n ) ) ,
which allows calculating weights γ cos t ( q ) ( n ) = 0.7 L cos t ( q ) ( n ) necessary to calculate the final value of m:
m = 0.693147 γ cos t ( 1 ) ( n ) S ( 1 ) ( n ) + γ cos t ( 2 ) ( n ) S ( 2 ) ( n ) γ cos t ( 1 ) ( n ) + γ cos t ( 2 ) ( n ) + 0.563636 ,
which is additionally scaled to best approximate one of the 40 values found in the m vector. The method of scaling is as follows; for m < 1, we assign m = 1 and values between 1 and 6 remain unchanged, whereas for m > 6, we calculate m : = 1.65 m . The last step is the approximation of the calculated m value to the nearest of those in m vector. In this way, the index of the quantised value of m (hereinafter referred to as bGolomb) is obtained. The bGolomb value is a fragment of the context number and is invariable when encoding all bits of the next Golomb code word representing the number e ¯ ( n + 1 ) .
This allows for a highly flexible adaptation to the local features of the probability distribution of currently coded prediction errors as well as the rate of change of features of this distribution. Then, the quantisation of the m parameter to only 40 values is a compromise approach with regard to the uninterrupted adaptation of each of the 40 probability distributions that are used in the binary arithmetic coder, by which the bit stream coming from Golomb coder is coded.
The Golomb code word coding e ¯ ( n ) value consists of two elements. The first one is the u G = e ¯ ( n ) / m number specifying the group number that is written in the unary form (sequence uG zeros ending with one). The second element is the v G = e ¯ ( n ) u G m number, called the number of element in a group (remainder of division by m). It is coded using a phased-in binary code (which is the variant of the Huffman code for sources with m equally probable symbols [42]). Specifying the k = log 2 m parameter means that, in each group, the first l = 2 k m elements vG are coded using k − 1 bits, and the remaining ml elements are coded as number vG + l using k bits [13].

7.3. The Way of Determining the Number of Contexts

Binary streams representing uG and vG are coded using separate arithmetic coders, with separate ways to determine the context number. The ctxu context number used to encode the uG series with zeros terminated with one is calculated for each consecutively coded bit as follows,
ctxu = 23⋅(18⋅bGolomb + 2⋅bmedium + bultra) + bunary
The bunary value is a three-bit number (in the range from 0 to 7) specifying the number (counting from the most to the least significant) of the currently coded unary bit of the uG number. When the bit number is greater than 7 (which takes less than 1% of cases) then bunary = 7. We therefore obtain 23⋅18⋅40 = 5760 contexts ctxu.
In the case of the second coder used to encode the vG, number, there are slightly fewer contexts, because 24⋅5⋅40 = 3200. The ctxv number is calculated from the dependency
ctxv = 24⋅(5⋅bGolomb + bphased-in) + 23bbinary2 + 22bbinary1 + bunary2
where bunary2 is equal bunary2 = min{bunary, 3}. The bbinary1 is the oldest (first in coded order) bit of value vG. The bbinary2 is the second oldest (second coded) bit of value vG. Although bphased-in is a number (in the range from 0 to 4) specifying the number (counting from the more-significant to less-significant bits) of currently coded bits of value vG, if the bit number is greater than 4, then bphased-in = 4. Furthermore, when bphased-in = 0, then bbinary1 = bbinary2 = 0 is set, and if bphased-in = 1, then bbinary2 = 0.

7.4. Long-Term Adaptation

In the case of an context adaptive binary arithmetic coder with each number of the context, a probability distribution consisting of only two values—p(0) and p(1)—is associated. In practice, the number of occurrences of zeros and ones denoted n(0) and n(1) respectively, then p(0) = n(0)/(n(0) + n(1)), and p(1) = 1 − p(0). The current adaptation of the distribution consists in increasing (in a given context) by 1 counter of occurrences of the currently encoded bit. Additionally, if the total number of occurrences of zeros and ones in a given context exceeds the value of Nmax, then their counters n(0) and n(1) are divided by 2, which is the equivalent of the long-term forgetfulness method. For this reason, the designer of such a binary coder should determine both the Nmax value and the initial values of n(0) and n(1) within each context in such a way as to obtain the best fit to the encoded data.
In the proposed solution, in all contexts of the uG coder, Nmax = 29 was set, and the counters n(0) and n(1) were initialised with the value 1. In the case of the vG coder, in all contexts, Nmax = 212 was set, and the counters were initialised as n(0) = 64 and n(1) = 60.

7.5. Analysis of Practical Aspects of Prediction Error Encoder Implementation

Table 3 presents a comparison of the results for several different encoder settings, using the example of a database containing 16 test files (the average of the whole database was bold in the last row of Table 3). The second column contains the values of H(SDMS) unconditional entropy here referred to as zero order entropy. The third column contain entropy values of the first-order H(S|C(1)) for prediction errors (we are talking about conditional entropy in which according to the first-order Markov model process the context is the predicted value which was calculated using cascade prediction (see Formula (23)). There is a significant decrease (bit average for the whole database of 16 files) from 13.813 to 9.172 bits per sample.
The fourth column takes into account the actual bit average after using our proposed adaptive Golomb; a code described in Section 7.2. This allowed to obtain a bit average lower than H(S|C(1)) by 0.407 bits per sample, which shows that predictive modelling is not able to fully remove the mutual information.
On the other hand, the introduction of a block in the form of an context-free arithmetic encoder (with active default long-term adaptation, see Section 7.3), which additionally encodes the stream of bits coming out of the Golomb encoder, allowed for a further decrease of the bit average by another 0.14 bits per sample.
After introducing context rules (medium term adaptation) to the encoder CABAC, the bit average was reduced by 0.082 bits per sample, and after including additional contexts in CABAC (in the short term adaptation, see Section 7.1), the result was improved by reducing the bit average by 0.004 to the target level of 8.539 bits per sample.

8. Schematic Diagram of the Proposed Cascading Audio Data Encoder Method

In Figure 3, the diagram of the target encoder proposed in this work is presented. Although the cascading concept was previously used [24,29] and was considered the most effective way of lossless audio compression, our approach demonstrates that there is scope for further significant improvement within this concept.
In general, we can distinguish two main stages: In the first stage, the prediction value x ^ ( n ) and prediction error e(n) are calculated (see Formula (23), which was executed by the blocks in the border in the form of a dashed line: OLS+, 3 stages of LMS, CDCCR). The second stage is coding prediction error to binary stream (implemented by Golomb code and CABAC blocks).
Introducing our proposal of OLS+ method to the initial prediction stage allows to remove the interchannel dependencies with the highest correlation with the currently coded sample x(n). The Formula (2) is used here and the method of adaptation of the prediction coefficients described in Section 4.2. The resulting prediction error e1(n) and previous errors e1(n − i) contained in the buffer become an input data stream for the next block, which is the first stage of LMS. In LMS1, the dependencies between further samples are removed by linear prediction of the order r1 = 1000.
Because as an optimised version of adaptive linear prediction (NLMS, see Section 3) with a relatively slow convergent coefficient adaptation procedure (Formula (5)) is used; it cannot fully remove these dependencies. For this reason, this method is used in a cascade manner in two further steps of prediction improvement (using descending orders r2 = 380 and r3 = 10), minimising the mean of prediction error. These three LMS blocks use the Formula (20) to calculate consecutive predictive values y2(n), y3(n) and y4(n) and the corresponding prediction errors e2(n), e3(n) and e4(n) obtained from Formula (22).
The last stage in the prediction part is an attempt to remove the context dependent constant component by using our proposal CDCCR method, which has not been widely used in lossless audio compression before. After subtracting from e4(n), the constant component Cmix calculated from Formula (24), we obtain the final form of the prediction error e(n), which is transmitted to the input of the adaptive Golomb encoder (Section 7.2). This block generates a bit stream, which by use of context dependencies between individual bits is encoded using an arithmetic encoder. In contrast to frequently used multivalue arithmetic encoders, the binary variant of the CABAC adaptive binary encoder is used (see Section 7.3).
The proposed solution is time symmetrical, which means the same complexity is also in case of decoding. The decoding procedure is very similar to the coding stage. In the first two steps (arithmetic decoder and Golomb decoder), the prediction error e(n) must be recovered from the binary stream. This prediction error is added to the predicted value, which is calculated in the same way as in the encoder (see the blocks in the border made by the dashed line in Figure 3), obtaining a decoded sample of x(n). This is a simple transformation of Formula (23) into the following generalised form of the K-step predictive cascade:
x ( n ) = e ( n ) + y 1 ( n ) + y 2 ( n ) + + y K ( n ) + C mix + 0.5 .
Our implementation of the solution proposed here was done in the C language, without considering optimisation of the code or attempts to its parallelisation. However, even with stereo and 44,100 samples per second, it still works online, and the encoding and decoding times are linearly dependent on the number of samples. The encoding program uses one 3.4 GHz i5 processor core in 69.07% (version with ALCM+, see Figure 2) and 93.36% (version with OLS+ block, see Figure 3). In the latter case by dividing the total coding time into blocks we obtain the following proportions: 32.5% − OLS+ block with the order of predictions r = 20, 53.7% − 3 LMS blocks with the total order of predictions r = 1390, 7.5% − CDCCR block. The remaining 6.3% is devoted to the work of blocks: Golomb code, CABAC and for input/output operations.
It is worth noting that the schema of the faster version of the codec proposed here (see Figure 2), in which instead of the OLS+ block the our proposal of ALCM+ method was used with quick adaptation of prediction coefficients, the bit average of 8.667 bits per sample (average for the whole test base) was obtained. This value is lower than in the case of the MP4-ALS-RM23 model in the best mode (8.718 bits per sample, see Table 4) using RLS block of higher implementation complexity than in the case of ALCM+. For more comparisons with other existing publicly available codecs see Section 9.

9. Experimental Research

Sixteen fragments of recordings (various genres of music as well as men’s and women’s speech recordings) were used to perform efficiency analysis. The recordings are long for several seconds, and all available in the database [44]. Comparing to the bit average of the general purpose coder RAR v5.0, our solution achieves as much as 21.08% better result, because the bit average of compressed the same test base using RAR equal 10.82. The result of the encoding of these recordings by the proposed method is presented in Table 4 and Table 5, where the results of other effective available audio coders are also included (the best results for the individual files in Table 4 are underlined, the bit average of the whole database was bold in the last row of Table 4 and Table 5). The bit average obtained after coding proposed in this work using the multistage OLS+/NLMS method was the lowest of all coders listed in the table. The propose method had an 8.7% lower bit average than obtained using the MP4-ALS-RM23 method (default mode), and also had a 2.1% lower bit average compared to MP4-ALS-RM23 using the best mode.

10. Conclusions

This work presents an extended version of the cascading audio data encoder. In classic solutions of lossless audio coding, adaptive versions of Rice and arithmetic encoders are used interchangeably. The proposed solution uses an adaptive Golomb code, which is a generalised form of the Rice code with potentially higher compression efficiency because of a better adaptation of the code to the current probability distribution of the currently encoding data. The Golomb codec output is additionally compressed using a context-dependent adaptive binary arithmetic encoder. In contrast to CABAC [45], where the Exp-Golomb distribution is used, our proposition adapts better to the sequence of prediction errors characterised by the geometric distribution.
The proposed adaptive arithmetic coder presented in this paper offers less computational complexity compared to the coder used in the work [16]. A higher degree of compression was obtained due to a better algorithm for selection of context numbers, and also due to the omission a multivalued arithmetic coder in favour a binary variant (this allows increasing the dynamics adaptation to probability distributions of individual bits of values uG and vG calculated in the Golomb encoder block). This solution is more flexible also in comparison to classic solutions using the adaptive Rice code.
In the proposed solution, the increase in efficiency of compression compared to the most efficient option MP4-ALS-RM23 (working in backward adaptation mode) was possible due to introducing a more efficient OLS+ block in place of RLS, adding an additional CDCCR block in the cascaded predictive model and introducing an efficient CABAC based on initial compression using the adaptive Golomb code. The introduced improvements entail a disproportionately large increase in the implementation complexity relative to the reference version, for which was adopted MP4-ALS-RM23 in the best mode. Despite the increasing implementation complexity it has been kept at a level enabling real-time encoding and decoding. Similar conclusions can also be drawn by comparing the reference version (the best mode) with the MP4-ALS-RM23 in default mode. Therefore, one should be aware that shortening the length of the result files for each subsequent percentage is paid by the increasing costs of increasing the implementation complexity. This is similar to the nonlinear increase in energy necessary to speed up objects that want to approach the speed of light in vacuo. The complexity of an efficient cascade prediction system also is possible to reduce by regulation not only by modifying the orders of predictive models in NLMS blocks. We propose using the ALCM+ block with a much lower implementation complexity compared to RLS and OLS+ blocks.

Author Contributions

Conceptualisation, G.U. and C.W.; methodology, G.U.; software, G.U., C.W.; validation, G.U., C.W. formal analysis, G.U.; investigation, G.U., C.W.; resources, G.U., C.W.; data curation, C.W.; writing—original draft preparation, G.U., C.W.; writing—review and editing, C.W.; visualization, C.W.; supervision, G.U.; project administration, G.U.

Funding

This research received no external funding.

Acknowledgments

We would like to thank all open-access publishers and free access databases for allowing us to use their collections. We would also like to thank our university for its support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. The Moving Picture Experts Group Homepage. Available online: https://mpeg.chiariglione.org/ (accessed on 28 April 2019).
  2. Liebchen, T.; Reznik, Y.A. Improved Forward-Adaptive Prediction for MPEG-4 Audio Lossless Coding. In Proceedings of the 118th AES Convention, Barcelona, Spain, 28–31 May 2005; pp. 1–10. [Google Scholar]
  3. Andriani, S.; Calvagno, G.; Erseghe, T.; Mian, G.A.; Durigon, M.; Rinaldo, R.; Knee, M.; Walland, P.; Koppetz, M. Comparison of lossy to lossless compression techniques for digital cinema. In Proceedings of the International Conference on Image Processing ICIP’04, Singapore, 24–27 October 2004; Volume 1, pp. 513–516. [Google Scholar]
  4. Monkey’s Audio Codec Homepage. Available online: http://www.monkeysaudio.com/ (accessed on 28 April 2019).
  5. Free Lossless Audio Codec Homepage. Available online: https://www.xiph.org/flac (accessed on 28 April 2019).
  6. Robinson, T. SHORTEN: Simple Lossless and Near-Lossless Waveform Compression; Technical report 156; Cambridge University Engineering Department: Cambridge, UK, 1994; pp. 1–17. [Google Scholar]
  7. Hybrid Lossless Audio Compression Homepage. Available online: http://wavpack.com/ (accessed on 12 October 2019).
  8. NanoZip Experimental File Archiver Software Homepage. Available online: http://nanozip.ijat.my/ (accessed on 1 October 2019).
  9. TTA Lossless Audio Codec Overview Homepage. Available online: http://tausoft.org/wiki/True_Audio_Codec_Overview (accessed on 12 October 2019).
  10. Tom’s lossless Audio Kompressor (TAK) Codec Homepage. Available online: http://thbeck.de/Tak/Tak.html (accessed on 12 October 2019).
  11. Lossless Audio (LA) Compression Program Homepage. Available online: http://www.lossless-audio.com/ (accessed on 14 October 2019).
  12. Coding of Audio-Visual Objects ISO Standard Homepage. Available online: https://www.iso.org/standard/53943.html (accessed on 28 April 2019).
  13. Sayood, K. Introduction to Data Compression, 5th ed.; Morgan Kaufmann Publishers/Elsevier: Cambridge, MA, USA, 2018. [Google Scholar]
  14. Wernik, C.; Ulacha, G. Analysis of inter-channel dependencies in audio lossless block coding. In Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS), Poznań, Poland, 9 September 2018; pp. 135–139. [Google Scholar]
  15. Ulacha, G.; Stasiński, R. Lossless Audio Coding by Predictor Blending. In Proceedings of the 36th International Conference on Telecommunications and Signal Processing (TSP), Rome, Italy, 2–4 July 2013; pp. 502–506. [Google Scholar]
  16. Ulacha, G.; Stasiński, R. Entropy coder for audio signals. Int. J. Electron. Telecommun. 2015, 61, 219–224. [Google Scholar] [CrossRef]
  17. Wernik, C.; Ulacha, G. Application of adaptive Golomb codes for lossless audio compression. In Proceedings of the 23rd Conference Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznań, Poland, 19–20 September 2018; pp. 203–207. [Google Scholar]
  18. Golomb, S.W. Run-length encoding. IEEE Trans. Inf. Theory 1966, 12, 399–401. [Google Scholar] [CrossRef]
  19. Moriya, T.; Yang, D.; Liebchen, T. Extended Linear Prediction Tools for Lossless Audio Coding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Montreal, QC, Canada, 17–21 May 2004; Volume 3, pp. 1008–1011. [Google Scholar]
  20. Ghido, F.; Tabus, I. Sparse Modeling for Lossless Audio Compression. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 14–28. [Google Scholar] [CrossRef]
  21. Biswas, A.; den Brinker, B. Lossless compression of digital audio using Laguerre-based pure linear prediction. In Proceedings of the SPS 2004 (4th IEEE Benelux Signal Processing Symposium), Hilvarenbeek, The Netherlands, 15–16 April 2004; pp. 49–52. [Google Scholar]
  22. Wernik, C.; Ulacha, G. Audio lossless encoding with adaptive Context-Dependent Constant Component Removing. In Proceedings of the 27th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 19–21 September 2019. [Google Scholar]
  23. Saeed, V. Vaseghi Advanced Digital Signal Processing and Noise Reduction, 2nd ed.; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2000; pp. 228–262. ISBN 0-471-62692-9. [Google Scholar]
  24. Huang, H.; Fränti, P.; Huang, D.; Rahardja, S. Cascaded RLS-LMS prediction in MPEG-4 lossless audio coding. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 554–562. [Google Scholar] [CrossRef]
  25. Ravelli, E.; Gournay, P.; Lefebvre, R. A Two-Stage MLP+NLMS Lossless coder for stereo audio. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’06), Toulouse, France, 14–19 May 2006; Volume 5, pp. 177–180. [Google Scholar]
  26. Makino, S.; Kaneda, Y.; Koizumi, N. Exponentially weighted stepsize NLMS adaptive filter based on the statistics of a room impulse response. IEEE Trans. Speech Audio Process. 1993, 1, 101–108. [Google Scholar] [CrossRef]
  27. Sayood, K. Lossless Compression Handbook; Academic Press: San Diego, CA, USA, 2003. [Google Scholar]
  28. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  29. Huang, H.; Rahardja, S.; Lin, X.; Yu, R.; Fränti, P. Cascaded RLS-LMS prediction in MPEG-4 lossless audio coding. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2006), Toulouse, France, 14–19 May 2006; Volume 5, pp. 181–184. [Google Scholar]
  30. Ulacha, G.; Stasiński, R. Performance Optimized Predictor Blending Technique for Lossless Image Coding. In Proceedings of the 36th International Conference on Acoustics, Speech and Signal Processing ICASSP’11, Prague, Czech Republic, 22–27 June 2011; pp. 1541–1544. [Google Scholar]
  31. Ye, H.; Deng, G.; Devlin, J.C. Adaptive linear prediction for lossless coding of greyscale images. In Proceedings of the IEEE International Conference on Image Processing (CDROM), Vancouver, BC, Canada, 10–13 September 2000. [Google Scholar]
  32. Ulacha, G.; Stasiński, R. Context based lossless coder based on RLS predictor adaptation scheme. In Proceedings of the International Conference on Image Processing ICIP 2009, Cairo, Egypt, 7–10 November 2009; pp. 1917–1920. [Google Scholar]
  33. Wu, X.; Memon, N.D. CALIC-A Context Based Adaptive Lossless Image Coding Scheme. IEEE Trans. Commun. 1996, 45, 437–444. [Google Scholar] [CrossRef]
  34. Weinberger, M.J.; Seroussi, G.; Sapiro, G. LOCO-I Lossless Image Compression Algorithm: Principles and Standardization into JPEG-LS. IEEE Trans. Image Process. 2000, 9, 1309–1324. [Google Scholar] [CrossRef] [PubMed]
  35. Topal, C.; Gerek, Ö.N. Pdf sharpening for multichannel predictive coders. In Proceedings of the 14th European Signal Processing Conference (EUSIPCO-06), Florence, Italy, 4–8 September 2006. [Google Scholar]
  36. Rice, R.F. Some Practical Universal Noiseless Coding Techniques; JPL Publication 79-22; Jet Propulsion Labolatory: Pasadena, CA, USA, March 1979. [Google Scholar]
  37. Reznik, Y.A. Coding of prediction residual in MPEG-4 standard for lossless audio coding (MPEG-4 ALS). In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Montreal, QC, Canada, 17–21 May 2004; Volume 3, pp. 1024–1027. [Google Scholar]
  38. Deng, G.; Ye, H. Lossless image compression using adaptive predictor combination, symbol mapping and context filtering. In Proceedings of the IEEE 1999 International Conference on Image Processing, Kobe, Japan, 24–28 October 1999; Volume 4, pp. 63–67. [Google Scholar]
  39. Aiazzi, B.; Alparone, L.; Baronti, S. Context modeling for nearlossless image coding. IEEE Signal Process. Lett. 2002, 9, 77–80. [Google Scholar] [CrossRef]
  40. Matsuda, I.; Ozaki, N.; Umezu, Y.; Itoh, S. Lossless coding using Variable Blok-Size adaptive prediction optimized for each image. In Proceedings of the 13th European Signal Processing Conference EUSIPCO-05 CD, Antalya, Turkey, 4–8 September 2005. [Google Scholar]
  41. Bhaskaran, V.; Konstantinides, K. Image and Video Compression Standards: Algorithms and Architectures, 2nd ed.; Kluwer Academic Publishers: Palo Alto, CA, USA, 1997. [Google Scholar]
  42. Salomon, D. Data Compression: The Complete Reference, 3rd ed.; Springer-Verlag as part of Springer Science & Business Media: New York, NY, USA, 2004. [Google Scholar]
  43. Sugiura, R.; Kamamoto, Y.; Harada, N.; Moriya, T. Optimal Golomb-Rice Code Extension for Lossless Coding of Low-Entropy Exponentially Distributed Sources. IEEE Trans. Inf. Theory 2018, 64, 3153–3161. [Google Scholar] [CrossRef]
  44. Test Database. Available online: http://www.rarewares.org/test_samples/ (accessed on 28 April 2019).
  45. Thiele, C.C.; Vizzotto, B.B.; Martins, A.L.M.; da Rosa, V.S.; Bampi, S. A low-cost and high efficiency entropy encoder architecture for H.264/AVC. In Proceedings of the 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC), Santa Cruz, CA, USA, 7–10 October 2012; pp. 117–122. [Google Scholar]
Figure 1. Cascaded RLS-LMS prediction in MPEG-4 lossless audio coding [24,29].
Figure 1. Cascaded RLS-LMS prediction in MPEG-4 lossless audio coding [24,29].
Applsci 09 05218 g001
Figure 2. Schema of our approach for lossless audio coding.
Figure 2. Schema of our approach for lossless audio coding.
Applsci 09 05218 g002
Figure 3. Schema of our approach for lossless audio coding.
Figure 3. Schema of our approach for lossless audio coding.
Applsci 09 05218 g003
Table 1. Comparison first-order entropy using DPCM + ALCM+, DPCM + RLS and OLS+ without and with connection three Normalised Least Mean Square (NLMS) blocks in cascade.
Table 1. Comparison first-order entropy using DPCM + ALCM+, DPCM + RLS and OLS+ without and with connection three Normalised Least Mean Square (NLMS) blocks in cascade.
Without Three NLMS BlocksWith Three NLMS Blocks in Cascade
FileALCM+RLSOLS+ALCM+RLSOLS+
ATrain8.35078.19348.28177.75317.70887.6675
BeautySlept9.83889.74979.84168.88068.81948.7118
chanchan10.397410.380710.353710.293110.227810.1932
death26.40387.05946.58566.64366.90726.3777
experiencia11.551411.571011.582811.368311.319811.2973
female_spech5.85835.69225.77435.74135.57545.5857
FloorEssence10.491710.415310.395210.147110.071210.0134
ItCouldBeSweet9.61819.39869.46309.35959.21499.2689
Layla10.607710.558910.617510.245710.175910.1559
LifeShatters11.197611.149211.169811.015110.965910.9481
macabre10.109010.175810.19529.58199.54899.5070
MaleSpeech5.81775.73105.76325.70545.61565.5813
SinceAlways11.509511.342411.390411.311111.213211.2034
thear112.050011.985512.002611.915911.866011.8490
TomsDiner8.20757.99848.03377.94817.77407.7600
velvet11.440111.211311.137911.071010.829810.7851
Average 1st order entropy9.59069.53839.53689.31139.23969.1816
Table 2. A set of decision rules for determining number of contexts.
Table 2. A set of decision rules for determining number of contexts.
BitsCondition
αi2⋅x(ni) − x(ni − 1) >   x ^ ( n ) , for i = {1, 2, 3}
αi + 3x(ni >   x ^ ( n ) , for i = {1, 2, 3}
αi + 6e(ni) > 0, for i = {1, 2}
αi + 8e(ni) > e(ni − 1), for i = {1, 2, 3}
αi + 11 | x ( n i ) x ^ ( n ) | > T h ( i ) , for i = {1, 2, 3, 4}, Th(i) = {250, 100, 1500, 1500}
αi + 15 | e ( n i ) | > T h ( i ) , for i = {1, 2}, Th(i) = {50, 150}
αi + 17 | e ( n i ) | > T h ( i ) , for i = {1, 2}, Th(i) = {750, 900}
αi + 19x(ni) > x(ni − 1), for i = {1, 2}
α221.75⋅x(n − 1) − 0.75⋅x(n − 2) > x ^ ( n )
Table 3. Entropy and bit-average values for several encoder settings.
Table 3. Entropy and bit-average values for several encoder settings.
FileZero Order EntropyFirst-Order EntropyGolomb Code 1CABAC 2CABAC 3Our Proposition
ATrain11.8947.6647.3707.2107.1357.134
BeautySlept11.9428.7128.5028.3468.2668.265
chanchan14.21410.1819.9599.7959.7289.711
death214.5596.3235.8405.7795.6525.642
experiencia14.85611.28511.10610.94510.87910.874
female_spech12.5375.5804.6374.6114.4964.493
FloorEssence14.70810.0019.5069.3429.2799.267
ItCouldBeSweet15.1259.2628.5448.3838.3088.307
Layla13.75310.1539.8119.6499.5749.573
LifeShatters14.83010.94611.03010.87110.78610.786
macabre13.8529.5069.3149.1599.0829.082
MaleSpeech12.4215.5744.7674.7124.5944.589
SinceAlways14.65211.20010.61510.45610.37910.377
thear115.12811.84711.64411.48711.40511.406
TomsDiner12.6547.7527.3567.1887.1167.114
velvet13.88410.77310.24310.06810.0029.999
average13.8139.1728.7658.6258.5438.539
1 after using Golomb code, 2 using long-term adaptation, 3 using long- and medium-term adaptation.
Table 4. The bit average of 16 encoded test files using different audio coders.
Table 4. The bit average of 16 encoded test files using different audio coders.
FileMonkey’s Audio 1TAK v2.3.0MP4 2LA v0.4bOLS-NLMS [16]Our Proposition
ATrain7.4417.5717.2327.2047.1997.134
BeautySlept8.8268.8508.3058.3188.4918.265
chanchan9.9389.9719.8869.7829.7469.711
death25.9305.7786.6605.9075.8735.642
experiencia11.02911.15310.99210.90810.91110.874
female_spech5.0854.6744.7105.3024.5004.493
FloorEssence9.7509.8419.5099.3629.3559.267
ItCouldBeSweet8.5778.5778.3968.5918.2558.307
Layla9.8859.9439.6919.5869.6339.573
LifeShatters10.87410.96810.83610.77710.82810.786
macabre9.2759.4339.0769.0969.1669.082
MaleSpeech5.2214.7814.8125.2334.6294.589
SinceAlways10.53910.65010.47310.40410.39410.377
thear111.50411.62211.42511.39811.43511.406
TomsDiner7.4237.3417.2687.1537.1167.114
velvet10.50810.31410.21210.24810.0299.999
Bit average8.8638.8428.7188.7048.5978.539
1 Monkey’s Audio in version 4.33, 2 MP4-ALS-RM23 in the best mode.
Table 5. The bit average of 16 encoded test files using different audio coders.
Table 5. The bit average of 16 encoded test files using different audio coders.
FileShorten v3.6.1TTA v3.4.1MP4 1FLAC v1.3.2WavPack v4.60.1Nanozip v0.09a
ATrain8.6378.1897.8627.9337.7927.396
BeautySlept10.72410.18810.04910.0059.8258.671
chanchan10.86310.07710.16010.12710.0329.980
death27.1526.3066.4966.2846.6205.982
experiencia12.29011.33411.37711.37111.25211.099
female_spech7.5395.2675.2425.3295.2045.048
FloorEssence11.46410.17610.20210.1749.9229.677
ItCouldBeSweet11.5879.1869.0169.0048.8588.776
Layla10.87110.30010.37710.34410.2029.859
LifeShatters12.17711.14511.18211.14611.05210.927
macabre10.56410.0629.9659.8959.9289.267
MaleSpeech7.5325.5305.5905.6495.3335.024
SinceAlways12.19211.24311.16411.21110.74910.564
thear112.57411.70311.74211.74611.63511.685
TomsDiner9.7098.5618.5348.4048.0877.480
velvet11.06711.46410.67210.67910.84310.656
Bit average10.4349.4219.3529.3319.2088.881
1 MP4-ALS-RM23 in default mode.
Back to TopTop