2.2. Proposed Adaptive DCT Algorithm
Let us represent the image as a function
g:
where
x and
y are the coordinates of an image pixel. The restored image after compression with thresholds
and
can be defined as follows:
Let us introduce an evaluation function
f. We will assume that it possesses a local minimum at certain optimal threshold values
pm,
ps. Thus, the mathematical definition of the chosen problem is to find the value
for each image. Then, a data-fitting approximation of arbitrary order is applied to the set of “original level of image detail–optimal threshold value” pairs for each of the two thresholds.
It should be noted that, when the value of the function
f is close to the local minimum, the value of MS-SSIM approaches 1. The standard SSIM formula is as follows:
where
is the pixel sample mean of
x,
the pixel sample mean of
y,
the variance of
x,
the variance of
y,
the covariance of
x and
y,
two variables to stabilize the division with weak denominator, and
L the dynamic range of the pixel values,
L = 255,
and
. MS-SSIM is a composition of several SSIM measurements taken for downscaled versions of the image. The relative importance coefficients for the 5 measurements are default and as such 0.0448, 0.2856, 0.3001, 0.2363 and 0.1333, respectively, for the scales from 1 to 5 [
18].
To set up the optimization process, three quality levels were derived. They are different in the target MS-SSIM (minimum acceptable value during optimization) and quantization values:
High level corresponds to target MS-SSIM = 0.98, quantization value 10 and typically minor degradation;
Medium level: target MS-SSIM = 0.90, quantization value 30, noticeable degradation in quality, but many details are preserved;
Low level: target MS-SSIM = 0.80, quantization value 100, significant quality degradation while preserving the most detailed areas.
An example of adaptive DCT is shown in
Figure 3.
Figure 3 shows the basic principles of constructing an image quadtree via the proposed method.
In this study, we will not use RMSE to determine ADCT thresholds and instead use the values of TDV and ITDV, because expanding the tonal distribution of an image increases tone visibility and does not directly affect detail while changing the RMSE value. Vice versa, narrowing the tonal range does not increase detail, yet it changes the RMSE value as well. Examples of tonal histograms of images are shown in
Figure 4.
One can see that the amount of detail cannot be measured by calculation using actual tone values: the RMSE will change as the histogram changes. However, the image histogram can be used for its original purpose, namely visualizing the tonal distribution. Making an array out of pixel brightness values associated with a histogram results in an array of 256 elements, each one corresponding to a pixel brightness value. Each element is then assigned the number of corresponding pixels of the image. The order of these elements does not affect the calculation of the mean value and the variance, so changes in brightness or histogram normalization also do not affect the calculations.
The root mean square error can be calculated for this distribution of 256 tone values. However, a more elegant and efficient approach by calculating tonal distribution variance is illustrated further.
For an input image or image fragment consisting of pixels, the average tonal distribution (TD) value is , since the tonal values of the pixels are arbitrarily distributed among the 256 elements.
Therefore, the variance can be calculated as follows:
where
i is the pixel tone index,
is the total number of pixels in the image or fragment and TD is the tonal distribution array, consisting of 256 counts of pixels of specific brightness. For example,
is the number of black pixels and
is the number of white pixels in the fragment or image. Examples of this distribution can be observed in
Figure 4.
Thus, the variance will be sensitive to the input size since there are more pixel hue values in the distribution. However, by instead dividing the sum of squared errors by , a normalized value between 0 and 1 is obtained, regardless of the value of L. A value of 0 means zero variance in the distribution, indicating a completely uniform tonal distribution, and 1 corresponds to the maximum possible variance, where one solid tone fills one tonal distribution array element and leaves other elements empty.
Let us prove the abovementioned statement. If
is the mean TD value, there is only one tone array element filled with L pixels and the rest are empty. Then, the sum of squared errors is
where the first term is the squared error between the mean and the maximum value of the diapason and the second term is the squared error between the mean and the zeros for the remaining 255 out of 256 tones.
This expression is equal to
Therefore, dividing the squared errors’ sum by
instead of 256 provides a more practically applicable and generalized value. This lies between 0 and 1 and is derived from a sum of squared values. To make the distribution practical and gradual we take a square root from the quotient, without changing the properties of the normalized value. Since the value correlates to variance, we call it tonal distribution variance (TDV). The resulting formula is as follows:
For comparison, the RMSE of an image fragment can be calculated with the following formula:
Tonal distribution variance directly correlates with RMSE up to a constant multiplier. However, multiplying or dividing pixel brightness values changes the tonal range and, consequently, the RMSE value. However, the TDV remains constant unless the multiplication makes pixels reach maximum brightness or a division forces individual brightness values to 1. This clearly demonstrates that TDV is more versatile than RMSE due to being independent of the tonal range of the image.
Thus, using this metric allows one to safely set the thresholds inside the 0–1 range for arbitrary image size and tonal range.
The first threshold ps (split) determines whether the image fragment should be further split into four quadrants. If the TDV of a fragment is below ps, then the tonal distribution of the fragment is supposed to be less uniform, contain more tones and, probably, more details, so it is logical to split it to preserve details after performing a discrete cosine transform.
The second threshold
pm (mean) determines whether the fragment appears monochromatic enough (if TDV is above
pm) to be replaced by a single average value of its brightness. The result is the highest possible level of compression since there is only one non-zero DCT coefficient left in the spectrum. Having mean-value tiles does not detract too much from image quality as long as they are small enough not to affect valuable details. For our purposes, it is acceptable to paint unimportant details, like road surface or sky, in solid color, as long as the resulting quality is sufficient. Solid color fragments are demonstrated in
Figure 5.
When constructing a quadtree using the abovementioned thresholds, the fragment is first tested with the threshold pm and only then checked for further fragmentation. This means that for each fragment there are three possible outcomes:
Replacing with a solid color if its TDV is higher than pm;
Leaving as is if its TDV is between pm and ps;
Splitting into four sub-fragments if its TDV is below ps.
Tonal distribution variance appeared to be in the range between 1/15 and 1/10 for many of the traffic images in our experiments. Meanwhile, there are cases where completely solid color frames possess a TDV equal to 1. For convenience and visibility, further on we used the inversion of TDV, called ITDV, to display the distribution of frames with more convenient numbers instead of fractions:
The majority of the frames are then found in the 10–15 ITDV range.
We introduce an optimization process consisting of the following steps: choosing pairs of threshold ADCT values, constructing quadtrees using them, performing discrete cosine transform on their fragments and checking whether the results satisfy certain requirements. The threshold values are optimized for images with various original ITDVs, i.e., the tonal distribution variance is calculated for the entire original image, as we assume it is an acceptable measure of image detail. Optimal thresholds should provide maximum compression efficiency while preserving a certain level of image quality.
The example shown in
Figure 6 illustrates a quality comparison of compressed images with three distinct levels.
For simplicity, the compression ratio is calculated here as a ratio between non-zero DCT coefficients and the total number of coefficients. It does not necessarily portray the actual compression ratio of the stored file, but we consider it sufficiently suitable for this study. Given the appropriate data compression methods for the pixel information, such as run-length encoding, the chosen compression ratio metric allows for good relative comparison between methods. This frees us from the burden of describing a full-scale compression method to implement and measure bits per pixel metric or similar. For example, to compare the performance of ADCT with regular JPEG, one can set appropriate thresholds to split the image into 8 × 8 tiles without creating solid color fragments, mimicking regular DCT, and then compare the resulting ratios.
The optimization process consists of three levels of optimal value search for the ps and pm thresholds. Given that TDV is normalized between 0 and 1, we can evaluate the result of compression with different pairs of threshold values from this diapason, increasing the precision of the search step by step.
At the first level of optimization, pairs of threshold values are evaluated with 0.1 accuracy (0.05:0.05, 0.05:0.15, …, 0.95:0.95). For each pair, a quadtree is constructed for the tested image and ADCT is performed to obtain the compression ratio and MS-SSIM between the result and the original image since the latter must be above the minimal threshold for the chosen compression quality. Then, a pair with the best resulting compression ratio is chosen for the next level of search with higher accuracy (0.01, then 0.001). New values are searched for in the vicinity of the previous optimal values. As a result, optimal values are obtained with an accuracy of 3 decimal digits.
A visual representation of the search algorithm is shown in
Figure 7. The grid represents the pairs of threshold values. The colors, from blue to red, represent the resulting compression ratio value after ADCT: blue represents the minimum value on the search level and red represents the maximum value. Black represents invalid threshold value pairs or that resulting MS-SSIM is below the target value for the chosen quality. A threshold value pair with the biggest compression ratio on one level is searched through at higher precision at the next level, with 3 levels of precision in total. If there are threshold value pairs with similar compression ratios, one with the best MS-SSIM is chosen. However, an equal compression ratio usually means equal quadtree structure and equal MS-SSIM; the first pair by order is chosen then.
The image illustrates the search process on all three levels: first the 0.25:0.35 pair is chosen out of the 0.05:0.05 (top left) to 0.95:0.95 (bottom right) field, then 0.27:0.31 is chosen from the 0.21:0.31–0.30:0.40 field, then the leftmost red cell representing 0.267:0.306 is chosen from the 0.266:0.306–0.275:0.315 field.