Quality Control for the BPG Lossy Compression of Three-Channel Remote Sensing Images

: This paper deals with providing the desired quality in the Better Portable Graphics (BPG)- based lossy compression of color and three-channel remote sensing (RS) images. Quality is described by the Mean Deviation Similarity Index (MDSI), which is proven to be one of the best metrics for characterizing compressed image quality due to its high conventional and rank-order correlation with the Mean Opinion Score (MOS) values. The MDSI properties are studied and three main areas of interest are determined. It is shown that quite different quality and compression ratios (CR) can be observed for the same values of the quality parameter Q that controls compression, depending on the compressed image complexity. To provide the desired quality, a modiﬁed two-step procedure is proposed and tested. It has a preliminary stage carried out ofﬂine (in advance). At this stage, an average rate-distortion curve (MDSI on Q) is obtained and it is available until the moment when a given image has to be compressed. Then, in the ﬁrst step, an image is compressed using the starting Q determined from the average rate-distortion curve for the desired MDSI. After this, the image is decompressed and the produced MDSI is calculated. In the second step, if necessary, the parameter Q is corrected using the average rate-distortion curve, and the image is compressed with the corrected Q. Such a procedure allows a decrease in the MDSI variance by around one order after two steps compared to variance after the ﬁrst step. This is important for the MDSI of approximately 0.2–0.25 corresponding to the distortion invisibility threshold. The BPG performance comparison to some other coders is performed and examples of its application to real-life RS images are presented.


Introduction
A general tendency in modern remote sensing (RS) imaging is to acquire data with higher resolution, for larger territories and more frequently [1]. This leads to collecting data having a very large size and running into typical problems of big data, i.e., difficulties with image processing, transferring, storage, and dissemination [2]. These difficulties with the RS data can be partially solved by their compression [3]. As is widely known, there are lossless and lossy image compression techniques that may be applied for general-purpose images [4]. The former group of methods allows original (undistorted) data to be obtained after decompression. However, in this case, the compression ratio (CR) is relatively small (limited by image entropy) and may reach values of around 5:1 only for hyper-spectral images with very high inter-band correlation [5]. Nevertheless, a larger CR is often needed in practical applications, particularly for RS images; hence, the only possibility is the use of lossy compression with inevitable distortions [6].
An interesting analysis of the influence of the lossy compression on the quality of aerial images using a weighted combination of qualitative parameters is presented in the paper [7], where the multi-criteria decision-making framework has been proposed for quality evaluation. Although there are many different requirements for lossy compression, their priority and type (obligatory or desired) usually depend on the application [8], e.g., there may be an initial condition to provide the minimum required CR. This may happen if a channel bandwidth and/or time of data transferring are limited. In such a situation, a coder should provide a simple way to define the desired CR (similarly to JPEG2000-like methods) and diminish introduced distortions to prevent their negative influence on further image processing, as well as the results of image analysis or classification studied in the paper [9] and object recognition effectiveness depending on the RS image quality [10]. For multi-channel data (e.g., color, multi-spectral, hyper-spectral), the compression performance may be improved by a preliminary decorrelation and the use of three-dimensional (3D) approaches for compression [11][12][13]. Another recent direction of research is the application of deep neural networks for the optimization of observerdependent image compression towards a trade-off between the human visual system and classification accuracy [14]. However, in some applications, there may be another priority related to the introduced distortions (characterized by a specified measure or metric) below a given level, keeping the highest possible CR. Some of the issues may be related to the fast and efficient control (providing) of distortions' level [15], and the choice of an adequate metric to characterize distortions, considering a task to be further solved using compressed data [16]. Some additional requirements may concern a certain format (according to some standards), compression speed, or, e.g., limited power consumption [17].
To satisfy the above requirements concerning the metrics, some pre-requisites have appeared recently, and considerable attention has been paid to metrics able to characterize the quality of RS data [16,18,19], including the artificial visible-like images based on SAR data generated with the use of deep CNNs [20]. In particular, special attention has been paid to the so-called visually lossless compression for RS image browsing and other applications [12,13,[21][22][23]. This is important since compressed RS images are often subject to visual inspection. Meanwhile, JPEG2000 is not the best compression technique, similarly to the well-known full-reference image quality metric SSIM used in [12,13], which is clearly not the best among existing visual quality metrics. Perceptual quality has also received attention in the papers [24,25]. The necessity to provide a desired compression ratio and quality quickly enough is important in practical applications where processing time and resources are limited [26]. No-reference metrics potentially can also be used for quality assessment purposes [27]; however, to date, no-reference metrics are less adequate in characterizing image quality than full-reference ones.
It has been demonstrated recently [16] that the Mean Deviation Similarity Index (MDSI) [28] and some other elementary metrics can perform well in the characterization of three-channel RS images with distortions typical for remote sensing imagery, including distortions caused by lossy compression. It has also been shown that lossy compression, under certain conditions, can lead to practically the same or even better performance of image classification compared to the classification of original uncompressed data [29][30][31][32][33]. This may happen when noise suppression is observed or if distortions cannot be detected visually [33]. This means that two benefits can be provided simultaneously-one obtains the CR that sufficiently differs from unity and an improved (or, at least, not worse) classification is observed. Additionally, sufficient work has been carried out towards accelerating lossy compression while attaining a predefined desired quality. For this purpose, an iterative compression [34] has been found accurate but requiring an unpredictable number of iterations that might cause problems with the time and computational efficiency of compression. To solve this issue, two-step methods and algorithms have been proposed and studied [35,36]. It has been demonstrated that lossy compression providing a given quality according to a chosen quality metric can be carried out with the appropriate accuracy in two steps.
For the first step, one needs an average dependence of a chosen metric on a compression controlling parameter (sometimes referred to as the PCC [35,36]; however, in this paper, the abbreviation CCP is used to avoid confusion with the Pearson Correlation Coefficient, typically used in image quality assessment), obtained in advance for a set of basic images (see details in Section 4). Knowing such a dependence, it is possible to determine the initial CCP that corresponds to a desired quality metric value according to the average rate-distortion curve, i.e., Peak Signal-to-Noise Ratio (PSNR) on quantization step (QS) or the number of bits per pixel (bpp). The value of the metric for the first step may be obtained after compression and decompression of an image with the initial CCP. After this, the CCP is corrected using a linear interpolation, and the average rate-distortion curve to obtain the final CCP is used for final compression in the second step. Such a two-step procedure is quite fast, accurate, and universal, working well particularly for coders based on the Discrete Cosine Transform (DCT) and wavelets (e.g., SPIHT).
Nevertheless, an important feature of the proposed two-step approach is that it may be effectively applied for Better Portable Graphics (BPG)-a novel compression method tending to replace JPEG due to its considerably better performance compared to JPEG, JPEG2000 [21], and some other popular lossy compression techniques. Compression characteristics can be varied by the so-called quality parameter Q; its increase leads to a larger compression ratio but more distortions introduced for the BPG. However, the same value of the Q parameter leads to different quality if it is characterized by a certain quality metric [36], e.g., the PSNR-HVS-M metric [37], where HVS denotes the human vision system and M stands for masking. This means that Q should be adjusted depending on the individual image subject to compression, particularly for grayscale images, as shown in the paper [36]. However, in remote sensing practice, many modern systems acquire multi-channel images, for which the use of 3D compression is expedient.
Thus, the first research goal is to check whether or not it is possible to apply the two-step method for compressing multi-channel RS images-more precisely, three-channel images that include color images and vision range data of multi-spectral imagery. Since there are no commonly accepted quality metrics for an arbitrary number of RS data components, there is also a need to evaluate some recently proposed metrics in terms of their applicability for the quality control of such images. Hence, the second goal of the paper is to investigate some important properties of the MDSI metric, pre-selected as the most appropriate, and to verify its usefulness for the proposed two-step approach. We also analyze the degree of accuracy of the MDSI that should and can be achieved in practice.
The original contributions of the paper are related to the following: • To the best of our knowledge, MDSI has been never used and analyzed for lossy image compression in general and lossy compression of remote sensing data in particular; • MDSI is shown to be very useful for the considered application due to several benefits it provides-in particular, high linear and rank-order correlation of MDSI with MOS values for the types of distortions under interest and fast calculation; • The main areas of MDSI values have been determined and the behavior of the MDSI metric for them has been analyzed; • The analysis has been carried out for the BPG coder that outperforms known standards and provides performance comparable to the state-of-the-art compression techniques; • the two-step procedure of providing the desired quality has been tested for the considered metric and coder, showing peculiarities dealing with the CCP (integer values of the Q parameter).
The paper structure is as follows. Section 2 considers some properties of three-channel images. Some important properties of the MDSI metric are obtained using color images from the TID2013 database [38]. Then, the proposed two-step procedure is described in Section 3 and its implementation for the BPG is studied in Section 4. Some simulation results are provided in Section 5 and the obtained accuracy is discussed in Section 6, followed by the conclusions.

Properties of Three-Channel RS Images
Many modern RS systems produce multi-channel images, where the term "multichannel" concerns multi-spectral, hyper-spectral, dual and multi-polarization radar data, etc. In this paper, we concentrate on three-channel images due to three main reasons. Firstly, together with dual-polarization radar data, three-channel images are the simplest example of multi-channel ones, being also convenient to process and analyze. Having some methods and results obtained for three-channel images, they can be generalized for images with a higher number of channels. Secondly, three-channel RS images can be easily visualized as color ones, leading to the simplicity of visual analysis in comparison to the case of multi-channel image representation in pseudo-colors. Although the wavelengths of the channels of multi-spectral images usually do not coincide with the wavelengths in traditional RGB representations, visualized three-channel images usually look quite clear for an observer. Finally, it is expected that the BPG compression can be extended for images with a higher number of channels. However, at the moment, only the BPG version designed for the compression of three-channel color images is available.
Compared to conventional color images (photos), RS images have some specific features. First of all, they are usually more highly structured and each object has a semantic meaning [39], whereas natural images are more chaotic. These objects present in RS images need to be analyzed in further stages of data processing, particularly target recognition, classification, segmentation, and parameter estimation. For example, the main goal of the image segmentation process related to partitioning an image into a set of homogeneous segments, in terms of chromatography or texture, is highly important for remote sensing data [40]. Meanwhile, the fact that RS images often include large areas of background, which is much less important than foreground objects [41], may also be taken into account in compression.
Secondly, the correlation between component images of three-channel RS data can differ from the correlation of red, green, and blue components of color images. Whilst, for color images, the cross-correlation factor is usually around 0.7 [42], the correlation factor between components of three-channel RS data may be significantly higher [5]. This might influence noticeably the final compression performance if 3D approaches are to be applied.
Since individual channels of RGB images are represented as 8-bit data and such components of RS images might be initially represented in another way, e.g., as 16-bit data, it may be difficult to adequately compare the compression of color and three-channel RS images. Therefore, three-channel images with the 8-bit representation of channels (also referred to as component images) are considered in a further part of the paper. Additionally, if component images originally have more than 8 bits, it is supposed that they are prenormalized to 8 bits before lossy compression. Certainly, this leads to the introduction of some additional errors, although relatively small, due to rounding to the nearest integer. The PSNR of images subject to normalization, rounding-off, and re-normalization is usually around 59 dB, whereas distortions introduced during lossy compression usually lead to a much smaller PSNR. Thus, distortions caused by normalization (if applied) may be ignored in further analysis.
Furthermore, there are no commonly accepted databases of "pristine" (reference, distortion-free) RS images. Moreover, types of distortions inherent for RS images and color images partly coincide but are partly different. For example, image dithering is not met in practice in RS images. Meanwhile, speckle noise is not typical for color images but it might be an important factor for RS images of a special kind (synthetic aperture radar ones). This obstacle prevents the direct use of color image databases for making conclusions and recommendations for RS images. However, recently, the TID2013 dataset has been indirectly used to analyze the subsets of distortion types that might be present in RS images. This has allowed the determination of good elementary visual quality metrics for the adequate characterization of RS image quality and the design of combined metrics. Already known visual quality metrics, used in a combined metric [16] as one of the inputs, are referred to as elementary metrics. In particular, the MDSI [28] has been presented as one of the best elementary metrics [16]. Hence, more details about this metric are provided below, together with an explanation of why it has caught our attention.
It should be kept in mind that the visual quality metrics describe the quality of data from a specific viewpoint and the relation between visual quality metrics and, e.g., text recognition from document images or image classification accuracy is not fully known [29]. Nevertheless, preliminary results of the classification of compressed images have already demonstrated that visual quality has a high correlation with classification accuracy, especially for classes represented by small-sized, prolonged, and textural objects, i.e., for classes that are quite heterogeneous [16]. Since high-frequency information can be lost due to lossy compression for large CR, it might harm the classification as well.
Our task of providing desired quality in compressing a given image can be treated as a particular case of applying the theory of generative adversarial nets [43][44][45] since the image at hand is supposed to belong (according to its basic properties) to the set of images used at the method training stage when the average rate distortion curve has been obtained. Although there are some papers studying how distortions affect image classification and object recognition tasks, particularly with the use of neural networks, it should be kept in mind that lossy compression should not lead to an essential degradation in performance for image classification and object recognition compared to uncompressed original data.

Analysis of Some Elementary Image Quality Metrics
Image quality assessment (IQA) plays a significant role in numerous image processing applications-for example, image acquisition, lossy compression, restoration, denoising, etc. IQA techniques and metrics can be divided into three categories according to the availability of the original image, namely Full-Reference (FR), Reduced-Reference (RR), and No-Reference (NR). Different methods may be chosen according to the requirements, their priority, and their application. In this paper, the FR IQA methods are utilized to evaluate the compressed images and to provide the desired visual quality in RS lossy compression. The main reason is the knowledge of the reference image (this is simply the image to be compressed). Another reason is that the FR IQA metrics are usually simpler and more adequate (accurate) than the metrics that belong to the two latter groups.
The simplest classical FR IQA metric is the Mean Squared Error (MSE), computed by averaging the squared intensity differences of distorted and reference images pixel-wise. Another well-known metric strictly related to MSE is the Peak Signal-to-Noise Ratio (PSNR). The significant advantages of MSE and PSNR are that their calculation is simple and their physical meaning is clear since the contrasting is based on the pixel level. Nevertheless, the most relevant weakness of these metrics is that they are not very well matched to perceived visual quality [46]. Because of this, numerous metrics based on the MSE and other principles, particularly calculated locally using the sliding window approach, have been proposed and intensively studied in the last three decades, i.a., Structural Similarity (SSIM) [47][48][49], Feature Similarity (FSIM) [50], or the MDSI [28].
In this paper, the MDSI is adopted as the visual quality metric for three-channel RS images. To explain the reasons for this choice, some aspects and requirements for IQA should be recalled. As is widely known, subjective image visual quality is assessed during quite complicated experiments involving a large number of participants and test images of different complexity [38]. The result of such testing consists in obtaining Mean Opinion Score (MOS) or Differential Mean Opinion Score (DMOS) values. A metric is considered good if, for different databases, it has high absolute values of correlation factor between a given metric and MOS, where both conventional (Pearson) and rank-order (Spearman and/or Kendall) correlations can be taken into consideration (ideally, it is desired that both Pearson and Spearman correlation coefficients have absolute values close to unity). As the first requirement, monotonicity of dependence of a metric on image quality characterized by Spearman Rank Order Correlation Coefficient (SROCC) is required. Meanwhile, linearity of this dependence, better characterized by the Pearson Linear Correlation Coefficient (PLCC), is desired as well. Note that the above-mentioned FSIM [50] has sufficiently nonlinear behavior on image quality and MOS [51]. During some experiments conducted for 50 metrics [16], SROCC values have been calculated as one quantitative criterion of metric performance for all distortion types and three subsets of the Tampere Image Database (TID2013). According to these calculations, the SROCC value determined for MDSI versus MOS is 0.8897 for all types and levels of distortions, being higher than for most other metrics, whereas the SROCC for the subset Noise&Actual is 0.9374. It is the highest value among all the considered elementary metrics. Meanwhile, the statistics of average calculation time demonstrate that the computational efficiency of the MDSI is very high. Additionally, the SROCC between MDSI and MOS has been calculated for images with three types of distortions related to lossy compression in the TID2013 dataset. It is equal to 0.966, i.e., very high, meaning that the MDSI is able to adequately characterize the visual quality of lossy compressed images.
Although the detailed SROCC values for 50 elementary metrics obtained for three subsets and the whole TID2013 dataset may be found in the paper [16], an additional verification of these metrics may be conducted for the Konstanz Artificially Distorted Image quality Database (KADID-10k) [52], containing 81 pristine images, each degraded by 25 distortions in five levels-particularly for the subsets containing distortions characteristic for RS images. Considering the JPEG and JPEG2000 compressed images, only two metrics-HaarPSI and MDSI-achieve SROCC values over 0.925 and PLCC over 0.94 (after nonlinear fitting) simultaneously, demonstrating both high prediction monotonicity and accuracy. Nevertheless, it is worth noting that the HaarPSI metric is around 1.4 times slower than MDSI, being one of the fastest elementary metrics (detailed results are presented in the paper [16]). The performance comparison of individual metrics for the lossy compressed images from the KADID-10k dataset is presented in Appendix A (Table A1).

The MDSI Metric and Its Properties
An important property of the MDSI metric is that, during its computation, a gradient magnitude is used to measure structural distortions, whereas chrominance features are used to measure color distortions (recalling that both these types of distortions are equally important for three-channel RS images). Subsequently, the two obtained similarity maps are combined to form a gradient-chromaticity similarity map. Differently than for SSIM and FSIM, the deviation pooling strategy is used to compute the final quality score. In comparison to previous research, this new gradient similarity map is more likely to follow the human visual system (HVS).
Providing the desired visual quality in lossy compression is a challenging task; however, it would be possible if a metric value was associated with a certain level of quality. It could also be useful to know a range of metric values for which distortions are practically invisible. In lossy compression, the desired visual quality is often within a certain range, also for RS images. As illustrated by some already completed analyses based on other metrics [33], the lower limit is such that lossy compression has no negative impact on further image processing. It means that the introduced distortions may be noticeable or even visible but not annoying. Concerning the upper limit, the lossy compression should provide a higher CR than possible to achieve by the lossless compression (limited by entropy). A reasonable threshold should be set in such a way that the introduced distortions are invisible, so the visual quality of compressed data should be identical to lossless compressed images but higher CR can be achieved. This threshold is around 40 dB in terms of the PSNR-HVS-M metric multi-channel RS images [33].
To provide a reasonable range for the metric MDSI, we have tested 3000 color images of the database TID2013 to obtain the statistical data results [38] and put them into three categories, namely visible, just noticeable, and invisible distortions. Since MOS values have been provided for each image in the TID2013 dataset, the scatter plot for MDSI vs. MOS for the three mentioned classes of images is shown in Figure 1. Therefore, the reasonable range (under interest in this paper) is set as the range from 0.10 to 0.25. It is also worth noting that the relation between MDSI and MOS is almost linear and this should be considered as one more advantage of the MDSI metric.

The Two-Step Method for Lossy Compression
The two-step image compression method has been recently proposed to control the visual quality in lossy compression, and further provide the desired visual quality for handling images in terms of a chosen visual quality metric. The previous research has proven that this method works well for DCT-based coders (such as AGU, ADCTC) [35,53] as well as for the DWT-based coder (SPIHT) [54]. The latest conference paper [36] demonstrates also some initial results and the advantages of the two-step compression method for the BPG coder.
Although lossy compression can easily achieve higher CR than lossless compression, usually, it reduces the visual quality of an image noticeably. However, the visual quality is also important and it can be even the most important requirement in some cases. Consequently, in such applications, lossy compression should be applied with additional control of introduced distortions. To control the visual quality in lossy compression, the CCP can be adjusted (or just properly set), considering the so-called rate-distortion curve, representing the dependence of visual quality on the CCP. However, for a given lossy compression coder, visual quality dependence on the CCP varies, depending also on image characteristics. Although it is difficult to know the rate-distortion curve for each image to be compressed in advance, it is still possible to obtain the general trend for particular image categories, e.g., three-channel RS images.
Since the provided approach should be possible to apply for images of different terrains or, in other words, of different complexity containing various types of objects, a relatively high number of test images should be used in experiments. This requirement is one of the main reasons for the methodology of design and analysis applied at different stages of the study presented in the paper.
First of all, a certain number of images are chosen to be compressed/decompressed assuming a series of CCP values, further referred to as the basic image set (sample images are shown in Figure 2). Each value of MDSI for each test image obtained after compression/decompression using given CCP values should be registered to evaluate the distortions. As a consequence, a model of visual quality metric dependence on the CCP may be obtained from these statistic data. At this stage, it is possible to obtain individual dependencies of the metric on the CCP, further averaged for all basic images corresponding to each CCP (more details are provided in Section 4). In this way, it is also possible to obtain an averaged dependence, the so-called average rate-distortion curve, which reflects the monotonous change in visual quality with the CCP. This process is performed offline; hence, it does not influence the time efficiency of the two-step method. Based on this average rate-distortion curve, the second step of the method can be carried out.
The block diagram of the two-step compression method is illustrated in Figure 3; in the first step, the initial CCP is determined using the desired visual quality and the average rate-distortion curve. In general, it is calculated using the following equation: where M dst is the desired visual quality pre-set by the user, and the other three parameters come from the average rate-distortion curve presented in a tabular form. The M avg is the value closest to the desired M des at the right end of the corresponding interval of the average rate-distortion data array, whereas CCP est is the value corresponding to M avg . The curve derivative for the corresponding CCP est is denoted as M . These calculations do not require image compression, so the CCP init value is the same for all images being compressed, assuming a given desired visual quality. The first step of the proposed method is the compression and decompression of the original image with the initial CCP; subsequently, the visual quality value M init of the decompressed image could be calculated using the original image as the reference. Since the metric value M init is close to the desired M des , the absolute error is calculated as further evaluated as acceptable or not to undertake a decision concerning further actions. It is worth noting that, for some images, the above ∆M error can achieve the required level, so the second step is not needed and the compressed image in the first step can be treated as the final output.
To improve the accuracy of provided visual quality, the CCP value needs to be corrected before the second step using the following equation: This corrected CCP value may be different for different images. Finally, the second step, compression, is carried out using the CCP des , and the compressed image file obtained after the second step is considered as the final output with the desired quality.  Since the provided approach should be possible to apply for images of different terrains 33 or, in other words, of different complexity containing various types of objects, a relatively 34 high number of test images should be used in experiments. This requirement is one of the 35 main reasons for the methodology of design and analysis employed at different stages of 36 the study presented in the paper. 37 First of all, a certain number of images are chosen to be compressed/decompressed 38 assuming a series of CCP values, further referred to as the basic image set. Each value of 39 MDSI for each test image obtained after compression/decompression using given CCP 40 values should be registered to evaluate the distortions. As a consequence, a model of 41 visual quality metric dependence on the CCP may be obtained from these statistic data. At

The Implementation of the Two-Step Method for the BPG Coder
Since the demand for high-quality images with acceptable size has been dramatically increased, there have been numerous conventional lossy compression coders developed [21,31,[54][55][56] that may be applied not only for typical three-channel RGB images. One of the most interesting recent concepts is known as the Better Portable Graphics (BPG), which is a novel image compression method that aims to replace the decades-old JPEG format due to considerably better performance in the sense of higher quality and/or lower size [57]. Even though, recently, some learned lossless and lossy image compression methods have been proposed, the BPG is still highly competitive [58]. In this paper, it is utilized as the lossy compression coder applied to compress three-channel RS images. The description of the implementation of the two-step compression method for the BPG coder is provided further.

The BPG Coder
The BPG coder is based on the High Efficiency Video Coding (HEVC) [59,60] technique, which was proposed by Fabrice Bellard in the form of the open-source code (More details available at: https://bellard.org/bpg/ (accessed on 23 February 2022)). Compared to JPEG, the BPG coder ensures a higher compression ratio for the same quality. Additionally, the BPG has the following important attributes: • it provides a JavaScript decoder, increasing the accessibility of the BPG to most web browsers and portable applications due to its lightweight code; it supports lossless and lossy compression.
As mentioned above, the BPG algorithm is an essential part of the HEVC video encoder [59,60], which is treated as a sufficient advance in video compression technology. The high efficiency of HEVC is explained by the intelligent approach to block size and the use of the DCT or the Discrete Sine Transform in the spatial frequency domain.
Regardless of some patent issues of the HEVC implementations (More details available at: https://www.loc.gov/preservation/digital/formats/fdd/fdd000530.shtml (accessed on 30 March 2022)), all the above features make the open-source BPG a highly competitive candidate for the replacement of many existing image compression algorithms. Some details concerning its hardware implementation may be found in the paper [61].
The parameter Q, used internally to control the compression ratio and image quality in the BPG encoder, is considered as the CCP, where Q ∈ {1, . . . , 51} and a larger Q results in a higher CR, corresponding to lower visual quality [36]. Unlike the CCP in AGU and SPIHT, the parameter Q takes only integer values. The BPG encoder may also use JPEG or PNG images as input, performing BPG compression, and producing the corresponding BPG image.
The BPG performance has already been compared to many other coders in terms of standard and visual quality metrics [62][63][64]. It has been demonstrated that the BPG outperforms JPEG and JPEG2000 [62]. It also performs approximately at the same level as modern autoencoders based on learning [63,64].
In addition, a special comparison has been carried out for the considered case, i.e., for the MDSI metric and color RS images. Two 512 × 512 pixel color images for simple and complex structures (Frisco and Diego, respectively), shown in Figure 4   The CR values have been obtained for the BPG (version 4:2:2), the 3D version of the AGU coder [65], JPEG, and JPEG2000 [66]. The data are collected in Table 1. Their analysis shows that the BPG slightly outperforms the 3D AGU and JPEG2000 for the middle values of the Q parameter (high quality of compressed images) and has obvious benefits for large Q values (low quality of the compressed image). Additional data on the comparison of coders can be found in [32,33], showing that better performance of a coder usually leads to better classification.

Calculation of the Average Rate-Distortion Curve According to the MDSI
The prime task in the implementation of the two-step method for the BPG coder is to obtain the average rate-distortion curve for the considered visual quality metric (MDSI). Based on the analysis provided in Section 3, this average rate-distortion curve results from statistical data from a certain number of images. In this paper, twelve 1024 × 1024 pixel three-channel RS images taken from the USC-SIPI dataset (volume 2: Aerials) (The USC-SIPI image database is available at: http://sipi.usc.edu/database/database.php?volume= aerials (accessed on 23 February 2022)) have been chosen as the basic image set, shown in Figure 5. Then, the compression and decompression have been carried out for each image with various CCP values, where all possible values of the Q parameter have been used.
To obtain the average rate-distortion curve, MDSI values have been calculated for all 612 images (12 original images and 51 values of the Q parameter), and further averaged for the individual Q values. The obtained curves are presented in Figure 6 independently for each of the images used in the experiments. The obtained curves may be divided into four areas for further analysis:  Consequently, the optimal work area of the Q parameter seems to be [26,40], and the corresponding range of the MDSI values is [0.1, 0.25].
The two-step compression can be carried out for the BPG using the average ratedistortion curve (shown in Figure 6), but the implementation differs from the previous scheme. Formula (1) is not needed and the Q init may be set as the nearest integer Q producing the desired MDSI value according to the average rate-distortion curve. In other words, Q init equals Q est . Additionally, since the value of Q after the correction according to Equation (3) is, most probably, non-integer, a rounding-off operation is needed for the Q. More detailed experimental results obtained for the basic image set are provided in Appendix A (Table A2).
Since the curves in Figure 6 partly overlap, the MDSI values obtained for ten values of the Q parameter for the considered test images are also presented in Table 2. Whilst the MDSI values are almost the same for small Q values (where the introduced distortions are anyway invisible), sufficient differences can be observed for large Q (e.g., data for Q equal to 40 and 45, which are of practical interest). Table 2. Results of the MDSI metric obtained for 10 values of the Q parameter using the basic images (RSI #1-#12) presented in Figure 6. To obtain an initial insight about the CR and its variation depending on image properties, Figure 7 illustrates the corresponding data. Whilst the CR values are small and do not vary significantly for small Q values, they are quite large (significantly larger than for lossless compression) and vary strongly for the Q values starting from 20. This explains our special interest in the Q values within the limits from around 20 to approximately 45, corresponding to MDSI values from around 0.07 to 0.28. The pseudo-code illustrating the main idea of the method is provided in Algorithm 1.

The Experimental Results
To evaluate the performance of the proposed two-step method for the BPG coder for three-channel RS images, some additional test experiments are necessary since the average rate-distortion curve model has been obtained only for the basic image set, which might be not representative enough. Consequently, the two-step compression method has been applied firstly for the basic image set to provide four typical values for the MDSI metric, representing the four classes provided in Section 4.2. These four typical values have been set as 0.1, 0.15, 0.2, and 0.25, respectively, and the achieved statistical data are shown in Table 3, where M des denotes the desired value of the MDSI metric, VAR 1 stands for the variance of MDSI provided in the first step, and VAR 2 is the variance of MDSI provided in the second step. For a better understanding of the data, the mean MDSI values finally provided in the second step are provided as well, denoted as MEAN 2 . The analysis of the data provided in Table 3 leads to the conclusion that the variance after the second step of compression has decreased by approximately one order of magnitude for each desired value. It proves that the proposed two-step procedure works well in the considered conditions. It can also be noticed that both variances VAR 1 and VAR 2 tend to increase if the desired MDSI increases. This means that the task of providing the desired MDSI is more important for larger M des values, e.g., 0.2 or 0.25 (for M des = 0.1, the distortions are invisible and they remain invisible if the desired MDSI is provided with the error of around 0.01).
The mean absolute error of the desired quality, calculated as |M des − MEAN 2 |, does not exceed 0.034, and its value increases as the desired visual quality decreases, which is similar to the trend observed in previous works with the other coders [35,54].
To verify the representativeness of the basic set, the other 12 RS images have been chosen as the test image set, shown in Figure 8 (further referred to as RSI #13-#24), which is also a part of the USC-SIPI dataset. Then, the two-step compression method has been applied to these images to verify the correctness and universality of the previously obtained curve model, leading to the statistical data shown in Table 4   As shown in Table 4, for each desired MDSI value, the variance after the second step of compression has also decreased by approximately an order of magnitude, and the mean error does not exceed 0.035. As one can see, the tendencies and values are similar to those observed in Table 3, so the basic image set has been chosen correctly to obtain the average rate-distortion curve and this model works well for other three-channel RS images.
To analyze the data for 12 test images (RSI #13-#24) in detail, results obtained for M des = 0.25 are presented in Table 5, where Q init denotes the parameter Q used for the first step of compression. It comes from the average rate-distortion curve and equals 45 for all images. As may be seen in Table 5, although the initial MDSI values are different for individual test images, their variance after the second step has decreased significantly, being only 1/15 of its value after the first step. The mean value of the MDSI after the second step is also noticeably closer to 0.25; hence, its average relative error has also decreased (from 2.08% to 1.4%). In general, the accuracy has radically improved due to the second step of compression. Meanwhile, there are cases when the MDSI after the second step is the same as for the first step, e.g., this happens for RSI #13. This means that there is no need to correct the Q parameter and apply the second step of compression in such cases. As shown in Table 5, 5 out of 12 images needed only one-step compression to meet the quality requirements, whilst the other seven images needed the second step to improve the accuracy. For all verification experiments (carried out for all 24 images and four desired MDSI values), 28.1% of images needed only the first step of the two-step compression to provide the desired visual quality. The other tables with more detailed results, both for the basic and the test image sets, achieved for different desired MDSI values, are provided in Appendix A (Tables A3-A9).

Discussion
To analyze the accuracy of the provided visual quality for the BPG-based lossy compression of three-channel remote sensing images, three images (RSI #13, #14, and #16) have been selected as representative examples with the desired visual quality (MDSI) equal to 0.25.
The decompressed images for the two-step compression method are shown in the middle (third) column in Figure 9. For the desired MDSI value equal to 0.25, the initial Q is equal to 45; the calculated Q des values are different for different images (equal to 44, 45, and 47, respectively). The two images on the left from the third column are the images obtained if the parameter Q is set as Q des − 1 and Q des − 2, and the two images on the right are the images when Q is set as Q des + 1 and Q des + 2.
For RSI #14, Q init appears to be the appropriate value as for its change (increase or decrease), the error ∆ MDSI = MDSI f inal − MDSI des increases. For RSI #13, the parameter Q is corrected to 44, and in comparison to the four other values, compression with the Q des produces MDSI f inal , which is the closest to the MDSI des . In contrast, for RSI #16, the initial Q is corrected to 47, and compression with this Q des produces the smallest error between MDSI f inal and MDSI des . Concerning the error, for RSI #16 considered as an example, the provided MDSI f inal is 0.2447, and the error between MDSI f inal and MDSI des is 0.0053. Figure 9 shows five decompressed images resulting from RSI #16, where images compressed with two values of Q differing by unity seem to be practically identical, but if Q differs by 2 or more, e.g., for RSI #16 (45, Figure 9. The sample decompressed images obtained for the Q values around Q des . The values provided in brackets concern Q, MDSI, and CR values, respectively. In summary, for images where the second step is necessary, regardless of whether the correction is forward or reverse (initial Q is increased or decreased), it gives a positive impact and eventually provides the visual quality that is the closest to the desired one. Additionally, the CR values provided for image lossy compression in the neighborhood of the distortion invisibility threshold are considerably higher than possible to achieve using a lossless compression.
To analyze the computational efficiency of the proposed approach, some tests have also been performed using a notebook with an Intel ® Core™ i7-4710HQ CPU @2.50 GHz and 16.0 GB RAM, controlled by the 64-bit Windows 10 Pro operating system for the x64 processor architecture. For 512 × 512 pixel images, the compression time is from 0.02 s to 0.05 s depending on image complexity and the value of the parameter Q (a larger time is needed for more complex structure images). The decompression time is from 0.006 s to 0.019 s (more time is spent on the decompression of more complex structure images). For 1024 × 1024 pixel images, the compression time is from 0.06 s to 0.12 s; the decompression time is sufficiently smaller (from 0.02 s to 0.06 s). The MDSI values can be calculated very quickly (the time for their calculation is only around 1.5 times longer than for the calculation of MSE).

Conclusions
In this paper, a two-step algorithm for providing the desired visual quality for the BPG-based lossy compression of three-channel remote sensing images has been proposed. The MDSI metric has been applied to evaluate the visual quality of the decompressed image. The main contributions of this paper concern the extensions of the basic two-step algorithm utilizing the MDSI metric and some features of the BPG encoder, e.g., integer form of the Q parameter, as well as the properties of three-channel RS images, such as high correlation of multi-channel data.
The MDSI metric has been studied to evaluate the quality of three-channel images, providing a reasonable operation range for the lossy compression of three-channel RS images.
Three visual quality levels have been proposed, corresponding to the MDSI values appropriate for excellent quality, good quality, as well as middle and bad quality, respectively.
Experimental results have demonstrated the superiority of the proposed algorithm. It allows images to be compressed quickly and with appropriate errors concerning the desired quality characterized by the MDSI metric. If the parameter Q, calculated after the second step, is equal to the initial Q, the second step could be skipped and the procedure may be accelerated for some similar images or video frames. Otherwise, the second step is needed to improve the accuracy with the corrected Q value. Statistical data show that, due to the second step, the accuracy is considerably improved and the provided visual quality is very close to the desired one. Our methodology is quite general and can also be applied to some other metrics having similar performance to MDSI.
In the future, it is expected that BPG-based lossy compression can be applied to provide the desired characteristics of the classification of decompressed images. A study concerning the impact of visual quality evaluated by the MDSI on the accuracy of classification is planned, as well as a discussion of the applicability of the two-step algorithm for the BPG coder in the classification task of high-resolution multi-channel remote sensing images.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
The tables provided in the Appendix A contain the performance of the selected elementary metrics for KADID-10k dataset (Table A1), experimental results obtained for the basic image set (Table A2) as well as the results achieved for various desired MDSI values.