Lossy Compression of Multichannel Remote Sensing Images with Quality Control

: Lossy compression is widely used to decrease the size of multichannel remote sensing data. Alongside this positive e ﬀ ect, lossy compression may lead to a negative outcome as making worse image classiﬁcation. Thus, if possible, lossy compression should be carried out carefully, controlling the quality of compressed images. In this paper, a dependence between classiﬁcation accuracy of maximum likelihood and neural network classiﬁers applied to three-channel test and real-life images and quality of compressed images characterized by standard and visual quality metrics is studied. The following is demonstrated. First, a classiﬁcation accuracy starts to decrease faster when image quality due to compression ratio increasing reaches a distortion visibility threshold. Second, the classes with a wider distribution of features start to “take pixels” from classes with narrower distributions of features. Third, a classiﬁcation accuracy might depend essentially on the training methodology, i


Introduction
Nowadays, remote sensing (RS) is used in numerous applications [1][2][3] due to the following main reasons.Different types of useful information can be potentially retrieved from RS images, especially high resolution and multichannel data (i.e., a set of co-registered component images of the same territory acquired for different wavelengths, polarizations, even by different sensors [3][4][5][6]).The modern RS sensors often offer a possibility of fast and frequent data collection-good examples are multichannel sensors Sentinel-1 and Sentinel-2 which have been launched recently and started to produce a great amount of valuable RS data [7,8].
Therefore, the volume of RS data greatly increases due to the aforementioned factors: better spatial resolution, a larger number of channels, more frequent observations.This causes challenges in RS data processing that relate to all basic stages of their processing: co-registration, calibration, pre-or post-filtering, compression, segmentation, and classification [9,10].One of the most serious challenges is the compression of multichannel RS images [11][12][13].Compression is applied to diminish data size before their downlink transferring from spaceborne (more rarely, airborne, UAV) sensors, to store acquired images in on-land centers of RS data collecting or special depositories, to pass images to potential customers.Lossless compression is often unable to meet requirements to compression ratio (CR) that should be provided, since, even in the most favorable situations of high inter-band correlation of component images [14], CR attained by the best existing lossless compression techniques reaches 4 . . . 5 [12].
Lossy compression can provide considerably larger CR values [12,13,[15][16][17] but at the expense of introduced distortions.It is always a problem to reach an appropriate compromise between compressed image quality (characterized in many different ways) and CR [11][12][13][18][19][20].There are several reasons behind this.First, compression providing a fixed CR often used in practice [21,22] leads to compressed images whose quality can vary in wide limits [23].For a given CR and simpler structure images, introduced distortions are smaller and compressed image quality is higher, whilst for images, having a more complex structure (containing many small-sized details and textures), losses are larger and, thus, image quality can be inappropriate since some important information can be inevitably lost, which is undesired.Certainly, some improvements can be reached due to the employment of a better coder, adaptation to image content [24], use of inter-channel correlation by three-dimensional (3D) compression [12,13,21,22], or some other means.However, the positive effect can be limited, and/or there can be some restrictions, e.g., the necessity to apply image compression standard [12,19,21].
Thus, we are more interested in a different approach that presumes lossy compression of multichannel RS images with a certain control of introduced distortions combined with simultaneous desire to provide a larger CR.Let us explain the advantages of such an approach, when it can be reasonable, and what the conditions are for its realization.Recall here that RS data compressed in a lossy manner can be useful if: (a) introduced losses do not lead to sufficient reduction of solving the final tasks of RS image processing such as segmentation, object detection, spectral unmixing, classification, parameter estimation [17,[25][26][27][28][29][30][31][32][33][34]; (b) distortions due to lossy compression do not appear themselves as artifacts leading to undesired negative effects (e.g., appearing of ghost artifacts) in solving the aforementioned final task.
Below, we consider the impact of lossy compression on multichannel RS image classification [27,28,[33][34][35].Specifically, we focus on the case of a limited number of channels, e.g., color, multi-polarization, or multispectral images, due to the following reasons: (a) it is simpler to demonstrate the effects that take place in images due to lossy compression and how these effects influence classification just for the case of a small number of image components; (b) an accurate classification of multichannel images with a small number of components is usually a more difficult task than the classification of hyperspectral data (images with a relatively large number of components) because of a limited number of available features and the necessity to reliably discriminate classes in feature space.
Returning to the advantages of lossy compression of multichannel RS data, we can state the following.First, images compressed in a lossy manner might be classified better than original images [27,34].Such an effect usually takes place if original images are noisy and coder parameters are adjusted so that the noise removal effect due to lossy compression [13,[36][37][38] is maximal.However, even if the noise in images is absent (or, more exactly, peak signal-to-noise ratio (PSNR) values in original images are high and noise is not seen in visualized component images), the accuracy of image classification may remain practically the same for a relatively wide range of CR variation [27,34,38].
This range width can be different [39].Its width depends on the following main factors: • Used classifier, how efficient it is, how it is trained; • Applied compression technique (coder); • How well classes are discriminated, and which features are used; • How many classes are present in each multichannel image; • How "complex" is a considered image (is it highly textural, does it contain many small-sized or prolonged objects, what is the mean size of fragments belonging to each class).
Therefore, researchers have a wide field of studies intended on understanding how to set a CR or a coder's parameter that controls compression (PCC) for a given multichannel image, used lossy compression technique, and considered classifier.Here we would like to recall some already known aspects and dependencies that will be exploited below.First, for the same CR, considerably larger distortions can be introduced into images with higher complexity [40].Moreover, larger distortions can be introduced into more complex and/or noisier images for the same PCC, for example, quantization step (QS) of coders based on orthogonal transforms (discrete cosine transform (DCT) or wavelets [40]).Second, the desired quality of a compressed image (a value of a metric that characterizes image quality) can be nowadays predicted and provided with an appropriate accuracy [40,41], at least, for some coders based on DCT as, e.g., the coder AGU [42].Third, there exist certain dependencies between values of conventional (e.g., PSNR) and/or visual quality metrics and accuracy of correct classification of an entire image or classes [43].Classification accuracy for classes mainly represented by large size homogeneous objects (water surfaces, meadows) better correlates with the conventional metrics whilst probability of correct classification for classes mainly represented by textures, small-sized and prolonged objects (forests, urban areas, roads, narrow rivers) correlates with visual quality metrics.Besides, classification accuracies for classes can depend on PCC or CR differently [38].For the latter type of classes, the reduction of probability of correct classification with an increase of CR is usually faster.Fourth, different classifiers produce classification results that can significantly differ both in the sense of obtained classification maps and quantitative characteristics as the total probability of correct classification, probabilities for classes, and confusion matrices [43,44].Fifth, there are quite successful attempts to predict the total probability of correct classification (TPCC) of compressed RS images based on the fractal models [28].
Aggregating all these, we assume the following: • There are strict dependencies between distortions due to lossy compression (that can be characterized by certain metrics) and TPCC for multichannel RS images;

•
Having such dependences, it is possible to recommend what metrics' values must be provided to ensure practically the same TPCC for compressed data as for original data; • It is possible to provide the recommended metrics' values, at least, for some modern compression techniques based on DCT; • Metric's thresholds might depend upon a classifier used and properties of a multichannel image subject to lossy compression; • Classifier performance might also depend on how it is trained (for example, using an original or compressed multichannel image).
The goal of this paper is to carry out a preliminary analysis are these assumptions valid.The main contributions are the following.First, we show that it is worth applying lossy compression with distortions around their invisibility threshold to provide either some improvement of TPCC (achieved for simple structure images) or acceptable reduction of TPCC (for complex structure images) compared to TPCC for the corresponding original image.Moreover, we show how such compression can be carried out for a particular coder.Second, we demonstrate that classifier training for compressed data usually provides slightly better results than training for original (uncompressed) images.
The paper structure is the following.Section 2 deals with a preliminary analysis of the performance criteria used in lossy compression of RS images.In Section 3, two popular methods of RS data classification based on the maximum likelihood method (MLM) and neural network (NN) are described.Section 4 contains the results of classifier testing for the synthetic three-channel test image.Section 5 provides the results of experiments carried out for real-life three-channel image.Discussion and analysis of some practical peculiarities are presented in Section 6; the conclusions are shown in Section 7.

Performance Criteria of Lossy Compression and Their Preliminary Analysis
Consider an image I t ij where i, j denote pixel indices and I Im , J Im define the processed image size.If one deals with a multichannel image, index q can be used.Let's start with the consideration of the case of a single-component image.To characterize compression efficiency, we have used the following three metrics.The first one, traditional PSNR, is defined by where DR denotes the range of image representation, I c ij is the compressed image, MSE is the mean square error of distortions introduced by lossy compression.In general, DR in remote sensing applications might sufficiently differ from 255, which is common for processing RGB color images.Below, we will mainly consider single and multichannel RS images having DR = 255 but will also analyze what should be done if DR is not equal to 255.
PSNR is the conventional metric widely used in many image processing applications.Meanwhile, the drawbacks of PSNR are well known.The main drawback of PSNR is that it is not adequate in characterizing image visual quality [45,46].Currently, there are numerous other, so-called HVS (human vision system) metrics, that are considered to be more adequate [45][46][47].Some of the existing HVS-metrics can be applied only to color images.As we need some metrics that should be good enough and applicable to single component images, we apply the metrics PSNR-HVS and PSNR-HVS-M.The former HVS-metric takes into account less sensitivity of human vision to distortions in high-frequency spectral components, the latter one also takes into consideration a masking effect of image texture and other heterogeneities [47].These metrics are defined by PHVSM = 10 log 10 DR 2 /MSE HVS (2) PHVSM out = 10 log 10 DR 2 /MSE HVS−M (3) where MSE HVS and MSE HVS−M are MSEs determined considering the aforementioned effects.
If PSNR-HVS is approximately equal to PSNR, then the distortions due to lossy compression have properties similar to an additive white Gaussian noise.If PSNR-HVS is smaller than PSNR, this evidences that distortions are more similar to either non-Gaussian (heavy-tailed), or spatially correlated noise, or both [47] (see also [48] for details).If PSNR-HVS-M is considerably larger than PSNR, then a masking effect of image content is sufficient (most probably, an image is highly textural).In turn, if PSNR-HVS-M is approximately equal or smaller than PSNR, then a masking effect is absent, and/or distortions are like spatially correlated noise.Figure 1a,c,e present three grayscale test RS images of different complexity-the image Frisco has the simplest structure whilst the image Diego is the most textural.Figure 1b,d,f represent dependences of the considered metrics in Equations ( 1)-(3) on quantization step which serves as PCC in the coder AGU [42] (this compression method employs 2D DCT in 32 × 32 pixel blocks, the advanced algorithm of coding uniformly quantized DCT coefficients, and embedded deblocking after decompression).As one may expect, all metrics become smaller (worse) if QS increases.Meanwhile, for the same QS, compressed image quality is not the same.As an example, for the case of QS = 20, PSNR and PSNR-HVS-M for the test image Frisco are practically the same (about 40 dB).For the test images Airfield and Diego that have more complex structures, PSNR-HVS-M values are also about 40 dB whilst PSNR values are about 34 dB.This means the following: (1) there are masking effects, i.e., textures and heterogeneities still sufficiently mask introduced distortions; (2) the introduced distortions can be noticed by visual inspection (comparison of original and compressed images).Recall that distortion visibility thresholds are about 35 . . .38 dB according to the metric PSNR and about 40 . . .42 dB according to the metric PSNR-HVS-M [45].Note that distortions' invisibility happens if QS is smaller than 18…20 for grayscale images represented as 8-bit data arrays [23] (more generally, if QS ≤ DR / (12…13)).One should also keep in mind that the use of the same QS leads to sufficiently different CR.For example, for QS = 20, CR values are equal to 26.3, 4.6, and 4.5 for the considered test images Frisco, Airfield, and Diego, respectively.These facts and dependences are given in Figure 1 confirm the basic statements presented in the Introduction.Note that distortions' invisibility happens if QS is smaller than 18 . . .20 for grayscale images represented as 8-bit data arrays [23] (more generally, if QS ≤ DR / (12 . . .13)).One should also keep in mind that the use of the same QS leads to sufficiently different CR.For example, for QS = 20, CR values are equal to 26.3, 4.6, and 4.5 for the considered test images Frisco, Airfield, and Diego, respectively.These facts and dependences are given in Figure 1 confirm the basic statements presented in the Introduction.
Alongside differences in metrics' values for the same QS considered above for QS ≈ 20 (similar differences take place for QS > 20, compare the corresponding plots in Figure 1b,d,f), there are interesting observations for QS ≤ 10 (more generally, QS ≤ DR / 25).In this case, MSE ≈ QS 2 / 12 [20], PSNR exceeds 39 dB and PSNR-HVS-M is not smaller than PSNR and exceeds 45 dB.Thus, the introduced distortions are not visible and the desired PSNR des (or, respectively, MSE des ) can be easily provided by a proper setting of QS as QS ≈ (12MSE des ) 1/2 .
Certainly, there are numerous quality metrics proposed so far.Below we carry our study based on PSNR-HVS-M because of the following reasons.First, it is one of the best component-wise quality metrics [49] that can be calculated efficiently.Second, the cross-correlation between the best existing quality metrics is high (see data in Supplementary Materials Section).Therefore, other quality metrics can be used in our approach under the condition of carrying out a corresponding preliminary analysis of their properties.Note that we have earlier widely used PSNR-HVS-M in image compression and denoising applications [23] utilizing well its main properties.

Considered Approaches to Multichannel Image Classification
As has been mentioned above, there are numerous approaches to the classification of multichannel images.Below we consider two of them.The first one is based on the maximum likelihood method (MLM) [50,51] and the second one relies on a neural network (NN) training [50].In both cases, the pixel-wise classification is studied.There are many reasons behind using the pixel-wise approach and just these classifiers: (a) to simplify the classification task and to use only Q features (q = 1, . . ., Q), i.e., the values of a given multichannel image in each pixel; (b) to show the problems of pixel-wise classification; (c) MLM and NN based classifiers are considered to be among the best ones [50].
The note that classifiers of RS data classification can be trained in different ways.One option is to train a classifier in advance using earlier acquired images, e.g., uncompressed ones stored for training or other purposes.Another option is to use a part of the obtained compressed image for classifier training and its use for entire image classification [39].One can expect that classification results would be different where both options have both advantages and drawbacks.

Maximum Likelihood Classifier
The classification (recognition) function for MLM is based on the calculation of some metric in the feature space where x s are feature (attribute) vectors for current and sample objects (image pixel in our case); ρ(•, •) is a used metric of vector similarity; tr denotes the decision undertaking threshold.
As components of the feature vector, in general, one can consider both original ones (pixel (voxel) values of a multichannel image) and "derivative" features calculated on their basis (e.g., ratios of these values).
If Φ = 1, then a current object (pixel) determined by the feature vector , is related to a class a s .Taking into account the stochastic nature of the features This information is used for creating statistical decision rules where L is the likelihood ratio.The threshold tr is determined by a statistical criterion used; for example, the Bayesian criterion provides getting the optimal classifier when the following information is available: PDFs for all sets of patterns, probabilities of each class occurrence, and losses connected with probabilities of misclassifications.
In most practical cases, for object description, statistical sample information is used, i.e., statistical estimates of PDFs obtained at the training stage are employed in likelihood ratio L.Then, sample size, reliability of information about observed objects, and efficiency of using this information in decision rule mainly determine the quality of undertaken decisions.
Training samples are formed using pixels representing a given class; usually, it is some area (or areas, a set of pixels) in an image identified based on some true data for a sensed terrain.The main requirement for the data of training samples is their representativeness-the pixels of the sample must correspond to one class on the ground; such a class should occupy a territory that is fairly well represented by pixels in the image with a given resolution.In other words, the number of pixels in the selected area of the image should ensure the adoption of statistically significant decisions.
When constructing a multi-alternative decision rule (K > 2), the maximum likelihood criterion is used; the threshold tr in the Equation ( 5) is assumed to be equal to 1.The maximum likelihood criterion makes it possible to eliminate the uncertainty of the solution (when none of the classes can be considered preferable to the others), does not require knowledge of the a priori probabilities of the classes and the loss function, allows evaluating the reliability of the solutions, and can be easily generalized to the case of many classes.In accordance with this criterion, it is believed that the control sample (measured values of features in the current image pixel) belongs to the class, 1 ≤ u ≤ K, for which the likelihood function (or rather, its estimate obtained at the training stage) is maximum: If the assumptions about the normal law of distribution of the feature vector are true, the maximum likelihood method Equation ( 6) provides optimal recognition.The assumption of normality of features, on the one hand, simplifies the estimation of the parameters of distribution laws (DL) and makes it possible to take into account the presence of mutual correlation relationships between data in different channels in models of classes.On the other hand, it is known that the distributions of real multichannel data are often non-Gaussian.Even with the assumption of the normalization of DL observations due to the central limit theorem, the nonlinearity introduced during the reception and primary processing of information, which manifests itself in a change in the shape and parameters of the observed distribution, cannot be ignored.
To increase the accuracy of approximating empirical non-Gaussian distributions of features and to take into consideration the possible presence of strong correlation relationships, it is advisable to apply multiparameter distribution laws based on non-linear transformations of normal random variables.These transformations include Johnson's transformations.
For describing a class of spectral features, we propose to apply Johnson S B -distribution [52,53]: where η and γ are two shape parameters connected with skewness and kurtosis (η > 0, γ ∈ (−∞, +∞)); ε and λ are two scale parameters of a random variable x (ε(−∞, +∞),ε Due to a large number of parameters, the model in Equation ( 4) is quite universal and able to approximate almost any unimodal distribution as well as a wide spectrum of bimodal distributions.Besides, since Johnson distributions are based on nonlinear transformations of normal random variables, their use in describing empirical distributions allows staying within correlation theory framework, which is important for the considered features characterized by non-Gaussian distributions (see examples below) and the presence of strong inter-channel correlation.The multidimensional variant of Equation ( 7) is the following [52]:

The estimates of the parameters for PDFs
where c is the size of the feature vector → x ; Ξ is the sample correlation matrix.Thus, to construct a multidimensional statistical model of correlated data, processing consists of several stages.At the first stage, a traditional statistical analysis is performed by estimating the moments of distributions and histograms of the components of the random vector.Sample estimates of the cross-correlation coefficients of the components → x are also found.At the second stage, one-dimensional statistical models of the form Equation (7) are constructed for each of the components of multidimensional data.The initial estimation of the distribution density (the histogram of the feature x ν ) is used as a target for the desired model f(x ν ) described by the Johnson DL parameter vector → θ .The multidimensional statistical model is built based on multidimensional histograms fc → x .
To construct a multidimensional model as in Equation ( 8) for a class a k , one needs to evaluate the matrix of distribution parameters Θ k = [ε ck , λ ck , η ck , γ ck ] : 4 × c and the sample correlation matrix Ξ.
After the process of creation and estimation of the training samples, image pixels are sorted (related) to the classes based on the decision rule (#3).If the pixel-wise classification is applied, data for each pixel with coordinates (i, j) are processed separately (independently).Value vector for a current pixel → x * ij is put into mathematical models of the class target in Equation (5).The obtained results fc → x * ij | a k , k = 1 . . .K are compared between each other and a maximal estimate of the likelihood function is chosen; its number is the number of the class the current pixel (i, j) is referred to.

NN Classifier Description
As the classifier is based on a neural network for image processing, a feedforward neural network is used.Nowadays there exist many NNs and approaches to their training and optimization [54].We have chosen a simple but efficient NN for our application that can be easily implemented or placed on different platforms and devices.The employed classifier is an all-to-all connected perceptron combined with a self-organizing map to treat obtained weights or probabilities of the pixel belonging to one of the classes.The architecture details are the following: input layer for incoming data of three-color components, one hidden neuron layer with 30 fully connected neurons (this number has shown to perform equally well with larger possible numbers of hidden layer neurons).For the hidden layer, the tanh activation function is exploited.Scaled conjugate gradient backpropagation is used as a global training function.As has been mentioned above, the output layer of the used NN is a self-organized map fully connected with neurons of the hidden layer.It provides the mapping of output probabilities of the decision to the list of classes as vectors.
Training and validation processes have been performed in the following way.For this purpose, a certain image is divided into a set of pixel color value vectors.Hereby, the obtained set of vectors is permuted to introduce non-ordinary allocation for each training.For NN training, we have taken 70% of the produced set, the other 30% of data is used for validation.The self-dataset validation at this first stage has to be performed to prove the chosen NN architecture and NN parameters, and other training process peculiarities.We have considered different configurations of NN-based classifiers, especially varying the number of hidden layers, neurons, and connections among them.It has been established that complex architectures, like multi-layer perceptron, do not provide better efficiency of the classification and even cause over-fitting using the same training dataset.As the result, the chosen architecture was NN with one hidden layer with a fully optimal number of training epochs equal to 50 for the given NN.The proposed classifier is easy to use, and it is fast (which is the crucial point for the application of classification to remotely sensed data).The overall training process was repeated 100 times with full permutation of the dataset and the obtained classifier has been applied to test images.

MLM Classifier Results
Creating a test multichannel image, we have taken into account the following typical properties of classes that might take place in real-life RS images: Property 1. Features rarely have distributions close to Gaussian; since in our case, color component values (see the test color image and its components in Figure 2a-d, respectively) are the used features, then this property relates to them; NN-based classifiers, especially varying the number of hidden layers, neurons, and connections among them.It has been established that complex architectures, like multi-layer perceptron, do not provide better efficiency of the classification and even cause over-fitting using the same training dataset.As the result, the chosen architecture was NN with one hidden layer with a fully optimal number of training epochs equal to 50 for the given NN.The proposed classifier is easy to use, and it is fast (which is the crucial point for the application of classification to remotely sensed data).The overall training process was repeated 100 times with full permutation of the dataset and the obtained classifier has been applied to test images.

MLM Classifier Results
Creating a test multichannel image, we have taken into account the following typical properties of classes that might take place in real-life RS images:  Property 2. Objects that relate to a certain class are rarely absolutely homogeneous (even water surface usually cannot be considered as absolutely homogeneous); many factors lead to diversity (variations) of pixel values as variations of physical and chemical characteristics of reflecting or irradiating surface, noise, etc.

Property 3. Features overlap (intersect) in the feature space; just this property usually leads to misclassifications especially if such overlapping takes place for many features;
Property 4. The numbers of pixels belonging to different classes in each RS image can differ a lot, it might be so that a percentage of pixels belonging to one class is less than 1% whilst for another class, there can be tens of percent of the corresponding pixels.
Property 5.There can be objects of two or even more classes in one pixel [54,55] that requires the application of unmixing methods but we do not consider such, more complex, situations in our further studies (usually more than three components of multichannel RS data are needed to carry out efficient unmixing).
Let us show that we have, in a more or less extent, incorporated these properties in our test image.Table 1 contains data about seven classes, histograms for them, and approximations using Johnson SB-distribution.Class 1 can be conditionally treated as "Roads", Class 2-as agricultural fields (Field-Y), Class 3-as agricultural fields of another color (Field-R), Class 4-as "Water", Class 5-as "Grass", Class 6-as "Trees", Class 7-as some very textural or heterogeneous class like urban area or vineyard ("Texture").Property 2. Objects that relate to a certain class are rarely absolutely homogeneous (even water surface usually cannot be considered as absolutely homogeneous); many factors lead to diversity (variations) of pixel values as variations of physical and chemical characteristics of reflecting or irradiating surface, noise, etc. Property 3. Features overlap (intersect) in the feature space; just this property usually leads to misclassifications especially if such overlapping takes place for many features; Property 4. The numbers of pixels belonging to different classes in each RS image can differ a lot, it might be so that a percentage of pixels belonging to one class is less than 1% whilst for another class, there can be tens of percent of the corresponding pixels.Property 5.There can be objects of two or even more classes in one pixel [54,55] that requires the application of unmixing methods but we do not consider such, more complex, situations in our further studies (usually more than three components of multichannel RS data are needed to carry out efficient unmixing).
Let us show that we have, in a more or less extent, incorporated these properties in our test image.Table 1 contains data about seven classes, histograms for them, and approximations using Johnson S B -distribution.Class 1 can be conditionally treated as "Roads", Class 2-as agricultural fields (Field-Y), Class 3-as agricultural fields of another color (Field-R), Class 4-as "Water", Class 5-as "Grass", Class 6-as "Trees", Class 7-as some very textural or heterogeneous class like urban area or vineyard ("Texture").As one can see, Property 1 (non-Gaussian distribution) holds for practically all particular distributions.Property 2 (internal heterogeneity) takes place for all classes, especially for Classes 2 and 7. Property 3 (intersection of features) takes place for many classes-all three features for Classes 1 and 2, all three features for Classes 5 and 6; features of Class 7 overlap practically with all other classes (intersection of features can be also seen from a comparison of component images where some classes can be hardly distinguished, e.g., Classes 2 and 4 in Green component, Figure 2c).Property 4 is also observed-there is a small percentage of pixels belonging to Class 1 whilst quite many pixels belong to Classes 4, 5, and 7.
Figure 3a  As one can see, Property 1 (non-Gaussian distribution) holds for practically all particular distributions.Property 2 (internal heterogeneity) takes place for all classes, especially for Classes 2 and 7. Property 3 (intersection of features) takes place for many classes-all three features for Classes 1 and 2, all three features for Classes 5 and 6; features of Class 7 overlap practically with all other classes (intersection of features can be also seen from a comparison of component images where some classes can be hardly distinguished, e.g., Classes 2 and 4 in Green component, Figure 2c).Property 4 is also observed-there is a small percentage of pixels belonging to Class 1 whilst quite many pixels belong to Classes 4, 5, and 7.
Figure 3a  MLM classifier applied to original image pixel-wise (Figure 3a) produces a classification map given in Figure 3b.As one can see, there are quite many misclassifications.Classes 4 and 6 are classified in the best way (probabilities of correct classification P44 and P66 are equal to 0.85 and 0.82, respectively) whilst Class 5 is recognized in the worst manner (P55 = 0.19).Classes 1 and 7 are not recognized well (P11 and P77 are equal to 0.48 and 0.42, respectively).Classes 2 and 3 are characterized with P22 = 0.67 and P33 = 0.77, respectively.
Quite low probabilities of classification can be explained by several factors.Some of them have been already mentioned above-intersections in feature space are sufficient.One more possible MLM classifier applied to original image pixel-wise (Figure 3a) produces a classification map given in Figure 3b.As one can see, there are quite many misclassifications.Classes 4 and 6 are classified in the best way (probabilities of correct classification P 44 and P 66 are equal to 0.85 and 0.82, respectively) whilst Class 5 is recognized in the worst manner (P 55 = 0.19).Classes 1 and 7 are not recognized well (P 11 and P 77 are equal to 0.48 and 0.42, respectively).Classes 2 and 3 are characterized with P 22 = 0.67 and P 33 = 0.77, respectively.
Quite low probabilities of classification can be explained by several factors.Some of them have been already mentioned above-intersections in feature space are sufficient.One more possible reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).
Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P 11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2. reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2.  reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.

PSNR-HVS
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2.  reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.

PSNR-HVS
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2. reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2. reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2. reason is that the MLM classifier is based on distribution approximations and these approximations can be not perfect (see histograms and their approximations in Table 1).Quite many pixels are erroneously related to Class 7 ("Texture") for which distributions are very wide and they intersect with distributions for other classes.
Figure 2e,f presents images compressed providing PSNR-HVS-M ≈ 36 dB and PSNR-HVS-M ≈ 30 dB.In these cases, CR values are about 4.5 and 8.9, respectively (for component-wise lossy compression).For the case of PSNR-HVS-M≈36 dB, introduced distortions are visible [45].They mostly appear themselves as smoothing of variations in quasi-homogeneous regions (consider the fragments for Class 6 (dark blue) for images in Figure 2a,e).The effects of such suppression of noise or high-frequency variations due to lossy compression are known for lossy compression [36,37] and they might have even a positive effect for classification (under certain conditions).For the case of PSNR-HVS-M ≈ 30 dB, the distortions due to lossy compression (see the compressed image in Figure 2f) appear themselves in variations' smoothing and edge/object smearing (this can be seen well for Class 1).Clearly, such effects might have an impact on classification.
Probabilities of correct classification for classes are given in Table 2.Note that the MLM classifier has been "trained" using distribution approximations obtained for the original (uncompressed) three-channel image.As one can see, probabilities for particular classes depend on compressed image quality and compression ratio in a different manner.P11 steadily decreases and becomes close to zero.This happens because of two reasons.First, features for Class 1 sufficiently intersect with features for Class 2. Second, due to lossy compression, feature distribution after compression differs from feature distribution for the original image (for which the MLM classifier has been trained) where this difference increases with the reduction of PSNR-HVS-M and CR increase.This is illustrated by distributions presented in Table 3 for Class 1 and, partly, in Table 4 for Class 2.   Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).

PSNR-HVS
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Table 5.The conclusion Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Table 5.The conclusion Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P33, P44, P55, and P66).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P77.This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Table 5.The conclusion Analysis of data in Table 2 also shows the following.There are some classes (Classes 3-6) for which a larger CR and smaller PSNR-HVS-M (that correspond to more sufficient lossy compression) results in a reduction of probabilities of correct classification (analyze data in columns for P 33 , P 44 , P 55, and P 66 ).Meanwhile, there are classes, for which probabilities of correct classification increase-see data for P 77 .This is because of the decrease in misclassifications in the texture area (see the maps in Figure 3c,d).There is also one class (Class 2) for which P 22 is the largest for PSNR-HVS-M about 40 dB (due to noise filtering effect) but it reduces for smaller PSNR-HVS-M (larger CR).
One probable reason why the MLM classifier trained for original images and applied to compressed ones loses efficiency is that feature distributions observed for compressed data differ from those ones observed for original RS data and used for classifier training.
Then, one can expect that the MLM classifier has to be trained for compressed images.We have carried out the corresponding study and the results are presented in Table 5.The conclusion that follows from the analysis of data in Table 5 might seem trivial-it is needed to train the MLM classifier for compressed images with the same characteristics of compression.Following this rule leads to a considerable improvement in classification accuracy.For example, if original data are used in training and then the MLM classifier is applied to data compressed providing PSNR-HVS-M = 42 dB, then P cc = 0.501.At the same time, if the image compressed providing PSNR-HVS-M = 42 dB was used in training, P cc radically increases and becomes equal to 0.579.Even more surprising results are observed for cases of images compressed producing PSNR-HVS-M equal to 36 and 30 dB.P cc for the corresponding training reaches 0.597 and 0.527, respectively.Note that P cc = 0.627 for original image classified using training for the original image.Then, it occurs that for images compressed with PSNR-HVS-M about 40 dB almost the same P cc is observed if they are trained for the corresponding compressed images.Classification maps obtained for this methodology of training are presented in Figure 4a,b.They can be compared to the maps in Figure 3c,d, respectively.This comparison clearly demonstrates that training for the corresponding compressed data is preferable.Many classes are recognized sufficiently better.One interesting point is that many classifications might appear in the neighborhoods of sharp edges in multichannel images (near edges of Class 4).This is because such edges are smeared by lossy compression and pixels with "intermediate values" that can be referred to as "wrong classes" appear.Classification maps obtained for this methodology of training are presented in Figure 4a,b.They can be compared to the maps in Figure 3c,d, respectively.This comparison clearly demonstrates that training for the corresponding compressed data is preferable.Many classes are recognized sufficiently better.One interesting point is that many classifications might appear in the neighborhoods of sharp edges in multichannel images (near edges of Class 4).This is because such edges are smeared by lossy compression and pixels with "intermediate values" that can be referred to as "wrong classes" appear.

NN-Based Classification
Let us start by analyzing data for classifying the original three-channel image using NN training for this image.The obtained confusion matrix is presented in Table 6.The corresponding map is given in Figure 5a.As one can see, several classes are recognized well: Field-R (Class 3), Water (class 4), Trees (Class 6).There are many misclassifications for Classes "Roads" (with Class Field-Y that has practically the same colors) and "Texture" (its pixels can be erroneously related to classes "Water", "Trees", "Grass", and, more rarely, Field-Y).The results are relatively bad for Class 2 and Class 5. Thus, even for this almost ideal case (classified image is the same as that one used for training, the NN-based classifier able to work with non-Gaussian distributions of features is applied), the classification results are far from being perfect.

NN-Based Classification
Let us start by analyzing data for classifying the original three-channel image using NN training for this image.The obtained confusion matrix is presented in Table 6.The corresponding map is given in Figure 5a.As one can see, several classes are recognized well: Field-R (Class 3), Water (class 4), Trees (Class 6).There are many misclassifications for Classes "Roads" (with Class Field-Y that has practically the same colors) and "Texture" (its pixels can be erroneously related to classes "Water", "Trees", "Grass", and, more rarely, Field-Y).The results are relatively bad for Class 2 and Class 5. Thus, even for this almost ideal case (classified image is the same as that one used for training, the NN-based classifier able to work with non-Gaussian distributions of features is applied), the classification results are far from being perfect.Taking into account the results for the MLM classifier, we also considered the case when the NN classifier has been trained for the original image and then applied to compressed images.The data obtained for the cases of PSNR-HVS-M equal to 42 dB, 36 dB, and 30 dB are given in Tables 7-9, respectively.Analysis of data in Table 7 shows the following.As was expected, all probabilities have changed.Let us consider the diagonal elements marked by Bold that correspond to probabilities of correct classification of particular classes.In this sense, there are interesting observations (compare the corresponding data in Tables 6 and 7): some probabilities have decreased sufficiently (see P11), some decreased slightly (P33, P44, P66, P77).Meanwhile, some probabilities have slightly increased (P22 and P55).Taking into account the results for the MLM classifier, we also considered the case when the NN classifier has been trained for the original image and then applied to compressed images.The data obtained for the cases of PSNR-HVS-M equal to 42 dB, 36 dB, and 30 dB are given in Tables 7-9, respectively.Analysis of data in Table 7 shows the following.As was expected, all probabilities have changed.Let us consider the diagonal elements marked by Bold that correspond to probabilities of correct classification of particular classes.In this sense, there are interesting observations (compare the corresponding data in Tables 6 and 7): some probabilities have decreased sufficiently (see P 11 ), some decreased slightly (P 33 , P 44 , P 66 , P 77 ).Meanwhile, some probabilities have slightly increased (P 22 and P 55 ).Analysis of data in Tables 8 and 9 shows that lossy compression negatively influences P 11 , i.e., Probability of correct classification for the Class "Road".Recall that the same effect has been observed for the MLM.With a larger CR (smaller provided PSNR-HVS-M), probabilities P 33 and P 77 continue to decrease.P 22 slowly increases.P 44 remains practically the same and very high.P 66 also remains practically the same.Finally, P 55 decreases.Thus, we can state that being trained for the original image, the NN classifier continues working well for PSNR-HVS-M about 40 dB and then its performance starts making worse faster (with further increase of CR).

Class Probability of Decision
Figure 5 shows classification results for the test image compressed providing three different values of PSNR-HVS-M.The results are quite similar.However, some differences can be noticed:

2.
There are quite many misclassifications around the edge of Class "Water".

3.
More misclassifications appear for the class "Grass" when CR increases.

4.
Many pixels and even some objects that belong to Class 2 (Field-Y) are classified as "Texture" (white color).This example one more time shows the problems in image classification if an image to be classified contains a class like "Texture".The problems occur for the class "Texture" itself and the classes for which there is an intersection of features with features of the class "Texture".
Let us now check what happens if compressed images are used for training.Table 10 presents the confusion matrix obtained for the case of NN training using image compressed providing PSNR-HVS-M = 42 dB.These data can be compared to the corresponding data in Table 7.If compressed image is employed for training, probabilities P 11 , P 55 , and P 77 are sufficiently better, probability P 22 is sufficiently worse, other probabilities are slightly worse.Thus, the quality of classification, in general, slightly improves.The classification map is presented in Figure 6b.image to be classified contains a class like "Texture".The problems occur for the class "Texture" itself and the classes for which there is an intersection of features with features of the class "Texture".
Let us now check what happens if compressed images are used for training.Table 10 presents the confusion matrix obtained for the case of NN training using image compressed providing PSNR-HVS-M = 42 dB.These data can be compared to the corresponding data in Table 7.If compressed image is employed for training, probabilities P11, P55, and P77 are sufficiently better, probability P22 is sufficiently worse, other probabilities are slightly worse.Thus, the quality of classification, in general, slightly improves.The classification map is presented in Figure 6b.The next considered case is the classification of the image compressed producing PSNR-HVS-M = 36 dB.The obtained probabilities are given in Table 11, the classification map is presented in Figure 6c.The probabilities of correct classification in Table 11 are almost the same as the corresponding values in Table 10.Comparing the data in Tables 8 and 11, it is possible to state that it is worth using a compressed image for training (most probabilities of correct classification for classes are better in this case).The classification map is given in Figure 6c and it is quite similar to those in Figure 6a,b.This means that there is no considerable degradation of classification accuracy compared to the previous two cases.Finally, the last case is the classification of the image compressed providing PSNR-HVS-M = 30 dB, i.e., with considerable distortions.The confusion matrix is given in Table 12, the classification map is represented in Figure 6d.Compared to the corresponding data in Table 9, a sufficient increase of P 11 , P 55 , P 66 , and P 77 is observed, P 22 , P 33 , and P 44 have increased a little too.This means that it is worth using the compressed image for training, especially if compression is carried out with a large CR.Attentive analysis of the classification map in Figure 6d allows noticing quite many misclassifications near the edge between classes "Water" and "Trees" where these misclassified pixels are related to "Texture".We associate this effect with edge smearing and other effects that happen near high contrast edges if the compression ratio is quite high.

Preliminary Conclusions
Summarizing the obtained results and conclusions, we can state the following: Classification by any classifier is not perfect if features intersect in feature space and/or if not all possible features are used; here we mean that pixel-wise classification is unable to exploit spatial information that can be useful for improving classification accuracy; 2.
If a classifier is trained for original (uncompressed) data and then applied to compressed images, then more compression (a larger CR, a smaller PSNR or PSNR-HVS-M) leads to classification worsening (in general); meanwhile, the difference in Pcc and probabilities of correct classification for particular classes can be negligible; classification accuracy for some particular classes can even improve due to noise/variation suppression (if noise or high frequency variations are present); 3.
It is worth training a classifier (at least, MLM and NN-based classifiers) for "conditions" they will be applied; these conditions can be described by quality metrics' values that are planned to be provided at compression stage or QS value that will be used; 4.
Different classifiers can produce sufficiently different classification maps for the same RS data; quantitative parameters that characterize classification can be considerably different as well; a good example are data for Class 6-the MLM classifier does not recognize it well enough whilst the NN classifier performs sufficiently better, the situation is the opposite for Class 7; 5.
Lossy compression negatively influences classes that are mainly associated with small-sized and prolonged objects such as roads, narrow rivers, and cricks, small houses; the reason is that lossy compression smears such kinds of objects and changes feature distributions; in this sense, the use of visual quality metrics and setting desired values for them at compression stage can be useful to avoid sufficient degradation of classification accuracy for such classes; 6.
For pixel-wise classification, the misclassifications usually appear as isolated pixels; if distortions due to compression are larger, the misclassifications might appear as groups of pixels, especially in the neighborhoods of high contrast edges in RS images; this is undesired and it is worth avoiding such situations that mostly happen if distortions due to lossy compression are visible (in our experiments, if PSNR-HVS-M = 30 dB and, more rarely, if PSNR-HVS-M = 36 dB); 7.
This runs us into the idea that, possibly, it is worth carrying out lossy compression either without visually noticeable distortions or with distortions relating to partial suppression of noise in quasi-homogeneous image regions; this can be controlled by either setting the desired PSNR-HVS-M (≈40 dB) or the corresponding PCC for the used coder (e.g., QS for AGU).
Certainly, these conclusions are based on analysis of only one test image, and more studies are needed.One opportunity is to check them for real-life RS data.

Analysis of Real-Life Three-Channel Image
Our experiments with real-life images have been done using a three-channel Landsat TM image earlier used in our studies [38,39].This test image (shown in pseudo-color representation in Figure 7a) contains three component images acquired in optical bands with central wavelengths equal to 0.66 µm, 0.56 µm, and 0.49 µm, respectively.They have been associated with R, G, and B components of the color image that has a size of 512 × 512 pixels.The image has five recognizable classes, namely, "Soil" (Class 1), "Grass" (Class 2), "Water" (Class 3), "Urban (Roads and Buildings)" (Class 4), and "Bushes" (Class 5).The image fragments used in classifier training are shown in Figure 7b whilst the pixel used for verification of classifiers are marked by the same colors in Figure 7c.Details concerning the numbers of pixels are given in [39].As one can see, the total number of classes is less than for the test image in the previous section, but some classes are the same or similar.Class 1 is similar to Field-R, Class 5-to Class "Texture".As in test RS data, color features for many classes intersect (consider Classes "Grass" and "Soil", Classes "Soil" and Bushes").This means that classification is not a simple task.
A part of data concerning lossy compression of this image component-wise are presented in Table 13 (more details can be found in [39]).As one can see, to provide PSNR-HVS-Mdes = 42 dB, one has to use QS ≈ 17.3.To provide PSNR-HVS-Mdes = 42 dB, QS ≈ 29.3.This is in good agreement with the data in Figure 1 for other test images.Thus, it is possible to give recommendations on what QS to use for providing PSNR-HVS-Mdes (see details in [23]).Note that the provided CR can be quite large.It is about 6 for PSNR-HVS-Mdes = 42 dB and reaches 16.5 for PSNR-HVS-Mdes = 30 dB.
We have carried out the analysis for six values of PSNR-HVS-M (45, 42, 39, 36, 33, and 30 dB). Figure 8 presents the considered three-channel image (in pseudo-color representation) compressed providing PSNR-HVS-M equal to 42 dB, 36 dB, and 30 dB.They are all very similar between each other and similar to the original image.A more attentive analysis allows finding differences for images in Figure 8b,c that mainly appear themselves in the neighborhoods of small-sized objects and high contrast edges.As one can see, the total number of classes is less than for the test image in the previous section, but some classes are the same or similar.Class 1 is similar to Field-R, Class 5-to Class "Texture".As in test RS data, color features for many classes intersect (consider Classes "Grass" and "Soil", Classes "Soil" and Bushes").This means that classification is not a simple task.
A part of data concerning lossy compression of this image component-wise are presented in Table 13 (more details can be found in [39]).As one can see, to provide PSNR-HVS-M des = 42 dB, one has to use QS ≈ 17.3.To provide PSNR-HVS-M des = 42 dB, QS ≈ 29.3.This is in good agreement with the data in Figure 1 for other test images.Thus, it is possible to give recommendations on what QS to use for providing PSNR-HVS-M des (see details in [23]).Note that the provided CR can be quite large.It is about 6 for PSNR-HVS-M des = 42 dB and reaches 16.5 for PSNR-HVS-M des = 30 dB.
We have carried out the analysis for six values of PSNR-HVS-M (45, 42, 39, 36, 33, and 30 dB). Figure 8 presents the considered three-channel image (in pseudo-color representation) compressed providing PSNR-HVS-M equal to 42 dB, 36 dB, and 30 dB.They are all very similar between each other and similar to the original image.A more attentive analysis allows finding differences for images in Figure 8b,c that mainly appear themselves in the neighborhoods of small-sized objects and high contrast edges.

MLM Classifier Results
Let us start by considering the results of applying the MLM classifier trained for the original image.Tables 14-17 present the obtained data in the form of confusion matrices.As it is seen, probabilities of correct classification for particular classes vary from 0.75 to 0.99 and the smallest probabilities take place for rather heterogeneous classes "Soil" and "Bushes".15).Reduction of probabilities of correct classification by 0.002…0.033takes place.The largest reduction is observed for the most heterogeneous class "Bushes".

MLM Classifier Results
Let us start by considering the results of applying the MLM classifier trained for the original image.Tables 14-17 present the obtained data in the form of confusion matrices.As it is seen, probabilities of correct classification for particular classes vary from 0.75 to 0.99 and the smallest probabilities take place for rather heterogeneous classes "Soil" and "Bushes".Consider now the data obtained if the MLM classifier is applied to the image compressed with visible distortions (PSNR-HVS-M = 36 dB).The confusion matrix is given in Table 16.Most probabilities of correct classification for particular classes continue to decrease although this reduction is not essential-up to 0.021 compared to the previous case (Table 15).
Finally, let us analyze data for a compressed image (PSNR-HVS-M = 30 dB) to which the MLM classifier has been applied.The data are presented in Table 17.The tendency is the same-most probabilities of correct classification for particular classes continue to decrease but, again, the reduction is not large.
Classification maps for three images are presented in Figure 9.They are quite similar although some differences can be found.Compression does not lead to radical degradation of classification accuracy.Consider now the data obtained if the MLM classifier is applied to the image compressed with visible distortions (PSNR-HVS-M = 36 dB).The confusion matrix is given in Table 16.Most probabilities of correct classification for particular classes continue to decrease although this reduction is not essential-up to 0.021 compared to the previous case (Table 15).
Finally, let us analyze data for a compressed image (PSNR-HVS-M = 30 dB) to which the MLM classifier has been applied.The data are presented in Table 17.The tendency is the same-most probabilities of correct classification for particular classes continue to decrease but, again, the reduction is not large.
Classification maps for three images are presented in Figure 9.They are quite similar although some differences can be found.Compression does not lead to radical degradation of classification accuracy.One can be interested in the behavior of Pcc depending on compression.For original image Pcc = 0.865; for compressed images it equals to 0.853, 0.854, 0.853, 0.849, 0.842, and 0.839 for PSNR-HVS-M equal to 45, 42, 39, 36, 33, and 30 dB, respectively.So, it is possible to state that lossy compression providing PSNR-HVS-M about 40 dB does not lead to sufficient reduction of Pcc for the One can be interested in the behavior of P cc depending on compression.For original image P cc = 0.865; for compressed images it equals to 0.853, 0.854, 0.853, 0.849, 0.842, and 0.839 for PSNR-HVS-M equal to 45, 42, 39, 36, 33, and 30 dB, respectively.So, it is possible to state that lossy compression providing PSNR-HVS-M about 40 dB does not lead to sufficient reduction of P cc for the considered case.Moreover, if training is done for the image compressed with the same conditions as an image subject to classification, classification results can improve.For example, if training has been done for the compressed image (PSNR-HVS-M = 36 dB) and then applied to this image (to verification set of pixels), P cc increases to 0.855.
We have analyzed distributions of features before and after compression.One reason why P cc does not radically reduce with CR increase is that the corresponding distributions do not differ a lot.The largest differences are observed for the classes "Soil" and "Bushes".It might be slightly surprising that the class "Urban" is recognized so well.The reason is that, for the considered image, features for this class do not sufficiently overlap with features for other classes.

NN Classifier Results
Consider now the results of the NN-based classification of the real-life image.The ideal case data (NN trained for original image is applied to the original image) are given in Table 18.The results can be compared to the data in Table 14.The NN classifier better recognizes the classes "Soil" and "Grass", the results for the class "Water" are approximately the same.Suppose now that this classifier (NN trained for the original image) is applied to compressed images.The results for different qualities of compressed data are presented in Tables 19-21.If PSNR-HVS-M is equal to 42 dB or 36 dB, the results keep practically the same.Only the probability of correct classification for "Bushes" steadily decreases.Reduction of classification accuracy occurs to be larger for the image compressed with PSNR-HVS-M equal to 30 dB.Mainly, reduction takes place for the classes "Water" and "Urban".Some classification maps are presented in Figure 10.They do not differ a lot from each other.Some pixels that belong to the class "Water" for the narrow river (see the left low corner in Figure 10c) "disappear" (become misclassified).This is because of the effects of smearing prolonged objects due to lossy compression.The MLM classifier training for compressed images was performed using training samples for fragments shown in Figure 7b.Due to the usage of the same data for both training and validation, noticeable classification improvement can be observed (at least, for several classes).10.They do not differ a lot from each other.Some pixels that belong to the class "Water" for the narrow river (see the left low corner in Figure 10c) "disappear" (become misclassified).This is because of the effects of smearing prolonged objects due to lossy compression.The MLM classifier training for compressed images was performed using training samples for fragments shown in Figure 7b.Due to the usage of the same data for both training and validation, noticeable classification improvement can be observed (at least, for several classes).

Brief Analysis for Sentinel-2 Three-Channel Images
One can be interested whether the observed dependences hold for other images or their fragments, other imagers, and other compression techniques.Sentinel-2 offers wide possibilities to check this since it provides a huge amount of data that can be exploited for different purposes.For our study, we have taken three-channel images of the Kharkiv region (Ukraine) in visible range acquired on 30 August 2019 when there were practically no clouds (images are available at [56]).The analyzed 512 × 512 pixel fragments are for the neighborhood of Staryi Saltiv (45 km north-east

Brief Analysis for Sentinel-2 Three-Channel Images
One can be interested whether the observed dependences hold for other images or their fragments, other imagers, and other compression techniques.Sentinel-2 offers wide possibilities to check this since it provides a huge amount of data that can be exploited for different purposes.For our study, we have taken three-channel images of the Kharkiv region (Ukraine) in visible range acquired on 30 August 2019 when there were practically no clouds (images are available at [56]).The analyzed 512 × 512 pixel fragments are for the neighborhood of Staryi Saltiv (45 km north-east from Kharkiv, Ukraine, set 1) and north part of Kharkiv (set 2)-see Figure 11.The main reason for choosing these fragments is from Kharkiv, Ukraine, set 1) and north part of Kharkiv (set 2)-see Figure 11.The main reason for choosing these fragments is the availability of ground truth data that allow easy marking of four typical classes: Urban, Water, Vegetation, and Bare Soil.One more reason is that the image in Figure 11a is considerably less complex (textural) than the image in Figure 11b.First, these fragments have been compressed component-wise with providing a set of PSNR-HVS-M values using three coders: AGU used in the previous part of this paper, SPIHT [57] which can be considered as an analog of JPEG2000 standard, and Advanced DCT coder (ADCTC) [58] that uses partition scheme optimization for better adaptation to image content.Basic data on compression performance are given in Table 22.The observed dependences are predictable.CR for all coders increases if desired PSNR-HVS-M reduces.CR values for different components for the same desired PSNR-HVS-M and set differ but not considerably.CR for AGU is usually slightly larger than for SPIHT (for the same conditions), ADCTC outperforms both coders.CR values for set 2 images are several times smaller than for the corresponding set 1 images due to the higher complexity of set 2 RS data.First, these fragments have been compressed component-wise with providing a set of PSNR-HVS-M values using three coders: AGU used in the previous part of this paper, SPIHT [57] which can be considered as an analog of JPEG2000 standard, and Advanced DCT coder (ADCTC) [58] that uses partition scheme optimization for better adaptation to image content.Basic data on compression performance are given in Table 22.The observed dependences are predictable.CR for all coders increases if desired PSNR-HVS-M reduces.CR values for different components for the same desired PSNR-HVS-M and set differ but not considerably.CR for AGU is usually slightly larger than for SPIHT (for the same conditions), ADCTC outperforms both coders.CR values for set 2 images are several times smaller than for the corresponding set 1 images due to the higher complexity of set 2 RS data.

Discussion
Above, we have considered tendencies for one test and three real-life three-channel images presented as 8-bit 2D data for each component.In practice, images can be presented differently, for example, by 16-bit data [13] or by 10-bit data after certain normalization [59].Then, a question arises how to provide the desired PSNR-HVS-M (e.g., 40 dB) for a given multichannel image to be compressed.To answer this question, let us consider some data.First, Figure 11 taken from [60] presents the dependences of PSNR-HVS-M on QS for nine 8-bit grayscale test images of different complexity for AGU.The average curve obtained for these nine test images is also given.It is seen that this average curve allows the approximate setting of QS to provide the desired PSNR-HVS-M.For example, to provide PSNR-HVS-M ≈ 43 dB, one has to set QS rec ≈ 15.To provide PSNR-HVS-M ≈ 40 dB, it is possible to set QS rec ≈ 20.If D is not equal to 255, then the recommended QS is QS recD = QS rec D/255, (9) where QS rec is determined from the average curve in Figure 11.As it follows from the analysis of data in Figure 11, the use of QS rec or QS recD provides PSNR-HVS-M approximately, errors can be up to 1 . . . 2 dB depending upon the complexity of an image to be compressed.Such accuracy can be treated as acceptable, since, as it is shown in the previous Section, change of PSNR-HVS-M by even 2 dB does not lead to radical changes of P cc and probabilities of correct classification for particular classes.Second, if errors in providing the desired PSNR-HVS-M are inappropriate, accuracy can be improved by applying a two-step procedure proposed in [60].As it has been shown in Table 13, QS that should be used in component-wise compression is practically the same for all components.This means that it is enough to determine QS rec or QS recD for one component image and then apply it for compressing other components (this can save time and resources at the data compression stage).Moreover, QS rec or QS recD determined according to recommendations given above can be used in joint compression of all components of multichannel images or groups of components [59] by the 3D version of AGU.In this case, the positive effect is twofold.First, a larger CR is provided compared to the component-wise compression.Second, a slightly larger quality of the compressed image can be ensured.Figure 12 demonstrates the dependences PSNR-HVS-M on QS for different test images.Figure 13 shows the RS data processing flowchart.Acquired images (e.g., on-board) are subject to "careful" lossy compression in Quality Control Compression Unit where PCC (e.g., QS) is determined using the desired threshold for a chosen quality metric (e.g., 40 dB for PSNR-HVS-M) and rate/distortion curves obtained in advance (like those in Figure 12).PCC corrections can be done if, e.g., images are normalized before compression.In the Image Classification Unit, RS data are subject to classification where the classifier can be trained using either earlier processed images or data that have been just received.If time limitations are not strict, the second option seems preferable (according to results obtained in our analysis).and rate/distortion curves obtained in advance (like those in Figure 12).PCC corrections can be done if, e.g., images are normalized before compression.In the Image Classification Unit, RS data are subject to classification where the classifier can be trained using either earlier processed images or data that have been just received.If time limitations are not strict, the second option seems preferable (according to results obtained in our analysis).Another question that has arisen several times is the following-is it possible to improve classification accuracy?One answer that easily follows from our analysis is that the pixel-wise classification (at least, its simple version used by us) does not allow exploiting information from neighboring pixels.Such information can be of different types and it can be used differently.For example, a pixel can be preliminarily classified as belonging to a homogeneous, textural, or locally active area (by locally active, we mean pixel belonging to a small-sized object or edge or their neighborhood).Then, different features can be determined and used for such pre-classified pixels.However, this approach is out of the scope of this paper.
There are also two other approaches possible.They have been proposed in [39].First, several Another question that has arisen several times is the following-is it possible to improve classification accuracy?One answer that easily follows from our analysis is that the pixel-wise classification (at least, its simple version used by us) does not allow exploiting information from neighboring pixels.Such information can be of different types and it can be used differently.For example, a pixel can be preliminarily classified as belonging to a homogeneous, textural, or locally active area (by locally active, we mean pixel belonging to a small-sized object or edge or their neighborhood).Then, different features can be determined and used for such pre-classified pixels.However, this approach is out of the scope of this paper.
There are also two other approaches possible.They have been proposed in [39].First, several elementary classifiers (for example, MLM, NN, and SVM ones) can be applied in parallel, and then their outputs can be combined.Second, post-processing of classification results with preliminary detection of edges can be performed.Note that these approaches can lead to a sufficient (up to 0.1 . . .0.2) increase of P cc , especially if it is low for originally classified data.Meanwhile, more experiments for verifying the methods developed in [39] are needed.
Finally, the last question is how to improve CR without losing classification performance for multichannel RS data?In our opinion, 3D compression should be applied.However, in this case, additional studies are also desired.

Conclusions
We have considered the task of lossy compression of multichannel images by introducing different levels of distortions to analyze their influence on classification accuracy.Two classifiers, namely MLM and NN, have been studied.The DCT-based coder AGU has been used.In addition, two other transform-based coders have been employed with a brief analysis of their performance.One artificial and three real-life three-channel images have been thoroughly analyzed.Component-wise compression has been addressed with quality controlled (characterized) by visual quality metric PSNR-HVS-M.
The following has been shown for the case of classifier training for original (uncompressed) data; -there is no sufficient decrease in classification accuracy if images are compressed without visible distortions (PSNR-HVS-M exceeds 40 . . .42 dB); in some cases, TPCC can even increase compared to TPCC for original (uncompressed) data; -lossy compression might have no considerable negative impact on classification accuracy even if distortions are visible (PSNR-HVS-M < 40 dB) but this happens if a compressed image has a simple structure; -distortions of the aforementioned level result in different CR depending upon image complexity; this means that images with complex structure have to be compressed with smaller CR to avoid considerable reduction of P cc ; -dependences of probabilities of correct classification of particular classes on compression parameters are different; for classes mostly represented by large-size objects, lossy compression is not so crucial; it might even happen that lossy compression can improve classification accuracy if noise or high-frequency variations are partly suppressed due to it; -for classes represented by small-sizes and prolonged objects as well as some textures, lossy compression might lead to sufficient reduction of classification accuracy.This is due to the image smearing effect that transforms features.
The situation is slightly different if the training is done for compressed data.Then, it is worth using for training such compressed data that are characterized by the same quality (for example, approximately the same PSNR-HVS-M value) as data to which classification will be applied.Then, even for PSNR-HVS-M about 36 dB, the classification can be almost as good enough as for the original image classified by the classifier trained for original data.
It has been also shown that the main dependencies are observed for different coders (in particular, compression techniques based on DCT and wavelets) and for data provided by different multispectral
be found by a numerical optimization of the loss function minimizing integral MSE of empirical distribution (histogram) representation by the theoretical model in Equation (7).
presents the true map of classes where Class 1 is shown by light brown, Class 2-by yellow; Class 3-by red, Class 4-by blue; Class 5-by light green, Class 6-by dark green, Class 7-by white.Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 35

Figure 3 .
Figure 3. Classification results: true map (a); classification map for original image (b); classification map for image compressed with PSNR-HVS-M = 36 dB (c); classification map for image compressed with PSNR-HVS-M = 30 dB (d).

Figure 3 .
Figure 3. Classification results: true map (a); classification map for original image (b); classification map for image compressed with PSNR-HVS-M = 36 dB (c); classification map for image compressed with PSNR-HVS-M = 30 dB (d).

Figure 4 .
Figure 4. Classification maps for the compressed image with providing PSNR-HVS-M = 36 dB using maximum likelihood method (MLM) classifier trained for the same compressed image (a); compressed image with providing PSNR-HVS-M = 30 dB using MLM classifier trained for the same compressed image (b).

Figure 4 .
Figure 4. Classification maps for the compressed image with providing PSNR-HVS-M = 36 dB using maximum likelihood method (MLM) classifier trained for the same compressed image (a); compressed image with providing PSNR-HVS-M = 30 dB using MLM classifier trained for the same compressed image (b).
Class 1 is marked by red color, Class 2-by green, Class 3-by dark blue; Class 4-by yellow, Class 5-by azure.

Figure 7 .
Figure 7.The three-channel image in pseudo-color representation (a); pixel groups used for training (b), pixel groups used for verification (c).

Figure 7 .
Figure 7.The three-channel image in pseudo-color representation (a); pixel groups used for training (b), pixel groups used for verification (c).

Figure 12 .
Figure 12.Dependences PSNR-HVS-M on QS for nine test images (see the list in the upper part of the plot) and the average curve for AGU.

Figure 12 .
Figure 12.Dependences PSNR-HVS-M on QS for nine test images (see the list in the upper part of the plot) and the average curve for AGU.

Figure 13 .
Figure 13.Flowchart of the proposed processing approach.

Figure 13 .
Figure 13.Flowchart of the proposed processing approach.

Table 1 .
Information about classes.

Table 1 .
Information about classes.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 2 .
Probabilities of correct classifications for particular classes depending on image quality.

Table 3 .
Illustration of distribution changes due to lossy compression for Class 1.

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
The conclusion

Table 4 .
Illustration of distribution changes due to lossy compression for Class 2.

Table 5 .
Total probabilities of correct classification Pcc for different images depending upon "training" data.
Remote Sens. 2020,12,x FOR PEER REVIEW 15 of 35 that follows from the analysis of data in Table 5 might seem trivial-it is needed to train the MLM classifier for compressed images with the same characteristics of compression.Following this rule leads to a considerable improvement in classification accuracy.For example, if original data are used in training and then the MLM classifier is applied to data compressed providing PSNR-HVS-M = 42 dB, then Pcc = 0.501.At the same time, if the image compressed providing PSNR-HVS-M = 42 dB was used in training, Pcc radically increases and becomes equal to 0.579.Even more surprising results are observed for cases of images compressed producing PSNR-HVS-M equal to 36 and 30 dB.Pcc for the corresponding training reaches 0.597 and 0.527, respectively.Note that Pcc = 0.627 for original image classified using training for the original image.Then, it occurs that for images compressed with PSNR-HVS-M about 40 dB almost the same Pcc is observed if they are trained for the corresponding compressed images.

Table 5 .
Total probabilities of correct classification Pcc for different images depending upon "training" data.

Table 6 .
Confusion matrix for the original image classified by neural networks (NN) trained for this image.

Table 6 .
Confusion matrix for the original image classified by neural networks (NN) trained for this image.

Table 7 .
Confusion matrix for compressed image classified by NN trained for the original image and applied for the compressed image (PSNR-HVS-M = 42 dB).

Table 8 .
Confusion matrix for compressed image classified by NN trained for original image and applied for the compressed image (PSNR-HVS-M = 36 dB).

Table 9 .
Confusion matrix for compressed image classified by NN trained for the original image and applied for the compressed image (PSNR-HVS-M = 30 dB).

Table 10 .
Confusion matrix for compressed image classified by NN trained for this image (PSNR-HVS-M = 42 dB).

Table 11 .
Confusion matrix for compressed image classified by NN trained for this image (PSNR-HVS-M = 36 dB).

Table 12 .
Confusion matrix for compressed image classified by NN trained for this image (PSNR-HVS-M = 30 dB).

Table 14 .
Classification probabilities for the MLM method trained for the original image and applied to it.

Table 15 .
Classification probabilities (confusion matrix) for the MLM method trained for the original image and applied to the compressed image (PSNR-HVS-M = 42 dB).

Table 14 .
Classification probabilities for the MLM method trained for the original image and applied to it.

Table 15 .
Classification probabilities (confusion matrix) for the MLM method trained for the original image and applied to the compressed image (PSNR-HVS-M = 42 dB).

If the same classifier is applied to a compressed image (PSNR-HVS-M=42 dB), the results are slightly worse (see data in Table 15).
Reduction of probabilities of correct classification by 0.002 . . .0.033 takes place.The largest reduction is observed for the most heterogeneous class "Bushes".

Table 16 .
Classification probabilities (confusion matrix) for the MLM trained for the original image and applied to the compressed image (PSNR-HVS-M = 36 dB).

Table 17 .
Classification probabilities (confusion matrix) for the MLM trained for the original image and applied to the compressed image (PSNR-HVS-M = 30 dB).

Table 16 .
Classification probabilities (confusion matrix) for the MLM trained for the original image and applied to the compressed image (PSNR-HVS-M = 36 dB).

Table 17 .
Classification probabilities (confusion matrix) for the MLM trained for the original image and applied to the compressed image (PSNR-HVS-M = 30 dB).

Table 18 .
Classification probabilities for the NN-based method trained for the original image and applied to it.

Table 19 .
Classification probabilities (confusion matrix) for the NN-based method trained for the original image and applied to the compressed image (PSNR-HVS-M = 42 dB).

Table 20 .
Classification probabilities (confusion matrix) for the NN-based method trained for original image and applied to the compressed image (PSNR-HVS-M = 36 dB).

Table 21 .
Classification probabilities (confusion matrix) for the NN-based method trained for the original image and applied to the compressed image (PSNR-HVS-M = 30 dB).

Table 20 .
Classification probabilities (confusion matrix) for the NN-based method trained for original image and applied to the compressed image (PSNR-HVS-M = 36 dB).

Table 21 .
Classification probabilities (confusion matrix) for the NN-based method trained for the original image and applied to the compressed image (PSNR-HVS-M = 30 dB).
Some classification maps are presented in Figure

Table 22 .
CR comparison for real-life data for three coders, both sets.

Table 22 .
CR comparison for real-life data for three coders, both sets.

Table 29 .
Probabilities of correct classification depending on compressed image quality for Set 1, ADCTC, ML classifier.

Table 30 .
Probabilities of correct classification depending on compressed image quality for Set 2, ADCTC coder, ML classifier.