Combined No-Reference Image Quality Metrics for Visual Quality Assessment Optimized for Remote Sensing Images

: No-reference image quality assessment is one of the most demanding areas of image analysis for many applications where the results of the analysis should be strongly correlated with the quality of an input image and the corresponding reference image is unavailable. One of the examples might be remote sensing since the transmission of such obtained images often requires the use of lossy compression and they are often distorted, e.g., by the presence of noise and blur. Since the practical usefulness of acquired and/or preprocessed images is directly related to their quality, there is a need for the development of reliable and adequate no-reference metrics that do not need any reference images. As the performance and universality of many existing metrics are quite limited, one of the possible solutions is the design and application of combined metrics. Several possible approaches to their composition have been previously proposed and successfully used for full-reference metrics. In the paper, three possible approaches to the development and optimization of no-reference combined metrics are investigated and veriﬁed for the dataset of images containing distortions typical for remote sensing. The proposed approach leads to good results, signiﬁcantly improving the correlation of the obtained results with subjective quality scores.


Introduction
Modern remote sensing (RS) systems produce an enormous number of images that are later used for many valuable applications such as ecological monitoring, agriculture, and urban planning [1][2][3]. It is often supposed that all RS images are of high quality. However, this is not so due to many reasons. Noise and other distortions can be present in acquired data due to the RS sensor principle of operation such as speckle in radar images [4], wave absorption in specific bands as in junk channels of hyperspectral data [5], bad conditions of imaging [6], and imaging or communication system failures [7]. Then, an important task is to estimate an image quality for an original (acquired) image or a processed (e.g., filtered) image [6,[8][9][10][11]. There are practical situations when full-reference metrics [12,13] can be employed for this purpose [8]. This happens when there is an image that can be considered as "pristine" (reference) and, after processing (e.g., lossy compression), one has the corresponding distorted image that should be compared to the reference one using a certain quality metric where both traditional (such as Mean Square Error-MSE or Peak Signal-to-Noise Ratio-PSNR) or visual quality (e.g., Structural Similarity-SSIM [14,15]) metrics can be applied.
Nevertheless, one can often meet situations where full-reference metrics cannot be used since reference images are unavailable. Then, there are two options: either to try predicting the values of trustworthy full-reference metrics or apply no-reference or reducedreference ones. Examples of the first option are given in the papers [16,17], where trained neural networks (NNs) are applied to characterize the quality of images, including the TID2013), we have trained the NNs that used different numbers of elementary full-reference metrics as NN inputs. As the result, we have obtained the SROCC of about 0.965 for the best structures and configurations of the NN-based metrics for the subset Noise + Actual (NA). Keeping in mind these results, we have decided to carry out a similar attempt and study for NR IQA metrics. Thus, the main novelty of this paper consists in the design of general-purpose combined NR metrics using NN and weighted sum/product-based approaches of elementary metric aggregation with application to distortions typical for three-channel RS images.
The paper structure is as follows: in Section 2 the TID2013 database, NR metrics, earlier results, and the applied methodology are discussed, Section 3 focuses on the presentation of experimental results with their analysis. A brief discussion is given in Section 4.

Overview of the TID2013 Dataset and Earlier Results in the Combined Metric Design
To design and verify a new metric, one needs a database or several databases that contain images of a needed type corrupted by distortions with type and intensity inherent for images under interest. Besides, these databases should provide MOS or Differential Mean Opinion Score (DMOS) values for all images. In this sense, there are some problems in the design and verification of visual quality metrics for RS images. First, there is a very limited number of available databases of RS images, and they have a limited number of distortion types and/or are intended for other purposes [32,33]. Second, remote sensing images of a certain type can be intended for a particular purpose, therefore IQA results can be in improper agreement with this purpose and criteria used. For example, it is currently not fully clear how traditional criteria used in lossy compression relate to criteria characterizing object detection, classification, and segmentation of compressed images. Third, even if RS images have three components, there are several variants of their visualization as RGB color images. If the number of components is larger than three, there are numerous ways to represent RS data in pseudo-colors and then a correspondence between the assessed quality of visualized images and real values of RS data is not clear.
Because of this, similarly to [11], the analysis is restricted to three-channel RS images assuming that they are represented as RGB color images. Consideration of more complex cases of more channels (components) of RS images is out of the scope of this paper.
Recall that the synthesis and analysis of the full-reference and no-reference quality metrics for RS images using the database TID2013 has already been carried out [11]. The reasons for this may be summarized as follows:

•
The TID2013 dataset contains 25 reference images and images with 24 types and 5 levels of distortions where many distortion types take place in the practice of remote sensing; • These types of distortions are concentrated in two subsets, namely Noise and Actual, that can be processed and analyzed separately (see the details below); • MOS values have been obtained for all distorted images in TID2013 including the aforementioned subsets where a larger number of participants (volunteers) have been attracted to experiments (this is important since MOS has to be estimated accurately enough to minimize the negative influence of possible inaccuracy on the SROCC calculation and comparison of metrics performance).
It is worth noting here that some other types of distortions may also be present in RS images. For example, there may be multiple distortions such as blur + noise or blur + blockiness. However, currently, the analysis is restricted by the availability of the databases and appropriate subsets.
Previous experiments with the design of the combined FR and NR metrics [11,24,29,30] show the following: • SROCC for a given metric, elementary or combined, depends not only on a metric but also on the database whereas for databases with a smaller number and/or more typical distortions SROCC is usually considerably higher (because of this, the metric performance for the old LIVE database is commonly much better than for the other, more complicated, databases); • SROCC values for metrics for such subsets of TID2013 as Noise or Actual are usually larger than for the entire database; • Combined metrics, especially NN-based, can provide better results than the best elementary metric; meanwhile, for a given number of elementary metrics N, it does not mean that the best N elementary metrics have to be combined to produce the best NN-based metrics; the elementary metrics being complementary to each other constitute the best solutions.
Here it is worth recalling that there are three main types of combining elementary metrics: • CM = R(EMfMOS(n), n = 1, . . . , N), where R means a robust operator (e.g., a sample median or α-trimmed mean) applied to a set of elementary metrics after fitting to MOS (EMfMOS), N is the number of elementary metrics which is usually quite small (e.g., equal to 5); the advantage is that the method is simple; the drawback is that metric to MOS fitting is needed; • CM = WSUM(EM(n), n = 1, . . . , N) or CM = WPROD(EM(n), n = 1, . . . , N) where WSUM and WPROD denote the weighted sum or weighted product of N elementary metrics with weights optimized to provide the highest correlation with subjective scores for a given dataset; the advantages are that good results may be obtained for a relatively small number of elementary metrics and their combining is very simple and fitting (in general) is not needed; the drawback is that the obtained combined metrics are usually less efficient than the NN-based ones; • CM = NN(EM(n), n = 1, . . . , N) where NN() means that a neural network is applied to a set of input parameters where elementary metrics (without fitting) serve as inputs; the advantage is that the obtained metrics are usually the most efficient; the drawback is that there are several questions to be answered at the stage of metric design.
Because of the mentioned advantages and drawbacks, this paper concentrates on the two latter approaches to the design of combined metrics. In both cases, the provided performance depends upon many factors. The main two of them are the number of elementary metrics and the choice of the used elementary metrics. A general tendency is that a larger number of elementary metrics leads to better performance but only to a certain , 12, 1986 limit. For the combined metrics presented as a weighted sum or product [29,30], it is often enough to have less than 8-10 metrics since a further increase of N does not lead to sufficient benefits in performance, making the combined metric more complicated. The same holds for NN-based metrics [11] where usually it is enough to apply 20-30 elementary metrics instead of 40-50 elementary metrics used as inputs. In this case, the Lasso method [34] allows restricting a set of elementary metrics to be employed [11] with simplifying the design and the final structure of the obtained NN-based combined metric.
An important role is also played by the optimization and training methods. For example, in [24] only a very limited number of weights was allowed in the design of product (multiplicative) type of combined metrics. However, an optimization procedure that should allow avoiding to get into local extrema is important as well. It is worth noting that for the NN-based combined metrics, it is practically impossible to use the convolutional NNs which are very popular nowadays [35]. The main reason is a limited number of distorted images to be used in training and verification. Additionally, a division of images into smaller patches and data augmentation cannot be conducted due to the availability of MOS values obtained only for whole images. Thus, simpler structures of the NNs have to be applied. However, even in this case, there is a wide variety of possible variants.

Brief Overview of the No-Reference Image Quality Assessment Methods
The no-reference objective IQA metrics are particularly interesting for many applications where "pristine" reference images without any distortions are unavailable. However, due to the lack of possibility of comparisons of the distorted images with them, each NR method should not only determine the amount and/or type of distortions in the way as highly correlated with their subjective perception as possible. Typically, such methods should also be able to detect the presence of some types of distortions without the knowledge of the original reference image. These requirements cause much lower universality of the NR metrics in comparison to the full-reference IQA methods as well as their significantly lower correlation with subjective quality evaluation results available in the IQA datasets, such as LIVE or TID2013. Therefore, many various general-purpose or more specialized NR metrics have been proposed by various researchers, and some of them are briefly presented below. Some of the most widely known NR metrics for natural images, currently implemented i.a. in MATLAB environment are Naturalness Image Quality Evaluator (NIQE) [26], Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [36], and Perceptionbased Image Quality Evaluator (PIQE) [37]. The first one is based on the measurement of the distances between the image features utilizing the natural scene statistics (NSS) for the assessed image and the same features obtained from an image database during model training with the use of a multivariate Gaussian model. Its extension, known as IL-NIQE [25], utilizes such models calculated for each image patch and the overall quality score is obtained by average pooling. Another, relatively well-known NR metric based on the NSS is the Blind Image Quality Index (BIQI) [38], being a two-step framework composed of 5 sub-indexes sensitive to distortion types present in the LIVE database, utilizing wavelet decomposition, generalized Gaussian distribution (GGD) and a classifier based on the support vector machine (SVM). The BLIINDS-2 [39] metric utilizes the NSS model of the discrete cosine transform (DCT) coefficients a simple Bayesian inference approach for quality prediction, whereas Distortion Identification-based Image Verity and Integrity Evaluation (DIIVINE) [40] does not use any distortion-specific models to extend its universality.
An interesting general-purpose NR objective metric, known as the COdebook Representation for No-reference Image Assessment (CORNIA), has been proposed by Ye et al. [41] utilizing unsupervised feature learning. Instead of the handcrafted features, raw-imagepatches extracted from a set of unlabeled images have been used as local descriptors. Xue et al. [42] have proposed a quality-aware clustering (QAC) method to learn a set of cen-troids for each quality level without the necessity of using the images scored by a human in learning.
Some examples of more specialized metrics, e.g., for sharpness evaluation might be ARISM [43] based on the autoregressive model parameters, Cumulative Probability of Blur Detection (CPBDM) [20], JNBM [19] based on the idea of just noticeable blur (JNB), or the blur metric proposed by Crété-Roffet et al. [44]. Some other examples may be the wavelet-based Fast Image SHarpness (FISH) [45] metric, as well as Perceptual Sharpness Index (PSI) [46]. A specialized metric designed to measure blocking effects and relative blur is known as WNJE [47], and an exemplary metric designed for blur assessment based on discrete orthogonal moments is known as BIBLE [48].
The use of High Order Statistics Aggregation (HOSA) has been proposed by Xu et al. [49], whereas Min et al. proposed the BPRI metric [50] based on the use of pseudo-reference image, consisting of three sub-indexes: PSS blockiness measure, LSSs sharpness measure, and LSSn noisiness measure. An interesting hybrid NR metric, called SISBLIM, designed also for evaluation of multiply distorted images, has been proposed by Gu et al. [51] in four versions (SISBLIM_SM, SISBLIM_SFB, SISBLIM_WM, and SISBLIM_WFB).
Since the topic of the paper is related to the methods of an efficient combination of existing metrics, a more detailed description of some other NR metrics used in experiments may be found in respective papers referred for each metric in Table 1, presenting their performance.

The Design of the No-Reference Combined Metrics for RS Images
Improvement of no-reference metrics is a demanding topic of research. Robust and stable results still pose a challenge and are required for many tasks. One of the most effective approaches to increase their performance is the combination of elementary metrics. The design of the NR combined metrics that would reflect the distortions met in RS images is based on the "Noise and Actual" (NA) subset from the TID2013 database, concerning 13 selected types of distortions listed in Section 2.1. This subset contains 1625 images out of 3000 present in the whole database (it is still more than included in some other commonly used datasets, such as LIVE or CSIQ [52]). Considering the combination based on the weighted product, the generalized formula of the combined metric may be presented as: assuming the elementary metrics Q i . The application of the second investigated approach utilizing the weighted sum expressed as: with additional weights a i , increasing the flexibility of the designed combined metric, makes it possible to increase its correlation with subjective scores. Such an approach to metrics' combination, applied previously for full-reference metrics, has led to encouraging results [30]. In both cases, the values of all parameters (weights) for the selected set of elementary metrics are obtained as the results of optimization using the direct search method based on the Nelder-Mead simplex. For this purpose, MATLAB fminsearch function may be effectively used. The experiments have been started by the calculation of the SROCC values of elementary metrics for the NA subset. As shown in Table 1, the best accuracy of visual quality estimation for the NA subset is provided by IL-NIQE [25], which value reaches 0.72 by SROCC. Due to the relatively small number of possible combinations, each pair of metrics has been subject to both types of combinations, according to Equations (1) and (2) with optimization of parameters using the SROCC as the criterion. The further choice of elementary metrics for the combination using three or more of them has been based on the correlation of their subset: the five best combinations of two metrics have been selected as the basis and the third metric has been combined with them (all metrics have been checked and weights have been optimized). Then five best combinations of three metrics have been selected and each of the other elementary metrics has been checked as the potential fourth element, and so on. The reason for the choice of the five best combinations for each N is caused by the observation in previous experiments with full-reference metrics applied to multiply distorted images [30]. In some cases, the second or third "best" combination of N elementary metrics has led to better results in combination with the (N + 1)-th metric. An important element in the design of the combined metrics is also the calculation time. To verify the possible limitations some experiments have been conducted calculating some elementary metrics using the Intel i7 10gen laptop processor. Although the execution time of many metrics is below 1 s, it is worth mentioning some of the "slowest" algorithms, namely BLIINDS2 (nearly 16 s), NJQA (over 8 s), IL-NIQE (over 6 s), ARISM/ARISMc (about 5 s), OG-IQA (over 4 s), or CORNIA (nearly 3 s). Nevertheless, since the processing time may vary, the precise measurement results have not been presented, limiting this thread to the indication of the most computationally demanding metrics verified using an exemplary computer for comparison purposes. The graphical illustration of the differences is presented in Figure 1 in logarithmic scale with respect to the fastest metric.
presented, limiting this thread to the indication of the most computationally demanding metrics verified using an exemplary computer for comparison purposes. The graphical illustration of the differences is presented in Figure 1 in logarithmic scale with respect to the fastest metric.  Table 1 with respect to the fastest one (WNJE) presented in the logarithmic scale.

The Design of the Combined Metrics Based on Neural Networks
One of the main results obtained in earlier papers [11,24,70] concerning the use of neural networks is the confirmation of their high efficiency for the optimization of fullreference combined metrics. As the result of multiple parameters optimization, combined metrics based on neural networks provide several benefits. For a given image database they usually outperform any single elementary metric, and it is enough to use a simple NN model with several layers to achieve that result. Therefore, this stage is characterized by high computational performance, and the duration of computations is determined primarily by the slowest metric (in the case of parallel computations). In this paper, an approach tested previously on full-reference metrics has been applied to no-reference metrics and verified experimentally for the NA subset and the whole TID2013 database. Some preliminary results may also be found in [24], although only general-purpose metrics have been designed, demonstrating quite limited universality, given the results obtained for different considered datasets.
Since the use of more complex deep learning models [71,72] may not always be applicable for different image processing tasks, due to its limitations on the structure and size of the trained model and its computational complexity, the experiments are focused on the application of simpler NN models. Taking into account the distortions characteristic for RS images, some additional NR metrics, not considered previously in [24], have been included in calculations, increasing the possibilities of their choice and optimization.
Since the accuracy values of even the best NR metric (IL-NIQE) are quite low, a remarkable improvement may be expected using the combined NN metric, potentially better than using both approaches considered above (weighted product and weighted sum). Nevertheless, it would be difficult to directly compare the obtained combined metric with the results presented in [24], since the target datasets for the NN differ significantly. However, based on the results for 11 metrics and the fact that the NA subset has been included as a part of previous experiments, the expected improvement should reach the SROCC values at least 0.75-0.8 for a comparable number of metrics.  Table 1 with respect to the fastest one (WNJE) presented in the logarithmic scale.

The Design of the Combined Metrics Based on Neural Networks
One of the main results obtained in earlier papers [11,24,70] concerning the use of neural networks is the confirmation of their high efficiency for the optimization of fullreference combined metrics. As the result of multiple parameters optimization, combined metrics based on neural networks provide several benefits. For a given image database they usually outperform any single elementary metric, and it is enough to use a simple NN model with several layers to achieve that result. Therefore, this stage is characterized by high computational performance, and the duration of computations is determined primarily by the slowest metric (in the case of parallel computations). In this paper, an approach tested previously on full-reference metrics has been applied to no-reference metrics and verified experimentally for the NA subset and the whole TID2013 database. Some preliminary results may also be found in [24], although only general-purpose metrics have been designed, demonstrating quite limited universality, given the results obtained for different considered datasets.
Since the use of more complex deep learning models [71,72] may not always be applicable for different image processing tasks, due to its limitations on the structure and size of the trained model and its computational complexity, the experiments are focused on the application of simpler NN models. Taking into account the distortions characteristic for RS images, some additional NR metrics, not considered previously in [24], have been included in calculations, increasing the possibilities of their choice and optimization.
Since the accuracy values of even the best NR metric (IL-NIQE) are quite low, a remarkable improvement may be expected using the combined NN metric, potentially better than using both approaches considered above (weighted product and weighted sum). Nevertheless, it would be difficult to directly compare the obtained combined metric with the results presented in [24], since the target datasets for the NN differ significantly. However, based on the results for 11 metrics and the fact that the NA subset has been included as a part of previous experiments, the expected improvement should reach the SROCC values at least 0.75-0.8 for a comparable number of metrics.
When designing and training a neural network, it is necessary to take into account some conditions that significantly affect its efficiency: Input and output data of the neural network.
• The values of the visual quality metrics are used as inputs. The NN result should be an indicator corresponding to the human visual system (HVS). Therefore, the target data are the MOS estimates of the test image databases and the accuracy of the The number of the applied metrics.

•
One of the design goals is to keep computational complexity possibly low, so a smaller number of metrics is preferable. To analyze the influence of the number of input metrics on the overall efficiency, neural networks with different numbers up to over 40 metrics are considered; • The choice of the applied metrics. • Based on the research results for the reference metrics of visual quality in [11], the regression analysis approach using the Lasso method is quite effective. This method allows us to reduce the complexity of the network by assigning zero weights for non-essential features (elementary metrics in our case). By using different thresholds, we can define the "cut-off" levels for each of the possible metrics, thereby determining their importance in the resulting composite metric. It should be noted that a simple linear model is used here, and the selection of results will be approximate. Therefore, it is recommended to validate and carry out single replacements as needed; • Type of the neural network.

•
As the results in [72] have shown, the use of more complex modifications with nonlinear dependencies between layers as compared to feed-forward networks leads to longer training but may lead to a noteworthy increase in the accuracy of the designed metric. At the same time, the results obtained for the best types-cascade and Elman networks-gave comparable efficiency. Therefore, in the paper, the feed-forward and cascade-forward neural networks are analyzed.
The Lasso algorithm has an important feature that should be taken into account during the calculations. It performs a sequential optimization of the weights, so its result depends on the starting point of reference. In the context of neural networks, this means that the order of the metrics in the input data is essential. Therefore, the training procedure was carried out in two stages.
At the first stage, 100 sequences have been randomly generated from the considered metrics. For the initial configuration, from 5 input metrics, neural networks have been calculated assuming the use of both types of neural networks and a different number of layers and neurons (20 training rounds with a randomly subdivided test set). As shown in previously obtained results [11], the efficiency is enhanced with an increase in the number of metrics, and for the maximum size (all metrics) the influence of Lasso is minimized. Therefore, the minimum set of input data is the most informative. It should be noted that the presence of only 3 input data vectors (elementary metrics) does not allow to ensure the optimal choice of metrics (SROCC < 0.7), and more accurate results have been obtained when searching for 5 metrics (SROCC > 0.8). Depending on the configuration and the type of the neural network (feed-forward or cascade-forward), the difference in the SROCC value can reach 0.04 for a smaller number of inputs. In this case, the advantages over the elementary metrics can be provided by both of them, however, more often by the cascade-forward networks, used also in the resulting network.
At the second stage, for each type of neural network, several configurations are considered to select the optimal structure, similarly to [11]: from 1 to 5 hidden layers, with an equal or evenly decreasing number of neurons in each layer. Similarly, it has been previously determined that the appropriate selection of suitable NN activation functions eliminates the need for preliminary data linearization. The use of a higher number of neurons does not provide significantly better results. They are very similar for each number of layers and mostly dependent on the used datasets, therefore the decision to use 5 layers has been made. As activation function in hidden layers, we used tansig that allows normalizing the infinitive range of values to the limited range [−1;1] and the linear one for the last layer. After Lasso's calculations, the combinations of metrics shown in Table 2 are considered (NNZ means the number of non-zero values). Another advantage of the application of Lasso is that the use of even the smallest thresholds eliminates the weakest metrics. Since there are some other databases for which it is possible to check the performance of the optimized combined metrics, such an analysis has been performed for the combination of 5 and 10 elementary metrics. To verify the universality of the proposed approach, it has been checked for widely known LIVE database [28] (for all types of distortions), for several subsets of distortions that can be formed for TID2013, as well as for several subsets for the KADID-10k database [73] that contain distortions typical for the RS applications. For comparison, the elementary metrics used as inputs of the combined NN metric have been taken.

Results
The results of the optimization of the pairs of elementary metrics confirm the validity of the proposed approach since a significant increase of the SROCC may be observed for the best combinations in comparison to IL-NIQE shown in Table 1. The best five combinations for the NA subset are presented in Table 3 together with the absolute values of the KROCC and PLCC values, although Spearman's correlation has been assumed as the objective function during the optimization. Similar results obtained for the whole TID2013 database are shown in Table 4. Due to very low values of the Pearson's correlation (below 0.1) for some metrics, their precise values have not been presented. The results provided in Tables 3 and 4 illustrate some interesting properties of the combined metrics. Firstly, all five "best" combinations for the NA subset contain the IL-NIQE metric, however, the situation for the whole TID2013 database is a bit different, particularly for the CM + 2 family of metrics. Secondly, the elementary metrics combined with IL-NIQE are not the "best" among the elementary metrics listed in Table 1. It confirms that the combination of various types of metrics leads to better performance the use of those based on similar assumptions.
Starting from the combinations listed in Tables 3 and 4, the respective CM 3 . and CM + 3 families of the combined metrics have been optimized, as well as similar combinations of more elementary metrics. The obtained "best" combinations for the NA subset are presented in Table 5. As it may be observed for the combinations of five elementary metrics three of them are the same for both types of combined metrics whereas two of them are different. After a further increase of the number of elementary metrics (N > 5) there are no significant improvements in the performance, however, it leads to noticeably higher computational complexity. Table 6 illustrates the best results obtained using the combinations of 3 to 5 elementary metrics using formulas (1) and (2), for the whole TID2013 database.  Table 5. Performance of the best combinations of NR elementary metrics (N > 2) for the NA subset assuming the use of the weighted product (CM) and weighted sum (CM + ).  Table 6. Performance of the best combinations of NR elementary metrics (N > 2) for the whole TID2013 database assuming the use of the weighted product (CM) and weighted sum (CM + ). The results obtained for neural networks with 5, 10, and 15 inputs selected by the Lasso algorithm are shown in Table 7. In its description 1 L-4 L denote the number of hidden layers, whereas "equal" and "half" stand for the number of neurons. The number of neurons in the first hidden layer was equal to the number of elementary metrics. There were two variants for determining the number(s) of neurons in the next hidden layers. The first option was to set the same number(s) of neurons whilst the second was to reduce the number of neurons approximately twice in each next hidden layer. Table 7. Results of the cascade NN obtained for the NA subset using the 5, 10, and 15 input metrics (1 L-4 L denote the number of hidden layers, "equal" and "half" are the number of neurons; the best SROCC and PLCC correlations are marked with bold fonts). A graphical illustration of the obtained performance for some of the "best" combinations is presented on the scatter plots shown in Figure 2 and discussed in Section 4.

##
The dependence of the accuracy of the neural network (considered as the SROCC values calculated for the NA subset) on the number of input metrics is shown in Figure 3. For different configurations of networks, fairly similar results have been obtained. For this task, the most critical factors are the list of the used metrics, their number, whereas the specific network configuration is less crucial. In terms of the totality of all stages of networks training, it can be noted that a single-layer network does not always provide the necessary computing capacity, and the use of 4 hidden layers does not lead to a visible advantage. Hence, a stable result with a low complexity cascade network is ensured when 2-3 hidden layers are used and, basically, with an equal number of neurons in the layers.
The results of the verification of the universality of the proposed NN-based metric are presented in Table 8. The results obtained for the whole LIVE database are not the best. The NN-based metric is better than some elementary metrics and worse than other ones, however, the obtained SROCC for this database is considered large enough. It should be noted here that some elementary metrics have been obtained just for a limited set of distortion types present in the LIVE database and this explains the aforementioned observations. For different subsets of the TID2013, the results for the designed NN-based metric are either the best or close to the best.
For 5 metrics, the results for the NN-based metrics are close to the best elementary metric whereas different elementary metrics are the best for different subsets. It confirms the observation of the benefits achieved for the combination of the elementary metrics that are not the best themselves but somehow complementary, utilizing various "nature" of image data (different features, data representation, transforms, etc.) and based on different assumptions. To illustrate the idea of this "compensation", the scatter plots obtained for elementary metrics and their combination for the KADID-10k database are presented in Figure 4. Different outliers may be easily observed for elementary metrics and the NN-based metric provides the smallest number of obvious outliers and practically linear dependence between MOS and the metric. The results of the verification of the universality of the proposed NN-based metric are presented in Table 8. The results obtained for the whole LIVE database are not the best. The NN-based metric is better than some elementary metrics and worse than other ones, however, the obtained SROCC for this database is considered large enough. It should be noted here that some elementary metrics have been obtained just for a limited set of distortion types present in the LIVE database and this explains the aforementioned  For the subsets of the KADID-10k database, the data for 5 and 10 metrics have been considered separately. This database contains the following groups of distortions [73]: blurs (##1-3), color distortions (##4-8), compression (##9-10), noise (##11-15), brightness change (##16-18), spatial distortions (##19-23), over-sharpening (#24), and contrast change (#25).
For 5 metrics, the results for the NN-based metrics are close to the best elementary metric whereas different elementary metrics are the best for different subsets. It confirms the observation of the benefits achieved for the combination of the elementary metrics that are not the best themselves but somehow complementary, utilizing various "nature" of image data (different features, data representation, transforms, etc.) and based on different assumptions. To illustrate the idea of this "compensation", the scatter plots obtained for elementary metrics and their combination for the KADID-10k database are presented in Figure 4. Different outliers may be easily observed for elementary metrics and the NNbased metric provides the smallest number of obvious outliers and practically linear dependence between MOS and the metric.

Discussion
The visual analysis of scatter plots presented in Figure 2c,d shows one important benefit of the NN-based combined metrics. The dependence of the metric on MOS values is practically linear and this produces high SROCC and PLCC, simultaneously. This is caused by the fact that MOS values have been used as the target function in the NN training. For the designed metric based on the weighted sum ( Figure 2b) the dependence is not linear and, due to this, PLCC can be sufficiently smaller than SROCC. Meanwhile, this drawback can be easily removed by an appropriate fitting made after the calculation of and optimization of its coefficients. For some of the and metrics, particularly based on the combination of a relatively small number of elementary metrics, the optimization of their coefficients according to the maximization of Spearman's correlation, has led to very low values of Pearson's correlation and high nonlinearity of the relation between them and MOS values. One of the possible approaches to avoid this problem is the choice of the PLCC as the goal function, similarly as in some earlier papers [29,30], however, in this case, smaller SROCC values may be achieved.
Although for 5 elementary metrics better SROCC results may be obtained using the weighted sum, even in comparison with the NN-based approach, the advantages of the neural networks may be observed when more elementary metrics are used as inputs, as illustrated in Figure 3.
The choice of elementary metrics made by the Lasso algorithm during the NN-based design differs from the "best" combinations used in the and metrics. Comparing the case of five elementary metrics, the following metrics have been selected by the Lasso: SISBLIM_WFB, IL-NIQE, HOSA, CORNIA, and MSGF, whereas the highest SROCC value for the metric has been obtained for the combination of ARISMC, IL-NIQE, CORNIA, WNJE, and QAC as presented in Table 5. Interestingly, only two of the metrics selected in these two cases are the same.
As shown in Table 8, for 10 elementary metrics, the performance of the NN-based metric is the best for 4 out of 5 subsets of the KADID-10k database and close to the best in the remaining case. Thus, in general, although optimized for the database TID2013, the performance of the designed NN-based metrics is appropriately good for other databases.
The results presented in the paper confirm the usefulness of the various combination methods of elementary no-reference image quality metrics for the evaluation of images subject to distortions typical for RS images. Nevertheless, considering the results obtained for the whole TID2013 database, further experiments would be necessary to improve the correlation of such metrics with subjective quality scores as well as for the enhancement

Discussion
The visual analysis of scatter plots presented in Figure 2c,d shows one important benefit of the NN-based combined metrics. The dependence of the metric on MOS values is practically linear and this produces high SROCC and PLCC, simultaneously. This is caused by the fact that MOS values have been used as the target function in the NN training. For the designed CM + 5 metric based on the weighted sum ( Figure 2b) the dependence is not linear and, due to this, PLCC can be sufficiently smaller than SROCC. Meanwhile, this drawback can be easily removed by an appropriate fitting made after the calculation of CM + and optimization of its coefficients.
For some of the CM and CM + metrics, particularly based on the combination of a relatively small number of elementary metrics, the optimization of their coefficients according to the maximization of Spearman's correlation, has led to very low values of Pearson's correlation and high nonlinearity of the relation between them and MOS values. One of the possible approaches to avoid this problem is the choice of the PLCC as the goal function, similarly as in some earlier papers [29,30], however, in this case, smaller SROCC values may be achieved.
Although for 5 elementary metrics better SROCC results may be obtained using the weighted sum, even in comparison with the NN-based approach, the advantages of the neural networks may be observed when more elementary metrics are used as inputs, as illustrated in Figure 3.
The choice of elementary metrics made by the Lasso algorithm during the NN-based design differs from the "best" combinations used in the CM and CM + metrics. Comparing the case of five elementary metrics, the following metrics have been selected by the Lasso: SISBLIM_WFB, IL-NIQE, HOSA, CORNIA, and MSGF, whereas the highest SROCC value for the CM + 5 metric has been obtained for the combination of ARISMC, IL-NIQE, CORNIA, WNJE, and QAC as presented in Table 5. Interestingly, only two of the metrics selected in these two cases are the same.
As shown in Table 8, for 10 elementary metrics, the performance of the NN-based metric is the best for 4 out of 5 subsets of the KADID-10k database and close to the best in the remaining case. Thus, in general, although optimized for the database TID2013, the performance of the designed NN-based metrics is appropriately good for other databases.
The results presented in the paper confirm the usefulness of the various combination methods of elementary no-reference image quality metrics for the evaluation of images subject to distortions typical for RS images. Nevertheless, considering the results obtained for the whole TID2013 database, further experiments would be necessary to improve the correlation of such metrics with subjective quality scores as well as for the enhancement of the universality of the combined NR metrics. A design of such metrics sensitive to various types of distortions should be one of the natural directions of further research.