BM-IQE: An Image Quality Evaluator with Block-Matching for Both Real-Life Scenes and Remote Sensing Scenes

Like natural images, remote sensing scene images; of which the quality represents the imaging performance of the remote sensor, also suffer from the degradation caused by imaging system. However, current methods measuring the imaging performance in engineering applications require for particular image patterns and lack generality. Therefore, a more universal approach is demanded to assess the imaging performance of remote sensor without constraints of land cover. Due to the fact that existing general-purpose blind image quality assessment (BIQA) methods cannot obtain satisfying results on remote sensing scene images; in this work, we propose a BIQA model of improved performance for natural images as well as remote sensing scene images namely BM-IQE. We employ a novel block-matching strategy called Structural Similarity Block-Matching (SSIM-BM) to match and group similar image patches. In this way, the potential local information among different patches can get expressed; thus, the validity of natural scene statistics (NSS) feature modeling is enhanced. At the same time, we introduce several features to better characterize and express remote sensing images. The NSS features are extracted from each group and the feature vectors are then fitted to a multivariate Gaussian (MVG) model. This MVG model is therefore used against a reference MVG model learned from a corpus of high-quality natural images to produce a basic quality estimation of each patch (centroid of each group). The further quality estimation of each patch is obtained by weighting averaging of its similar patches’ basic quality estimations. The overall quality score of the test image is then computed through average pooling of the patch estimations. Extensive experiments demonstrate that the proposed BM-IQE method can not only outperforms other BIQA methods on remote sensing scene image datasets but also achieve competitive performance on general-purpose natural image datasets as compared to existing state-of-the-art FR/NR-IQA methods.


Introduction
Real-time monitoring the performance of imaging equipment is important in practical applications such as environmental monitoring and resources exploration [1,2]. In particular, the imaging performance of on-orbit space remote sensors can only be assessed via processing and analyzing the images transmitted from the satellite. Like natural images, remote sensing images also suffer from the degradations caused by the imaging system (as shown in Figure 1). The most salient and primary impact of image degradation is the decrease of image definition, that is, the decrease of visual and primary impact of image degradation is the decrease of image definition, that is, the decrease of visual perception effect which influences the subsequent remote sensing image interpretation. Therefore, remote sensing image quality assessment (RS-IQA) becomes helpful. Current solutions for evaluating the performance of remote sensors include the Target method [3], the Knife-edge method [4], the Pulse method [5], and others. However, in the Target method, not all on-orbit space remote sensors can obtain target images; and in the Knife-edge method and the Pulse method, it is challenging to provide every image returned with effective Knife-edges or pulses. Except for harsh imaging conditions, such deficiencies cause these methods to not be feasible or generalizable to other types of remote sensors. At present, it lacks normative approaches to assess the imaging performance of remote sensors. Thus, it is urgent to develop a universal method to evaluate the imaging performance of remote sensors via RS-IQA.

Related Works
Current approaches to assess the performance of remote sensors typically include those in references [3][4][5]. To the best of our knowledge, there is no IQA method dedicated to RS images, and thus, the development of IQA methods for natural images is reported. In general, IQA methods can be categorized into two parts-subjective assessment methods and objective assessment methods. Due to the fact that the subjective way of monitoring the image quality by humans is of great cost and low efficiency, in the past few decades, the growing demand for objective assessment methods in practical applications has become prominent and urgent. Objective IQA tasks can be divided into three categories-full-reference IQA (FR-IQA), reduced-reference IQA (RR-IQA), and no-reference IQA (NR-IQA)-among which NR-IQA is the most common and challenging method. In the case of NR-IQA, we can only accomplish the IQA task with degraded images, since the pristine images are usually unavailable. Although a great number of IQA algorithms in the last two decades have emerged to achieve a common goal, which is to conform the computational evaluation to human perception, they can only cover limited application requirements we usually meet in practice. Hence, there are still huge potentials and an important gap that needs to be filled in NR-IQA issues.
Early NR-IQA models commonly operate under the hypothesis that images are degraded by particular kind or several specified kinds of distortions [6][7][8][9], which requires a priori knowledge of the image distortion types. Limited by the selection of distortion types, such algorithms depending on the priori cannot achieve further progress. Later, a new NR-IQA class with no demand for prior knowledge of distortion types, labeled as blind image quality assessment (BIQA), appeared. The main idea of BIQA is training a model on the database that consists of distorted images associated with subjective assessment scores. A typical model is the Blind Image Quality Index (BIQI) [10]. With a pre-trained estimation model of distortion types, first, the BIQI method extracts the scene statistics from a given test image. Then, these statistics are used to determine which distortion type(s) the image suffered from, and the final image quality score is computed based on the extracted scene statistics and the pre-judged distortion types. The BIQI model was later extended to the DIIVINE [11] model. The improvement lies in its adoption of a more abundant set of natural scene statistics. Beside of BIQI and DIIVINE, Saad et al. successively proposed two models called BLINDS [12] and BLINDS-II [13]. Both methods can be simplified to learn a probabilistic model from the natural scene statisticsbased feature set, and the difference between them is the computational complexity of feature

Related Works
Current approaches to assess the performance of remote sensors typically include those in references [3][4][5]. To the best of our knowledge, there is no IQA method dedicated to RS images, and thus, the development of IQA methods for natural images is reported. In general, IQA methods can be categorized into two parts-subjective assessment methods and objective assessment methods. Due to the fact that the subjective way of monitoring the image quality by humans is of great cost and low efficiency, in the past few decades, the growing demand for objective assessment methods in practical applications has become prominent and urgent. Objective IQA tasks can be divided into three categories-full-reference IQA (FR-IQA), reduced-reference IQA (RR-IQA), and no-reference IQA (NR-IQA)-among which NR-IQA is the most common and challenging method. In the case of NR-IQA, we can only accomplish the IQA task with degraded images, since the pristine images are usually unavailable. Although a great number of IQA algorithms in the last two decades have emerged to achieve a common goal, which is to conform the computational evaluation to human perception, they can only cover limited application requirements we usually meet in practice. Hence, there are still huge potentials and an important gap that needs to be filled in NR-IQA issues.
Early NR-IQA models commonly operate under the hypothesis that images are degraded by particular kind or several specified kinds of distortions [6][7][8][9], which requires a priori knowledge of the image distortion types. Limited by the selection of distortion types, such algorithms depending on the priori cannot achieve further progress. Later, a new NR-IQA class with no demand for prior knowledge of distortion types, labeled as blind image quality assessment (BIQA), appeared. The main idea of BIQA is training a model on the database that consists of distorted images associated with subjective assessment scores. A typical model is the Blind Image Quality Index (BIQI) [10]. With a pre-trained estimation model of distortion types, first, the BIQI method extracts the scene statistics from a given test image. Then, these statistics are used to determine which distortion type(s) the image suffered from, and the final image quality score is computed based on the extracted scene statistics and the pre-judged distortion types. The BIQI model was later extended to the DIIVINE [11] model. The improvement lies in its adoption of a more abundant set of natural scene statistics. Beside of BIQI and DIIVINE, Saad et al. successively proposed two models called BLINDS [12] and BLINDS-II [13]. Both methods can be simplified to learn a probabilistic model from the natural scene statistics-based Sensors 2020, 20, 3472 3 of 24 feature set, and the difference between them is the computational complexity of feature extraction. Moreover, Mittal et al. proposed a model called BRISQUE [14], where they applied locally normalized luminance coefficients to estimate the loss of naturalness of the degraded image and gave a final image quality assessment score on the basis of the loss measurement. Ye et al. [15] proposed an unsupervised feature learning framework for BIQA namely CORNIA, which operates under a coding manner and realizes a combination of feature and regression training. CORNIA was later refined to the semantic obviousness metric (SOM) [16], where object-like regions are mainly detected and processed. In [17], another new BIQA model named DESIQUE was presented, which adopts features in both spatial and frequency domain.
However, all these approaches share the common problem of weak generalization ability [18]. Specifically, these models need to be trained on certain distorted image database(s) to learn a regression model. When applied to other different databases, they show rather weak performance [18]. What is more, since the distortion types are changeable and numerous in the real world and an image can suffer from a single or multiple distortions, it is impossible for a BIQA algorithm to train on a database perfectly containing all such distortion types. In other words, in no way can we acquire complete prior knowledge of image distortion types, which will inevitably result in the poor generalization ability of such algorithms. Therefore, it is of great significance to develop more general and more practical BIQA methods.
The Natural Image Quality Evaluator (NIQE) model [19] possesses better generalization ability. The NIQE model needs to be first trained on a corpus of high-quality images to learn a reference multivariate Gaussian (MVG) model. Then, with a given test image, the NIQE model extracts an NSS-based feature set and fits the feature vectors to an MVG model. Finally, the overall quality of the test image is predicted by measuring the distance between its MVG model and the reference model. However, this method may cause a loss of useful information. Since only one MVG model is used to characterize the test image, some local information of the image is neglected. To tackle this problem, Zhang et al. proposed a new model-IL-NIQE [18]. The IL-NIQE model partitions a test image into patches and extracts an enriched feature sets from each patch. Therefore, a set of MVG models is obtained and the final image quality score is computed by an averaging pooling. Proposed by Bosse et al. [20], a purely data-driven end-to-end deep neural networks for NR-IQA and FR-IQA also takes the advantage of local information and extracts the deep feature with 10 convolutional layers and five pooling layers and performs regression with two fully connected layers. Recently, a new method called MUSIQUE [21] was proposed, which is different from the previous methods that was only applicable to singly distorted images. The MUSIQUE model was designed for both single and multiple distortions applications through a way of estimating three distortion parameters with the NSS-based features and mapping the parameters into an overall image quality score.
However, we observed that these methods obtained unsatisfying results when applied to images with the content of remote sensing scenes (refer to Section 5 for more information), which shows a weak generalization ability to diverse tasks. Therefore, in this work, we seek a method to efficaciously evaluate the quality of images for both real-life natural scenes and RS scenes.
Note that, in this paper, the natural image represents the images of real-life scenes, and the remote sensing image represents that of RS scenes, so as to differentiate the image content as well as to conform to the RGB image format of these two kinds of images.

Our Contributions
In this paper, we propose a general-purpose BIQA model for real-life scenes as well as remote sensing scenes namely BM-IQE. Inspired by [18] and based on IL-NIQE, we introduced an enriched feature bag (EFB) and a structural similarity block-matching (SSIM-BM) strategy to ensure the proposed method performs well on RS-IQA applications. Meanwhile, the proposed method can achieve competitive performance on natural images as compared to existing state-of-the-art BIQA methods. A general framework of the proposed BM-IQE model is shown in Figure 2. The contributions of our BM-IQE model are as follows: 1. Datasets for BIQA of the RS scene images are first constructed based on public scene datasets and simulated degraded images. 2. Imaging performance evaluation by means of image quality assessment of the remote sensor is first studied, and a new way of indirectly evaluating the imaging performance of remote sensors is presented. 3. We introduce a block-matching strategy to assist the image patch strategy. This operation can better express the intrinsic features of the image patches (such as affinity, correlation, etc.), as well as to make sure the quality prediction suffering less from image degradations. In this way, the image quality assessment can acquire higher efficacy and accuracy. 4. We adopt four classic Gray-level co-occurrence matrix (GLCM) statistics as texture features to ensure that our method can be more appropriately applied to remote sensing applications comparing with existing IQA models, therefore making sure that the proposed model has an enhanced universality and practicality.
We conducted an extensive series of experiments on various largescale public databases, including RS scene image datasets, singly distorted image, databases and multiply distorted image databases. Note that in many remote sensing applications, such as scene classification and target detection, because images with visible bands of red, green, and blue can fully present the color, texture and contour features of the land cover from the human visual perception aspect, people usually use RGB images for content understanding. Besides, the algorithm proposed in this paper is a general image quality assessment approach, which is oriented to a wide range of natural scene types including both RS scenes and common real-life scenes, so the RGB colors play a very fundamental and significant role as a low-level image feature to recognize objects and understand content. Thus, in this paper, only images with RGB channels are used for experiments, and overall the proposed algorithm was accordingly designed for RBG images of RS scenes and real-life scenes. Experimental results show that the proposed BM-IQE method outperforms other state-of-the-art IQA models on RS scenes and is highly efficacious and robust on real-life scenes.
The rest of this paper is organized as follows. Section 2 introduces the block-matching strategy used in BM-IQE. Section 3 introduces the features we adopt to predict image quality. Section 4 illustrates how the proposed new model is designed. Section 5 presents the experimental results and Section 6 presents the general conclusions of this paper.

The Proposed Framework
Zhang et al. [18] indicate that partitioning a test image into several patches can make better use of image local features, therefore minimizing the loss of useful information. However, this operation can only enhance the expression of individual patch features but cannot exploit and utilize the latent feature information between the image patches. What is more, the distribution of the degradation effects is uneven on an image, so that the variation caused by degradation in the natural scene statistics extracted from one patch may have a big discrepancy with others. Thus, the statistics calculated from a single patch does not have strong representativeness, and the final pooling result Datasets for BIQA of the RS scene images are first constructed based on public scene datasets and simulated degraded images.

2.
Imaging performance evaluation by means of image quality assessment of the remote sensor is first studied, and a new way of indirectly evaluating the imaging performance of remote sensors is presented.

3.
We introduce a block-matching strategy to assist the image patch strategy. This operation can better express the intrinsic features of the image patches (such as affinity, correlation, etc.), as well as to make sure the quality prediction suffering less from image degradations. In this way, the image quality assessment can acquire higher efficacy and accuracy.

4.
We adopt four classic Gray-level co-occurrence matrix (GLCM) statistics as texture features to ensure that our method can be more appropriately applied to remote sensing applications comparing with existing IQA models, therefore making sure that the proposed model has an enhanced universality and practicality.
We conducted an extensive series of experiments on various largescale public databases, including RS scene image datasets, singly distorted image, databases and multiply distorted image databases. Note that in many remote sensing applications, such as scene classification and target detection, because images with visible bands of red, green, and blue can fully present the color, texture and contour features of the land cover from the human visual perception aspect, people usually use RGB images for content understanding. Besides, the algorithm proposed in this paper is a general image quality assessment approach, which is oriented to a wide range of natural scene types including both RS scenes and common real-life scenes, so the RGB colors play a very fundamental and significant role as a low-level image feature to recognize objects and understand content. Thus, in this paper, only images with RGB channels are used for experiments, and overall the proposed algorithm was accordingly designed for RBG images of RS scenes and real-life scenes. Experimental results show that the proposed BM-IQE method outperforms other state-of-the-art IQA models on RS scenes and is highly efficacious and robust on real-life scenes.
The rest of this paper is organized as follows. Section 2 introduces the block-matching strategy used in BM-IQE. Section 3 introduces the features we adopt to predict image quality. Section 4 illustrates how the proposed new model is designed. Section 5 presents the experimental results and Section 6 presents the general conclusions of this paper.

The Proposed Framework
Zhang et al. [18] indicate that partitioning a test image into several patches can make better use of image local features, therefore minimizing the loss of useful information. However, this operation can only enhance the expression of individual patch features but cannot exploit and utilize the latent feature information between the image patches. What is more, the distribution of the degradation effects is uneven on an image, so that the variation caused by degradation in the natural scene statistics Sensors 2020, 20, 3472 5 of 24 extracted from one patch may have a big discrepancy with others. Thus, the statistics calculated from a single patch does not have strong representativeness, and the final pooling result may not well estimate the distortion of the entire image. To mine the correlations among the image patches and make more representative the statistical features of each patch, in this paper, block-matching is introduced as an efficient means to group the similar patches of an image to perform a better feature extraction. By grouping the similar patches, first, the potential local information among different patches can be expressed through group-level feature extraction. Then, the impact on fitting features to an MVG model caused by distortion can be attenuated; the reason is as the data of natural scene statistics increased, so the image NSS feature can realize an enhanced expression and therefore suffer less effects from the distortions.

Grouping
Grouping can be realized in many ways, such as self-organizing maps [22], fuzzy clustering [23], vector quantization [24], etc. However, these methods are computationally demanding and may produce groups with overlaps. Furthermore, in [25], the author indicates that the clustering can cause unequal treatment of different fragments, because the ones that are close to the centroid in the group are more representative than those far from it. Therefore, a more efficient and precise way to realize grouping is needed.

Block-Matching
Block-matching is a strategy often used for image denoising and motion estimation. Matching is a method for finding fragments that are similar to the reference. It can be achieved by estimating the similarity between the reference fragment and each of the candidate fragments. Typical matching models are mean absolute differences (MAD) [26], sum of absolute differences (SAD) [27], sequential similarity detection algorithm (SSDA) [28], sum of absolute transformed difference (SATD) [29], etc. Compared with the grouping methods above, matching methods can achieve grouping in a much more effective and efficient way. Usually, similarity is measured by the distance between two fragments, and the fragments whose distance from the reference is smaller than a specified threshold are considered mutually similar and are subsequently grouped [25]. However, the commonly used distance measurement index such as Euclidean distance and Manhattan distance both have the following inadequacies. First, they are easily affected by the dimension of the data. Second, they cannot effectively reflect the intrinsic correlation of the data [30,31]. Therefore, aiming at avoiding such problems, in this paper, we introduce a new matching model called SSIM-BM, where we take the structural similarity index as the distance metric.
SSIM is a widely accepted FR-IQA metric that comprehensively integrates local luminance, local contrast, and image structure [32]. Then, the similarity is computed by a comprehensive integration of local luminance, local contrast, and structure.
Natural images are statistically highly structured, which has been confirmed in both the spatial and frequency domain in previous research. Especially in the case of RS images, they are remarkably characterized by structure features and texture features that convey important information that is useful for human visual perception. Hence, in order to make use of structure-level features, we take SSIM as distance metric for block-matching, where we compare local patterns of pixel intensities that have been normalized for luminance and contrast. The computations of local luminance µ X , local contrast σ X and local structure σ XY are given by Equations (1)-(4), respectively.
Sensors 2020, 20, 3472 6 of 24 where i and j are spatial coordinates and R and C represent the size of input data. In this way, we can more efficaciously exploit the potential similarity so that estimate the true signals among the image fragments. An illustration of block-matching and grouping is presented in Figure 3.
Sensors 2020, 20, x FOR PEER REVIEW 6 of 24 where i and j are spatial coordinates and R and C represent the size of input data. In this way, we can more efficaciously exploit the potential similarity so that estimate the true signals among the image fragments. An illustration of block-matching and grouping is presented in Figure 3. Note that to determine the threshold for SSIM-BM, we observe the grouping results after each pass of BM test and make corresponding adjustment. To this purpose, the grouping accuracy (higher value leads to small group scale) and the group scale need to be balanced, thus the tuning criterion is set as the threshold value leading to no false positive result and no less than two patches (include the reference patch) in each group is chosen.
By grouping similar image patches together, we can obtain an increasing amount of feature data through group-level feature extraction. Hence, the feature vectors can be more soundly represented and the true signals among different patches can get expressed. The products of SSIM-BM, beside those of the matrix made up of similar patch groups, include a similarity matrix and a location matrix. The former records the similarity computed during each matching operation, and the latter records the location of each matched patch in the given test image. Thus, both of the matrices have the same size with the group array, and their elements correspond one-to-one. All these products will be later used for a pooling strategy.

NSS Feature Extraction
In previous works, NSS-based features have shown great image quality prediction abilities, which makes the features in natural image BIQA tasks widely used. The NSS-based features can be extracted from spatial domain [11,12], DCT domain [13,14], Wavelet domain [33], etc. Each different NSS feature delivers useful information from an aspect of images. For RS images, they are characterized by the spectrum features, edge features and texture features. However, the existing general-purpose BIQA models mainly focus on the grayscale features and structure features but rarely discuss about the image texture feature [18,21]. Therefore, in order to improve the evaluation performance and enlarge the application scope of the general-purpose BIQA models, in BM-IQE, we introduce texture features to better conform to the remote sensing scopes. Although some of the features we adopt have been introduced in previous works, we collectively propose a new feature bag consisting of texture features and several existing features. Experiments show that the newly introduced feature bag possesses superior image quality prediction performance.

Statistical Features of MSCN Coefficients
Human visual perception is sharp to the areas in an image of high contrast. Changes in image contrast can have significant impacts on the image quality. Hence, as the image contrast is closely related to both of the subjective and objective image quality evaluation, we adopt the contrast feature for our BIQA tasks. The mean subtracted contrast normalized (MSCN) coefficient is a kind of commonly used contrast feature. Ruderman et al. [34] pointed out that the locally normalized Note that to determine the threshold for SSIM-BM, we observe the grouping results after each pass of BM test and make corresponding adjustment. To this purpose, the grouping accuracy (higher value leads to small group scale) and the group scale need to be balanced, thus the tuning criterion is set as the threshold value leading to no false positive result and no less than two patches (include the reference patch) in each group is chosen.
By grouping similar image patches together, we can obtain an increasing amount of feature data through group-level feature extraction. Hence, the feature vectors can be more soundly represented and the true signals among different patches can get expressed. The products of SSIM-BM, beside those of the matrix made up of similar patch groups, include a similarity matrix and a location matrix. The former records the similarity computed during each matching operation, and the latter records the location of each matched patch in the given test image. Thus, both of the matrices have the same size with the group array, and their elements correspond one-to-one. All these products will be later used for a pooling strategy.

NSS Feature Extraction
In previous works, NSS-based features have shown great image quality prediction abilities, which makes the features in natural image BIQA tasks widely used. The NSS-based features can be extracted from spatial domain [11,12], DCT domain [13,14], Wavelet domain [33], etc. Each different NSS feature delivers useful information from an aspect of images. For RS images, they are characterized by the spectrum features, edge features and texture features. However, the existing general-purpose BIQA models mainly focus on the grayscale features and structure features but rarely discuss about the image texture feature [18,21]. Therefore, in order to improve the evaluation performance and enlarge the application scope of the general-purpose BIQA models, in BM-IQE, we introduce texture features to better conform to the remote sensing scopes. Although some of the features we adopt have been introduced in previous works, we collectively propose a new feature bag consisting of texture features and several existing features. Experiments show that the newly introduced feature bag possesses superior image quality prediction performance.

Statistical Features of MSCN Coefficients
Human visual perception is sharp to the areas in an image of high contrast. Changes in image contrast can have significant impacts on the image quality. Hence, as the image contrast is closely related to both of the subjective and objective image quality evaluation, we adopt the contrast feature for our BIQA tasks. The mean subtracted contrast normalized (MSCN) coefficient is a kind of commonly used contrast feature. Ruderman et al. [34] pointed out that the locally normalized luminance map of a natural grayscale photographic image I conforms to a Gaussian distribution. The products of the normalized process are called MSCN coefficients, and the normalized process is given by: where i and j are spatial coordinates. ω = ω k,l k = −K, . . . , K, l = −L, . . . , L defines a unit-volume Gaussian window. µ is the local mean field, and σ is the corresponding local variance field. µ and σ denote the image local mean and image local contrast, respectively. An important attribute of MSCN coefficients is that its local correlation over the image content is not strong. Therefore, we can ensure the employed features take effects over different image scenes. The MSCN coefficients map is proved to conform to a unit Normal distribution. According to (5)- (7), the luminance map I of a test image is decorrelated through local mean subtraction and divisive normalization process to yield the MSCN coefficients map. Then, a zero-mean generalized Gaussian distribution (GGD) is adopted to model the histogram of the MSCN coefficients mapÎ(i, j), and its density function is given by where and Γ(·) is the gamma function The parameter α takes control of the shape of the GGD, and β controls the corresponding variance. The two parameters are employed as NSS features for image quality prediction.
In addition to the MSCN coefficients histogram, some of its derived NSS features were also introduced in previous works. In [14], the statistical relationships between neighboring pixels were modeled. The author suggested that while MSCN coefficients are definitely more homogenous for pristine images, the signs of adjacent coefficients also exhibit a regular structure, which gets disturbed in the presence of distortion. Pairwise neighboring MSCN coefficients are computed along four orientations: horizontal (H), vertical (V), main-diagonal (MD) and secondary-diagonal (SD); they are denoted byÎ(i, j)Î(i, j + 1),Î(i, j)Î(i + 1, j),Î(i, j)Î(i + 1, j + 1), andÎ(i, j)Î(i + 1, j − 1), respectively, where i and j are the spatial coordinates. The paired products of neighboring MSCN coefficients are observed to follow a zero-mode asymmetric generalized Gaussian distribution (AGGD), The mean of the AGGD is Then, the parameters (γ, β l , β r , η) are taken as derived NSS features at four orientations for our BIQA tasks. Particularly, all MSCN statistical features are extracted at two scales (the original scale and a low-pass down-sampled scale) to capture multi-scale information.

Statistical Features of Colors
In order to exploit further information that human perception is closely related to in color images, we resort to a classical NSS model. Ruderman et al. [35] pointed out that the distributions of natural image statistics conform well to a Gaussian probability model in a logarithmic-scale opponent color space. The color values (R, G, B) for each pixel in an image are determined from three human cone quantal catches through a direct linear correspondence (refer to [36] for detail), and the specifical coordinate transformation is as follows. Each of the three channels R(i, j), G(i, j), and B(i, j) is converted to a logarithmic signal, where the mean is subtracted away: where µ R , µ G and µ B are the corresponding mean values to the log R(i, j), log G(i, j) and log B(i, j) over the image. Then, an orthogonal decorrelation process is added to the three logarithmic signals robustly producing three principal axes, which are given by: The coefficients l 1 , l 2 , and l 3 are observed to follow a Gaussian and symmetrical distribution. The empirical density functions of l 1 , l 2 , and l 3 are given by a Gaussian model, By estimating the parameters µ and σ 2 , we can obtain two additional NSS features and the features are extracted for each of the three channels.

Structure Features Extraction
Generally speaking, the optics features we introduced in Section 3.1 can provide amounts of useful information. However, the calculations of these features are mainly made up of individual pixels, which restricts the features to the "unstructured" aspects of the image-equivalent to an investigation Sensors 2020, 20, 3472 9 of 24 of individual rays of light [37]. As T. Pouli pointed out, the first and most obvious way to look at image structure is to examine the relationship between pairs of pixels. Gradient is such a functional statistic. As is given by (20)- (21), the gradients at pixel (i, j) are calculated by convolving the luminance map I with the Gaussian derivative filter along horizontal and vertical orientations, respectively: where G(i, j) is a two-dimensional Gaussian distribution expressed as D i is the horizontal gradient and D j is the vertical gradient. It has been found that natural image gradients are well modeled by a GGD distribution [18]. An example of the gradients distributions for an image is given by Figure 4, where D x and D y denotes the horizontal and vertical gradients. Therefore, by fitting the histograms of the gradient components D i and D j to the GGD model, we can estimate the parameters (α, β) and adopt them as quality aware features.
Sensors 2020, 20, x FOR PEER REVIEW 9 of 24 is the horizontal gradient and is the vertical gradient. It has been found that natural image gradients are well modeled by a GGD distribution [18]. An example of the gradients distributions for an image is given by Figure 4, where and denotes the horizontal and vertical gradients. Therefore, by fitting the histograms of the gradient components and to the GGD model, we can estimate the parameters ( , ) and adopt them as quality aware features.  Beside of the gradient components, in both cases, it is common to calculate the mean gradient magnitude at a given location from the horizontal and vertical components [37]: The gradient magnitude of natural images conforms well to a Weibull distribution: where the parameters a and b control the shape and scale of the Weibull distribution, respectively. Recent studies in neuroscience suggested that the responses of visual neurons are strongly correlate with Weibull statistics during the image processing [38]. Hence, with an optimal fitting of the image gradient magnitude histogram to the Weibull distribution, we obtain the two parameters a and b as NSS features.
In order to further investigate the expression of distortions in image color space, we map the RGB images into a perceptually relevant opponent color space, and the weights in the conversion are perceptually optimized on human visual statistics [  Beside of the gradient components, in both cases, it is common to calculate the mean gradient magnitude at a given location from the horizontal and vertical components [37]: The gradient magnitude of natural images conforms well to a Weibull distribution: where the parameters a and b control the shape and scale of the Weibull distribution, respectively. Recent studies in neuroscience suggested that the responses of visual neurons are strongly correlate with Weibull statistics during the image processing [38]. Hence, with an optimal fitting of the image gradient magnitude histogram to the Weibull distribution, we obtain the two parameters a and b as NSS features.
In order to further investigate the expression of distortions in image color space, we map the RGB images into a perceptually relevant opponent color space, and the weights in the conversion are perceptually optimized on human visual statistics [39]: Based on this, the gradient components and derived magnitude features are also computed on each channel of O 1 , O 2 , and O 3 as NSS features.

Statistics of GLCM
Texture analysis has been widely used in remote sensing image interpretation and processing. Textures in an image are generally understood as a repetitive arrangement of some basic patterns, which can to some extent reflect the structural characteristics of objects. The main idea of texture analysis is to specify the spatial distribution patterns of grayscale images, and it can be effectively realized by means of the GLCM.
GLCM is commonly used for remote sensing image analysis. It utilizes the spatial distribution of gray levels to describe the image texture. Since the gray level distribution are significantly influenced by the presence of distortions, we take GLCM as a powerful analysis tool in our BIQA tasks. Figure 5 is an illustration of the GLCM calculation. Given the image luminance map I, we can obtain the corresponding GLCM map. Specifically, from Figure 5b we can see the GLCM value at the coordinate (1, 1) is 0, which means that no adjacent pixel pair with gray values of (1, 1) can be found in the luminance map ( Figure 5a). Likewise, the GLCM value at the coordinate (1, 2) is 10, which means that 10 pairs of adjacent pixels with gray values of (1, 2) can be found in the image. In general, GLCM map is essentially all possible combinations of adjacent gray values, where the adjacent can be understood in different directions. When the pixel pair is given by f (i, j) and f (i + 1, j), we regard the pixels as adjacent in the 0 • orientation. When the pixel pair is given by f (i, j) and f (i, j + 1), we regard the pixels as adjacent in the 90 • orientation. When the pixel pair is given by f (i, j) and f (i + 1, j + 1), we regard the pixels as adjacent in the 45 • orientation. When the pixel pair is given by f (i, j) and f (i − 1, j + 1), we regard the pixels as adjacent in the 135 • orientation. Hence, the GLCM map is computed at four orientations with three dimensions for color images. GLCM is commonly used for remote sensing image analysis. It utilizes the spatial distribution of gray levels to describe the image texture. Since the gray level distribution are significantly influenced by the presence of distortions, we take GLCM as a powerful analysis tool in our BIQA tasks. Figure 5 is an illustration of the GLCM calculation. Given the image luminance map I, we can obtain the corresponding GLCM map. Specifically, from Figure 5b we can see the GLCM value at the coordinate (1,1) is 0, which means that no adjacent pixel pair with gray values of (1, 1) can be found in the luminance map (Figure 5a). Likewise, the GLCM value at the coordinate (1, 2) is 10, which means that 10 pairs of adjacent pixels with gray values of (1, 2) can be found in the image. In general, GLCM map is essentially all possible combinations of adjacent gray values, where the adjacent can be understood in different directions. When the pixel pair is given by ( , ) and ( + 1, ) , we regard the pixels as adjacent in the 0° orientation. When the pixel pair is given by ( , ) and ( , + 1), we regard the pixels as adjacent in the 90° orientation. When the pixel pair is given by ( , ) and ( + 1, + 1), we regard the pixels as adjacent in the 45° orientation. When the pixel pair is given by ( , ) and ( − 1, + 1), we regard the pixels as adjacent in the 135° orientation. Hence, the GLCM map is computed at four orientations with three dimensions for color images. After we obtain the GLCM map, derived statistics can be further computed according to the map. In this work, we introduce four classic statistics of the GLCM map-contrast, energy, entropy, and correlation. The contrast statistic is closely related to the definition and the texture depth of the image. The deeper the texture groove, the larger the contrast values, and the image is correspondingly clearer. The energy statistic reflects the uniformity of gray levels distribution which essentially represents the texture fineness. A uniform GLCM map represents a fine texture pattern producing a smaller energy value, and an uneven GLCM map represents a coarse texture pattern yielding a larger energy value. The entropy statistic measures the image information randomness. When the values in the GLCM map are all equal, the image pixels show the greatest randomness and the entropy reaches the After we obtain the GLCM map, derived statistics can be further computed according to the map. In this work, we introduce four classic statistics of the GLCM map-contrast, energy, entropy, and correlation. The contrast statistic is closely related to the definition and the texture depth of the image. The deeper the texture groove, the larger the contrast values, and the image is correspondingly clearer. The energy statistic reflects the uniformity of gray levels distribution which essentially represents the Sensors 2020, 20, 3472 11 of 24 texture fineness. A uniform GLCM map represents a fine texture pattern producing a smaller energy value, and an uneven GLCM map represents a coarse texture pattern yielding a larger energy value. The entropy statistic measures the image information randomness. When the values in the GLCM map are all equal, the image pixels show the greatest randomness and the entropy reaches the maximum. Furthermore, the entropy shows the complexity of image gray levels distribution. The larger entropy value suggests a more complex image structure. Correlation is often used to measure the similarity of gray levels in row or in column directions, and a larger correlation indicates a greater image gray levels similarity.
The calculations of the four GLCM statistics are given by where i, j are the spatial coordinates.
To testify that the four statistics introduced above are effective and useful NSS features for the image quality prediction, we conducted a set of demonstrative experiments and the scheme is as follows.

1.
Compute the GLCM map and four derived statistics on a high-quality test image.

2.
Use the 5 × 5 Gaussian kernel to blur the pristine image. The deviation of Gaussian filters is set to four levels-0, 0.5, 1, and 5. The GLCM map and four derived statistics are computed on each blurred image.

3.
Add white noise to the pristine image with zero mean and various variances. The variance of white noise is set to four levels-0, 0.001, 0.005, and 0.01. The GLCM map and four derived statistics are also computed on each degraded image. Figure 6 presents the experiment results. It shows that the four GLCM derivatives can keep great consistency and linearity under varying degradations, which suggests that the four statistics are useful and effective for image quality prediction tasks. Hence, we adopt them as quality-aware features.
where , are the spatial coordinates.
To testify that the four statistics introduced above are effective and useful NSS features for the image quality prediction, we conducted a set of demonstrative experiments and the scheme is as follows.
1. Compute the GLCM map and four derived statistics on a high-quality test image.

Statistics of Log-Gabor Filter Response
Previous research proved that visual cortex neurons respond selectively to stimulus at disparate orientations and frequencies [18,40]. Hence, the multi-scale and multi-orientation filter responses to the image are also significant information for quality assessment tasks.
Fourier transform is known as a powerful tool for signal processing. However, a noticeable disadvantage of the Fourier transform is that the same frequency image elements at different spatial  As shown on the x-axis, four distortion levels were set for each distortion type.

Statistics of Log-Gabor Filter Response
Previous research proved that visual cortex neurons respond selectively to stimulus at disparate orientations and frequencies [18,40]. Hence, the multi-scale and multi-orientation filter responses to the image are also significant information for quality assessment tasks.
Fourier transform is known as a powerful tool for signal processing. However, a noticeable disadvantage of the Fourier transform is that the same frequency image elements at different spatial locations are often mixed together, which makes it impossible to achieve a local multi-scale and multi-orientation analysis. Different from Fourier transform, the Gabor filter can extract specified frequency and orientation information from local spatial fields, therefore making itself an effective texture analysis tool.
In order to better take use of the characteristics of Gabor filter, in this work, we deploy perceptually relevant log-Gabor filters [40] with multiple orientations and scales to accomplish filtering. A 2D log-Gabor filter is given by where ω 0 is the center frequency. θ j = jπ/J is the orientation angle, j = {0, 1, . . . , J − 1} is the orientation factor, and J is the number of orientations. σ r and σ θ control the radial bandwidth and the angular bandwidth of the filter, respectively. With log-Gabor filters possessing N different center frequencies and J different orientations applied to an image f (x), we can acquire a set of 2NJ responses {R n,j (x), I n,j (x), n = 0, . . . , N − 1, j = 0, . . . , J − 1}, where R n,j (x) and I n,j (x) are the real and imaginary components of the log-Gabor filters responses, respectively. As the filter responses maps are observed to follow a Gaussian distribution, we use the GGD to model the distributions of {R n,j (x)} and {I n,j (x)}, The best-fit parameters (α, β) are taken as quality prediction features. Besides, we use the GGD to model the distributions of the smoothed directional gradient components of R n,j (x) and I n,j (x) , and the best-fit parameters are extracted as NSS features. Furthermore, we use the Weibull distribution (Equation (24)) to model the smoothed directional gradient magnitudes of R n, j (x) and I n,j (x) , and take the best-fit parameters (a, b) as NSS features.

Algorithm
The BM-IQE model was constructed in two stages-the training stage and the testing stage. In the training stage, we learn a reference multivariate Gaussian (MVG) model of the NSS-based features from a collection of pristine images. Then, in the testing stage, similar patches of a given test image are grouped through block-matching and a feature vector is extracted from each group. For each feature vector, an MVG model is fitted to and the basic quality estimation of each patch (centroid of the group) is given by measuring the distance between group MVG model and the reference MVG model. Further quality estimation of each patch is obtained by weighting averaging of all its similar patches' basic estimations. Finally, the overall image quality of the test image is computed by an average pooling of the patch quality estimations.

Reference MVG Model Learning
Inspired by the IL-NIQE, we learn a reference MVG model to characterize the statistical NSS features of natural pristine images. We collected online a total of 100 high quality natural pristine images, which cover people, animals, architectures, plants, etc. Only images generally recognized as high-quality images are eventually selected. Note that experiments were conducted by adopting different size of image set for training, which demonstrates that 100 images are enough to produce a sound reference model. Besides, there is no overlap between the selected pristine images and the IQA databases that will be later used in algorithm performance evaluation, which ensures the effectiveness of our BIQA method.
After the natural pristine image corpus created, we started with learning the reference MVG model. First, we partitioned the pristine images into patches of size p × p, and the high-quality patches are selected out. Then, the EFB we introduced in Section 3 were extracted from each patch yielding a feature vector x i . As some features can be correlated with each other, such as the gradient components and magnitude features, we applied PCA to the feature vector to reduce the computational cost. The PCA can be described as where x i are the elements of X = [x 1 , x 2 , . . . , x n ] ∈ R d×n the matrix of feature vectors. Φ ∈ R d×m is the learned projection matrix, of which the dimension m is determined by solving the following error function: where S denotes the eigenvalue matrix, d denotes the original dimension, m denotes the wanted dimension. As a result of PCA, we obtained x i , i = 1, . . . , n as samples for the MVG distribution.
The final reference MVG model is given by where X is the vector variable, and (µ, Σ) are the mean vector and the covariance matrix of X , respectively. The reference MVG model can be fully described by the parameters (µ, Σ).

Distorted Image Quality Prediction
Like the pristine images in the training stage, the given test image is partitioned into patches of size p × p and these patches are numbered in sequence. Then, similar patches are grouped together through SSIM-BM. Specifically, each patch is used as a reference patch and the SSIM metrics are computed between the reference patch and all the other patches. The patches with greater metric values (smaller distance from the reference patch) than a given threshold are considered mutually similar and subsequently grouped together. For each group, the reference patch can be regarded as the "centroid" of this group; the sequence numbers and metric values of other patches are memorized in a location matrix M LOC and similarity matrix M SIM respectively for later use. Then, the NSS feature vector y i is extracted from each group, and the feature vector y i after PCA can be easily obtained using the pre-learned projection matrix Φ: Extensional experiments have been made to verify the validity of the use of Φ for distorted images. After obtaining Φ in training stage, arbitrary images are given to apply PCA to. The error function (33) was used to measure the information loss of PCA, and thus, was computed between the feature vectors before and after PCA operation. The test results turn out to be all below 0.05. Therefore, the projection matrix is subjectively assumed to be scalable.
After obtaining the feature vector set y i , i = 1, . . . , n , an MVG model denoted by y i , Σ is fitted to each group, where y i means µ i and the empirical covariance Σ of {y i } means Σ i . By comparing the group MVG model against the reference MVG model, we can obtain the basic quality estimation Q i of the i-th group centroid patch. The basic quality score is given by where (µ, Σ) describes the reference MVG model, y i , Σ describes the MVG model of patch i (centroid of group i). Then, by utilizing the location matrix and similarity matrix we obtained after SSIM-BM, the further quality estimation Q i of each patch i is given by a weighted average of all the similar patches' basic quality estimations, where the similarity is taken as the weight for averaging. Finally, the overall quality score Q of the test image can be computed by an averaging pooling of all the patch quality estimations. The process of computing the overall image quality score is illustrated in Figure 7, and the BM-IQE Algorithm 1 is summarized below.  image patches for the rest − 1 patches run global SSIM-BM using Equations (1)-(4) while similarity ≥ Update the patch group, the similarity matrix and the location matrix end while end for end for Extract feature matrixes from patch groups using Equations (5)-(31) Apply PCA to the feature matrixes using Equation (32) Conform the results of PCA to MVG models respectively using Equation (33) Calculate Q according to the reference MVG model using Equation (36) Calculate Q : Q = ∑ Average Q to produce the overall quality score Q Output: Q

Results and Discussion
In this section, we present the experimental results and analyze BM-IQE's performance on blind image quality assessment by employing various natural image databases and remote sensing scene datasets as well as comparing it with existing FR/NR-IQA algorithms.

Results and Discussion
In this section, we present the experimental results and analyze BM-IQE's performance on blind image quality assessment by employing various natural image databases and remote sensing scene datasets as well as comparing it with existing FR/NR-IQA algorithms.

Training Details
Inspired by [18], we tuned the parameters on a subset of the TID2013 database. The subset consists of 10 reference images and associated 1200 distorted images. The tuning criterion is that the parameter value leading to a higher Spearman rank-order correlation coefficient (SRCC) is chosen. In our final completion, we set the patch size p to 84 and the PCA transformed features dimension m to 430. The parameters of the log-Gabor filters are set as follows: N = 3, J = 4, σ r = 0.60, σ θ = 0.71, ω 1 0 = 0.417, ω 2 0 = 0.318, and ω 3 0 = 0.243, where ω 1 0 , ω 2 0 and ω 3 0 represent the three center frequencies of the log-Gabor filters at three scales. Particularly, for the block-matching threshold t, we tuned this parameter on a dataset consists of 50 distorted images cover different image content. The testing results were manually inspected to ensure that the entire testing dataset complies with the tuning rule presented in Section 2.2. In the final completion, the threshold t was set to 0.69.

Testing Details
To evaluate the prediction ability of BM-IQE model, five general-purpose IQA databases were used-(1) the LIVE database [41], (2) the TID2013 database [42], (3) the CSIQ database [43], (4) the IVC database [44], and (5) the LIVE MD database [45]. Besides, the UC Merced Land Use Dataset [46] and the Aerial Image Dataset (AID) [47] (RS image datasets) were particularly adopted. Among the IQA databases, the LIVE MD and TID2013 databases include multiply distorted images and the remaining three are singly distorted image databases. The information of these databases is summarized in Table 1. We compare the BM-IQE with eight state-of-the-art FR/NR-IQA models, including (1) SSIM [35], (2) FSIM [48], (3) VIF [49], (4) BRISQUE [14], (5) DIIVINE [11], (6) NIQE [19], (7) IL-NIQE [18], (8) MUSIQUE [21], and (9) WaDIQaM-NR [20]. In the IQA research area, the goodness of any algorithm is gauged by measuring the correlation of algorithmic scores with subjective (scored by human) mean opinion scores (DMOS/MOS) on a large dataset covering different distortion. Thus, three typical metrics were adopted to evaluate the prediction performance of the competing algorithms-(1) the Spearman Rank-Order Correlation Coefficient (SROCC), (2) the Pearson linear correlation coefficient (PLCC), and (3) the Root Mean Square Error (RMSE) were used to measure the rank-order correlation, the linear correlation and the degree of dispersion between two the groups of scores, respectively. The SROCC metric is computed between the objective scores predicted by BIQA algorithms and the subjective mean opinion scores (MOS) provided by images database and is generally used to evaluate the prediction monotonicity and accuracy. The PLCC and RMSE metrics are computed between the subjective and objective scores following a nonlinear regression: where β i , i = 1, 2, . . . , 5 are the parameters that need to be fitted. In short, the higher the SROCC and PLCC values and the lower RMSE values, the better the ability of prediction behaves.

Evaluation of Features
In our algorithm, six types of features were employed, including (1) MSCN features, (2) adjacent MSCN features, (3) color features, (4) gradient features, (5) log-Gabor filter response features, and (6) GLCM features. In order to demonstrate the effectiveness and efficiency of each selected feature, we evaluated the performance of each feature on IQA databases. The pre-trained projection matrix remained fixed for all experiments and the SROCC metric was used to evaluate the feature prediction ability. Experimental results are shown in Table 2. What is more, since the block-matching is introduced to extract group-level features, a comparative experiment was presented to verify that the expression discrepancy of individual patch features is effectively improved. The SROCC and PLCC metrics were used to measure the algorithm performance. The overall results are reported in Table 3, and the best-performing results are in bold. From Table 2, we can see that the front five types of features both show poor BIQA performance, nevertheless, the four GLCM features show superior performance on all databases, which suggests that the texture conveys more important information for visual perception evaluation. Besides, among the front five features, the log-Gabor filters response feature performs comparatively better and the color feature performs the worst. It indicates that the selected multi-scale and multi-orientation filters responses feature can effectively characterize image quality, while the color space we adopted is not suitable for the natural image statistical description. More sufficient representative feature models are needed in future works.
From Table 3, we can intuitively summarize that on all employed databases, algorithm with block-matching strategy performs much better than that without block-matching. The only deficiency of our model goes to the PLCC value on the TID2013 database; however, the SROCC metric is worth more for the performance measurement than the PLCC metric, and the SROCC result of our model is numerically slightly better. Therefore, we can conclude that the proposed method with a block-matching strategy is of high effectiveness and great values.

Performance on Remote Sensing Databases
In this section, we analyze the quality prediction abilities of BM-IQE and other FR/NR-IQA models on remote sensing image datasets. Two benchmark datasets were used in experiments. The first one is the UC Merced Land Use Dataset with 21 classes land use images meant for research purposes. There are 100 images for each of the following classes: agricultural, airplane, golf course, beach, buildings, chaparral, dense residential, forest, freeway, harbor, baseball diamond, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court. Each image measures 256 × 256 pixels. Similarly, the second one is the Aerial Image Dataset (AID), which has 30 different scene classes and about 200 to 400 samples of size 600 × 600 in each class.
In this work, we selected 15 pristine RS images of different classes from the UC Merced Land Use Dataset and 20 from the Aerial Image Dataset. The thumbnails of all these images are shown in Figure 8. Note that in order to truly and objectively evaluate the imaging performance of an imaging system, only the degradation types caused by imaging system should be considered regardless of such environmental disturbance as cloud cover or atmospheric turbulence. Particularly, for the band to band mismatch distortion, though caused by the imaging system, it usually existed on remote sensors of early years. However, with the technology of spectral splitting, the geometric deviation between the bands of the multispectral image is negligible. Thus, aiming at the application for current remote sensors, we think there is not much necessity to consider the band to band mismatch distortion. For another distortion type as the dropping line, it is also caused by the imaging system. This kind of image distortion has a noticeable manifestation, and the visual effect of the image will significantly decline under this situation, which can straightforwardly reflect the trouble of remote sensor imaging equipment and the decline of imaging performance. Hence, there is no necessity to quantify the dropping line distortion through the image quality assessment process. Based on above consideration, we select the white noise, Gaussian blur and JPEG compression distortions as subjects to conduct the experiments on RS images. distortion. For another distortion type as the dropping line, it is also caused by the imaging system. This kind of image distortion has a noticeable manifestation, and the visual effect of the image will significantly decline under this situation, which can straightforwardly reflect the trouble of remote sensor imaging equipment and the decline of imaging performance. Hence, there is no necessity to quantify the dropping line distortion through the image quality assessment process. Based on above consideration, we select the white noise, Gaussian blur and JPEG compression distortions as subjects to conduct the experiments on RS images. We added white noise, Gaussian blur and JPEG compression distortions to each pristine image respectively to set up a degraded image dataset. Each type of distortion was set to five different levels, and the subjective score was determined by the averaging of 10 people's grading results. Note that the 10 people were all trained on the standard IQA databases to ensure a consistent guideline on their judgments. Six NR-IQA algorithms were tested on the RS image dataset: BRISQUE, DIIVINE, NIQE, IL-NIQE, WaDIQaM-NR, and BM-IQE. SROCC and PLCC were used to measure the quality prediction performance of each IQA model. An example regarding distorted images and corresponding objective scores is given in Figure 9. The overall testing results are reported in Tables 4 and 5. We added white noise, Gaussian blur and JPEG compression distortions to each pristine image respectively to set up a degraded image dataset. Each type of distortion was set to five different levels, and the subjective score was determined by the averaging of 10 people's grading results. Note that the 10 people were all trained on the standard IQA databases to ensure a consistent guideline on their judgments. Six NR-IQA algorithms were tested on the RS image dataset: BRISQUE, DIIVINE, NIQE, IL-NIQE, WaDIQaM-NR, and BM-IQE. SROCC and PLCC were used to measure the quality prediction performance of each IQA model. An example regarding distorted images and corresponding objective scores is given in Figure 9. The overall testing results are reported in Tables 4 and 5. significantly decline under this situation, which can straightforwardly reflect the trouble of remote sensor imaging equipment and the decline of imaging performance. Hence, there is no necessity to quantify the dropping line distortion through the image quality assessment process. Based on above consideration, we select the white noise, Gaussian blur and JPEG compression distortions as subjects to conduct the experiments on RS images. We added white noise, Gaussian blur and JPEG compression distortions to each pristine image respectively to set up a degraded image dataset. Each type of distortion was set to five different levels, and the subjective score was determined by the averaging of 10 people's grading results. Note that the 10 people were all trained on the standard IQA databases to ensure a consistent guideline on their judgments. Six NR-IQA algorithms were tested on the RS image dataset: BRISQUE, DIIVINE, NIQE, IL-NIQE, WaDIQaM-NR, and BM-IQE. SROCC and PLCC were used to measure the quality prediction performance of each IQA model. An example regarding distorted images and corresponding objective scores is given in Figure 9. The overall testing results are reported in Tables 4 and 5.   From Tables 4 and 5, we can see that on the RS scene image datasets, BM-IQE shows much better BIQA performance than the other algorithms. For all types of distortions, compared with other models, BM-IQE obtains the best evaluation results and the metric values manifest a great progress on RS-IQA problem. Nevertheless, none of the five models can obtain an actually satisfying result. A possible reason is the texture did not get soundly expressed by existing features, though the GLCM features have effectively improved the IQA performance, they are far from meeting our demands. Hence, the quality assessment of RS scene image deserves further exploration.

Performance on Individual Databases
In this section, we evaluate the quality prediction performance of BM-IQE and other algorithms on individual distortion databases. Three FR-IQA models and five NR-IQA models were used to compare with BM-IQE-SSIM, FSIM, VIF, BRISQUE, DIIVINE, NIQE, IL-NIQE, and WaDIQaM-NR.
First, all algorithms need to train on a subset of the TID2013 database, and this subset will not be used in later testing stage. Then, the algorithms were tested on four benchmark image databases: LIVE, CSIQ, TID2013, and IVC. SROCC, PLCC, and RMSE were computed to measure the consistency and accuracy between the prediction scores and the MOS. The overall testing results are reported in Table 6. The best-performing results of FR and NR classes are highlighted in bold. As shown in Table 6, we can draw the following conclusions. First, compared with other state-of-the-art NR-IQA models, the proposed BM-IQE model shows competitive prediction abilities over all databases. Particularly, for the SROCC metric, BM-IQE obtains the best results on the CSIQ database and the second-best results on the LIVE, TID2013 and IVC databases, which suggests the great consistency of BM-IQE with the MOS values over various datasets. For the RMSE metric, our model outperforms all the others on each database. Second, compared with those superior FR-IQA models, BM-IQE shows a comparable quality prediction performance. Third, although the BM-IQE model didn't obtain the best results on the IVC database, the whole result still delivers a balanced performance, which demonstrates the strong robustness of our model over all databases. Figure 10 presents the scatter plots of objective scores predicted by BM-IQE versus subjective scores (the MOS) provided by four databases. Despite the presence of some outliers, the plots are generally regression linear. In summary, when looking at the overall performance across all databases, BM-IQE demonstrates a better average performance than other NR-IQA models.

Performance on Particular Distortion Types
In the process of previous literature research, we found that the existing NR-IQA algorithms cannot obtain exact quality prediction results under some specific distortion types. We exam BM-IQE on five particular distortion types of the TID2013 database-non eccentricity pattern noise, Local block-wise distortions of different intensity, mean shift (intensity shift), contrast change, and change of color saturation. Four NR-IQA models were employed to compare with BM-IQE by computing the SROCC metric; they are BRISQUE, DIIVINE, NIQE, IL-NIQE, and WaDIQaM-NR. The testing results are presented in Table 7, and the best-performing results are in bold.  Table 7, we can infer that the BM-IQE model outperforms all other algorithms under several specific distortion types. For such distortion types as non-eccentricity pattern noise, local block-wise distortions of different intensity, mean shift (intensity shift), and change of color saturation, our BM-IQE model obtains better results than the other four IQA models. Though the results are still not satisfying enough, a meaningful achievement has been made upon this. The reason for this improvement may be the adoption of texture features, which we mentioned is of great significance to visual perception in Section 5.4. However, for the contrast change distortion type, the result of BM-IQE model is barely satisfactory. One possible reason is that this distortion type is hard

Performance on Particular Distortion Types
In the process of previous literature research, we found that the existing NR-IQA algorithms cannot obtain exact quality prediction results under some specific distortion types. We exam BM-IQE on five particular distortion types of the TID2013 database-non eccentricity pattern noise, Local block-wise distortions of different intensity, mean shift (intensity shift), contrast change, and change of color saturation. Four NR-IQA models were employed to compare with BM-IQE by computing the SROCC metric; they are BRISQUE, DIIVINE, NIQE, IL-NIQE, and WaDIQaM-NR. The testing results are presented in Table 7, and the best-performing results are in bold. Table 7.
Overall SROCC values of NR-IQA algorithms on particular distortion types of TID2013 database.  Table 7, we can infer that the BM-IQE model outperforms all other algorithms under several specific distortion types. For such distortion types as non-eccentricity pattern noise, local block-wise distortions of different intensity, mean shift (intensity shift), and change of color saturation, our BM-IQE model obtains better results than the other four IQA models. Though the results are still not satisfying enough, a meaningful achievement has been made upon this. The reason for this improvement may be the adoption of texture features, which we mentioned is of great significance to visual perception in Section 5.4. However, for the contrast change distortion type, the result of BM-IQE model is barely satisfactory. One possible reason is that this distortion type is hard to characterize using existing features; another is that the color space we employed is not appropriate for the human visual perception. In future work, we need to investigate how to deal with such distortion types more properly.

Performance on Multiply Distorted Database
In this section, we evaluate the quality prediction performances of BM-IQE and other FR/NR-IQA algorithms on multiply distorted image databases. The LIVE MD database consists of two parts and we consider them as two separate datasets-LIVE MD1 and LIVE MD2. Images of LIVE MD1 are degraded by Gblur and JPEG distortions, and images of LIVE MD2 are degraded by Gblur and WN distortions. We compare BM-IQE with five NR-IQA models. The SROCC, PLCC and RMSE metrics were computed to evaluate the performance of each algorithm. The best-performing results are in bold, and the overall results are presented in Table 8. As shown in Table 8, BM-IQE shows competitively performance on multiply-distorted image datasets. Specifically, compared to other algorithms, BM-IQE obtains better SROCC result which manifests better consistency with the subjective scores. Though the PLCC and RMSE values of BM-IQE are not the best-performing ones, BM-IQE shows approximate performance, which indicates that our approach designed for singly-distorted image quality assessment can also be applied to multiply-distorted cases.

Computational Cost
In this section, we analyze the computational cost of the proposed algorithm. All of the experiments were performed by running Matlab code on a laptop (Intel(R) Core(TM) i5-6300HQ CPU@2.3GHz, 8 GB RAM, Windows 10 Pro 64-bit, Lenovo, Beijing, China). The software platform was Matlab R2016a (MathWorks, Natick, Massachusetts, USA). The time cost of each BIQA model was measured by predicting the quality of a 512 × 512 color image and the results are listed in Table 9. BM-IQE has a higher computational complexity, and this may account for the procedure of block-matching, which occupies most of the running time. The computational complexity of training stage is given as O(n) and that of testing stage is given as O n 2 .

Conclusions
In this work, we propose a BIQA model to efficaciously evaluate the quality of images for both real-life natural scenes and remote sensing scenes. Our model, BM-IQE, employs a novel block-matching strategy to help the image patch characteristics be more soundly expressed as well as adopting GLCM statistics as texture features to better describe RS images. Extensive experiments show that BM-IQE outperforms the other BIQA models on RS image datasets and achieves state-of-the-art performance on the general-purpose natural image datasets. The main achievements of our work can be summarized as following. First, BM-IQE has been confirmed suitable for the BIQA for remote sensing scenes. Therefore, it can be applied to help evaluate the imaging performance of remote sensors by assessing the visually representative RGB bands of its images. Second, for the natural image categories, the performance of the blind image quality assessment method has been meaningfully improved through the block-matching strategy.
Future work can be conducted in following dimensions. The first is the joint effect of block-matching and texture features; we can keep looking into the reasons of better prediction ability to remote sensing images. The second is that more distortion types can be adopted for investigation of RS-IQA issue, which is of great value for the subsequent remote sensing image interpretation.