Statistical Evaluation of No-Reference Image Quality Assessment Metrics for Remote Sensing Images

Image quality assessment plays an important role in image processing applications. In many image applications, e.g., image denoising, deblurring, and fusion, a reference image is rarely available for comparison with the enhanced image. Thus, the quality of enhanced images must be evaluated blindly without references. In recent years, many no-reference image quality metrics (IQMs) have been proposed for assessing digital image quality. In this paper, we first review 21 commonly employed no-reference IQMs. Second, we apply these measures to Quickbird images with three different types of general content (urban, rural, and harbor) subjected to three types of degradation (average filtering, Gaussian white noise, and linear motion degradation), each with 40 degradation levels. We evaluate the robustness of the IQMs based on the criteria of prediction accuracy, prediction monotonicity, and prediction consistency. Then, we perform factor analysis on those IQMs deemed robust, and cluster them into several components. We then select the IQM with the highest loading coefficient as the representative IQM for that component. Experimental results suggest that different measures perform differently for images with different contents and subjected to different types of degradation. Generally, the degradation method has a stronger effect than the image content on the evaluation results of an IQM. The same IQM can provide opposite dependences on the level of degradation for different degradation types, and an IQM that performed well with one type of degradation may not perform well with another type. The training-based measures are not appropriate for remote sensing images because the results are highly dependent on the samples employed for training. Only seven of the 21 IQMs were found to fulfill the requirements of robustness. Edge intensity (EI) and just noticeable distortion (JND) are suggested for evaluating the quality of images subjected to average filter degradation. EI, blind image quality assessment through anisotropy (BIQAA), and mean metric (MM) are suggested for evaluating the quality of images subjected to Gaussian white noise degradation. Laplacian derivative (LD), JND, and standard deviation (SD) are suggested for evaluating the quality of images subjected to linear motion. Finally, EI is suggested for evaluating the quality of an image subjected to an unknown type of degradation.


Introduction
The quality of digital images can be degraded during acquisition, transmission, storage, and reconstruction by various sources of degradation, such as distortion of the spatial resolution, motion blur, and transmission noise [1].Identifying the distortion and quantifying its impact on image quality is essential for various applications such as for monitoring image quality in quality control systems and for optimizing the output of image processing algorithms [2].The development of effective image quality assessment is therefore necessary for these purposes [3,4].
Because human beings are generally the end user in most image processing applications, the most reliable means of assessing image quality is by subjective evaluation.A subjective image quality metric (IQM) can be computed by preparing test images, selecting an appropriate number of human observers, and obtaining their opinion based on specified criteria and conditions.Widely used subjective IQMs are mean opinion score (MOS) and difference MOS (DMOS) [5].However, subjective IQMs require the services of multiple human observers, and are thus expensive, time-consuming, and impractical for real-time implementation.Moreover, the subjective test results depend on a number of factors that are difficult to quantify, such as the background and motivation of observers [6,7].As a result, the development of objective IQMs is presently receiving increasing attention.The goal is to design objective IQMs that quantify the image quality automatically and yield reliable results that are well correlated with subjective assessments [8].In general, objective IQMs can be classified into three categories according to the extent to which a reference image is required: full-reference (FR), reduced-reference (RR), and no-reference (NR).In an objective FR IQM, a reference image is required to assess the quality of the test image by comparing the extent of similarity or difference between the test image and the reference image.Objective FR IQM include the classical mean squared error (MSE), peak signal to noise ratio (PSNR), and the recently introduced structural similarity (SSIM) [9].In an objective RR IQM, some extracted features of a reference image are required to assess the quality of a test image.Objective RR IQM include a number of IQMs such as reference reduced image quality assessment (RRIQA) [10] and C4 [11].In an objective NR IQM, the statistical metric is calculated from the distorted image itself.Objective NR IQM include a number of IQMs such as entropy, gradient, and standard deviation [12].In contrast to FR or RR IQM, an NR IQM to some extent calculates the quality of the test image directly according to particular criteria, rather than assessing its fidelity or similarity to the reference image.Moreover, in many image applications, e.g., image denoising, deblurring, and fusion, a reference image is rarely available for comparison with the enhanced image.Thus, the image quality must be evaluated in the absence of a reference image [13][14][15][16].
Although a number of new objective IQMs have been developed in the past few decades, the majority of these require the original undistorted image as a reference [17].The development of objective NR IQMs is a relatively new topic in the field of image processing, and, in more recent years, a large number of NR metrics have been proposed for evaluating image quality.However, most NR metrics have been designed for gray or color images, and whether they are suitable for multi-spectral remote sensing images is still unknown.In this paper, we first review 21 commonly employed objective NR metrics.Then, we apply these measures to Quickbird images with three different contents (urban, rural, and harbor) using three types of degradation (average filtering, Gaussian white noise, and linear motion degradation), each with 40 degradation levels.We then investigate the robustness of the individual IQMs based on the criteria of prediction accuracy, prediction monotonicity, and prediction consistency.Finally, we analyze those NR metrics deemed robust, and determine representative NR metrics suitable for evaluating remote sensing images with different types of degradation.

Commonly Employed No-Reference Image Quality Metrics: An Overview
This section presents an overview of commonly employed objective NR IQMs proposed in recent years.These include several categories, i.e., distortion specific metrics, training-based metrics, and metrics based on natural scene statistics.

•
Auto correlation (AC): Derived from the auto-correlation function.The AC metric uses the difference between auto-correlation values at two different distances along the horizontal and vertical directions, respectively.If an image is blurred or the edges are smoothed, the correlation between neighboring pixels becomes high.

•
Average gradient (AG): Reflects the contrast and the clarity of the image.It can be used to measure the spatial resolution of a test image, where a larger AG indicates better spatial resolution [18].

•
Blind image quality index (BIQI): A two-step framework based on natural scene statistics.Once trained, the framework requires no knowledge of the distortion process, and the framework is modular, in that it can be extended to any number of distortions [19].

•
Blind image quality assessment through anisotropy (BIQAA): Measures the averaged anisotropy of an image by means of a pixel-wise directional entropy.A pixel-wise directional entropy is obtained by measuring the variance of the expected Rényi entropy and the normalized pseudo-Wigner distribution of the image for a set of predefined directions.BIQAA is capable of distinguishing the presence of noise in images [20].

•
Blur metric (BM): Based on the discrimination between different levels of blur perceptible on the same image [22].

•
Blind/referenceless image spatial quality evaluator (BRISQE): A distortion-generic blind image quality assessment model based on natural scene statistics, which operates in the spatial domain.Scene statistics of locally normalized luminance coefficients are employed to quantify possible losses of naturalness in the image due to the presence of distortions, thereby leading to a holistic measure of quality [23].

•
Cumulative probability of blur detection (CPBD): Based on the cumulative probability of blur detection, which is used to classify the visual quality of images into a finite number of quality classes [24].

•
Distortion measure (DM): Computes the deviation of frequency distortion from an allpass response of unity gain, and then the deviation is weighted by a model of the frequency response of the human visual system and integrated over the visible frequencies [25].

•
Edge intensity (EI): Calculated by the gradient of the Sobel filtered edge image.

•
Entropy metric (EM): Measures the information content of an image.If the probability of occurrence of each gray level is low, the entropy is high, and vice versa [26]

•
Just noticeable distortion (JND): Integrates spatial masking factors with the nonlinear additivity model for masking.The JND estimator applies to all color components and accounts for the compound impact of luminance masking, texture masking and temporal masking [28,29].

•
Kurtosis metric (KM): Kurtosis is a statistical measure of the degree of sharpness or flatness of a distribution (i.e., average slope and energy concentration).The KM increases with increasing depth of focus [30,31].

•
Laplacian derivative (LD): Includes the first-order (gradient) and second-order (Laplacian) derivative metrics.These metrics act as a high-pass filter in the frequency domain.Image sharpness increases with increasing LD.

•
Mean metric (MM): Calculated as the mean pixel value of the image, which indicates its average brightness level.For equivalent scenery, image brightness increases with increasing MM.

•
Naturalness image quality evaluator (NIQE): A quality-aware collection of statistical features based on a simple and successful space domain natural scene statistic model.These features are derived from a corpus of natural, undistorted images [32].
• Quality aware clustering (QAC): Distorted images are partitioned into overlapping patches, and a percentile pooling strategy is used to estimate the local quality of each patch.Then, a centroid for each quality level is learned by quality aware clustering.These centroids are then used as a codebook to infer the quality of each patch in a given image, and a perceptual quality score can be obtained subsequently for the entire image [33].

•
Standard deviation (SD): Calculated as the square root of the image variance.SD reflects the contrast of the image, where the image contrast increases with increasing SD.

•
Skewness metric (SM): Skewness is a statistical measure of the direction and extent to which a dataset deviates from a distribution.For a standard normal distribution, high skewness indicates asymmetry of the data.In this case, the data contains a greater amount of information.

Test Images and Degradation Methods
This section describes the initial testing images and the degradation methods used to apply a particular level of image degradation for evaluating the performance of the objective NR IQMs presented in Section 2.

Test Images
A Quickbird image was obtained from IGARSS 2012.The image was acquired on 11 November 2007, and covers the city of San Francisco, CA, USA.The spatial resolution of the multi-spectral image is 2.4 m.We selected the 12 subset images shown in Figure 1a-l with uniform sizes of 256 × 256 pixels.The test images can be classified into three categories: Figure 1a-d  • Standard deviation (SD): Calculated as the square root of the image variance.SD reflects the contrast of the image, where the image contrast increases with increasing SD.

•
Skewness metric (SM): Skewness is a statistical measure of the direction and extent to which a dataset deviates from a distribution.For a standard normal distribution, high skewness indicates asymmetry of the data.In this case, the data contains a greater amount of information.

Test Images and Degradation Methods
This section describes the initial testing images and the degradation methods used to apply a particular level of image degradation for evaluating the performance of the objective NR IQMs presented in Section 2.

Degradation Methods
The methods employed to simulate the distortion of the test images are introduced as follows.

(a) Average filter degradation
Average filtering replaces each pixel value in an image with the average value of its neighbors and itself.Average filtering is a kind of convolution filter.Like other convolution filters, it is based on a kernel, which represents the shape and size of the neighborhood to be sampled when calculating the average.In this paper, the kernel is a square matrix with an edge dimension ranging from 1 to 40 pixels in increments of 1.Average filtering provides image distortion that is representative of spatial resolution degradation.

Degradation Methods
The methods employed to simulate the distortion of the test images are introduced as follows.

(a) Average filter degradation
Average filtering replaces each pixel value in an image with the average value of its neighbors and itself.Average filtering is a kind of convolution filter.Like other convolution filters, it is based on a kernel, which represents the shape and size of the neighborhood to be sampled when calculating the average.In this paper, the kernel is a square matrix with an edge dimension ranging from 1 to 40 pixels in increments of 1.Average filtering provides image distortion that is representative of spatial resolution degradation.
To allow for a visual interpretation of the relative effects on image quality obtained after the application of average filter degradation in terms of monotonicity, we present Figure 1a subjected to average filter degradation for kernels with an edge dimension ranging from 5 to 40 pixels in increments of 5, as shown in Figure 2. To allow for a visual interpretation of the relative effects on image quality obtained after the application of average filter degradation in terms of monotonicity, we present Figure 1a subjected to average filter degradation for kernels with an edge dimension ranging from 5 to 40 pixels in increments of 5, as shown in Figure 2. We note from Figure 2 that the image quality obviously decreased with increasing kernel size.Visually, the differences in the image degradation between level 5 and level 20 are much greater than between level 25 and level 40, where the latter levels present only relatively slight differences.These images show that the average filter degradation has a decreasing effect on image quality with an increasing coarseness of spatial resolution.

(b) Gaussian white noise degradation
Gaussian white noise with a probability density function satisfying a normal distribution was added into images.Here, the mean value of the noise was set to 0 and the variance ranged from 0.0005 to 0.02 in increments of 0.0005.
We present Figure 1a subjected to Gaussian white noise degradation with variance ranging from 0.0025 to 0.02 in increments of 0.0025, as shown in Figure 3.We note from Figure 2 that the image quality obviously decreased with increasing kernel size.Visually, the differences in the image degradation between level 5 and level 20 are much greater than between level 25 and level 40, where the latter levels present only relatively slight differences.These images show that the average filter degradation has a decreasing effect on image quality with an increasing coarseness of spatial resolution.

(b) Gaussian white noise degradation
Gaussian white noise with a probability density function satisfying a normal distribution was added into images.Here, the mean value of the noise was set to 0 and the variance ranged from 0.0005 to 0.02 in increments of 0.0005.
We present Figure 1a subjected to Gaussian white noise degradation with variance ranging from 0.0025 to 0.02 in increments of 0.0025, as shown in Figure 3.To allow for a visual interpretation of the relative effects on image quality obtained after the application of average filter degradation in terms of monotonicity, we present Figure 1a subjected to average filter degradation for kernels with an edge dimension ranging from 5 to 40 pixels in increments of 5, as shown in Figure 2. We note from Figure 2 that the image quality obviously decreased with increasing kernel size.Visually, the differences in the image degradation between level 5 and level 20 are much greater than between level 25 and level 40, where the latter levels present only relatively slight differences.These images show that the average filter degradation has a decreasing effect on image quality with an increasing coarseness of spatial resolution.

(b) Gaussian white noise degradation
Gaussian white noise with a probability density function satisfying a normal distribution was added into images.Here, the mean value of the noise was set to 0 and the variance ranged from 0.0005 to 0.02 in increments of 0.0005.
We present Figure 1a subjected to Gaussian white noise degradation with variance ranging from 0.0025 to 0.02 in increments of 0.0025, as shown in Figure 3.We note from Figure 3 that the image quality obviously decreased with increasing variance of Gaussian white noise.The street in Figure 1a is heavily blurred by the noise with variance larger than 0.005.The building and trees are mixed when the variance of the noise reaches 0.0125.These images show that the Gaussian white noise has a decreasing effect on image quality with an increasing variance of noise.

(c) Linear motion degradation
The images were convolved with a filter that simulates the linear motion of a camera moving by m pixels at an angle of n degrees counterclockwise from the horizontal direction to the right.In this paper, n is set to 45 • and m ranges from 1 to 40 pixels in increments of 1.
We present Figure 1a subjected to linear motion degradation with pixels at an angle of 45 • ranging from 5 to 40 in increments of 5, as shown in Figure 4.
ISPRS Int.J. Geo-Inf.2017, 6, 133 6 of 18 We note from Figure 3 that the image quality obviously decreased with increasing variance of Gaussian white noise.The street in Figure 1a is heavily blurred by the noise with variance larger than 0.005.The building and trees are mixed when the variance of the noise reaches 0.0125.These images show that the Gaussian white noise has a decreasing effect on image quality with an increasing variance of noise.

(c) Linear motion degradation
The images were convolved with a filter that simulates the linear motion of a camera moving by m pixels at an angle of n degrees counterclockwise from the horizontal direction to the right.In this paper, n is set to 45° and m ranges from 1 to 40 pixels in increments of 1.
We present Figure 1a subjected to linear motion degradation with pixels at an angle of 45°ranging from 5 to 40 in increments of 5, as shown in Figure 4. We note from the Figure 4 that the image quality obviously decreased with increasing length size.It is difficult to distinguish land covers when pixel length reaches length size of 25.Similar to the results of average filter degradation, the differences in the image degradation between length size of 5 and 20 are much greater than between length size of 25 and 40, where the latter length sizes present only relatively slight differences.

Statistical Analysis of Evaluation Results
Owing to the 40 levels for each of the three classes of distortion investigated, 120 degraded images are obtained for each original image.Therefore, a total of 1440 (12 × 120) images were employed as samples for evaluation.As discussed in a past study [6], a good IQM must provide good prediction accuracy, prediction monotonicity, and prediction consistency.The prediction accuracy was determined by one-way analysis of variance (ANOVA) test, the prediction monotonicity was determined by the scatter plot of the degradation level and the IQM values, and the prediction consistency was determined by the Pearson linear correlation coefficient.The IQMs passed the three tests were determined to be robust.As the redundancy may exists, the robust IQMs were then classified into various components (or clusters) using factor analysis (FA), and the IQM with the highest loading coefficient was selected as the representative metric for each component.We note from the Figure 4 that the image quality obviously decreased with increasing length size.It is difficult to distinguish land covers when pixel length reaches length size of 25.Similar to the results of average filter degradation, the differences in the image degradation between length size of 5 and 20 are much greater than between length size of 25 and 40, where the latter length sizes present only relatively slight differences.

Statistical Analysis of Evaluation Results
Owing to the 40 levels for each of the three classes of distortion investigated, 120 degraded images are obtained for each original image.Therefore, a total of 1440 (12 × 120) images were employed as samples for evaluation.As discussed in a past study [6], a good IQM must provide good prediction accuracy, prediction monotonicity, and prediction consistency.The prediction accuracy was determined by one-way analysis of variance (ANOVA) test, the prediction monotonicity was determined by the scatter plot of the degradation level and the IQM values, and the prediction consistency was determined by the Pearson linear correlation coefficient.The IQMs passed the three tests were determined to be robust.As the redundancy may exists, the robust IQMs were then classified into various components (or clusters) using factor analysis (FA), and the IQM with the highest loading coefficient was selected as the representative metric for each component.

Prediction Accuracy
An objective IQM that provides good prediction accuracy is unaffected by image content.As such, the evaluation results of an IQM should be similar for equivalent degradation levels, regardless of the image content.One-way ANOVA was employed to evaluate the prediction accuracy.One-way ANOVA weighs a hypothesis that each sample is drawn from the same underlying probability distribution against an alternative hypothesis that underlying probability distributions are not the same for all samples.The hypotheses for the comparison of independent groups are where H 0 denotes that the mean values of all groups are equal, and H 1 denotes that the mean values of two or more groups are not equal.The null hypothesis indicates that no significant difference exists between the sample means.A high value for the F test indicates that the null hypothesis is rejected.Thus, any test results with an F test value larger than critical value would be significant, and the null hypothesis is rejected.This is used to determine whether the variation in the scores of IQMs arises predominantly from image degradation or from the image content.The metrics that are sensitive to image content are not suitable for objective image quality assessment.

Prediction Monotonicity
To be consistent with visual inspection, an IQM should demonstrate a monotonic dependence on the level of degradation of an image and exhibit small variations for different images with equivalent levels of degradation [34].A scatter plot is used to test the prediction monotonicity.

Prediction Consistency
The evaluation results of an IQM are judged according to how well the results correlate with the degradation level.The Pearson linear correlation coefficient (PLCC) is employed to quantitatively measure the correlation between image degradation levels and the results of NR metrics.The PLCC is defined as where Level(i) is the degradation level of the ith image, Level avg is the average degradation level of all images, NR(i) is the evaluation results of an NR metric for the ith image, and NR avg is the average evaluation results of an NR metric for all images.

Cluster Analysis of Robust Image Quality Metrics
As there may exist redundancy in robust IQMs, factor analysis (FA) based on principal component analysis (PCA) was employed to group similar IQMs into fewer factors.To verify the appropriateness of FA for this study, the Kaiser-Meyer-Olkin (KMO) measurement of sample adequacy and Bartlett's test of sphericity were performed on the correlation matrix of IQMs.When the KMO was greater than 0.5, the sample was considered adequate for FA [35,36].Bartlett's test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix.When this null hypothesis is rejected, the FA is appropriate for clustering robust IQMs.Each IQM is assumed to depend on a linear combination of the common factors, and the coefficients are known as loadings.Rotation was used to reorient the factor loadings so that the factors were more interpretable.The simplest case of rotation was an orthogonal rotation (varimax) in which the angle between the reference axes of factors was maintained at 90 • .This type of rotation was used with PCA.We performed FA on the metrics deemed robust for the different types of image degradation considered.

Prediction Accuracy
The results of one-way ANOVA testing for 21 IQMs based on 40 degradation levels for each type of degradation applied to the 12 sample images are listed in Table 1.The abbreviations used in Table 1 are defined in Section 2. The critical value of the F test for each IQM in Table 1 is 1.427.The values in grey in Table 1 therefore do not reject the null hypothesis, and, thus, the corresponding IQMs were affected by image content more than by the image degradation level.For average filter degradation, all of the IQMs were robust for different image contents.For Gaussian noise degradation, the results of EM were affected by image content.For linear motion degradation, the results of BIQAA, BLIINDS-II, EM, and KM were affected by image content.The metrics that are sensitive to image content are not suitable for image quality assessment.

Prediction Monotonicity
The scatter plot results for the 21 IQMs based on 40 degradation levels for each type of degradation applied to the 12 sample images are presented and discussed in this subsection.

(a) Average filter degradation
The scatter plots for the 21 IQMs based on the 40 levels of average filter degradation applied to the 12 sample images are presented in Figure 5.The IQMs were classified into four groups according to their degree of monotonicity, as Decreasing, Increasing, Fluctuating, and Unchanging.The four groups are given as follows: ,
For the Decreasing group, the evaluation results sharply decreased over the first 10 degradation levels, and changed little for the remaining 30 degradation levels, which are consistent with a visual inspection of Figure 2.For the Increasing group, the evaluation results were negatively correlated with the degradation levels.For the Fluctuating group, BIQI, BLIINDS-II, BRISEQ, CPBD, JNBM, and QAC were trained for a specific digital image dataset, which produced fluctuating evaluation results for different image contents.Therefore, it can be concluded that the results of the training-based IQMs are highly dependent on the training samples employed, and cannot be directly applied for the image quality evaluation of remote sensing images.For the other IQMs in this group, BM was proposed based on subjective tests and psychophysics functions and was limited to the specific images.KM measures the depth of focus, while Quickbird images are obtained with a uniform depth of focus.SM measures the asymmetry of the data, which is not a suitable metric for remote sensing images.The Unchanging group provided equivalent evaluations for all images regardless of the degradation level.DM was designed to measure the effect of frequency distortion, and is therefore not sensitive to average filter degradation.MM measures the mean pixel value of the image, which is unaffected by average filter operations, such that the results of MM remained unchanged with increasing degradation.

(b) Gaussian white noise degradation
The scatter plots for the 21 IQMs based on the 40 levels of Gaussian white noise degradation applied to the 12 sample images are presented in Figure 6.For the Decreasing group, the evaluation results sharply decreased over the first 10 degradation levels, and changed little for the remaining 30 degradation levels, which are consistent with a visual inspection of Figure 2.For the Increasing group, the evaluation results were negatively correlated with the degradation levels.For the Fluctuating group, BIQI, BLIINDS-II, BRISEQ, CPBD, JNBM, and QAC were trained for a specific digital image dataset, which produced fluctuating evaluation results for different image contents.Therefore, it can be concluded that the results of the training-based IQMs are highly dependent on the training samples employed, and cannot be directly applied for the image quality evaluation of remote sensing images.For the other IQMs in this group, BM was proposed based on subjective tests and psychophysics functions and was limited to the specific images.KM measures the depth of focus, while Quickbird images are obtained with a uniform depth of focus.SM measures the asymmetry of the data, which is not a suitable metric for remote sensing images.The Unchanging group provided equivalent evaluations for all images regardless of the degradation level.DM was designed to measure the effect of frequency distortion, and is therefore not sensitive to average filter degradation.MM measures the mean pixel value of the image, which is unaffected by average filter operations, such that the results of MM remained unchanged with increasing degradation.

(b) Gaussian white noise degradation
The scatter plots for the 21 IQMs based on the 40 levels of Gaussian white noise degradation applied to the 12 sample images are presented in Figure 6.The IQMs were classified into three groups according to their degree of monotonicity, as Decreasing, Increasing, and Fluctuating.The three groups are given as follows:

Decreasing AC BIQAA BM KM SM Increasing AG BIQI DM EI FISH JND LD MM SD Fluctuating BLIINDS II BRISQE CPBD EM JNBM NIQE QAC
For the Decreasing group, the evaluation results of AC, BM, KM, and SM demonstrate a decreasing trend with increasing degradation, while, with respect to the level of average filter degradation, the results of AC demonstrated an increasing trend and the results of BM, KM, and SM fluctuated.The evaluation results of the Increasing group demonstrate an increasing trend with increasing degradation.Meanwhile, the results of AG, EI, FISH, JND, LD, and SD demonstrated a decreasing trend with increasing average filter degradation, the results of BIQI fluctuated with respect to the level of average filter degradation, and the results of DM and MM were unchanging with respect to the level of average filter degradation.The members of the Fluctuating group here largely coincide with those of the Fluctuating group obtained for average filter degradation.For those members not in the same group, EM demonstrated unchanging results and the results of NIQE fluctuated with respect to the level of average filter degradation.

(c) Linear motion degradation
The scatter plots for the 21 IQMs based on the 40 levels of linear motion degradation applied to the 12 sample images are presented in Figure 7.
The IQMs were classified into four groups according to their degree of monotonicity.The four groups are given as follows: The results of linear motion degradation were very similar to the results obtained for average filter degradation.We note that the results of BIQAA and NIQE demonstrate a fluctuating trend with respect to the level of linear motion degradation while having, respectively, demonstrated decreasing and increasing trends with respect to the level of average filter degradation.The results of BM For the Decreasing group, the evaluation results of AC, BM, KM, and SM demonstrate a decreasing trend with increasing degradation, while, with respect to the level of average filter degradation, the results of AC demonstrated an increasing trend and the results of BM, KM, and SM fluctuated.The evaluation results of the Increasing group demonstrate an increasing trend with increasing degradation.Meanwhile, the results of AG, EI, FISH, JND, LD, and SD demonstrated a decreasing trend with increasing average filter degradation, the results of BIQI fluctuated with respect to the level of average filter degradation, and the results of DM and MM were unchanging with respect to the level of average filter degradation.The members of the Fluctuating group here largely coincide with those of the Fluctuating group obtained for average filter degradation.For those members not in the same group, EM demonstrated unchanging results and the results of NIQE fluctuated with respect to the level of average filter degradation.

(c) Linear motion degradation
The scatter plots for the 21 IQMs based on the 40 levels of linear motion degradation applied to the 12 sample images are presented in Figure 7.
The IQMs were classified into four groups according to their degree of monotonicity.The four groups are given as follows:

Prediction Consistency
The PLCC values between degradation levels for the three types of degradation and the IQM evaluation results for the 12 sample images are listed in Table 2.The results marked in gray in the table reside below the 0.05 confidence level, therefore indicating that the IQM fails to fulfill the requirements of prediction consistency.

Prediction Consistency
The PLCC values between degradation levels for the three types of degradation and the IQM evaluation results for the 12 sample images are listed in Table 2.The results marked in gray in the table reside below the 0.05 confidence level, therefore indicating that the IQM fails to fulfill the requirements of prediction consistency.Tables 6-9 present the component loading matrix for each type of degradation after conducting orthogonal rotation.The loading value was the correlation coefficient between IQMs and retained components.For each IQM, the level of importance of an IQM on a component increases with increasing loading value.The IQMs clustered in two-dimensional and three-dimensional component space are shown in Figure 9.The IQM with the highest loading value on a component was selected as the representative IQM for a given type of degradation.The results are summarized as follows:
From the perspective of the spatial resolution of an image, the results suggest EI and JND for evaluating the quality of an image:

•
Gaussian white noise degradation

Conclusions
In this paper, 21 objective NR IQMs were reviewed and evaluated for Quickbird images with urban, rural, and harbor contents subjected to 40 different levels of average filter, Gaussian white noise, and linear motion degradation.The experimental results provide a number of suggestions.(1)

Conclusions
In this paper, 21 objective NR IQMs were reviewed and evaluated for Quickbird images with urban, rural, and harbor contents subjected to 40 different levels of average filter, Gaussian white noise, and linear motion degradation.The experimental results provide a number of suggestions.
(1) Different IQMs performed differently for different image contents and different types of image degradation.Generally, the effect of the degradation type was stronger than that of the image content on the evaluation results.(2) The same IQM can provide opposite dependences on the level of degradation for different degradation types, e.g., the evaluation results of AC demonstrated a decreasing trend with increasing Gaussian white noise degradation and an increasing trend with increasing average filter degradation, and the evaluation results of AG, EI, FISH, JND, LD, and SD demonstrated increasing trends with increasing Gaussian white noise degradation and decreasing trends with increasing average filter degradation.(3) An IQM that performed well with one type of degradation may not perform well with another type, e.g., the evaluation results of BIQAA fluctuated with respect to the level of linear motion degradation and demonstrated a decreasing trend with increasing average filter degradation, the evaluation results of BM demonstrated an increasing trend with increasing linear motion degradation and fluctuated with respect to the level of average filter degradation, and the evaluation results of NIQE fluctuated with respect to the level of linear motion degradation and increased with increasing average filter degradation.(4) The general results of clustering provided suggestions for representative IQMs most appropriate for the different types of degradation.For average filter degradation, EI and JND for evaluating the quality of an image.For Gaussian white noise degradation, EI, BIQAA, and MM for evaluating the quality of an image.For linear motion degradation, LD, JND, and SD for evaluating the quality of an image.( 5) For image quality assessment without knowledge of the degradation type, EI was suggested for evaluating the quality of an image.

A
Quickbird image was obtained from IGARSS 2012.The image was acquired on 11 November 2007, and covers the city of San Francisco, CA, USA.The spatial resolution of the multi-spectral image is 2.4 m.We selected the 12 subset images shown in Figure 1a-l with uniform sizes of 256 × 256 pixels.The test images can be classified into three categories: Figure 1a-d urban areas; Figure 1e-g rural areas; and Figure 1h-l harbor areas.

Figure 2 .
Figure 2. Average filter degradation for the image in Figure 1a.

Figure 3 .
Figure 3. Gaussian white noise degradation for the image in Figure 1a.

Figure 2 .
Figure 2. Average filter degradation for the image in Figure 1a.

Figure 2 .
Figure 2. Average filter degradation for the image in Figure 1a.

Figure 3 .
Figure 3. Gaussian white noise degradation for the image in Figure 1a.

Figure 3 .
Figure 3. Gaussian white noise degradation for the image in Figure 1a.

Figure 4 .
Figure 4. Linear motion degradation for the image in Figure 1a.

Figure 4 .
Figure 4. Linear motion degradation for the image in Figure 1a.

Figure 5 .
Figure 5. Scatter plot of evaluation results for average filter degradation.

Figure 5 .
Figure 5. Scatter plot of evaluation results for average filter degradation.

Figure 6 .
Figure 6.Scatter plots of evaluation results for Gaussian white noise degradation.

Figure 6 .
Figure 6.Scatter plots of evaluation results for Gaussian white noise degradation.
Decreasing : AG, EI, EM, FISH, JND, LD, SD Increasing : AC, BM Fluctuating : BIQAA, BIQI, BLI I NDS − I I, BRISQE, CPBD, JNBM, KM, N IQE , QAC, SM Unchanging : DM, MM .The results of linear motion degradation were very similar to the results obtained for average filter degradation.We note that the results of BIQAA and NIQE demonstrate a fluctuating trend with respect to the level of linear motion degradation while having, respectively, demonstrated decreasing and increasing trends with respect to the level of average filter degradation.The results of BM demonstrate an increasing trend with increasing linear motion degradation while having demonstrated a fluctuating trend with respect to the level of average filter degradation.ISPRS Int.J. Geo-Inf.2017, 6, 133 11 of 18demonstrate an increasing trend with increasing linear motion degradation while having demonstrated a fluctuating trend with respect to the level of average filter degradation.

Figure 7 .
Figure 7. Scatter plots of evaluation results for linear motion degradation.

Figure 7 .
Figure 7. Scatter plots of evaluation results for linear motion degradation.

Figure 8 .
Figure 8. Scatter plots of the eigenvalues for all components with respect to each degradation type.Tables 6-9 present the component loading matrix for each type of degradation after conducting orthogonal rotation.The loading value was the correlation coefficient between IQMs and retained components.For each IQM, the level of importance of an IQM on a component increases with increasing loading value.

Figure 8 .
Figure 8. Scatter plots of the eigenvalues for all components with respect to each degradation type.

Figure 9 .
Figure 9. Clustering of image quality metrics in two-dimensional and three-dimensional space.

•
[21]d image integrity notator using discrete cosine transform (DCT) statistics (BLIINDS-II): Relies on Bayesian model to predicate image quality scores given certain extracted features.The features are based on natural scene statistics model of the image DCT coefficients[21].

Table 1 .
One-way ANOVA test results for different types of degradation.

Table 2 .
PLCC values between degradation levels and image qualities.

Table 6 .
Rotated component matrix for average filter degradation.

Table 7 .
Rotated component matrix for Gaussian white noise degradation.

Table 8 .
Rotated component matrix for linear motion degradation.

Table 9 .
Rotated component matrix for all degradation types together.