Thermodynamics-Based Evaluation of Various Improved Shannon Entropies for Configurational Information of Gray-Level Images

The quality of an image affects its utility and image quality assessment has been a hot research topic for many years. One widely used measure for image quality assessment is Shannon entropy, which has a well-established information-theoretic basis. The value of this entropy can be interpreted as the amount of information. However, Shannon entropy is badly adapted to information measurement in images, because it captures only the compositional information of an image and ignores the configurational aspect. To fix this problem, improved Shannon entropies have been actively proposed in the last few decades, but a thorough evaluation of their performance is still lacking. This study presents such an evaluation, involving twenty-three improved Shannon entropies based on various tools such as gray-level co-occurrence matrices and local binary patterns. For the evaluation, we proposed: (a) a strategy to generate testing (gray-level) images by simulating the mixing of ideal gases in thermodynamics; (b) three criteria consisting of validity, reliability, and ability to capture configurational disorder; and (c) three measures to assess the fulfillment of each criterion. The evaluation results show only the improved entropies based on local binary patterns are invalid for use in quantifying the configurational information of images, and the best variant of Shannon entropy in terms of reliability and ability is the one based on the average distance between same/different-value pixels. These conclusions are theoretically important in setting a direction for the future research on improving entropy and are practically useful in selecting an effective entropy for various image processing applications.


Introduction
Image quality assessment plays a fundamental role in the field of digital image processing [1][2][3][4][5][6], where it is useful in monitoring the quality of image systems, benchmarking image processing applications, and optimizing image processing algorithms [7,8]. The most reliable approach to assess image quality is a visual observation with the naked eye [9], but this approach depends largely on individual interpretations of quality and is thus subjective. For objective image quality assessment, one simple and widely used approach is to quantify the amount of (syntactic) information contained in an image using information-theoretic measures [10][11][12][13][14][15][16][17]. It is believed that the more information an image contains, the better the quality of the image is [12].
The most basic information-theoretic measure is entropy, which was proposed by Shannon [18] in the area of telecommunication. Shannon entropy (also called information entropy) is widely recognized as a cornerstone of information theory [19], and it has been used in various fields such as physics recognized as a cornerstone of information theory [19], and it has been used in various fields such as physics e.g., [20], chemistry e.g., [21], and biology e.g., [22]. Although Shannon entropy was originally used to quantify the information (i.e., disorder) of a one-dimensional message (e.g., a telegram message consisting of a series of letters), it has also been actively utilized as a measure of information content for gray-level (or grayscale) images, which can be considered as two-dimensional messages, in various applications including registration, segmentation, and fusion [23][24][25][26][27][28].
However, the information contained in a gray-level image (hereafter simply image) cannot be fully characterized by Shannon entropy as it only captures the image's compositional (or non-spatial) information such as the proportions and gray values of different pixels. The configurational (or spatial) information (i.e., the spatial distribution of pixels) of an image is ignored by Shannon entropy; see an example in Figure 1, where four images with different configurations of pixels have the same Shannon entropy. In fact, this problem of Shannon entropy has been pointed out by a number of researchers [29][30][31][32][33][34], questioning the applicability of Shannon entropy as a measure of information content of two-dimensional messages such as images, maps, and digital elevation models. To overcome the above problem, many improved Shannon entropies have been proposed in the last few decades to quantify the configurational information of an image, or, more specifically, the configurational disorder (or configuration) of pixels in an image. Nevertheless, to the best of our knowledge, no comparative study has been conducted concerning the performance of different improved Shannon entropies. More seriously, in the original papers on improved Shannon entropies, evaluations were either omitted e.g., [35] or simply performed in one of the two following ways:


to check whether the improved Shannon entropies of a few examples of spatial patterns are different e.g., [36], or  to examine whether the performance of a Shannon entropy-based image processing algorithm is improved e.g., [37].
Such evaluations are incomprehensive and sometimes case dependent. This study aims to systematically evaluate and compare the performance of improved Shannon entropies.
The remainder of this article is organized as follows: Section 2 presents a critical review of Shannon entropy and its improvements. Section 3 describes the design of the experiments to evaluate the performance of various improved Shannon entropies. A strategy to simulate configurational disorder (used as the experimental data) and a set of measures for evaluation is also proposed in this To overcome the above problem, many improved Shannon entropies have been proposed in the last few decades to quantify the configurational information of an image, or, more specifically, the configurational disorder (or configuration) of pixels in an image. Nevertheless, to the best of our knowledge, no comparative study has been conducted concerning the performance of different improved Shannon entropies. More seriously, in the original papers on improved Shannon entropies, evaluations were either omitted e.g., [35] or simply performed in one of the two following ways: • to check whether the improved Shannon entropies of a few examples of spatial patterns are different e.g., [36], or • to examine whether the performance of a Shannon entropy-based image processing algorithm is improved e.g., [37].
Such evaluations are incomprehensive and sometimes case dependent. This study aims to systematically evaluate and compare the performance of improved Shannon entropies.
The remainder of this article is organized as follows: Section 2 presents a critical review of Shannon entropy and its improvements. Section 3 describes the design of the experiments to evaluate Entropy 2018, 20,19 3 of 25 the performance of various improved Shannon entropies. A strategy to simulate configurational disorder (used as the experimental data) and a set of measures for evaluation is also proposed in this section. Then, Section 4 reports the experimental results and the analysis in terms of validity, reliability, and ability. It is found that the improved Shannon entropies based on local binary patterns are invalid for use in quantifying the configurational information of images, and the best variant of Shannon entropy in terms of reliability and ability is the one based on the average distance between same/different-value pixels. Section 5 presents a further discussion, followed by some concluding remarks in Section 6.

A Critical Review of Improved Entropies
The formula of Shannon entropy (referred to as Sh48, which is a short name formed from the letters of the author's surname and digits of the year of publication) is given as follows: where X is a discrete random variable with possible values of {x 1 , x 2 , · · · , x i , · · · , x n }, and P(x i ) is the probability of X taking the value of x i . When Sh48 is used for an image, X denotes the pixel of the image, and P(x i ) is the proportion of the pixels with a gray value of x i . To make Shannon entropy capable of quantifying the configurational information of an image, one should first characterize the configuration of image pixels using a certain tool and then reflect the characterization in the computation of Shannon entropy. Six tools have been used in the literature, leading to six categories of improved Shannon entropies as follows: 1.
Entropies based on the gray-level co-occurrence matrix of an image; 2.
Entropies based on the gray-level variance of the neighborhood of a pixel; 3.
Entropy based on the Sobel gradient of a pixel; 4.
Entropy based on the local binary pattern of an image; 5.
Entropy based on the Laplacian pyramid of an image; and 6.
Entropy based on the distance between pixels of the same/different value.
These six categories are reviewed in the remainder of this section.

Entropies Based on the Gray-Level Co-Occurrence Matrix of an Image
The gray-level co-occurrence matrix (GLCM) was first proposed by Haralick, et al. [35] and is still widely used in image processing e.g., [38,39]. The basic idea behind it is the co-occurrence of two gray levels in an image. For example, there are nine co-occurrences of gray levels when scanning the image in Figure 2 from left to right and pixel by pixel. The GLCM of the image, also shown in Figure 2, is a matrix that records the frequency of such co-occurrence of every two gray levels. In this example, the element f ij of the matrix indicates that the j-th gray level occurs f ij time (s) at the immediate right of the i-th gray level. section. Then, Section 4 reports the experimental results and the analysis in terms of validity, reliability, and ability. It is found that the improved Shannon entropies based on local binary patterns are invalid for use in quantifying the configurational information of images, and the best variant of Shannon entropy in terms of reliability and ability is the one based on the average distance between same/different-value pixels. Sections 5 presents a further discussion, followed by some concluding remarks in Section 6.

A Critical Review of Improved Entropies
The formula of Shannon entropy (referred to as Sh48, which is a short name formed from the letters of the author's surname and digits of the year of publication) is given as follows: where is a discrete random variable with possible values of , , ⋯ , , ⋯ , , and ( ) is the probability of taking the value of . When Sh48 is used for an image, denotes the pixel of the image, and ( ) is the proportion of the pixels with a gray value of .
To make Shannon entropy capable of quantifying the configurational information of an image, one should first characterize the configuration of image pixels using a certain tool and then reflect the characterization in the computation of Shannon entropy. Six tools have been used in the literature, leading to six categories of improved Shannon entropies as follows: 1. Entropies based on the gray-level co-occurrence matrix of an image; 2. Entropies based on the gray-level variance of the neighborhood of a pixel; 3. Entropy based on the Sobel gradient of a pixel; 4. Entropy based on the local binary pattern of an image; 5. Entropy based on the Laplacian pyramid of an image; and 6. Entropy based on the distance between pixels of the same/different value.
These six categories are reviewed in the remainder of this section.

Entropies Based on the Gray-Level Co-Occurrence Matrix of an Image
The gray-level co-occurrence matrix (GLCM) was first proposed by Haralick, et al. [35] and is still widely used in image processing e.g., [38,39]. The basic idea behind it is the co-occurrence of two gray levels in an image. For example, there are nine co-occurrences of gray levels when scanning the image in Figure 2 from left to right and pixel by pixel. The GLCM of the image, also shown in Figure  2, is a matrix that records the frequency of such co-occurrence of every two gray levels. In this example, the element of the matrix indicates that the -th gray level occurs time (s) at the immediate right of the -th gray level. Formally, the GLCM of a × image with gray levels is given as a × matrix, 1 ≤ ≤ , 1 ≤ ≤ , the element of which is computed according to Equation (2)   Formally, the GLCM of a M × N image with L gray levels is given as a L × L matrix, f ij 1 ≤ i ≤ L, 1 ≤ j ≤ L , the element of which is computed according to Equation (2): 1 I(m, n) = G(i) and I(m + ∆x, n + ∆y) = G(j) 0 otherwise (2) where G(x) is the value of the x-th gray level in the image, I(m, n) denotes the gray value of the pixel located at (m, n), and (∆x, ∆y) is a pair of pre-set parameters called the displacement operator (denoted as d). Haralick, et al. [35] provided a total of eight displacement operators (Figure 3), which can be used to generate GLCMs along eight different directions, i.
It should be pointed out that all eight improved Shannon entropies by Haralick, et al. [35] are computed based on the GLCM generated along only one direction. One may argue that the configurational information quantified by such Shannon entropies is incomplete. For this reason, three other methods to generate a GLCM were proposed for the computation of a GLCM-based improved Shannon entropy using Figure 3.
(1) GLCM generated along two directions In computing a GLCM-based improved Shannon entropy, Pal and Pal [40] proposed generating a GLCM with displacement operators along two directions, namely "R" and "D". In other words, the element ( ) of such a GLCM is derived using Equations (4)- (6). The resultant improved Shannon entropy is referred to as PP89 in this study: (2) GLCM generated along eight directions Abutaleb [41] proposed considering all eight directions when generating a GLCM with an image. In his method, the element ( ) of the GLCM of an image is computed using Equations (7) and (8). Note that in this way, the term "gray-level co-occurrence" in "GLCM" is actually redefined to be It should be pointed out that all eight improved Shannon entropies by Haralick, et al. [35] are computed based on the GLCM generated along only one direction. One may argue that the configurational information quantified by such Shannon entropies is incomplete. For this reason, three other methods to generate a GLCM were proposed for the computation of a GLCM-based improved Shannon entropy using Figure 3.
(1) GLCM generated along two directions In computing a GLCM-based improved Shannon entropy, Pal and Pal [40] proposed generating a GLCM with displacement operators along two directions, namely "R" and "D". In other words, the element ( f ij ) of such a GLCM is derived using Equations (4)- (6). The resultant improved Shannon entropy is referred to as PP89 in this study: (2) GLCM generated along eight directions Abutaleb [41] proposed considering all eight directions when generating a GLCM with an image. In his method, the element ( f ij ) of the GLCM of an image is computed using Equations (7) and (8). Note that in this way, the term "gray-level co-occurrence" in "GLCM" is actually redefined to be the co-occurrence of the gray level of a pixel and the average gray level of the pixel's eight neighbors. The resultant improved Shannon entropy is referred to as Ab89: Ave(m, n) = 1 8 (3) GLCM generated along four directions Brink [42] proposed the use of only four directions containing "R", "RD", "D", and "LD" when computing the GLCM-based Shannon entropy (referred to as Br95) of an image; that is, each element of the GLCM of an image is derived using Equations (9)- (13). In this way, the GLCM employed by Brink [42] is based on the asymmetrical neighborhood of a pixel, rather than the symmetrical neighborhood used by Abutaleb [41]. It is worth noting that such asymmetrical neighborhoods are now widely used in generating the GLCM of an image [43]:

Entropies Based on the Gray-Level Variance of Neighborhoods of a Pixel
The configuration of pixels of an image can also be captured by the gray-level variance (GLV) computed for the neighborhood of each pixel. This is because two pixels with the same gray value, but different neighbors are likely to have different GLVs, as shown in Figure 4. In the literature, there are two improved Shannon entropies based on the GLVs of pixels.
The first GLV-based improved Shannon entropy (referred to as Br96) was proposed by Brink [44] in the form of Equations (14)- (16): Entropy 2018, 20, 19 6 of 25 where n is the number of pixels in an image; N 3 is the 3 × 3 neighborhood (including the pixel itself) of a pixel; µ N 3 is the average gray value of pixels in N 3 ; δ i is the GLV of N 3 ; and g i is the gray value of pixel i. Note that in this improved Shannon entropy, the probability p i is computed for each pixel rather than for each gray level in the original Shannon entropy. The other GLV-based improved Shannon entropy (referred to as Qu12-V) was proposed by Quweider [37] and computed using the following equations: where n is the number of gray levels in an image; l denotes a gray level; Ω l is the collection of coordinates of pixels with a gray value of l; |Ω l | is the number of elements in Ω l ; and δ(i, j) is the GLV of the 3 × 3 neighborhood of pixel (i, j). Note that the probability p l in Equation (17) is computed for all pixels at the same gray level, rather than for a single pixel in Equation (14). In the literature, the parameter m l is commonly referred to as the busyness or activity of the gray level l [37,45].
(3) GLCM generated along four directions Brink [42] proposed the use of only four directions containing "R", "RD", "D", and "LD" when computing the GLCM-based Shannon entropy (referred to as Br95) of an image; that is, each element of the GLCM of an image is derived using Equations (9)- (13). In this way, the GLCM employed by Brink [42] is based on the asymmetrical neighborhood of a pixel, rather than the symmetrical neighborhood used by Abutaleb [41]. It is worth noting that such asymmetrical neighborhoods are now widely used in generating the GLCM of an image [43]:

Entropies Based on the Gray-Level Variance of Neighborhoods of a Pixel
The configuration of pixels of an image can also be captured by the gray-level variance (GLV) computed for the neighborhood of each pixel. This is because two pixels with the same gray value, but different neighbors are likely to have different GLVs, as shown in Figure 4. In the literature, there are two improved Shannon entropies based on the GLVs of pixels. The first GLV-based improved Shannon entropy (referred to as Br96) was proposed by Brink [44] in the form of Equations (14)-(16):

Entropy Based on the Sobel Gradient of a Pixel
Different configurations of pixels may lead to different edges, which can be detected by computing the gradient of each pixel [46,47]. One of the commonly used tools to determine the gradient of a pixel is the Sobel operator [48], which consists of two 3 × 3 kernels ( Figure 5) used to convolve an image (denote the convolved images as G x and G y , respectively).
The first kernel aims to detect the edges of the image in the horizontal direction, whereas the second kernel operates in the vertical direction. Based on G x and G y , the (Sobel) gradient of a pixel (i, j) is computed as follows: Quweider [37] proposed a Sobel gradient-based Shannon entropy, referred to as Qu12-G. This entropy is also computed using Equation (17), but the busyness m l in Equation (17) is redefined as the average Sobel gradient of all pixels with a gray value of l, as shown in Equation (20): where Ω l denotes the collection of coordinates of pixels with a gray value of l; |Ω l | is the number of elements in Ω l ; and G(i, j) is the Sobel gradient computed according to Equation (19).
where is the number of gray levels in an image; denotes a gray level; Ω is the collection of coordinates of pixels with a gray value of ; |Ω | is the number of elements in Ω ; and ( , ) is the GLV of the 3 × 3 neighborhood of pixel ( , ). Note that the probability in Equation (17) is computed for all pixels at the same gray level, rather than for a single pixel in Equation (14). In the literature, the parameter is commonly referred to as the busyness or activity of the gray level [37,45].

Entropy Based on the Sobel Gradient of a Pixel
Different configurations of pixels may lead to different edges, which can be detected by computing the gradient of each pixel [46,47]. One of the commonly used tools to determine the gradient of a pixel is the Sobel operator [48], which consists of two 3 × 3 kernels ( Figure 5) used to convolve an image (denote the convolved images as and , respectively). The first kernel aims to detect the edges of the image in the horizontal direction, whereas the second kernel operates in the vertical direction. Based on and , the (Sobel) gradient of a pixel ( , ) is computed as follows: Quweider [37] proposed a Sobel gradient-based Shannon entropy, referred to as Qu12-G. This entropy is also computed using Equation (17), but the busyness in Equation (17) is redefined as the average Sobel gradient of all pixels with a gray value of , as shown in Equation (20):

Entropy Based on the Local Binary Pattern of an Image
A specific configuration of pixels may form a specific local binary pattern (LBP), which is a popular local texture descriptor that was first introduced by Ojala et al. [49] and is widely used in image analysis e.g., [50,51]. The LBP of an image is expressed as a series of integers called the LBP values, which are assigned to each pixel of an image. The procedure to determine the LBP value of a pixel is as follows (an example is shown in Figure 6).

1.
Read the gray value (y) of the pixel and that of the pixel's eight immediate neighbors from the left top in clockwise order (denoted as x 0 , x 1 , · · · , x 7 ).

2.
Create an 8-digit binary number is a binary digit with a value of either 0 or 1.

3.
Compare each neighbor to the pixel; Convert the binary number to its decimal equivalent, which is the LBP value of the pixel. where Ω denotes the collection of coordinates of pixels with a gray value of ; |Ω | is the number of elements in Ω ; and ( , ) is the Sobel gradient computed according to Equation (19).

Entropy Based on the Local Binary Pattern of an Image
A specific configuration of pixels may form a specific local binary pattern (LBP), which is a popular local texture descriptor that was first introduced by Ojala, et al. [49] and is widely used in image analysis e.g., [50,51]. The LBP of an image is expressed as a series of integers called the LBP values, which are assigned to each pixel of an image. The procedure to determine the LBP value of a pixel is as follows (an example is shown in Figure 6). 4. Convert the binary number to its decimal equivalent, which is the LBP value of the pixel.
An LBP-based Shannon entropy (referred to as Qu12-L) was suggested by Quweider [37] in the same form as Equation (17), but the busyness in Equation (17) is computed as follows: where ( , ) is the LBP value of pixel ( , ) , and Ω = ( , )| ( , ) = is the collection of coordinates of pixels with a gray value of . An LBP-based Shannon entropy (referred to as Qu12-L) was suggested by Quweider [37] in the same form as Equation (17), but the busyness m l in Equation (17) is computed as follows: where LBP(i, j) is the LBP value of pixel (i, j), and Ω l = {(i, j)|I(i, j) = l} is the collection of coordinates of pixels with a gray value of l.

Entropy Based on the Laplacian Pyramid of an Image
Rakshit and Mishra [52] pointed out that the configuration of pixels in an image can be captured by its Laplacian pyramid, which is proposed by Burt and Adelson [53] and has been widely used for image analysis [54]. The Laplacian pyramid is a type of multi-scale representation for images, and it is constructed by decomposing an image into multiple scales (or levels, denoted as L 0 , L 1 , · · · , L i , · · · , L n−1 , L n ), as shown in Figure 7.
same form as Equation (17), but the busyness in Equation (17) is computed as follows: where ( , ) is the LBP value of pixel ( , ) , and Ω = {( , )| ( , ) = } is the collection of coordinates of pixels with a gray value of .

Entropy Based on the Laplacian Pyramid of an Image
Rakshit and Mishra [52] pointed out that the configuration of pixels in an image can be captured by its Laplacian pyramid, which is proposed by Burt and Adelson [53] and has been widely used for image analysis [54]. The Laplacian pyramid is a type of multi-scale representation for images, and it is constructed by decomposing an image into multiple scales (or levels, denoted as 0 , 1 , ⋯ , , ⋯ , n−1 , n ), as shown in Figure 7. The gray-level Lena image 0 ( 256 × 256 pixels) and its Laplacian pyramid, which consists of nine levels: 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , and 8 . Figure 7. The gray-level Lena image G 0 (256 × 256 pixels) and its Laplacian pyramid, which consists of nine levels: L 0 , L 1 , L 2 , L 3 , L 4 , L 5 , L 6 , L 7 , and L 8 .
In a Laplacian pyramid, the size of the first level (L 0 ) is the same as that of the original image, whereas the size of each of the other levels is half of that of its previous level (please see [55] for more technical details on the Laplacian pyramid).
The assumption behind Rakshit and Mishra [52]'s argument is that two different images with the same composition of pixels are likely to have different Laplacian pyramids; thus, the difference in the configuration of pixels in the two images can be reflected in measures based on the Laplacian pyramid. Based on this assumption, they proposed an improved Shannon entropy (referred to as RM06) that is computed as follows: where H(L i ) is the Shannon entropy of the i-th level (denoted as L i where i = 0, 1, · · · , n) of the Laplacian pyramid of an image.

Entropy Based on the Average Distance between Same/Different-Value Pixels
The configuration of pixels (or geographic features in general) determines their correlation, which can be estimated, according to Claramunt [36], by using the Euclidean distance. Following this line of thought, Claramunt [36] proposed an improved Shannon entropy based on the distance between two pixels, or the geographic features in general.
The distance between two pixels, as pointed out by Claramunt [36], can be considered as the key factor in determining the correlation between them, because the First Law of Geography [56] states that "everything is related to everything else, but near things are more related than distant things" [57]. This key, according to Claramunt [36], should also be used in determining the correlation among all the pixels of an image, or the configurational disorder of an image. He assumed that the degree of the configurational disorder of an image would decrease if the average distance between every two pixels of the same gray value (or same-value pixels in short) becomes shorter and/or the average distance between every two pixels of different gray values (or different-value pixels) becomes longer. With this Entropy 2018, 20, 19 9 of 25 assumption, Claramunt [36] proposed an improved Shannon entropy (referred to as Cl05) which is computed by the following three equations: where i denotes the i-th gray level; and n and N are the total number of gray levels and that of pixels, respectively. p i , N i , and C i are the proportion, the total number, and the collection of pixels at the i-th gray level, respectively. j and k denote the j-th and k-th pixel in C i , respectively, and the Euclidean distance between them is denoted by d jk . λ is a pre-set parameter taking a small value such as 0.1 or 0.2.
The nature of the d s (i) computed using Equation (24) is the average of the distances between every two pixels at the i-th gray level. Therefore, d s is termed the average distance between the same-value pixels in this study. In contrast, d d (i) is actually the average of the distances between each of the pixels at the i-th gray level and each of the pixels at the other gray levels, so d d is referred to as the average distance between the different-value pixels. In the work by Leibovici, et al. [58], d s /d d is termed discriminant ratio.
It is worth noting that, although a comprehensive evaluation is lacking, Cl05 has found some applications in geographic information science. Examples of these applications include spatial data classification [59] and clustering [60].

Design of the Thermodynamics-Based Evaluation
The basic idea of the evaluation is to compute the values of an improved Shannon entropy for a sequence of increasingly configuration-disordered images and then to examine whether these values capture the increasing disorder or not. However, there is no standard sequence of images that are increasingly disordered in terms of configuration. In this section, a thermodynamics-based strategy is first proposed and used to generate such images. Then, the criteria for the evaluation are defined and measures for each criterion are developed.

A Thermodynamics-Based Strategy for Generating Testing Images
To obtain a sequence of increasingly configuration-disordered images, one natural strategy is to generate a group of images with the same composition of pixels and then rank these images according to their degrees of configurational disorder. Such a strategy requires a measure of (configurational) disorder that can be employed to rank different configuration-disordered images, or configurational disorders in general. However, the long-used standard measure of disorder is Shannon entropy itself [61,62], but, as mentioned in the introduction, its value is not related to configurational disorders.
To escape the above paradox, the origin of the entropy concept, thermodynamics, was revisited in this study. In thermodynamics, the terms entropy and disorder are used interchangeably [63]. The classical example of increasing disorder is the mixing of ideal gases [64], as shown in Figure 8. In this example, two ideal gases are initially separated by a partition in a closed system (Figure 8a), and then they mix together because the partition is removed (Figure 8b-d). During the mixing process, the disorder/entropy of the system increases logarithmically until the system achieves equilibrium [65], at which time the disorder/entropy reaches its maximum value.
1. Get the size, × , of the seed image, which is taken as the output of Iteration 0. 2. Randomly select ( × )/2 pixels in the resultant image of the previous iteration. 3. Exchange the position of each of the selected pixels and a randomly selected neighboring pixel. 4. Output the resultant image as the result of the current iteration of mixing. 5. Go back to Step 2 until the number of iterations reaches some threshold.  One possible strategy for generating a sequence of increasingly configuration-disordered images is to simulate this classical example in thermodynamics, i.e., the mixing of ideal gases. To this end, a simulation strategy, referred to as the thermodynamics-based strategy, was proposed in this study. The strategy works with a user-supplied image, referred to as a "seed" (image), which is regarded as the initial state of a closed system. In the strategy, pixels of the seed image are regarded as gas molecules, whose "mixing" is simulated using the following iterative algorithm:

1.
Get the size, r × c, of the seed image, which is taken as the output of Iteration 0.

2.
Randomly select (r × c)/2 pixels in the resultant image of the previous iteration.

3.
Exchange the position of each of the selected pixels and a randomly selected neighboring pixel.

4.
Output the resultant image as the result of the current iteration of mixing.

5.
Go back to Step 2 until the number of iterations reaches some threshold.

A Set of Testing Images Generated Using the Proposed Strategy
Using the thermodynamics-based strategy, a set of testing images were generated in this study. The testing image set is a sequence of increasingly configuration-disordered images generated using a natural image (Figure 9a) as the seed. This seed image contains 150 × 215 pixels, with values ranging from 0 to 215. The threshold in implementing the thermodynamics-based strategy was determined using the following procedure:

1.
Set its initial value to a large enough number (e.g., 100,000) to obtain numerous outputs.

2.
View the outputs of the 10,000 ×k-th (k = 1, 2, 3, · · · ) iterations with the naked eye, and select one from these viewed outputs as the "total disorder". 3.
Set the final value of the threshold to the number of iterations of the "total disorder".
Following the preceding procedure, the threshold was determined as 20,000. In other words, the testing image set contains 20,000 increasingly configuration-disordered images (see a few of these images in Figure 9b-l), each of which is the output of the i-th (i = 1, 2, 3, · · · , 20, 000) iteration of mixing using the natural image (Figure 9a) as the seed.
Some readers may wonder what the mixing result is like after 20,000 iterations. Our experiment, consisting of 100,000 iterations of mixing, showed that there was little visual difference between two resultant images after 20,000 iterations (the results of 100,000 iterations are available from the authors upon request).
one from these viewed outputs as the "total disorder". 3. Set the final value of the threshold to the number of iterations of the "total disorder".
Following the preceding procedure, the threshold was determined as 20,000. In other words, the testing image set contains 20,000 increasingly configuration-disordered images (see a few of these images in Figure 9b-l), each of which is the output of the -th ( = 1, 2, 3, ⋯ , 20,000) iteration of mixing using the natural image (Figure 9a) as the seed.

Criteria and Measures for Evaluation
Three criteria are defined in this section for evaluating the improved Shannon entropies, i.e., their validity, reliability, and ability to capture configurational disorder. In addition to the definition of these criteria, three measures were developed to assess the fulfillment of each criterion.
(1) Validity and its measure Validity is the most important criteria; it indicates "whether the instrument is actually measuring the concept it claims to measure" [66]. In this study, the validity of an improved Shannon entropy refers to whether the entropy really captures configurational disorder or not. In dealing with the testing images, the values of a valid improved Shannon entropy for these images should exhibit a logarithmic trend over the iterations of mixing. Such a trend is a characterization of the logarithmic growth of the degree of the configurational disorder of pixels-as simulations of gas molecules in mixing-in the iterations. The measure of validity, referred to as V, is qualitatively defined as follows: where yes means valid, and no indicates invalid. The parameter thre is a pre-set threshold, and r 2 is the coefficient of determination obtained when performing a least-squares regression between (a) the values of an improved Shannon entropy for the testing images and (b) the iterations of mixing, using a logarithmic model. The value of r 2 indicates the goodness of fit of a regression model to data [67], so in the context of this study it demonstrates whether the logarithmic trend shown by these values over the iterations of mixing is strong. In this study, the value of thre was set as 50% because a regression model can usually be regarded as a good fit, if r 2 is greater than a half [68].
(2) Reliability and its measure The reliability of a measure refers to "whether something is being measured consistently" [69]. The meaning of reliability is two-fold. First, a reliable measure "produces the same results when used repeatedly to measure the same thing" [70]. Second, the values of a reliable measure for two similar things are close. In the second sense, if an improved Shannon entropy is reliable, the difference between its values for the configuration-disordered images at two consecutive iterations of mixing should be tiny. In other words, if the values of a reliable improved Shannon entropy for the testing images are shown in a scatter plot, the polyline (hereafter referred to as the scatter line) connecting every two consecutive scatter points should be smooth (see [71,72] for more information on scatter plots). The measure of reliability, referred to as R, is quantitatively defined as follows: where v i is value of an improved Shannon entropy for the configuration-disordered image at the i-th iteration of mixing (i = 1, 2, 3, · · · , n); n is the total number of iterations; and max and min are the maximum and minimum of all (v i )s, respectively. It can be seen from Equation (27) that R is the ratio of (a) the cumulative growth in value of an improved Shannon entropy for the configuration-disordered images from the first iteration to the last to (b) the value range of this entropy for the images of all iterations. The smaller this ratio, the smoother the scatter line (see an example in Figure 10), and the more reliable the improved Shannon entropy. (3) Ability and its measure The ability to capture configurational disorder refers to the range of configurations, in terms of the degree of disorder, that can be captured by an improved Shannon entropy. An improved Shannon entropy of high ability should capture a large range of configurations, say, from (nearly) completely ordered to totally disordered. For the testing images, the values of a high-ability improved Shannon entropy should converge slowly over the iterations of mixing. In contrast, for an improved Shannon entropy of low ability, its values converge quickly. The measure of ability, referred to as , is defined by the following formula: (3) Ability and its measure The ability to capture configurational disorder refers to the range of configurations, in terms of the degree of disorder, that can be captured by an improved Shannon entropy. An improved Shannon entropy of high ability should capture a large range of configurations, say, from (nearly) completely ordered to totally disordered. For the testing images, the values of a high-ability improved Shannon entropy should converge slowly over the iterations of mixing. In contrast, for an improved Shannon entropy of low ability, its values converge quickly. The measure of ability, referred to as A, is defined by the following formula: where v i , n, max, and min hold the same meaning as in Equation (27). The nature of A is the ratio of areas (i.e., S 1 and S 2 ) of two shapes formed in the scatter plot of the values of an improved Shannon entropy for a sequence of increasingly configuration-disordered images, as shown in Figure 11. A smaller value of this ratio means that the value of an improved entropy converges slower over the iterations of mixing, as shown in Figure 12. Therefore, the smaller this ratio is, the higher ability the improved entropy is. (3) Ability and its measure The ability to capture configurational disorder refers to the range of configurations, in terms of the degree of disorder, that can be captured by an improved Shannon entropy. An improved Shannon entropy of high ability should capture a large range of configurations, say, from (nearly) completely ordered to totally disordered. For the testing images, the values of a high-ability improved Shannon entropy should converge slowly over the iterations of mixing. In contrast, for an improved Shannon entropy of low ability, its values converge quickly. The measure of ability, referred to as , is defined by the following formula: where , , , and hold the same meaning as in Equation (27). The nature of is the ratio of areas (i.e., and ) of two shapes formed in the scatter plot of the values of an improved Shannon entropy for a sequence of increasingly configuration-disordered images, as shown in Figure  11. A smaller value of this ratio means that the value of an improved entropy converges slower over the iterations of mixing, as shown in Figure 12. Therefore, the smaller this ratio is, the higher ability the improved entropy is.    (3) Ability and its measure The ability to capture configurational disorder refers to the range of configurations, in terms of the degree of disorder, that can be captured by an improved Shannon entropy. An improved Shannon entropy of high ability should capture a large range of configurations, say, from (nearly) completely ordered to totally disordered. For the testing images, the values of a high-ability improved Shannon entropy should converge slowly over the iterations of mixing. In contrast, for an improved Shannon entropy of low ability, its values converge quickly. The measure of ability, referred to as , is defined by the following formula: where , , , and hold the same meaning as in Equation (27). The nature of is the ratio of areas (i.e., and ) of two shapes formed in the scatter plot of the values of an improved Shannon entropy for a sequence of increasingly configuration-disordered images, as shown in Figure  11. A smaller value of this ratio means that the value of an improved entropy converges slower over the iterations of mixing, as shown in Figure 12. Therefore, the smaller this ratio is, the higher ability the improved entropy is.

Methods to be Evaluated: Original and Modified
Methods that were evaluated in this study are listed in Table 1. These methods contain the original Shannon entropy and all the improved methods reviewed in Section 2. In addition, some modified improved Shannon entropies are also tabulated in Table 1,  using the neighborhood of 5 × 5 pixels; the results are referred to as Br96-5 and Qu12-V-5, respectively. The size of the neighborhood used in other entropies was not changed because their computation is limited to only the original size; for example, the size of the neighborhood used in computing Qu12-G is fixed at 3 × 3 pixels by the Sobel operator.
(2) Avoiding dividing by zero There is a problem of dividing by zero in the three improved Shannon entropies by Quweider [37], i.e., Qu12-V, Qu12-G, and Qu12-L, if the busyness m l in Equation (17) takes the value of zero. To fix this problem, the strategy used in Br96-adding one to the denominator, as shown in Equation (15)-was adopted in this study. Accordingly, a modified formula to Equation (17) was proposed in this study, as shown in Equation (19). The modified results of Qu12-V/G/L and Qu12-V-5 computed using Equation (29) are referred to as Qu12-V /G /L and Qu12-V-5 , respectively: Ha73-L 12 Qu12-V-5

Results of the Evaluation
The entropies of each increasingly configuration-disordered image generated in this study are shown in Figure 13. Note that the logarithmic base in computing each entropy was set as two in this study, although other bases such as 10 and e are also acceptable. Furthermore, this figure shows the results of the regression analysis for each Shannon entropy, namely the regression equation and r 2 . The validity, reliability, and ability, measured by V, R, and A, respectively, of each Shannon entropy are listed in Table 2.   Note: N/A means "not applicable".

Analysis of the Results on Validity
Among the 23 improved Shannon entropies, only Qu12-L and Qu12-L turn out to be invalid in the evaluation, as shown in Table 2. Although both of these improved Shannon entropies are based on LBP, they are invalid due to different reasons.
Qu12-L is not valid as its algorithm returned an error of "dividing by zero" when using Equation (17). In other words, the parameter m l in Equation (17) has a chance of taking the value of zero when dealing with the testing images. In fact, this error makes sense when computing Qu12-L with any image. According to Equation (21), m l takes the value of zero if the LBP value of each pixel at the gray level of l equals zero, or, in other words, if all the immediate neighbors of the pixels at the gray level of l have a gray value not greater than l. This condition is always true when l equals the greatest gray value when dealing with any image.
Qu12-L is invalid because its values for the testing images present a convex trend, rather than a logarithmic trend, over the iterations of mixing. This convex trend can be revealed by a close look at the scatter plot of Qu12-L : As shown in Figure 14, the value of Qu12-L first presents an upward trend, peaks at about Iteration 3000, and then shows a downward trend.
Qu12-L is not valid as its algorithm returned an error of "dividing by zero" when using Equation (17). In other words, the parameter in Equation (17) has a chance of taking the value of zero when dealing with the testing images. In fact, this error makes sense when computing Qu12-L with any image. According to Equation (21), takes the value of zero if the LBP value of each pixel at the gray level of equals zero, or, in other words, if all the immediate neighbors of the pixels at the gray level of have a gray value not greater than . This condition is always true when equals the greatest gray value when dealing with any image.
Qu12-L' is invalid because its values for the testing images present a convex trend, rather than a logarithmic trend, over the iterations of mixing. This convex trend can be revealed by a close look at the scatter plot of Qu12-L': As shown in Figure 14, the value of Qu12-L' first presents an upward trend, peaks at about Iteration 3000, and then shows a downward trend.

Analysis of the Results on Reliability
The ranking of different improved Shannon entropies can be determined according to the measure of reliability (i.e., ), as shown in Table 3. It can be seen from this table that the most reliable improved Shannon entropy is the one based on the average distance between same/different-value pixels, i.e., Cl05, followed by the improved Shannon entropies based on GLV, namely Qu12-V-5, Qu12-V-5', Qu12-V, Qu12-V', Br96-5, and Br96 (ranked 2nd-7th, respectively).
The most unreliable improved Shannon entropy is the one based on Laplacian pyramid, i.e., RM06, whose -value is significantly higher than that of the other improved Shannon entropies, as shown in Figure 15. A possible explanation for the low reliability of RM06 (i.e., the great fluctuation in the value of RM06) is that in the mixing simulation, the "motion" of each pixel has a "butterfly effect" on the resultant Laplacian pyramid. In other words, the motion of even a single pixel is enough to change all the levels of the Laplacian pyramid of an image.

Analysis of the Results on Ability
The rankings of various improved Shannon entropies in terms of ability is shown in Table 4. It can be seen from the rankings that Cl05 is the improved Shannon entropy with the highest ability to capture configurational disorder, followed by RM06 with the second highest ability. In addition, the ability of these two improved Shannon entropies, especially Cl05, is significantly better than that of the others, as shown in Figure 16. This significant difference is because these two improved Shannon entropies are sensitive to not only configurations (referred to as local configurations) within a pixel' neighborhood of a pre-set size but also configurations (global configurations) outside the neighborhood.

Analysis of the Results on Ability
The rankings of various improved Shannon entropies in terms of ability is shown in Table 4. It can be seen from the rankings that Cl05 is the improved Shannon entropy with the highest ability to capture configurational disorder, followed by RM06 with the second highest ability. In addition, the ability of these two improved Shannon entropies, especially Cl05, is significantly better than that of the others, as shown in Figure 16. This significant difference is because these two improved Shannon entropies are sensitive to not only configurations (referred to as local configurations) within a pixel' neighborhood of a pre-set size but also configurations (global configurations) outside the neighborhood. used in Br95. It is also noted that the improved Shannon entropies based on the GLCM generated along multiple directions (i.e., Br95, Ab89, and PP89) are more reliable than that based on the GLCM generated along a single direction.
The most unreliable improved Shannon entropy is the one based on Laplacian pyramid, i.e., RM06, whose -value is significantly higher than that of the other improved Shannon entropies, as shown in Figure 15. A possible explanation for the low reliability of RM06 (i.e., the great fluctuation in the value of RM06) is that in the mixing simulation, the "motion" of each pixel has a "butterfly effect" on the resultant Laplacian pyramid. In other words, the motion of even a single pixel is enough to change all the levels of the Laplacian pyramid of an image.

Analysis of the Results on Ability
The rankings of various improved Shannon entropies in terms of ability is shown in Table 4. It can be seen from the rankings that Cl05 is the improved Shannon entropy with the highest ability to capture configurational disorder, followed by RM06 with the second highest ability. In addition, the ability of these two improved Shannon entropies, especially Cl05, is significantly better than that of the others, as shown in Figure 16. This significant difference is because these two improved Shannon entropies are sensitive to not only configurations (referred to as local configurations) within a pixel' neighborhood of a pre-set size but also configurations (global configurations) outside the neighborhood.  Let us take the two images (the upper one and the lower) in Figure 17 as an example. The only difference between the two images is the location of the pixel with a gray value of seven. For this pixel, its local configuration within a pre-set size, say 3 × 3, in the upper image is the same as that in the lower image, but its global configurations are different between the two images (obviously evident in the distance between this pixel and the one with a gray value of eight). The values of all improved Shannon entropies of these two images were computed and are shown in Table 5. One can note from this table that, among all these improved Shannon entropies, only Cl05 and RM06 capture the difference between the two images in Figure 17.
improved Shannon entropies of these two images were computed and are shown in Table 5. One can note from this table that, among all these improved Shannon entropies, only Cl05 and RM06 capture the difference between the two images in Figure 17.
Ha73-D 11 Ha73-L 18 Qu12-V' Note: Some rankings are bolded to indicate that they are the same as their previous one. Figure 17. Two simple images with a slight difference. Table 5. The values of all improved Shannon entropies of the two images in Figure 17.  Figure 17, respectively.

Effects of Modifications on Improved Shannon Entropies
In this section, we investigate the effects of modifications on improved Shannon entropies. As described in Section 4.1, the first modification is to change the size of the neighborhood used in computing Br96 and Qu12-V, resulting in two modified improved Shannon entropies, namely Br96-5 and Qu12-V-5. A comparison between the performance of Br96 and that of Br96-5 reveals that such a modification increases the reliability but decreases the usability of Br96. The changing of the size of Figure 17. Two simple images with a slight difference. Table 5. The values of all improved Shannon entropies of the two images in Figure 17.

Effects of Modifications on Improved Shannon Entropies
In this section, we investigate the effects of modifications on improved Shannon entropies. As described in Section 4.1, the first modification is to change the size of the neighborhood used in computing Br96 and Qu12-V, resulting in two modified improved Shannon entropies, namely Br96-5 and Qu12-V-5. A comparison between the performance of Br96 and that of Br96-5 reveals that such a modification increases the reliability but decreases the usability of Br96. The changing of the size of the neighborhood, however, improves both the reliability and the usability of Qu12-V. These findings imply that neighborhoods of larger sizes are not always better than that of smaller ones in improving Shannon entropy.
The second modification was aimed at avoiding the problem of dividing by zero when computing Qu12-V, Qu12-V-5, Qu12-G, and Qu12-L, but this problem was encountered only in the computation of Qu12-L in the evaluation (as shown in Figure 13). It is worth noting that although the other three improved Shannon entropies, i.e., Qu12-V, Qu12-V-5, and Qu12-G, are available with the testing images in this study, it does not deny the necessity of this modification. For example, these three improved Shannon entropies are unavailable when dealing with an image where all the pixels have the same gray value.

Computational Efficiency of Various Improved Shannon Entropies
In this section, the computational efficiency of these improved Shannon entropies is discussed. It is necessary to note that an efficiency evaluation (in terms of central processing unit, CPU, time [73]) was not formally included in this study due to two reasons. First, the algorithms of the improved Shannon entropies were implemented in different programming environments in this study. More specifically, the algorithm of RM06 was implemented in MathWorks (MatLab, R2016a) while that of the other improved Shannon entropies in Visual Studio (Microsoft, 2015). Second, some of the improved Shannon entropy algorithms were optimized in this study to improve their efficiency; otherwise, it takes-according to preliminary estimates-a week with a desktop computer to compute all the improved Shannon entropies of the 20,000 testing images.
To provide an intuitive insight into the computational efficiency of different Shannon entropies, the following experiment was carried out with a desktop computer equipped with an Intel Core i7-4790 CPU @ 3.60 GHz and 8.00 GB RAM. First, a total of 100 configuration-disordered images were randomly selected from the testing image dataset. Then, all the Shannon entropies of each selected image were computed using algorithms without any optimization. The CPU time required by each computation was recorded and is shown in Table 6. It can be seen from this table that Cl05 is the most time-consuming Shannon entropy. It has been shown in the evaluation that Cl05 is the best method according to the three criteria defined in this study. However, one may argue that such a method is essentially not a Shannon entropy because it can be replaced by its coefficient, d s /d d , which is an index of correlation. Here we first removed the probability component from the equation of Cl05, leaving only the coefficients as shown in Equation (30) (referred to as Coef_Cl05). Then, we computed the values of Coef_Cl05 for all the testing images and found that the trend shown by Coe f _Cl05 is similar as that of Cl05, as shown in Figure 18. A further regression analysis shows that there is a strong liner relationship between Cl05 and Coef_Cl05, as shown in Figure 19:

Thermodynamic Entropy and Fractal Dimension
It is appropriate at this point to mention two relevant topics, namely thermodynamic entropy and fractal dimension. The concept of thermodynamic entropy, as its name suggests, originates from thermodynamics which is a branch of physics dealing with the movement of energy [74]. Thermodynamic entropy (sometimes referred to as Boltzmann [75] entropy) is similar, or even equivalent in some sense [76], to Shannon entropy, as both of them can be used to statistically characterize the disorder of a system [77,78]. But a clear difference between them is that Shannon entropy is commonly expressed in binary digits per unit (e.g., bits per pixel), while thermodynamic entropy is quantified in units of energy divided by temperature [79].
Although Shannon entropy sometimes is capable of characterizing the disorder of a system, the characterization depends largely on the scale adopted to measure that system (i.e., measurement scale). That is, the value of Shannon entropy may differ largely with the measurement scale. In this sense, one needs to determine the characteristic scale [80][81][82][83] of a system before computing an entropy. However, a large number of systems, such as urban forms and coastlines, are "scale-free" [84,85], namely that they have no characteristic scales. In this case, fractal metrics, such as fractal dimension [86,87], information dimension [88,89], and ht-index [90][91][92][93], can be used as effective alternatives to Shannon entropy because these metrics are independent of measurement scales.

Conclusions
In this study, a systematic evaluation of various improved Shannon entropies was conducted. In doing so, a critical review was first undertaken on the improvements on Shannon entropy for quantifying the configurational information (i.e., the configurational disorder) of a gray-level image. Next, a systematic evaluation of various improved Shannon entropies was designed. To generate testing data for such an evaluation, a strategy for simulating the mixing of ideal gases-a thermodynamic process of entropy increasing-was proposed in this study. Furthermore, to evaluate the performance of improved Shannon entropies, three criteria were defined (i.e., validity, reliability, and ability to capture configurational disorder) and three measures were developed to assess the fulfillment of each criterion. Finally, 23 variants of Shannon entropy (Table 1) were evaluated, with a testing dataset containing 20,000 increasingly configuration-disordered images. From the results of the evaluation, the following can be concluded: 1. Among all the variants of Shannon entropy, only the two based on LBP (local binary pattern)-Qu12-L and Qu12-L'-are invalid to quantify the configurational information of an image. However, it is worth noting that, although valid with the testing images in this study, Qu12-V, Qu12-V-5, and Qu12-G may be invalid with other images due to dividing by zero. Figure 19. The relationship between Cl05 and Coef_Cl05.

Thermodynamic Entropy and Fractal Dimension
It is appropriate at this point to mention two relevant topics, namely thermodynamic entropy and fractal dimension. The concept of thermodynamic entropy, as its name suggests, originates from thermodynamics which is a branch of physics dealing with the movement of energy [74]. Thermodynamic entropy (sometimes referred to as Boltzmann [75] entropy) is similar, or even equivalent in some sense [76], to Shannon entropy, as both of them can be used to statistically characterize the disorder of a system [77,78]. But a clear difference between them is that Shannon entropy is commonly expressed in binary digits per unit (e.g., bits per pixel), while thermodynamic entropy is quantified in units of energy divided by temperature [79].
Although Shannon entropy sometimes is capable of characterizing the disorder of a system, the characterization depends largely on the scale adopted to measure that system (i.e., measurement scale). That is, the value of Shannon entropy may differ largely with the measurement scale. In this sense, one needs to determine the characteristic scale [80][81][82][83] of a system before computing an entropy. However, a large number of systems, such as urban forms and coastlines, are "scale-free" [84,85], namely that they have no characteristic scales. In this case, fractal metrics, such as fractal dimension [86,87], information dimension [88,89], and ht-index [90][91][92][93], can be used as effective alternatives to Shannon entropy because these metrics are independent of measurement scales.

Conclusions
In this study, a systematic evaluation of various improved Shannon entropies was conducted. In doing so, a critical review was first undertaken on the improvements on Shannon entropy for quantifying the configurational information (i.e., the configurational disorder) of a gray-level image. Next, a systematic evaluation of various improved Shannon entropies was designed. To generate testing data for such an evaluation, a strategy for simulating the mixing of ideal gases-a thermodynamic process of entropy increasing-was proposed in this study. Furthermore, to evaluate the performance of improved Shannon entropies, three criteria were defined (i.e., validity, reliability, and ability to capture configurational disorder) and three measures were developed to assess the fulfillment of each criterion. Finally, 23 variants of Shannon entropy (Table 1) were evaluated, with a testing dataset containing 20,000 increasingly configuration-disordered images. From the results of the evaluation, the following can be concluded:

1.
Among all the variants of Shannon entropy, only the two based on LBP (local binary pattern)-Qu12-L and Qu12-L -are invalid to quantify the configurational information of an image. However, it is worth noting that, although valid with the testing images in this study, Qu12-V, Qu12-V-5, and Qu12-G may be invalid with other images due to dividing by zero.

2.
Variants of Shannon entropy differ significantly in terms of reliability. The most reliable variant of Shannon entropy is Cl05, with an R-value of 2.50. In contrast, the least reliable one is RM06, with an R-value of 331.23 that is 131 times larger than that of Cl05.

3.
In terms of the ability to quantify configurational information (i.e., to capture configurational disorder), the best two variants of Shannon entropy are Cl05 (with an A-value of 0.82) and RM06 (with an A-value of 0.88). As for the other variants, they have a similar performance with A-values ranging from 0.96 to 0.98.

4.
Cl05 is the best variant of Shannon entropy for quantifying the configurational information of images according to the three criteria defined in this study. However, from a theoretical point of view, it is debatable whether the nature of Cl05 is still in Shannon entropy or not; from a technical point of view, practical applications of Cl05 in remote sensing image processing may be limited by its high computational complexity.
The significance of this study can be seen from two perspectives. Theoretically, it presents for the first time a comprehensive evaluation framework (including testing data, criteria, and measures) for the usability of various of entropies. This evaluation framework will play a guiding role in further improving the usability of information-theoretic measures for spatial sciences. Practically, the conclusions of this study are useful for various image processing applications in selecting an entropic measure. For example, a number of band selection algorithms [94][95][96][97] for hyperspectral remote sensing images rely on entropic measures for characterizing the information content of each band. In this case, the improved Shannon entropies which are valid and reliable in this study can be used as effective alternatives to the original Shannon entropy.
Future research is recommended in two areas. First, the computational efficiency of Cl05 can be improved to achieve its real-time performance with large datasets. For this purpose, some advanced computational means, such as parallel [98,99] and cloud computing [100,101], may be of use. Second, a comparison can be made between the improved Shannon entropies and Boltzmann entropy, which is "both configurational and compositional" [102] and has been recommended for use as an alternative to Shannon entropy in characterizing spatial disorder [31,103].