Two-Dimensional EspEn: A New Approach to Analyze Image Texture by Irregularity

Image processing has played a relevant role in various industries, where the main challenge is to extract specific features from images. Specifically, texture characterizes the phenomenon of the occurrence of a pattern along the spatial distribution, taking into account the intensities of the pixels for which it has been applied in classification and segmentation tasks. Therefore, several feature extraction methods have been proposed in recent decades, but few of them rely on entropy, which is a measure of uncertainty. Moreover, entropy algorithms have been little explored in bidimensional data. Nevertheless, there is a growing interest in developing algorithms to solve current limits, since Shannon Entropy does not consider spatial information, and SampEn2D generates unreliable values in small sizes. We introduce a proposed algorithm, EspEn (Espinosa Entropy), to measure the irregularity present in two-dimensional data, where the calculation requires setting the parameters as follows: m (length of square window), r (tolerance threshold), and ρ (percentage of similarity). Three experiments were performed; the first two were on simulated images contaminated with different noise levels. The last experiment was with grayscale images from the Normalized Brodatz Texture database (NBT). First, we compared the performance of EspEn against the entropy of Shannon and SampEn2D. Second, we evaluated the dependence of EspEn on variations of the values of the parameters m, r, and ρ. Third, we evaluated the EspEn algorithm on NBT images. The results revealed that EspEn could discriminate images with different size and degrees of noise. Finally, EspEn provides an alternative algorithm to quantify the irregularity in 2D data; the recommended parameters for better performance are m = 3, r = 20, and ρ = 0.7.


Introduction
Image-processing applications have allowed the advance of several technologies in medicine, informatics, microscopy, agriculture, and others. Over time, various issues in science and technology have prompted improvements in algorithms for extracting features from digital images, which are useful in face detection, character recognition, and augmented reality [1]. Therefore, processing techniques let us handle the digitized images mathematically to obtain quantitative data and perform detection, recognition, segmentation, and classification tasks in order to obtain high-quality products while reducing time and costs in production [2].
Currently, texture is a crucial feature that represents an active interest in computer vision systems. Although there is currently no consensus on the formal definition of texture, it has been related to the surface of an object or phenomenon of repetitive pattern in images [3]. In fact, texture elements (called texels) give us information on the spatial distribution of local intensity variations of pixels in a neighborhood [4].
to compare EspEn with DistrEn2D, because it is more focused on small-sized textures. Furthermore, mixing random values with an image does not significantly change the value of DistrEn2D. This document has the following structure. In Section 2, two of the most popular entropy algorithms in image processing (Shannon Entropy and SampEn2D) and the EspEn algorithm proposal are presented. Section 3 describes the methodology used to evaluate the performance of the proposed algorithm and compares it to the other algorithms' performance. Section 4 presents the results and discussions. Finally, Section 5 contains the conclusion of the study.

Shannon Entropy
Shannon Entropy is considered a measure of uncertainty related to the probability distribution that has been used as a Haralick descriptor to categorize the texture of the image. The normalized histogram is an intensity function that shows the count of pixels with equal intensity regardless of position. Entropy is the amount of individual information weighted by the probability of elements occurrence [25] and is defined as follows: The probability of occurrence for each intensity of gray is p i = g/N, where g represents each value of the histogram, and N represents the positions in the matrix.

SampEn2D Entropy
SampEn2D is an extension of the SampEn algorithm in 1D, applied to images, that seeks to preserve the original proposal as a measure of irregularity [24]. SampEn2D algorithm considers an image u(i,j) with width, W, and height, H. Let x m (i,j) be the set of pixels that form a square of length m, with column range j to j + m − 1 and row range i to i + m − 1.
Let Nm be the number of square windows (x m (i,j)) within u that can be generated for both m and m + 1. This can be calculated by Nm = (W − m) × (H − m). Considering a threshold of similarity, r, SampEn2D is defined as follows: where and where a ranges from 1 to H − m, b ranges from 1 to W − m, (a, b) = (i, j) to exclude self-matches, and r can be defined as a fraction of image standard deviation. The distance function, d, is defined by the following: where k and l range from 0 to m − 1.

Espinosa Entropy Proposal (EspEn) for 2D
On the one hand, although Shannon Entropy is often used as a measure of image irregularity [26][27][28][29], it does not take spatial information into account. Therefore, the entropy value in noisy images or grayscale images could be similar due to the histogram, even when their texture information is different. Therefore, estimates of entropy as irregularity in an image may be wrong. Furthermore, it does not consider the comparison between pixels; thereby, the user cannot set the comparison bearing in mind the characteristics and conditions of the image, which is actually an advantage of current entropy algorithms.
On the other hand, due to the popularity of SampEn in the analysis of temporary signals, the SampEn2D extension has had some visibility for the analysis of irregularity in images (details in References [24,[30][31][32][33]); some new methods have incorporated into their algorithm the calculation of SampEn2D, generating interesting alternatives, such as multiscale entropy (MSE2D) and its variant ModMSE2D [34]. However, expanding the SampEn1D method into the world of 2D data analysis or imaging should mean additional considerations, such as the number of m points taken as a pattern for comparison in SampEn1D; typically, m = 2 or m = 3 is less than the number of points (pixels) taken as a pattern in the case of SampEn2D. Some researchers have analyzed m = 1, m = 2, and m = 3, representing a square window of m * m; for instance, the case of m = 3 in SampEn2D indicates that 9 pixels are taken [24]. In the case of comparing a pattern of 9 pixels with another set of pixels of the same quantity within the image, 1 pixel may be different, while the remaining are the same; for this example, we would say that there is a similarity of 89% than the user could accept as a similarity. SampEn2D is too strict in the comparison, since, if at least 1 pixel is different in the comparison between the pattern and the rest of the pixels of the same dimensions, there is no similarity, being even more critical with m + 1. The above leads to the vector U m i, j (r) of zero, and therefore the final estimate of the SampEn2D is Infinite (Inf) or no data (NaN), as shown in Silva et al. [24] when the researchers evaluated SampEn2D for m = 3 in noisy images. They also refer to the role of r (tolerance threshold) and the different effects produced by the variation of r for 1D signals and 2D data, effects that could be more associated with problems in estimating the entropy of an image. In addition, we consider that, since r is linked to the standard deviation (std), in the case of image processing, especially for very low r values or low values of the standard deviation (image with a single gray level) could be close to zero, due to the nature of pixel values, integer values from 0 to 255. Consequently, a tolerance threshold of zero or simply a very small value would indicate that the comparison distance is very limited, by a few values of gray levels; an extreme case would be a tolerance threshold lower than 1, in whose case the vector U m i, j (r) would not exist.
At this point, a simplified entropy estimator is proposed, called EspEn (Espinosa Entropy), considering the relevant aspects of Shannon and SampEn2D: the comparison of patterns with the remaining of the pixels grouped in the same dimensions of the pattern proposed in SampEn2D, and the simplicity in calculating the probability of occurrence for each intensity of gray according to Shannon's entropy. We tried to overcome the weaknesses of each algorithm to quantify the irregularity of an image.

EspEn Algorithm for Two Dimensions
EspEn is an estimator of the irregularity of an image that considers the probability of occurrence of a set of samples, with dimension m 2 , that are similar within a similarity threshold r, with an acceptable percentage in the number of similar samples. The EspEn algorithm, similar to SampEn2D, considers an image u(i,j) with width, W, and height, H. Let x m (i,j) be the set of pixels that form a square window, with column range j to j + m − 1 and row range i to i + m − 1. The window construction would be x m (i,j) = [u(i,j), u(i, j + 1), . . . , u(i,j + m − 1), u(i + 1,j), u(i + 1,j + 1), . . . , u(i + 1,j + m − 1), . . . , u(i + m − 1,j + m − 1)]. Then, EspEn is defined by the following: EspEn(u, m, r) = − ln(D m ) (8) where where ρ is fixed and represents the percentage of similarity acceptable for the study, expressed in decimals.
to exclude self-matches. The distance function, d, for EspEn is defined by the following: where k, y, and l vary from 0 to m − 1. Note that, in Equation (12), the maximum value of the distances is not estimated, but each of the distances calculated between the pattern and the set of pixels of the same dimensions is evaluated. In Figure 1a, an example of square windows is shown; x m (i,j) and x m (a,b) with m = 3 have different gray values. In Figure 1b, we see the distances between x m (i,j) and x m (a,b). Moreover, ϕ(r) is calculated by counting the distances within the threshold of similarity r; in the example, there are 8 distances ≤ r. This result is divided by the total number of possibilities (m 2 ); 0.88 is compared to ρ to establish the acceptable similarity between the windows, given by the observer. Let xm(i,j) be the set of pixels that form a square window, with column range j to j + m − 1 and row range i to i + m − 1. The window construction would be xm(i,j) = [u(i,j), u(i, j + 1), …, u(i,j + m − 1), u(i + 1,j), u(i + 1,j + 1), …, u(i + 1,j + m − 1), …, u(i + m − 1,j + m − 1)]. Then, EspEn is defined by the following: where is fixed and represents the percentage of similarity acceptable for the study, expressed in decimals.
where k, y, and l vary from 0 to m−1. Note that, in Equation (12), the maximum value of the distances is not estimated, but each of the distances calculated between the pattern and the set of pixels of the same dimensions is evaluated. In Figure 1a, an example of square windows is shown; xm(i,j) and xm(a,b) with m = 3 have different gray values. In Figure 1b, we see the distances between xm(i,j) and xm(a,b). Moreover, ( ) is calculated by counting the distances within the threshold of similarity r; in the example, there are 8 distances ≤ r. This result is divided by the total number of possibilities ( 2 ); 0.88 is compared to to establish the acceptable similarity between the windows, given by the observer.  The similarity threshold parameter (r) in EspEn should be fixed, considering the standard deviation of the image but not linked to it. We consider this parameter as an

Set of Images
This section describes the characteristics of the images used to analyze the performance of EspEn in evaluating the irregularity of an image. Synthetic images that had repeating (predictable) and clearly identifiable patterns (shapes) were created. These images were progressively contaminated with uniform white noise, similar to the process shown with MIX 2D in Reference [24], defined as follows: where X ij represents the synthetic image, Y ij is the noise image with normalized random values with amplitude from 0 to 255 at each pixel with uniform distribution, and p represents the degree of contamination: p = 0 (without contamination) and p = 1 (only noise). Initially, four X ij images were generated with class unit8 and dimensions of 500 × 500 pixels, Figure 2a is based on sinusoidal functions created with the same process described in Reference [24], where X ij = sin(2πi/48) + sin(2πj/48). Figure 2b is a checkerboard image with 50 black squares (pixels of value 0) and 50 white squares (pixels of value 255) interspersed; the box in the upper left corner is black, and the size of each box on the board is 50 × 50 pixels. Figure 2c represents vertical stripes, which were created by an automatic path in the matrix, each 50 columns, taking all the rows and making a displacement to replace the first 25 columns by pixels with value 255 (white) and pixels with a value of 0 (black) in the remaining 25 columns. This process was repeated until the full dimensions of the image were reached, thus obtaining 10 white stripes and 10 black stripes interspersed (each strip had 500 rows × 25 columns), starting with a white stripe. Figure 2d represents horizontal stripes, which were created through a cycle that ran through the matrix every 50 rows, selecting all the columns and performing an automatic scrolling to replace the first 25 rows by pixels with value 255 (white), and in the remaining 25 rows the pixels with a value of 0 (black). This process was repeated until the full dimensions of the image were reached, thus obtaining 10 white stripes and 10 black stripes arranged interspersed (each strip had 25 rows × 500 columns) and starting from a white strip. In each case, the complement images were considered to expand the set of images. There was a total set of 8 synthetic images.

Experiment and Parameters
Three experiments were performed: The first consisted of implementing 3 entropy algorithms (Shannon, SampEn2D, and EspEn) on synthetic images (MIX(0), MIX(0.33), MIX(0.66), and MIX(1)) with different sizes. The only input argument to the Shannon algorithm is the image. The parameters used in the SampEn2D algorithm were the image (u), the length of the square window (m = 2), and the tolerance factor or similarity threshold (r = 0.2 × standard deviation of each image). The parameters used in the EspEn algorithm were the image (u), length of the square window (m = 3), percentage of similarity between windows (ρ = 0.7), and the similarity threshold (r = 20).

Experiment and Parameters
Three experiments were performed: The first consisted of implementing 3 entropy algorithms (Shannon, SampEn2D, and EspEn) on synthetic images (MIX(0), MIX(0.33), MIX(0.66), and MIX(1)) with different sizes. The only input argument to the Shannon algorithm is the image. The parameters used in the SampEn2D algorithm were the image (u), the length of the square window (m = 2), and the tolerance factor or similarity threshold (r = 0.2 × standard deviation of each image). The parameters used in the EspEn algorithm were the image (u), length of the square window (m = 3), percentage of similarity between windows (ρ = 0.7), and the similarity threshold (r = 20).
The second numerical experiment was to implement the EspEn algorithm in synthetic images MIX(0), MIX(0.33), MIX(0.66), and MIX (1) with a size of 100 × 100 pixels and the value of the parameters m, r, and to observe their influence or impact on the response of EspEn. The second numerical experiment was to implement the EspEn algorithm in synthetic images MIX(0), MIX(0.33), MIX(0.66), and MIX (1) with a size of 100 × 100 pixels and the value of the parameters m, r, and ρ to observe their influence or impact on the response of EspEn.
The third experiment consisted of implementing the EspEn algorithm with m = 3, ρ = 0.7, and r = 20, to 112 images from Brodatz's database of normalized textures [35], sampled s = 6 to obtain images of size 107 × 107 pixels.

EspEn (m, r, and ρ) Applied to Images from Normalized Brodatz's Textures Database
The normalized Brodatz texture database (NBT) contains images with different shapes and textures, where the spectral informational background of the grayscale Brodatz textures was removed so that the discrimination of the texture does not depend on the background information, using first-order statistics [35]. The set of images has been used in investigations related to the analysis of texture and irregularity; some investigations in which this database has been used are References [34,36,37].
We applied the EspEn algorithm with parameters m = 3, ρ = 0.7, and r = 20 to images from the NBT database; we sampled s = 6 to obtain images with dimensions of 107 × 107 pixels, and 112 images were processed. The algorithm that used the least time was Shannon, and the algorithm that took the most time was EspEn, about 87 times more than SampEn2D. The images of 500 × 500 pixels could not be evaluated because the time used by EspEn exceeded 2 days of processing. Shannon and SampEn2D Entropy algorithms spent more time processing regular images (MIX(0)). SampEn2D and EspEn took more computation time as the images increased in size; this time used apparently increased exponentially as a function of size. Similar results were reported by da Silva et al. in 2018, when they used the multiscale entropy algorithm (MSE) adapted for two-dimensional data processing, which used SampEn2D as the basis for calculating irregularity [34]. overhead… ** WARNING: unlocking the allowed array size can cause memory errors that could cause Matlab to crash **".

Computational Cost
The reduction of time in the 2D data processing, to quantify the irregularity, is a current problem that presents an interesting challenge for future proposals. Currently, the delay for processing in existing algorithms (including EspEn) is an impediment for realtime applications.  Figure 4 shows the entropy values of each algorithm applied to images with different degrees of contamination with the noise of uniform distribution and different sizes. Shannon Entropy (Figure 4a) can process regular images (MIX(0)) and irregular (MIX(1)). Nevertheless, the entropy values for MIX(0.33) and MIX(0.66) are very close to the maximum entropy value. Consequently, it is difficult to distinguish between images with different degrees of contamination. The resulting values do not vary in a relevant way regarding the variation in size and maintaining certain stability in the measurement. The EspEn algorithm takes longer because it compares each possible pattern (H − m + 1) × (W − m + 1) with each pixel in the image (reference point to form the square window of length m); for example, if a 500 × 500 pixel image is processed, 6.1506 × 10 10 comparison procedures are performed.

Shannon, SampEn2D, and EspEn Results (All Images)
These extensive comparisons are a problem for many algorithms to quantify the irregularity of 2D data. This problem is so evident that the EntropyHub: Matlab platform (https://www.entropyhub.xyz/matlab/EHmatlab.html, accessed on 12 September 2021), which contains a repository of entropy algorithms, indicates in the documentation of some algorithms (for example, SampEn2D) a warning message related to the size of the images: " . . . By default, 'SampEn2D' only allows arrays with a maximum size of 128 × 128 to avoid RAM overhead . . . ** WARNING: unlocking the allowed array size can cause memory errors that could cause Matlab to crash **".
The reduction of time in the 2D data processing, to quantify the irregularity, is a current problem that presents an interesting challenge for future proposals. Currently, the delay for processing in existing algorithms (including EspEn) is an impediment for real-time applications.

EspEn Validation
This section presents the impact on the entropy measurements obtained with EspEn, varying the parameters m, r, and ρ, when the algorithm was applied to 100 × 100 pixel images contaminated with different degrees of white noise (MIX(p)). Figure 5a shows the behavior of the entropy measurements when m changes. A low value of m causes low entropy values for all MIX groups; this causes a difficult differentiation between the MIX groups. The increase in m allows a separation between the entropy values of the groups of MIX images, desirable for the classification between regular and irregular images. Table 1 shows that the most significant differences between MIX groups were obtained with m = 3 and m = 4.  Figure 4b shows the measurements of SampEn2D(u, m, r), where there is evidence of a problem already reported by da Silva et al. [24], related to the length of the square window (m = 2 or higher), for this case, the entropy values obtained were "Inf" for the MIX(1) images of size <250 × 250 pixels and for the images MIX(0.66) of size 50 × 50 pixels. Figure 4c shows the EspEn results, where there is a distinction between images contaminated with different degrees of noise. The lowest entropy value was for regular images (MIX(0)), the highest value was for irregular images (MIX(1)) and intermediate values, clearly differentiated, for MIX(0.33) and MIX(0.66), being the values of MIX(0.66) > MIX(0.33). Entropy measurements were obtained with few variations according to the sizes evaluated.

EspEn Validation
This section presents the impact on the entropy measurements obtained with EspEn, varying the parameters m, r, and ρ, when the algorithm was applied to 100 × 100 pixel images contaminated with different degrees of white noise (MIX(p)). Figure 5a shows the behavior of the entropy measurements when m changes. A low value of m causes low entropy values for all MIX groups; this causes a difficult differentiation between the MIX groups. The increase in m allows a separation between the entropy values of the groups of MIX images, desirable for the classification between regular and irregular images. Table 1 shows that the most significant differences between MIX groups were obtained with m = 3 and m = 4.

EspEn Validation
This section presents the impact on the entropy measurements obtained with EspEn, varying the parameters m, r, and ρ, when the algorithm was applied to 100 × 100 pixel images contaminated with different degrees of white noise (MIX(p)). Figure 5a shows the behavior of the entropy measurements when m changes. A low value of m causes low entropy values for all MIX groups; this causes a difficult differentiation between the MIX groups. The increase in m allows a separation between the entropy values of the groups of MIX images, desirable for the classification between regular and irregular images. Table 1 shows that the most significant differences between MIX groups were obtained with m = 3 and m = 4.      Figure 5b shows the effect of the variation of r on the entropy measurements, using EspEn. When r is low (a distance of five gray levels on the graph), there is a large separation between regular and irregular images, lower entropy values for MIX(0), and higher entropy values for MIX(1) and MIX(0.66). With the increase of r, the entropy values decrease for images with some degree of contamination by noise. When r is near or greater than the value of the standard deviation of the image, it is difficult to differentiate between MIX groups. At the other extreme, when r is very small (eg, r ≤ 1), differentiating between images contaminated with a high degree of noise (MIX (0.66) and MIX (1)) is difficult. Table 2 shows that the most significant differences between MIX groups were obtained with 15 < r < 35.   Figure 5c shows the behavior of the entropy values regarding the change of ρ for images MIX. A low value of ρ causes low entropy values for all the MIX groups; this causes a difficult differentiation between the groups. The increase in ρ allows a separation between the entropy values of the groups of MIX images, as is desirable for the classification between regular and irregular images. Table 3 shows that, when ρ ≥ 0.7, the differentiation between the MIX groups improves.  Although NBT images have been used in investigations of texture and irregularity analysis of two-dimensional data, there is no validated and accepted classification regarding the regularity or irregularity of each image in the database. Table 4 shows the entropy values obtained with the application of EspEn on NBT images, ordered from the lowest entropy value (regularity) to the highest entropy value (irregularity). This information can be a reference for new algorithm proposals to quantify the irregularity of an image. Figure 6 shows 35 NBT images as an example, distributed in five columns and seven rows; each row represents a range of entropy value, from greater regularity (first rows) to greater irregularity (last rows). For a better understanding, we use a code of color marking in Table 4 for the images and entropy values that we show in Figure 6. Column 6 of Figure 6 shows the Entropy range EspEn obtained from the images of the corresponding row.

Summary Characteristics of EspEn (u, m, r, ρ)
EspEn (u, m, r, ρ) is an innovative algorithm that allows users to quantify the irregularity present in an image. Parameter considerations to take into account include the image (u); image size is recommended to be low, due to computational cost, so that for large images (≥250 × 250 pixels) subsampling is performed. The value of m is recommended to be 2 ≤ m ≤ 4, typical value m = 3. The value of r is recommended to be 15 ≤ r ≤ 25, typical value r = 20 for std = ±80, avoid r ≥ std and r ≤ 1. The value of ρ is recommended to be 0.7 ≤ ρ ≤ 0.9, typical value ρ = 0.7. In case m ≥ 5 is used, decrease ρ-value if a weak similarity between windows is considered. Although NBT images have been used in investigations of texture and irregularity analysis of two-dimensional data, there is no validated and accepted classification regarding the regularity or irregularity of each image in the database. Table 4 shows the entropy values obtained with the application of EspEn on NBT images, ordered from the lowest entropy value (regularity) to the highest entropy value (irregularity). This information can be a reference for new algorithm proposals to quantify the irregularity of an image. Figure 6 shows 35 NBT images as an example, distributed in five columns and seven rows; each row represents a range of entropy value, from greater regularity (first rows) to greater irregularity (last rows). For a better understanding, we use a code of color marking in Table 4 for the images and entropy values that we show in Figure 6. Column 6 of Figure  6 shows the Entropy range EspEn obtained from the images of the corresponding row.  EspEn (u, m, r, ) is an innovative algorithm that allows users to quantify the irregularity present in an image. Parameter considerations to take into account include the image (u); image size is recommended to be low, due to computational cost, so that for large images (≥ 250 × 250 pixels) subsampling is performed. The value of m is recommended to be 2 ≤ m ≤ 4, typical value m = 3. The value of r is recommended to be 15 ≤ r ≤ 25, typical value r = 20 for std = ±80, avoid r ≥ std and r ≤ 1. The value of is recommended to be 0.7 ≤ ≤ 0.9, typical value = 0.7. In case m ≥ 5 is used, decrease -value if a weak similarity between windows is considered. Figure 6. Database of Normalized Brodatz textures. The EspEn algorithm was applied to the images from https:// multibandtexture.recherche.usherbrooke.ca/normalized_brodatz_more.html (accessed on 23 September 2021), and 5 images were taken as an example. The last column specifies the entropy value range obtained with EspEn for the images in the corresponding row. Each row can be interpreted as a degree of irregularity.

Conclusions
Entropy algorithms applied to images to estimate irregularity provide relevant information that can be used in texture analysis, classification, or segmentation processes. These algorithms have been useful in various fields of industry, agronomy, and biomedicine. We have proposed a new algorithm to quantify the irregularity of an image, called EspEn. The measurements provided by EspEn are consistent and robust for images contaminated with different degrees of noise. The following characteristics of EspEn stand out: (i) The entropy measurements show little variation with the change of the image size, overcoming the limitations that SampEn2D presents for small image sizes. (ii) The percentage of acceptable similarity (ρ) gives the researcher the possibility to decide how many pixels below r are accepted as a similarity between the pattern and the window. EspEn is more flexible than SampEn2D in the comparison between pattern and window. The rigidity of SampEn2D generates results that cannot be manipulated or interpreted. (iii) The similarity threshold takes into account the standard deviation but does not depend on it to control the limits allowed to perform the quantization of entropy without bias. (iv) EspEn is a simplified algorithm compared to SampEn2D, because it does not need to evaluate the algorithm in m + 1, and it is more robust than Shannon's Entropy, because it takes into account spatial information from the image.
The most notable disadvantage of EspEn is the high computational cost for large images, which can be overcome by subsampling the image, because the entropy value does not differ greatly with the change in size.