Recovery of Natural Scenery Image by Content Using Wiener-Granger Causality: A Self-Organizing Methodology

: One of the most important applications of data science and data mining is is organizing, classifying, and retrieving digital images on Internet. The current focus of the researchers is to develop methods for the content based exploration of natural scenery images. In this research paper, a self-organizing method of natural scenes images using Wiener-Granger Causality theory is proposed. It is achieved by carrying out Wiener-Granger causality for organizing the features in the time series form and introducing a characteristics extraction stage at random points within the image. Once the causal relationships are obtained, the k-means algorithm is applied to achieve the self-organizing of these attributes. Regarding classiﬁcation, the k − NN distance classiﬁcation algorithm is used to ﬁnd the most similar images that share the causal relationships between the elements of the scenes. The proposed methodology is validated on three public image databases, obtaining 100% recovery results. and C.B.-A.; validation, C.B.-A.; conceptualization, C.A.-C. and C.B.-A.; formal analysis, C.B.-A., E.R.-M. and A.Z.-L.; methodology, C.B.-A. and C.A.-C.; C.A.-C. supervised the overall research work. All authors contributed to the discussion and conclusion of this research. All authors have read and agreed to the published version of the manuscript.


Introduction
With the increasing usage of internet and digital gadgets, content-based image retrieval (CBIR) has grown and been applied in fields such as artificial vision and artificial intelligence [1]. Currently, improvements have been reported in new CBIR approaches and several effective algorithms have been established that allow searching and retrieving images (by content) from an input image [2][3][4][5][6]. The application areas include: fashion, people identification, e-commerce recovery products, remote sensing recovery images, brand images recovery, natural scenes recovery, among others [7,8].
In the computer vision or artificial vision research areas, currently, the aim is to emulate the human visual system in the best way possible. Therefore, the objective pursued by artificial vision is to provide computers with three-dimensional human visual capabilities, generally starting from images in 2D [9].
Since any object or scene cannot be recognized efficiently by a single algorithm, this leaves the door open for computer vision applications where areas such as digital image processing, pattern recognition, machine learning, etc. are combined.
Due to the exponential increase in the natural scenes images in the web, one of the automated image recognition systems tasks is to successfully classify and identify natural scenes images (A natural scene is said to be natural if the image has no human intervention or alteration). It is estimated that more than half of the information on the Internet are images, of which 85% were taken through mobile devices, and a final estimate of five billion images was considered until 2018 [10]. To use images efficiently, a CBIR system is necessary, which would help the users to find relevant images having self-contained features based on our visual perception.
In this work, a natural scene retrieval system is developed by applying the Wiener-Granger Causality theory (WGC) [11] as a tool to analyze images through self-organized information. The causal relationships between the local textures contained in an image were identified, which helps in characterizing a descriptive pattern of a set of natural scenes within an image data set.
The main stages (shown in Figure 1) involved in the developed system are as follows: 1. Image reading: The images in the dataset are read, and then a color space change is applied from Red-Green-Blue (RGB) space to Hue-Saturation-Intensity (HSI) space.

2.
Feature Extraction: CBIR statistical feature extraction is randomly generated based on 300 image points.

3.
Time series conformation: Texture features are organized as a time series for each image.

4.
Causality analysis: WGC analysis is applied to calculate the causal relationship matrix between different textures. 5.
Classification application: The k − NN k-next neighbors classification algorithm is used to find the features closest to or similar to the searched one.
The work proposes a causality analysis of the natural scene classes based on an autogenerated texture dictionary and the WGC analysis of the CBIR features [12,13] to provide the characterization of the data set. It is also possible to search for a type of constitutive element of the scenes, i.e., water, clouds, forests, etc. The proposed system recovers all the scenes that contain a particular element, even when they are classified into different natural scenes.
The proposed methodology was tested on three natural scenes databases: Vogel and Shiele (V_S) [14], Oliva and Torralba (O_T) [15], and Shullani (Sh) [16]. The proposed method in this research work can be implemented in an autonomous natural scene recognition system mounted on a car or drone with 100% certainty.
The rest of the paper is organized as follows. In Section 3, the theoretical support of the WGC model to be applied is presented. In Section 4, the methodology is presented. The results are presented in Section 5. Finally, the conclusions and future works are presented in Section 6.

State of the Art
Since the beginning of the Internet, the need to search and retrieve information has increased. Initially, the information available on Internet was mostly in the form of text [17], but with the technological progress, the information circulating on the Internet has advanced to multimedia data, i.e., text, voice, image, video, and graphics, which require search and retrieval. For each media, there is a need for search and retrieval. In this research work, search and retrieval of images of natural scenes has been carried out using Wiener-Granger causality.
Nowadays, people need to search for complete scenes or some elements within the scenes, all this within a context of large and complex image bases. Thus, many important search and retrieval methodologies and algorithms have been presented [18]. Since the existing algorithms have been proven limited in performance, the problem of image search and retrieval is still open for researchers.
The classification of CBIR system is shown in Figure 2. Table 1 shows a few CBIR methodologies application fields i.e., medical, biology, academic, design, video/images, Covid-19, among others.
This research work adopts conventional methodologies using attributes of the Wiener-Granger Causality theory, framed in a self-organizing system of natural settings. The proposed methodology comprises of a stage of feature extraction at random points within the image, after which these features are organized in time-series form, subsequently, the estimation of Wiener-Granger causality is carried out. Once the causal relationships are obtained, a clustering algorithm " k-means " is applied to achieve the self-organization of attributes. Regarding classification, the k − NN distance classification algorithm is used to find the most similar images that share the causal relationships between the elements of the scenes. Our methodology is validated on three public image databases. The advantage of the proposed system is that it is able to search for a particular element in a scenery image i.e., clouds, forest, water, among others. Thus, the search and recovery of the proposed system maybe by means of a component of a scene.

Area Application/Reference
Medical Computer-aided diagnosis of mammography masses based on a supervised content-based image retrieval approach [4].
Relevance feedback for enhancing content based image retrieval and automatic prediction of semantic image features: Application to bone tumor radiography [3].

Vegetable
Vegetable Image Retrieval with Fine-tuning VGG Model and Image Hash [32].

COVID-19
Deep metric learning-based image retrieval system for chest radiography and its clinical applications in COVID-19 [33].
Academic/Educational Design and implementation of CBIR system for academic/ educational purposes [5].
Video/Image processing Comparative analysis on different degrees of jpeg compression used in CBIR systems [34].

Design
Diagram Image Retrieval and Analysis: Challenges and Opportunities [35].

Biology
Indexing and Image Search by the Content according to the Biological Base of the Cognitive Processing of Information using a Neural Sensor [36].

Cloud Repositories
Practical Privacy-Preserving Content-Based Retrieval in Cloud Image Repositories [37].
Towards Privacy-Preserving Content-Based Image Retrieval in Cloud Computing [38].

Wiener-Granger Causality Analysis Theoretical Fundamentals
The Wiener-Granger theory of causality is used in different areas of knowledge, for example, in neurology the WGC theory is used [39] to examine brain areas and the causal relationships between them. The WGC analysis was performed using sensors [40,41], and also, in magnetic resonance images [42,43], the WGC theory is being used for the study of causal relationships between areas of the brain that perform activities. Other fields of science where the WGC theory has been applied are video processing for massive identification of people and vehicles [44][45][46] and the analysis of complex scenarios [47]. In this proposal, for the first time, the WGC theory is applied for the recovery of natural elements and scenes.
For the sake of brevity, the theory is presented only for two random processes, being extensible to n− processes, and to avoid extensive mathematical models. In our approach, a random process corresponds to a signal reading associated with a type of texture within a natural scene. Thus, for the presented analysis, each texture reading corresponds to a stochastic process represented by C i , i being the i-th texture that has a stochastic behavior within a scene.

Stochastic Autoregressive Model
We consider carrying out analysis with two signals, C1 and C2, being easily extensible to n signals-textures. Each stationary process represents a texture of the scene and can be represented by an auto-regressive model as follows: where η 1 C1 and η 1 C2 are the Gaussian random noise with zero mean value and unitary standard deviation; K 1 C1 (k) and K 1 C2 (k) are the coefficients of the regression model to the textures C1 and C2, respectively.
The common auto-regressive model for the two textures is defined by the Equations (3) and (4) where ∑ 2 C1 and ∑ 2 C2 are the variances of the residual terms η 2 C1 and η 2 C2 , respectively. Now we analyze the variances/covariances residual terms η 2 Ci using the equation in matrix form Σ (5): ). Starting from the previous conditions, and using the concept of statistical independence between two random processes at the same time (in pairs), causality can be defined over time. An example of the causality between C1 and C2 is the expression given by Equation (6).
Equation (6) is commonly known as time domain causality. From this equation, if the random processes C1(t) and C2(t) are statistically independent, then F C1,C2 = 0; otherwise there will be causality from one to the other.
In the Equation (1), Σ 1 C1 measures the precision of the auto-regressive model to predict C1(t), established in the passed samples. In turn, Σ 2 C1 in the expression (3) measures the precision to predict C1(t) based on the previous values of C1(t) and C2(t) at the same time. Returning to the case of taking only 2 textures at the same time C1(t) and C2(t) and according to [11], if Σ 2 C2 < Σ 1 C1 then C2(t) is said to have a causal influence on C1(t). Causality is defined by Equation (7) as follows: If F C2→C1 = 0 then there is no causal influence of C2(t) towards C1(t), at any other value, the result will be nonzero. Also, the causal influence of C1(t) towards C2(t) is established by Equation (8):

Methodology
The training stage in the proposed methodology is presented in Figure 3. It includes the image database reading, the change of color space, the random points selection, the attributes extraction, the auto-grouping, the time series generation, the causality calculation, the classification-recovery of the k-next images, and finally, the most similar natural scenes are shown.
The main hypothesis of the proposed methodology is the automatic texture dictionary generation, which represents their own elements contained in the scenes such as water, foliage, clouds, rocks, sand, etc. (see Figure 4). It depends on the self-organization of the information through the k-means clustering algorithm. Each block in the methodology is described in detail below.

1.
DB: Contains the set of natural scenes images.

2.
Preprocessing: Load the image from the DB, equalize the histogram in the three layers and finally change the RGB color space to the HSI (Hue, Saturation, Intensity) color space, which provides the related textured information.

3.
Random points seeding: A uniform 300 random points seeding is done inside the image.

4.
Feature extraction. This step is carried out in two main parts and to have several samples of the image, the process is repeated r number of times.
• Neighborhood generation: At each random point, a window of size p × p pixels is created starting with the interest point in the upper right corner, as shown in Figure 5, where p < image_rows and p < image_colums. • CBIR feature extraction: In each neighborhood, three CBIR features are extracted, that are mean, standard deviation, and homogeneity in the three layers of the HSI color space. This appears in a 1 × 9 CBIR features vector, for an image a matrix N P × 9 size is created. Where N P is the number of random points and F CBIR are the extracted CBIR features. Figure 5. Image with random points, each one generating a neighborhood 10 × 10 size pixels.

5.
Grouping CBIR textures: Once the CBIR features of each image in the database are extracted, an F CBIR matrix of size (r × N I × N P ) × 9 is generated, where r is the number of repetitions, N I is the number of images in the database, N P is the number of random points and, F CBIR the CBIR features at each point for each HSI color layer. By means of the k-means algorithm, the CBIR features are grouped into k clusters which constitute the k most representative database textures, generating a CM k matrix. 6.
Time series generation: From the F CBIR matrix of the previous step, each item of the matrix is compared with the entries of the MC k matrix of the automatic dictionary to construct a discrete signal as a time series T S having a size of k × r × N I , where k is the number of automatic textures of the k-means algorithm, r is the number of repetitions of experiment, and N I is the number of image in the database.

7.
Wiener-Granger Causality analysis: An entry in the T S matrix has a size of k × r for each image in the database. This input will be the one that feeds the CWG analysis, as shown in Figure 6. The causality analysis was calculated with the MVGC causality toolbox [48]. When causality analysis has been performed for each image in the database, we obtain a causality relationships matrix Λ I of size k × k. The element F C i ,C j represents the causal relationships between the scene components C i towards C j ∀i, j ∈ [1...k] (see Equation (9)). If F C i ,C j = 0, then there is no causal relationship between the image i → j, otherwise, there is a strong causal relationship between them.
where the constant ALL I is defined as The causality matrices Lambda I are converted into a vector by concatenating their rows, in the same step the elements of the main diagonal are deleted because there is no causal relationship of a variable with itself. Each element converted into a vector is now an image representative pattern in the database within an array named Θ, of size where N I is the number of images, and k the number of automatic textures in the dictionary. 8.
Finally, a new Θ matrix grouping is carried out using the k-means algorithm to create a set of classes to which each pattern of causality can be seen represented employing an average value. The k value to create the number of classes is k = N c . This is done to obtain N c classes within the patterns in the Θ array. Therefore, the generated class array has a size of N c × (r × r) − r elements.

Experiments and Results
The evaluation of the proposal was carried out using a 19 dual-core processors workstation. The processor used is an Intel copyright Xeon copyright CPU E5-2670 v3 2.30GHz, and 128 GB of RAM.
The current methodology was tested on three natural scenes databases. The metric used was Euclidean.

Recovery Results
The first two previously mentioned image databases were concatenated, obtaining a single broad image base; the second image database is Shullani (Sh). The natural scenes are: forests, skies, coasts, mountains, fields, and rivers. The results were taken based on two performance evaluation methods: (i) in resubstitution (see Figure 7) and (ii) in cross-validation at 70% and 30% (see Figure 7b). Regarding the number of centers in the k-means algorithm, a value of K = 9 was used, since it provided the best results.
As can be seen in Figure 7a,b, the search or query image is recovered along with the 5 more similar images.The proposed methodology gives 100% result if the the recovery of the most similar image is taken into account in the resubstitution method. In contrast, for the cross-validation method, the five recovered images belong to the same type of natural scene.
Finally, the quantification of the performance of the proposed system is given in Table 2 via confusion matrix showing 100% recovery of each of the natural scene.    Figure 8 shows the queries result of an image with a percentage reduction of 50% and 75%. In the confusion matrix, the 100% recovery in the query image is achieved within the 5 closest images given in Table 3.

Rotation Recovery Results
An important test in this proposal is the rotational invariance of the natural scenes. Since the images can be taken from a drone, light aircraft, helicopter, satellite, etc., the images can be acquired rotated at a given angle. Thus, using the scenes from the proposed databases, In Figure 9, it can be seen that the query image is within the most similar image. Although the five most similar images are sought, the image firstly recovered is the image sought. The confusion matrix result for the rotation tests is given in Table 3.

Noise Recovery Results
The most challenging tests for the proposed methodology are the images having noise, because the CBIR methodology works directly with the texture, the noise directly affects the CBIR performance. Figure 10 shows the recovery results for three types of images contaminated with salt and pepper noise at 0.1%, 0.3%, and 0.5%, the searched image again appears within the 5 most similar images.The result of the confusion matrix for the noise tests is given in Table 3.

Recovery Results for Vision Database
The second set of images of natural scenes (Shullani [16]) taken with cellphone devices was used to test the same methodology as previously defined. Figure 11 shows some natural scenarios such as forest, sky, coast, mountain, prairie, and river. Precision measure was used to determine performance of the proposal, precision measure is defined in Equation (11).
where P = precision, TP = True Positive, FP = False Positive. As shown in Figures 12 and 13, the search or query image is recovered along with the 5 more similar images by taking into account the recovery of the most similar image in the replacement method. The proposed methodology gives 100% result if the the recovery of the most similar image is taken into account in the resubstitution method. Figure 13 concatenates all the tests carried out on the referenced database such as rotated, scaled, and noised (salt and pepper noise). It can be observed that the proposed method works 100% for the rotated and scaled images, whereas the performance falls to 50% for images having noise.

Recovery Results by Element
The proposed methodology also allows the searching of natural scenes by their constituent elements, that is, from among all the scenes in the image base, retrieving all the scenes that contain a particular element of the scene (or common element). Figure 14 shows some examples of natural scenes i.e., fields, rivers, forests, beaches, and mountains. In the lower part of the Figure 14, some semantic concepts that constitute the scenes are shown i.e., clouds, sky, water, grass, mountains, etc. The protocol experimentation to quantitatively evaluate the scene recovery by element was as follows: one of the six elements to recover in the scenes was defined, each base element being cloud, sky, water, mountain, grass, and river. Subsequently, the recovery of 100 scenes containing the searched item was was carried out. It was quantified within the 100 recovered how many, indeed, contained the searched element. Table 4 gives the confusion matrix for the recovery of the four elements searched in the scenes. It can be seen that a good recovery is obtained since the proposal is ideal for searching one item at a time-for example water, thus having natural scenes that contain water.

•
Water Element: Figure 15 shows an example of recovering natural scenes containing the element water. For the proposed methodology, the type of scene is not important, and only items containing water (sea water or river water) are recovered. Among all the 20 images, only 14 images containing water were recovered. • Cloud Element: Regarding the cloud element, Figure 16 shows the recovery results, for the sake of clarity, first 20 mages recovered are presented. It can be seen from the example that 15 out of 20 images contained cloud.
• Follage element: To recover the follage element, Figure 17 shows 15 out of 20 scenes recovered. • Rock Element: Finally, for the rock element, Figure 18 provides an example of recovery of 20 scenes recovered; from the figure, it can be seen that 17 out of 20 images.

Comparison with Related Works
The comparison between the proposed method in this research paper and other competitive methods reported in the literature is given in Table 5. The classification performance for the six natural scenes, as well as the mean average performance, are presented. The comparison also includes the best competitive method using Convulational Neural Networks [12,[28][29][30]. It can be seen that our proposal accurately recognizes (100%) each of the natural images under consideration. As shown in Table 5, using the same database found in other research works, our proposed method shows 100% recognition.

Conclusions and Future Work
This research paper proposes the utilization of the Wiener-Granger causality theory, together with the CBIR self-organization analysis. The novel proposal is applied to image retrieval of 6 natural scenes that are: coast, forest, mountain, field, river/lake, and sky/cloud. Taking into account the proposed methodology, from the causality matrix, it was fruitful to find a set of descriptors that represent a type of natural scene. Texture patterns could be defined from an automatic set of reference textures. From the self-organizing attributes, it is now possible to classify any unknown natural scene.
With the proposed methodology, 100% image retrieval is achieved for the three data sets. The proposal is advantageous since no prior labeling or knowledge of the natural scenes content is required.
The proposed methodology gives 100% of recovery on vision image database. Thus, the proposal works at 100% for rotated images and scaled images; performance falls down to 50% for noised images.
Another important contribution of our proposal is the ability to recover natural scenes by some contained element, i.e., water, cloud, follage, and rock. The percentage of recovery of these natural elements is above 70% using the proposal presented in this research paper.
The experimental results show that our proposal outperforms the most competitive methods reported by Damodaran [28], Sharma [29], Damodaran [30], Serrano-Talamantes [12]; with an average recognition of 100% for the same image datasets.
Future work will seek the entire methodology implementation in parallel computing; using CPUs and GPU technology, which could perform the scene recovery task efficiently in the image feature extraction stage. The parallel algorithms might also help to jointly analyze the textures of the image seeking to characterize the image and its associations with the paradigm of visual understanding.

Conflicts of Interest:
The authors declare that there is no conflict of interests regarding the publication of this paper.