A Synthetic Fusion Rule for Salient Region Detection under the Framework of DS-Evidence Theory

: Saliency detection is one of the most valuable research topics in computer vision. It focuses on the detection of the most signiﬁcant objects/regions in images and reduces the computational time cost of getting the desired information from salient regions. Local saliency detection or common pattern discovery schemes were actively used by the researchers to overcome the saliency detection problems. In this paper, we propose a bottom-up saliency fusion method by taking into consideration the importance of the DS-Evidence (Dempster–Shafer (DS)) theory. Firstly, we calculate saliency maps from different algorithms based on the pixels-level, patches-level and region-level methods. Secondly, we fuse the pixels based on the foreground and background information under the framework of DS-Evidence theory (evidence theory allows one to combine evidence from different sources and arrive at a degree of belief that takes into account all the available evidence). The development inclination of image saliency detection through DS-Evidence theory gives us better results for saliency prediction. Experiments are conducted on the publicly available four different datasets (MSRA, ECSSD, DUT-OMRON and PASCAL-S). Our saliency detection method performs well and shows prominent results as compared to the state-of-the-art algorithms.


Introduction
Nowadays, images are becoming an important piece of media for information transmission, information retrieval and information security. Saliency information extraction from images is one of the most active research area in the field of computer and robotics vision. Due to a large amount of data, it is difficult to deal with the large number of images quickly and accurately. The task of image saliency detection is to determine the areas of focus by the human visual system in images and videos. There are different important aspects that should be considered for extracting saliency information from images like motion, depth, color and localization factors, etc. If the salient parts are extracted accurately, the calculation time could be significantly reduced and fast processing of images can be realized. The idea of saliency detection was first introduced by Itti et al. [1]. At present, applications of image saliency detection are very successful in images transmission systems [2], image quality assessment [3], image compression [4], image segmentation [5], target recognition [6], image scaling [7], image retrieval [8], foot plant detection [9], and other areas as well.
Several saliency detection methods have been introduced to overcome the problems in computer vision system for the last three decades. Some methods solely concentrate on low level visual cues for patch-level and region-level saliency detection methods is described in Section 2 (Related Work). Section 3 (Proposed algorithm) elaborates on the two major steps. Firstly, a review of the DS-Evidence theory is described. Secondly, we briefly discuss our proposed approach together with the summary of the complete algorithm. Illustration and experimental results on four different datasets are presented in Section 4. Finally, the conclusions are made in Section 5. Figure 1 shows the pipeline of proposed (DS-OUR) saliency fusion framework.  Figure 1. Pipeline of our proposed (DS-OUR) saliency fusion framework (FG denotes the foreground area containing salient pixels, BG represents the background area containing salient pixels and DS-Fusion represents Dempster-Shafer theory of evidence based fusion).

Related Work
Various saliency computational models are based on the structure of Feature Integration Theory [23]. In this theory, objects are supposed to be attentive in a visual scene by its shape, color and orientations, etc., and are focused by combining these features. Itti et al. [1] practically implemented these features for the first time to construct the saliency map. Some researchers used pixel-wise local priors for saliency detection. Lu et al. [24] calculated pixel-level saliency by integrating multi-scale reconstruction errors and a Gaussian model to refine them under Bayesian framework. Seo et al. [25] calculated the pixel-level saliency by processing each pixel to its surroundings. Perazzi et al. [14] introduced a pixel level saliency detection method in which an image was decomposed into compact and homogeneous elements to extract the necessary detail; then, the uniqueness and the spatial distribution of these elements are measured to estimate saliency information.
Researchers also used patches level information to estimate the region of interest. Shi et al. [26] used image patches to tackle the saliency problem and introduced a new saliency benchmark data-set (ECSSD). They also proposed a multi-layer approach for saliency map construction. The saliency information extracted a hierarchical model of three layers of different patch sizes, and these layers are then combined together to produce the final saliency map. Tong et al. [27] used a multi-scale saliency detection method. They segmented the image into multi-scale super-pixels and then estimated three different cues from integrity, contrast and central biased on each scale within the Bayesian framework for saliency detection. They used a guided filter to smooth the final saliency map obtained by a summation of the saliency information.
Some researchers computed the saliency maps from regional level information by segmenting the images into foreground and background. Wei et al. [13] proposed a region based geodesic saliency detection method in which a saliency map is produced by taking into account the probability of the background priors as boundary and connectivity priors. A graph based saliency detection method was introduced by Yang et al. [19] in which a region level saliency detection method is proposed by representing the image as a close-loop graph with super-pixels as nodes. Then, background and foreground based nodes are ranked together to estimate the salient region based on affinity matrices. Xie et al. [20] proposed a regional level saliency method based on Bayesian framework. They calculated a saliency map by using low and mid level visual cues. Firstly, saliency information is extracted using color priors and then a convex hull is computed to estimate foreground and background. The saliency map is then computed pixel-wise inside and outside the convex hull under the Bayesian framework. By taking into consideration the importance of Bayesian framework, Ayoub et al. [21] calculated the saliency map by using color and texture features of images. They used Bayesian framework to estimate the salient region. Zhu et al. [28] proposed a well modeled saliency detection method in which image boundary regions were characterized in spatial layout by taking into considerations the robust background measure, called boundary connectivity. Secondly, an optimization framework was used for integration of low level visual cues.
DS-Evidence theory is more applicable for the saliency fusion than other saliency fusion methods. DS-Evidence theory is based on Dempster's work [29] of upper and lower probabilities. The use of belief functions into the artificial intelligence was introduced by Barnett [30] in which he introduced the degree of belief function as a numerical method that combines all possible evidences instead of null hypothesis. Lowrance et al. [31] defined the DS-Evidence functions as evidential-reasoning which manipulate all possible reasoning of evidence. A lot of work has been done to improve the DS-Evidence theory to get the maximum information from the input data. Various methods for saliency estimation have improved the results of saliency maps for good in accuracy and recall rate. It is much better to fuse all possible evidence of saliency values rather than a single probability value to reach on the degree of acceptance. The importance of the DS-Evidence theory and its use in fusing all of the possible outcomes instead of a null hypothesis can increase the effect of saliency estimation higher than that of each individual saliency detection method. Figure 2 shows the comparison of saliency maps calculated from our algorithm and nine other state-of-the-art saliency detection algorithms. , and saliency maps of other nine sate of the art approaches DSR [24], GS [13], HS [26], MR [19], MS [27], SF [14], SST [25], XL [20], wCtr [28], respectively.

DS-Evidence Theory Review
DS-Evidence theory was proposed in 1967 by Dempster [29] in which Dempster introduced a system of upper and lower probabilities. In the context of statistical inference, Shafer developed the theory for uncertainty modeling frameworks in 1976 [32]. Beynon et al. [33] revealed that the DS-Evidence theory has numerous advantages over the various statistical methods and Bayesian decision theory due to its performance in modeling the reasons under uncertainty. The DS-Evidence theory can be defined as follows: Denoting the universal set by Θ, which represents a finite set of all possible hypotheses of a problem and its power set by 2 Θ to quantify the candidate proposition. The basic probability assignment (BPA) m : 2 Θ → [0, 1] can be defined as A probability of mass m(A) provides the body of confidence that proposition A is true. Any set A ⊂ 2 Θ that possesses a nonzero BPA, where m(A) > 0 is called a focal element. In the formalism, the imprecision of knowledge can be handled by assigning a nonzero probability mass to the union of two or more classes.
The rule proposed by Dempster played an important role in DS-Evidence theory. Given M mass functions as m 1 , m 2 , ..., m M , the rule is defined as where The property of DS-Evidence theory of combining evidence from different sources makes it applicable for dealing with reasoning under uncertain conditions. The probability can be assigned to any of the subsets of the discernment framework without having the condition to be mutually exclusive and exhaustive. DS-Evidence theory regards each subset as a single hypothesis, which can simulate the reasoning similar to human logic. Therefore, DS-Evidence theory is more applicable in fusion tasks.

DS-Fusion Method
We propose a method for saliency maps fusion based on DS-Evidence theory with the aim to overcome the shortcomings of the above-mentioned prior state-of-the-art methods.
In the first step, n pieces of initial saliency images are generated using n(n > 1) methods to be fused. In our experiments, we used nine different saliency detection methods, n = 9.
In the second step, for each pixel, we define the mass function corresponding to the n saliency images. Define the environment Θ = {FG, BG} where FG represents the pixel as the foreground and BG represents the pixel as the background. The recognition framework contains 2 2 subset, which defines a set of powers that can represent pixels as a foreground or a background as 2 Θ . We know that the mass function is satisfied when Σ A∈2 Θ m(X) = 1. Thus, we can define the mass function (basic trust assignment function) in the form of n saliency images as shown in Equations (4) and (5): where m i (FG) represents the mass function corresponding to the ith saliency maps. FG indicates that the pixels that should be fused are in the foreground and the saliency value at the corresponding pixel. BG indicates that the corresponding pixel is in the background.
In the third step, we calculate the similarity coefficients between the mass functions corresponding to each saliency images (i.e., each piece of evidence), and list the similar matrices. The similarity coefficient is calculated by Equation (6): where the similarity coefficient d ij (d ij ∈ [0, 1]) is used to describe the degree of similarity between the evidence. d ij = 1 indicates the similarity between two pieces of evidence and d ij = 0 indicates that two pieces of evidence are completely different. From the correlation coefficient, we can get the similarity matrix corresponding to n evidence by Equation (7): In the fourth step, we find the supported level and credibility among the evidence. The degree of support of the evidence indicates the degree of support by other evidence, and if one piece of evidence are similar to other evidence. The mutual support is considered to be higher and the support formula of the evidence is shown in Equation (8): The credibility of the evidence reflects the credibility of the general evidence of a higher degree. Credibility can be calculated as follows: In the fifth step, with the credibility as the weight of the mass function, we get the weighted mass function m ave (FG) by taking the pixel as the basic probability of the foreground assignment by Equation (10).
In addition, here we will use the weighted mass function value as the saliency value of the saliency image. All possible prospects of saliency image can detect effectively by a preliminary synthesis of a significant map. We can assume the m ave (FG) as an initial saliency map with the foreground pixels: In the sixth step, the weighted average evidence is used for n − 1 times in the DS synthesis rule to obtain another saliency image of fusion. We know that DS-Evidence mass function in the synthesis of the law Equation (2). We can get the synthetic mass function as shown in Equation (12): where k = (m ave (FG)) n + (1 − m ave (FG)) n .
In the seventh step, we calculate the weighted fusion of the intial saliency maps sal 1 and sal 2 . The final saliency map can be calculated by merging both intial saliency maps with different weights as follows: where µ 1 , µ 2 are the composition weights. We set µ 1 = 0.35 and µ 2 = 0.65 in our experimental work. The method is different from the existing methods because it takes advantage of the various saliency detection methods and the results are superior to the results of each individual saliency detection method. Initial saliency maps can be merged by giving the different weights. Comparison with the traditional methods shows that the proposed method (DS-OUR) outperforms the various saliency detection methods. The complete algorithm of proposed saliency fusion method is summarized in Algorithm 1. Figure 3 shows the saliency maps results calculated by Equations (11), (13) and (14).  (11)); (c) K (addition of foreground and background pixels ); (d) Sal 2 (combined saliency map by Equation (13)); (e) Sal(DS − OUR) final saliency maps by Equation (14); (f) GT (Ground truth mask).

Algorithm 1: DS-Saliency Fusion Algorithm
for A = RGB Image do Calculate S pi = S npi (A); Where npi represents the number of pixel-wise saliency methods. S pa = S npa (A); Where npa represents the number of pixel-wise saliency methods.

Data-Sets
For the evaluation purposes, we use four publicly available datasets in contrast to the efficiency comparison of our algorithm against nine state-of-the-art saliency detection models. These datasets have been used by many researchers in their articles to predict and estimate the efficiency of their algorithms.
MSRA consists of 1000 images introduced by Achanta et al. [10]. It contains pixel-wise ground truth masks of the salient object annotation in terms of bounding boxes, annotated by 3-9 users. It is one of the widely used datasets by the Computer vision community for saliency detection and segmentation comparisons.
ECSSD was introduced by Shi et al. [26] as an extended version of CSSD dataset and contains 1000 images acquired from the Internet. The images in ECSSD dataset structurally have a more complex background than the CSSD dataset. Pixel-wise ground truth masks were annotated by five different users for this dataset.
DUT-OMRON was introduced by Yang et al. [19]. It consists of 5168 high quality images, manually selected from more than 140,000 images. These Images contain one or more salient objects with complex background. Pixel-wise ground truth masks in terms of bounding boxes were annotated by 25 users.

PASCAL-S was built on the validation set of the PASCAL-S segmentation challenge introduced by
Li et al. [34]. It contains 850 natural images as a subset of the PASCAL VOC dataset, with both saliency segmentation ground truth and eye fixation ground truth. The PASCAL-S is a less biased data-set and contains one or more objects. Pixel-wise ground truth masks were annotated by 12 users for this dataset. Figure 4 shows the saliency maps calculated by our saliency detection method and nine different state-of-the-art algorithms on four different data-sets.  , and saliency maps of other nine state-of-the-art approaches DSR [24], GS [13], HS [26], MR [19], MS [27], SF [14], SST [25], XL [20], wCtr [28], respectively.

Precision-Recall Curves
Precision-Recall curves are used to evaluate the performance in contrast to the best estimation of salient regions. Many researchers have employed the method Precision-Recall matrices to show the accuracy of their algorithms, where saliency maps are binarized within the domain of [0, 255] and compared with the ground truth fixation masks. The thresholds vary from 0 to 255 and precision_recall metrics are calculated at each binarized threshold by Equation (15) where t np is the total number of the pixels inside the salient regions of the saliency map and ground truth, s np , is the number of the salient pixels in saliency map. g np , is the number of the salient pixels inside the salient region of ground truth.

ROC-AUC Curves
Receiver Operating Characteristic (ROC) curves are used to quantify the performance of a saliency map detection algorithm in contrast to distinguishing fixation vs. non-fixation regions. AUC stands for the area under the ROC curve. The greater value of the AUC shows the better performance of the classifier. The value of AUC is calculated based on a true positive rate (TPR) and a false positive rate (FPR). TPR is the ratio of the number of saliency values above a given threshold that overlap with the fixation points in the ground truth. FPR is the ratio of the number of saliency map values above the given threshold that do not overlap with the fixation points in the ground truth map. TPR and FPR are computed as follows: Figure 5 shows the AUC comparison results on four different datasets. Our saliency detection method performs well with higher AUC values.  [24], GS [13], HS [26], MR [19], MS [27], SF [14], SST [25], XL [20], wCtr [28], respectively.

F-Measure
F-measure scores are calculated as a harmonic mean of average precision and average recall. We set α 2 to 0.3 to weigh precision more than recall. The precision, recall and F-measure are averaged over total number of the images: where value of α 2 is 0.3. Figure 6 shows the graphical representation of precision-recall curves, ROC curves and F-measures comparison results on the databases presented in [10,19,26,34]

MAE Evaluation
Mean Absolute Error (MAE) is used to measure the closed prediction between calculated saliency map and ground truth map. MAE is computed between saliency map (S) and Ground truth (G). The formula for measuring mean absolute error is Figure 7 shows the MAE comparison results on four different data-sets. Our saliency detection method shows better performance against other saliency detection by showing a minimum mean absolute error.  Figure 7. MAE results on four different datasets, from left to right: DS-OUR, DSR [24], GS [13], HS [26], MR [19], MS [27], SF [14], SST [25], XL [20], wCtr [28], respectively.

Performance Comparison
For the evaluation purposes, we compare the quantitative and qualitative results obtained from our method and nine different state-of-the-art algorithms. We compare our saliency maps' results in contrast to better object detection in the images. The method XL [20] detects non-interesting background pixels as salient, and it calculates saliency of each pixel inside and outside the interesting region by using the convex-hull method with low level visual features. This method can not perform well when the same color of the background is detected as foreground. The approach SF [14] fails to detect the salient pixels of the prominent objects as it considers the saliency as uniqueness of the pixels and spacial distribution of the elements. Our approach consistently estimates the accurate pixels on the dominant objects and their contextual surroundings by considering the pixels of foreground and background, and fuse them for better results. Our final saliency maps were computed by fusing the saliency maps by using Equation (14). We took two sample images from each database presented in [10,19,26,34]. These databases contain the original images and annotated ground-truths to show the saliency maps' comparison results. Figure 4 shows the comparison of our saliency maps results with the biologically-inspired saliency detection approaches. Our algorithm gives the better performance in contrast to over all thresholded accuracy. Figure 5 shows the AUC comparison results on four different datasets. Our saliency detection method performs well with higher AUC values. Figure 6 shows the graphical representation of precision-recall curves, ROC curves and F-measures comparison results. Figure 7 shows the graphical representation of MAE (Mean absolute error) results on the four different data-sets, our (DS-OUR) algorithm shows minimum error in contrast to overall performance in saliency detection as compared to the other nine state-of-the-art algorithms. In part of the images, only the prominent object was marked. For the purpose of fair evaluation, we performed the experiments on these datasets with each algorithm. Table 1 shows quantitative comparison of our algorithm against nine state-of-the-art algorithms. The quantitative results obtained from our algorithm shows the best performance in terms of the AUC (Maximum is better), MAE (Minimum is better) and F-measure values (maximum is better).

Conclusions
In this paper, we discuss the DS-Evidence theory that is used to fuse the saliency maps in our proposed method. Second, our method regards the saliency information extraction as an important issue and considers the foreground and background regions to get the salient pixels' information. Inspired by processes of the human visual and cognitive systems, our proposed method uses the relationship model between saliency labels. We calculate saliency maps by adopting different methods, based on the pixels-level, patches-level and regional-level information. We fuse these saliency maps under the framework of DS-Evidence theory. Extensive experiments on four publicly available images data-sets demonstrate that the proposed method significantly outperforms state-of-the-art saliency detection methods, particularly in terms of insensitivity to different features. In the future, we plan to improve our proposed framework for saliency detection by adding noise and incomplete scenes.