Multi-Scale Pure Graphs with Multi-View Subspace Clustering for Salient Object Detection

Wang, Mingxian; Yang, Hongwei; Zhang, Yi; Wang, Wenjie; Wang, Fan

doi:10.3390/sym17081262

Open AccessArticle

Multi-Scale Pure Graphs with Multi-View Subspace Clustering for Salient Object Detection

by

Mingxian Wang

^1,2

,

Hongwei Yang

^1,2,

Yi Zhang

³

,

Wenjie Wang

^1,2 and

Fan Wang

^4,*

¹

School of Earth Science and Engineering, Xi’an Shiyou University, Xi’an 710065, China

²

Shaanxi Key Laboratory of Petroleum Accumulation Geology, Xi’an Shiyou University, Xi’an 710065, China

³

College of Petroleum Engineering, Xi’an Shiyou University, Xi’an 710065, China

⁴

School of Science, Xi’an Shiyou University, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1262; https://doi.org/10.3390/sym17081262

Submission received: 30 June 2025 / Revised: 21 July 2025 / Accepted: 1 August 2025 / Published: 7 August 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

Salient object detection is a challenging task in the field of computer vision. The graph-based model has attracted lots of research attention and achieved remarkable progress in this task, which constructs graphs to formulate the intrinsic structure of any image. Nevertheless, the existing graph-based salient object detection methods still have certain limitations and face two major challenges: (1) Previous graphs are constructed by the Gaussian kernel, but they are often corrupted by original noise. (2) They fail to capture common representations and complementary diversity of multi-view features. Both of these degrade saliency performance. In this paper, we propose a novel method, called multi-scale pure graphs with multi-view subspace clustering for salient object detection. Its main contribution is a new, two-stage graph, constructed and constrained by multi-view subspace clustering with sparsity and low rank. One of the advantages is that the multi-scale pure graphs upgrade the saliency performance from the propagation of noise in the graph matrix. Another advantage is that the multi-scale pure graphs exploit consistency and complementary information among multi-view features, which can effectively boost the capability of the graphs. In addition, to verify the impact of the symmetry of the multi-scale pure graphs on the salient object detection performance, we compared the proposed two-stage graphs, which included cases considering the multi-scale pure graphs and those not considering the multi-scale pure graphs. The experimental results were derived using several RGB benchmark datasets and several state-of-the-art algorithms for comparison. The results demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of multiple standard evaluation metrics. This paper reveals that multi-view subspace clustering is beneficial in promoting graph-based saliency detection tasks.

Keywords:

salient object detection; multi-scale pure graphs; subspace clustering; multi-view fearures

1. Introduction

Saliency detection aims to seek the most interesting region or important object from a natural scene by simulating the visual attention mechanism. This is a hot topic in computer vision and widely applied in many computer vision tasks, such as image segmentation [1], image retrieval [2], image fusion [3], visual tracking [4], dynamic driving scenes [5], and others.

Existing saliency detection algorithms are used for either eye fixation prediction and salient object detection. Early works [6,7,8] focused on the former topic, predicting where human gazes focused on a given image. Afterward, developed algorithms [9,10,11,12] were mainly geared toward salient object detection, aiming to extract complete object information from RGB images. Salient object detection approaches include bottom-up models and top-down models. Bottom-up models [11,13,14] are stimuli-driven, without specific task guidance, employing low-level features such as colors, textures, orientations, and spatial distances to detect salient objects. Top-down models [15,16,17] are task-driven, requiring supervised learning with manually labeled ground truth, exploiting high-level information to better abstract salient objects.

It is worth mentioning that the bottom-up models [13,14] are pioneer, graph-based methods that have shown promising performance with simplicity and efficiency. These methods mainly contain two aspects, i.e., graph construction and foreground/background seed selection. For saliency detection, graph construction is a critical issue. Traditionally, graph models depend on the Gaussian kernel function to compute affinity matrices in a single-view feature (CIElab color). Background seeds follow the boundary prior (the boundary patches are mostly background). The single-view feature cannot contain rich information about an image. To address this problem, graph models [18,19,20,21] were further constructed on multi-view low-level features for salient object detection. However, the above methods have two limitations. On the one hand, these traditional graphs are constructed from the Gaussian kernel function and may often be corrupted by noise. On the other hand, they fail to efficiently exploit both the consistency and complementary intrinsic structure of multi-view features. Both degrade saliency performance (Figure 1).

To overcome the aforementioned limitations, on the basis of the existing graph models, we propose a novel, multi-scale pure graph with multi-view subspace clustering for salient object detection. It builds upon two-stage multi-scale graphs and follows the background to compute the saliency score of each superpixel. In detail, to capture the local structures of multi-view low-level features, we first utilized the Gaussian kernel function to calculate the traditional graph matrix. Secondly, to further depict the global structure and eliminate the noise of multi-view low-level features, multi-view low-rank sparse subspace clustering [22] was applied to generate a shared affinity matrix. Afterward, the traditional graph matrix and affinity matrix were decomposed by singular value decomposition (SVD). In order to exploit the consistency and complementary intrinsic structure of multi-view low-level features, a joint affinity graph matrix was learned from multi-view subspace clustering based on a low-rank representation with diversity regularization and a rank constraint [23]. Based on these graph matrices, we constructed two-stage multi-scale graphs for graph-based manifold ranking. The main contributions are summarized as follows:

1.: To depict the global structure and remove the noise of multi-view low-level features, multi-view low-rank sparse subspace clustering was applied to induce a shared affinity graph matrix.
2.: To further capture the consistency and complementary intrinsic structure of multi-view features, multi-view subspace clustering was explored based on a low-rank representation with diversity regularization.
3.: To clearly describe the local and global structure of multi-view low-level features, two-stage multi-scale pure graphs were constructed based on the above graph matrices.
4.: Extensive experiments demonstrate that our two-stage, multi-scale pure graphs consistently achieve better saliency performance than several state-of-the-art graph models on five benchmark datasets.

The rest of this paper is organized as follows. Section 2 describes the works related to the proposed method. Section 3 introduces the main aspects of our designed framework. Section 4 presents the experiments and comparisons. Section 5 gives a discussion of the proposed method. Finally, Section 6 concludes our work.

2. Related Work

In the early stage of graph-based saliency detection, in view of these representative graph-based saliency detection methods [14,24,25,26], researchers mainly adopted the CIElab color feature to compute the affinity matrix. However, these graph models only use the CIElab color space and are unable to present the rich diversity of salient objects and backgrounds. Even worse, traditional graphs fail to capture the global structure of an image.

In particular, the work that inspired our proposed method is that featuring graph-based manifold ranking (GMR) [14], which has attracted a lot of attention in salient object detection. In this method, an RGB image is mapped onto a graph

G = (V, E)

with N nodes

{v_{1}, v_{2}, . . ., v_{N}}

and a set of edges E. Traditionally, E is denoted by an affinity graph matrix

W = {[w_{i j}]}_{N \times N}

and

w_{i j}

is calculated by

w_{i j} = \{\begin{matrix} e^{- \frac{∥ x_{i} - x_{j} ∥}{σ^{2}}} & i f j \in N_{i} \\ 0 & o t h e r w i s e \end{matrix}

(1)

where

x_{i}

and

x_{j}

represent the mean scores of superpixels

v_{i}

and

v_{j}

in the feature space, respectively.

σ

is a constant.

N_{i}

denotes the neighbors of superpixel

v_{i}

. From the graph matrix

W = {[w_{i j}]}_{N \times N}

, the corresponding degree matrix

D = d i a g {d_{11}, . . ., d_{N N}}

is obtained, where

d_{i i} = \sum_{i}^{N} w_{i j}

. Accordingly, let

y = {[y_{1}, y_{2}, . ., y_{N}]}^{T}

be a indication vector, in which

y_{i} = 1

if

x_{i}

is a query and

y_{i} = 0

otherwise. The optimal ranking solution of queries is derived by solving the optimization formula:

\begin{matrix} f^{*} = arg min_{f} \frac{1}{2} (\sum_{i, j = 1}^{N} w_{i j} ∥ \frac{f_{i}}{\sqrt{d_{i i}}} - \frac{f_{j}}{\sqrt{d_{j j}}} ∥^{2} + μ \sum_{i = 1}^{N} ∥ f_{i} - y_{i} ∥^{2}) \end{matrix}

(2)

where the first term

(\sum_{i, j = 1}^{N} w_{i j} ∥ \frac{f_{i}}{\sqrt{d_{i i}}} - \frac{f_{j}}{\sqrt{d_{j j}}} ∥^{2})

represents the smoothness constraint. The second term

(\sum_{i = 1}^{N} ∥ f_{i} - y_{i} ∥^{2})

represents the fitting constraint. The parameter

μ

controls the balance between these two terms.

f^{*}

is the ranking results. When setting the derivative of Equation (2) to be 0, the ranking results

f^{*}

is equal to

\begin{matrix} f^{*} = {(D - α S)}^{- 1} y \end{matrix}

(3)

where

S

denotes a normalized Laplacian matrix,

S = D^{- 1 / 2} W D^{- 1 / 2}

, and

α = \frac{1}{1 + μ} = 0.99

.

To suit salient object detection, an unnormalized Laplacian matrix is embeded into Equation (3) and then the final ranking result is generated as

\begin{matrix} f^{*} = {(D - α W)}^{- 1} y \end{matrix}

(4)

The key contribution of graph-based manifold ranking is two-stage “superior” graphs. For a “superior” graph matrix

W \in R^{N \times N}

, each element

w_{i j}

is an edge weight that reflects the similarity between two adjacent nodes/superpixels

x_{i}

and

x_{j}

of an input image. Inspired by this, researchers [18,20,27] extracted the multi-view appearance feature from an image and developed the corresponding graph-based approaches. They individually computed the traditional graph matrix in each appearance feature and achieved the final graph matrix through linear fusion or vector dot product. However, these approaches still cannot capture the global structure. Furthermore, their graphs are insufficient to possess consistency and complementary intrinsic structures from the multi-view features and do not obtain significantly enhanced saliency results. Benefiting from the high-level features of deep learning [28,29] that contain rich semantic concepts of salient objects, several improved graph-based algorithms [30,31,32,33] have been proposed. These methods construct traditional graph matrices by extracting the multi-view features containing high-level semantic information and appearance features. Since the high-level semantic information can effectively describe the salient objects, these algorithms achieve better saliency performance. However, the high-level semantic information needs plenty of time to train the sample datasets. Moreover, the pooling of convolutional neural networks seriously blurs the position information of salient objects. In recent years, with the superior performance of deep learning [34,35], convolutional neural networks [36,37] and attention-based graph models [38,39,40] have been developed to perform salient object detection and have achieved outstanding saliency results. But, they still have a common shortcoming whereby they require a huge amount of labeled data along with a GPU-enabled system. In this study, different from all the aforementioned salient object detection methods, our proposed method explicitly explores subspace clustering and constructs a robust graph model of the multi-view low-level features in saliency objects. Fortunately, compared with unsupervised graph-based saliency models with multi-view low-level features, our proposed method generates promising performance and breaks through the limitation of traditional graph models.

3. Methodology

A diagram of our proposed method is shown in Figure 2, and the details will be elaborated upon below.

3.1. Multi-View Feature Extraction

To effectively describe the difference between salient objects and background, our proposed method firstly extracts 64-dimension low-level features, as given in Table 1. Secondly, SLIC [41] is carried out to over-segment the input image into N nonoverlapping superpixels

P = {P_{1}, P_{2}, \dots, P_{N}}

. For each superpixel

P_{i}

, each view feature matrix is represented as

X^{(v)} = {x_{1}^{(v)}, x_{2}^{(v)}, \dots, x_{N}^{(v)}} \in R^{d^{v} \times N}

, where

v = {1, 2, . ., V}

.

3.2. Multi-View Pure Graph Construction

This section includes the traditional graph matrix computation and joint affinity matrix learning with multi-view subspace clustering.

3.2.1. Adjacent Graph

Based on the suitability of the traditional graph matrix in salient object detection, the similarity between any two adjacent nodes

P_{i}

and

P_{j}

in multi-view features is calculated as follows:

w_{i j}^{(l)} = \{\begin{matrix} e^{- \frac{∥ x_{i}^{(l)} - x_{j}^{(l)} ∥^{2}}{σ^{2}}} & i f j \in N_{i} \\ 0 & o t h e r w i s e \end{matrix}

(5)

w_{i j}^{(o)} = \{\begin{matrix} e^{- \frac{∥ x_{i}^{(o)} - x_{j}^{(o)} ∥^{2}}{σ^{2}}} & i f j \in N_{i} \\ 0 & o t h e r w i s e \end{matrix}

(6)

where

x_{i}^{(l)}

and

x_{j}^{(l)}

represent the mean values of superpixels

P_{i}

and

P_{j}

in CIElab color features, respectively.

x_{i}^{(o)}

and

x_{j}^{(o)}

represent the mean values of superpixels

P_{i}

and

P_{j}

in other multi-view features, respectively.

σ

is a constant parameter that controls the degree of similarity. To integrate these two graph matrices

W^{(o)} = {[w_{i j}^{(o)}]}_{N \times N}

and

W^{(l)} = {[w_{i j}^{(l)}]}_{N \times N}

, the traditional graph matrix is generated by

W^{(T)} = \sqrt{{(W^{(l)})}^{2} + {(W^{(o)})}^{2}}

(7)

3.2.2. Affinity Graph Learning

The multi-view features

X^{(v)} = {V^{(1)}; V^{(2)}; \dots; X^{(V)}} \in R^{d \times n}

should share consensus information because different features represent an input image simultaneously. Firstly, to eliminate the confusion of noise, an affinity matrix shared among multi-views is learned with low-rank and sparsity constraints, then defined as

\begin{matrix} min_{C^{(1)}, C^{(2)}, . . ., C^{(V)}} = \sum_{v = 1}^{V} (β_{1} ∥ C^{(v)} ∥_{*} + β_{2} ∥ C^{(v)} ∥_{1} + λ^{(v)} ∥ C^{(v)} - C^{*} ∥_{F}^{2}) \\ s . t . X^{v} = X^{v} C^{v}, d i a g (C^{v}) = 0 . \end{matrix}

(8)

where

C^{(v)}

represents the low-rank representation matrix of the v-th feature,

C^{*}

represents the consensus low-rank representation matrix of all views, and the parameters

β_{1}, β_{2}

, and

λ^{(v)}

control the trade-off between low-rank and sparsity terms, respectively. In this study, we set

β_{1} = 0.1, β_{2} = 1 - β_{1}, λ^{(v)} = 0.3, V = 8

. The optimization problem of this objective function is solved with the method [22]. Then, the affinity matrix is obtained by

\begin{matrix} W^{(C)} = \frac{| C^{*} | + | C^{*} |^{T}}{2} \end{matrix}

(9)

To better explore the local similarity information of the traditional graph matrix

W^{(T)}

and global similarity information of the affinity matrix

W^{(C)}

, we utilize the Hadamard product to combine these two matrices, as follows:

\begin{matrix} W^{(T C)} = {\tilde{W^{(C)}}}^{T} \circ W^{(T)} \circ \tilde{W^{(C)}} + {\tilde{W^{(T)}}}^{T} \circ W^{(C)} \circ \tilde{W^{(T)}} \end{matrix}

(10)

where ∘ represents the Hadamard product of

N \times N

matrices. The Hadamard product can reduce the computational complexity of the matrix fusion and is insensitive to the saliency results of our proposed method. Regarding the matrices

W^{(C)}

and

W^{(T)}

, the angular information of the principal directions of any two low-rank vectors probably extracted from the same subspace has a higher value than that of those extracted from different subspaces. Specifically, we calculate

\tilde{W^{(C)}} = U^{(C)} Σ {(V^{(C)})}^{T}

and

\tilde{W^{(T)}} = U^{(T)} Σ {(V^{(T)})}^{T}

, which represent the SVDs of

W^{(C)}

and

W^{(T)}

, respectively. Thus,

W^{(T C)}

with global clustering can be used to refine the neighbors of the traditional graph. Further, in order to sufficiently exploit consistency and complementary information among multi-view features, the graph regularization term and rank constraint are simultaneously mixed to formulate the objective model and then a joint affinity matrix is learned and obtained as follows:

\begin{matrix} min_{\hat{Z}, E^{(v)}, w} ∥ \hat{Z} ∥_{*} + \sum_{v = 1}^{V} w^{(v)} {∥ E^{(v)} ∥}_{2, 1} + λ w^{T} H w + β T r (\hat{Z} L_{s} {\hat{Z}}^{T}) \\ s . t . X^{(v)} = X^{(v)} \hat{Z} + E^{(v)}, w^{T} 1_{V} = 1, r a n k (L_{s}) = N - C, \end{matrix}

(11)

where

\hat{Z}

is the joint representation matrix of the multi-view features.

E^{(v)}

is the representation residual matrix of the v-th view.

L_{s}^{(T C)}

is the Laplacian matrix of the graph matrix

W^{(T C)}

.

λ

and

β

are two positive balance parameters, which are set to

λ = 1

and

β = 1

. C represents different classes of the multiview data, setting

C = 10

.

H = {[H_{i, j}]}_{V \times V} \in R^{V \times V}

and

H_{i, j} = T r (T_{i}^{T} T_{j})

represent the similarity between the i-th and j-th views, where

T_{i} = D_{i}^{- 1} S_{i} (i = 1, 2, \dots, V)

is the probability transition matrix of the random walk of the i-th view.

1_{V} \in R^{V}

is a vector and its elements are set as 1. Given a vector

w \in R_{+}^{V}

, a diversity regularization term is obtained as follows:

\begin{matrix} min_{w \in R_{+}^{V}} \sum_{i, j = 1}^{V} w_{i} w_{j} H_{i, j} = w^{T} Hw s . t . w^{T} 1 = 1 \end{matrix}

(12)

We adopt the augmented Lagrange multiplier with alternating direction minimizing (ALM-ADM) to optimize the objective function Equation (11). This objective function is solved via [23]. Accordingly, the joint affinity matrix is achieved by

\begin{matrix} W^{(Z)} = \frac{| \hat{Z} | + | \hat{Z} |^{T}}{2} \end{matrix}

(13)

In order to compute saliency maps with homogeneity and integrity,

W^{(Z)}

is further normalized as follows:

\begin{matrix} W^{{(Z)}_{*}} = {(D^{(Z)})}^{- 1} \cdot W^{(Z)} \end{matrix}

(14)

where

D^{(Z)}

is the degree matrix of the graph matrix

W^{{(Z)}_{*}}

. On this basis,

W^{{(Z)}_{*}}

is applied to strengthen the above graph matrix. Accordingly, the graph matrix of first-stage graph-based manifold ranking is

\begin{matrix} W^{(1)} = {(W^{{(Z)}_{*}})}^{T} \circ W^{(T C)} \circ W^{{(Z)}_{*}} + η_{1} {(W^{{(Z)}_{*}})}^{T} \circ W^{(T)} \circ W^{{(Z)}_{*}} \end{matrix}

(15)

Inspired by the background prior, the coarse saliency score is computed by propagating the background seeds

y

on the graph, which is formulated as

\begin{matrix} {f *}^{(1)} = {(D^{(1)} - α W^{(1)})}^{- 1} y \end{matrix}

(16)

where

y \in {y_{t}, y_{d}, y_{l}, y_{r}}

, in which

y_{t}, y_{d}, y_{l}

, and

y_{r}

represent the top, down, left, and right boundaries of an image, respectively.

{f *}^{(1)}

is the saliency result of the first-stage graph-based manifold ranking.

After segmenting

{f *}^{(1)}

by the threshold value

κ \cdot m e a n ({f *}^{(1)})

, the foreground seed

y_{f}

is produced. Thereby, for the faithful propagation of foreground seed

y_{f}

, another “good” graph matrix is expected to be constructed. Considering the saliency

{f *}^{(1)}

, a new graph matrix is computed as

\begin{matrix} W^{(2)} = {(W^{(T)})}^{T} \circ W^{(f)} \circ W^{(T)} + η_{2} {(W^{{(Z)}_{*}})}^{T} \circ W^{(f)} \circ W^{{(Z)}_{*}} \end{matrix}

(17)

where

W^{(f)}

is the traditional graph matrix by using the saliency

{f *}^{(1)}

. Finally, the saliency map is achieved by Algorithm 1:

\begin{matrix} {f *}^{(2)} = {(D^{(2)} - α W^{(2)})}^{- 1} y_{f} \end{matrix}

(18)

The saliency maps of the proposed framework are presented in Figure 3.

Computational Complexity: In Algorithm 1, two main computational complexities solve the objective functions (Equations (8) and (11)). Following [22], the computational complexity of solving the objective functions in Equation (8) is

O (T V N^{3})

, where T is the number of iterations, V is the number of views, and N is the number of superpixels. In addition, the computational complexity of solving the objective functions in Equation (11) is

O (d_{i}^{2} N + d_{i} N^{2} + N^{3})

by following reference [23], where

d_{i}

is the dimension of each view.

Algorithm 1 Multi-scale pure graphs with multi-view subspace clustering for salient object detection

Require: Multi-view features

X

, background seeds

y \in {y_{t}, y_{d}, y_{l}, y_{r}}

.
Ensure: Saliency result

{f *}^{(2)}

.

1:: Initially compute traditional graph matrices $W^{(l)} = {[w_{i j}^{(l)}]}_{N \times N}$ and $W^{(o)} = {[w_{i j}^{(o)}]}_{N \times N}$ by Equation (5) and Equation (6) respectively.
2:: Fuse traditional graph matrices $W^{(l)} = {[w_{i j}^{(l)}]}_{N \times N}$ and $W^{(o)} = {[w_{i j}^{(o)}]}_{N \times N}$ to generate $W^{(T)}$ by Equation (7).
3:: Learn affinity graph matrix $W^{(C)}$ by the objective function Equations (8) and (9).
4:: Combine $W^{(C)}$ and $W^{(T)}$ and produce $W^{(T C)}$ by Equation (10).
5:: Achieve affinity graph matrix $W^{(Z)}$ by the objective function Equations (11) and (13).
6:: Normalize affinity graph matrix $W^{(Z)}$ and obtain $W^{{(Z)}_{*}}$ by Equation (14).
7:: Construct first-stage graph matrix $W^{(1)}$ by Equation (15).
8:: Propagate background seeds $y \in {y_{t}, y_{d}, y_{l}, y_{r}}$ and compute coarse saliency map ${f *}^{(1)}$ by Equation (16).
9:: Construct second-stage graph matrix $W^{(2)}$ by Equation (17).
10:: Obtain foreground seeds by segmenting ${f *}^{(1)}$ .
11:: Propagate foreground seeds $y_{f}$ and calculate fine saliency result ${f *}^{(2)}$ by Equation (18).

4. Experimental Design and Analysis

In this section, we evaluate the saliency performance of the proposed method on five public RGB datasets, which provide qualitative and quantitative comparisons with ten state-of-the-art methods, and carry out some effectiveness analysis of ablation experiments.

4.1. Experimental Setup

4.1.1. Implementation Details

All extensive experiments were performed in MATLAB 2016b on an Intel(R) Core(TM) i5-3700K with CPU @ 3.40 GHz and 32.0 GB of RAM. In the proposed framework, the numbers of multi-scale superpixels were set as 200, 300, and 400.

4.1.2. Benchmark Dataset

To demonstrate the effectiveness of the proposed approach, we conducted extensive experiments on five public RGB saliency detection datasets, named ECSSD [46], SOD [47], DUTOMRON [46], HUK-IS [48], and SED2 [49].

ECSSD has 1000 images of complex scenes with meaningful and rich semantic concepts.

SOD contains 300 challenging images from the Berkeley segmentation dataset, in which most images have several objects, which is challenging for bottom-up models.

DUTOMRON contains 5168 images with structural complexity and at least one object in a scene.

HUK-IS contains 4447 images with multiple salient objects.

SED2 contains 100 images and each image has two salient objects.

4.1.3. Baseline Models

To show the superior saliency performance of the proposed method, we compared the proposed method with ten state-of-the-art methods, including DRFI [13], GMR [14], RBD [50], BSCA [26], SMD [42], DP2-LSG [51], RCRR [52], WMR [18], PDP [53], AME [54], and HCA [55].

GMR is the baseline of our method.

AME is a promising graph-based salient object detection approach among existing bottom-up models in which the graph is constructed by multi-view high-level features.

HCA is an improvement of BSCA, where BSCA is named single cellular automata. It utilizes multi-view high-level features and multi-view low-level features.

SMD is a saliency detection method based on low-rank matrix recovery theory that exploits multi-view low-level features.

DRFI is a deep learning approach that exploits multi-view low-level features to highlight salient regions and achieves remarkable saliency performance.

Others are representative graph-based methods.

4.1.4. Evaluation Metrics

To achieve a faithful evaluation of all experimental results, the comparison experiments relied on eight evaluation metrics, such as the precision–recall (PR) curve [14], S-measure [56], E-measure [57], F-measure (

F_{m}

), mean absolute error (MAE) [12], area under curve (AUC), overlap ratio (OR) [42], and weighted F-measure [58].

PR: in the precision–recall (PR) curve, the precision was the correct proportion of the detection results and the recall was the proportion of correct detection results in the ground truth (GT). They were defined as follows:

P r e c i s i o n = \frac{| M \cap G T |}{| M |}, R e c a l l = \frac{| M \cap G T |}{| G T |}

(19)

where M denotes the binarized map of a saliency map and GT represents the corresponding ground truth (GT).

S-measure was adopted to assess the detection performance and counted region-aware and object-aware structural similarity among the saliency map and the corresponding ground-truth map, which was computed by

S_{β} = β \cdot S_{o} + (1 - β) \cdot S_{r}

(20)

where

S_{o}

and

S_{r}

are the region-aware and object-aware structural similarity measures, respectively.

E-measure was an enhanced-alignment measure that integrated the local pixel score with the image-level mean score and computed the statistics for image levels and matching information of local pixels. It was defined as

E_{ξ} = \frac{1}{W \times H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} θ (ξ)

(21)

where

θ (ξ)

denotes the enhanced-alignment matrix.

F-measure was used to obtain an overall measure for the saliency results and was formulated as

F_{β} = \frac{(1 + β^{2}) \cdot P r e c i s i o n \cdot R e c a l l}{β^{2} \cdot P r e c i s i o n + R e c a l l}

(22)

β^{2}

was set to 0.3 to strengthen the precision. Based on this, the weighted F-measure was computed by

F_{β}^{ω} = \frac{(1 + β^{2}) \cdot P r e c i s i o n^{ω} \cdot R e c a l l^{ω}}{β^{2} \cdot P r e c i s i o n^{ω} + R e c a l l^{ω}}

(23)

MAE was an assistant measure that calculated the average difference between the saliency map and the corresponding ground truth:

M A E = \frac{1}{W \times H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} | S (i, j) - G T (i, j) |

(24)

A lower computed value meant a better performance.

AUC was defined as the area under the ROC curve; the ROC curve was obtained by computing the true positive rate (TPR) and false positive rate (FPR) under different thresholds.

4.2. Comparison with State-of-the-Art Methods

This section comprehensively evaluates and analyzes both quantitative and qualitative comparisons of the experimental results for the ECSSD, SOD, DUTOMRON, HUK-IS, and SED2 datasets.

Figure 4 shows the PR curves of our proposed method and other compared methods. It is noteworthy that our method significantly outperformed other state-of-the-art methods for DUTOMRON, ECSSD, SOD, and HUK-IS, and our method was clearly better than the other baseline methods except for AME and HCA. The most important factor was that AME and HCA utilize deeper or learned semantic features, while our proposed method only employs multi-view low-level features. On SED2, our method also achieved a better result. Moreover, Table 2, Table 3, Table 4, Table 5 and Table 6 present the quantitative comparison results, including the S-measure, E-measure, F-measure, MAE, AUC, WF, and OR matrices. Table 2, Table 3, Table 4 and Table 5 further demonstrate the superiority of our proposed method on ECSSD, SOD, HUK-IS, and DUTOMRON. As shown in Table 6, our method achieved considerable performance compared with the baseline methods.

Visual comparisons of saliency maps of different algorithms in typical scenarios are shown in Figure 5. Intuitively, our method could accurately extract the salient objects in complex scenes. Even if salient objects were extremely similar to the surroundings (Image 4, 5, 11, 12, 15), the proposed method could still generate better saliency performance. These examples obviously illustrate the effectiveness and robustness of the proposed method. In total, visual comparisons clearly demonstrate that our proposed method performed better than the state-of-the-art methods on challenging scenes.

4.3. Comparison of Ablation Experiments

In this subsection, we describe the related ablation experiments, including the graph model analysis and each stage of graph contribution.

In the first stage of graph-based manifold ranking, the key matter is the newly constructed graph

W^{(1)}

. Based on this, we tested the proposed scheme with traditional multi-view graph

W^{T}

and new multi-view graph

W^{(1)}

. From Figure 6a and Table 7, we can observe that the new multi-view graph

W^{(1)}

performed significantly better than the traditional multi-view graph

W^{T}

and baseline method. Meanwhile, Figure 7 displays the superiority of the proposed method using visual comparisons. As for the contributions of the two-stage graphs, Figure 6b and Table 8 present the PR curves and other quantitative results. It can be seen that the affinity graph

W^{(T C)}

contributed to the two-stage graph-based manifold ranking. Table 8 quantitatively demonstrates the contribution of the proposed second-stage graph. Figure 8 displays visual comparisons, further showing the progress from coarse to fine.

Finally, we analyzed the influence of superpixels with different numbers of superpixels, as shown in Figure 9 and Table 9. The best performance of the proposed method was obtained when the number of superpixels was set to

N = 400

.

4.4. Extended Experiment on High-Resolution Dataset

To demonstrate the potential applicability of our proposed method in broader contexts, we provide quantitative comparisons on a high-resolution dataset. Most graph-based saliency detection methods mainly focus on natural images with low resolutions, such as 400 × 400 or smaller, which limits their potential applications. Therefore, we tested our method on the high-resolution dataset (HRSOD) and compared it with graph-based models (GMR, IDCL) and the PDP method. Figure 10 shows the PR curves of the experimental methods on the HRSOD dataset, showing that the proposed method achieved the best performance improvement. The proposed method demonstrates a powerful ability to be used in practical applications compared to the three traditional graph-based saliency detection models.

4.5. Running Time

The computational time of the proposed method was analyzed on the ECSSD benchmark dataset. The proposed method was implemented on MATLAB 2016b on the 13th Gen Intel(R) Core(TM) i7-13700K with CPU @ 3.40 GHz and 32.0 GB of RAM. It took approximately 5.79 s on average to generate the saliency map for experimental images of 400 × 267 pixels. Table 10 shows the average time of our method and the baseline models. We found that our method was slower than the competitive methods but still outperformed them both considering the comprehensive evaluation performances. This was because the competitive models AME and HCA required a significant amount of time to train multi-view semantic features using neural networks.

5. Discussion

In this work, we explored graphs with low-level multi-view features and computed saliency scores by following the background prior. This could produce poor saliency results, as shown in Figure 11. For the challenging scenes, only low-level multi-view features caused an inability to identify salient objects. In a future study, we plan to exploit multi-view deep-level semantic information to construct a “good” graph matrix for hyperspectral images. In addition, it is worth mentioning that eye movement data can be formulated to describe clinically relevant regions in a medical image and potentially integrated into an artificial intelligence (AI) system for automatic diagnosis in medical imaging. Therefore, we will be committed to researching graph neural networks for saliency detection and applying them to chest X-rays.

6. Conclusions

In this paper, we proposed novel, multi-scale pure graphs with multi-view subspace clustering for salient object detection. This work presented two-stage graphs constrained by multi-view subspace clustering with sparsity and low rank and embedded them into manifold ranking to compute a saliency map. This could upgrade saliency performance using the propagation on the graph model that exploited consistency and complementary information among multi-view features. Both methods effectively boosted the capability of the graph model. To validate the promising performance of the proposed method in terms of standard evaluation indexes, experiments on several RGB benchmark datasets and comparisons with several state-of-the-art baseline methods were carried out. The superior saliency performance enhancement demonstrates the good generalization capability of our proposed saliency detection framework.

Author Contributions

Conceptualization, M.W. and F.W.; Methodology, M.W. and F.W.; Software, H.Y. and W.W.; Validation, M.W., F.W. and H.Y.; Formal analysis, M.W. and F.W.; Resources, F.W.; Data curation, M.W.; Writing—original draft, M.W.; Writing—review and editing, M.W., Y.Z. and F.W.; Visualization, H.Y.; Supervision, M.W., Y.Z. and F.W.; Project administration, M.W.; Funding acquisition, M.W., Y.Z. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported in part by the National Natural Science Foundation of China under grant 12401673, the Graduate Education Comprehensive Reform Project of Xi’an Shiyou University under grant 2023-X-YJG-021, the Deep Earth Probe and Mineral Resources Exploration-National Science and Technology Major Project under grant 2024ZD1004406, the Natural Science Basic Research Program of Shaanxi under grant 2024JC-YBQN-0670, and the Youth Innovation Team of Shaanxi Universities (Yi Zhang).

Data Availability Statement

All data generated or analysed during this study are included in this article.

Acknowledgments

We sincerely thank all the creators and funding programs that were involved in the writing of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hsu, K.J.; Lin, Y.Y.; Chuang, Y.Y. DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8838–8847. [Google Scholar] [CrossRef]
Yang, X.; Qian, X.; Xue, Y. Scalable Mobile Image Retrieval by Exploring Contextual Saliency. IEEE Trans. Image Process. 2015, 24, 1709–1721. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Tang, L.; Xu, M.; Zhang, H.; Xiao, G. STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection. IEEE Trans. Instrum. Meas. 2021, 70, 5009513. [Google Scholar] [CrossRef]
Wang, Q.; Chen, F.; Xu, W. Saliency selection for robust visual tracking. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 2785–2788. [Google Scholar]
Deng, T.; Yang, K.; Li, Y.; Yan, H. Where Does the Driver Look? Top-Down-Based Saliency Detection in a Traffic Driving Environment. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2051–2062. [Google Scholar] [CrossRef]
Itti, L.; Koch, C.; Niebur, E. A Model of Saliency-based Visual Attention for Rapid Scene Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
Schölkopf, B.; Platt, J.; Hofmann, T. Graph-Based Visual Saliency. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference; MIT Press: Cambridge, MA, USA, 2007; pp. 545–552. [Google Scholar]
Hou, X.; Zhang, L. Saliency Detection: A Spectral Residual Approach. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
Achanta, R.; Estrada, F.; Wils, P.; Süsstrunk, S. Salient Region Detection and Segmentation; Springer: Berlin, Germany, 2008. [Google Scholar] [CrossRef]
Achantay, R.; Hemamiz, S.; Estraday, F.; Süsstrunky, S. Frequency-tuned salient region detection. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, Miami, FL, USA, 20–25 June 2009; pp. 1597–1604. [Google Scholar] [CrossRef]
Cheng, M.M.; Zhang, G.X.; Mitra, N.; Huang, X.; Hu, S.M. Global Contrast Based Salient Region Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 37, 409–416. [Google Scholar] [CrossRef]
Perazzi, F.; Krähenbühl, P.; Pritch, Y.; Hornung, A. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 733–740. [Google Scholar]
Jiang, H.; Wang, J.; Yuan, Z.; Wu, Y.; Zheng, N.; Li, S. Salient Object Detection: A Discriminative Regional Feature Integration Approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2083–2090. [Google Scholar]
Yang, C.; Zhang, L.; Lu, H.; Ruan, X.; Yang, M.H. Saliency Detection via Graph-Based Manifold Ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 3166–3173. [Google Scholar]
Li, X.; Zhao, L.; Wei, L.; Yang, M.H.; Wu, F.; Zhuang, Y.; Ling, H.; Wang, J. DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection. IEEE Trans. Image Process. 2015, 25, 3919–3930. [Google Scholar] [CrossRef]
Liu, N.; Han, J. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 678–686. [Google Scholar]
Li, G.; Yu, Y. Visual Saliency Detection Based on Multiscale Deep CNN Features. IEEE Trans. Image Process. 2016, 25, 5012–5024. [Google Scholar] [CrossRef]
Zhu, X.; Tang, C.; Wang, P.; Xu, H.; Wang, M.; Tian, J. Saliency Detection via Affinity Graph Learning and Weighted Manifold Ranking. Neurocomputing 2018, 312, 239–250. [Google Scholar] [CrossRef]
Zhang, M.; Pang, Y.; Wu, Y.; Du, Y.; Sun, H.; Zhang, K. Saliency Detection via Local Structure Propagation. J. Vis. Commun. Image Represent. 2018, 52, 131–142. [Google Scholar] [CrossRef]
Ji, Y.; Zhang, H.; Tseng, K.K.; Chow, T.W.; Wu, Q.J. Graph Model-based Salient Object Detection Using Objectness and Multiple Saliency Cues. Neurocomputing 2018, 323, 188–202. [Google Scholar] [CrossRef]
Liu, Y.; Han, J.; Zhang, Q.; Wang, L. Salient Object Detection via Two-Stage Graphs. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 1023–1037. [Google Scholar] [CrossRef]
Brbić, M.; Kopriva, I. Multi-view low-rank sparse subspace clustering. Pattern Recognit. 2018, 73, 247–258. [Google Scholar] [CrossRef]
Tang, C.; Zhu, X.; Liu, X.; Li, M.; Wang, P.; Zhang, C.; Wang, L. Learning a Joint Affinity Graph for Multiview Subspace Clustering. IEEE Trans. Multimed. 2019, 21, 1724–1736. [Google Scholar] [CrossRef]
Jiang, B.; Zhang, L.; Lu, H.; Yang, C.; Yang, M.H. Saliency Detection via Absorbing Markov Chain. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 1665–1672. [Google Scholar]
Zhou, L.; Yang, Z.; Yuan, Q.; Zhou, Z.; Hu, D. Salient Region Detection via Integrating Diffusion-Based Compactness and Local Contrast. IEEE Trans. Image Process. 2015, 24, 3308–3320. [Google Scholar] [CrossRef]
Qin, Y.; Lu, H.; Xu, Y.; Wang, H. Saliency Detection via Cellular Automata. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 110–119. [Google Scholar]
Wang, F.; Wang, M.; Peng, G. Multiview diffusion-based affinity graph learning with good neighbourhoods for salient object detection. Appl. Intell. 2025, 55, 37. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 512–519. [Google Scholar] [CrossRef]
Wang, Q.; Zheng, W.; Piramuthu, R. GraB: Visual Saliency via Novel Graph Model and Background Priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 535–543. [Google Scholar]
Zhang, Y.Y.; Zhang, S.; Zhang, P.; Song, H.Z.; Zhang, X.G. Local Regression Ranking for Saliency Detection. IEEE Trans. Image Process. 2019, 29, 1536–1547. [Google Scholar] [CrossRef]
Xia, C.; Zhang, H.; Gao, X.; Li, K. Exploiting background divergence and foreground compactness for Salient object detection. Neurocomputing 2019, 383, 194–211. [Google Scholar] [CrossRef]
Deng, C.; Yang, X.; Nie, F.; Tao, D. Saliency Detection via a Multiple Self-Weighted Graph-Based Manifold Ranking. IEEE Trans. Multimed. 2020, 22, 885–896. [Google Scholar] [CrossRef]
Zhang, K.; Li, T.; Shen, S.; Liu, B.; Chen, J.; Liu, Q. Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-Saliency Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9047–9056. [Google Scholar] [CrossRef]
Ji, W.; Li, X.; Wei, L.; Wu, F.; Zhuang, Y. Context-Aware Graph Label Propagation Network for Saliency Detection. IEEE Trans. Image Process. 2020, 29, 8177–8186. [Google Scholar] [CrossRef] [PubMed]
Fang, X.; Jiang, M.; Zhu, J.; Shao, X.; Wang, H. GroupTransNet: Group transformer network for RGB-D salient object detection. Neurocomputing 2024, 594, 127865. [Google Scholar] [CrossRef]
Zhong, M.; Sun, J.; Ren, P.; Wang, F.; Sun, F. MAGNet: Multi-scale Awareness and Global fusion Network for RGB-D salient object detection. Knowl.-Based Syst. 2024, 299, 112126. [Google Scholar] [CrossRef]
Zhao, J.; Jia, Y.; Ma, L.; Yu, L. Recurrent Adaptive Graph Reasoning Network With Region and Boundary Interaction for Salient Object Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5630720. [Google Scholar] [CrossRef]
Wu, Z.; Lu, J.; Han, J.; Bai, L.; Zhang, Y.; Zhao, Z.; Song, S. Domain Separation Graph Neural Networks for Saliency Object Ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 3964–3974. [Google Scholar]
Yang, Q.; Gao, W.; Li, C.; Wang, H.; Dai, W.; Zou, J.; Xiong, H.; Frossard, P. 360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 9979–9996. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef]
Peng, H.; Li, B.; Ling, H.; Hu, W.; Xiong, W.; Maybank, S.J. Salient Object Detection via Structured Matrix Decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 818–832. [Google Scholar] [CrossRef] [PubMed]
Lan, R.; Zhou, Y.; Tang, Y.Y. Quaternionic Weber Local Descriptor of Color Images. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 261–274. [Google Scholar] [CrossRef]
Zhang, L.; Gu, Z.; Li, H. SDSP: A novel saliency detection method by combining simple priors. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; pp. 171–175. [Google Scholar]
Tong, N.; Lu, H.; Ruan, X.; Yang, M.H. Salient Object Detection via Bootstrap Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1884–1892. [Google Scholar]
Yan, Q.; Xu, L.; Shi, J.; Jia, J. Hierarchical Saliency Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1155–1162. [Google Scholar]
Movahedi, V.; Elder, J.H. Design and perceptual validation of performance measures for salient object segmentation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 49–56. [Google Scholar]
Li, G.; Yu, Y. Visual saliency based on multiscale deep features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5455–5463. [Google Scholar]
Alpert, S.; Galun, M.; Brandt, A.; Basri, R. Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 315–327. [Google Scholar] [CrossRef]
Zhu, W.; Liang, S.; Wei, Y.; Sun, J. Saliency Optimization from Robust Background Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 2814–2821. [Google Scholar]
Zhou, L.; Yang, Z.; Zhou, Z.; Hu, D. Salient Region Detection Using Diffusion Process on a Two-Layer Sparse Graph. IEEE Trans. Image Process. 2017, 26, 5882–5894. [Google Scholar] [CrossRef]
Zheng, Q.; Yu, S.; You, X. Coarse-to-fine salient object detection with low-rank matrix recovery. Neurocomputing 2020, 376, 232–243. [Google Scholar] [CrossRef]
Xiao, X.; Zhou, Y.; Gong, Y.J. RGB-‘D’ Saliency Detection With Pseudo Depth. IEEE Trans. Image Process. 2019, 28, 2126–2139. [Google Scholar] [CrossRef]
Zhang, L.; Ai, J.; Jiang, B.; Lu, H.; Li, X. Saliency Detection via Absorbing Markov Chain With Learnt Transition Probability. IEEE Trans. Image Process. 2018, 27, 987–998. [Google Scholar] [CrossRef] [PubMed]
Qin, Y.; Feng, M.; Lu, H.; Cottrell, G.W. Hierarchical Cellular Automata for Visual Saliency. Int. J. Comput. Vis. 2018, 126, 751–770. [Google Scholar] [CrossRef]
Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-Measure: A New Way to Evaluate Foreground Maps. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4548–4557. [Google Scholar]
Fan, D.P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.M.; Borji, A. Enhanced-alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main Track, Stockholm, Sweden, 13–19 July 2018; pp. 698–704. [Google Scholar]
Margolin, R.; Zelnik-Manor, L.; Tal, A. How to Evaluate Foreground Maps. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 248–255. [Google Scholar] [CrossRef]

Figure 1. Visual results of graph-based manifold ranking. (a) Input image, (b) GMR [14], (c) WMR [18], (d) proposed method, (e) ground truth.

Figure 2. Diagram of our proposed method.

Figure 3. Visual results of the first-stage graph-based manifold ranking. (a) Input image, (b) GMR [14], (c) WMR [18], (d) proposed method, (e) ground truth.

Figure 4. Quantitative comparisons on five datasets in terms of PR curves: (a) ECSSD, (b) SOD, (c) DUTOMRON, (d) HUK-IS, (e) SED2.

Figure 5. Visual comparisons of saliency maps of different algorithms in different scenarios: (a) input image, (b) GMR, (c) RBD, (d) BSCA, (e) SMD, (f) DP2-LSG, (g) RCRR, (h) WMR, (i) DRFI, (j) HCA, (k) AME, (l) Ours, (m) GT.

Figure 6. Illustrations of the superiority of our proposed “superior” affinity matrix on the ECSSD dataset: (a) Comparisons of the PR curves of our proposed affinity matrix

W^{(1)}

and the traditional multi-view graph

W^{T}

, (b) Comparisons of the PR curves of two-stage graphs with

W^{(T C)}

and without

W^{(T C)}

, respectively.

Figure 6. Illustrations of the superiority of our proposed “superior” affinity matrix on the ECSSD dataset: (a) Comparisons of the PR curves of our proposed affinity matrix

W^{(1)}

and the traditional multi-view graph

W^{T}

, (b) Comparisons of the PR curves of two-stage graphs with

W^{(T C)}

and without

W^{(T C)}

, respectively.

Figure 7. Saliency maps of ablation experiments in different scenarios: (a) input image, (b) GMR, (c) ours by

W^{(T)}

, (d) ours, (e) GT.

Figure 7. Saliency maps of ablation experiments in different scenarios: (a) input image, (b) GMR, (c) ours by

W^{(T)}

, (d) ours, (e) GT.

Figure 8. Saliency maps of ablation experiments in different scenarios: (a) input image, (b) first stage of ours without

W^{(T C)}

, (c) second stage of ours without

W^{(T C)}

, (d) first stage of ours with

W^{(T C)}

, (e) second stage of ours with

W^{(T C)}

, (f) GT.

Figure 8. Saliency maps of ablation experiments in different scenarios: (a) input image, (b) first stage of ours without

W^{(T C)}

, (c) second stage of ours without

W^{(T C)}

, (d) first stage of ours with

W^{(T C)}

, (e) second stage of ours with

W^{(T C)}

, (f) GT.

Figure 9. Comparisons of the PR curves of the different numbers of superpixels on the overall performance.

Figure 10. Comparisons of the PR curves of different graph-based methods on the HRSOD dataset.

Figure 11. Failure cases of our proposed method: (a) input image, (b) our proposed method, (c) GT.

Table 1. The details of the multi-view features.

Types	Feature Descriptions	Dim
Color features	The average RGB values of each superpixel	3
	The average LAB values of each superpixel	3
	The average HSV values of each superpixel	3
Texture features	The Gabor features of each superpixel [42]	36
	The steerable pyramids features of each superpixel [42]	12
	The average LBP features of each superpixel	1
	The QWLD features of each superpixel [43]	1
Saliency priors	The warm color prior map [44]	1
	The SR prior map [8]	1
	The dark channel prior maps [45]	3

Table 2. Quantitative comparisons on ECSSD in terms of S-measure, E-measure, F-measure, MAE, AUC, WF and OR scores.

Methods	S-Measure ↑	E-Measure ↑	F-Measure ↑	MAE ↓	AUC ↑	OR ↑	WF ↑
DRFI	0.752	0.816	0.733	0.164	0.833	0.584	0.542
GMR	0.689	0.774	0.689	0.189	0.790	0.520	0.493
RBD	0.689	0.787	0.676	0.189	0.781	0.525	0.513
BSCA	0.725	0.797	0.702	0.182	0.815	0.549	0.513
SMD	0.734	0.800	0.712	0.173	0.811	0.560	0.537
2LSG	0.702	0.786	0.703	0.181	0.795	0.541	0.510
RCRR	0.694	0.781	0.693	0.184	0.793	0.529	0.498
WMR	0.698	0.779	0.684	0.191	0.798	0.527	0.497
AME	0.775	0.824	0.789	0.168	0.832	0.628	0.586
HCA	0.707	0.825	0.778	0.119	0.781	0.616	0.674
Ours	0.756	0.821	0.756	0.145	0.816	0.614	0.603