Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights

Chen, Zhikui; Zhang, Hongwei; Liu, Zhenjiao; Zheng, Hao; Zhao, Liang

doi:10.3390/su17031136

Open AccessArticle

Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights

by

Zhikui Chen

^1,2

,

Hongwei Zhang

^1,2

,

Zhenjiao Liu

^1,2

,

Hao Zheng

^1,2

and

Liang Zhao

^1,2,*

¹

School of Software Technology, Dalian University of Technology, Dalian 116620, China

²

Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian 116620, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(3), 1136; https://doi.org/10.3390/su17031136

Submission received: 13 December 2024 / Revised: 15 January 2025 / Accepted: 28 January 2025 / Published: 30 January 2025

(This article belongs to the Special Issue Data-Driven Sustainable Development: Techniques and Applications)

Download

Browse Figures

Versions Notes

Abstract

In the context of sustainable development, particularly in environmental monitoring and resource management, data from multiple heterogeneous sources are often incomplete or inconsistent. This presents a significant challenge for data-driven analysis, especially in tasks like clustering, where the goal is to extract meaningful patterns from multi-view data. Incomplete multi-view clustering (IMVC) aims to address this challenge by effectively leveraging complementary and consistent information despite the missing data. However, traditional graph-based clustering methods that rely on Euclidean distance often fail to capture the complex structures in high-dimensional incomplete data. To overcome this limitation, we propose Motif-Based Multi-Scale Bipartite Graph Fusion (MMBGF_IMC), a novel framework that combines multi-scale measurements with ensemble clustering. By integrating higher-order graph motifs, MMBGF_IMC significantly enhances the representation of inter-instance correlations. Empirical results on seven real-world datasets demonstrate that MMBGF_IMC outperforms existing methods by an average of 5–15% in clustering accuracy (ACC) and normalized mutual information (NMI), offering an effective solution for data fusion, modeling, and mining in sustainable development applications such as ecological monitoring, urban planning, and resource management.

Keywords:

graph fusion; incomplete multi-view clustering; motif; spectral clustering

1. Introduction

In the face of increasingly severe global challenges, including resource depletion, environmental degradation, and the pressing need for sustainable development, data-driven approaches have emerged as crucial tools in addressing these issues. Sustainable development, which aims to balance economic, environmental, and social needs, has become a global priority. In this context, big data offers a wealth of information and potential insights that can significantly contribute to the realization of sustainable development goals (SDGs) [1]. From ecological conservation [2] to energy management [3], urban planning [4], and public health [5], diverse fields can benefit from data-driven solutions that enable more informed decision-making. However, the data sources for these applications are often complex, fragmented, and incomplete, making effective data integration and analysis a key challenge in achieving the goals of sustainable development.

Multi-view clustering (MVC) is a powerful technique that can address this challenge by combining information from multiple views or sources to improve the clustering accuracy and robustness compared to single-view methods [6,7,8,9,10]. In real-world applications, data are often collected from multiple sources that offer different perspectives on the same underlying objects, such as sensor data, satellite images, or social media content. Multi-view clustering has the potential to provide a more comprehensive understanding by integrating these diverse data sources. However, existing MVC methods struggle with specific problems: they often assume complete data across all views, which is rarely the case in practice. Missing or partial data in certain views are a common issue, especially when data are collected from heterogeneous sources. This problem is particularly evident in areas like environmental monitoring, where sensors might fail, or in healthcare, where data are often sparse across different domains. Hence, addressing the issue of data incompleteness in multi-view clustering is critical for enabling its widespread application in real-world scenarios, particularly in the context of sustainable development.

To overcome these challenges, graph-based incomplete multi-view clustering (IMVC) methods have been developed to manage missing data by employing strategies that complete or infer the absent views. Graph-based methods, especially spectral clustering, have proven to be highly effective in capturing the complex relationships and intrinsic structures within data [11,12,13,14]. Spectral clustering typically involves constructing similarity graphs for each view, followed by the computation of the eigenvectors of the graph’s Laplacian matrix to partition the data into clusters [15,16]. These methods have shown significant promise in multi-view clustering tasks, particularly in high-dimensional or heterogeneous datasets. However, these methods are constrained by their reliance on Euclidean distance metrics, which fall short in capturing the non-linear and intricate structures inherent in high-dimensional, incomplete data [17,18]. This limitation is particularly problematic in sustainable development applications, where data from diverse sources such as environmental sensors, satellite imagery, and health records exhibit complex relationships that are not effectively modeled by traditional similarity measures. Moreover, when the data are incomplete, traditional graph-based methods tend to struggle, as missing data in one or more views can lead to inaccurate or suboptimal clustering results. For instance, certain approaches construct similarity graphs for each view and apply weighted fusion techniques to mitigate the impact of missing data.

In this paper, we propose a novel approach to incomplete multi-view clustering by introducing motif-based bipartite graph fusion. The motivation behind our method stems from the need to effectively integrate complementary and consistent information from multiple views while overcoming the limitations of existing IMVC techniques. By leveraging multi-scale measurement techniques and ensemble clustering, our method constructs robust bipartite similarity graphs that can accommodate varying neighborhood structures and capture complex relationships within the data. This approach not only addresses the challenge of data incompleteness but also captures more nuanced relationships between data points by integrating information across different scales and views. By using multi-scale measurements and various k-nearest neighbor strategies, our method creates more reliable similarity graphs, improving the clustering accuracy and robustness. The motif-based graph fusion further enhances the integration of diverse views, leading to more effective multi-view clustering, especially in the presence of missing data.

The main contributions of this work are as follows:

We introduce a novel incomplete multi-view clustering framework that integrates multi-scale measurements and ensemble clustering into the graph construction process, generating robust bipartite similarity graphs.
We propose a motif-based multi-scale bipartite graph fusion method that effectively integrates information from multiple views, addressing data incompleteness while preserving the underlying structural relationships.
We demonstrate the effectiveness and robustness of our method through experiments on several benchmark datasets, highlighting its ability to handle incomplete and heterogeneous data in the context of sustainable development research.

Our method represents a significant advancement in the field of multi-view clustering, providing a more accurate and robust solution to the challenge of incomplete data. It holds great potential for real-world applications in sustainable development, where data from diverse sources are often incomplete, fragmented, or noisy. By enhancing the ability to integrate and analyze multi-view data, our approach can contribute to more informed decision-making in critical areas such as environmental monitoring, resource management, healthcare, and urban planning.

2. Materials and Methods

2.1. Preliminaries

Spectral clustering, which relies on graph theory, is widely recognized for its effectiveness in identifying clusters within complex data structures. Given a data matrix

X = [x_{1}, x_{2}, \dots, x_{n}]

, where each instance

x_{i} \in R^{d \times 1}

, spectral clustering models these instances as nodes in a graph

G = (X, A)

, with

A_{i j}

representing the similarity between instances

x_{i}

and

x_{j}

[19]. A common similarity function is the Gaussian kernel, defined as follows:

A_{i j} = exp (- \frac{{||x_{i} - x_{j}||}_{2}^{2}}{2 σ^{2}}),

(1)

where

σ

determines the rate of similarity decay with respect to the distance between instances. The degree matrix

D

of the graph

G

, which encodes the connectivity of nodes, is computed as follows:

d_{i} = \sum_{j = 1}^{n} A_{i j} .

(2)

Spectral clustering seeks to find a low-dimensional representation

F \in R^{n \times c}

for clustering purposes by solving the following optimization problem:

min_{F^{T} F = I} Tr (F^{T} L F),

(3)

where Tr(·) denotes the trace operation and

L \in R^{n \times n}

is the Laplacian matrix of the graph, defined as

L = D - A

for an unnormalized graph, or

L = I - D^{- \frac{1}{2}} A D^{- \frac{1}{2}}

for a normalized graph. Equation (3) is an eigenvalue decomposition problem, whose solution involves the eigenvectors corresponding to the first c smallest eigenvalues of

L

. These eigenvectors form the new representation

F

, which is subsequently clustered using the k-means algorithm.

Spectral clustering is particularly suited for multi-view data due to its ability to capture the intrinsic geometry of the data manifold, which is often multi-modal and high-dimensional. By leveraging the eigenvectors of the Laplacian matrix, spectral clustering effectively integrates information from multiple views, facilitating the discovery of clusters that are consistent across different data representations. This makes it an ideal choice for incomplete multi-view clustering (IMVC), where data from some views may be missing or incomplete.

2.2. Multi-Scale Anchor Bipartite Graph

Let

X^{(1)}, X^{(2)}, \dots, X^{(V)}

denote the multi-view data with V views, where

X^{(v)} \in R^{N \times d_{v}}

, with N representing the number of samples and

d_{v}

the feature dimension of the v-th view. Specifically, the data are divided into incomplete and complete subsets for each view:

X_{I} = {X_{I}^{(v)}}

and

X_{C} = {X_{C}^{(v)}}

, where

X_{I}^{(v)} \in R^{N_{I}^{(v)} \times d_{v}}

and

X_{C}^{(v)} \in R^{N_{C}^{(v)} \times d_{v}}

represent incomplete and complete samples, respectively, in the v-th view.

To construct the similarity graphs for each view, we use an anchor-based approach to capture the intrinsic structural relationships. Anchors serve as representative points, and several selection strategies exist, such as random selection or k-means-based selection. In this work, we use a hybrid anchor selection strategy [20], which balances the efficiency of random selection with the representativeness of k-means clustering. Specifically, a subset of candidate anchors is randomly selected to ensure diversity and computational efficiency, and then k-means clustering is applied to these candidates to identify the most representative anchor points. This combined approach leverages the strengths of both methods, resulting in a more robust and representative set of anchors. For each v-th view, this strategy selects p anchors by first randomly selecting

p^{'}

candidate anchors from

X_{C}^{(v)}

, where

p < p^{'} ≪ N

, followed by the k-means algorithm to obtain p cluster centers as the anchor set

A^{(v)}

.

Once the anchor points are selected for each view, we proceed to construct the anchor bipartite graphs. The anchor bipartite graph is a two-layer graph where one layer represents the original samples and the other layer represents the anchor points. The edges between the two layers signify the similarity between samples and their nearest anchors. This bipartite structure reduces computational complexity and memory usage, making it suitable for large-scale datasets.

To fully exploit the geometric structure within each view, we construct anchor bipartite graphs using multi-scale measurement techniques. For each sample

X_{i}^{(v)} \in X_{C}^{(v)}

and anchor

A_{j}^{(v)}

, we measure the similarity between them using multiple scales. Specifically, we compute the distance between

X_{i}^{(v)}

and

A_{j}^{(v)}

using five different distance metrics including Euclidean, Cosine, Correlation, Chebyshev, and Jaccard. Each distance metric captures a different aspect of similarity, allowing the framework to integrate diverse perspectives on data relationships.

Each scale corresponds to a distinct base cluster in ensemble clustering, enhancing the robustness of the similarity graph. For each sample

X_{i}^{(v)}

, we randomly select its k-nearest neighbors, where k ranges from 3 to 5. This variability in k introduces multiple perspectives on neighborhood structures, which are crucial for capturing the inherent data manifold’s complexity. Using these nearest neighbors, we construct a sparse bipartite graph

B = {b_{i j}}_{N \times p}

, where the following hold:

b_{i j} = \{\begin{matrix} 1, & if X_{i}^{(v)} \in X_{C}^{(v)} and A_{j}^{(v)} \in N_{K} (X_{i}^{(v)}), \\ 0, & otherwise, \end{matrix}

(4)

This multi-scale approach ensures that the similarity graph captures both local and global structural information, thereby improving the clustering performance, especially in the presence of incomplete data. The resulting bipartite graphs encapsulate multiple neighborhood relationships, which are later fused to form a comprehensive similarity matrix.

2.3. Motif-Based Bipartite Graph Fusion

Motifs, which capture higher-order interactions in networks, are pivotal in understanding complex graph structures [21]. We extend the concept of motifs, traditionally applied to networks, to bipartite graphs. In our method, a motif is defined as a subgraph consisting of undirected links, as shown in Figure 1a. The third-order (

M^{3}

) and fourth-order (

M^{4}

) motifs are considered, and their corresponding motif adjacency matrices

W_{M}

are constructed from the bipartite graphs generated earlier.

2.3.1. Third-Order Motif ( $M^{3}$ )

For the third-order motif, the motif adjacency matrix

W_{M^{3}, s}^{(v)}

is computed by counting the number of shared anchors between each pair of instances within a specific view v and scale s. Mathematically, this can be expressed as follows:

W_{M^{3}, s}^{(v)} (i, j) = \sum_{k = 1}^{p} I (B_{i, k}^{(v)} = 1 \land B_{j, k}^{(v)} = 1)

(5)

where

I (\cdot)

is the indicator function that returns 1 if both instances i and j are connected to anchor k in view v at scale s, and 0 otherwise. This formulation effectively captures the number of shared anchors between each pair of instances, thereby reflecting their similarity based on third-order interactions.

2.3.2. Fourth-Order Motif ( $M^{4}$ )

For the fourth-order motif, the motif adjacency matrix

W_{M^{4}, s}^{(v)}

accounts for pairs of instances that co-occur with pairs of anchors within the same view v and scale s. This higher-order interaction is captured by the following:

W_{M^{4}, s}^{(v)} (i, j) = \sum_{k_{1} = 1}^{p - 1} \sum_{k_{2} = k_{1} + 1}^{p} I (B_{i, k_{1}}^{(v)} = 1 \land B_{j, k_{1}}^{(v)} = 1 \land B_{i, k_{2}}^{(v)} = 1 \land B_{j, k_{2}}^{(v)} = 1)

(6)

This equation calculates the number of times each pair of instances simultaneously connects to any pair of anchors

(k_{1}, k_{2})

across all views v and scales s. By doing so,

W_{M^{4}, s}^{(v)}

captures more complex structural relationships, reflecting the consistency and collaborative connections between instances across multiple anchors.

The consensus motif adjacency matrix is then obtained by summing the motif adjacency matrices across all views and scales:

W_{m} = \sum_{v = 1}^{V} \sum_{s = 1}^{c} W_{m, s}^{(v)} + W_{m, s}^{(v) T}

(7)

This aggregation captures the higher-order structural information from all views and scales, ensuring that the consensus similarity matrix

S

reflects the collective insights from the multi-view data.

The consensus similarity matrix is derived by combining the motif adjacency matrices for third-order and fourth-order motifs:

S = W_{m^{3}} + log (W_{m^{3}} + 1) + log (W_{m^{4}} + 1)

(8)

The combination of third-order and fourth-order motifs allows the similarity matrix to encapsulate both triadic and tetradic relationships among data instances, providing a richer representation of the data’s structural properties. The logarithmic transformation ensures that the influence of higher motif counts is tempered, preventing domination by overly connected motifs.

The resulting matrix

S

represents a global consistency graph that is used in spectral clustering. To perform spectral clustering, we compute the Laplacian matrix

L = D - S

, where

D

is the degree matrix with diagonal entries

D_{i, i} = \sum_{j = 1}^{n} S_{i, j}

. By performing singular value decomposition (SVD) on

L

, we obtain the latent representation

V

, which is then clustered using the k-means algorithm.

An overview of the proposed framework is shown in Figure 2, and the complete procedure is outlined in Algorithm 1.

Algorithm 1 Motif-based multi-scale bipartite graph fusion for incomplete multi-view clustering

Input:: Original incomplete multi-view data ${X^{(v)}}_{v = 1}^{V}$ ; Number of anchors p; Range of scales c; Number of nearest neighbors k; Distance metrics ${D i s t^{(s)}}_{s = 1}^{c}$ ;
Output:: Final clustering results.

1:: Initialize anchor sets ${A^{(v)}}_{v = 1}^{V}$ using a hybrid strategy;
2:: Initialize empty set for bipartite graphs ${B^{(v)}}_{v = 1}^{V}$
3:: for $v = 1$ to V do
4:: for $s = 1$ to c do
5:: Compute $D i s t (X_{i}^{(v)}, A_{j}^{(v)}, s)$ for each sample $X_{i}^{(v)} \in X_{C}^{(v)}$ and anchor $A_{j}^{(v)}$ ;
6:: Select $k_{s}$ nearest anchors based on the computed distance;
7:: Update bipartite graph $B_{s}^{(v)}$ with selected anchors;
8:: end for
9:: Fuse multi-scale bipartite graphs to form $B^{(v)}$ by Equation (4);
10:: end for
11:: Extract motif adjacency matrices $W_{m^{3}, s}^{(v)}$ and $W_{m^{4}, s}^{(v)}$ from ${B^{(v)}}_{v = 1}^{V}$ using third-order and fourth-order motifs as defined in Equations (5) and (6);
12:: Compute the consensus motif adjacency matrix $W_{m^{3}}$ and $W_{m^{4}}$ by Equation (7)
13:: Compute consensus similarity matrix $S$ using Equation (8)
14:: Spectral Clustering:
15:: Compute Laplacian matrix $L = D - S$ ;
16:: Perform Singular Value Decomposition (SVD) on $L$ to obtain latent representation $V$ ;
17:: Apply k-means clustering on $V$ to obtain final clusters;
18:: return Final clustering results;

2.4. Computational Complexity

The proposed Motif-Based Multi-Scale Bipartite Graph Fusion for Incomplete Multi-View Clustering (MMBGF_IMC) framework comprises several key components, each contributing to the overall computational complexity. The complexity analysis for each component is detailed below:

Anchor Selection: The hybrid strategy involves a two-step process: random selection followed by k-means clustering. Selecting $p^{'}$ candidate anchors randomly from the complete subset $X_{C}^{(v)}$ has a linear complexity of $O (N p V d_{v})$ , where N is the number of samples, p is the number of anchors, V is the number of views, and $d_{v}$ is the feature dimension of the v-th view. Applying the k-means algorithm to the $p^{'}$ candidate anchors to obtain p cluster centers incurs a complexity of $O (N p V d_{v})$ . The combined complexity for anchor selection is therefore $O (N p V d_{v})$ .
Similarity Graph Construction: For each scale and view, computing distances between samples and anchors, followed by identifying the k-nearest neighbors, incurs a complexity of $O (N p V c d_{v})$ , where c is the number of scales.
Motif Extraction: Extracting motifs from bipartite graphs involves identifying higher-order interactions. The complexity for motif extraction is the following:
–
For the third-order motif ( $M^{3}$ ), the complexity is $O (p \cdot m^{2})$ , where m is the average number of anchors connected to each instance.
–
For the fourth-order motif ( $M_{1}^{4}$ ), the complexity is $O (p^{2} \cdot N)$ , where N is the number of instances and p is the number of anchors.
Given that two motif types ( $M^{3}$ and $M_{1}^{4}$ ) are considered, the total complexity for motif extraction becomes $O (p \cdot m^{2} + p^{2} \cdot N)$ .
Spectral Clustering: The eigenvalue decomposition required for spectral clustering traditionally has a complexity of $O (N^{3})$ in the worst case. However, utilizing fast approximate methods can reduce this to $O (k^{3}) + O (k n t)$ , where k is the number of clusters and t is the number of iterations in the algorithm.

Overall Time Complexity: Summing the complexities of all components, the overall time complexity of the MMBGF_IMC framework is dominated by the spectral clustering step, especially for large values of N. However, with the application of approximate spectral clustering methods, the overall complexity can be effectively managed. The cumulative time complexity can be expressed as

O (N p V c d_{v})

. This indicates that the framework scales linearly with respect to the number of samples (N) and the number of views (V), making it suitable for large-scale applications. The linear scalability ensures that the framework remains computationally feasible even as the dataset size and the number of views increase.

Space Complexity: In addition to time complexity, space complexity is an important consideration, especially for large-scale datasets. The space complexity is primarily influenced by the storage of similarity graphs and motif adjacency matrices, which require

O (N p V c)

space. Efficient data structures and sparse representations can be employed to optimize memory usage, further enhancing the framework’s scalability.

3. Experiments

Experimental Settings

We assess the performance of various methods on four widely used multi-view datasets: ALOI [22], BBC4 [23], ORL [24], OutScene [25], YALE [26], BDGP [27], and Reuters [28]. Detailed statistics and descriptions of these datasets are presented in Table 1. To simulate incomplete data, we initially select a subset of instances based on varying paired rates and subsequently randomly remove one view from the remaining instances. We generate incomplete datasets with paired rates ranging from 0.1 to 0.9 in increments of 0.2.

We compare the proposed MMBGF_IMC method against several state-of-the-art Incomplete Multi-View Clustering (IMVC) algorithms, including UEAF [29], CBG [30], BSV [31], CONCAT [31], BGIMVSC [32], SAGF_IMC [33], and GIMVC [34]. For each method, we conduct experiments 10 times and report the average performance along with the standard deviation. Clustering performance is evaluated using three metrics: clustering accuracy (ACC), normalized mutual information (NMI), and adjusted rand index (ARI). We employ the default hyperparameters as recommended in the original publications of the respective methods. All experiments are performed on a server equipped with an Intel(R) Xeon(R) Silver 4214 processor, 128 GB of RAM, and MATLAB R2023b (64-bit).

4. Results

4.1. Experimental Results and Analysis

The experimental results for various methods on the incomplete multi-view datasets are summarized in Table 2. The best results for each dataset are highlighted in bold. We make the following key observations based on the results:

Among the evaluated methods, the proposed MMBGF_IMC consistently outperforms competing approaches across most datasets and metrics, demonstrating its superior capability in handling incomplete multi-view data through multi-perspective structural analysis.

MMBGF_IMC exhibits remarkable robustness and stability across varying paired rates, maintaining high performance even as the proportion of missing data changes. This stability is particularly evident when compared to other spectral clustering-based methods, which exhibit higher sensitivity to fluctuations in paired rates. The method’s ability to consistently achieve the highest scores in clustering accuracy (ACC) and normalized mutual information (NMI) underscores its effectiveness in capturing the underlying structures of diverse data types, including images, text, and gene sequences.

When compared to baseline methods such as BSV and CONCAT, multi-view learning-based Incomplete Multi-View Clustering (IMC) methods, including MMBGF_IMC, demonstrate significant advantages by capturing more comprehensive information from incomplete datasets. This superiority highlights the limitations of simple imputation techniques, such as setting missing instances or graphs to zero or averaging, which are inadequate for addressing incomplete clustering problems. In contrast, MMBGF_IMC’s approach of thoroughly exploring the aligned information among available views proves to be a more effective strategy for tackling the complexities inherent in IMC tasks.

Furthermore, while certain graph learning-based and kernel learning-based methods (e.g., SAGF_IMC and BGIMVC) perform exceptionally well on specific datasets such as BDGP, YALE, and OutScene, they encounter significant challenges when applied to larger-scale datasets like Reuters due to memory constraints. The analysis reveals that graph learning-based methods generally achieve more discriminative representations than matrix factorization (MF)-based IMC methods, underscoring the critical role of accurately capturing the geometric structure of data in unsupervised clustering tasks. However, these graph- and kernel-based approaches necessitate the computation of multiple

n \times n

graph or kernel matrices for datasets with n samples, resulting in substantial memory storage requirements and thereby limiting their scalability. This limitation highlights the advantage of MMBGF_IMC in efficiently handling large and complex datasets without compromising performance.

The BDGP dataset, in particular, exhibits distinct characteristics compared to other datasets, primarily due to the inherent unavailability of certain original instances, which are represented as zero vectors. This inherent incompleteness poses significant challenges for the proposed MMB_IMC method, especially under low-paired-rate scenarios. MMB_IMC relies on constructing multi-scale anchor bipartite graphs and subsequently fusing them through motif-based mechanisms to derive a consensus similarity matrix. In the context of BDGP, the presence of zero vectors disrupts the accurate construction of bipartite graphs. Specifically, zero vectors can lead to misleading similarity measurements, as the distance metrics may not effectively capture the true relationships between instances and anchors. This distortion is exacerbated when the paired rate is low, resulting in a limited number of reliable paired instances to anchor the graph construction.

Additionally, the motif-based fusion process depends on the integrity of the bipartite graphs across multiple scales. With a scarcity of valid paired data, the motifs extracted from these graphs become less representative of the underlying data structure, leading to an unreliable consensus similarity matrix. Consequently, the spectral clustering stage, which operates on this consensus matrix, suffers from reduced accuracy and robustness, manifesting in poorer clustering performance metrics.

In summary, while no single IMC method consistently delivers superior performance across all types of datasets, MMBGF_IMC stands out for its robust generalization capabilities, methodological flexibility, and balanced performance across multiple metrics. These attributes make it a versatile and reliable choice for multi-view clustering tasks in real-world scenarios characterized by incomplete and diverse datasets. Future research could focus on integrating MMBGF_IMC with advanced deep learning techniques to further enhance its scalability and applicability to even larger and more complex datasets, as well as exploring its effectiveness in other data domains such as audio or time-series data.

4.2. Ablation Study

To validate the contribution of each component in our framework, we conduct an ablation study by comparing MMBGF_IMC with three variants that progressively eliminate key parts of the model:

Numbered lists can be added as follows:

w.o. MS (without multi-scale part): This version evaluates the distances between samples using a standard Euclidean metric, without the multi-scale analysis;
w.o. MF (without motif fusion part): This variant employs the multi-scale analysis but omits the motif fusion component, relying on the similarity matrix $\sum_{v = 1}^{V} B^{(v)} \cdot B^{{(v)}^{T}}$ to perform spectral clustering;
w.o. MS and MF (without both components): This version excludes both the multi-scale and motif fusion parts, essentially reducing the method to a basic spectral clustering approach.

Table 3 presents the results of this ablation study in the Out-Scene data set with a paired rate of 0.5. Performance metrics (NMI, ACC, and ARI) clearly show the following:

MMBGF_IMC outperforms all variants, demonstrating that both multi-scale analysis and motif fusion contribute significantly to its superior performance;
Without the multi-scale part (w.o. MS), the method performs noticeably worse, highlighting the importance of capturing hierarchical structures in the data;
Excluding motif fusion (w.o. MF) also leads to a performance drop, underscoring the role of motif fusion in combining complementary information from different views.

Table 3. Ablation study on Out-Scene with 0.5 paired rate. It is shown in bold for the best results.

Methods	meanNMI	meanAcc	meanARI
w.o. MS	0.5055	0.6347	0.3705
w.o. MF	0.4988	0.6146	0.3624
w.o. MS & MF	0.4873	0.5947	0.3634
MMBGF_IMC	0.5272	0.6727	0.4103

These findings confirm that both components are crucial for achieving the best clustering performance.

4.3. Parameter Study

To optimize the performance of MMBGF_IMC, we performed a parameter study focusing on the number of nearest neighbors (Knn) and the anchor rate. We explore Knn values in the range of [1, 10] and anchor rates in the range of [0.1, 0.9]. The results, shown in Figure 3, demonstrate the following:

Sensitivity to anchor rate: MMBGF_IMC is more sensitive to changes in the anchor rate than the number of nearest neighbors. As the anchor rate increases, performance improves, particularly in terms of ACC and NMI;
Effect of Knn: For larger datasets, performance improves with Knn values in the range of 7–10, while for smaller datasets, Knn values in the range of 3–5 yield the best results;

Figure 3. ACC and NMI w.r.t. different #Knns and anchor rates on ORL.

Based on these observations, we set the anchor rate to 0.2 and Knn to 7–10 for large datasets. For smaller datasets, the anchor rate is set to 0.8 and Knn to 3–5. These settings provide a good balance between performance and computational efficiency.

5. Conclusions

5.1. Main Conclusions

This study introduces a novel Motif-Based Multi-Scale Bipartite Graph Fusion for Incomplete Multiview Clustering (MMBGF_IMC), specifically designed to tackle the challenges posed by incomplete and heterogeneous data in the realm of sustainable development research.

By integrating multi-scale measurements, ensemble clustering, and motif-based bipartite graph fusion, the proposed framework adeptly captures intricate relationships and structural subtleties within multi-view data, even when faced with data incompleteness. This comprehensive approach ensures that the framework can effectively leverage the complementary information inherent in diverse data sources.

The evaluation of MMBGF_IMC was conducted on seven widely-recognized multi-view datasets—ALOI, BBC4, ORL, Out-Scene, YALE, BDGP, and Reuters. The results consistently demonstrate that MMBGF_IMC outperforms existing state-of-the-art methods across all datasets in terms of clustering accuracy (ACC), normalized mutual information (NMI), and adjusted Rand index (ARI). Notably, the framework exhibits significant performance enhancements, with clustering metrics improving by 7–12% in challenging datasets such as Out-Scene and Reuters. These improvements validate the effectiveness of the proposed method in handling diverse and complex data structures.

Furthermore, MMBGF_IMC showcases remarkable stability across varying paired rates, underscoring its robustness and reliability for real-world applications where data incompleteness is prevalent. Ablation studies reinforce the indispensable roles of both multi-scale analysis and motif fusion within the framework. The multi-scale component enhances the ability to capture hierarchical and multi-faceted data relationships, while motif fusion effectively integrates higher-order structural information, leading to superior clustering performance.

In addressing the primary scientific hypotheses of this research—namely, that multi-scale measurements combined with motif-based graph fusion can significantly enhance incomplete multi-view clustering performance—the experimental findings provide strong validation. The consistent outperformance of MMBGF_IMC across multiple datasets and varied experimental conditions confirms that the integration of these components is crucial for achieving high-quality clustering results in the presence of incomplete and heterogeneous data.

Additionally, the framework’s capability to seamlessly handle incomplete data and amalgamate information from diverse sources positions MMBGF_IMC as a potent tool for sustainable development applications. It facilitates more accurate and insightful analyses in critical areas such as environmental monitoring, urban planning, resource management, public health, and beyond, thereby contributing to informed and evidence-based decision-making processes.

5.2. Policy Recommendations

To further promote the practical adoption and impact of data-driven multi-view clustering methods in sustainable development, several key recommendations are proposed. Firstly, it is imperative for governments and organizations to invest in robust data collection and management infrastructures. Ensuring the availability of high-quality, multi-view datasets is foundational for the success of sustainable development research.

Moreover, fostering interdisciplinary collaborations between computer scientists, environmental experts, urban planners, and other relevant stakeholders will enhance the refinement and applicability of clustering methodologies. Such collaborations can drive the customization of algorithms to better address specific real-world challenges, maximizing their relevance and effectiveness.

Addressing privacy and ethical considerations is also crucial as data-driven techniques become more pervasive. Establishing stringent data governance frameworks and ensuring transparency in data usage will help maintain public trust and ensure equitable outcomes.

5.3. Limitations

Despite the promising advancements presented by the MMBGF_IMC framework, several limitations warrant consideration. One primary limitation is the sensitivity of the method to certain hyperparameters, such as the number of anchors and the number of nearest neighbors (Knn). While optimal parameter ranges have been identified, these settings may need further refinement for different datasets, suggesting the need for automated parameter selection techniques in future research.

Additionally, the computational complexity associated with motif-based fusion and spectral clustering may pose scalability challenges for extremely large datasets. Future work could explore the incorporation of efficient approximation algorithms or leverage parallel computing architectures to mitigate these issues.

Another limitation pertains to the framework’s focus on static datasets. Many real-world applications involve dynamic or temporal data, necessitating extensions of MMBGF_IMC to handle such complexities. Integrating temporal dynamics would significantly enhance the framework’s applicability to time-sensitive domains like disaster management and traffic prediction.

Lastly, while the framework has been validated on general multi-view datasets, its performance on domain-specific datasets—such as those in genomics, finance, or social sciences—remains unexplored. Future studies should assess and adapt MMBGF_IMC to these specialized contexts to fully harness its potential across a broader spectrum of applications.

6. Discussion

This research underscores the transformative potential of advanced data-driven techniques in addressing global sustainable development challenges. By leveraging multi-view data and innovative clustering methodologies, deeper insights into complex systems can be attained, thereby informing more effective and evidence-based decision-making. The proposed MMBGF_IMC framework bridges the gap between theoretical advancements and practical applications, offering a robust tool for sustainable development research. Moving forward, it is essential to refine and expand these methods to ensure their applicability to real-world problems, thereby providing actionable insights that support informed policy and decision-making across diverse fields. The continued evolution of such data-driven frameworks will be pivotal in tackling some of the most pressing challenges of our time, particularly in sustainable development and environmental management.

Author Contributions

Conceptualization, Z.C. and L.Z.; methodology, H.Z. (Hongwei Zhang); software, H.Z. (Hongwei Zhang); validation, H.Z. (Hongwei Zhang), Z.L. and H.Z. (Hao Zheng); formal analysis, H.Z. (Hongwei Zhang); investigation, Z.L.; resources, H.Z. (Hao Zheng); data curation, Z.L.; writing—original draft preparation, H.Z. (Hongwei Zhang); writing—review and editing, H.Z. (Hongwei Zhang); visualization, H.Z. (Hao Zheng); supervision, L.Z.; project administration, Z.C.; funding acquisition, Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China No. 62476038 and Science and Technology Planning Project of Liaoning Province No. 2023JH2/101300092 and 2023JH26/10100008.

Institutional Review Board Statement

This research was approved by the authors’ college of the relevant university.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author(s) upon request.

Acknowledgments

We would like to thank the anonymous reviewers for their time and effort devoted to improving the quality of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kharrazi, A.; Qin, H.; Zhang, Y. Urban big data and sustainable development goals: Challenges and opportunities. Sustainability 2016, 8, 1293. [Google Scholar] [CrossRef]
He, L.; Shen, J.; Zhang, Y. Ecological vulnerability assessment for ecological conservation and environmental management. J. Environ. Manag. 2018, 206, 1115–1125. [Google Scholar] [CrossRef] [PubMed]
Mariano-Hernández, D.; Hernández-Callejo, L.; Zorita-Lamadrid, A.; Duque-Pérez, O.; García, F.S. A review of strategies for building energy management system: Model predictive control, demand side management, optimization, and fault detect & diagnosis. J. Build. Eng. 2021, 33, 101692. [Google Scholar]
Datola, G. Implementing urban resilience in urban planning: A comprehensive framework for urban resilience evaluation. Sustain. Cities Soc. 2023, 98, 104821. [Google Scholar] [CrossRef]
Johnson, D.P.; Ravi, N.; Filippelli, G.; Heintzelman, A. A Novel Hybrid Approach: Integrating Bayesian SPDE and Deep Learning for Enhanced Spatiotemporal Modeling of PM_2.5 Concentrations in Urban Airsheds for Sustainable Climate Action and Public Health. Sustainability 2024, 16, 10206. [Google Scholar] [CrossRef]
Hu, Y.; Cai, H. Multi-View Clustering Through Hypergraphs Integration on Stiefel Manifold. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 01–06. [Google Scholar]
Li, L.; Zhang, J.; Wang, S.; Liu, X.; Li, K.; Li, K. Multi-view bipartite graph clustering with coupled noisy feature filter. IEEE Trans. Knowl. Data Eng. 2023, 35, 12842–12854. [Google Scholar] [CrossRef]
Zhao, L.; Zhang, J.; Wang, Q.; Chen, Z. Dual alignment self-supervised incomplete multi-view subspace clustering network. IEEE Signal Process. Lett. 2021, 28, 2122–2126. [Google Scholar] [CrossRef]
Chen, Z.; Li, Y.; Lou, K.; Zhao, L. Incomplete Multi-View Clustering with Complete View Guidance. IEEE Signal Process. Lett. 2023, 30, 1247–1251. [Google Scholar] [CrossRef]
Li, Z.; Wu, D.; Nie, F.; Wang, R.; Sun, Z.; Li, X. Multi-View Clustering Based on Invisible Weights. IEEE Signal Process. Lett. 2021, 28, 1051–1055. [Google Scholar] [CrossRef]
Wang, H.; Yang, Y.; Liu, B. GMC: Graph-based multi-view clustering. IEEE Trans. Knowl. Data Eng. 2019, 32, 1116–1129. [Google Scholar] [CrossRef]
Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Zhang, W.; Zhu, E. Consensus graph learning for multi-view clustering. IEEE Trans. Multimed. 2021, 24, 2461–2472. [Google Scholar] [CrossRef]
Wang, H.; Yang, Y.; Liu, B.; Fujita, H. A study of graph-based system for multi-view clustering. Knowl.-Based Syst. 2019, 163, 1009–1019. [Google Scholar] [CrossRef]
Liu, C.; Wu, S.; Li, R.; Jiang, D.; Wong, H.S. Self-supervised graph completion for incomplete multi-view clustering. IEEE Trans. Knowl. Data Eng. 2023, 35, 9394–9406. [Google Scholar] [CrossRef]
Cai, X.; Nie, F.; Huang, H.; Kamangar, F. Heterogeneous image feature integration via multi-modal spectral clustering. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1977–1984. [Google Scholar]
Huang, D.; Wang, C.D.; Lai, J.H. Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity. IEEE Trans. Knowl. Data Eng. 2023, 35, 11388–11402. [Google Scholar] [CrossRef]
Abu Alfeilat, H.A.; Hassanat, A.B.; Lasassmeh, O.; Tarawneh, A.S.; Alhasanat, M.B.; Eyal Salman, H.S.; Prasath, V.S. Effects of distance measure choice on k-nearest neighbor classifier performance: A review. Big Data 2019, 7, 221–248. [Google Scholar] [CrossRef]
Xia, S.; Xiong, Z.; Luo, Y.; Xu, W.; Zhang, G. Effectiveness of the Euclidean distance in high dimensional spaces. Optik 2015, 126, 5614–5619. [Google Scholar] [CrossRef]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Huang, D.; Wang, C.D.; Wu, J.S.; Lai, J.H.; Kwoh, C.K. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 2019, 32, 1212–1226. [Google Scholar] [CrossRef]
Benson, A.R.; Gleich, D.F.; Leskovec, J. Higher-order organization of complex networks. Science 2016, 353, 163–166. [Google Scholar] [CrossRef]
Geusebroek, J.M.; Burghouts, G.J.; Smeulders, A.W. The Amsterdam library of object images. Int. J. Comput. Vis. 2005, 61, 103–112. [Google Scholar] [CrossRef]
Greene, D.; Cunningham, P. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In Proceedings of the 23rd International Conference on Machine learning (ICML’06), Honolulu, HI, USA, 23–29 July 2023; ACM Press: New York, NY, USA, 2006; pp. 377–384. [Google Scholar]
Samaria, F.S.; Harter, A.C. Parameterisation of a stochastic model for human face identification. In Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; IEEE: Piscataway, NJ, USA, 1994; pp. 138–142. [Google Scholar]
Monadjemi, A.; Thomas, B.T.; Mirmehdi, M. Experiments on high resolution images towards outdoor scene classification. In Proceedings of the Seventh Computer Vision Winter Workshop (CVWW 2002), Bad Aussee, Austria, 6–8 February 2002; Wildenauer, H., Kropatsch, W., Eds.; Vienna University of Technology: Vienna, Austria, 2002; pp. 325–334. [Google Scholar]
Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
Tomancak, P.; Berman, B.P.; Beaton, A.; Weiszmann, R.; Kwan, E.; Hartenstein, V.; Celniker, S.E.; Rubin, G.M. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2007, 8, R145. [Google Scholar] [CrossRef] [PubMed]
Amini, M.R.; Usunier, N.; Goutte, C. Learning from multiple partially observed views-an application to multilingual text categorization. Adv. Neural Inf. Process. Syst. 2009, 22, 28–36. [Google Scholar]
Wen, J.; Zhang, Z.; Xu, Y.; Zhang, B.; Fei, L.; Liu, H. Unified embedding alignment with missing views inferring for incomplete multi-view clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5393–5400. [Google Scholar]
Wang, S.; Liu, X.; Liu, L.; Tu, W.; Zhu, X.; Liu, J.; Zhou, S.; Zhu, E. Highly-efficient incomplete large-scale multi-view clustering with consensus bipartite graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9776–9785. [Google Scholar]
Zhao, H.; Liu, H.; Fu, Y. Incomplete multi-modal visual data grouping. In Proceedings of the IJCAI, New York, NY, USA, 9–15 July 2016; pp. 2392–2398. [Google Scholar]
Sun, L.; Wen, J.; Liu, C.; Fei, L.; Li, L. Balance guided incomplete multi-view spectral clustering. Neural Netw. 2023, 166, 260–272. [Google Scholar] [CrossRef]
Liang, N.; Yang, Z.; Xie, S. Incomplete multi-view clustering with sample-level auto-weighted graph fusion. IEEE Trans. Knowl. Data Eng. 2022, 35, 6504–6511. [Google Scholar] [CrossRef]
Yang, Z.; Zhang, H.; Wei, Y.; Wang, Z.; Nie, F.; Hu, D. Geometric-inspired graph-based Incomplete Multi-view Clustering. Pattern Recognit. 2024, 147, 110082. [Google Scholar] [CrossRef]

Figure 1. Definition of bipartite motifs and example motif adjacency matrices. (a) The solid blue and green circles represent instances and anchors, respectively. The dashed circles are the missing instances of the corresponding views. (b) Each sample has two nearest anchors as example represented by solid links to construct the motif adjacency matrices.

Figure 2. Framework of the proposed incomplete multi-view clustering method. In the bipartite graph generation stage, each sample has a varying range of k-nearest neighbors, from 3 to 5. As a result, the motif structure of each bipartite graph is distinct, and this is visualized through transparent nodes and links in the motif-based bipartite graph fusion stage. The proposed method focuses on learning a consensus similarity graph from multiple views for clustering.

Table 1. Summary of multiview datasets.

Dataset	#Clusters	#Instances	#Features
ALOI	100	10,800	77/13/64/125
BBC4	5	685	4659/4633/4665/4684
BDGP	5	2500	1750/79
ORL	40	400	4096/3304/6750
Out-Scene	8	2688	512/432/256/48
Reuters	6	18,715	10/10
YALE	15	165	4096/3304/6750

Table 2. Mean NMIs, ACCs, and ARIs of different methods on various datasets. It is shown in bold for the best results.

		MeanNMI					MeanACC					MeanARI
Dataset	Method/p	0.1	0.3	0.5	0.7	0.9	0.1	0.3	0.5	0.7	0.9	0.1	0.3	0.5	0.7	0.9
ALOI	BSV	0.5108	0.5383	0.5701	0.6096	0.6368	0.3490	0.3685	0.3895	0.4119	0.4329	0.0703	0.1046	0.1546	0.2216	0.2810
	CONCAT	0.2995	0.3176	0.3359	0.3523	0.3711	0.1215	0.1276	0.1336	0.1383	0.1445	0.0183	0.0276	0.0421	0.0582	0.0744
	UEAF	0.5586	0.5676	0.5756	0.5909	0.6190	0.3469	0.3437	0.3570	0.3880	0.4108	0.1884	0.2085	0.2196	0.2397	0.2749
	CBG	0.4521	0.4526	0.4539	0.5055	0.5561	0.2305	0.2094	0.2269	0.2754	0.2984	0.0379	0.0425	0.0449	0.0637	0.0880
	SAGF_IMC	0.6218	0.6441	0.6665	0.6773	0.6785	0.5109	0.5309	0.5520	0.5584	0.5628	0.0803	0.1007	0.1255	0.1472	0.1446
	BGIMVC	0.7303	0.7312	0.7366	0.7361	0.7407	0.6517	0.6513	0.6751	0.6493	0.6795	0.2315	0.2211	0.2039	0.2317	0.2025
	GIMVC	0.7124	0.7194	0.7375	0.7447	0.7535	0.4349	0.4461	0.4642	0.5051	0.5089	0.2375	0.2309	0.2746	0.3176	0.2841
	MMBGF_IMC	0.7411	0.7448	0.7552	0.7529	0.7644	0.6825	0.6860	0.6882	0.6889	0.6986	0.2583	0.2442	0.2894	0.2560	0.2955
BBC4	BSV	0.0605	0.0678	0.0553	0.0808	0.1052	0.3742	0.3874	0.3674	0.3826	0.4035	0.0104	0.0190	0.0236	0.0133	0.0311
	CONCAT	0.0520	0.0172	0.0598	0.1142	0.0941	0.3615	0.3428	0.3680	0.3775	0.3841	0.0162	0.0043	0.0202	0.0219	0.0484
	UEAF	0.5729	0.5935	0.6343	0.6151	0.5720	0.7136	0.7422	0.7962	0.7455	0.6863	0.5193	0.5449	0.6176	0.5871	0.4950
	CBG	0.4323	0.4494	0.4925	0.5541	0.4376	0.5012	0.5109	0.6536	0.7616	0.5004	0.2511	0.2626	0.3754	0.5038	0.2879
	SAGF_IMC	0.4768	0.4659	0.4760	0.4779	0.4995	0.7032	0.6993	0.6964	0.6949	0.6978	0.4846	0.4682	0.4784	0.4798	0.4950
	BGIMVC	0.6120	0.5907	0.6354	0.6283	0.5937	0.8155	0.7847	0.8222	0.8253	0.8000	0.5958	0.5541	0.6195	0.6126	0.5654
	GIMVC	0.1842	0.1873	0.1105	0.1943	0.1544	0.4942	0.4781	0.4223	0.4629	0.4467	0.1880	0.1582	0.1125	0.1861	0.1636
	MMBGF_IMC	0.6470	0.6795	0.6950	0.7014	0.7059	0.8301	0.8590	0.8673	0.8692	0.8692	0.6845	0.7179	0.7330	0.7456	0.7451
ORL	BSV	0.4772	0.5390	0.5719	0.6433	0.7154	0.3665	0.4105	0.4540	0.4868	0.5348	0.0562	0.0897	0.1148	0.2225	0.3488
	CONCAT	0.4819	0.5332	0.5872	0.6211	0.6883	0.3603	0.3833	0.4368	0.4435	0.4998	0.0580	0.0950	0.1448	0.2051	0.3072
	UEAF	0.6859	0.7134	0.7140	0.7154	0.7310	0.5035	0.5198	0.5413	0.5388	0.5528	0.3204	0.3555	0.3595	0.3697	0.3866
	CBG	0.8164	0.8333	0.8506	0.8447	0.8788	0.6518	0.6610	0.6798	0.6663	0.7308	0.4515	0.4868	0.5031	0.5182	0.6312
	SAGF_IMC	0.6877	0.6900	0.6938	0.6691	0.6886	0.5163	0.5285	0.5350	0.5118	0.5133	0.2695	0.2710	0.2736	0.2473	0.2651
	BGIMVC	0.5990	0.7173	0.7159	0.6241	0.6447	0.3815	0.5248	0.4978	0.4050	0.3898	0.1871	0.3741	0.3648	0.2123	0.2330
	GIMVC	0.8339	0.8451	0.8537	0.8502	0.8618	0.6935	0.7003	0.7155	0.7065	0.7242	0.5579	0.5614	0.5664	0.5791	0.6007
	MMBGF_IMC	0.8535	0.8433	0.8399	0.8504	0.8616	0.7583	0.7383	0.7320	0.7358	0.7588	0.6349	0.5667	0.5778	0.6159	0.6426
Out-Scene	BSV	0.3368	0.3423	0.3723	0.3967	0.4201	0.4465	0.4825	0.4889	0.5256	0.5363	0.1762	0.2084	0.2389	0.2796	0.3075
	CONCAT	0.1479	0.1472	0.1482	0.1673	0.1736	0.3026	0.3005	0.3081	0.3218	0.3247	0.0758	0.0790	0.0894	0.1068	0.1153
	UEAF	0.0599	0.0514	0.0511	0.0517	0.0528	0.1938	0.1914	0.1910	0.1940	0.2028	0.0219	0.0182	0.0195	0.0220	0.0229
	CBG	0.3620	0.3510	0.3837	0.3842	0.3906	0.4810	0.5160	0.5186	0.5116	0.5143	0.2666	0.2662	0.2940	0.2894	0.2964
	SAGF_IMC	0.4427	0.4678	0.4567	0.4988	0.5302	0.5374	0.4700	0.5580	0.5735	0.5916	0.2904	0.3445	0.3427	0.3823	0.4190
	BGIMVC	0.4530	0.4566	0.4961	0.5222	0.4689	0.5741	0.5553	0.5688	0.6078	0.5676	0.3589	0.3194	0.3849	0.4298	0.3350
	GIMVC	0.5080	0.5316	0.5240	0.5220	0.5264	0.6344	0.6699	0.6365	0.6516	0.6289	0.4035	0.4049	0.4037	0.4586	0.4365
	MMBGF_IMC	0.5262	0.5269	0.5272	0.5303	0.5320	0.6636	0.6762	0.6727	0.6775	0.6797	0.4056	0.4108	0.4103	0.4137	0.4152
YALE	BSV	0.3404	0.3824	0.4227	0.4592	0.5454	0.3285	0.3715	0.3988	0.4048	0.4830	0.0599	0.0897	0.1306	0.1811	0.2934
	CONCAT	0.3309	0.3259	0.3977	0.4394	0.4719	0.3236	0.3067	0.3618	0.3964	0.4091	0.0547	0.0612	0.1216	0.1523	0.2114
	UEAF	0.5782	0.5975	0.5999	0.5924	0.5752	0.5297	0.5267	0.5303	0.5364	0.5182	0.3448	0.3627	0.3682	0.3635	0.3400
	CBG	0.6710	0.6216	0.6044	0.6782	0.6540	0.6182	0.5655	0.5564	0.6200	0.6042	0.4458	0.3861	0.3625	0.4540	0.4258
	SAGF_IMC	0.5349	0.5307	0.5541	0.5661	0.5744	0.4806	0.4788	0.4970	0.5097	0.5152	0.2748	0.3120	0.2930	0.3352	0.3388
	BGIMVC	0.4438	0.4649	0.4687	0.4463	0.4583	0.3891	0.4133	0.3885	0.3988	0.4164	0.1543	0.2032	0.1827	0.1746	0.1837
	GIMVC	0.6636	0.6261	0.6472	0.6636	0.6620	0.6267	0.5806	0.6042	0.6370	0.6309	0.4397	0.3903	0.4094	0.4715	0.4354
	MMBGF_IMC	0.6687	0.6823	0.6748	0.6727	0.6697	0.6552	0.6806	0.6655	0.6703	0.6636	0.4479	0.4677	0.4621	0.4338	0.4365
BDGP	BSV	0.1631	0.1875	0.2519	0.3012	0.3246	0.3514	0.3634	0.4359	0.4567	0.4949	0.0332	0.0442	0.0919	0.1069	0.1621
	CONCAT	0.1860	0.2132	0.2596	0.3076	0.3612	0.3615	0.3804	0.4084	0.4632	0.4994	0.0414	0.0594	0.0800	0.1179	0.1562
	UEAF	0.4320	0.5939	0.6739	0.7171	0.7743	0.6998	0.8253	0.8664	0.8892	0.9173	0.4154	0.6183	0.6988	0.7440	0.8047
	CBG	0.3668	0.4762	0.5030	0.6255	0.6825	0.5836	0.6732	0.7040	0.8172	0.8487	0.2441	0.3145	0.3662	0.5848	0.6441
	SAGF_IMC	0.4619	0.5189	0.7152	0.6420	0.6683	0.6651	0.7172	0.8844	0.7196	0.7348	0.6651	0.7172	0.8844	0.7196	0.7348
	BGIMVC	0.0251	0.0220	0.0487	0.0785	0.0313	0.2240	0.2220	0.2423	0.2724	0.2304	0.0009	0.0005	0.0035	0.0156	0.0012
	GIMVC	0.3734	0.5605	0.5997	0.6049	0.6306	0.5463	0.6886	0.6893	0.6812	0.6739	0.2997	0.5278	0.5529	0.5505	0.5600
	MMBGF_IMC	0.4139	0.5535	0.7138	0.7960	0.8938	0.5161	0.7260	0.8739	0.9251	0.9588	0.2074	0.3903	0.7046	0.8228	0.9004
Reuters	BSV	0.0610	0.0728	0.0892	0.1030	0.1214	0.3380	0.3498	0.3642	0.3873	0.3976	0.0305	0.0361	0.0532	0.0712	0.0987
	CONCAT	0.0550	0.0732	0.1178	0.1446	0.1743	0.3097	0.3297	0.3891	0.4075	0.4383	0.0260	0.0364	0.0783	0.1053	0.1380
	CBG	0.0972	0.1160	0.1333	0.1743	0.1671	0.3683	0.3735	0.3719	0.4258	0.4142	0.1232	0.1319	0.1212	0.1548	0.1676
	GIMVC	0.2142	0.2351	0.2457	0.2447	0.2413	0.4342	0.4482	0.4551	0.4543	0.4342	0.1653	0.1598	0.1638	0.1645	0.1582
	MMBGF_IMC	0.1353	0.1626	0.2160	0.2482	0.2948	0.4058	0.4239	0.4642	0.4888	0.5233	0.1192	0.1212	0.1664	0.2054	0.2569

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Zhang, H.; Liu, Z.; Zheng, H.; Zhao, L. Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights. Sustainability 2025, 17, 1136. https://doi.org/10.3390/su17031136

AMA Style

Chen Z, Zhang H, Liu Z, Zheng H, Zhao L. Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights. Sustainability. 2025; 17(3):1136. https://doi.org/10.3390/su17031136

Chicago/Turabian Style

Chen, Zhikui, Hongwei Zhang, Zhenjiao Liu, Hao Zheng, and Liang Zhao. 2025. "Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights" Sustainability 17, no. 3: 1136. https://doi.org/10.3390/su17031136

APA Style

Chen, Z., Zhang, H., Liu, Z., Zheng, H., & Zhao, L. (2025). Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights. Sustainability, 17(3), 1136. https://doi.org/10.3390/su17031136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights

Abstract

1. Introduction

2. Materials and Methods

2.1. Preliminaries

2.2. Multi-Scale Anchor Bipartite Graph

2.3. Motif-Based Bipartite Graph Fusion

2.3.1. Third-Order Motif ( $M^{3}$ )

2.3.2. Fourth-Order Motif ( $M^{4}$ )

2.4. Computational Complexity

3. Experiments

Experimental Settings

4. Results

4.1. Experimental Results and Analysis

4.2. Ablation Study

4.3. Parameter Study

5. Conclusions

5.1. Main Conclusions

5.2. Policy Recommendations

5.3. Limitations

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Beyond Missing Data: A Multi-Scale Graph Fusion Framework for Sustainable Development Insights

Abstract

1. Introduction

2. Materials and Methods

2.1. Preliminaries

2.2. Multi-Scale Anchor Bipartite Graph

2.3. Motif-Based Bipartite Graph Fusion

2.3.1. Third-Order Motif ( M 3 )

2.3.2. Fourth-Order Motif ( M 4 )

2.4. Computational Complexity

3. Experiments

Experimental Settings

4. Results

4.1. Experimental Results and Analysis

4.2. Ablation Study

4.3. Parameter Study

5. Conclusions

5.1. Main Conclusions

5.2. Policy Recommendations

5.3. Limitations

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3.1. Third-Order Motif ( $M^{3}$ )

2.3.2. Fourth-Order Motif ( $M^{4}$ )