1. Introduction
Sustainability in the context of supply chains has attracted increasing attention in recent years, driven by the heightened exposure to environmental risks, regulatory pressure, and recurring disruptions across global production and distribution networks. Climate change-related events, geopolitical instability, resource scarcity, and evolving governance requirements have increased the need for supply chains that are not only efficient and resilient, but also environmentally and socially responsible [
1]. In particular, the growing emphasis on carbon reduction and net-zero commitments has placed renewed focus on upstream supply chain activities, where indirect or Scope 3 emissions often account for the majority of the environmental impact [
2]. Despite their significance, such emissions remain difficult to measure, manage, and mitigate due to limited visibility and fragmented data across multi-tier networks.
Artificial Intelligence (AI) has been widely highlighted as a key enabler for addressing complexity in modern supply chains [
3]. The recent literature points to its potential to improve forecasting accuracy, coordination, and responsiveness, while also contributing indirectly to sustainability objectives through efficiency gains and waste reduction [
4,
5]. Advances in machine learning, optimisation, and data-driven decision support have further strengthened expectations around the role of AI in sustainable supply chain management [
6]. However, much of this work remains focused on performance optimisation or impact assessment, offering limited insight into how AI can be operationalised to support concrete sustainability-oriented decisions under real-world constraints.
A growing body of research has explored the use of AI techniques, and related technologies, such as evolutionary computation [
7,
8] and the Internet of Things [
9], to address sustainability challenges such as emissions reduction, green production planning, and environmentally responsible supplier selection. While these approaches demonstrate that sustainability objectives can be embedded into predictive or optimisation models, they often rely on simplified representations of supply chains, treating entities and relationships as independent or locally connected. As a result, indirect dependencies, hidden supplier relationships, and incomplete upstream information are frequently abstracted away, despite their central importance for sustainability-critical decisions such as supplier substitution or compliance assessment. Moreover, many AI-based approaches provide limited transparency into how recommendations are derived, which poses challenges for trust, accountability, and adoption in organisational contexts where sustainability decisions must be justified to diverse stakeholders [
10,
11].
Graph-based AI offers a promising alternative by explicitly modelling supply chains as networks of interconnected entities and relationships [
12]. Knowledge graphs, in particular, provide a semantically rich and interpretable representation of heterogeneous supply chain data, enabling reasoning over incomplete and multi-tier structures [
13]. When combined with Graph Neural Networks (GNNs), such representations allow inductive learning over relational context, supporting the inference of missing relationships and unobserved attributes. Despite these advantages, existing graph-based supply chain research has largely focused on visibility enhancement, risk analysis, or operational optimisation, with sustainability considerations either treated implicitly or omitted altogether [
14,
15]. Explicit decision support for sustainability-oriented tasks, such as supplier selection or substitution under environmental constraints, remains underexplored.
In this paper, we aim to address this gap by proposing a knowledge graph-based AI framework for sustainability-oriented decision support in supply chains. Sustainability is considered primarily in terms of environmental performance and governance, including carbon emissions as a proxy for Scope 3 impacts, supplier sustainability performance, and compliance or certification-related attributes. The proposed approach models a multi-tier supply chain as a heterogeneous knowledge graph and employs GNNs to perform two complementary knowledge graph completion tasks as follows: link prediction, used to infer feasible alternative supplier relationships, and node classification, used to estimate sustainability-related labels where data are incomplete. Rather than optimising prediction accuracy in isolation, the approach is designed to support feasibility-oriented decision-making under uncertainty, providing structured and explainable recommendations that can assist practitioners in navigating sustainability trade-offs.
The main contributions of our work are threefold. First, it provides a structured analysis of existing AI and graph-based approaches to supply chain sustainability, highlighting a persistent gap between methodological capability and sustainability-oriented decision support. Second, it introduces a knowledge graph-based AI methodology that integrates explicit relational representation with inductive learning to support sustainability-aware supplier selection and substitution. Third, it demonstrates the applicability of the proposed approach through empirical evaluation on two real-world supply chain datasets characterised by heterogeneity and data sparsity.
The remainder of this paper is organised as follows.
Section 2 reviews related work on AI and graph-based methods in supply chains, with a focus on sustainability.
Section 3 presents the proposed methodology, including knowledge graph construction, GNN learning, and sustainability reasoning mechanisms.
Section 4 describes the evaluation setup and datasets.
Section 5 discusses the empirical results and their implications for sustainability-oriented decision support. Finally,
Section 6 concludes and outlines directions for future research.
2. Related Work
This section reviews the literature on AI in supply chains with a particular emphasis on its role in supporting sustainability-oriented decision-making. In this paper, sustainability is considered primarily in terms of environmental performance and governance, including carbon emissions as a proxy for Scope 3 impacts, supplier sustainability performance, and compliance or certification-related attributes. The review is organised around three complementary perspectives. First, it considers studies that examine the impact and adoption of AI technologies in supply chains, often motivated by gains in efficiency, resilience, or operational performance, and increasingly framed in relation to sustainability objectives. Second, it analyses operational AI applications that seek to address sustainability-related challenges, such as emissions reduction, supplier evaluation, and resource efficiency, typically through optimisation or predictive modelling. Third, it critically assesses the extent to which existing AI approaches, both ones unrelated to graphs and graph-oriented ones, explicitly support sustainability-aware tasks and processes. Through this structure, the review highlights a persistent gap between the acknowledged potential of AI for sustainable supply chains and the limited availability of graph-based, explainable, and decision-oriented methods capable of reasoning in relation to sustainability.
2.1. Impact of AI on Supply Chain Sustainability
The recent literature has increasingly examined the adoption of AI across supply chains and its implications for operational performance and sustainability objectives. Most authors position AI as a key enabler of improved efficiency, coordination, and resilience in complex supply networks, with sustainability often framed as a downstream benefit of these improvements. Across multiple sectors, including manufacturing, agriculture, healthcare, finance, and energy, AI is frequently associated with enhanced decision support, waste reduction, and more effective resource utilisation.
Representative studies summarised in
Table 1 illustrate this broad perspective. Akbari and Hopkins [
16] highlight AI, alongside big data analytics and robotics, as a central component of digital transformation for Industry 4.0 and beyond, with projected adoption rates exceeding those of many other emerging technologies in manufacturing. Sector-specific studies [
17,
18,
19] further argue that AI can contribute to sustainability-related goals by supporting more efficient distribution networks, improving energy efficiency, and reducing waste in supply chain operations. In energy-intensive and globalised supply chains, as discussed by Zhong et al. [
20], AI-driven coordination between supply and demand has also been linked to potential reductions in energy consumption and associated emissions.
Despite these contributions, studies in this area do not move beyond discussing impact, offering limited insight into how AI can be operationalised to support concrete sustainability-oriented decisions. In particular, they do not address how AI methods might assist with decision-making processes that have a direct or indirect effect on sustainability. This limitation motivates the need to move beyond discussions of impact towards AI approaches that explicitly embed sustainability considerations within supply chain decision-making.
2.2. Operational AI Approaches for Supply Chain Sustainability
A substantial body of research has explored the use of AI to support sustainability objectives in supply chains through operational and optimisation-oriented approaches. These studies typically employ machine learning, deep learning, reinforcement learning, and hybrid optimisation frameworks, to address sustainability-related challenges including emissions reduction, waste minimisation, energy efficiency, and environmentally responsible supplier selection. This is generally achieved by operationalising through measurable indicators, such as carbon emissions, energy consumption, cost–emissions trade-offs, or compliance-related constraints.
Table 2 includes indicative examples of these approaches. These demonstrate how AI has been applied across a range of industries to support sustainability-relevant outcomes, including carbon emission forecasting [
21], green production planning [
22], supplier evaluation and selection [
23], blockchain-based traceability enhancement [
24], and out-of-stock prediction [
25,
26]. Several contributions explicitly embed sustainability objectives within learning or optimisation processes, for instance, by incorporating emissions-related costs, green production constraints, or environmental performance indicators into decision models. Collectively, these works illustrate that AI approaches can play an effective role in advancing sustainability goals when such goals are clearly defined and quantifiable.
However, as reflected across the studies in
Table 2, these approaches predominantly rely on tabular, sequential, or otherwise network-agnostic representations of supply chain data. While effective for local optimisation or prediction tasks, this limits their ability to capture the relational and multi-tier structure that characterises real-world supply chains. As a result, indirect dependencies, hidden supplier relationships, and incomplete upstream information are often abstracted away, despite their importance for sustainability-critical challenges such as Scope 3 emissions assessment or sustainability-aware supplier substitution. In addition, many of these approaches provide limited support for transparent and explainable reasoning. Although optimisation outcomes or predictions may be accurate, the underlying rationale for sustainability-related recommendations is not always easily traceable, which poses challenges for sustainability reporting, auditing, and regulatory compliance.
These limitations suggest that, while some AI methods have demonstrated value for sustainability-focused supply chain tasks, there remains a need for approaches that can explicitly model relational structure, support learning under incomplete information, and provide interpretable decision support. This observation motivates the growing interest in knowledge graph-based representations, which offer a natural mechanism for capturing heterogeneous supply chain entities, relationships, and sustainability attributes in a unified and semantically rich structure.
2.3. Graph-Based AI in Supply Chains
Graph-based AI has become an increasingly prominent approach for modelling supply chains as complex, interconnected systems rather than linear sequences of activities. As shown in
Table 3, existing research adopts graph representations to capture heterogeneous entities such as firms, products, locations, and infrastructure, together with the relational dependencies that govern material, information, and financial flows. Across this literature, graph-based methods are used to support a wide range of objectives, including visibility enhancement, risk identification, forecasting, routing, and operational optimisation [
28,
29,
30,
31].
From a representational perspective, prior work spans three broad categories. Several studies rely on explicit knowledge graphs, often grounded in domain ontologies, to structure supply chain information and enable semantic querying and reasoning [
15,
34,
35,
38]. These approaches are commonly used for tasks such as visibility enhancement, risk analysis, and information extraction, where the primary contribution lies in organising heterogeneous data into a coherent relational structure [
28,
38]. In contrast, a larger body of work employs learned graph representations, where supply chain networks are constructed directly from observed interactions or transactions and subsequently processed using GNNs [
29,
30,
33,
39]. These learned graphs typically prioritise predictive performance in tasks such as demand forecasting, risk detection, or coordination analysis [
29,
30]. A few studies combine both paradigms, integrating structured graph representations with embedding-based or multimodal learning to support more complex decision workflows [
31,
32,
37,
40].
In terms of inference and learning,
Table 3 shows that most graph-based approaches rely on data-driven inference mechanisms, including GNNs, Graph Convolutional Networks (GCNs), variational graph autoencoders, large language models, and federated graph learning frameworks [
29,
30,
36,
39,
41,
42]. These models learn latent representations through message passing and aggregation, enabling the prediction of missing links, node attributes, or future demand patterns [
15,
35]. Knowledge-graph-centric approaches place greater emphasis on explicit reasoning, such as graph traversal, logical inference, or neurosymbolic reasoning, often to improve explainability and practitioner trust [
15,
32]. Hybrid approaches combine these techniques, using learned embeddings to enhance reasoning over structured graphs or to integrate additional data modalities [
31,
37].
Despite their methodological complexity,
Table 3 reveals a consistent limitation across the graph-based supply chain literature: sustainability is rarely operationalised as an explicit decision objective. With the notable exception of Ni et al. [
31] that frames optimisation in terms of energy intensity and circular economy considerations, most studies do not encode environmental or social sustainability variables directly into their graph representations, learning objectives, or inference mechanisms [
28,
29,
34,
35]. Instead, sustainability is either absent, treated implicitly, or subsumed under broader goals such as efficiency, resilience, or risk reduction [
30,
33,
39]. Even when sustainability is mentioned, as in the case of Wang et al. [
32], it is conceptual rather than measurable, and is not linked to concrete decision criteria such as carbon emissions, compliance status, or supplier sustainability performance.
This gap highlights a disconnect between the demonstrated capability of graph-based AI to model complex supply chain structures and its limited use in supporting sustainability-oriented decisions. Existing approaches show that graphs are well suited for representing multi-tier relationships and reasoning under uncertainty, yet they stop short of using these capabilities to guide supplier selection or substitution under explicit sustainability constraints [
15,
35]. Addressing this limitation motivates the methodology proposed in this paper. By integrating sustainability-relevant attributes into a supply chain knowledge graph and leveraging graph completion techniques, we aim to move beyond visibility and prediction towards decision support for sustainable supplier selection and substitution.
3. Methodology
This section presents our proposed knowledge graph-based AI methodology for sustainability-oriented decision support in supply chains. We model a multi-tier supply network as a heterogeneous knowledge graph capturing entities such as products, firms, locations, and sustainability-related attributes, and we learn predictive representations using GNNs. The following two complementary prediction tasks are considered: (i) link prediction, used to infer missing relationships and recommend feasible supplier alternatives under disruption or sustainability constraints [
39,
43,
44], and (ii) node classification, used to estimate sustainability-related labels such as emissions risk bands or compliance status where data are incomplete [
45]. This combination performs knowledge graph completion within a heterogeneous supply chain graph.
To assess the impact of different architectures, we evaluate multiple GNN variants. GraphSAGE [
46] is treated as a baseline inductive aggregation model, and its performance is compared against attention-based and relation-aware architectures, specifically Graph Attention Networks (GAT) [
47] and Relational Graph Convolutional Networks (RGCN) [
48]. By combining explicit relational representation with inductive graph learning, the approach aims to support sustainability-aware supplier selection and substitution in complex, data-sparse supply chain settings. The proposed methodology is illustrated in
Figure 1.
We focus on a sustainability-aware supplier selection and substitution problem in which decision-makers must identify feasible alternative suppliers or partners when existing relationships are disrupted or fail to meet sustainability requirements. The decision is subject to multiple constraints, including limited visibility across multi-tier networks, incomplete or missing relational data, and the need to account for sustainability performance indicators such as carbon emissions, certification status, or compliance proxies. Rather than assuming complete and reliable information, the problem is formulated under uncertainty, where the objective is to infer plausible supplier alternatives and assess their sustainability suitability using graph-based prediction. The outputs are therefore not deterministic prescriptions, but they are ranked or classified recommendations that support informed, sustainability-conscious decision-making.
3.1. Knowledge Graph Construction
Supply chains are naturally characterised by heterogeneous entities and multi-relational dependencies that are difficult to capture using flat or purely sequential data structures. To represent this complexity explicitly, we model the supply chain as a heterogeneous knowledge graph, in which entities are represented as nodes and typed relationships are represented as directed edges. This representation provides a structured and interpretable view of the supply network, enabling reasoning over incomplete and multi-tier relationships [
49].
Formally, a supply chain knowledge graph is defined as follows:
where
is a set of entities (nodes),
is a set of relations (edges). Equivalently, a knowledge graph can be defined as a set of triples
, where
and
denote two entities linked through relation
e. Entities correspond to supply chain actors and artefacts, such as suppliers, manufacturers, products, locations, and sustainability-related attributes, while relations encode interactions such as manufactures, supplies to, is located in, or holds certification. The ontology underlying the graph specifies permissible entity types and relation semantics, ensuring consistency and interpretability across the network [
50].
The knowledge graph schema is designed to accommodate sustainability-relevant information alongside operational relationships [
51]. Sustainability attributes, such as carbon emission scores or certification indicators, are associated with entities as node attributes or linked entities, allowing them to be queried and compared within the same relational structure. This design choice enables sustainability considerations to be integrated into downstream decision-making without requiring a separate data model.
Unlike purely learned graph representations, the proposed knowledge graph is constructed explicitly from available data sources using a predefined schema rather than inferred solely from interaction patterns. This explicit representation facilitates transparency and traceability, which are important for sustainability-oriented decision support. At the same time, the resulting graph remains incomplete and sparse in practice, reflecting real-world limitations in supply chain data availability. Addressing this incompleteness motivates the learning and inference mechanisms introduced in the subsequent sections.
3.2. Graph Neural Network Model
The explicit knowledge graph constructed in
Section 3.1 provides a structured representation of the supply chain, but by itself it does not resolve missing relationships or incomplete sustainability information. To address these limitations, we employ GNNs to learn latent representations of entities and relations directly from the graph structure and associated attributes. These learned representations are then used to support predictive tasks under data sparsity and uncertainty.
We evaluate multiple GNN variants to assess the impact of different architectures. GraphSAGE is treated as a baseline model due to its suitability for large, evolving supply networks and its ability to generalise to previously unseen entities without retraining on the entire graph [
52]. In addition, we consider an attention-based architecture (GAT) to capture different importance across neighbouring nodes and a relation-aware architecture (RGCN) to explicitly model typed relationships within heterogeneous supply chain graphs.
Two complementary learning tasks are defined. First, link prediction is used to infer missing or potential relationships between entities, such as alternative supplier–product or product–location links. Given node embeddings learned through neighbourhood aggregation, the model estimates the likelihood that a relation exists between a pair of entities, enabling the identification of feasible but previously unobserved connections. Second, node classification is used to assign sustainability-related labels or risk bands to individual entities based on their learned representations and attributes. This task supports the estimation of sustainability performance where direct measurements are unavailable or incomplete. These tasks jointly contribute to completing the supply chain knowledge graph, providing a more informative basis for sustainability-oriented reasoning.
Each GNN variant is implemented using stacked message-passing layers that update node embeddings by aggregating information from neighbouring nodes and combining it with the node’s current representation. Non-linear transformations are applied between message-passing layers to capture complex relational dependencies. Model training is performed using supervised objectives appropriate to each task, including binary cross-entropy loss for link prediction and classification loss functions for node classification. Optimisation is carried out using adaptive gradient-based methods to ensure stable convergence in heterogeneous and high-dimensional settings.
By separating explicit graph representation from inductive graph learning, the proposed approach provides a flexible mechanism for uncovering latent structure in the supply chain while preserving the interpretability afforded by the underlying knowledge graph. The outputs of the learning stage serve as inputs to the sustainability reasoning mechanisms described in the following section.
3.3. Sustainability Reasoning Mechanism
The predictions produced by the GNN are used to support sustainability-oriented reasoning and decision-making in supply chains. Rather than treating model outputs as end results, the proposed approach interprets link prediction and node classification outcomes as decision signals that can be evaluated against sustainability constraints and operational requirements.
For link prediction, inferred relationships between entities are used to identify feasible supplier or partner alternatives that are not explicitly present in the observed data. In the context of supplier substitution, predicted links indicate potential suppliers capable of fulfilling similar roles to existing partners. These alternatives are then filtered or ranked based on sustainability-related attributes, such as carbon emission proxies, certification status, or compliance indicators. This process enables decision-makers to consider sustainability performance alongside feasibility, thereby supporting substitution decisions that reduce environmental impact or improve compliance without disrupting supply continuity.
For node classification, entities are assigned sustainability-related labels or risk bands based on their learned representations and associated attributes. These labels provide an interpretable abstraction of sustainability performance, for example by categorising suppliers into low, medium, or high emission risk groups. Such classifications can be used to flag entities that require intervention, monitoring, or replacement, and to prioritise sustainability improvements across the network.
Explainability arises naturally from the graph-based structure underpinning both tasks. Because predictions are derived from neighbourhood aggregation and relational context, it is possible to trace which connected entities and relations contribute to a given recommendation. For link prediction, explanatory context can be inspected through shared neighbours and direct relational overlap between existing and predicted alternative suppliers. For node classification, neighbourhood composition and attribute similarity provide insight into why an entity is assigned to a particular sustainability category. This relational transparency distinguishes our approach from black-box prediction models and supports trust in sustainability-oriented decision support.
By explicitly linking graph-based predictions to sustainability constraints and substitution logic, the proposed reasoning mechanism moves beyond visibility and forecasting. It enables actionable, interpretable recommendations that support sustainability-aware supplier selection and substitution under uncertainty, providing a practical foundation for responsible supply chain decision-making.
4. Evaluation
This section empirically evaluates the proposed knowledge graph-based methodology in terms of its ability to support sustainability-oriented decision tasks in supply chains. Specifically, the experiments assess the following: (i) the effectiveness of link prediction in identifying feasible alternative relationships within incomplete supply networks; and (ii) the performance of node classification in estimating sustainability-related labels under limited data availability. The evaluation is conducted on two real-world supply chain datasets that exhibit heterogeneous structure and structural sparsity, reflecting the challenges encountered in practice. The following subsections describe the datasets, experimental setup, and evaluation metrics used in the analysis.
4.1. Dataset Description
To evaluate the proposed knowledge graph-based methodology under realistic supply chain conditions, we use two real-world datasets that differ in structure, scale, and domain focus. Both datasets exhibit heterogeneity and relational sparsity, reflecting the incomplete and fragmented nature of supply chain data encountered in practice. Together, they enable the assessment of our approach across distinct decision contexts and prediction tasks.
The first dataset represents a food supply chain derived from the LODHalal project (
https://github.com/utomogirraz/graph-app-kit, accessed on 12 January 2026) and converted into a knowledge graph format following the process explained in Sunmola et al. [
53]. The dataset focuses on relationships between food products, manufacturers, ingredients, and certifications. For the purposes of this study, the analysis concentrates on food–manufacturer relationships, which are particularly relevant for supplier substitution and continuity decisions. The dataset contains 43,101 food entities and 2520 manufacturer entities, connected through explicit production relationships. Despite its size, the dataset is structurally sparse, with many potential relationships unobserved, making it suitable for evaluating link prediction in incomplete supply networks. This dataset is primarily used to assess the ability of the proposed method to infer feasible alternative supplier relationships.
The second dataset, obtained from the Supply Chain Data Hub (
https://supplychaindatahub.org/, accessed on 12 January 2026) (AnonNet Dataset [
54]), captures a manufacturing supply network consisting of products and production locations connected through operational relationships. The network includes 3813 nodes, of which a small subset of 106 corresponds to locations and the remainder to products, resulting in an imbalanced and heterogeneous graph structure. This dataset is used to explore both link prediction and node classification tasks in a more complex industrial context.
Direct, standardised sustainability indicators are not available in either dataset, reflecting a common limitation in real-world supply chain data. To enable evaluation of sustainability-aware reasoning, we introduce synthetic sustainability attributes in the second dataset in the form of carbon emission scores associated with selected entities. The scores were produced using a generative process informed by publicly reported industry-level emissions ranges and were used solely as relative proxies rather than absolute measurements, analogous to Scope 3 emissions indicators or compliance-related risk measures used in practice when direct measurements are unavailable. The use of synthetic proxies allows controlled evaluation of the proposed reasoning mechanisms while preserving the structural characteristics of the underlying real-world networks.
For both datasets, the graphs are partitioned into training, validation, and test sets following standard practice for graph learning tasks. Care is taken to preserve relational structure across splits to avoid information leakage. The resulting datasets support systematic evaluation of the proposed methodology’s ability to operate under data sparsity, heterogeneity, and limited sustainability information.
4.2. Implementation Details
All experiments were implemented using the PyTorch deep learning framework v2.9.1, together with the PyTorch Geometric v2.8 and Deep Graph Library (DGL) toolkits, which provide support for heterogeneous graph representations and scalable GNN training. Computations were performed on a CUDA-enabled GPU environment through Google Colab to ensure efficient training on large and sparse graph structures.
Prior to model training, raw supply chain data were preprocessed to construct the heterogeneous graph representation described in
Section 3.1. Entity identifiers were mapped to contiguous indices, and node features were encoded using one-hot or categorical representations as appropriate. For each dataset, both forward and reverse relations were included to preserve bidirectional information flow, and the resulting graphs were converted to undirected form where required for message passing.
The GNN models were implemented using two-layer message-passing architectures. GraphSAGE employs mean-based aggregation, GAT introduces attention mechanisms to weight neighbouring contributions, and RGCN incorporates relation-specific transformations to model typed edges explicitly. These configurations were selected to balance expressive power and computational efficiency in inductive learning settings while enabling comparison across neighbourhood aggregation strategies. For link prediction tasks, negative sampling was employed to generate non-existent edges during training, enabling the model to distinguish between plausible and implausible relationships. A higher ratio of negative samples was used to mitigate false positive predictions in sparse graphs.
Model training followed a standard supervised learning procedure. As explained in
Section 4.1, the datasets were partitioned into training, validation, and test sets, and mini-batch training was performed using neighbourhood sampling to scale to large graphs. Binary cross-entropy loss was used for link prediction, while standard classification loss functions were applied for node classification tasks. Optimisation was carried out using the Adam optimiser with adaptive learning rates to ensure stable convergence across heterogeneous feature spaces.
Hyperparameters, including the learning rate, batch size, and neighbourhood sampling depth, were selected empirically based on validation performance and held constant across experiments for consistency. Early stopping was applied where appropriate to prevent overfitting. Model performance was evaluated exclusively on held-out test data using the metrics described in the following section.
4.3. Baselines and Evaluation Metrics
To assess the effectiveness of the proposed approach, we conduct a comparative evaluation across multiple GNN architectures. GraphSAGE is treated as a baseline inductive aggregation model, and its performance is compared against GAT across both datasets. In addition, for the LODHalal dataset, we also evaluate RGCN to analyse the impact of explicitly modelling typed relationships within the heterogeneous supply chain graph. RGCN is not evaluated on the AnonNet dataset, as it contains a single edge type.
For link prediction, all models operate within the same knowledge graph completion setting and are trained using identical negative sampling strategies. The comparison therefore isolates the effect of architectural differences on the identification of plausible supplier substitutions and relational inferences.
For node classification, we explore GraphSAGE in terms of its ability to assign sustainability-related labels based on learned relational representations. By comparing architectures with differing structural inductive biases, we assess the extent to which incorporating attention mechanisms or explicit relation modelling improves estimation of sustainability-related proxies under data sparsity. Node classification experiments are conducted only on the AnonNet dataset, as the LODHalal dataset does not provide suitable node-level attributes for sustainability-oriented classification.
Model performance is evaluated using multiple metrics. For link prediction, we report the Accuracy, AUC-ROC, F1-score, Mean Reciprocal Rank (MRR), and Hits@10. Ranking-based metrics (MRR and Hits@10) are particularly appropriate for sparse link prediction tasks, as they assess the model’s ability to prioritise plausible relationships among many negative candidates. For node classification, we report the Accuracy. All metrics are computed on held-out test sets to assess generalisation performance.
5. Results and Discussion
This section presents and discusses the empirical results obtained from evaluating the proposed knowledge graph-based methodology on two real-world supply chain datasets. The analysis focuses on link prediction and node classification tasks and reports performance using the metrics discussed in
Section 4.3. Beyond numerical performance, the discussion interprets the results in the context of sustainability-oriented decision support, examining how predictive outcomes can inform supplier substitution and sustainability classification under incomplete and heterogeneous data conditions.
5.1. Link Prediction Performance
Here we report the predictive performance of the proposed knowledge graph-based approach on link prediction tasks. For the LODHalal dataset, results for the link prediction task are summarised in
Table 4. The results show that all models capture useful patterns in the graph, with performance improving as relational information is more explicitly modelled. GraphSAGE provides a strong baseline, while GAT achieves some gains. RGCN achieves the best overall performance, likely due to its ability to learn relation-specific transformations. This suggests that explicitly modelling heterogeneous relations in the supply chain graph improves the identification of supplier-product links.
Results for link prediction in the AnonNet dataset are reported in
Table 5 and are consistent with the homogeneous nature of this dataset. Both models achieve comparable ranking performance, with similar MRR and Hits@10 values, indicating that neighbourhood structure alone captures much of the relational information present in the graph. However, GAT produces higher accuracy, AUC–ROC, and F1 scores, suggesting that attention-based aggregation helps prioritise more informative neighbours, even in a single-relation setting.
5.2. Sustainability-Oriented Supply Chain Decisions
Building on the results presented in the previous section, the proposed approach is intended not to optimise prediction accuracy in isolation, but to support feasibility-oriented decision-making. In particular, it aims to identify plausible alternative supplier relationships and to categorise entities using sustainability-related proxies. This framing enables decision support under uncertainty, where sustainability-oriented supply chain decisions must often be made in the presence of incomplete and heterogeneous information.
Figure 2 and
Figure 3 provide visual evidence of the link prediction outcomes for the two datasets.
Figure 2 illustrates the predicted missing links in the LODHalal dataset, highlighting additional manufacturer–product relationships inferred by the proposed model beyond those explicitly observed in the original data. For example, food product 9593 (Canned Retorted Guava Slices) is now linked to manufacturer 5371 (Jain Farm Fresh Foods). These inferred links represent feasible alternative sourcing options within the halal food supply chain, which could be considered by decision-makers in the event of supply disruption or sustainability-related non-compliance. The visualisation demonstrates how graph-based learning can uncover latent structural relationships that are not readily visible through direct inspection of transactional data.
Similarly,
Figure 3 visualises the predicted missing links in the AnonNet manufacturing dataset. The inferred relationships between products and locations reveal alternative production or sourcing pathways within a complex industrial network. From a sustainability perspective, such inferred links can support decisions that involve relocating production, diversifying suppliers, or avoiding geographically concentrated risks, all of which are relevant to maintaining resilient and sustainable supply chains.
Figure 4 complements the link prediction results by illustrating the outcomes of the node classification task on the AnonNet dataset. The figure shows the predicted sustainability-related categories assigned to manufacturing entities based on proxy indicators. The reported accuracy for these preliminary results is 0.61. While the classification labels are simplified representations, the visualisation highlights how entities can be differentiated into relative sustainability bands using relational context rather than isolated attributes. This supports decision-making processes such as identifying suppliers with comparatively lower sustainability risk or prioritising entities for further assessment or engagement.
In summary, the integration of link prediction and node classification within a knowledge graph-based framework enables a unified view of supply chain structure, alternatives, and sustainability proxies. By visualising inferred relationships and classifications, the proposed approach provides transparency into how predictions are formed, supporting more informed and accountable sustainability-oriented supply chain decisions.
5.3. Explainability and Trustworthiness
A key motivation for adopting a knowledge graph-based approach is its potential to support explainable and trustworthy decision-making in sustainability-oriented supply chain contexts. Unlike black-box predictive models that produce isolated outputs, the proposed method operates over an explicit relational structure, enabling predictions to be interpreted in terms of entities, relationships, and neighbourhood context.
For the link prediction task, explainability is supported through the underlying graph structure and the aggregation of neighbouring nodes. Predicted relationships can be examined by tracing the local subgraphs that contribute to a given inference, as illustrated in
Figure 2 and
Figure 3. This allows practitioners to inspect which existing manufacturer–product or product–location relationships influence the identification of alternative links, providing a transparent rationale for why certain relationships are considered feasible.
In the case of node classification, trustworthiness is enhanced by the ability to relate predicted categories to the surrounding relational context. As shown in
Figure 4, entities are classified based not only on individual attributes, but also on their position within the broader supply network. This supports explainability by enabling decision-makers to understand classifications in terms of network structure and similarity to other entities, rather than relying on non-transparent scoring mechanisms.
It should be noted that our approach does not claim to provide definitive or causal explanations of sustainability performance. Instead, it offers structured and easy-to-inspect reasoning pathways that support informed judgement under uncertainty. Trustworthiness is further supported by the fact that link prediction and node classification results are recommendations made in situations characterised by sparse and proxy-based data, allowing relevant stakeholders to make informed decisions.
5.4. Scalability and Practical Considerations
The proposed approach is designed to operate in supply chain environments characterised by heterogeneity, sparsity, and incomplete sustainability information, which are common conditions in real-world settings. From a scalability perspective, the use of inductive graph learning enables the model to generalise beyond the specific entities observed during training, supporting the inclusion of new suppliers, products, or locations without requiring complete retraining of the network. This property is particularly relevant for dynamic supply chains, where entities and relationships evolve over time.
In practical terms, the approach is intended to function as a decision support tool rather than as an automated decision-making system. The inferred links and classified entities provide structured information that can assist practitioners in narrowing down feasible alternatives, prioritising further investigation, or identifying potential sustainability risks within complex networks. By operating over an explicit relational representation, the model supports inspection and validation of predictions, which is essential in organisational contexts where sustainability decisions must be justified to internal and external stakeholders.
At the same time, several practical considerations must be acknowledged. The quality of inferred relationships and classifications remains dependent on the underlying graph structure and the relevance of the available attributes. Given this, our approach should be viewed as complementary to existing assessment processes, such as audits or expert evaluations, rather than as a substitute for them. Future deployments could benefit from integrating richer sustainability indicators, temporal information, or external data sources as they become available.
Sustainability-related data availability presents additional considerations, which are discussed separately in
Section 5.5.
5.5. Implications of Proxy-Based Sustainability Indicators
In this paper, the evaluation relied on synthetically generated carbon emission scores as relative sustainability proxies, as a result of the limited availability of standardised, multi-tier sustainability disclosures in real-world supply chain datasets. While this allows us to conduct a feasibility exploration of the proposed framework, it simplifies the multidimensional nature of verified Scope 3 reporting, where environmental, governance, and compliance indicators may be interdependent and non-linearly correlated.
Within the proposed framework, sustainability information is incorporated as node-level attributes or categorical risk bands. Transitioning to verified, multi-dimensional Scope 3 data would primarily affect the reasoning layer rather than the underlying graph learning mechanism. Richer Environmental, Social, and Governance (ESG) indicators could be encoded as vector-valued node features or additional typed relations, allowing the GNN to propagate more granular sustainability signals through relational aggregation.
In practice, this would shift decision support from threshold-based filtering towards multi-criteria assessment of predicted supplier alternatives, where a range of trade-offs need to be considered. The present results should therefore be interpreted as a proof-of-concept demonstration of sustainability-aware reasoning under data sparsity, with the framework designed to accommodate more comprehensive sustainability indicators as they become available.
6. Conclusions and Future Work
In this paper, we proposed a graph-based AI framework for supporting sustainability-oriented decision-making in supply chains under conditions of uncertainty and incomplete information. By modelling multi-tier supply networks as heterogeneous knowledge graphs and applying graph neural networks for knowledge graph completion through link prediction and node classification, the approach enables the identification of feasible alternative supplier relationships and the categorisation of entities using sustainability-related proxies. Unlike prior work that focuses primarily on visibility, optimisation, or performance metrics, the proposed methodology emphasises feasibility-oriented and explainable decision support, allowing practitioners to reason about sustainability trade-offs in complex and data-sparse environments. Empirical evaluation on two real-world supply chain datasets demonstrates that the approach can improve predictive accuracy over baseline methods while providing interpretable insights that align with sustainability-aware supplier selection and substitution tasks.
Several directions for future research emerge from this work. First, the integration of richer and standardised sustainability indicators, including verified Scope 3 emissions data and social impact measures, would strengthen the practical applicability of the approach. This would enable further empirical validation using supplier-level ESG datasets, such as ones available through Open Supply Hub (
https://opensupplyhub.org/, accessed on 12 January 2026) or verified corporate disclosures. Second, incorporating temporal dynamics into the knowledge graph could enable reasoning about sustainability trends, supplier evolution, and long-term risk. Third, tighter integration with complementary decision-making frameworks, such as multi-criteria decision analysis [
55] or policy-driven constraints, could further support adoption in real-world settings. In particular, sustainability objectives could be integrated directly within the training process through constraint-aware or multi-objective loss formulations. Finally, exploring hybrid neurosymbolic extensions and human-in-the-loop validation mechanisms [
56] would enhance trustworthiness and robustness, particularly in high-stakes sustainability contexts, by enabling explicit encoding of regulatory constraints and certification rules alongside learned relational representations. Together, these directions point towards the development of more comprehensive, transparent, and actionable AI-assisted decision support systems for sustainable supply chain management.