Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks

Su, Peng; Xu, Rui; Wu, Wenbin; Chen, Dejiu

doi:10.3390/asi8060160

Open AccessArticle

Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks

¹

School of Automation, Nanjing Institute of Technology, Nanjing 211167, China

²

Department of Engineering Design, KTH Royal Institute of Technology, 11428 Stockholm, Sweden

³

Engineer School, Qinghai Institute of Technology, Xining 810016, China

^*

Authors to whom correspondence should be addressed.

Appl. Syst. Innov. 2025, 8(6), 160; https://doi.org/10.3390/asi8060160

Submission received: 18 August 2025 / Revised: 7 October 2025 / Accepted: 17 October 2025 / Published: 22 October 2025

Download

Browse Figures

Versions Notes

Abstract

Global warming is a critical issue today, largely due to the widespread use of fossil fuels in everyday life. One promising solution to reduce reliance on conventional energy sources is to promote the use of renewable power. In particular, to encourage the use of renewable energy in industrial sectors which involve development and manufacture of the industrial artifacts, there is continuous demand for tracing energy sources within the production processes. However, given a sophisticated industrial product that involves diverse and extensive components and their suppliers, the traceability analysis across its production is a critical challenge for ensuring the full utilization of renewable energy. To alleviate this issue, this paper presents a functional framework to support tracing the usage of renewable energy by integrating the Large Language Models (LLMs) and logic programming across supply chain networks. Specifically, the proposed framework contains the following components: (1) adopting graph-based models to process and manage the extensive information within supply chain networks; (2) using the Retrieval-Augmented Generation (RAG) techniques to support the LLM for processing the information related to supply chain networks and generating relevant responses with structured representations; and (3) presenting a logic programming-based solution to support the traceability analysis of renewable energy regarding the responses from the LLM. As a case study, we use a public dataset to evaluate the proposed framework by comparing it to the RAG-based LLM and its variant. Compared to baseline methods solely relying on LLMs, the experiments show that the proposed framework achieves significant improvement.

Keywords:

knowledge graphs; logic programming; large language models; supply chain management; renewable energy tracing

1. Introduction

Global warming is a critical issue today, largely due to the excessive carbon emissions caused by the extensive use of fossil energy across industries and daily activities [1]. In particular, industrial sectors that involve extensive and large-scale energy use for developing and producing industrial artifacts play a critical role. Recent studies show that the industrial sector accounts for roughly 34% of greenhouse gas emissions in Year 2024 [2]. Given such a large proportion of emissions from industrial sectors, continuous efforts work on establishing international standards and regulations to control their carbon emission. For example, ISO 14064 provides a set of guidelines to support various organizations, including industrial sectors, to quantify, monitor, report, and verify greenhouse gas emissions [3]. In addition, to analyze the carbon footprint, this standard, along with other regulations, demand for reducing carbon emissions by replacing or decreasing the use of fossil energy [4]. Such demands encourage industries to adopt renewable energy generated from natural resources that are replenished faster than they are consumed [5]. Consequently, the original equipment manufacturer or authorities additionally need to ensure the use of renewable energy throughout product lifecycles by identifying its usage across supply chains. However, modern industries, especially those involved in production processes, often work with various suppliers who offer diverse and complex components from different regions and profiles, presenting a challenging task to distinguish and analyze the energy used by suppliers.

To address this challenge, a common solution is to trade renewable energy as a specific form of guarantee [6]. With such guarantees, organizations (e.g., authorities and industries) can trace the renewable energy flow via promising techniques. For example, a blockchain network, which acts as a decentralized database, can enhance the farm-to-fork traceability of guarantees for end consumers through a set of predefined specifications [7]. Additionally, smart contracts within the blockchain enable formal verification to mathematically prove correctness of these specifications, providing a complete and sound traceability process [8]. Despite the advantages of these consumer-oriented solutions, they still pose limitations due to the following reasons: (1) Trade of renewable energy leads to risks of falsification, where conventional energy is falsely endowed as renewable by labeling it with a guarantee [9]. For example, greenwashing refers to deceptive behavior that presents energy as renewable, regardless of its actual source. Such behaviors imply that solely relying on the guarantees still poses limitations to trace the renewable energy use. (2) The guarantees certified by smart contracts pose challenges for certain regional small- and medium-scale manufacturers, who may be unable to obtain such guarantees due to the additional costs, even if they already use renewable energy. [10,11]. This situation may reduce the scalability of the blockchain-based solutions to trace the renewable energy. (3) Formal specification in smart contracts requires a set of models, which still demands further effort to gain acceptance from different organizations and domains. For example, although current studies propose different approaches to model the specifications used in smart contracts for tracing renewable energy [12], more effort is needed to flexibly and effectively leverage the strengths of both.

To alleviate the aforementioned limitations, Large Language Models (LLMs) show promising potential due to their flexible and effective capabilities. Therefore, we formulate the Research Question (RQ) of the presented work as follows: How can LLMs trace renewable energy within supply chain networks? To respond the proposed RQ, this paper presents a functional framework by integrating LLMs and logic programming for tracing renewable energy use across supply chain networks. Compared to existing blockchain-based solutions focused on the customer side, the proposed work offers an alternative perspective to support industries manage and analyze their suppliers in the context of renewable energy use. Specifically, the proposed framework contains the following components to attain the traceability of renewable energy across supply chain networks: (1) adopting graph-based models to process and manage the extensive relational information within supply chain networks; (2) using the Retrieval-Augmented Generation (RAG) technique to support the LLM for analyzing the graph-based models by synthesizing domain knowledge; and (3) presenting a logic programming-based solution to formulate the specifications for supporting the traceability analysis of renewable energy regarding the responses from LLMs. In addition to the novel framework described above, we present a hybrid solution that combines embeddings with heuristic-based methods to enable the LLM to retrieve information from the knowledge base efficiently and effectively. To this end, we summarize the contributions of the proposed work as follows:

Proposing a functional framework to support tracing renewable energy use across the supply chain network from the original equipment manufacturer perspective. Compared to consumer-oriented solutions, such as green energy trading, the proposed framework supports manufacturers in analyzing and managing diverse information (e.g., renewable energy use) within the supply chain context, thereby reducing reliance on blockchain-based solutions.
Combining the LLM and domain knowledge to manage and analyze the extensive unstructured information provides a generic solution to trace the renewable energy across supply chain networks. Compared to existing works that rely on smart contracts with strictly pre-defined models, the LLM enables semantic and syntactic analysis of unstructured information to facilitate downstream tasks by generating structured responses.
Using logic programming to support the traceability analysis of renewable energy across supply chain networks by formulating specifications in terms of rules and facts. Compared to formal verification, which involves exhaustive recursion in elements of smart contracts, logic programming offers a more flexible solution that allows end-users to refine and update specifications regarding the analyzed information from domain knowledge.

To this end, we organize the rest of this paper as follows: Section 2 mainly introduces the existing work related to the proposed work. Section 3 elaborates the methodological design of integrating LLMs and logic programming. The case study is presented in Section 4 by evaluating the proposed framework with a public dataset combined to the synthesized data generation. At last, we conclude the proposed work with future work in Section 5.

2. Related Work

To provide a landscape of key concepts relevant to the proposed work, we first introduce existing approaches that model and manage supply chain information. Next, we present recent studies that apply LLMs for traceability analysis across various applications, which inform and support the techniques leveraged in our proposed framework. Finally, we present recent work that uses logic programming to support domain-specific modeling, which aligns with our goal of tracing renewable energy use in the proposed framework.

2.1. Modeling and Managing Supply Chain Information

Modeling and managing supply chain information typically involves collecting and analyzing all relevant activities such as procurement, manufacturing, and demand management throughout a product’s lifecycle by integrating automation, digitization, and networking across different organizations [13]. These activities typically generate diverse data that capture objects and their interactions as relational data, collected from multiple sources such as on-device sensors and documents with different data structures [14]. Sequentially, different solutions can be used to process and analyze relational data via multi-source information. Specifically, these solutions can be broadly categorized as follows:

(1): Knowledge-enabled approaches are heuristic-based solutions by analyzing the input data via deductive learning (reasoning) where a set of defined rules is defined [15]. These rules can be formulated as clauses by expert experience and industrial standards. Each clause is the composition of literals, which are atomic elements to represent objects related to the key concepts of supply chain. These clauses and literals can represent the ontology of a set of relational models. Within the ontology, Terminology Box (TBox) refers to concepts and their properties, and the ABox includes the set of statements (e.g., relationships) among concepts defined in the TBox. Knowledge bases are typical solutions to initialize the instances by consisting of a set of the ABox and TBox [16,17]. To materialize the knowledge base, one common solution is to adopt knowledge graphs (KGs) which are a kind of graph-based model by identifying the relational data in terms of entities and their interactions. For example, a heuristic-based solution is proposed in [18] to search for keywords and model the supply chain information. Specifically, the entities and their interactions related to the keywords are structured as the nodes and edges to facilitate the construction of the KG. Additionally, a decision-focused framework is used in lean chain management by deriving from a set of logic rules and specifications [19]. However, these heuristic-based approaches are often less flexible due to the strict specifications of their rules. Specifically, given the variety of input data representations, these specifications become obstacles to accurately interpreting the input data. While most existing tools use different query languages (e.g., SQL and CQL) to model and manage knowledge, they are often inefficient and inflexible at reasoning over relational data for tailored use cases [20].
(2): Data-driven approaches are usually learning-enabled methods which learn the patterns from the extensive input data via inductive learning. In particular, Deep Neural Networks (DNN) with extensive parameters can model spatial and temporal patterns of the supply chain information by analyzing and extracting features from multivariate data in supply chains. For example, by analyzing the numeric data related to the supply chain, Long Short-Term Memory (LSTM) is used to predict the time-series data and detect anomalies across the supply chain [21]. Additionally, due to the implied interactions within the multivariate data within the supply chain, DNN also allows us to identify the underlying relationships by analyzing the complicated unstructured data (e.g., natural languages). For example, combining the LSTM and autoencoder supports the identification and extraction of the entities and their relationships related to supply chain information [22]. Although the data-driven methods are promising solutions to support the supply chain management by handling various data representations, the training-intensive nature of these methods usually require extensive data, posing challenges to collect sufficient and balanced data in the field of supply chain management. Moreover, given the multiple rules to model the supply chain networks, solely relying on data-driven solutions is insufficient to further exploit the intricate relationships.

2.2. Using LLMs for Traceability Analysis Across Applications

Large Language Models (LLMs) refer to a trend of neural networks which obtain extensive parameters. Benefiting from widespread use of the Transformer architecture and its attention mechanism, LLMs are capable of handling various types of unstructured data, such as text and images [23,24,25,26]. Current LLMs, such as Generative Pretrained Transformers (GPT), are based on Transformer architectures that use self-attention mechanisms to process sequences by mapping keys and values. Within this process, key–value caches are introduced to improve the efficiency and effectiveness of encoding input data. Additionally, Mixture-of-Experts (MoE) layers are proposed to facilitate the utilization of knowledge from extensive training data through conditional computation [26,27,28,29]. By leveraging their powerful processing and analytical capabilities to generate responses, LLMs enable the recovery of traceability by directly analyzing semantic and syntactic similarities within specific applications. Based on the needs of tracing data representations, the following major trends can be identified in traceability analysis:

(1): Across unstructured and structured data. This kind of traceability analysis usually requires analyzing unstructured data such as nature language representations and tracing their potential links to structured models which contain a set of specifications (e.g., domain-specific models). For example, an LLM-based framework used in the automotive industry is proposed to generate functional safety traceability by analyzing requirements [30]. Given the extensive safety requirements provided by domain experts, LLMs enable traceability analysis by linking these natural language requirements to potential functional specifications defined within system architectural models.
(2): Within structured data. For example, an LLM-based approach is used to recover traceability between functional requirements and goals, both represented in structured data formats (e.g., JSON files), within software engineering to mitigate potential threats [31]. Specifically, the LLMs trace the requirements to the goals by analyzing graph-based models of virtual interior designs for software systems.
(3): Within unstructured data. This approach involves a conversation-based agent that processes and responds using natural language. In the field of blockchain networks, an LLM-based interface is designed for attaining the farm-to-fork traceability [32]. Specifically, this work proposes the use of RAG to enhance the traceability generated by the LLMs through the synthesis of a knowledge base which contains domain knowledge. As a result, the LLM-based agent enables us to make human-understandable traceability analysis regarding the user queries.

2.3. Using Formal Logic to Support LLM Inference

Given the variety of data representations used to analyze traceability across applications requiring the comprehension of corresponding domain knowledge, solely relying on the learning-enabled methods could lead to ambiguous results due to hallucinations, which are common in such models. Inspired by the integration of formal logic into Deep Neural Networks [33,34], the use of formal logic offers a promising approach to overcome the ambiguous responses. Logic programming is a typical programming paradigm that utilizes formal logic to represent knowledge and perform computations [35]. As an example in [36], LLMs can be integrated with logic programming by recursively translating natural language into first-order logic. A set of generated logic clauses are then processed and analyzed using Prolog, which is a typical logic-based solver. The solver is expected to return true or false to identify the correctness of logic clauses which describe and represent the objects within natural language. Another example is to support the analysis of domain-specific KGs by integrating logic programming and LLMs [37].

To better illustrate the advantages of the proposed work, we present the main advantage shown in Table 1. Specifically, given the relational data across supply chain networks, This work proposes a knowledge-enabled approach via KGs. The graph-based knowledge base provides an interpretable solution to model and analyze the information. To address the inflexibility of retrieving knowledge graphs via heuristic-based methods, which are commonly constrained by strict query requirements, we employ an LLM to process unstructured data and generate structured representations. Additionally, using RAG-based techniques mitigate hallucination via retrieving domain knowledge from the graph-based knowledge base. Given the limitations of LLM-enabled methods for multiple-step reasoning over graph-based data, we employ logic programming to enhance traceability analysis by reasoning over the structured outputs generated by the LLM.

3. Methodology

This section mainly elaborates the proposed framework which is shown in Figure 1. Specifically, we firstly present the methodology of modeling the graph-based knowledge base to support supply chain management in Section 3.1. Next, the workflow of generating answers via LLMs by retrieving the KG from the graph-based knowledge base is elaborated in Section 3.2. At last, we further conduct traceability analysis of renewable energy by reasoning over the generated responses using logic programming in Section 3.3.

3.1. Modeling the KG via Knowledge-Enabled Methods

As mentioned above, original equipment manufacturers typically rely on various suppliers to support the design and production of industrial artifacts. These suppliers and their related information naturally form networks that contain rich supply chain information. Based on the topological features stemming from the supply chain networks, we use the KG to model and represent the supply chain information. Additionally, the graph-based models within the KG show a promising trend to support the LLMs retrieving the relational data including entities and their interactions [38]. We present the main workflows of modeling of the KG as the following steps:

Modeling the ontology via meta-models: The meta-models consist of stereotypes and relationships to represent the TBox and ABox. Any stereotype

S_{i} \in S

refers to the terminology of domain models by assigning specific characteristics or behaviors, such as classes and types. Any relationship

R_{i} \in R

represents possible connections between these stereotypes. A meta-model defines a semantic schema that describe the structure of all stereotypes and their interactions. Specifically, we model the meta-model

G^{m}

as follows:

G^{m} = {S, R}

(1)

where

S = {S_{1}, \dots, S_{n}}

refers to the stereotype set and

R = {R_{1}, \dots, R_{m}}

refers to the relationship set.

Based on the common supply chain information [39,40,41], we define the following stereotypes in the proposed framework to support the traceability analysis within supply chain networks: (1) Manufacturers refer to the suppliers involved in a specific product or component; (2) materials refer to raw industrial inputs (e.g., steel and plastic) that are processed by manufacturers. In addition to manufacturers that specialize in processing a specific material, some are also capable of handling multiple materials simultaneously. (3) Services represent the functional capabilities provided by manufacturers to enable operations such as material processing and machining. Depending on the scale of the manufacturer, multiple services may be available. (4) Certifications represent credentials and qualifications awarded to manufacturers that meet different industry-specific standards and requirements. For example, ISO 9001 [42] provides a certification of product management of a manufacturer. (5) Industries: This stereotype denotes to the sectors of potential customers for products from manufacturers. (6) Energy refers to the power type consumed by suppliers during production. This information can be roughly provided by the suppliers when managing product processing. To this end, we visualize the meta-model

G^{m}

based on Equation (1) in Figure 2.

Creating KGs regarding the meta-models: The KG is an instantiation of the meta-models mentioned above, created by processing and analyzing tabular data illustrated in Figure 1. By extracting all the elements including entities and their relationships from the tabular data, each entity is identified as a node

e_{i} \in E

, which is an instance of the stereotype. Any two entities

e_{i}, e_{j}

containing an interaction

v_{i, j} \in V

. Any pair of nodes

e_{i}, e_{j}

and their edges

v_{i, j}

involved in the tabular data are modeled as triplet representations

< e_{i}, v_{i, j}, e_{j} > \in T

. To this end, we model the KG

G^{k}

as follows:

G^{k} = {E, V, T}

(2)

where

E

refers to the node set regarding the extracted entities. Any node within the set

E

obtains a stereotype by a mapping function

F_{n} : E \to S

.

V

refers to the edge set that represents the identified relationships within the nodes. Any edge obtains a relational label by a mapping function

F_{v} : V \to R

.

T

is the set of triplets that describes pairs of nodes connected by any relationship identified in relationship set

R

. The purpose of mapping functions

F_{n}, F_{v}

is to depict the process of matching input data to predefined meta-models, thereby generating graph-based knowledge by aligning input entities and relationships with those models.

Given the extensive numbers of nodes and edges from the source data, we adopt a graph-based database to manage their corresponding triplets due to the following reasons: (1) Using a graph-based database allows the merge of duplicated elements by assigning a unique identifier to support traceability analysis for each node across all elements. (2) The usage of the graph-based database improves the efficiency of analyzing a specific topic (e.g., instance or stereotypes) from the KG

G

by managing and querying a sub-graph

G_{i} \subset G

which is related to this topic. To this end, a graph-based knowledge base is created by synthesizing the tabular data and the meta-models via the KG.

3.2. Generating Responses via LLMs by Retrieving from the Knowledge Base

While a graph-based knowledge base supports querying domain knowledge related to supply chain information, it remains inflexible for end-users and stakeholders to efficiently obtain relevant information by designing the precise and accurate input queries. Additionally, the results from the knowledge base are expressed as compositions of triplets, which creates obstacles for interpretation. Therefore, we adopt the LLM illustrated in Figure 1 to query relevant information from the knowledge base and interpret the results by analyzing the input prompts in the context of natural language and generating human-understandable responses via the Retrieval Augmentation Generation (RAG) framework [43]. Specifically, the RAG framework contains the following steps:

Indexing the graph-based models: Indexing refers to the process of encoding the graph-based models within the knowledge base to facilitate LLM-generated responses by enabling the extraction and retrieval of relevant information. To encode graph-based models, we propose the following methods to index them by decomposing the graphs into chunks [38]: (1) Indexing the chunks via the keywords defines a set of keywords

D^{k}

as follows:

D^{k} = {d_{1}, \dots, d_{n}}

(3)

where

d_{i} \in D_{k}

refers to the pre-defined keywords. We specify the keywords as the elements of meta-models (e.g., stereotypes and relationships), node names, and alias.

(2) While keyword-based chunks support strict search by precisely matching keywords, indexing the chunks via vector models uses a DNN model

F_{e}

to provide a robust and generalizable solution for generating embeddings from the keywords. We model the generated embeddings as follows:

d_{i}^{e} = F_{e} (d_{i})

(4)

where

d_{i}^{e}

refers to the embedding of the keywords

d_{i} \in D^{k}

generated by the vector model

F_{e}

.

To this end, we create a look-up table

I

to support chunk indexing by pairing keywords with their corresponding embeddings as follows:

I = {(d_{1}^{e}, d_{1}), \dots, (d_{n}^{e}, d_{n})}

(5)

Retrieving the relevant information: Given an input prompt

P

from the user, we perform information retrieval from the knowledge base by extracting both explicit and implicit knowledge. Specifically, explicit knowledge refers to information that exists in the prompts and is stored in the look-up table

I

, simultaneously. We present the explicit knowledge retrieval according to the following indexing methods [44]: (1) Retrieving the keywords via a heuristic method

F_{h}

as follows:

D_{p} = F_{h} (P)

(6)

where

D_{p} = {d_{i}, \dots, d_{j}}

refers to the keywords that exist in

D^{k}

identified from the input prompt.

(2) Retrieving the chunks to measure the semantics and contextual similarity by firstly encoding the prompts as follows:

d_{p}^{e} = F_{d} (P)

(7)

where

F_{d}

refers to the vector model which obtains the same parameters as the one defined in Equation (4). The term

d_{p}

refers to the embeddings encoded in the tokens from the prompts. Next, we define a distance-based function

F_{s}

formulated in that it computes the similarity between embeddings stored in the look-up table and embeddings generated from tokens in the input prompt.

τ_{d} = | | d_{p}^{e} - d_{i}^{e} | |

(8)

where

| | \cdot | |

refers to the function to evaluate the distance-based metrics (e.g., Euclidean distance) between the embedding

d_{p}

and

d_{i}

. If the similarity score

τ_{d}

falls below the predefined threshold

τ

, tokens from the prompt are aligned with their corresponding keywords.

Based on the explicit knowledge retrieved by the above methods, we further explore the implicit knowledge which are related to the explicit one by querying across the KG. Specifically, we propose a multi-hop retrieval across the KG [45], extracting the nodes that connect to the identified explicit knowledge from the graphical knowledge base. We model the retrieved KG

G^{r} \subset G^{k}

to represent the explicit and implicit information identified in the knowledge base.

Generating the responses regarding the requirements: To this end, we model a pre-trained LLM

F_{l}

used to generate responses

R^{l}

as follows:

R^{l} = F_{l} (P | G^{r})

(9)

where the term

G^{r}

refers to the retrieved information within the graph-based knowledge base. The term

P

refers to the input prompts. While prompts typically consist of a set of questions from users, we further formulate each prompt as Equation (10) via in-context learning [46], where we additionally provide constraints, demonstrations, and instructions to avoid ambiguous responses.

P = {P_{I}, P_{C}, P_{D}}

(10)

where the term

P_{I}

refers to instructions that specify the tasks of LLMs by defining the patterns (e.g., structured and unstructured data) and expected content of generated responses. The term

P_{C}

refers to the constraints that support LLMs to optimize the context and limit the scope of the generated responses. The term

P_{D}

refers to specific questions proposed by the users.

3.3. Tracing the Use of Renewable Energy via Logic Programming

Given the LLM’s output, which contain implicated information related to renewable energy usage, we further apply logic programming to automatically reason about tracing the renewable energy use across the supply chain network from the generated responses

P

. By defining specifications and rules regarding renewable energy, logic programming enables us to reason about facts in the context of graph-based models within the knowledge base. Specifically, we propose the principles of defining specifications and rules derived from commonsense knowledge and domain experts to align the graph-based models in the knowledge bases. Although some services may not directly indicate renewable energy usage, commonsense knowledge from related works and online resources covers most stereotypes defined in the meta-model shown as follows:

Material-based rules: Some material present correlates to the renewable energy used in their manufacturers. For example, energy-intensive critical materials such as steel and aluminum usually require extensive power to process, heavily relying on conventional energy [47]. In contrast, manufacturers that can process renewable materials (e.g., biogenic feedstocks) commonly support the use of renewable energy [48].

Industry-based rules: Suppliers that are directly tied to sustainability-focused industries tend to adopt renewable energy. For example, manufacturers involved in renewable energy sectors (e.g., solar panel production, wind turbine components, battery manufacturing, and biofuels) are more likely to deploy renewable energy in their operations or have stronger incentives to adopt it, since it builds a credible green narrative, giving them an advantage in bidding, certification, and high-end markets [49].

Certification-based rules: Tracing suppliers that have been awarded standards and certifications related to renewable energy is a straightforward way to verify their use of renewable energy. For example, a manufacturer that holds ISO 14001 [50] certification or similar environmental management system certifications is more likely to use renewable energy than one that does not [51].

Energy-related rules: While some suppliers can provide information on the types of energy they use, further effort is needed to trace this usage according to additional rules. For example, if suppliers use solar energy, it is more reasonable to maximize its use during summer, as the combination of long daylight hours, a high sun angle, and clear skies significantly boosts production, thereby lowering energy costs [52].

To this end, we present the logic formulations to support the traceability analysis of using renewable energy across supply chain networks as follows:

\begin{matrix} O = ψ (R^{l}) \\ O_{v} = {\forall o \in O | o ⊧ ϕ} \end{matrix}

(11)

where the term

ψ (\cdot)

refers to propositional formulas that express the above rules using the logic programming tool to support the logic inference. The term

R^{l}

denotes the predicated logic generated from the LLM, as defined in Equation (9). The term

O

refers to an output set that is generated from the logic programming. The term

O_{v}

refers to a set of the valid entities that always satisfy the specifications

ϕ

in the context of using renewable energy after verifying the logic formulas.

4. Case Study

This section presents case studies to evaluate the proposed framework by using a public dataset of supply chain management [39,41,53]. This CSV format dataset includes information on over 7000 entities including the manufacturers, processed materials, services, and certifications, and 112,494 relationships across these entities. Within this dataset, the stereotypes and relationships of all the entities are labeled in separate rows. We select this dataset due to the following reasons: (1) Some of the datasets may lack sufficient information required to construct a knowledge base. For example, the dataset in [54,55] contains unstructured data intended to facilitate time-series analysis within the supply chain, which is not suitable for building a knowledge base. (2) While some datasets contain information that could be used to construct a knowledge base, they are restricted to specific domains. For example, the dataset in [56] contains structured data similar to what we use in our work; however, all of its data are collected solely from the automotive industry, with only a few suppliers and limited information. (3) While this dataset still lacks certain information (e.g., supplier certifications and energy usage), it is more comprehensive than the other datasets we have found. Additionally, this dataset has already been validated by many previous publications, which supports its suitability for academic use. Additionally, we collect data on manufacturers related to renewable sectors from public documents and online sources, and organize these data according to the schema. While this dataset contains the most essential information, directly accessing the energy data within it remains challenging. Therefore, we also adopt synthetic data generation techniques [57,58] to enrich the original dataset by assigning energy types using random distributions. While the synthetic data generation could involve bias, it still could be handled in this case since the proposed framework reasons about the context within supply chain networks by understanding their semantic and logic relationships rather than analyzing their features. Moreover, synthetic data generation is used as an auxiliary method in our case to support reasoning across supply chain networks by enriching the diversity of rules. Any bias introduced by synthetic data generation is likely to have a trivial impact on the rule-based reasoning performed via Prolog.

4.1. Creation of the Graph-Based Knowledge Base

To model the graph-based data within the knowledge base, we firstly analyze and extract the entities and their interactions from the tabular data based on the stereotypes and relationships formulated in Equation (1). To organize the extracted data according to the meta-model schema shown in Figure 2, we use Neo4j to process and manage the extracted data with Cypher queries [59]. Neo4j is a widely used and stable industrial database for managing large-scale graph-based information. Cypher, the native query language of Neo4j, offers an efficient way to manipulate content and generate specific knowledge graphs through user queries [60,61]. It supports creating each extracted entity as a node, with their interactions represented as edges. Additionally, we use the names and other features of each extracted entity as attributes for the corresponding node. Next, we merge duplicate entities by searching for matching node names within Neo4j. In Figure 3, we present an exemplified KG which is generated regarding Equation (2) by searching the nodes that interact with specific manufacturers. From this figure, we observe that the pattern matching functions within Neo4j could search some simple conditions that support the retrieval of relational data in the context of supply chain information. For example, the associated information that are related to these manufacturers can be traced by specific queries. However, the traceability analysis of renewable energy within these queried results still needs further analysis.

4.2. Structured Response Generation via RAG-Based LLMs

To index the graph-based models within the knowledge base, we generate embeddings of the keywords

d_{i}^{e}

defined in Equation (4) and tokens in a specific question

P_{D}

defined in Equation (10) within the prompts by using Gemini embedding models as the function

F_{e}

defined in Equation (4) [62]. The look-up table

I

defined in Equation (5), containing the keywords

D^{k}

defined in Equation (3) and their embeddings, is stored in plain format to facilitate indexing of explicit information from the graph-based knowledge base by querying the corresponding nodes. To facilitate the usage of the lookup table, we cache it locally in a serialized .pkl file to facilitate manipulation by the LLM. We configure cosine distance to measure the semantic similarity and set

τ

defined in Equation (8) equal to 0.4. These configurations are commonly used in RAG-based framework [63]. Next, we propose a 2-hop retrieval of implicit information to generate the KG

G^{r}

defined in Equation (9), which is constructed as CSV files to facilitate the downstream usage.

Based on the retrieved graph-based models

G^{r}

, we use Gemini 2.5 Flash API as the LLM to analyze the prompts and generate responses

R^{l}

defined in Equation (9). Given the high cost of retraining pre-trained LLMs, we calibrate the output from Gemini API via few-shot learning [63,64], a common technique used in domain-specific tasks to ensure the quality of LLM responses. Specifically, in few-shot learning, usually the prompt includes an illustrative example consisting of a small set of desired responses, which guides the language model to align its outputs with the desired format. If the LLM generates the incorrect responses, we further calibrate the answers by providing the correct responses along with their justifications until it makes correct responses. We set the temperature = 0.7 and maximum token length = 8192 Additionally, we constrained the generation process using top-k = 5 and top-p = 0.75, thereby balancing diversity with output stability. We specify the the instructions

P_{I}

defined in Equation (10) to inform the scope and purpose of the LLM outputs as follows:

You are an expert AI assistant specializing in supply chain management. Your task is to give responses based on the retrieved graph-based models.
The retrieved graph-based models should not more than 2-hops depth. If the retrieved models cannot directly match the queries, we need to infer the possible results based on the known situations.

The constraints

P_{C}

defined in Equation (10) are used to optimize the responses with delimitation by presenting the demonstrations. In this case, we expect the generated responses with structured representations to seamlessly support the analysis by using logic programming. In particular, this structured representation is aligned to the logic programming for the facilitation of downstream tasks. Therefore, we specify the constraints as follows:

After generating the responses regarding the retrieved graph-based models, convert the generated responses following this format: Stereotype(Node Name), Relationship(Node Name, Node Name).
For example, the manufacturer 149401-us.all.biz enables us to process rubber, and it certifies ISO 9001. The expected output is Manufacturer (149401-us.all.biz). Certification (ISO 9001). Material (rubber). Certify_award (149401-us.all.biz, ISO 9001). Process (149401-us.all.biz, rubber).

Since the proposed framework is primarily used to trace renewable energy across different suppliers, the questions

P_{D}

mainly focus on are potential manufacturers and their possible relationships regarding specific services. We use a template-based approach [65] to formulate these questions as follows:

Given specific services, which manufacturers can provide this service?

Here, the services in this question can be replaced by any instance with a service stereotype that exists in the knowledge base.

The LLM is intended to retrieve manufacturers along with their associated materials, certifications, industries, and energy sources. The responses are then structured according to the formats which specified in the constraints

P_{C}

to support the downstream tasks. All the context within the input prompts

P

defined in Equation (10) with natural language representations are stored as plain-format files. The input prompts are processed in Python 3.13. and sent to the LLM via an API call.

4.3. Traceability Analysis via Logic Programming

We use Prolog as the solver to support the traceability analysis via logic inference [66]. The Prolog requires the facts and rules to logic inference. The facts are those that contain a set of entities and their relationships whose semantics and syntax are formulated by the LLM. To efficiently attain the logic inference, these facts are updated regarding the LLM sessions. These updates ensure that the results bypassing logic programming directly aligned with the questions within the prompts. We further define the following facts to support the logic inference via predicate logic: (1) Specifications of renewable and energy-intensive materials; (2) specifications of certifications related to renewable energy; (3) optimal seasons and periods to use specific renewable energy; (4) industrial sectors tend to the usage of renewable energy. We formulate the rules including the material-based, industry-based, certification-based, and energy-related rules defined in Section 3.3 into propositional logic, which is compatible with Prolog, as shown in Algorithm Listing 1:

Listing 1. Prolog rules to define the rules.

Where \+ refers to the not provable operator, which the statement holds false if the argument following this operator is provable. Considering the possible incompleteness of the collected data, we represent the same predicate

e l i g i b l e_f o r_r e n e w a b l e (C o m p a n y)

with multiple clauses (e.g., from Line 1–17 of Algorithm Listing 1). These clauses work together as alternative rules, which can be interpreted as logical OR (disjunction). The adoption of the disjunction operator is used to synthesize all the known information provided by the dataset. For example, even if a supplier cannot provide any certification, if it uses solar energy during the summer, which is a season defined in our manuscript, we can still infer that it relies on renewable energy.

Additionally, some industrial categories retrieved by the LLM may consist of multiple related sectors. We can further enrich the rules by combining these retrieved sectors. For example, while the automotive sector does not explicitly exist in the graph-based knowledge base, the LLM can identify related sectors from its inherent knowledge acquired during training—such as engine, electric, and electrical. The sector composition which is defined in Line 31–33 supports the ability to synthesize the information based on the above sectors.

To synthesize the above separate rules, we propose using logical connectors to compose predicate logic expressions for inferring and tracing the usage of energy in Algorithm Listing 2:

Listing 2. Synthesized logics to support the tracability of using renewable energy.

The above logic enables answering different questions by reasoning over facts defined in the predicated logic. Based on the given logic assertions, Prolog can infer which manufacturers are eligible or ineligible to use renewable energy, as well as the type of renewable energy suitable for a given geography and climate by designing interpretable and efficient logic statements. Beyond the exemplified rules shown above, Prolog allows stakeholders to refine and update rules based on various specifications without additional compilation, thereby improving the flexibility of traceability analysis for various use cases.

4.4. Comparative Studies

We focus on designing comparative studies through quantitative analysis. While Neo4j can retrieve the desired information, it requires experts to construct sophisticated queries. In contrast, the main advantage of our work is that it provides a more flexible and user-friendly solution by analyzing natural language. Therefore, if Neo4j were included as another benchmark, the evaluation should emphasize flexibility, which is an aspect that is difficult to assess quantitatively using commonly applied metrics. Similarly, while the sole use of Prolog yields precise and complete inferred results in our case, where over 7000 entities and 100,000 relationships need to formulate as facts, reasoning across such a large fact set as a comparative study to evaluate performance is clearly impractical and inefficient. Therefore, the design of the comparative studies mainly focuses on performance analysis by comparing the proposed framework which adopts the logic programming with the LLM-based solutions. Therefore, we select a RAG-based LLM and a RAG-based LLM enhanced with Chain-of-Thought (CoT) reasoning [67] as baseline methods. To enable these baseline methods to straightforwardly support the traceability analysis in the context of renewable energy, we modify the prompts by directly identifying manufacturers that use renewable energy. In particular, the RAG-based LLM with CoT prompts generates responses by combining logical representations through chain rules, a typical natural language approach that facilitates reasoning for the LLM. We use Recall-Oriented Understudy for Gisting Evaluation (ROUGE) formulated in Equation (12) as metrics to evaluate the validity across these methods [63,68]. This metric is commonly used in the RAG-based tasks particular in graph-based models [64,69]. ROUGE reflects the coverage rate between the inferred results and the ground truths. Higher ROUGE scores indicate better performance of the proposed framework.

\begin{matrix} P_{R_{1}} = \frac{O_{v} \cap G}{G} \\ R_{R_{1}} = \frac{O_{v} \cap G}{O_{v}} \\ F 1_{R_{1}} = \frac{2 \times P_{R_{1}} \times R_{R_{1}}}{P_{R_{1}} + R_{R_{1}}} \end{matrix}

(12)

where

R_{1}

refers to the ROUGE-1, as the the overlap of unigram in terms of each entity between generated output

O_{v}

defined in Equation (11) and the ground truths

G

labeled by domain expert.

O_{v} \cap G

refers to the amount of overlapping entities within both

O_{v}

and

G

. The term

P_{R_{1}}

and

R_{R_{1}}

refer to the precision and recall of ROUGE-1, correspondingly.

F 1_{R_{1}}

is the F1 score that measures the performance of the proposed framework by balancing the precision and recall.

The platform of the comparative experiments is 11th Generation Intel Core i7 processor, 32 GB of RAM, and an NVIDIA GeForce RTX 3070 GPU designed and manufactured by NVIDIA Corporation. To collect the ground truth, we annotate the relational data concerning suppliers and their relevant information within a knowledge base. Next, we propose the template-based queries described previously, filling in the supplier names whose ground truth has been annotated. Finally, the generated results are compared against the ground truth using ROUGE-1, a widely recognized evaluation metric. To collect the performance of the proposed framework, we design a prompt set which contains 20 questions that support the ability to trace the renewable energy. As a result, Table 2 presents the ROUGE-1 metrics of different frameworks. Due to the usage of rule chains in the LLM with CoT, it presents an improved performance than the RAG-based framework. The precision of our framework shows significant improvement compared to the rest of the methods. However, omissions often occur when the LLM retrieves relevant information from the knowledge base, meaning the method is still imperfect which is a limitation commonly observed in LLM-based use cases. In contrast, since the facts generated by the LLM always exist in the ground truth during our experiments, it implies that the valid set

O_{v}

is always a subset of

G

so that the recall of our framework is notably high. We believe this situation due to the following reasons: (1) Since the LLM retrieves information from the graph-based knowledge base and rephrases the results as predicate logic, this process does not generate any novel entities or relationships. The retrieved information is always a subset of the ground truth. (2) While the rules are pre-defined, the facts are updated according to each LLM session in the proposed design, meaning that the existing facts relate only to the current questions.

5. Conclusion and Future Work

This work addresses the RQ by first adopting a RAG-based technique that integrates a graph-based knowledge base with the LLM. By using the RAG-based LLM, the proposed framework can process and analyze prompts in natural language, offering a highly flexible and generalizable capability to support analysis of relational data across supply chain networks. We then introduce a logic programming solution to mitigate hallucinations and improve the precision of the output generated by the RAG-based LLM. It allows the traceability analysis of using renewable energy by defining the rules and specifications. Compared to the baseline methods, our approach achieves competitive results.

The future work can be developed in the following aspects:

While the proposed framework demonstrates the use of logic programming with rules derived from commonsense knowledge, refining these rules with expert knowledge aligned to the renewable energy could further improve the efficiency of traceability analysis. For example, contract-based green power certifications often incorporate formal logic rules, whose definition and specification could be integrated into logic programming to enhance the proposed framework with sound and complete logical inference. In addition, to ensure the completeness and rigor of the facts generated by the LLM, their alignment still needs to be verified using syntax checkers (e.g., data validators or compilers) in future work.
While we present several types of rules to support logical reasoning, future work could extend these rules by incorporating additional collected information and applying various techniques to interpret this information into Prolog-supported logical representations. Additionally, the formulation of the rules can further optimize the efficiency of Prolog. With more clearly defined rules, Prolog compilation can mitigate redundant iterations.
While Prolog provides deterministic logical inference for traceability analysis, renewable energy usage often exhibits probabilistic and statistical characteristics, making Prolog less effective for certain cases. For example, geographical information, including the location and country of suppliers, can provide a statistical perspective on renewable energy usage. ProbLog, which extends Prolog with probabilistic reasoning, supports incorporating these features.
While the proposed framework follows a sequential pipeline which is from question analysis to logic programming, the inferred results from Prolog can further enrich and augment the graph-based models in the knowledge base due to the sound and precise nature of logical reasoning. For example, defining rules through logic programming provides a flexible solution to investigate novel relationship types that were not initially present or specified in the knowledge base.
While the proposed framework shows promising performance on the current dataset, further optimization could focus on cloud-based or distributed deployment of the knowledge base to reduce computational cost as the size of the knowledge base increases.

Author Contributions

Conceptualization, P.S., D.C. and W.W.; Methodology, P.S., D.C., R.X. and W.W.; Software, P.S., W.W. and R.X.; Validation, R.X. and P.S.; Formal analysis, P.S. and R.X.; Writing—original draft, P.S.; Writing—review & editing, W.W. and D.C.; Supervision, D.C.; Funding acquisition, W.W. and D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by: (1) Qinghai Province New-Type R&D Institutions 2024 (Qinghai Institute of Technology): 007hxky005; (2) KTH Royal Institute of Technology with the industrial research project ADinSOS: 2019065006.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pasgaard, M.; Strange, N. A quantitative analysis of the causes of the global climate change research distribution. Glob. Environ. Chang. 2013, 23, 1684–1693. [Google Scholar] [CrossRef]
International Energy Agency. Global Energy Review 2025; International Energy Agency: Paris, France, 2025; Available online: https://www.iea.org/reports/global-energy-review-2025 (accessed on 9 September 2025).
Wintergreen, J.; Delaney, T. ISO 14064, international standard for GHG emissions inventories and verification. In Proceedings of the 16th Annual International Emissions Inventory Conference, Raleigh, NC, USA, 26 April 2007. [Google Scholar]
Aristizábal-Alzate, C.E.; González-Manosalva, J.L. Application of NTC-ISO 14064 standard to calculate the Greenhouse Gas emissions and Carbon Footprint of ITM’s Robledo campus. Dyna 2021, 88, 88–94. [Google Scholar] [CrossRef]
Panwar, N.L.; Kaushik, S.C.; Kothari, S. Role of renewable energy sources in environmental protection: A review. Renew. Sustain. Energy Rev. 2011, 15, 1513–1524. [Google Scholar] [CrossRef]
Siddi, M. The European Green Deal: Asseasing its current state and future implementation. In Upi Report; United Press International: Washington, DC, USA, 1 January 2020; Volume 114. [Google Scholar]
Belchior, R.; Vasconcelos, A.; Guerreiro, S.; Correia, M. A survey on blockchain interoperability: Past, present, and future trends. Acm Comput. Surv. (CSUR) 2021, 54, 1–41. [Google Scholar] [CrossRef]
Tolmach, P.; Li, Y.; Lin, S.W.; Liu, Y.; Li, Z. A survey of smart contract formal specification and verification. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
Liu, Q.; Fang, D. Deceptive greenwashing by retail electricity providers under renewable portfolio standards: The impact of market transparency. Energy Policy 2025, 202, 114591. [Google Scholar] [CrossRef]
Liu, D.; Jiang, Y.; Peng, C.; Jian, J.; Zheng, J. Can green certificates substitute for renewable electricity subsidies? A Chinese experience. Renew. Energy 2024, 222, 119861. [Google Scholar] [CrossRef]
Li, A.; Choi, J.A.; Long, F. Securing smart contract with runtime validation. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, London, UK, 15–20 June 2020; pp. 438–453. [Google Scholar]
Jia, R.; Li, Y.; Zhou, Y.; Gao, M.; Sun, H.; Jiang, W. Design and implementation of smart contracts for power networked command system. In Proceedings of the 2024 8th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE), Xi’an, China, 25–27 October 2024; pp. 881–886. [Google Scholar]
Abdirad, M.; Krishnan, K. Industry 4.0 in logistics and supply chain management: A systematic literature review. Eng. Manag. J. 2021, 33, 187–201. [Google Scholar] [CrossRef]
Sheth, A.; Kusiak, A. Resiliency of smart manufacturing enterprises via information integration. J. Ind. Inf. Integr. 2022, 28, 100370. [Google Scholar] [CrossRef]
Xiao, F.; Li, C.M.; Luo, M.; Manya, F.; Lü, Z.; Li, Y. A branching heuristic for SAT solvers based on complete implication graphs. Sci. China Inf. Sci. 2019, 62, 72103. [Google Scholar] [CrossRef]
Tiddi, I.; Schlobach, S. Knowledge graphs as tools for explainable machine learning: A survey. Artif. Intell. 2022, 302, 103627. [Google Scholar] [CrossRef]
Su, P.; Kang, S.; Tahmasebi, K.N.; Chen, D. Enhancing safety assurance for automated driving systems by supporting operation simulation and data analysis. In Proceedings of the ESREL 2023, 33nd European Safety and Reliability Conference, Southampton, UK, 3–8 September 2023. [Google Scholar]
Guo, L.; Yan, F.; Li, T.; Yang, T.; Lu, Y. An automatic method for constructing machining process knowledge base from knowledge graph. Robot. Comput.-Integr. Manuf. 2022, 73, 102222. [Google Scholar] [CrossRef]
Liu, S.; Leat, M.; Moizer, J.; Megicks, P.; Kasturiratne, D. A decision-focused knowledge management framework to support collaborative decision making for lean supply chain management. Int. J. Prod. Res. 2013, 51, 2123–2137. [Google Scholar] [CrossRef]
Junghanns, M.; Kießling, M.; Averbuch, A.; Petermann, A.; Rahm, E. Cypher-based graph pattern matching in Gradoop. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences &Systems, Chicago, IL, USA, 19 May 2017; pp. 1–8. [Google Scholar]
Nguyen, H.D.; Tran, K.P.; Thomassey, S.; Hamad, M. Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. Int. J. Inf. Manag. 2021, 57, 102282. [Google Scholar] [CrossRef]
Kumar, A.; Starly, B. “FabNER”: Information extraction from manufacturing process science domain literature using named entity recognition. J. Intell. Manuf. 2022, 33, 2393–2407. [Google Scholar] [CrossRef]
Su, P.; Xu, R.; Quan, Y.; Chen, D. Leveraging large language models for health management in cyber-physical systems. In IET Conference Proceedings CP927; IET: Stevenage, UK, 2025; Volume 2025, pp. 91–97. [Google Scholar]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
Pope, R.; Douglas, S.; Chowdhery, A.; Devlin, J.; Bradbury, J.; Heek, J.; Xiao, K.; Agrawal, S.; Dean, J. Efficiently scaling transformer inference. Proc. Mach. Learn. Syst. 2023, 5, 606–624. [Google Scholar]
Bonner, M.; Zeller, M.; Schulz, G.; Savu, A. LLM-based Approach to Automatically Establish Traceability between Requirements and MBSE. In Proceedings of the INCOSE International Symposium, Dublin, Ireland, 2–6 July 2024; Wiley Online Library: Hoboken, NJ, USA, 2024; Volume 34, pp. 2542–2560. [Google Scholar]
Hassine, J. An llm-based approach to recover traceability links between security requirements and goal models. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno, Italy, 18–21 June 2024; pp. 643–651. [Google Scholar]
Benzinho, J.; Ferreira, J.; Batista, J.; Pereira, L.; Maximiano, M.; Távora, V.; Gomes, R.; Remédios, O. LLM Based Chatbot for Farm-to-Fork Blockchain Traceability Platform. Appl. Sci. 2024, 14, 8856. [Google Scholar] [CrossRef]
Xie, Y.; Xu, Z.; Kankanhalli, M.S.; Meel, K.S.; Soh, H. Embedding symbolic knowledge into deep networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Xie, Y.; Zhou, F.; Soh, H. Embedding symbolic temporal knowledge into deep sequential models. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 4267–4273. [Google Scholar]
Yu, D.; Yang, B.; Liu, D.; Wang, H.; Pan, S. A survey on neural-symbolic learning systems. Neural Netw. 2023, 166, 105–126. [Google Scholar] [CrossRef] [PubMed]
Di, Z.; Zhang, C.; Lv, H.; Cui, L.; Liu, L. LoRP: LLM-based Logical Reasoning via Prolog. Knowl.-Based Syst. 2025, 327, 114140. [Google Scholar] [CrossRef]
Bashir, A.; Peng, R.; Ding, Y. Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration. Data Knowl. Eng. 2025, 157, 102406. [Google Scholar] [CrossRef]
Peng, B.; Zhu, Y.; Liu, Y.; Bo, X.; Shi, H.; Hong, C.; Zhang, Y.; Tang, S. Graph retrieval-augmented generation: A survey. arXiv 2024, arXiv:2408.08921. [Google Scholar] [CrossRef]
Su, P.; Chen, D. Designing a knowledge-enhanced framework to support supply chain information management. J. Ind. Inf. Integr. 2025, 47, 100874. [Google Scholar] [CrossRef]
Kosasih, E.E.; Margaroli, F.; Gelli, S.; Aziz, A.; Wildgoose, N.; Brintrup, A. Towards knowledge graph reasoning for supply chain risk management using graph neural networks. Int. J. Prod. Res. 2024, 62, 5596–5612. [Google Scholar] [CrossRef]
Li, Y.; Liu, X.; Starly, B. Manufacturing service capability prediction with Graph Neural Networks. J. Manuf. Syst. 2024, 74, 291–301. [Google Scholar] [CrossRef]
International Organization for Standardization (ISO). ISO 9001:2015; Quality Management Systems–Requirements. ISO: Geneva, Switzerland, 2015. Available online: https://www.iso.org/standard/62085.html (accessed on 17 September 2025).
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Bahr, L.; Wehner, C.; Wewerka, J.; Bittencourt, J.; Schmid, U.; Daub, R. Knowledge graph enhanced retrieval-augmented generation for failure mode and effects analysis. J. Ind. Inf. Integr. 2025, 45, 100807. [Google Scholar] [CrossRef]
Shu, Y.; Yu, Z.; Li, Y.; Karlsson, B.F.; Ma, T.; Qu, Y.; Lin, C.Y. Tiara: Multi-grained retrieval for robust question answering over large knowledge bases. arXiv 2022, arXiv:2210.12925. [Google Scholar] [CrossRef]
Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Ma, J.; Li, R.; Xia, H.; Xu, J.; Wu, Z.; Liu, T.; et al. A survey on in-context learning. arXiv 2022, arXiv:2301.00234. [Google Scholar]
International Energy Agency. Steel and Aluminium; International Energy Agency: Paris, France, 2023; Available online: https://www.iea.org/reports/steel-and-aluminium (accessed on 9 September 2025).
Capodaglio, A.G. Developments and Issues in Renewable Ecofuels and Feedstocks. Energies 2024, 17, 3560. [Google Scholar] [CrossRef]
Tang, J.; Xiao, X.; Han, M.; Shan, R.; Gu, D.; Hu, T.; Li, G.; Rao, P.; Zhang, N.; Lu, J. China’s sustainable energy transition path to low-carbon renewable infrastructure manufacturing under green trade barriers. Sustainability 2024, 16, 3387. [Google Scholar] [CrossRef]
International Organization for Standardization (ISO). ISO 14001:2015; Environmental Management Systems—Requirements with Guidance for Use. ISO: Geneva, Switzerland, 2015. Available online: https://www.iso.org/standard/60857.html (accessed on 16 October 2025).
Ikram, M.; Zhang, Q.; Sroufe, R.; Shah, S.Z.A. Towards a sustainable environment: The nexus between ISO 14001, renewable energy consumption, access to electricity, agriculture and CO₂ emissions in SAARC countries. Sustain. Prod. Consum. 2020, 22, 218–230. [Google Scholar] [CrossRef]
Jiang, H.; Yao, L.; Qin, J.; Bai, Y.; Brandt, M.; Lian, X.; Davis, S.J.; Lu, N.; Zhao, W.; Liu, T.; et al. Globally interconnected solar-wind system addresses future electricity demands. Nat. Commun. 2025, 16, 4523. [Google Scholar] [CrossRef] [PubMed]
Yan, H.; Yang, J.; Wan, J. KnowIME: A system to construct a knowledge graph for intelligent manufacturing equipment. IEEE Access 2020, 8, 41805–41813. [Google Scholar] [CrossRef]
Supply Chain Data Set for Data Analytics Project Portfolio. Available online: https://www.kaggle.com/datasets/shivaiyer129/supply-chain-data-set (accessed on 17 September 2025).
DataCo Smart Supply Chain for Big Data Analysis. Available online: https://www.kaggle.com/datasets/shashwatwork/dataco-smart-supply-chain-for-big-data-analysis (accessed on 17 September 2025).
Supply Chain Management for Car. Available online: https://www.kaggle.com/datasets/prashantk93/supply-chain-management-for-car/data (accessed on 17 September 2025).
Lu, Y.; Shen, M.; Wang, H.; Wang, X.; van Rechem, C.; Fu, T.; Wei, W. Machine learning for synthetic data generation: A review. arXiv 2023, arXiv:2302.04062. [Google Scholar]
Su, P. Supporting Self-Management in Cyber-Physical Systems by Combining Data-Driven and Knowledge-Enabled Methods. Ph.D. Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2025. [Google Scholar]
Miller, J.J. Graph database applications and concepts with Neo4j. In Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, 23–24 March 2013; Volume 2324, pp. 141–147. [Google Scholar]
Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Schuster, M.; Selmer, P.; Voigt, H. Updating graph databases with Cypher. In Proceedings of the 45th International Conference on Very Large Data Bases (VLDB), Los Angeles, CA, USA, 26–30 August 2019; Volume 12, pp. 2242–2254. [Google Scholar]
Francis, N.; Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Rydberg, M.; Selmer, P.; Taylor, A. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1433–1445. [Google Scholar]
Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
Luo, L.; Zhao, Z.; Haffari, G.; Li, Y.F.; Gong, C.; Pan, S. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models. arXiv 2024, arXiv:2410.13080. [Google Scholar] [CrossRef]
Bast, H.; Haussmann, E. More accurate question answering on freebase. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 18–23 October 2015; pp. 1431–1440. [Google Scholar]
Wielemaker, J.; Schrijvers, T.; Triska, M.; Lager, T. Swi-prolog. Theory Pract. Log. Program. 2012, 12, 67–96. [Google Scholar] [CrossRef]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
Zhang, J. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv 2023, arXiv:2304.11116. [Google Scholar]
Pan, L.; Albalak, A.; Wang, X.; Wang, W.Y. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv 2023, arXiv:2305.12295. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed framework. This figure presents main components: construction of knowledge base, generation of responses via LLMs by retrieving relevant information from knowledge base, and logic reasoning via Prolog by synthesizing the responses from LLMs and pre-defined rules from domain knowledge.

Figure 2. Visualization of meta-models. This figure presents the common stereotypes and relationships regarding supply chain networks.

Figure 3. An example of the KG queried from the graph-based knowledge base. The KG is generated via Cypher queries by searching multiple supplier names. Neo4j supports the retrieval of these suppliers and their relevant information.

Table 1. Overall advantages of the proposed framework.

Components	Advantages
Graph-Based Knowledge Base	(1) Graph-Based models support the relational data within supply chain networks (2) Knowledge base with graph-based models provides an interpretable solution
RAG-based LLM	(1) Flexible analysis of input queries with natural language representations (2) Mitigation of hallucination via retrieving domain knowledge
Logic Programming	(1) A flexible solution to analyze traceability regarding each output from LLM (2) Support of multiple step reasoning via user-specific rules

Table 2. ROUGE-1 metrics of the comparative studies.

	Our Framework	RAG-Based LLM	LLM with CoT
Precision	0.73	0.52	0.59
Recall	1.00	0.55	0.65
F1-score	0.84	0.53	0.57

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Su, P.; Xu, R.; Wu, W.; Chen, D. Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks. Appl. Syst. Innov. 2025, 8, 160. https://doi.org/10.3390/asi8060160

AMA Style

Su P, Xu R, Wu W, Chen D. Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks. Applied System Innovation. 2025; 8(6):160. https://doi.org/10.3390/asi8060160

Chicago/Turabian Style

Su, Peng, Rui Xu, Wenbin Wu, and Dejiu Chen. 2025. "Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks" Applied System Innovation 8, no. 6: 160. https://doi.org/10.3390/asi8060160

APA Style

Su, P., Xu, R., Wu, W., & Chen, D. (2025). Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks. Applied System Innovation, 8(6), 160. https://doi.org/10.3390/asi8060160

Article Menu

Integrating Large Language Model and Logic Programming for Tracing Renewable Energy Use Across Supply Chain Networks

Abstract

1. Introduction

2. Related Work

2.1. Modeling and Managing Supply Chain Information

2.2. Using LLMs for Traceability Analysis Across Applications

2.3. Using Formal Logic to Support LLM Inference

3. Methodology

3.1. Modeling the KG via Knowledge-Enabled Methods

3.2. Generating Responses via LLMs by Retrieving from the Knowledge Base

3.3. Tracing the Use of Renewable Energy via Logic Programming

4. Case Study

4.1. Creation of the Graph-Based Knowledge Base

4.2. Structured Response Generation via RAG-Based LLMs

4.3. Traceability Analysis via Logic Programming

4.4. Comparative Studies

5. Conclusion and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI