Next Article in Journal
Engine Optimization Model for Accurate Prediction of Friction Model in Marine Dual-Fuel Engine
Previous Article in Journal
Detecting SARS-CoV-2 in CT Scans Using Vision Transformer and Graph Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing

1
Department of Electrical & Computer Engineering, University of Saint Thomas, Saint Paul, MN 55105, USA
2
College of Science and Engineering, University of Minnesota, Minneapolis, MN 55105, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(7), 414; https://doi.org/10.3390/a18070414 (registering DOI)
Submission received: 23 May 2025 / Revised: 24 June 2025 / Accepted: 29 June 2025 / Published: 4 July 2025

Abstract

This paper explores the integration of Large Language Models (LLMs) and secure Gen-AI technologies within engineering design and manufacturing, with a focus on improving inventory management, component selection, and recommendation workflows. The system is intended for deployment and evaluation in a real-world industrial environment. It utilizes vector embeddings, vector databases, and Approximate Nearest Neighbor (ANN) search algorithms to implement Retrieval-Augmented Generation (RAG), enabling context-aware searches for inventory items and addressing the limitations of traditional text-based methods. Built on an LLM framework enhanced by RAG, the system performs similarity-based retrieval and part recommendations while preserving data privacy through selective obfuscation using the ROT13 algorithm. In collaboration with an industry sponsor, real-world testing demonstrated strong results: 88.4% for Answer Relevance, 92.1% for Faithfulness, 80.2% for Context Recall, and 83.1% for Context Precision. These results demonstrate the system’s ability to deliver accurate and relevant responses while retrieving meaningful context and minimizing irrelevant information. Overall, the approach presents a practical and privacy-aware solution for manufacturing, bridging the gap between traditional inventory tools and modern AI capabilities and enabling more intelligent workflows in design and production processes.

1. Introduction

Modern manufacturing enterprises operate across complex, layered systems that range from equipment-level automation to enterprise-wide planning and control. While innovations in robotics, Programmable Logic Controllers (PLCs), Supervisory Control and Data Acquisition (SCADA), Manufacturing Execution Systems (MESs), and Enterprise Resource Planning (ERP) have enabled greater throughput and coordination [1], inventory search and management remain a persistent operational challenge, especially in legacy environments.
In particular, inventory search systems within industrial settings struggle to keep pace with the evolving complexity and scale of manufacturing data. Despite technological advancements, legacy inventory systems continue to be a persistent bottleneck in manufacturing. Over time, organizations accumulate vast amounts of data across changing software, naming conventions, and reference structures. This evolution often renders historical data unstructured, inconsistent, and difficult to query. Inventory systems, in particular, suffer from fragmentation, making it hard to interpret part names, categories, and usage history without extensive programming or insider knowledge.
To address this critical issue, we present an effective, secure, and intelligent inventory management and recommendation system that has been developed and rigorously tested in a real-world manufacturing setting. Traditional systems rely on keyword-based queries and static filters, forcing users to adjust search terms to locate parts or components iteratively. These methods fall short in handling semantic variability, complex queries, or incomplete input.
This paper introduces a novel framework that leverages Large Language Models (LLMs) to enable intelligent, context-aware inventory search and recommendations. By integrating state-of-the-art techniques, such as Retrieval-Augmented Generation (RAG), which incorporates concepts like data vectorization and vector search, the proposed system significantly enhances both the relevance and efficiency of inventory access. Furthermore, in response to growing concerns over data confidentiality in industrial AI applications, our solution includes robust privacy-preserving mechanisms to secure sensitive information throughout the search process.
LLMs have shown transformative potential across various industries by enhancing communication, streamlining workflows, and supporting data-driven decision-making. In manufacturing, LLMs can interpret ERP and internal database queries more intelligently, addressing persistent communication challenges such as order inconsistencies and feedback loops between customers and producers [2]. They also support predictive maintenance by classifying work orders, estimating duration, and identifying key failure factors, thereby reducing downtime and improving resource allocation [3]. Additionally, LLMs help address challenges in data preparation, a significant barrier to the adoption of ML in manufacturing. The labor-intensive nature of data wrangling often limits the scalability of ML solutions. LLMs, however, can automate aspects of this process, enabling non-experts to engage with data science workflows and improving interdisciplinary collaboration [4]. Their ability to parse complex data and derive actionable insights supports the broader vision of intelligent manufacturing and data-centric operations [5].
In summary, our contribution is an LLM-enhanced, privacy-aware inventory search and recommendation system specifically designed to address the challenges of legacy industrial data. The solution is validated in a manufacturing context and is extensible to other domains requiring intelligent inventory or product search.
The rest of this article is organized as follows: Section 2 reviews related work relevant to the methods presented in this study. Section 3 details the methodology. Section 4 provides the performance evaluation metrics, while Section 5 provides the implementation of the proposed methods and the results. Finally, Section 6 presents the conclusions.

2. Related Work

Emerging inventory management trends address the limitations of traditional search systems by integrating ML and AI to automate and improve search processes. These technologies enable dynamic, context-aware searches, thereby reducing the need for manual query refinement [6,7]. Despite this, traditional systems remain prevalent in manufacturing, underscoring the need for continued development.
One significant development is the integration of AI with legacy inventory systems. This approach seeks to overcome the challenges posed by traditional systems, which often struggle with data silos and lack of real-time insights. Singh et al. [2] discuss innovative strategies that leverage AI to enhance the functionality of existing inventory systems, drawing on case studies that demonstrate successful integrations across various sectors. By utilizing AI, manufacturers can automate data processing, improve demand forecasting, and optimize inventory levels, thereby reducing the need for manual adjustments and enhancing overall operational efficiency.
Furthermore, the application of AI in Just-In-Time (JIT) inventory management is gaining traction. Pal et al. [8] highlight how these technologies are utilized to elevate demand forecasting accuracy, which is crucial for aligning inventory levels with fluctuating market demands. This integration not only streamlines inventory management but also minimizes waste and reduces holding costs, addressing some of the inefficiencies associated with traditional inventory systems. Another promising trend is the development of AI-driven real-time monitoring systems. Okuyelu et al. [9] emphasize the importance of real-time quality monitoring and process optimization in manufacturing. By implementing AI systems that continuously analyze inventory data and production processes, manufacturers can make informed decisions quickly, thereby reducing their reliance on manual input and improving responsiveness to inventory changes.
With the advancements in the inference of Deep Learning (DL) and ML models, recommendation systems have also garnered significant attention in manufacturing environments. While they are more suited for platforms such as entertainment streaming, as their quality depends on the vast amounts of collected data, these systems can be incorporated with considerable success. The authors of [10] examined several different architectures of recommendation systems that have been the focus of various modern developments. They discussed recommendation techniques, the data used in recommendation systems, deep learning, potential applications of recommendation systems, ML algorithms, evaluation metrics, and proposed challenges. The primary concern addressed was preventing each recommendation system from overwhelming the user with excessive information. Clustering is a standard ML technique, and the most common accuracy metrics are Mean, Precision, Recall, and F-measure. Scalability and latency are significant issues that still need to be addressed, though privacy and security were also discussed.
Marcuzzo et al. [11] focus their work on introducing current trends in recommendation systems, updating the taxonomy, and outlining the different trends in research, as well as the problems that have yet to be addressed. The authors define and discuss item recommendations, learning objectives, ranking, sampling, and taxonomies and provide an overview of the methods, experimental factors, accuracy metrics, recent advancements, and challenges. The relevant factors affecting model design, such as available data and chosen evaluation metrics, are introduced and compared to provide the foundation of knowledge that several recommendation systems reference. The authors emphasize the need for clearly defined testing protocols and benchmarks to create more universal systematic evaluation procedures to indicate the differences in each model’s performance.
The work of He et al. [12] focuses on tackling problems in collaborative filtering based on implicit feedback. Key topics discussed include learning from implicit data, matrix factorization, neural collaborative filtering, a fusion of generalized matrix factorization and multi-layer perceptron, and the proposed solution performance. The authors indicate that this framework is simple and generic, serving as a guideline for developing new Deep Learning (DL) models that open up a new avenue for future work, especially in extending models to incorporate auxiliary information and building a multimedia recommender system.
Additionally, the use of human-centered design principles in technology implementation is becoming increasingly important. Berretta et al. [13] argue that incorporating human factors into the design of AI systems can enhance user experience and improve the effectiveness of inventory management tools. This shift towards a more user-centric approach ensures that technology complements human decision-making rather than complicates it, thereby addressing some of the frustrations associated with traditional inventory management methods. Moreover, integrating AI-powered analytics into supply chain management transforms how manufacturers approach inventory optimization. Adegbola [14] discusses the potential of advanced financial modeling techniques and AI-driven analytics to reduce inventory costs and enhance overall competitiveness. By leveraging these technologies, manufacturers can gain deeper insights into their inventory dynamics, enabling more informed decision-making and enhanced operational performance.
Most existing inventory management solutions leverage ML for search, predictive maintenance, and real-time monitoring while typically remaining confined to text-based queries, manual adjustments, and fixed search parameters. Our approach departs from these conventions by integrating LLMs with RAG to provide context-aware semantic recommendations. Additionally, our privacy-preserving mechanisms address critical data security concerns in industrial settings. Extensive real-world testing further demonstrated the scalability and adaptability of our framework, making it a robust solution for next-generation inventory management. An overall, high-level implementation flow of our application is depicted in Figure 1. The contributions of this paper are summarized as follows:
  • Integration of LLMs with RAG, vector embeddings, and ANN search for dynamic, context-aware inventory recommendations.
  • Incorporation of robust privacy-preserving mechanisms suitable for industrial applications.
  • Demonstration of scalability and effectiveness through real-world industrial testing.

3. Methodology

Since LLMs are a relatively recent development, the technologies surrounding them are still evolving and continually improving. This rapid pace of advancement means that new techniques and methodologies are frequently introduced, making it a dynamic field. However, despite this ongoing evolution, several foundational concepts are consistently utilized in many LLM-based applications to achieve desired outcomes. This section discusses, in detail, the concepts that are used in this study to develop a robust framework—including techniques like RAG, data orchestration, fine-tuning, context-aware generation, and leveraging large-scale pre-training—that forms the backbone of how LLMs are applied across various domains. Understanding and effectively implementing these concepts is crucial for maximizing the potential of LLMs in real-world applications.

3.1. Large Language Models

LLMs have significantly advanced Natural Language Processing by learning from large, diverse corpora, enabling them to understand and generate human-like text beyond the capabilities of rule-based systems [15]. Their effectiveness stems from transformer-based architectures (Figure 2), which process input text using embeddings, positional encoding, self-attention layers, and decoders that predict word sequences through probabilistic outputs. This design supports parallel data processing and captures complex linguistic patterns using deep neural networks.
Models like GPT-4 demonstrate high performance in generating coherent, context-aware responses, making them valuable for tasks such as content generation and dialogue systems [17]. Beyond text generation, LLMs are being adopted in various fields, including education, government, and recommendation systems. In academia, they support personalized learning and administrative efficiency [18], while in digital governance they enhance service delivery and citizen interaction via conversational interfaces [19,20].
In the work presented in this paper, we have employed OpenAI’s GPT-4 model. GPT-4o (“o” for “omni”) is OpenAI’s flagship multimodal model, supporting text and image inputs with text-based outputs, including structured formats. It features a 128,000-token context window and up to 16,384 output tokens as of 30 September 2023. GPT-4o is optimized for most tasks, offering strong performance across modalities, though audio input is not supported. The model supports key features such as streaming, function calling, structured outputs, fine-tuning, and tool integration (e.g., web search, image generation, code interpreter). It is accessible via multiple endpoints, including chat, batch, and assistants APIs.

3.2. Retrieval-Augmented Generation

While LLMs face numerous challenges, particularly in terms of ethical considerations and generating biased content, hallucination is one of the most prominent issues in applications such as industrial automation. Hallucinations occur when LLMs generate false but plausible-sounding information due to gaps in their knowledge or when they are given too many tokens in a prompt. To address this issue, one of the most robust methods used is Retrieval-Augmented Generation (RAG).
RAG addresses the limitations of LLMs by incorporating an information retrieval component into the text generation process. This integration enables LLMs to access current and domain-specific knowledge from external sources, thereby enhancing the accuracy and relevance of their outputs. By relying less on static training data, RAG helps reduce hallucinations and enhances the reliability of LLMs in important use cases. Instead of supplying all the data, RAG provides the ability to extract only the necessary information relevant to a user’s prompt, enabling more accurate answers.
Several key components are necessary for a successful RAG. The following will discuss each major component used in this paper’s RAG implementation.

3.2.1. Vector Embeddings

Vector embeddings are a fundamental concept in RAG, representing objects such as control panel components as vectors in a continuous vector space. This method captures functional relationships and similarities between parts, supporting intelligent applications like automated part classification, predictive maintenance, and inventory optimization. The embedding process transforms discrete part information into numerical vector representations that reflect both semantic and functional characteristics.
Embedding models play a crucial role in this process by converting words or terms into vectors based on their meanings and usage within a specific context. Trained on large collections of text, these models learn to position semantically related terms closer together in a high-dimensional space [21]. In the context of control panel manufacturing, for example, embedding models can identify that ‘relay’ and ‘contactor’ are functionally similar and frequently used together, mapping them to nearby points in the vector space. This numerical encoding preserves meaningful relationships, enabling systems to perform tasks such as clustering, classification, and analogy detection more effectively.
Figure 3 and Figure 4 illustrate how this works. Terms like ‘relay’ and ‘contactor’ appear close to each other due to their similar roles, while components like ‘timer’ are positioned further apart, reflecting their distinct functions. The diagrams also highlight analogical patterns such as the relationship between ‘switch:block’ and ‘button:light’, demonstrating how embeddings capture structure and meaning within technical vocabularies.
We utilize an open-source embedding model, BAAI/bge-small-en-v1.5 [22]. The bge-small-en-v1.5 model, developed by the Beijing Academy of Artificial Intelligence (BAAI) as part of the FlagEmbedding project, is a compact English text embedding model designed for efficient performance in resource-constrained environments. As a smaller variant of the larger bge-base and bge-large models, it utilizes 384-dimensional embeddings.

3.2.2. Vector Search

Vector search is a technique used to find items that are the most similar to a given query by comparing their vector representations in a high-dimensional space. Unlike traditional keyword-based search, which relies on exact or partial word matches, vector search uses numerical embeddings that capture the semantic meaning of text, images, or other data types. This enables more flexible and accurate retrieval, particularly in instances where relevant content may not share the same vocabulary as the query. By measuring the distance or similarity between vectors using metrics like cosine similarity or Euclidean distance vector search enables systems to return results that are conceptually related, even if they differ in wording or structure.
In the application of this work, we utilized vector search to find the matching vectors based on the user’s query. The query that was converted to a vector itself was processed to find similar vectors from the list of vector embeddings.
The Approximate Nearest Neighbor (ANN) search is a method for efficiently finding points in high-dimensional space that are close to a query point without guaranteeing exact matches. It is beneficial for large datasets where an exact search is too slow or costly. By allowing for slight inaccuracies, ANN significantly speeds up the search, making it practical for applications such as recommendation systems, image recognition, and Natural Language Processing.
Common ANN algorithms include methods like Locality-Sensitive Hashing (LSH), Product Quantization (PQ), Hierarchical Navigable Small World (HNSW) graphs, and tree-based approaches such as KD-Trees and Ball Trees. These techniques reduce search time and memory usage by organizing data in a way that allows for the quick approximation of nearest neighbors. ANN typically begins with dimensionality reduction to simplify computations, and it operates within metric spaces using distance measures, such as Euclidean or cosine similarity, to evaluate the closeness of data points to one another.
BM25 (Best Matching 25) is a widely used ranking function in information retrieval that estimates the relevance of documents to a given query. It is based on the probabilistic retrieval framework and incorporates key factors, including term frequency, inverse document frequency, and document length normalization. The BM25 scoring function rewards documents that contain frequent and rare query terms while penalizing excessively long documents to prevent length bias. The relevance score of a document D concerning a query Q is given by
Score ( D , Q ) = i = 1 n I D F ( q i ) · f ( q i , D ) · ( k 1 + 1 ) f ( q i , D ) + k 1 · 1 b + b · | D | avgdl
Here, f ( q i , D ) is the frequency of term q i in document D, | D | is the length of the document, and avgdl is the average document length in the corpus. The parameters k 1 and b are typically set to values such as 1.2 and 0.75 , respectively. I D F ( q i ) represents the inverse document frequency of term q i , which gives more weight to informative terms.
In this work, we employed an ANN alongside BM25 to perform a technique called Hybrid Vector Search. While the method’s performance greatly depends on the tuning of the functions, it tends to perform better with large datasets, such as parts.
On the ANN aspect, the cosine similarity search was utilized. Cosine similarity measures the similarity between two vectors by calculating the cosine of the angle between them. Unlike other metrics, it is independent of vector magnitude, focusing solely on the orientation of vectors within the vector space.
Mathematically, cosine similarity is defined as the dot product of two vectors divided by the product of their magnitudes:
cos ( θ ) = A · B A B = i = 1 n A i B i i = 1 n A i 2 i = 1 n B i 2
This property makes cosine similarity especially useful in fields like text analysis and NLP, where documents are often represented as high-dimensional vectors based on term frequency–inverse document frequency (TF–IDF) or word embeddings [23,24]. It is frequently preferred in vector similarity searches, due to its ability to handle sparse data efficiently. In information retrieval systems, for example, cosine similarity enables the ranking of documents by relevance to a query, facilitating the retrieval of the most pertinent results [25,26]. Additionally, cosine similarity performs well in high-dimensional spaces, where traditional metrics like Euclidean distance may struggle due to the curse of dimensionality [27].

3.3. Vector Databases

Vector databases differ from traditional databases in both structure and function, offering advantages for applications that rely on similarity search rather than exact matching. Traditional databases store structured data and use rule-based queries. In contrast, vector databases manage high-dimensional embeddings of numerical representations of unstructured data, such as text, images, or video, to enable semantic search using similarity metrics, including cosine similarity or Euclidean distance [28]. Most vector database systems include built-in support for embedding generation, vector computation, and optimization. These databases are crucial in AI-driven use cases, such as recommendation systems, image retrieval, and personalized search, where relevance depends on meaning rather than keywords. They also provide efficient indexing and storage at scale, support real-time querying, reduce latency in machine learning workflows, and lower the cost and complexity of building custom retrieval solutions [29,30].
In this work, we utilized a self-hosted instance of Qdrant, an open-source and commercially available database solution running within a Docker container. Qdrant is a high-performance vector similarity search engine designed for managing and querying high-dimensional vectors with optional metadata, known as payloads. It is well-suited for applications such as semantic search and recommendation systems, where traditional databases often fall short. Qdrant supports distance metrics such as cosine similarity, dot product, and Euclidean distance, as well as B25 Hybrid search, and it utilizes efficient indexing methods like HNSW for fast Approximate Nearest Neighbor search. Data is organized into collections of points, each consisting of a vector, an ID, and optional payloads for filtering and enriched search results. With flexible storage options, a simple API, and support for various deployment environments, Qdrant offers an efficient and scalable solution for vector-based retrieval tasks.
The database instance was optimized to utilize an ANN + BM25 hybrid search for improved vector search. The data was accessed through the REST-API functionality provided by the Qdrant database.

3.4. Data Privacy

Data privacy concerns are increasingly relevant in the use of LLMs, primarily due to the risk of unintentionally retaining or exposing sensitive data from training datasets. While LLMs do not store information in a conventional memory structure, the AI community has concerns that LLM providers may collect user prompts and responses for model refinement. This practice can integrate sensitive information into the model’s knowledge, making it vulnerable to exposure in later interactions.
The primary interface between a user or application and an LLM is the prompt, making any prompt that contains sensitive information a potential privacy risk. Traditional privacy-preserving techniques, such as Differential Privacy (DP), often fall short in the context of LLMs. Shi et al. [31] highlight that standard DP methods treat all data points uniformly, which can degrade model performance. As an alternative, prompt obfuscation methods have emerged in research as a simple yet effective approach to enhance privacy without significantly impacting utility.
Prompt obfuscation involves transforming the original text or a part of it to obscure its meaning, significantly reducing its readability and recognizability while preserving the ability to recover the original content accurately. This balance ensures both privacy protection and data integrity. Several methods can be used for obfuscation, each with varying levels of complexity and effectiveness. Base64 encoding converts text into an ASCII representation of binary data, making it less readable to humans. ROT13 applies a simple letter substitution by rotating each character 13 positions in the alphabet. Hex encoding represents each character as a two-digit hexadecimal number, while URL encoding replaces special characters with percent-encoded equivalents. Finally, reversing the string provides a basic yet sometimes effective obfuscation by simply inverting the order of characters. The choice of algorithm depends on the desired trade-off between simplicity, obfuscation strength, and ease of reversibility.
This application involves processing customer and component information, some of which is considered sensitive. Applying heavy obfuscation to the entire prompt would degrade the performance of the language model due to the added complexity. To balance privacy and model efficiency, this work utilizes ROT13 to obfuscate only the sensitive words identified prior to prompt construction. ROT13 is a simple Caesar cipher variant that shifts each letter by 13 positions (e.g., ‘A’ becomes ‘N,’ ‘Z’ becomes ‘M’). Although not designed for strong encryption, ROT13 is effective for lightweight text scrambling, making it well-suited for scenarios where obfuscation, not security, is the primary goal [32]. An example of ROT13 obfuscation is shown in Figure 5.

4. Performance Evaluation Metrics

For evaluation purposes, multiple performance metrics were used, namely, Context Precision, Context Recall, Faithfulness, and Answer Relevancy as shown in Table 1.

4.1. Context Precision

Context Precision measures how well the relevant context chunks are ranked toward the top among all the retrieved pieces of context. It evaluates if the chunks that contain the correct information (according to the ground truth) appear early in the ranked list. The metric is calculated using question, ground_truth, and contexts. Precision is computed at each rank k up to K, using
Precision   at   k = true   positives   at   k true   positives   at   k + false   positives   at   k
The final Context Precision at K is given by
Context   Precision   at   K = k = 1 K Precision   at   k × v k Total   number   of   relevant   items   in   the   top   K   results
Here, v k { 0 , 1 } indicates whether the item at rank k is relevant.

4.2. Context Recall

Context Recall assesses how well the retrieved context aligns with the information in the ground-truth answer. For each claim in the ground-truth answer, the method checks whether it can be attributed to the retrieved context. The inputs used are question, ground_truth, and the retrieved context. The formula is
Context   Recall = | GT   claims   that   can   be   attributed   to   context | | Number   of   claims   in   GT |
A score closer to 1 indicates that the retrieved context is more complete and better covers the answer.

4.3. Faithfulness

Faithfulness assesses whether the generated answer stays consistent with the context used to produce it. It checks if the retrieved context can back each claim in the answer. First, claims in the generated answer are extracted, and then each is verified against the context. The Faithfulness score ( F S ) is computed as
F S = | claims   in   the   generated   answer   that   can   be   inferred   from   given   context | | Total   claims   in   the   generated   answer |
Scores closer to 1 indicate that the generated content is factually consistent with the source.

4.4. Answer Relevance

Answer Relevance measures how well the generated answer addresses the given question. It compares the semantic similarity between the original question and a set of artificial questions that are generated based on the answer. Using cosine similarity between their embeddings, the metric is calculated as follows:
Answer   Relevancy = 1 N i = 1 N cos ( E g i , E o )
or, equivalently,
Answer   Relevancy = 1 N i = 1 N E g i · E o E g i E o
where E g i is the embedding of the i-th generated question, and where E o is the embedding of the original question. N is the number of generated questions. Higher scores indicate better alignment with the original intent of the question.

5. Implementation and Results

This article implements an LLM-based search framework designed to enhance inventory search and recommendation processes within a control panel manufacturing facility. The framework utilized a dataset provided by our industry sponsor, which includes part numbers, availability, and usage information. Developed as a Python version 3.11-based API, the system integrates seamlessly with internal engineering and production tools, providing flexible access to inventory data. As illustrated in Figure 6, the architecture is designed to support efficient searches across a large and dynamic inventory, thereby reducing dependence on tribal knowledge and minimizing inefficiencies associated with manual search refinement.
The core pipeline for search and recommendation is detailed in Algorithm 1. When a user submits a query, it is first converted into a 384-dimensional vector embedding, using a transformer-based model. This embedding is then used to search a vector database for similar items, applying a similarity threshold of 0.7 and returning up to five candidate parts. These candidates are subsequently evaluated by the LLM, which generates a context-aware recommendation that is returned to the user. The specific parameters, including embedding dimension, similarity threshold, and maximum result count, were selected based on system testing to balance performance with the computational cost of LLM inference.
A distinguishing feature of the proposed framework is its use of a vector database that dynamically expands as new queries are processed. This capability enables continuous learning, allowing the system to generate increasingly accurate and context-aware search results over time. By serving as an intelligent assistant, the system provides engineers with rapid access to relevant component and design information, thereby streamlining workflows and supporting data-driven decision-making.
Algorithm 1 Inventory Search and Recommendation Pipeline
1:
q: user query
2:
m o d e l : embedding model (dimension d = 384 )
3:
d b : vector database (threshold 0.7 )
4:
N: max results ( N = 5 )
5:
procedure SimpleSearchAndRecommend(q, m o d e l , d b , N)
6:
     v q m o d e l . g e t _ e m b e d d i n g ( q )
7:
     C VectorDB_search( v q , d b , N, 0.7 )
8:
     r e s p o n s e LLM_recommend(q, C)
9:
    return  r e s p o n s e
10:
end procedure
The framework utilized the all-MiniLM-L6-v2 transformer model for embedding generation [33]. To further enhance search and recommendation accuracy, the embedding model was optionally fine-tuned using domain-specific part lists and sample queries, as guided by the model provider [34]. A high-level overview of this training process is depicted in Figure 7.
The trained embedding model transformed the complete parts database into a vector database. Each part and its associated characteristics were represented as a single embedding and stored in the vector database. We used Qdrant to store these vector embeddings, both as dense and sparse vectors, utilizing Qdrant’s internal tools. The embedding model maps sentences and paragraphs into a 384-dimensional dense vector space suitable for clustering and semantic search [33]. Algorithm 2 describes the basic steps of converting parts into embeddings.
The search process begins by taking a user query, converting it to a vector using the embedding model, and performing an initial vector search using Qdrant’s search tools. It then retrieves a specified number (N) of vectors based on both dense and sparse vector matching.
Algorithm 2 Convert text to vector embeddings
1:
P: a list of parts
2:
V: a list of vector embeddings
3:
procedure GenerateEmbeddings(P, V)
4:
     m o d e l a l l _ M i n i L M _ L 6 _ v 2
5:
    for each item i in P do
6:
         v i m o d e l . g e t _ e m b e d d i n g ( i )
7:
         V v i
8:
    end for
9:
    return V
10:
end procedure
These retrieved vectors undergo a secondary vector search to refine results based on specific characteristics such as voltage, amperage, and product availability. Due to LLM token limitations, the program may need to filter and select only a subset of these vectors; in this case, five vectors are chosen to be sent to the LLM. The second vector search uses a library called Faiss, which is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size [35].
The steps of converting a user query and performing the multi-step vector search are depicted in Figure 8. The figure contains the experiment example discussed in this experiment section. The selected results are then passed through an encryption module, which encrypts any predefined sensitive information. The ROT13 is used for this purpose with predefined sensitive information such as customer names, proprietary product names, and any contact information. The sensitive information can vary from application to application based on how this main framework and other tools are utilized. Figure 9 depicts using ROT13 with the selected parts and parts data.
We then present an experimental example to demonstrate the process of part retrieval within the proposed system. The vector search output, shown in Table 2, reflects the system’s ability to identify similar parts available in inventory that align with the specified characteristics of a given part number and description, using Algorithm 3. For this example, the input requested parts that matched a specified fuse type, FLNR015, ‘with characteristics’ Fuse, Delay, 250 VAC, 15A, 200 kA.’ The result table shows the first five results of the vector search with the highest similarity scores. These results can now be sent to the LLM for reasoning.
Algorithm 3 Two-Step Vector Search
1:
q u e r y : query against which to match embeddings.
2:
r e s u l t : a list of selected embeddings.
3:
m o d e l : embedding model.
4:
v e c t o r _ d b : qdrant vector database.
5:
F: Faiss search
6:
procedure SearchEmbeddings
7:
       d b _ r e s u l t = v e c t o r _ d b . s e a r c h ( q u e r y )
8:
       v e c t o r _ d b a l l _ M i n i L M _ L 6 _ v 2
9:
      S: embeddings of each property in the q u e r y .
10:
     V: selected vector embeddings.
11:
     for each property i in q u e r y  do
12:
           S m o d e l . g e t _ e m b e d d i n g ( i )
13:
     end for
14:
     for each embedding j in S do
15:
             s c o r e F . i n d e x . s e a r c h ( j )
16:
             V 5 embeddings with highest s c o r e .
17:
     end for
18:
     return V
19:
end procedure
The system is configured to apply characteristic-specific matching rules, such as avoiding undersized fuses unless explicitly requested by the user. To achieve this functionality, the embedding model is trained and deployed alongside a powerful LLM, such as GPT-4, providing a robust combination for effective part retrieval. Assessing the accuracy of an LLM-based software application is inherently challenging, due to the lack of universally perfect evaluation methods. However, actively analyzing the system’s input and output using a curated dataset can provide valuable insights into its effectiveness and accuracy. In this application, a dataset of Question-and-Answer pairs was generated using a process inspired by the principles of LLM distillation [36].
Distillation is a widely used method in machine learning, particularly in the context of LLMs, where a larger, more powerful model (the “teacher”) is used to train a smaller, more efficient model (the “student”). The teacher model generates extensive training data, such as Question-and-Answer pairs or other forms of structured outputs, which capture its advanced reasoning, knowledge, and decision-making capabilities. This generated data serves as a simplified and targeted representation of the teacher model’s understanding, allowing the student model to learn from it. The distillation transfers knowledge from the teacher to the student and enables the creation of domain-specific models that are faster, more resource-efficient, and tailored to specific applications. For example, chain-of-thought distillation, an advanced variant of this approach, involves generating step-by-step reasoning Question-and-Answer pairs. This method helps train smaller models to mimic not just the conclusions of the teacher model but also its reasoning pathways, improving the interpretability and reliability of the distilled models.
In this work, while a traditional distillation process was not employed, since the application used a pre-trained high-end LLM (GPT-4o), the distillation principles were leveraged in creating a curated dataset. This dataset, generated using a similar chain-of-thought methodology, was used to evaluate the effectiveness and accuracy of the application rather than to train a new model. Figure 10 illustrates an example of an advanced distillation process, highlighting the generation of chain-of-thought Question-and-Answer pairs. This evaluation approach ensures that the system’s outputs align closely with the expected results and demonstrates the utility of distillation techniques for assessing LLM-based applications.
The system was thoroughly evaluated using real-world manufacturing inventory data provided by the research sponsors. The dataset included a diverse range of part descriptions, both well-structured and poorly formatted, including entries with special characters or inconsistent terminology. This diversity was intentional to assess the robustness of the RAG-based system and the underlying language model when exposed to noisy, industry-specific input.
To benchmark performance, a total of 200 Question-and-Answer pairs were generated using the language model. A representative subset is shown in Figure 11. The system responses were then evaluated using the DeepEval framework [37], a modern, open-source toolkit for assessing LLM outputs based on key quality dimensions.
The evaluation focused on four core metrics: Answer Relevance, Faithfulness, Context Recall, and Context Precision. The results, summarized in Figure 12, revealed strong performance across all dimensions: 88.4% for Answer Relevance, 92.1% for Faithfulness, 80.2% for Context Recall, and 83.1% for Context Precision. These findings indicate that the system not only delivers highly relevant and factually accurate responses but also retrieves meaningful context while minimizing irrelevant noise.
Although a subset of generated answers diverged from the expected phrasing, they generally conveyed the correct information and aligned well with user intent. These results affirm the effectiveness of our RAG-based pipeline for part lookup and recommendation tasks, with promising adaptability to other domains involving structured data and context-dependent retrieval.
Table 3 summarizes and contrasts the core features of the traditional SQL search, generic machine learning-based methods, the recent RALLRec [38] framework, and our proposed LLM + RAG pipeline. The comparison highlights the capabilities of each approach in handling semantic queries, supporting context-aware recommendations, integrating privacy mechanisms, and operating in real-world industrial environments. Our method uniquely combines semantic understanding, advanced retrieval, and privacy features that are not present in other approaches.

6. Conclusions

In this paper, we examined several state-of-the-art approaches to deploying Large Language Models in industrial applications for enhancing inventory management and component selection and recommendation processes. We thoroughly discussed the generation of vector embeddings, vector search, data storage, orchestration, and application-specific prompting, highlighting each technique’s role in facilitating robust industrial solutions. The practical application of these technologies was demonstrated through an implementation designed to assist manufacturing and engineering teams in efficiently selecting components or parts from a database. Testing of this implementation indicated improved accuracy and relevance of results within the context of the application, underscoring the potential of LLM-based tools to enhance decision-making in industrial environments.

Author Contributions

Conceptualization, D.R., H.S., and C.-H.M.; methodology, D.R., H.S., S.S., and C.-H.M.; software, D.R., C.U., and A.K.; validation, D.R., H.S., and C.-H.M.; formal analysis, D.R., C.U., S.S., and A.K.; investigation, D.R., C.U., S.S., and A.K.; resources, D.R. and C.U.; data curation, D.R. and S.S.; writing—original draft preparation, D.R., C.U., and A.K.; writing—review and editing, H.S. and C.-H.M.; visualization, D.R., C.U., S.S., and A.K.; supervision, H.S. and C.-H.M.; project administration, H.S. and C.-H.M.; funding acquisition, H.S. and C.-H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Design Ready Controls grant number 26199.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors gratefully acknowledge the funding and support provided by Design Ready Controls, Inc. (DRC) for the project titled “Manufacturing Automation.” The content presented in this paper has been independently prepared by the authors and does not necessarily reflect the official views of DRC. The text in this article was improved using AI tools such as grammarly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Vyskočil, J.; Douda, P.; Novák, P.; Wally, B. A Digital Twin-Based Distributed Manufacturing Execution System for Industry 4.0 with AI-Powered On-The-Fly Replanning Capabilities. Sustainability 2023, 15, 6251. [Google Scholar] [CrossRef]
  2. Singh, N. Challenges and solutions in integrating AI with legacy inventory systems. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 609–613. [Google Scholar] [CrossRef]
  3. Navinchandran, M.; Sharp, M.; Brundage, M.; Sexton, T. Studies to predict maintenance time duration and important factors from maintenance workorder data. In Proceedings of the Annual Conference of the PHM Society, Scottsdale, AZ, USA, 21–26 September 2019; Volume 11. [Google Scholar] [CrossRef]
  4. Zhou, B.; Svetashova, Y.; Pychynski, T.; Baimuratov, I.; Soylu, A.; Kharlamov, E. SemFE: Facilitating ML pipeline development with semantics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
  5. Sexton, T.; Hodkiewicz, M.; Brundage, M. Categorization errors for data entry in maintenance work-orders. In Proceedings of the Annual Conference of the PHM Society, Scottsdale, AZ, USA, 21–26 September 2019; Volume 11. [Google Scholar] [CrossRef]
  6. Lubis, A. Information system design in warehouse inventory control. J. Logist. Supply Chain. 2023, 3, 35–44. [Google Scholar] [CrossRef]
  7. Bisri, C. Design of inventory information systems at cv company. Web-based berjaya jaya abadi. J. Multimed. Dan Teknol. Inf. (Jatilima) 2024, 6, 46–56. [Google Scholar] [CrossRef]
  8. Pal, S. Advancements in AI-enhanced just-in-time inventory: Elevating demand forecasting accuracy. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 282–289. [Google Scholar] [CrossRef]
  9. Okuyelu, O. AI-driven real-time quality monitoring and process optimization for enhanced manufacturing performance. J. Adv. Math. Comput. Sci. 2024, 39, 81–89. [Google Scholar] [CrossRef]
  10. Khanal, S.; Prasad, P.; Alsadoon, A.E.A. A systematic review: Machine learning based recommendation systems for e-learning. Educ. Inf. Technol. 2020, 25, 2635–2664. [Google Scholar] [CrossRef]
  11. Marcuzzo, M.; Zangari, A.; Albarelli, A.; Gasparetto, A. Recommendation Systems: An Insight Into Current Development and Future Research Challenges. IEEE Access 2022, 10, 86578–86623. [Google Scholar] [CrossRef]
  12. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17), Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar] [CrossRef]
  13. Berretta, S.; Tausch, A.; Peifer, C.; Kluge, A. The job perception inventory: Considering human factors and needs in the design of human—AI work. Front. Psychol. 2023, 14, 1128945. [Google Scholar] [CrossRef]
  14. Adegbola, A. Advanced financial modeling techniques for reducing inventory costs: A review of strategies and their effectiveness in manufacturing. Financ. Account. Res. J. 2024, 6, 801–824. [Google Scholar] [CrossRef]
  15. Hassij, V. Unleashing the potential of conversational ai: Amplifyichat-gpt’s capabilities and tackling technical hurdles. IEEE Access 2023, 11, 143657–143682. [Google Scholar] [CrossRef]
  16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  17. Sha, S. Navigating the web of disinformation and misinformation: Large language models as double-edged swords. IEEE Access 2024, 1. [Google Scholar] [CrossRef]
  18. Idri, M. Revolutionizing higher education: Unleashing the potential of large language models for strategic transformation. IEEE Access 2024, 12, 67738–67757. [Google Scholar] [CrossRef]
  19. Ha, J. Intelligent practices of large language models in digital government services. IEEE Access 2024, 12, 8633–8640. [Google Scholar] [CrossRef]
  20. Kouba, A. Exploring chatgpt capabilities and limitations: A survey. IEEE Access 2023, 11, 118698–118721. [Google Scholar] [CrossRef]
  21. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
  22. Xiao, S.; Liu, Z.; Zhang, P.; Muennighoff, N. C-Pack: Packaged Resources To Advance General Chinese Embedding. arXiv 2023, arXiv:2309.07597. [Google Scholar]
  23. Sitikhu, P.; Pahi, K.; Thapa, P.; Shakya, S. A Comparison of Semantic Similarity Methods for Maximum Human Interpretability. In Proceedings of the 2019 Artificial Intelligence for Transforming Business and Society (AITB), Kathmandu, Nepal, 5 November 2019. [Google Scholar] [CrossRef]
  24. Guo, X.; Zhern, T.; Soo, W.; Tan, Y.; Shuan, L. News Reliability Evaluation Using Latent Semantic Analysis. Telkomnika (Telecommun. Comput. Electron. Control) 2018, 16, 1704. [Google Scholar] [CrossRef]
  25. Singh, J.; Dwivedi, S. Performance Evaluation of Search Engines Using Enhanced Vector Space Model. J. Comput. Sci. 2015, 11, 692–698. [Google Scholar] [CrossRef]
  26. Shaik, N.; Chitralingappa, P.; Harichandana, B. The Nexus of AI and Vector Databases: Revolutionizing NLP with LLMs. Int. J. Sci. Res. Eng. Manag. 2024, 8, 1–5. [Google Scholar] [CrossRef]
  27. Schubert, E. A Triangle Inequality for Cosine Similarity. arXiv 2021, arXiv:2107.04071. [Google Scholar]
  28. Coyne, B.; Gupta, A. Vector Databases for Efficient Semantic Search in Natural Language Processing. IEEE Access 2022, 10, 125744–125753. [Google Scholar]
  29. AWS. What Is a Vector Database? 2024. Available online: https://aws.amazon.com/what-is/vector-databases/ (accessed on 1 March 2025).
  30. Jimenez, C.; Xu, F. Scaling Vector Databases for Real-Time Applications. IEEE Trans. Data Knowl. Eng. 2022, 34, 3952–3961. [Google Scholar]
  31. Shi, W.; Cui, A.; Li, E.; Jia, R.; Yu, Z. Selective differential privacy for language modeling. arXiv 2021, arXiv:2108.12944. [Google Scholar]
  32. Milian, Y.; Sulistyo, W. Model Pengembangan Keamanan Data dengan Algoritma ROT13 Extended Vernam Cipher dan Stream Cipher. J. Teknol. Inf. Dan Komun. (JTIK) 2023, 7, 208–216. [Google Scholar] [CrossRef]
  33. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
  34. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
  35. Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazaré, P.E.; Lomeli, M.; Hosseini, L.; Jégou, H. The Faiss library. arXiv 2024, arXiv:2401.08281. [Google Scholar]
  36. Hsieh, C.Y.; Li, C.L.; Yeh, C.K.; Nakhost, H.; Fujii, Y.; Ratner, A.; Krishna, R.; Lee, C.Y.; Pfister, T. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. arXiv 2023, arXiv:2305.02301. [Google Scholar]
  37. Confident-AI. GitHub—Confident-AI/Deepeval: The LLM Evaluation Framework. Available online: https://github.com/confident-ai/deepeval (accessed on 1 March 2025).
  38. Xu, J.; Luo, S.; Chen, X.; Huang, H.; Hou, H.; Song, L. RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning. arXiv 2025, arXiv:2502.06101. [Google Scholar]
Figure 1. A high-level abstraction of the proposed work.
Figure 1. A high-level abstraction of the proposed work.
Algorithms 18 00414 g001
Figure 2. The transformer architecture [16].
Figure 2. The transformer architecture [16].
Algorithms 18 00414 g002
Figure 3. An example structure of word embeddings and their relationships.
Figure 3. An example structure of word embeddings and their relationships.
Algorithms 18 00414 g003
Figure 4. An example of words in a vector space.
Figure 4. An example of words in a vector space.
Algorithms 18 00414 g004
Figure 5. An example of applying ROT13.
Figure 5. An example of applying ROT13.
Algorithms 18 00414 g005
Figure 6. High-level depiction of the implementation.
Figure 6. High-level depiction of the implementation.
Algorithms 18 00414 g006
Figure 7. Training an embedding model.
Figure 7. Training an embedding model.
Algorithms 18 00414 g007
Figure 8. Flow of the steps from a user query to LLM response.
Figure 8. Flow of the steps from a user query to LLM response.
Algorithms 18 00414 g008
Figure 9. Using ROT13 to scramble customer names.
Figure 9. Using ROT13 to scramble customer names.
Algorithms 18 00414 g009
Figure 10. Example of Few-Shot chain-of-thought Question/Answer pair.
Figure 10. Example of Few-Shot chain-of-thought Question/Answer pair.
Algorithms 18 00414 g010
Figure 11. Excerpt of the Question and Answer pairs generated.
Figure 11. Excerpt of the Question and Answer pairs generated.
Algorithms 18 00414 g011
Figure 12. Results of evaluated metrics.
Figure 12. Results of evaluated metrics.
Algorithms 18 00414 g012
Table 1. Summary of metrics.
Table 1. Summary of metrics.
CategoryMetric and Description
GenerationFaithfulness: How factually accurate is the generated answer?
Answer Relevancy: How relevant is the answer to the question?
RetrievalContext Precision: The signal-to-noise ratio of the retrieved context.
Context Recall: Can it retrieve all the relevant information required?
Table 2. Results and recommendations of the initial vector search.
Table 2. Results and recommendations of the initial vector search.
ScoreCategoryPart NumberDescription
0.923Fuse25860505Fuse Rejection Type 250 V 15 A Class RK5
0.911FuseFRN-R-15Fuse Rejection Type 250 V 15 A Class RK5
0.813FuseJTD015Fuse, Class J, Time Delay, 600 VAC, 15 A, 200 KAIC
0.745FuseKLDR015Fuse, Class CC, Time Delay, 600 VAC, 15 A, 200 KAIC
0.757Fuse349937133Fuse Rejection Time Delay 15 A Class CC
Table 3. Comparison of key features across four inventory search and recommendation systems.
Table 3. Comparison of key features across four inventory search and recommendation systems.
Feature/CapabilitySQL SearchML-BasedRALLRec [38]Ours
Query typeKeyword onlyKeyword/structuredText, promptsNatural language, semantic
Semantic understandingLimited
Vector embeddingsSometimes
Context-aware recommendationLimited
Privacy mechanisms✔ (ROT13)
Industrial testingSometimes
Unstructured/noisy data supportPartial
Hybrid dense/sparse retrieval
Evaluation metricsPrecision/RecallAccuracy, F1CustomRelevance, Faithfulness, Recall/Precision
Unique featuresSimple, interpretablePattern learningLLM-based reasoningSemantic, privacy, context-aware
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rupanetti, D.; Uberecken, C.; King, A.; Salamy, H.; Min, C.-H.; Schmidgall, S. An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing. Algorithms 2025, 18, 414. https://doi.org/10.3390/a18070414

AMA Style

Rupanetti D, Uberecken C, King A, Salamy H, Min C-H, Schmidgall S. An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing. Algorithms. 2025; 18(7):414. https://doi.org/10.3390/a18070414

Chicago/Turabian Style

Rupanetti, Dulana, Corissa Uberecken, Adam King, Hassan Salamy, Cheol-Hong Min, and Samantha Schmidgall. 2025. "An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing" Algorithms 18, no. 7: 414. https://doi.org/10.3390/a18070414

APA Style

Rupanetti, D., Uberecken, C., King, A., Salamy, H., Min, C.-H., & Schmidgall, S. (2025). An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing. Algorithms, 18(7), 414. https://doi.org/10.3390/a18070414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop