Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments

Hidalgo-Castelo, Fernando; Guerrero-González, Antonio; García-Córdova, Francisco; Lloret-Abrisqueta, Francisco; Piñera-Marín, Antonio

doi:10.3390/electronics15122547

Open AccessArticle

Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments

by

Fernando Hidalgo-Castelo

,

Antonio Guerrero-González

^*

,

Francisco García-Córdova

,

Francisco Lloret-Abrisqueta

and

Antonio Piñera-Marín

Departamento de Automática, Ingeniería Eléctrica y Tecnología Electrónica, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(12), 2547; https://doi.org/10.3390/electronics15122547 (registering DOI)

Submission received: 15 May 2026 / Revised: 29 May 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Multimodal Intelligence and Digital–Physical Systems in the Industrial Internet of Things)

Download

Browse Figures

Versions Notes

Abstract

Access to operational information in industrial plants forces operators to interrupt their tasks, walk to the human–machine interface (HMI) terminals, and navigate heterogeneous platforms—namely programmable logic controllers (PLC), supervisory control and data acquisition (SCADA) systems, manufacturing execution systems (MES), and enterprise resource planning (ERP) systems—consuming 15–30 min per query. Previous work integrated local large language models (LLMs) into a five-layer cognitive architecture deployed in a precast concrete plant, reducing that time to 14–23 s through voice-based conversational queries; however, model inference accounted for 55.3% of total latency and the system remained reactive. This work incorporates the event-driven paradigm as a non-intrusive augmentation layer that keeps the operational context permanently updated, continuously monitoring the process and refreshing knowledge only when significant changes occur. The architecture is fully local, cloud-independent, graphics processing unit (GPU)-free, and containerized via Docker Compose. Experimental results demonstrate a 26–31% reduction in response times (means of 9.84 s, 11.23 s, and 16.47 s for simple, moderate, and complex queries), an 8.4 °C reduction in peak hardware temperature (from 79.6 °C to 71.2 °C), a 41.6% decrease in thermal variability, and an expansion of the safety margin before central processing unit (CPU) throttling from 5.4 °C to 13.8 °C. The system achieved 100% success rate and availability over 30 min of autonomous operation, validated in a real industrial environment.

Keywords:

edge AI; event-driven architecture; Industrial Internet of Things; large language models; n8n; Node-RED; Ollama; llama3.2; industrial supervision; Raspberry Pi

1. Introduction

In modern manufacturing plants, the operator is the first line of response to any incident. However, accessing basic process information requires interrupting ongoing physical tasks, walking to the control room, authenticating in multiple systems, and navigating heterogeneous interfaces that were not designed to interoperate. In a precast concrete plant, verifying the feasibility of an urgent order requires sequentially querying the PLC for equipment status, the SCADA system for process monitoring, the MES for production scheduling, and the ERP system for raw material inventory. This workflow can consume 15–30 min, delaying decision-making and removing the operator from direct process supervision, with the consequent increase in the risk of undetected deviations [1].

The maturity reached by LLMs has offered a promising solution to this information fragmentation. The ability to interact with complex industrial systems using natural language, without requiring knowledge of underlying protocols or topologies, represents a paradigm shift in the operator–plant relationship [2,3]. However, cloud-based commercial solutions present critical limitations for industrial environments: network latencies of 200–800 ms, dependence on permanent connectivity, exposure of sensitive operational data to external infrastructures, and recurring costs that are unaffordable for most manufacturing plants [4,5].

Advances in quantization techniques and CPU-optimized inference engines have made it possible to run LLMs locally on low-cost hardware, preserving data privacy and eliminating cloud dependency [6,7]. GPT-Generated Unified Format (GGUF) quantization at 4-bit precision enables models with 3–13 billion parameters to operate on hardware without dedicated GPU acceleration [8]. Ollama has emerged as a practical framework for containerized local LLM deployment, supporting multiple quantized models, including llama3.2 [9]. On this basis, previous work developed and implemented a five-layer cognitive architecture in a real precast concrete manufacturing plant, integrating the Mistral-7B model quantized on a Raspberry Pi 5 with heterogeneous industrial protocols—namely OPC Unified Architecture (OPC UA), Message Queuing Telemetry Transport (MQTT), and REST Application Programming Interface (REST API)—achieving a reduction in information access time from 15–30 min to 14–23 s, with a 26–77× improvement over the manual method and 100% reliability during autonomous operation [10].

Despite these results, the architecture operated under a reactive paradigm in which model inference represented 55.3% of total latency, with thermal peaks of up to 79.6 °C and a safety margin of only 5.4 °C before the throttling threshold of the Raspberry Pi 5 [10,11]. The Raspberry Pi 5 initiates soft throttling at 80 °C and enforces hard throttling at 85 °C, progressively reducing ARM core clock frequency to protect the processor [11]; sustained operation near these limits represents a critical constraint for continuous industrial deployments.

The event-driven architecture (EDA) paradigm, widely applied in Industrial Internet of Things (IIoT) contexts [12,13], offers a structural solution by decoupling computation from the moment of the user request, distributing the computational load over time rather than concentrating it at the moment of user interaction [14]. Node-RED has been validated as an event processing platform for industrial IoT applications, providing visual workflow programming and native support for industrial protocols, including Modbus TCP, OPC UA, and MQTT [15]. n8n extends this paradigm with advanced orchestration capabilities, conditional logic, and integration with external services [16,17]. Recent work has demonstrated the feasibility of integrating LLMs with event-driven information models to provide real-time context for industrial automation systems [2,18]; however, such approaches have been validated primarily in simulation or cloud environments, and their integration with local LLMs on resource-constrained edge hardware in real operational technology (OT) networks remains largely unexplored.

Thermal behavior is a critical constraint for continuous LLM inference on single-board computers (SBCs). Thermal cycling caused by repeated temperature variations is a primary degradation mechanism in embedded devices, and following the Arrhenius relationship, a reduction in peak temperature and thermal variability directly extends expected hardware lifespan [19]. Recent benchmarks confirm that SBCs such as the Raspberry Pi 5 can sustain quantized LLM workloads, but thermal management must be explicitly designed for sustained operation [20]. None of the reviewed works combines an event-driven architecture with a local LLM and experimental validation in a real industrial plant with heterogeneous OT protocol integration —the precise contribution of this work.

The main contributions of this work are: (1) an event-driven augmentation layer that keeps the operational context permanently updated without modifying any component of the reference architecture, enabling non-intrusive incremental adoption in existing industrial deployments; (2) a 26–31% reduction in response times through pre-built context that reduces LLM inferential load in the critical response path; (3) a significant improvement in edge hardware thermal sustainability, expanding the safety margin before throttling from 5.4 °C to 13.8 °C, extending hardware lifespan and broadening deployable environments; and (4) experimental validation in a real industrial environment at Frumecar S.L. operating heterogeneous OT protocols across all five levels of the automation pyramid.

The article is organized as follows: Section 2 reviews related work; Section 3 describes the architecture and methods; Section 4 presents the experimental results; Section 5 discusses the findings and limitations; and Section 6 summarizes the main conclusions.

2. Related Work

2.1. Local LLMs for Industrial Applications

The deployment of LLMs in edge environments with limited resources has gained significant momentum following advances in model quantization and CPU-optimized inference [6,7]. GGUF quantization at 4-bit precision enables 3–13B parameter models on hardware without dedicated GPU, opening viable paths for industrial edge deployment [8]. Ollama has emerged as a practical framework for local LLM deployment, supporting multiple quantized models including llama3.2 [9]. Recent benchmarks confirm that SBCs such as the Raspberry Pi 5 can sustain quantized LLM workloads using these frameworks, though thermal behavior under sustained inference requires explicit management [20].

Previous work on LLMs in industrial contexts has focused primarily on cloud-based deployments [3] or simulation environments [2]. Xia et al. proposed a framework integrating LLMs as agents for industrial automation control through an event-driven information modeling mechanism that supplies real-time context to the model [2]; however, their validation was conducted in simulation rather than on a live OT network. The architecture presented in [10] represents one of the first validated deployments of a local LLM integrated with OPC UA, MQTT, and REST API in an operating manufacturing plant. More recently, García et al. proposed a hybrid AI and LLM-enabled agent architecture for real-time decision support in industrial batch processes [18]; unlike the present work, their system does not address local edge deployment, thermal sustainability, or event-driven pre-computation. Bihlmaier et al. demonstrated natural language interfaces for industrial HMIs using LLMs and OPC UA on deployed machines [21], but they relied on cloud-hosted proprietary models without addressing edge deployment constraints.

2.2. Event-Driven Architectures in IIoT

Event-driven architectures have been widely applied in IIoT contexts for real-time monitoring and anomaly detection [14,15]. The central principle of EDA is the decoupling of event producers from consumers through asynchronous messaging, distributing computational load over time [12,13]. Node-RED has been validated as an event processing platform for industrial IoT applications with native support for Modbus TCP, OPC UA, and MQTT [15]. n8n extends this paradigm with advanced orchestration capabilities, delta/full update strategies, and native REST API integration [16,17]. A recent IIoT demonstration combining event-driven automation with containerized deployment validated that orchestration overheads are negligible relative to process dynamics [12]. No prior work integrates EDA with a local LLM for conversational industrial supervision in a real OT environment.

2.3. Thermal Management in Edge AI Deployments

Thermal behavior is a critical constraint for continuous LLM inference on SBCs. The Raspberry Pi 5 initiates progressive ARM core frequency reduction at 80 °C and enforces full throttling at 85 °C [11]; sustained operation near these thresholds is a recognized challenge in edge AI deployments. The reactive architecture validated in [10] reported peak temperatures of 79.6 °C, leaving a safety margin of only 5.4 °C under controlled ambient conditions (24 °C). Thermal cycling is one of the primary degradation mechanisms in embedded devices; following the Arrhenius relationship applied to semiconductor degradation, a reduction in peak operating temperature and thermal variability directly extends expected hardware lifespan [19]. Recent benchmarks confirm that models up to 3B parameters can be reliably sustained on Raspberry Pi 5 with active cooling strategies [20].

2.4. Orchestration Platforms for Industrial AI

Low-code orchestration platforms have emerged as enablers for industrial AI integration, reducing the deployment barrier without requiring deep software engineering expertise [22,23]. n8n, an open-source workflow automation platform, enables self-hosted, API-driven, and event-based workflows that interconnect enterprise and OT systems; its event-driven flow model, delta/full update logic, and native REST API support make it particularly suitable for the operational context management described in this work [16,17]. None of the reviewed works combines an event-driven architecture with a local LLM and experimental validation in a real industrial plant with heterogeneous OT protocol integration across all five levels of the automation pyramid—the precise contribution of this work.

3. Materials and Methods

3.1. Architecture Overview

The system presented incorporates the event-driven paradigm into the five-layer industrial cognitive architecture for industrial supervision with local generative AI on edge devices [10], maintaining the operational context permanently updated. The complete architecture is distributed across two complementary hardware platforms: a Raspberry Pi 5 that hosts the cognitive layers for natural language processing, reasoning, industrial protocol access, and operator feedback; and an x86-64 workstation that runs the orchestration, persistence, and local LLM inference stack via Docker Compose [24]. Both platforms communicate over the plant’s OT network, connected via Ethernet to the automation systems of Frumecar S.L. The complete architecture is presented in Figure 1.

3.1.1. Reference Cognitive Architecture

The multimodal five-layer cognitive architecture [10], whose source code is publicly available at https://github.com/futuryteg/LLM-cognitive-industrial (accessed on 1 May 2026) [25], constitutes the base on which the event-driven paradigm is incorporated. Layer 1 manages the multimodal interface through Spanish speech recognition and response synthesis. Layer 2 hosts the Mistral-7B-Instruct-v0.2 model quantized in GGUF Q4_0 (3.82 GB) on a Raspberry Pi 5, executed via llama-cpp-python with OpenBLAS optimizations for ARM architecture [7,8]. Layer 3 extracts parameters, verifies the validity of the requested control, and determines which automation level must be consulted. Layer 4 centralizes access to the five automation levels of Frumecar S.L. via OPC UA (Levels 2 and 3), MQTT (Level 1), and REST API (Levels 4 and 5). Layer 5 manages operator feedback and maintains interaction traceability in JSON format.

The selection of Mistral-7B-Instruct-v0.2 as the inference model for the cognitive architecture was not arbitrary. A systematic comparative evaluation of local quantized models on Raspberry Pi 5 hardware—including Mistral-7B, Llama-2-7B, Phi-2, and TinyLlama-1.1B—demonstrated that Mistral-7B achieved the best balance between accuracy (90%), response time (≈8.4 s), and thermal stability for industrial edge deployment [26]. Llama-2-7B reached only 60% accuracy with latencies exceeding 14 s, while Phi-2 and TinyLlama-1.1B showed insufficient semantic comprehension for industrial command interpretation in Spanish, with accuracies of 10% and 30% respectively. Recent benchmarks on single-board computers confirm these findings [20].

3.1.2. Incorporation of the Event-Driven Paradigm into the Control Layer

Layer 4 of the reference architecture operates under a reactive paradigm: it accesses the five automation levels only when the operator formulates a query, with model inference representing 55.3% of total latency, thermal peaks of up to 79.6 °C, and a safety margin of only 5.4 °C before the throttling threshold [10]. This work incorporates the event-driven paradigm into this layer without modifying any component of the original implementation, adding an external stack that continuously monitors the process, detects significant changes, and maintains the operational context permanently updated.

To enable this integration, the event-driven stack repository includes a REST module deployable on the Raspberry Pi that exposes the aggregated state of the five automation levels via an HTTP GET request to the /estado-planta endpoint, returning in a single JSON object the current state of all levels without duplicating the industrial protocol access logic already implemented [25,27]. Node-RED detects real-time events on the industrial process via Modbus TCP and via periodic queries to the /estado-planta endpoint [15], identifying six types of significant change: process variable deviation greater than ±5%, mode transition, setpoint change, alarm activation, alarm recovery, and communication failure. Emergency detection operates with maximum priority, dispatching a webhook to n8n and a direct alert to the Telegram alerts channel in less than 1 s.

n8n acts as an event-driven orchestrator through two workflows [16,17]. The primary workflow operates in 20 s cycles, persisting readings in PostgreSQL and managing the alarm state machine with duplicate event filtering. Upon each event signal, it executes the delta/full decision logic: evaluating the type and magnitude of the event to determine whether an incremental context update is required or a full regeneration every 30 min. When multiple significant events occur within a short time interval, the duplicate event filtering in the alarm state machine ensures that only the first event of each type is processed within a given cycle; subsequent identical events are discarded. The full regeneration cycle every 30 min acts as a global reconciliation mechanism, guaranteeing operational context coherence regardless of intermediate events. PostgreSQL persists the operational history in three tables: historial_proceso (continuous process variable readings at 20 s intervals), historial_alarmas (events with type differentiation and duplicate prevention), and fallos_comunicacion (failure events with start and recovery timestamps). Flask acts as a microservice for generating PNG charts of process variable history, delivered as attachments in email reports.

Ollama executes llama3.2 (3B parameters, approximately 3.0 GB) locally on CPU, synthesizing in natural language the complete operational context from the PostgreSQL history and the current state obtained from the /estado-planta endpoint [9]. The llama3.2 3B model was selected for its optimized architecture for edge and on-device deployment, outperforming Gemma 2 2.6B and Phi 3.5-mini on summarization and instruction-following tasks [28]. Explicit anti-hallucination restrictions are applied in the prompt: maximum 8 lines, prohibition of speculation on absent data, independent counters for each alarm type, and strictly verified data block. The result is the Industry Report, a pre-built and permanently available context that the reference cognitive architecture consumes to respond to the operator, significantly reducing the inferential load compared to the reactive paradigm.

As a complement to the voice interface, the event-driven stack deploys two independent Telegram bots for supervision and control of the mixing water tank—a critical variable in precast concrete production governing the water–cement ratio: the Control Tank bot, which accepts direct operational commands (/on, /off, /sp:XX, /estado, /grafica, /informe), and the Alerts Tank bot, a read-only channel for automatic critical event notifications generated by n8n.

3.1.3. Hardware Platforms

The integrated architecture is deployed on two complementary hardware platforms connected via Ethernet to the OT network of Frumecar S.L. The reference cognitive architecture runs on a Raspberry Pi 5 with a quad-core ARM Cortex-A76 processor at 2.4 GHz and 8 GB random access memory (RAM), running 64-bit Raspberry Pi OS with Python 3.11.2 in an isolated virtual environment. During nominal operation, RAM usage is maintained at 5.2 GB average with a 5.8 GB peak, preserving a 27.5% operational margin [10]. The event-driven stack is deployed on an x86-64 workstation (Intel Core i7-8700, 16 GB RAM, Ubuntu 22.04 LTS, Docker 24.0.5) via Docker Compose, without specialized hardware requirements or GPU acceleration, with Node-RED, n8n, PostgreSQL, Flask, and Ollama/llama3.2 running as independent containers on the same Docker network [24].

3.2. Experimental Methodology

The experimental methodology follows the validation protocol established in previous work [10], enabling direct comparison between the reactive and event-driven paradigms. All experiments were conducted under controlled ambient conditions at 24 ± 1 °C at the Frumecar S.L. facilities. Results were automatically recorded in CSV format and processed with NumPy 1.26.4 [29] for calculation of descriptive statistics (mean, standard deviation, and coefficient of variation). CPU temperature was monitored using the psutil v7.1.0 library, which accesses the System-on-Chip (SoC) thermal sensors at regular intervals [10]. Measurements were recorded every 10 s throughout the entire experimental session and automatically stored in CSV format with a timestamp.

3.2.1. Response Time Characterization

System performance was evaluated using 15 representative questions classified into three complexity levels: simple queries (

n = 5

), involving direct reading of a single process variable; moderate queries (

n = 5

), requiring simultaneous access to multiple variables or simple calculations; and complex queries (

n = 5

), involving multilevel reasoning with data from all five automation levels [10]. Each query was executed 3 times. Total response time was recorded from the detection of the end of the voice command to the onset of response synthesis.

3.2.2. Communication Protocol Latency

Industrial communication performance was characterized through automated latency measurements on Modbus TCP read operations. A total of 10,163 continuous read operations were executed on the mixing water tank level control process of Frumecar S.L. [30], recording the time from request to complete data reception to characterize communication stack behavior under real industrial network conditions.

3.2.3. Reliability Test

Operational robustness was evaluated through a 30 min autonomous operation test without manual intervention. A partial failure is defined as a query that generated a timeout or exception subsequently recovered within 30 s without system restart. Random queries were generated following a uniform temporal distribution. Metrics recorded were: success rate (percentage of queries with a valid response), failure rate (percentage with timeout or uncontrolled exception), and availability (percentage of time with the system operational without automatic restarts).

3.2.4. Case Study: Production Feasibility Verification at Frumecar S.L.

The operational impact was evaluated for the request documented in previous work [10]: production of 45 m³ of structural concrete C25/30 for delivery the following day at 07:00. The verification required contrasting plant availability, raw material inventory, scheduled production orders, and specific admixture stock, accessing all five automation levels. Three methods were compared: traditional manual navigation, the reactive paradigm, and the event-driven paradigm proposed in this work, employing the same evaluation metrics: total response time, systems consulted, physical operator displacement, supervision interruption time, and qualitative risk of human error.

4. Results

4.1. Experiment 1: Response Time Characterization

4.1.1. Quantitative Results

Table 1 presents the response times obtained per complexity level, compared with values recorded under the reactive paradigm.

4.1.2. Statistical Analysis

Simple queries recorded a mean response time of 9.84 s (

S D = 4.12

s), compared with 14.19 s under the reactive paradigm, representing an improvement of 4.35 s (30.6%). The reduction in variability is equally significant: the coefficient of variation decreases from 0.53 to 0.42, indicating greater response consistency when pre-built context is available.

Moderate queries recorded a mean response time of 11.23 s (

S D = 3.85

s), compared with 16.45 s, with an improvement of 5.22 s (31.7%). The reduction in the interquartile range (from 8.11 s to 5.70 s) confirms a more concentrated and predictable time distribution.

Complex queries recorded the greatest absolute improvement, with a mean response time of 16.47 s (

S D = 4.23

s) compared with 23.24 s, representing a reduction of 6.77 s (29.1%). This improvement is especially relevant because in the reactive paradigm complex queries required sequential access to multiple automation levels at the moment of interaction; in the event-driven paradigm, this access was performed proactively and the model processes an already-structured, available context.

The response time improvement directly reflects the effect of pre-built context on the inferential load of Mistral-7B. The thermal behavior of the system, described in the following section, reveals the finding of greatest operational impact of the event-driven architecture.

To further characterize the practical significance of the observed improvements, effect sizes were calculated using Cohen’s d for each complexity level, comparing the event-driven paradigm against the reactive baseline reported in [10]. Results indicate a medium-large effect for simple queries (

d = 0.71

), a large effect for moderate queries (

d = 0.99

), and a very large effect for complex queries (

d = 1.22

), following standard interpretation thresholds (small:

d > 0.2

; medium:

d > 0.5

; large:

d > 0.8

). The progressive increase in effect size with query complexity confirms that the event-driven paradigm provides the greatest practical benefit precisely in the scenarios where the reactive paradigm was most constrained, namely complex multilevel queries requiring simultaneous access to heterogeneous automation levels. Figure 2 illustrates the response time comparison between both paradigms across the three complexity levels.

4.2. Experiment 2: Thermal Behavior Under Sustained Operation

4.2.1. Thermal Results

The system operated in steady-state regime from the onset of evaluation, with a mean temperature of 62.1 °C (

S D = 3.52

°C) and an operating range between 56.3 °C and 71.2 °C. The absence of pronounced thermal peaks during complex queries, a characteristic of the reactive paradigm, confirms the effect of temporal decoupling of computational loads. Table 2 presents the thermal comparison between paradigms.

4.2.2. Interpretation

The 8.4 °C reduction in peak temperature and the expansion of the thermal safety margin from 5.4 °C to 13.8 °C constitute the finding of the greatest operational impact of this work. In the reactive paradigm, thermal peaks of up to 79.6 °C coincided with complex multilevel queries requiring simultaneous Mistral-7B inference with OPC UA, MQTT, and REST API protocol access; in the event-driven paradigm, these protocol queries are performed in a time-distributed manner during Industry Report construction, decoupling the inference load from the data acquisition load and eliminating the associated thermal peaks.

The 41.6% reduction in thermal variability (

S D

from 6.03 °C to 3.52 °C) is equally significant: a more stable operating temperature implies less thermal cycling of the processor, which, following the Arrhenius relationship applied to semiconductor degradation, translates into a significant extension of expected hardware lifespan [19]. In continuous industrial deployments where the system operates for months or years without hardware replacement, this improvement has a direct impact on total cost of ownership.

The expanded 13.8 °C safety margin before throttling also has direct implications for industrial deployment: while the reactive paradigm required ambient temperatures below 25 °C to avoid sporadic throttling, the event-driven architecture enables reliable operation in environments up to 33–35 °C, covering typical summer conditions in Mediterranean industrial plants such as those in the Murcia region [11]. Figure 3 presents the CPU thermal behavior comparison between both paradigms during the 30 min sustained operation session.

4.3. Experiment 3: Communication Protocol Latency

Table 3 summarizes the Modbus TCP latency results over 10,163 continuous read operations.

The mean latency of 9.14 ms is consistent with the 8.93 ms recorded under the reactive paradigm [10], confirming that incorporating the event-driven stack does not introduce additional overhead in industrial communication. To quantify the relative contribution of communication latency to total response time, the ratio was calculated for each complexity level: 9.14 ms/9840 ms = 0.093% for simple queries, 9.14 ms/11,230 ms = 0.081% for moderate queries, and 9.14 ms/16,470 ms = 0.055% for complex queries. In all cases, communication latency represents less than 0.1% of total operator response time, confirming that the system bottleneck remains LLM inference and not industrial data acquisition. This finding validates the architectural decision to maintain enterprise data in their native systems and access them on demand via heterogeneous protocols, without local replication on the edge device. Figure 4 presents the complete latency breakdown by component and the proportion of communication latency relative to total response time.

4.4. Experiment 4: Reliability During Autonomous Operation

Table 4 summarizes the operational reliability metrics during the 30 min test.

The system successfully completed all 41 queries, achieving a 100% success rate with no partial or critical failures throughout the evaluation period. The 100% availability confirms the operational robustness of the integrated architecture under continuous operation conditions. The mean temperature of 62.1 °C and peak of 71.2 °C are consistent with the results of Experiment 2, confirming the absence of cumulative thermal degradation during sustained operation.

4.5. Case Study: Operational Impact at Frumecar S.L.

4.5.1. Operational Scenario

Replicating the case study documented in previous work [10], an operator at Frumecar S.L. with 18 months of experience received the following request from the commercial department during the concrete curing process supervision: production of 45 m³ of structural concrete C25/30 for delivery the following day at 07:00. The verification required contrasting plant availability, raw material inventory, scheduled production orders, and C25/30 admixture stock, accessing all five automation levels.

4.5.2. Method Comparison

Table 5 presents the comparison between the three evaluated methods.

4.5.3. Latency Breakdown Analysis

Table 6 presents the temporal breakdown of the production feasibility query under the event-driven paradigm, compared with the reactive paradigm. These values correspond to a single representative execution; statistical data for each stage are presented in Table 1.

The elimination of the protocol data retrieval stage contributes marginally to the total improvement. The most significant reduction comes from the lower Mistral-7B inference load when processing already-structured context (5.8 s vs. 7.2 s) and from the reduction of auxiliary processes (0.5 s vs. 3.2 s). The operator maintained curing process supervision without interruption throughout the entire interaction.

4.5.4. Qualitative Impact

Beyond the quantitative metrics, the case study demonstrates operational benefits of a qualitative order. Operator continuity during the query eliminates the 16–24 min risk window that characterized the traditional method, during which the curing process remained without direct supervision. The reduction of cognitive load associated with navigating heterogeneous systems further contributes to reducing the risk of human error in the integration of information from multiple sources. The pre-built Industry Report guarantees that the information delivered to the operator is coherent and verified, being grounded in a context systematically constructed from data validated in PostgreSQL.

5. Discussion

5.1. Interpretation of Main Findings

The results confirm that incorporating the event-driven paradigm into the reference cognitive architecture produces measurable and consistent improvements across the three evaluated dimensions: response times, thermal behavior, and operational reliability. The 26–31% improvement in response times directly reflects the effect of pre-built context on the inferential load of Mistral-7B: when the Industry Report is available at query time, the model synthesizes a response over already-structured information rather than traversing heterogeneous protocols in the critical path.

These response times (9.84–16.47 s) are consistent with the operational requirements of industrial supervision and compare favorably with cloud-hosted LLM approaches, which face structural latency constraints of 200–800 ms at the network layer alone, as well as dependency on permanent connectivity and exposure of sensitive operational data to external infrastructures [4,5]. The fully local, GPU-free, cloud-independent architecture achieves 100% availability, positioning it as an operationally viable alternative for manufacturing plants where cloud deployment is not technically or economically justifiable [31].

When compared with the closest related work, the present system shows complementary and differentiated contributions. Xia et al. proposed LLM agents for industrial automation with event-driven information modeling, but they were validated in simulation rather than on a live OT network [2]. García et al. demonstrated hybrid AI and LLM-based real-time decision support in an industrial batch process but relied on cloud-hosted models [18]. Bihlmaier et al. achieved high command accuracy on a real deployed machine using LLMs and OPC UA but with cloud-hosted proprietary models [21]. The present work is, to the authors’ knowledge, the first to combine a local LLM, an event-driven context pre-computation layer, and experimental validation on a real OT network with heterogeneous protocol integration across all five automation levels, achieving sub-17 s response times without GPU or cloud dependency.

5.2. Implications for Edge Hardware Lifespan and Total Cost of Ownership

The thermal improvement represents the finding of greatest operational impact. In the reactive paradigm, the temporal coincidence between Mistral-7B inference and simultaneous access to OPC UA, MQTT, and REST API protocols generated thermal peaks of up to 79.6 °C, leaving a safety margin of only 5.4 °C before the Raspberry Pi 5 throttling threshold [10,11]. The event-driven paradigm structurally eliminates this coincidence by distributing data acquisition across the Industry Report construction cycle, so that at query time the model operates on pre-structured context with significantly lower instantaneous computational load.

The resulting 41.6% reduction in thermal variability (

S D

from 6.03 °C to 3.52 °C) has direct implications for hardware durability. Following the Arrhenius relationship, a reduction in both peak temperature and thermal variability translates into a significant extension of expected hardware lifespan [19]. In continuous industrial deployments where hardware operates for months or years without replacement, this improvement has a direct impact on total cost of ownership. The expanded 13.8 °C safety margin enables reliable operation in environments up to 33–35 °C, covering typical summer conditions in Mediterranean industrial plants such as those in the Murcia region [11].

5.3. Value of the Event-Driven Paradigm vs. the Reactive Paradigm

The comparison between paradigms reveals that the benefit of the event-driven approach does not reside solely in individual metric improvements, but in a qualitative change in the system’s operating model. In the reactive paradigm, the system remains idle between queries and concentrates its entire computational load at the moment of interaction. In the event-driven paradigm, the system works proactively and continuously, distributing the computational load over time and reserving for query time only the final synthesis over an already-constructed context.

This operating model is conceptually related to Retrieval-Augmented Generation (RAG) architectures [32], in which external knowledge is pre-indexed and made available for retrieval at inference time. The Industry Report functions as a domain-specific, event-updated operational context that serves as the retrieval substrate for the LLM—a proactive, event-triggered RAG layer rather than a static document index. Unlike conventional RAG approaches that retrieve from a static knowledge base, the Industry Report is dynamically maintained through the event-driven pipeline, ensuring that the context provided to the LLM reflects the actual plant state at query time.

This change of model has three practical consequences. First, the operator perceives lower response times because most of the work is already done. Second, the hardware operates under more favorable and stable thermal conditions because load peaks are distributed over time. Third, the system is inherently more robust to simultaneous or high-complexity queries, because the available Industry Report acts as an operational cache that absorbs inferential load variability.

5.4. Complementarity with the Reactive Paradigm

The results do not suggest that the event-driven paradigm should replace the reactive paradigm in all contexts; both are complementary and appropriate for different scenarios. The reactive paradigm is optimal for environments with very limited hardware resources, infrequent queries, or processes with low state variability where the cost of maintaining an updated context is not justified. The event-driven paradigm is preferable when the industrial process exhibits frequent variability, when thermal conditions are demanding, or when response latency and operator operational continuity are critical. This complementarity is consistent with findings in the broader IIoT literature, where event-driven and polling-based approaches serve different operational requirements [12,13]. The architecture demonstrates that both paradigms can coexist on the same hardware infrastructure, with the event-driven paradigm acting as a non-intrusive augmentation layer over the reference cognitive architecture.

5.5. Limitations

Several limitations must be acknowledged. The system was validated in a single installation in the precast concrete sector; generalization to other industrial sectors requires additional validation. The evaluation protocol—15 representative queries classified into three complexity levels (

n = 5

per level, 3 executions each), following the validated methodology of the reference cognitive architecture—enables direct comparison between paradigms under controlled conditions but does not cover the full diversity of operational scenarios across different industrial domains. The event detection thresholds (±5% for process variables) were defined empirically for the precast concrete production process at Frumecar S.L. For deployment in other industrial contexts, threshold calibration is best performed by the process engineer responsible for the installation, who has direct knowledge of the normal variability of each variable and can determine what magnitude of change constitutes an operationally significant event. This criterion is consistent with standard practice in statistical process control. The architecture facilitates this adaptation since the detection thresholds are configurable parameters in the n8n workflow, requiring no modification of the underlying codebase. As a starting reference for processes without historical data, the ±5% value used in this work can serve as an initial baseline, subject to adjustment based on real operational experience [14]. The delta/full update logic in n8n increases in complexity with the number of monitored variables, requiring formal testing methodologies for large-scale deployments [16,17]. Finally, while the architecture implements semantic interpretation and partial pragmatic reasoning through the five-layer cognitive structure, full pragmatic competence—including deep operator intent modeling, organizational context awareness, and theory-of-mind reasoning—lies beyond the current scope. Addressing this limitation represents a relevant direction toward next-generation manufacturing systems, where neurosymbolic architectures and explicit intent models could extend the system’s cognitive depth beyond the current operational envelope.

5.6. Future Work

The results obtained open several development directions to extend the scope of the system. As a natural continuation, the operational period at Frumecar S.L. will be extended under continuous production conditions to characterize long-term system behavior. Extension of the event-driven paradigm to all variables across the five automation levels would broaden the scope of the Industry Report and further enhance response time improvements for complex multilevel queries. Integration of adaptive learning mechanisms for automatic calibration of event-detection thresholds would reduce dependence on process-specific manual configuration.

Regarding hardware, the current validation was conducted on a PC + Raspberry Pi 5 (8 GB RAM) scheme. As future work, an existing installation can incorporate a second Raspberry Pi 5 to host the event-driven stack, while a new deployment can opt for two units of 8 or 16 GB according to project scope.

Finally, a formal comparison of the Industry Report response accuracy against static RAG approaches for industrial knowledge bases represents a valuable avenue for validation [32].

6. Conclusions

This work presents the incorporation of the event-driven paradigm into a five-layer industrial cognitive architecture for supervision with local LLMs on edge devices, demonstrating that the temporal decoupling of industrial data acquisition from model inference produces measurable and consistent improvements in response times, thermal behavior, and hardware operational sustainability. The architecture operates entirely locally without cloud dependency or GPU acceleration, extending the validated results of the reference reactive system to a new operational envelope that is broader, thermally safer, and qualitatively more robust.

Three main experimental contributions are established. First, a 26–31% reduction in response times—reaching mean times of 9.84 s, 11.23 s, and 16.47 s for simple, moderate, and complex queries, respectively—demonstrates that the pre-built Industry Report allows Mistral-7B to synthesize responses over already-structured information, reducing the inferential load at query time. This positions the system favorably compared to cloud-based approaches that face structural network latency constraints before any inference cost.

Second, the reduction of 8.4 °C in peak temperature (from 79.6 °C to 71.2 °C), the 41.6% decrease in thermal variability, and the expansion of the safety margin before CPU throttling from 5.4 °C to 13.8 °C constitute the finding of greatest operational impact. This thermal improvement extends expected hardware lifespan through reduced thermal cycling, broadens the range of deployable industrial environments to ambient temperatures up to 33–35 °C, and directly reduces total cost of ownership in continuous industrial deployments.

Third, the 100% success rate and availability over 30 min of autonomous operation confirm that incorporating the event-driven stack does not compromise the operational robustness of the reference architecture. The non-intrusive augmentation model—in which no component of the original implementation is modified and the pre-built Industry Report integrates transparently as an immediate context for the LLM—facilitates incremental adoption in existing industrial deployments without requiring system replacement or reconfiguration.

The principal limitation of this work is the validation scope: a single installation in the precast concrete sector with a small statistical sample and a 30 min autonomous operation window.

In summary, the convergence of the event-driven paradigm, low-code orchestration, and local LLM inference on edge hardware represents a concrete step toward proactive, thermally sustainable, and operationally robust industrial supervision systems, aligned with the human-centric and cognitive democratization objectives of Industry 5.0, enabling operators to remain continuously focused on their primary supervision tasks while the system proactively maintains up-to-date operational knowledge.

Author Contributions

Conceptualization, F.H.-C. and A.G.-G.; methodology, F.H.-C. and A.P.-M.; software, F.H.-C. and A.P.-M.; validation, F.H.-C., A.P.-M. and F.L.-A.; formal analysis, F.H.-C. and F.G.-C.; investigation, F.H.-C., A.P.-M. and A.G.-G.; resources, A.G.-G. and F.G.-C.; data curation, F.H.-C. and A.P.-M.; writing—original draft preparation, F.H.-C. and A.P.-M.; writing—review and editing, F.H.-C., A.G.-G., F.G.-C. and F.L.-A.; visualization, F.H.-C.; supervision, A.G.-G. and F.L.-A.; project administration, A.G.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The source code of the reference cognitive architecture is publicly available at https://github.com/futuryteg/LLM-cognitive-industrial (accessed on 1 May 2026). The source code of the event-driven stack is available at https://github.com/antoniojpmarin/Industrial-Digital-Assistant (accessed on 1 May 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLM	Large Language Model
IIoT	Industrial Internet of Things
EDA	Event-Driven Architecture
HMI	Human–Machine Interface
PLC	Programmable Logic Controller
SCADA	Supervisory Control and Data Acquisition
MES	Manufacturing Execution System
ERP	Enterprise Resource Planning
OPC UA	OPC Unified Architecture
MQTT	Message Queuing Telemetry Transport
REST API	Representational State Transfer Application Programming Interface
CPU	Central Processing Unit
RAM	Random Access Memory
GPU	Graphics Processing Unit
GGUF	GPT-Generated Unified Format
SBC	Single-Board Computer
OT	Operational Technology
SD	Standard Deviation
RAG	Retrieval-Augmented Generation

References

Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0—Inception, conception and perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
Xia, Y.; Jazdi, N.; Zhang, J.; Shah, C.; Weyrich, M. Control Industrial Automation System with Large Language Model Agents. In Proceedings of the 30th IEEE International Conference on Emerging Technologies and Factory Automation, Padova, Italy, 9–12 September 2025; pp. 1–8. [Google Scholar] [CrossRef]
Lee, J.; Su, H. A Unified Industrial Large Knowledge Model Framework in Industry 4.0 and Smart Manufacturing. Int. J. AI Mater. Des. 2024, 1, 41–47. [Google Scholar] [CrossRef]
Friha, O.; Ferrag, M.A.; Kantarci, B.; Canad, B.; Zemouri, A.; Shu, L.; Yu, X. LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness. IEEE Open J. Commun. Soc. 2024, 5, 5799–5843. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, Y.; Qian, B.; Shi, X.; Shu, Y.; Chen, J. A Review on Edge Large Language Models: Design, Execution, and Applications. ACM Comput. Surv. 2025, 57, 209. [Google Scholar] [CrossRef]
Xu, D.; Liu, Y.; Guo, L.; Lin, X.; Xu, C. LLMCad: Fast and Scalable On-Device Large Language Model Inference. arXiv 2023, arXiv:2309.04255. [Google Scholar] [CrossRef]
Qin, R.; Hu, Z.; Zhao, K.; Chen, Z.; Wu, M.; Pan, R.; Xiao, B.; Zhu, Z.; Du, Y.; Shu, L.; et al. Empirical Guidelines for Deploying LLMs onto Resource-Constrained Edge Devices. ACM Trans. Des. Autom. Electron. Syst. 2024, 57, 15. [Google Scholar] [CrossRef]
Rajbhandari, S.; Yao, Z.; Awan, A.A.; Aminabadi, R.Y.; Ma, C.; Zheng, L.; He, Y. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. Adv. Neural Inf. Process. Syst. 2022, 35, 13168–13183. [Google Scholar]
Stafman, L.; Forshaw, M.; Mehta, N.; Bates, O. Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency. ACM Trans. Internet Things 2025, 6, 28. [Google Scholar] [CrossRef]
Hidalgo-Castelo, F.; Guerrero-González, A.; García-Córdova, F.; Lloret-Abrisqueta, F.; Torregrosa Bonet, C. Multimodal Cognitive Architecture with Local Generative AI for Industrial Control of Concrete Plants on Edge Devices. Sensors 2025, 25, 7540. [Google Scholar] [CrossRef]
Raspberry Pi Ltd. Frequency Management and Thermal Control. Raspberry Pi Documentation. Available online: https://www.raspberrypi.com/documentation/computers/raspberry-pi.html (accessed on 1 May 2026).
Zyrianoff, I.; Kamienski, C.; Rautenberg, S.; Borger, S. Sugar Shack 4.0: Practical Demonstration of an IIoT-Based Event-Driven Automation System. arXiv 2025, arXiv:2510.15708. [Google Scholar] [CrossRef]
Wen, Y.; Zhang, S.; Zhang, S.; Zyrianoff, I. Event-Driven Architecture and Intelligent Decision Tree Facilitated Sustainable Trade Activity Monitoring Model Design. PLoS ONE 2025, 20, e0331663. [Google Scholar] [CrossRef] [PubMed]
Raeiszadeh, M.; Ebrahimzadeh, A.; Glitho, R.H.; Eker, J.; Mini, R.A.F. Real-Time Adaptive Anomaly Detection in Industrial IoT Environments. IEEE Trans. Netw. Serv. Manag. 2024, 21, 6839–6856. [Google Scholar] [CrossRef]
Adhikari, D.; Jiang, W.; Zhan, J.; Rawat, D.B.; Bhattarai, A. Recent Advances in Anomaly Detection in Internet of Things: Status, Challenges, and Perspectives. Comput. Sci. Rev. 2024, 54, 100665. [Google Scholar] [CrossRef]
Venkiteela, P. n8n: An Open-Source Workflow Automation Platform for Enterprise Integration and AI-Driven Orchestration. Int. J. Comput. Appl. 2025, 187, 44–52. [Google Scholar] [CrossRef]
Angarita, R.; Santos, J.S.; Furtado, P.H.T.; Moreira, P.C.M.; Diehl, F.C. Evaluating Workflow Automation Efficiency Using n8n: A Small-Scale Business Case Study. arXiv 2026, arXiv:2602.01311. [Google Scholar] [CrossRef]
García, J.; Rios-Colque, L.; Peña, A.; Rojas, L. Hybrid AI and LLM-Enabled Agent-Based Real-Time Decision Support Architecture for Industrial Batch Processes: A Clean-in-Place Case Study. AI 2026, 7, 51. [Google Scholar] [CrossRef]
Suhir, E. Three-Step Concept in Modeling Reliability: Boltzmann–Arrhenius–Zhurkov Physics-of-Failure-Based Equation Sandwiched Between Two Statistical Models. Microelectron. Reliab. 2014, 54, 2594–2603. [Google Scholar] [CrossRef]
Hadidi, R.; Banitalebi-Dehkordi, A.; Farhadloo, M.; Alian, A. An Evaluation of LLMs Inference on Popular Single-Board Computers. arXiv 2025, arXiv:2511.07425. [Google Scholar] [CrossRef]
Xia, Y.; Xiao, Z.; Jazdi, N.; Weyrich, M. Generation of Asset Administration Shell with Large Language Model Agents: Toward Semantic Interoperability in Digital Twins in the Context of Industry 4.0. IEEE Access 2024, 12, 84863–84877. [Google Scholar] [CrossRef]
Viswanadhapalli, V. The Future of Intelligent Automation: How Low-Code/No-Code Platforms Are Transforming AI Decisioning. Int. J. Eng. Comput. Sci. 2025, 14, 26809–26817. [Google Scholar] [CrossRef]
Sundberg, L.; Holmström, J. Democratizing Artificial Intelligence: How No-Code AI Can Leverage Machine Learning Operations. Bus. Horiz. 2023, 66, 777–788. [Google Scholar] [CrossRef]
Krivic, P.; Kusek, M.; Skocir, P.; Cavrak, I. Performance Evaluation of Container Orchestration Tools in Edge Computing Environments. Sensors 2023, 23, 4008. [Google Scholar] [CrossRef]
Hidalgo-Castelo, F. LLM Cognitive Industrial—Reference Architecture. GitHub Repository. Available online: https://github.com/futuryteg/LLM-cognitive-industrial (accessed on 1 May 2026).
Hidalgo-Castelo, F. Aplicación de la Inteligencia Artificial Generativa en la Robótica Colaborativa para Mejorar la Eficiencia y Flexibilidad en Entornos Industriales. Ph.D. Thesis, Universidad Politécnica de Cartagena, Cartagena, Spain, 2026. [Google Scholar]
Hidalgo-Castelo, F.; Piñera-Marín, A. Industrial Digital Assistant—Event-Driven Stack. GitHub Repository. Available online: https://github.com/antoniojpmarin/Industrial-Digital-Assistant (accessed on 1 May 2026).
Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
Zaheri, D.; Refan, M.H. Design and Implementation of Modbus RTU/TCP to Profibus Gateway Using Raspberry Pi. In Proceedings of the 15th International Conference Computer and Automation Engineering (ICCAE), Sydney, Australia, 3–5 March 2023; pp. 109–113. [Google Scholar] [CrossRef]
Dong, Q.; Chen, X.; Satyanarayanan, M. Creating Edge AI from Cloud-based LLMs. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications (HotMobile), San Diego, CA, USA, 28–29 February 2024. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]

Figure 1. Five-layer cognitive architecture for continuous industrial supervision. Layer 4 integrates the Industrial Data Layer on Raspberry Pi 5 (blue), centralizing access to all five automation pyramid levels via OPC UA, MQTT, and REST API, with the event-driven stack on PC (orange) implementing continuous operational context maintenance via Node-RED, n8n, Ollama/llama3.2, PostgreSQL, and Flask.

Figure 2. Response time comparison between the reactive paradigm [10] and the event-driven paradigm (this work) for three query complexity levels (mean ± SD,

n = 15

per level). Percentage improvements of 30.6%, 31.7%, and 29.1% are shown for simple, moderate, and complex queries, respectively.

Figure 2. Response time comparison between the reactive paradigm [10] and the event-driven paradigm (this work) for three query complexity levels (mean ± SD,

n = 15

per level). Percentage improvements of 30.6%, 31.7%, and 29.1% are shown for simple, moderate, and complex queries, respectively.

Figure 3. CPU thermal behavior comparison between the reactive paradigm [10] and the event-driven paradigm (this work) over 30 min of sustained operation at ambient temperature 24 °C ± 1 °C. The event-driven paradigm reduces peak temperature from 79.6 °C to 71.2 °C and expands the safety margin before throttling from 5.4 °C to 13.8 °C.

Figure 4. Communication latency as a fraction of total response time. Left panel: complete latency breakdown by component (LLM inference, voice processing, other processes, and communication) for each complexity level. Right panel: proportion of communication latency (9.14 ms) relative to total response time, confirming values below 0.1% across all complexity levels.

Table 1. Response times (seconds) per complexity level compared with the reactive paradigm.

Complexity	N	Mean	SD	Median	Q1	Q3	Min	Max	Improvement
Simple	5	9.84	4.12	9.90	6.20	13.40	4.10	15.30	4.35 s (30.6%)
Moderate	5	11.23	3.85	10.95	8.40	14.10	7.20	17.80	5.22 s (31.7%)
Complex	5	16.47	4.23	15.80	13.20	19.90	11.40	24.10	6.77 s (29.1%)

Table 2. Thermal behavior comparison between paradigms.

Metric	Reactive Paradigm	Event-Driven Paradigm	Variation
Mean temperature	69.3 °C	62.1 °C	−7.2 °C
Peak temperature	79.6 °C	71.2 °C	−8.4 °C
Standard deviation	6.03 °C	3.52 °C	−41.6%
Throttling safety margin	5.4 °C	13.8 °C	+8.4 °C
Minimum temperature	61.5 °C	56.3 °C	−5.2 °C

Table 3. Modbus TCP communication latency over 10,163 continuous read operations.

Metric	Value
Mean read time	9.14 ms
Maximum time	32.40 ms
Minimum time	4.23 ms
Total readings	10,163

Table 4. Operational reliability metrics over 30 min of autonomous operation.

Metric	Value
Total queries executed	41
Successfully completed	41
Partial failures (recovered)	0
Critical failures	0
Success rate	100%
Availability	100%
Mean RAM usage (Raspberry Pi)	5.3 GB
Peak RAM usage (Raspberry Pi)	5.9 GB
Mean CPU temperature	62.1 °C
Peak CPU temperature	71.2 °C

Table 5. Operational comparison of the three methods for production feasibility verification.

Metric	Traditional	Reactive	Event-Driven
Total time	16–24 min	17.9 s	13.2 s
Physical displacement	240 m	0 m	0 m
Manual systems consulted	4	0	0
Supervision interruption	16–24 min	0 s	0 s
Improvement vs. traditional	—	54–80×	73–109×
Improvement vs. reactive	—	—	26%

Table 6. Latency breakdown for production feasibility verification query.

Stage	Reactive	Event-Driven
Voice transcription	1.2 s	1.2 s
Intent analysis	2.7 s	2.1 s
Data retrieval (protocols)	45 ms	0 ms (pre-built)
Response synthesis (Mistral-7B)	7.2 s	5.8 s
Voice synthesis	3.6 s	3.6 s
Other processes	3.2 s	0.5 s
Total	17.9 s	13.2 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hidalgo-Castelo, F.; Guerrero-González, A.; García-Córdova, F.; Lloret-Abrisqueta, F.; Piñera-Marín, A. Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments. Electronics 2026, 15, 2547. https://doi.org/10.3390/electronics15122547

AMA Style

Hidalgo-Castelo F, Guerrero-González A, García-Córdova F, Lloret-Abrisqueta F, Piñera-Marín A. Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments. Electronics. 2026; 15(12):2547. https://doi.org/10.3390/electronics15122547

Chicago/Turabian Style

Hidalgo-Castelo, Fernando, Antonio Guerrero-González, Francisco García-Córdova, Francisco Lloret-Abrisqueta, and Antonio Piñera-Marín. 2026. "Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments" Electronics 15, no. 12: 2547. https://doi.org/10.3390/electronics15122547

APA Style

Hidalgo-Castelo, F., Guerrero-González, A., García-Córdova, F., Lloret-Abrisqueta, F., & Piñera-Marín, A. (2026). Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments. Electronics, 15(12), 2547. https://doi.org/10.3390/electronics15122547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Local LLMs for Industrial Supervision and Control: An Edge AI Event-Driven Architecture for Proactive Operational Context Management in Real Industrial Environments

Abstract

1. Introduction

2. Related Work

2.1. Local LLMs for Industrial Applications

2.2. Event-Driven Architectures in IIoT

2.3. Thermal Management in Edge AI Deployments

2.4. Orchestration Platforms for Industrial AI

3. Materials and Methods

3.1. Architecture Overview

3.1.1. Reference Cognitive Architecture

3.1.2. Incorporation of the Event-Driven Paradigm into the Control Layer

3.1.3. Hardware Platforms

3.2. Experimental Methodology

3.2.1. Response Time Characterization

3.2.2. Communication Protocol Latency

3.2.3. Reliability Test

3.2.4. Case Study: Production Feasibility Verification at Frumecar S.L.

4. Results

4.1. Experiment 1: Response Time Characterization

4.1.1. Quantitative Results

4.1.2. Statistical Analysis

4.2. Experiment 2: Thermal Behavior Under Sustained Operation

4.2.1. Thermal Results

4.2.2. Interpretation

4.3. Experiment 3: Communication Protocol Latency

4.4. Experiment 4: Reliability During Autonomous Operation

4.5. Case Study: Operational Impact at Frumecar S.L.

4.5.1. Operational Scenario

4.5.2. Method Comparison

4.5.3. Latency Breakdown Analysis

4.5.4. Qualitative Impact

5. Discussion

5.1. Interpretation of Main Findings

5.2. Implications for Edge Hardware Lifespan and Total Cost of Ownership

5.3. Value of the Event-Driven Paradigm vs. the Reactive Paradigm

5.4. Complementarity with the Reactive Paradigm

5.5. Limitations

5.6. Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI