1. Introduction
In modern manufacturing plants, the operator is the first line of response to any incident. However, accessing basic process information requires interrupting ongoing physical tasks, walking to the control room, authenticating in multiple systems, and navigating heterogeneous interfaces that were not designed to interoperate. In a precast concrete plant, verifying the feasibility of an urgent order requires sequentially querying the PLC for equipment status, the SCADA system for process monitoring, the MES for production scheduling, and the ERP system for raw material inventory. This workflow can consume 15–30 min, delaying decision-making and removing the operator from direct process supervision, with the consequent increase in the risk of undetected deviations [
1].
The maturity reached by LLMs has offered a promising solution to this information fragmentation. The ability to interact with complex industrial systems using natural language, without requiring knowledge of underlying protocols or topologies, represents a paradigm shift in the operator–plant relationship [
2,
3]. However, cloud-based commercial solutions present critical limitations for industrial environments: network latencies of 200–800 ms, dependence on permanent connectivity, exposure of sensitive operational data to external infrastructures, and recurring costs that are unaffordable for most manufacturing plants [
4,
5].
Advances in quantization techniques and CPU-optimized inference engines have made it possible to run LLMs locally on low-cost hardware, preserving data privacy and eliminating cloud dependency [
6,
7]. GPT-Generated Unified Format (GGUF) quantization at 4-bit precision enables models with 3–13 billion parameters to operate on hardware without dedicated GPU acceleration [
8]. Ollama has emerged as a practical framework for containerized local LLM deployment, supporting multiple quantized models, including llama3.2 [
9]. On this basis, previous work developed and implemented a five-layer cognitive architecture in a real precast concrete manufacturing plant, integrating the Mistral-7B model quantized on a Raspberry Pi 5 with heterogeneous industrial protocols—namely OPC Unified Architecture (OPC UA), Message Queuing Telemetry Transport (MQTT), and REST Application Programming Interface (REST API)—achieving a reduction in information access time from 15–30 min to 14–23 s, with a 26–77× improvement over the manual method and 100% reliability during autonomous operation [
10].
Despite these results, the architecture operated under a reactive paradigm in which model inference represented 55.3% of total latency, with thermal peaks of up to 79.6 °C and a safety margin of only 5.4 °C before the throttling threshold of the Raspberry Pi 5 [
10,
11]. The Raspberry Pi 5 initiates soft throttling at 80 °C and enforces hard throttling at 85 °C, progressively reducing ARM core clock frequency to protect the processor [
11]; sustained operation near these limits represents a critical constraint for continuous industrial deployments.
The event-driven architecture (EDA) paradigm, widely applied in Industrial Internet of Things (IIoT) contexts [
12,
13], offers a structural solution by decoupling computation from the moment of the user request, distributing the computational load over time rather than concentrating it at the moment of user interaction [
14]. Node-RED has been validated as an event processing platform for industrial IoT applications, providing visual workflow programming and native support for industrial protocols, including Modbus TCP, OPC UA, and MQTT [
15]. n8n extends this paradigm with advanced orchestration capabilities, conditional logic, and integration with external services [
16,
17]. Recent work has demonstrated the feasibility of integrating LLMs with event-driven information models to provide real-time context for industrial automation systems [
2,
18]; however, such approaches have been validated primarily in simulation or cloud environments, and their integration with local LLMs on resource-constrained edge hardware in real operational technology (OT) networks remains largely unexplored.
Thermal behavior is a critical constraint for continuous LLM inference on single-board computers (SBCs). Thermal cycling caused by repeated temperature variations is a primary degradation mechanism in embedded devices, and following the Arrhenius relationship, a reduction in peak temperature and thermal variability directly extends expected hardware lifespan [
19]. Recent benchmarks confirm that SBCs such as the Raspberry Pi 5 can sustain quantized LLM workloads, but thermal management must be explicitly designed for sustained operation [
20]. None of the reviewed works combines an event-driven architecture with a local LLM and experimental validation in a real industrial plant with heterogeneous OT protocol integration —the precise contribution of this work.
The main contributions of this work are: (1) an event-driven augmentation layer that keeps the operational context permanently updated without modifying any component of the reference architecture, enabling non-intrusive incremental adoption in existing industrial deployments; (2) a 26–31% reduction in response times through pre-built context that reduces LLM inferential load in the critical response path; (3) a significant improvement in edge hardware thermal sustainability, expanding the safety margin before throttling from 5.4 °C to 13.8 °C, extending hardware lifespan and broadening deployable environments; and (4) experimental validation in a real industrial environment at Frumecar S.L. operating heterogeneous OT protocols across all five levels of the automation pyramid.
The article is organized as follows:
Section 2 reviews related work;
Section 3 describes the architecture and methods;
Section 4 presents the experimental results;
Section 5 discusses the findings and limitations; and
Section 6 summarizes the main conclusions.
5. Discussion
5.1. Interpretation of Main Findings
The results confirm that incorporating the event-driven paradigm into the reference cognitive architecture produces measurable and consistent improvements across the three evaluated dimensions: response times, thermal behavior, and operational reliability. The 26–31% improvement in response times directly reflects the effect of pre-built context on the inferential load of Mistral-7B: when the Industry Report is available at query time, the model synthesizes a response over already-structured information rather than traversing heterogeneous protocols in the critical path.
These response times (9.84–16.47 s) are consistent with the operational requirements of industrial supervision and compare favorably with cloud-hosted LLM approaches, which face structural latency constraints of 200–800 ms at the network layer alone, as well as dependency on permanent connectivity and exposure of sensitive operational data to external infrastructures [
4,
5]. The fully local, GPU-free, cloud-independent architecture achieves 100% availability, positioning it as an operationally viable alternative for manufacturing plants where cloud deployment is not technically or economically justifiable [
31].
When compared with the closest related work, the present system shows complementary and differentiated contributions. Xia et al. proposed LLM agents for industrial automation with event-driven information modeling, but they were validated in simulation rather than on a live OT network [
2]. García et al. demonstrated hybrid AI and LLM-based real-time decision support in an industrial batch process but relied on cloud-hosted models [
18]. Bihlmaier et al. achieved high command accuracy on a real deployed machine using LLMs and OPC UA but with cloud-hosted proprietary models [
21]. The present work is, to the authors’ knowledge, the first to combine a local LLM, an event-driven context pre-computation layer, and experimental validation on a real OT network with heterogeneous protocol integration across all five automation levels, achieving sub-17 s response times without GPU or cloud dependency.
5.2. Implications for Edge Hardware Lifespan and Total Cost of
Ownership
The thermal improvement represents the finding of greatest operational impact. In the reactive paradigm, the temporal coincidence between Mistral-7B inference and simultaneous access to OPC UA, MQTT, and REST API protocols generated thermal peaks of up to 79.6 °C, leaving a safety margin of only 5.4 °C before the Raspberry Pi 5 throttling threshold [
10,
11]. The event-driven paradigm structurally eliminates this coincidence by distributing data acquisition across the Industry Report construction cycle, so that at query time the model operates on pre-structured context with significantly lower instantaneous computational load.
The resulting 41.6% reduction in thermal variability (
from 6.03 °C to 3.52 °C) has direct implications for hardware durability. Following the Arrhenius relationship, a reduction in both peak temperature and thermal variability translates into a significant extension of expected hardware lifespan [
19]. In continuous industrial deployments where hardware operates for months or years without replacement, this improvement has a direct impact on total cost of ownership. The expanded 13.8 °C safety margin enables reliable operation in environments up to 33–35 °C, covering typical summer conditions in Mediterranean industrial plants such as those in the Murcia region [
11].
5.3. Value of the Event-Driven Paradigm vs. the Reactive
Paradigm
The comparison between paradigms reveals that the benefit of the event-driven approach does not reside solely in individual metric improvements, but in a qualitative change in the system’s operating model. In the reactive paradigm, the system remains idle between queries and concentrates its entire computational load at the moment of interaction. In the event-driven paradigm, the system works proactively and continuously, distributing the computational load over time and reserving for query time only the final synthesis over an already-constructed context.
This operating model is conceptually related to Retrieval-Augmented Generation (RAG) architectures [
32], in which external knowledge is pre-indexed and made available for retrieval at inference time. The Industry Report functions as a domain-specific, event-updated operational context that serves as the retrieval substrate for the LLM—a proactive, event-triggered RAG layer rather than a static document index. Unlike conventional RAG approaches that retrieve from a static knowledge base, the Industry Report is dynamically maintained through the event-driven pipeline, ensuring that the context provided to the LLM reflects the actual plant state at query time.
This change of model has three practical consequences. First, the operator perceives lower response times because most of the work is already done. Second, the hardware operates under more favorable and stable thermal conditions because load peaks are distributed over time. Third, the system is inherently more robust to simultaneous or high-complexity queries, because the available Industry Report acts as an operational cache that absorbs inferential load variability.
5.4. Complementarity with the Reactive Paradigm
The results do not suggest that the event-driven paradigm should replace the reactive paradigm in all contexts; both are complementary and appropriate for different scenarios. The reactive paradigm is optimal for environments with very limited hardware resources, infrequent queries, or processes with low state variability where the cost of maintaining an updated context is not justified. The event-driven paradigm is preferable when the industrial process exhibits frequent variability, when thermal conditions are demanding, or when response latency and operator operational continuity are critical. This complementarity is consistent with findings in the broader IIoT literature, where event-driven and polling-based approaches serve different operational requirements [
12,
13]. The architecture demonstrates that both paradigms can coexist on the same hardware infrastructure, with the event-driven paradigm acting as a non-intrusive augmentation layer over the reference cognitive architecture.
5.5. Limitations
Several limitations must be acknowledged. The system was validated in a single installation in the precast concrete sector; generalization to other industrial sectors requires additional validation. The evaluation protocol—15 representative queries classified into three complexity levels (
per level, 3 executions each), following the validated methodology of the reference cognitive architecture—enables direct comparison between paradigms under controlled conditions but does not cover the full diversity of operational scenarios across different industrial domains. The event detection thresholds (±5% for process variables) were defined empirically for the precast concrete production process at Frumecar S.L. For deployment in other industrial contexts, threshold calibration is best performed by the process engineer responsible for the installation, who has direct knowledge of the normal variability of each variable and can determine what magnitude of change constitutes an operationally significant event. This criterion is consistent with standard practice in statistical process control. The architecture facilitates this adaptation since the detection thresholds are configurable parameters in the n8n workflow, requiring no modification of the underlying codebase. As a starting reference for processes without historical data, the ±5% value used in this work can serve as an initial baseline, subject to adjustment based on real operational experience [
14]. The delta/full update logic in n8n increases in complexity with the number of monitored variables, requiring formal testing methodologies for large-scale deployments [
16,
17]. Finally, while the architecture implements semantic interpretation and partial pragmatic reasoning through the five-layer cognitive structure, full pragmatic competence—including deep operator intent modeling, organizational context awareness, and theory-of-mind reasoning—lies beyond the current scope. Addressing this limitation represents a relevant direction toward next-generation manufacturing systems, where neurosymbolic architectures and explicit intent models could extend the system’s cognitive depth beyond the current operational envelope.
5.6. Future Work
The results obtained open several development directions to extend the scope of the system. As a natural continuation, the operational period at Frumecar S.L. will be extended under continuous production conditions to characterize long-term system behavior. Extension of the event-driven paradigm to all variables across the five automation levels would broaden the scope of the Industry Report and further enhance response time improvements for complex multilevel queries. Integration of adaptive learning mechanisms for automatic calibration of event-detection thresholds would reduce dependence on process-specific manual configuration.
Regarding hardware, the current validation was conducted on a PC + Raspberry Pi 5 (8 GB RAM) scheme. As future work, an existing installation can incorporate a second Raspberry Pi 5 to host the event-driven stack, while a new deployment can opt for two units of 8 or 16 GB according to project scope.
Finally, a formal comparison of the Industry Report response accuracy against static RAG approaches for industrial knowledge bases represents a valuable avenue for validation [
32].
6. Conclusions
This work presents the incorporation of the event-driven paradigm into a five-layer industrial cognitive architecture for supervision with local LLMs on edge devices, demonstrating that the temporal decoupling of industrial data acquisition from model inference produces measurable and consistent improvements in response times, thermal behavior, and hardware operational sustainability. The architecture operates entirely locally without cloud dependency or GPU acceleration, extending the validated results of the reference reactive system to a new operational envelope that is broader, thermally safer, and qualitatively more robust.
Three main experimental contributions are established. First, a 26–31% reduction in response times—reaching mean times of 9.84 s, 11.23 s, and 16.47 s for simple, moderate, and complex queries, respectively—demonstrates that the pre-built Industry Report allows Mistral-7B to synthesize responses over already-structured information, reducing the inferential load at query time. This positions the system favorably compared to cloud-based approaches that face structural network latency constraints before any inference cost.
Second, the reduction of 8.4 °C in peak temperature (from 79.6 °C to 71.2 °C), the 41.6% decrease in thermal variability, and the expansion of the safety margin before CPU throttling from 5.4 °C to 13.8 °C constitute the finding of greatest operational impact. This thermal improvement extends expected hardware lifespan through reduced thermal cycling, broadens the range of deployable industrial environments to ambient temperatures up to 33–35 °C, and directly reduces total cost of ownership in continuous industrial deployments.
Third, the 100% success rate and availability over 30 min of autonomous operation confirm that incorporating the event-driven stack does not compromise the operational robustness of the reference architecture. The non-intrusive augmentation model—in which no component of the original implementation is modified and the pre-built Industry Report integrates transparently as an immediate context for the LLM—facilitates incremental adoption in existing industrial deployments without requiring system replacement or reconfiguration.
The principal limitation of this work is the validation scope: a single installation in the precast concrete sector with a small statistical sample and a 30 min autonomous operation window.
In summary, the convergence of the event-driven paradigm, low-code orchestration, and local LLM inference on edge hardware represents a concrete step toward proactive, thermally sustainable, and operationally robust industrial supervision systems, aligned with the human-centric and cognitive democratization objectives of Industry 5.0, enabling operators to remain continuously focused on their primary supervision tasks while the system proactively maintains up-to-date operational knowledge.