A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge

Cirillo, Luca; Gotelli, Marco; Massei, Marina; Sina, Xhulia; Solina, Vittorio

doi:10.3390/ai6120304

Open AccessArticle

A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge

by

Luca Cirillo

¹

,

Marco Gotelli

¹

,

Marina Massei

²

,

Xhulia Sina

¹

and

Vittorio Solina

^3,*

¹

Dipartimento di Ingegneria Meccanica, Energetica, Gestionale e dei Trasporti, Genoa University, Via Opera Pia 15, 16145 Genova, Italy

²

S4F SIM4Future, Via Trento 43, 16145 Genova, Italy

³

Department of Mechanical, Energy and Management Engineering, University of Calabria, Ponte Pietro Bucci, Cubo 45/C, 87036 Rende, Italy

^*

Author to whom correspondence should be addressed.

AI 2025, 6(12), 304; https://doi.org/10.3390/ai6120304

Submission received: 8 October 2025 / Revised: 7 November 2025 / Accepted: 18 November 2025 / Published: 25 November 2025

(This article belongs to the Special Issue AI for Industrial Operation and Maintenance: Recognition Challenges with Limited Data Condition)

Download

Browse Figures

Versions Notes

Abstract

In capital-intensive industries, operational knowledge is often trapped in unstructured technical manuals, creating a barrier to efficient and reliable maintenance planning. This work addresses the need for an integrated system that can automate knowledge extraction and generate optimized, resilient, operational plans. A synergistic multi-agent framework is introduced that transforms unstructured documents into a structured knowledge base using a self-validating pipeline. This validated knowledge feeds a scheduling engine that combines multi-objective optimization with discrete-event simulation to generate robust, capacity-aware plans. The framework was validated on a complex maritime case study. The system successfully constructed a high-fidelity knowledge base from unstructured manuals and the scheduling engine produced a viable, capacity-aware operational plan for 118 interventions. The optimized plan respected all daily (6) and weekly (28) task limits, executing 64 tasks on their nominal date, bringing 8 forward, and deferring 46 by an average of only 2.0 days (95th percentile 4.8 days) to smooth the workload and avoid bottlenecks. An interactive user interface with a chatbot and planning calendar provides verifiable “plan-to-page” traceability, demonstrating a novel, end-to-end synthesis of document intelligence, agentic AI, and simulation to unlock strategic value from legacy documentation in high-stakes environments.

Keywords:

agentic AI; maintenance scheduling; simulation-based optimization; document intelligence; retrieval-augmented generation; explainable AI; operational resilience

1. Introduction

Operational management in capital-intensive sectors such as maritime, aerospace, and advanced manufacturing industries faces a universal challenge. The foundational knowledge required for safe and efficient operations is predominantly locked within vast archives of unstructured technical manuals. This “tyranny of the technical manual” represents a systemic bottleneck where critical protocols for maintenance, safety, and compliance are trapped in legacy formats incompatible with modern data-driven decision systems. This forces organizations into laborious, error-prone manual data transcription processes, creating operational friction, introducing risks, and breaking the chain of accountability between a planned action and its original justification. The rise of artificial intelligence offers powerful tools to address this challenge. Technologies for predictive maintenance [1], agentic AI for automating complex workflows [2], and discrete-event simulation for assessing risk and resilience [3] are maturing rapidly. These technologies, however, have largely evolved in separate research streams. Their full potential remains unrealized because of a lack of integration. The output of a document extraction system is rarely connected directly to a scheduling engine, and the plans from that engine are seldom stress-tested for resilience in a systematic way. A review of the current state of the art reveals this integration gap. While progress in individual areas is substantial, there is a distinct lack of holistic, end-to-end frameworks that guarantee both traceability and resilience, from the source document to the final operational plan. This creates a “trust deficit” where advanced AI systems are perceived as “black boxes,” which is a primary barrier to their adoption in regulated, high-stakes environments. There is a need for a solution that does not just extract and plan but does so in a manner that is reliable, verifiable, and robust against real-world uncertainty. This paper introduces a synergistic multi-agent framework that transforms unstructured technical knowledge into optimized and resilient operational plans with full traceability from the plan back to the source page. The main contributions of this work are threefold. First, a multi-agent system is designed and implemented to automate the end-to-end process from ingesting unstructured manuals to generating a validated maintenance knowledge base. Second, a resilience-driven scheduling engine is developed that combines multi-objective optimization with discrete-event simulation to generate plans that are not only efficient but also robust to real-world disruptions. Third, a principle of “explainability by design” is incorporated throughout the framework, ensuring every scheduled task is verifiably traceable to its source document. The remainder of this article is structured as follows. Section 2 reviews the state of the art and defines the research gaps. Section 3 details the proposed methodological framework. Section 4 presents the empirical validation and results. Section 5 discusses the implications of the findings, and Section 6 provides concluding remarks. From a broader perspective, ongoing trends underline the urgency of an end-to-end approach. A significant portion of enterprise information remains unstructured and under-utilized for analytics and decision support [4]. In parallel, maintenance is shifting from a cost center to a strategic capability within the “smart maintenance” vision, emphasizing digitalization, servitization, and proactive practices [5]. Even in predictive maintenance, arguably the most mature strand, systematic reviews still report fragmentation across methods, datasets, and deployment pathways [6]. These dynamics strengthen the case for integrating document intelligence, capacity-aware scheduling, and resilience analysis into a single, trustworthy pipeline.

2. State of the Art and Research Contribution

2.1. Strategic Review of the Literature

2.1.1. The Challenge of Unstructured Knowledge: Advances in Document Intelligence

The problem of extracting structured information from unstructured text is a long-standing challenge in natural language processing. Early approaches relied on Optical Character Recognition (OCR) combined with rule-based systems or templates, which were brittle and failed when document layouts changed. The modern era was ushered in by deep learning, particularly the Transformer architecture, which enabled models to understand language with deep contextual awareness [7]. For industrial applications, the focus sharpened on document-level information extraction (doc-IE), a task that aims to identify relationships between entities that may span multiple sentences or pages, which is common in technical manuals [8]. A crucial breakthrough was the development of layout-aware language models. These models are pre-trained on both the textual content and the two-dimensional layout of documents, enabling them to interpret complex structures like tables, forms, and figures with high accuracy [9]. This builds upon foundational document analysis research that established methods for page segmentation and table understanding [10]. However, recent surveys in Document AI, while highlighting advances in multi-modal understanding, also note a persistent gap between benchmark performance and reliable industrial impact. Production systems must grapple with noisy scans, handwritten annotations, inconsistent terminology across different manufacturers, and ambiguous temporal expressions like “periodically” [11]. These real-world constraints demand more than just accurate extraction; they necessitate robust validation and normalization pipelines, a key motivation for the architecture of the proposed framework.

2.1.2. Process Automation: Agentic AI in Logistics and Maintenance

Multi-Agent Systems (MASs) have emerged as a powerful paradigm for solving complex, distributed problems in dynamic environments [12]. In a MAS, a collection of autonomous agents interact with each other and their environment to achieve a collective objective. This decentralized approach is particularly well-suited for industrial automation and dynamic scheduling in manufacturing systems, where it offers scalability and robustness [13,14]. Foundational work in maintenance optimization established mathematical models for balancing the trade-off between preventive maintenance costs and operational reliability [15]. These models are now being enhanced by data-driven approaches like predictive maintenance, which leverage machine learning to forecast equipment failures. Specific to the offshore industry, machine learning and simulation are being applied to model complex FPSO (Floating Production Storage and Offloading) construction projects to mitigate common delays and improve Material Take Off (MTO) estimations [16]. Beyond pure optimization, the agent paradigm offers significant architectural advantages, including loose coupling, organizational abstraction, and a natural mapping to functional roles like parsing, validation, and planning [17]. Seminal work in software engineering has long positioned agents as a principled approach for building complex, open-ended software ecosystems [18]. The critical limitation of most existing systems, however, is their reliance on pre-existing clean, structured data. They are designed to optimize known tasks, not to discover and define those tasks from raw, unstructured documentation. This work directly addresses this integration gap by creating a seamless pipeline from document to schedule.

2.1.3. The Imperative of Resilience: Simulation for Planning and Risk Analysis

An operational plan, no matter how optimal in theory, is only effective if it is resilient to the unexpected disruptions of the real world. Discrete-Event Simulation (DES) is a standard and powerful methodology for modeling, analyzing, and optimizing complex systems under uncertainty. A significant body of research has used simulation to enhance the resilience of critical infrastructures, including ports and maritime transport networks, by stress-testing operational plans against disruptions like extreme weather or equipment failures [19]. This extends to security, where simulation is also used to model and protect strategic offshore assets, such as FPSOs, from hybrid threats [20]. The broader literature on supply-chain resilience provides a theoretical foundation for this approach, advancing concepts of buffering, redundancy, and agility as mechanisms to absorb shocks. More recent work on “viable” supply networks extends the concept of resilience toward long-term survivability under severe and prolonged disruptions [21]. This perspective strongly motivates the fusion of optimization and simulation. By subjecting candidate plans to simulated disruptions, such as component shortages or crew unavailability, it is possible to proactively favor schedules that are not just efficient but robust. The literature review reveals a common disconnect between the optimization and resilience research communities. Optimization studies often assume a deterministic world to find an optimal plan [13], while resilience studies typically assume a plan already exists and then test its fragility [22]. The proposed framework represents a conceptual step forward by integrating simulation directly into the planning loop to generate proactively resilient plans from the outset.

2.1.4. The Demand for Trust: Explainable AI and Traceability

The adoption of AI in high-stakes, regulated industries is fundamentally contingent on trust and transparency. This necessity has fueled the rapid growth of Explainable AI (XAI), a field dedicated to making the decisions of AI systems understandable to human operators [23]. Foundational surveys in the field systematize a wide range of techniques, from post-hoc explanation methods that analyze a model after training to the design of intrinsically interpretable models [24,25]. For knowledge-intensive industrial tasks, a key enabling technology is Retrieval-Augmented Generation (RAG) [26]. Unlike other explanation methods that might highlight which features were important for a decision, RAG models ground their generated outputs in a corpus of verifiable source documents, providing explicit citations for their claims. This mechanism is uniquely suited to industrial settings because it provides verifiability and auditability, not just interpretability. By applying RAG principles, it is ensured that every AI-generated operational instruction is verifiably linked back to an authoritative source document. This directly addresses the “black box” problem and provides the foundation of trust required for deploying AI systems where accountability for safety and compliance is paramount [27].

2.2. Identified Research Gaps and Objectives

The literature review identifies three interconnected research gaps. First is the “first mile” gap: while document extraction techniques are mature, there is a lack of integrated frameworks that carry this knowledge through the entire operational planning and risk analysis lifecycle. Most planning systems assume the availability of clean, structured data. Second is the optimization-resilience gap: current research often treats schedule optimization and resilience analysis as separate disciplines. This leads to plans that are efficient in theory but brittle in practice. Third is the “trust deficit” gap: the absence of verifiable traceability from a planned task back to its authoritative source remains a barrier to AI adoption in regulated industries where accountability is paramount. To address these gaps, this work pursues three primary objectives. First, to design and implement a synergistic multi-agent framework that automates the end-to-end process from ingesting unstructured technical manuals to generating a validated and structured maintenance knowledge base. Second, to develop a resilience-driven scheduling engine that combines multi-objective optimization with discrete-event simulation to generate maintenance plans that are not only efficient but also robust to real-world disruptions. Third, to incorporate a principle of “explainability by design” throughout the framework, ensuring that every scheduled task is verifiably traceable to its source document, thereby addressing the industrial need for trustworthy and auditable AI systems. The primary contribution of this research is the *synthesis* of these three disparate fields into a single, end-to-end, and auditable framework, which addresses a gap that isolated solutions cannot.

3. The Synergistic Multi-Agent Framework

The agents are designed to operate in a sequential pipeline, where the output of one agent serves as the input for the next. The Parsing Agent writes text chunks to a document store. The Extractor Agent reads these chunks, generates structured JSON objects, and passes them to the Validator Agent. The Validator Agent then cross-checks these objects against the original documents and, upon success, writes the final, clean record to the definitive ‘Validated Maintenance Table’ in the knowledge base. This database is then read by the Scheduler Agent and Query Agent for their respective tasks. This one-way data flow ensures traceability and simplifies the system architecture.

The system is architected as a cooperative ensemble of specialized AI agents, with each agent responsible for a specific cognitive task in a structured, sequential workflow. This agentic model utilizes Large Language Models (LLMs) (specifically, decoder-only generative models) as adaptable reasoning engines for every agent, facilitating goal-oriented autonomy and dynamic, context-sensitive behavior. Such a design offers the modularity and specialization needed to transform raw, unstructured documents into strategic, actionable intelligence [18]. The system is composed of five specialized agents: The Parsing Agent, The Extractor Agent, The Validator Agent, The Scheduler and Simulation Agent, and The Query Agent. The complete system architecture, illustrating this agentic workflow from document ingestion to the user interface, is shown in Figure 1.

3.1. Stage 1: High-Fidelity Knowledge Base Construction

The first phase of the framework is dedicated to the high-fidelity transformation of unstructured source material into a dependable, structured data repository. This is achieved through the combined work of the Parsing, Extractor, and Validator agents. The Parsing Agent acts as the system’s entry point for raw data. Acknowledging the inherent complexity of technical PDFs, this agent uses a versatile backend with layout-aware parsing engines to interpret visual structures and retain information from tables and hierarchical headings. A significant optimization is its targeted search capability, which examines document metadata and tables of contents to locate likely “maintenance” sections. By concentrating its processing power on these high-value segments, the agent minimizes computational load while maintaining comprehensive coverage of pertinent content. The resulting machine-readable output is then forwarded to The Extractor Agent. This agent is powered by an LLM (in this implementation, Anthropic’s Claude 3.5 Haiku, chosen for its large context window and cost-effectiveness) that is constrained by a rigorous, predefined data schema. This schema provides a cognitive structure, guiding the model to perform detailed entity and relationship extraction. The agent is specifically programmed to distinguish between recurring maintenance tasks and single-instance procedures like installation, ensuring that only actionable, periodic protocols are captured and organized according to the schema in Table 1. The Validator Agent introduces the quality assurance layer that sets this framework apart. A major risk in using LLMs for industrial applications is their tendency to generate factually inaccurate data. The Validator Agent is designed to mitigate this risk. Functioning as an independent auditor, it reviews each record from the Extractor Agent and performs a contextual cross-check against the original source material. This agent can correct miscategorized activity types, standardize ambiguous time-based expressions, and resolve conflicts, such as when a task appears in multiple languages. In this case, the agent uses a user-specified primary language (e.g., ’English’) as the canonical source and discards duplicates identified in other languages, ensuring a single, authoritative task entry. It acts as a crucial safeguard, removing any extracted task that cannot be clearly verified in the source document. All actions taken by the Validator Agent are logged, establishing a transparent audit trail for the knowledge base’s development.

3.2. Stage 2: Resilience-Driven Maintenance Scheduling

With the verified knowledge base in place, the system proceeds to convert this static information into a dynamic, operational schedule. The procedure starts with a deterministic computation to set a nominal target date,

{\hat{d}}_{T}

, for every task T. Using the last completion date,

L_{T}

, and the specified interval,

Δ_{T}

, the system projects forward from

L_{T}

in increments of

Δ_{T}

until the date is no longer in the past. With

t_{0}

as the current date, the nominal target date is calculated as

{\hat{d}}_{T} = L_{T} + k_{T} Δ_{T}

(1)

where the smallest non-negative integer

k_{T}

that makes the target not earlier than today is

k_{T} = max {0, ⌈ \frac{t_{0} - L_{T}}{Δ_{T}} ⌉}

(2)

The final scheduled date,

d_{T}

, must adhere to real-world constraints like daily resource limits and non-working periods, which together form a set of viable dates,

Ω

. To choose the best date from this set, a multi-objective optimization problem is formulated. The objective function, J, aims to balance three conflicting priorities: minimizing delays, preventing resource over-allocation, and prioritizing high-stakes tasks. The problem is thus to find the set of planned dates

{d_{T}}

that minimizes this function:

min_{{d_{τ} \in Ω}} J = \sum_{τ} [α \cdot max (0, d_{τ} - {\hat{d}}_{τ}) + β \cdot cluster (d_{τ}) + γ \cdot R_{τ}]

(3)

Each component of the objective function is weighted and reflects established principles from operations research [28]. The term

α \cdot max (0, d_{T} - {\hat{d}}_{T})

penalizes lateness to minimize tardiness. The term

β \cdot cluster (d_{T})

serves as a workload-smoothing function to avert resource bottlenecks. The term

γ \cdot R_{T}

includes a risk or criticality score,

R_{T}

, for each task, enabling the scheduler to prioritize safety-critical activities. This optimized schedule reflects an ideal plan in a deterministic environment. To confirm its real-world feasibility, the framework incorporates a discrete-event simulation engine. The optimized plan is used as a baseline for a series of Monte Carlo simulations that introduce stochastic events, like unforeseen technician shortages or supply chain disruptions. By subjecting the plan to this operational uncertainty, the system can predict key performance indicators, pinpoint potential bottlenecks, and measure operational risks. This simulation-based method facilitates a transition from reactive troubleshooting to proactive, resilient maintenance planning.

3.3. Stage 3: Explainability by Design: The User Interaction Layer

The concluding phase of the framework is the user interaction layer, which renders the structured data and optimized plans into an intuitive interface centered on the principle of explainability. The Scheduler Agent displays the optimized plan within an interactive calendar. Each scheduled item is an interactive element; clicking on it reveals a panel with the complete structured data and a direct link to the corresponding page in the source manual. This direct linkage from the plan to the source page eliminates ambiguity and allows for immediate verification. The calendar is supplemented by the Query Agent, which offers a conversational interface to the knowledge base. This agent utilizes a Retrieval-Augmented Generation (RAG) process [26]. When a user asks a question in natural language, the agent first queries the validated table to find relevant tasks. It then pulls surrounding text from the cited page in the source PDF to add context. Finally, it synthesizes this data into a clear, natural language answer that explicitly cites the source document and page. This dedication to “plan-to-page” traceability is a tangible application of XAI principles. By ensuring that all system-generated information has a verifiable audit trail to an authoritative source, the framework establishes the trust required for deployment in high-stakes operational settings. An example of the final user interface, demonstrating the ’plan-to-page’ traceability in both the chatbot and the calendar, is presented in Figure 2.

3.4. Technical Implementation Details

The framework is implemented using the following technologies and configurations.

The Parsing Agent utilizes PyMuPDF4LLM as its primary backend for layout-aware extraction, with a fallback to plain PyMuPDF. This produces per-page Markdown with preserved table structure. The agent automatically identifies maintenance sections by scanning the document’s Table of Contents (TOC) and outline. The Extractor Agent and Validator Agent both utilize Anthropic’s Claude 3.5 Haiku (claude-3-5-haiku-latest). This model was selected for its large 200 K token context window, high-speed inference, and cost-effectiveness. Extraction and validation parameters are set with max_tokens = 1500 and max_retries = 2.

The framework uses the instructor (v1.6.2) library to bind LLM outputs to Pydantic (v2.6+) models. The core schema is a 13-field MaintenanceRecord model (detailed in pydantic_models.py) which includes runtime validators for normalizing time periods and ensuring valid ActivityTypeEnum values. The Validator Agent (from ai_validator.py) performs a cross-check by re-fetching the original PDF context (including ±N page neighbors) and applying hallucination guards (e.g., verifying that “6 months” is explicitly mentioned on the page before accepting a 6-month interval). All validation decisions are logged with a confidence score to logs/ai_validation_decisions.jsonl. The Scheduler Agent (from optimizator.py) uses a multi-objective heuristic search algorithm built with NumPy and Pandas to manage capacity constraints. The Query Agent indexes the final validated data into a ChromaDB (v0.4.22) vector store to power the RAG chatbot interface.

4. Empirical Validation and Results

4.1. Experimental Setup and Dataset

For validation, the framework was tested against a demanding, real-world corpus of intricate technical manuals. The initial, manually curated CSV file used as the input for this case study contained 162 unique maintenance protocols across 39 distinct machines. This dataset, from manufacturers like Kohler, Sanlorenzo, and Furuno, featured a range of formats and technical terms. The protocols were categorized into types such as ‘Maintenance’ (102 tasks), ‘Control’ (49 tasks), and ‘Replacement’ (11 tasks).

4.2. Schedulable Task Filtering

The initial 162-protocol knowledge base contained a mix of schedulable and non-schedulable tasks. To create the dataset for the optimization case study, a filter was applied to select only time-based interventions. This process excluded 18 ‘Every Use’ tasks and 42 tasks with ambiguous frequencies (e.g., ‘Not Specified’, ‘Periodically’, or ‘When Required’). The optimizer’s logic also correctly identified and converted tasks based on numeric Operating_Hours (e.g., ‘250 h’) into day-based intervals. This filtering and conversion process resulted in the final schedulable dataset of 118 unique maintenance interventions, which are the subject of the following scheduling results. Table 2 shows two examples from this maritime case study, demonstrating the system’s capacity to capture varied tasks with precise requirements.

4.3. Scheduling Performance and Visual Analytics

The optimized maintenance plan was generated by the scheduling engine based on the validated 118-task dataset. The heuristic was configured with a daily capacity of 6 tasks and a weekly capacity of 28 tasks to simulate a realistic operational tempo. The quantitative results of the optimization are presented in Table 3. The engine successfully scheduled all 118 interventions without violating any capacity constraints. The peak daily load reached but never exceeded the 6-task limit, and the peak weekly load was 22 tasks, well below the 28-task ceiling. To achieve this, the optimizer deferred 46 tasks by a mean of only 2.0 days and advanced 8 non-critical tasks by an average of 1.9 days to pre-empt bottlenecks.

The analytics layer provides complementary visuals to interpret these results. Figure 3 visually quantifies the optimizer’s adjustments, showing the distribution of schedule deviations. The large central bar confirms the majority of tasks (64) were scheduled on their exact nominal date. The color-coded tails show the controlled nature of the adjustments: 8 tasks were advanced (green) by an average of 1.9 days, while 46 were deferred (red) by an average of 2.0 days, demonstrating that the heuristic avoids significant delays.

Figure 4 shows the daily workload heatmap. The absence of any cells with a value greater than 6 (the daily limit) provides clear visual confirmation that the heuristic successfully respected all operational constraints. The busiest date (23 September) is clearly visible with 6 tasks, while the later weeks show a lighter load as the optimizer pushed non-critical tasks into these periods.

Finally, Figure 5 plots the final optimized date against the original nominal target for every task. The tasks on the central diagonal line represent the 64 tasks executed exactly on-time. The short, dashed connectors for most other tasks visualize the minimal deferrals (average 2.0 days) introduced by the optimizer to respect capacity. The color-coding (criticality) confirms that high-priority tasks (red/pink) remained close to their nominal dates.

4.4. Qualitative User Feedback

In the user-facing application, the attending planners (two) and one technician reported that the automated schedule generated by the system provided a high-quality baseline, reducing the time required for manual planning. The “plan-to-page” traceability feature was positively received, with technicians highlighting the value of being able to instantly access the source page from a tablet while performing a task in the field. The Query Agent successfully answered most test queries with accurate, cited information, proving useful for ad hoc troubleshooting and for training new personnel by providing immediate, verifiable answers to specific questions.

5. Discussion

The successful application of the multi-agent framework to the complex domain of maritime equipment maintenance highlights several implications for industrial AI. The system’s core design, emphasizing self-validation and “plan-to-page” traceability, directly confronts the challenge of trust in automated systems. In high-stakes environments, the inability to verify the provenance of a decision is a barrier to adoption. By ensuring every scheduled task is verifiably linked to an authoritative source document, the framework moves beyond being a “black box” to become a transparent and auditable partner in the maintenance process. The validation of the system’s outputs by domain experts confirmed the high accuracy and practical relevance of the extracted data. This feedback underscores the system’s value not as a replacement for human expertise, but as a tool for augmenting it, freeing skilled personnel from tedious data transcription to focus on higher-level planning. The integration of simulation-driven scheduling further distinguishes this work, empowering organizations to move beyond static planning towards a proactive and resilient maintenance strategy. By stress-testing optimized plans against potential real-world disruptions, the system allows planners to anticipate bottlenecks and quantify risks. This represents a shift from the reactive approach to maintenance management that is common in many industries.

5.1. Practical and Industrial Implications

Beyond the academic contributions, the framework has direct practical applications for industrial asset management. For a yacht fleet operator, this system translates unstructured data (manuals) into direct financial and safety value. First, it reduces manual planning overhead, automating thousands of hours of data entry and validation. Second, as shown in Figure 2, it empowers field technicians by providing a “plan-to-page” link on a tablet, eliminating ambiguity and reducing human error during a task. Third, the simulation-driven scheduling, visualized in the results (Figure 3, Figure 4 and Figure 5), improves operational resilience, minimizing costly downtime by ensuring that optimized plans are robust to real-world disruptions. This provides a clear pathway for moving from a reactive or preventive maintenance posture to a truly predictive and resilient strategy. This directly translates to financial value by reducing unplanned equipment downtime-a critical cost driver in the maritime sector-and optimizing technician crew allocation, preventing unnecessary overtime costs.

5.2. Limitations and Future Work

While the results are promising, the current framework has limitations. The document intelligence pipeline is primarily focused on textual information and does not yet interpret complex technical diagrams or schematics; the consequence is that critical information, such as torque values, safety warnings, or part numbers found only in schematics, is currently missed, necessitating a final manual verification step. The system was validated on a single, albeit complex, domain and its generalizability to other industries such as aerospace or energy would require further testing; this means the framework’s robustness to different documentation standards, such as those in aerospace or civil engineering which may use different terminology and layouts, has not yet been validated. The simulation models, while effective, are based on generalized distributions for disruptions and would benefit from being calibrated with historical, domain-specific data on equipment failure rates and supply chain performance; as a result, the simulation currently provides a qualitative measure of resilience (e.g., ’this plan is more robust’) rather than a quantitative prediction (e.g., ’this plan has a 10% risk of failure’), which would be possible if calibrated with historical failure-rate data. Fourth, the framework successfully extracted and validated 118 maintenance tasks, with 100% retention of meaningful tasks post-validation. However, a formal quantitative accuracy study (precision, recall, F1-score) of the Extractor and Validator agents was not conducted. To perform this assessment, ground-truth labels would be required from domain experts manually reviewing all extracted records. This remains an important direction for future work, particularly to assess performance across different document types and languages. Fifth, the heuristic weights (

α, β, γ

) for the objective function in Equation (3) were set empirically based on planner feedback. A formal sensitivity analysis was not performed to explore the trade-offs between different business goals (e.g., minimizing max lateness vs. minimizing average lateness). Sixth, the study presents an integrated system, but does not include an ablation study to quantitatively isolate the contribution of each component (e.g., the improvement in data quality from the Validator Agent, or the impact of simulation-in-the-loop vs. a post-hoc stress test). Seventh, the qualitative feedback (Section 4.4) was gathered from informal demonstrations with 3 domain experts (2 planners, 1 technician), not a structured user study. Metrics on task time reduction, usability, or decision quality were not formally collected. Eighth, the proposed framework integrates extraction, validation, and resilience-aware scheduling. However, comparative benchmarks against alternative architectures are absent, particularly:

Rule-based extraction systems with heuristic scheduling.
Knowledge Graph approaches (e.g., Neo4j-based maintenance ontologies).
Digital Twin frameworks with physics-based simulation.
Commercial platforms (e.g., SAP Predictive Maintenance, IBM Maximo).
Competing LLM models (GPT-5, Claude 4.5, locally hosted open models via Ollama).

Such comparisons would strengthen the empirical validation and position the work within the broader competitive landscape. Finally, deploying LLMs for safety-critical maintenance tasks requires a rigorous risk analysis. The current framework acknowledges hallucination risk (mitigated by the Validator Agent) but does not yet include a formal human-in-the-loop (HITL) verification policy for high-criticality extractions, which would be essential before any real-world deployment.

Future work will focus on addressing these limitations. The research team plans to integrate multi-modal models capable of interpreting technical diagrams to enrich the knowledge base. The framework will also be extended to other industrial domains to validate its generalizability. Finally, methods for integrating real-time sensor data will be explored to enable a shift from preventive to condition-based maintenance, where scheduling is informed not only by manuals but by the actual health of the equipment.

6. Conclusions

This work presented a multi-agent AI framework that transforms unstructured maintenance documentation into a structured, queryable and auditable knowledge base, and demonstrated its application to a real-world maritime case study. The primary contribution of this work is the design and validation of a synergistic, end-to-end framework that bridges the gap from unstructured document intelligence to resilient, simulation-driven scheduling. Unlike prior work, which tackles these problems in isolation, this framework guarantees ’plan-to-page’ traceability throughout the entire pipeline, establishing the trust and auditability required for high-stakes industrial applications. Applied to the Sanlorenzo SD132#150 superyacht, the system processed a manually curated knowledge base of 162 protocols to extract a schedulable set of 118 maintenance tasks. The scheduling engine generated an optimized 12-week maintenance plan that respected all capacity constraints (6 tasks/day, 28 tasks/week). The scheduling engine achieved a 54.2% on-time execution rate, deferring 46 tasks by only 2.0 days on average (and a max of 5 days) to smooth the workload and avoid bottlenecks, as validated in Table 3 and Figure 3.

The framework’s design prioritizes transparency and traceability. Every task in the generated schedule is hyperlinked to its source page in the original documentation, enabling technicians to verify procedures instantly via tablet interfaces. The Query Agent provides cited, on-demand answers to ad hoc technical questions, democratizing access to maintenance knowledge and reducing dependency on expert availability. Visual analytics—including the deviation histogram, daily workload heatmap, and schedule alignment plot—provide planners with interpretable decision support, making capacity bottlenecks and schedule adjustments immediately visible.

The results validate the viability of LLM-driven document intelligence for industrial applications but also highlight important directions for future work. Extensions include integrating multi-modal models to interpret technical diagrams, validating generalizability across other sectors (aerospace, energy), incorporating real-time sensor data for condition-based maintenance, and embedding constraint solvers to handle complex resource allocation (technician skills, tool availability). These enhancements would advance the system from a robust preventive scheduler to a continuously learning, resilient, and explainable decision-support platform that remains auditable by design, laying the foundation for a new generation of intelligent maintenance systems that are not just automated but also transparent, reliable, and strategically aligned with organizational goals.

Author Contributions

Conceptualization, L.C., M.G., M.M., X.S. and V.S.; methodology, L.C., M.G., M.M., X.S. and V.S.; software, L.C., M.G., M.M., X.S. and V.S.; writing original draft preparation, L.C., M.G., M.M., X.S. and V.S.; writing review and editing, L.C., M.G., M.M., X.S. and V.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to industrial confidentiality.

Acknowledgments

The authors wish to acknowledge Antonio Martella for the development of this research and framework. His profound expertise and background in yacht maintenance and the broader marine sector provided critical insights that greatly enhanced the quality and relevance of this work.

Conflicts of Interest

Author M.M. is employed by the company S4F SIM4Future. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Carvalho, T.P.; Soares, F.A.; Vita, R.; Francisco, R.D.P.; Basto, J.P.; Alcalá, S.G. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng. 2019, 137, 106024. [Google Scholar] [CrossRef]
Stone, P.; Veloso, M. Multiagent Systems: A Survey from a Machine Learning Perspective. Auton. Robot. 2000, 8, 345–383. [Google Scholar] [CrossRef]
Tako, A.A.; Robinson, S. The application of discrete event simulation and system dynamics in the logistics and supply chain context. Decis. Support Syst. 2012, 52, 802–815. [Google Scholar] [CrossRef]
Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
Bokrantz, J.; Skoogh, A.; Berlin, C.; Wuest, T.; Stahre, J. Smart maintenance: A research agenda for industrial maintenance management. J. Manuf. Syst. 2020, 56, 176–200. [Google Scholar] [CrossRef]
Zonta, T.; da Costa, C.A.; da Rosa Righi, R.; de Lima, M.J.; da Trindade, E.S.; Li, G.P. Predictive maintenance in the Industry 4.0: A systematic literature review. Comput. Ind. 2020, 123, 103289. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Zheng, Y.; Guo, Y.; Luo, Z.; Yu, Z.; Wang, K.; Zhang, H.; Zhao, H. A Survey on Document-Level Relation Extraction: Methods and Applications. In Proceedings of the 3rd International Conference on Internet, Education and Information Technology (IEIT 2023), Xiamen, China, 28–30 April 2023; pp. 1061–1071. [Google Scholar]
Xu, Y.; Li, M.; Cui, L.; Huang, S.; Wei, F.; Zhou, M. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1192–1200. [Google Scholar]
Marinai, S.; Gori, M.; Soda, G. Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 23–35. [Google Scholar] [CrossRef]
Shen, Y.; Yu, J.; Wang, P.; Zhang, R.; Liu, T. Large Language Models for Document-Level Information Extraction: A Survey. arXiv 2024, arXiv:2402.17930. [Google Scholar]
Dorri, A.; Kanhere, S.S.; Jurdak, R. Multi-Agent Systems: A Survey. IEEE Access 2018, 6, 28573–28593. [Google Scholar] [CrossRef]
Ouelhadj, D.; Petrovic, S. A survey of dynamic scheduling in manufacturing systems. J. Sched. 2009, 12, 417–431. [Google Scholar] [CrossRef]
Hussain, M.S.; Ali, M. A Multi-agent Based Dynamic Scheduling of Flexible Manufacturing Systems. Glob. J. Flex. Syst. Manag. 2019, 20, 267–290. [Google Scholar] [CrossRef]
Dekker, R. Applications of maintenance optimization models: A review and analysis. Reliab. Eng. Syst. Saf. 1996, 51, 229–240. [Google Scholar] [CrossRef]
Bruzzone, A.G.; Sinelshchikov, K.; Gotelli, M.; Monaci, F.; Sina, X.; Ghisi, F.; Cirillo, L.; Giovannetti, A. Machine Learning and Simulation Modeling Large Offshore and Production Plants to improve Engineering and Construction. Procedia Comput. Sci. 2025, 253, 3318–3324. [Google Scholar] [CrossRef]
Ruiz Rodríguez, M.L.; Kubler, S.; de Giorgio, A.; Cordy, M.; Robert, J.; Le Traon, Y. Multi-agent deep reinforcement learning based predictive maintenance on parallel machines. Robot. Comput. Integr. Manuf. 2022, 78, 102406. [Google Scholar] [CrossRef]
Jennings, N.R. An agent-based approach for building complex software systems. Commun. ACM 2001, 44, 35–41. [Google Scholar] [CrossRef]
He, Y.; Yang, Y.; Wang, M.; Zhang, X. Resilience Analysis of the Container Port Shipping Network Structure: A Case Study of China. Sustainability 2022, 14, 9489. [Google Scholar] [CrossRef]
Bruzzone, A.G.; Massei, M.; Gotelli, M.; Giovannetti, A.; Martella, A. Sustainability, Environmental Impacts and Resilience of Strategic Infrastructures. In Proceedings of the International Workshop on Simulation for Energy, Sustainable Development and Environment, SESDE, Athens, Greece, 18–20 September 2023. [Google Scholar]
Ivanov, D.; Dolgui, A. Viability of intertwined supply networks: Extending the supply chain resilience angles towards survivability. Int. J. Prod. Res. 2020, 58, 2904–2915. [Google Scholar] [CrossRef]
Tordecilla, R.D.; Juan, A.A.; Montoya-Torres, J.R.; Quintero-Araujo, C.L.; Panadero, J. Simulation–optimization methods for designing and assessing resilient supply chain networks under uncertainty scenarios: A review. Simul. Model. Pract. Theory 2021, 106, 102172. [Google Scholar] [CrossRef]
Moosavi, S.; Zanjani, M.; Razavi-Far, R.; Palade, V.; Saif, M. Explainable AI in Manufacturing and Industrial Cyber–Physical Systems: A Survey. Electronics 2024, 13, 3497. [Google Scholar] [CrossRef]
Guidotti, R.; Monreale, A.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 2018, 51, 93. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 9459–9474. [Google Scholar]
Cummins, C.; Somers, F.; Delaney, A.; Bruton, K. Explainable predictive maintenance: A survey of current methods, challenges, and opportunities. arXiv 2024, arXiv:2401.07871. [Google Scholar] [CrossRef]
Pinedo, M.L. Scheduling: Theory, Algorithms, and Systems, 6th ed.; Springer: Cham, Switzerland, 2022. [Google Scholar]

Figure 1. System architecture of the multi-agent framework for intelligent maintenance. The pipeline transforms unstructured PDF manuals into a validated knowledge base through a sequence of specialized AI agents. The frontend application leverages this trusted data to provide a simulation-driven scheduling calendar and a RAG-powered chatbot. The principle of “Plan-to-Page Traceability” ensures every output is directly verifiable against the source document.

Figure 2. The Yacht Maintenance Assistant User Interface. The dashboard integrates a conversational AI assistant (left) with an interactive maintenance calendar (right). Both the AI’s responses and the scheduled tasks demonstrate the core “plan-to-page” traceability by providing direct, verifiable citations to the source manuals.

Figure 3. Distribution of Schedule Deviations (Nominal vs. Optimized). This histogram shows the number of tasks scheduled on-time (blue, 64 tasks), advanced (green, 8 tasks), or delayed (red, 46 tasks). The plot visually confirms that most tasks were on-time and deviations were minor.

Figure 4. Optimized Daily Workload Heatmap (Capacity Limit: 6 tasks/day). The color scale is locked to the daily limit (yellow = 6 tasks). The absence of any “hotter” colors provides visual confirmation that the 6-task daily capacity was never breached.

Figure 5. Optimized Schedule Versus Nominal Targets. The vertical axis displays the Task Index (denoted by #), representing the unique identifier for each of the 118 maintenance tasks sorted by their optimized date. Deviation lines are color-coded: tasks advanced (green) or delayed (red) by the optimizer to respect capacity. Tasks on the central line were executed on their exact nominal date.

Table 1. Structure of the Validated Maintenance Table.

Column Heading	Description
Machinery	The specific name or model of the equipment.
Component	The sub-component or part of the machinery being maintained.
Maintenance Description	A clear, verb-first description of the required maintenance action.
Activity Type	The categorical nature of the task (e.g., Control, Maintenance, Replacement, Other).
Operating Hours	The maintenance interval defined in terms of machinery operating hours.
Time Period	The calendar-based interval (e.g., Daily, Weekly, Monthly, Yearly).
Every Use Flag	A boolean flag for tasks to be performed before or after each use.
Reference	The filename of the source PDF document.
Page	The precise page number within the source document.
Necessary Material	A comma-separated list of required tools, parts, or materials.
Operator	The designated role or qualification for the person performing the task.
Note	Any supplementary notes, warnings, or crucial instructions.

Table 2. Examples of Curated Data from the Validated Knowledge Base.

Machinery	Component	Maintenance Description	Activity Type	Necessary Material	Time Period	Operator	Note
Calpeda pump	Pump Body	Rinse with clean water to remove deposits	Maintenance	N.A. ¹	Before each use	User	Briefly run the pump with clean water to remove accumulated deposits.
Gaggenau CI292	Silicone Seal	Remove and inspect silicone seal around cooktop	Maintenance	Suitable removal tool	Each 12 months	Authorized personnel	Use suitable tool to remove seal carefully.

¹ N.A.: Not Applicable.

Table 3. Summary of Optimization Scheduling Results for the Case Study.

Metric	Value
Total Schedulable Tasks	118
Schedule Adherence
Tasks On-Time (Executed on nominal date)	64 (54.2%)
Tasks Advanced (Executed early)	8 (6.8%)
Tasks Deferred (Executed late)	46 (39.0%)
Deviation Metrics
Average Deferral (for late tasks)	2.0 days
95th Percentile Deferral	4.8 days
Max Deferral	5 days
Average Advancement (for early tasks)	1.9 days
Workload & Capacity
Daily Capacity Limit	6 tasks
Peak Daily Load (Max tasks in one day)	6 tasks
Days Exceeding Daily Capacity	0
Weekly Capacity Limit	28 tasks
Peak Weekly Load (Busiest week)	22 tasks
Weeks Exceeding Weekly Capacity	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cirillo, L.; Gotelli, M.; Massei, M.; Sina, X.; Solina, V. A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge. AI 2025, 6, 304. https://doi.org/10.3390/ai6120304

AMA Style

Cirillo L, Gotelli M, Massei M, Sina X, Solina V. A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge. AI. 2025; 6(12):304. https://doi.org/10.3390/ai6120304

Chicago/Turabian Style

Cirillo, Luca, Marco Gotelli, Marina Massei, Xhulia Sina, and Vittorio Solina. 2025. "A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge" AI 6, no. 12: 304. https://doi.org/10.3390/ai6120304

APA Style

Cirillo, L., Gotelli, M., Massei, M., Sina, X., & Solina, V. (2025). A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge. AI, 6(12), 304. https://doi.org/10.3390/ai6120304

Article Menu

A Synergistic Multi-Agent Framework for Resilient and Traceable Operational Scheduling from Unstructured Knowledge

Abstract

1. Introduction

2. State of the Art and Research Contribution

2.1. Strategic Review of the Literature

2.1.1. The Challenge of Unstructured Knowledge: Advances in Document Intelligence

2.1.2. Process Automation: Agentic AI in Logistics and Maintenance

2.1.3. The Imperative of Resilience: Simulation for Planning and Risk Analysis

2.1.4. The Demand for Trust: Explainable AI and Traceability

2.2. Identified Research Gaps and Objectives

3. The Synergistic Multi-Agent Framework

3.1. Stage 1: High-Fidelity Knowledge Base Construction

3.2. Stage 2: Resilience-Driven Maintenance Scheduling

3.3. Stage 3: Explainability by Design: The User Interaction Layer

3.4. Technical Implementation Details

4. Empirical Validation and Results

4.1. Experimental Setup and Dataset

4.2. Schedulable Task Filtering

4.3. Scheduling Performance and Visual Analytics

4.4. Qualitative User Feedback

5. Discussion

5.1. Practical and Industrial Implications

5.2. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI