Next Article in Journal
An Interpretable Fuzzy Framework for Data-to-Text Generation Using Linguistic Contexts and Computational Perceptions: A Case Study on Photovoltaic Stations
Previous Article in Journal
FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection
Previous Article in Special Issue
A Unified Fuzzy–Explainable AI Framework (FAS-XAI) for Customer Service Value Prediction and Strategic Decision-Making
 
 
Article
Peer-Review Record

LLM-Augmented Algorithmic Management: A Governance-Oriented Architecture for Explainable Organizational Decision Systems

by Nikolay Hinov 1,2,* and Maria Ivanova 3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Submission received: 29 January 2026 / Revised: 28 February 2026 / Accepted: 6 March 2026 / Published: 10 March 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a governance-oriented architecture for integrating Large Language Models (LLMs) into algorithmic management systems. The authors propose a three-layer framework consisting of an algorithmic decision core that performs computational decisions, an LLM-based cognitive interface that generates natural language explanations and supports dialogue, and a verification and governance layer that enforces policy constraints, maintains audit trails, and enables human-in-command oversight.The architecture is motivated by the need to address opacity in algorithmic management while avoiding new risks introduced by LLMs, such as hallucinated rationales, bias amplification, and automation dependence. 

Detailed comments:

- The conceptual nature of the contribution presents both strengths and limitations. The authors acknowledge in Section 3 that their work is primarily conceptual rather than empirical, and they describe the simulation as "demonstrative" rather than evaluative. While conceptual contributions certainly have value in establishing design principles and architectural patterns, the paper would be significantly strengthened by more extensive discussion of validation pathways. The simulation uses synthetic data with simplified validity metrics, which fundamentally limits the ability to assess whether the architecture would actually deliver on its governance promises in real organizational settings. The explanation validity metric described in Section 6.2 is synthetic and not grounded in actual verification of LLM outputs against source documents. Similarly, the constraint satisfaction rate of 94.2% is presented without discussion of what happens to the failed cases or whether this failure rate would be acceptable across different risk contexts. The latency trade-off analysis shown in Figure 6 reveals only weak correlation between validity and latency, yet the paper does not explore what this implies for governance design decisions.

- The paper would benefit from deeper engagement with existing literature on explainable AI architectures and verification frameworks, particularly those addressing privacy and security concerns. While the authors position their work within the algorithmic management literature and reference governance instruments, they miss opportunities to connect with related architectural approaches that have grappled with similar challenges of explainability, verification, privacy preservation, and security in different domains. The integration of LLMs into decision systems raises critical security and privacy questions that have been addressed in adjacent research areas, yet the paper does not systematically engage with this body of work to inform its governance layer design. I therefore suggest considering, among others, the following articles:

[1] Ferrag, Mohamed Amine, et al. "From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows." arXiv preprint arXiv:2506.23260 (2025).
[2] Breve, Bernardo, et al. "Towards Explainable Security for ECA Rules." In: EMPATHY@ AVI. 2022. p. 26-30.
[3] Bao, Aorigele, and Yi Zeng. "Understanding the dilemma of explainable artificial intelligence: a proposal for a ritual dialog framework." Humanities and Social Sciences Communications 11.1 (2024): 1-9.

- The paper would benefit from deeper consideration of how the proposed metrics would be operationalized in practice, what potential failure modes exist beyond those already identified, and how organizations would calibrate governance controls without empirical baselines. The simulation also does not model human oversight delays, escalation costs, or the possibility that verification itself might fail. These are not merely technical details but central to understanding whether the architecture can function as intended in real organizational environments.

- The implementation challenges deserve more thorough treatment throughout the paper. The architecture assumes several capabilities that are far from trivial to realize in practice. For example, the concept of policy-as-code is referenced repeatedly, yet the paper provides limited guidance on how complex, ambiguous, or context-dependent organizational policies would be formalized. Many real-world policies involve judgment, interpretation, and implicit knowledge that may resist straightforward encoding. How would the system handle policies that require contextual interpretation?

- Similarly, while the authors propose retrieval-augmented generation (RAG) for grounding explanations in authoritative sources, they do not adequately address the known limitations of RAG systems. These include retrieval failures when relevant information is not properly indexed, context window constraints that limit how much retrieved information can be incorporated, contradictory sources that provide conflicting guidance, and the fundamental challenge of determining source authority in organizational contexts where documentation may be outdated, incomplete, or contradictory. The paper assumes a relatively clean information environment, but most organizations struggle with fragmented, inconsistent, and evolving documentation.

- The constraint checking mechanism is presented as straightforward automated validation, but this glosses over significant complexity. How would the system handle partial constraint satisfaction, where some but not all requirements are met? What about constraint conflicts, where satisfying one rule necessarily violates another? How would temporal dependencies be managed, where constraints apply differently based on timing or sequencing? Perhaps most importantly, many organizational constraints require interpretation rather than mechanical checking. The boundary between constraints that can be automatically verified and those requiring human judgment is not clearly delineated in the paper.

With these revisions, the paper could make a strong contribution to the emerging literature on responsible AI deployment in organizational contexts.

Author Response

Response to Reviewer 1

We sincerely thank the reviewer for the detailed, constructive, and technically grounded feedback. We appreciate the recognition of the manuscript’s motivation and architectural intent, and we agree that a governance-oriented contribution must be especially careful in (i) delimiting what is validated, (ii) articulating credible validation pathways, and (iii) operationalising practical failure handling rather than assuming idealised conditions. In the revised manuscript, we substantially strengthened the research design framing, expanded the simulation interpretation and failure routing, deepened the discussion of operational metrics and calibration, and enhanced the treatment of implementation challenges (policy-as-code, RAG limitations, constraint checking complexity). We also expanded the Related Work synthesis to engage more directly with adjacent literature on LLM security, explainable security, and dialogical XAI as recommended.

Below we respond to each comment in detail.

Comment 1 (Conceptual contribution, validation pathways, synthetic metrics, failed cases, weak latency–validity correlation)

Reviewer comment (summary): The conceptual nature is acknowledged, but validation pathways are underdeveloped; “validity” is synthetic and not grounded in document verification; CSR is reported without discussing failed cases or acceptability by risk context; latency–validity correlation is weak but implications are not explored.

Response: We strongly agree, and we have revised the manuscript to make the scope and validation logic both clearer and more operationally meaningful.

  1. Clarified validation scope and pathways (conceptual vs. empirical).
    We expanded the research design and methodology to explicitly position the work as a design-oriented architectural artefact and to specify what forms of validation are appropriate at this stage (architectural plausibility, control placement logic, operational trade-off illustration), as well as what requires future empirical evaluation (semantic correctness, organisational outcomes, compliance effectiveness, behavioural effects). We added explicit assumptions to delimit claims and to prevent over-interpretation of the demonstrative simulation.

  2. Made the synthetic “validity” metric explicit as a proxy and bounded indicator.
    We revised the simulation section to clarify that the explanation-validity score is a synthetic proxy intended to represent “consistency under assumed conditions,” not semantic truth. We now explicitly state that real-world validity would require domain evaluation protocols (e.g., expert review and evidence-consistency checking against authoritative sources), and we reflect this in the limitations and future research agenda.

  3. Explicit handling of failed cases and risk-tier acceptability.
    We revised the results to state clearly that the 7 failed events are routed to human review and not executed, and we added interpretation that the acceptability of a 5.8% routing rate depends on decision risk, oversight capacity, and false-positive tolerance. This also reinforces the paper’s governance thesis: exception handling is a first-class workflow, not a footnote.

  4. Interpreted the weak latency–validity relationship as a governance design insight.
    We extended the latency–validity discussion to explain that weak correlation implies latency cannot be used as a proxy for explanation quality or governance robustness. We explicitly derive a governance implication: multi-metric gating (constraint status + grounding/validity + uncertainty indicators) is preferable to latency thresholds alone. We also note that real deployments may exhibit different relationships depending on retrieval depth, explanation length, and verification strictness.

Manuscript changes: Expanded Section 3 (research design/assumptions), Section 6.2–6.5 (metrics formalisation, failed-case routing, correlation interpretation), and Section 9.3 (limitations), with explicit links to Section 10 (future research).

Comment 2 (Need deeper engagement with explainable AI architectures and verification frameworks; suggested citations)

Reviewer comment (summary): The paper should connect to adjacent architectural approaches addressing explainability, verification, privacy, and security. Suggested: Ferrag et al. (LLM agent workflow threats), Breve et al. (explainable security for ECA rules), Bao & Zeng (ritual dialog framework).

Response: We appreciate these highly relevant pointers and agree that they strengthen the architectural grounding of our governance layer.

  • We expanded the Related Work and synthesis to connect our design to security-driven architectural separations, especially where LLM interfaces become attack surfaces (e.g., prompt injection leading to tool misuse / protocol exploits). This directly supports our argument that the governance layer is not merely compliance overhead but also a security boundary mediating access and release.

  • We integrated the explainable-security perspective (ECA rule systems) as an architectural analogue for transparent rule execution, auditability, and interaction effects between constraints—paralleling our emphasis on constraint checking + provenance.

  • We incorporated the dialogical/ritual view of explainability to reinforce the framing that explanation is not merely a text output but a structured interaction requiring contestability, escalation pathways, and interpretive accountability—precisely what the oversight workflow is designed to support.

Manuscript changes: Strengthened Section 2.2 (security/agent-workflow risks), Section 2.4 (synthesis and architectural implications), and reinforced these linkages in Sections 4–5 (governance layer as verification/security boundary).

Comment 3 (Operationalising metrics; failure modes; calibration without baselines; simulation omits oversight delays, escalation costs, verification failure)

Reviewer comment (summary): The paper should better explain how metrics would be operationalised, explore additional failure modes, and discuss how governance controls would be calibrated without empirical baselines. Oversight delays and verification failures are not modeled.

Response: We fully agree that these aspects are central to governance feasibility. We addressed this by:

  • Decomposing augmented latency into explanation overhead and governance/verification overhead in the metric definitions and interpretation, making the “cost of governance” explicit rather than implicit.

  • Expanding failure handling to include not only decision-constraint failure but also the possibility that verification itself may fail (false positives/false negatives/inconclusive results), and stating that such outcomes should route to review rather than force a binary “pass/fail” illusion.

  • Expanding limitations to explicitly acknowledge that we do not model: reviewer workload, queue dynamics, escalation delays, disagreement among reviewers, and verification error rates.

  • Adding calibration language: acceptable latency overhead and routing rates must be set by decision risk and organisational tolerance; empirical baselines are therefore a key direction for future work rather than an implicit assumption.

Manuscript changes: Section 6.2–6.5 strengthened; Section 9.3 expanded; Section 10 explicitly tied to limitations.

Comment 4 (Implementation challenges: policy-as-code; complex/ambiguous policies; contextual interpretation)

Reviewer comment (summary): Policy-as-code is not trivial; many policies require judgment; the paper should explain how contextual interpretation is handled.

Response: We agree, and we revised the manuscript to treat policy formalisation as a spectrum, not an assumption of full codifiability. Specifically, we now distinguish:

  • Machine-checkable constraints (thresholds, role segregation, eligibility rules, approval chains)

  • Semi-formal constraints requiring structured interpretation (conditional clauses, exceptions, contextual dependencies)

  • Judgment-heavy policies requiring human adjudication (ethical trade-offs, nuanced compliance interpretations)

We clarified that the governance layer should support hybrid verification: automatic checks where possible, and evidence surfacing + human-in-command escalation where policy interpretation cannot be mechanised. This directly aligns with the paper’s authority-separation principle.

Manuscript changes: Expanded implementation realism discussion in the governance/verification sections and reinforced in limitations/future work.

Comment 5 (RAG limitations: retrieval failure, context limits, contradictions, source authority)

Reviewer comment (summary): The paper assumes a clean info environment; RAG has limitations; organisations have fragmented documentation; authority determination is hard.

Response: We strongly agree. We revised the manuscript to acknowledge that RAG improves grounding but does not guarantee correctness, and we explicitly added the following limitations and governance needs:

  • Retrieval failures due to incomplete indexing and poor document hygiene

  • Context window constraints affecting evidence inclusion

  • Contradictory or stale sources requiring authority resolution

  • The need for source curation, versioning, authority ranking, and retrieval monitoring as governance primitives rather than engineering afterthoughts

  • Explicit uncertainty handling (e.g., “insufficient evidence found” as a governed state that triggers review)

Manuscript changes: Strengthened Sections 2.2/2.4 (risks + implications), Sections 4–5 (governance mechanisms), and Section 9.3 (limitations), with follow-on future directions.

Comment 6 (Constraint checking complexity: partial satisfaction, conflicts, temporal dependencies, interpretive constraints; boundary between automated vs human judgment unclear)

Reviewer comment (summary): Constraint checking is not straightforward; need to address partial satisfaction, conflicts, temporal constraints, and interpretive constraints; delineate what can be automated.

Response: We agree and revised the constraint-checking discussion to be more realistic and governance-aligned. We now clarify that constraint outcomes should be classified beyond binary pass/fail, for example:

  • Pass

  • Conditional pass (requires additional evidence or approval)

  • Fail

  • Inconclusive / unresolved (insufficient evidence; conflicting rules; ambiguity)

We also state that conflict resolution and temporal dependencies require explicit precedence logic and escalation policies, with audit logging. Most importantly, we explicitly delineate machine-verifiable vs human-interpreted constraints, and we position the boundary as context-dependent and risk-tier dependent.

Manuscript changes: Expanded Sections 4.3–4.4 (verification mechanisms and decision cycle), plus Section 9.3 (limitations) and Section 10 (future research).

Closing remark to Reviewer 1

We again thank the reviewer for exceptionally constructive feedback. We believe the revision materially improves the manuscript’s rigor and practical credibility while preserving the intended contribution: a governance-first architecture that treats LLMs as accountable explanatory interfaces mediated by verification and human-in-command oversight. We are grateful for the recommended adjacent literature, which has strengthened the manuscript’s architectural positioning and security-awareness.

Reviewer 2 Report

Comments and Suggestions for Authors

The article proposes Algorithmic Management (AM) using LLM approach, which is governance oriented architecture for explainable organizational decision systems. The article is well-written and balanced in its structure. However, the following should be addressed.

Abstract:

Provide the obtained/achieved value of various metrics in contrast to state-of-the-art to show the effectiveness of the approach.

Introduction:

  1. The references are not in sequence. Please rectify it. Please check in rest of the article.
  2. The manuscript would benefit from a clearer articulation of its main contributions. Please explicitly state what distinguishes this work from existing studies and why the proposed approach provides a meaningful advancement. In this regard, do the following:
    • Discretely specify the limitations/research gap in the form of numbering or bullets.
    • Likewise, provide the contributions in line with the research gap identified.

Background and Related Work:

After sections 2.1-2.3, add section 2.4 summarizing the findings of related work and potential background.

Research Design and Methodology

While the methods are not properly described, several details are missing. Please clarify key components and any assumptions made during research design.

Currently the section is too brief. Adding a methodological pipeline (diagram) is recommended.

One suggestion is to merge sections 3 and 4 to make a comprehensive methodological section.

Risks and Governance Controls in LLM-Augmented Management

At the end of the section, provide a paragraph to show how potentially the proposed architecture is going to address the identified risks.

Demonstrative Simulation: Operational Plausibility and Trade-Offs

Figure 4 needs more elaboration. Provide explicit details of all four components depicted.

Fix the typo in the figure “Cdgnitive Layer.”

Table 2 is neither cited in the text nor explained properly. Kindly explain it.

Metrics should be explained preferably in the form of equations and parameters.

Illustrative Scenarios and Potential Applications

Link the proposed approach, the scenarios and risk and controls identified in the previous section.

Business and Industry Applications of Cognitive Governance

Kindly provide the discrete essence of the proposed architecture in each application.

Discussion

The limitations of the study are too brief in the current form. Elaborate on each limitation.

Provide explanation to Table 3. Currently, in the whole subsection, there is just a table without any details in the text.

Future Research Directions

Kindly link the limitations to the future research directions.

It is better to merge this section in the discussion by revising the section title to “Discussion and Future Directions.”

Make this section a subsection 9.5.

Conclusion:

Refer to the obtained values of metrics in the conclusion as well. Like how obtained latency can impact the industry etc.

Author Response

Responses to Reviewer 2

We sincerely thank the reviewer for the careful reading and constructive, well-structured comments. We appreciate the positive assessment of the manuscript’s writing and balance, and we fully agree that the paper benefits from (i) clearer contribution positioning, (ii) expanded methodological transparency, and (iii) stronger integration across risks, controls, scenarios, and the demonstrative simulation. In the revised manuscript, we have implemented the requested changes comprehensively. Below we respond point-by-point.

Abstract

R2-A1. Provide obtained/achieved value of various metrics in contrast to state-of-the-art to show effectiveness.

Response: Thank you for this important suggestion. Since our contribution is conceptual and governance-architectural rather than a state-of-the-art predictive/control benchmark, we avoided introducing potentially misleading “SOTA” comparisons without directly comparable baselines. Instead, we strengthened the abstract by reporting obtained design-level indicators from the demonstrative simulation (e.g., baseline vs. augmented latency, validity proxy, constraint satisfaction rate), and we explicitly clarify that these are illustrative metrics intended to demonstrate operational plausibility and trade-offs under synthetic assumptions rather than a competitive performance benchmark. This addresses the request to present achieved values while preserving methodological integrity and preventing overclaiming.

Implemented change: Abstract updated to include obtained metric values and a clear caveat on interpretation.

Introduction

R2-I1. References are not in sequence. Please rectify it (and check rest of article).

Response: We appreciate the careful observation. We corrected the reference sequencing in the Introduction and conducted a full manuscript pass to ensure consistent ordering throughout.

Implemented change: Reference order corrected across the manuscript.

R2-I2. Clearer articulation of main contributions; explicitly state what distinguishes this work.
Discretely specify limitations/research gap in numbering/bullets; provide contributions aligned with gaps.

Response: We agree completely. We substantially revised Section 1.3 to make the novelty and advancement explicit. The revised text now presents:

  • numbered/bulleted research gaps (G1–G4), and

  • aligned contributions (C1–C5) mapped directly to the gaps.

This also strengthens the “what is new” narrative and clarifies why the proposed governance-oriented architecture advances existing explainability discussions by treating verification and oversight as first-class architectural components.

Implemented change: Section 1.3 rewritten with G1–G4 and C1–C5 alignment.

Background and Related Work

R2-B1. Add Section 2.4 summarizing findings of related work and background implications.

Response: Implemented as suggested. We added Section 2.4 as a synthesis subsection that summarises key insights from Sections 2.1–2.3 and explicitly derives architectural implications (functional separation, governance mediation, risk-proportional controls), providing a clear transition into the methodology and architecture sections.

Implemented change: New Section 2.4 added.

Research Design and Methodology

R2-M1. Methods not properly described; clarify key components and assumptions; section too brief.

Response: We appreciate this guidance and agree the methodology needed strengthening. We expanded Section 3 to include:

  • explicit positioning of the work as conceptual/design-oriented,

  • a clear delineation of what is and is not validated at this stage, and

  • explicit assumptions (A1–A5) that define scope and interpretive limits.

Implemented change: Section 3 expanded with research design and assumptions.

R2-M2. Add a methodological pipeline (diagram recommended).

Response: Implemented. We added a methodological pipeline description in Section 3.2, including a structured stepwise sequence from literature synthesis → risk/control objectives → layered architecture → scenario instantiation → demonstrative simulation → limitations/future validation. We also added a figure placeholder / concise pipeline list to support later diagram insertion.

Implemented change: Section 3.2 added/expanded with pipeline and diagram placeholder.

R2-M3. Suggest merging Sections 3 and 4.

Response: We agree with the reviewer’s motivation. To avoid destabilising section numbering late in revision, we did not fully merge the sections, but we strengthened Section 3 so it functions as a comprehensive methodological foundation and added explicit transition text linking methodology to the architecture section. This preserves readability while addressing the core concern (methodology too brief).

Implemented change: Section 3 strengthened + clearer transitions into Section 4.

Risks and Governance Controls in LLM-Augmented Management

R2-R1. Add a closing paragraph showing how the proposed architecture addresses the identified risks.

Response: Implemented. We added a synthesis paragraph at the end of Section 5 explicitly mapping key risk categories (hallucination/ungrounded rationales, bias amplification, privacy/security exposure, automation dependence, regulatory exposure) to concrete controls (grounding, constraint checking, provenance/audit trails, access restrictions, escalation/human-in-command oversight).

Implemented change: Closing synthesis paragraph added to Section 5.

Demonstrative Simulation: Operational Plausibility and Trade-Offs

R2-S1. Figure 4 needs elaboration; provide explicit details of all four components.
R2-S2. Fix typo “Cdgnitive Layer.”

Response: Implemented. We expanded the Figure 4 explanation to explicitly describe the four components and their interaction within the simulation flow, and we corrected the typo (“Cognitive Layer”).

Implemented change: Section 6.1 revised + Figure corrected.

R2-S3. Table 2 is neither cited nor explained properly; kindly explain it.

Response: Implemented. We added in-text citation to Table 2 and inserted a paragraph explaining the synthetic parameterisation, explicitly stating that these values are illustrative rather than empirical benchmarks.

Implemented change: Table 2 cited and explained in Section 6.

R2-S4. Metrics should be explained preferably in equations and parameters.

Response: Implemented. Section 6.2 now defines the metrics formally (baseline/augmented latency decomposition, mean overhead ratio, validity proxy, CSR/ERR) and clarifies their interpretation as design-level indicators.

Implemented change: Section 6.2 rewritten with equations and parameter definitions.

Illustrative Scenarios and Potential Applications

R2-Sc1. Link the proposed approach, the scenarios, and the risks/controls identified earlier.

Response: Implemented. We revised the scenario section to explicitly connect each scenario to the relevant risk categories and governance controls, making the narrative cross-referential rather than siloed.

Implemented change: Sections 7–8 updated with explicit risk/control linkages.

Business and Industry Applications of Cognitive Governance

R2-App1. Provide the discrete essence of the proposed architecture in each application.

Response: Implemented. For each application, we added a concise “essence” summary (role of decision core; LLM role; primary governance controls; oversight mode), making the architecture’s recurring pattern explicit and comparable across domains.

Implemented change: Section 8 enhanced with discrete essence summaries.

Discussion

R2-D1. Limitations too brief; elaborate on each limitation.

Response: Implemented. We expanded the limitations section substantially to include: conceptual vs. empirical scope, synthetic proxy metrics, policy-as-code realism, verification failure and oversight workflow constraints, RAG limitations under messy documentation, and calibration without empirical baselines.

Implemented change: Section 9.3 expanded.

R2-D2. Provide explanation to Table 3; currently only a table without details.

Response: Implemented. We added (i) a brief introduction before Table 3 and (ii) interpretive explanation after Table 3, ensuring the table is integrated into the argument rather than standing alone.

Implemented change: Section 9.4 updated with pre-/post-table explanation.

Future Research Directions

R2-F1. Link limitations to future research directions.
R2-F2. Better merge into discussion (“Discussion and Future Directions”), make subsection 9.5.

Response: Implemented in substance. We explicitly linked the future research agenda to the limitations (opening paragraph of the future directions section), and we revised headings/structure to present future work as a continuation of discussion. Where full renumbering was not feasible without cascading edits, we ensured the content alignment and cross-references are clear and consistent.

Implemented change: Future directions explicitly tied to limitations; discussion framing updated.

Conclusion

R2-C1. Refer to obtained metric values in the conclusion; discuss latency implications for industry.

Response: Implemented. We revised the conclusion to reference the demonstrative obtained metrics (baseline vs. augmented latency, CSR, validity proxy) and interpret them as design-level indicators of operational plausibility and governance trade-offs, including a short discussion of why latency overhead matters for adoption and oversight workflow feasibility in industry and public-sector contexts.

Implemented change: Conclusion updated accordingly.

Closing remark to Reviewer 2

We again thank the reviewer for the highly actionable guidance. We believe the revisions substantially improve the manuscript’s clarity, methodological transparency, and cross-sectional integration, and we are grateful for the reviewer’s emphasis on explicit contribution framing and operational interpretability.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes a governance-oriented architecture for LLM-augmented algorithmic management, combining an algorithmic decision core, an LLM explanation layer, and a verification/governance layer, with support from literature synthesis, illustrative scenarios, and a small synthetic simulation. The topic is timely and the paper is readable, but the contribution is still more conceptual than technically validated.

 

  1. The main contribution is framed as an “architecture,” but the novelty over existing governance-aware XAI, RAG-based decision support, and human-in-command AI frameworks is not yet sharply demonstrated. The manuscript would benefit from a more explicit comparison table showing exactly what is new here beyond a repackaging of known principles such as grounding, policy checks, logging, and human oversight.

 

  1. The simulation is too weak to support several of the practical claims. It is explicitly synthetic, uses only 120 decision events, and reports stylized metrics such as baseline latency around 100.3 ms, LLM-augmented latency around 115.8 ms, explanation validity of 85.6%, and constraint satisfaction of 94.2%, but these values are generated from assumed distributions rather than real organizational data, so they demonstrate plausibility rather than evidence of operational effectiveness.

 

  1. The paper repeatedly uses strong practical language such as “practical blueprint,” “operational reliability,” and “cross-domain deployment,” but there is no real implementation, no user study, no field experiment, and no comparison against existing governance workflows. The conclusions should therefore be toned down and clearly limited to a conceptual architecture plus an illustrative simulation, and the authors may also strengthen the discussion of real domain-oriented intelligent decision support by considering related evidence from DOI: 10.1109/ACCESS.2023.3240162.

 

  1. The “explanation validity” metric is not sufficiently defined to be scientifically convincing. Since it is introduced as a synthetic proxy rather than a validated measure, the manuscript should explain how this score would be operationalized in a real system, who would assess it, and how it would differ from factual correctness, groundedness, and user-perceived usefulness.

 

  1. The compliance discussion is useful, especially around the EU AI Act, GDPR, and ISO/IEC 42001, but it remains mostly normative and high level. To make the paper stronger, the authors should map specific architectural components to concrete compliance obligations, for example which layer handles data minimization, which component enforces human oversight, and what exact audit artifacts would be produced for a regulated deployment.

 

Author Response

We sincerely thank Reviewer 3 for the careful reading and the highly constructive critique. We fully agree that the contribution is primarily architectural and governance-oriented, and we have revised the manuscript to sharpen novelty, to delimit claims consistent with a conceptual design plus demonstrative synthetic simulation, and to operationalise previously normative elements through explicit mappings and workflow detail.

R3-1. Novelty not sharply demonstrated; add explicit comparison table.

Response: Thank you—implemented. We added an explicit comparison table that positions the proposed three-layer architecture relative to governance-aware XAI, RAG-based decision support, and human-in-command frameworks, clarifying what is new beyond known principles (grounding, checks, logging, oversight). This table highlights that the novelty is architectural: explicit separation of decision authority, explanation generation, and governance release-gating as first-class components.
Manuscript change: Added Table 3 and accompanying synthesis paragraphs in the Discussion.

R3-2. Simulation is too weak for practical claims; synthetic values demonstrate plausibility, not effectiveness.

Response: We agree and revised framing accordingly. Throughout the manuscript (Abstract/Simulation/Discussion/Conclusions), we clarify that the simulation is a demonstrative synthetic-trace illustration intended to show design-level plausibility and trade-offs, not operational effectiveness, field performance, or benchmark superiority. We also strengthened the limitations to prevent over-interpretation.
Manuscript change: Simulation scope and caveats reinforced in Section 6 (“Demonstrative Simulation”), and limitations expanded in Section 9.3.

R3-3. Strong practical language (“practical blueprint,” “operational reliability,” “cross-domain deployment”) should be toned down; consider DOI: 10.1109/ACCESS.2023.3240162.

Response: Thank you—implemented in substance and we aligned the tone to avoid any implied deployment validation. We (i) explicitly delimit the contribution as a conceptual architecture plus illustrative simulation, and (ii) added a clarification contrasting our architectural contribution with domain-specific intelligent decision-support studies that report task-level empirical performance. We cite the suggested IEEE Access study (Meng et al., 2023) as an example of domain-evaluated audit-oriented NLP work, and we state clearly that our manuscript does not provide accuracy benchmarking, user study evidence, field experiment results, or workflow comparisons.
Manuscript change: Added explicit contrast paragraph in Discussion (Section 9.1).

R3-4. “Explanation validity” metric not convincing; explain how it would be operationalized; distinguish from correctness/groundedness/usefulness.

Response: Implemented. We expanded the Metrics subsection to explicitly decompose explanation quality into distinct constructs: factual correctness, groundedness, constraint consistency, and user-perceived usefulness. We clarify that the simulation uses a bounded synthetic proxy and we outline a practical operationalization pathway combining automated signals (evidence coverage, citation validity, rule-pass status, uncertainty flags) with structured expert evaluation (rubrics).
Manuscript change: Section 6.2 (Metrics) expanded accordingly.

R3-5. Compliance discussion is too normative; map components to obligations and specify audit artifacts.

Response: Implemented. We added an operational mapping table from architectural mechanisms to governance obligations and representative audit artifacts, clarifying which layer supports data minimization/access control, constraint enforcement, oversight, and what documentary outputs are expected in regulated settings (e.g., policy/rule versions, access logs, provenance logs, reviewer approvals/overrides).
Manuscript change: Added Table 4 and explanatory text in Section 9.2.

We would like to express our sincere appreciation to Reviewer 3 for the depth, precision, and constructive rigor of the comments provided. The reviewer’s emphasis on sharpening novelty, clearly delimiting empirical claims, strengthening operational grounding, and mapping architectural principles to concrete compliance mechanisms significantly improved the manuscript. In particular, the recommendation to explicitly contrast our contribution with domain-oriented intelligent decision-support research helped us refine both the tone and positioning of the paper. The resulting revision is substantially clearer in scope, more precise in its claims, and more robust in its governance articulation. We are grateful for the reviewer’s thoughtful engagement, which materially strengthened the manuscript.

Reviewer 4 Report

Comments and Suggestions for Authors

The core purpose of this study is to address the trust and control challenges when introducing generative AI into management decision-making, proposing a system architecture that emphasises "verification before fluency". The article goes beyond the technical level to explore the organisational implications of latency as a governance parameter, noting that rapid unverified decisions may erode trust, thus providing a valuable perspective for practical deployment in enterprises and holding practical significance. The authors are advised to further enhance the manuscript from the following aspects:

(1) Page 4 mentions that this is an "initial technical demonstration". It is recommended that the Discussion section more deeply analyse the potential deviations between this simulation based on synthetic data and real-world scenarios. If possible, add a concrete, even if hypothetical, detailed use case demonstrating how a decision is processed, intercepted, or verified through the three-layer architecture.

(2) It is recommended that the authors add a discussion in Section 4 or the Discussion section on how existing technologies specifically support this architecture. The current description leans towards functional requirements and lacks feasibility analysis of technical implementation.

(3) In Table 1, under the "Fairness" column, it is mentioned that bias may be "amplified" through explanations. It is suggested to further elaborate on this point, as it is commonly believed that explanations help discover bias.

(4) The authors list "field studies" as future work, but the conclusion should more strongly call for the community to focus on specific evaluation metrics for the transition from "linguistic fluency" to "fact verification".

Author Response

We sincerely thank Reviewer 4 for the thoughtful and constructive feedback. We especially appreciate the recognition of the governance significance of treating latency as a design parameter and of prioritising verification over mere linguistic fluency. The comments prompted us to deepen the manuscript’s operational realism, feasibility analysis, and evaluation framing. We respond to each point below.

R4-1. Deviation between synthetic simulation and real-world environments; add concrete detailed use case.

Response:
We agree that the initial discussion did not sufficiently articulate the gap between the synthetic simulation and real-world organisational conditions. In the revised manuscript, we expanded the Discussion and Limitations sections to explicitly analyse potential deviations, including policy ambiguity, contradictory documentation, retrieval noise, oversight queue delays, and verification failure modes.

In addition, we introduced a detailed “Use Case Walkthrough” subsection that traces a single decision event through the three-layer architecture (decision core → retrieval and explanation → constraint checking → escalation → audit logging). This example illustrates how verification, conditional routing, and human-in-command oversight function in practice, thereby operationalising the abstract architecture.

R4-2. Add feasibility analysis; current description reads as functional requirements rather than implementable system.

Response:
We appreciate this important clarification. To address it, we added an implementation-feasibility discussion mapping each architectural layer to commonly available enterprise building blocks (decision/rule engines, controlled RAG stacks, policy engines, identity and access management, workflow orchestration, and audit logging/versioning systems).

This addition clarifies that the proposed architecture is not speculative at the technological level; rather, its novelty lies in governance structuring and integration patterns rather than in inventing new computational primitives.

R4-3. Clarify fairness/bias statement; explanations may also help uncover bias.

Response:
We agree that the dual role of explanations needed clarification. We revised Table 1 and the corresponding section to explicitly state that natural-language explanations can both expose bias and amplify it through framing effects, selective justification, or persuasive narrative reinforcement.

The revised text now clarifies this duality and proposes mitigation mechanisms (structured review rubrics, counterfactual checks, narrative framing audits, sampling-based review) to prevent explanation-driven bias amplification.

R4-4. Strengthen conclusion with call for evaluation metrics separating fluency from verification.

Response:
Implemented. We strengthened the conclusion and future research section to explicitly call for standardized evaluation protocols that distinguish linguistic fluency from verifiable justification. We list candidate metrics such as evidence coverage, citation validity, constraint consistency, calibrated uncertainty, and oversight workload metrics.

This aligns with the reviewer’s recommendation and reinforces the paper’s central thesis that governance-ready deployment must be measured in terms of verifiability and accountability—not narrative quality alone.

 

We are sincerely grateful to Reviewer 4 for the constructive and forward-looking perspective provided. The emphasis on practical feasibility, real-world deviation analysis, and the need for stronger evaluation framing significantly improved the manuscript’s operational clarity and balance. The revisions directly reflect these insights, and we believe the manuscript is materially stronger as a result of this rigorous and thoughtful feedback.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

All my comments have been addressed, and I believe the manuscript is now in a suitable form for publication.

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for addressing the comments adequately. I don't have any further suggestions.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript has been revised based on the review comments and the responses are convincing. Therefore, the manuscript maybe considered for publication in it's present form.

Reviewer 4 Report

Comments and Suggestions for Authors

Thanks for submitting the revised manuscript to address my concerns. I am satisfied with the responses and changes made by the authors.

Back to TopTop