MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities

Alharbi, Saad

doi:10.3390/systems14030294

Open AccessArticle

MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities

by

Saad Alharbi

Department of Computer Science, College of Computer Science and Engineering, Taibah University, Medina 42353, Saudi Arabia

Systems 2026, 14(3), 294; https://doi.org/10.3390/systems14030294

Submission received: 3 February 2026 / Revised: 5 March 2026 / Accepted: 9 March 2026 / Published: 10 March 2026

(This article belongs to the Special Issue Artificial Intelligence and Data-Driven Analytics in Smart City Systems)

Download

Browse Figures

Versions Notes

Abstract

Smart city decision systems must balance conflicting objectives including efficiency, sustainability, equity, safety, and public accountability. Existing AI and reinforcement learning approaches often optimize isolated objectives and rarely provide integrated mechanisms for sustainability alignment, transparency, and pre-deployment validation. This paper introduces MORL-SGF, a governance-aware framework that integrates ESG/SDG-aligned multi-objective reinforcement learning, Digital Twin (DT)-based policy validation, and Pareto-based policy auditing within a single learning pipeline. The framework preserves vector-valued rewards to avoid hidden scalarization bias and supports auditable policy selection from a portfolio of Pareto-optimal candidates. MORL-SGF is validated analytically and conceptually through formal modeling and structured evidence synthesis rather than empirical deployment, providing a blueprint for subsequent simulation-based and real-world implementation studies. Future work will focus on large-scale Digital Twin benchmarking, stakeholder preference modeling, and deployment-oriented evaluation.

Keywords:

smart cities; multi-objective reinforcement learning; governance-aware AI; digital twin; pareto optimization; explainable AI; sustainable urban systems

1. Introduction

The digital transformation of cities has accelerated the adoption of artificial intelligence (AI) to optimize critical urban functions, including mobility, energy distribution, public safety, healthcare delivery, environmental monitoring, and citizen services [1,2,3,4,5]. Despite these advancements, many operational AI deployments remain single-objective, siloed, or domain-specific, where optimization focuses on isolated efficiency metrics rather than interdependent city-scale trade-offs. Urban optimization objectives frequently conflict, as demonstrated by multi-objective routing and control studies that jointly optimize delay, energy consumption, emissions, and throughput [5,6,7,8,9,10]. These competing priorities reveal a fundamental challenge: smart city decision systems must evolve from isolated optimization toward adaptive multi-objective governance-aware decision reasoning [9,11]. To understand this transition toward governance-aware decision intelligence, it is necessary to distinguish between three foundational layers of urban AI systems. First, multi-objective optimization addresses competing performance metrics. Second, sustainability and governance alignment introduce societal accountability into optimization objectives. Third, safe policy validation mechanisms are required before deployment in real municipal environments. The following paragraphs stage these dimensions progressively to clarify how they converge into a unified governance-centric learning framework.

At the algorithmic level, Multi-Objective Reinforcement Learning (MORL) enables learning Pareto-optimal policies under competing objectives rather than maximizing a single utility [12,13,14]. However, MORL deployments in smart-city contexts remain domain-fragmented and typically performance-centric, with sustainability alignment, accountability, and deployment safety treated as external evaluation criteria rather than intrinsic learning objectives [6,7,15,16,17]. Complementing MORL, DT provide simulation infrastructure for scenario testing and risk-aware validation, yet most reported DT implementations focus on monitoring or forecasting and are rarely integrated as policy-validation modules within reinforcement-learning loops [18,19]. As a result, current research lacks an end-to-end pipeline that learns multi-objective urban policies, validates them safely in simulation, and supports governance-aligned auditing and selection before real-world adoption

Several recent studies have advanced multi-objective reinforcement learning (RL) in transportation control, energy management, and surveillance optimization (e.g., [12,15,20]), while others have explored DT-enabled governance modeling and simulation-based urban planning (e.g., [21]). However, existing MORL applications remain predominantly domain-specific and performance-driven, without embedding governance-aligned reward structures or formal pre-deployment policy certification. Conversely, DT governance studies emphasize simulation fidelity and policy visualization but rarely integrate adaptive multi-objective learning or Pareto-based policy auditing within the decision loop. The proposed MORL-Smart Governance Framework (MORL-SGF) framework differs by explicitly unifying governance-aware MORL, DT validation, and Pareto-based accountability mechanisms within a single end-to-end learning pipeline

To address these systemic limitations, this paper introduces the MORL-SGF, a unified architecture integrates:

Multi-Objective Reinforcement Learning for adaptive policy learning under competing objectives [12,14];
DT simulation pipelines for risk-aware pre-deployment policy evaluation [18,22];
Governance-aligned reward shaping derived from ESG (Environmental, Social, Governance) and United Nations Sustainable Development Goals (SDGs) metrics [11,23];
Pareto-frontier policy auditing for transparent and accountable decision selection [10,16].

Unlike conventional MORL solutions that optimize performance alone, MORL-SGF extends MORL by embedding sustainability objectives within the reward structure and incorporating simulation-backed validation mechanisms [18,23]. The framework does not target a single application domain but instead establishes a cross-domain decision layer capable of governing city-scale policies under shared sustainability constraints [9,11].

It is important to clarify the scope of validation in this study. The proposed MORL-SGF framework is analytically and conceptually validated through formal modeling, architectural specification, and structured evidence synthesis rather than empirical deployment or large-scale simulation benchmarking. The objective of this work is to establish a governance-aware decision architecture and functional design blueprint, laying the groundwork for subsequent simulation-based and real-world implementation studies.

Collectively, these limitations motivate the need for a governance-aware learning-to-deployment architecture capable of integrating multi-objective optimization, sustainability alignment, and simulation-based policy validation within a unified decision framework. MORL-SGF is designed to address this integration gap systematically. The following contributions formalize the structural, algorithmic, and evaluative components of this proposed framework. This work is positioned as a design-oriented methodological framework study that formalizes governance-aware reinforcement learning architecture rather than as a survey, policy manifesto, or empirical benchmarking report.

The primary contributions of this work are:

A novel MORL-based governance framework (MORL-SGF) that unifies multi-domain urban decision learning under sustainability and governance objectives.
A governance reward shaping mechanism translating ESG and SDG indicators into formal MORL reward signals.
A Digital Twin integration protocol enabling safe, non-intrusive policy validation, risk tracing, and Pareto compliance auditing.
A policy selection and accountability model that ranks Pareto-optimal policies using sustainability-aware governance scoring.
Evidence-driven validation informed by synthesis and characterization of 79 smart city studies, demonstrating research readiness and real-world demand for governance-aware MORL systems.

The remainder of the paper proceeds as follows. Section 2 reviews foundational concepts in MORL, DTs, and governance-aware intelligent systems. Section 3 formalizes research gaps and functional system requirements. The methodology applied in this research is described in Section 4. Section 5 presents the MORL-SGF architecture, reward formulation, and decision pipeline. Section 6 synthesizes design evidence from the smart city literature. Section 7 outlines representative use cases. Section 8 discusses open challenges, followed by conclusions and future research directions in Section 9.

2. Background and Foundations

Smart city decision systems operate in highly dynamic and heterogeneous environments characterized by continuous sensing, multiple stakeholders, constrained resources, sustainability requirements, and inherently conflicting policy objectives [1,4,9]. While artificial intelligence (AI) and reinforcement learning have demonstrated strong optimization capabilities across urban domains, many deployed systems remain single-objective, siloed, or narrowly performance-driven. Such formulations fail to capture the multidimensional trade-offs required in real municipal decision-making, motivating the need for multi-objective learning, governance-aware reasoning, and safe pre-deployment validation mechanisms [10,11,16].

This section reviews the foundational concepts underpinning the proposed MORL-SGF, focusing on (i) multi-objective reinforcement learning, (ii) governance-aware AI grounded in ESG and SDG targets, and (iii) DTs as policy validation environments.

2.1. Multi-Objective Reinforcement Learning (MORL)

Reinforcement Learning aims to learn a policy π(a∣s) that maximizes cumulative reward through interaction with an environment [10,13]. In real smart-city deployments, however, decision objectives are inherently multidimensional. Urban policies must simultaneously reduce emissions, minimize congestion, control energy consumption, maintain fairness across districts, and ensure safety and service continuity [7,16]. Encoding such competing objectives into a single scalar reward often leads to opaque trade-offs and hidden policy bias.

MORL extends classical RL by optimizing a vector-valued reward:

r_{t} = [r_{t}^{(1)}, r_{t}^{(2)}, \dots, r_{t}^{(n)}]

and by learning a set of Pareto-optimal policies rather than a single optimal solution. This formulation preserves trade-offs explicitly and avoids collapsing heterogeneous objectives into manually weighted sums. From a mathematical perspective, the multi-objective formulation induces an m-dimensional objective space ℝ^m, where each axis corresponds to one governance-aligned reward component (e.g., efficiency, sustainability, fairness, safety, participation). Each learned policy π is therefore mapped to a coordinate vector J(π) = [J₁ (π), …, J_m (π)] ∈ ℝ^m, where J_i (π) denotes the expected cumulative return for objective i. The Pareto front is defined as the subset of non-dominated coordinate points in this objective space. Thus, the “coordinates” correspond to governance-relevant performance metrics, not spatial variables, and represent trade-off positions in governance-aligned objective space.

In smart-city contexts, MORL provides several structural advantages over scalarized reinforcement learning approaches.

Explicit trade-off preservation: Vector-valued rewards retain conflicting objectives—such as energy efficiency versus mobility throughput—without collapsing them into manually weighted aggregates, thereby maintaining transparency across heterogeneous urban metrics [24].
Pareto-based policy diversity: Learning non-dominated policies enables systematic trade-off analysis and prevents dominance by a single performance metric [10,15].
Adaptive multi-agent scalability: MORL architectures extend naturally to distributed and cooperative agents, supporting large-scale, non-stationary urban environments [25].
Governance objective integration: Multi-dimensional reward structures allow fairness, sustainability, and accountability objectives to be embedded directly during learning rather than evaluated post hoc [11,23].

Collectively, these characteristics position MORL as a promising foundation for governance-aware urban decision systems.

Despite these strengths, existing MORL deployments in smart-city research remain predominantly subsystem-oriented, optimizing transportation, energy, or communication infrastructures independently rather than as interconnected governance problems. While such domain-specific optimizations demonstrate technical performance gains, they typically omit explicit sustainability alignment, cross-sector trade-off coordination, and institutional accountability mechanisms. In contrast, governance-aware decision systems require integrated objective modeling, transparent trade-off exposure, and validation protocols that extend beyond isolated performance metrics. This comparative gap between performance-centric MORL applications and governance-centric urban requirements underscores the need for a structurally unified framework.

However, optimization alone—even when multi-objective—does not guarantee societal legitimacy or regulatory alignment. This limitation motivates the explicit incorporation of governance principles into the learning objective itself.

2.2. Governance-Aware AI and Sustainability Targets (ESG & SDGs)

Modern smart-city AI systems are no longer assessed solely on operational efficiency. Governance, accountability, fairness, and environmental impact have become mandatory design criteria for public-sector AI deployment [11,23]. Urban decision systems must increasingly satisfy three interdependent dimensions:

Environmental objectives, including emissions reduction, energy efficiency, pollution mitigation, and waste minimization;
Social objectives, such as fairness, accessibility, safety, and inclusive service delivery;
Governance objectives, encompassing transparency, explainability, regulatory compliance, and accountability.

These dimensions align directly with globally recognized sustainability frameworks, including ESG principles and the SDGs. However, many AI systems operationalize these goals only as post hoc evaluation metrics rather than embedding them into the learning objective itself.

To enable governance-aware learning, sustainability and accountability targets must be mathematically encoded into the reward structure of the decision system. Table 1 summarizes representative governance dimensions and their corresponding interpretations in MORL-based policy design.

Despite recognition of these objectives, many existing MORL deployments treat them as after-optimization evaluation metrics, rather than embedding them into the reward function at training time [11,23].

While embedding governance objectives into reward structures addresses normative alignment, it does not resolve deployment risk. Even governance-aware policies require structured validation before activation in complex urban environments.

2.3. Digital Twins as a Policy Validation Layer

To address the deployment and risk-validation dimension introduced above, Digital Twins provide high-fidelity virtual representations of physical urban systems that synchronize with real-world data to enable simulation, forecasting, and scenario analysis. They are increasingly adopted in urban planning, transportation modeling, energy systems, and emergency response [18,19]. However, their integration into reinforcement learning pipelines—particularly for governance validation—remains limited [22]. Recent studies suggest that DTs have the potential to support risk-aware experimentation and policy evaluation when integrated with AI pipelines [21].

When integrated with MORL, DTs provide a critical policy safety layer. They enable stress testing of learned policies under extreme or rare conditions, validation of Pareto trade-offs before deployment, estimation of long-term sustainability impacts, and early rejection of unsafe or inequitable policies. This shifts DTs from passive monitoring tools into active governance sandboxes, where policy behavior—not just infrastructure performance—is evaluated.

Conventional DT deployments focus on simulating physical system behavior or forecasting demand. In contrast, governance-aware DT integration evaluates policy outcomes, including risk exposure, fairness violations, and ESG/SDG compliance, before any real-world enactment. This distinction is essential for responsible urban AI deployment and forms a central pillar of the MORL-SGF framework.

From an algorithmic perspective, DTs can interact with reinforcement learning pipelines in three complementary roles. First, the DT functions as a high-fidelity simulated environment in which candidate policies generated by MORL agents are executed without real-world risk. Second, simulation outputs provide structured feedback signals—including performance metrics, constraint violations, and externality indicators—that can be incorporated into reward adjustment or policy filtering mechanisms. Third, the DT enables pre-deployment stress testing under adversarial or rare-event scenarios, allowing governance thresholds to be evaluated before real-world activation. In MORL-SGF, the DT is therefore not a passive visualization layer but an active policy validation module integrated into the learning-to-deployment loop.

The above foundations highlight why multi-objective learning, governance-aligned objective design, and simulation-based validation must be treated jointly in urban decision systems. Section 3 consolidates the observed limitations into explicit research gaps and corresponding capability requirements. Furthermore, the identified limitations and their corresponding system-level requirements are consolidated in Table 2.

3. Research Gap and Motivation

Smart cities increasingly rely on autonomous, data-driven decision systems to coordinate critical urban operations, including mobility management [26,27], energy distribution [28,29], public safety monitoring [13,30], environmental sustainability [3,31], healthcare logistics [32,33,34], and long-term infrastructure planning [18,35]. Advances in artificial intelligence, reinforcement learning, and computer vision have enabled substantial performance improvements across these domains [26,36,37,38]. Despite these advances, recurring structural limitations are evident across deployments, particularly in the limited integration of governance-related objectives such as public accountability [23], social equity [16,39], sustainability [2,4,5], and transparent decision logic [13,40].

This pattern reveals a structural misalignment between algorithmic optimization objectives and the societal, regulatory, and ethical requirements of urban governance. Recent surveys indicate that contemporary smart-city AI deployments predominantly emphasize efficiency and throughput metrics, while governance integration, fairness considerations, sustainability alignment, and citizen-centric accountability mechanisms remain underrepresented [41,42]. Rather than reiterating general concerns, the following subsections formalize these deficiencies into explicit capability gaps.

3.1. Why Governance Must Become a Primary Optimization Objective

Urban policy decisions differ from classical engineering optimization in that they directly affect public welfare, require explicit justification, and involve inherently conflicting objectives that must remain transparent and auditable. Despite these realities, many AI-driven city systems continue to optimize operational key performance indicators (KPIs) such as latency, throughput, or resource utilization [15,20,43,44], while treating governance constraints as offline checks or post-deployment evaluation criteria [13,23,40]. This disconnect underscores the need to elevate governance to a primary optimization objective in urban AI systems.

3.2. Systemic Limitations of Existing Smart City AI Approaches

A cross-domain analysis of smart-city AI literature reveals five recurring and structural limitations that hinder responsible urban decision-making.

First, many reinforcement learning formulations rely on scalarized or manually weighted objectives, collapsing multi-dimensional societal trade-offs into a single reward signal [11,43,44]. This practice introduces hidden policy bias and obscures the rationale behind trade-off decisions. Second, governance-aware reward shaping is largely absent: sustainability, fairness, accountability, and social impact are rarely encoded during policy learning, leading AI systems to learn efficient yet potentially inequitable or unsustainable behaviors [2,11,39]. Third, policy interpretability and auditability remain limited. Few systems expose Pareto trade-offs or decision rationales in a form suitable for regulatory review or public accountability [13,23,40].

Fourth, pre-deployment policy stress testing is uncommon. While DTs and simulators are widely used to model infrastructure behavior, they are rarely integrated as validation checkpoints for reinforcement learning policies, allowing unsafe or non-compliant decisions to reach deployment without governance assurance [22,45,46]. Finally, many solutions remain domain-isolated, optimizing traffic, energy, safety, or environment independently, without a unified view of system-wide governance trade-offs [4,6,29].

An alternative strategy for prioritizing structural limitations or governance contributions involves multi-criteria decision-making methods such as the Analytic Hierarchy Process (AHP). AHP can assist municipal stakeholders in assigning structured pairwise preference weights to governance dimensions, thereby supporting strategic prioritization in complex urban systems [47,48,49,50]. While MORL-SGF preserves objective independence during learning, AHP-style post-learning preference structuring may complement Pareto-based selection by guiding final policy choice under stakeholder-defined priorities. This integration remains a promising direction for future research.

3.3. Research Gaps

Addressing the above limitations requires explicit capability targets that are currently missing from the literature. First, there is a lack of true Pareto-based multi-objective policy learning, where trade-offs remain transparent rather than being implicitly resolved through scalarization. Second, governance-aware reward modeling—explicitly encoding ESG, SDG, fairness, and sustainability objectives during learning—is largely absent. Third, explainable policy trade-offs suitable for public audit and regulatory approval are rarely supported. Fourth, DT environments are underutilized as policy sandboxing platforms for high-stress, pre-deployment validation. Finally, cross-domain generalization remains limited, with no unified governance layer capable of orchestrating decisions across mobility, energy, safety, and environmental systems.

These gaps collectively indicate that existing approaches lack the architectural, methodological, and governance foundations required for responsible, large-scale urban autonomy

3.4. Motivating Vision

To operationalize governance within AI-driven decision systems, a next-generation smart-city learning framework must satisfy five core principles. Governance objectives must be optimized jointly with performance metrics rather than evaluated post hoc. Decision systems should expose a Pareto frontier of auditable policy alternatives instead of producing a single opaque policy. Policies must be validated under large-scale synthetic scenarios using DTs to ensure safety, robustness, and fairness. Decision rationale must be explainable in structured, human-interpretable formats suitable for public institutions. Finally, governance reasoning must generalize across urban domains instead of reinforcing isolated AI silos.

Together, these principles motivate a fundamental design requirement: smart-city AI must evolve from performance-centric learning to governance-centric policy intelligence.

Translating these principles into an implementable learning architecture requires explicit formulation of reward design, validation mechanisms, policy selection criteria, and cross-domain integration strategies. The subsequent section formalizes these elements as concrete research objectives and contributions of the proposed MORL-SGF framework.

3.5. Research Objectives and Contributions

Guided by the identified gaps and motivating vision, this work delivers five primary contributions, summarized in Table 3. These include a governance-aware MORL reward formulation embedding ESG and SDG metrics, a DT-based policy validation loop for safe deployment, Pareto-based policy ranking with explainability support, a unified cross-domain governance architecture, and a generalizable smart governance framework (MORL-SGF) applicable across urban infrastructures. Here, G1–G5 refer to the systemic research gaps identified in Section 3.3, namely scalarized objectives (G1), lack of governance-aware rewards (G2), absence of policy explainability (G3), lack of pre-deployment validation (G4), and domain-isolated optimization (G5).

Current smart-city AI research remains dominated by isolated, performance-oriented optimization approaches that lack governance guarantees, sustainability alignment, and policy accountability [51]. Existing learning paradigms rarely treat governance as an optimization objective, provide limited explainability, and fail to validate policies prior to deployment. These shortcomings motivate the MORL-SGF, which positions multi-objective reinforcement learning as a governance engine—validated through DTs and guided by ESG/SDG-aware reward modeling—to enable auditable, sustainable, and policy-aligned urban intelligence.

4. Methodology

The methodology adopted in this study is design-oriented and evidence-synthesized rather than empirical in the experimental sense. The proposed MORL-SGF framework is architecturally specified and analytically formalized, with validation grounded in systematic literature synthesis and structural feasibility analysis rather than real-world deployment or controlled simulation experiments. This positioning reflects the objective of establishing a governance-aware learning architecture that can guide future empirical implementations.

This study adopts a multi-phase methodological pipeline that integrates systematic evidence synthesis, governance-aligned problem modeling, multi-objective reinforcement learning design, and DT-based validation. The purpose of this methodology is to provide a transparent, reproducible foundation showing how the MORL-SGF was derived, structured, and theoretically validated.

The methodological workflow follows seven sequential steps, aligned exactly with the process illustrated in Figure 1. These steps map the evolution from evidence gathering to governance-aligned policy generation and deployment-ready validation.

4.1. Literature Search and Screening Protocol

To ensure methodological transparency and replicability, the evidence synthesis underlying this study followed a structured literature search and screening protocol. Publications were retrieved from Scopus, Web of Science, IEEE Xplore, and ScienceDirect, covering the period 2015–2025.

Search queries combined terms related to multi-objective reinforcement learning, smart cities, governance modeling, sustainability alignment, and DT validation. Representative search expressions included:

(“multi-objective reinforcement learning” OR “MORL”) AND (“smart city” OR “urban systems”) AND (“governance” OR “sustainability” OR “ESG” OR “SDG”) AND (“digital twin” OR “simulation validation”).

Inclusion criteria required that studies:

(1): Addressed reinforcement learning or AI-based optimization in urban contexts;
(2): Incorporated multi-objective or trade-off formulations;
(3): Discussed governance, sustainability, fairness, or accountability dimensions;
(4): Employed simulation or Digital Twin environments for validation.

Exclusion criteria removed purely theoretical optimization works without urban relevance, single-objective formulations lacking governance implications, non-peer-reviewed articles, and studies with insufficient methodological clarity.

After duplicate removal and abstract-level screening, 79 studies were retained for detailed governance-readiness evaluation. These studies correspond to the core peer-reviewed works cited in Section 2, Section 3, Section 4, Section 5 and Section 6 and form the evidentiary basis for the governance-readiness analysis.

4.2. Overview of the Methodological Pipeline

The methodology is organized into seven tightly coupled phases:

(i): Systematic literature review and evidence collection;
(ii): Governance readiness coding and gap identification;
(iii): Requirements derivation and multi-objective problem formalization;
(iv): Governance-aware reward modeling;
(v): MORL-based policy learning design;
(vi): Digital Twin–driven validation and governance-based policy filtering;
(vii): Deployment logic with continual governance alignment.

Rather than operating as isolated steps, these phases form a continuous decision pipeline in which analytical insights, formal models, and validation outcomes are progressively refined. This design ensures that governance considerations are embedded from the earliest conceptual stages and preserved through policy learning and validation.

4.3. Step 1—Systematic Literature Review

The methodological process begins with a structured systematic literature review conducted in accordance with widely accepted review guidelines [52]. Studies were collected from multiple smart-city-related domains, including transportation systems, energy management, digital health, surveillance, and urban governance [53,54,55,56,57]. To ensure methodological rigor and reproducibility, the screening and selection process followed PRISMA 2020 principles, with the full screening flow summarized in Figure 2. Figure 2 provides the PRISMA 2020 flow diagram, while Table 4 summarizes the corresponding numerical counts for clarity and reproducibility.

An initial corpus of more than 400 studies was identified through database searches and manual screening. These records were filtered based on relevance to artificial intelligence, reinforcement learning, governance modeling, and smart-city optimization. After title, abstract, and full-text screening, a final set of 79 peer-reviewed studies was retained, covering multi-objective optimization, reinforcement learning, DT environments, and governance frameworks. This evidence base serves as the empirical foundation for identifying systemic limitations in existing research and motivating the design of MORL-SGF. While the PRISMA-based screening resulted in a final corpus of 79 studies for systematic evidence synthesis, additional references were included throughout the manuscript to support background concepts, methodological foundations, and comparative discussion.

4.4. Step 2—Governance Readiness Coding & Gap Identification

Each study included in the final corpus was systematically evaluated using the Governance Readiness Index (GRI) described in Section 5. The coding process assessed five governance-critical dimensions: objective formulation (single versus multi-objective), level of governance integration, use of simulators or DTs, degree of explainability, and validation rigor. The design of this coding scheme was informed by prior governance and accountability analyses [23,58,59].

The aggregated coding results revealed five recurring deficiencies across the literature. First, governance objectives are rarely encoded directly into learning objectives. Second, DT usage is often limited to visualization or basic simulation rather than policy validation. Third, explainability mechanisms are typically applied post hoc rather than embedded within the learning process. Fourth, cross-domain reasoning across city subsystems remains largely absent. Finally, although MORL techniques exist, they are seldom applied for governance-oriented reasoning. These findings directly informed the functional requirements underlying the MORL-SGF architecture.

4.5. Step 3—Requirements Derivation & MOMDP Problem Formalization

Building on the gaps identified in Step 2, smart-city decision-making was formalized as a Multi-Objective Markov Decision Process (MOMDP). This formulation captures the inherently stochastic, multi-stakeholder nature of urban governance problems. The MOMDP representation incorporates state variables reflecting system congestion, energy demand, emissions, and spatial fairness disparities; action spaces corresponding to mobility control, routing, energy dispatch, or allocation decisions; and transition dynamics derived either from real-world data or DT simulations.

Crucially, the reward function is defined as a vector-valued governance-aligned objective space, rather than a scalar performance metric. This formalization establishes the mathematical foundation for the MORL component described later in Section 4 and ensures that governance trade-offs are preserved throughout the learning process.

The MOMDP formulation assumes full observability at the system level through aggregated state representations derived either from sensor networks or DT integration. While urban environments are inherently non-stationary due to demand fluctuations, seasonal variation, and evolving infrastructure conditions, non-stationarity is addressed through adaptive policy updates within the continual governance alignment stage (Step 7). The framework supports both centralized and multi-agent configurations. In centralized settings, a global policy operates over aggregated city-scale states; in multi-agent configurations, decentralized agents (e.g., traffic controllers, energy nodes) share a common governance-aware reward structure while interacting within a coordinated DT environment. This flexible modeling assumption ensures applicability across heterogeneous smart-city deployment scenarios.

4.6. Step 4—Governance-Aware Reward Modeling

To operationalize governance within the learning process, abstract sustainability and accountability principles were translated into measurable reward components. Reward design was informed by ESG and SDG indicators, municipal governance key performance indicators, and fairness and sustainability metrics reported in prior studies [39,60]. The resulting reward vector captures multiple governance dimensions, including efficiency, sustainability, fairness, safety, cost, and public participation.

To prevent dominance of any single objective, normalization and independence constraints were applied across reward dimensions. This design ensures that the MORL algorithm learns trade-offs intrinsically rather than relying on manually tuned scalar weights. The output of this stage is a governance-aware reward vector that serves as the direct input to policy learning.

4.7. Step 5—MORL-Based Policy Learning Design

The MORL-SGF framework operationalizes Pareto-based multi-objective reinforcement learning as its core policy optimization mechanism. Rather than relying on scalarized reward aggregation, the framework adopts dominance-based policy learning in which vector-valued rewards are preserved throughout training. This approach enables explicit exploration of the Pareto frontier and maintains transparent trade-offs among governance-aligned objectives.

Algorithmically, the framework is compatible with established Pareto-oriented MORL implementations, including Pareto Q-learning (value-based), multi-objective actor–critic architectures, and hypervolume-guided policy optimization methods [25,27,60]. In the conceptual instantiation presented here, dominance relations are used to evaluate candidate policies during training or post-training selection, ensuring that no objective dimension is collapsed into a fixed scalar weight. This design choice preserves governance transparency and prevents hidden objective bias.

4.8. Step 6—Digital Twin Validation & Governance-Based Policy Filtering

Candidate policies are validated within a high-fidelity DT environment before any real-world consideration. Drawing on DT governance studies [18,60], this phase evaluates policies under diverse operational and stress conditions. Validation focuses on robustness, stability, fairness impact, and compliance with predefined governance thresholds.

Governance thresholds are defined as quantitative upper or lower bounds on governance-critical indicators derived from municipal regulations, ESG benchmarks, SDG targets, or policy-defined risk tolerances. Examples include maximum allowable emission increases, minimum fairness parity ratios, acceptable congestion ceilings, or budgetary constraints. These thresholds function as hard feasibility constraints during DT validation: any policy violating at least one governance constraint is removed from the candidate set Π*.

Importantly, threshold values are not assumed to be static. Within the continual governance alignment stage (Step 7), thresholds may be updated to reflect evolving policy priorities, regulatory revisions, or stakeholder-driven adjustments. Such updates trigger re-evaluation of policies within the DT environment, ensuring that governance compliance remains dynamic rather than fixed at design time.

Policies that violate governance risk constraints are filtered out, while remaining candidates are ranked using governance-aligned Pareto dominance and hypervolume metrics. The output of this stage is a validated policy set Π_valid that satisfies both performance and governance requirements, ensuring that unsafe or non-compliant policies are excluded prior to deployment. Where Π_valid ⊆ Π* denotes the subset of Pareto-optimal policies that satisfy governance risk constraints under DT validation.

4.9. Step 7—Deployment Logic & Continual Governance Alignment

The final methodological stage addresses real-world deployment dynamics. A continual feedback loop compares real-world performance metrics with DT predictions and evolving governance indicators. Policy parameters are iteratively adjusted to correct deviations and maintain long-term alignment with governance objectives, consistent with adaptive reinforcement learning principles in [61,62].

This step ensures that governance alignment is not treated as a static design-time constraint but as an ongoing operational requirement throughout the policy lifecycle. It is important to emphasize that the present study develops a formal architectural and methodological framework rather than an empirical implementation or simulation benchmark. The objective is to specify the structural integration of governance-aware MORL, DT validation, and Pareto-based policy auditing at the architectural level. Empirical deployment, simulation benchmarking, and prototype implementation are intentionally reserved for future work, as the primary contribution of this study lies in formal system design and governance-aligned optimization modeling.

5. MORL–Smart Governance Framework (MORL-SGF)

5.1. Framework Overview

City-scale decision-making inherently involves multiple, often conflicting objectives spanning operational performance, environmental sustainability, social equity, safety, and public accountability. Such objectives cannot be adequately addressed through traditional single-objective optimization pipelines, which typically collapse diverse societal priorities into a single utility function [3,33]. To overcome this limitation, the MORL-SGF introduces a unified architecture that integrates multi-objective reinforcement learning with governance-aware reward design and Digital Twin-based policy validation.

At its core, MORL-SGF is structured around three tightly coupled intelligence layers. First, MORL is employed to learn sets of Pareto-optimal policies rather than a single optimal solution, explicitly preserving trade-offs among competing objectives [6,63]. Second, governance-aware reward modeling embeds sustainability, equity, safety, participation, and cost considerations directly into the learning objective, ensuring that governance principles influence policy formation rather than post hoc evaluation [2,39,64]. Third, a DT validation layer serves as a risk-aware sandbox in which candidate policies are stress-tested, filtered, and certified before real-world deployment [25,65].

Unlike conventional reinforcement learning pipelines that return a single opaque policy, MORL-SGF produces a portfolio of non-dominated policies, enabling public authorities to select decisions aligned with explicit governance priorities such as sustainability targets, equity mandates, safety constraints, or fiscal limitations [45,52]. The overall architecture and its boundary between the external smart-city context and the internal governance core are illustrated in Figure 3.

5.2. Problem Formulation: MOMDP Modeling

Smart-city decision processes within MORL-SGF are formalized as a Multi-Objective Markov Decision Process (MOMDP) [1,44], defined as:

M = (S, A, P, R, γ), R : S \times A \to R^{m}

The MOMDP formulation follows the standard MDP structure with vector-valued rewards, as commonly adopted in multi-objective reinforcement learning literature. Where

S

denotes the system state space (e.g., congestion levels, energy demand, emissions, service access disparities),

A

denotes the set of admissible control actions (e.g., signal timing, routing strategies, energy dispatch decisions),

P

(s′|s,a) describes the stochastic state transition dynamics, derived either from historical data or from a calibrated DT simulator [5,9], and R(s,a) is an m-dimensional reward vector; throughout, larger is better (minimization objectives are sign-flipped or normalized). The reward function R = [R₁, R₂, …, R_m] is vector-valued, where each component corresponds to a distinct governance-aligned objective, and γ ∈ (0, 1) is the discount factor. All objectives are normalized to comparable ranges and oriented consistently (maximize), with minimization objectives negated or transformed.

At every step, the MORL agent receives a multi-dimensional reward rather than a scalar objective [66]:

R (s, a) = [R_{eff} (s, a), R_{sus} (s, a), R_{fair} (s, a), R_{safe} (s, a), R_{part} (s, a)] \in R^{m}

Here, R(s,a) denotes a vector-valued reward function

R : S \times A \to R^{m}

, where m is the number of governance objectives. In the present framework, m = 5, corresponding to efficiency, sustainability, fairness, safety, and participation objectives. The learning objective is to derive a set of optimal policies

Π^{*}

such that no policy strictly dominates another across all objectives [33,67]. This Pareto-optimal formulation ensures that improvements in one governance dimension necessarily incur trade-offs in at least one other, thereby making policy compromises explicit and auditable [14].

Π^{*} = {π_{1}, π_{2}, \dots, π_{k}}

Let

Π^{*}

denote the Pareto-optimal policy set (portfolio), and let

k = |Π^{*}|

be the number of non-dominated policies returned by the MORL procedure where each π_i stationary policy mapping states to actions. During training, the MORL agent iteratively samples trajectories under the current policy, updates vector-valued value estimates, and retains non-dominated policy candidates based on Pareto dominance relations. This process continues until convergence criteria are met (e.g., stability of the Pareto front or bounded hypervolume improvement).

5.3. Governance-Aware Reward Design

A defining feature of MORL-SGF is the direct encoding of governance principles into the reward space, inspired by ESG and SDG frameworks [4,42,68]. Rather than collapsing governance objectives into a weighted scalar reward, MORL-SGF maintains them as independent reward components to preserve transparency and avoid subjective pre-weighting [38].

Efficiency rewards capture service performance metrics such as congestion reduction or latency minimization. Sustainability rewards penalize environmental externalities, including carbon emissions and energy overuse. Fairness rewards quantify spatial or demographic service disparities, while safety rewards reflect accident rates or risk exposure. Cost rewards account for operational and budgetary efficiency, and participation rewards capture citizen engagement or public feedback indicators. Operational cost is treated as a component of the efficiency objective rather than as a separate reward dimension.

By maintaining independence among these reward dimensions and applying normalization constraints, MORL-SGF ensures that trade-offs are learned rather than imposed, allowing governance priorities to emerge naturally through Pareto reasoning rather than manual tuning.

5.4. Pareto-Based Multi-Objective Policy Learning

Policy learning within MORL-SGF is conducted using MORL architectures such as actor–critic methods, Soft Actor–Critic (SAC), or Pareto Q-learning, all of which estimate vector-valued value functions [15,20,27,63]. The expected return is defined over cumulative vector rewards:

Q^{π} (s, a) = E_{π}! [\sum_{t = 0}^{\infty} γ^{t} r (s_{t}, a_{t})]| [s_{0} = s, a_{0} = a] \in R^{m} .

Here,

Q^{π} (s, a) \in R^{m}

is a vector-valued action–value function, consistent with the vector-valued reward formulation. Policy gradients are computed in a multi-objective setting, enabling simultaneous optimization across governance dimensions. Rather than selecting a single optimal policy, the learning process yields a Pareto front of candidate solutions. To support governance-driven comparison, dominance-aware metrics such as hypervolume contribution are used to rank policies within the Pareto set [10,41].

H V (Π^{*}) = λ ⋃_{π \in Π^{*}} [r^{rf,}, J (π)], Δ H V (π) = H V (Π^{*}) - H V (Π^{*} ∖ {π})

r^{rf}

is a fixed reference point; ΔHV(π) ranks policies by Pareto-set contribution. This approach preserves the diversity of governance-relevant solutions and avoids premature convergence toward policies that over-optimize a single objective at the expense of others. Policy updates are computed using multi-objective extensions of actor–critic or Q-learning methods, where vector-valued returns are handled through dominance-based evaluation or preference-conditioned optimization rather than a single scalar objective. Importantly, no scalar aggregation of objectives is performed during training. Any governance-based ranking occurs only after Pareto set generation, preserving the multi-objective learning structure while enabling accountable stakeholder selection.

To mitigate Pareto-front explosion and cognitive overload, MORL-SGF supports dominance filtering, hypervolume contribution ranking, and stakeholder-conditioned preference queries to reduce candidate policies to a manageable subset. Hierarchical decomposition may also be applied, whereby governance objectives are clustered into higher-level categories prior to frontier evaluation. While full real-time frontier pruning guarantees are not formalized in the present work, scalable MORL variants and preference-conditioned policy learning represent active research directions.

5.5. Digital Twin Policy Validation and Risk Filtering

Before real-world deployment, all candidate policies are evaluated within a city-scale DT environment that replicates operational dynamics under normal and extreme conditions [31,69]. The DT enables systematic stress testing, robustness analysis, and risk assessment without exposing real citizens or infrastructure to experimental failures [18].

The Digital Twin is interfaced with the MORL agent through a simulation evaluation pipeline: candidate policies generated during training are executed within the DT environment across multiple controlled scenarios. The resulting performance vectors are logged and compared against governance thresholds before policies are admitted into the validated set. This interface decouples policy learning from real-world execution while maintaining governance-aware certification prior to deployment.

Policies are assessed for stability, governance compliance, and risk tolerance. Only those satisfying predefined governance thresholds—such as acceptable risk levels and minimum robustness criteria—are retained:

Π_{validated} = \{π \in Π^{*}| Risk (π) \leq λ \land Robustness (π) \geq τ\}

Risk (π) = {m a x}_{ω \in Ω} [\frac{1}{m} \sum_{j = 1}^{m} m a x (0, J_{j}^{nom} (π) - J_{j}^{ω} (π))],

Robustness (π) = \frac{1}{m} \sum_{j = 1}^{m} \frac{{Var}_{ω \in Ω}! (J_{j}^{ω} (π))}{{(ϵ + |J_{j}^{nom} (π)|)}^{2} .}

Here, Risk(π) is operationalized as the expected performance degradation under predefined stress-test scenarios within the DT, measured as the weighted variance or worst-case deviation of governance reward components across simulated perturbations. Robustness(π) denotes policy stability under distributional shifts, estimated as the consistency of reward performance across multiple scenario samples. These definitions are domain-instantiable and may be computed using statistical dispersion, constraint-violation frequency, or scenario-based sensitivity metrics.

Risk (π)

and

Robustness (π)

denote DT-derived evaluation scores, and

λ

,

τ

are governance thresholds set by stakeholders/regulators. Ω is a predefined finite set of DT stress-test scenarios (demand spikes, sensor noise, incidents);

J_{j}^{nom} (π)

is the nominal return and

J_{j}^{ω} (π)

is the return under scenario ω; ϵ > 0 avoids division by zero. This validation stage replaces unsafe online trial-and-error exploration with certified, simulation-backed policy approval, ensuring that only governance-compliant policies proceed to deployment [70]. Furthermore, Thresholds are defined contextually by municipal governance requirements and are not fixed constants within the framework.

5.6. Explainable Policy Selection for City Stakeholders

Following DT validation, decision makers are presented with a transparent portfolio of Pareto-optimal policies rather than a single recommendation [8,71]. Each policy can be compared across governance dimensions such as emissions, equity, cost, and safety, enabling explicit alignment with municipal priorities.

For example, one policy may prioritize emissions reduction at the expense of cost efficiency, while another may favor equity and safety in underserved regions. By exposing these trade-offs explicitly, MORL-SGF supports accountable, stakeholder-driven policy selection rather than algorithmic opacity [17,22].

Legitimacy, auditability, and governance alignment are operationalized in MORL-SGF through three measurable mechanisms: (i) explicit vector-valued objective representation, ensuring that all governance dimensions are preserved and inspectable; (ii) Digital Twin-based validation metrics (risk, robustness, compliance thresholds), enabling pre-deployment stress testing; and (iii) transparent Pareto portfolio exposure, allowing decision-makers to inspect trade-offs prior to selection. These mechanisms provide structural and procedural auditability rather than post hoc justification. Empirical validation at full city scale remains a future deployment-stage objective.

5.7. Deployment Feedback and Continual Policy Alignment

Once deployed, policies remain subject to continual governance alignment through real-world monitoring and feedback. Observed performance and governance metrics are compared against simulated expectations, and policy parameters are iteratively adjusted using governance-aware feedback signals [11,35]. Policy parameters are adjusted through governance-aware feedback signals derived from discrepancies between real-world and simulated outcomes. This update rule represents a conceptual governance-aware feedback heuristic rather than a standardized reinforcement learning update, intended to illustrate how discrepancies between simulated and real-world governance outcomes may guide adaptive policy adjustment. This adaptive loop ensures long-term alignment with evolving societal goals, regulatory constraints, and urban dynamics [72,73]. When deployed performance deviates from governance targets—such as sustainability thresholds, fairness constraints, or safety metrics—the feedback term

(R_{r} e a l - R_{s} i m u l a t e d)

drives corrective adaptation. Unlike conventional online reinforcement learning, this update mechanism is explicitly governance-aware, ensuring that policy evolution remains bounded by regulatory, ethical, and societal constraints rather than purely performance-driven optimization.

5.8. Discussion and Framework Implications

MORL-SGF provides an architecture-level integration of governance-aware MORL and DT validation, enabling explicit trade-off exposure and policy filtering prior to deployment. Its primary implication is a shift from single-policy optimization to auditable policy portfolios, where governance constraints and stakeholder priorities can be operationalized transparently. The framework is intended as a reusable blueprint that can be instantiated with different MORL algorithms and domain-specific DT environments.

6. Evidence Synthesis and Multi-Domain Insights from Smart City Literature

The design of the MORL-SGF is grounded in a systematic synthesis of 79 peer-reviewed studies, complemented by additional references used for contextual and methodological support, spanning smart transportation, energy systems, surveillance, healthcare, and urban infrastructure planning. While these studies demonstrate progress in AI, reinforcement learning, and simulation for city-scale systems (e.g., [1,2,3,5,8,10,11]), the synthesis identifies persistent limitations in governance integration, explainability, and cross-domain reasoning.

This section evaluates the governance readiness of contemporary smart-city AI research and positions MORL-SGF in relation to the identified gaps. The validation presented in this section is interpretive and synthesis-driven, aimed at demonstrating architectural readiness rather than empirical performance benchmarking.

6.1. Review Methodology and Governance Readiness Scoring

Each selected study was evaluated using a GRI designed to assess whether an AI system is suitable for accountable, deployable urban decision-making. Five governance-critical dimensions were considered:

Objective formulation (O): single- vs. multi-objective learning;
Governance integration (G): explicit modeling of fairness, sustainability, or accountability;
Digital Twin or simulator usage (D): from none to high-fidelity twins;
Explainability (X): post hoc vs. intrinsic interpretability;
Validation setting (V): theoretical, simulated, or real-world deployment.

To ensure consistency and interpretability of the GRI, each dimension is discretized on a bounded ordinal scale reflecting increasing levels of governance maturity. Specifically, objective formulation is encoded as O ∈ {0, 1}, where 0 denotes single-objective optimization and 1 denotes explicit multi-objective formulation. Governance integration, DT usage, explainability, and validation rigor are encoded as G, D, X, V ∈ {0, 1, 2} where 0 indicates absence, 1 denotes partial or post hoc implementation, and 2 represents explicit, intrinsic, or high-fidelity integration. Under this encoding, the maximum achievable composite score is 10, corresponding to full multi-objective formulation with explicit governance modeling, high-fidelity DT validation, intrinsic explainability, and real-world or deployment-level validation

The normalized index is defined as

GRI = \frac{O + G + D + X + V}{10}, GRI \in [0,1]

where GRI ∈ [0, 1] and higher values indicate stronger alignment with governance-aware and deployment-ready AI.

This scoring framework enables quantitative comparison across heterogeneous domains while preserving interpretability for policy-oriented analysis.

Given the interpretive nature of governance assessment, potential subjectivity in scoring was addressed through a structured coding rubric with explicit operational definitions for each dimension (O, G, D, X, V). For example, governance integration (G = 2) required direct embedding of fairness, sustainability, or accountability indicators within the learning objective, whereas post hoc evaluation corresponded to G = 1. Similarly, DT integration (D = 2) required active policy-loop validation rather than standalone simulation or visualization. To enhance internal consistency, a two-pass coding process was adopted, consisting of initial classification followed by structured cross-verification against predefined criteria.

While formal inter-rater reliability analysis was not conducted, the transparent scoring rubric and ordinal definitions mitigate arbitrariness and allow independent replication or reassessment by future researchers.

6.2. Key Empirical Findings

6.2.1. Finding 1—Governance Objectives Are Rarely Embedded into Learning

Although 37% of reviewed studies adopt multi-objective formulations ([3,12,63]), only 11% explicitly encode governance objectives such as fairness, sustainability, or accountability directly into the reward function ([1,8,73]). Many systems continue to optimize classical engineering KPIs (e.g., latency, throughput, cost) without governance coupling:

P (O = 0) = 0.63, P (O = 1) = 0.37, P (G = 2) = 0.11

All reported percentages were computed through frequency analysis of the 79 retained studies. Each study was coded according to the predefined ordinal rubric (O, G, D, X, V). Binary or ordinal classifications were then aggregated by counting occurrences of each category and normalizing by the total sample size (n = 79). For example, P(O = 1) = 0.37 indicates that 29 out of 79 studies explicitly adopted multi-objective formulations. No inferential statistical modeling was performed; results represent descriptive frequency analysis of governance-readiness characteristics.

6.2.2. Finding 2—Digital Twins Are Underutilized for Policy Validation

Only a limited number of works use DTs or high-fidelity simulations as part of the RL loop ([2,20,70]), leaving most approaches reliant solely on abstract simulators.

P (D \geq 1) = 0.28, P (D = 2) = 0.08

Here, D = 2 denotes high-fidelity DT environments explicitly integrated into the policy evaluation loop rather than used solely for visualization or offline simulation. Many DT implementations remain focused on forecasting or visualization rather than policy-validation integration.

6.2.3. Finding 3—Explainability Is Mostly Post Hoc, Not Intrinsic

Interpretability frameworks appear in a few mobility and energy applications ([3,5]), but intrinsic policy explainability appears in fewer than 10% of reviewed systems:

P (X = 0) = 0.52, P (X = 1) = 0.39, P (X = 2) = 0.09

6.2.4. Finding 4—Cross-Domain Decision Learning Is Largely Absent

The literature remains dominated by domain-isolated optimization, with transportation [20,42,63] and energy systems [10,28,70] accounting for the majority of studies:

P (t r a f f i c) = 0.41, P (e n e r g y) = 0.22, P (h e a l t h) = 0.09, P (m u l t i - d o m a i n j o i n t) = 0.04

Only 4% of systems address cross-sector trade-offs.

6.2.5. Finding 5—MORL Is Rare and Not Aligned with Governance Reasoning

Although some works explore MORL in transportation and energy [12,20,73], governance-aware Pareto selection is still absent:

P (D L o n l y) = 0.34, P (R L o n l y) = 0.29, P (M O R L) = 0.07

Few MORL-based approaches link Pareto policy selection to governance or sustainability criteria.

6.3. Governance Readiness Benchmarking

Table 5 positions common research paradigms according to average Governance Readiness Index. The MORL-SGF score reflects a framework-level idealized evaluation rather than an empirical deployment benchmark.

The GRI score of 0.89 assigned to MORL-SGF represents an idealized framework-level evaluation derived directly from the ordinal scoring rubric defined in Section 6.1. Under this rubric, MORL-SGF satisfies the maximum attainable scores in objective formulation (O = 1), governance integration (G = 2), DT validation (D = 2), and intrinsic explainability (X = 2). Deployment validation is modeled at near-full maturity (V ≈ 1.9), reflecting structured DT-based validation and governance-constrained deployment logic rather than large-scale empirical field implementation. The resulting composite score (1 + 2 + 2 + 2 + 1.9)/10 ≈ 0.89 therefore represents a theoretical upper-bound architectural alignment with governance-aware AI design.

Importantly, this benchmarking is comparative and structural rather than empirical. The score is not derived from real-world deployment metrics but from architectural capability alignment under the defined governance maturity criteria. This distinction ensures transparency and prevents conflation between conceptual completeness and empirical validation. The improvement margin between MORL-SGF and the nearest paradigm is:

G R I = 0.89 - 0.52 = 0.37

This reflects a 37% relative increase in governance alignment. This improvement reflects the cumulative impact of governance-aware rewards, DT validation, and Pareto-based explainability.

6.4. Domain-Specific Gaps and MORL-SGF Contributions

Key shortcomings are consistent across domains, with representative examples drawn from transportation [20,42], energy [10,28,70], healthcare [33,53], surveillance [13,68], and urban systems [35,73]. MORL-SGF directly responds to these limitations by integrating governance-aware rewards, DT validation, and Pareto-based explainability. Specifically, it directly addresses this by:

Embedding governance objectives into learning;
Validating policies under extreme scenarios;
Enabling auditable, stakeholder-driven policy selection;
Supporting cross-domain governance reasoning.

6.5. Key Implications and Positioning of MORL-SGF

The evidence synthesis highlights persistent gaps in governance integration, DT validation, explainability, cross-domain coordination, and governance-aligned MORL deployment. These findings provide analytical justification for the architectural components of MORL-SGF described in Section 5.

7. Smart City Governance Use-Case Scenarios

This section provides four illustrative use-case scenarios showing how MORL-SGF operationalizes (i) vector-valued governance rewards, (ii) DT validation, and (iii) Pareto-based policy selection. The scenarios are conceptual demonstrations of the framework workflow rather than empirical benchmarking results. They do not represent implemented simulation experiments but serve to clarify how governance-aware objectives and DT validation would be operationalized in practice.

7.1. Governance-Aware Traffic Policy Optimization

Urban traffic control remains one of the most studied smart city applications [4,6,10,16,28,66,74,75]; however, existing solutions primarily optimize throughput or delay, rarely encoding fairness, emissions, or safety into the optimization objective. Within MORL-SGF, traffic policy generation considers a vectorized governance reward reflecting these competing concerns. Decision variables include intersection signal timing, lane prioritization, adaptive routing, and public-transit privileges.

The governance reward structure is modeled as:

R_{t r a f f i c} = [R_{f l o w}, R_{e m i s s i o n}, R_{e q u i t y}, R_{s a f e t y}]

where flow measures relative congestion reduction, emission penalizes CO₂ output, equity represents service balance across districts, and safety penalizes accident-prone signal configurations. To provide a quantitative illustration of how governance-aware optimization operates within this structure, consider three candidate traffic policies evaluated in the DT environment under peak-hour congestion. Using the previously defined normalized reward dimensions, the simulation yields the following governance-aligned evaluation vectors:

Policy A: (0.82, −0.41, 0.65, −0.22);
Policy B: (0.74, −0.28, 0.81, −0.18);
Policy C: (0.88, −0.56, 0.52, −0.35).

Under Pareto dominance analysis, Policy C becomes dominated due to elevated emission and safety-risk penalties, despite its high congestion reduction score. Policies A and B remain Pareto-efficient, representing distinct governance trade-offs between throughput efficiency and spatial equity. Hypervolume comparison indicates that Policy B provides broader balanced coverage across governance objectives, illustrating how MORL-SGF enables structured and auditable policy selection within the defined objective space.

This illustrative numerical demonstration clarifies how governance criteria are operationalized in measurable form within the DT validation loop, without relying on post-deployment evaluation.

Candidate policies are validated in a DT traffic simulator, stress-tested under rush hours, detours, and sensor failure conditions [2,7,25]. Final policy selection follows Pareto optimality using hypervolume dominance:

π * \in a r g m a x_{π \in Π_{v a l i d}} (H V (π))

This formulation allows city operators to explicitly choose policies prioritizing, for instance, sustainability-heavy traffic control in central business districts while selecting fairness-optimized traffic policies for underserved regions.

7.2. Sustainable Energy Sharing Across Urban Districts

City energy allocation must balance demand, cost, renewable utilization, and fair access. Prior research highlights strong potential for multi-objective energy optimization but with limited attention to citizen equity or governance constraints [5,14,19,29,31,69,76]. Unlike common grid optimization approaches that focus purely on load balancing or price reduction, MORL-SGF embeds energy equity and sustainability within the reward structure itself. The policy agent controls energy redistribution across microgrids, storage dispatch, and renewable prioritization. Recent work on intelligent charging and energy navigation demonstrates the effectiveness of multi-objective reinforcement learning in balancing cost, demand, and infrastructure constraints, yet such approaches typically lack explicit governance or equity modeling, which MORL-SGF directly addresses [77].

The reward model is expressed as:

R_{e n e r g y} = [R_{l o a d_{b} a l a n c e}, R_{r e n e w a b l e_{u} s a g e}, R_{c o s t}, R_{f a i r n e s s}]

Policies are evaluated using DT simulations that replicate seasonal consumption shifts, renewable intermittency, and grid instability scenarios [30,56,70]. This approach ensures that energy allocation decisions do not disproportionately favor high-income districts at the cost of marginalized areas, while still optimizing for sustainability and operational feasibility. The resulting Pareto policy set may offer, for example, a low-cost strategy, an emission-optimal strategy, or a fairness-first allocation strategy, allowing decision makers to select aligned policies based on policy priority rather than algorithmic defaults.

7.3. Waste Collection Governance with Sustainability Metrics

Traditional waste collection routing focuses on cost and coverage, seldom incorporating societal satisfaction or environmental footprint as decision variables [18,23,45,61,78]. MORL-SGF reformulates collection scheduling as a multi-stakeholder governance problem incorporating service quality and carbon impact alongside operational cost.

The reward vector is defined as:

R_{w a s t e} = [R_{c o v e r a g e}, - R_{c o s t}, - R_{{C O}_{2}}, R_{c i t i z e n_{r} a t i n g}]

where coverage ensures adequate service distribution, negative cost and CO₂ terms penalize expensive or polluting routes, and citizen rating captures satisfaction from public service hotlines or digital feedback channels. DT routing environments assess policies against traffic dynamics, varying waste generation rates, and fleet capacity limitations. This allows city administrators to evaluate trade-offs such as whether a lower-carbon route sacrifices service coverage or whether a lower-cost solution may reduce community satisfaction.

7.4. Privacy-Preserving Public Safety Surveillance

AI-driven surveillance systems typically maximize anomaly or threat detection rates while neglecting privacy leakage, resulting in governance and ethical conflicts [45,61,78]. MORL-SGF introduces a controlled privacy–utility trade-off via reward balancing, ensuring surveillance agents are both effective and compliant with civic privacy constraints.

The reward is formalized as:

R_{p r i v a c y} = α R_{a n o m a l y_{d} e t e c t i o n} - (1 - α) R_{i d e n t i t y_{e} x p o s u r e}

where α controls the governance priority between safety and privacy. DT camera networks simulate crowd density, sensor distribution, lighting variation, and adversarial occlusion to evaluate policy robustness without exposing real citizens to experimental risks [3,33]. This enables accountable policy review such as “max detection with minimal identity recovery” or “strict identity suppression with permissive anomaly detection” depending on regulatory context.

Across all four scenarios, MORL-SGF demonstrates that smart-city decision problems are fundamentally governance trade-off problems rather than single-objective optimization tasks. The framework enables (i) explicit modeling of sustainability, equity, safety, and accountability, (ii) safe policy validation through DT simulation, and (iii) transparent Pareto-based policy selection aligned with ESG and regulatory priorities [7,8,11,27,35,56,66,72]. These use cases illustrate how MORL-SGF operationalizes governance-centric AI decision-making across diverse urban domains.

8. Challenges, Limitations, and Open Research Directions

While MORL-SGF introduces a structured path toward accountable and sustainability-driven policy learning in smart cities, deploying such a framework at scale presents persistent challenges. These challenges span governance modeling, technical feasibility, data readiness, multi-stakeholder alignment, and real-world reliability [40,62,64,76]. Recognizing them transparently is essential for a responsible research agenda, future benchmarking, and practical road-to-deployment planning. This section discusses key limitations and research opportunities, categorized into governance, learning, infrastructure, evaluation, and deployment dimensions.

8.1. Governance Representation and Quantification Challenges

Although sustainability and governance are widely endorsed goals in city planning, modeling them as mathematical objectives remains nontrivial [67,79]. Unlike latency or energy consumption metrics, governance constructs such as fairness, accountability, citizen trust, and participatory inclusion lack universally standardized or machine-readable formulations [33]. Consequently, reward shaping becomes highly context-dependent and often relies on proxy variables that can introduce bias or oversimplification. Critical studies on smart-city governance caution that technological optimization alone cannot resolve institutional accountability, democratic legitimacy, or social trust, underscoring the need for governance-aware AI frameworks that explicitly expose and manage policy trade-offs rather than obscuring them through automated decision-making [80].

The following expression illustrates a common composite governance formulation discussed in the literature and is not used within MORL-SGF, which preserves vector-valued rewards:

R_{g o v} (s, a) = w^{1} R_{f a i r} + w^{2} R_{s u s t a i n} + w^{3} R_{p a r t i c i p a t i o n} + w^{4} R_{a c c o u n t a b i l i t y}

This weighted form is shown only as an optional stakeholder preference aggregation for post-learning selection, not as the MORL training reward (which remains vector-valued). However, determining weights w_i without embedding subjective political or institutional bias is itself a governance challenge. Future research must develop standardized, audit-ready governance metrics derived from regulatory frameworks such as ISO 37120 [81], UN SDG [82] indicator taxonomy, or local municipal policy KPIs. Equally important is the construction of mechanisms that allow stakeholder-controlled adjustment of governance weights to ensure alignment with democratic priorities instead of opaque technical defaults.

Beyond weight subjectivity, governance-aware reward modeling introduces deeper ethical risks related to reward mis-specification and governance manipulation. Even when vector-valued rewards are preserved during MORL training, the selection and normalization of governance indicators may unintentionally privilege measurable proxies over latent civic values. For example, fairness metrics based solely on service distribution variance may overlook structural inequalities, while sustainability metrics may neglect long-term ecological externalities not captured in short-horizon simulations.

Additionally, governance objectives may be strategically influenced if institutional actors adjust threshold definitions or post-learning aggregation weights to favor politically convenient outcomes. Such manipulation risks transforming governance-aware AI into governance-appearing optimization without substantive accountability. Addressing these vulnerabilities requires transparent metric documentation, stakeholder-auditable reward design, periodic recalibration of governance thresholds, and independent policy review mechanisms capable of detecting metric gaming or reward exploitation.

8.2. Scalability of Multi-Objective Reinforcement Learning

Scaling MORL to real city environments remains computationally expensive, especially where high-dimensional state spaces, long planning horizons, and dense Pareto front approximations are required. MORL scalability challenges—such as Pareto explosion, long horizons, and high-dimensional objectives—have been highlighted in advanced RL research [26,52,83]. Hierarchical or preference-based MORL architectures may mitigate computational overhead [78,84]. The number of non-dominated policies grows polynomially with objectives:

|Π^{*}| \propto O (k^{(m - 1)})

where k is the number of policy candidates and m is the number of objectives. This compromises real-time decision-making, especially when governance objectives are expanded beyond efficiency and sustainability into auditability, equity, and citizen participation. This expression is intended to illustrate the general combinatorial growth trend of the non-dominated policy set with respect to the number of objectives, rather than to represent a strict theoretical complexity bound.

Promising directions include dimensionality reduction for objective pruning, preference-based MORL to limit non-actionable Pareto regions, and hierarchical MORL where governance objectives are learned at a slower policy cycle while operational objectives are updated in real time.

While MORL-SGF provides structural governance alignment, computational scalability remains a non-trivial challenge. Multi-objective reinforcement learning inherently increases optimization complexity relative to scalarized RL due to the need to approximate or maintain a Pareto policy set rather than a single optimal solution. The computational burden grows with (i) the dimensionality of the reward vector, (ii) the size of the state–action space, and (iii) the number of non-dominated candidate policies retained during training.

In high-dimensional objective spaces, Pareto front approximation may scale super-linearly as the number of objectives increases, potentially leading to policy set expansion and increased storage and evaluation costs. Additionally, integration with DT environments introduces further computational overhead, particularly when stress-testing policies under multiple extreme scenarios or multi-agent urban simulations.

In large-scale city deployments, scalability may require distributed MORL training, objective-space pruning mechanisms, or adaptive Pareto sampling strategies to prevent combinatorial growth. These computational trade-offs highlight the need for future research on scalable governance-aware MORL implementations capable of operating under real-time urban constraints.

8.3. Reliability of Digital Twin Environments

DT fidelity concerns are documented in smart infrastructure and simulation-based validation literature [39,76]. The need for adversarial testing, uncertainty modeling, and cross-domain synchronization is emphasized in multiple DT studies [33,53]. DTs play a critical role in pre-deployment policy evaluation, yet their representational fidelity remains a bottleneck. If the Digital Twin fails to model rare but high-impact scenarios (e.g., flash floods, mass transit failures, sensor blackouts, civil events), policies validated in simulation may fail catastrophically in real deployment. Moreover, many city Digital Twins are fragmented, vertically siloed, or lack real-time bidirectional data streams.

A robust validation loop ideally follows:

V a l i d a t i o n_S c o r e = E (R_{g o v}) - λ R i s k_{s i m} - β D i s t (r e a l, t w i n)

Future progress requires adaptive DTs capable of uncertainty modeling, adversarial stress testing, domain randomization, and cross-sector data fusion pipelines that maintain causal consistency between simulated and real urban dynamics.

8.4. Explainability, Accountability, and Policy Auditability

Explainability requirements for urban AI have been previously noted in governance and fairness research [40,64,78]. Policy-level interpretability tools—such as counterfactual explanations—are increasingly recognized as essential for public institutions [34,78]. Although some of these studies focus on domain-specific digital systems rather than reinforcement learning algorithms, they provide empirical evidence that transparency, perceived fairness, and user trust are decisive factors for acceptance of AI-enabled public-sector technologies, thereby motivating the need for auditable and explainable policy-learning frameworks in smart-city governance. Despite MORL producing Pareto-optimal policies, explaining the rationale behind policy trade-offs to non-technical stakeholders remains a governance barrier. Current MORL literature focuses on dominance ranking, yet city planning requires actionable justification. Illustrative examples of governance trade-offs include:

Why a policy sacrifices 12% traffic efficiency to improve 34% equity in underserved districts;
Which demographic zones benefit or lose from specific Pareto solutions;
What minimum governance threshold a policy violates if deployed.

Empirical studies on citizen acceptance of smart-city technologies demonstrate that transparency, explainability, and perceived fairness are stronger predictors of public trust than technical performance alone, reinforcing the necessity of auditable and interpretable policy-learning systems in urban governance [85]. Future research must formalize automatically generated policy audit trails, counterfactual explanations, and compliance certificates tied to governance constraints. These should resemble model cards or policy sheets, but at the policy level rather than the model level.

8.5. Data Readiness, Bias, and Social Representation Risks

Challenges of biased or incomplete urban data have been documented in smart city sensing and equity-focused AI literature [26,33,62,67]. Reward-level fairness calibration and demographic stress-testing extensions are necessary to prevent systemic bias propagation [78]. MORL-SGF assumes access to representative city-wide data, but many real municipal data contain sampling bias, sensor sparsity, missing demographic coverage, or socioeconomic blind spots. If left uncorrected, governance-aware learning may optimize biased realities rather than ideal civic outcomes. This is particularly critical when encoding fairness as a reward component:

R_{f a i r} = - σ (s e r v i c e_g a p_a c r o s s_d i s t r i c t s)

Future work must introduce bias-aware reward normalization, fairness-calibrated state sampling, synthetic data augmentation for marginalized zones, and worst-case equity stress testing before policy approval.

8.6. Synthesis and Research Outlook

The challenges identified in this section highlight that advancing governance-aware reinforcement learning for smart cities requires progress across multiple dimensions, including governance quantification, learning scalability, simulation fidelity, explainability, and data integrity. As summarized in Table 6, current limitations are not isolated technical shortcomings but systemic gaps that emerge when AI systems are deployed in complex, socio-technical urban environments [39,64,76]. The framework has not yet been implemented in a live DT environment or evaluated through numerical benchmarking, which remains a necessary next step for empirical validation.

A central challenge remains the lack of standardized, machine-interpretable governance metrics. While sustainability, fairness, and accountability are widely recognized as policy objectives, their translation into robust reward formulations continues to rely on context-specific proxies, raising concerns about bias, subjectivity, and regulatory alignment. Addressing this gap will require the development of audit-ready governance indicators grounded in international standards such as SDGs, ISO frameworks, and municipal policy KPIs, alongside mechanisms that allow stakeholder-adjustable governance preferences.

Scalability poses another critical barrier. As the number of objectives increases, MORL systems face Pareto front explosion and increased computational cost, limiting their applicability in real-time city operations. Promising research directions include preference-based MORL, hierarchical learning architectures, and objective-space reduction techniques that preserve governance relevance while maintaining tractability.

The reliability of DT environments also remains a bottleneck. Simulation fidelity gaps, limited modeling of rare but high-impact events, and fragmented cross-domain representations can undermine policy validation. Future DTs must evolve toward uncertainty-aware, adversarially tested, and causally consistent platforms capable of supporting governance-critical decision validation across multiple urban sectors.

Explainability and auditability represent equally important governance requirements. While Pareto-optimality exposes trade-offs mathematically, city administrators and regulators require interpretable policy justifications, counterfactual explanations, and compliance certificates that articulate why a policy was selected and whom it benefits or disadvantages. Embedding such policy-level explanations into MORL pipelines remains an open research challenge.

Finally, data readiness and social representation risks must be addressed to prevent governance-aware learning from reinforcing existing inequities. Bias-aware reward normalization, demographic stress testing, and synthetic data augmentation for underserved populations are essential to ensure that learned policies reflect equitable civic objectives rather than biased data realities.

Overall, addressing these challenges is essential to move MORL-SGF from a conceptual framework toward operational civic infrastructure. The value of MORL-SGF lies not only in optimizing urban systems, but in producing governable, auditable, and deployable policy portfolios that city authorities can justify, adapt, and regulate through transparent policy reasoning rather than algorithmic opacity. As governance-aware AI becomes a foundational requirement for future smart cities, MORL-SGF represents a necessary transition from performance-driven autonomy toward responsibility-driven urban intelligence.

9. Conclusions and Future Research Agenda

Smart cities require decision systems that extend beyond single-objective performance optimization toward governance-aware and accountable policy learning. This study introduced MORL-SGF, a unified architecture integrating multi-objective reinforcement learning, ESG/SDG-aligned reward modeling, DT–based policy validation, and Pareto-based policy auditing within a single decision pipeline.

From a methodological perspective, MORL-SGF formalizes governance objectives—such as sustainability, equity, safety, and accountability—as intrinsic components of vector-valued reinforcement learning rewards rather than post hoc evaluation criteria. The framework replaces single-policy optimization with Pareto-governed policy portfolios, supports structured DT validation prior to deployment, and establishes a cross-domain governance layer capable of coordinating heterogeneous urban subsystems under shared accountability constraints. The structured synthesis of 79 smart-city studies provides empirical grounding for this architectural design.

The evidence synthesis in Section 6 quantitatively substantiates the structural gap motivating this work. Within the reviewed corpus, 37% of studies adopted multi-objective formulations, yet only 11% explicitly embedded governance objectives within reinforcement learning reward functions. DT integration into policy-loop validation appeared in fewer than one-third of cases, while intrinsic explainability mechanisms were present in fewer than 10% of systems. Cross-domain governance-aware MORL implementations were rare. These findings indicate that contemporary smart-city AI research remains largely performance-centric.

Responsible deployment of MORL-SGF requires careful governance reward specification, high-fidelity DT modeling, human-in-the-loop policy selection, bias-aware data handling, and scalable MORL implementation strategies. These considerations highlight that governance-aware learning is both a technical and institutional challenge.

Future research should focus on standardized governance reward libraries aligned with SDG and ESG indicators, scalable and preference-based MORL architectures, causal and neuro-symbolic governance modeling, continual post-deployment governance adaptation, citizen-in-the-loop preference integration, and benchmarking within high-fidelity urban DT environments.

By embedding governance considerations directly into learning and validation processes, MORL-SGF provides a structured foundation for advancing toward accountable, auditable, and governance-aligned urban decision intelligence.

Funding

This research received no external funding.

Data Availability Statement

The data presented in the study are derived from publicly available sources. They were obtained from published articles and reports cited in the references list. No new data were generated during the study.

Conflicts of Interest

The author declares no conflicts of interest.

References

Herath, M.; Mittal, H. Adoption of Artificial Intelligence in Smart Cities: A Comprehensive Review. Int. J. Inf. Manag. Data Insights 2022, 2, 100076. [Google Scholar] [CrossRef]
Zheng, Z.; Zhou, Y.; Sun, Y.; Wang, Z.; Liu, B.; Li, K. Applications of Federated Learning in Smart Cities: Recent Advances, Taxonomy, and Open Challenges. Connect. Sci. 2022, 34, 1–28. [Google Scholar] [CrossRef]
Paiva, G.; Ahad, S.; Tripathi, M.A.; Feroz, G.; Casalino, N. Enabling Technologies for Urban Smart Mobility: Recent Trends, Opportunities, and Challenges. Sensors 2021, 21, 2143. [Google Scholar] [CrossRef] [PubMed]
Akram, H.; Abbas, A.; Khan, S.; Athar, M.; Ghazal, A.; Hamadi, T.A. Smart Energy Management System Using Machine Learning. Comput. Mater. Contin. 2024, 78, 959–973. [Google Scholar] [CrossRef]
Ullah, R.; Al-Turjman, Z.; Mostarda, F.; Gagliardi, L. Applications of Artificial Intelligence and Machine Learning in Smart Cities. Comput. Commun. 2020, 154, 313–323. [Google Scholar] [CrossRef]
Zhang, Y.; Cheng, T.; Zou, J. Multimodal Transportation Routing Optimization Based on Multi-Objective Q-Learning under Time Uncertainty. Complex Intell. Syst. 2024, 10, 3133–3152. [Google Scholar] [CrossRef]
Ryu, W.; Kim, K. Multi-Objective Optimization of Energy Saving and Throughput in Heterogeneous Networks Using Deep Reinforcement Learning. Sensors 2021, 21, 7925. [Google Scholar] [CrossRef]
Syed, A.; Sierra-Sosa, A.S.; Kumar, D.; Elmaghraby, A. Making Cities Smarter—Optimization Problems for IoT-Enabled Smart City Development: A Mapping of Applications, Objectives, Constraints. Sensors 2022, 22, 4380. [Google Scholar] [CrossRef]
Prawiyogi, L.; Purnama, A.G.; Meria, S. Smart Cities Using Machine Learning and Intelligent Applications. Int. Trans. Artif. Intell. 2022, 1, 102–116. [Google Scholar] [CrossRef]
Duhayyim, R.; Eisa, M.A.E.; Al-Wesabi, T.A.; Abdelmaboud, F.N.; Hamza, A.; Zamani, M.A.; Rizwanullah, A.S.; Marzouk, M. Deep Reinforcement Learning Enabled Smart City Recycling Waste Object Classification. Comput. Mater. Contin. 2022, 71, 5699–5715. [Google Scholar] [CrossRef]
Wolniak, K.; Stecu?a, R. Artificial Intelligence in Smart Cities—Applications, Barriers, and Future Directions: A Review. Smart Cities 2024, 7, 1346–1389. [Google Scholar] [CrossRef]
Vijayalakshmi, M.; Saravanan, V. Reinforcement Learning-Based Multi-Objective Energy-Efficient Task Scheduling in Fog-Cloud Industrial IoT-Based Systems. Soft Comput. 2023, 27, 17473–17491. [Google Scholar] [CrossRef]
Hafiz, R.; Parah, A.M.; Bhat, S.A. Reinforcement Learning Applied to Machine Vision: State of the Art. Int. J. Multimed. Inf. Retr. 2021, 10, 71–82. [Google Scholar] [CrossRef]
Abid, M.H.; Apon, M.S.; Hossain, H.J.; Ahmed, S.; Ahshan, A.; Lipu, R. A Novel Multi-Objective Optimization-Based Multi-Agent Deep Reinforcement Learning Approach for Microgrid Resources Planning. Appl. Energy 2024, 353, 122029. [Google Scholar] [CrossRef]
Mao, L.; Li, F.; Lin, Z.; Li, Y. Mastering Arterial Traffic Signal Control with Multi-Agent Attention-Based Soft Actor-Critic Model. IEEE Trans. Intell. Transp. Syst. 2022, 24, 3129–3144. [Google Scholar] [CrossRef]
Sriprateep, P.; Pitakaso, K.; Khonjun, R.; Srichok, S.; Luesak, T.; Gonwirat, P.; Kaewta, S.; Kosacka-Olejnik, C.; Enkvetchakul, M. Multi-Objective Optimization of Resilient, Sustainable, and Safe Urban Bus Routes for Tourism Promotion Using a Hybrid Reinforcement Learning Algorithm. Mathematics 2024, 12, 2283. [Google Scholar] [CrossRef]
Kazmi, A.M.; Khan, S.A.; Khan, Z.; Mazzara, A.; Khattak, M. Leveraging Deep Reinforcement Learning and Healthcare Devices for Active Travelling in Smart Cities; IEEE Transactions on Consumer Electronics: Piscataway, NJ, USA, 2024. [Google Scholar]
Deng, T.; Zhang, K.; Shen, Z.-J. A Systematic Review of a Digital Twin City: A New Pattern of Urban Governance toward Smart Cities. J. Manag. Sci. Eng. 2021, 6, 125–134. [Google Scholar] [CrossRef]
Vázquez-Canteli, Z.; Ulyanin, J.R.; Kämpf, S.; Nagy, J. Fusing TensorFlow with Building Energy Simulation for Intelligent Energy Management in Smart Cities. Sustain. Cities Soc. 2019, 45, 243–257. [Google Scholar] [CrossRef]
Ribeiro, D.Z.; Melgarejo, D.A.; Saadi, D.C.; Rosa, M.; Rodríguez, R.L. A Novel Deep Deterministic Policy Gradient Model Applied to Intelligent Transportation System Security Problems in 5G and 6G Network Scenarios. Phys. Commun. 2023, 56, 101938. [Google Scholar] [CrossRef]
Sacoto-Cabrera, E.J.; Perez-Torres, A.; Tello-Oquendo, L.; Cerrada, M. IoT, AI, and Digital Twins in Smart Cities: A Systematic Review for a Thematic Mapping and Research Agenda. Smart Cities 2025, 8, 175. [Google Scholar] [CrossRef]
Konstantakopoulos, C.; Barkan, I.C.; He, A.R.; Veeravalli, S.; Liu, T.; Liu, H.; Spanos, C. A Deep Learning and Gamification Approach to Improving Human-Building Interaction and Energy Efficiency in Smart Infrastructure. Appl. Energy 2019, 237, 810–821. [Google Scholar] [CrossRef]
Mittelstadt, B.D.; Allo, P.; Taddeo, M.; Wachter, S.; Floridi, L. The Ethics of Algorithms: Mapping the Debate. Big Data Soc. 2016, 3, 1–21. [Google Scholar] [CrossRef]
Alharbi, S. A Review of Deep Multi-Objective Reinforcement Learning and Vision-Based Systems for Smart Cities. Informatica 2025, 49, 221–238. [Google Scholar] [CrossRef]
Yun, J.-H.; Park, W.J.; Kim, S.; Shin, J.; Jung, M.; Mohaisen, S.; Kim, D.A. Cooperative Multi-Agent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control. IEEE Trans. Ind. Inform. 2022, 18, 7086–7096. [Google Scholar] [CrossRef]
Buch, N.; Velastin, S.A.; Orwell, J. A Review of Computer Vision Techniques for the Analysis of Urban Traffic. IEEE Trans. Intell. Transp. Syst. 2011, 12, 920–939. [Google Scholar] [CrossRef]
Wei, H.; Zheng, G.; Yao, H.; Li, Z. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. In 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; ACM: New York, NY, USA, 2017; pp. 2496–2505. [Google Scholar]
Li, X.; Xu, W. A Hybrid Evolutionary and Machine Learning Approach for Smart Building: Sustainable Building Energy Management Design. Sustain. Energy Technol. Assess. 2024, 65, 103709. [Google Scholar] [CrossRef]
Mason, S.; Grijalva, K. A Review of Reinforcement Learning for Autonomous Building Energy Management. Comput. Electr. Eng. 2019, 78, 300–312. [Google Scholar] [CrossRef]
Fei, B.; Han, L. Multi-Object Multi-Camera Tracking Based on Deep Learning for Intelligent Transportation: A Review. Sensors 2023, 23, 3852. [Google Scholar] [CrossRef]
Lin, H.; Lin, Y.H.; Tsai, M.D.; Deng, K.T.; Ishii, M.J. Multi-Objective Optimization Design of Green Building Envelopes and Air Conditioning Systems for Energy Conservation and CO? Emission Reduction. Sustain. Cities Soc. 2021, 64, 102555. [Google Scholar] [CrossRef]
Mshali, D.; Lemlouma, H.; Moloney, T.; Magoni, M. A Survey on Health Monitoring Systems for Health Smart Homes. Int. J. Ind. Ergon. 2018, 66, 26–56. [Google Scholar] [CrossRef]
Okolo, A.O.; Arowoogun, C.A.; Chidi, J.O.; Adeniyi, R. Telemedicine’s Role in Transforming Healthcare Delivery in the Pharmaceutical Industry: A Systematic Review. World J. Adv. Res. Rev. 2024, 21, 1836–1856. [Google Scholar] [CrossRef]
Shaik, T.; Tao, X.; Higgins, N.; Li, L.; Gururajan, R.; Zhou, X.; Acharya, U.R. Remote Patient Monitoring Using Artificial Intelligence: Current State, Applications, and Challenges. WIREs Data Min. Knowl. Discov. 2023, 13, e1485. [Google Scholar] [CrossRef]
Tong, S.; Ye, Z.; Yan, F.; Liu, M.; Basodi, H. A Survey on Algorithms for Intelligent Computing and Smart City Applications. Big Data Min. Anal. 2021, 4, 155–172. [Google Scholar] [CrossRef]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Zhang, V.; Wang, M.; Sathishkumar, X.; Sivakumar, V. Machine Learning Techniques Based on Security Management in Smart Cities Using Robots. Work 2021, 68, 891–902. [Google Scholar] [CrossRef] [PubMed]
Almulhim, A.I.; Yigitcanlar, T. Understanding Smart Governance of Sustainable Cities: A Review and Multidimensional Framework. Smart Cities 2025, 8, 113. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Using Thematic Analysis in Psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar]
Lau, U.-X.; Marakkalage, B.P.L.; Zhou, S.H.; Hassan, Y.; Yuen, N.U.; Zhang, C.; Tan, M. A Survey of Data Fusion in Smart City Applications. Inf. Fusion 2019, 52, 357–374. [Google Scholar] [CrossRef]
Musa, S.I.; Malami, A.A.; Alanazi, S.I.; Ounaies, F.; Alshammari, W.; Haruna, M. Sustainable Traffic Management for Smart Cities Using Internet-of-Things-Oriented Intelligent Transportation Systems (ITS): Challenges and Recommendations. Sustainability 2023, 15, 9859. [Google Scholar] [CrossRef]
Song, B.; Xing, F.; Wang, H.; Luo, X.; Dai, S.; Xiao, P.; Zhao, Z. Evolutionary Multi-Objective Reinforcement Learning-Based Trajectory Control and Task Offloading in UAV-Assisted Mobile Edge Computing. IEEE Trans. Mob. Comput. 2022, 22, 7387–7405. [Google Scholar] [CrossRef]
Yang, H.; Zhang, J.; Wang, J. Urban Traffic Control in Software-Defined Internet of Things via a Multi-Agent Deep Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3742–3754. [Google Scholar] [CrossRef]
Peralta-Ochoa, E.J.; Chaca-Asmal, A.M.; Guerrero-Vásquez, P.A.; Ordoñez-Ordoñez, L.F.; Coronel-González, J.O. Smart Healthcare Applications over 5G Networks: A Systematic Review. Appl. Sci. 2023, 13, 1469. [Google Scholar] [CrossRef]
Tran, S.-H.Q.; Bae, D. Proximal Policy Optimization through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection. Appl. Sci. 2020, 10, 5722. [Google Scholar] [CrossRef]
Vaidya, O.S.; Kumar, S. Analytic Hierarchy Process: An Overview of Applications. Eur. J. Oper. Res. 2006, 169, 1–29. [Google Scholar] [CrossRef]
Stofkova, J.; Krejnus, M.; Stofkova, K.R.; Malega, P.; Binasova, V. Use of the Analytic Hierarchy Process and Selected Methods in the Managerial Decision-Making Process in the Context of Sustainable Development. Sustainability 2022, 14, 11546. [Google Scholar] [CrossRef]
Simjanović Dušan, J.; Vesić Nenad, O.; Ignjatović Jelena, M.; Ranđelović Branislav, M. A Novel Surface Fuzzy Analytic Hierarchy Process. Filomat 2023, 37, 3357–3370. [Google Scholar] [CrossRef]
Domínguez, S.; Carnero, M.C. Fuzzy Multicriteria Modelling of Decision Making in the Renewal of Healthcare Technologies. Mathematics 2020, 8, 944. [Google Scholar] [CrossRef]
Louati, H.M.; Louati, A.; Kariri, H.; Neifar, E.; Hassan, W.; Khairi, M.K.; Farahat, M.H.; El-Hoseny, M.A. Sustainable Smart Cities through Multi-Agent Reinforcement Learning-Based Cooperative Autonomous Vehicles. Sustainability 2024, 16, 1779. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Brommeyer, Z.; Whittaker, M.; Mackay, M.; Ng, M.; Liang, F. Building Health Service Management Workforce Capacity in the Era of Health Informatics and Digital Health—A Scoping Review. Int. J. Med. Inf. 2023, 169, 104909. [Google Scholar] [CrossRef] [PubMed]
Ho, K.L.; Tsang, G.T.S.; Wu, Y.P.; Wong, C.H.; Choy, W.H.; Choy, W.H. A Computer Vision-Based Roadside Occupation Surveillance System for Intelligent Transport in Smart Cities. Sensors 2019, 19, 1796. [Google Scholar] [CrossRef] [PubMed]
Thakur, N.; Nagrath, P.; Jain, R.; Saini, D.; Sharma, N.; Hemanth, J. Artificial Intelligence Techniques in Smart Cities Surveillance Using UAVs: A Survey. Stud. Comput. Intellig. 2021, 971, 329–353. [Google Scholar]
Chen, G.; Liu, X. Federated Deep Reinforcement Learning-Based Task Offloading and Resource Allocation for Smart Cities in a Mobile Edge Network. Sensors 2022, 22, 4738. [Google Scholar] [CrossRef]
Dhankhar, S.; Dhankhar, N.; Sandhu, V.; Mehla, S. Forecasting Electric Vehicle Sales with ARIMA and Exponential Smoothing Method: The Case of India. Transp. Dev. Econ. 2024, 10, 32. [Google Scholar] [CrossRef]
Dai, Y.; Hasanefendic, S.; Bossink, B. A Systematic Literature Review of the Smart City Transformation Process: The Role and Interaction of Stakeholders and Technology. Sustain. Cities Soc. 2023, 101, 105112. [Google Scholar] [CrossRef]
Bibri, S.E. Data-Driven Smart, Sustainable Cities of the Future: An Evidence Synthesis Approach to a Comprehensive State-of-the-Art Literature Review. Sustain. Futures 2021, 3, 100047. [Google Scholar]
Çakil, F.; Aksoy, N. Reinforcement Learning-Based Multi-Objective Smart Energy Management for Electric Vehicle Charging Stations with Priority Scheduling. Energy 2025, 322, 135475. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2016, arXiv:150902971. [Google Scholar]
Zheng, Y.; Hao, Q.; Wang, J.; Gao, C.; Chen, J.; Jin, D.; Li, Y. A Survey of Machine Learning for Urban Decision Making: Applications in Planning, Transportation, and Healthcare. ACM Comput. Surv. 2024, 57, 99. [Google Scholar] [CrossRef]
Nguyen, L.D.; Duong, K.K.; Vien, T.Q.; Le-Khac, N.A.; Nguyen, N.A. Distributed Deep Deterministic Policy Gradient for Power Allocation Control in D2D-Based V2V Communications. IEEE Access 2019, 7, 164533–164543. [Google Scholar] [CrossRef]
Šurdonja, A.; Giuffrè, S.; Deluka-Tibljaš, T. Smart Mobility Solutions—Necessary Precondition for a Well-Functioning Smart City. Transp. Res. Procedia 2020, 45, 604–611. [Google Scholar] [CrossRef]
Vamvakas, E.; Michailidis, D.; Korkas, P.; Kosmatopoulos, C. Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
Zhang, Q.; Xu, K.; Pan, H.; Zheng, B. Explicit Coordinated Signal Control Using Soft Actor–Critic for Cycle Length Determination; IET Intelligent Transport Systems: Hoboken, NJ, USA, 2024. [Google Scholar] [CrossRef]
Lakshmi, N.J.; Jabalia, N. Body Sensor Networks as Emerging Trends of Technology in Health Care Systems: Challenges and Future. In Efficient Data Handling for Massive Internet of Medical Things Healthcare Data Analytics; Springer: Berlin/Heidelberg, Germany, 2021; pp. 129–157. [Google Scholar]
Bahaghighat, S.; Abedini, M.; Xin, F.; Zanjireh, Q.; Mirjalili, M.M. Using Machine Learning and Computer Vision to Estimate the Angular Velocity of Wind Turbines in Smart Grids Remotely. Energy Rep. 2021, 7, 8561–8576. [Google Scholar] [CrossRef]
Ghofrani, M.A.; Nazemi, A.; Jafari, S.D. HVAC Load Synchronization in Smart Building Communities. Sustain. Cities Soc. 2019, 51, 101741. [Google Scholar] [CrossRef]
Mocanu, W.L.; Nguyen, E.; Gibescu, P.H.; Kling, M. Deep Learning for Estimating Building Energy Consumption. Sustain. Energy Grids Netw. 2016, 6, 91–99. [Google Scholar] [CrossRef]
Nisyrios, E.; Gkiotsalitis, K. Optimization-Based Approaches for Traffic and Fleet Management of Connected and Autonomous Vehicles: A Systematic Literature Review. Int. J. Intell. Transp. Syst. Res. 2025, 23, 1668–1690. [Google Scholar] [CrossRef]
Jiang, J.C.; Kantarci, B.; Oktug, S.; Soyata, T. Federated Learning in Smart City Sensing: Challenges and Opportunities. Sensors 2020, 20, 6230. [Google Scholar] [CrossRef] [PubMed]
Leite, C.E.; Jiménez-Fernández, G.M.C.; Salcedo-Sanz, S.; Marcelino, C.G.; Pedreira, C.G. Solving an Energy Resource Management Problem with a Novel Multi-Objective Evolutionary Reinforcement Learning Method. Knowl.-Based Syst. 2023, 280, 111027. [Google Scholar] [CrossRef]
Emami, N.; Pacheco, L.; Maio, A.D.; Braun, T. RC-TL: Reinforcement Convolutional Transfer Learning for Large-Scale Trajectory Prediction. In NOMS IEEE/IFIP Network Operations and Management Symposium; IEEE: New York, NY, USA, 2022. [Google Scholar]
Rafique, M.T.; Mustafa, A.; Sajid, H. Reinforcement Learning for Adaptive Traffic Signal Control: Turn-Based and Time-Based Approaches to Reduce Congestion. arXiv 2024, arXiv:2408.15751. [Google Scholar]
Chan, H.C.Y.; Chan, L. Smart Library and Smart Campus. J. Serv. Sci. Manag. 2018, 11, 543–564. [Google Scholar] [CrossRef]
Cai, Z.; Lin, X.; Weng, H.; Mansour, D.-E.A. Intelligent Charging Navigation for Electric Vehicles Based on Reservation Charging Service. Smart Cities 2025, 8, 178. [Google Scholar] [CrossRef]
Al-Fraihat, S.; Joy, M.; Masa’deh, R.; Sinclair, J. Evaluating E-Learning Systems Success: An Empirical Study. Comput. Hum. Behav. 2020, 102, 67–86. [Google Scholar] [CrossRef]
Dimitriadou, A.; Lanitis, E. A Critical Evaluation, Challenges, and Future Perspectives of Using Artificial Intelligence and Emerging Technologies in Smart Classrooms. Smart Learn. Environ. 2023, 10, 12. [Google Scholar] [CrossRef]
Damrongrat, B.; Sararit, T.; Pokharatsiri, J.; Waroonkun, T.; Wongkaew, W.; Phunjanna, K. Digital Infrastructure and the Limits of Smart Urbanism: Evidence from a Panel Analysis and the Case of Wang Chan Valley. Smart Cities 2025, 8, 180. [Google Scholar] [CrossRef]
ISO 37120; Sustainable Cities and Communities: Indicators for City Services and Quality of Life. ISO: Paris, France, 2018.
Nations, U. Sustainable Development Goals. 2025. Available online: https://regen4futures.org/en/discover-regen/education-with-impact/sdgs-sustainability-knowledge/?gad_source=1&gad_campaignid=22291469258&gbraid=0AAAAA-PHuH3Tqe3uVS0W8E6SaYYEDaXZf&gclid=EAIaIQobChMItKjJ3ueRkwMV1QCiAx1Rwg1MEAAYAiAAEgLxNfD_BwE (accessed on 24 February 2026).
Kothadiya, D.; Chaudhari, A.; Macwan, R.; Patel, K.; Bhatt, C. The Convergence of Deep Learning and Computer Vision: Smart City Applications and Research Challenges. In Proceedings of the 3rd International Conference on Integrated Intelligent Computing, Communication and Security (ICIIC 2021), Bangalore, India, 6–7 August 2021. [Google Scholar]
Upadhyay, R.K.; Sharma, S.K.; Kumar, V. Introduction to Intelligent Transportation System and Advanced Technology. In Energy, Environment, and Sustainability; System, I.T., Technology, A., Eds.; Springer: Singapore, 2024. [Google Scholar]
Skouloudis, A.; Botetzagias, I.; Malesios, C.; Koutroumpinis, P. Acceptance of Smart-City Technologies: Some Evidence on the Role of Perceptions and Demographics from a Municipality of Athens, Greece. Smart Cities 2025, 8, 177. [Google Scholar] [CrossRef]

Figure 1. Workflow of the MORL–Smart Governance Framework (MORL-SGF), blue boxes represent system components and inputs, green boxes denote analytical processes, and orange boxes correspond to decision outputs within the MORL-SGF workflow.

Figure 2. PRISMA 2020 Flow Diagram for Study Selection.

Figure 3. Architecture of the MORL–Smart Governance Framework (MORL-SGF), where Blue = Smart-city context (external to MORL-SGF), Orange = Core MORL-SGF processes and Dashed box = Governance Core boundary.

Table 1. Governance Reward Mapping for MORL Policy Design.

Governance Dimension	AI Reward Interpretation	Example Metrics
Environmental (E)	Minimize negative externalities	CO₂ emissions, energy use, pollution index [4,7]
Social (S)	Maximize equity and accessibility	Service fairness, response time variance, coverage [8,16]
Governance (G)	Ensure accountability & explainability	Policy transparency, risk compliance score [11,23]

Table 2. Identified Gaps and Corresponding System Requirements.

Observed Gap	Required Capability
Disconnected domain optimizations	Cross-domain MORL policy learning [9,10]
No governance-aligned rewards	ESG/SDG-aware reward formulation [11,23]
Lack of safe policy rollout	DT pre-deployment validation [18,22]
No accountable policy selection	Governance-aware Pareto auditing [10,16]

Table 3. Contributions Compared to Current Limitations.

Contribution	What It Solves	Impact
C1. Governance-aware MORL reward formulation	Addresses G1 + G2	Embeds ESG & SDG metrics directly into policy learning
C2. DT policy validation loop	Addresses G4	Enables safety screening before deployment
C3. Pareto policy ranking + explainability layer	Addresses G1 + G3	Provides transparent trade-offs and auditable decisions
C4. Unified cross-domain MORL architecture	Addresses G5	Single governance layer across city infrastructures
C5. Generalizable smart governance framework (MORL-SGF)	Addresses all gaps	Reusable, scalable, and policy-driven AI governance for cities

Table 4. PRISMA 2020 Flow Summary for the Smart-City Governance Review.

PRISMA Stage	Count	Description
Records identified	400	Smart-city AI/optimization studies retrieved from databases and manual sources
Duplicates removed	50	Duplicates and non-academic sources removed
Records screened	350	Title/abstract screening for AI, MORL, DT, governance relevance
Records excluded	200	Not aligned with governance/optimization scope
Full-text articles assessed	150	Full methodological review
Full-text articles excluded	71	Excluded due to insufficient governance modeling, incomplete RL formulation, or lack of empirical basis
Studies included in final synthesis	79	Articles coded using the Governance Readiness Index (GRI)

Table 5. Governance Readiness Across Research Paradigms.

Framework Category	Average GRI
Single-objective RL	0.32
Multi-objective (no governance modeling)	0.41
RL + Simulation	0.47
RL + Explainability (post hoc)	0.52
MORL-SGF (proposed)	0.89

Table 6. Summary of Challenges and Research Paths.

Challenge Category	Current Limitation	Promising Future Directions
Governance quantification	No standard reward formulation	SDG-aligned KPIs, citizen-adjustable governance weights
MORL scalability	Pareto explosion and high compute cost	Preference-based MORL, hierarchical objective learning
DT reliability	Simulation realism gaps	Uncertainty-aware, adversarial, multi-sector twin validation
Explainability	Technical policies hard to justify	Policy audit trails, counterfactual explanations
Data fairness	Biased city datasets	Fairness-aware reward normalization and resampling

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alharbi, S. MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities. Systems 2026, 14, 294. https://doi.org/10.3390/systems14030294

AMA Style

Alharbi S. MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities. Systems. 2026; 14(3):294. https://doi.org/10.3390/systems14030294

Chicago/Turabian Style

Alharbi, Saad. 2026. "MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities" Systems 14, no. 3: 294. https://doi.org/10.3390/systems14030294

APA Style

Alharbi, S. (2026). MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities. Systems, 14(3), 294. https://doi.org/10.3390/systems14030294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MORL-SGF: A Governance-Aware Multi-Objective Reinforcement Learning Framework with Digital Twin Policy Validation for Sustainable Smart Cities

Abstract

1. Introduction

2. Background and Foundations

2.1. Multi-Objective Reinforcement Learning (MORL)

2.2. Governance-Aware AI and Sustainability Targets (ESG & SDGs)

2.3. Digital Twins as a Policy Validation Layer

3. Research Gap and Motivation

3.1. Why Governance Must Become a Primary Optimization Objective

3.2. Systemic Limitations of Existing Smart City AI Approaches

3.3. Research Gaps

3.4. Motivating Vision

3.5. Research Objectives and Contributions

4. Methodology

4.1. Literature Search and Screening Protocol

4.2. Overview of the Methodological Pipeline

4.3. Step 1—Systematic Literature Review

4.4. Step 2—Governance Readiness Coding & Gap Identification

4.5. Step 3—Requirements Derivation & MOMDP Problem Formalization

4.6. Step 4—Governance-Aware Reward Modeling

4.7. Step 5—MORL-Based Policy Learning Design

4.8. Step 6—Digital Twin Validation & Governance-Based Policy Filtering

4.9. Step 7—Deployment Logic & Continual Governance Alignment

5. MORL–Smart Governance Framework (MORL-SGF)

5.1. Framework Overview

5.2. Problem Formulation: MOMDP Modeling

5.3. Governance-Aware Reward Design

5.4. Pareto-Based Multi-Objective Policy Learning

5.5. Digital Twin Policy Validation and Risk Filtering

5.6. Explainable Policy Selection for City Stakeholders

5.7. Deployment Feedback and Continual Policy Alignment

5.8. Discussion and Framework Implications

6. Evidence Synthesis and Multi-Domain Insights from Smart City Literature

6.1. Review Methodology and Governance Readiness Scoring

6.2. Key Empirical Findings

6.2.1. Finding 1—Governance Objectives Are Rarely Embedded into Learning

6.2.2. Finding 2—Digital Twins Are Underutilized for Policy Validation

6.2.3. Finding 3—Explainability Is Mostly Post Hoc, Not Intrinsic

6.2.4. Finding 4—Cross-Domain Decision Learning Is Largely Absent

6.2.5. Finding 5—MORL Is Rare and Not Aligned with Governance Reasoning

6.3. Governance Readiness Benchmarking

6.4. Domain-Specific Gaps and MORL-SGF Contributions

6.5. Key Implications and Positioning of MORL-SGF

7. Smart City Governance Use-Case Scenarios

7.1. Governance-Aware Traffic Policy Optimization

7.2. Sustainable Energy Sharing Across Urban Districts

7.3. Waste Collection Governance with Sustainability Metrics

7.4. Privacy-Preserving Public Safety Surveillance

8. Challenges, Limitations, and Open Research Directions

8.1. Governance Representation and Quantification Challenges

8.2. Scalability of Multi-Objective Reinforcement Learning

8.3. Reliability of Digital Twin Environments

8.4. Explainability, Accountability, and Policy Auditability

8.5. Data Readiness, Bias, and Social Representation Risks

8.6. Synthesis and Research Outlook

9. Conclusions and Future Research Agenda

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI