Next Article in Journal
Integrating Ocean Literacy Through a Locally Contextualized Dobble-like Card Game: An Exploratory Classroom Implementation
Previous Article in Journal
Tourism-Driven Land Use Transitions and Rural Livelihood Resilience: A Spatial Production Approach to Sustainable Development in China’s Heritage Areas
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Learning-Enabled Policy Optimization for Sustainable Ship Registry Selection

1
School of International Studies, Hainan University, Haikou 570228, China
2
Law School, Hainan University, Haikou 570228, China
3
School of Computer Science and Technology, Hainan University, Haikou 570228, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(23), 10836; https://doi.org/10.3390/su172310836
Submission received: 6 November 2025 / Revised: 29 November 2025 / Accepted: 1 December 2025 / Published: 3 December 2025
(This article belongs to the Section Sustainable Transportation)

Abstract

The global maritime industry faces a conflict between economic competition and sustainability standards. Economic pressure often incentivizes ship registries toward regulatory leniency, degrading environmental and social standards. Traditional static models often overlook how current flag choices impact future inspection risks and financing costs. To address this, we propose a Deep Reinforcement Learning framework that models flag selection as a sequential decision problem. Using a Markov Decision Process, we integrate economic, environmental, and social rewards. We analyze Port State Control records, AIS data, and 27 policy factors to quantify policy effectiveness within the simulation environment. The results show significant heterogeneity in policy performance. Reducing corporate income tax yielded the highest reward improvement (+131.37, p < 0.001). This suggests that, within the model, economic viability serves as a foundation for environmental investments. Enhanced safety standards also generate significant value (+58.35, p < 0.001) by reducing accident penalties and improving reputation metrics. Conversely, increasing tonnage taxes incentivizes the agent toward registries with lax oversight (−87.61, p < 0.001). These findings demonstrate that economic competitiveness and sustainability are mutually reinforcing. This framework provides maritime administrations with a “policy sandbox” for evidence-based decision-making, enabling a transition to sustainability without sacrificing competitiveness.

1. Introduction

The global shipping industry faces increasing pressure to balance economic viability with sustainability objectives. While maritime transport underpins over 80% of global trade, its environmental and social impacts attract intense scrutiny [1,2]. A structural barrier to sustainability lies in the “open registry” system. Shipowners can flag vessels in jurisdictions offering favorable tax rates and regulatory flexibility, known as Flags of Convenience (FOCs) [3]. This competitive dynamic has shifted the majority of the global fleet to open registries like Panama and Liberia, while traditional maritime nations have seen their market share decline [4,5,6,7].
This competition creates a “sustainability paradox”. To attract tonnage, some registries may lower regulatory oversight, leading to a “race to the bottom” in safety and environmental standards [8,9]. Responsible maritime nations thus face a dual challenge [10]. They must design policies that remain economically competitive while enforcing high Environmental, Social, and Governance (ESG) standards. However, flag states act primarily as ecosystem designers rather than direct operators. They establish fiscal and regulatory frameworks, but shipowners and Recognized Organizations (ROs) influence day-to-day compliance based on economic logic. Therefore, effective policy must make sustainability economically rational for shipowners.
Traditional analytical methods struggle to resolve this paradox. Econometric studies successfully identify static correlations between costs and flag choice [11,12] but cannot model the sequential nature of these decisions. Flag selection is a dynamic problem: a decision to cut costs today affects future inspection risks, insurance premiums, and access to green financing [13,14]. Static optimization models often overlook these long-term feedback loops [15]. Consequently, they may fail to capture shipowners’ adaptive behavior in a changing regulatory environment [16,17].
To address these limitations, we propose a framework using Deep Reinforcement Learning (DRL) [18,19]. Unlike traditional methods, DRL is designed for sequential decision-making. It combines Deep Learning to process complex state spaces (27 heterogeneous policy factors) with Reinforcement Learning to optimize long-term cumulative rewards [20]. This allows our model to simulate how shipowners adapt to policy changes over time, balancing immediate economic incentives against future environmental and social benefits [21,22]. By modeling these dynamics, we provide a “policy sandbox” to test which interventions achieve sustainable competitiveness.
This study makes three principal contributions to maritime policy research:
  • We apply Deep Reinforcement Learning to the flag-of-registry selection problem, addressing the limitations of traditional static discrete choice models. We formulate flag selection as a sequential decision process. This approach captures how current choices influence future inspection risks and operational costs, offering a new theoretical lens for sustainability trade-offs.
  • We provide a systematic quantification of feature importance across 27 specific policy levers. Our results reveal significant heterogeneity in effectiveness, demonstrating that economic incentives (e.g., corporate tax reductions) can serve as the necessary foundation for environmental investments, while improper fiscal measures may trigger a “race to the bottom” dynamic.
  • We develop a dynamic policy simulation framework that allows maritime administrations to conduct ex ante scenario analysis. This tool enables policymakers to estimate the potential long-term cumulative effects of integrated policy portfolios on fleet competitiveness and ESG performance before actual implementation, reducing trial-and-error costs in governance.
The remainder of this paper is organized as follows. Section 2 reviews the literature on flag selection and methodological gaps. Section 3 details the DRL framework, including the Markov Decision Process formulation and reward function design. Section 4 describes the data and 27 quantified policy factors. Section 5 presents the experimental results, analyzing policy effectiveness across economic, environmental, and social dimensions. Section 6 discusses strategic implications, and Section 7 concludes with policy recommendations and future research directions.

2. Literature Review

Research on ship registry selection has traditionally focused on economic determinants. Early studies modeled flag choice as a cost minimization problem [23,24], identifying registration fees, taxes, and operational costs as primary drivers [25]. In this view, Flags of Convenience (FOCs) attract tonnage through regulatory leniency and fiscal incentives, often creating a “race to the bottom” in standards. While this explains the historical growth of open registries, emerging research suggests that purely cost-based advantages are eroding as international scrutiny intensifies [26,27,28,29].
Subsequent scholarship has integrated regulatory quality into these frameworks. Factors such as Port State Control (PSC) performance, safety standards, and administrative efficiency have been shown to significantly influence flag attractiveness [30,31]. However, most existing studies frame environmental and social regulations primarily as compliance burdens. There remains a lack of analysis on how flag state policies can simultaneously advance economic viability and sustainability goals (ESG) without treating them as zero-sum trade-offs.
Methodologically, the field relies heavily on static discrete choice models, such as Logit and Probit regressions [11,12]. While these methods effectively identify correlational factors, they treat flag selection as an isolated, one-time event. They fail to capture the sequential nature of the decision process, where a choice made today influences future states—such as inspection risks, insurance premiums, and access to green financing [13,14]. Recent applications of machine learning have improved predictive capabilities for specific operational risks [32,33,34,35], but they typically lack the capacity to optimize long-term policy strategies in dynamic environments.
Deep Reinforcement Learning (DRL) addresses these limitations by modeling decision-making as a sequential process under uncertainty [36,37,38,39]. Unlike static optimization models, DRL agents learn strategies through interaction with a simulated environment, maximizing cumulative long-term rewards rather than immediate utility. This approach has been successfully applied to complex maritime problems such as autonomous navigation and collision avoidance [40,41,42,43,44]. This study extends DRL to maritime policy analysis, providing a dynamic framework to evaluate how fiscal and regulatory interventions influence the long-term sustainability of ship registries.

3. Methodology

3.1. Problem Definition and Factor Identification

To clarify our methodological choice, we distinguish between Machine Learning (ML), Deep Learning (DL), and Deep Reinforcement Learning (DRL). ML algorithms typically perform static predictions (e.g., predicting detention risk) [45,46]. DL, a subset of ML, employs neural networks to process high-dimensional data [47,48]. DRL integrates DL with sequential decision-making. We select DRL because ship registry selection is fundamentally a sequential, not static, problem. A decision to choose a low-standard flag today creates long-term feedback loops—affecting future inspection probabilities, insurance costs, and financing access—that static models cannot capture. DRL optimizes policies by accounting for these delayed consequences and path dependencies [49,50,51].
We first structured the policy factors influencing flag selection into economic, environmental, and social dimensions (Table 1). Table 2 details the quantification and normalization methods for these 27 factors. We distinguish between active policy levers (e.g., tax rates, subsidies) that regulators can directly adjust, and contextual state variables (e.g., market indices) that constitute the decision environment but are exogenous to the registry’s immediate control. The RL agent observes both sets of variables to form the state space but can only act upon the active policy levers.

3.2. Sustainable Flag Selection Modeling Based on Reinforcement Learning

To address the sequential nature of flag selection, we model the shipowner’s decision-making process as a Markov Decision Process (MDP), defined by the tuple ( S , A , P , R , γ ) . This formulation allows the agent to optimize long-term sustainability objectives rather than short-term profits.
State Space ( S ). The state s t S at time t encapsulates the decision environment, comprising both vessel-specific attributes (e.g., vessel age, current flag) and the external market environment. The state vector includes the quantified values of the 27 policy factors (F1–F27) defined in Table 2, along with dynamic market indicators such as freight rate indices and fuel prices.
Action Space ( A ). The discrete action set a t available to the shipowner is defined as:
A = { Maintain Current Flag , Switch to National Flag , Switch to FOC } ,
Constraints are applied to ensure validity (e.g., a vessel cannot switch to a flag it is already registered under).
Reward Function ( R ). The immediate reward R t serves as the guiding signal for sustainability optimization. As defined in Equation (2), it integrates three conflicting objectives:
R t = w 1 · R economic + w 2 · R environmental + w 3 · R social C ( a t ) ,
where R economic captures operating profits and tax efficiencies (F1–F9); R environmental reflects compliance status and safety records (F10–F13, F16); and R social accounts for labor standards and service quality (F14–F27). C ( a t ) represents the transaction and friction costs of flagging out, capturing the organizational disruption that undermines sustained sustainability improvements. The weights w 1 , w 2 , w 3 (specified in Table 3) balance these dimensions.
To solve this high-dimensional MDP, we employ a Deep Q-Network (DQN). Unlike standard black-box neural networks, we incorporate a specialized attention mechanism layer before the fully connected layers. This mechanism computes a compatibility score between the state vector and a trainable context vector, normalized via a Softmax function.
α i = exp ( e i ) k = 1 N exp ( e k )
These attention weights α i dynamically highlight which policy factors (e.g., “Corporate Tax” vs. “Safety Standards”) dominate the agent’s decision at any given step, significantly enhancing model interpretability.
The training process follows the standard Q-learning update rule with Experience Replay to stabilize convergence. The complete hyperparameters (e.g., learning rate α = 0.001 , discount factor γ = 0.95 ) are detailed in Table 3, and the full training logic is presented in Algorithm 1.
Algorithm 1 Training Procedure of Sustainability-Oriented Attention-DQN
Require: 
Factors S, Actions A, Weights w, γ = 0.95 , α = 0.001 , Batch B = 32 .
  1:
Initialize: Q-net θ , Target θ θ , Buffer D , ϵ 1.0 .
  2:
for episode m = 1 to M do
  3:
    Reset state s t ;
  4:
    for step t = 1 to T do
  5:
        Select a t random action , if rand ( ) < ϵ arg max a Q ( s t , a ; θ ) , otherwise Equation (2);
  6:
        Store ( s t , a t , r t , s t + 1 ) in D .
  7:
        if  | D | > B  then
  8:
           Sample batch { ( s j , a j , r j , s j + 1 ) } j = 1 B from D .
  9:
           Target y j r j + γ max a Q ( s j + 1 , a ; θ ) .
10:
           Update θ by minimizing loss L = 1 B j ( y j Q ( s j , a j ; θ ) ) 2 .
11:
        end if
12:
        Every C steps: θ θ .
13:
        Update  s t s t + 1 , decay ϵ .
14:
    end for
15:
end for

3.3. Sustainability Policy Impact Quantification and Priority Ranking

To translate the model’s learned strategies into actionable policy insights, we employ a hybrid assessment framework combining data-driven interpretability with expert-based feasibility analysis. To quantify the relative importance of each policy factor, we apply SHAP (SHapley Additive exPlanations) to the trained DQN agent. SHAP values originate from cooperative game theory and provide a robust measure of feature importance. For our framework, the SHAP value ϕ j represents the marginal contribution of policy factor j to the agent’s expected long-term reward (Q-value). A positive SHAP value indicates that a specific policy improvement (e.g., reducing tax F1 or enhancing safety F10) significantly influences the agent’s decision metrics toward a sustainable registry, while the magnitude reflects its relative efficacy compared to other interventions.
While SHAP identifies what is effective, it does not account for implementation costs or political resistance. To address this, we complement the RL results with the Analytic Hierarchy Process (AHP). We convened an expert panel comprising 8 specialists: 3 academic researchers in maritime economics, 3 senior executives from shipping companies, and 2 maritime legal advisors. The panel performed pairwise comparisons of the top-performing policies identified by the RL model, evaluating them against three criteria:
  • Implementation Feasibility: The administrative and legal complexity of enacting the policy.
  • Fiscal Efficiency: The cost–benefit ratio from the government’s perspective.
  • Strategic Alignment: Consistency with long-term national maritime goals.
The Consistency Ratio (CR) for all expert judgment matrices was maintained below 0.10, ensuring the logical validity of the weights. The final policy priorities are derived by synthesizing the data-driven SHAP effectiveness scores with the expert-derived feasibility weights.

4. Case Study and Data Description

4.1. Dataset Construction and Feature Engineering

This study focuses exclusively on the open international shipping market where shipowners have the legal discretion to choose their flag registry. Protected domestic markets governed by strict cabotage regimes (e.g., the U.S. Jones Act trades) are excluded, as registry choice in these sectors is legally mandated rather than economically optimized.
We integrated data from three primary sources to construct the state space: (1) Port State Control (PSC) Records: Sourced from the Tokyo and Paris MoUs, these records provide objective indicators of safety, environmental, and social compliance (e.g., deficiencies and detentions). (2) AIS Trajectory Data: We utilized Automatic Identification System data to track vessel movements, port calls, and operational patterns. (3) Policy Factors: We quantified 27 regulatory and economic factors as detailed in Table 1. While PSC and AIS data serve as external proxies for internal decision-making, they represent the standard objective metrics for assessing fleet behavior in maritime economics.
The dataset covers the period from January to June 2024. This window was selected to capture immediate behavioral responses to the post-IMO 2023 greenhouse gas strategy, minimizing concept drift associated with older historical data. Although the calendar span is short, the data density is extremely high: it includes 8,008,935 AIS trajectory records and more than 1,025,721 unique state transition samples. While this high data density ensures the stability of the reinforcement learning training process, we acknowledge that the six-month period limits the model’s ability to capture multi-year economic cycles. Therefore, the results should be interpreted as reflecting short-to-medium-term tactical adaptations rather than long-term strategic fleet migration trends.
The 27 policy factors were quantified using a three-tier framework to ensure replicability:
  • Tier 1 (Public Databases): 15 factors (e.g., corporate tax rates, convention ratifications) were sourced from open international databases such as OECD Statistics, IMO GISIS, and the World Bank.
  • Tier 2 (Industry Reports): 8 factors (e.g., registration fees, compliance costs) were compiled from official registry fee schedules and industry reports (e.g., BIMCO, Drewry).
  • Tier 3 (Constructed Indicators): 4 factors were synthesized methodologically. For instance, “Policy Stability” (F25) was calculated as the inverse variance of regulatory changes over a 5-year rolling window.
Missing values (approximately 8%) were handled via median imputation within flag categories.
The processed dataset and model code are available from the corresponding author upon request. Raw third-party AIS and PSC data remain subject to commercial licensing restrictions.

4.2. Baseline Parameter Configuration

Table 4 summarizes the simulation parameters. The environment operates on an annual cycle (365 steps). We set the discount factor γ to 0.95 to reflect the long-term nature of capital planning in shipping. Economic volatility is set to 0.1 to simulate market stochasticity.
The experiments were conducted using Python 3.8 and PyTorch 1.9.0 on a workstation equipped with an NVIDIA GeForce RTX 3090 GPU. We employ two distinct baselines: (1) Competitive Baseline (Training), which reflects real-world disparities to train the agent. For example, national flags are modeled with higher tax rates (0.25) and lower inspection probabilities (0.1) compared to FOCs (0.05 tax, 0.4 inspection risk). (2) Neutral Baseline (Experiments), wihch was used for the counterfactual analysis in Section 5. Here, all 27 policy factors are set to their cross-registry median values. This control setting isolates the marginal contribution of individual policy interventions by eliminating confounding variables from the baseline assumptions.

5. Results and Analysis

We present experimental evaluations of the proposed framework. Our results show that while economic competitiveness incentivizes flag selection, long-term sustainability requires balancing cost minimization with ESG standards. The analysis proceeds in five stages: model validation (Section 5.1), single-policy impact quantification (Section 5.2), risk-return profiling (Section 5.3), multi-dimensional evaluation (Section 5.4), and category-level analysis (Section 5.5).

5.1. Model Performance Comparison: Validating Sustainable Policy Learning

We first benchmark the Attention-based Deep Q-Network (Attn-DQN) against baseline policies to validate its ability to balance economic, environmental, and social objectives. Our reward function explicitly integrates these three dimensions (Equation (2)), enabling the agent to discover strategies that generate long-term value despite short-term compliance costs.
Figure 1 compares the average cumulative rewards of different strategies. The DQN agent outperforms all baselines with a mean reward of 218.2, representing an 88% improvement over the random policy (116.1) and a 73% improvement over the “Always Keep” heuristic (126.0).
Notably, the “Always National” policy performs poorly (mean reward 54.2). This yields three insights: (1) Economic viability is the foundation of competitiveness; blindly pursuing national registration without cost advantages is unsustainable. (2) Effective policies must make compliance economically attractive, not just legally mandated. (3) A context-aware decision framework is necessary to identify when high-standard registries become economically viable.
The box plots show that the DQN policy has a wider distribution (max ∼340), indicating its adaptive capacity to capitalize on favorable market conditions. In contrast, the “Always Keep” policy shows high variance, leaving vessels vulnerable to deteriorating standards.
Figure 2 shows the agent’s action distribution. The agent maintains the current flag in 94.7% of decisions, switches to a convenience flag (FOC) in 5.2%, and switches to the national flag in only 0.1%.
This conservative strategy reflects three principles: First, stability enables sustainability; frequent switching disrupts operations and safety culture. Second, the 5.2% shift to FOCs represents a rational survival response to financial distress, preventing total insolvency. Third, the rare selection of the national flag (0.1%) quantitatively reproduces the established “sustainability paradox”: without economic competitiveness, aspirational standards incentivize vessels toward regulatory arbitrage.
Figure 3 displays the reward distribution, which approximates normality (mean 218.2, median 215.6). The positive skew suggests the agent successfully captures opportunities to avoid penalties or gain preferential treatment.

5.2. Policy Reward Improvement Comparative Analysis

We simulated 27 policy interventions over 1000 episodes each. Figure 4 ranks their effectiveness. Results show significant heterogeneity, with improvements ranging from 87.61 to + 131.37 .
Reducing Corporate Income Tax (F1) is associated with the most significant reward improvement ( Δ R = + 131.37 , p < 0.001 ). This indicates that economic viability enables investments in green technology and safety. Enhanced Financial Subsidies (F5) rank second ( Δ R = + 73.06 ), serving as an effective alternative when tax restructuring is difficult. Enhanced Safety Inspection Standards (F10) rank third ( Δ R = + 58.35 ), proving that rigorous oversight generates value by reducing accident costs and improving reputation.
Increasing Tonnage Tax (F2) corresponds to a strong negative reward signal ( Δ R = 87.61 , p < 0.001 ). This asymmetry (harm exceeds comparable benefits) reflects loss aversion. Critically, unviable tax burdens incentivize vessels to FOCs with lax oversight, triggering a “race to the bottom” that degrades environmental and social performance.
Administrative improvements yield moderate gains. Registration Process Simplification (F19) and Policy Stability (F25) improve rewards by approximately 44 points. While less transformative than tax incentives, these factors reduce transaction costs and provide the certainty needed for long-term capital commitments. Regulatory policies (F11–F18) show smaller improvements (∼14–18 points), acting as necessary “hygiene factors” rather than primary differentiators.
We identify a three-tier policy architecture: (1) Transformative (Tier 1): Policies with improvement > 50 (F1, F5, F10). These form the core of competitiveness. (2) Complementary (Tier 2): Improvements between 20–50 (F19, F25). These enhance efficiency. (3) Foundational (Tier 3): Improvements < 20. These build institutional legitimacy. Strategic prioritization should focus on Tier 1 policies to establish viability before layering Tier 2 and 3 interventions.

5.3. Risk-Return Profile Analysis

Figure 5 maps the risk-return profiles of all policies.
Volatility is remarkably consistent across scenarios (σ ≈ 54.5). This indicates that risk stems mainly from exogenous market factors rather than policy choices. Policymakers can therefore pursue ambitious sustainability goals (e.g., F1, F10) without amplifying operational risk for shipowners.
The most effective policies (F1, F5, F10, F19, F25) form a rightward frontier. F1 maximizes return without increasing risk, while F10 improves rewards significantly. This clustering demonstrates that economic competitiveness and governance quality are mutually reinforcing.
The extreme position of F2 (μ = −52.37) confirms that fiscal mismanagement undermines all sustainability dimensions. Short-term cost pressures incentivize vessels to low-standard regimes, creating a cascade of environmental and social degradation.

5.4. Multi-Dimensional Evaluation

Table 5 details the performance of top policies across the triple bottom line, replacing the graphical representation to provide precise numerical benchmarks as requested.
The performance gap between F1 and F2 (218.97 points) highlights the critical role of fiscal policy. F1’s high mean reward (166.60) and lower coefficient of variation (0.326 vs. baseline 1.547) show that economic sustainability enhances operational stability. The top-tier policies (F1, F5, F10) successfully integrate economic incentives with environmental standards, validating their complementarity.
We propose a phased implementation strategy based on these quantitative rankings: Phase 1 (Governance): Implement F19 and F25 to build credibility ( Δ R + 44 ). Phase 2 (Enablement): Deploy F1 and F5 to create economic capacity ( Δ R + 200 ). Phase 3 (Consolidation): Introduce F10 to attract premium tonnage ( Δ R + 58 ). This sequence ensures that economic and governance foundations are in place to support rigorous environmental standards.

5.5. Category-Level Analysis

We aggregated interventions into Economic (F1–F9), Environmental (F10–F13, F16–F17, F24), and Social (F14–F15, F18–F27) categories. Figure 6 displays their performance.
These categories exhibit distinct strategic roles: (1) Economic Policies: Provide the strongest reward generation (Mean: 59.82) but carry high variance due to the risk of poor fiscal design (e.g., F2). They are foundational for viability. (2) Environmental Policies: While offering modest average improvement (+14.73), they provide the highest stability (Normalized Score: 0.85), serving as essential hygiene factors and competitive differentiators. (3) Social Policies: These deliver 80% of the effectiveness of economic policies with strong stability. They act as enablers by reducing friction and building trust.
The heatmap reveals an integrated “Sustainability Triangle”: Economic policies provide the foundation, Environmental policies ensure stability, and Social policies reduce barriers. An optimal portfolio might consist of 60% Core Economic instruments (F1, F5), 25% Social Enablers (F19, F25), and 15% Environmental differentiators (F10). This confirms that sustainable competitiveness requires a holistic approach where economic efficiency and ESG responsibility are treated as synergistic rather than opposing forces.

6. Policy Recommendations

Based on our empirical analysis of sustainability-oriented flag state policy optimization, we propose the following phased recommendations derived from the simulation results. These recommendations are designed to support the simultaneous enhancement of economic viability, environmental responsibility, and social equity—the three pillars of sustainable maritime competitiveness. Our simulation results suggest that these dimensions can be modeled as mutually reinforcing objectives when policies are strategically sequenced and integrated.

6.1. Short-Term Priorities (0–12 Months): Establishing Economic Sustainability Foundations

The immediate goal proposed by the framework is to create the economic conditions necessary for shipowners to invest in sustainable operations while preventing migration to flags of convenience with lax environmental and social standards. This phase reflects the simulation finding that economic viability serves as a foundational pillar—without cost competitiveness, vessels may flee to regulatory havens, undermining the registry’s long-term capacity to enforce environmental and social standards.
Implement Competitive Tax Incentives (F1). Our analysis identifies corporate income tax reduction as a highly effective lever within the simulation ( Δ R = + 131.37 ), suggesting it acts not merely for profit maximization but as a key facilitator of sustainability investments. Policymakers might consider phased tax reductions for vessels meeting environmental standards (e.g., IMO EEDI/EEXI compliance, alternative fuel adoption) and social criteria (certified crew welfare systems). This approach aims to ensure economic competitiveness supports rather than undermines ESG objectives, mitigating the risk of a “race to the bottom” phenomenon where cost pressures incentivize vessels toward registries with minimal environmental oversight (as evidenced by F2’s substantial negative impact of Δ R = 87.61 ).
Streamline Administrative Processes (F19). Registration process simplification ( Δ R = + 44.11 ) reduces transaction costs while signaling governance quality. Establishing a one-stop digital portal with multilingual support, minimizing documentation through risk-based verification, and providing 24/7 technical assistance may signal administrative efficiency that could help attract responsible operators seeking stable, predictable regulatory environments for long-term sustainability planning.

6.2. Medium-Term Strategy (1–3 Years): Building Environmental–Social Excellence

Building on economic foundations, the focus shifts to differentiation through superior environmental performance and social responsibility that attract premium tonnage. This phase leverages the financial capacity created in Phase 1 to implement rigorous ESG standards that generate long-term competitive advantages.
Deploy Targeted Green Vessel Subsidies (F5). Financial subsidies ( Δ R = + 73.06 ) could prioritize vessels demonstrating environmental leadership: alternative fuel systems, emission control technologies, and certified environmental management systems. This may create positive selection effects, attracting shipowners committed to decarbonization pathways aligned with IMO 2030/2050 targets while building the registry’s reputation as an ESG leader.
Strengthen Safety and Environmental Standards (F10). Our model suggests that enhanced safety inspection standards can generate substantial value ( Δ R = + 58.35 ) through reduced accident costs, improved PSC performance, and enhanced reputation among ESG-conscious charterers and financiers. Rigorous enforcement of MARPOL Annexes, ballast water management conventions, and crew welfare standards may position the registry as a preferred choice for responsible maritime operations.

6.3. Long-Term Vision (3+ Years): Sustainable Governance Leadership

The long-term objective is to support the establishment of the registry as a global exemplar of integrated sustainability governance, influencing international maritime policy evolution.
Institutionalize Dynamic Policy Evaluation (F25). The reinforcement learning framework developed in this study could serve as a decision-support infrastructure for continuous policy optimization. This adaptive governance approach enables real-time assessment of how evolving policy combinations affect the economic–environmental–social triple bottom line, aiming to support sustained competitiveness as global sustainability requirements intensify.
Champion International Sustainability Standards. Proactive engagement in IMO deliberations on carbon pricing mechanisms, crew welfare conventions, and circular economy principles for ship recycling may amplify influence while aligning national interests with global sustainable development goals. Leadership in international standard-setting potentially creates first-mover advantages for domestic operators while elevating the entire industry’s sustainability performance.
These phased recommendations reflect the model’s insight that sustainable maritime competitiveness requires simultaneous progress across all three dimensions: economic instruments that enable green investments, environmental–social standards that ensure responsible operations, and governance excellence that maintains long-term credibility and stability.

7. Discussion and Conclusions

We first clarify the mechanism of maritime governance to contextualize our findings. Flag states function as ecosystem designers rather than direct operators. While they establish fiscal and regulatory frameworks, shipowners influence compliance based primarily on economic logic. Our simulation results align with this “economic-primacy” view: Corporate Income Tax Reduction (F1) acts as the most effective policy lever ( Δ R = + 131.37 ). This suggests that environmental and social policies are most effective when they align with shipowners’ economic rationality—either by reducing operational risks (e.g., safety standards) or by enhancing market access.
Based on the simulation, our primary conclusion is that economic sustainability serves as the foundation for environmental and social responsibility within the simulation. Rather than competing with ESG objectives, fiscal viability enables them. The high effectiveness of tax incentives demonstrates that competitive fiscal policies provide shipowners with the necessary financial capacity to invest in green technologies and crew welfare. Without this economic foundation, aspirational environmental standards are difficult to enforce.
Second, we find that rigorous regulations can generate economic value. Enhanced Safety Inspection Standards (F10, Δ R = + 58.35 ) improve registry reputation and reduce accident-related costs, challenging the assumption that regulation is merely a financial burden. Conversely, the negative impact of Increased Tonnage Tax (F2, Δ R = 87.61 ) quantitatively reproduces the “sustainability paradox.” Excessive fiscal burdens incentivize the simulated agents toward flags of convenience with lax oversight, triggering a “race to the bottom” that degrades safety and environmental performance. This confirms that economic and ESG dimensions are interdependent.
Methodologically, the Deep Reinforcement Learning (DRL) framework advances beyond static econometric models by capturing the sequential nature of flag selection. It allows for the discovery of synergistic strategies. Our analysis suggests a three-pillar architecture for sustainable competitiveness: (1) Economic policies (F1, F5) ensure financial viability; (2) Environmental policies (F10, F11) build a long-term reputation and attract responsible operators; (3) Social/Governance policies (F19, F25) reduce transaction costs and ensure stability.
We acknowledge several limitations in this study. First, the dataset covers a short period (January–June 2024), which may not fully capture long-term market cycles. Second, we rely on proxy variables for certain governance factors, which may not perfectly reflect institutional and geopolitical nuances. Third, the model uses a representative agent approach, potentially overlooking the heterogeneity between large international fleets and smaller, cost-sensitive operators.
Future research should address these gaps by extending the longitudinal data and employing multi-agent reinforcement learning to simulate competitive dynamics between shipowners. Despite these limitations, this study demonstrates that sustainable maritime competitiveness is achievable. By integrating economic incentives with rigorous ESG standards, maritime administrations can design policies that are both commercially viable and socially responsible. This framework offers a data-driven tool for navigating the industry’s necessary transition toward sustainability.

Author Contributions

Conceptualization, G.X.; Methodology, Y.L.; Investigation, B.Z.; Writing—original draft, Z.Z.; Writing—review&editing G.X. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Research and Development Program of Hainan Province (Grant No. ZDYF2022GXJS348), the Hainan Provincial Philosophy and Social Science Planning Project, 2025 (Grant No. HNSK(ZX)25-118) and Hainan Provincial Philosophy and Social Science Planning Project (Grant No. HNSK(YB)23-48).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. United Nations Conference on Trade and Development. Review of Maritime Transport 2022; United Nations: Geneva, Switzerland, 2022. [Google Scholar]
  2. Thakur, A.S.; Alex, T.L.; Nighojkar, A. Artificial Intelligence in Maritime Anomaly Detection: A Decadal Bibliometric Analysis (2014–2024). J. Inst. Eng. (India) Ser. C 2025, 106, 665–689. [Google Scholar] [CrossRef]
  3. International Transport Workers’ Federation. The ITF FOC Campaign; ITF Global: London, UK, 2021. [Google Scholar]
  4. Hoffman, J.; Rydbergh, T.; Stevenson, A. Decarbonizing Shipping: What Role for Flag States; UNCTAD: Geneva, Switzerland, 2020. [Google Scholar]
  5. Chang, C.C.; Jhang, C.W. Reducing speed and fuel transfer of the green flag incentive program in kaohsiung port taiwan. Transp. Res. Part D Transp. Environ. 2016, 46, 1–10. [Google Scholar] [CrossRef]
  6. Garcia, B.; Foerster, A.; Lin, J. Net zero for the international shipping sector? An Analysis of the Implementation and Regulatory Challenges of the IMO Strategy on Reduction of GHG Emissions. J. Environ. Law 2021, 33, 85–112. [Google Scholar] [CrossRef]
  7. Van Pham, T. The impact of the flag of convenience regime into shipping industry. J. Marit. Res. JMR 2024, 21, 280–284. [Google Scholar]
  8. Gould, A. The customer is always right? Flags of convenience and the assembling of maritime affairs. Int. Relations 2023. [Google Scholar] [CrossRef]
  9. Alcaidea, J.I.; Piniella, F.; Rodr’ıguez-D’ıaz, E. The “Mirror Flags”: Ship registration in globalised ship breaking industry. Transp. Res. Part D Transp. Environ. 2016, 48, 378–392. [Google Scholar] [CrossRef]
  10. Alcock, F. Flagging Standards: Globalization and Environmental, Safety and Labor Regulations at Sea. Glob. Environ. Politics 2008, 8, 154–156. [Google Scholar] [CrossRef]
  11. Goss, R.O. The history of a competitive industry. In The Global Shipping Industry; Routledge: London, UK, 1990; pp. 1–24. [Google Scholar]
  12. Dixit, A.K.; Pindyck, R.S. Investment Under Uncertainty; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
  13. Hoffmann, J.; Sanchez, R.J.; Talley, W.K. Determinants of vessel flag. Res. Transp. Econ. 2005, 12, 173–219. [Google Scholar] [CrossRef]
  14. Luo, M.; Fan, L.; Li, K.X. Flag choice behaviour in the world merchant fleet. Transp. A Transp. Sci. 2013, 9, 429–450. [Google Scholar] [CrossRef]
  15. Kavussanos, M.; Tsekrekos, A.E. The option to change the flag of a vessel. In International Handbook of Maritime Economics; Edward Elgar Publishing: Cheltenham, UK, 2011; pp. 47–62. [Google Scholar]
  16. Cariou, P.; Wolff, F.-C. Do Port State Control inspections influence flag-and class-hopping phenomena in shipping? J. Transp. Econ. Policy (JTEP) 2011, 45, 155–177. [Google Scholar]
  17. Paris MoU. Paris MoU. Paris memorandum of understanding on port state control. In The Legal Order of the Oceans; Paris MoU: Paris, France, 2021. [Google Scholar]
  18. Lapeyrolerie, M.; Chapman, M.S.; Norman, K.E.A.; Boettiger, C. Deep reinforcement learning for conservation decisions. Methods Ecol. Evol. 2022, 13, 2649–2662. [Google Scholar] [CrossRef]
  19. Zuccotto, M.; Castellini, A.; Torre, D.L.; Mola, L.; Farinelli, A. Reinforcement learning applications in environmental sustainability: A review. Artif. Intell. Rev. 2024, 57, 88. [Google Scholar] [CrossRef]
  20. Zhao, Y.; Wen, S.; Zhao, Q.; Zhang, B.; Huang, Y. Deep Reinforcement Learning-Based Energy Management Strategy for Green Ships Considering Photovoltaic Uncertainty. J. Mar. Sci. Eng. 2025, 13, 565. [Google Scholar] [CrossRef]
  21. Dey, A.; Ejohwomu, O.A.; Chan, P.W. Sustainability challenges and enablers in resource recovery industries: A systematic review of the ship-recycling studies and future directions. J. Clean. Prod. 2021, 329, 129787. [Google Scholar] [CrossRef]
  22. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  23. Metaxas, B.N. The Economics of Tramp Shipping; The Athlone Press: London, UK, 1971. [Google Scholar]
  24. Pan, R.; Zhang, W.; Wang, S.; Kang, S. Deep reinforcement learning model for Multi-Ship collision avoidance decision making design implementation and performance analysis. Sci. Rep. 2025, 15, 21250. [Google Scholar] [CrossRef]
  25. Garcia-Cant’on, S.; RuizdeMendoza, C.; Cervell’o-Pastor, C.; Sallent, S. Multi-Agent Reinforcement Learning-Based Routing and Scheduling Models in Time-Sensitive Networking for Internet of Vehicles Communications Between Transportation Field Cabinets. Appl. Sci. 2025, 15, 1122. [Google Scholar] [CrossRef]
  26. Sheriff, A.M.; Anantharaman, M.; Islam, R.; Nguyen, H.-O. An in-depth analysis of port state control inspections: A bibliometric analysis and systematic review. J. Int. Marit. Saf. Environ. Aff. Shipp. 2025, 9, 2454754. [Google Scholar] [CrossRef]
  27. Vaidheeswaran, V.; Jayakody, D.; Mulay, S.; Lo, A.; Alam, M.M.; Spadon, G. Goal-Conditioned Reinforcement Learning for Data-Driven Maritime Navigation. arXiv 2025, arXiv:2509.01838. [Google Scholar]
  28. M"uller, M.; Finkeldei, F.; Krasowski, H.; Arcak, M.; Althoff, M. Falsification-Driven Reinforcement Learning for Maritime Motion Planning. arXiv 2025, arXiv:2510.06970. [Google Scholar] [CrossRef]
  29. Wang, H.; Li, J.; Tao, H.; Liu, J.; Li, C.; Wang, K.; Xu, M. Autonomous dynamic formation for maritime target tracking using multi-agent reinforcement learning. Eng. Appl. Artif. Intell. 2025, 154, 110904. [Google Scholar] [CrossRef]
  30. Yen, V.T.H. Capital solutions to promote fleet investment in shipping in countries such as Vietnam. Int. J. Prof. Bus. Rev. 2022, 7, 16. [Google Scholar] [CrossRef]
  31. Fan, L.; Zhang, S.; Yin, J. Structural analysis of shipping fleet capacity. J. Adv. Transp. 2018, 2018, 3854090. [Google Scholar] [CrossRef]
  32. Jin, J.; Fu, X.; Gao, X.; Cheng, T.; Yan, R. MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model. arXiv 2025, arXiv:2505.19568. [Google Scholar] [CrossRef]
  33. Wang, R.; Zhang, M.; Gong, F.; Wang, S.; Yan, R. Improving port state control through a transfer learning-enhanced XGBoost model. Reliab. Eng. Syst. Saf. 2025, 253, 110558. [Google Scholar] [CrossRef]
  34. Tsou, M.-C. A Machine Learning-Based Model for Predicting High Deficiency Risk Ships in Port State Control: A Case Study of the Port of Singapore. J. Mar. Sci. Eng. 2025, 13, 1485. [Google Scholar] [CrossRef]
  35. Guan, Y.; Tian, X.; Wu, Y.; Wang, S. Equitable port state control in maritime transportation: A data-driven optimization approach. Transp. Res. Part C Emerg. Technol. 2025, 180, 105303. [Google Scholar] [CrossRef]
  36. Mitroussi, K.; Arghyrou, M.G. Institutional performance and ship registration. Transp. Res. Part E Logist. Transp. Rev. 2016, 85, 90–106. [Google Scholar] [CrossRef]
  37. Iwunze, V. Has the Exception Become the Rule? Examining the Growing Dominance of Flags of Convenience in International Shipping. Int. J. Comp. Law Leg. Philos. (IJOCLLEP) 2021, 3, 15. [Google Scholar]
  38. Wang, W.; Xiong, W.; Ouyang, X.; Chen, L. TPTrans: Vessel Trajectory Prediction Model Based on Transformer Using AIS Data. ISPRS Int. J. Geo-Inf. 2024, 13, 400. [Google Scholar] [CrossRef]
  39. Nguyen, D.; Fablet, R. A transformer network with sparse augmented data representation and cross entropy loss for AIS-based vessel trajectory prediction. IEEE Access 2024, 12, 21596–21609. [Google Scholar] [CrossRef]
  40. Zhang, B.; Hirayama, K.; Ren, H.; Wang, D.; Li, H. Ship anomalous behavior detection using clustering and deep recurrent neural network. J. Mar. Sci. Eng. 2023, 11, 763. [Google Scholar] [CrossRef]
  41. Wang, L.; Wang, J.; Hua, Y.; Shi, W.; Yang, Z.; Sha, M. Machine learning approaches for identifying substandard ships in port state control inspections with imbalanced data. Ocean Eng. 2025, 334, 121614. [Google Scholar] [CrossRef]
  42. Xie, Z.; Tu, E.; Fu, X.; Yuan, G.; Han, Y. AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review. arXiv 2025, arXiv:2505.07374. [Google Scholar] [CrossRef]
  43. Løvland, A.B. Predicting the Destination Port of Fishing Vessels. Master’s Thesis, UiT Norges Arktiske Universitet, Tromsø, Norway, 2024. [Google Scholar]
  44. Alqithami, S. CH-MARL: Constrained Hierarchical Multiagent Reinforcement Learning for Sustainable Maritime Logistics. arXiv 2025, arXiv:2502.02060. [Google Scholar] [CrossRef]
  45. Waltz, M.; Paulig, N.; Okhrin, O. 2-level reinforcement learning for ships on inland waterways: Path planning and following. Expert Syst. Appl. 2025, 274, 126933. [Google Scholar] [CrossRef]
  46. Wilson, A.; Menzies, R.; Morarji, N.; Foster, D.; Mont, M.C.; Turkbeyler, E.; Gralewski, L. Multi-agent reinforcement learning for maritime operational technology cyber security. arXiv 2024, arXiv:2401.10149. [Google Scholar] [CrossRef]
  47. Liu, D.; Soares, C.G. Ship abnormal behaviour detection based on AIS data at the approach to ports. Reliab. Eng. Syst. Saf. 2025, 226, 111712. [Google Scholar] [CrossRef]
  48. Liang, M.; Weng, L.; Gao, R.; Li, Y.; Du, L. Unsupervised maritime anomaly detection for intelligent situational awareness using AIS data. Knowl.-Based Syst. 2024, 284, 111313. [Google Scholar] [CrossRef]
  49. Kontopoulos, I.; Chatzikokolakis, K.; Zissis, D.; Tserpes, K.; Spiliopoulos, G. Real-time maritime anomaly detection: Detecting intentional AIS switch-off. Int. J. Big Data Intell. 2020, 7, 85–96. [Google Scholar] [CrossRef]
  50. Zhang, R.; Qin, X.; Pan, M.; Li, S.; Shen, H. Adaptive Temporal Reinforcement Learning for Mapping Complex Maritime Environmental State Spaces in Autonomous Ship Navigation. J. Mar. Sci. Eng. 2025, 13, 514. [Google Scholar] [CrossRef]
  51. Minßen, F.-M.; Klemm, J.; Steidel, M.; Niemi, A. Predicting vessel tracks in waterways for maritime anomaly detection. Trans. Marit. Sci. 2024, 13. [Google Scholar] [CrossRef]
Figure 1. Carrier Selection Policy Performance: Baseline Policies vs. DQN. The error bars indicate the standard deviation. The horizontal red line within each box indicates the median value. The (left) panel shows the mean reward achieved by each policy with error bars representing standard deviation. The (right) panel presents box plots illustrating the full distribution of total rewards.
Figure 1. Carrier Selection Policy Performance: Baseline Policies vs. DQN. The error bars indicate the standard deviation. The horizontal red line within each box indicates the median value. The (left) panel shows the mean reward achieved by each policy with error bars representing standard deviation. The (right) panel presents box plots illustrating the full distribution of total rewards.
Sustainability 17 10836 g001
Figure 2. Action Selection Distribution of the Trained DQN Agent during evaluation period.
Figure 2. Action Selection Distribution of the Trained DQN Agent during evaluation period.
Sustainability 17 10836 g002
Figure 3. Evaluation Reward Distribution of the DQN Policy. The histogram shows the frequency distribution of cumulative rewards with an overlaid kernel density estimate.
Figure 3. Evaluation Reward Distribution of the DQN Policy. The histogram shows the frequency distribution of cumulative rewards with an overlaid kernel density estimate.
Sustainability 17 10836 g003
Figure 4. Systematic Policy Reward Improvement Ranking. The horizontal bar chart displays all 27 policy interventions ranked by their mean cumulative reward improvement over the baseline scenario ( μ b a s e l i n e = 35.24 ). Each bar’s length represents Δ R k , with color coding distinguishing policy categories: Economic Incentives (blue), Regulatory Environment (red), and Administrative Services (green). The red bar indicates a counterproductive policy.
Figure 4. Systematic Policy Reward Improvement Ranking. The horizontal bar chart displays all 27 policy interventions ranked by their mean cumulative reward improvement over the baseline scenario ( μ b a s e l i n e = 35.24 ). Each bar’s length represents Δ R k , with color coding distinguishing policy categories: Economic Incentives (blue), Regulatory Environment (red), and Administrative Services (green). The red bar indicates a counterproductive policy.
Sustainability 17 10836 g004
Figure 5. Risk-Return Landscape of Policy Interventions. The scatter plot positions each policy scenario by its mean cumulative reward (x-axis) and standard deviation (y-axis). Point size is proportional to | Δ R k | , and color gradient maps to improvement magnitude (green = positive, red = negative). The baseline scenario is marked with a red star at (35.24, 54.53).
Figure 5. Risk-Return Landscape of Policy Interventions. The scatter plot positions each policy scenario by its mean cumulative reward (x-axis) and standard deviation (y-axis). Point size is proportional to | Δ R k | , and color gradient maps to improvement magnitude (green = positive, red = negative). The baseline scenario is marked with a red star at (35.24, 54.53).
Sustainability 17 10836 g005
Figure 6. Policy Category Performance Heatmap. (Left) panel displays actual metric values; (right) panel shows normalized scores (0–1 scale) for cross-metric comparison. Rows represent policy categories; columns represent performance dimensions.
Figure 6. Policy Category Performance Heatmap. (Left) panel displays actual metric values; (right) panel shows normalized scores (0–1 scale) for cross-metric comparison. Rows represent policy categories; columns represent performance dimensions.
Sustainability 17 10836 g006
Table 1. Sustainability-Oriented Classification of Policy Factors Influencing Flag Selection.
Table 1. Sustainability-Oriented Classification of Policy Factors Influencing Flag Selection.
Sustainability DimensionPolicy Factors (Indicators)
Economic Sustainability(F1) Corporate Income Tax Rate
(F2) Tonnage Tax Level
(F3) Initial Ship Registration Fees
(F4) Annual Ship Maintenance Fees
(F5) Government Financial Subsidies and Incentives
(F6) Favorable Financing Policies (e.g., low-interest loans)
(F7) Accelerated Depreciation Policies
(F8) Tax Exemptions on Crew Wages
(F9) Foreign Exchange Control Policies
Environmental Sustainability(F10) Safety Inspection Standards and Frequency
(F11) Stringency of Environmental Regulations
(F12) Ship Technical Standard Requirements
(F13) Port State Control (PSC) Record and Environmental Performance
(F16) Compliance Costs (e.g., environmental and safety compliance burden)
(F17) Government’s Stance on International Maritime Conventions
(F24) Scope of Recognized Classification Societies
Social Sustainability(F14) Crew Nationality and Labor Rights Requirements
(F15) Legal System and Maritime Dispute Resolution Mechanism
(F18) Crisis Response and Maritime Security Capabilities
(F19) Convenience and Efficiency of the Registration Process
(F20) Level of Digitalization in Government Services
(F21) 24/7 Emergency Response and Technical Support
(F22) Multilingual Service Capability
(F23) Diplomatic Protection and Consular Services
(F25) Policy Stability and Predictability
(F26) Communication and Consultation Mechanisms with the Industry
(F27) Market Access Opportunities (e.g., cabotage rights)
Table 2. Operationalization and Quantification of Policy Factors.
Table 2. Operationalization and Quantification of Policy Factors.
FactorMeasurement ScaleOperationalizationNormalization Method
F10–50%Statutory corporate tax rate applicable to shipping companiesMin–Max: x / x m a x
F2Binary (0/1)Availability of tonnage-based taxation optionZ-score: ( x μ ) / σ
F3USD/GTOne-time fee per gross tonnage for new registrationsLog: log ( 1 + x )
F4USD/GT/yearRecurring annual fee per gross tonnageLog: log ( 1 + x )
F50–10 scaleGovernment subsidies as % of average vessel value (normalized)Min–Max: x / x m a x
F60–10 scaleAccess to policy bank loans (rate differential vs. market)Min–Max: x / x m a x
F7Binary (0/1)Availability of tax depreciation benefits for vesselsMin–Max: x / x m a x
F80–100%Maximum allowed foreign equity ownershipMin–Max: x / x m a x
F90–1 (Chinn-Ito)Inverse of capital account restrictiveness indexDirect (0–1 scale)
F100–100%Annual PSC inspection rate (inspections/port calls)Min–Max: x / x m a x
F110–10 scaleComposite score: IMO convention ratification + enforcement recordDirect (0–10 scale)
F120–10 scaleClassification society requirements beyond SOLAS minimumDirect (0–10 scale)
F130–10 deficienciesAverage deficiencies per inspection before detentionDirect (0–10 scale)
F140–10 scaleMLC compliance score + additional crew welfare provisionsDirect (0–10 scale)
F15−2.5 to +2.5World Bank Rule of Law index (WGI)Direct (−2.5–+2.5 scale)
F16USD/vessel/yearEstimated annual compliance costs (surveys, audits, reporting)Direct (USD/year)
F170–10 scaleExtent of derogations/exemptions from international conventionsDirect (0–10 scale)
F180–10 scaleInverse of security incident frequency + emergency response capabilityDirect (0–10 scale)
F19DaysAverage time from application to certificate issuanceDirect (Days)
F200–1 (EGDI)UN E-Government Development Index scoreDirect (0–1 scale)
F210–10 scaleGeographic coverage of registry offices + 24/7 supportDirect (0–10 scale)
F220–10 scalePSC white/grey/black list status + industry perception surveysMin–Max: x / x m a x
F23Number of missionsCount of embassies/consulates worldwideDirect (Count)
F24Binary (0/1)Acceptance of IACS-member classification societies onlyDirect (Binary)
F250–10 scaleInverse of regulatory change frequency (5-year rolling window)Direct (0–10 scale)
F260–10 scaleStakeholder engagement in policymaking (frequency + transparency)Direct (0–10 scale)
F270–10 scalePreferential access through bilateral shipping agreementsDirect (0–10 scale)
Table 3. Hyperparameter Configuration for DQN Model.
Table 3. Hyperparameter Configuration for DQN Model.
HyperparameterValue
Learning Rate ( α )0.001
Batch Size32
Discount Factor ( γ )0.95
Replay Buffer Size10,000
Target Network Update Frequency100 steps
Initial Rate ( ϵ start )1.0
Minimum Rate ( ϵ min )0.01
Exploration Decay Rate0.995
State Dimension20
Hidden Layer Dimension128
Action Dimension3
Training Episodes1000
Evaluation Episodes100
Random Seed42
Economic ( w 1 )0.50
Environmental ( w 2 )0.30
Social ( w 3 )0.20
Table 4. Environment and Model Parameter Configuration.
Table 4. Environment and Model Parameter Configuration.
CategoryParameterValue
EnvironmentMax Steps per Episode365 days
Discount Factor ( γ )0.95
Economic Volatility0.1
Random Seed42
Economic BaselineCorporate Tax (National/FOC)0.25/0.05
Regulatory Burden (National/FOC)0.8/0.3
Safety Inspection Prob. (National/FOC)0.1/0.4
Other Policies (F2–F9, F11–F27)Neutral baseline
RL AgentAlgorithmDQN with Attention
Hidden Dimension128
Learning Rate0.001
Training Episodes1000
Evaluation Episodes100
Reward WeightsEconomic ( w 1 )0.50
Environmental ( w 2 )0.30
Social ( w 3 )0.20
Table 5. Table Comparing Performance of Top 15 Policies.
Table 5. Table Comparing Performance of Top 15 Policies.
RankPolicyMeanImprov.Std DevRisk
Reward Level
-Baseline35.240.0054.53HIGH
1F1 Low Corporate Income Tax166.60131.3754.39HIGH
2F5 High Financial Subsidy108.3073.0654.51HIGH
3F10 High Safety Inspection Freq93.5858.3554.52HIGH
4F25 High Policy Stability79.0043.7754.55HIGH
5F19 High Registration Convenience78.9843.7454.47HIGH
6F12 High Tech Standard49.9414.7154.52HIGH
7F8 Crew Wage Tax Exemption49.8914.6554.51HIGH
8F27 High Market Access49.8414.6054.42HIGH
9F7 Accelerated Depreciation49.8314.6054.53HIGH
10F21 Emergency Support 24 749.8314.5954.47HIGH
11F16 Low Compliance Cost49.8214.5954.53HIGH
12F14 High Crew Qualification49.8214.5854.56HIGH
13F17 Positive Convention Stance49.8114.5854.45HIGH
14F20 High Service Digitalization49.8114.5754.57HIGH
15F18 High Crisis Response49.8114.5754.55HIGH
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, G.; Liang, Y.; Zhang, B.; Zhang, Z. Deep Learning-Enabled Policy Optimization for Sustainable Ship Registry Selection. Sustainability 2025, 17, 10836. https://doi.org/10.3390/su172310836

AMA Style

Xie G, Liang Y, Zhang B, Zhang Z. Deep Learning-Enabled Policy Optimization for Sustainable Ship Registry Selection. Sustainability. 2025; 17(23):10836. https://doi.org/10.3390/su172310836

Chicago/Turabian Style

Xie, Gengquan, Yarong Liang, Bin Zhang, and Zihui Zhang. 2025. "Deep Learning-Enabled Policy Optimization for Sustainable Ship Registry Selection" Sustainability 17, no. 23: 10836. https://doi.org/10.3390/su172310836

APA Style

Xie, G., Liang, Y., Zhang, B., & Zhang, Z. (2025). Deep Learning-Enabled Policy Optimization for Sustainable Ship Registry Selection. Sustainability, 17(23), 10836. https://doi.org/10.3390/su172310836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop