Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness

Park, Hyeri; Kim, Jeongmin; Jung, Min; Kang, Suk-young; Kim, Daegun; Kim, Changwoo; Jang, Unkyu

doi:10.3390/app152010993

Open AccessArticle

Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness

by

Hyeri Park

¹,

Jeongmin Kim

^2,*

,

Min Jung

³

,

Suk-young Kang

³,

Daegun Kim

³,

Changwoo Kim

³ and

Unkyu Jang

³

¹

Korea Maritime Institute, 26, Haeyang-ro 301beon-gil, Yeongdo-gu, Busan 49111, Republic of Korea

²

Korea Institute of Maritime and Fisheries Technology (Yongdang Campus), 93, Sinseon-ro 365beon-gil, Nam-gu, Busan 48562, Republic of Korea

³

Korea Institute of Maritime and Fisheries Technology, 367 Haeyang-ro, Yeongdo-gu, Busan 49111, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10993; https://doi.org/10.3390/app152010993

Submission received: 16 September 2025 / Revised: 5 October 2025 / Accepted: 11 October 2025 / Published: 13 October 2025

(This article belongs to the Special Issue Risk and Safety of Maritime Transportation)

Download

Browse Figures

Versions Notes

Abstract

Maritime Autonomous Surface Ships (MASSs) raise safety and regulatory challenges that extend beyond technical reliability. This study builds on a published system-theoretic process analysis (STPA) of degraded operations that identified 92 loss scenarios. These scenarios were reformulated into a two-round Delphi survey with 20 experts from academic, industry, seafaring, and regulatory backgrounds. Panelists rated each scenario on severity, likelihood, and detectability. To avoid rank reversal, common in the Risk Priority Number, an adjusted index was applied. Initial concordance was low (Kendall’s W = 0.07), reflecting diverse perspectives. After feedback, Round 2 reached substantial agreement (W = 0.693, χ² = 3265.42, df = 91, p < 0.001) and produced a stable Top 10. High-priority items involved propulsion and machinery, communication links, sensing, integrated control, and human–machine interaction. These risks are further exacerbated by oceanographic conditions, such as strong currents, wave-induced motions, and biofouling, which can impair propulsion efficiency and sensor accuracy. This highlights the importance of environmental resilience in MASS safety. These clusters were translated into five action bundles that addressed fallback procedures, link assurance, sensor fusion, control chain verification, and alarm governance. The findings show that Remote Operator competence and oversight are central to MASS safety. At the same time, MASSs rely on artificial intelligence systems that can fail in degraded states, for example, through reduced explainability in decision making, vulnerabilities in sensor fusion, or adversarial conditions such as fog-obscured cameras. Recognizing these AI-specific challenges highlights the need for both human oversight and resilient algorithmic design. They support explicit inclusion of Remote Operators in the STCW convention, along with watchkeeping and fatigue rules for Remote Operation Centers. This study provides a consensus-based baseline for regulatory debate, while future work should extend these insights through quantitative system modeling.

Keywords:

Maritime Autonomous Surface Ships (MASS); Delphi; STPA; Remote Operator (RO); STCW; degraded state; risk prioritization

1. Introduction

Maritime Autonomous Surface Ships (MASSs) are often described as a fourth-generation innovation in maritime transport. They combine artificial intelligence, advanced sensors, satellite communication, and real-time data processing into a single control system that reduces human intervention and strengthens safety [1,2]. The idea is no longer limited to theory. In 2018, Finland operated a car ferry by remote control, and, in 2021, Norway launched sea trials of a fully autonomous electric container ship [2]. East Asian countries, such as South Korea, Japan, and China, have also begun large-scale programs, reflecting the rapid globalization of MASS development [1,2].

To address these advances, the International Maritime Organization (IMO) is preparing the MASS Code, which is scheduled to enter into force in 2032. The code is intended to provide harmonized international standards for safe and accountable operation [3]. This is an important step, yet discussions so far have focused heavily on technical matters. Legal, institutional, and human factors remain less developed, which leaves a gap between technological progress and regulatory preparation [3,4].

One of the most urgent questions concerns the Remote Operator (RO). ROs supervise and control MASSs from a land-based Remote Operation Center (ROC), taking on duties that would otherwise belong to shipboard officers. Although the draft MASS Code recognizes the human element, it does not resolve where command authority should lie, or how accountability should be shared in remote operations [4,5,6]. This also raises the broader question of whether a master must remain on board, or whether command can legally be exercised from shore. Proposals supporting the latter emphasize efficiency, but critics warn about communication delays and reduced situational awareness [4,5]. In addition, AI-based perception and decision-making systems face technical limitations. Machine learning models may fail under data scarcity or adversarial conditions, such as obscured visual sensors or corrupted training data, while neural networks for object detection can produce unreliable outputs in degraded states. These vulnerabilities underscore the need for ROs to not only compensate for communication gaps, but also to monitor and intervene when AI performance deteriorates.

Existing conventions add to the difficulty. The International Convention on Standards of Training, Certification and Watchkeeping for Seafarers (STCW) applies only to crew physically on board. ROs are excluded, which leaves competence requirements, watchkeeping rules, and fatigue management outside the international framework [6,7,8,9]. As MASSs move toward commercial use, this gap becomes a major safety concern. Recent IMO debates show that work on the MASS Code cannot be separated from new training and certification standards for ROs [6].

Safety engineering studies reinforce this point. MASSs can operate in degraded states, where they remain functional, but outside their intended design conditions. Failures of sensors, propulsion, or communication can lead to unsafe control actions that escalate into serious losses. Park and Kim [10] applied system-theoretic process analysis (STPA) to MASSs under such conditions and identified 31 unsafe control actions (UCAs) and 92 detailed loss scenarios (LSs). Their work highlights the role of ROs in compensating for degraded autonomy, especially in situations where the detectability of failures is low.

Other research points to the distinct skills required of ROs. Kuntasa and Lirn [8] outlined a competency model that emphasizes situational awareness, workload management, and emergency response. Their findings show that traditional competence tables cannot be applied directly to remote operations. Legal scholars add that ROs do not fit neatly into existing categories of “crew” or “master,” which raises unresolved questions of liability and accountability [9].

Taken together, these studies show that safe deployment of MASSs depends on more than reliable technology. It also requires legal clarity, institutional readiness, and recognition of RO competence. Building on the STPA-based identification of loss scenarios [10], this study applies structured expert judgment through the Delphi method. The outputs are then linked to the STCW convention to argue that RO competence, watchkeeping, and fatigue management should be explicitly incorporated into international maritime law. The overall aim is to provide both empirical and normative grounds for embedding ROs within the STCW framework as a foundation for safe MASS operations.

The rest of this paper is organized as follows. Section 2 reviews earlier research on MASS safety, regulation, and human factors. Section 3 explains the framework that combines STPA and Delphi. Section 4 presents the survey results and develops a roadmap. Section 5 draws conclusions and points to implications for training and regulation.

This study is guided by the following research questions:

What are the critical loss scenarios (LSs) for MASSs in degraded states identified through STPA?
How can expert consensus (Delphi) help prioritize these risks?
In what ways can these risks inform training and regulatory frameworks under the STCW convention?
How do human–AI interactions affect MASS safety management?

2. Literature Review

2.1. Safety Analysis of MASSs Using STPA

System-theoretic process analysis has emerged as one of the most promising approaches for assessing the safety of complex socio-technical systems. In the maritime domain, Park and Kim [10] applied STPA to MASSs operating under degraded states. Their study defined system losses, identified 31 UCAs, and developed 92 detailed LSs. These scenarios covered navigation, propulsion, sensing, communication, cybersecurity, and human interaction. The findings demonstrated how degraded states amplify operational risk, particularly when communication delays, conflicting commands, or incomplete situational awareness disrupt remote decision making. In addition, oceanographic variability can aggravate these failures: strong currents and wave-induced motions may destabilize navigation and increase propulsion load, while biofouling and salinity changes can reduce the accuracy of sonar and LIDAR sensors. These factors demonstrate that degraded states in MASSs are inseparable from the marine environment, a concern also reflected in IMO discussions on environmental resilience [3]. This work provides a structured foundation for quantifying degraded-state hazards and forms a direct link between risk analysis and regulatory debate.

The theoretical underpinnings of STPA were developed by Leveson [11], who emphasized that accidents should be understood as control problems in dynamic systems, rather than the sum of component failures. This perspective shifted the focus of safety analysis from reactive failure counting to proactive identification of unsafe control paths. The method was further refined and popularized in the STPA Handbook by Leveson and Thomas [12], which offered step-by-step procedures for practical application. The Republic of Korea institutionalized these advances through the Risk Analysis Guide Using STPA [13], which illustrates the growing recognition of STPA as a standard tool for analyzing safety in emerging technologies.

2.2. Applications of STPA in Maritime Contexts

The maritime research community has increasingly employed STPA to understand risks specific to autonomous operations. Wróbel et al. [14] proposed a system-theoretic model for autonomous merchant vessels, demonstrating how STPA could capture hazards overlooked by traditional methods. Wróbel et al. [15] extended this work with preliminary empirical findings, confirming the method’s feasibility for MASSs. Rokseth, Haugen, and Utne [16] applied STPA to safety verification and showed that the technique uncovered vulnerabilities in autonomous control architectures that would otherwise remain hidden.

Further developments integrated organizational and technical elements. Utne et al. [17] advanced supervisory risk control, arguing that MASS safety depends on multi-layered oversight, rather than technical redundancy alone. Glomsrud and Xie [18] applied STPA to co-analyze safety and security, revealing how cyber vulnerabilities could intersect with traditional hazards. Chaal et al. [19] formalized hierarchical control structures, enabling STPA to reflect the multi-level complexity of autonomous vessels. Eriksen and Lützen [20] explored redundancy in machinery systems, finding that even well-designed redundancy cannot guarantee safety without integration into system-theoretic risk models. Collectively, these works confirm that STPA is not only adaptable, but also necessary for capturing the layered risks of MASSs.

2.3. Regulatory Foundations and IMO Discussions

While safety analysis has advanced rapidly, regulatory developments have lagged behind. The IMO’s regulatory scoping exercise [21] revealed gaps across existing conventions and set the foundation for the draft MASS Code [3]. The code is the first attempt to provide an international regulatory framework for autonomous shipping, but it remains non-mandatory and broad in scope. Ongoing discussions address whether the master must remain on board [4], how human element provisions should be strengthened [3], and how training and certification of ROs should be formalized [5,6].

National submissions also broaden the debate. Belgium’s proposal on remote operations management [22] suggested that MASSs and ROCs should be certified as an integrated system. This view introduces accountability for system updates and cybersecurity alongside operational control. Reports from the MASS Working Group [23] confirm that these topics are politically sensitive, as they touch on liability and command authority. The outcome is a fragmented regulatory environment, in which technology is advancing faster than international law.

2.4. Human Factors, Competence, and Training

Human elements remain central to the safe operation of MASSs. Porath at el. examined situational awareness in remote centers and showed that interface design, display quality, and alarm management significantly affect operator performance [24]. Building on these insights, Kuntasa and Lirn [8] developed a competency model for ROs. They argued that remote operators need unique skills in information processing, workload management, and emergency response, which cannot simply be borrowed from existing seafarer competence tables.

These findings align with the concerns raised by the IMO that, without proper training and certification frameworks [5,6], ROs may face excessive workload and fatigue, potentially leading to unsafe operations. The literature confirms that competency standards are essential, not optional, if MASSs are to be safely integrated into international shipping.

2.5. Legal and Institutional Perspectives

The legal status of ROs remains unresolved. Choi et al. [9] argued that current conventions exclude ROs from the definitions of “crew” or “master.” This exclusion creates ambiguity in liability, insurance, and accountability. Proposals within the IMO [4] indicate that consensus has not been reached on whether the master must remain physically present on board. National legal frameworks, such as Korea’s Act on the Investigation of and Inquiry of Marine Accidents, are beginning to adapt independently [25]. The states may move ahead without international agreement if the IMO process remains slow.

2.6. Synthesis

STPA provides a structured and proactive framework for safety analysis [10,11,12,13,14,15,16,17,18,19,20]. The IMO’s regulatory efforts [3,4,5,6,21,22,23] mark important progress, but leave key issues unresolved. Human factor studies [8,24] emphasize that ROs require competencies distinct from seafarers on board. Legal debates [4,9,25] expose ambiguity in accountability, while national measures reveal divergence across jurisdictions. Finally, research on cybersecurity and redundancy [18,19,20,26] underscores that technical safeguards alone cannot ensure safety.

Together, these findings demonstrate that the safety of MASSs depends on integrating engineering analysis, human competence, and legal clarity. This study builds directly on these insights by applying Delphi-based expert judgment to prioritize degraded-state risks and by mapping the results to the STCW framework. In doing so, it links technical safety methods with regulatory and institutional reform, offering a pathway toward safer and more accountable MASS operations.

3. Methodology

3.1. Research Framework

This study adopts a multi-stage research framework that integrates system-theoretic safety analysis, structured expert judgment, dynamic modeling, and regulatory alignment. The approach is explicitly sequential, with each stage providing the foundation for the next. By design, the framework seeks to overcome a common limitation in safety studies of MASSs, namely the gap between abstract hazard identification and the formulation of actionable regulatory standards.

Stage 1: Hazard identification through STPA. The process begins with STPA, which has been recognized as a leading method for identifying unsafe control actions in complex socio-technical systems [10,11,12,13]. The STPA study by Park and Kim [10] defined system-level losses and hazards and produced 31 UCAs and 92 LSs. These LSs constitute a comprehensive risk inventory that reflects degraded operational states of MASSs. STPA ensures that the initial dataset is structured, traceable, and directly linked to control actions within the ROC–RO–MASS system boundary.
Stage 2: Prioritization through Delphi. Because 92 scenarios cannot all be treated with equal weight, the second stage applies the Delphi technique to elicit expert judgment. Experts rate each LS across three dimensions, severity (S), likelihood (L), and detectability (D), producing quantitative data that capture both perceived importance and uncertainty. The Delphi method was chosen because MASS operations lack a large body of accident statistics, and structured consensus building provides a credible substitute for empirical frequencies. Iterative rounds of feedback and convergence increase the reliability of the prioritization.
Stage 3: Regulatory alignment with STCW convention. Finally, the outcomes of the Delphi are mapped onto the competence requirements of the STCW convention. In particular, the framework identifies where ROs should be explicitly included in the STCW conventions, covering competence elements such as situational awareness, multi-vessel monitoring, emergency transition procedures, and fatigue management. This linkage ensures that this study does not end at risk identification, but produces normative implications for international maritime regulation [2,3,4,5,6,22,23].

By connecting these three stages, the research framework establishes a closed loop between safety analysis, expert prioritization, and institutional reform, as shown in Figure 1. This design reflects the central premise of this study: the safe operation of MASSs requires not only technical robustness, but also legal clarity and explicit recognition of human competence within global regulatory instruments.

3.2. Scenario Source: STPA Outputs

The foundation of the methodology is the STPA study by Park and Kim [10]. That study defined four types of losses (L-1 to L-4) and ten categories of hazards (H-1 to H-10). Using a control structure that included the ROC and RO, the MASS itself, and the regulatory authority, the study identified 31 UCAs and developed 92 detailed LSs. These scenarios cover navigation, propulsion and speed control, sensing, communication, cybersecurity, and human interaction.

The present study adopts these LSs in full. Each scenario is reformulated into a Delphi survey item. The wording of each item remains consistent with the original STPA terminology and numbering to preserve traceability. The classification also ensures that risks associated with human competence and communication—often neglected in technical analyses—receive explicit attention [11,12,13].

3.3. Delphi Survey Design

The Delphi technique was selected due to its combination of expert knowledge with iterative feedback to build consensus in areas where empirical data are scarce. This is especially relevant for MASS operations, where large-scale accident statistics do not yet exist. Approximate reasoning was applied in other risk studies, but it was not used here. Instead, we relied on Kendall’s W and the interquartile range to judge consensus. This approach gives clear and transparent results that fit the Delphi process. Approximate reasoning may handle uncertainty well, but it is harder to interpret and does not fit the step-by-step design of Delphi. For this reason, Kendall’s W was the preferred choice.

Panel composition: The panel consisted of 20 experts from four areas: academia, industry, seafaring professionals, and government and regulatory agencies. The group of experts ensured that technical, operational, investigative, and ergonomic perspectives were all represented. The number of experts aligns with established Delphi methodology, which typically recommends panels of 10–30 participants to balance diversity and manageability [27,28]. The two-round design also reflects common practice in consensus-building studies [27,28].
Rounds: Two rounds were conducted. In the first round, each panelist rated every LS on three dimensions: S, L, and D. A five-point scale was used, depending on the precision needed. In the second round, the aggregated group statistics (median, interquartile range, and distribution) were provided as feedback, and panelists were invited to revise their ratings in light of this information. A third round would be considered for scenarios where consensus was not achieved in the first two rounds.
Consensus criteria: Agreement was measured using Kendall’s coefficient of concordance (W) and interquartile range (IQR). A scenario was considered to have reached consensus if W ≥ 0.7 or if the IQR was less than or equal to 1. Approximate reasoning methods were considered conceptually, but were not implemented in this study. Instead, uncertainty and consensus were assessed directly through Kendall’s W and interquartile ranges, which provided the basis for convergence assessment between rounds. In interpreting detectability, the panel primarily considered operational detectability, but it was also relevant to account for computational detectability, i.e., the ability of AI-based anomaly detection or predictive analytics to identify failures in real time. Although not directly implemented in the present Delphi framework, such AI-informed criteria offer a pathway for future refinement of the risk index. Scenarios that fail to meet these thresholds were flagged for sensitivity analysis in later modeling stages.
Priority index: In this study, risk prioritization was based on three dimensions: S, L and D. This triad originates from the classical Failure Mode and Effects Analysis (FMEA), where the Risk Priority Number (RPN) is originally computed as the simple product shown in Equation (1).

$R P N = S * D * L$

(1)

While widely adopted, the traditional RPN has been criticized for rank reversal problems and disproportionate sensitivity to detectability values [29]. As Bowles and Pelaez pointed out, the multiplicative form often produces identical scores for scenarios with very different underlying risk characteristics, leading to ambiguity in prioritization [30]. To address these limitations, several authors have suggested modifications in which detectability is treated not as a core multiplier, but as a weighting factor that adjusts the baseline risk derived from severity and likelihood [30]. Pillay and Wang demonstrated that approximate reasoning methods could provide more stable rankings by handling detectability as an auxiliary adjustment variable [31]. Furthermore, the international standard “International Electrotechnical Commission (IEC) 60812:2018” recommends moving away from sole reliance on RPN and toward Action Priority (AP), while still recognizing detectability as an important modifier in judging whether additional mitigation is necessary [32].
Accordingly, this study adopts the adjusted risk function shown in Equation (2).

$R i s k = S_{m e d} * L_{m e d} * (1 + 0.7 * \frac{D_{m e d} - 1}{4})$

(2)

This ensures that severity and likelihood form the structural basis of the risk index, while scenarios with low detectability are elevated in ranking to reflect the practical difficulty of timely intervention.
Prioritization: Within each functional group, scenarios are ranked by their priority indices. The highest-ranked scenarios across all groups form a Top Risk Set of 10 scenarios. These constitute the critical pathways that require further regulatory attention.
Quality control: To maintain validity, rounds are conducted at fixed intervals, panelist anonymity is preserved, and only statistical summaries are shared between rounds. This minimizes group pressure and ensures that consensus reflects genuine convergence of opinion.

4. Results

4.1. First-Round Delphi Result

4.1.1. Panel and Response Summary

Round 1 collected a complete set of evaluations for all 92 loss scenarios across three dimensions (severity, likelihood, detectability). The dataset comprises N = 1840 judgments, which corresponds to 20 panelists × 92 scenarios. All invited experts completed the survey (100% response rate), as shown in Table 1.

The panel was designed to ensure balanced representation of educational, industrial, operational, and governmental perspectives. To protect anonymity, individual identities are not disclosed, and only aggregated domain-level information is reported. The average professional experience among participants was 12 years, with a range of 8–21 years. This diversity ensured that the evaluation of loss scenarios incorporated academic rigor, industrial feasibility, practical seamanship, and policy perspectives.

4.1.2. Concordance and Global Test of 1st Round

The Friedman/Kendall analysis shows low initial concordance, but significant differentiation, across dimensions, as shown in Table 2.

4.1.3. Prioritization of Loss Scenarios

The Round 1 Delphi survey produced median values for S, L, and D across all 92 scenarios. To identify the most critical pathways for subsequent modeling, a shortlist of 30 scenarios was generated. Prioritization was based on the following rules.

Risk index as the baseline measure, defined as in Equation (1).
Scenarios with Risk ≥ 22 were considered of high priority.
Consensus gate: Scenarios were retained if all three IQR values (S, L, D) were ≤ 1.
Safety override: Scenarios with $S_{m e d}$ = 5 and $L_{m e d}$ ≥ 3 were included even, if Risk* was below the threshold, to ensure that low-frequency/high-severity risks were not excluded.
Balanced representation: The final set ensured adequate coverage of navigation, propulsion, sensor, connectivity, cyber, and human factors.

The resulting Top 30 scenarios are shown in Figure 2 and Table 3. These scenarios represent the most severe, most probable, and least detectable risks identified in Round 1. They provide the foundations for both Round 2 Delphi refinement and calibration of the System Dynamics model.

The cutoff threshold of Risk ≥ 22 was selected to capture approximately the top 30% of scenarios, which aligns with common practice in FMEA-based prioritization, where thresholds are calibrated to the five-point scoring scale [31]. This approach ensured that the prioritization did not overly concentrate on marginal differences at the lower end of the distribution. In addition, scenarios with severity = 5 and likelihood ≥ 3 were retained, even when their composite risk index fell below the threshold. This safety override rule was introduced to prevent the exclusion of low-frequency but catastrophic events, reflecting established recommendations in risk engineering that emphasize precautionary treatment of extreme-severity outcomes [29,30,31]. In this way, the rules ensured both statistical robustness and safety sensitivity in the construction of the Top 30 set.

The cutoff rule and quality control follow established practices in risk engineering and FMEA prioritization [32]. Reducing the indices to a manageable set ensures interpretability and aligns with prior applications in maritime and aviation safety studies. The threshold of 22 was set a priori to capture the upper tier of risk scores while keeping the evaluation manageable and transparent. Scenarios were considered “dominated” when they scored lower on severity and likelihood without disadvantages in detectability. Such items were deprioritized, but low-frequency/high-severity cases were retained through a safety override to ensure that catastrophic outcomes were not excluded.

In practice, quality control was carried out through three checks: (i) verifying completeness of expert responses, (ii) confirming that rating scales were applied consistently, and (iii) monitoring stability of the final Top 30 across Delphi rounds. These steps ensured that the prioritization results reflected consistent expert judgment, rather than artifacts of calculation.

4.2. Second Round Delphi Result

4.2.1. Concordance and Global Test of 2nd Round

Round 2 achieved a substantial improvement in consensus among the 20 experts. The Friedman/Kendall analysis indicated strong agreement across severity, likelihood, and detectability ratings, as shown in Table 4.

A Kendall’s W of 0.693 indicates substantial concordance and approaches the commonly cited Delphi convergence threshold of 0.70. Together with a large chi-square (χ² = 3265.42, df = 91, p < 0.001), this shows effective alignment after feedback.

For comparison, the overall Kendall’s W in Round 1 was only 0.07, reflecting minimal agreement among the experts. The progression from W = 0.07 in Round 1 to W = 0.693 in Round 2 clearly demonstrates that structured feedback successfully transformed initially divergent judgments into strongly aligned consensus.

4.2.2. Prioritization and Stability of Loss Scenarios

Compared with Round 1, the rank order of critical scenarios stabilized substantially. All of the previously identified Top 30 scenarios were retained, but the interquartile ranges narrowed, and medians converged. Importantly, the Top 10 scenarios showed near-unanimous agreement across all three dimensions, as shown in in Figure 3 and Table 5.

4.2.3. Patterns of Change from Round 1

The null hypothesis was that there would be no concordance among experts (W = 0). Given the diversity of the panel, Round 1 was expected to show low agreement, while Round 2 was expected to show improvement after structured feedback. The results confirmed this expectation: the first round yielded a very low W, whereas the second round produced a substantially higher W, indicating strong consensus.

Round 2 showed a marked consensus improvement, with Kendall’s W rising from 0.07 in Round 1 to 0.693 in Round 2, indicating a clear shift from divergent views to strong alignment. Kendall’s W measures agreement, not algorithmic performance. The increase in W reflects improved consensus among experts, rather than a change in model capacity. Kendall’s W only shows the degree of concordance among panel members. It should not be read as a measure of predictive accuracy or model efficiency. The rise in W demonstrates that structured feedback led experts toward a common view, rather than any gain in computational performance. The primary set of indices was deliberately kept minimal to maintain interpretability; broader sets added complexity without materially improving prioritization. The ranking proved stable, as critical propulsion and communication failures (LS-009, LS-030, LS-066, LS-062) consistently dominated the Top 10. Human factor issues with low detectability—such as LS-037 and LS-083—remained prominent, underscoring their perceived importance despite relatively lower median scores. Conversely, scenarios with persistently low ratings and wide dispersion (e.g., LS-059, LS-089) fell to the bottom of the list, confirming stable expert rejection.

Specifically, the Top 10 scenarios identified in Round 2 were as follows: LS-009 (MASS fails to avoid navigational hazards following communication failure), LS-066 (false distress signal triggers unnecessary evasive action), LS-058 (MASS moves abnormally due to erratic sensor switching), LS-062 (MASS drifts into traffic without emergency signals), LS-012 (MASS ignores emergency stop due to blocked command), LS-030 (MASS sustains further damage from propulsion reactivation), LS-076 (hacker sends false navigation command to MASS), LS-034 (MASS collides because of late emergency stop), LS-010 (MASS loses control in a congested waterway, disrupting traffic), and LS-078 (ransomware locks MASS control systems). Together, these scenarios reflect a combination of propulsion and machinery failures, communication breakdowns, sensing and cybersecurity vulnerabilities, and human–machine interaction risks. This list underscores that the Delphi panel consistently prioritized both technical and human factor hazards as critical pathways for degraded MASS operations. Detailed descriptions of all 92 scenarios are provided in Appendix A.

4.2.4. Implications for Modeling

The stabilized Top 10 scenarios offer a solid empirical base for the next stage of analysis. Round 2 concordance reached W = 0.693. This shows that structured expert judgment can deliver a dependable priority list, even when large accident datasets are not available. At the same time, expert panels are not uniform across organizations or operations. Results can shift with a different mix of disciplines or contexts. Further checks are therefore needed.

Future work should carry these prioritized scenarios into quantitative models that track how risks change over time. Simulation-based approaches can represent operator workload, communication degradation, and recovery after detection. Delphi outputs can supply parameter ranges or priors. These inputs allow the model to look for thresholds, tipping points, recovery rates, and path dependence that static ranking cannot show.

Such models can also test policy choices. Examples include the number of vessels per RO, watchkeeping schedules, and communication redundancy. Scenarios should run under representative traffic and weather conditions. Outputs should include risk curves, time to recovery, and probabilities of loss events. If the Top 10 remain critical across settings, the list is robust. If not, the analysis will show where the leverage points lie for regulation, training, and design.

In short, Delphi delivers a reasoned map of concern, due to quantitative modeling then stress-tests that map and turn it into actionable evidence for a comprehensive MASS risk-management framework.

4.3. Practical Prioritization Outcomes

The Round 2 shortlist yields a stable Top 10 with tight interquartile ranges. These scenarios, already critical in degraded states, become even more hazardous under adverse oceanographic conditions, such as high-sea states or strong currents, which place additional stress on propulsion systems and limit maneuverability. These scenarios cluster around the following four technical domains and one human factor domain:

Propulsion and machinery: LS-009, LS-030, LS-066, LS-062;
Communication and links: LS-010;
Sensing and perception: LS-076, LS-078;
Integrated control/systems: LS-012, LS-058;
Human–machine interaction: LS-034.

This pattern is consistent with degraded-state observations in prior STPA work [10] and redundancy limits reported in maritime studies [20]. The presence of LS-034 in the Top 10 underscores the roles of display logic and alarm behavior. It also supports competency concerns raised for ROs [8].

For near-term use, the Top 10 can be mapped to five action bundles:

Propulsion fallback and verification (LS-009/030/066/062): pre-sail validation, auto-fallback logic, health monitoring thresholds, and shore-based escalation rules;
Link assurance and congestion control (LS-010): dual-path links, latency budgets, hold-last-command limits, and degraded-mode command filters;
Sensor fusion hardening (LS-076/078): cross-checks across sensors, confidence scores to the Human–Machine Interface (HMI), and minimum data quality for remote maneuvers;
Integrated control checks (LS-012/058): watchdogs on actuation chains, rollback of unsafe parameter updates, and test-before-application for remote patches;
HMI and alarm governance (LS-034): alarm suppression rules, grouping of correlated alarms, and RO rate-limited advisories with acknowledgments.

4.4. Regulatory Implications

The findings have direct bearing on STCW and the draft MASS Code. Three implications follow.

First, regarding competence profiles for ROs, high-ranking scenarios involve multi-modal diagnosis under latency or partial failure. ROs, therefore, need assessed skills in workload management, degraded-mode navigation, failure triage, and cross-vessel supervision [8]. The STCW tables should add elements for remote monitoring, communication budgets, cyber-aware troubleshooting, and recovery planning. This addresses the present gap, where STCW covers only on-board personnel [6,9].

Second, ROC watchkeeping and fatigue control. Top 10 scenarios show risk growth when alarm frequency and vessel count rise. This supports mandatory ROC watchkeeping standards: duty cycles, rest minima, and load limits per operator. Human factor exposure must be auditable under the MASS Code’s human element provisions [3,4,5,6]. Fatigue rules should mirror bridge practice, yet reflect ROC realities, such as night shifts and multi-vessel oversight.

Third, fallback and degraded-mode procedures. The ranking justifies codified degraded-mode playbooks: who commands, which actions are allowed, and when to hand off to local assistance. These procedures should be referenced in the MASS Code annexes and tied to certification of both the ship and the ROC as a single system [3,22,23].

4.5. Proposed Example of Implication Roadmap

This roadmap is derived directly from the prioritized Round 2 results, clustering the Top 10 scenarios into domain groups such as propulsion, communication, sensing, and human–machine interaction. It translates these consensus risks into phased actions for training and regulatory adoption. An example of staged adoption reduces burden while closing the most significant gaps.

Phase 1—Operational safeguards (0–6 months): These measures are technical in nature, but they also involve moderate costs for software updates and training. Administrative effort is needed to align inspection and audit requirements between flag states.
- Apply the five action bundles to high-risk routes.
- Introduce link-latency budgets, alarm governance rules, and propulsion health gates.
- Begin RO drills for degraded operation and remote handover.
Phase 2—Competence and oversight (6–18 months): This stage requires more resources. Simulator-based training and certification cycles are expensive. Regulatory barriers may arise when states apply different rules for watchkeeping. Data privacy must also be reviewed, since monitoring RO performance involves sensitive information.
- Pilot competence assessments aligned to the Top 10 skills [8].
- Establish ROC watchkeeping schedules and fatigue limits with audit trails.
- Add Key Performance Indicator (KPI) dashboards: alarm rate per RO, recovery time from degraded modes, and frequency of safe handovers.
Phase 3—Certification alignment (18+ months): The final stage demands both technical and administrative effort. International agreement on certification is slow and costly. Gaps remain in the way MASS control systems interact with national infrastructure. Policy on data handling and privacy must be updated before wide adoption is possible.
- Prepare documentation sets for MASS-plus-ROC certification as an integrated system [22,23].
- Propose STCW table additions for RO training outcomes, assessment methods, and revalidation cycles [5,6,9].
- Feed operational KPIs back into risk reviews to sustain improvement.

The roadmap presented above is an illustrative example, rather than a fixed sequence. The actual pathway of adoption will vary across flag states, vessel types, and organizational practices. What remains essential is that these measures are validated during the EBP (Experience-Building Phase). This phase, with its prototype operations and pilot projects, provides an opportunity to test degraded-mode procedures under conditions that are controlled, yet realistic. In this context, the roadmap also serves as a framework for assessing RO competence. Areas such as workload management, failure triage, and multi-vessel supervision must be evaluated so that human performance is considered alongside technical reliability. The EBP should, therefore, be treated as a proving ground for human capability. This ensures that the integration of MASSs is guided not only by engineering readiness, but also by evidence-based understanding of human competence and its limitations. In this way, the staged adoption process achieves more than filling current regulatory gaps. It establishes a data-driven foundation for embedding competence standards into the STCW framework and supporting the safe and accountable introduction of MASSs. The roadmap should, therefore, be read as qualitative and illustrative. The phase durations are indicative—short term (0–6 months), medium term (6–18 months), and long term (18+ months)—and intended to reflect typical development and training cycles. Cost implications are expressed in relative terms, such as additional training versus technological upgrades, rather than absolute figures. Future validation and quantification will be required before this roadmap can be generalized beyond the present study.

5. Conclusions

This study examined the safety and regulatory implications of MASSs by integrating a system-theoretic safety analysis with structured expert judgment. Beginning with a comprehensive STPA, the research established a dataset of 92 LSs reflecting degraded operational states of MASSs, ranging from propulsion and navigation failures to human–machine interaction risks. These scenarios provided a structured and traceable basis for expert evaluation, ensuring that the risk inventory was anchored in system-level control logic, rather than anecdotal incidents.

The Delphi process was then applied to elicit and refine expert judgment in areas where empirical accident statistics are scarce. Across two rounds, the 20-member panel—drawn from academia, industry, seafaring professionals, and regulatory bodies—rated each LS in terms of severity, likelihood, and detectability. Round 1 demonstrated the expected divergence of opinion, with a Kendall’s W of 0.07, reflecting heterogeneity of perspectives. However, Round 2’s structured feedback and convergence mechanisms elevated concordance to W = 0.693, well above conventional thresholds for Delphi consensus. This result highlights the utility of iterative expert elicitation in domains of high uncertainty and demonstrates that even diverse panels can achieve robust alignment when guided by transparent procedures.

The prioritization process produced both stability and clarity. High-severity and high-likelihood scenarios, such as propulsion control failures (LS-009, LS-066) and critical communication breakdowns (LS-030, LS-062), consistently occupied the top ranks. These scenarios reflect hazards with immediate catastrophic potential, where operator intervention may be ineffective without redundant safeguards. Equally important, the results highlighted scenarios with high severity but low detectability, such as cognitive overload of ROs and delayed alarm recognition (LS-037, LS-083). Although these risks were sometimes rated lower in likelihood, experts converged on their salience due to the challenges of monitoring human performance in real time. Conversely, scenarios with low severity and high dispersion (e.g., LS-059, LS-089) dropped to the bottom, confirming stable expert rejection. Taken together, these patterns underscore the multidimensional nature of risk: MASS safety cannot be understood solely in terms of technical redundancy, but must also address human–machine integration and cognitive resilience.

Beyond prioritization, the findings carry broader implications for maritime regulation and training. The exclusion of ROs from the current STCW convention creates a regulatory vacuum in competence, watchkeeping, and fatigue management. The Delphi results strengthen the argument that these omissions are not marginal, but fundamental to MASS safety. The Top 10 scenarios emphasize precisely those domains—propulsion, communication, and human factors—where RO competence is most critical. Integrating such competence requirements into STCW tables would align international law with the operational realities of MASSs and provide a basis for accountability in remote control environments.

At the methodological level, this study contributes by modifying the traditional RPN formula. By re-weighting detectability as an adjustment factor, rather than a simple multiplier, the adopted risk index reduces rank reversal and provides more stable prioritization across diverse scenarios. This refinement, although heuristic, aligns with the recommendations of IEC 60812:2018 and recent scholarship advocating for action-based, rather than purely numeric, prioritization. The result is a risk ranking system that is transparent, sensitive to expert input, and more reflective of the operational challenges unique to MASSs.

Nonetheless, this study also recognizes its boundaries. Expert judgment, while systematic, remains dependent on panel composition, disciplinary balance, and operational background. Different mixes of regulators, operators, or technical specialists may yield variations in the ranking of scenarios. To mitigate this, this study ensured diversity of panel membership and applied strict consensus criteria, yet further replication with larger or differently composed panels would be valuable. Additionally, the Delphi method captures relative prioritization, but does not reveal how risks evolve dynamically over time or interact through feedback loops.

Future research should build on this work using quantitative modeling. The Top 10 scenarios can be applied in structured models to examine thresholds, escalation dynamics, and resilience under different conditions of vessel count, communication latency, and operator workload. Such modeling would also let regulators and industry stakeholders test policy options such as shift length, communication redundancy, and alarm interface design before adoption. The Delphi study provides a prioritized and evidence-based input set, while quantitative modeling offers a way to measure interactions and identify leverage points.

In conclusion, this study demonstrates that safe deployment of MASSs requires a combined approach: system-theoretic analysis to identify hazards, expert judgment to prioritize risks, and regulatory integration to close institutional gaps. The results emphasize that Remote Operators are not peripheral actors, but central agents whose competence, workload management, and accountability must be codified in international law. By aligning technical risk analysis with institutional reform, the research provides both empirical evidence and normative direction. It argues that embedding ROs within the STCW framework is not optional, but necessary if MASSs are to be operated safely, legally, and responsibly in global waters. Regarding limitations and outlook, findings rely on expert judgment and may be sensitive to panel composition; validation in real-world MASS operations remains necessary. Future work should incorporate quantitative simulation and AI-informed detectability to complement consensus-based prioritization.

The Delphi process itself also deserves emphasis as a methodological contribution. Twenty experts representing academia, industry, seafaring professionals, and regulatory authorities participated in two structured rounds, evaluating all 92 loss scenarios across severity, likelihood, and detectability. Round 1 revealed minimal concordance (Kendall’s W = 0.07), while structured feedback and statistical summaries in Round 2 elevated agreement to substantial levels (W = 0.693, p < 0.001). This transparent design ensured that convergence was not incidental, but the result of informed reconsideration by a diverse panel. The stability of the Top 10 list across propulsion, communication, sensing, integrated control, and human–machine interaction domains demonstrates that even heterogeneous groups can achieve robust alignment when guided by iterative and evidence-based procedures. These features underscore that the findings are not merely a set of subjective expert views, but a reproducible consensus baseline for regulatory debate and training development.

This study has some limitations that must be noted. The Delphi process depends on expert judgment, so the outcome may shift if the panel composition changes. At present, there are no large datasets of MASS accidents, which restricts validation. The adjusted index used in this study is a practical tool, rather than a full causal model. Future work should test these results through simulation and real trials during the EBP. It should also explore how AI-based detection and sensor fusion perform in degraded conditions. These steps will provide stronger evidence for integrating Remote Operator competence into international regulation.

6. Patents

A review of relevant databases found no applicable patents directly related to MASS degraded operations within the scope of this study. This section is therefore intentionally left without citations.

Author Contributions

Conceptualization, H.P. and J.K.; methodology, H.P. and J.K.; software, M.J.; validation, H.P., S.-y.K., and J.K.; formal analysis, J.K. and H.P.; investigation, H.P., S.-y.K., and M.J.; resources, D.K. and C.K.; data curation, M.J. and J.K.; writing—original draft preparation, H.P. and J.K.; writing—review and editing, J.K., S.-y.K., U.J. and C.K.; visualization, M.J.; supervision, J.K.; project administration, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This study did not involve humans or animals.

Informed Consent Statement

Not applicable. This study did not involve humans.

Data Availability Statement

Data supporting the findings of this study are available from the corresponding author upon reasonable request. No new publicly archived datasets were created.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Action Priority
EBP	Experience-Building Phase
D	Detectability
DP	Dynamic Positioning
FMEA	Failure Mode and Effects Analysis
HMI	Human–Machine Interface
IEC	International Electrotechnical Commission
IMO	International Maritime Organization
IQR	Interquartile Range
KPI	Key Performance Indicator
L	Likelihood
LS	Loss Scenario
MASS	Maritime Autonomous Surface Ship
R	Risk
ROC	Remote Operation Center
RO	Remote Operator
RPN	Risk Priority Number
S	Severity
STCW	International Convention on Standards of Training, Certification and Watchkeeping for Seafarers
STPA	System-Theoretic Process Analysis
UCA	Unsafe Control Action
VTS	Vessel Traffic Services
W	Kendall’s Coefficient of Concordance

Appendix A

This appendix provides the complete set of 92 loss scenarios (LSs) identified through the STPA of MASS operations under degraded states. The full list is presented here to ensure transparency and reproducibility of the Delphi survey design. Summaries of each scenario are included to supplement the main text, where only aggregated results (Top 30 and Top 10 sets) are discussed.

Table A1. Total Loss Scenario of MASS operation.

LS ID	Loss Scenario (Summary)
LS-1	MASS drifts uncontrollably due to lack of navigation commands.
LS-2	MASS enters restricted area when commands are not issued.
LS-3	MASS misses berthing schedule because of command delay.
LS-4	MASS fails to react to an approaching vessel due to late navigation update.
LS-5	MASS cannot adjust to sudden weather changes.
LS-6	MASS miscalculates approach angle during docking.
LS-7	MASS responds too late to ice detection warning.
LS-8	MASS drifts after communication loss without switching to remote control.
LS-9	MASS fails to avoid navigational hazards following comms failure.
LS-10	MASS loses control in a congested waterway, disrupting traffic.
LS-11	MASS continues executing outdated commands after comms loss.
LS-12	MASS ignores emergency stop due to blocked command.
LS-13	MASS denies course correction request.
LS-14	MASS moves erratically from conflicting navigation commands.
LS-15	MASS turns the wrong way after receiving outdated command.
LS-16	MASS engages throttle before completing course change.
LS-17	MASS attempts to stop before adjusting course.
LS-18	MASS continues into typhoon because of outdated weather data.
LS-19	MASS fails to reroute around an ice field.
LS-20	MASS makes unnecessary course change in minor weather.
LS-21	MASS reduces speed unnecessarily due to weather.
LS-22	MASS drifts into traffic zone without backup propulsion.
LS-23	MASS drifts toward shallow waters.
LS-24	MASS is unresponsive during docking due to propulsion loss.
LS-25	MASS drifts for extended time, causing major delay.
LS-26	MASS engages propulsion despite damage, causing fire or accident.
LS-27	MASS moves with jammed thruster.
LS-28	MASS fails to stop because of overload.
LS-29	MASS burns excess fuel due to malfunction.
LS-30	MASS sustains further damage from propulsion reactivation.
LS-31	MASS moves into hazardous area following forced operation.
LS-32	MASS accelerates abruptly, damaging cargo.
LS-33	Propulsion restart overloads electrical system.
LS-34	MASS collides because of late emergency stop.
LS-35	MASS runs aground from delayed deceleration.
LS-36	Cargo is damaged due to sudden braking.
LS-37	Delay in stopping causes operational delays.
LS-38	MASS becomes unstable due to conflicting speed commands.
LS-39	MASS overreacts to sudden speed reduction.
LS-40	Crew cancels emergency stop, leading to collision.
LS-41	Crew increases speed in rough seas, raising accident risk.
LS-42	MASS drifts into traffic without sensor backup.
LS-43	MASS drifts toward shallow area due to sensor loss.
LS-44	MASS unresponsive during docking without sensor support.
LS-45	MASS drifts for hours because of sensor failure.
LS-46	Cargo shift goes undetected, leading to capsizing.
LS-47	Undetected cargo movement causes structural damage.
LS-48	Cargo shift reduces maneuverability, vessel drifts.
LS-49	Cargo shift damages propulsion and onboard systems.
LS-50	MASS misidentifies hazardous cargo and initiates false emergency.
LS-51	Misidentified cargo causes mishandling accidents.
LS-52	False hazard alert wastes emergency resources.
LS-53	False detection forces unnecessary rerouting.
LS-54	MASS halts in open waters from misinterpreted sensor data.
LS-55	MASS reroutes unnecessarily.
LS-56	MASS slows unexpectedly, causing congestion.
LS-57	MASS takes evasive action for a false hazard.
LS-58	MASS moves abnormally due to erratic sensor switching.
LS-59	MASS remains in fail-safe longer from delayed comms recovery.
LS-60	MASS navigation not updated after reconnection.
LS-61	MASS ignores critical updates after reconnection.
LS-62	MASS drifts into traffic without emergency signals.
LS-63	MASS drifts into shallow water without warning.
LS-64	Tug rescue delayed, as no distress alert is issued.
LS-65	Port is unaware of vessel distress.
LS-66	False distress signal triggers unnecessary evasive action.
LS-67	Unnecessary rescue deployed due to false signal.
LS-68	MASS denied port entry following false distress signal.
LS-69	False NUC leads to unnecessary collision avoidance maneuvers.
LS-70	MASS receives conflicting data from excessive channel switching.
LS-71	MASS loses connection due to excessive switching.
LS-72	MASS pushed off course by current without stabilization.
LS-73	MASS rolls excessively, damaging cargo.
LS-74	Crew delays engine response due to comms failure.
LS-75	Crew misinterprets command, maneuver errors occur.
LS-76	Hacker sends false navigation command to MASS.
LS-77	Cyberattack manipulates vessel speed.
LS-78	Ransomware locks MASS control systems.
LS-79	Cyberattack disables emergency systems.
LS-80	MASS remains exposed due to delayed cyber defense.
LS-81	MASS ignores emergency stop because legitimate command blocked.
LS-82	MASS denies course correction request.
LS-83	RO override causes collision.
LS-84	RO override extends voyage duration.
LS-85	Crew ignores engine failure alert.
LS-86	Crew ignores navigation hazard alert.
LS-87	Crew ignores onboard security breach.
LS-88	RO fails to respond to engine failure.
LS-89	RO slow to respond to approaching vessel.
LS-90	RO delays response to abnormal condition.
LS-91	RO issues contradictory commands, destabilizing MASS.
LS-92	RO switches modes incorrectly, confusing the system.

References

Yun, I.H. Prepare for the Era of Autonomous Ships, Need to Change the Maritime Education and Evaluation Paradigm. Monthly Hyundai Ocean of Dec. 2024; pp. 20–21. [Google Scholar]
Kim, J.M. A Study on the Development of Demand Prediction Model for Fleet Capacity of Dynamic Positioning Vessel. Doctorial Dissertation, Korea Maritime and Ocean University, Busan, Republic of Korea, 2024. [Google Scholar]
International Maritime Organization. Draft MASS Code: Annex 1 of MSC 108/WP.7; International Maritime Organization: London, UK, 2023. [Google Scholar]
International Maritime Organization. Proposed Amendments to Chapter 15 (Human Element) of the Draft MASS Code (MSC 110/5/1); Submitted to the 110th Session of the Maritime Safety Committee; International Maritime Organization: London, UK, 2025. [Google Scholar]
International Maritime Organization. Role of the Master and Obligation to be on Board the MASS (MSC 110/5/14); Submitted to the 110th Session of the Maritime Safety Committee; International Maritime Organization: London, UK, 2025. [Google Scholar]
International Maritime Organization. Training, Certification and Watchkeeping Requirements for Remote Operators of MASS (MSC 110/5/4); Submitted to the 110th Session of the Maritime Safety Committee; International Maritime Organization: London, UK, 2025. [Google Scholar]
International Maritime Organization. Report on the Progress of the MASS Code Development and Implications for Training and Certification (MSC 110/13/3); Submitted to the 110th Session of the Maritime Safety Committee; International Maritime Organization: London, UK, 2025. [Google Scholar]
Kuntasa, T.; Lirn, T.-C. A conceptual model of autonomous ship remote operators’ competency. J. Navig. 2024, 76, 653–674. [Google Scholar] [CrossRef]
Choi, J.-H.; Yoo, J.-H.; Lee, S.-I. Roles and legal status of the remote operator in a maritime autonomous surface ship: Focusing on the concept of a crew and a master. J. Marit. Law Stud. 2018, 30, 155–185. [Google Scholar] [CrossRef]
Park, H.R.; Kim, J.M. STPA analysis for safe operation of maritime autonomous surface ship (MASS) under degradation state. Front. Mar. Sci. 2025, 12, 1601515. [Google Scholar] [CrossRef]
Leveson, N.G. Engineering a Safer World: Systems Thinking Applied to Safety; MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
Leveson, N.G.; Thomas, J.P. STPA Handbook; Massachusetts Institute of Technology: Cambridge, MA, USA, 2018. [Google Scholar]
Ministry of Science and ICT; Telecommunication Technology Association. Risk Analysis Guide Using STPA; Ministry of Science and ICT.; Telecommunication Technology Association: Seoul, Republic of Korea, 2018; pp. 2–14. [Google Scholar]
Wróbel, K.; Montewka, J.; Kujala, P. Towards the development of a system-theoretic model for safety assessment of autonomous merchant vessels. Reliab. Eng. Syst. Saf. 2018, 178, 209–224. [Google Scholar] [CrossRef]
Wróbel, K.; Krata, P.; Montewka, J. Preliminary results of a system-theoretic assessment of maritime autonomous surface ships’ safety. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2019, 13, 717–723. [Google Scholar] [CrossRef]
Rokseth, B.; Haugen, O.I.; Utne, I.B. Safety verification for autonomous ships. MATEC Web Conf. 2019, 273, 02002. [Google Scholar] [CrossRef]
Utne, I.B.; Rokseth, B.; Sørensen, A.J.; Vinnem, J.E. Towards supervisory risk control of autonomous ships. Reliab. Eng. Syst. Saf. 2020, 196, 106757. [Google Scholar] [CrossRef]
Glomsrud, J.A.; Xie, J. A structured STPA safety and security co-analysis framework for autonomous ships. In Proceedings of the ESREL 2019, Hannover, Germany, 22–26 September 2019. [Google Scholar] [CrossRef]
Chaal, M.; Valdez Banda, O.A.; Glomsrud, J.A.; Basnet, S.; Hirdaris, S.; Kujala, P. A framework to model the STPA hierarchical control structure of an autonomous ship. Saf. Sci. 2020, 132, 104939. [Google Scholar] [CrossRef]
Eriksen, S.; Lützen, M. The impact of redundancy on reliability in machinery systems on unmanned ships. WMU J. Marit. Aff. 2022, 21, 161–177. [Google Scholar] [CrossRef]
International Maritime Organization. Outcome of the Regulatory Scoping Exercise for MASS, MSC.1/Circ.1638; International Maritime Organization: London, UK, 2021. [Google Scholar]
International Maritime Organization. Development of a Goal-Based Instrument for MASS: Concept on the Management of Remote Operations, MSC 108/4/2; International Maritime Organization: London, UK, 2024. [Google Scholar]
International Maritime Organization. Report of the Working Group (MASS), MSC 109/WP.8; International Maritime Organization: London, UK, 2024. [Google Scholar]
Porathe, T.; Prison, J.; Man, Y. Situation awareness in remote control centers for unmanned ships. In Proceedings of the Human Factors in Ship Design and Operation, London, UK, 26–27 February 2014; pp. 93–101. [Google Scholar] [CrossRef]
Ministry of Oceans and Fisheries. Act on the Investigation of and Inquiry of Marine Accidents; Ministry of Oceans and Fisheries: Sejong, Republic of Korea, 2020. [Google Scholar]
Banda, O.A.V.; Kannos, S. Hazard Analysis Process for Autonomous Vessels, Technical Report; Yrkeshögskolan Novia: Vaasa, Finland, 2017.
Dalkey, N.; Helmer, O. An Experimental Application of the DELPHI Method to the Use of Experts. Manag. Sci. 1963, 9, 458–467. [Google Scholar] [CrossRef]
Keeney, S.; Hasson, F.; McKenna, H. The Delphi Technique in Nursing and Health Research; Wiley-Blackwell: Oxford, UK, 2011. [Google Scholar] [CrossRef]
Stamatis, D.H. Failure Mode and Effect Analysis: FMEA from Theory to Execution; ASQ Quality Press: Milwaukee, WI, USA, 2003. [Google Scholar]
Bowles, J.B.; Pelaez, C.E. Fuzzy logic prioritization of failures in a system failure mode, effects and criticality analysis. Reliab. Eng. Syst. Saf. 1995, 50, 203–213. [Google Scholar] [CrossRef]
Pillay, A.; Wang, J. Modified failure mode and effects analysis using approximate reasoning. Reliab. Eng. Syst. Saf. 2003, 79, 69–85. [Google Scholar] [CrossRef]
International Electrotechnical Commission (IEC). IEC 60812: Failure Modes and Effects Analysis (FMEA and FMECA); International Electrotechnical Commission: Geneva, Switzerland, 2018. [Google Scholar]

Figure 1. Framework of research.

Figure 2. Risk Distribution of Top30 LSs.

Figure 3. Factor and total risk scores for the Top10 LSs.

Table 1. Panel composition.

Domain	Number of Experts	Background
Academia (professors, lecturers, maritime trainers, etc.)	6	Faculty in maritime universities, training centers
Industry (shipyards, equipment manufacturers)	5	Engineers, safety managers, R&D staff
Seafaring professionals (masters, chief engineers)	5	Licensed officers with command/engine experience
Government and regulatory agencies	4	Ministry officials, maritime safety regulators
Total	20

Average experience: 12 years (range 8–21); participation rate: 100% (20/20); anonymity: maintained throughout, with only aggregated statistics shared between rounds.

Table 2. Friedman/Kendall’s W of 1st Round.

Statistic	Value
Kendall’s W (overall)	0.07
χ²	256.569
df	2
Asymp. Sig. (p-value)	<0.001

W = 0.070 indicates limited consensus in Round 1, as expected. The significant χ² confirms that experts discriminated among scenarios.

Table 3. Top 30 loss scenarios (Round 1).

Rank	LS ID	S_med	L_med	D_med	Risk
1	LS-083	4.75	3.95	4.25	29.43
2	LS-063	4.65	4	4.3	29.34
3	LS-076	4.5	4.25	4.05	29.33
4	LS-084	4.7	3.9	4.25	28.76
5	LS-018	4.5	4.2	3.85	28.33
6	LS-062	4.75	4	3.8	28.31
7	LS-057	4.5	4.1	4.05	28.3
8	LS-034	4.5	4	4.25	28.24
9	LS-070	4.45	4	4.35	28.24
10	LS-010	4.6	3.95	4.1	28.03
11	LS-031	4.6	4.05	3.85	27.92
12	LS-078	4.55	3.95	4.1	27.72
13	LS-014	4.55	4	3.85	27.28
14	LS-071	4.55	3.95	3.95	27.25
15	LS-058	4.35	4.1	4	27.2
16	LS-065	4.4	4.1	3.9	27.2
17	LS-037	4.55	3.9	4	27.06
18	LS-090	4.5	3.95	3.95	26.95
19	LS-066	4.45	3.9	4.15	26.92
20	LS-055	4.4	3.85	4.3	26.72
21	LS-030	4.4	4	3.95	26.69
22	LS-009	4.4	3.9	4.1	26.47
23	LS-088	4.55	3.7	4.2	26.26
24	LS-001	4.65	3.65	4.05	26.03
25	LS-054	4.35	3.8	4	25.21
26	LS-012	4.45	3.8	3.75	25.05
27	LS-092	4.45	3.75	3.6	24.28
28	LS-059	3.15	2.3	2.1	8.64
29	LS-089	3	2.3	1.9	7.99
30	LS-029	2.95	2.25	2	7.8

Table 4. Friedman/Kendall’s W of 2nd Round.

Statistic	Value
Kendall’s W (overall)	0.693
χ²	3265.42
df	91
Asymp. Sig. (p-value)	<0.001

Table 5. Top 10 loss scenarios (Round 2).

Rank	LS ID	S_med	L_med	D_med	Risk
1	LS-009	5.00	5.00	5.00	42.50
1	LS-066	5.00	5.00	5.00	42.50
3	LS-058	5.00	5.00	4.75	41.41
4	LS-062	5.00	4.85	4.90	40.80
5	LS-012	5.00	4.90	4.80	40.79
6	LS-030	4.70	5.00	5.00	39.95
7	LS-076	4.95	4.70	4.95	39.35
8	LS-034	4.60	5.00	5.00	39.10
9	LS-010	4.85	4.40	5.00	36.28
10	LS-078	4.55	4.55	4.20	32.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, H.; Kim, J.; Jung, M.; Kang, S.-y.; Kim, D.; Kim, C.; Jang, U. Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness. Appl. Sci. 2025, 15, 10993. https://doi.org/10.3390/app152010993

AMA Style

Park H, Kim J, Jung M, Kang S-y, Kim D, Kim C, Jang U. Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness. Applied Sciences. 2025; 15(20):10993. https://doi.org/10.3390/app152010993

Chicago/Turabian Style

Park, Hyeri, Jeongmin Kim, Min Jung, Suk-young Kang, Daegun Kim, Changwoo Kim, and Unkyu Jang. 2025. "Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness" Applied Sciences 15, no. 20: 10993. https://doi.org/10.3390/app152010993

APA Style

Park, H., Kim, J., Jung, M., Kang, S.-y., Kim, D., Kim, C., & Jang, U. (2025). Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness. Applied Sciences, 15(20), 10993. https://doi.org/10.3390/app152010993

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Risk Management Challenges in Maritime Autonomous Surface Ships (MASSs): Training and Regulatory Readiness

Abstract

1. Introduction

2. Literature Review

2.1. Safety Analysis of MASSs Using STPA

2.2. Applications of STPA in Maritime Contexts

2.3. Regulatory Foundations and IMO Discussions

2.4. Human Factors, Competence, and Training

2.5. Legal and Institutional Perspectives

2.6. Synthesis

3. Methodology

3.1. Research Framework

3.2. Scenario Source: STPA Outputs

3.3. Delphi Survey Design

4. Results

4.1. First-Round Delphi Result

4.1.1. Panel and Response Summary

4.1.2. Concordance and Global Test of 1st Round

4.1.3. Prioritization of Loss Scenarios

4.2. Second Round Delphi Result

4.2.1. Concordance and Global Test of 2nd Round

4.2.2. Prioritization and Stability of Loss Scenarios

4.2.3. Patterns of Change from Round 1

4.2.4. Implications for Modeling

4.3. Practical Prioritization Outcomes

4.4. Regulatory Implications

4.5. Proposed Example of Implication Roadmap

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI