1. Introduction
Maritime Autonomous Surface Ships (MASSs) are often described as a fourth-generation innovation in maritime transport. They combine artificial intelligence, advanced sensors, satellite communication, and real-time data processing into a single control system that reduces human intervention and strengthens safety [
1,
2]. The idea is no longer limited to theory. In 2018, Finland operated a car ferry by remote control, and, in 2021, Norway launched sea trials of a fully autonomous electric container ship [
2]. East Asian countries, such as South Korea, Japan, and China, have also begun large-scale programs, reflecting the rapid globalization of MASS development [
1,
2].
To address these advances, the International Maritime Organization (IMO) is preparing the MASS Code, which is scheduled to enter into force in 2032. The code is intended to provide harmonized international standards for safe and accountable operation [
3]. This is an important step, yet discussions so far have focused heavily on technical matters. Legal, institutional, and human factors remain less developed, which leaves a gap between technological progress and regulatory preparation [
3,
4].
One of the most urgent questions concerns the Remote Operator (RO). ROs supervise and control MASSs from a land-based Remote Operation Center (ROC), taking on duties that would otherwise belong to shipboard officers. Although the draft MASS Code recognizes the human element, it does not resolve where command authority should lie, or how accountability should be shared in remote operations [
4,
5,
6]. This also raises the broader question of whether a master must remain on board, or whether command can legally be exercised from shore. Proposals supporting the latter emphasize efficiency, but critics warn about communication delays and reduced situational awareness [
4,
5]. In addition, AI-based perception and decision-making systems face technical limitations. Machine learning models may fail under data scarcity or adversarial conditions, such as obscured visual sensors or corrupted training data, while neural networks for object detection can produce unreliable outputs in degraded states. These vulnerabilities underscore the need for ROs to not only compensate for communication gaps, but also to monitor and intervene when AI performance deteriorates.
Existing conventions add to the difficulty. The International Convention on Standards of Training, Certification and Watchkeeping for Seafarers (STCW) applies only to crew physically on board. ROs are excluded, which leaves competence requirements, watchkeeping rules, and fatigue management outside the international framework [
6,
7,
8,
9]. As MASSs move toward commercial use, this gap becomes a major safety concern. Recent IMO debates show that work on the MASS Code cannot be separated from new training and certification standards for ROs [
6].
Safety engineering studies reinforce this point. MASSs can operate in degraded states, where they remain functional, but outside their intended design conditions. Failures of sensors, propulsion, or communication can lead to unsafe control actions that escalate into serious losses. Park and Kim [
10] applied system-theoretic process analysis (STPA) to MASSs under such conditions and identified 31 unsafe control actions (UCAs) and 92 detailed loss scenarios (LSs). Their work highlights the role of ROs in compensating for degraded autonomy, especially in situations where the detectability of failures is low.
Other research points to the distinct skills required of ROs. Kuntasa and Lirn [
8] outlined a competency model that emphasizes situational awareness, workload management, and emergency response. Their findings show that traditional competence tables cannot be applied directly to remote operations. Legal scholars add that ROs do not fit neatly into existing categories of “crew” or “master,” which raises unresolved questions of liability and accountability [
9].
Taken together, these studies show that safe deployment of MASSs depends on more than reliable technology. It also requires legal clarity, institutional readiness, and recognition of RO competence. Building on the STPA-based identification of loss scenarios [
10], this study applies structured expert judgment through the Delphi method. The outputs are then linked to the STCW convention to argue that RO competence, watchkeeping, and fatigue management should be explicitly incorporated into international maritime law. The overall aim is to provide both empirical and normative grounds for embedding ROs within the STCW framework as a foundation for safe MASS operations.
The rest of this paper is organized as follows.
Section 2 reviews earlier research on MASS safety, regulation, and human factors.
Section 3 explains the framework that combines STPA and Delphi.
Section 4 presents the survey results and develops a roadmap.
Section 5 draws conclusions and points to implications for training and regulation.
This study is guided by the following research questions:
What are the critical loss scenarios (LSs) for MASSs in degraded states identified through STPA?
How can expert consensus (Delphi) help prioritize these risks?
In what ways can these risks inform training and regulatory frameworks under the STCW convention?
How do human–AI interactions affect MASS safety management?
5. Conclusions
This study examined the safety and regulatory implications of MASSs by integrating a system-theoretic safety analysis with structured expert judgment. Beginning with a comprehensive STPA, the research established a dataset of 92 LSs reflecting degraded operational states of MASSs, ranging from propulsion and navigation failures to human–machine interaction risks. These scenarios provided a structured and traceable basis for expert evaluation, ensuring that the risk inventory was anchored in system-level control logic, rather than anecdotal incidents.
The Delphi process was then applied to elicit and refine expert judgment in areas where empirical accident statistics are scarce. Across two rounds, the 20-member panel—drawn from academia, industry, seafaring professionals, and regulatory bodies—rated each LS in terms of severity, likelihood, and detectability. Round 1 demonstrated the expected divergence of opinion, with a Kendall’s W of 0.07, reflecting heterogeneity of perspectives. However, Round 2’s structured feedback and convergence mechanisms elevated concordance to W = 0.693, well above conventional thresholds for Delphi consensus. This result highlights the utility of iterative expert elicitation in domains of high uncertainty and demonstrates that even diverse panels can achieve robust alignment when guided by transparent procedures.
The prioritization process produced both stability and clarity. High-severity and high-likelihood scenarios, such as propulsion control failures (LS-009, LS-066) and critical communication breakdowns (LS-030, LS-062), consistently occupied the top ranks. These scenarios reflect hazards with immediate catastrophic potential, where operator intervention may be ineffective without redundant safeguards. Equally important, the results highlighted scenarios with high severity but low detectability, such as cognitive overload of ROs and delayed alarm recognition (LS-037, LS-083). Although these risks were sometimes rated lower in likelihood, experts converged on their salience due to the challenges of monitoring human performance in real time. Conversely, scenarios with low severity and high dispersion (e.g., LS-059, LS-089) dropped to the bottom, confirming stable expert rejection. Taken together, these patterns underscore the multidimensional nature of risk: MASS safety cannot be understood solely in terms of technical redundancy, but must also address human–machine integration and cognitive resilience.
Beyond prioritization, the findings carry broader implications for maritime regulation and training. The exclusion of ROs from the current STCW convention creates a regulatory vacuum in competence, watchkeeping, and fatigue management. The Delphi results strengthen the argument that these omissions are not marginal, but fundamental to MASS safety. The Top 10 scenarios emphasize precisely those domains—propulsion, communication, and human factors—where RO competence is most critical. Integrating such competence requirements into STCW tables would align international law with the operational realities of MASSs and provide a basis for accountability in remote control environments.
At the methodological level, this study contributes by modifying the traditional RPN formula. By re-weighting detectability as an adjustment factor, rather than a simple multiplier, the adopted risk index reduces rank reversal and provides more stable prioritization across diverse scenarios. This refinement, although heuristic, aligns with the recommendations of IEC 60812:2018 and recent scholarship advocating for action-based, rather than purely numeric, prioritization. The result is a risk ranking system that is transparent, sensitive to expert input, and more reflective of the operational challenges unique to MASSs.
Nonetheless, this study also recognizes its boundaries. Expert judgment, while systematic, remains dependent on panel composition, disciplinary balance, and operational background. Different mixes of regulators, operators, or technical specialists may yield variations in the ranking of scenarios. To mitigate this, this study ensured diversity of panel membership and applied strict consensus criteria, yet further replication with larger or differently composed panels would be valuable. Additionally, the Delphi method captures relative prioritization, but does not reveal how risks evolve dynamically over time or interact through feedback loops.
Future research should build on this work using quantitative modeling. The Top 10 scenarios can be applied in structured models to examine thresholds, escalation dynamics, and resilience under different conditions of vessel count, communication latency, and operator workload. Such modeling would also let regulators and industry stakeholders test policy options such as shift length, communication redundancy, and alarm interface design before adoption. The Delphi study provides a prioritized and evidence-based input set, while quantitative modeling offers a way to measure interactions and identify leverage points.
In conclusion, this study demonstrates that safe deployment of MASSs requires a combined approach: system-theoretic analysis to identify hazards, expert judgment to prioritize risks, and regulatory integration to close institutional gaps. The results emphasize that Remote Operators are not peripheral actors, but central agents whose competence, workload management, and accountability must be codified in international law. By aligning technical risk analysis with institutional reform, the research provides both empirical evidence and normative direction. It argues that embedding ROs within the STCW framework is not optional, but necessary if MASSs are to be operated safely, legally, and responsibly in global waters. Regarding limitations and outlook, findings rely on expert judgment and may be sensitive to panel composition; validation in real-world MASS operations remains necessary. Future work should incorporate quantitative simulation and AI-informed detectability to complement consensus-based prioritization.
The Delphi process itself also deserves emphasis as a methodological contribution. Twenty experts representing academia, industry, seafaring professionals, and regulatory authorities participated in two structured rounds, evaluating all 92 loss scenarios across severity, likelihood, and detectability. Round 1 revealed minimal concordance (Kendall’s W = 0.07), while structured feedback and statistical summaries in Round 2 elevated agreement to substantial levels (W = 0.693, p < 0.001). This transparent design ensured that convergence was not incidental, but the result of informed reconsideration by a diverse panel. The stability of the Top 10 list across propulsion, communication, sensing, integrated control, and human–machine interaction domains demonstrates that even heterogeneous groups can achieve robust alignment when guided by iterative and evidence-based procedures. These features underscore that the findings are not merely a set of subjective expert views, but a reproducible consensus baseline for regulatory debate and training development.
This study has some limitations that must be noted. The Delphi process depends on expert judgment, so the outcome may shift if the panel composition changes. At present, there are no large datasets of MASS accidents, which restricts validation. The adjusted index used in this study is a practical tool, rather than a full causal model. Future work should test these results through simulation and real trials during the EBP. It should also explore how AI-based detection and sensor fusion perform in degraded conditions. These steps will provide stronger evidence for integrating Remote Operator competence into international regulation.