Future Directions for Human-Centered Transparent Systems for Engine Room Monitoring in Shore Control Centers

: Many autonomous ship projects have reﬂected the increasing interest in incorporating the concept of autonomy into the maritime transportation sector. However, autonomy is not a silver bullet, as exempliﬁed by many incidents in the past involving human and machine interaction; rather it introduces new Human Factor (HF) challenges. These challenges are especially critical for Engine Room Monitoring (ERM) in Shore Control Centre (SCCs) due to the system’s complexity and the absence of human senses in the decision-making process. A transparent system is one of the potential solutions, providing a rationale behind its suggestion. However, diverse implementations of transparency schemes have resulted in prevalent inconsistencies in its effects. This literature review paper investigates 17 transparency studies published over the last eight years to identify (a) different approaches to developing transparent systems, (b) the effects of transparency on key HFs, and (c) the effects of information presentation methods and uncertainty information. The ﬁndings suggest that the explicit presentation of information could strengthen the beneﬁts of the transparent system and could be promising for performance improvements in ERM tasks in the SCC.


Maritime Autonomous Surface Ships
Recent technological breakthroughs, including artificial intelligence, blockchain, and the internet of things, have accelerated a trend towards the transformation of conventional industries into smarter, safer, and more efficient business models across the medical, aviation, civil, and maritime sectors. The transformation in the maritime domain is often referred to as Maritime 4.0, derived from the German government's initiative to pursue digital transformation in the manufacturing sector: Industry 4.0. Maritime 4.0 includes transformations in vessel design, construction, operations, and shipping [1]. This digitalization is envisioned as a disruptive opportunity, requiring a high level of cooperation between regulatory bodies, industrial partners, labor organizations and other stakeholders [2]. One of its endeavors resides in autonomous ship development, in which the estimated investment reached USD 5.8 billion in 2020 and is expected to increase up to 14.2 billion by 2030 [3]. As a response to increasing interest in autonomous vessels, the International Maritime Organization (IMO) identified the degrees of autonomy in four classes as a guideline for Maritime Autonomous Surface Ships (MASS), shown in Table 1. A notable keyword in their classifications of autonomy is remote operation, specified in Degrees Two and Three, indicating that the role of SCC will be significant.  [4].

Degree One
Ship with automated processes and decision support Degree Two Remotely controlled ship with seafarers on board Degree Three Remotely controlled ship without seafarers on board Degree Four Fully autonomous ship being able to make decisions by itself

Shore Control Centre
The SCC is envisaged as a central control tower, where remote operators are expected to receive information about multiple vessels via satellites and engage in various tasks, such as monitoring, voyage and maintenance planning, and legal and organizational arrangements inshore. The SCC scheme offers economic, environmental and social benefits, primarily in the removal of the human presence from ships. A ship will no longer necessitate accommodation facilities (i.e., refrigerant and sanitary systems), lowering construction and maintenance costs. It will be lighter and faster, enabling slow steaming to reduce carbon emissions. The sense of isolation, motion sickness and sleep deprivation, the possible causes of a significant shortfall in seafarers in the future, will no longer be a challenge facing the industry [5]. The SCC will be beneficial in mitigating human errors that are involved with the psychological, workplace, external factors, and 75 to 96% of marine accidents in the conventional navigation systems [6,7]. Labor costs, accounting for up to 30% of the total operation cost, can be minimized, provided that remote operators successfully monitor multiple vessels at once [8]. These potential benefits have encouraged many countries, companies, and universities globally to initiate MASS development with the SCC scheme. Two past projects, MUNIN (Maritime Unmanned Navigation through Intelligence in Networks) and AAWA (The Advanced Autonomous Waterborne Applications), investigated the feasibility of remotely operated vessels. As a result, they both acknowledged a recovery of situational awareness due to the loss of ship senses as a major challenge. This implies that the simple replication of the current ship monitoring systems is not a favorable option [9]. Their insight has been understood in the current MASS projects, and some of the large projects are described in Table 2. When looking at their projects from the perspective of traditional shipping operations, the focal development areas can be divided into two primary sectors: navigation-relevant or engine room-relevant technologies, which shows a different trend in supporting remote operators. For example, systems that visualize environmental data (i.e., weather, obstacles and collision levels) and ship states (roll and pitch motions) to enhance situation awareness are under development in KASS (situational awareness system) and AUTOSHIP (intelligent awareness system) projects. However, it appears that automation is at the center of engine room operation, serving as a decision-making tool rather than a decision support system involving human operators.
It is apparent that data-driven technologies, such as artificial intelligence, would be highly beneficial for the future of the maritime industry. However, the purely technological approach may not guarantee operational safety, as shown by past accidents in the aviation industry. The two air disasters owing to faulty software in the Boeing 737 MAX model that a digital twin (DT) failed to ascertain in 2018 and 2019 showed that the sole use of advanced technologies is not an ultimate solution. Ibrion et al. [13] reviewed these accidents and suggested a multidisciplinary approach to mitigate the risk of DT implementation in the maritime industry, such as the integration of expertise and the role of experts. The problems posed by highly automated processes were discussed by [14], who introduced the ironies of automation. One of the ironies is manual skill deterioration, which can devastate takeover performance in case of automation failures, indicating the importance of situational awareness, which is very likely to be poor considering ERM's characteristics.

Engine Room Monitoring and Human Factor Challenges
The conventional engine room system is built and optimized for humans acting as direct on-site supervisors. This means that current sensor systems alone do not cover the whole area and are built to inform operators of the general functionality of the machinery system rather than indicating the exact cause of the problem. For example, bilge alarms are only installed at certain locations with a high potential for water leakage, such as the ballast pump room, and only report the presence of liquid. They do not inform remote operators as to the type of liquid and the cause of the leakage, which are critical in the formulation of counter-measures. Another example is diesel engines, whose general functionality is commonly inferred from combustion status. It is challenging to clearly diagnose the cause of abnormalities with current sensor information (i.e., exhaust gas temperature), preventing remote operators from pre-planning what is to be replaced or repaired in the next port. The lack of bodily feelings is another characteristic that limits the operator's ability to understand the situation [15]. This includes the loss of visual, haptic and auditory information, which many operators utilize to understand the status of equipment and process [16]. These two factors, the limitations of current sensor systems and the lack of bodily feelings, require more information to be produced from each vessel to decompose the problem and compensate for the loss of bodily information. With the current industrial approach towards remote engine room operation, it may be possible to reduce the amount of information that remote operators should handle and to mitigate human errors, such as performance degradation due to motion sickness. However, the approach will induce new forms of human error, called HF challenges [17].
As concluded in the MUNIN project, it is still unknown whether one operator can control multiple ships, (six were used in the project) with adequate situational awareness and workload [18]. Man et al. [19] discovered difficulties in developing sufficient situational awareness and decision-making latencies when SCC operators received information through ordinary navigation devices such as radar. In addition, Man, et al. evaluated the SCC prototype system where each operator monitored six vessels and communicated with a supervisor (coordinator), captains (decision maker), and engineers (technical consultant). One of the significant findings was that the captains exhibited the worst situation awareness. These HF challenges, decision-making latency or response time, workload and situation awareness, are critical in determining crew configurations in the SCC [20]. Trust is another key HF challenge that is highly relevant to the misuse and disuse of automation and is expected to be critical for ERMs embedded with data-driven technologies. The question as to how to calibrate an operator's trust and system trustworthiness should be addressed to support both the utilization and verification of automation, consequently leading to high performance. In response to these challenges, one of the potential solutions is a transparent system that could ameliorate the decrease in situational awareness resulting from the high level of automation [21].

Transparency
The potential of transparency is vital in highly automated control room environments to provide an understandable overview of their complex work domain. Roundtree et al. [22] demonstrated explainability, performance, usability, and trust as direct factors in transparency. Other researchers attempted to address transparency as the observability and predictability of system behaviour for situation awareness recovery [23]. For example, Battiste et al. [24] established a preliminary model of human and machine teaming (HAT) for flight control. In their study, they found that as soon as ground operators engaged with new tasks, they quickly gained situational awareness through the environmental and systems data, including reasoning information from the suggestions or actions by the machine agent. They further described that prompt situational awareness recovery is likely to reduce the need for continuous monitoring of individual aircraft. It can be one of the critical aspects of successful SCC operation as it will not be possible to continuously monitor every system of multiple vessels under the control of a single operator [25].
The most common way of implementing the transparent system is to provide the information corresponding to the three stages of situational awareness [26]: perception, comprehension, and the project of the environment. Chen et al. [27] further shaped Endsley's concept and developed the Situation Awareness-Based Agent Transparency (SAT) model by specifying what specific forms of information align with these stages. For example, only basic environmental information is provided at the lowest transparency level, accompanied by reasoning information at the medium level, and predicted outcomes, often with uncertainty information, at the highest level. Examples of uncertainty information include time projections [28], uncertain zones in the route [29] and heart rate, implying a sensor visibility range in fog [30].
Many researchers have also tried to expand the field of transparency. Skraaning and Jamieson [31] employed verbal and diagnostic feedback in the system behaviour for nuclear system monitoring, focusing on automation observability. Dikmen et al. [32] implemented the automation support system for a target identification task that informs its system limitation (i.e., factors not considered in its suggestions). Panganiban et al. [33] introduced benevolent transparency, in which the automation exhibits social supports. Other interesting approaches include the presentation of the agent's strategies during a collaborative game [34], and of both the robot's decision-making process and its understanding of the human's decision-making process [35] and of normalcy exemplar corresponding to the detected anomaly [36].
The effects of transparency on the key HF (i.e., trust, workload, situational awareness, response time, and performance) have been studied in many industries, but mostly in the aviation and military domains, in which a strong similarity exists between seafarers and flight pilots [37]. However, it seems that the effects of transparency on key HFs vary across empirical studies based on how subjects interpret and employ the concept of transparency, especially in regards to the availability of uncertainty information and the presentation of reasoning information. Therefore, this paper aims to investigate different transparency models and their impact on key HFs that are envisioned as critical for the ERM and operation in the SCC.

Materials and Methods
Our meta-analysis was developed by following the guidelines by [38]. A search period of the past eight years, from 2014 to 2021, was targeted, considering the nature of the fast-evolving HAT research domain. The initial search was performed in May 2021 to find publications that met the search term in the databases, which included Scopus, Sage, Web of Science, and ACM DL. The following search term was chosen to include as many studies as possible that investigated HF challenges in the HAT environment, to which the transparent system is a potential solution.

•
Search Terms: (Human-Machine OR Human-Agent OR Human-Automation) AND (Supervis* OR Team*) Consequently, 2603 articles were identified from these four databases. These articles included non-and not highly relevant papers, such as papers on information technology algorithm development. The first screening, of the abstracts, was conducted to find articles involving system management, monitoring, and interaction regarding HF challenges in general. As a result, 657 papers were collected for a full-text review. The second screening was conducted to differentiate empirical studies investigating the effects of transparency in the HAT domain and meeting the following inclusion criteria.

•
The article should have experiments conducted under the HAT scheme, with at least one human operator and one agent.

•
The article should investigate at least one key HF at varying levels of transparency regardless of its transparency type.

•
The article should be written in English or translated into English.
Consequently, seventeen papers met the inclusion criteria for in-depth investigation and eighteen experiments were investigated. The flow of information, referred to as a Preferred Reporting Item for Systematic Reviews and Meta-Analysis (PRISMA), is shown in Figure 1. The details of the studies included in the meta-analysis are demonstrated in Table 3.  For the data analysis, this study was inspired by the work of [49]. As a first step, we listed all the HFs used as dependent variables and selected key factors from more than five samples to describe a general trend in transparency effects over the last eight years. Next, we ranked the levels of transparency from one for the lowest level (perception) to three or four for the highest level (projection), according to Endsley's three stages of situational awareness. Some studies separated the highest level of transparency into two levels: future outcome and uncertainty and provided uncertainty information at the highest transparency level. In cases were the studies developed their own transparency levels based on different approaches (i.e., level of information), the levels described in their studies were used for our analysis. There were no tied ranks in the transparency levels. For key HFs, different rankings were assigned when there was a significant difference between levels (p < 0.05); otherwise, they were ranked the same (tied ranks). Most studies (n = 15) employed an ANOVA test or its variations (ANCOVA and MANOVA), while Akash et al. [47] and Kunze et al. [30] employed a likelihood ratio test and Satterthwaite's method, respectively.
The ranking examples for a study comparing the effects of three transparency levels (low, medium, and high) on trust is provided in Figure 2. For Example B, the trust was significantly lower at the low transparency than the other two levels, and there was no significant difference in trust at medium and high transparency. Therefore, the tied-rank (2.5) was assigned to trust at medium and high transparency. In Example C, the trust at the low transparency level was significantly lower than that at the medium transparency level. However, there was no significant difference between medium and high and low and high transparency levels. In this case, we assigned tied-ranks to the trust at the low and high transparency level, since trust at the high transparency level was still lower than that at the medium transparency level.

Figure 2.
A methodology used to rank raw data. In the (A) example, the low transparency (rank: 1) is matched with trust (rank: 1.5), medium transparency (rank: 2) is matched with trust (rank: 3) and lastly, the high transparency (rank: 3) is matched with trust (rank: 1.5). As there is no significant difference in trust at low and high transparency, and trust is significantly higher at medium than at other levels, the tied-rank (1.5) is assigned to trust at low and high transparency. Other two examples (B,C) are explained above.
We employed Kendall's tau-b correlation coefficient for each study to investigate the strength and direction of the association between an independent variable (transparency level) and dependent variables (key HF). The data could be expressed in rank order (ordinal variables) and featured tied ranks. A positive value indicated a positive association, and zero indicated no association. The Kendall's tau-b coefficient is expressed as where n 0 = n(n − 1)/2 (2) As there are no standard rules to interpret the strength of their association, the coefficients were evaluated according to the guideline proposed by [50]. A similar guideline is also suggested by [51]. However, it should be noted that this guideline is one of many perspectives that interpret the strength of the tau-b correlations differently; it is not the only approach.
• Less than ±0.10: very weak • ±0.10 to 0.19: weak • ±0.20 to 0.29: moderate • ±0.30 or above: strong When the dependent variables were measured by several means or segmented in several groups (i.e., level 1 situational awareness and level 2 situation awareness), the relevant coefficients were aggregated and averaged to describe an overall trend. The overall tau-b coefficients were tested with one-tailed t tests to find whether they were significantly different from zero. One-tailed t tests were employed in consideration of a small sample size. As the last step, we further classified the resultant coefficients in terms of information presentation methods (implicit or explicit) and transparency conditions (availability of uncertainty information) to evaluate whether these influenced the general trend. However, the t tests were not used due to the insufficient number of samples (e.g., n = 1 for situation awareness in explicit transparency). In the implicit condition, all information was presented on the screen, requiring operators to search, combine, and understand critical information. However, in the explicit condition, a clear description of automation's rationale was provided to operators along with other information. For example, Lyons et al. [40] provided a clear statement (i.e., the landing crosswind is too high for a safe landing) when the automation indicated that the runway was not acceptable, instead of presenting the crosswind speed somewhere on the screen.

Results
We investigated seventeen studies that contained eighteen experiments to identify (a) different approaches to developing transparent systems in the last eight years, (b) the general effect of transparency on key HFs, and (c) the effects of information presentation methods and the availability of uncertainty information on key HFs.

Overview of Transparency Studies
It was found that the SAT model (n = 9) was widely recognized for employing the concept of transparency in the HAT environment. While the most of research in our investigation used conventional SAT models, a few studies attempted to expand the SAT model further, such as through the agent-and-team SAT [35] or the in-depth SAT model [43]. Other approaches to developing transparent systems that have been employed are shown in Table 4. The HFs investigated in the HAT research area from 2014 to 2021 are described in Table 5.  The general relationship between trust and transparency level was studied in 13 experiments, and a moderate positive association (±0.20 < τ B < ±0.29) was found that was significantly different from zero (p = 0.014). No studies indicated a negative relationship. Interestingly, the associations from two information presentation groups, implicit without a clear description and explicit transparency with a clear description, differed noticeably. A strong positive association (±0.30 < τ B ) was observed in the explicit transparency studies, while a weak positive association (±0.10 < τ B < ±0. 19) was found in the implicit transparency studies. It was also found that five out of seven studies reported a decrease in trust level when uncertainty information was provided at the highest transparency level.

Performance
Overall, a strong positive association (τ B = 0.474) between performance and transparency level that was significantly different from zero (p = 0.004) was found fin 11 experiments, while one study reported a moderate negative relationship. The greatest difference was observed between the two information presentation groups. The association in explicit transparency studies was more than three times greater (τ B = 0.863) than in implicit transparency studies (τ B = 0.250). It was also found that the presentation of uncertainty information did not exert negative influences on performance.

Workload
A moderate negative association (τ B = −0.222) between workload and transparency level was observed, indicating a decrease in workload upon increased transparency level. However, it was not significantly different from zero (p = 0.085), and this result may not reflect a clear association, since seven out of nine studies demonstrated no association. This association became stronger in explicit transparency studies (τ B = −0.333), while it became weaker in implicit transparency studies (τ B = −0.167). Lastly, only one study reported an increase in workload when uncertainty information was provided.

Response Time
A weak positive association (τ B = 0.117) between response time and transparency level was observed from 7 experiments. However, it was not significantly different from zero (p = 0.368). The association became moderate and positive (τ B = 0.250) in implicit transparency studies, while a very weak negative association (±0.01 < τ B < ±0.10) was observed in explicit transparency studies. Only one study reported an increase in response time upon the presentation of uncertainty information.

Situation Awareness
A very weak negative association (τ B = −0.072) between situational awareness and transparency level was observed in five experiments, suggesting that an increase in situational awareness related to increased transparency was not statistically significant (p = 0.274). This association varied little in both information presentation types: a weak association for implicit transparency studies and no association for explicit transparency studies. Two studies reported a negative influence of uncertainty information on situation awareness, as shown in Table 6. Table 6. The overall associations (τ B coefficients) between key HF and transparency levels, and the statistics of one-tailed t tests, are represented. The overall associations under different information presentation conditions (implicit and explicit) denoted as I and E, are demonstrated. Lastly, a number of experiments that either reported positive or negative effects of uncertainty information on key HF is shown. For example, two experiments reported positive effects on trust. The positive effects indicate the increase in trust, performance, and situation awareness, and a decrease in workload and response time.

Discussion
The general transparency effects found in the 18 experiments were mostly in line with the common belief that the presentation of reasoning information plays a key role in human and machine interaction. The analysis shows that transparent systems could mitigate HF challenges at different levels, from weak to strong, and the possibility of amplifying their positive effects by providing a clear description to human operators. The rationale of these effects likely stems from the process of human reasoning.
The human reasoning process in complex work environments is similar to solving a puzzle, in which information elements are puzzle pieces. The transparent system provides the most relevant pieces to operators, helping them to form the right shape quickly. However, excessive information creates ambiguity and difficulties in interpreting the machine agent's action, which still applies in transparent systems. One of the main issues is that there may be multiple shapes (interpretations) derived from the same information elements at different amplitudes (signal strengths). This is especially likely in ERM tasks in the SCC, which are mostly indeterminate [52].
For instance, a transparent system may indicate a sudden decrease in the boiler's water level to the ERM operators in SCC with the intention of warning the potential feed water pump breakage. However, the ERM operators may interpret this information as water tube breakage and decide to stop the boiler. This is a misinterpretation of intentions between the system and its operators, which can lead to catastrophic disasters. One such accidents was the oil spill in the Kalamazoo River in Canada, in which operators decided to pump more oil instead of isolating the system after receiving the low-pressure pipeline alarm [53]. Furthermore, when operators allocate more attention to interpreting information, less is attention is paid to monitoring and updating information and supporting performance [54].
The presentation of possible interpretations and combinations of critical information clues may help direct the operator to verify its suggestions or analysis, minimizing the excessive attention allocated to data interpretation. By contrast, implicit transparency displaying key information elements on the screen may still require the operator to analyze information elements and decipher possible interpretations. This difference can be explained in terms of short-answer and multiple-choice questions. The two-process theory explains that recall involves more processes such as the formulation of the relation between memories and retrieval strategies than recognition [55][56][57]. This is well understood in the education sector when evaluating students' performance in different formats. It is known that multiple-choice questions demand the recognition of the right answer while short-answer questions demand the recall of the right answer [58], and significantly more time is required for short-answer questions than multiple-choice questions [59]. Chan and Kennedy [60] reported that multiple options seem to help students to form answers. Furthermore, it is possible that the presentation of multiple options may serve as a trigger to retrieve long-term memories when considering thought as a product of images, sounds, ideas, and words [61,62]. As in multiple-choice questions, the presentation of possible interpretations may benefit operators by guiding them to focus on verification; that is, to recognize a correct interpretation instead of analyzing and forming their own interpretation, which is a recall of a right interpretation, as shown in Figure 3. The allocation of data interpretation functions is likely to yield stronger associations between HF (performance, workload, and trust) and transparency levels in explicit models. A clear description of the current status transforms key information into context-specific information and mitigates the chances of misinterpreting key information, increasing operators' performance, as reflected by the highest positive association between performance and transparency level in explicit models. The higher association may also be related to the operator's preference of information display type, which takes the form of a written description in explicit models. Wright et al. [63] reported the stronger operator's tendency to opt for a simpler display (i.e., a text box) rather than a graph. Operators are likely to experience less workload, since the transparent system executes a part of the reasoning process that is the integration of key data. This is in line with the general notion that the more functions are executed by automation, the less workload is undertaken by operators [64]. The stronger positive association of explicit models in the trust dimension may be relevant to how feedback is described [65]. When the interpretations are addressed in a similar way to human logical reasoning processes, a series of data are integrated to deduce the situation, and operators may tend to trust the system more. For example, Skraaning and Jamieson [31] provided feedback (i.e., a program is starting up due to A) and reported the highest positive association. Similarly, three written reasons for a suggestion were presented (i.e., A vehicle will arrive faster than B vehicle because A vehicle follows a direct flight path) in the study by [41], yielding the second-highest association between trust and transparency level. Furthermore, when a transparent system interprets critical information, it is likely to minimize response time and maximize situation awareness, as it mitigates the burden of being in charge of data analysis and the likelihood of misconstruing key information. However, the results showed very weak positive associations for response times, and no association for situational awareness. This was likely due to the low number of samples (n = 2 for response time and n = 1 for situation awareness), requiring more empirical studies to be conducted for a precise evaluation of the explicit transparency concept.
The demonstration of uncertainty information at the highest transparency level, presented as a percentage of risk, failure and success, has mixed effects on HF. The effect was not notable in workload and response time, as one study reported. However, negative effects on trust level and situation awareness and positive effects on performance were observed. The decrease in trust level and situational awareness may have been because, in addition to the fact that uncertainty information raises doubts as to the system's capability, operators are incapable of understanding how such a value is calculated, preventing them from attaining a clear picture. The negative effects of uncertainty on trust were also discussed in the study by [66]. The positive effects of uncertainty information on performance may be related to how uncertainty information is presented. Kirschenbaum et al. [67] reported that the spatial representation of uncertainty information increased the performance of submarine officers who engaged with spatial tasks. In addition, the presentation of uncertainty information followed by operators' initial assessment is found to increase their performance from 20 to 45% by guiding them to reconsider their decision [68]. However, it should be noted that all the studies were based on simulation experiments, in which uncertainty information may not influence operator performance significantly, as an apparent solution was presented. In fact, the increase in performance may have been due to the increase in transparency level, requiring further investigations.

Conclusions
Many studies have been conducted on fostering and investigating the transparent system to mitigate HF challenges in the HAT environment. However, it was found that there are many inconsistencies in the empirical results. This is likely due to the various implementations of the transparency scheme, and the fact that none of the studies investigated a transparent system considering the ERM tasks in the SCC. We investigated different transparency models and their impact on key HFs. We found that when the agent provided a clear description supporting its suggestions, there were stronger associations between transparency level and certain factors (performance, workload, and trust). We also examined the positive and negative effects of uncertainty information on performance, trust, and situational awareness and found that the negative influence is likely to be a problem of how to present uncertainty information. We acknowledge that our findings are based on a relatively small sample size, especially when it comes to the effects of uncertainty information. The role of individual, task, and cultural characteristics in HF research areas were not considered. Therefore, we propose that transparent system development via a human-centered design approach is the key to mitigating HF challenges for performing ERM tasks in the autonomous maritime era, and it can be initiated from an initial understanding of which information, for which decisions, and under which circumstances, is required. Our future research will seek to identify which information is utilized for which decisions, and under which circumstances, for maritime engineers to develop an explicit transparent system for the ERM in the SCC.