Human–Machine Interface for Remote Crane Operation: A Review

: Cranes are traditionally controlled by operators who are present on-site. While this operation mode is still common nowadays, a signiﬁcant amount of progress has been made to move operators away from their cranes, so that they would not be exposed to hazardous situations that may occur in their workplace. Despite its apparent beneﬁts, remote operation has a major challenge that does not exist in on-site operation, i.e., the amount of information that operators could receive remotely is more limited than what they could receive by being on-site. Since operators and their cranes are located separately, human–machine interface plays an important role in facilitating information exchange between operators and their machines. This article examines various kinds of human–machine interfaces for remote crane operation that have been proposed within the scientiﬁc community, discusses their possible beneﬁts, and highlights opportunities for future research.


Introduction
Cranes are machinery primarily used for lifting and moving heavy loads from one place to another. As many industrial activities require the lifting of heavy loads, cranes can be found in various domains, such as manufacturing, construction, and maritime industries. Cranes also come with different forms and sizes depending on the environment where they are deployed and the weight of the load to be lifted. Some are installed in a fixed position, e.g., tower cranes, while others have wheels and can be moved around, e.g., gantry cranes.
Cranes are traditionally controlled by operators who are also present on-site. While this operation mode is still common nowadays, a significant amount of progress has been made to move operators away from their cranes [1]. The transition from on-site operation to remote operation (hereinafter referred to as "teleoperation") is mainly driven by safety concerns, since operators would not be exposed to hazardous situations that may occur around their machines [2]. Moreover, the design of crane cabins has also been reported to have ergonomic issues that would cause physical problems to operators over a long period of time [3,4]. Therefore, the ability to perform teleoperation is not only beneficial in case of accidents, but also for improving operators' wellbeing in general.
Despite its apparent benefits, teleoperation has a major challenge that does not exist in on-site operation. By being physically present on-site, operators can capture a rich amount of information directly through their senses. In the case of teleoperation, the amount of information that operators can receive is limited to what could be captured through sensors installed on cranes [5], what could be timely transmitted over the network [6], and what could be reasonably presented to operators [7]. On-site crane operation is already considered as complex and challenging [8] and the limitations presented above make crane teleoperation even more challenging [9].
The major challenge of crane teleoperation opens up opportunities for research on how human-machine interface (HMI) could assist crane operators to perform teleoperation in a productive and safe manner. HMI plays an important role due to its role as the instrument for information exchange between operators and their machines [10]. HMI for teleoperation is supposed to help operators observe the remote environment, make correct decisions, and provide necessary inputs, while trying to minimize cognitive and motoric workload [11]. The shift from on-site crane operation towards crane teleoperation is still an ongoing process [12], and thus it is relevant to explore what kinds of HMIs that have been proposed to address the major challenge of crane teleoperation. For that purpose, this article aims to examine different kinds of HMIs for crane teleoperation that have been proposed within the scientific community, discuss the results reported in the scientific literature, and highlight future research opportunities.
The remainder of this article is divided into four sections. Section 2 describes how the publications were retrieved, filtered, and analyzed. Section 3 presents the HMIs that have been proposed within the scientific community and describes the results that other researchers have reported. Section 4 discusses open issues based on the reviewed publications and highlights opportunities for future research, while Section 5 concludes the study in this article.

Method
The process for searching and selecting relevant publications in this article was performed according to the PRISMA guideline [13], which requires authors to clearly specify their source of databases, searching strategies, the criteria used in the selection process, and the number of publications screened throughout the selection process. Scopus was selected as the database due to its extensive coverage of scientific publications from different disciplines [14]. The following search string was used to find relevant publications on Scopus and the search was conducted according to all fields and limited to publications written in English: ("remote" OR "teleoperation" OR "tele-operation") AND "crane" AND "interface." The search result provided 2192 publications that fit the search string shown above. No time limitation was used in the search process, and thus everything published up to 15 February 2022 was considered. The next step was to remove publications that do not focus on cranes. The first filtering process was done manually by checking the publications using the following exclusion criteria:

•
The term "crane" is used to refer to species of birds; • The term "crane" appears in the authors' names or in the bibliography section only; • The term "crane" appears in the body text, but it is mentioned in a passing manner. For example, the term is only mentioned once or twice in the body text.
The first filtering process provided 130 publications that do not fall into the criteria listed above. Since this article focuses on HMI, the next step was to exclude publications that do not propose HMI for crane teleoperation. To have a clear definition on what constitutes an HMI in this context, this article defines HMI as the medium that informs operators about the situation of remote cranes and their surroundings (see "Display" in Figure 1), as well as the medium that allows operators to give commands to remote cranes (see "Controller" in Figure 1). This was done by examining the materials and methods section in the publications. The second filtering process provided 41 publications.
After examining the methods section, it was found that some of the proposed HMIs would allow operators to control their cranes without having a physical contact, but operators are still required to be present very close to their cranes. Some examples of this kind of approaches are found in [15][16][17], who respectively proposed the use of sticks with reflective materials, radio-frequency tags, and laser pointers as the replacement of control pendants for controlling bridge cranes. Since this article focuses on HMI that would allow operators to control their cranes from a separate location, the publications that proposed HMIs that still require operators to be present near their cranes were also excluded. The third filtering process provided 21 publications that were reviewed in detail. The PRISMA guideline [13] also requires explicit statements on what kind of questions that the review addresses. The detailed review focused on finding answers to the following questions: 1.
What kind of HMI that was proposed? 2.
What is the purpose of the HMI?

3.
For what type of cranes that the HMI was proposed? 4.
Was the HMI evaluated with test users? 5.
What were the findings from the evaluation with test users?

Results
As shown in Table 1, teleoperation has been investigated for various types of cranes, including bridge cranes (seven publications), tower cranes (five publications), gantry cranes (four publications), all-terrain cranes (three publications), deck cranes (two publications), and loader cranes (two publications). The various types of cranes also represent diverse industrial settings. Bridge cranes are often used in factories and warehouses, while tower cranes and all-terrain cranes are usually used in construction sites. Gantry cranes and deck cranes are typically used for handling cargoes for sea shipping, while loader cranes can be found in different kinds of worksites, as they can be used for lifting various kinds of goods. Table 1 also shows different kinds of HMIs that have been proposed for operating cranes remotely. Although the kinds of HMIs vary, they were mainly proposed for three purposes: general teleoperation (six publications), load sway reduction (six publications) and collision prevention (nine publications). General teleoperation refers to the HMIs that were mainly proposed to allow operators to observe the remote environment and control their cranes remotely. Load sway reduction refers to the HMIs that were specifically proposed for assisting operators to handle load sway that could happen due to the crane movement of their cranes and the weather condition. Finally, collision prevention refers to the HMIs that were specifically proposed for avoiding collisions between the crane/the lifted load and surrounding objects.
Although the proposed HMIs could be classified into three main purposes, the approaches to achieve those purposes differ from one study to another. The following subsections describe the different kinds of HMIs that have been proposed in the reviewed publications. Some of the reviewed publications also report evaluations with test users, while the others are still limited to technical evaluations only. More information is further described in the following subsections.  1 The profile of the participants is not specified in the publications.

Graphical User Interfaces for Performing Teleoperation
The HMIs in this category represent the graphical user interfaces (GUIs) that operators could use to observe the remote environment and control their cranes remotely. There are four GUIs presented in this subsection, where three GUIs were designed to be used on typical desktop computers and the remaining one was designed to run on mobile devices.
In the context of all-terrain cranes, Singhose et al. [23] proposed a GUI that shows one video feed from either the cabin view or the top view, as well as the information related to the crane's boom and the lifted load. The GUI also contains buttons that could be used to operate the crane remotely. In addition, they also proposed two more similar GUIs, where each of them was specifically designed for operating a bridge crane and a tower crane remotely. However, their GUIs were specifically designed for educational purposes only, where students could learn about crane-related concepts. Therefore, they do not report the effectiveness of the proposed GUIs for crane teleoperation.
In the context of gantry cranes, Kim [6] proposed a GUI that simultaneously shows two video feeds: (1) the cabin view; and (2) the spreader view, as well as the buttons for operating a gantry crane (see Figure 2). However, their study focused on measuring the network delay for transmitting data between remote cranes and the teleoperation station. Hence, they do not report about how the proposed GUI shown in Figure 2 would facilitate crane teleoperation.
In the context of bridge cranes, Yu et al. [2] proposed a GUI that simultaneously shows the video feeds from four camera views: (1) global view; (2) cabin view; (3) bird's eye view; and (4) top view (see Figure 3). In addition, the GUI also contains the buttons to control the crane and four icons that could be clicked to change the input techniques. Yu et al. [2] do report an evaluation with test users, but the evaluation focused on comparing the effectiveness between four input techniques. As such, the findings from the evaluation are presented in Section 3.7.
Differently from the previous examples, He et al. [35] proposed a mobile augmented reality (AR) application that augments a virtual replica (also known as the digital twin) of a physical tower crane into the operator's physical environment and presents the tower crane's current status (see Figure 4) as an alternative to showing the video feed taken from a tower crane. In addition, the AR application was designed for monitoring a tower crane rather than actively controlling it. In the future, the role of a crane operator may change to a crane supervisor, as cranes become more autonomous and require fewer human controls. Therefore, operators could still monitor the tower crane even though they are not in their workstations. The AR application also has buttons that operators could use to operate the tower crane remotely in case of emergency (see the bottom-right buttons in Figure 4). Twenty non-operator participants were split into two groups, where one group used the AR application and the other group used a real-time dashboard. Both groups were asked to monitor the current state of the tower crane and respond to any unsafe situation as soon as possible. The results show that the participants who used the AR application had shorter response time in detecting unsafe conditions than the participants who used the dashboard. Regarding the response accuracy, the participants who used the AR application also made less wrong responses than the participants who used the dashboard.

Different Ways of Presenting Video Feed from Different Camera Views
As already presented in Section 3.1, it is a common practice to have video feeds from different camera views for crane teleoperation. Having different camera views allows operators to observe the remote environment from different angles. However, depending on the size of the monitor being used, it could be challenging to present all visual information in a readable manner. This subsection describes the HMIs that specifically explore different ways of presenting video feed from different camera views.
In the context of gantry cranes, Karvonen et al. [28] investigated the suitable number of camera views to be shown on a 32-inch monitor. The first option is a two-camera view, which allows operators to manually choose the video feeds from two out of four cameras to be shown on the monitor. The second option is a four-camera view, where each view is set to show the video feed from one camera. Six crane operators were involved to evaluate how the two-camera view and the four-camera view would influence the operators' work. The results suggest that the two-camera view was more preferable than the four-camera view. Since only one monitor was used to show the different camera views, the size of each view in the four-camera view was considered too small to be seen. On the other hand, the size of each view in the two-camera view was considered large enough to allow operators to observe relevant objects in the remote environment. In addition, the operators also commented that they were able to focus with the two-camera view compared to the four-camera view, as the two-camera view enabled them to easily estimate the operation status from one glance.
In the context of tower cranes, Chi et al. [7] proposed a four-monitor setup, where each monitor is assigned to show the video feed from four camera views: (1) left-side view; (2) right-side view; (3) top view; and (4) global view (see Figure 5). To evaluate the effectiveness of the four-monitor setup, 30 non-operator participants and five crane operators were involved in an evaluation that compared the four-monitor setup and the onemonitor setup (showing the top view only) with verbal guidance. The results suggest that both groups of participants had a shorter completion time when they used the four-monitor setup than when they used the one-monitor setup with verbal guidance.
Similar to the previous example, Chen et al. [29] also proposed a four-monitor setup to be used in the context of tower cranes. The only difference is that Chen et al. [29] used the two upper monitors to show operation-related information, while the two lower monitors were used to show the video feed from different camera angles (see Figure 6). This arrangement was made to allow operators to perceive the remote environment and information from other sensors installed in the remote environment. Thirty non-operator participants were involved in an evaluation that compared their four-monitor setup shown in Figure 6 and the four-monitor setup with camera views only. The results suggest that the participants required shorter time to complete the given task while using the four-monitor setup with camera views only, even though the difference between both setups was not significant. However, fewer participants encountered unsafe situations with the proposed four-monitor setup shown in Figure 6.   [29] (used with permission from American Society of Civil Engineers). The two upper monitors present operation-related information, while the two lower monitors show the video feeds from different camera views. The left image illustrates the kinds of information to be presented in a safe situation, while the right image illustrates the kinds of information to be presented when a collision is imminent. The yellow dotted line represents the recommended lifting path.

Overlay Supportive Information into Video Feed
As operators and their cranes are located separately, operators rely on the video feed from on-site cameras to observe the remote environment. The HMIs presented here aim to help operators by overlaying supporting information into the video feed. Hence, operators could see the remote environment and the supportive information simultaneously.
In the context of all-terrain cranes, Yoneda et al. [18] proposed to overlay four types of visual information onto the video feed that would help operators to reduce load sway. The four types of visual information are: (1) a shadow of the lifted load; (2) an arrow that indicates the desirable joystick direction; (3) two bars that each indicates the current joystick angle and the desirable joystick angle; and (4) a side view that shows both the current and the desirable states of the jib and the hoist cable. Four participants were involved in an evaluation that compared how fast the presented information would help the participants to reduce load sway. However, due to the low number of participants, the results are inconclusive with respect to which visual information was the most beneficial for the participants. Nevertheless, compared to the condition with no supportive information, all the participants managed to reduce the load sway more quickly when any of the visual information was present.
In addition to the four-monitor setups described in Section 3.2, Chi et al. [7] and Chen et al. [29] also proposed to overlay the recommended lifting path and the collision warning into the video feeds to help operators avoid collisions with nearby objects (see Figures 5 and 6). To evaluate the effectiveness of the overlaid information, Chi et al. [7] involved 30 non-operator participants and five crane operators in an evaluation that compared the conditions with and without overlaid information. The results suggest that the crane operators had shorter completion time in the condition with overlaid information. In contrast, the non-operator participants had shorter completion time in the condition without overlaid information. Nevertheless, the presence of overlaid information was rated positively by both groups of participants, since the overlaid information helped them to mitigate potential collisions more easily.
Still related to overlaying the recommended lifting path into the video feed, Chi et al. [27] proposed an algorithm for generating lifting paths that take into account the camera's angle and the presence of obstacles around the worksite. Therefore, the generated lifting path is not only the efficient one, but also the realistic one. Their study focused on how quick the algorithm could generate the recommended lifting path for situations with different complexities. The results suggest that the algorithm was able to generate recommended lifting paths in real-time.
In the context of gantry cranes, Gao et al. [31] proposed an algorithm that overlays the safe area for each container onto the video feed, since knowing the safe area would help operators to detect potential collisions from the video feed. The proposed algorithm automatically detects edges of visible containers, and then generates the estimated safe area for each visible container. When two or more safe areas overlap, a collision warning would also be overlaid into the video feed to warn operators that a collision is imminent. As their study focused on the accuracy of the proposed algorithm in detecting containers, they do not report to what extent the overlaid information would influence operators' work.

Provide Auditory Information to Operators
Among the reviewed publications, Yoneda et al. [18] is the only one that specifically proposed the use of auditory information for helping operators of all-terrain cranes to reduce load sway. The proposed auditory information consists of the swinging sound that indicates the intensity of the load sway. In principle, the volume of the swinging sound would be lower as the intensity of the load sway decreases, and vice versa. As briefly described in Section 3.3, Yoneda et al. [18] also proposed four types of visual information as part of their evaluation. However, due to the low number of participants, the results are inconclusive in determining which of the supportive information would provide the highest benefit to the participants in terms of reducing load sway. Nevertheless, the presence of the swinging sound enabled the participants to reduce load sway more quickly than the condition without any supportive information.

Provide Force Feedback to Operators
When operators are physically present on-site, they could also receive information through their body movements and any receptible feedback through their skins, which is also helpful to inform operators about the current state of their operation. However, this kind of information is mostly lost in case of teleoperation. The HMIs presented in this section aim to provide artificial feedback that operators could perceive through their skins as a way to inform operators about the status of remote cranes.
In the context of bridge cranes, Farkhatdinov and Ryu [21] proposed three kinds of force feedback for helping operators to reduce load sway. The three kinds of force feedback are generated based on three states of crane movements: (1) dynamical model; (2) angular velocity; and (3) sway angle. Five non-operator participants were involved in an evaluation that compared the condition without any force feedback and the conditions with the three kinds of force feedback. The results indicate that the presence of force feedback enabled the participants to reduce load sway more quickly than the condition without any force feedback. Comparing the three kinds of force feedback, the force feedback based on angular velocity produced the shortest completion time, followed by the force feedback based on dynamical model and sway angle.
Suzuki and Murakami [26] also proposed the use of force feedback to assist bridge crane operators to mitigate load sway. The intensity of the proposed force feedback is automatically generated based on the crane acceleration, the length of the hoist cable, and the angle of the current load sway. To evaluate the effectiveness of the proposed force feedback, one participant was involved in an evaluation that compared the conditions with and without the force feedback. The results show that the presence of force feedback enabled the participant to complete the given task much faster and produced less load sway.
In the context of deck cranes, Chu et al. [30] developed of a customized haptic device for generating force feedback (see the left image in Figure 7). To evaluate the proposed haptic device, three non-operator participants were involved in an evaluation that compared the handling of load sway in the conditions with and without the force feedback. The results show that the presence of force feedback enabled the participants to reduce the angle of the load sway from 20 • to 2 • more quickly than when the force feedback was not present. The left image shows the haptic device (see "NHD" in the image) that could be used to operate a deck crane remotely and deliver force feedback to the operator. while the right image shows the crane simulator being used in the study [30] (used with permission from Springer Nature).
In the context of gantry cranes, Heikkinen and Handroos [25] also suggested the use of force feedback to help operators reduce load sway. They proposed three kinds of force feedback and each of them is given based on the swing angle, the swing speed, or the swing direction. The effectiveness of the proposed force feedback was evaluated by involving five non-operator participants. The results indicate that giving force feedback with the same direction as the swing direction helped the participants to mitigate the load sway. On the other hand, giving force feedback with the opposite direction as the swing direction led to even stronger load sway. In case of giving force feedback based on the swing speed, the participants were able to reduce the load sway, but they were not able to completely prevent it from happening. Furthermore, giving force feedback according to the swing angle was found to be more effective at reducing load sway than giving force feedback based on the swing speed.
While the presented examples so far are about giving force feedback for handling load sway, Villaverde et al. [24] suggested the use of force feedback to help bridge crane operators avoid collisions with surrounding objects. The intensity of the force feedback varies depending on the proximity between the crane and surrounding objects, where stronger feedback is given if the distance between the crane and another object decreases, and vice versa. One participant was involved to evaluate how the presence of force feedback would influence his performance. The results show that the presence of force feedback enabled the participant to work 20% faster than the condition without any force feedback.

Improve Telepresence Using Immersive Technologies
The HMIs in this category aim to improve the feeling of presence for crane teleoperation (also called as telepresence) by immersing operators into the virtual representation of the remote environment. Therefore, operators could feel as if they are present in the remote environment.
In the context of all-terrain cranes, Goh et al. [32] proposed a virtual reality (VR) system that offers two views to observe the remote environment: (1) in-cabin view and (2) observer view (see Figure 8). The in-cabin view allows operators to see the remote environment as if they are located inside the cabin of their cranes, while the observer view allows operators to observe the surrounding environment. To help operators work safely, a virtual box, which is slightly larger than the lifted load, is visualized to indicate the permissible distance between the lifted load and nearby objects (see the yellow box in Figure 8). When a collision between the lifted load and nearby objects is imminent, red boxes are visualized to indicate the area that should be avoided (see the red boxes in Figure 8). Since their study was limited to the technical feasibility of the proposed visualization, they do not report to what extent the proposed visualization would influence operators' capability to avoid collisions. Figure 8. The left image shows the view from the observer view, while the right image shows the view from inside the crane cabin [32] (used with permission from American Society of Civil Engineers). The yellow bounding box indicates the minimum safe distance between the lifted load and nearby objects. When a collision is imminent, the red bounding boxes also appear to indicate the area that should not be approached by the operator.
Major et al. [34] suggested using a cave automatic virtual environment (CAVE), where multiple projectors are used to present the virtual environment on the surrounding wall. The projection on the surrounding wall allows operators to see the remote environment, as if they are located onboard the ship. The virtual environment also contains virtual replicas (or digital twins) of a ship and a deck crane. The virtual environment and the virtual replicas were also reconstructed based on transmitted data from on-site sensors. However, their study was also limited to the technical feasibility of this approach, and thus they do not report to what extent the proposed approach would facilitate crane teleoperation.
In addition to the force feedback presented in Section 3.5, Heikkinen and Handroos [25] also suggested using a CAVE environment, where multiple projectors are used to present the cabin view of gantry cranes. Although they report an evaluation with test users, the evaluation focused on evaluating the effectiveness of different types of force feedback (see the results of this evaluation in Section 3.5). Hence, they do not report to what extent the use of multiple projectors would improve telepresence.

Provide Different Input Techniques to Perform Teleoperation
Among the reviewed publications, Yu et al. [2] is the only one that specifically investigated different input techniques for operating a bridge crane remotely. As briefly mentioned in Section 3.1, the GUI proposed by Yu et al. [2] contains four icons that represent four input techniques (see the bottom-left icons in Figure 3). The proposed input techniques are: (1) clicking the buttons on top-right part of the GUI using a mouse; (2) using a keyboard; (3) using a joystick; and (4) using hand gestures. They involved 21 non-operator participants and 11 crane operators to determine how the different input techniques would influence the participants' completion time and task accuracy. The results show that both groups of participants had the shortest completion time when they used the joystick, followed by the keyboard, the mouse, and the hand gestures. In terms of task accuracy, both groups of participants also had the highest accuracy when they used the joystick, followed by the keyboard, the mouse, and the hand gestures.

Incorporate Higher Levels of Automation into Crane Teleoperation
Using the levels of automation proposed by Parasuraman et al. [36], many HMIs presented from Sections 3.1-3.6 already offer some sort of automation, as the HMIs automatically analyze incoming data and/or generate some sort of feedback to operators. This subsection is dedicated to describe HMIs that were also proposed along with action automation, which execute inputs or decisions that operators make. Hence, operators do not need to give continuous inputs to control their cranes remotely.
In the context of bridge cranes, Sorensen et al. [20] proposed a GUI that shows the video feed of the remote environment and allows operators to specify the coordinate of the target location and the preferred lifting height. Once the target coordinate has been inserted, the system automatically moves the crane to the target location in a way that would produce less load sway. They involved 19 non-operator participants to evaluate the effectiveness of the proposed GUI against input devices traditionally used for controlling bridge cranes, such as a remote joystick and a control pendant. The results indicate that using the proposed GUI enabled the participants to complete lifting paths that require no hoisting faster than when they used the remote joystick and the control pendant. However, the opposite occurred for lifting paths that require hoisting, as the participants worked faster using the remote joystick and the control pendant than when they used the proposed GUI.
Osumi et al. [22] also proposed a GUI that could be used for controlling a bridge crane remotely. The GUI shows the video feed of the remote environment and the crane can be moved by clicking any location within the video feed. After giving the input, the system automatically moves the crane in a way that would prevent overshooting, i.e., the crane stops beyond the intended location. Since their study was limited to the accuracy of the crane movement based on this approach, they do not report how this approach would influence crane operators' work.
Top et al. [33] proposed a tablet application to control a loader crane by touching any location within the video feed of the remote environment. After the input is made, the system automatically moves the crane and its joints to the target location. They also proposed a manual version of the tablet application, which allows operators to manually control every joint that the crane has. They involved 28 crane operators and 28 non-operator participants in an evaluation that compared the effectiveness between the tablet application with automated control, the tablet application with manual control, and using a remote joystick. A remote joystick was included in the evaluation, since current loader cranes could be controlled using a remote joystick. The results show that the crane operators had the highest accuracy when they used the remote joystick, followed by the tablet application with manual control and the tablet application with automated control. The opposite happened to the non-operator participants, as they had the highest accuracy when they used the tablet application with automated control, followed by the remote joystick and the tablet application with manual control. In terms of completion time, the operators required the shortest time when they used the remote joystick, followed by the tablet application with manual control and the tablet application with automated control. The non-operator participants also had the shortest time when they used the remote joystick, followed by the tablet application with automated control and the tablet application with manual control.
Differently from the previous examples, Moon and Bernold [19] proposed four levels of control that could be used for operating a full-scale loader crane remotely: 1.
Manual control: The operator is responsible for controlling and monitoring the crane; 2.
Human-led control: The operator indicates the target lifting location and the system automatically moves the crane to the target location; 3.
Machine-led control: The operator controls the crane based on the visual information provided by the system; 4.
Autonomous control: The system completely controls the crane from the starting location to the target location.
Moon and Bernold [19] involved four crane operators to evaluate how each level of control would facilitate the completion of tasks with five levels of complexity: Level 1 (without any obstacles); Level 2 (with one obstacle); Level 3 (with two obstacles); Level 4 (with three obstacles); and Level 5 (with four obstacles). The results suggest that the autonomous control required the shortest time for completing the tasks with any levels of complexities. Excluding the autonomous control, using the manual control produced a shorter completion time for the tasks between Level 1 and Level 3 of complexities. However, both human-led and machine-led controls produced shorter completion time for the tasks with Level 4 and Level 5 of complexities.

Open Issues and Future Research Opportunities
This section emphasizes issues that emerge based on the findings reported in the reviewed publications and highlights opportunities for future research.

Involvement of Crane Operators
Out of 21 publications, there are 14 publications that report some sort of user evaluation (see Table 1). Among those 14 publications, only five publications that explicitly involved crane operators as part of the user evaluations [2,19,27,28,33]. The relatively low level of involvement of crane operators implies that it was difficult to involve crane operators as part of the design process. However, this situation is not unique to cranes only, since a similar situation has also been reported in the heavy machinery domain in both academic [1] and industrial [37] contexts.
Among the reviewed publications, there are three publications that involved both crane operators and non-operator participants in their user evaluations [2,7,33]. Based on these three publications, there are some notable differences between the results from crane operators and non-operator participants. Yu et al. [2] report that crane operators took a significantly longer time to complete the given tasks, but they had better task accuracy than non-operator participants. Chi et al. [7] also observed that crane operators completed the given tasks more cautiously than non-operator participants. In addition, they also report that the results between crane operators and non-operator participants are not always aligned with each other. For instance, as shortly described in Section 3.3, their crane operators worked faster in the condition with overlaid information, while their non-operator participants worked faster in the condition without overlaid information. Although Top et al. [33] do not report a notable behavioral difference between how crane operators and non-operator participants completed the given tasks, they also report the difference between the results from crane operators and non-operator participants. For example, as briefly described in Section 3.8, their crane operators had the highest accuracy when they used the remote joystick, followed by the tablet application with manual control and the tablet application with automated control. On the other hand, their non-operator participants had the highest accuracy when they used the tablet application with automated control, followed by the remote joystick and the tablet application with manual control. These comparisons suggest that researchers should carefully consider what kinds of measurements to be collected when non-operator participants are involved, since the obtained results may not reflect the results that could be obtained if crane operators are involved.

Design of Teleoperation Stations
The multi-monitor setups presented in Section 3.2 resemble remote crane stations that are currently used in industry (see Kalmar [38] for an example). Using multiple monitors increases the available area to present visual information to operators, even though this could also be achieved by using one large monitor. Nevertheless, multiple monitors are also widely used for teleoperation in other contexts [11,39].
Among the reviewed publications, there are three publications that aim to improve the feeling of presence by immersing operators into virtual environments [25,32,34]. Instead of limiting operators' view according to the size of their monitors, operators could be immersed into virtual environments and giving the feeling as if they are operating from inside their cranes. As described in Section 3.6, there are two approaches that have been proposed in the reviewed publications: (1) using virtual reality and (2) using a CAVE environment. However, these proposed approaches have not been evaluated yet, and thus it is still unclear how the immersive setups would influence operators' capability to operate their cranes remotely.
Some types of cranes, such as loader cranes and all-terrain cranes, can be mobilized and be used in various workplaces. In this case, operators are responsible for transporting their cranes and they are also required to inspect the work environment to ensure that it is safe to perform lifting operations there [40,41]. Therefore, having a fixed teleoperation station seems to be less suitable for these types of cranes. As an alternative, operators may rely on mobile devices, such as tablets, to perform teleoperation due to the mobility requirement. Among five publications that focused on these types of cranes (see Table 1), three of them proposed the use of desktop computers [18,19,23] and the remaining two proposed the use of mobile devices [33,35]. In the future, it would also be interesting to compare these two setups to determine their suitability for operating these types of cranes remotely.

Facilitate Telepresence through Multimodal Feedback
As mentioned in Section 1, remote operators are unable to capture the same amount of information as what they could do by being physically present on-site. This situation opens research opportunities on how to make operators feel as if they are present on-site. Based on the reviewed publications, visual feedback is still the most common way to inform operators about the situation of the remote environment. In addition, there are also some publications that proposed the use of auditory and force feedback to inform operators about the condition of the remote environment (see Sections 3.4 and 3.5). However, it is important to note that the reviewed publications still consider the different modalities in isolation from each other. For example, Heikkinen and Handroos [25] also proposed using three kinds of force feedback in addition to the CAVE environment. However, their evaluation focused on the impact of the proposed force feedback only. Another example is Yoneda et al. [18], who investigated how visual and auditory information could help operators to mitigate load sway. Their evaluation compared each of the supportive information, instead of determining how the combination of visual and auditory information would help to perform crane teleoperation. The use of multimodal feedback has the potential to facilitate operators' feeling of being present in the remote environment [11,42].

Mitigate the Impact of Time Delay on Teleoperation
Since operators and their cranes are placed in different places, teleoperation is never free from time delay [43]. Time delay is not only produced due to data transmission between operators and their cranes (also known as network delay), but also due to the time needed for processing and executing incoming data [44]. Having an acceptably low time delay is essential in any kind of teleoperation, as a large time delay could influence operators' capability to perform teleoperation. The presence of a time delay produces the "move-and-wait" situation, where operators make one input and then wait for incoming feedback before giving further inputs [45]. The presence of a large time delay not only reduces operators' capability to work quickly, but also their ability to work correctly. In the context of remote car driving, Neumeier et al. [46] reported that drivers' capability to follow the planned route started to deteriorate when the time delay reached 300 ms. Moreover, the presence of large time delay also has the potential to harm operators' wellbeing. For example, Brunnström et al. [47] found that operators of forest machinery started to feel discomfort when the time delay exceeded 400 ms.
Among the reviewed publications, there are four publications that attempted to mitigate network delay produced by data transmission. Kim [6] specifically investigated the use of Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) to determine how different network protocols would reduce network delay for data transmission between a teleoperation station and remote gantry cranes. The remaining three publications attempted to reduce the amount of data to be transmitted by eliminating the need for transmitting video from the remote environment, since video transmission usually consumes the highest network bandwidth [48]. Villaverde et al. [24], Major et al. [34], and He et al. [35] decided to show the virtual replicas (or digital twins) of the remote cranes, which were reconstructed based on real-time data from sensors installed in the remote environment, as an alternative to transmitting video feed. However, those three publications do not investigate how the interaction with the proposed virtual replicas or digital twins would support operators' capability to perform teleoperation in the presence of relatively large network delay. Furthermore, the deployment of newer network technologies, e.g., 5G, is also expected to reduce network delay produced by data transmission [49].
The presence of time delay opens research opportunities on how HMI could help operators to control their cranes remotely in the situation with a large time delay. Although none of the reviewed publications proposed something in this research area, there are two notable approaches that could be adopted into crane teleoperation. One approach is to visualize both presence and magnitude of time delay, so that operators could prepare their own strategy for mitigating the impact of time delay [50]. Another approach is to provide predictive information that indicates the near-future state of the crane, which would enable operators to estimate what would soon happen without having to wait for incoming feedback [51].

Considerations for Human-Automation Interaction
Speaking about teleoperation in general, automating some of the operators' work is preferred due to the presence of time delay that could hinder operators' capability to perform continuous control [11,42]. Instead of requiring operators to perform continuous control, operators could provide high-level commands and let automation to execute the rest. This kind of automation has been proposed in the context of crane teleoperation, where operators only need to indicate the target location and the automation systems move the cranes on their own (see Section 3.8).
As technology progresses, cranes are expected to be increasingly autonomous in the future. Among the reviewed publications, Moon and Bernold [19] provide an example of such autonomous control, where the crane could lift a load and navigate through obstacles without any intervention from an operator. However, even if cranes could work autonomously, they will still need to inform human operators about their status and what they intend to do, so that operators could make necessary intervention in a timely manner [11,39]. In this case, the role of an operator would progressively change from the crane controller to the crane supervisor [1]. He et al. [35] and Major et al. [34] are the only examples of research in this area among the reviewed publications, as their proposed digital twins were specifically designed for monitoring the physical cranes rather than for controlling them actively. The shift from the crane controller to the crane supervisor opens further research opportunities on how HMI could convey both status and intention of autonomous cranes to operators [52] and how HMI could help operators to maintain their vigilance for handling exceptional situations that could happen at any time [53].

Examine the Impact of the Proposed HMIs on User Experience
As the instrument that allows operators to observe the remote environment and control their cranes remotely, the HMI also has the potential to affect how well operators could perform their work and how operators view and experience their work [54]. As shown in Table 1, most of the publications that report some sort of user evaluations investigate the impact of the proposed HMIs on the participants' performance. This trend can be seen from the reliance of performance-related metrics, such as completion time and task accuracy, to determine the impact of the proposed HMIs (see Table 1). Hence, the impact of the proposed HMIs on the participants' experience was often not investigated in the reviewed publications. Among the reviewed publications, Karvonen et al. [28] is the only example that explicitly states what kinds of experiences that the designers aim to provide by using the proposed HMIs and discusses to what extent their participants could achieve such experiences.
Savioja et al. [55] argue that it is important to examine the impact of the new tool on user experience, especially in safety-critical domains, e.g., heavy machinery, since user experience could indicate the overall appropriateness of the new tool for the work that should be performed. Any negative experiences should be addressed, since they could indicate any inadequacies of the new tool for the performed work. Since safety-critical domains are characterized by the occurrence of failure that may lead to serious damage on life, property, and environment [56], any inadequacies of the new tool should be addressed in order to prevent failure from occurring [55].

Conclusions
To prevent crane operators from being exposed to hazardous situations, a significant amount of progress has been made to allow operators to control their cranes remotely. As the instrument that allows operators to observe the remote environment and control their cranes remotely, HMIs could affect operators' capability to perform teleoperation. This article has examined various kinds of HMIs that have been designed to help operators mitigate challenges that exist in crane teleoperation. The results show that crane teleoperation has been investigated in different types of cranes, such as bridge cranes, tower cranes, gantry cranes, all-terrain cranes, deck cranes, and loader cranes. While the kinds of HMIs vary diversely, they were mainly designed for three broad purposes: (1) to enable operators to perform teleoperation; (2) to help operators to reduce load sway; and (3) to assist operators to prevent collision with nearby objects. Although not all the reviewed publications report some sort of user evaluations, the overall results suggest that there are improvements between the conditions with and without the proposed HMIs. Furthermore, this article has also highlighted six open issues that could be investigated to further explore how HMIs could improve crane operators' capability to perform teleoperation.