A Review on Methods and Systems for Remote Collaboration

: Due to the appearance of COVID-19 in 2019, person-to-person interactions were drastically reduced. The impact of these restrictions on the economic environment was signiﬁcant. For example, technical assistance for commissioning or adjusting the parameters of some complex machines/installations had to be postponed. Economic operators became interested in the possibility of remote collaboration, depending on the manufactured products and the performance of the production lines that they owned. This bibliographic research was undertaken to address these needs. The purpose of this review was to analyze the current solutions, approaches, and technologies that workers and specialists can implement to obtain a reliable remote collaboration system. This survey focuses on techniques, devices, and tools that are being used in different contexts to provide remote guidance. We present communication cues and methods being employed, the implemented technological support, and the areas that beneﬁt from remote collaboration. We hope that our effort will be useful to those who develop such systems and people who want to learn about the existence of collaborative solutions, and that it will increase awareness about the applications and the importance of the domain. We are convinced that, with the development of communication systems, the advancement of remote support systems will be a goal for many economic operators.


Introduction
When talking about remote assistance, different situations requiring remote collaboration might come to mind. In some cases, work assignments, problems, or other issues might be easy to clarify during a phone call, but other tasks may necessitate the use of images or video communication so that an expert can understand each detail of a problem that an employee may be having. However, even a video call may not be sufficient for some tasks, e.g., if the problem is critical and highly complex.
There are multiple situations that may interrupt the expected continuation of a process, and in order to meet deadlines, they must be fixed immediately. The whole context is more complicated now due to the restrictions that arose from the COVID-19 pandemic. The reality is that there are not enough experts, and they are sometimes servicing or supervising multiple production sites. Thus, efficient collaboration systems for instant access are in high demand. Technical advancements and miniaturization enable real-time remote support, facilitating interactions and explanations for teams with workers located in different corners of the world. Infrastructure for remote collaboration enables fast problem identification and problem-solving, prompt intervention due to virtual presence, accessibility, and cost-effectiveness.
Remote collaboration is an emergent domain, and the tendency is for it to be democratized at a large scale due to technological advances and lowering costs. Moreover, we must mention that privacy concerns, security threats, and standardization gaps are major To the best of our knowledge, there has not been any comprehensive review of remote assistive technology until now. The motivation for conducting this review was the need to have an overview of the current technologies employed. There was no such type of review when we researched in the IEEE Xplore, Springer Link, and ACM Digital Library online databases. It is of great interest for researchers to have a clear view of the main methods and systems that would allow them to identify the position of their research in the field. To the best of our knowledge, there has not been any comprehensive review of remote assistive technology until now. The motivation for conducting this review was the need to have an overview of the current technologies employed. There was no such type of review Appl. Sci. 2021, 11, 10035 3 of 28 when we researched in the IEEE Xplore, Springer Link, and ACM Digital Library online databases. It is of great interest for researchers to have a clear view of the main methods and systems that would allow them to identify the position of their research in the field. Therefore, this paper reviews the current methods and systems employed in remote assistance and collaboration to gain clear guidelines for future research.
We review state-of-the-art remote assistance technologies, devices used, visual display technologies, and communication cues.

Methodology
Our research methodology was organized in three phases: planning the review, conducting the review, and documenting the review.
The first step was to define the scope of the review by identifying the answers to the following research questions: • "What kinds of devices are used in remote collaboration?" • "What are the industrial domains making use of remote collaboration infrastructure?" • "What are the technologies used in displaying the information?" • "What are the types of communication cues used in remote collaboration?" • "What can be improved to offer a better user experience?" The second step was to conduct the review by selecting the existing studies in the field, extracting the required content, and synthesizing the data for the last stage of elaborating the review.
A total of 230 articles were selected by querying two databases, IEEE [23] and ACM [24], using the following keywords: "remote collaboration", "remote collaboration system", "remote video collaboration", "remote assistance", and "remote guidance". We chose these two databases because the number of articles that resulted from the queries was significant, and we considered that the content was relevant for our research. Figure 2 shows the dynamics of the "remote collaboration" subject in academia and industry. Figure 2 depicts the distribution graph of the relevant articles sorted by database and year. Starting in 2010, a steadily increasing trend can be observed, almost tripling from 11 works (3 in IEEE and 8 in ACM) in 2010 to 31 works (6 in IEEE and 25 in ACM) in 2020. For the keywords "remote collaboration" and "remote intervention", Google Trends records data starting in 2004 [25]. We observed some overlap during 2004, when the interest was quite high, until early 2005. This interest can be explained by the first preoccupations in remote collaboration using augmented reality. As with each new technology that appears, the interest was high in the first year; afterwards, it decreased but had occasional spikes between 2005 and 2014. Interestingly, in March 2020, a sudden interest could be observed, which coincided with the beginning of the COVID-19 pandemic.
Trends records data starting in 2004 [25]. We observed some overlap during 2004, when the interest was quite high, until early 2005. This interest can be explained by the first preoccupations in remote collaboration using augmented reality. As with each new technology that appears, the interest was high in the first year; afterwards, it decreased but had occasional spikes between 2005 and 2014. Interestingly, in March 2020, a sudden interest could be observed, which coincided with the beginning of the COVID-19 pandemic. From the total of 230 articles found in the two indexed databases, we selected the most relevant ones, excluding the papers focusing on videoconferencing, thus retaining 118 papers pertinent to the subject of remote support. After extracting the most valuable information from the selected articles, we identified three main directions for this review. Accordingly, the content of this survey is arranged on three axes, which correspond to the topics of technologies in the remote assistance sector. The first motif is represented by the electronic devices that serve the remote assistance task. Secondly, we deepened the display technologies embodied in the devices employed, and, finally, we focused on the communication cues as means used by the expert for guiding the remote user to identify objects and follow instructions. We depict in Figure 3 the main topics of this study, which will be addressed in the following sections: existing electronic devices, display technologies, and communication cues.   For each of the three aforementioned topics, the information was systematized in a table that identifies the works in a specific scientific field and the different methods/equipment used. Thus, future researchers will be able to more quickly locate the scientific works and the methods/equipment used by different authors for a certain field/task/problem. Readers will be able to focus only on the solutions that apply to their activity, and they will be able to identify the advantages or disadvantages of the various existing/applied solutions from the comparison done in this review.

Electronic Devices Used in Remote Assistance
When developing a remote collaboration system, particular devices with different form factors are adapted for various purposes and scenarios. For example, video cameras can be head-mounted or worn on the shoulder; information can be displayed on a PC For each of the three aforementioned topics, the information was systematized in a table that identifies the works in a specific scientific field and the different methods/equipment used. Thus, future researchers will be able to more quickly locate the scientific works and the methods/equipment used by different authors for a certain field/task/problem. Readers will be able to focus only on the solutions that apply to their activity, and they will be able to identify the advantages or disadvantages of the various existing/applied solutions from the comparison done in this review.

Electronic Devices Used in Remote Assistance
When developing a remote collaboration system, particular devices with different form factors are adapted for various purposes and scenarios. For example, video cameras can be head-mounted or worn on the shoulder; information can be displayed on a PC screen, projected, or seen through a head-mounted device (HMD).
The system proposed by Villaruel et al. [1] was intended for remote surgery. A 2 DOF (degree of freedom) robotic arm was controlled distantly by electromyographic signals captured from the remote user's muscles, corresponding to the human arm's natural movements.
Another system for remote surgery is presented in [2]; the doctor performing the surgery uses an HMD and controls a surgical robot remotely. The article's title is "The rise of robots in surgical environments during COVID-19", which underlines the favorable context for the proliferation of remote systems during the COVID-19 pandemic. A more complex system used in remote surgery that employs a robotic arm is presented by Suthakorn et al. in [4]. The system has an expert station that allows the surgeon to control the surgical robots located at the remote intervention site through a 3D forcefeedback haptic robot.
M. Bauer [26] was among the first researchers who proposed a system to be used by two remote users for educational purposes, the aim being to set up a wired circuit. The prototype developed for the worker's side was composed of a video camera attached to a head-mounted device (HMD). The camera sensor was pointing away from the user in the approximate direction of the local user's gaze and tried to capture images of the task area. On the remote side, the expert analyzed the images and instructed the user on what steps to follow via a telepointer.
Susan R. Fussell's [27] study showed that in a remote bike repairing context, in addition to a head-mounted camera worn by the repairer, a scene camera provided valuable visual information for remote collaboration on physical tasks and augmented the feeling of co-presence for the remote helper.
The wearable active camera laser (WACL) described in [28] incepted the idea of placing the camera on the shoulder of the local user. A laser pointer was attached to the video camera, and the remote user controlled the laser beam intended for spatial guidance. The expert and the worker had headsets with microphones used to convey instructions and explanations from the working site. The system was tested in the context of an assembly task (Lego assembly). The results from [28] showed that, compared with a head-mounted camera, the WACL was more comfortable and caused less fatigue, although, if the wearer moved the laser pointer slightly, the pointer position changed; hence, guidance became difficult.
Machino [29,30] proposed a 2 DOF robot on wheels, situated on a worksite and controlled by a remote expert. A video camera was fixed to the top of the robot, allowing it to capture the local environment, while a projector was used to project the expert's instructions to the remote worker. The solution was targeted to the manufacturing industry for tasks involving maintenance operations.
M. Adcock and C. Gunn [31] presented a remote guidance system that enabled a remote expert to instruct a mobile worker using sticky annotations. The worker wore a helmet equipped with a small camera and a laser pico projector placed on top of the helmet. These devices were connected to a laptop in the local user's backpack. The remote helper received the video stream on a tablet that also communicated with the worker's PC. The tablet allowed the expert to sketch the instructions, which were transmitted, with the help of the pico projector, as a 3D line into the local user's environment. This system was aimed to be deployed in the domains of industrial manufacturing for maintenance and healthcare for medical consultations.
Starting in 2010, we noticed that smart devices (mobile phones or tablets) were designed to facilitate remote visualization and collaboration between workers in factories and experts in different locations [14,15].
The prototype proposed by Gauglitz in [32] is a mobile remote collaboration system used to assist a user in operating a mock-up airplane cockpit. The pilot was guided by the instructions of a remote helper. The local user was equipped with a tablet device with an integrated camera for sending environmental information remotely, and the display showed the virtual annotations placed by the remote expert. The remote user was presented with a view of the local user's environment and, besides controlling the annotations, the helper could also "freeze" his viewpoint at any time. Later, Gauglitz et al. [33] presented an improved solution for mobile remote collaboration. The solution enabled the helper to have an independent view of the local user's environment by controlling the remote video camera and the augmented reality (AR) annotations displayed on the local user's smart device (tablet). It was tested by a user repairing a car following the instructions of a remote helper.
Teleadvisor [34], a novel design used to solve complex wiring tasks, comprises two devices, the Teleadvisor component and the controller display. A worker controls the position of the Teleadvisor, represented by a robotic arm with a camera and a pico projector mounted on the top. Teleadvisor's camera captures the visual information, and the data are transmitted to an off-site controller operated by an expert. With the help of the projector, the information sent from the controller back to the Teleadvisor is displayed as an overlay emphasizing objects in the worker's environment. Thanks to the mobile robotic arm, the helper can move the camera's viewpoint closer to the object and zoom in and out.
All the aforementioned solutions [26][27][28][29][30][31][32][33][34] require a fixed setup for the cameras, either on the worker's or the helper's side. In contrast, HandsInAir [35] includes a new feature that allows mobility for both the remote helper and the worker. This novel approach is highly relevant for factories where workers need to walk around and inspect different machines. The solution requires that both users are wearing the same equipment, consisting of a camera mounted on a helmet and a custom-built near-eye display beneath the brim. The worker's camera is used to capture the worker's actions, and the remote user's camera captures the hand gestures of the helper, providing instructions for the local user.
The MobileHelper presented in [36] proposes a similar approach as [35]. The worker has a helmet equipped with a video camera and a near-eye display, but the remote helper uses a tablet. The video transmitted from the worker's camera is displayed on the helper's device. The tablet's video camera captures the hand gestures of the helper and combines them with the video data, and the result is displayed on the helper's tablet screen and the worker's near-eye display.
The study in [37] focused on how AR can be used to enhance remote space collaboration. The system used a Kinect depth sensor to capture a 3D model of the local user's surroundings, on which a remote user could overlay annotations. These labels appeared projected onto a screen with the help of a laser projector, which in turn could be manipulated by the remote expert. This solution also allowed the possibility for the off-site expert to manipulate the scene independently of the view of the local user.
Domova [38] presented a system conceived for solving physical tasks in the manufacturing industry, allowing both the on-site worker and the off-site helper to annotate video feeds. The worker carried a mobile phone with video streaming capabilities to capture the details of the local environment, and the scene information was displayed on the remote user's PC. Both users could take snapshots, and the stream could be frozen on either side when one of the users wanted to point at a particular scene at a chosen moment.
PopArm [39] is a solution based on a robotic arm that seems to pop out from a video streamed between two remote users. The robot arm is synchronized with the remote instructor's arm movements, and it moves and rotates on the local user's display, enabling the off-site user to point to and touch remote objects in the local user's environment. PopArm is useful when there is a need to point to specific objects remotely.
RemotIO [40] consists of two head-mounted cameras, one for the helper and one for the novice user. The local user shares his view; the video is then used for immersing the expert into the worker's field of view. The expert's hands are tracked with a depth sensor and superimposed in real time onto the worker's environment. The local user's hands and the superimposed expert's hands can be seen by both users. RemotIO aims to be used in the industry for remote maintenance and repair.
ExpertOnWheels [41] is a mobile telepresence robot designed to support collaboration between a field worker and a remote expert in the manufacturing industry. The architecture of the collaboration system is showcased in Figure 4. The mobile robot comprises a video camera for capturing the environment's details, a projector used for annotations, a speaker, a microphone for sound, and wheels for movement. A computer is the processing brain of the ExpertOnWheels (worker's side), which receives commands via the internet from the remote expert's PC controls. The local worker moves the robot only where intervention is needed.  To determine the task efficiency when collaborating remotely, Johnson et al. [42] ducted a comparative study for the use of handheld devices and head-mounted de (HMD). The conclusions showed that, for static tasks, handheld devices were more s ble, but for dynamic tasks, head-mounted devices helped the user to finish a task fa In another article, Johnson explored how the field of view impacts collaboration [43 this study, a robot controlled by a remote participant captured the distant environme different angles. The results indicated that using a wide angle and a panoramic view tributes to quicker task completion.
The JackInHead system [44] tries to overcome the field of view limitation of th mote user by using headgear with multiple cameras to capture an omnidirectional v The local user wears the headgear, while the remote user wears a head-mounted dis with head tracking functionality.
Tait and Billinghurst [45] investigated how the independent view helps users to ish tasks faster when using an AR interface. The prototype allowed a remote user to igate independently in the local user's space using a 3D scanned model of the local u environment. The local user wore a head-mounted display that helped to render th notations of the remote user. A four-video-camera system was used to track the ob from the local environment and the head movements of the host.
Smart Phone/Pad and Robot for Tele-operation and Tele-presence (SPRinT) [4 another remote collaboration system using a robot located at a local user's site. The r is equipped with a video camera and a projector to display the instructions of the re user. The local user can adjust the robot and the projector position while the expert trols it using a mobile phone.
ADAMAAS [47] aims to determine the local user's actions in order to trigger ins tions that are displayed as visual images or text on the HMD worn by the respective The system is composed of a head-mounted display with AR capabilities and an tracking module.
In order to increase the situational awareness of the remote user, S. Kratz and F. To determine the task efficiency when collaborating remotely, Johnson et al. [42] conducted a comparative study for the use of handheld devices and head-mounted devices (HMD). The conclusions showed that, for static tasks, handheld devices were more suitable, but for dynamic tasks, head-mounted devices helped the user to finish a task faster. In another article, Johnson explored how the field of view impacts collaboration [43]. In this study, a robot controlled by a remote participant captured the distant environment at different angles. The results indicated that using a wide angle and a panoramic view contributes to quicker task completion.
The JackInHead system [44] tries to overcome the field of view limitation of the remote user by using headgear with multiple cameras to capture an omnidirectional video. The local user wears the headgear, while the remote user wears a head-mounted display with head tracking functionality.
Tait and Billinghurst [45] investigated how the independent view helps users to finish tasks faster when using an AR interface. The prototype allowed a remote user to navigate independently in the local user's space using a 3D scanned model of the local user's environment. The local user wore a head-mounted display that helped to render the annotations of the remote user. A four-video-camera system was used to track the objects from the local environment and the head movements of the host.
Smart Phone/Pad and Robot for Tele-operation and Tele-presence (SPRinT) [46] is another remote collaboration system using a robot located at a local user's site. The robot is equipped with a video camera and a projector to display the instructions of the remote user. The local user can adjust the robot and the projector position while the expert controls it using a mobile phone.
ADAMAAS [47] aims to determine the local user's actions in order to trigger instructions that are displayed as visual images or text on the HMD worn by the respective user. The system is composed of a head-mounted display with AR capabilities and an eye-tracking module.
In order to increase the situational awareness of the remote user, S. Kratz and F. Rabelo Ferriera [48] focused their research on improving the user's view using a mobile telepresence robot situated at the worker's site. The study's main interest was to see how quickly the job was completed when the worker was using an HMD with head-tracking and mono camera systems, while the helper was using a fixed 2D monitor to see the worker's space. The video feed, captured at the local site by a camera mounted on the robot pan/tilt servo system, was seen by the remote user on his HMD. Using a head-tracking module, the helper could control the robot and the camera's orientation so that he could see the remote environment from various angles.
Most of the previously described prototypes concentrate on the remote user's awareness rather than the local user's reactions. In contrast, Empathy Glasses [49,50], proposed by Lee et al., focus on acquiring the local user's heart rate and facial expression as well as his gaze and viewpoint. To accomplish this, the system combines a see-through display with a head-mounted camera, an eye tracker, a facial expression tracker, and a heart rate monitor. The helper sees all the information acquired from those modules on a remote desktop interface.
Traditional remote collaboration solutions have examined what a single user sees in his local space, but JackInSpace [51] utilizes a new approach that allows a remote user to switch between the view of different local users (body users). In the prototype, the local user's side includes a head-mounted fisheye camera and depth sensors that capture the data and recreate a virtual 3D environment for the ghost (remote user). The remote user can enter the first local user's view to analyze the environment and, afterwards, can switch to another local user's view to see the scene from a different angle. The ghost wears 3D glasses to see the virtual perspective of the body user, displayed on three screens with the help of three projectors, an HDMI splitter, and a motion capture system.
MirrorTablet [52] is a low-cost system used for capturing hand gestures and is proposed for situations where both the helper and the worker are equipped with a mobile device (tablet). The hardware setup on the helper's side is composed of a mobile tablet on top of which is positioned a mirror within the field of view of the tablet's video camera. The mirror reflects the tablet's screen and the image of the hand gestures to make them visible to the camera. The video data captured from the worker are transmitted to the helper, and the instructions from the helper are overlaid on the worker's screen.
Unlike traditional remote collaboration systems, where the communication is done face to face, Gutsy-Avatar [53] allows the remote user to have the same view as the local one in an unusual way. This is performed using an electronically controlled t-shirt. The system architecture is presented in Figure 5. A local user wears a t-shirt embedded with a smart device/tablet with an attached camera. The device captures the local user's surroundings and displays them to the remote user. Behind the tablet, there are four servo motors commanded by a microcontroller, which receives commands from the expert's PC via the user's tablet. For the remote user, a PC with a video camera is needed to capture the user's face and send instructions related to the direction of the local camera view. The remote user changes the camera's view by sending commands to the server.
Microsoft Remote Assist [17][18][19] is based on a HoloLens app that enables the remote collaborator to see everything that the local user is seeing (including holograms and the real environment) and to add annotations that can be seen by the local user. The host wears a Microsoft HoloLens device equipped with a video camera and depth sensors capable of tracking hand gestures and eye and head movements. system architecture is presented in Figure 5. A local user wears a t-shirt embedded with a smart device/tablet with an attached camera. The device captures the local user's surroundings and displays them to the remote user. Behind the tablet, there are four servo motors commanded by a microcontroller, which receives commands from the expert's PC via the user's tablet. For the remote user, a PC with a video camera is needed to capture the user's face and send instructions related to the direction of the local camera view. The remote user changes the camera's view by sending commands to the server.  Remote Manipulator (ReMa) [54] focuses on reproducing the local object's manipulation of the remote site using a proxy object. At the tracking site, the object's manipulations are captured by infrared cameras and are transmitted to the manipulator's site, where they are displayed on a similar object, called a proxy object. The proxy object is automatically oriented by a robotic arm to reflect the position from the tracking site.

Internet
SharedSphere, proposed by Lee et al. [55], uses mixed reality (MR) to add gestures and cues to a 360-degree live panorama video. The local user wears a see-through AR HMD with an attached 360-degree video camera. The guest user wears a virtual reality (VR) HMD to watch the live panorama scene. A head tracking sensor is mounted on the host's side so that the remote user can have an independent view of the local user's environment even if the local user moves his head. The system uses view frames to indicate where each user is looking.
Unver et al. [56] addressed the use of multi-cameras for remote collaboration. The study compared the efficiency of using handheld devices over the hands-free experience in the context of using multiple video cameras. The system is composed of three types of cameras focused on the room (space), task, and the collaborator's face. Cameras could be switched manually or automatically depending upon the activity in the room. The study showed that task performance was the same when using handheld devices compared with when the user had his hands free, but users preferred the flexibility of the hands-free setup.
In the context of an industrial environment, S. Kesavan [16] proposed a video collaboration platform for remote interaction based on mobile devices and web cameras. The system has three main components: a dashboard receiver, a cloud service, and a plant operator. If a worker needs assistance in configuring industrial equipment, the machine's information is loaded onto a dashboard receiver, where it can be accessed by a remote expert. A cloud service hosts the APIs, which the collaborators can access to connect to the dashboard and operate. The system allows the operator to connect with the expert using a tablet device with a camera and to point the camera at the malfunctioning equipment. The collaborators can exchange information through video and audio channels and text messages through the chat functionality.
Kwon et al. [57] defined a fully asymmetric remote collaboration system where the remote user identified the local user's problem and had full control of the worker's environment while the worker had limited responsibilities. On the local user's side, a wheeled robot controlled remotely by a handheld device was equipped with a projector, a highresolution camera, and a 360-degree camera. The remote user wore an HMD device that displayed information about the local user's surroundings captured by the robot. The remote user was capable of zooming in and out, pointing to objects, or adding annotations that could be projected onto the local user's environment.
The system proposed by Teo et al. [58] is composed of an AR HMD device with a 360-degree camera on top, on the local user's side, and a VR HMD on the remote user's side. The local user captures 360-degree panoramas of the local environment and sends them to the remote user, where they are displayed as a prebuilt 3D scene. Verbal communication is accomplished through the speakers of the HMD, and for non-verbal communication, the remote user can use the VR controller for pointing and instructing the local user.
OmniGlobe [59] uses a VR HMD device on the worker's side and, on the other side, a spherical display with a 360-degree camera, which is located on a rotating platform, allowing the specialist to move around. The system has a 360-degree first-person mode that allows the expert to see the VR environment from a panoramic view, and a third-person mode makes it possible to see the environment from a higher elevation. Communication is enhanced by the system's ability for the users to share body gestures, gaze cues, and facial expressions.
Mohr et al. [60] proposed an MR collaboration system using mobile devices that uses light fields for orientation. The system records the local space and sends images to the remote user, who annotates them and sends them back to be visualized in AR by the local user.
The following table (Table 1) presents the repartition of the aforementioned articles and the system components used for the proposed solutions. Interested readers can easily and quickly identify the works that address certain areas/tasks and the methods/equipment on which the solution of remote assistance/collaboration is based.

Visual Display Technologies
In this section, we focus on methods that are used to share the visual space between two distant collaborators.
The most common method used in remote assistance and collaboration is a simple video camera with the ability to stream a 2D video, which is shown on a monitor at the helper's side. In order to give the remote collaborator the feeling of being collocated with the local user, the researchers also experimented with the use of 3D videos, 360-degree panoramas, virtual reality, augmented reality, and mixed reality. For each of the last three technologies, we will also provide a short definition within our discussion for a better understanding.
Most remote collaboration systems use video cameras installed at the local user's (worker's) space (environment). The video captured by these cameras is shared in 2D on a display such as a PC, TV, or smartphone. However, this method does not allow the remote user to understand the spatial relationship between the objects; thus, communication is more difficult.

3D View
Studies have been performed to obtain a 3D view that is presented to both operators. In one of these studies, G. Welsch [61] presented a prototype that uses an array of cameras to obtain a three-dimensional view for a remote environment.
OmniKinect [62] also tries to obtain a 3D view, but instead of using multiple cameras, it uses Kinects, which have depth sensors, to acquire real-time video.
The technology 3D Helping Hands [63] uses special cameras that are capable of 3D video capture in real time on both the worker's and the helper's side. The 3D videos acquired from the worker and the helper are then fused in real time to form a single, common workspace with an augmented view.
A new prototype that uses depth cameras at the worker's space [64] can capture in 3D the worker's position and environment. The cameras are connected to a fusion PC that renders the video stream with the helper's visual instructions.
Gauglitz [65] presented a new model of a remote collaboration system, where the local user has a smart device (tablet or smartphone) that captures the video information and sends it to the distant user. The remote user uses AR and adds annotations, which are overlaid on the source video and displayed on the local user's device.
Gao et al. [66] presented a mobile-based collaboration system in which the local user has a mobile phone equipped with a depth sensor in order to capture the local environment and display it in virtual reality (VR) as a 3D environment for the remote user. The system allows the expert to move independently in the virtual environment and to switch the view of the local user's smartphone to show the local changes in real time in the form of a 2D video.

Virtual Reality (VR)
Virtual reality is a visual technology that allows a person to immerse themselves completely in a digital world, independent of the physical environment, using special devices. The digital environment is computer-generated and contains objects and scenes that appear to be real. These environments can be used in training, games, or other live broadcast events. The user is equipped with a VR headset to access the applications that support VR. Popular VR devices include Oculus Rift [67], HTC Vive [68], and Oculus Quest [69].
Traditionally, immersive technologies have been mostly used to facilitate guidance during the exploration of an unfamiliar environment. From 1970 to 1990, VR devices were used for flight simulation, military training, and medical purposes. In the decade following 1990, these devices started to be commercialized for entertainment purposes. In recent years, VR devices have become more affordable. The combination of affordability and COVID-19 restrictions have fueled an enormous rise in their use [70].
CoVar [71] is a mixed reality system that supports collaboration between augmented reality and virtual reality users by sharing a 3D reconstructed environment. The VR user wears an HTC Vive Lighthouse device, which displays a reconstructed mesh. The AR user captures his real, local environment and shares it with a remote augmented virtuality user. Both collaborate on the tasks in the shared space. The system enhances the communication by using awareness cues (field of view and gaze cues) and an AV-Snap-to-AR interface, which enables the AV user to control his head orientation. The interaction between the users is improved by eye-gaze, head-gaze, and hand gestures.
Gao et al. [72] presented a prototype for remote guidance that uses VR headsets for both the host and the remote user. The workspaces of the worker and the helper are captured using camera depth sensors, which results in sharing 3D environment data for the remote user, who can also track the worker's viewpoint. At the remote user's site, the sensors capture the helper's hands to indicate the instructions that have to be followed by the worker. Thus, in the shared virtual space, the worker sees the helper's hands overlaid on his own hands.
The study in [73] focused on allowing the helper to have an independent view and study objects from any direction in a shared environment by capturing and reconstructing a copy of the local worker's space in 3D. Before the collaboration starts, the local user moves around their space using the VR device to capture all the details of the local environment. The remote helper can see the scene in a VR environment.
Elvezio et al. [74] proposed a remote collaboration in AR and VR intended to allow the remote expert to create or manipulate virtual replicas of the real objects. The remote user gives instructions to a local user by manipulating, pointing, or annotating the virtual objects in his VR environment. The same annotations appear in the AR environment of the local user. Both users wear a head-worn display (HWD).

Augmented Reality (AR)
Augmented reality places digital objects or annotations onto the real world. Historically, augmented reality was used with HMD devices capable of displaying digital content overlaid on a user's view of the real world.
In 1957, augmented reality appeared in the form of the Sensorama [75], an invention that could deliver visuals, sounds, vibrations, and smells to a viewer. Nevertheless, the term AR started to be used in 1990, when several workers wearing HMDs were guided in assembling electrical wires in aircraft. Over the last decade, different applications and devices that employ AR have emerged; for example, the design tool ARToolkit appeared in 2009, Google Glass in 2013, and HoloLens in 2015 [75].
One system that implemented AR was the Télé-Assistance-Collaborative system developed by Bottecchia [76], in which the operator was equipped with a specific AR display device. Its design enabled it to capture a video flow of exactly what the carrier's eye saw (flow A) and a wide-angle video flow (flow B). On the other side, the expert had a catalogue of images and annotations that he could use by applying them on the video stream seen in real time by the operator. J. Gu [12] presented another solution where AR was used with mobile devices to overlay objects onto the real environment.
In most collaboration systems where AR was utilized, head-mounted displays were employed. Schneider et al. [77] proposed an AR application that used edge computing. This solution consisted of three main components. The local user had a mobile AR device that captured images of the site and displayed the incoming video. These images were processed by an edge server, which sent them to the remote user and then back to the operator after overlaying the expert's annotations. The helper used a laptop/PC device that displayed the incoming video flux and a software program that allowed him to draw instructions on the images.
Zillner et al. [78] focused on the dense reconstruction of the local user's surroundings by generating a three-dimensional mesh automatically. The local user wore a pair of AR glasses equipped with a depth sensor used for scene reconstruction. The remote user could explore the scene independently, add annotations, and create 3D animations.
SceneCam [79,80] was a multi-camera remote collaboration system that allowed a remote user to have multiple views of the task space. The local user wore an AR HMD that collected information about the local space and ran an algorithm that determined which camera view captured his actions the best. Based on this selection, the system drew the remote user's attention to the optimal view. Another technique implemented by the system was the automatic camera view selection, which implies that, based on the algorithm, the system selected the optimal view for the remote user, making it their primary view.

Mixed Reality (MR)
Unlike VR, which implies a complete immersion experience that shuts out the physical world, mixed reality (MR) is a method that mixes the real world with the artificial world. Digital objects are superimposed, and images are overlaid onto the real world. MR systems enable distant collaborators to feel as though they are in the same space. The user needs a headset capable of MR (e.g., Microsoft HoloLens) in order to experience the involvement of virtual objects with the real world or physical view.
Yang and Peng in [81] presented a remote collaboration solution based on mixed reality. The real objects were virtualized using computing virtualization and displayed at the remote user's site. The local user could see how the remote user interacted with the virtual objects and could take the same actions on the real objects.
To increase the collaborators' shared view and reduce the ambiguity of deictic expressions, Muller et al. [82] proposed the use of shared virtual landmarks in a mixed reality environment.
Feick et al. [83] proposed a design for an MR system where both the novice and the expert wore head-mounted devices with video capturing capabilities. Based on the object of interest from the novice's environment, a proxy object is created in a virtual environment that can be seen by the expert in a split display together with the remote live video feed. The novice can also see the virtual proxy object and the expert's gestures.
Teo et al. [84,85] proposed an MR collaboration system that used a live 360-degree panorama in a 3D reconstructed scene. A remote user could choose the way in which he interacted with the local user's environment, either via a live 360-degree panorama video, using past static images captured from the 360-degree panorama, or using the 3D reconstructed scene. In [86], the authors compared the results obtained when allowing the user to switch between the 3D view and the 360-degree panoramic view of the local environments. The results showed that the users reported that their concentration was better when solving their task using the 3D mode and had a good understanding of the collaborator's focus in the 360-degree mode.
In addition to the capabilities of the system proposed by Teo et al. in [84], Gao et al. [87] allowed the local user to share a 2D first-person view of the local environment. The local surroundings were captured as a 3D scene and displayed on the remote VR HMD. The expert could analyze the virtual space from three different perspectives: the 2D first-person view, a 3D static view, and a 360-degree panoramic view. The results showed that the users preferred the 360-degree view, as it offered more control and independence compared to the 2D view.

360-Degree Panorama View
Another display option for collaborating remotely is the 360-degree panorama view, where the expert user can see all the surroundings of the local user.
LiveSphere [88] is a remote collaboration solution that consists of a system with wearable camera headgear that provides 360-degree spherical images of the local user's surrounding environment. The headgear has six video cameras, and the video streams are fused into a spherical video that is transmitted to a remote user. This setup allows the off-site user to see the local user's view and also to have the freedom to look around via the on-site user's view.
JackInHead [89] allows a local user to send an omnidirectional video to a remote user, who has an independent view of the local user's space. The prototype involves the use of headgear with multiple cameras worn by the local user and an HMD worn by the remote user. The omnidirectional video resulting from the images captured by the headgear is sent to the remote user, where it is mapped in a spherical virtual space.
SharedSphere [90,91] is a system used in interactive collaborations and allows 360-degree panorama video live sharing between two users. Hand gestures are overlaid on the live video; thus, mixed reality is used to enhance the collaboration. The guest user wears a VR HMD on which a high-resolution camera and a second 360-degree panorama camera are positioned. The video captured at the local site is sent to the host user, who wears a see-through HMD. The guest user can see the local user's hand gestures thanks to the video cameras, but the hands of the guest user need to be tracked with a hand-tracking sensor in order to be seen by the host user.
Kangas et al. [92] proposed a remote collaboration solution that enabled the remote expert to have a 360-degree view of the remote environment while instructing the local user with the help of a projector equipped with a close-up attached camera and pointer capabilities. This setup allowed the expert to analyze the local user's environment independently.
In contrast with other remote collaboration systems that offered a 360-degree view that allowed interaction between only two users, 360Anywhere [93] allows multiple users to interact remotely by annotating a 360-degree video and projecting it back to the local users using AR. The remote collaborator has a total view of the local environment without the need for the local user to move or adjust the cameras. Besides adding annotations, the user can rewind the video, chat with the remote user, calibrate the system, and track the collaborator's gaze.
On the Shoulder of the Giant [94] is a multi-scale mixed reality system sharing a 360-degree panoramic video that facilitates collaboration between a local VR user and a remote AR user. The study presents two modes of collaboration realized by the system: one shares a 3D reconstruction with the remote user, and the other shares a panoramic video at a different scale, using a 360-degree camera controlled by the remote user. Both users are allowed to switch between the two modes of collaboration. The use of the 360-degree camera was tested in different positions: head, back, hand, shoulder. The most preferred camera position was on the shoulder, which allowed the user to see the collaborator's face and environment. Table 2 shows the repartition of the articles and the display viewing technologies used in remote collaboration in the application field. It can be seen that, for maintenance, object manipulation, or other physical tasks, the visual display methods are important for both remote collaborators to have a clear view of the environment or the instructions that have to be followed.
Similar to the previous section, in the following table, interested readers can easily and quickly identify the papers that address certain areas/tasks and the methods/equipment on which the solution of remote assistance/collaboration is based.
As can be seen in the table, in this context (visual display technologies), most of the researchers focused on systems that allow a helper to give instructions on tasks involving object manipulation.

Communication Methods and Cues
In face-to-face collaboration, for both collaborators involved in exchanging information, not only what the person is saying but also where they are looking and what their hands and body gestures are showing is important.
Fussel [95] showed that for remote collaboration, video communication is better than having only audio. For a helper-worker pair involved in a bicycle repair task, Fussel compared the task performance in the following scenarios: one where the communication was done using visual information, and another where the communication was audio-only. The conclusions showed that a shared visual space was essential for the collaborative task.
In addition, in order to establish common ground, users need to use communication cues such as speech and no-speech audio, gaze, facial expression, and hand and body gestures. Many of these cues can be captured using different video telecommunication systems, such as Skype, Google Hangouts, or Zoom. However, in some cases, the remote user needs to point to a specific object or draw specific instructions; here, pointing gestures and representational gestures (annotations) are important [96]. Kirk and Stanton demonstrated how remote gestures influence the structure of collaborative discourse and how their use can also influence the temporal nature of the grounding process [97].
In [98], Kirk and Stanton performed a study on three different gestures' formats. The research compared the performance of unmediated hands only, hands and sketch, and digital sketch only.
We divide the following section into two subsections. In the first, we present studies on pointers and annotations, and in the second, we look at behavior detection.

Pointers and Annotations
One of the first methods applied in remote assistance was a reality-augmenting telepointer controlled remotely to guide the local user [26]. The results of this approach showed that a remote user could effectively guide and direct a local user's activities. The participants of the study conducted in [99] for remote construction of a robot valued the pointing device because it helped them to make references to objects faster than they could with video.
The visible light path laser projector (VLLP) described in [100] was equipped with a laser projector and a mist generator. This solution had the advantage that the laser projector, VLLP, could instruct with not only a laser spot but also with simple line drawings.
A new setup was presented in [101], where a remote expert could annotate, point, and draw on the local worker's objects using a laser projector. At the time that it was developed, this method was intended to support pre-and post-surgical consultations between a surgeon and a remotely located patient.
A prototype system that enables a mobile worker to receive guidance from a remote helper in real time using freehand sketches projected directly onto the worker's environment is presented in [31].
A study comparison of the use of pointers and annotations in live video and still images was performed in [102]. The results showed that users collaborate more efficiently using annotation cues than pointer cues for communicating object position and orientation information.
H. Jo [103] showed how Chili, a mobile phone call system, can provide on-video drawing capabilities and control of the viewpoint.
S. Kim proved, in his research on augmented visual communication cues [104], that both pointers and annotation could improve the feeling of being more connected, understanding the remote partner, and being together.
StickyLight [105] is a system based on a pico projector that allows a worker to draw annotations in the real local user's environment. The expert can draw and point via his tablet, which is connected to the local user's PC. The pico projector is paired with the local user's computer and performs the actions that the remote user is drawing.
Fakourfar et al., in their study about stabilized annotations [106], found that, by freezing the video, the temporal stabilization of the annotations was efficient if the camera did not have a fixed position. In this scenario, it could be difficult for the remote collaborator to return to live video if the viewed perspective was changed. Stabilized annotations proved to be useful when referencing objects but not when pointing or simulating object manipulation. The research found that, overall, users preferred stabilized annotations, even though they did not outperform non-stabilized annotations in all tasks.
In [107], a user was able to navigate in a virtual space in order to collaborate with other users, making use of 2D gestures in 3D reconstructed scenes. This method can be applied in virtual and augmented reality.
Rice et al., in their study [108], proposed a remote assistance platform that gave the remote user the ability to use video and virtual annotations to give explanations to the other user. The platform was compared with an instant messaging application that could be used to send text instructions for each captured image. The results showed that visual annotations are valuable when objects have a similar appearance.
AlphaRead [109] is a tool developed to reference objects by annotating them in a remote collaboration context. The system introduces the feature of object tracking, a solution for cases when the objects or the camera are moving. Each object has a readable label that the users can read in order to give instructions. This study showed that users found the object annotations useful and readable.

Behavior Detection
In this section, we look at studies that focused on the behavior of the human body in a remote collaboration context. Fussel [27] proposed a method where the helper could follow a local worker's gaze through an eye tracker and head-mounted camera worn by the worker. However, a problem with this system was identified. Users could not make eye contact, and gaze awareness was lost. To improve this aspect, a conventional telepresence system was developed in [110], based on a see-through display that gave local and remote users the sensation that they were separated only by a vertical sheet of glass. This offered the ability to make eye contact and use non-verbal communication such as gaze and gestures.
As augmented reality is starting to be used more often, researchers are attempting to use it to overlay hands on a video stream. In [111], a solution is presented allowing hand gestures to be identified using augmented reality, and this method enables intuitive interaction during real-time computer vision.
HandsOnVideo [112] gives the helpers the possibility to point to objects using their hands. The remote worker's hands are overlaid onto the video stream on which the local user is able to see the instructions. An improved solution of HandsOnVideo is proposed in [63], which introduces hand gestures in a 3D context and allows the remote helper to use his hands directly to gesture. A special camera is used to capture hand movements in 3D.
BeThere [113] is based on a mobile smartphone configuration and depth sensors that allow two remote users to perform 3D mobile collaboration. This solution allows the remote user to perform 3D gestures in a shared environment and navigate in the 3D shared environment. In order to direct the user's attention, the system uses awareness cues such as helping hands and 3D annotations.
In his research, Zenati-Henda [114] showed how a gesture-recognizing module could detect and recognize the gestures of a remote user. The system recorded the position and the description of the gesture and displayed them as virtual hands on the local user's display.
In their study, Gupta et al. [115] tried to evaluate whether sharing the eye-tracking information of the local worker could improve performance during a remote assistance task. The study combined an eye tracker and a pointer and showed that the cues could enhance the feeling of co-presence between remote users.
Another approach to using both the eyes and the hands was proposed by Higuch et al. [116] in their research. The study addressed the problem of eye fixation during a remote guidance operation. Helpers used their eyes to identify the object of interest, and the hands were projected onto the local user's space to provide instructions for the manipulation of the objects.
A study by Li et al. [117] explored how shared gaze awareness impacted remote collaboration. The results showed that shared gaze information could be disruptive but improved remote coordination.
SharedSphere [90,91] is a prototype in which both the local and the remote user's hand gestures are captured and displayed on head-mounted devices worn by the users. The local user's hand gestures are captured by a video camera, while those of the remote user are captured by a hand-tracking sensor.
The study in [118] researched how gaze sharing in both directions influenced collaboration and communication between two remote participants. An eye tracker module was placed on the HMD of the local user and on the display of the remote helper. Results for most of the users showed that the system improved their awareness of their partner's focus, while some users mentioned that they could determine what the next step for their job execution would be.
HandsInTouch [119] combines gesture sharing with sketches in order to collaborate remotely. The study shows that hand gestures might be sufficient when performing easy tasks, but sketches together with gestures give better results when used for complex physical tasks involving objects.
Otsuki et al. [120], in their study, focused on how gaze cues support remote collaboration when using the ThirdEye display. ThirdEye is a hemispherical display that helps the local user to determine the gaze direction of the remote helper, which is then displayed on a mobile terminal. The experiment showed that the local user's attention is driven to the objects of interest when using ThirdEye faster than when not using it.
Omnigaze [121], another telepresence system, uses an omnidirectional video camera that has a spherical display on top of it. An eye tracker positioned at the remote user's location captures their gaze direction, which is then represented as information on the spherical display. This system enhanced the remote collaboration to some extent; however, it had many disadvantages, one being that the remote user could see the local user but not vice versa, which caused discomfort.
Wearable RemoteFusion [122] shares eye gaze and hand gestures in an MR environment. The local user can see in AR the hand gestures of the remote user guiding him, while the remote user has a view of the local worker's eye gaze in VR. The study showed that users had an above-average feeling of co-presence, while the remote user was under less psychical load and concentration.
Wang et al. [123] investigated whether sharing the eye gaze or the head pointer improved performance when resolving a task between two remote collaborators. The results showed that there were no significant differences between the two methods. Thus, the head pointer method, which is less expensive, could be used instead of the eye tracking one.
ZoomTouch [124] is a system that allows multiple users to control a robot using a hand-tracking module remotely. Recognized hand gestures are used to control the robot, which has embedded tactile sensors.
Xiao et al. [125] reviewed eye-tracking prototypes categorized by their functionalities, subject used, and physical task types.

Combining Visual Communication Cues
The use of annotations and body cues has been studied separately, but there are several other studies that focus on the mixed use of visual communication cues, both related to body gestures and annotations.
Teo et al. [126] studied the use of hand gestures and annotations overlaid on a live 360-degree video. The system was based on MR; the local user wore an AR HMD, and the remote user wore a VR HMD. The remote user's gestures were tracked using a sensor and displayed in a 360-degree video feed as virtual hands. Besides the hand gesture cues, the user was allowed to use annotations, and his pointing gestures were captured and displayed as a ray pointer as seen through the local user's AR device. The annotations had fixed spots both in the virtual and the real environment, even if the local user's head direction changed. The study showed that the participants could finish the task faster and understand the remote instructions better when using visual annotations.
Kim et al. [127] researched the effect of combining visual communication cues such as hand gestures, sketches, and pointing in a mixed reality remote collaboration. The study showed that using sketches and hand gestures gave the best results regarding task performance, but the pointing cues did not improve the performance significantly. Moreover, the sketches and the pointing cues involved higher mental effort for the users.
The research of Teo et al. [128,129] focused on proposing communication cues when using an MR system based on a 360-degree panorama reconstructed in a 3D environment. The remote user could control a virtual ray pointer using hand gestures or add sketches (drawings), which were displayed through the AR HMD of the local user and overlaid onto the 360-degree panoramas seen by the remote user.
Bai et al. [130] modeled the eye gaze of the remote user as a virtual ray cast line and the remote user's hand as a 3D mesh; both were overlaid onto the local user's AR view. In order to help the users to identify their partner's location and viewing direction, the authors proposed the use of a 3D arrow cue and a virtual avatar as a virtual head frustum that points to the location of the other user. The head frustum represents the other user's head direction, while the 3D arrow is modeled as a pin arrow pointing to the other user's avatar head. The results showed that when combining hand gestures and eye gaze, the task completion time was reduced, and the users had a better feeling of co-presence. For the local user, the required mental effort was considered lower when using the gesture cues, but both users said that they preferred combined cues over gesture or gaze alone.
In Table 3, we classify the publications that address the use of communication cues in remote collaboration and assistance by the application and the communication method employed to accomplish the given task. In this case, the synthesis in Table 3 allows the straightforward identification of scientific papers that present a method applied in a certain field. With the advent of new 5G communications technologies, it is possible that researchers' efforts will focus on such methods. Moreover, many more applications will be developed, and the global state of the pandemic will accelerate research in these areas. Wiring Assembly [108] Education [111] [110] Industry [112] Printer Assembly [114] Design [124] Objects Manipulation [126,129] [ 126,129,130] [120] Physical Task [125] Surveying the scientific literature, it can be observed that efficient communication presupposes sound, image, and orientation towards the object being discussed. Digitization together with remote assistance and collaboration to accomplish tasks will be the future challenges for technological development.

Discussions
In this literature review, we identify the main applications where remote assistance is used. Table 4 represents the type of engineering applications described and utilized as case studies throughout this survey. Table 4. Applications of remote collaboration system.

Applications Article Reference
Healthcare [1][2][3][4][5][6][7][8][9][10]61,101] Industry [13][14][15][16][17][18][19][20][21]30,31,36,38,40,41,92,108,112] Education [11,12,26,29,34,39,47,53,95,96,99,110,111,121,125] Assembly Task [28,35,37,42,48- Design [59,118,124] To summarize the current literature review, as seen in Figure 6, we depict the trend for devices used for displaying the working context, and the development of the industry is surveyed during the periods 1999-2006, 2007-2010, 2011-2015, and 2016-2020. We observe an increase in the production of device types over all periods for all the studied technologies-HMCs, HMDs, video cameras, and smart devices and displays-with the exception of robotic devices and projectors, which are not as widely examined over time, perhaps because of their complexity and automation needs. We also observe a decrease in the use of discrete video cameras and displays that are integrated into HMCs and HMDs. The tendencies are towards the smallest form factor of the product; HMDs were the most dominant solution for the period 2016-2020. The interest in LCD displays and video cameras decreased from 2011 to 2015 and from 2016 to 2020. There were few identified HMDs and HMCs during 1999-2006, and the reason is that the integration and the miniaturization techniques were very incipient. We can state that the emergence of these devices, HMCs, HMDs, smart devices (tablet/phone/remote control), robotic devices, and projectors, started with a proof of concept, and the demonstration of their feasibility persuaded the manufacturing companies that the chosen solution was reliable. perhaps because of their complexity and automation needs. We also observe a decrease in the use of discrete video cameras and displays that are integrated into HMCs and HMDs. The tendencies are towards the smallest form factor of the product; HMDs were the most dominant solution for the period 2016-2020. The interest in LCD displays and video cameras decreased from 2011 to 2015 and from 2016 to 2020. There were few identified HMDs and HMCs during 1999-2006, and the reason is that the integration and the miniaturization techniques were very incipient. We can state that the emergence of these devices, HMCs, HMDs, smart devices (tablet/phone/remote control), robotic devices, and projectors, started with a proof of concept, and the demonstration of their feasibility persuaded the manufacturing companies that the chosen solution was reliable. The study of the occurrence of communication cues in the same periods of time is presented in Figure 7. We can see that 20 years ago, pointing was used more often as a communication cue when collaborating remotely; in the last four years, the emphasis has been on annotations, gestures, and eye gaze. Overall, studies showed that, during a remote collaboration, tasks are completed better and faster when the visual workspace is shared between the participants. The study of the occurrence of communication cues in the same periods of time is presented in Figure 7. We can see that 20 years ago, pointing was used more often as a communication cue when collaborating remotely; in the last four years, the emphasis has been on annotations, gestures, and eye gaze. perhaps because of their complexity and automation needs. We also observe a decrease in the use of discrete video cameras and displays that are integrated into HMCs and HMDs. The tendencies are towards the smallest form factor of the product; HMDs were the most dominant solution for the period 2016-2020. The interest in LCD displays and video cameras decreased from 2011 to 2015 and from 2016 to 2020. There were few identified HMDs and HMCs during 1999-2006, and the reason is that the integration and the miniaturization techniques were very incipient. We can state that the emergence of these devices, HMCs, HMDs, smart devices (tablet/phone/remote control), robotic devices, and projectors, started with a proof of concept, and the demonstration of their feasibility persuaded the manufacturing companies that the chosen solution was reliable. The study of the occurrence of communication cues in the same periods of time is presented in Figure 7. We can see that 20 years ago, pointing was used more often as a communication cue when collaborating remotely; in the last four years, the emphasis has been on annotations, gestures, and eye gaze. Overall, studies showed that, during a remote collaboration, tasks are completed better and faster when the visual workspace is shared between the participants. Overall, studies showed that, during a remote collaboration, tasks are completed better and faster when the visual workspace is shared between the participants.

Conclusions
From the remote collaboration systems presented above, several issues linked to the devices and video cameras can be specified:

•
Using fixed cameras can reduce the field of view for the remote specialist and make it difficult to determine where the local user is looking. • Head-mounted cameras (HMCs), besides being uncomfortable for the wearer, restrict the field of view for the helper, who can only see what the worker is seeing. On the other hand, HMCs allow the worker to use their hands to perform tasks.

•
Handheld devices limit a local user's ability to use both hands freely, but the research shows that they are a good option when working on static tasks. • Head-mounted devices proved to be suitable for dynamic tasks but need stabilization techniques for captured video because the images are shaky, and viewers become dizzy. • An independent view for the remote participant improves the viewer's confidence significantly, and verbal communication is reduced used during the collaboration. • Smartphone applications used in remote collaboration still have limited performance because of the heavy use of networks when streaming video.

•
No published works were found on the extent to which more experts from different geographically distant locations can offer collaborative assistance for a machine/process, etc.
When comparing the visual display methods, mixed reality proved to be very useful for tasks that needed complex design instructions and decreased time and mental effort. Reconstructing 3D scenes is challenging because they often have to be dynamically updated, requiring higher processing power.
Regarding communication cues, pointing was the best method to use when a user wished to be quick and precise when indicating an object. Annotation cues such as sketches proved to be useful, providing spatial information when manipulating objects. Hand gestures could express more information, such as pointing, emotion, appreciation, and shapes. Researchers found that a user finished faster a task when using hand gestures than without them.
This study reviewed different prototypes, systems, and methods used in remote collaboration over the last two decades. We categorized and analyzed the systems and functions of the devices used in remote collaboration, the display view of the local environment, and the communication cues used between remote collaborators. We discussed the limitations and disadvantages of the current system components and methods to better understand why these kinds of systems are not used at a large scale in industry and to help to make future decisions when designing a remote collaboration system. Table 5 summarizes the benefits and the drawbacks of the devices used in remote collaboration.
Remote assistance began to be used in surgery more than 20 years ago, and it is now beginning to be employed in other fields, such as education or industry.
We have seen that the latest technologies used in remote collaboration are those that allow the remote user to have an independent view of the local user's environment and offer the capability of a 360-degree panorama view. These systems employ HMDs together with MR, AR, or VR. Smart devices such as mobile phones and tablets, even if they are more accessible than HMDs and have better capabilities (such as video streaming), are not employed at a large scale for remote assistance.
It can be concluded that there are many challenges in obtaining a remote collaboration system that allows the remote users to have a controlled view and make them feel as if they are immersed within the local user's environment, together with the capability of transmitting instructions in a natural manner. Digitization, together with remote assistance and collaboration to accomplish tasks, will be the future challenges for technological development. Due to the new 5G communications technologies, researchers' efforts will certainly focus on such methods. Moreover, many more applications will be developed, and the problems caused by the pandemic situation will accelerate research in these areas.