Immersive Robotic Telepresence for Remote Educational Scenarios

: Social robots have an enormous potential for educational applications and allow for cognitive outcomes that are similar to those with human involvement. Remotely controlling a social robot to interact with students and peers in an immersive fashion opens up new possibilities for instructors and learners alike. Using immersive approaches can promote engagement and have beneficial effects on remote lesson delivery and participation. However, the performance and power consumption associated with the involved devices are often not sufficiently contemplated, despite being particularly important in light of sustainability considerations. The contributions of this research are thus twofold. On the one hand, we present telepresence solutions for a social robot’s location-independent operation using (a) a virtual reality headset with controllers and (b) a mobile augmented reality application. On the other hand, we perform a thorough analysis of their power consumption and system performance, discussing the impact of employing the various technologies. Using the QTrobot as a platform, direct and immersive control via different interaction modes, including motion, emotion, and voice output, is possible. By not focusing on individual subsystems or motor chains, but the cumulative energy consumption of an unaltered robot performing remote tasks, this research provides orientation regarding the actual cost of deploying immersive robotic telepresence solutions.


Introduction
The potential of social robots for educational applications is enormous, allowing cognitive outcomes that are similar to those with human involvement [1]. While many research efforts focus on aspects related to autonomous and cognitive robotics for education [2][3][4][5], enabling learners and instructors to control a social robot remotely and to immersively interact with their peers and students opens up further possibilities for effective lesson delivery, participation, and tutoring in the classroom. For the operator, who could be either a teacher or a tutor (peer) [6], directly interacting with students is crucial for acquiring non-verbal feedback and observing immediate reactions to evaluate their comprehension [7].
Educational research distinguishes various communication mechanisms between students and instructors, i.e., teachers or tutors, which include non-verbal clues that are visible to the instructor during the lesson [7,8]. These clues involve monitoring and tracking motion to different extents and the time that students spend looking at materials or looking away. Conversely, instructor feedback to students is equally important to the learning process [9], and could be transmitted to students using the verbal and non-verbal capabilities of the robot. A social robot with diverse interaction modalities would thus increase the quality and amount of feedback delivered to students.
Virtual reality (VR) and augmented reality (AR) technologies lend themselves perfectly to complement pure human-robot interaction (HRI) scenarios where a robot is controlled remotely. Operators using a VR headset combined with motion-based control can naturally translate their movements into input, which, together with the visual, acoustic, and further channels of expression, allows for a highly immersive interaction between operator and student. Technologies such as emotion recognition and face detection further enhance how the operator can perceive the student.
Due to the proliferation and high availability of mobile networked devices, such as smartphones or tablets, in addition to the VR-based telepresence solution discussed in [10], in this article, we also introduce a mobile application for the location-independent operation of a social robot. Albeit slightly less immersive, an AR layer in a mobile application can also present much of the above information. Motion controls can be realized through both touch or gyroscopic input.
The recent COVID-19 pandemic has resulted in remote teaching at an unprecedented scale, with telepresence and related technologies gaining further significance. However, the performance and power consumption associated with the involved devices are often not sufficiently contemplated, despite being particularly important in light of sustainability considerations. In addition to discussing VR-and app-based telepresence solutions for the location-independent operation of a social robot (cf. Figure 1), we thus perform a thorough analysis of their power consumption and system performance, examining the impact of employing the various technologies. By not focusing on individual subsystems or motor chains, but the cumulative energy and message flows within an unaltered robot performing remote tasks, this research provides orientation regarding the actual cost of deploying immersive robotic telepresence (IRT) solutions.

VR-based Telepresence
App-based Telepresence Network Phone Robot Computer Figure 1. Immersive telepresence approaches that offer the operator different views and interaction modes: 3D/stereoscopic, motion-focused (VR, top right) versus 2D, touch-focused (App, bottom right).

Problem Description
Robotic telepresence in classroom settings has been the subject of research for several decades, resulting in different applied commercial solutions, such as Beam [11], Double [12], or Ubbo [13], along with many purely experimental approaches.
Zhang et al. [14], for instance, proposed a telepresence system based on a robot avatar controlled via a mobile-app solution where the robot fulfills the role of the teacher. In their work, they reviewed the major hardware and software components associated with telepresence systems. Other research approaches, such as that by Cha et al. [15], evaluated different design elements associated with robot and interface systems for deploying the platforms in classroom scenarios. Based on the Beam system, they analyzed different user experience challenges in teaching sessions through telepresence, such as communications, inclusion, or embodiment. Gallon et al. [16] presented an overview based on Self-Determination Theory (SDT), a psychological theory that was applied to understand the learning process and how to motivate and attract students' attention. Using Transactional Distance (TD) theory, they suggested a multi-device system comprising a tablet serving as a whiteboard and a computer for robot control. TD focuses on technology-enabled teaching and learning, determining the degree of psychological distance between the student and the teacher, including the three variables of dialogue, structure, and learner autonomy.
The robots in all of these experiments consisted of a display or tablet computer mounted on a mobile drive unit, only vaguely resembling a human. Even with this non-anthropomorphic shape, such robots prove to be valuable for education, and embodiment constitutes a cornerstone of student engagement [17]. The addition of other humanoid features to the robot is usually omitted due to bandwidth requirements, as well as an increased interaction complexity and higher cognitive load for users [15]. In terms of interface, the final operator is provided with information similar to that of videoconferencing software, i.e., audiovisual data from the local robot are transmitted to the remote operator, and vice versa. Some complex tasks can be performed autonomously, such as approaching students or navigating in the classroom [14].
The majority of studies focus on enabling telepresence in the first place and rarely take an immersive approach [18], which would involve dedicated social robots, as in [10], or adapted robots, as in [19]. IRT solutions are much more common in, e.g., drone operation [20], since, once airborne, more degrees of freedom (DoFs) can be easily mapped to motion and sensorics, and VR technology [21] can potentially be involved to convey a higher informational bandwidth.
VR elements in an IRT system for educational purposes add another dimension to curricular components, offering many promising applications [22]. For instance, individuals interacting with VR textbooks show better performance and obtain better feedback than under traditional or video conditions [23]. VR has also been successfully employed in physical education for people with disabilities [24]. However, such advantages observed in VR studies also require analysis from a Robotics in Education (RiE) perspective [25], from a curricular point of view [26], and in the broader perspective of Inclusive Education [27] in order to be able to provide equal opportunities for all.
Although most of the reviewed IRT approaches are well motivated and well intended, the literature mainly focuses on the analysis of bandwidth consumption, operator experience, and final user acceptability. Deployment cost and resource consumption are rarely contemplated fully, which, under sustainability considerations, is highly relevant and might result in different design decisions, choices, and concerns.

Challenges
When deploying robots outside of a laboratory context, power consumption constraints become a central concern. Depending on the application, the robot's battery can be drained in no time and can deteriorate the user experience. For instance, it is well known in the robotics community that vision systems and the associated sensors generate most of a system's overall consumption.
Modeling a robot's resource consumption is tackled similarly to modeling any other computer system [28]. It is not a straightforward process when the robot is not meant to be tampered with and to remain unopened. Therefore, it is necessary to first identify and estimate the power consumption of the platform's various hardware components. Subsequently, the power associated with software applications needs to be modeled, since different heuristics would produce other levels of power consumption [29].

Research Question(s)
This article focuses on IRT for remote educational scenarios. By discussing a dedicated social robot's location-independent operation using (a) a VR headset and controllers and (b) a mobile application, we explore the following central aspects and associated research questions: • RQ1-Applicability. How can we use immersive technologies, such as VR and AR, to promote engagement in remote educational scenarios involving robots? • RQ2-Sustainability. How do IRT solutions fare in light of sustainability considerations? -RQ2.1 (explanatory). What is the cumulative energy consumption? -RQ2.2 (exploratory). What are the effects of different immersive technology types on robot performance? -RQ2. 3 (exploratory). What are the deployment costs of each system? By answering the above research questions and analyzing power consumption, performance, and other vital indices, this article provides a general overview and a set of guidelines on the potential and associated costs to consider when deploying different IRT solutions using both VR-based and AR-based approaches. A more detailed discussion of the underlying framework and an initial qualitative study on how the different stakeholders in educational settings, i.e., instructors, learners, and peers, receive IRT solutions can be found in [10].

Robot
This project's robotic platform is LuxAI's QTrobot [30], a humanoid robot with an expressive social appearance. It has a screen as its face, allowing it to display facial expressions and emotions using animated characters, along with 12 DoFs to present upperbody gestures. Eight DoFs are motor controlled-two in each shoulder, one in each arm, plus pitch and yaw movements of the head. The other four-one in each wrist and one in each hand-can be manually configured. As shown in Figure 2, amongst other features, QTrobot has a close-range 3D camera mounted on its forehead and is provided with a six-microphone array. The QTrobot is powered by an Intel NUC processor and Ubuntu 16.04 LTS, providing it with a native Robot Operating System (ROS) interface.

Robot Power Model
Throughout an HRI-based telepresence session in the classroom, the robot performs different interaction patterns with the students. Generally, social robotics research favors the inclusion of new technologies and approaches in the robot and its interfaces to enhance operator experience and perceived usability (e.g., offering multiple devices for control with augmented information on the robot's environment using lasers or 3D cameras, adding arms for enhancing the acceptability of the robot). Therefore, it is crucial to consider all hardware components involved in the robot routine to achieve better energy efficiency and, in some cases, to avoid specific software solutions that increase overall power consumption σ: Hence, the consumption model σ proposed here is associated with hardware components P HW , which have a consumption associated with behaviors defined from the software point of view (P SW ).

Hardware Perspective
From the hardware perspective, this study generalizes robot power consumption. The model for the robot is simplified into the next formula according to the approach suggested by Mei et al. [31]: An electronic device can function as both a sensor and actuator, e.g., a motor where it is possible also to measure movement by analyzing its consumption. However, to reduce complexity, we assign one device per action. For instance, a piezoelectric transducer in a speaker produces sounds, and we will not consider it as a microphone.
• Sensors. Every robot has a set of sensors that measure various physical properties associated with the robot and its environment. Again, we follow Mei et al. [31], who suggested a linear function to model the power consumption of sensors: The proposal connects the sensing power (P s ) to two physical constants associated with the device, and is also coupled with the sensing frequency. • Actuators. This study simplifies the actuators into motors, which convert electrical energy into mechanical energy. The motor power consumption is associated with the mechanical engine and the transforming loss related to friction or resistance, such as those associated with grasping and manipulating objects or the surface where the robot is moving. Once more, Mei et al. [31] proposed a possible model to be applied in this case: A motor's motion power P m is associated with mass m, v represents the velocity, and a defines the acceleration. P l defines the transforming loss, and m(a + gµ)v is the mechanical power, where g is the gravitational constant. • Main Controller Unit. This element is responsible for managing the robot and parts of the controllers with their associated devices. It comprises the central processing unit (CPU) and can sequence and trigger robot behaviors using the different hardware elements under its controls. This study simplifies the model in that all components are evaluated together, and it does not distinguish between the directly (hard drive and fans) and indirectly measurable (network ports) devices. • Others. There are different devices that need to be considered, such as controllers, routers, external fans, batteries, displays, or speakers. Each one is modeled on its own, and product specifications define their values.

Software Perspective
To assess the power consumption impact of a particular software, this study uses the model presented by Acar et al. [32], which encompasses the following model subtypes: • CPU Model. The power consumed by a specific process given a set of constants and the percentage of CPU use over a period of time. • Memory Model. The power that a process needs when triggering one of the four states of the random access memory (RAM): read, write, activate, and precharge. • Disk Usage Model. The power consumption associated with read/write processes of a given application when the disk is in active mode.
Combined, these yield the following formula for modeling the software-related power consumption: At times, the power consumption associated with sending or receiving data through the network is also considered. However, as this is highly specific to how the various ROS topics and the associated ROS message types are defined, we opt for a model that does not include these data.

QTrobot Power Model
While Section 2.2 discusses the generalized robot power model, this section details how it specifically maps onto the QTrobot platform employed in this study. The resulting model constitutes the basis for the experiments and measurements discussed later.

Hardware Perspective
Physically, the QTrobot (cf. Figure 2) is a non-mobile platform. The power supply unit is a strictly one-way single-entry point system from a regular plug dock of 220 V. As detailed in Section 2.1, the QTrobot has eight DoFs for arm and head movements. In addition, it includes a few extra actuators and sensors-mainly a RealSense camera, as well as a microphone array. The robot also includes a speaker that is used for presenting audible information. Furthermore, it has an eight-inch display and an Intel NUC, and we generalize other components that potentially consume energy in the system (e.g., fans).
• NUC: The robot integrates an Intel NUC i7 computer running an Ubuntu 16.04 LTS operating system with 16 GB of RAM. The NUC kits are well known for their bounded consumption [33]. • Camera: The QTrobot is equipped with an Intel RealSense D435 depth camera. According to the Intel documentation [34], it demands 0.7 Amps as a feeding source for operating. • Motors: The robot has eight motors rendering eight DoFs for the robot's neck and two arms. The neck's two motors provide pitch and yaw, while each arm contains two motors in the shoulders and one in the elbow. It is out of this work's scope to evaluate the motor efficiency, so we generalize the power consumption without explicitly dealing with copper, iron, mechanical, and stray losses. • Display: QTrobot features an LCD panel that is active from the moment that the robot is switched on. This eight-inch multicolor graphic TFT LCD with 800 × 480 pixels mainly shows direct animations of facial expressions. The current version does not allow changes to the backlight brightness, so it is assumed to work under the same operating voltage and current as the robot. It is not possible to measure or extract more information about its consumption without disassembling the display. • Speaker: The robot has a 2.8 W stereo-class audio amplifier with a frequency rate of 800-7000 Hz. • Other: Any regulators, network devices, or other control mechanisms beyond our knowledge that somehow drain power.
Combined, these yield the following formula for modeling the hardware-related power consumption: P HW = P NUC + P Camera + P Motors + P Display + P Speaker + P Other (6) As mentioned above, the procedures of handling the power consumption of each element individually would require disassembling the robot; therefore, this study manages consumption as a single consumption element.

Software Perspective
From the software perspective, the power consumption is static or dynamic. Static power consumption results from the component's features as defined by the manufacturer, and therefore remains unalterable. Hence, we only model and investigate dynamic power consumption, which depends on the software's implementation and source code.
The main applications used in the IRT experience in this article are: • Robot Operating System (ROS) [35]: ROS is considered the de facto standard for robotics middleware. It provides a set of libraries and tools for building and running robot applications. Such components are notorious in the robotics community for having a higher CPU consumption than other software components deployed in the robot. Specifically, Find_Object_2D [38] is used, which is a webcam-based feature extractor employed to detect objects. Upon detection, the component publishes the object ID and its position on a dedicated ROS topic.

Telepresence
The different IRT solutions discussed in this research (cf. Figure 1) have slightly different rationales behind their development: on the one hand, the VR-based solution for full immersion, and on the other hand, the mobile-app-based solution for maximum availability.

VR-Based IRT
VR technology comprises two central elements: the hardware, i.e., all physical components conveying the experience of and interaction with the environment, such as screens, gloves, and controllers, and software, which allows the development of the virtual environments.
In terms of hardware, we employ the Oculus Rift 3 headset together with the accompanying Touch controllers for the VR-based telepresence solution (cf. Figure 1, top right). The HMD consists of two PenTile OLED displays with an overall resolution of 2160 × 1200 at 90 Hz and a 110 degree field of view. This dual-display arrangement is complemented by two adjustable lenses that rectify the 1080 × 1200 image for each eye to create a stereoscopic 3D image. The headset features rotational and positional tracking and comes with integrated headphones supporting 3D-audio effects. The Oculus Touch controllers utilize the same low-latency tracking technology as the headset, providing a setup with joysticks and buttons for input and the opportunity for haptic feedback. Both the headset and the controllers are tracked using Oculus' Constellation sensors, a pair of external infrared cameras mounted on dedicated desk stands.
Developers can easily integrate the Oculus hardware with existing game engines, such as Unity, to create realistic VR experiences. We chose Unity, since it provides the flexibility to deploy and develop the software on a wide range of different platforms [39]. Moreover, it has a large community of developers, and there are previous results where the engine has been put to good use in robot-VR scenarios [39,40]. Different ROS topics communicate the essential data between the robot and the VR setup; worth mentioning are the two separate streams provided by the stereo camera, which are necessary to generate the dual image that is later corrected by the HMD's lenses (cf. the render view in Figure 3a). The VR-based IRT solution favors direct interaction control and the use of different sensory channels. The operator's head and arm motions are directly forwarded and translated to the robot interacting with the users and bystanders. Other input, such as audio, can be also be forwarded and processed in real time. Botev and Rodríguez Lera [10] provided further details on the software architecture, user interface, and various interaction modes.

App-Based IRT
The app-based telepresence solution (cf. Figure 1, bottom right) aims at providing the same feature set and functionality as in VR-based IRT, but on a more widely available hardware platform. Due to its dominant global market share of around 85 percent [41], we chose Android as the target platform for the mobile application, which was also developed in Unity (Version 2019.3.0a8). For the measurements, the application ran on a Samsung A40 smartphone (Model SM-A405FN) with Android 10 installed. It has 4 GB of RAM and comes with a 5.9-inch Super AMOLED display with FHD+ resolution of 2160 × 2340 pixels, powered by an Exynos 7885 CPU and Mali-G71 MP2 GPU. The built-in sensors include an accelerometer, fingerprint sensor, gyroscopic sensor, geomagnetic sensor, hall sensor, light sensor, and proximity sensor.
The smartphone's gyroscopic sensor is used to translate head motion. Simultaneously, virtual joysticks with separate elbow motion sliders rendered on a semi-transparent layer on top of the robot's camera stream (cf. the render view in Figure 3b) allow the operator to control arm movement. The partial control of some motors via touch input together with the perceived screen size hampers immersion, but ensures full controllability. The robot transmits an unprocessed, single-camera image to the device via a dedicated ROS topic. The other topics are nearly identical to those of the VR-based IRT implementation, with their data only rendered in a different way.

Experiment Modes and Measurements
To measure the robotic system's energy consumption in the different experiments, we used a Fluke 289 True-RMS Digital Multimeter capable of high-resolution and eventand interval-based logging of the electric current. The multimeter was placed between the outlet and the robot's power supply to intercept and record the actual, unalloyed AAC values.
In addition to the externally gathered data, internal statistics were compiled using system tools and a dedicated ROS profiling tool [42]. This tool harvests statistics about each captured event, bounding the window start/stop, samples, and threads running. An event contains information about CPU usage (as a percentage of total local use), virtual memory use, and real memory use, which helps with the evaluation of P So f tware (cf. Section 2.2.2).
Moreover, ROS bags were used for storing different performance and message data on the robot during operation. The bags were created by subscribing to dedicated ROS topics and storing the received data in an efficient file structure.
The experiment modes employed for evaluating the impact of the different IRT configurations can be classified into two behavioral categories: • Test/Calibration: Robot behavior associated with various motion and gestures to check the motor status and perform calibration tasks. • Natural: Classic HRI-related robot behavior comprising motions and gestures such as greetings, head tilts, or hand rubbing.
The next sections detail the various routines that we devised for the baseline and realistic measurements, introducing the different experimental modes that form the basis of the results discussed in Sections 3 and 4.

Baseline
To better evaluate the performance and power consumption with natural behaviors and provide a common baseline for the measurements, we integrated two different testspecific modes into our experimental setup. On the one hand, we assessed the robot in idle mode, i.e., the QTrobot at the beginning or end of a session without any external or scripted activity. On the other hand, a scripted demo routine provided a further indication of the aggregate cost of coordinated motion using the robot's different actuators. The demo relied on the basic interfaces supplied by the manufacturer to trigger predefined motion sequences as follows: The robot (1) spreads its arms while moving its head to look up, (2) moves its head from left to right and, at the same time, twists and folds the stretched arms so that its hands touch, and (3) moves its head to the upper left while the arms move down, stretch, and settle in the default, relaxed position. Both baseline modes were carried out with and without active 3D tracking to explore the associated cost.

Realistic
This study's central element is the measurement under realistic conditions, i.e., comprising the full experimental setup with VR-and app-based IRT to control the QTrobot. The following calibration routine with simplistic robot movements covers the respective servo motors' full play: (1) left arm (up/down), (2) right arm (up/down), (3) nod of the head (up/down), and (4) shake of the head (left/right). Finally, we tested the system with a natural motion routine mimicking realistic behaviors extending over all DoFs: (1) wave with right arm bent (welcome), (2) fling up both arms (cheer), (3) grab and look (investigate), (4) nod swiftly (sneeze), and wave with right arm stretched (farewell). These gestures are examples of purposeful and meaningful interactions for robotic telepresence, as proposed, e.g., in [43,44].

Results
This section presents the experimental results obtained from the profiling tools and power consumption measurements. All experiments were performed in five iterations (or more) but for the baseline measurements, which needed less individual capturing, since no applications were running.
The experiment results are presented below as follows: Firstly, we show the overall real power consumption obtained from the power meter device; secondly, we focus on the software results, where the CPU and RAM consumption is evaluated. In total, we ran 39 different test routines, comprising 14 baseline measurements and 25 measurements under realistic conditions (divided into five loops each plus an extra five rounds for calibration in VR-based IRT). The 14 baseline measurements are distributed into four different collections: (1) idle, default robot behavior; (2) idle with NUI, default, plus NuiTrack application; (3) demo, default robot behavior running find-object-2d for entertainment purposes, and (4) demo with NUI, similarly to the previous case, but adding NuiTrack. The reason for using five extra loops for VR-based IRT was the appearance of non-normal outlier cases during data analysis; thus, after removing some of them, we decided to maintain ten iterations. Overall, the experimental results supported our core hypotheses and are consistent with related research.

Hardware Perspective
The robot's regular power consumption was measured using the external multimeter (cf. Section 2.5) for periods of approximately one minute, during which data on the electric current were recorded.
Research question 2.2 ("What is the cumulative energy consumption?") is partially answered with Table 1, which details the robot power in watts in both baseline and realistic (i.e., app-and VR-based IRT) scenarios, underlining the importance of specific applications in the robot. Firstly, the baseline cases without NuiTrack had a 44% and 37% lower power consumption. Secondly, there was a measurable difference between the two IRT options, with the VR-based approach producing a slightly higher consumption. This establishes that telepresence proportionally doubles the overall power consumption, which natural-input, VR-based IRT further exacerbates. Figure 4 illustrates the cumulative consumption for the different experimental modes as a function of power over time based on the events captured by the multimeter. The x-axis indicates the variable count (over time), and the y-axis presents the consumption in Amps. Looking at the realistic modes, we can discern two cases: (1) calibration (Figure 4a,b), where the line slightly fluctuates around the mean, which is below 0.5, and (2) natural (Figure 4c,d), where the line shows a pronounced fluctuation around a mean of above 0.5 for VR-based IRT and below 0.5 in app-based IRT. The baseline (Figure 4e,g) and demo (Figure 4f,h) modes also showed clearly discernible behaviors: The baseline remained mostly constant except for a few sudden spikes, while the demo mode consistently showed dramatic power consumption changes. In these cases, the consumption was around 0.3 Amps in the non-NUI approaches and close to 0.5 Amps in those deploying NuiTrack, which was also reflected in the cumulative power consumption data and relative behavior presented in Table 1.

Software Perspective
This section mostly covers the exploratory research question RQ2.2 ("What are the effects of different immersive technology types on robot performance?"). The question is associated with P So f tware , and we examined the robot performance from two central characteristics: CPU and memory. We refrained from measuring disk usage, assuming that the robot processed everything in memory based on ROS's publish/subscribe paradigm without necessitating paging or using swap memory.

CPU Consumption
The percentage of mean CPU consumption associated with each routine is described in Table 2, showing the cumulative mean CPU loads induced by the ROS processes during the routine tested in each experimental approach. There were six different nodes involved in every experiment associated with ROS: find_object2d, qt_emotion_app, record nodes, rosprofiler, rostopic, and qt_nuitrack_app. Outside of the baseline modes, i.e., during the actual IRT experiments, the following ROS nodes were active: infoJoints, repub, rosapi, and rosbridge_websocket. Table 2 indicates a low CPU consumption of under 15% for the idle and demo baseline scenarios. However, involving NuiTrack or with realistic IRT modes, CPU consumption leaped up to beyond 200%. In contrast, only around 6.4% of the CPU consumption was necessary for running in the simple baseline.
The most distinctive peaks could be observed in the realistic scenarios, specifically during calibration, where maximum values of up to 280% were possible, i.e., more than 30% above the actual mean values.  Figure 5 presents a high-level overview of the mean CPU consumption in percent over time, with the x-axis indicating the number of events registered by the profiling tool. In accordance with the profiling tool's documentation [42], the sample rate was set to 0.1 s with an update rate of 2 s. The y-axis shows the mean CPU load in units of percent for the total number of CPUs; values above 100 imply that more than one CPU core was used. The overlaid plots in Figure 5 show that the baseline graphs level off around their means (cf. Figure 6e,g,h). However, the demo mode with active NuiTrack exhibited four drastic drops in find_object_2d. This might have been due to the app being blocked during a short period of time, and therefore, its performance decreased. We included this data in tabular form (cf. Table 3) for clarification; however, the specific behavior of this application is out of this article's scope, and these drop points are considered outliers when compared to the otherwise constant behavior of the demo. The app-based IRT approach, in turn, showed constant behavior over time (cf. Figure 6b,d). The other realistic mode, VR-based IRT, showed a more fluctuating behavior over time; however, this appears to be coherent with the scenario's average performance.

Memory Usage
Memory usage is also associated with the software perspective and is one of the elements that affects power consumption. Although the default consumption is low, a component view of RAM power consumption [45] revealed values between 3 and 4.5 Watts for a DDR2 RAM module at 1.8 Volts and 2 Watts for a DDR3 RAM module at 1.5 Volts. Memory size and use should be considered for cases where much memory is needed, and it is likely necessary to perform paging, which includes storing data on a disk, and therefore also adds to the power consumption.
This study presents a general overview of the total RAM used for all ROS nodes running on the system using the same nodes presented in Section 3.2.1 on CPU usage. Table 4 contains the values measured for each scenario, which, again, exhibited consistent behavior across scenarios (baseline and realistic). Baselines without NuiTrack used between 200 (idle) and 270 MB (demo), while the values increased to between 820 and 873 MB with NuiTrack active. Conversely, the realistic scenarios showed means that doubled the mean memory consumption. To illustrate these values, Figure 7 shows a box plot that reveals that the scenarios with active NuiTrack showed a similar behavior to that of the realistic app-based option. Finally, the realistic VR-based solution consumed almost 1.5 GB of RAM.

Collateral Effects of the Telepresence Option
The remote telepresence component is not independent of robot telepresence components, showing that the remote control or visualization tool relies on the hardware and software components available on the robot. For instance, if the robot has stereo-vision services provided by the camera, it will affect runtime performance. This means that a telepresence system component can affect and be affected by other software components that related with IRT or unrelated.
The chosen example is associated with the robot camera, as well as the assumption that a data stream of stereo images will generate more consumption. Particularly, we will analyze the Rosbridge performance. By definition [46], Rosbridge provides a JSON API to ROS functionality for non-ROS applications. In our specific case, ROS needs to work with the Unity framework, which both the VR-based and app-based IRT solutions require. Table 5 presents an overview of the component performance. For VR-based IRT, Rosbridge sends more information due to the requirement of simulating human binocular vision. This requires sending more than one image per frame to perceive depth, nearly doubling the machine's CPU consumption. Moreover, the memory consumption is quite similar for app-based and calibration-based operation. When comparing the results using means, it should be noted that the calibration mode in VR is reported to have a high value. However, when focusing on individual modes' values, this pattern can be correlated with other cases; both work under the same exponent and can be considered similar and minimal.

Impact of Measurement Tools
Measurement tools were essential for determining the power consumption in this study. While the correctness of an external multimeter relies on the device manufacturer's calibration, it is necessary to evaluate the tool's impact when measuring system performance using software.
On the one hand, the ros_profiling tool's impact was minor; its mean CPU load in regular operation was 3.66%, with a mode of 3.5% and a standard deviation of 0.34 (using 692 events throughout the experimental measurements).
Regarding the mean memory usage, the average consumption was 58,836,646.466 MB (standard deviation of 219,576.960 MB) in all cases. This mean value corresponded to the 28% in the baseline excluding the NuiTrack application and 7% in the idle/demo cases including NuiTrack, as well as 3.5% in VR-based IRT. Such values reveal a normalized behavior, but significantly impact the baseline while only marginally impacting the other cases.
On the other hand, we used the rosbag tool during some of the experiments to record from and play back ROS topics afterwards. The tool is preconceived to operate with high performance, and its working mode avoids deserialization and reserialization of harvested messages. Thus, it used around 1% of the mean CPU load and 12 MB of the memory mean.
As we are evaluating the power consumption, it is important to consider the related footprint of recording rosbags (cf. Table 6). The power consumption minimally deteriorates, mainly for the idle baseline scenarios. We observed only slightly increased values; however, that difference is not significant.
To sum up, these results lead to acceptable values for our measurement tools, with only a minimal, negligible impact on the general metrics outside of the baseline cases with their very low mean consumption.

Effect of Immersive Technologies on the Power Consumption Model
When employing novel technologies to create immersive experiences, it is vital to assess and weigh the various cost factors associated with their deployment. For both the VR-based and app-based IRT solutions, the transfer of video data constitutes the largest payload and, thus, the most significant power consumption factor. While there is a sizable difference between transmitting stereoscopic and regular video, the transmission of the video data and commands/controls is-in terms of electricity-completely "overpowered" by the AI-based processing on the QTrobot. However, it is essential to note that the realistic scenarios have a significant impact on the P So f tware trends. In both cases, we continuously measured a CPU consumption of beyond 200%, and the memory consumption in the VR-based IRT setting increased sharply. Unsurprisingly, CPU and RAM consumption reached their lowest levels in the baseline scenarios or the minimal demonstration modes.

Developing Energy-Efficient Demos
To our knowledge, the development of energy-efficient demos remains mostly unexplored in social robotics research. Usually, the different demo modes included with social robots aim at increasing acceptability without involving sustainability considerations. During the experiments, we observed that a demo is highly dependent on depth information or detection services, which constitute the robot's baseline energy consumption. In addition, in light of the economic aspects discussed in the following section, it is advisable to address sustainability issues already when designing demonstration routines [29].

Economic Efficiency
To assess the economic impact of operating a robot or a set of robots, we can estimate the cost associated with power consumption. To this end, we consult the European Union's official electricity price statistics [47] and use the current average of 0.2126 EUR/kWh.
The results obtained through projection of valid measurements (cf. Table 7) indicate that whether or not a demo is running by default on the robot can significantly impact the annual cost and needs to be taken into account. On the other hand, the difference between IRT experiences using VR-based versus app-based telepresence solutions has only a minor impact. However, if the number of robots increases, the telepresence mode might also become a relevant factor with regard to the overall energy footprint. Due to the sporadic nature of the calibration process, it is excluded from the annual projection.
To put these results further into context, consider a typical desktop computer setup that uses an average of 200 Watt hours (Wh); the computer itself has an average per-hour consumption of 171 W, the network card uses 10 W, the printer uses 5 W, and the speakers use 20 W. Assuming that the computer is also operated for six hours a day, the annual consumption amounts to circa 450 kWh (i.e., 95.535 EUR/year). This corresponds to CO 2 emissions of more than 130 kg per year, which is roughly 1.3 percent of the average total emissions of a Belgian national [48].

Social Efficiency
Using a robotics platform for remote presence and IRT in different assistive scenarios can create new jobs in (health)care and other interactive settings. In addition, beyond the current pandemic situation, it is foreseeable that IRT has the potential to enable new interactive applications and create a positive impact in various collective contexts and any type of event, even beyond educational or healthcare scenarios.
At the same time, it is necessary to increase the number of sensors and actuators associated with HRI in IRT to expand the interaction bandwidth and create alternative mechanisms, such as those favoring kinesthetic learning or those requiring dedicated physical feedback.
Again, this requires a careful cost-benefit analysis for the various factors associated with the deployment of these new technologies, particularly classic and resource-hungry approaches, such as the OpenCV object recognition (find_object_2D) or NuiTrack's more modern approach based on deep learning. Table 3 shows that these applications exhibit nearly the same CPU consumption as that of the immersive system. In addition, it is necessary to consider that newer RGBD-range image cameras perform some of these algorithms on board. According to the specs sheet, they can have peaks of 0.7 Amps (more than our current immersive consumption), so manufacturers and developers need to select the type of camera with care.

Conclusions
The in-depth analysis of the VR-based and app-based IRT solutions revealed several interesting aspects related to the initial set of research questions.
In terms of applicability (RQ1), our results confirm the huge potential for immersive telepresence and the associated positive effects, as discussed in [10]. However, when including sustainability considerations (RQ2), different features and interaction modes need to be carefully weighed in a cost-benefit analysis. Particularly, employing AI-based technologies or other computationally expensive features can dramatically influence the cumulative energy consumption (RQ2.1) and, consequently, the system's deployment costs (RQ2.3). Compared to these, immersive technologies, per se, only marginally affect the overall robot performance (RQ2.2), and they can help improve acceptance for operators, users, and bystanders alike without causing overly high costs.
Unless AI-based technologies become substantially more energy efficient, their footprint directly impacts feature sets of IRT systems that are operated continuously and over more extended periods. Similarly, demo routines should not only be created to maximize acceptance while omitting sustainability considerations. Even minor adjustments here can be hugely impactful and can help in deploying more affordable and practical solutions.
Social efficiency in education linked to public goods can only be realized when instruction is effective and learning is universal [49]. With this in mind, IRT can only benefit collectives and society on a larger scale if, on the one hand, teachers adapt lessons, assignments, and assessment methods to immersive interactive scenarios for the instruction to be effective. On the other hand, the required technologies need to be readily available to students for learning to be universal. Therefore, it is vital to not only apply and adapt IRT solutions to the curriculum to increase knowledge transfer and skill development, but to extend the use of IRT technologies in the classroom beyond academic approaches.
Integrating IRT systems with robotic platforms entails additional evaluations by manufacturers and developers alike in order to optimize efficiency and battery consumption. Such integration requires an in-depth and fine-grained analysis of all the processes running on the robot and the impact of each algorithm selected to perform specific tasks.
Furthermore, it is essential to analyze flexibility and potential pitfalls when considering robotics in general and IRT systems in particular [22]. Albeit guided by teachers, traditional classes are already highly dynamic settings, where alongside lessons, students ask questions in the classroom, i.e., the same location. The IRT solution and software embedded in the robot platform, as well as the IRT hardware and robot skills available to the students at home, determine the set of possible tasks that students can perform and need to be carefully adapted to the different scenarios.