CHARMIE: A Collaborative Healthcare and Home Service and Assistant Robot for Elderly Care

: The global population is ageing at an unprecedented rate. With changes in life expectancy across the world, three major issues arise: an increasing proportion of senior citizens; cognitive and physical problems progressively affecting the elderly; and a growing number of single-person house-holds. The available data proves the ever-increasing necessity for efﬁcient elderly care solutions such as healthcare service and assistive robots. Additionally, such robotic solutions provide safe healthcare assistance in public health emergencies such as the SARS-CoV-2 virus (COVID-19). CHARMIE is an anthropomorphic collaborative healthcare and domestic assistant robot capable of performing generic service tasks in non-standardised healthcare and domestic environment settings. The combination of its hardware and software solutions demonstrates map building and self-localisation, safe navigation through dynamic obstacle detection and avoidance, different human-robot interaction systems, speech and hearing, pose/gesture estimation and household object manipulation. Moreover, CHARMIE performs end-to-end chores in nursing homes, domestic houses, and healthcare facilities. Some examples of these chores are to help users transport items, fall detection, tidying up rooms, user following, and set up a table. The robot can perform a wide range of chores, either independently or collaboratively. CHARMIE provides a generic robotic solution such that older people can live longer, more independent, and healthier lives. ﬁnal ﬁnal


Introduction
The world's population is consistently growing older, with the over-65 age group overgrowing all other age groups. The predicted worldwide growth in the elderly population is from 702 million in 2019 to over 1.5 billion by 2050 [1]. In 2014, the over-55-years old population outnumbered persons aged 15 to 24 years old, and by 2035 it is expected that the ages 0 to 14 will also be outnumbered [2]. The elder age group is estimated to continuously grow and outnumber all youth and child populations under 24 years old by 2050. It is estimated that the percentage of elderly people (the over-65 age group) in the European Union will grow from 19% to 29% over the next approximately 50 years [3]. Population ageing and public health expenses primarily dedicated to older dependent persons presents significant challenges, with implications not just on the social aspect but also economically [4]. The World Health Organization (WHO, Geneva, Switzerland) even developed a global strategy and action plan on ageing and health that focused, among other strategic objectives improving measurement, monitoring, and research on healthy ageing [5].
other strategic objectives improving measurement, monitoring, and research on healthy ageing [5].
These statistics/data prove the growing need for efficient elderly care solutions regarding therapy, rehabilitation, companions and activity planning, but most importantly, healthcare robotics [6][7][8][9] that are capable of aiding in day-to-day tasks, collaboratively or independently. Healthcare generic service and assistive robots can provide practical help to increase the elderly population's life quality, improving cognitive and physical health. These robots can play an essential role concerning healthcare support and independent life, especially when problems related to ageing start to appear. Moreover, service robots provide safe healthcare assistance in public health emergencies such as the SARS-CoV-2 virus (COVID-19) [10][11][12].
The Collaborative Healthcare/Home Assistant Robot by Minho Industrial Electronics (CHARMIE), represented in Figure 1, is an anthropomorphic healthcare and domestic service and assistive robot capable of performing tasks in non-standardised environmental settings. The social goal of the CHARMIE project is the development of a robot capable of aiding in nursing homes, healthcare facilities and domestic houses, among other settings. The scientific objective of CHARMIE is the development of a multifaceted anthropomorphic robot capable of performing a broad set of tasks based solely on machine learning solutions that allow the robot to learn how to perform and improve tasks via observation and trial-and-error direct interaction with the environment.  Some service robots are already being implemented in geriatric care [13]. The development of such robots is encouraged with funding from the European Union with projects such as Hobbit [14], ENRICHME [15], ARNA [16] and Sobi [17]. These four robots go beyond the simpler pet-like social companion robots already possessing sensors and actuators that allow them to perform more complex tasks. Focusing on enhancing older people's well-being, some examples of tasks already performed are user entertainment, object manipulation and transportation, empathic and social human-robot interaction, monitoring persons and home-related chores. MOnarCH [18] presents a multi-robot cognitive system whose operation is already being tested in hospitals. It targets the development of a

Materials and Methods
To perform the wide variety of non-standardised chores, CHARMIE's system and hardware went through different development choices regarding hardware components and related dependencies. The complexity required by some tasks dictated that the robot's system and hardware solutions had to contemplate a significant number of degrees of movement. From the omnidirectional platform to the arms, all components were dimensioned, envisioning the complex movements the robot must make. An example is pushing wheeled trolleys that force the robot to adjust the position of both arms throughout the action and the platform movement that must fit the moving object. Additionally, the anthropomorphic shape of CHARMIE allows users to be more receptive to interact with the robot, as described in [33]. From a practical perspective, it is easier to interact with the real world if the robot has the same shape as a human since the environment is optimised for human-shape interaction. From a social perspective, creating a small human's physical appearance gives the impression users are interacting with a small, friendly robot, which translates into a sense of friendliness and proximity to its users. Even though, at the moment, CHARMIE is not fully anthropomorphic, some parts are, such the arm. The final objective is to reach the anthropomorphic level presented in Figure 1a,b.

System and Hardware
As described in Figure 2, CHARMIE's hardware [34] can be divided into four sections: (i) the motion platform; (ii) the robot arms; (iii) the lifting mechanism and torso; and (iv) the robot head. The goal is to provide solutions that best fit generic service tasks to improve elderly care. Thus, [34] provides a more in-depth description of hardware sections (i) and (ii). One aspect that must also be referred to regarding the robot's design is its anthropomorphic design. One study concerning robot shape indicates that the robot's visual design is directly related to the number of human-robot interactions initiated by humans [35]. The anthropomorphic shape grants the people who interact with the robot a greater sense of comfort or friendliness than differently shaped robots. Different shapes made human users more reluctant to interact with the robot due to fear of the unknown and its movement. Having the shape of a human body allowed the robot to have a higher number of interactions, resulting in more tasks performed by the robot, thus helping more significantly its users, both the elderly and healthcare workers. Another significant advantage of the anthropomorphic shape is that every day-to-day human environment is refined to the human body height-wise, weight-wise, and shape-wise. This aspect facilitates robot interaction with day-to-day environments without requiring an adaptation of the world to best fit the robot's capabilities. With this shape, the concept is precisely the opposite: adjust the developed robot shape to enhance its interaction with human environments, rather than adapting every human environment the robot must interact with.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 31 robot, as described in [33]. From a practical perspective, it is easier to interact with the real world if the robot has the same shape as a human since the environment is optimised for human-shape interaction. From a social perspective, creating a small human's physical appearance gives the impression users are interacting with a small, friendly robot, which translates into a sense of friendliness and proximity to its users. Even though, at the moment, CHARMIE is not fully anthropomorphic, some parts are, such the arm. The final objective is to reach the anthropomorphic level presented in Figure 1a,b.

System and Hardware
As described in Figure 2, CHARMIE's hardware [34] can be divided into four sections: (i) the motion platform; (ii) the robot arms; (iii) the lifting mechanism and torso; and (iv) the robot head. The goal is to provide solutions that best fit generic service tasks to improve elderly care. Thus, [34] provides a more in-depth description of hardware sections (i) and (ii). One aspect that must also be referred to regarding the robot's design is its anthropomorphic design. One study concerning robot shape indicates that the robot's visual design is directly related to the number of human-robot interactions initiated by humans [35]. The anthropomorphic shape grants the people who interact with the robot a greater sense of comfort or friendliness than differently shaped robots. Different shapes made human users more reluctant to interact with the robot due to fear of the unknown and its movement. Having the shape of a human body allowed the robot to have a higher number of interactions, resulting in more tasks performed by the robot, thus helping more significantly its users, both the elderly and healthcare workers. Another significant advantage of the anthropomorphic shape is that every day-to-day human environment is refined to the human body height-wise, weight-wise, and shape-wise. This aspect facilitates robot interaction with day-to-day environments without requiring an adaptation of the world to best fit the robot's capabilities. With this shape, the concept is precisely the opposite: adjust the developed robot shape to enhance its interaction with human environments, rather than adapting every human environment the robot must interact with.

Motion Platform
For service and assistive robots that operate in highly dynamic environments with humans working beside them, it is a significant advantage to have a platform that allows

Motion Platform
For service and assistive robots that operate in highly dynamic environments with humans working beside them, it is a significant advantage to have a platform that allows movement in any direction at any point in time. The locomotion system is the only Appl. Sci. 2021, 11, 7248 5 of 30 part of CHARMIE that is not intended to be anthropomorphic (bipedal) for stability and simplicity reasons. The motion platform developed uses four omnidirectional wheels with individual suspension systems. This design was conceived with the robot's predicted interaction environments in mind, mainly large/medium indoor environments such as hospitals, healthcare facilities, nursing homes and houses. The wheels are 203 mm double aluminium omnidirectional wheels with roller bearings. When motion platforms use three omnidirectional wheels, and the center of mass is considerably high, the platform may be at risk of falling under certain circumstances. If a linear momentum is applied in the 120 • gap between wheels the robot may lose balance and end up falling. With the addition of the fourth wheel and consequential reduction of the degree gap between wheels the robot significantly improved in safety and stability. However, with this addition, it is possible for a wheel to occasionally lose contact with the floor due to slight surface irregularities or slopes. Locomotion wise, this translates into unpredictable and incorrect movement. Thus, a compact MacPherson [36], Figure 3a, suspension system was developed to overcome floor irregularities, small bumps, and slope variations while improving its control smoothness. The motion platform is a regular octagonal shape with a 54 cm diameter designed so CHARMIE fits through every standard door frame size. It is purposely heavy (~20 kg without batteries) to guarantee a low centre of mass for safety reasons, ensuring the robot does not fall if an external force pushes it. Additionally, this motion platform can transport a load of approximately 65 kg, as shown in Figure 3b. In [34], a more in-depth analysis of the motion platform and suspension system is presented.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 31 movement in any direction at any point in time. The locomotion system is the only part of CHARMIE that is not intended to be anthropomorphic (bipedal) for stability and simplicity reasons. The motion platform developed uses four omnidirectional wheels with individual suspension systems. This design was conceived with the robot's predicted interaction environments in mind, mainly large/medium indoor environments such as hospitals, healthcare facilities, nursing homes and houses. The wheels are 203 mm double aluminium omnidirectional wheels with roller bearings. When motion platforms use three omnidirectional wheels, and the center of mass is considerably high, the platform may be at risk of falling under certain circumstances. If a linear momentum is applied in the 120° gap between wheels the robot may lose balance and end up falling. With the addition of the fourth wheel and consequential reduction of the degree gap between wheels the robot significantly improved in safety and stability. However, with this addition, it is possible for a wheel to occasionally lose contact with the floor due to slight surface irregularities or slopes. Locomotion wise, this translates into unpredictable and incorrect movement. Thus, a compact MacPherson [36], Figure 3a, suspension system was developed to overcome floor irregularities, small bumps, and slope variations while improving its control smoothness. The motion platform is a regular octagonal shape with a 54 cm diameter designed so CHARMIE fits through every standard door frame size. It is purposely heavy (~20 kg without batteries) to guarantee a low centre of mass for safety reasons, ensuring the robot does not fall if an external force pushes it. Additionally, this motion platform can transport a load of approximately 65 kg, as shown in Figure 3b. In [34], a more indepth analysis of the motion platform and suspension system is presented.
(a) (b) Figure 3. (a) MacPherson suspension system designed for CHARMIE to improve stability and guarantee all four wheels are in contact with the floor surface. Image from [34]. (b) CHARMIE's motion platform load transportation experimental test. The platform was tested, moving for approximately 5 min while maintaining the movement performance and stability. The load in the picture shows an average adult human weight of approximately 65 kg.

Lifting Mechanism and Torso
A significant number of the tasks that generic service and assistive robots must perform involve object manipulation. The motion platform allows the robot to move throughout the environment on the x-axis and y-axis. To interact at different heights with objects such as cabinets with shelves at different heights or with users, from children to adults, it is of extreme value to implement a system that allows the robot to have a z-axis DOF (Degree of Freedom). By implementing a lifting system, the workspace of the redundant manipulators attached to the torso increases. Moreover, the robot can move up and down in its z-axis, allowing it to interact with objects at different heights, from picking/placing objects from/to the floor to interacting with objects on tables, counters and shelves, among others. In the elderly care context, the robot needs to collect things from the floor. This is

Lifting Mechanism and Torso
A significant number of the tasks that generic service and assistive robots must perform involve object manipulation. The motion platform allows the robot to move throughout the environment on the x-axis and y-axis. To interact at different heights with objects such as cabinets with shelves at different heights or with users, from children to adults, it is of extreme value to implement a system that allows the robot to have a z-axis DOF (Degree of Freedom). By implementing a lifting system, the workspace of the redundant manipulators attached to the torso increases. Moreover, the robot can move up and down in its z-axis, allowing it to interact with objects at different heights, from picking/placing objects from/to the floor to interacting with objects on tables, counters and shelves, among others. In the elderly care context, the robot needs to collect things from the floor. This is primarily due to older people having significant difficulties in lowering themselves to pick up something from the floor, which sometimes leads to falls or injuries that can be avoided with the proposed robot solution. The initial lifting system solution shown in Figure 4a,b consists of a ball screw spindle mechanism. The torso that is in the threaded shaft has two linear bearings connected to aiding beams that go from the motion platform to the head. By activating the motor, the torso moves linearly up and down, providing the lifting mechanism. In this system, only the torso moves, and the only parts assembled to the torso are the redundant manipulators.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 31 only 1 DOF that controls the ankles, the knees and the hips of the robot, allowing a complete squatting movement. In [34], a more in-depth analysis of this anthropomorphic lifting mechanism is presented.

Robotic Arm
The arm's design objective was to develop an affordable, lightweight component with a human-like design. The initial design demonstrates an autonomous robotic manipulator with four degrees of freedom, named the Robotic Arm for Collaboration with Humans in Industrial Environment (RACHIE) [39]. This robot's primary goal was to perform a simplified service pick and place task with human interaction to sort cans according to their colour and shape. By simplifying some of the tasks that generic service robots must perform, it allowed the development of a robotic arm that fulfilled all of the objectives previously stated. However, this manipulator did not provide the degrees of freedom necessary for some generic service robot tasks' complexity. The desired solution, to also comply with the anthropomorphic objective, should present similarities to a shoulder, elbow and hand redundant manipulator [40].
The solution implemented in CHARMIE is based on the InMoov arm [41], initially designed by Gaël Langevin. InMoov consists of the first Open Source 3D printed life-size robot. The main differences between the original InMoov arm and CHARMIE's arm reside Moreover, two new different systems were developed to overcome the problems the initial elevation system had. The first, shown in Figure 4c,d, resembles the lifting mechanism presented by AMIGO [37,38]. Similarly to the initial lifting system, a ball screw spindle mechanism lifts the torso. However, in this implementation, an aluminium tube is connected to the threaded shaft. This system allows the torso to be assembled in the aluminium tube, which with three slider rails can move the torso up and down in a linear movement. The two main differences from the AMIGO elevation system to the initial lifting system design are: (i) instead of moving just the torso and consequently the redundant manipulators in the z-axis, it also moves the head; and (ii) the variation of the robot's height. As previously described, this system can adjust the torso height to best fit its goals. CHARMIE can pick objects from the floor in its lower position, and in its higher position, it can place its arms height-wise, similar to an average size adult human. Since the head is now also present on the torso in this solution, all sensors and the multimodal user interface can alter their height. This system can now move the head height-wise, positioning the camera height to better analyse the environment using the computer vision system. In the initial implementation system, the only degree of freedom was a rotation on the neck so the robot could look up and down. Often, this DOF proved insufficient to analyse the environment since, at times, the robot needed to place its head at the height of what it had to analyse for better results. Computer vision wise, object and user detection are simplified by allowing the robot to position itself at the best angle. This system grants this movement that improved all computer vision-related tasks significantly. The multimodal user interface previously connected to the head can now also move up and down to best fit the user's height. The other advantage of this system is the ability to change the robot's height. Psychologically, it is highly advantageous to alter the robot's height according to the person with which it interacts. Similarly to a human body, when the robot moves down to pick an object from the floor, it moves all of its body with that purpose. The initial solution could only move the arms, not following the anthropomorphic ideology. The field tests developed demonstrated that people were more interactive with this version of the robot due to it being more anthropomorphic. The robot altered its height to be slightly smaller than the user it was interacting with. Creating a sense of inferiority to the human reduces the users' reluctance to intervene with the robot. This way, CHARMIE can operate most features in a domestic environment while at the same time having a friendly appearance The second lifting mechanism under testing, shown in Figure 4e,f is based on the human's legs. For this elevation system, the following requirements were taken into consideration: (i) structural integrity to support the robot's weight; (ii) anthropomorphic look; (iii) self-locking actuation to reduce energy consumption; and (iv) allowing the robot to squat, increasing its workspace and making it able to interact with objects on the floor. This lifting mechanism has the same advantages as the first lifting mechanism, all of the upper body can move up and down, and the robot's height can change, with the addition of resembling the human body even more. Although the mechanical system's complexity increases substantially in this iteration, a design based on the same mechanical principles is being made, which results in a more straightforward and more reliable solution with a significant increase in robustness. Thus, this mechanism was developed to encompass only 1 DOF that controls the ankles, the knees and the hips of the robot, allowing a complete squatting movement. In [34], a more in-depth analysis of this anthropomorphic lifting mechanism is presented.

Robotic Arm
The arm's design objective was to develop an affordable, lightweight component with a human-like design. The initial design demonstrates an autonomous robotic manipulator with four degrees of freedom, named the Robotic Arm for Collaboration with Humans in Industrial Environment (RACHIE) [39]. This robot's primary goal was to perform a simplified service pick and place task with human interaction to sort cans according to their colour and shape. By simplifying some of the tasks that generic service robots must perform, it allowed the development of a robotic arm that fulfilled all of the objectives previously stated. However, this manipulator did not provide the degrees of freedom necessary for some generic service robot tasks' complexity. The desired solution, to also comply with the anthropomorphic objective, should present similarities to a shoulder, elbow and hand redundant manipulator [40].
The solution implemented in CHARMIE is based on the InMoov arm [41], initially designed by Gaël Langevin. InMoov consists of the first Open Source 3D printed life-size robot. The main differences between the original InMoov arm and CHARMIE's arm reside on the bicep and shoulder. The whole arm is printable on a 12 × 12 × 12 cm 3D, and its PLA parts weigh around 1.414 kg, with the actuators weighing around 0.766 kg. The whole weight of the arm is approximately 2.2 kg. From shoulder to the hand, the arm measures 75 cm and can lift a maximum load of around 400 g.
This arm can be divided into three main parts: (i) hand and forearm; (ii) bicep; and (iii) shoulder. Most generic service robots with robotic manipulators tend to use grippers to simplify this task. However, the anthropomorphic hand presents more DOFs that allow the robot to best use the hand according to the object, providing greater dexterity capability [42,43]. The hand is composed of five fingers, similar to a human hand, to provide different ways to pick up various objects. Each finger has a different number of joints, as can be seen in Figure 5a. The thumb has two DOF, the index and middle finger have three DOF, and the ring and little finger have four DOF. Each finger has one actuator (servo motor) that moves all DOF of the respective finger. The movement is based on a pulley mechanism, similar to tendons, where the motors are located in the robot's forearm, as shown in Figure 5b with fishing line tendons leading to the fingers. So, regarding kinematic architecture, this is an underactuated hand since, in total, it has seventeen DOF, but only five actuators. The wrist has one servo motor responsible for rotating the hand. The bicep has two servo motors, one responsible for lifting and lowering the forearm, and the second responsible for rotating the bicep and the forearm. The shoulder has two individual DOF for moving the whole arm parallel to the robot body and lifting the whole arm perpendicular to the body. In total, as shown in Figure 5c this redundant manipulator has 22 DOF with 10 actuators, which define the movement of the arm. The Denavit-Hartenberg parameters for the arm are described in the Table 1. This arm can be divided into three main parts: (i) hand and forearm; (ii) bicep; and (iii) shoulder. Most generic service robots with robotic manipulators tend to use grippers to simplify this task. However, the anthropomorphic hand presents more DOFs that allow the robot to best use the hand according to the object, providing greater dexterity capability [42,43]. The hand is composed of five fingers, similar to a human hand, to provide different ways to pick up various objects. Each finger has a different number of joints, as can be seen in Figure 5a. The thumb has two DOF, the index and middle finger have three DOF, and the ring and little finger have four DOF. Each finger has one actuator (servo motor) that moves all DOF of the respective finger. The movement is based on a pulley mechanism, similar to tendons, where the motors are located in the robot's forearm, as shown in Figure 5b with fishing line tendons leading to the fingers. So, regarding kinematic architecture, this is an underactuated hand since, in total, it has seventeen DOF, but only five actuators. The wrist has one servo motor responsible for rotating the hand. The bicep has two servo motors, one responsible for lifting and lowering the forearm, and the second responsible for rotating the bicep and the forearm. The shoulder has two individual DOF for moving the whole arm parallel to the robot body and lifting the whole arm perpendicular to the body. In total, as shown in Figure 5c this redundant manipulator has 22 DOF with 10 actuators, which define the movement of the arm. The Denavit-Hartenberg parameters for the arm are described in the Table 1. The four parameters are described as: Ɵ the rotation around the axis by the angle between the links, the translation along the axis of the distance between the links, the translation along the axis (rotated axis) of the length of the link, and , the rotation about the axis of the twist angle.  The four parameters are described as: θ i the rotation around the Z i−1 axis by the angle between the links, d i the translation along the Z i−1 axis of the distance between the links, l i the translation along the X i axis (rotated X i−1 axis) of the length of the link, and α i , the rotation about the X i axis of the twist angle.
For elderly care specifically, the arm and hand allow the robot to perform essential tasks. Picking up objects from the floor, low tables, counters and shelves and transporting them to the desired place ease many tasks that otherwise would either not be done or be done with many risks associated. With the addition of the anthropomorphic hand, the robot can easily interact with everyday objects specifically designed for the human hand with different types of grips and grasps. In Figure 4 solutions with both two arms and one arm are presented. The real-world robot and real-world tasks only uses one arm, as can be seen in Figure 5c, whereas the simulated robot already uses the two arms.

Robotic Head
The robot head holds the RGB-D camera, the multimodal user interface and the microphone. From a human interaction perspective, the head is the part of the robot the users look at when communicating with the robot, so it must be appealing and functional. Since the robots' head height could not be adjusted in the initial lifting mechanism system, a DOF was introduced to simulate a neck, so the robot could rotate its head, similar to a yes nod movement. This movement allows the robot to adjust its RGB-D camera angle to see objects at different heights. In the medium position, parallel to the motion platform, it can see objects on tables, shelves and people with an above-average size, as can be seen in Figure 6a. In the minimum position, at −60 • from the horizontal field of view, it can see objects on the floor touching the front of the robots motion platform, as can be seen in Figure 6b. Lastly, at +30 • degrees, it can see objects and people higher than the robot in a higher position, as can be seen in Figure 6c. For now, if the robot needs to see further to the sides, it rotates its motion platform, so it is always facing what it is trying to analyse. With the addition of the two new lifting mechanisms, the same system is used just for the RGB-D cameras. This camera is responsible for all visual related tasks such as: (i) user recognition; (ii) pose detection and tracking; (iii) gesture recognition; (iv) obstacle detection in navigation; (v) mapping; and (vi) object learning and recognition. Both the microphone and the multimodal user interface were part of the initial lifting system head. Being able to adapt both systems' height to the user's height allowed a cleaner interaction overall. The microphone is placed at the same height as the users head, allowing better voice recognition in environments with a significant noise level. For elderly care specifically, the arm and hand allow the robot to perform essential tasks. Picking up objects from the floor, low tables, counters and shelves and transporting them to the desired place ease many tasks that otherwise would either not be done or be done with many risks associated. With the addition of the anthropomorphic hand, the robot can easily interact with everyday objects specifically designed for the human hand with different types of grips and grasps. In Figure 4 solutions with both two arms and one arm are presented. The real-world robot and real-world tasks only uses one arm, as can be seen in Figure 5c, whereas the simulated robot already uses the two arms.

Robotic Head
The robot head holds the RGB-D camera, the multimodal user interface and the microphone. From a human interaction perspective, the head is the part of the robot the users look at when communicating with the robot, so it must be appealing and functional. Since the robots' head height could not be adjusted in the initial lifting mechanism system, a DOF was introduced to simulate a neck, so the robot could rotate its head, similar to a yes nod movement. This movement allows the robot to adjust its RGB-D camera angle to see objects at different heights. In the medium position, parallel to the motion platform, it can see objects on tables, shelves and people with an above-average size, as can be seen in Figure 6a. In the minimum position, at −60° from the horizontal field of view, it can see objects on the floor touching the front of the robots motion platform, as can be seen in Figure 6b. Lastly, at +30° degrees, it can see objects and people higher than the robot in a higher position, as can be seen in Figure 6c. For now, if the robot needs to see further to the sides, it rotates its motion platform, so it is always facing what it is trying to analyse. With the addition of the two new lifting mechanisms, the same system is used just for the RGB-D cameras. This camera is responsible for all visual related tasks such as: (i) user recognition; (ii) pose detection and tracking; (iii) gesture recognition; (iv) obstacle detection in navigation; (v) mapping; and (vi) object learning and recognition. Both the microphone and the multimodal user interface were part of the initial lifting system head. Being able to adapt both systems' height to the user's height allowed a cleaner interaction overall. The microphone is placed at the same height as the users head, allowing better voice recognition in environments with a significant noise level.  Moreover, the multimodal user interface also best fits the users' height to be more comfortable to interact with. Thus, from the initial lifting mechanism onwards, these two systems are now part of the lifting mechanism, since it is unnecessary to rotate them similarly to the RGB-D camera.

Components
To accomplish generic service and assistive tasks to aid in day-to-day life, for both the elderly and healthcare workers, CHARMIE must safely navigate in healthcare-related and domestic environments, perceive and track its human users, recognise gestures or poses and detect and manipulate different everyday objects. To achieve this, CHARMIE has a set of sensors and actuators that best fit both its environment and its tasks. With the sensorial data, the robot must perform some low-level cognitive functions that, when combined, allow the robot to perform more complex chores, both independently and collaboratively. The low-level functions can be classified into four groups of tasks: (i) map building and self-localisation; (ii) navigation; (iii) human-robot interaction; and (iv) object detection and manipulation. The training of all systems that require neural networks is made off-line and the main processor is a MSI Cubi 2 mini-PC with Core-i5 processor and 4 GB RAM.

Sensor System
To move safely according to the operating environment and with the purpose to perform the necessary tasks, CHARMIE must have an adequate perception of the following low-level functions:

•
Map building and self-localisation; • Safe navigation (obstacle detection and avoidance); • Human-robot interaction (user and pose/gesture detection); • Object detection and subsequent manipulation.
For map-building and self-localisation, CHARMIE uses two sensors: (i) 2D LiDAR (HOKUYO URG-04LX-UG01) mounted on the motion platform; and (ii) an RGB-D camera (Microsoft Kinect) located on the robot head. The laser range finder provided a 2D map of the environment near the floor, whereas the RGB-D camera provided a 3D map. The combination of both technologies allowed the robot to take advantage of the positive sides of both technologies. The 2D map could detect small objects on the floor or at the motion platform height, while the 3D map illustrates the complete environment map.
For safe navigation in various indoor environments such as hospitals, nursing homes and domestic houses, the robot uses the same sensors as the mapping and self-localisation tasks. Again, the combination of the 2D laser range finder and the RGB-D camera lets the robot detect different types of obstacles and react appropriately. The 2D detects all small objects on the floor that are harder to detect using the RGB-D camera, and the 3D information can detect all other objects at every height. In the motion platform, the motors have encoders embedded to close the control loop. Besides, the motion platform has current and voltage sensors for every motor. These sensors allow CHARMIE to know whether a wheel is not touching the floor, if a motor is stuck and if the robot is pushing against an object.
For human-robot interaction, CHARMIE uses the RGB-D to detect its users, their pose and some specific recognisable gestures. Recent works developed on this robot already started using a different RGB-D camera (Intel ® RealSense™ Depth Camera D455). One of the goals of CHARMIE is to communicate with its human users, both healthcare workers and older people, the same way humans communicate with each other by talking and hearing. Thus, the primary way to communicate with the robot is to speak some set of instructions that are interpreted and converted into tasks. This communication skill allows users who have not been familiarised with technology to easily interact and take the most advantage possible of CHARMIE. The robot has an MV5 Digital Condenser Microphone that allows the robot to hear 360 • and a JBL GO Speakers. To overcome some difficulties that this communication system may present in particular situations, the robot has a multimodal user interface in its body so users can select the necessary tasks.
For object detection, the robot uses its RGB-D camera. The objects that need to be detected are usually on the floor, on counters and tables, or hand delivered by a user to the robot. Thus the focus lies primarily on objects that are either between 50 cm to 120 cm or laying on the floor. To grasp objects that are hand-delivered or on counters the robot adjusts its lifting mechanism to best accommodate its redundant manipulators to the object's position. The same happens to the objects on the floor. That is why all lifting mechanisms can place themselves so that the robot arms can pick objects from the floor.
The robot's sensory system setup [44] is presented in Figure 7, where each sensor location is described on both implementations of CHARMIE's body. In the head, at a maximum height when the lifting mechanism is at the top position, the RGB-D camera is at 1.50 m which can rotate, tilting the head up and down. The initial RGD-D camera used is the Microsoft Kinect for all the tasks. However, with the recent acquisition of the Intel ® RealSense™ Depth Camera D455 camera for CHARMIE, some human pose estimation tasks have already been developed using the new camera. The microphone, the speaker and the multimodal user interface lay on the torso with a maximum height of 1.30 m. The laser range finder is on top of the motion platform at approximately 25 cm of height. All the sensors related to the motors (voltage, current and encoders) are attached to the motors.
difficulties that this communication system may present in particular situations, the robot has a multimodal user interface in its body so users can select the necessary tasks.
For object detection, the robot uses its RGB-D camera. The objects that need to be detected are usually on the floor, on counters and tables, or hand delivered by a user to the robot. Thus the focus lies primarily on objects that are either between 50 cm to 120 cm or laying on the floor. To grasp objects that are hand-delivered or on counters the robot adjusts its lifting mechanism to best accommodate its redundant manipulators to the object's position. The same happens to the objects on the floor. That is why all lifting mechanisms can place themselves so that the robot arms can pick objects from the floor.
The robot's sensory system setup [44] is presented in Figure 7, where each sensor location is described on both implementations of CHARMIE's body. In the head, at a maximum height when the lifting mechanism is at the top position, the RGB-D camera is at 1.50 m which can rotate, tilting the head up and down. The initial RGD-D camera used is the Microsoft Kinect for all the tasks. However, with the recent acquisition of the Intel ® RealSense™ Depth Camera D455 camera for CHARMIE, some human pose estimation tasks have already been developed using the new camera. The microphone, the speaker and the multimodal user interface lay on the torso with a maximum height of 1.30 m. The laser range finder is on top of the motion platform at approximately 25 cm of height. All the sensors related to the motors (voltage, current and encoders) are attached to the motors.

Map Building and Self-Localisation
To perform mapping of the environment, it is necessary to detect the points of interest known as keypoints. Additionally, to calculate travelled trajectory, it is necessary to quantify the movement occurred in-between frames. Rather than using all the pixels from an image to detect keypoints, the FAST [45] algorithm presents a computationally more efficient strategy than other similar solutions. This algorithm works by creating an adjustable bounding circle on every pixel that might resemble a corner. To be a corner, three nearby pixels inside the bounding circle must have a higher intensity than the fourth with a factor . Also, inside the bounding circle, there must be a set of collinear points with

Map Building and Self-Localisation
To perform mapping of the environment, it is necessary to detect the points of interest known as keypoints. Additionally, to calculate travelled trajectory, it is necessary to quantify the movement occurred in-between frames. Rather than using all the pixels from an image to detect keypoints, the FAST [45] algorithm presents a computationally more efficient strategy than other similar solutions. This algorithm works by creating an adjustable bounding circle on every pixel that might resemble a corner. To be a corner, three nearby pixels inside the bounding circle must have a higher intensity than the fourth with a factor µ. Also, inside the bounding circle, there must be a set of collinear points with intensity higher than µ. These two conditions must be satisfied to consider this region a corner, and thus a keypoint. Next, to estimate the visual odometry using just the camera, since Microsoft Kinect does not have an IMU (Inertial Measurement Unit), the selected algorithm uses monocular odometry. It detects the keypoints using the FAST algorithm on consecutive frames and associates these between frames using essential matrix estimate through LMeds algorithm.
The method selected to build the 3D map of the environment was using OctoMap [46], based on octrees to group all points. This method loses part of the detail, which translates into higher computational efficiency. For CHARMIE's purpose, it is more valuable to have a time-efficient algorithm than to have 3D mapping with great detail, since the robot only needs the map to safely navigate to the desired position. Figure 8 shows a point cloud converted into an OctoMap in three different perspectives representing an object on top of a table and a guitar next to the wall. Figure 9 shows an example of a complete indoor environment (office) mapped by CHARMIE.
corner, and thus a keypoint. Next, to estimate the visual odometry using just the camera, since Microsoft Kinect does not have an IMU (Inertial Measurement Unit), the selected algorithm uses monocular odometry. It detects the keypoints using the FAST algorithm on consecutive frames and associates these between frames using essential matrix estimate through LMeds algorithm.
The method selected to build the 3D map of the environment was using OctoMap [46], based on octrees to group all points. This method loses part of the detail, which translates into higher computational efficiency. For CHARMIE's purpose, it is more valuable to have a time-efficient algorithm than to have 3D mapping with great detail, since the robot only needs the map to safely navigate to the desired position. Figure 8 shows a point cloud converted into an OctoMap in three different perspectives representing an object on top of a table and a guitar next to the wall. Figure 9 shows an example of a complete indoor environment (office) mapped by CHARMIE.  intensity higher than . These two conditions must be satisfied to consider this region a corner, and thus a keypoint. Next, to estimate the visual odometry using just the camera, since Microsoft Kinect does not have an IMU (Inertial Measurement Unit), the selected algorithm uses monocular odometry. It detects the keypoints using the FAST algorithm on consecutive frames and associates these between frames using essential matrix estimate through LMeds algorithm. The method selected to build the 3D map of the environment was using OctoMap [46], based on octrees to group all points. This method loses part of the detail, which translates into higher computational efficiency. For CHARMIE's purpose, it is more valuable to have a time-efficient algorithm than to have 3D mapping with great detail, since the robot only needs the map to safely navigate to the desired position. Figure 8 shows a point cloud converted into an OctoMap in three different perspectives representing an object on top of a table and a guitar next to the wall. Figure 9 shows an example of a complete indoor environment (office) mapped by CHARMIE.  The mobile platform's self-localisation is done using Adaptive Monte Carlo Localization (AMCL) [47]. In this algorithm, the robot's pose is represented as a set of multiple hypotheses concerning a prior known map. AMCL combines the information from the mobile platform's odometry and the 2D LiDAR. It starts by performing global localisation, so it is immune to the initial position and, after knowing its location, only performs local tracking using adaptive particle filters.

Navigation (Obstacle Detection and Avoidance)
To safely navigate a previously mapped environment, CHARMIE must detect dynamic and static obstacles that might not be in the environment map and navigate accordingly to overcome these.
Regarding obstacle detection, the robot uses the same sensors as in the mapping function, the 2D LiDAR and the RGB-D camera. The 2D LiDAR is used for small obstacles at the motion platform height. It starts by checking if there is any obstacle inside a 1.50 m radius and calculates both the obstacle position relative to the robot and its size. The RGB-D camera starts with the same principle regarding the 1.50 m radius, calculates its position relative to the robot and size, creating a virtual obstacle from the floor to the robot's height. In Figure 10, an example of the same image from three different angles is displayed, showing an example of data fusion between the RGB camera and the Depth camera. The obstacles derived from the sensors are combined in a temporary virtual obstacle map that is constantly updated. By applying this method in consecutive frames, the direction of movement of dynamic obstacles is also added to the virtual obstacle map. The mobile platform's self-localisation is done using Adaptive Monte Carlo Localization (AMCL) [47]. In this algorithm, the robot's pose is represented as a set of multiple hypotheses concerning a prior known map. AMCL combines the information from the mobile platform's odometry and the 2D LiDAR. It starts by performing global localisation, so it is immune to the initial position and, after knowing its location, only performs local tracking using adaptive particle filters.

Navigation (Obstacle Detection and Avoidance)
To safely navigate a previously mapped environment, CHARMIE must detect dynamic and static obstacles that might not be in the environment map and navigate accordingly to overcome these.
Regarding obstacle detection, the robot uses the same sensors as in the mapping function, the 2D LiDAR and the RGB-D camera. The 2D LiDAR is used for small obstacles at the motion platform height. It starts by checking if there is any obstacle inside a 1.50 m radius and calculates both the obstacle position relative to the robot and its size. The RGB-D camera starts with the same principle regarding the 1.50 m radius, calculates its position relative to the robot and size, creating a virtual obstacle from the floor to the robot's height. In Figure 10, an example of the same image from three different angles is displayed, showing an example of data fusion between the RGB camera and the Depth camera. The obstacles derived from the sensors are combined in a temporary virtual obstacle map that is constantly updated. By applying this method in consecutive frames, the direction of movement of dynamic obstacles is also added to the virtual obstacle map. After parameterising all the static and dynamic obstacles in a nearby radius and defining a target location, CHARMIE uses dynamic non-linear systems [48,49] as a distributed control architecture that generates navigation. Task constraints are component forces that are cast together into the vector field of this dynamical system. For example, the directions = (where obstacles are from the robot's viewpoint) and the directions = (where the target is) are constraints represented by repulsive and attractive forces acting on the heading direction. The attractive force attracts the system to the desired heading direction value, whereas the repulsive forces prevent the system from moving in an undesired direction. As the robot moves, the directions to the target and obstacles in the world variates, and consequently, the attractor and repellers move in the vector field.
As stated above, the virtual obstacle map foresees the direction and size of all obstacles that must influence the robot's trajectories according to the robot's reference. So, a repulsive force is applied to all obstacles: After parameterising all the static and dynamic obstacles in a nearby radius and defining a target location, CHARMIE uses dynamic non-linear systems [48,49] as a distributed control architecture that generates navigation. Task constraints are component forces that are cast together into the vector field of this dynamical system. For example, the directions ϕ = ψ obs (where obstacles are from the robot's viewpoint) and the directions ϕ = ψ tar (where the target is) are constraints represented by repulsive and attractive forces acting on the heading direction. The attractive force attracts the system to the desired heading direction value, whereas the repulsive forces prevent the system from moving in an undesired direction. As the robot moves, the directions to the target and obstacles in the world variates, and consequently, the attractor and repellers move in the vector field.
As stated above, the virtual obstacle map foresees the direction and size of all obstacles that must influence the robot's trajectories according to the robot's reference. So, a repulsive force is applied to all obstacles: where φ is the robot direction, ψ i is the obstacle direction, thus (φ − ψ i ) is the obstacle direction relative to the robot. σ is the angular magnitude on which a repulsive force acts, defined as: where ∆θ is the angle the robot occupies, R robot is the robot radius and d i is the distance from the robot's centre to the obstacle. λ i is the maximum repulsion force, defined as: where β 1 controls the maximum repulsion strength, and β 2 controls the decay rate with increasing distance. The repulsors contributions from all obstacles are summed, creating the repulsor vector field. To get the target position, it starts by transforming the target coordinates into the target direction and apply an attractive force as: where ψ tar is the target direction and λ tar is the attraction force magnitude. To finish the dynamic field vector system, all contributions are added.
In Figure 11, two different scenarios regarding nearby objects are presented. Figure 11a shows three obstacles in red that can be seen inside the maximum security distance circle. Consequently, the repulsors are created considering their distance to the robot and their size, represented in the second graph. In the first graph, the attractor (pink) and the sum of all repulsors (green) are totalled to create the dynamic field vector system (black) that yields the angle to which the robot should rotate to avoid the obstacles. Similarly, in Figure 11b, all the graphs shown represent the same variables as in Figure 11a. where is the robot direction, is the obstacle direction, thus ( − ) is the obstacle direction relative to the robot.
is the angular magnitude on which a repulsive force acts, defined as: where ∆ is the angle the robot occupies, is the robot radius and is the distance from the robot's centre to the obstacle.
is the maximum repulsion force, defined as: where controls the maximum repulsion strength, and controls the decay rate with increasing distance. The repulsors contributions from all obstacles are summed, creating the repulsor vector field. To get the target position, it starts by transforming the target coordinates into the target direction and apply an attractive force as: where is the target direction and is the attraction force magnitude. To finish the dynamic field vector system, all contributions are added.
In Figure 11, two different scenarios regarding nearby objects are presented. Figure  11a shows three obstacles in red that can be seen inside the maximum security distance circle. Consequently, the repulsors are created considering their distance to the robot and their size, represented in the second graph. In the first graph, the attractor (pink) and the sum of all repulsors (green) are totalled to create the dynamic field vector system (black) that yields the angle to which the robot should rotate to avoid the obstacles. Similarly, in Figure 11b, all the graphs shown represent the same variables as in Figure 11a.
(a) (b) Figure 11. Two different variations (a) and (b) of the dynamic field vector system are displayed depending on the various obstacles that the robot faces. The blue and red 2D graph on the right of both images represents the 2D LiDAR data. The bottom graph represents the velocity graph with one attractor to the desired speed. The middle graph represents the influence of the various obstacles the robot faces and consequential repulsors. The top graph represents the angle attractor (pink), the sum of all repulsors (green) and the final dynamic field vector system (black).
The difference between the two images lies in the increase in the obstacle size at the right. This translates into a more aggressive repulsor on the right side, represented in blue in the middle graph. Consequently, both the sum of all repulsors (green) and the dynamic Figure 11. Two different variations (a,b) of the dynamic field vector system are displayed depending on the various obstacles that the robot faces. The blue and red 2D graph on the right of both images represents the 2D LiDAR data. The bottom graph represents the velocity graph with one attractor to the desired speed. The middle graph represents the influence of the various obstacles the robot faces and consequential repulsors. The top graph represents the angle attractor (pink), the sum of all repulsors (green) and the final dynamic field vector system (black).
The difference between the two images lies in the increase in the obstacle size at the right. This translates into a more aggressive repulsor on the right side, represented in blue in the middle graph. Consequently, both the sum of all repulsors (green) and the dynamic field vector system (black) will also be more aggressive. The third graph of each image represents the speed of the robot, also manipulated by an attractor.
To fit CHARMIE's medium-term objective of using machine learning algorithms to solve all of the proposed tasks, a different implementation of the robot's autonomous movement using reinforcement learning was developed. Deep reinforcement learning is primed to revolutionise the field of artificial intelligence. It represents a step towards building autonomous systems with a higher-level understanding than any other learning technique. The application of deep reinforcement learning to robotics allows robots to learn control policies directly from real-world sensorial information through trial-and-error interactions. The tasks performed by service robots such as CHARMIE can be characterised by long-term planning, high-dimensional continuous action-space, and in most cases, incomplete information. Even though this is a problem for some reinforcement learning algorithms, novel solutions such as those presented in [50][51][52][53] already solve a wide range of simulated tasks similarly characterised. The algorithm selected to implement into CHARMIE's motion platform is based on Q-Learning [54,55]. The reinforcement learning setup consists of an agent (CHARMIE motion platform) interacting with the environment (simulated indoor environment) in discrete timesteps. At each timestep, the agent receives an observation, takes an action and receives a scalar reward. The agent uses its sensory information, the 2D LiDAR, to navigate towards the target position avoiding the obstacles that come up in its way, similarly to [56]. A mapless motion planner can be trained end-to-end with no manually designed features nor prior demonstrations through this reinforcement learning method. This trained planner can be directly applied to never before seen environments. In Figure 12, it is demonstrated the evolution of how the reinforcement learning algorithm, without any previous knowledge about the environment, solved three iterative complexity mazes. The first more straightforward maze is learnt just by trial-anderror, and the two following mazes use the knowledge gathered from the previous maze.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 31 field vector system (black) will also be more aggressive. The third graph of each image represents the speed of the robot, also manipulated by an attractor.
To fit CHARMIE's medium-term objective of using machine learning algorithms to solve all of the proposed tasks, a different implementation of the robot's autonomous movement using reinforcement learning was developed. Deep reinforcement learning is primed to revolutionise the field of artificial intelligence. It represents a step towards building autonomous systems with a higher-level understanding than any other learning technique. The application of deep reinforcement learning to robotics allows robots to learn control policies directly from real-world sensorial information through trial-and-error interactions. The tasks performed by service robots such as CHARMIE can be characterised by long-term planning, high-dimensional continuous action-space, and in most cases, incomplete information. Even though this is a problem for some reinforcement learning algorithms, novel solutions such as those presented in [50][51][52][53] already solve a wide range of simulated tasks similarly characterised. The algorithm selected to implement into CHARMIE's motion platform is based on Q-Learning [54,55]. The reinforcement learning setup consists of an agent (CHARMIE motion platform) interacting with the environment (simulated indoor environment) in discrete timesteps. At each timestep, the agent receives an observation, takes an action and receives a scalar reward. The agent uses its sensory information, the 2D LiDAR, to navigate towards the target position avoiding the obstacles that come up in its way, similarly to [56]. A mapless motion planner can be trained end-to-end with no manually designed features nor prior demonstrations through this reinforcement learning method. This trained planner can be directly applied to never before seen environments. In Figure 12, it is demonstrated the evolution of how the reinforcement learning algorithm, without any previous knowledge about the environment, solved three iterative complexity mazes. The first more straightforward maze is learnt just by trial-and-error, and the two following mazes use the knowledge gathered from the previous maze.

Human-Robot Interaction (User and Gesture Detection)
One of the biggest concerns regarding the CHARMIE project is to provide communication tools to ease communication with users, both older people and healthcare workers. To initialise a communication process, the robot must first recognise its users and their pose/gestures. The functionalities regarding user detection demonstrate solutions using

Human-Robot Interaction (User and Gesture Detection)
One of the biggest concerns regarding the CHARMIE project is to provide communication tools to ease communication with users, both older people and healthcare workers. To initialise a communication process, the robot must first recognise its users and their pose/gestures. The functionalities regarding user detection demonstrate solutions using the RGB-D camera, both with and without the depth image. The initial 2D visual user detection algorithm focuses on detecting faces, then crop and align the detected faces and recognise which previously trained, or new users are attempting to communicate.
For this purpose, a Multi-task Cascaded Convolutional Network (MTCNN) [57] is used. It consists of three convolutional networks with different architectures with increasing complexity attached to each other, where the output of the previous net is the input of the next. The first network is known as P-Net (Proposal Network), which generates different proposals of where the faces must be, used as inputs to the following network. Next, the R-Net (Refinement Network) analyses the proposals and filters false positives. The final network, O-Net (Output Network), creates the final output, the detected face's image and its facial landmarks.
To classify the faces, the detected faces output by the MTCNN are introduced as input in an Inception-ResNet v1 neural network [58], transforming it into an image representation vector in space, also known as embedding. This network combines the Inception and ResNet architectures. The Inception architecture uses multiple inception modules, which have different convolutional-layers with different kernel sizes operating in parallel. These filter the same level layer in the architecture, concatenating in the next level, finding various features with fewer convolutional-layers. This process makes the network less deep without losing information. The Resnet network introduces residual connections. In traditional neural networks, a layer feeds data to the next one, but with this algorithm, it sends direct information to a deeper layer. These blocks improve two areas. The training time, given that skip-connections can jump layers without training and the lost information when making gradient descent since more deep networks have a hard time being accurate without overfitting or working more straightforward tasks.
The last step is implemented through an SVC-C (Support Vector Classification), with training and a test dataset. This supervised learning algorithm separates the different sample classes, known as embeddings, using a hyperplane where the margins are optimised through support vectors. Figure 13a shows an application of the MTCNN algorithm, detecting different famous people's faces. It was trained to detect 64 different personalities with the "Labeled Faces in the Wild dataset" [59]. Only using information from the 2D camera it could successfully identify all of the trained people on the test set with certainty percentages of over 60% in the worst cases. Additionally, Figure 13b shows the testing dataset's mapping with PCA (Principal Component Analysis), where the same label/output images are projected close to each other while distanced from different ones. For better visualisation purposes, the photos also have a colour differentiation filter. Even though it can detect 64 different personalities, to maintain the same levels of accuracy, CHARMIE is set to detect 20 different people. the RGB-D camera, both with and without the depth image. The initial 2D visual user detection algorithm focuses on detecting faces, then crop and align the detected faces and recognise which previously trained, or new users are attempting to communicate. For this purpose, a Multi-task Cascaded Convolutional Network (MTCNN) [57] is used. It consists of three convolutional networks with different architectures with increasing complexity attached to each other, where the output of the previous net is the input of the next. The first network is known as P-Net (Proposal Network), which generates different proposals of where the faces must be, used as inputs to the following network. Next, the R-Net (Refinement Network) analyses the proposals and filters false positives. The final network, O-Net (Output Network), creates the final output, the detected face's image and its facial landmarks.
To classify the faces, the detected faces output by the MTCNN are introduced as input in an Inception-ResNet v1 neural network [58], transforming it into an image representation vector in space, also known as embedding. This network combines the Inception and ResNet architectures. The Inception architecture uses multiple inception modules, which have different convolutional-layers with different kernel sizes operating in parallel. These filter the same level layer in the architecture, concatenating in the next level, finding various features with fewer convolutional-layers. This process makes the network less deep without losing information. The Resnet network introduces residual connections. In traditional neural networks, a layer feeds data to the next one, but with this algorithm, it sends direct information to a deeper layer. These blocks improve two areas. The training time, given that skip-connections can jump layers without training and the lost information when making gradient descent since more deep networks have a hard time being accurate without overfitting or working more straightforward tasks.
The last step is implemented through an SVC-C (Support Vector Classification), with training and a test dataset. This supervised learning algorithm separates the different sample classes, known as embeddings, using a hyperplane where the margins are optimised through support vectors. Figure 13a shows an application of the MTCNN algorithm, detecting different famous people's faces. It was trained to detect 64 different personalities with the "Labeled Faces in the Wild dataset" [59]. Only using information from the 2D camera it could successfully identify all of the trained people on the test set with certainty percentages of over 60% in the worst cases. Additionally, Figure 13b shows the testing dataset's mapping with PCA (Principal Component Analysis), where the same label/output images are projected close to each other while distanced from different ones. For better visualisation purposes, the photos also have a colour differentiation filter. Even though it can detect 64 different personalities, to maintain the same levels of accuracy, CHARMIE is set to detect 20 different people. When updating to 3D user recognition technology, some works that use the same camera (Microsoft Kinect) [60] show algorithms that can overcome different face poses, expressions, illumination and disguises. Other solutions, such as FaceNet [61], demonstrate a deep convolutional network trained to directly optimise the embedding itself. The depth solution CHARMIE employs to recognise a small number of users uses various deep convolutional neural networks, similar to the one with just the RGB that integrates both the RGB and the depth image. Figure 14a demonstrates a classification example of two users using the depth information in addition to the RGB. The lack of different trained users can justify the significantly higher values of certainty compared to the 2D solution. However, some problems, such lightning and the pictures of users, can be successfully filtered with the depth camera's addition. An example of the lightning condition being filtered is shown in Figure 14a right image. Figure 14b shows the two images input to the neural networks, the RGB and the depth matrix.
When updating to 3D user recognition technology, some works that use the camera (Microsoft Kinect) [60] show algorithms that can overcome different face expressions, illumination and disguises. Other solutions, such as FaceNet [61], d strate a deep convolutional network trained to directly optimise the embedding itse depth solution CHARMIE employs to recognise a small number of users uses v deep convolutional neural networks, similar to the one with just the RGB that inte both the RGB and the depth image. Figure 14a demonstrates a classification exam two users using the depth information in addition to the RGB. The lack of different t users can justify the significantly higher values of certainty compared to the 2D sol However, some problems, such lightning and the pictures of users, can be succes filtered with the depth camera's addition. An example of the lightning condition filtered is shown in Figure 14a right image. Figure 14b shows the two images input neural networks, the RGB and the depth matrix.
After recognising the users, the robot is establishing an interaction with, CHA analyses their poses. Essential information can be interpreted from the pose: (i) th man's position, standing, sitting and laying, among others; (ii) an estimate of their tion to perform an action; (iii) whether they are pointing at something; and (iv) an an of their motion when doing collaborative tasks. Skeleton tracking processes depth data to determine multiple skeleton joints' positions on a human body. The cameras distinguish a human from the background and identify the posit several features or joints, such as the head, knees, elbows and hands. Once identifie software connects the joints into a humanoid skeleton and tracks their position in time, providing the X, Y and Z coordinates for each of the skeleton points. The ad of depth cameras [62] allows the skeleton tracking system to remove uncertainti tween overlapping or occluded objects or limbs, making the method more robust ferent lighting conditions than a 2D camera-based algorithm [63]. The algorithm u based on the Skeleton Tracking SDK by cubemos. It provides fast and highly accur and 3D human pose estimation that allows tracking of 18 joints simultaneously (tw kles, two knees, two hips, two wrists, two elbows, two shoulders, one between the and the neck, one nose, two eyes and two ears). Due to the artificial intelligence rithms, it can track up to five people in real-time. Figure 15 shows the skeleton-fittin estimation overlaid on the users. It can detect high-speed movements and estimate After recognising the users, the robot is establishing an interaction with, CHARMIE analyses their poses. Essential information can be interpreted from the pose: (i) the human's position, standing, sitting and laying, among others; (ii) an estimate of their intention to perform an action; (iii) whether they are pointing at something; and (iv) an analysis of their motion when doing collaborative tasks. Skeleton tracking processes depth image data to determine multiple skeleton joints' positions on a human body.
The cameras distinguish a human from the background and identify the position of several features or joints, such as the head, knees, elbows and hands. Once identified, the software connects the joints into a humanoid skeleton and tracks their position in real-time, providing the X, Y and Z coordinates for each of the skeleton points. The addition of depth cameras [62] allows the skeleton tracking system to remove uncertainties between overlapping or occluded objects or limbs, making the method more robust to different lighting conditions than a 2D camera-based algorithm [63]. The algorithm used is based on the Skeleton Tracking SDK by cubemos. It provides fast and highly accurate 2D and 3D human pose estimation that allows tracking of 18 joints simultaneously (two ankles, two knees, two hips, two wrists, two elbows, two shoulders, one between the chest and the neck, one nose, two eyes and two ears). Due to the artificial intelligence algorithms, it can track up to five people in real-time. Figure 15 shows the skeleton-fitting pose estimation overlaid on the users. It can detect high-speed movements and estimate a joint location, even when hidden from the camera. Figure 15c shows that the robot with pose only estimation can detect when a person raises their arm in the air to ask for assistance. location, even when hidden from the camera. Figure 15c shows that the robot with pose only estimation can detect when a person raises their arm in the air to ask for assistance. The joints tracked by the algorithm also allow the agents to communicate or transmit information using different gestures. Some technologies like optical flow provide solutions to track movement between images. This demonstrates the movement from people or animals, as long as it is in a different direction from the camera's movement and allows 2D verification of the movement direction. FlowNet2 [64] presents an end-to-end solution based on convolutional neural networks to estimate optical flow. It uses various methods that allow estimating movements at both quick and slow speeds. All of the methods have different purposes, and their combination, despite providing good results, is very computationally expensive. So, LiteFlowNet [65] introduces a different end-to-end convolutional neural network architecture to estimate optical flow. This network, used by CHARMIE for gesture recognition, uses an optimised neural network structure whose goal is to have results with the precision of FlowNet, but with a lower computational expense. In Figure  16, the human agent performs a gesture, from (a) to the (c), where it shows the palm of the hand to the robot, then closes the fist and brings it to its chest. The robot can detect different sets of movements that can be configured and associated with a specific task the user intends the robot to do. Apart from the visual human-machine interfaces, the robot has two more communication systems. One is through speaking and listening, similar to a human-human conversation. CHARMIE uses CMU Sphinx tools from Carnegie Mellon University for speech recognition and Emic 2 Text-to-Speech Module to perform text to speech conversion. All conversations with the robot must be made using the English language. The most significant advantage of this method is that the users do not need any prior knowledge regarding the robot to successfully communicate with it. The robot recognises sequences of keywords from the human. Even though a robot may not understand the entire conversation The joints tracked by the algorithm also allow the agents to communicate or transmit information using different gestures. Some technologies like optical flow provide solutions to track movement between images. This demonstrates the movement from people or animals, as long as it is in a different direction from the camera's movement and allows 2D verification of the movement direction. FlowNet2 [64] presents an end-to-end solution based on convolutional neural networks to estimate optical flow. It uses various methods that allow estimating movements at both quick and slow speeds. All of the methods have different purposes, and their combination, despite providing good results, is very computationally expensive. So, LiteFlowNet [65] introduces a different end-to-end convolutional neural network architecture to estimate optical flow. This network, used by CHARMIE for gesture recognition, uses an optimised neural network structure whose goal is to have results with the precision of FlowNet, but with a lower computational expense. In Figure 16, the human agent performs a gesture, from (a) to the (c), where it shows the palm of the hand to the robot, then closes the fist and brings it to its chest. The robot can detect different sets of movements that can be configured and associated with a specific task the user intends the robot to do.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 18 of 31 location, even when hidden from the camera. Figure 15c shows that the robot with pose only estimation can detect when a person raises their arm in the air to ask for assistance. The joints tracked by the algorithm also allow the agents to communicate or transmit information using different gestures. Some technologies like optical flow provide solutions to track movement between images. This demonstrates the movement from people or animals, as long as it is in a different direction from the camera's movement and allows 2D verification of the movement direction. FlowNet2 [64] presents an end-to-end solution based on convolutional neural networks to estimate optical flow. It uses various methods that allow estimating movements at both quick and slow speeds. All of the methods have different purposes, and their combination, despite providing good results, is very computationally expensive. So, LiteFlowNet [65] introduces a different end-to-end convolutional neural network architecture to estimate optical flow. This network, used by CHARMIE for gesture recognition, uses an optimised neural network structure whose goal is to have results with the precision of FlowNet, but with a lower computational expense. In Figure  16, the human agent performs a gesture, from (a) to the (c), where it shows the palm of the hand to the robot, then closes the fist and brings it to its chest. The robot can detect different sets of movements that can be configured and associated with a specific task the user intends the robot to do. Apart from the visual human-machine interfaces, the robot has two more communication systems. One is through speaking and listening, similar to a human-human conversation. CHARMIE uses CMU Sphinx tools from Carnegie Mellon University for speech recognition and Emic 2 Text-to-Speech Module to perform text to speech conversion. All conversations with the robot must be made using the English language. The most significant advantage of this method is that the users do not need any prior knowledge regarding the robot to successfully communicate with it. The robot recognises sequences of keywords from the human. Even though a robot may not understand the entire conversation Apart from the visual human-machine interfaces, the robot has two more communication systems. One is through speaking and listening, similar to a human-human conversation. CHARMIE uses CMU Sphinx tools from Carnegie Mellon University for speech recognition and Emic 2 Text-to-Speech Module to perform text to speech conversion. All conversations with the robot must be made using the English language. The most significant advantage of this method is that the users do not need any prior knowledge regarding the robot to successfully communicate with it. The robot recognises sequences of keywords from the human. Even though a robot may not understand the entire conversation from a user, it understands keywords and confirms if the information received is correct by asking the user if what is understood is the correct answer. Some examples of keywords are names of its users, names of different rooms, names of objects, and actions.
The robot's response may vary according to its perception, location, priorities, whether it is performing a task or moving somewhere. Every sentence a user says to the robot must start with the word "CHARMIE", so the robot knows whether the conversation is towards it or not. In the video, in Appendix A, it is possible to see CHARMIE introducing himself and some conversations with users.
The last human-machine interface the robot has is a multimodal user interface. In environments where there is significant noise level, the user presents difficulties speaking, or the user cannot make a predefined gesture, this system can be used. A tablet with a menu lets users select all features that the previous human-robot interactions presented at the robot's torso, with the addition of being available in languages other than English.

Object Detection and Subsequent Manipulation
For learning and recognising objects, both healthcare-related and household items, CHARMIE uses the supervised learning algorithm named YOLO (You Only Look Once) [66]. YOLO is a state-of-the-art, real-time object detection system known for its high-speed and accuracy. The YOLOv3 [67] algorithm starts by separating the image into a grid. Each grid cell predicts several boundary boxes around objects that score highly with predefined classes. Each boundary box has a respective confidence score of how accurate it assumes the prediction must be and detects only one object per bounding box. The boundary boxes are generated by clustering the ground truth boxes' dimensions from the original dataset to find the most common shapes and sizes. Unlike other models, YOLO looks at the entire image when testing, so its prediction reflects the image's global context. It makes predictions with a single network evaluation, unlike systems such as R-CNN which requires thousands evaluation systems for a single image. This makes it extremely fast; more than a thousand times faster than R-CNN and a hundred times faster than Fast R-CNN. However, YOLO is not ideal to use with models where large datasets may be hard to obtain. Even with its high speed, YOLO is not fast enough to run on embedded devices such as a Raspberry Pi. To help make YOLO even faster, the algorithm creators defined a YOLO architecture variation called Tiny-YOLO. This architecture is approximately 442% faster than YOLO and can achieve 244 FPS on a single GPU. Since Tiny-YOLO is a more compact version, this also means that it is less accurate. The architecture that is used in YOLOv3 is called DarkNet53 [67]. With its 53 layers of convolutions and no max-pooling, its main job is to perform feature extraction. A BatchNormalization and a leaky RELU follow each convolution operation. Darknet53 architecture is proved to be an extremely efficient network regarding object classification. CHARMIE uses Tiny-YOLOv3 architecture to detect different healthcare-related and household objects. Figure 17 shows some of the things that the robot has already learnt to detect, like bottles, cans and bags of chips. To introduce new objects into CHARMIE's database, CHARMIE records a video of the item, collecting all the frames. This information is later used to retrain Tiny-YOLOv3.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 19 from a user, it understands keywords and confirms if the information received is co by asking the user if what is understood is the correct answer. Some examples of words are names of its users, names of different rooms, names of objects, and actions robot's response may vary according to its perception, location, priorities, whether performing a task or moving somewhere. Every sentence a user says to the robot m start with the word "CHARMIE", so the robot knows whether the conversation is tow it or not. In the video, in Appendix A, it is possible to see CHARMIE introducing him and some conversations with users. The last human-machine interface the robot has is a multimodal user interfac environments where there is significant noise level, the user presents difficulties speak or the user cannot make a predefined gesture, this system can be used. A tablet w menu lets users select all features that the previous human-robot interactions prese at the robot's torso, with the addition of being available in languages other than Eng

Object Detection and Subsequent Manipulation
For learning and recognising objects, both healthcare-related and household it CHARMIE uses the supervised learning algorithm named YOLO (You Only Look O [66]. YOLO is a state-of-the-art, real-time object detection system known for its high-s and accuracy. The YOLOv3 [67] algorithm starts by separating the image into a grid. grid cell predicts several boundary boxes around objects that score highly with predef classes. Each boundary box has a respective confidence score of how accurate it assu the prediction must be and detects only one object per bounding box. The boundary b are generated by clustering the ground truth boxes' dimensions from the original da to find the most common shapes and sizes. Unlike other models, YOLO looks at the e image when testing, so its prediction reflects the image's global context. It makes pr tions with a single network evaluation, unlike systems such as R-CNN which req thousands evaluation systems for a single image. This makes it extremely fast; more a thousand times faster than R-CNN and a hundred times faster than Fast R-CNN. H ever, YOLO is not ideal to use with models where large datasets may be hard to ob Even with its high speed, YOLO is not fast enough to run on embedded devices such Raspberry Pi. To help make YOLO even faster, the algorithm creators defined a Y architecture variation called Tiny-YOLO. This architecture is approximately 442% f than YOLO and can achieve 244 FPS on a single GPU. Since Tiny-YOLO is a more com version, this also means that it is less accurate. The architecture that is used in YOLO called DarkNet53 [67]. With its 53 layers of convolutions and no max-pooling, its mai is to perform feature extraction. A BatchNormalization and a leaky RELU follow convolution operation. Darknet53 architecture is proved to be an extremely efficient work regarding object classification. CHARMIE uses Tiny-YOLOv3 architecture to d different healthcare-related and household objects. Figure 17 shows some of the th that the robot has already learnt to detect, like bottles, cans and bags of chips. To intro new objects into CHARMIE's database, CHARMIE records a video of the item, colle all the frames. This information is later used to retrain Tiny-YOLOv3. (a) (b) Figure 17. Two different outputs (a,b) from the YOLO detection algorithm that detect and locate various pre-trained household objects simultaneously in real-time.
To grasp the detected objects, CHARMIE uses its redundant manipulator, the robot arm. Some objects with unusual shapes and whose shape changes, such as a bag of chips, have different programmed collection algorithms. Nevertheless, for most of the items, the picking-up system is similar. After detecting the desired object, CHARMIE calculates the inverse kinematics to place the robot arm and the lifting mechanism in the best position to collect the object. Then the hand-closing is performed according to the object's physical properties. Figure 18 demonstrates an example of picking a can. As stated, the robot moves its platform and lifting mechanism to best fit the item's position, as can be seen in Figure 18a. The arm is moved right next to the object that wants to be manipulated, as can be seen in Figure 18b, and moving the hand right next to it, as can be seen in Figure 18c. In this case, it starts by using the thumb as a back wall, as can be seen in Figure 18d, and then it starts closing the fingers one by one from the index finger to the little finger, as can be seen in Figure 18e. This movement allows the robot to pick up the can, similar to how a human picks it up, as can be seen in Figure 18f. To grasp the detected objects, CHARMIE uses its redundant manipulator, the robot arm. Some objects with unusual shapes and whose shape changes, such as a bag of chips, have different programmed collection algorithms. Nevertheless, for most of the items, the picking-up system is similar. After detecting the desired object, CHARMIE calculates the inverse kinematics to place the robot arm and the lifting mechanism in the best position to collect the object. Then the hand-closing is performed according to the object's physical properties. Figure 18 demonstrates an example of picking a can. As stated, the robot moves its platform and lifting mechanism to best fit the item's position, as can be seen in Figure 18a. The arm is moved right next to the object that wants to be manipulated, as can be seen in Figure 18b, and moving the hand right next to it, as can be seen in Figure 18c. In this case, it starts by using the thumb as a back wall, as can be seen in Figure 18d, and then it starts closing the fingers one by one from the index finger to the little finger, as can be seen in Figure 18e. This movement allows the robot to pick up the can, similar to how a human picks it up, as can be seen in Figure 18f.
(e) (f) Figure 18. A step-by-step demonstration of CHARMIE's process of picking the desired item to furtherly be used in subsequent tasks.

Results
CHARMIE performs a wide range of tasks as a service and assistant robot combining the four low-level functions previously stated. When developing algorithms to perform new tasks, the central focus lies mostly on how helpful these are for elderly or healthcare workers. By aiding both, CHARMIE can provide a higher quality of life for older people. Some central problems the robot tries to tackle regard tasks: (i) where older people have a lack of mobility (picking objects from the floor); (ii) where they have a lack of strength (carrying heavy objects); (iii) that are safety-related (if a person falls and cannot get up); and (iv) that happen on a day-to-day basis and that may require a lot of energy and include injury risks. As stated, CHARMIE can perform a wide variety of tasks. Of those, five chores that encompass different scenarios and different interactions are thoroughly explained.

Help Me Carry This Bag
One of the most severe difficulties for older people is carrying heavy objects. A common practice among the elderly is to shop for groceries with higher regularity, which translates to less weight but more trips to the stores. For now, CHARMIE is only intended for indoor use, with some exceptions. Thus, groceries-carrying tasks are idealised to receive the bags at the environment entrance and be further transported to the desired location. Figure 19 shown all the steps CHARMIE underwent to complete this task. Additionally, this task can be seen in the video in Appendix A.

Results
CHARMIE performs a wide range of tasks as a service and assistant robot combining the four low-level functions previously stated. When developing algorithms to perform new tasks, the central focus lies mostly on how helpful these are for elderly or healthcare workers. By aiding both, CHARMIE can provide a higher quality of life for older people. Some central problems the robot tries to tackle regard tasks: (i) where older people have a lack of mobility (picking objects from the floor); (ii) where they have a lack of strength (carrying heavy objects); (iii) that are safety-related (if a person falls and cannot get up); and (iv) that happen on a day-to-day basis and that may require a lot of energy and include injury risks. As stated, CHARMIE can perform a wide variety of tasks. Of those, five chores that encompass different scenarios and different interactions are thoroughly explained.

Help Me Carry This Bag
One of the most severe difficulties for older people is carrying heavy objects. A common practice among the elderly is to shop for groceries with higher regularity, which translates to less weight but more trips to the stores. For now, CHARMIE is only intended for indoor use, with some exceptions. Thus, groceries-carrying tasks are idealised to receive the bags at the environment entrance and be further transported to the desired location. Figure 19 shown all the steps CHARMIE underwent to complete this task. Additionally, this task can be seen in the video in Appendix A. (e) (f) Figure 19. A step-by-step demonstration of CHARMIE's "Help me carry this bag" task. From receiving a task from the first user, collecting the bag, navigating to the desired location and hand-delivering it to the second user.
The user starts by initialising the dialogue requesting help from the robot to collect and transport the grocery bags, as can be seen in Figure 19a. It must indicate where the robot must transport the loads, and optionally if these must be given to another known user. After receiving this information, CHARMIE expresses that it is ready to receive the Figure 19. A step-by-step demonstration of CHARMIE's "Help me carry this bag" task. From receiving a task from the first user, collecting the bag, navigating to the desired location and hand-delivering it to the second user.
The user starts by initialising the dialogue requesting help from the robot to collect and transport the grocery bags, as can be seen in Figure 19a. It must indicate where the robot must transport the loads, and optionally if these must be given to another known user. After receiving this information, CHARMIE expresses that it is ready to receive the bag for transportation, and asks the user to place it in its hand, as can be seen in Figure 19b. The robot detects when the user places the load in its hand through human pose estimation and confirms it via the force the arm actuators' must provide, as can be seen in Figure 19c. Furthermore, the robot starts moving to the delivery location while performing safe navigation with static and dynamic obstacle avoidance, as can be seen in Figure 19d. If the robot does not have to deliver the grocery bags to another user, it analyses the table surface and tries to find an empty spot to drop the bags. In the situation of not having an open space for the bags, the robot would wait for a user to clarify where the bags should be placed. If the robot's task is to hand in the bag to a human user, it searches for the user when arriving at the desired location. If no user is there to receive it, the robot tries to place it on the table. Otherwise, the robot moves in the user's direction and asks if he/she is available to receive the bag, as can be seen in Figure 19e. If the answer is no, the robot patiently waits until the user is available. When the user confirms its availability, the robot extends its arm in the human's direction and asks the user to collect the bag, as can be seen in Figure 19f. Afterwards, the robot returns to the initial user to verify whether there are more grocery bags to transport.

Can You Find This Item and Bring It to Me
With ageing, older people tend to have more difficulties getting up from a sitting position. Additionally, with mental health deterioration, there is a tendency for memory to start failing and forgetting an object's placement. One of the main chores CHARMIE performs is to collect things and return them to the user. Some examples of objects might be medicine boxes, cans, bottles, cellphones and remote controllers. A variation of this task is when a user does not know the exact location of the object it asks the robot to retrieve. In the example provided for this task, a scenario is presented where the user is laying in bed, not feeling very well and needing to drink some water.
The user starts by calling CHARMIE, and upon arrival the robot sees that the person is laying in bed and stays alert to the task it must do. Through dialogue, the user indicates that he/she is not feeling very well, and cannot get up, but needs to drink some water. In this situation, the user can say where the drink is or say the location and the robot must find it. CHARMIE starts moving towards the kitchen through safe navigation. When arriving at the kitchen, if the user stated where the drink is, the robot goes directly to that location. If the user did not define the area, the robot must look around the kitchen searching, starting with open spaces like the counters, tables and open shelves. After detecting the specified drink, the robot picks up the object with its redundant manipulators and, if possible, closes the opened kitchen cabinets. In Figure 18a scenario is displayed where CHARMIE has various drinks and objects placed on a table, and it must analyse and collect the required drink. The robot returns to the user and asks if it is available to receive the drink. When the user is available, the robot extends the arm in the human's direction, and waits for the user to collect it through pose estimation. After finishing, the robot asks whether the user is feeling better and if it needs anything else.

Check on the Patients
One of the major causes of serious injuries in the elderly has to do with falls. The reflexes to protect themselves start to degrade, and older people lose balance and movement capabilities, which may cause serious falls. In situations where the person lives alone, this is extremely dangerous, since the person is only analysed for potential injuries when someone goes to their house. One of CHARMIE's most safety-related tasks consists of patrolling indoor areas and, through pose estimation, understanding if a person is in danger. The robot can patrol nursing homes or hospitals at night to check if the patients are laying on their beds, lowering the detection time for an older person who fell. CHARMIE can also check if a person is not on the bed or even sitting down needing some assistance. In cases where older people live without any health professional, the robot can quickly send an alert message to the emergency contacts.
In the nursing home patrolling task, CHARMIE uses the known environment mapping where the patient rooms are defined. By patrolling the bedrooms, the robot starts by very calmly opening or pushing the door and slowly moving inside the bedroom to not scare or wake up the patients. As displayed in Figure 20, the robot may encounter various scenarios when estimating the human's pose to evaluate patient safety.
One of the major causes of serious injuries in the elderly has to do with falls. The reflexes to protect themselves start to degrade, and older people lose balance and movement capabilities, which may cause serious falls. In situations where the person lives alone, this is extremely dangerous, since the person is only analysed for potential injuries when someone goes to their house. One of CHARMIE's most safety-related tasks consists of patrolling indoor areas and, through pose estimation, understanding if a person is in danger. The robot can patrol nursing homes or hospitals at night to check if the patients are laying on their beds, lowering the detection time for an older person who fell. CHARMIE can also check if a person is not on the bed or even sitting down needing some assistance. In cases where older people live without any health professional, the robot can quickly send an alert message to the emergency contacts.
In the nursing home patrolling task, CHARMIE uses the known environment mapping where the patient rooms are defined. By patrolling the bedrooms, the robot starts by very calmly opening or pushing the door and slowly moving inside the bedroom to not scare or wake up the patients. As displayed in Figure 20, the robot may encounter various scenarios when estimating the human's pose to evaluate patient safety. The first possible scenario is the detection of a patient laying in bed. However, this scenario may present some different variations. The first, displayed in Figure 20a, shows the pose detection of a patient laying on the bed without any covers. The second, shown in Figure 20b, demonstrates a patient laying in bed but covered from the waist down. The The first possible scenario is the detection of a patient laying in bed. However, this scenario may present some different variations. The first, displayed in Figure 20a, shows the pose detection of a patient laying on the bed without any covers. The second, shown in Figure 20b, demonstrates a patient laying in bed but covered from the waist down. The third, Figure 20c, shows a patient laying in bed but covered from the neck down with an arm also showing. In the first two cases, the pose estimation algorithm can detect properly that the user is safely laying in bed, but the third example does not always detect successfully. Thus, when the robot cannot estimate the pose, it analyses the bed height variation to differentiate between the patient laying in bed and the bed being empty. If the patient is not laying in bed and no pose estimation is made inside that room, two different situations may occur, the patient is either missing or sleeping deep inside the covers. If the patient is laying on the bed, the robot calmly leaves the room and slightly closes the door. By analysing the bed height variation, should it detect there is no patient in the room, it sends an emergency warning to the healthcare workers that a patient might be missing. Another scenario results when a patient is sitting on the bed, Figure 20d, and requires non-emergency help. In this case, CHARMIE also summons a healthcare professional but in a different way to the first case. The last scenario displays the worst case: an elderly person has fallen out of bed and cannot get up, as can be seen in Figure 20e,f. If CHARMIE detects this scenario, it immediately sends an emergency message to all of the healthcare professionals. This recognition process is repeated throughout all of the rooms in the nursing home. For an older person who lives by themselves, this recognition system can be temporarily programmed.
3.4. Store the Groceries" or "Clean up the Room CHARMIE can also execute some tasks regarding cleaning and tidying up. Even though these tasks do not require carrying heavy loads, they involve a different physical restriction that may end in injury. When tidying a room, older people may have to place themselves in positions, such as picking objects from the floor or stretching to reach the top or bottom shelves that may lead to accidents. With the goal of helping the elderly, and at the same time freeing healthcare workers to focus on more critical patient-related chores, CHARMIE can analyse these environments that need to be cleaned. By perceiving which objects are out of place, the robot can collect and place those in areas previously indicated. To describe this task, two different variations are presented.
To store some groceries laying on the kitchen table, the correct places for every object need to have been previously indicated to the robot. After the bags have been unloaded, the robot analyses the various things it must store. If an object is not in the robot's database, CHARMIE asks the user the correct spot to place the item. When the robot is storing this product, it captures images from different angles to later add to its training data. The remaining known products are packed in the correct place using CHARMIE's redundant manipulators. If the robot comes across a full shelf when attempting to place an item, the robot returns the item to the kitchen table and informs the user.
When cleaning a specific room, the robot starts by analysing the environment, extracting all the information regarding all objects on the floor or tables. With its depth image, the robot can differentiate objects from flat surfaces and save their location. Similarly to the storing groceries example, the robot has to already know the object's place. Contrary to the groceries examples, the places where the items go are varied height-wise, which forces the robot to adapt its height to successfully manipulate the out of place objects.

Follow Me" and "Lay the Table
Another beneficial task of a service robot is to follow a user who needs help performing a task. At times, CHARMIE might be in a different compartment than where the user needs it to complete the next job. Therefore, to ease the whole process, the user might request CHARMIE to follow him/her to the new room. This chore is particularly useful in large indoor environments where following a user is significantly more efficient than looking for the worker. In the following process, the robot can avoid static and dynamic obstacles or even calculate a new trajectory when it realises it cannot pass.
In Figure 21, extracted from the video in Appendix A, all of the steps regarding the "Follow Me" task can be seen. The user starts by asking the robot to follow him/her to navigate to the compartment where help is needed, as can be seen in Figure 21a. The robot uses both the 2D LiDAR and the RGB-D camera to lock the user it is following, as can be seen in Figure 21b. During the process, the robot can navigate narrow spaces and even avoid collisions with humans that pass between the followed user and CHARMIE, as can be seen in Figure 21c,d. Almost at the end of the route, a new obstacle comes up, the robot detects that the followed user has an insuperable barrier similar to a wall (represented by the black cardboard demonstrated in the video in Appendix A that the robot cannot surpass, as can be seen in Figure 21e. Thus the robot, using the environment's map, calculates an alternative route to get back to its user, as can be seen in Figure 21f. In the return path, CHARMIE finds novel obstacles that it can successfully overcome.

Follow Me" and "Lay the Table
Another beneficial task of a service robot is to follow a user who needs help performing a task. At times, CHARMIE might be in a different compartment than where the user needs it to complete the next job. Therefore, to ease the whole process, the user might request CHARMIE to follow him/her to the new room. This chore is particularly useful in large indoor environments where following a user is significantly more efficient than looking for the worker. In the following process, the robot can avoid static and dynamic obstacles or even calculate a new trajectory when it realises it cannot pass.
In Figure 21, extracted from the video in Appendix A, all of the steps regarding the "Follow Me" task can be seen. The user starts by asking the robot to follow him/her to navigate to the compartment where help is needed, as can be seen in Figure 21a. The robot uses both the 2D LiDAR and the RGB-D camera to lock the user it is following, as can be seen in Figure 21b. During the process, the robot can navigate narrow spaces and even avoid collisions with humans that pass between the followed user and CHARMIE, as can be seen in Figure 21c,d. Almost at the end of the route, a new obstacle comes up, the robot detects that the followed user has an insuperable barrier similar to a wall (represented by the black cardboard demonstrated in the video in Appendix A that the robot cannot surpass, as can be seen in Figure 21e. Thus the robot, using the environment's map, calculates an alternative route to get back to its user, as can be seen in Figure 21f. In the return path, CHARMIE finds novel obstacles that it can successfully overcome. (e) (f) Figure 21. A step-by-step demonstration of CHARMIE's "Follow Me" task. From receiving the task to navigating behind the human user while avoiding static and dynamic obstacles that cross the path between the robot and the user. When the robot cannot continue following the user, it recalculates a new trajectory.
As an example of a collaborative duty, the robot can lay the table with a human user. The methodology used is similar to the storing/cleaning tasks but with the collaboration twist. When this task is selected, CHARMIE knows that it needs five items to lay on the table: a plate, a fork, a spoon, a knife and a cup. The robot analyses the objects already laid on the table by a worker or a human user. Furthermore, it complements the user's work Figure 21. A step-by-step demonstration of CHARMIE's "Follow Me" task. From receiving the task to navigating behind the human user while avoiding static and dynamic obstacles that cross the path between the robot and the user. When the robot cannot continue following the user, it recalculates a new trajectory.
As an example of a collaborative duty, the robot can lay the table with a human user. The methodology used is similar to the storing/cleaning tasks but with the collaboration twist. When this task is selected, CHARMIE knows that it needs five items to lay on the table: a plate, a fork, a spoon, a knife and a cup. The robot analyses the objects already laid on the table by a worker or a human user. Furthermore, it complements the user's work by signalling which items it will distribute while analysing whether the human user forgot one item on a previously laid set.

Conclusions
A description of hardware and software solutions for healthcare and domestic collaborative service and assistant robot, CHARMIE, is presented in this article. Additionally, results from the development of the initial prototype and the first set of user trials in a controlled laboratory setting focusing on developing an assistive care robot for older adults are described. The focus of the chores presented displays several different scenarios where CHARMIE can directly impact the quality of life of older people, mainly regarding physical safety, but also concerning social interactions and mental health. The majority of the robot's tasks designed for geriatric care involve fall detection and prevention, such as patrolling through nursing or domestic homes and picking up objects from the floor. In addition to the tasks demonstrated, CHARMIE can also work as a social company robot, asking questions throughout the day, allowing the user to play some games in its multimodal user interface and reminding the users of their schedules. Even though most tasks mainly aid the elderly, these chores can be adapted to be more oriented to healthcare workers or people with reduced mobility.
From a different perspective, due to the difficulties brought by the COVID-19 pandemic and its associated lockdowns, robots such as CHARMIE provide a safer solution to overcome this disease's challenges. Robots provide a superlative solution, since the virus cannot replicate itself in a robot as it does in a human and drastically reduces person-toperson contacts. Elderly care facilities and all other healthcare-related environments are particularly at risk of heavy breakouts since these encompass a very vulnerable population. Without a robotic solution, residents without the virus inside such facilities face a higher chance of contamination, which may happen through asymptomatic healthcare professionals. In addition to all of the social and mental health tasks social robots can perform, service and assistant robots such as CHARMIE can provide more direct help not only to patients, but also to healthcare professionals in a wide range of healthcare facilities. The longevity of the virus has dictated that all healthcare workers undergo very long working shifts, with low sleeping schedules, and in many situations being away from their families. This can lead to healthcare workers reaching a state of high fatigue and burnout in overburdened health systems, which can be eased if some of the tasks are performed by healthcare robots such as CHARMIE. Some examples of tasks are: transporting goods, providing patients information, patrolling the facilities, and sending an alert when any unexpected patient behaviour is detected. Additionally, due to the COVID-19 pandemic, the desired tests meant to be performed on real world environments, in this case in two nursing homes, two domestic homes and one hospital, initially scheduled for 2020 had to be indefinitely postponed as mandated by the national public health committee.
As short-term and middle-term objectives, novel concepts and tasks are projected to be developed and implemented in CHARMIE. The robot can successfully detect falls and prevent some scenarios where older people might fall by picking up items from the floor. However, one of the most significant difficulties regarding movement and mobility of older people happens when trying to get up from a sitting position, which is more aggravated in single-person households. This scenario happens several times during the day: getting out of bed, after a meal, when getting up from the sofa where the elderly do not have enough strength or apply too much force and lose balance, which results in dangerous falls. The goal is to provide active help to older people when trying to stand up. Another goal is to increase the robot's working area in buildings with more than one floor. Since these healthcare facilities are commonly prepared for people with reduced mobility or wheelchairs, it is uncommon to have rooms that can only be accessed through stairs. Since the motion platform cannot overcome stairs, the goal is to create a system that allows the robot to successfully move between floors using elevators. The map would consist of all the floors in the building, and the robot must move to the elevator to switch floors. However, this task encompasses many steps that are still being worked on, such as calling the elevator, pressing the correct buttons and entering and leaving without colliding with other users. In order to benchmark all of these different tasks, it is intended to apply for RoboCup@Home participation. The CHARMIE project had already done so in 2017, as the video in Appendix A is the qualification video, part of the necessary qualification material. The video demonstrates CHARMIE performing some of the tasks previously described.
The desirable long-term goals for the CHARMIE project can be divided into two categories. From a technological perspective, it is intended to create the necessary hardware and software solutions to transport broader wheeled objects such as hospital beds and wheelchairs that may have patients. It is extremely challenging to move these wheeled platforms with human patients on top. The robot must adapt its omnidirectional way of motion to the wheeled object, detect obstacles further away and analyse the patient pose. All this must be performed while considering all the strong safety measures that come with patient transportation. Additionally, as previously stated, it is intended to use machine learning algorithms for a wide range of tasks allowing the robot to learn how to perform and improve chores via observation and trial-and-error direct interaction with the environment. However, these household and healthcare tasks usually consist of long-term planning, high-dimensional continuous action-space, simulation to real-world transition problem and, in most cases, incomplete information. Such issues require highly complex reinforcement learning algorithms to be solved. This methodology enables adaptative learning, using reinforcement learning based service and assistive elderly care chores like those previously described. From a global perspective, it is planned to start thoroughly testing CHARMIE in real healthcare and domestic environments. All functions, such as map building, self-localisation, obstacle detection and avoidance, human-robot interaction, object detection and manipulation, will be executed in a hospital, two nursing homes, and two domestic houses. This will allow a qualitative and quantitative evaluation from the developers and users of how the robot can perform in such environments. The final results will comprise a fitting framework for a socially assistive robot for end-to-end chores whose final goal is to enhance all its users' quality of life.  Acknowledgments: The authors of this work would like to thank the members of the Laboratory of Automation and Robotics from the University of Minho as well as all former CHARMIE project members.