Next Article in Journal
An Area-Efficient up/down Double-Sampling Circuit for a LOFIC CMOS Image Sensor
Previous Article in Journal
DNS for IoT: A Survey
Previous Article in Special Issue
Intelligent Eye-Controlled Electric Wheelchair Based on Estimating Visual Intentions Using One-Dimensional Convolutional Neural Network and Long Short-Term Memory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Object Affordance-Based Implicit Interaction for Wheelchair-Mounted Robotic Arm Using a Laser Pointer

State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2023, 23(9), 4477; https://doi.org/10.3390/s23094477
Submission received: 29 March 2023 / Revised: 27 April 2023 / Accepted: 3 May 2023 / Published: 4 May 2023
(This article belongs to the Special Issue Robot Assistant for Human-Robot Interaction and Healthcare)

Abstract

:
With the growth of the world’s population, limited healthcare resources cannot provide adequate nursing services for all people in need. The wheelchair-mounted robotic arm (WMRA) with interactive technology could help to improve users’ self-care ability and relieve nursing stress. However, the users struggle to control the WMRA due to complex operations. To use the WMRA with less burden, this paper proposes an object affordance-based implicit interaction technology using a laser pointer. Firstly, a laser semantic identification algorithm combined with the YOLOv4 and the support vector machine (SVM) is designed to identify laser semantics. Then, an implicit action intention reasoning algorithm, based on the concept of object affordance, is explored to infer users’ intentions and learn their preferences. For the purpose of performing the actions about task intention in the scene, the dynamic movement primitives (DMP) and the finite state mechanism (FSM) are respectively used to generalize the trajectories of actions and reorder the sequence of actions in the template library. In the end, we verified the feasibility of the proposed technology on a WMRA platform. Compared with the previous method, the proposed technology can output the desired intention faster and significantly reduce the user’s limb involvement time (about 85%) in operating the WMRA under the same task.

1. Introduction

Nowadays, the world’s population is in the stage of growing [1]. The increasing number of people challenges the social care services system, and most care workers are facing unprecedented pressure [2]. The WMRA can help people complete some household tasks independently [3], which will improve the quality of life and reduce caregiving stress. However, the traditional joystick remote control mode on the WMRA requires frequent limb movements of the user, which brings physical and psychological burdens. The WMRA should be easy to operate and have a sense of control [4,5]. Therefore, helping users interact with the WMRA with less limb movements and conveying their intentions with the least operation is the current research focus impacting the performance and user acceptance of the WMRA.
To facilitate the operation of robots for users with different levels of physical ability, many human–robot interactions (HRI) interfaces utilizing residual limb abilities have been studied, such as using chin [6], shoulder [7], gesture [8], and eye movement [9]. In these studies, the interaction operations mapped the residual limb movement to robot instructions, such as forward, back, left, right, rotation, or other cartesian motions for the robotic arm, and some preset simple household tasks, which could help users perform some structured tasks, but still needing frequent limb movement. In addition, using biological signals remote control robots to perform the tasks would substantially reduce limb movements and physical burden [10,11,12,13]. However, the time-consuming signal recognition, complex operation, and high cost limit their development.
In recent years, researchers have tried to use the concept of “shared attention [14]” to reduce the complexity of manipulating robots. The concept refers to “what you see is what you want” so that robots can grasp or place the target objects for a user by capturing the focus of the user’s attention. For instance, screen tapping [15], eye gaze [16], laser pointer [17,18], and electroencephalogram (EEG) recognition [19] can share attention with robots. However, compared with laser pointer, gaze, EEG, and a screen would attract too much attention from users, which will make them ignore the surrounding environment and lead to safety risks.
Users could convey simple intentions to the robot through share attention. However, people’s intentions are complex and robots’ tasks are unstructured, which requires a robot to make deeper inferences about intention from a simple interaction. The concept of object affordance refers to actions that match the physical properties of an object [20,21], which contribute to reasoning the intention of deeper tasks from the shared attention of users. Therefore, we propose an implicit intention interaction technology using a laser pointer based on the idea of object affordance. A user could point to the objects in the scene using a laser pointer, and the WMRA would reason the action intentions through the selected objects and execute the corresponding action content.
The proposed technology is composed of two major parts. One is the laser semantics identification model based on the SVM and YOLOv4 fusion algorithm, which mainly identifies the corresponding semantics by detecting the flicker frequency of the laser spot. The other is the household task intentions reasoning model based on the conditional random field (CRF) and Q-learning algorithm. We formalize the concept of object affordance based on the CRF algorithm and reason the user’s intention through the object information in the scene. In addition, through the Q-learning algorithm, the proposed reasoning model can learn user preferences in the long-term historical intention prediction, in which people may have different operation intentions for the same object on different occasions. Based on the above models, our implicit interaction technology can use the laser pointer to achieve “share attention” with the WMRA and reason the user’s intention. In the end, to execute the household tasks and verify the practicability of the proposed method, we embed the DMP algorithm into the laser interaction technology to generalize the action trajectories of the tasks and achieve the logical state transformation of different actions through the finite state machine.
The contributions of this paper are summarized below:
  • The WMRA with laser interaction is designed, and it can identify the laser semantics correctly even if users point to the wrong position due to hand tremors.
  • The model of intention reasoning and execution based on object affordance is proposed so that the WMRA can identify the desired intention faster and perform tasks with less limb involvement from the user.
The rest of this paper is as follows. Section 2 presents related works. Section 3 describes our system and laser semantic identification. Section 4 describes the technology of reasoning intentions and executing tasks. Section 5 reports the experiments and results. Lastly, the paper is concluded and discussed in Section 6.

2. Related Works

2.1. Interaction Using Laser Pointer

Interaction using a laser pointer has been widely studied because of its simple operation and clear instruction. Gualtieri et al. created a grasping assistance system for activities of daily living (ADLs), and its novel user interface and grasping capabilities enable this system to grasp objects automatically when a user points to an object with a laser pointer [18]. Sprute et al. employed the virtual boundary to the mobile robot using a laser pointer so that it can navigate according to the user’s wishes [21]. Fukuda et al. used a laser spot to guide the direction of the electric wheelchair and bypassed obstacles successfully [22]. Minato et al. proposed a robot navigation system with the pattern recognition of figures drawn by a laser pointer [23]. Widodo et al. selected the buttons on a large screen by laser spot to control the movement of the robot for people with limited mobility [24]. Kemp et al. proposed a human-computer interaction interface that used a laser spot to navigate the EL_E robot. It allowed humans to select a three-dimensional position in the world and communicate with the mobile robot intuitively, picking up the object selected by the laser spot [25]. Nguyen et al. improved the work of Kemp et al. and realized the designated object’s picking and placement according to the behavior of the laser spot and context environment, in which robots can pick up the designated object from the floor or table, deliver the object to the designated person, and place the object on the designated table [26]. Chavez et al. used the automatic object capture technology to grasp the object locked by the laser spot in an unstructured environment [27].
These studies presented some more user-friendly and low-cost interactive methods with a laser pointer and proved their efficacy. There would not be a heavy burden on the human body when equipped with a laser-pointing device, and users could participate in the process of performing tasks (acquire a sense of engagement). However, robots can only execute some simple tasks when grasping the use of laser interaction in existing research. There is still little discussion about how to further convey complex user intentions and control the robot to complete complex tasks using a laser pointer.

2.2. Object Implicit Intention Identification

Object affordance held that people perceived the content in the behavioral possibility provided by things [28]. J. Grezes et al. found that when people perceive the task object, they activate the cortical areas that store visual movement at the same time [29]. From the perspective of neurophysiology, this gave evidence that people perceive objects that can automatically activate action nerves related to objects. The psychological experiments of Anna et al. show that the cognitive system serves for action, and different actions will activate different object constructions [30]. Hassanein et al. believed that the research on object affordance would help to predict the behavior of robots or people, understand social scenes, understand the hidden value of objects, fine-grained scene understanding, and recognize intention [20]. Martijn et al. learned from the concept of object affordance and realized the recognition of the current assembly action and the prediction of the next assembly action according to the object sequence operated by the operator in the video [31]. Mi et al. combined speech recognition with the concept of object affordance and proposed a multi-modal fusion framework based on affordance, which infers the user’s grasping intention according to the user’s voice commands and finally achieves the desired goal [32]. Mo et al. studied the affordance relationship between objects and used the implicit attributes of objects to predict the execution modes of four household tasks [33]. Mandikal et al. embedded the concept of object affordance into the deep reinforcement learning loop to learn grasping policies that are favored by people [34]. Deng et al. made a dataset composed of 18 executable actions and 23 types of objects, which can help the robot identify the implied grasping actions of the objects [35]. Xu et al. studied the expression method of affordance, analyzed the long-term execution effect of objects in the task, and predicted the actions that should be performed in the next step [36]. In addition, some studies have also integrated human actions with object category clues to predict users’ intentions [37,38].
The above research shows that there is a relationship between the object to be operated and the action to be performed. Veronica et al. believed that the research on the intention recognition method based on the idea of object affordance is a new direction to plan more complex tasks using people’s attention and object recognition [39]. However, the revelation relationship between objects and actions in human society is a psychological tendency, not a strict mapping rule. Therefore, the digital modeling of affordance relationships and the adaptability of the model to human habits still hinder the development of implicit interaction.
Kim et al. presented that the robot could perceive the operator’s attention by understanding the user’s gestures, and the robot would identify the objects concerned by the user and infer the task intentions [40]. Kester et al. first proposed an action intention reasoning method based on the category of objects, but the method can only infer actions for a single object without considering the possible impact of the sequence of the concerned objects on action intention recognition [41]. Li et al. observed the user’s eyes through visual technology, inferred the position of the gazed object on the screen, and reasoned the possible subsequent tasks by using the simple Bayes probability model [42]. Wang et al. presented an off-line training and action intention recognition method based on long short-term memory networks that could track eye movement signals such as staring at objects, line of sight inclination, and inclination change rate, which can recognize four sub-action intentions including reach, move, set down, and manipulate [39]. Hitherto, scholars have imitated human’s ability to infer action intention based on the “shared attention” and concept of affordance, which provides a way forward for disabled users to realize implicit intelligent interaction with assistive robots such as the WMRA. However, the existing task reasoning methods based on the concept of affordance are difficult to generalize, and the parsing and execution of tasks mostly depend on expert coding [43]. Therefore, the affordance modeling between objects and actions, the learning mechanism about user preferences and the generalized task execution mechanism need to be further studied.

3. Implicit Interaction Using a Laser Pointer

3.1. WMRA Specifications

The method proposed in this paper will be developed on the WMRA with an embedded board, NVIDIA Jetson TX2. The WMRA mainly consists of a robotic arm, an electric wheelchair, and other components, and the whole robot is powered by two DC 12 V batteries in series, as shown in Figure 1. The robot is equipped with a Realsense D435i RGB-D camera and a Realsense D435 RGB-D camera. One is a global vision (D435i, eye-to-hand) mounted on the top of the wheelchair, and the other is an operational vision (D435, eye-on-hand) on the robotic arm. It is worth mentioning that D435i, as a global vision, has one more inertial motion unit (IMU) compared to D435, which can measure triaxial attitude Angle and acceleration for the WMRA.

3.2. System Overview

In our system, a user could activate the interaction by pointing to a target object using a laser pointer, as shown in Figure 2. The visual system would capture the laser spot, focus on the target object, and extract scene information. The information would be processed by six functional modules:
  • Object recognition: Affordance information of target objects will be accessed, such as visual appearance and spatial relationships.
  • Point cloud data processing: 3D point clouds in the scene are filtered to acquire the objects’ coordinates utilizing the point cloud library.
  • Semantics identification: Identifying the user instructions by the flicker frequency of the laser spot.
  • Intention reasoning: Inferring the user’s intention based on affordance information and semantics information.
  • Grasping posture: An appropriate grasping posture about task intention for the robot arm is generated.
  • Sub-actions reordering: A task expressed by intention consists of some objects and sub-actions. This module will choose and reorder the sub-actions within an action template library and generalize their trajectory.
The touching screen interface enables users to acquire interactive information, such as scene information, user’s intention, confirm and emergency buttons, and operaton of the system. The WMRA will perform the task after the user confirms the robot’s display is the same as their idea.

3.3. Semantic Recognition of Laser Pointer Interaction

Precise detection of the laser spot is crucial for the WMRA to focus on the target objects and subsequently identify their semantics. Instead of using conventional background difference or template matching methods [44], we propose a combined algorithm of YOLOv4 [45] and support vector machine (SVM) to detect laser spots and recognize laser semantics. As shown in Figure 3, the visual system would capture the laser spot and transmit the image stream to the YOLOv4 algorithm. Then, SVM gets the input labels and identifies interactive semantics. Finally, the module will output the semantic information that the user chooses a cup and a bowl.
In the process of laser interaction, laser spot flashes regularly on the target objects, as shown in Figure 2c. To calculate the flashing duration and flashing position of the laser spot, the YOLOv4 algorithm, which has fast and stable image recognition capabilities, is used to process the image stream transmitted from the vision system, as shown in Figure 3. In order to enhance the precision of laser spot detection by YOLOv4, an additional 1800 images of the laser spot in a diverse home environment are added to the COCO dataset, which is augmented with random brightness augmentation, random rotation, and salt-and-pepper noise. Considering the potential misreading caused by laser spot shaking due to the limited limb movement abilities of users, we add missed operation samples to the dataset to improve detection accuracy and robustness.
Object category and laser semantic mapping label will be output from the YOLOv4 algorithm and processed to generate a 26-dimensional array, which is defined as the input of the SVM. Four semantic images and labels are respectively presented in the first and fourth rows of Figure 3. If the laser spot is detected but not in the frame of any object, the label array adds 1. If the laser spot is detected in the frame of an object, a numerical label (2 or 3 or 4, etc.) is added based on the serial number of objects. If no laser spot is detected, the label array adds 0.
SVM is known to learn small samples due to the principle of maximum margin [46,47]. To train an SVM classifier for semantic identification, we marked 100 groups of the training set T = ( x 1 , y 1 ) ,   ( x 2 , y 2 ) ,   , ( x m , y m ) , in which the input is a 26-dimensional array x i and the output y i is semantic information. For ease of operation, we design three types of semantic information: (1) a short click indicating the selection of an object (numerical label determines the target object) or the user’s decision “yes”, (2) a long press indicates that the object selection is ending or the user’s decision “no”, and (3) a double click indicates reset semantic or end the task in progress. To classify the three types of semantic information, the Gaussian kernel is used to construct dataset features.
K x , x i = exp ( γ x x i 2 ) ,
where x is the sample value, x i is the landmark of the Gaussian kernel. γ is bandwidth.
After solving the weight parameters of SVM, the complete semantic identification module would recognize the interactive information from a laser pointer.

4. Intention Reasoning and Task Execution

The semantic identification module transmits the category information of selected objects to the intention reasoning module, as shown in Figure 2d. To infer the user’s intention, we explore the implicit action of objects and construct an object–action intention network (OAIN) based on the concept of object affordance. Figure 4 illustrates the workflow of the reasoning module. First, OAIN gets semantic information and outputs some intentions with a different probability. Users next express their decision, which will be used to learn user preferences. When the user confirms the intention, the final intention will be output to the next module and logically assembled to execute.

4.1. Implicit Action Intention Recognition

The international classification of functioning, disability, and health (ICF) guideline describes the necessary tasks to maintain the lives of people in a home environment. Therefore, we construct a knowledge base dataset about household tasks by extracting the executed object and action labels from the ICF, as shown in Table 1. The dataset contains three types of task descriptions, which consist of a different number of object–action pairs. Objects represent the target that the WMRA will execute, and actions reflect the function affordance of adjacent objects.
In order to formalize affordance between objects, we propose an intention reasoning method based on the conditional random field (CRF). The CRF is a machine learning algorithm that can predict implicit random variables A based on the visible random variable O by solving the maximum conditional probability P A O . The way of reasoning action intention with object category information as a context clue can be regarded as a process of solving the maximum probability implicit variable “action intention” with the visible variable “object”.
The CRF algorithm consists of nodes and edges. As depicted in Figure 5, nodes represent the variable, and edges represent the transition features. The edge between object nodes and action nodes represents the state transition feature, which is defined as s a i , o , i , and the edge between the action nodes represents the action transition feature, which is defined as t a i 1 , a i , o , i .
The object and action sets are represented as O = O 1 ,   O 2 ,   ,   O n and A = A 1 ,   A 2 ,   ,   A n , respectively. Under the condition that the value of O is o and the value of A is a . The conditional probability model P a o is expressed as:
P a o = 1 Z o e x p i ,   k λ k t k a i 1 ,   a i ,   o ,   i + i ,   l μ l s l a i ,   o ,   i ,
where
Z o = a e x p i ,   k λ k t k a i 1 ,   a i ,   o ,   i + i ,   l μ l s l a i ,   o ,   i ,
where λ k and μ l represent the weights of the action transition feature and state transition feature in the model. k and l represent the sequence number of the different transition features. Z o represents the normalization coefficient, which is the weighted sum of all actions and the given objects.
To solve the weight parameter, we use the log-likelihood equation and the improved iterative scaling method. The log-likelihood equation is expressed as:
L w = l o g o , a P w ( a | o ) P ˜ o , a ,
where
P ˜ o , a = v o , a N ,
where
w k = λ k ,                                                   k = 1 ,   2 ,   ,   K 1 μ l ,           k = K 1 + l ;   l = 1 ,   2 ,   ,   K 2   ,
where P ˜ o , a represents the empirical probability distribution. v o , a represents the frequency of a sample o , a in the training dataset. N is the total number of samples.
The prediction of OAIN is the Viterbi algorithm, which can quickly traverse all probability distributions based on the known input sequence and select the maximum probability. The predicted action with the maximum probability can be expressed as:
a * = a r g max a w T P w a o ,
where a * represents the optimal action intention sequence.

4.2. Learning Mechanism

Although we have trained the OAIN using the CRF algorithm, the OAIN may not correctly output the user’s intention when the user’s preference changes. As shown in Figure 4, in order to adapt to the user preference, we embed the Q-learning algorithm into the intention reasoning module so that the OAIN can automatically update weight parameters after the user makes a decision about action intention.
Q-learning algorithm is a value iteration algorithm based on the Markov decision process in the field of reinforcement learning. The core of Q-learning is making decisions by evaluating the change in Q value. Q value refers to the reward or penalty value after the agent selects the action, and the action triggers a state change for the agent. The Q value will be iterated based on the Bellman equation and stored in the Q table, which consists of state and action.
The low-probability intention reasoning results in the traversal process of the Viterbi algorithm being put into the data cache pool so that the user expresses a decision about the intention through laser interaction, as depicted in Figure 4. After the user locks an intention, the learning mechanism will quantify the interaction process and transmit quantization results to the reward mechanism, which is the Bellman equation in Q-learning:
Q * s , a = s T s , a , s R s , a , s + γ max a Q * s , a ,
where s is the previous state (previous intention), a is action (the process of interaction), s is the current state (current intention), T s , a , s is a transition function, R s , a , s is the reward function, and γ is the discount coefficient.
In our scene, the intention of the user may change over time, which means optimal strategy will not exist. Therefore, we adjust the Bellman equation and embed it into our framework. The iterated equation is expressed as:
q s , a = r s , a + γ q s , a ,
where q s , a represents the Q value of the previous state; r s , a is reward function, the value is equal to 0 and 1 when the user chooses “no” and “yes”. respectively; γ is discount coefficient and set as 1.

4.3. Intention Execution

To perform actions intention sequence continuously, the finite state machine (FSM) is used to logically assemble actions. The FSM is a state transition mechanism, which has the characteristics of state storage and logic coordination. Figure 6 illustrates an example of a “grasp cup pour bowl” task. Firstly, the WMRA will process the point cloud data of the target objects to calculate their positions. The WMRA will choose the template trajectory based on the action intention from our action template library and generate new trajectories to adapt to the positions of target objects in the scene. Lastly, trajectories would be reordered by the FSM, and the WMRA follows FSM-defined rules to execute action intention automatically.
Given the demonstrated trajectories in the template library cannot adapt to the change of object position in the scene, the dynamic movement primitives (DMP) algorithm [48,49] is used to post-process the trajectories. DMP is a known trajectory generalization algorithm [50], which would further generalize the trajectory of similar actions by solving the optimal curve characteristic solution of the teaching trajectory. The generalization of one-dimensional motion is generated by the following differential equations.
τ v ˙ = K g x D v + K g x 0 s + K f s ,
where
τ x ˙ = v ,
where x is the position of the trajectory. v is the speed of the trajectory points. x 0 is the starting point of the trajectory. g is the goal of the trajectory. τ is the time scaling factor. K and D are the gain coefficients of the system, which are used to adjust the convergence trajectory of the system. f s is the nonlinear function of the demonstrated trajectory.

5. Experiment and Results

In this section, the feasibility of the proposed implicit interaction technology is verified on a WMRA platform. We test the ability of the laser interaction, the intention reasoning model, the learning mechanism, and task execution. Compared with some previous works [41,44], our model achieves some better results.

5.1. Laser Interaction Evaluation

We have deployed YOLOv4 and SVM-based laser interaction on the embedded board, Jetson TX2. Compared to other visual algorithms, such as Transformer [51], the YOLOv4 algorithm requires fewer computational resources, has faster recognition speed, and is more compatible with hardware environments. Therefore, the YOLOv4 is selected as our visual detection algorithm.
As shown in Figure 7, researchers conduct 50 clicking tests on different objects respectively and observe whether the output results match the target objects. The success rate is shown in Table 2. The results represent that the minimum success rate can reach over 92%, with the accuracy of the table even reaching 100%. We analyze that the high success rate is due to their larger clickable space, more uniform surface curvature, and less laser reflection, while the objects with lower success rate do not possess these characteristics.
Then, we test the accuracy of SVM for semantic recognition and construct a neural network (NN) for comparison, which includes three fully connected layers, and the activation function is Rectified Linear Unit (ReLU). Before the test, we created a guide that describes some target semantics to be implemented (Each semantic is repeated 20 times in a random order in the guide). Researchers click on the laser pointer to output the semantics and judge whether the result is consistent with the content of the guide. The recognition accuracy is shown in Table 3.
A timing-based method is used as a baseline, which recognizes the laser semantics by calculating the time the laser spot appears. The results show that the SVM and NN methods are much better than the baseline. However, compared to the black-box feature of NN, SVM has a rigorous mathematical theory and is a more transparent classification algorithm. Therefore, we combine the YOLOv4 and SVM algorithms to recognize the laser semantics.

5.2. Intention Reasoning Evaluation

In order to compare the ability of intention reasoning with the previous work [41], which combines the Markov network and Bayesian incremental learning, we select the same objects and actions to reasoning intention as shown in Table 4.
Under the same conditions of quantity and type of objects and actions, our work can output 36 single-object intentions and 19 multi-objects intentions. In the other work, the reasoning model could only output the intention with an object and an action, which is low relative to our model. Compared to their work, we can output intentions for several objects and actions. In our opinion, more intentions reflect that the robots have a better comprehension of users and afford users an increased range of options.

5.3. Learning Ability Evaluation

The main metric for evaluating performance is the number of human-computer interactions in each session. The object in the scene selected by the user for the first time is defined as entering the first session, and the laser semantic “yes” or “no” of the user output decision is defined as human–computer interaction. In the process, the user may judge the target intention and find the desired intention in the first session through multiple human-computer interactions. When the user selects the same objects for the second time, it is defined as the second session, and the user may only need less human-computer interaction time to determine the target intention in the second session. The ideal situation will appear when the number of interactions is one, and it is indicating that the first query to users is their expected intention. Table 5 further lists the various groupings of intentions that are tested in the evaluation.
As shown in Figure 8, it requires four interactions when our method outputs the user’s expected intention for the first time. In the span of twenty sessions, the average number of interactions required by our framework decreases monotonically. Finally, the average value of all groups will be equal to one. After twenty sessions, the average reduction in interaction is 70%. Because we set the same scene and intention as the previous work and the number of sessions to the desired intention is a 75% reduction, we believe that our method could learn user preferences and be superior.
Besides, we also evaluate our method through the intention composed of multiple objects, and to our knowledge, the previous work has not achieved this. As shown in Table 6, we set two evaluation groups in which task intentions are composed of two objects and three objects, respectively. The metric and process of evaluation are similar to the above.
As shown in Figure 9, the desired intention in the first session required more interactions compared with the single object intentions. Moreover, it can be seen that the desired intention of three objects required more time interaction than the two objects in the first session. We analyze that the intention prediction of the Viterbi algorithm is to calculate the probability layer by layer, and each layer has many invalid intentions. However, by constantly learning the right intentions, the number of interactions reduces and reaches the ideal situation. Therefore, in general, our method can identify the user’s intention and adapt to user preferences successfully.
Furthermore, we record the time spent on intention reasoning. As we can see from Table 7, the average intention recognition time of a single object is about 0.12 s, and the average intention recognition time of multiple objects is about 0.19 s, which is less than 0.3 s and meets the requirements of real-time recognition.

5.4. Household Task Execution Evaluation

To evaluate the execution ability of laser implicit intention interaction technology on the WMRA platform, we record and analyze the process of performing tasks, and compare our method with joystick operation.
As illustrated in Figure 10, we take the task “grasp cup pour bowl” as an example and calculate the proportion of human–computer interaction in the video stream. The interaction time required by our method is about 17 s, accounting for 14.78% of the total time. However, in manual mode, the user needs to continuously control the joystick when performing household tasks. Although we only list the interaction time required for the task “grasp cup pour bowl”, the time required for other tasks will not change greatly unless the amount of objects change. Therefore, compared with the joystick operation, the proposed technology could not only save time for task execution but also significantly reduce the user’s limb involvement time (about 85%), which effectively alleviates the user’s physical burden.

6. Conclusions and Discussion

To improve the control sense, usability, and adaptability of the WMRA, we propose an object affordance-based implicit interaction technology using a laser pointer. A laser pointer could lock the target objects, similar to the human gaze. Therefore, a YOLOv4 and SVM-based laser semantic identification algorithm is proposed to convey user intention intuitively. Moreover, the robustness of the algorithm contributes to the automatic correction of the wrong semantics when the user points to a wrong position because of hand tremors. Next, we were inspired by the cognitive psychology of human interaction processes and designed an object affordance-based intention reasoning algorithm, which is based on CRF and Q-learning. The WMRA realized the implicit action intention identification of target objects and learned user preference through the decision of user interaction. Lastly, in order to execute the action sequence in the scene, trajectories of actions were generalized by the DMP and transmitted to the FSM to logically reorder.
We selected many objects and conducted the click test of laser interaction. The results show that the lowest success rate of correctly selecting a desired object is 92%, and the highest is 99%. We inferred that the impacts on the success rate are the surface curvature and material, which is consistent with the discussion in [24]. However, compared to previous works on laser interaction [22,23,24], we designed more forms of laser clicking and achieved more laser semantics through intention recognition.
Based on the experiments and results, we believe that our research could reduce the burden of interaction between disabled users and assistive robots. Users only need to sequentially click on the target objects, and then they can wait for the robot to execute automatically. Due to security considerations, the research did not include human-related tests. However, in our opinion, the less participation during the task execution, the fewer the users’ interaction burden. Overall, this paper realizes the conveyance of complex task intention through simple interactive operation and executes it, which lays a technical foundation for the rapid practical application of the WMRA.

7. Limitations and Future Work

This paper explores an implicit interaction system that can assist users in completing household self-care tasks, which can reduce the interaction burden in the operation of the robot. However, some limitations that affect our research should be noted. For safety reasons, we have not conducted tests on older adults or other individuals, which limits the diversity of experimental samples. To achieve interactive experiments between robots and older adults, the reliability of the proposed system and the psychological pressure that robots bring to users, such as the trusted testing between users and robots [52], need further research.
Based on the above limitations, we should not only focus on the execution ability of the robots but also consider the cognitive and decision-making burden of users during the robot operation process. These provide ideas for future research. As reviewed by [53], Ergonomics & Human Factors (E&HF) should be taken into account in the design process of robots. E&HF contributes to understanding users’ capabilities and limitations and uses this information to minimize the cognitive burden on humans in the process of operating robots [54]. Therefore, we will further study the reliability and ergonomics of our laser interaction systems and test their ability to interact with older adults in future work.

Author Contributions

Conceptualization, Y.L. (Yaxin Liu), Y.L. (Yan Liu), Y.Y. and M.Z.; methodology, Y.L. (Yaxin Liu), Y.L. (Yan Liu), and M.Z.; software, Y.L. (Yan Liu); validation, Y.L. (Yaxin Liu), Y.L. (Yan Liu), and M.Z.; formal analysis, Y.L. (Yaxin Liu), Y.L. (Yan Liu), and M.Z.; investigation, Y.Y.; resources, Y.Y. and M.Z.; data curation, Y.L. (Yaxin Liu).; writing—original draft preparation, Y.L. (Yan Liu); writing—review and editing, Y.L. (Yaxin Liu).; visualization, Y.L. (Yan Liu); supervision, M.Z.; project administration, M.Z.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key R&D Program of China (Grant No. 2018YFB1309400).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. United Nations. World Population Prospects 2022: Summary of Results; United Nations: New York, NY, USA, 2022. [Google Scholar]
  2. Keller, E.; Hittle, B.M.; Smith, C.R. Tiredness Takes Its Toll: An Integrative Review on Sleep and Occupational Outcomes for Long-Term Care Workers. J. Gerontol. Nurs. 2023, 49, 27–33. [Google Scholar] [CrossRef]
  3. Argall, B.D. Turning Assistive Machines into Assistive Robots. In Proceedings of the Conference on Quantum Sensing and Nanophotonic Devices XII, San Francisco, CA, USA, 8–12 February 2015. [Google Scholar] [CrossRef]
  4. Kim, D.J.; Hazlett, R.; Godfrey, H.; Rucks, G.; Portee, D.; Bricout, J.; Cunningham, T.; Behal, A. On the Relationship between Autonomy, Performance, and Satisfaction: Lessons from a Three-Week User Study with post-SCI Patients using a Smart 6DOF Assistive Robotic Manipulator. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, USA, 3–8 May 2010; pp. 217–222. [Google Scholar] [CrossRef]
  5. Shishehgar, M.; Kerr, D.; Blake, J. The effectiveness of various robotic technologies in assisting older adults. Health Inform. J 2019, 25, 892–918. [Google Scholar] [CrossRef] [PubMed]
  6. Graser, A.; Heyer, T.; Fotoohi, L.; Lange, U.; Kampe, H.; Enjarini, B.; Heyer, S.; Fragkopoulos, C.; Ristic-Durrant, D. A Supportive FRIEND at Work Robotic Workplace Assistance for the Disabled. IEEE Robot. Autom. Mag. 2013, 20, 148–159. [Google Scholar] [CrossRef]
  7. Bien, Z.; Chung, M.J.; Chang, P.H.; Kwon, D.S.; Kim, D.J.; Han, J.S.; Kim, J.H.; Kim, D.H.; Park, H.S.; Kang, S.H.; et al. Integration of a rehabilitation robotic system (KARES II) with human-friendly man-machine interaction units. Auton. Robot. 2004, 16, 165–191. [Google Scholar] [CrossRef]
  8. Jiang, H.; Wachs, J.P.; Duerstock, B.S. Integrated vision-based robotic arm interface for operators with upper limb mobility impairments. IEEE Int. Conf. Rehabil. Robot. 2013, 2013, 6650447. [Google Scholar] [CrossRef]
  9. Rubies, E.; Palacin, J.; Clotet, E. Enhancing the Sense of Attention from an Assistance Mobile Robot by Improving Eye-Gaze Contact from Its Iconic Face Displayed on a Flat Screen. Sensors 2022, 22, 4282. [Google Scholar] [CrossRef]
  10. Perera, C.J.; Lalitharatne, T.D.; Kiguchi, K. EEG-controlled meal assistance robot with camera-based automatic mouth position tracking and mouth open detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1760–1765. [Google Scholar] [CrossRef]
  11. Quiles, E.; Dadone, J.; Chio, N.; Garcia, E. Cross-Platform Implementation of an SSVEP-Based BCI for the Control of a 6-DOF Robotic Arm. Sensors 2022, 22, 5000. [Google Scholar] [CrossRef] [PubMed]
  12. Saha, S.; Mamun, K.A.; Ahmed, K.; Mostafa, R.; Naik, G.R.; Darvishi, S.; Khandoker, A.H.; Baumert, M. Progress in Brain Computer Interface: Challenges and Opportunities. Front. Syst. Neurosci. 2021, 15, 20. [Google Scholar] [CrossRef]
  13. Belkhiria, C.; Boudir, A.; Hurter, C.; Peysakhovich, V. EOG-Based Human-Computer Interface: 2000–2020 Review. Sensors 2022, 22, 4914. [Google Scholar] [CrossRef]
  14. Shteynberg, G. Shared Attention. Perspect. Psychol. Sci. 2015, 10, 579–590. [Google Scholar] [CrossRef]
  15. Quintero, C.P.; Ramirez, O.; Jagersand, M. VIBI: Assistive Vision-Based Interface for Robot Manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 4458–4463. [Google Scholar]
  16. Fuchs, S.; Belardinelli, A. Gaze-Based Intention Estimation for Shared Autonomy in Pick-and-Place Tasks. Front. Neurorobotics 2021, 15, 17. [Google Scholar] [CrossRef] [PubMed]
  17. Kemp, C.C.; Anderson, C.D.; Nguyen, H.; Trevor, A.J.; Xu, Z. A point-and-click interface for the real world: Laser designation of objects for mobile manipulation. In Proceedings of the 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Amsterdam, The Netherlands, 12–15 March 2008; pp. 241–248. [Google Scholar] [CrossRef]
  18. Gualtieri, M.; Kuczynski, J.; Shultz, A.M.; Pas, A.T.; Platt, R.; Yanco, H. Open world assistive grasping using laser selection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4052–4057. [Google Scholar] [CrossRef]
  19. Padfield, N.; Camilleri, K.; Camilleri, T.; Fabri, S.; Bugeja, M. A Comprehensive Review of Endogenous EEG-Based BCIs for Dynamic Device Control. Sensors 2022, 22, 5802. [Google Scholar] [CrossRef] [PubMed]
  20. Hassanin, M.; Khan, S.; Tahtali, M. Visual Affordance and Function Understanding: A Survey. ACM Comput. Surv. 2022, 54, 35. [Google Scholar] [CrossRef]
  21. Sprute, D.; Tonnies, K.; Konig, M.; IEEE. This Far, No Further: Introducing Virtual Borders to Mobile Robots Using a Laser Pointer. In Proceedings of the 3rd IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019; pp. 403–408. [CrossRef]
  22. Yoshihisa, F.; Yosuke, K.; Kazuyuki, K.; Kajiro, W. Development of electric wheelchair interface based on laser pointer. In Proceedings of the 2009 ICCAS-SICE, Fukuoka, Japan, 18–21 August 2009; pp. 1148–1151. [Google Scholar]
  23. Minato, Y.; Tsujimura, T.; Izumi, K. Sign-at-ease: Robot navigation system operated by connoted shapes drawn with laser beam. In Proceedings of the SICE Annual Conference 2011, Tokyo, Japan, 13–18 September 2011; pp. 2158–2163. [Google Scholar]
  24. Widodo, R.B.; Chen, W.J.; Matsumaru, T.; SICE. Interaction Using the Projector Screen and Spot-light from a Laser Pointer: Handling Some Fundamentals Requirements. In Proceedings of the Annual Conference of the Society-of-Instrument-and-Control-Engineers (SICE), Akita University, Akita, Japan, 20–23 August 2012; pp. 1392–1397.
  25. Jain, A.; Kemp, C.C. EL-E: An assistive mobile manipulator that autonomously fetches objects from flat surfaces. Auton. Robot. 2010, 28, 45–64. [Google Scholar] [CrossRef]
  26. Nguyen, H.; Jain, A.; Anderson, C.; Kemp, C.C. A Clickable World: Behavior Selection Through Pointing and Context for Mobile Manipulation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 787–793. [Google Scholar] [CrossRef]
  27. Chavez, F.; Fernandez, F.; Alcala, R.; Alcala-Fdez, J.; Herrera, F. Evolutionary Learning of a Laser Pointer Detection Fuzzy System for an Environment Control System. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ 2011), Taipei, Taiwan, 27–30 June 2011; pp. 256–263. [Google Scholar]
  28. Szokolszky, A.; Gibson, E. An interview with Eleanor Gibson. Ecol. Psychol. 2003, 15, 271–281. [Google Scholar] [CrossRef]
  29. Grezes, J.; Decety, J. Does visual perception of object afford action? Evidence from a neuroimaging study. Neuropsychologia 2002, 40, 212–222. [Google Scholar] [CrossRef] [PubMed]
  30. Borghi, A.M. Object concepts and action: Extracting affordances from objects parts. Acta Psychol. 2004, 115, 69–96. [Google Scholar] [CrossRef]
  31. Cramer, M.; Cramer, J.; Kellens, K.; Demeester, E. Towards robust intention estimation based on object affordance enabling natural human-robot collaboration in assembly tasks. In Proceedings of the 6th CIRP Global Web Conference on Envisaging the Future Manufacturing, Design, Technologies and Systems in Innovation Era (CIRPe), Shantou, China, 23–25 October 2018; pp. 255–260. [Google Scholar] [CrossRef]
  32. Mi, J.P.; Tang, S.; Deng, Z.; Goerner, M.; Zhang, J.W. Object affordance based multimodal fusion for natural Human-Robot interaction. Cogn. Syst. Res. 2019, 54, 128–137. [Google Scholar] [CrossRef]
  33. Mo, K.; Qin, Y.; Xiang, F.; Su, H.; Guibas, L. O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning. In Proceedings of the 5th Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 1666–1677. [Google Scholar]
  34. Mandikal, P.; Grauman, K. Learning Dexterous Grasping with Object-Centric Visual Affordances. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 6169–6176. [Google Scholar] [CrossRef]
  35. Deng, S.H.; Xu, X.; Wu, C.Z.; Chen, K.; Jia, K. 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), ELECTR NETWORK, Virtual, 19–25 June 2021; pp. 1778–1787. [Google Scholar] [CrossRef]
  36. Xu, D.F.; Mandlekar, A.; Martin-Martin, R.; Zhu, Y.K.; Savarese, S.; Li, F.F.; IEEE. Deep Affordance Foresight: Planning Through What Can Be Done in the Future. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 6206–6213. [CrossRef]
  37. Muller, S.; Wengefeld, T.; Trinh, T.Q.; Aganian, D.; Eisenbach, M.; Gross, H.M. A Multi-Modal Person Perception Framework for Socially Interactive Mobile Service Robots. Sensors 2020, 20, 722. [Google Scholar] [CrossRef]
  38. Jain, A.; Zamir, A.R.; Savarese, S.; Saxena, A. Structural-RNN: Deep Learning on Spatio-Temporal Graphs. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5308–5317. [Google Scholar] [CrossRef]
  39. Wang, X.Y.; Fathaliyan, A.H.; Santos, V.J. Toward Shared Autonomy Control Schemes for Human-Robot Systems: Action Primitive Recognition Using Eye Gaze Features. Front. Neurorobotics 2020, 14, 17. [Google Scholar] [CrossRef]
  40. Kim, S.; Jung, J.; Kavuri, S.; Lee, M. Intention Estimation and Recommendation System Based on Attention Sharing. In Proceedings of the Neural Information Processing: 20th International Conference, Daegu, Republic of Korea, 3–7 November 2013; pp. 395–402. [Google Scholar] [CrossRef]
  41. Duncan, K.; Sarkar, S.; Alqasemi, R.; Dubey, R. Scene-Dependent Intention Recognition for Task Communication with Reduced Human-Robot Interaction. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 730–745. [Google Scholar] [CrossRef]
  42. Li, S.P.; Zhang, X.L.; Webb, J.D. 3-D-Gaze-Based Robotic Grasping Through Mimicking Human Visuomotor Function for People with Motion Impairments. IEEE Trans. Biomed. Eng. 2017, 64, 2824–2835. [Google Scholar] [CrossRef] [PubMed]
  43. Oliver, P.; Uwe, L.; Henning, K.; Christian, M.; Axel, G. Programming of Intelligent Service Robots with the Process Model “FRIEND::Process” and Configurable Task-Knowledge. In Robotic Systems; Ashish, D., Ed.; IntechOpen: Rijeka, Croatia, 2012; p. 25. [Google Scholar]
  44. Zhong, M.; Zhang, Y.Q.; Yang, X.; Yao, Y.F.; Guo, J.L.; Wang, Y.P.; Liu, Y.X. Assistive Grasping Based on Laser-point Detection with Application to Wheelchair-mounted Robotic Arms. Sensors 2019, 19, 303. [Google Scholar] [CrossRef] [PubMed]
  45. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M.; IEEE. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), ELECTR NETWORK, Virtual, 19–25 June 2021; pp. 13024–13033. [Google Scholar] [CrossRef]
  46. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
  47. Suykens, J.A.K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  48. Ijspeert, A.J.; Nakanishi, J.; Schaal, S. Learning attractor landscapes for learning motor primitives. In Proceedings of the 15th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002; pp. 1547–1554. [Google Scholar]
  49. Schaal, S. Dynamic movement primitives—A framework for motor control in humans and humanoid robotics. In Proceedings of the International Symposium on Adaptive Motion of Animals and Machines (AMAM), Kyoto, Japan, 4–8 March 2003; pp. 261–280. [Google Scholar] [CrossRef]
  50. Chi, M.S.; Yao, Y.F.; Liu, Y.X.; Zhong, M. Learning, Generalization, and Obstacle Avoidance with Dynamic Movement Primitives and Dynamic Potential Fields. Appl. Sci. Basel 2019, 9, 1535. [Google Scholar] [CrossRef]
  51. Fan, H.Q.; Xiong, B.; Mangalam, K.; Li, Y.H.; Yan, Z.C.; Malik, J.; Feichtenhofer, C.; IEEE. Multiscale Vision Transformers. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), ELECTR NETWORK. Montreal, BC, Canada, 11–17 October 2021; pp. 6804–6815. [Google Scholar] [CrossRef]
  52. Alves, C.; Cardoso, A.; Colim, A.; Bicho, E.; Braga, A.C.; Cunha, J.; Faria, C.; Rocha, L.A. Human-Robot Interaction in Industrial Settings: Perception of Multiple Participants at a Crossroad Intersection Scenario with Different Courtesy Cues. Robotics 2022, 11, 59. [Google Scholar] [CrossRef]
  53. Cardoso, A.; Colim, A.; Bicho, E.; Braga, A.C.; Menozzi, M.; Arezes, P. Ergonomics and Human Factors as a Requirement to Implement Safer Collaborative Robotic Workstations: A Literature Review. Safety 2021, 7, 71. [Google Scholar] [CrossRef]
  54. Gualtieri, L.; Rauch, E.; Vidoni, R. Emerging research fields in safety and ergonomics in industrial collaborative robotics: A systematic literature review. Robot. Comput. Integr. Manuf. 2021, 67, 30. [Google Scholar] [CrossRef]
Figure 1. Specifications of the WMRA.
Figure 1. Specifications of the WMRA.
Sensors 23 04477 g001
Figure 2. The program flow chart of the interaction system with a laser pointer (green laser).
Figure 2. The program flow chart of the interaction system with a laser pointer (green laser).
Sensors 23 04477 g002
Figure 3. The principles diagram of semantic identification of laser interaction.
Figure 3. The principles diagram of semantic identification of laser interaction.
Sensors 23 04477 g003
Figure 4. The schematic of intention reasoning with a preference learning mechanism.
Figure 4. The schematic of intention reasoning with a preference learning mechanism.
Sensors 23 04477 g004
Figure 5. Object–action intention network model. Blue and black circles represent objects and actions in the dataset, respectively. Colored lines represent probabilistic relationships between objects and actions. Object 1, object 2, and object 3 are selected targets by a user with a laser pointer. Action 1, action 2, and action 3 are the output action intention sequences based on the objects.
Figure 5. Object–action intention network model. Blue and black circles represent objects and actions in the dataset, respectively. Colored lines represent probabilistic relationships between objects and actions. Object 1, object 2, and object 3 are selected targets by a user with a laser pointer. Action 1, action 2, and action 3 are the output action intention sequences based on the objects.
Sensors 23 04477 g005
Figure 6. Intention execution mechanism. (a) The workflow diagram of the FSM. (b) The point cloud data of the selected targets are processed, and the filter functions in PCL are used to calculate the position coordinates of the targets in the scene. (c) A library of action templates that we build by dragging teaching.
Figure 6. Intention execution mechanism. (a) The workflow diagram of the FSM. (b) The point cloud data of the selected targets are processed, and the filter functions in PCL are used to calculate the position coordinates of the targets in the scene. (c) A library of action templates that we build by dragging teaching.
Sensors 23 04477 g006
Figure 7. Example of the experimental procedure for clicking test. The left picture shows that the researcher points a laser spot on the table. The right picture shows that our system is recognizing the laser spot position and selected objects.
Figure 7. Example of the experimental procedure for clicking test. The left picture shows that the researcher points a laser spot on the table. The right picture shows that our system is recognizing the laser spot position and selected objects.
Sensors 23 04477 g007
Figure 8. The result of the single-object intention interaction. (ad) The node represents the average number of interactions in each group, and the line represents the standard deviation in the number of interactions required by each intention in the group.
Figure 8. The result of the single-object intention interaction. (ad) The node represents the average number of interactions in each group, and the line represents the standard deviation in the number of interactions required by each intention in the group.
Sensors 23 04477 g008
Figure 9. The result of the multi-objects intention interaction. (a) First; (b) Second.
Figure 9. The result of the multi-objects intention interaction. (a) First; (b) Second.
Sensors 23 04477 g009
Figure 10. Comparison diagram of limb involvement during the handle interaction and laser interaction.
Figure 10. Comparison diagram of limb involvement during the handle interaction and laser interaction.
Sensors 23 04477 g010
Table 1. Partial examples from our knowledge base dataset 1.
Table 1. Partial examples from our knowledge base dataset 1.
Single Object (a1, o1)Two Objects (a1, o1, a2, o2)Three Objects (a1, o1, a2, o2, a3, o3)
pick cupgrasp cup pour plantspick spoon scoop bowl feed person
grasp cupgrasp cup pour bottlepick spoon scoop plate feed person
push cupgrasp cup drink personpick spoon scoop pot feed person
………………
1 For more detailed information, please contact the corresponding author.
Table 2. The result of the clicking test.
Table 2. The result of the clicking test.
ObjectSuccessFailSuccess Rate
Table500100%
Cup49198%
Bottle49198%
Bowl47394%
spoon46492%
Table 3. The result of the semantics recognition test.
Table 3. The result of the semantics recognition test.
ObjectTiming-BasedNNSVM
Short click80%95%95%
Long click5%90%90%
Double click65%90%95%
Table 4. Experimental dataset. Previous work [41] chose 11 objects and 7 actions to evaluate the ability of intention reasoning and preference learning. We, therefore, select the same objects and actions to compare with their work 1.
Table 4. Experimental dataset. Previous work [41] chose 11 objects and 7 actions to evaluate the ability of intention reasoning and preference learning. We, therefore, select the same objects and actions to compare with their work 1.
CategoryDataset
Objects (11)Bottle, Bowl, Box, Can, Carton, Cup, Mug, Spray-can, Tin, Tube, Tub
Actions (7)Drink, Grasp, Move, Open, Pour, Push, Squeeze
Object-action intention (36)Drink-Bottle, Grasp-Bottle, Move-Bottle, Open-Bottle, Pour-Bottle, Grasp-Bowl, Move-Bowl, Push-Bowl, Grasp-Box, Move-Box, Open-Box, Push-Box, Drink-Can, ……
Objects-actions intention (19)Grasp-Carton-Move-Box, Grasp-Can-Move-Box, Grasp-Bottle-Move-Box, Grasp-Box-Move-Box, Grasp-Cup-Move-Box, Grasp-Can-Pour-Bowl, Grasp-Can-Pour-Cup, ……
1 For more detailed data, please contact the corresponding author.
Table 5. Single-object intention evaluation groups. Single-object intentions consist of an object and an action. To compare with [41], we choose the same four sets of intentions.
Table 5. Single-object intention evaluation groups. Single-object intentions consist of an object and an action. To compare with [41], we choose the same four sets of intentions.
GroupIntention
FirstGrasp-Box, Open-Box, Grasp-Carton, Open-Carton, Pour-from-Carton
SecondGrasp-Can, Move-Can, Pour-from-Can, Drink-from-Cup
ThirdGrasp-Carton, Move-Carton, Pour-from-Carton, Drink-from-Cup
FourthGrasp-Bottle, Move-Bottle, Pour-from-Bottle, Drink-from-Cup
Table 6. Multi-objects intention evaluation groups.
Table 6. Multi-objects intention evaluation groups.
GroupIntention
FirstGrasp-Cup-Move-Box, Grasp-Can-Pour-Bowl, Open-Can-Pour-Bowl, Open-Bottle-Pour-Bowl
SecondGrasp-Can-Pour-Cup-Drink-Person, Grasp-Cup-Pour-Bowl-Drink-Person, Open-Bottle-Pour-Cup-Drink-Person
Table 7. Time of objects–actions intentions reasoning.
Table 7. Time of objects–actions intentions reasoning.
Single-ObjectTime(s)Multi-ObjectsTime (s)
Cup0.146924Cup, Bowl0.210154
Bowl0.185253Cup, Box0.182532
Can0.180976Bottle, Box0.188147
Box0.143749Bowl, Cup, Person0.189620
Bottle0.165264Spoon, Bowl, Person0.166144
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Liu, Y.; Yao, Y.; Zhong, M. Object Affordance-Based Implicit Interaction for Wheelchair-Mounted Robotic Arm Using a Laser Pointer. Sensors 2023, 23, 4477. https://doi.org/10.3390/s23094477

AMA Style

Liu Y, Liu Y, Yao Y, Zhong M. Object Affordance-Based Implicit Interaction for Wheelchair-Mounted Robotic Arm Using a Laser Pointer. Sensors. 2023; 23(9):4477. https://doi.org/10.3390/s23094477

Chicago/Turabian Style

Liu, Yaxin, Yan Liu, Yufeng Yao, and Ming Zhong. 2023. "Object Affordance-Based Implicit Interaction for Wheelchair-Mounted Robotic Arm Using a Laser Pointer" Sensors 23, no. 9: 4477. https://doi.org/10.3390/s23094477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop