Design and Evaluation of Anthropomorphic Robotic Hand for Object Grasping and Shape Recognition

: We developed an anthropomorphic multi-ﬁnger artiﬁcial hand for a ﬁne-scale object grasping task, sensing the grasped object’s shape. The robotic hand was created using the 3D printer and has the servo bed for stand-alone ﬁnger movement. The data containing the robotic ﬁngers’ angular position are acquired using the Leap Motion device, and a hybrid Support Vector Machine (SVM) classiﬁer is used for object shape identiﬁcation. We trained the designed robotic hand on a few monotonous convex-shaped items similar to everyday objects (ball, cylinder, and rectangular box) using supervised learning techniques. We achieve the mean accuracy of object shape recognition of 94.4%.


Introduction
Human-robot interfaces have many applications, including prosthetics and artificial wrists [1,2], manufacturing and industrial assembly lines [3], surgery in medical robotics [4], hand rehabilitation [5], assisted living and care [6], soft wearable robotics [7], territory patrolling [8], drone-based delivery and logistics [9], military robotics applications [10], smart agriculture [11,12], and student teaching [13,14]. However, achieving efficient object grasping and dexterous manipulation capabilities in robots remains an open challenge [15]. When designing anthropomorphic robot hands, i.e., robotic manipulators who have a structure (joints and links) similar to a human hand), it is challenging to achieve feedback from actuators, sensors, and the mechanical part of the manipulator about the shape, texture, and other physical characteristics of the grasped object. Grasping is considered one of the must-have skills that the robots need to master before being successfully adopted as a replacement for many manual operations. The robotic hand grasping task is commonly implemented using rigid robotic hands, which need accurate control with tactile sensor feedback. Robotic manipulation tasks have become very demanding when a gripper needs to have a feasible plan to solve grasping problems under possible uncertainties such as items occupying unknown positions or having some item properties (like shape) unknown for a robot.
Developing an accurate robot grasping for a wide range of object shapes is a serious challenge for order delivery and home service robotics. The anthropomorphic robot hand has to grasp and lift items without pre-knowledge of their weight and damping characteristics. Optimizing their reliability, and range is complex due to inherent uncertainty in control, sensing, and tactile feedback [16]. The robotic hand control parameters depend upon the geometric properties of the item, such as shape, and position, while grasping control is related to the structure of the gripper [17].
Currently, the robots empowered by artificial intelligence algorithms can accomplish moving objects through grasping. Grasp detection based on neural networks can help robots to precisely perceive surrounding environments. For example, Alkhatib et al. [18] evaluate the grasp robustness of a three-fingered robotic hand based on position, and movement speed measured at each joint, achieving 93.4% accuracy of predicting the grasping stability with unknown gripped items due to inexpensive tactile sensors used. Dogar et al. [19] considered the problem of searching optimal robot configurations for grasping operations during the collaborative part assembly task as a constraint satisfaction problem (CSP). They proposed the algorithm for simplifying the problem by dividing it into a sequence of atomic grasping actions and optimizing it by removing the unnecessarily repeated grasps from the plan. Gaudeni et al. [20] suggested an innovative grasping strategy based on a soft modular pneumatic surface, which uses pressure sensors to assess the item's pose and center of mass and recognize the contact between the robot's gripper and the grasped item. The strategy was validated on multiple items of different shapes and dimensions. Golan et al. [21] developed a general-purpose robotic hand of variable-structure that can adapt to fit a wide range of objects. The adaptation is ensured by rearranging the hand's structure for the desired grasp so the previously unseen items could be grasped. Homberg et al. [22] designed a soft hand to effectively grasp and recognize object shapes based on their state measurements. The internal sensors allow for the hand configuration and the item to be detected. A clustering algorithm is adopted to recognize each grasped item in the presence of uncertainty in the item's shape. Hu et al. [23] proposed using a trained Gaussian process classifier for determining the feasibility of grasping points of a robot.
Ji et al. [24] linked vision and robot hand grasping control to attain reliable and accurate item grasping in a complex cluttered scene. By fusing sensor data, real-time grasping control was obtained that provided the capability to manipulate with various items of unknown weight, stiffness, and friction. Kang et al. [25] developed an integrated gripper that fitted an under-actuated gripper with a suction gripping system for grasping different items in various environments. Experiments using a diverse range of items under various grasping scenarios were executed to demonstrate the grasping capability. Kim et al. [26] developed a 14 degrees of freedom (DoF) robotic hand with five fingers and a wrist with a tendon-driven mechanism that minimizes friction and optimizes efficiency and back-drivability for human-like payload and compact dexterity. Mu et al. [27] constructed a robot with a prototype of an end-effector for picking kiwifruit, while its artificial fingers were mounted with fiber sensors to find the best position for grabbing the fruit without any damage. Neha et al. [28] performed grasping simulations of the four-fingered robotic hand in Matlab to demonstrate that the developed robotic hand model can grasp items of different shapes and sizes. Zhou et al. [29] proposed an intuitive grasping control approach for a custom 13-DOF anthropomorphic soft robotic hand. The Leap Motion controller acquires the human hand joint angles in real-time and is mapped into the robotic hand joints. The hand was demonstrated to attain good grasping performance for safe and intuitive interactions without strict accuracy requirements.
Neural networks and deep learning have been successfully adopted to improve control of robotic hands and object grasping. For example, Mahler et al. [16] trained separate Grasp Quality Convolutional Neural Networks (GQ-CNNs) for each robot gripper. The grasping policy was trained on the Dex-Net 4.0 dataset to maximize efficiency while using a separate GQ-CNN for each gripper. The approach was validated for the binpicking task with up to 50 diverse heaps of previously unseen items. James et al. [30] used Randomized-to-Canonical Adaptation Networks (RCANs) for training a vision-based grasping reinforcement learning unit in a simulator with no real-world data and then transfer it to the physical world to achieve 70% of zero-shot grasping success on unknown items. Setiawan et al. [31] suggested a data-driven approach for controlling robotic fingers to assist users in bi-hand item manipulation. The trained neural network is used to control the robotic finger motion. The method was tested on ten bimanual tasks, such as operating a tablet, holding a large item, grasping a bottle, and opening a bottle cap. Song et al. [32] performed detection of robotic grasp using region proposal networks. The experiments performed on the Cornell grasp and Jacquard datasets demonstrated high grasp detection accuracy. Yu et al. [33] suggested a vision-based grasping method based on a deep grasping guidance network (DgGNet) and a recognition network MobileNetv2 that can recognize the occluded item, while DgGNet calculates the best grasp action for the grasped item and controls the manipulator movement.
A typical limitation of previous implementations is the high implementation cost, which constrains its wide application by end-users. Our approach uses 3D printing technology and consumer-grade Leap Motion (Leap Motion Inc., San Francisco, CA, USA) sensors to develop an anthropomorphic multi-finger robotic hand, which is affordable to end-users and efficient in small-scale grasping applications.
This article aimed to develop and evaluate a robotic hand, which can identify the shape of the item it is holding, for custom grasping tasks. We trained the developed robotic hand on several simple-shaped items using supervised artificial intelligence techniques and the hand gesture data acquired by the Leap Motion device. This paper is an extended version of the paper presented at the ICCSA'2020 conference [34]. In [34], we used the Naïve Bayes algorithm to predict the shape of objects grasped by the robotic hand based on its fingers' angular position, achieving an accuracy of 92.1%. In this paper, we further improve the result by adopting the hybrid classifier based on deep feature extraction using a transforming autoencoder and a Support Vector Machine (SVM) for object shape recognition.

Formal Definition of a Problem
From the perspective of mechanics, the hand is a multi-link "mechanism". The kinematic chain of bone links of the "mechanism" of a human hand can be represented as a diagram shown in Figure 1 (for the sake of simplicity, the palm of the arm is represented as a plane). We can consider the robotic arm as a spatial mechanism; the number of degrees of freedom (DoF) is 27. The hinges of the robotic fingers have 3, 2, and 2 degrees of freedom; therefore, the total number of DoFs is 7. Except for the thumb, the others have two DoFs; with one DoF-two connections and with two DoFs-one connection. In addition to a wide variety of hand movements, the hand and the fingers also have great mobility, flexibility, and a wealth of possible movements. All this, within reach of fingers, provides a grip on an object of any shape and makes it possible to perform various actions on objects with the help of the fingers. Given an object ℴ with its spatial dimensions ∈ ℛ , which denote the 3D shape of the object's surface as . A grasp by a robotic hand manipulator can be defined as = , , where = ; ; ∈ ℛ are the coordinates of the robotic hand fingers, and = , , is the Euler angle vector representing the 3D orientation of the fingers. To draw up a kinematic model, the manipulator is specified by the base coordinate system and each link's coordinate system. The base coordinate system is called the "zero" coordinate system (x 0 , y 0 , z 0 ), which is the inertial coordinate system of the manipulator. As for each link, then on the axis of its articulation define the Cartesian orthonormal coordinate system (x i , y i , z i , 1), where i = 1, 2, . . . , n, and n is equal to the number of DoFs of the robotic manipulator. When the electric drive sets the i-th joint in motion, the i-th link begins to move relative to the i-th link, and since the i-th coordinate system is connected with the i-th link, it moves with it so that the n-th coordinate system moves together with the last n-th link of the manipulator.
When considering the anthropomorphic robotic fingers only, they are composed of three links (named phalanxes) except for the thumb finger, which has only two phalanxes. The three phalanxes are called proximal, medial, and distal, while their joints are metacarpophalangeal (MCP), proximal interphalangeal (PIP), and distal interphalangeal (DIP) joints, respectively. The MCP joints have two DoFs, while the PIP and DIP joints have only one DoF each (see Figure 2). The origin of the coordinate system is assigned to the center of the MCP joint. The angle of the MCP joint is denoted as ϕ, the angle of the PIP joint is denoted as ε, and the angle of the DIP joint is denoted as τ. Given an object ℴ with its spatial dimensions ∈ ℛ , which denote the 3D shape of the object's surface as . A grasp by a robotic hand manipulator can be defined as = , , where = ; ; ∈ ℛ are the coordinates of the robotic hand fingers, and = , , is the Euler angle vector representing the 3D orientation of the fingers.

Outline
The supervised prediction of object shape using the robotic hand has four steps: 1. Data acquisition and labeling. 2. Feature selection and dimensionality reduction. 2. Classification model training. 4. Object shape prediction. The steps are summarized in Figure  3 and are explained below.
For data acquisition, we employ the Leap Motion sensor and a hand motion theory described in detail in [35]. Note that here we recognize the motions of the anthropomorphic robotic hand rather than a human hand. The Leap Motion controller uses two Infrared (IR) cameras and three IR LEDs. These cameras track the IR light, which exceeds the light spectrum visible to a human eye. Given an object σ with its spatial dimensions z ∈ R 3 , which denote the 3D shape of the object's surface as S. A grasp by a robotic hand manipulator can be defined as g = (C, θ), where C = (x; y; z) ∈ R 3 are the coordinates of the robotic hand fingers, and θ = (ϕ, ε, τ) is the Euler angle vector representing the 3D orientation of the fingers.

Outline
The supervised prediction of object shape using the robotic hand has four steps: 1. Data acquisition and labeling. 2. Feature selection and dimensionality reduction. 2. Classification model training. 4. Object shape prediction. The steps are summarized in Figure 3 and are explained below.
For data acquisition, we employ the Leap Motion sensor and a hand motion theory described in detail in [35]. Note that here we recognize the motions of the anthropomorphic robotic hand rather than a human hand. The Leap Motion controller uses two Infrared (IR) cameras and three IR LEDs. These cameras track the IR light, which exceeds the light spectrum visible to a human eye.
The captured image data are sent to the personal computer (PC) to extract tracking information about the hand, hand's finger, and grasped objects. The Leap Motion SDK has an inbuilt function, aimed to recognize each finger of the hand. The angles between each finger's proximal and intermediary bone are calculated (see "Raw data" in Figure 3) and used for further processing. The griping tasks are then performed using the robotic hand, and the data is labeled based on the shape of the gripped object. All the finger data are captured and streamlined for pre-processing (denoising and normalization of data). The collected dataset contains the angle values for each separate finger of 1200 instances from three differently shaped objects: Ball, Cylinder, and Rectangular Box (see Figure 4). The data attributes represent and the three angles (ϕ, ε, τ) between bones of the individual fingers.
The captured image data are sent to the personal computer (PC) to extract tracking information about the hand, hand's finger, and grasped objects. The Leap Motion SDK has an inbuilt function, aimed to recognize each finger of the hand. The angles between each finger's proximal and intermediary bone are calculated (see "Raw data" in Figure 3) and used for further processing. The griping tasks are then performed using the robotic hand, and the data is labeled based on the shape of the gripped object. All the finger data are captured and streamlined for pre-processing (denoising and normalization of data). The collected dataset contains the angle values for each separate finger of 1200 instances from three differently shaped objects: Ball, Cylinder, and Rectangular Box (see Figure 4). The data attributes represent and the three angles , , between bones of the individual fingers.

Architecture of the Classification Model
To implement the detector of robotic hand grasping actions, we use a transforming autoencoder [36], which is a subtype of a more general class of autoencoder neural networks [36]. The advantage of transforming autoencoder is being resistant to variable change compact representation of input variables. Next, we consider the architecture and principles of operation of transforming autoencoder. A transforming autoencoder is a neural network, which is learning by the backpropagation method. It is based on the following principle: the output level of the network is structurally equal to the input, and input values are used as reference values for training autoencoder-thus, the neural network learns to predict the same data that it receives at the input. The function encapsulated by such a network in the general case is trivial, but in in the case of an autoencoder, an additional restriction is imposed on the network-the presence of a "bottleneck" in one of the intermediate (hidden) layers, i.e., a layer which has fewer neurons than the input layer. The neurons of such a layer thus represent a representation of input data. Considering the use of non-linear activation functions neurons and many autoencoder layers, such a representation can be compact and accurate. Unlike classical multilayer perceptrons, which have a homogeneous structure within the layer, the transforming autoencoder is a heterogeneous network consisting of several smaller size networks. Each such network is called a capsule. All autoencoder capsules have the same structure, and each capsule contains one decisive neuron, taking a value in the range [0,1], corresponding to the probability that the object is present in the image. The capsule encodes the spatial position object in a compact form corresponding to the selected representation coordinates. Thus, the network architecture (presented in Figure 5) allows you to get not only a compact object representation but also explicitly set a semantic value for each code element of the autoencoder. Since the code generated by the transforming autoencoder represents the object positioning parameters, then in cases where position information is available, it becomes possible to conduct explicit supervised learning by comparing the approximated autoencoder value with given positioning values.

Architecture of the Classification Model
To implement the detector of robotic hand grasping actions, we use a transformin autoencoder [36], which is a subtype of a more general class of autoencoder neural ne works [36]. The advantage of transforming autoencoder is being resistant to variabl change compact representation of input variables. Next, we consider the architecture an principles of operation of transforming autoencoder. A transforming autoencoder is neural network, which is learning by the backpropagation method. It is based on the fo lowing principle: the output level of the network is structurally equal to the input, an

Implementation of Robotic Hand
The robotic hand architecture used in this study is based on known robotic hand prototypes with a pulley-tendon transmission with fingers, which are moved by serial kinematic chains with revolute joints (e.g., see [37]). Such design minimizes friction loss and improves performance and back drivability. The robotic hands were 3D printed using an open-source design obtained from inmoov (http://inmoov.fr/) following the principles of pattern-based design [38]. The Inmoov designs support the principal elements of human hand anatomy, such as bones, ulna, joints, and tendons ( Figure 6).
Designing a system to perform the shape recognition of held items consists of 1) servo actions, 2) I/O interface, and 3) algorithm suite (see Figure 7). As defined by the algorithm shown in Figure 7, the robotic hand will first perform the gripping task on an unknown shape object, and the pressure sensors will register the contact with an object. Then the servo motors controlling the hand tendons' movements will stop their action and the spatial positions of the finger joints are recorded. The spatial positions of fingers are transformed into angles (in degrees) and stored as a dataset. After data normalization, the data are sent to the spot-checking procedure, which performs the data validity checking and removes measurements with corrupt (or not available) values. Then, the prediction of the shape is of a gripped object is performed. If there is not enough data to make a reliable prediction, the gripping action is repeated.

Implementation of Robotic Hand
The robotic hand architecture used in this study is based on known robotic hand prototypes with a pulley-tendon transmission with fingers, which are moved by serial kinematic chains with revolute joints (e.g., see [37]). Such design minimizes friction loss and improves performance and back drivability. The robotic hands were 3D printed using an open-source design obtained from inmoov (http://inmoov.fr/) following the principles of pattern-based design [38]. The Inmoov designs support the principal elements of human hand anatomy, such as bones, ulna, joints, and tendons ( Figure 6).   Designing a system to perform the shape recognition of held items consists of (1) servo actions, (2) I/O interface, and (3) algorithm suite (see Figure 7). As defined by the algorithm shown in Figure 7, the robotic hand will first perform the gripping task on an unknown shape object, and the pressure sensors will register the contact with an object. Then the servo motors controlling the hand tendons' movements will stop their action and the spatial positions of the finger joints are recorded. The spatial positions of fingers are transformed into angles (in degrees) and stored as a dataset. After data normalization, the data are sent to the spot-checking procedure, which performs the data validity checking and removes measurements with corrupt (or not available) values. Then, the prediction of the shape is of a gripped object is performed. If there is not enough data to make a reliable prediction, the gripping action is repeated.

Data Collection
The process of data collection and neural network training is summarized in Figure  8. As each finger has three bones joined together, it leads to three angular values calculated for each finger. To recognize these bones and each finger on the hand, we use the in-built function provided by Leap Motion SDK. Once all the finger angles are captured and are

Data Collection
The process of data collection and neural network training is summarized in Figure 8.
As each finger has three bones joined together, it leads to three angular values calculated for each finger. To recognize these bones and each finger on the hand, we use the in-built function provided by Leap Motion SDK. Once all the finger angles are captured and are stored against the hand holding the objects, they are saved to a.csv file. The collected dataset includes three objects (circle, rectangular box, and cylinder) trained from 12 different people for the initial approximation for algorithm analysis. The example of data collected for different object grasping tasks is shown in Figure 9.
Computers 2021, 10, x FOR PEER REVIEW 9 of 16 stored against the hand holding the objects, they are saved to a.csv file. The collected dataset includes three objects (circle, rectangular box, and cylinder) trained from 12 different people for the initial approximation for algorithm analysis. The example of data collected for different object grasping tasks is shown in Figure 9.

Analysis of Features
The data encoded using the autoencoder network is used as features to characterize finger positions and joint bending data. We have evaluated the importance of each type of shape's angular features using the two-sample t-test with a pooled variance estimate. The results of the feature ranking are presented in Figure 10. Note that different features are relevant for the recognition of different item shapes.

Analysis of Features
The data encoded using the autoencoder network is used as features to characterize finger positions and joint bending data. We have evaluated the importance of each type of shape's angular features using the two-sample t-test with a pooled variance estimate. The results of the feature ranking are presented in Figure 10. Note that different features are relevant for the recognition of different item shapes.   For recognizing the shape of the Ball object, the most relevant feature is provided by the Thumb and Index fingers, for the shape of the Rectangular object-by the Ring and Thumb fingers, and for the shape of the Cylinder object-by the Thumb finger.
For example, see the most informative value distribution (according to the results of feature ranking) Middle and Pinky finger angles in Figure 11, the Ring and Index finger angles in Figure 12, and the Pinky and Thumb finger angles in Figure 13.

Evaluation of Results
Finally, we used autoencoder features as an input to the Support Vector Machine (SVM) classifier. We used SVM with radial basis function (RBF) kernel. The kernel has hyperparameters, gamma, and C; the best fitting values were found using the grid search method. The classifier's evaluation is an estimate of how well an object shape recognition algorithm works in a real-world environment.
To evaluate the performance of the classification quantitatively, we used 10-fold cross-validation. The mean accuracy of object shape recognition achieved using is 94.4%. The confusion matrix of the results is given in Figure 14. Note that the Cylinder and Ball shapes are confused more often due to their similarity in shape.

Evaluation of Results
Finally, we used autoencoder features as an input to the Support Vector Machine (SVM) classifier. We used SVM with radial basis function (RBF) kernel. The kernel has hyperparameters, gamma, and C; the best fitting values were found using the grid search method. The classifier's evaluation is an estimate of how well an object shape recognition algorithm works in a real-world environment.

Evaluation of Results
Finally, we used autoencoder features as an input to the Support Vector Machine (SVM) classifier. We used SVM with radial basis function (RBF) kernel. The kernel has hyperparameters, gamma, and C; the best fitting values were found using the grid search method. The classifier's evaluation is an estimate of how well an object shape recognition algorithm works in a real-world environment.

Discussion and Conclusions
An anthropomorphic robotic manipulator's design allows the robot to be efficiently used in several applications such as object grasping that allow the robot to work to operate in an environment that is more suitable to human manual work [39]. Although a human hand has many unique characteristics that allow grabbing and holding objects of diverse shapes, the anthropomorphic robotic hand can still be used for repetitive object grabbing and moving tasks such as for industrial conveyors [40] or medical laboratories [41]. Specifically, designing a multi-fingered arm that can grasp and hold reliably and has versatile manipulative abilities cannot be performed with a generic gripper [42]. The designed hand is a low-cost alternative to other known 3D printed robotic hands [43][44][45].
Several previous works tried to combine the Leap Motion sensor to recognize the grasping movements and control the robotic hand, either physical or virtual. Moldovan and Staretu [46] used Leap Motion to control the virtual robotic hand by recognizing a human hand's grasping movements. However, no quantitative evaluation of the experiment in grasping a ball was done. Zhang et al. [47] used the Leap Motion controller and a ray detection rendering method to generate tactile feedback. They used four types of shape (cube, ball, cylinder, and pyramid) for recognition, but evaluated shape recognition using only the qualitative 1o-point scale. Zhou et al. [29] also used the Leap Motion controller to capture a human hand's joint angles in real-time. Then the human hand joint angle position was mapped into the robotic hand to perform object grasping. However, they also made no effort in recognizing the shape of an object.
In this paper, the robotic hand was designed to execute human-like grasping of items of various simple shapes such as balls or rectangular boxes. Using the robotic hand and the Leap Motion device's data, we have achieved a 94.4% accuracy of shape recognition, which improved the results reported in our previous paper [34].
Author Contributions: All authors have contributed equally to this manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Discussion and Conclusions
An anthropomorphic robotic manipulator's design allows the robot to be efficiently used in several applications such as object grasping that allow the robot to work to operate in an environment that is more suitable to human manual work [39]. Although a human hand has many unique characteristics that allow grabbing and holding objects of diverse shapes, the anthropomorphic robotic hand can still be used for repetitive object grabbing and moving tasks such as for industrial conveyors [40] or medical laboratories [41]. Specifically, designing a multi-fingered arm that can grasp and hold reliably and has versatile manipulative abilities cannot be performed with a generic gripper [42]. The designed hand is a low-cost alternative to other known 3D printed robotic hands [43][44][45].
Several previous works tried to combine the Leap Motion sensor to recognize the grasping movements and control the robotic hand, either physical or virtual. Moldovan and Staretu [46] used Leap Motion to control the virtual robotic hand by recognizing a human hand's grasping movements. However, no quantitative evaluation of the experiment in grasping a ball was done. Zhang et al. [47] used the Leap Motion controller and a ray detection rendering method to generate tactile feedback. They used four types of shape (cube, ball, cylinder, and pyramid) for recognition, but evaluated shape recognition using only the qualitative 1o-point scale. Zhou et al. [29] also used the Leap Motion controller to capture a human hand's joint angles in real-time. Then the human hand joint angle position was mapped into the robotic hand to perform object grasping. However, they also made no effort in recognizing the shape of an object.
In this paper, the robotic hand was designed to execute human-like grasping of items of various simple shapes such as balls or rectangular boxes. Using the robotic hand and the Leap Motion device's data, we have achieved a 94.4% accuracy of shape recognition, which improved the results reported in our previous paper [34].
Author Contributions: All authors have contributed equally to this manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.