Robust Kalman Filtering Cooperated Elman Neural Network Learning for Vision-Sensing-Based Robotic Manipulation with Global Stability

In this paper, a global-state-space visual servoing scheme is proposed for uncalibrated model-independent robotic manipulation. The scheme is based on robust Kalman filtering (KF), in conjunction with Elman neural network (ENN) learning techniques. The global map relationship between the vision space and the robotic workspace is learned using an ENN. This learned mapping is shown to be an approximate estimate of the Jacobian in global space. In the testing phase, the desired Jacobian is arrived at using a robust KF to improve the ENN learning result so as to achieve robotic precise convergence of the desired pose. Meanwhile, the ENN weights are updated (re-trained) using a new input-output data pair vector (obtained from the KF cycle) to ensure robot global stability manipulation. Thus, our method, without requiring either camera or model parameters, avoids the corrupted performances caused by camera calibration and modeling errors. To demonstrate the proposed scheme's performance, various simulation and experimental results have been presented using a six-degree-of-freedom robotic manipulator with eye-in-hand configurations.


Introduction
Visual sensors integrated with robotic manipulators can be increasingly beneficial for robotic perception and behavioral flexibility in unstructured environments [1]; such sensors have caught much attention, and have applications in all walks of life [2][3][4][5].
Vision-based robotic manipulation depends mainly on visual information feedback to control the positioning or motioning of a manipulator [6]. There are two general categories of manipulation: position-based visual servoing (PBVS) and image-based visual servoing (IBVS). There is also a hybrid of the two techniques [7]. These different categories arise from differential definitions of visual feature and feedback control information.
The PBVS vision usage provides information to regulate the end-effector pose (relative to the object in the Cartesian space). Owing to its characteristic of global asymptotic stability, this method is suitable for most industrial robotic manipulators [8]. However, the servoing task is inevitably related to the tedious affine model which is associated with world coordinates, camera frame, and robot base coordinates. Consequently, PBVS is more sensitive, with respect to the camera and object modeling errors [9], and to the possibility of disappearing image features [10][11][12].
In IBVS, there is direct control of the feature points on the image plane for robotic manipulation, and the image Jacobian matrix is used to describe the differential relation between the image error and the end-effector pose [13]; such direct control has simplified PBVS computation. However, IBVS has its intrinsic drawbacks. Consequently, modified IBVS method have recently mainly focused on: (1) the dynamic visual servo control law, which was proposed to improve poor dynamic performances due to either low vision acquisition frequency or to the time latency processing system [14]; (2) image frame path planning as an effective solution for making camera calibration and robotic kinematics modeling errors more robust [15][16][17] (such optimization techniques are selected for minimizing both end-effector trajectories in Cartesian space and features trajectories on image planes [18]); and (3) redundant and cooperative 2D visual servo systems [5] and intelligent hybrid schemes [19], which solve typical problems, such as features extraction errors and task singularities, etc.
With respect to modified methods, the presented image Jacobian matrix requires depth information and camera calibration parameters (as in traditional IBVS methodologies); such requirements inevitably lead to task singularities, thus making it difficult to ensure the stabilized convergence of a desired target [20]. The Jacobian identification is considered to be a dynamic parameter estimation problem. Solutions to this issue are broadly grouped into two categories.
One category comprises of online estimation techniques (e.g., the famous Broyden-based method and its modified variants) [21,22]. However, these estimation techniques may actually depend on the system configurations and the tasks to be accomplished. Moreover, these techniques are ill-suited for dealing with system environmental noises. In [23], an M-estimator was applied to the Jacobian estimation task, which does not require the model and system parameters. The first literature discussing the Kalman-Bucy filter (KBF) on state space for online Jacobian estimation is proposed in [24]. Those methods have the same properties as recursive least squares-based estimation processing's. However, the Jacobian estimation task may involve singularity instability in global space; failure to have end-effector positioning in cases of large displacement between the initial and desired poses is also a risk.
Another solution to this estimation problem involves machine learning techniques [25,26], which are based on biologically inspired approaches, such as neural networking. These approaches enables the machine to learn the kinematic relationship of manipulators during action-perception cycles [27,28]. While in IBVS leaning techniques, the Jacobian is presented by a trained nonlinear network [25], this approach approximates convergence to the desired position; reaching levels of high servoing precision is difficult because network training is an incomplete convergence process and is sensitive to training samples.
In this paper, the discussion will focus on non-parameter Jacobian matrix estimation problems. A new global-state-space IBVS scheme, which associates KF and ENN learning techniques, is proposed for uncalibrated model-independent robotic manipulation that has robust stability in global-state-space, where the image features are constrained on the camera field-of-view (FOV). The proposed scheme is different from traditional PBVS methods; IBVS methods possess the merits of not needing calibration of either camera parameters or the robotic kinematics model. Moreover, for IBVS methods, Jacobian dynamic estimation does not require depth information. The main contributions of the paper are as follows: (1) Jacobian online identification problems were solved by introducing state-space infrastructure, which has been incorporated into robust KF techniques. The traditional KF is a minimumvariance state estimator for linear dynamic systems with Gaussian white noises sequence. In most practices, however, the observation noise is compound and the noises are statistically correlated noises (as to being simple white noise sequences). Therefore, we have derived a robust KF estimator with multiple orders of noise for the Jacobian online estimation task. (2) The KF is sensitive with respect to the initial robotic state and the initial noises' statistical characteristics (i.e., a small perturbation of noise characteristics to dynamic modeling will lead to serious positioning error). For this problem, the ENN was adopted as a global estimator for Jacobian learning, and then the global map relationship between vision space and the robot workspace is represented by the ENN. Notice that this learning method is an approximation of the Jacobian matrix, with respect to the end-effector desired pose in scale motion space; the approximation error present in the offline learning is due to the limited size of the training data. (3) In the online testing phase, the precise positioning problem is solved by using a robust KF to improve ENN learning so as to achieve robotic convergence to the desired pose. After, the ENN's weights are updated through re-training using a new input-output data pair vector, which is obtained from the KF cycle to ensure robotic global stability manipulation. Finally, we have designed a novel global-state-space IBVS framework associated with robust KF cooperated ENN learning. In our finding, the image Jacobian matrix is estimated without accounting for camera calibration and modeling error. Our servoing system performs robustly despite outside noises and system destabilization.
The paper is organized as follows: a general description of visual servoing (VS) for uncalibrated model-independent robotic manipulation is presented in the next section. The Jacobian estimation with KF is presented in the Section 3; in that section, we have also derived a robust KF with statistically correlated noises. The ENN for global Jacobian learning and a novel global-state-space IBVS scheme are presented in Section 4. The simulation and experimental results are discussed in Section 5. Section 6 presents our conclusions.

Visual Servoing to Uncalibrated Model-Independent Robotic Manipulation
In this section, we assume that we have a robotic manipulator, R, that lacks a kinematics model, and where the camera parameters are uncalibrated. As described in Figure 1, uncalibrated model-independent visual servoing (VS) aims to drive the robotic end-effector from the current image feature (S(k), ) to the desired image feature (S * (k), ) by using the control variable (U(k), ). Generally, a simple local motion of the end-effector will result in the nonlinear complex change of many features on the image plane. The solution for describing the change of features and the robot motion, through the adoption of an image Jacobian matrix, was originally proposed in [13]. The task error χ e(k) in the n-dimensional workspace can be described as: where χ P E (k) and χ P T (k) represent the end-effector and target pose, respectively.
The m-dimensional feature vector on image frame I is used for task manipulation; image error I e(k) is given by: Where I ħ E , I ħ T belong to the end-effector and target; these variables result from the projection of χ P E (k), χ P T (k) to the image plane, respectively. The Equation (2) is a decision function, when I e(k) = 0, that meaning is actually equivalent to robot achieving the task manipulation, i.e., χ e(k) = 0.
The association of I e(k) with the differential changes of the end-effector pose χ P E (k) is achieved by assuming linearity through the image Jacobian matrix J(k); this is done through the equation [13]:

The State Equation and Observation Equation
Robot attached visual sensors can be considered an instrumental system whose state vector is formed by concatenations of the row and column elements of the image Jacobian matrix J(k) [24], i.e., The robotic state-space model is a linear discrete-time dynamical system according to: where is the state vector, is the model noise with zero mean, and variances is Q(k) , is the observation noise vector, is the output vector, given by: The Kalman-Bucy filter (KBF) [29] is a minimum-variance state estimator for linear dynamic systems with Gaussian white noise sequences; the KBF has been used for real-time state estimation [30], as well as for solving the Jacobina matrix online estimation problem [24]. In most practices, however, the observation noises (V (k) ) may be compound, and statistically correlated with model noises (W (k) ), rather than being simple white noise. In the next section, we provide a robust Kalman filtering with statistically correlated noises for image Jacobian estimation task.

Robust KF for Jacobian Estimation
For the universal environment, the observation noise vector (V (k) ) that meets the Markov chain model, given by: where, ψ (k − 1) is the coefficient matrix, η (k) is the Gaussian white noise sequence with zero mean, and the variance is R (k) .
Based upon Equations (6), (7) and (10), we obtain the observation value of system state at k + 1 time, as follows: Equation (11) can be rewritten as: Equation (12) can be considered as an equivalent observation equation to replace Equation (7); the statistic characteristics of W (k) and V * (k) are as follows: Equation (13) shows that the observation noise (V * (k) ) is a Gaussian white noise sequence with zero mean, the variance is R * (k) . V * (k) is statistically correlated with the model noise W (k) , and the cross-covariance matrix is S (k) . In this case, the application of traditional KF methods was limited. To solve this problem, we introduced a filtering-revise-vector ρ (k) , such that state Equation (6) can be written as: In order to eliminate the relevance between W * (k) and V * (k), we let , i.e.: Thus, the filtering-revised-vector is: Equations (12) and (14) can be considered as a equivalent observation equation and system state equation, respectively. According to the optimal estimation theory [29], the robust Kalman filtering model of statistically correlated noises can be derived as follows: (1) Estimation equations: the one-step state estimation equation and the variance matrix of estimation error are given by: (2) Updating equations: the state filtering equation, filtering gain, and variance matrix for filtering error are as follows: (18) As shown in Figure 2, a robust KF has two basic processes, which are the estimation and the observation updates. The estimation process uses prior-state information and a state model to estimate the current state. The observation update uses the discrepancy between the estimated value and the observed value. Figure 2. The structure of a robust KF for Jacobian matrix estimation. We introduce a filtering-revise-vector ρ (k) to improve the robustness of the filter's performance with respect to universal dynamic noises.

A Global-State-Space IBVS Scheme for Robotic Manipulation
In this section, a novel global-state-space IBVS method is proposed, which attaches the robust KF Cooperate with Elman neural network learning techniques, so as to enable robot stability manipulation in global-state-space.

Elman Neural Network for Jacobian Approximate Leaning
The original Elman neural network (ENN) was proposed by Elman in 1990 [31]. ENN is a typical dynamic recurrent network, with particular context nodes (other than the common input layer): the hidden and output layers. ENN's biggest advantage is that its context nodes may be used to memorize the previous activations of the hidden nodes; this makes ENN applicable in the fields of dynamic system identification and prediction control. In this paper, as shown in Figure 3, the framework of feedback ENN with its three layer structure is selected; the framework consists of the input vector S(k), output vector J(k), connection weights W Li (where i = 1, 2, …, 5), the nodal activation functions, h(•), g(•), f(•), and the thresholds, α I , α j , α k , which correspond to the input layer, hidden layer, and output layer, respectively. The mathematical of the ENN is described as follows [32]: Input layers: Hidden layers: Output layers: where the activation function of input layer h(x) and hidden layer g(x) are taken as a sigmoidal function: The output layer activation function f(x) is taken as a linear function, such that Equation (21) is written as: The learning algorithm is applied to determine the neural network structure. The goal of training is to obtain the convergence of the connection weights W Li (i = 1, 2, …, 5) of the input layer to hidden layer, and the hidden layer to output layer. Where 600 samples are used to train the ENN, the input

Input layer
Hidden layer Output layer samples to the ENN are the image feature vector (S'(k) = [s 1 (k), s 2 (k), …, s 600 (k)] 8 × 600 ), and the output samples of the network are the Jacobian matrix (J'(k) = [x 1 (k), x 2 (k),…, x 600 (k)] 48 × 600 ). The learning laws with the gradient descent method [32], and the training result (approximately at epoch 107), reach the best validation performances, and the minimum sum squared error (MSE) is 1.3. It is seen that the test output of the network is not very close to the training output, which means that the actual end-effector positioning is approximately close to the desired target; this is due to the approximation error present in offline learning.

Global-State-Space IBVS Scheme
The schematic of the proposed global-state-space IBVS is shown in Figure 4. The goal is to extract image features by vision sensor to estimate, online, the desired Jacobian Ĵ(k), so as to drive the end-effector to reach the desired pose, as employed by a control law. The scheme consists of the following steps: (i) First, the Jacobian for the global mapping relationship between the vision space and the robot workspace is approximated using the ENN. As mentioned in last section, it is not necessary to have a very low approximation error during offline training. But in the testing phase, the supposed desired Jacobian Ĵ(k), with respect to the desired end-effector pose, is estimated by the robust KF (as shown in the dashed box in Figure 4). The ENN's weights were undated and use the gradient descent algorithm to minimize output error e(k) for a given input image features vector S(k − 1).
. The update law of the connecting weight is given by: where η is the learning rate.
(ii) The initial state of the robot is very important to robot stability manipulation. The authors in [33] use a fuzzy neural network to provide this initial join guess. In [34], the authors start from the current robot configuration and in [27], a KSOM-based sub-clustering network is used to provide an initial join guess to the inversion algorithm. For the initial Jacobian guess, the common method involves introducing the robotic probe moving at the neighborhood of its initial position n times , and observing the corresponding features displacements in the image frame. The initial Jacobian matrix could then be obtained reasonably through [24]: In contrast, in this paper, at the initial time k = 0, the initial image features S(0) captured by the vision sensor are used as an input for ENN. After, the ENN outputs an initial Jacobian guess J(0), which can be considered the initial state X(0) of the robot. Therefore, our scheme is more flexible since it avoids introducing complex probe motions.
(iii) The robust KF estimator (mentioned in Section 3.2) is used to estimate the desired Jacobian Ĵ(k) through the following steps: Step 1 robot state initialization as Step (ii).
Step 5 The best state estimate is obtained using the robust KF Equations (17) and (18) (iv) The control law should be employed to drive the robot from its present pose to its desired pose. The image error, as shown in Equation (2) at the time instant k, is rewritten as: where S * (k) is the desired image feature, and is the estimate value of the image features. Let χ P E (k) = U(k); then, according to Equation (3), the control law is formed by: where λ is the control rate, and Ĵ + (k) is the inverse Jacobian matrix.

The State Definition of Robotic System
The simulation data consist of four-feature-points, which are used for robot manipulation. The desired features vector S * (k) does not change over time, and therefore can be calculated before the main control loop of the experiment. On the other hand, the feature vector S(k) is not constant due to the camera motion and is obtained in each time instant. The features vector is obtained through:   (29) where . Supposing the linear velocity of the end-effector is V(k) = [v x , v y , v z ] T , the angular velocity is W(k) = [w x , w y , w z ] T , and the robotic control variable is U(k) = [V(k), W(k)] T 6 × 1 . According to Equation (4), the 8 × 6 Jacobian matrix is given by: According to Equation (5) (31) where j ik refers to the ith row and kth column of J(k).

Simulation Evaluation
An eye-in-hand simulation environment was set up to test the performance of the proposed method. Camera movement covers linear, rotational movement, as well as the combination of the translational and rotational movements in the workspace. In simulation, we consider only two difficult tasks, which consist of four cases (combined movement and rotation movement). The evaluation goals of Cases 1 and 2 are test the feature trajectories and the camera trajectory performances of PBVS, IBVS, and our method. Cases 3 and 4 test the performance's global stability and robustness for both KBF and our method.
Case 1 involves a combination of the translational and rotational movements of the camera. The results are illustrated in Figure 5a. The feature trajectories on the image plane, obtained through our method, are constrained on the camera field-of-view (FOV), but the PBVS method for this task the feature trajectories easy leaves the FOV (Figure 6a). Additionally, Figure 5b shows that the camera trajectory in the Cartesian space obtained by our method moves on a seemingly straight line from the initial pose toward the desired pose. This result is similar to that obtained by the PBVS method, as illustrated in Figure 6b. The IBVS method for the same task, the feature trajectories are constrained by the controller to move on straight lines from the initial toward desired feature points (Figure 7a), but the camera motion in the Cartesian space (Figure 7b) becomes slight odd, i.e., the camera undergoes an abrupt movement, that is not perfect compared with our method.
Case 2 involves a pure camera rotational movement around the optical axis. The feature points obtained by our method (Figure 8a) move in almost straight lines on the image plane; this movement is very similar to those obtained by the IBVS method. This is because our method is similar to IBVS in that we increase redundant manipulation on the Y-axis to minimize the image feature trajectories ( Figure 8b). Additionally, as illustrated in Figure 8c, the PBVS method converge to the desired pose as the image feature points are constrained by the controller to move on circular trajectories. In such circumstances, the feature points almost leave the FOV.
Therefore, compared with the IBVS and PBVS methods, our approach utilizes the advantages of the PBVS method to improve the camera motion trajectory, and takes advantages of the IBVS method to constrain the feature trajectories to avoid situations where features points leave the FOV.  Furthermore, because for PBVS, planar homography is used for pose estimation of the object with respect to the end-effector, and because IBVS is associated with Jacobian computation, both methods are sensitive to camera calibration error. In works [35], the influenceof the calibration parameters on the system is investigated. One solution to this problem uses intelligent hybrid control laws (as proposed in [19]) by introducing neural network reinforcement learning. In our method, the Jacobian is an online estimation, which has nothing to do with camera calibration and system modeling errors. Therefore, our method avoids the corrupted performances caused by calibration and modeling errors.
Case 3 deals with dynamic performances in a noisy environment. This case examines the stability of our method, in comparison with the KBF method [21], in a harsh noisy environment. At each sampling instant, uniformly distributed random noises are added to the control system. A Gaussian noise sequence with zero mean, where the variances are 1 × 10 −3 , were added to the state and observation models (the additive noises shown in Figure 4), respectively. The results, shown in Figure 9, are compared with KBF's. In the former, the camera movement and feature trajectories are smoother and more preferable. When the variance is 9 × 10 −3 , the camera pose in the Cartesian space and the feature trajectories on the image plane for this situation are given in Figure 10. It is obvious that the results of the KBF method are seriously deteriorated by the additive noises; the task goals of reaching the desired position and zeroing the errors are accomplished, but the camera engages in unnecessary retreat, and the image features in the image plane almost leave the FOV. Not only does our method enable convergence on the desired pose, it also robustly performs in the presence of noise. On the other hand, when the noise variances are around 1 × 10 −2 , The KBF method failure to have end-effector positioning, i.e., the camera randomly moves in Cartesian space and the image features in image plan are unknown. In contrast, our method always converges to the desired pose without camera retreat. It is clear that our method has global stability irrespective of the presence of noise. Case 4 deals with dynamic performances due to system destabilization. During the actual robot manipulation, the statistical characteristics of model noise and observation noise are variational, which leads to system destabilization. For simplicity, but without loss of generality, we consider the statistical parameters Q(k) and R * (k) (shown in Figure 2) are classical situation, and then a challenging The dynamic performance of the KBF method is shown in Figure 11. Here, the image features leave the FOV (Figure 11a), and the velocity of the camera does not zero (Figure 11d). As a result, the camera pose fails to converge toward the desired pose at the steady state. For the same case, the performance of our method is presented in Figure 12. It can be seen that the local minima problem of the KBF method has been avoided, and the convergence of both the camera pose and the image features toward their desired targets has been achieved.
On the other hand, different values of Q(k) and R(k) are considered for different simulations. The results tell us that when the covariance of noises changes slightly, the results of the KBF method will alter largely, and even lose their tracking ability (due to serious camera retreat). When the robust KF was selected to cooperate with ENN learning, our method always worked well, even though the Q(k) and R(k) were changing in a larger region. This means that the proposed global-state-space IBVS is the robust despite system destabilization.  (c) (d)

Experimental Results and Discussion
The experimental results have been carried out using eye-in-hand configurations (Figure 13a). The task is the online dynamic estimation of a Jacobian J(k) so as to drive the end-effector from the initial pose (Figure 13b) to the desired pose (Figure 13c). Our VS system consists of a DENSO RC7M-VSG6BA robotic controller, a computer with an Intel Corei5 2.67-GHz CPU, 4GBs of RAM for image processing (The robotic controller and image processing computer can communicate through RS232C serial interface), and a DENSO VS-6556GM six-DoF robotic manipulator with a Basler scA1300-32fm/fc camera mounted at its end-effector. The object is an A4 paper with four black-colored small circular disks on it. The object images are captured by the camera at a rate of 30 Hz. The resolution is 640 × 480, and the center points of the small circular disks are used as feature points.   (c) (d) The experimental results are shown in Figures 14-16. For Case 1 and Case 3, the camera involves rotational movement with low rotational angle values (due to the limitation of the joints angle and the workspace). The results of the image error are minimized, with the initial features converging toward the desired features. The feature trajectories are almost straight lines on the image plane, and the steady-state errors in the image space are in order of 4 pixels for Case 1 and Case 3, while the errors are less than 6 pixels for Case 2. In addition, the camera trajectory in the Cartesian space obtained by our method show that all the tasks of the three cases are completed with global stability without camera retreat. Moreover, the image features remain within the FOV.  In the experiment, our approach direct control of the feature points on the image plane for robotic task manipulation, in other words, the proposed method of observing the changing of features on image plane direct control to the robot converging toward the desired pose with six-degree-of-freedom. As described in Section 2, if the feature errors converge toward zero that meaning is the robot successful achieve the task manipulation. Therefore, the experimental results can validate the proposed method.

Conclusions
In this work, a new global-state-space IBVS scheme for uncalibrated model-independent robot manipulation in an eye-in-hand configuration is discussed. Here, a robust KF cooperates with ENN learning techniques so as to ensure robust stability in global-state-space with respect to image features within the camera field-of-view (FOV). Also, the image Jacobian matrix is estimated without requiring camera parameters and depth information. Therefore, our method avoids the corrupted performances caused by calibration and modeling errors. Through various simulation and experimental results through IBVS, PBVS, KBF and our method, we have shown that our approach takes the advantages of PBVS to improve the camera moving trajectory, and takes the advantages of IBVS to avoid the loss of image features. Finally, in comparison with the KBF method, our method is robust despite outside and system noises.